The Probabilistic Model Checker Storm

We present the probabilistic model checker Storm. Storm supports the analysis of discrete- and continuous-time variants of both Markov chains and Markov decision processes. Storm has three major distinguishing features. It supports multiple input languages for Markov models, including the JANI and PRISM modeling languages, dynamic fault trees, generalized stochastic Petri nets, and the probabilistic guarded command language. It has a modular set-up in which solvers and symbolic engines can easily be exchanged. Its Python API allows for rapid prototyping by encapsulating Storm's fast and scalable algorithms. This paper reports on the main features of Storm and explains how to effectively use them. A description is provided of the main distinguishing functionalities of Storm. Finally, an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.

A model checker takes the formal system model and the formal property as inputs and, somewhat simplifying, returns one of three results, see Figure 2. It reports that the property holds or is violated, and these reports are-given a correct implementation-guaranteed to be correct. The third outcome is that the model checker ran out of computational resources. Model checking has written numerous success stories [13,66], and major contributors Edmund M. Clarke, E. Allen Emerson and Joseph Sifakis were awarded the Turing Award in 2007. Probabilistic model checking extends traditional model checking with tools and techniques for the analysis of systems involving random phenomena or other forms of behavior that can be approximated by randomization. Alur, Henzinger and Vardi [4] state: "A promising new direction in formal methods is probabilistic model checking, with associated tools for quantitative evalua-tion of system performance along with correctness". Distributed algorithms and communication protocols are natural examples, as they often use randomization to efficiently break symmetry. Another example are cyberphysical systems that tightly integrate software and hardware such as sensors, actors and micro-controllers. In particular, sensor readings may be noisy, actors may not always have the same effects, and physical components may fail. Other domains that give rise to models involving probabilistic aspects include, e. g. security protocols and systems biology. All these systems are naturally mapped to Markov models, and probabilistic model checking takes exactly such models as input.
Probabilistic model checking is not new. Initial theoretical results and algorithms for Markov chains [54,55] and Markov decision processes [92,29] were provided about thirty years ago. The use of symbolic data structures led to the first serious tool support [8,61]. Tool realizations for continuous-time Markov chains appeared shortly thereafter [65], and Prism evolved as one of the main probabilistic model checkers 1 covering all these models in a symbolic way [74]. In more recent years, tool support extended to cover probabilistic real-time and hybrid systems, as well as multi-player games.
Meanwhile, research in probabilistic model checking continued, changed directions, and progressed in new application areas. The combination of changing goals and new results led to the development of a modular and adaptive in-house model checker, called Storm. Storm's main aim is to be a performant, easy-extendible platform supplying various probabilistic model checking algorithms. After five years of development, Storm was released as open-source project in 2017 [33]. Despite its relative young age, Storm has established the following in pursue of its original goals: -In the first edition of QComp [50], Storm compared favorably with other model checkers. Consider the quantile-plot in Figure 1. The quantile plot expresses how many benchmark instances (on the x-axis) each were solved in at most the time given on the y-axis. In other words, the point x, y is contained in the quantile plot for tool c if the maximal runtime when using c on the x fastest solved instances (for c) is y seconds. Storm solved more instances, and was generally faster in solving these instances. We elaborate these results in Section 7.4. -Storm's modularity paid off in various occasions: The tool has been adapted to include various novel variants to the typical value iteration algorithm, and has been extended parameter synthesis for probabilistic systems and multi-objective model checking. 1 Resulting in the Haifa Verification Conference 2016 Award.
In many of these areas, Storm has helped to push the state-of-the-art considerably. We elaborate these results in Section 4.
In this paper, we report on Storm's main features and how to use them. We start with a very quick overview introducing Storm before elaborating the supported models and properties. We survey Storm's most prominent building blocks and unique features in greater detail, and discuss the possibilities to interface with these features in Storm. Finally, we report on its internal tool architecture, and provide some empirical evaluation of the main configurations of Storm on the QComp 2019 benchmark set.

Storm in a Nutshell
Research to advance concepts and methods for probabilistic model checking often combines key routines and a variety of essential model checking algorithms. Storm delivers these. Some main characteristic features of Storm that help to push the state-of-the-art in probabilistic model checking are that Storm contains efficient implementations of well-known and mature model checking algorithms for discrete-time and continuous-time Markov chains and Markov decision processes, but also for the more general Markov automata [40], a model containing probabilistic branching, non-determinism, and exponentially distributed delays 2 ; -supports explicit-state and symbolic (BDD-based) model checking as well as a mixture of these modes to handle a wider range of models; -has a modular set-up, enabling the easy exchange of different solvers and distinct decision diagram packages; its current release supports about 15 solvers, and two BDD packages. -extends probabilistic model checking with the possibility of generating (high-level) counterexample [31], synthesizing permissive schedulers [37], symbolic bisimulation minimization [36,95] as well as game-based abstraction of infinite-state MDPs [94]. -offers the possibility to improve the reliability of model checking by supporting exact rational arithmetic using recent techniques [15], and techniques to avoid premature termination of value iteration [86]. -supports advanced properties such as multi-objective model checking [42,43,85], efficient algorithms for conditional probabilities and rewards [10], and longrun averages on MDPs [3,6] and MAs [22]. Storm also contains (the essential building blocks) for handling parametric models such as [30,84,90]; Storm can also be used as a black-box tool to investigate the application of model checking in novel domains: In particular, -Storm supports various native input formats: the PRISM and JANI languages, generalized stochastic Petri nets, dynamic fault trees, and conditioned probabilistic programs. This support does makes it easier to apply probabilistic model checking, and amounts not to just providing another parser; statespace reduction and generation techniques as well as analysis algorithms are partly tailored to these modeling formalisms; -besides a command line interface with many optional arguments, Storm provides a Python API facilitating easy and rapid prototyping of other tools using the engines and algorithms of Storm; -it provides advanced approaches to model checking (see above) and good performance in terms of verification speed and memory footprint, cf. Figure 1, under one roof.
How does Storm relate to other probabilistic model checkers? Storm has not reinvented the wheel, but has rather been inspired and learned from the successes of in particular Prism [76] and the explicit model checker Mrmc [71]. Like its main competitors Prism, mcsta [56], and Epmc [52], Storm relies on numerical and symbolic computations. Although many functionalities are covered by Storm, there are some significant areas that Storm has not been extended to. It does not support discrete-event simulation against temporal logic formulas, known as statistical model checking [80,2]. Storm does not support LTL model checking (as supported by Epmc and Prism), does not support probabilistic timed automata (as supported by mcsta and Prism), has no equivalent of Prism's hybrid engine (a crossover between full MTBDD and Storm's hybrid engine), and does not support the analysis of stochastic games. A longer survey of both features and performance of the various model checkers can be found in [50]. A detailed comparison to Storm can be found in [63].

Probabilistic Model Checking with Storm
We give a gentle introduction to probabilistic model checking 3 with Storm, clarifying the different parts as 3 Readers familiar with probabilistic model checking may safely skip this section. outlined in Figure 2. For surveys and more formal introductions to probabilistic model checking, we refer to [9,69,7].

Model Types
Storm supports the analysis of several different formalisms. They differ regarding (i) their notion of time and (ii) whether or not nondeterministic choices are allowed. Table 1 shows a categorization of the models supported by Storm along the two dimensions. Discrete-time models abstract from timing behavior by viewing the progression of time in terms of discrete steps. In contrast, continuous-time models use real numbers to model the flow of time and therefore have a dense notion of time. Deterministic models (also referred to as Markov chains from now on) behave purely probabilistically. Dually, in MDPs and MAs, nondeterministic choices can be used to model, for instance, the interaction with an adversarial environment or underspecification of the model with the goal to synthesize the optimal concrete system. In general, all model types can be enriched with cost structures. Together with the probabilities in the model this allows for reasoning over, for instance, expected costs until a certain goal is reached. Rather than providing formal definitions, we will illustrate a typical use-case for each model type. We start with the simplest model. In DTMCs every state is equipped with a single probability distribution over successor states. The evolution of the system therefore is fully probabilistic in the sense that it is governed only by repeated randomized trials. A famous example that can be captured in terms of a DTMC is the Herman protocol [77]. The general setting is this: a ring consisting of identical processes that each start either with a token or without one. If more than one process holds a token the protocol is in an unstable state. The goal is to reach a configuration in which exactly one token remains, a situation called a stable configuration. This problem cannot be solved by deterministic algorithms and randomization is crucial. Herman's protocol uses synchronous, unidirectional communication and can be shown to eventually reach a stable configuration with probability 1.

Continuous-time Markov Chains (CTMCs)
CTMCs extend DTMCs with a continuous notion of time. Here, the sojourn time of the system in a state is also determined by a random experiment. More specifically, the time is sampled according to a negative exponential distribution. The transitions between states happen just like for DTMCs, i. e., governed by the associated probability distributions. Examples for CTMCs can be found in, for instance, systems biology [23]. In this work, they are used to analyze the effect of concentrations of proteins and reaction rates on signal transduction pathways. In other words, the model combines discrete aspects (the molecule concentration) and continuous aspects (time). Here, not only the probabilistic but also the timing effects are important: since both the underlying chemical reactions as well as the spatial distribution of molecules take time, fundamental questions like "what is the probability that the concentration of X is high after 10 seconds?" require a proper modeling of time.

Markov Decision Processes (MDPs)
MDPs extend DTMCs with nondeterministic choices. That is, instead of a single distribution governing the successor states, the system can nondeterministically select between several distributions. After a selection has been made, the successor states are resolved probabilistically and in the successor state a new selection process is initiated. As already mentioned, nondeterminism can be used to model the possible interaction with an adversarial environment. An important example for this are distributed protocols. Such protocols are often randomized to efficiently break symmetry. However, because of their distributed nature, the progress of the processes is not synchronized and they may be scheduled differently. A well-known example is the randomized consensus algorithm by Aspnes and Herlihy [78]. In this protocol, the participating processes repeatedly modify a shared global counter based on the outcome of a coin flip until the whole system agrees on one of two outcomes, i. e., consensus has been reached. To faithfully model the protocol, nondeterminism can be used to account for the missing information about the scheduling of the competing accesses to the counter. Probabilistic automata (PAs) [88] extend MDPs with action labels, that allow for slightly more flexible modeling.

Markov Automata (MAs)
Finally, MAs extend PAs using the notion of continuous time that CTMCs use. In probabilistic states no time passes, and the system nondeterministically selects one of the available probability distributions. In Markovian states, an amount of time passes that is distributed in a negative exponential manner, as in CTMCs. A wellknown example is the stochastic job scheduling problem [85]. Here, the task is to schedule n jobs with (different) exponential service times onto k processors. The processors are assumed to run a pre-emptive scheduling strategy: upon completion of any job, all k processors can take over any of the remaining jobs. The corresponding MA uses nondeterministic choices to model the assignment of jobs to processors whenever such a choice can be made. Thus, the nondeterminism is used to underspecify the concrete behavior. Determining the job assignment that maximizes the probability for completion within a given time limit can thus be seen as synthesizing a scheduling policy that one would like to impose in the actual system.

Modeling Languages
Markov models for practical purposes are often too large to denote explicitly, but may be described by various more powerful and concise modeling languages. Depending on the domain, different modeling languages are more or less suitable. Furthermore, the structure of the model is often more apparent from a symbolic description than on the state level. Storm therefore tries to support a variety of different input languages. In order to be compatible with the wide-spread usage of Prism, the PRISM language is supported. For testing small models explicit enumeration of states and transitions is supported in two different formats. Furthermore, Storm accepts models given in JANI [20], a modeling language that was devised in a joint effort across multiple tools (involving Epmc, Modest, Fig) in an attempt to unify the cluttered language landscape. Storm supports three other modeling languages. First, the user can input generalized stochastic Petri nets (GSPNs) [81] specified in an extension of the Petrinet Markup Language PNML, which is then translated to JANI automatically. GSPNs are an important modeling formalism in dependability and performance evaluation. Secondly, Dynamic Fault Trees (DFTs) are a means to specify the fault behavior of systems and is a reliability engineering formalism that is widely used in industry [87]. DFTs can be specified in the Galileo format [91]. Finally, a recent trend in the analysis of probabilistic systems is probabilistic programming [45]. The latter refers to programs written in a probabilistic extension of regular programs. An extension to imperative while programs is pGCL [62], and can additionally be extended with statements expressing conditional reasoning [83], an ingredient that is essential to describe Bayesian networks. Storm can parse and translate programs written in pGCL to JANI, which makes such programs amenable to existing probabilistic model checking techniques.

Properties
Storm offers support for a multitude of properties. The most fundamental properties are reachability properties. Intuitively, they ask for the probability with which a system reaches a certain state. One may, e.g., ask -"is the probability to reach an unsafe state of the system less than 0.1?" -"is the probability to reach a target within 20 steps at least 0.9?" For models involving nondeterministic choices, such an analysis will reason about all possible resolutions of nondeterminism and assert that the desired property holds in all cases. Alternatively, an easy extension is to ask for some resolution of the nondeterminism such that the property holds. Besides asking for whether the probability meets some threshold, one may also ask "what is the probability to reach an unsafe state of the system?".
As models can be equipped with cost structures, properties allow for retrieving, e. g.
-"what is the expected cost that is accumulated until reaching a given scenario?" -"what is the expected cost that is accumulated after t time units?" -"what is the expected cost incurred at time point t?" Further properties include conditional probability and cost queries [10,11], long-run average values [3,6,22] (also known as steady-state or mean payoff values), costbounded properties [58] (see Section 4.2 and support for multi-objective queries [42,85] (see Section 4.5).

Model Checking Methods
In probabilistic model checking and arguably in verification in general, (sadly) there is no known "one-sizefits-all" solution. Instead, the best tools and techniques depend heavily on the input model and the properties.
Storm-as well as other model checkers-implements a variety of approaches that allow a knowledgeable user to pick the appropriate method as part of the input, and allows developers to extend and combine their favorite methods. In particular we provide approaches based on solving (explicit) linear (in)equation systems, value-iteration variants on explicit or symbolic representations of (parts of the) model, policy iteration methods, methods using abstraction techniques and bisimulation minimization. We refer to Section 4 for some of Storm's distinguishing features for model checking, and Section 6 for specifics on the technical realization.

Storm's Features
In this section we detail some of the outstanding features of Storm that set it apart from other probabilistic model checkers. We give an overview in Table 2.
In particular, we have chosen four aspects that improve probabilistic model checking of standard properties such as reachability or expected rewards. These are reflected by the first four rows. Sound/exact model checking reflects a collection of approaches that, compared to the classical numerical algorithms, provide stronger guarantees on the accuracy of the obtained results. Costbounded model checking, symbolic bisimulation minimization, and game-based abstraction reduce the size of the analyzed model in various ways to make probabilistic model checking more scalable.
Furthermore, we have selected four extensions that go beyond the classical variants of probabilistic model = except for time-bounded reachability properties (·) = not meaningful Table 2 Overview of distinguishing features of Storm and their applicability based on the model types.
checking: we discuss how to extract counterexamples using Storm, how to handle finding strategies that satisfy multiple properties simultaneously using multi-objective model checking, we discuss parametric models in which probabilities are not a fixed constant but rather an unknown symbol, and tailored model checking of dynamic fault trees. We stress that the modular structure of Storm allows the latter approaches to use the previous methods to speed up regular model checking.

Exact and Sound Model Checking
Several works [97,95,48,15] observed that the numerical methods applied by probabilistic model checkers are prone to numerical errors. This has mostly two reasons. First, the floating point data types used by the tools are inherently imprecise. For example, representing the probability 1 10 using IEEE 754 compliant double precision introduces an error of 5 · 10 −18 . In the presence of numerical algorithms, these errors accumulate and may lead to incorrect results. An alternative to the above is to employ rational arithmetic. That is, by representing probabilities (and costs) in the model and also the results as rational numbers, models may be analyzed without introducing any numerical errors. Storm implements these ideas and allows for the exact solution of many properties. However, efficient approaches for floating point arithmetic such as value iteration become inefficient when using rational numbers, as the representation of the latter grow very large. Storm offers two tailored techniques to solve systems of (in)equations using rational arithmetic. The first is based on policy iteration and Gaussian elimination and the second on a recent technique called rational search [15]. The idea of the latter is to use an (imprecise) approximation of the exact solution and then sharpen this to a precise rational solution using the Kwek-Mehlhorn algorithm [73]. If a straightforward check then returns that the sharpened values constitute an actual solution, the technique can return it. Otherwise, the precision of the imprecise underlying solver is increased and the loop is restarted.
Secondly, the numerical algorithms sometimes themselves are strictly speaking unsound. For example, standard value iteration for computing reachability probabilities approximates the solution in the limit, but the termination criterion implemented by most tools does not guarantee that the obtained result is differing by at most the given precision ǫ from the actual solution. One way to combat these problems is to approach the solution from both directions, a technique referred to as interval iteration [19,48,12]. Storm implements the latter and additionally the more recent sound value iteration [86]. This method ensures a correct result within a user-defined accuracy and comes with a small time penalty, as shown in Section 7.

Cost-bounded Reachability
A typical application for Markov models is to analyze the probability to, e.g., reach a goal state before some resource like time or energy is depleted. Another typical application is to analyze the expected time before a number of tasks have been fulfilled. Both instances can be generalized to cost-bounded reachability. In costbounded reachability one is interested in the behavior of the system that does not violate the bounds on the resources. The classical approach to analyze cost-bounded reachability is to model this behavior in the model description by keeping track of the resources explicitly and then rely on standard reachability queries [5]. That is, the states of the model keep track of the consumed resources, and the reachability query asks, e.g., what the probability is that one of the target states is reached in which the resource bounds are not violated. The downside is that the model grows with these bounds.
Storm alternatively allows modeling the (non-negative) costs of actions or states in the modeling language. These costs are attached in the model, and then one may analyze cost-bounded reachability with the adequate query. The clear advantage of this approach is that the resources are not encoded in the state space which keeps the model much smaller. Rather, Storm does a series of model-checking calls on the much smaller model [58,59], generalizing ideas from [49,72] to multiple cost dimensions. The reduced memory footprint allows to handle much larger models, and often the reduced memory consumption also yields faster verification times.
Cost-bounded reachability is closely related to quantile properties [72,59], where one fixes a desired reachability probability and asks how many resources have to be invested in order to achieve this probability.

Symbolic Bisimulation Minimization
A typical approach to alleviate the state space explosion is to represent the state space symbolically. In the probabilistic setting, employing variants of decision diagrams (DDs) such as multi-terminal binary DDs (MTBDDs) or Multi-valued DDs (MDDs) is the most widely used approach to deal with large state spaces [8]. They are a graph-based data structure that can exploit structure and symmetry in the underlying model to represent gigantic models very compactly.
A different angle to approach the problem is abstraction. Here, the idea is to remove details from the model that are unnecessary for the desired analysis. A well-studied technique is bisimulation minimization. Its core idea is that states with equivalent behavior (in some suitable sense) can be merged to obtain a quotient model that preserves the properties of the original input. Then, the (potentially much smaller) quotient can be analyzed instead. Bisimulation minimization was shown to yield substantial reductions in the case that models are represented explicitly (for instance in terms of a probability matrix) [70].
Storm allows to combine a symbolic representation with bisimulation minimization, thereby extending previous work [95,36]. We extended the approach to deal with nondeterministic models, which makes it available on all four model types supported by Storm (see Section 3.1). This combination leads to significant reductions in memory and time consumption for a variety of models, and enables the analysis of models that are otherwise out of reach [63]. The resulting quotient model is often small enough to be represented explicitly which enables a wide range of efficient analysis methods.

Game-Based Abstraction-Refinement
Even though bisimulation minimization effectively helps reducing the model, it has two major drawbacks. First, it is not guided by the concrete analysis that is to be performed. The quotient model may be much too fine for the analysis of a given property as it preserves a whole class of properties. Secondly, with few exceptions [34], the algorithms to compute the bisimulation quotient require the entire state space and transitions to be available. If the model is very large or even infinite, the algorithms fail to produce a quotient even if the quotient is very small.
Game-based abstraction [75] addresses these two challenges. It is based on two fundamental ideas. The first is that states are merged much more aggressively than in bisimulation minimization. That is, they may be collapsed even if they have distinguishable behavior. The behavior of the original model is over-approximated by the abstraction and the latter can therefore be used to obtain sound bounds for the measures on the former. Note that the abstraction contains two sources of nondeterminism: the one present in the original model and the nondeterminism that is introduced by the abstraction process. Merging these sources of nondeterminism results in very loose and unsatisfactory bounds on the target values. The second idea therefore is to keep the two kinds of nondeterminism apart. This gives rise to a stochastic game [27] whose solution gives lower and upper bounds on both minimal and maximal probabilities in the original model.
Storm implements a game-based abstraction-refinement loop based on the ideas in [94]. The loop is illustrated in Figure 3. As a first step, the abstract game is derived from the model and the current partitioning of the states, which is initially induced by the given property. If the bounds obtained by the analysis of the game are precise enough, they can be returned. Otherwise, the abstraction is refined by splitting the partition in a suitable way and the process is repeated. To enable the analysis of gigantic or even infinite models, the abstraction is extracted directly from the highlevel model description (given in terms of a PRISM or JANI model). This extraction is achieved by the formulation as a (series of) satisfiability problem(s), which are dispatched to an off-the-shelf solver. While this has the aforementioned advantages, it is often the computationally most expensive part of the overall procedure.

Multi-Objective Model Checking
Initially, the focus in many probabilistic model checkers was mostly on computing the probability that a certain event happens. However, probabilistic model checking can provide meaningful data beyond the probability to reach some state, such as the optimal strategies for MDPs, i.e., functions that describe how to resolve the nondeterminism in an MDP such that the induced behavior satisfies a given property.
However, if a strategy should satisfy multiple properties, standard model checking techniques do not suffice. Consider two properties limiting time and energy usage. Standard techniques would independently compute two strategies, one optimizing time, the other optimizing energy consumption. Both strategies might be wasting the other resource, thus violating the limits described in the matching combined property. Multiobjective model checking [41,42] helps in finding strategies that satisfy multiple properties at once, and can be used to clarify the trade-offs between various properties.
Essentially, state-of-the-art multi-objective model checking boils down to a series of preprocessing steps on the model, and then either solving a linear program [42] or iteratively applying standard model checking tech-niques [43]. Storm supports multi-objective model checking on MDPs, and in addition on MAs [85]. Furthermore, it allows for a more flexible combination of various properties, including properties with (multiple) costbounds [58,59], and incorporates some particularly efficient preprocessing steps.

Synthesis of High-Level Counterexamples
Besides the computation of a single strategy, the synthesis of counterexamples and/or of sets of strategies that all satisfy or violate a given property has gained some attraction. Here, we discuss counterexamples, but similar ideas have been used for so-called permissive strategies [37] as implemented in Storm using [68]. Suppose that a system reaches a bad state with a probability above some threshold. To locate the reason for this behavior, it is helpful to obtain the part of the system that leads to this behavior, by means of a counterexample. Counterexamples try capturing the essence of the failed verification attempt and help the user of the model checker-being a human or another algorithm-to revise the system or its model accordingly. In the non-probabilistic setting, a counterexample may be represented as one offending run of the system. However, such a representation is not necessarily possible in the probabilistic setting as there may be infinitely many paths that contribute to the overall probability mass reaching the bad state [53]. A single run ending in a bad state is therefore typically insufficient as a counterexample. While it is possible to consider sets of paths for probabilistic safety properties, the resulting counterexamples are large and hard to comprehend. Alternatively, counterexamples can be computed as sub-Markov models [1,25].
Rather than considering counterexamples at the statespace level, Storm computes counterexamples in terms of the high-level model specification using the ideas of [96]. More concretely, given a JANI (or PRISM) model that violates a safety property, Storm computes the smallest portion of the JANI code that already witnesses the violation based on the method proposed in [31]. It does so by a guided exploration of all candidate sub-models. Ultimately, the smallest sub-model highlights the core of the problem. It does so at the abstraction level of the user. High-level counterexamples are thus a valuable as diagnostic feedback to tool users (by humans). Recent work has illustrated that these examples can be effectively used in a counterexampleguided inductive synthesis approach of finite Markov chains [24].

Parametric Model Checking
Naturally, the model checking result of Markov models crucially depends on the transition probabilities. Often, these probabilities are approximations based on data or reflect configurable parts of a modeled system. To represent the uncertainty about the probabilities, parametric Markov models have been first considered in [30,79]. In parametric Markov models, the probabilities are symbolic expressions rather than concrete values. For any valuation of the parameters, replacing the parameters in a parametric Markov model yields an instantiated parameter-free Markov model.
There are many interesting questions that one can ask revolving around parametric systems. The simplest is feasibility, i.e., whether there exist a valuation such that the instantiated Markov model satisfies a property. More advanced is parameter space partitioning where the goal is to decompose the parameter space into regions in which a predefined property is either satisfied or violated. Such a decomposition indicates for most parameter valuations whether they lead to a system that satisfies the given property. An alternative question is to find the solution function, i.e., a function in closed-form that gives the model checking result of the instantiated Markov model in terms of the parameter values.
Storm supports the construction and analysis of parametric Markov models. Besides handling models and supporting efficient instantiation of parametric models, Storm provides three methods to perform parameter synthesis. The first is based on computing the aforementioned solution function through state elimination [30,51] that can also be seen as Gaussian elimination. This basic algorithm is improved by heuristics that order the operations, and a representation of the rational functions that allows for faster operations [32]. The second method, referred to as parameter lifting, avoids computing a potentially large rational function and determines validity of a formula over a region of parameter valuations through a sound abstraction into a nonparametric system [84]. The third method [90] aims to analyze whether the solution function is monotonic in some parameter without actually computing the solution function, as the latter can be exponential in the number of parameters. These and further methods are all used by the parameter synthesis tool PROPhESY [67] which provides a playground for parameter synthesis approaches using Storm as a back-end.

Model Checking Dynamic Fault Trees
Fault trees [87] are widely used in reliability engineering and model how component failures lead to failures of the complete system. Dynamic fault trees (DFTs) [38] extend (static) fault trees by dynamic gates. DFTs more faithfully model systems by allowing order-dependent failures, functional dependencies and spare management.
Dynamic fault trees may be translated into corresponding Markov models [38,17] whose analysis yields common measures on dynamic fault trees, such as reliability and mean-time-to-failure. The analysis of the corresponding Markov models also allows more complex measures, e. g., dealing with degraded modes [44]. The essential step here is that Storm supports all these queries out-of-the-box. Due to the modular architecture of Storm features such as parametric DFTs are supported off-the-shelf without dedicated implementation.
To drastically improve the analysis of DFTs, Storm contains a dedicated translation of such models into Markov models [93]. To make the state-space generation as fast as possible, Storm utilizes the structure of the DFT, and constructs a Markov model that contains only the relevant behavior of the DFT. Symmetries in the fault trees are exploited to further collapse the model with is then subject to regular model checking with Storm. As the state-space explosion might still be present during translation, Storm also supports a partial state-space generation for DFTs [93]. This partial state space yields a sound abstraction, which may be model checked to obtain safe lower and upper bounds. The state space can be iteratively extended to obtain the desired precision of the analysis result.

Using Storm
Storm is available as free and open software. Below, we give an overview how to use Storm. A detailed and up-to-date guide may be found on Storm's website: http://stormchecker.org Before you start. Storm has to be configured and compiled on the target machine. This procedure automatically looks up various dependencies, and (optionally) adds them if they are not found on the system. While this configuration and compilation procedure offers some advantages, see Section 6.5, it is often cumbersome. Therefore, we recommend users which only want to experiment to rely on the docker containers 4 containing Storm with all the key dependencies, and all interfaces and extensions. One may start right away, at the cost of slightly reduced performance.

Command Line Interface
The key way to interact with Storm is through its command line interface. The command line interface allows to specify the input model and properties, and after analysis reports on the requested results. The command storm --prism brp.pm --prop brp.props invokes Storm with a PRISM description in brp.pm, and the properties listed in a file brp.props. Storm will build the model and perform model checking on each property. For advanced users, the methods used for model checking can be flexibly yet simply set, e.g., storm ... --engine hybrid --eqsolver elimination sets the engine to hybrid (see Section 6.3) and sets the linear equation solver to state elimination, see Section 6.4. Experts may exploit the possibility to configure even details of the various procedures, e.g., the order in which state elimination is applied.

C++ Extensions
To be able to flexibly use the internal data structures of Storm, one may build an own tool using Storm as a library. This approach is also taken by the Storm command-line interface, as well as other extensions shipped and tightly bound to Storm, such as the analysis of DFTs outlined in Section 4.8. This approach is the most flexible and powerful way of using Storm, but also requires most effort. We illustrate model checking DTMCs with the sparse engine in Figure 4. The code parses a string and a property, builds a DTMC corresponding to the model, and applies model checking on the property to compute the corresponding probability for all states. The output is then created based on the model checking result of (some) initial state. We provide a minimal working example to build your own C++ tool based on Storm as a template repository 5 .

Python Interface
A much quicker way to flexibly interact with (a selection of) Storm's internal data structures is the Python 5 http://stormchecker.org/api/starter-project API called Stormpy 6 . We exemplify the ease of use in Figure 5. The code is equivalent to Figure 4. Using Python may induce some runtime penalty, but it enables a flexible access to the main functionality of Storm. We stress that the code is powerful enough to drive also larger projects, e.g., the parameter synthesis tool PROPhESY [32] relies on Stormpy. We provide a minimal working example to build your own Python tool based on Stormpy as a template repository 7 .

Architecture
In this section, we report on some internal aspects of Storm. In particular, we aim to address how we realized performance and modularity. Naturally, we cannot go into the details of the various algorithms. Rather, we discuss some design choices that will help a user to feel more familiar with the code base.

Logical Structure
The root directory of Storm contains-among otherssources and resources. The latter contains the logic for the configuration routines as well as various third-party dependencies. The sources are divided into various libraries and executables. The core functionality is found in the storm library. Inside that library, one finds data structures for the representation of matrices, models, expressions, modeling languages, as well as the model checking engines and solvers, which are discussed below. Besides this library, there are libraries for parsing, handling parametric models, and handling various modeling formalism such as GSPNs and DFTs. All libraries depend on the core storm library. Moreover, most libraries are accompanied by executables that provide adequate command line interfaces.

Models
Storm features two different in-memory representations of Markov models. First, it can use sparse matrices, an explicit representation form that uses memory roughly proportional to the number of transitions with nonzero probability. Sparse matrices are suited for small and moderately-sized models and allow for fast operations also on models with irregular structure. Secondly, Storm can store models symbolically using MTBDD, cf. Section 4.3. The MTBDDs are built from the model description directly. While it is possible to go from MTBDDs to the explicit representation, the other direction is not (efficiently) possible. While MTBDDs often store a model compactly, typical operations for the analysis of models yield a growth in the MTBDDs and are therefore often slow. All models can be built representing the reachability probability with floating point arithmetic, exact rational numbers, or rational functions.

Model Checking Engines
Storm's engines are built around the two model representations. The sparse engine exclusively uses the sparse matrix-based representation. It first constructs the matrix representation of the state space by exploring the reachable state space specified in the modeling language, and then analyses the model using one of the many (standard, numerical) approaches, which are encapsulated as solvers (see below). While the exploration engine also uses sparse matrices, it uses ideas from reinforcement learning to avoid exploring all reachable states [19]. Instead, it proceeds in an "on-the-fly" manner and explores those parts of the system that appear to be most relevant to the verification task.
The next two engines use MTBDDs as their primary form of representation. Except for the concrete in-memory representation, the dd engine is the counterpart to the sparse engine in the sense that model building and verification is done on the very same representation and no translation takes place. Storm's hybrid engine tries to avoid the costly numerical operations on MTBDDs by transforming only the relevant parts of the system into a sparse matrix representation 8 .
Finally, the abstraction-refinement engine implements the technique described in Section 4.4 and is able to compute bounds for both minimal and maximal reachability probabilities for (infinite) MDPs.
Support for queries and model descriptions. The sparse engine supports all model checking queries present in Storm and all DTMCs, CTMCs, MDPs and MAs described in PRISM or JANI. The engine can be paired with sound or exact model checking as in Section 4.1. However, exact arithmetic does not support time-bounded properties in CTMCs and MAs as these involve exponentials. Many advanced features such as cost-bounded reachability and multi-objective model checking are only implemented in the sparse engine. The support within other engines is more limited. A precise list is intricate to provide, but the following restrictions cover the typical benchmarks. The dd engine does not support continuous time models (considered too slow) and the hybrid engine has no support for MAs. The exploration engine and the abstraction-refinement engine are both limited to reachability queries on discrete-time models. Moreover, some advanced features of the JANI language (indexed assignments, non-trivial system compositions) currently cannot be translated into DDs.

Solvers
Probably the most outstanding trait of Storm's architecture is the concept of solvers. Ultimately, many tasks related to (probabilistic) verification revolve around solving subproblems. For example, computing reachability probabilities or expected costs in a DTMC reduces to solving a system of linear equations. Similarly, for an MDP a system of equations needs to be solved, with the difference that the equations are Bellman equations involving minima and maxima. However, these are by no means the only kinds of problems appearing in probabilistic verification. Figure 6 illustrates some functionalities of Storm which have a dependency to one or more solvers. For example, (explicit) model building employs Smt solving. As the initial states of symbolic models (e. g. PRISM or JANI) are given by the satisfying assignments of an expression, Storm uses Smt solvers to enumerate the possible initial states. Similarly, the extraction of the abstract model from the symbolic model (as presented in Section 4.4) in the abstraction refinement engine crucially depends on enumerating satisfy-ing assignments and therefore Smt solvers. As yet another example, consider the synthesis of high-level counterexamples as in Section 4.6. Here, one of the offered techniques relies on the solution of a Milp while the other uses Smt solvers.
Two of the main goals in the development of Storm were the ability to exchange central building blocks (like solvers) and to benefit from (re)using high-performance implementations provided by other libraries. It therefore offers abstract interfaces for the solver types mentioned above that are oblivious to the underlying implementation. Offering these interfaces has several key advantages. First, it provides easy and coherent access to the tasks commonly involved in probabilistic model checking. Secondly, it enables the use of dedicated stateof-the-art high-performance libraries for the task at hand. More specifically, as the performance characteristics of different backend solvers can vary drastically for the same input, this permits choosing the best solver for a given task. Licensing problems are avoided, because implementations can be easily enabled and disabled, depending on whether or not the particular license fits the requirements. Finally, implementing new solver functionality is easy and can be done without detailed knowledge of the global code base. This flexibility allows to keep Storm up to date with new state-of-the-art solvers.
For each of the solver interfaces, several actual implementations exist. For example, Storm currently has four implementations (each of them with a range of further options) of the linear equation solver interface for problems given as sparse matrices: one is based on Gmm++, one on Eigen [46], one uses its native internal data structures and algorithms for numerical algorithms and another one is based on Gaussian elimination [30]. Table 3 gives an overview over the currently available implementations. Here, all solvers that are purely implemented in terms of Storm's data structures and do not use libraries are marked with an asterisk to indicate that they are "built-in".
To realize the support for DD-based representations of systems, Storm relies on two different libraries: CUDD [89] and Sylvan [35]. While the former is very well established in the field, the latter is more recent and tries to make use of modern multi-core CPU architectures by parallelizing costly operations.  Table 3 The solvers Storm provides out-of-the-box.
possible to write code that is independent of the underlying library and does not incur runtime costs.

Technicalities
By far the largest part (over 170,000 lines of code) of Storm is written in the C++ programming language and extensively uses template meta-programming. This has several positive and negative implications. On the one hand, it serves the purpose of high performance for several reasons. First, C++ allows fine-grained control over implementation details like memory allocations. Secondly, C++ templates allow code to be heavily reused while maintaining performance as the static polymorphism enables type-dependent optimizations at compile-time. Large parts of the code are written agnostic of the data type (floating point, rational number or even rational functions) and only the core parts are specialized based on the data type. As this happens at compile-time, no runtime cost is incurred. Finally, we observe that many high-performance solvers and data structure libraries that are well-suited for the context of (probabilistic) verification are written in C or C++ (and also partially make use of template metaprogramming), such as -SMT solvers (Z3 [82], MathSat [26], Smt-Rat [28]), -LP solvers (Gurobi [47], glpk 9 ), -linear algebra libraries (Gmm++ 10 , Eigen [46]), -DD libraries (CUDD [89], Sylvan [35]), and rational arithmetic libraries (CArL [28], GMP 11 ).
Choosing C++ as the language for Storm therefore allows easy and fast interfacing with these solvers. On the other hand, the advantages come at a price. Advanced templating patterns can be difficult to understand and increase compile-times significantly.

Evaluation
This section contains an empirical evaluation of some key functionalities of Storm. Furthermore, we recap results of QComp 2019 [50] to emphasize the competitiveness of Storm.

Setup and Methodology
We consider the set of 100 benchmark instances that were selected in QComp 2019 [50]. Each instance consists of a symbolic model description and a property specification from the Quantitative Verification Benchmark Set (QVBS) [60]. If available, we consider model descriptions in the PRISM language. Otherwise, the model is build from the JANI description. For a better comparison across Storm's engines, we did not employ the techniques from Section 4.8 to solve DFTs. Since Storm has no native support for PTA, we used the tool moconv (part of the Modest Toolset 12 [56]) to translate PTAs into MDPs. For four instances either moconv did not support the PTA or Storm did not support the output of moconv. We therefore restrict our evaluation to the remaining 96 benchmark instances.
For each instance, the task is to solve the corresponding model checking query within a time limit of 30 minutes and a memory limit of 12 GB. The results are compared to the reference results provided by the QVBS. If the relative difference between these values is greater than 10 −3 , the result is considered incorrect.   Whenever the invoked model checking method is sound (i. e. provides precision guarantees), the precision of Storm is set to 10 −3 (relative). Otherwise, Storm's default precision 10 −6 (relative) is used. We select Sylvan [35] as DD-library, and set its memory limit to 4 GB. We also consider a "fastest" configuration that takes the best result from the six configurations, i.e., a configuration which runs all six configurations and terminates whenever the fastest terminates (and further runs the six configurations independently on different machines). All benchmark files, log files and replication scripts are available at [64].

Results
We summarize the outcomes of our experiments in Table 4. The six columns refer to the six configurations as described above. In the first row we indicate how many of the 96 considered instances were correctly solved for each configuration. The subsequent rows indicate the number of not supported instances 13 , the number of times the time-or memory limit was exceeded, respectively, and the number of incorrect results 14 that were obtained. Observe that these rows always sum to 96. For the "fastest" configuration, we obtain 86 solved instances and 0 incorrect results. The next rows (after the horizontal line) show how often each configuration was either the fastest among the tested ones or only 1% (50%) slower than the fastest one, i.e., terminated within 101% (150%) of the fastest configuration.
We further compare the runtimes of the different engines and features in Figure 7. The shown quantile plot expresses how many benchmark instances (measured on the x-axis) each were solved in at most the time given on the y-axis. In other words, the point x, y is contained in the quantile plot for configuration c if the maximal runtime when using c on the x fastest solved instances (for c) is y seconds. Time-and memory-outs, incorrect results and unsupported experiments may skew the lines of the affected configurations as all these outcomes do not count as solved. Besides the six considered configurations we also depict the runtime obtained by the fastest engine or feature for each individual benchmark.
Finally, we compare the configurations of Storm one-by-one and give the results in Figure 8. Each point in the depicted scatter plots indicates the runtimes of the two compared configurations for one benchmark instance. The type (DTMC, CTMC, MDP, MA, or PTA) of the verification task is indicated by means of different marks. The scatter plots use logarithmic scales on both axes and indicate speed-ups of 10 by means of dotted lines. If an experiment ran out of resources (time or memory), was not supported, or yielded an incorrect result, we draw the point on separate lines, labeled OOR, NS, and INC, respectively. We compare the engines (sparse, hybrid, and dd) with each other in More detailed results of our experiments can be found on http://stormchecker.org/benchmarks.

Discussion
The sparse engine was the most versatile engine during our experiments since it supports all 96 instances and successfully solved the majority (71) of them, outperforming the other engines. However, looking at Figure 7 we see that other engines are competitive. In fact, picking the "right" configuration for a given benchmark may drastically reduce verification times. As indicated in Figure 8, several instances could only be solved using symbolic techniques based on the hybrid or the dd engine. We emphasize that the benchmark selection can have a strong impact when comparing the engines of Storm because the symbolic engines are strongly reliant on the model structure. Moreover, many benchmarks are not supported by the hybrid and/or the dd engine which skews the lines in Figure 7. Symbolic bisimulation was extremely effective on models with a concise decision diagram (DD)-based representation and a small bisimulation quotient. The export into a sparse quotient allows Storm to make use of the versatility of the sparse engine.
In Figure 8(e) we see that the overhead for sound model checking is often negligible. As mentioned above, we invoke classical model checking (such as value iteration) with the default precision parameters (10 −6 relative precision) whereas sound model checking is invoked with the actual precision requirements (10 −3 relative precision), yielding speed-ups for some instances.
Exact model checking is comparably costly. The use of exact (infinite precision) arithmetic induces increasingly larger number representations. Moreover, approximative, numerical solution methods cannot be applied. However, on a few instances where numerical methods do not work well, exact model checking was superior to the remaining configurations.

Summary from QComp 2019
We briefly recap the results of QComp 2019, focusing on the performance evaluation. For further details we refer to the competition report [50].
The experimental setup of QComp 2019 (benchmark selection, precision requirements, time-and memory limits, . . . ) coincides with our setup as detailed above, except that a different machine was used, and -Storm was considered in version 1.3.0.
Each tool was executed in two different modes: Once with default settings (which for Storm coincides with using the sparse engine) and once with benchmark specific settings. For the latter mode, the participants could provide a tailored tool invocation for each individual benchmark instance. For Storm this was realized by empirically determining the fastest configuration for a given instance, where we considered the configurations sparse, hybrid, dd, bisim, sound, and exact (as above). Figure 9 depicts the performance results of QComp 2019 that are relevant for Storm. The quantile plots in Figures 9(a) and 9(b) compare Storm with the other participating general-purpose probabilistic model checkers Epmc [52], mcsta [57], and Prism [76] using the default and specific modes, respectively. Storm supported 96 of the 100 considered benchmark instances, whereas Epmc, Prism, and mcsta supported 63, 58, and 86 instances, respectively. For the quantile plots only the 43 instances that were supported by all 4 tools were taken into account. In particular, all benchmarks are given in PRISM language since Prism does not support JANI. The scatter plots in Figures 9(c) and 9(d) compare Storm with the best of the other 8 participating tools. A point above the solid diagonal line indicates that on the corresponding instance, Storm was the fastest tool among all participants.
Considering the results for the default mode in Figure 9(a), Storm is the strongest competitor of the other three tools. However, the performance results of Storm and Prism are very close to each other. For instancespecific invocations (Figure 9(b)), Storm clearly outperformed all its competitors. The scatter plots show that Storm performed best among all tools for 1/3 of the supported benchmarks in default mode and 1/2 of the supported benchmarks in specific mode.
Since QComp 2019 further progress of the participating tools has been made. For example, new and efficient model-checking techniques for MAs have been implemented in mcsta [21]. We remark that a repetition of the evaluation of QComp with recent tool versions could have yielded different results.

Conclusion
This paper presented the state-of-the-art probabilistic model checker Storm. We have discussed its main distinguishing features, and described how it can be used for rapid prototyping of new algorithms and tools. Key aspects of Storm are its modularity, its accessibility through a Python interface, its various modeling formalisms, as well as the functionalities that go beyond the standard probabilistic model-checking algorithms. We believe that its modularity, careful crafting of the most time-consuming operations, and our experience with earlier in-house developed model checkers, have led to a tool that is competitive to existing probabilistic model checkers. Storm provides an effective and efficient platform for future-proof developments in probabilistic model checking. It is open access and publicly available from http://stormchecker.org. Challenges will be to keep up with the rapid progress in the field. This does not only involve the implementation of new algorithms, but also involves constantly revising existing code fragments.