1 Introduction

The verification of systems involving stochastic uncertainty is a prominent research challenge. Among the many techniques is probabilistic model checking, a mature technique that grew out of model checking.

A model checker takes the formal system model and the formal property as inputs and, somewhat simplifying, returns one of three results, see Fig. 2. It reports that the property holds or is violated, and these reports are—given a correct implementation—guaranteed to be correct. The third outcome is that the model checker ran out of computational resources. Model checking has written numerous success stories [16, 79], and major contributors Edmund M. Clarke, E. Allen Emerson and Joseph Sifakis were awarded the Turing Award in 2007. Probabilistic model checking extends traditional model checking with tools and techniques for the analysis of systems involving random phenomena or other forms of behavior that can be approximated by randomization. Alur, Henzinger and Vardi [3] state: “A promising new direction in formal methods is probabilistic model checking, with associated tools for quantitative evaluation of system performance along with correctness.” Distributed algorithms and communication protocols are natural examples, as they often use randomization to efficiently break symmetry. Another example are cyber-physical systems that tightly integrate software and hardware such as sensors, actors and microcontrollers. In particular, sensor readings may be noisy, actors may not always have the same effects, and physical components may fail. Other domains that give rise to models involving probabilistic aspects include, e.g., security protocols and systems biology. All these systems are naturally mapped to Markov models, and probabilistic model checking takes exactly such models as input.

Probabilistic model checking is not new. Initial theoretical results and algorithms for Markov chains [65, 66] and Markov decision processes [37, 116] were provided about thirty years ago. First tool support using explicit [53] and symbolic data structures [10, 73] followed. Tool realizations for continuous-time Markov chains appeared shortly thereafter [78]. Prism evolved as one of the main probabilistic model checkersFootnote 1 covering all these models in a symbolic way [91]. In more recent years, tool support extended to cover probabilistic real-time and hybrid systems, as well as multi-player games.

Meanwhile, research in probabilistic model checking continued, changed directions, and progressed in new application areas. The diversity of this field motivated the development of a modular and adaptive model checker, called Storm. Storm’s main aim is to be a performant, easily extendible platform supplying various probabilistic model checking algorithms. After five years of development, Storm was released as open-source project in 2017 [41]. Despite its relative young age, Storm has established the following in pursuit of its original goals:

  • In the first edition of QComp [60], Storm compared favorably with other model checkers. Consider the quantile plot in Fig. 1. The quantile plot expresses how many benchmark instances (on the x-axis) each were solved in at most the time given on the y-axis. In other words, the point \(\left\langle x, y \right\rangle \) is contained in the quantile plot for tool c if the maximal runtime when using c on the x fastest solved instances (for c) is y seconds. Storm solved more instances and was generally faster in solving these instances. We elaborate these results in Sect. 7.4.

  • Storm’s modularity paid off in various occasions: The tool has been adapted to include various novel variants to the typical value iteration algorithm and has been extended with parameter synthesis for probabilistic systems and multi-objective model checking. In many of these areas, Storm has helped to push the state of the art considerably. We elaborate these results in Sect. 4.

Fig. 1
figure 1

Runtime comparison of general-purpose probabilistic model checkers taken from the QComp 2019 report [60] licensed under Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/

In this paper, we report on Storm’s main features and how to use them. We start with a very quick overview introducing Storm before elaborating the supported models and properties. We survey Storm’s most prominent building blocks and unique features in greater detail and discuss the possibilities to interface with these features in Storm. Finally, we report on its internal tool architecture and provide some empirical evaluation of the main configurations of Storm on the QComp 2019 benchmark set.

A video tutorial covering Storm and some of its core features is available at

http://stormchecker.org/video-tutorial.

2 Storm in a nutshell

Research to advance concepts and methods for probabilistic model checking often combines key routines and a variety of essential model checking algorithms. Storm delivers these. Some main characteristic features of Storm that help to push the state of the art in probabilistic model checking are that Storm

  • contains efficient implementations of well-known and mature model checking algorithms for discrete-time and continuous-time Markov chains and Markov decision processes, but also for the more general Markov automata [49], a model containing probabilistic branching, nondeterminism, and exponentially distributed delaysFootnote 2;

  • supports explicit state and symbolic (BDD-based) model checking as well as a mixture of these modes to handle a wider range of models;

  • has a modular setup, enabling the easy exchange of different solvers and distinct decision diagram packages; its current release supports about 15 solvers and two BDD packages.

  • extends probabilistic model checking with the possibility of generating (high-level) counterexample [39], synthesizing permissive schedulers [46], symbolic bisimulation minimization [119, 121] as well as game-based abstraction of infinite-state MDPs [120].

  • offers the possibility to improve the reliability of model checking by supporting exact rational arithmetic using recent techniques [18] and techniques to avoid premature termination of value iteration [110].

  • supports advanced properties such as multi-objective model checking [51, 52, 109], efficient algorithms for conditional probabilities and rewards [13], and long-run averages on MDPs [6, 44] and MAs [28]. Storm also contains (the essential building blocks) for handling parametric models such as [38, 108, 114];

Storm can also be used to investigate the application of model checking in novel domains: In particular,

  • Storm supports various native input formats: the Prism and Jani languages, generalized stochastic Petri nets, dynamic fault trees, and conditioned probabilistic programs. This support makes it easier to apply probabilistic model checking, and amounts not to just providing another parser; state-space reduction and generation techniques as well as analysis algorithms are partly tailored to these modeling formalisms;

  • besides a command line interface with many optional arguments, Storm provides a Python API facilitating easy and rapid prototyping of other tools using the engines and algorithms of Storm;

  • it provides advanced approaches to model checking (see above) and good performance in terms of verification speed and memory footprint, cf. Fig. 1, under one roof.

How does Storm relate to other probabilistic model checkers? Storm has not reinvented the wheel, but has rather been inspired and learned from the successes of in particular Prism [93] and the explicit model checker Mrmc [88]. Like its main competitors Prism, mcsta [67], and Epmc [62], Storm relies on numerical and symbolic computations. Although many functionalities are covered by Storm, there are some significant areas that Storm has not been extended to. It does not support discrete-event simulation against temporal logic formulas, known as statistical model checking [2, 97]. Storm does not support LTL model checking (as supported by Epmc and Prism), does not support probabilistic timed automata (as supported by mcsta and Prism), has no equivalent of Prism’s hybrid engine (a crossover between full MTBDD and Storm’s hybrid engine), and does not support the analysis of stochastic games. A longer survey of both features and performance of the various model checkers can be found in [26, 60]. A detailed comparison between Storm, Epmc, mcsta, and Prism is given in [76].

3 Probabilistic model checking with Storm

Fig. 2
figure 2

Overview of the model checking approach [12]

We give a gentle introduction to probabilistic model checkingFootnote 3 with Storm, clarifying the different parts as outlined in Fig. 2. For surveys and more formal introductions to probabilistic model checking, we refer to [9, 12, 86].

Table 1 Overview of model types

3.1 Model types

Storm supports the analysis of several different formalisms. They differ regarding (i) their notion of time and (ii) whether or not nondeterministic choices are allowed. Table 1 shows a categorization of the models supported by Storm along the two dimensions. In a third dimension, Storm supports partially observable models, in which the way nondeterminism is resolved is restricted.

Discrete-time models abstract from timing behavior by viewing the progression of time in terms of discrete steps. In contrast, continuous-time models use real numbers to model the flow of time and therefore have a dense notion of time. Deterministic models (also referred to as Markov chains from now on) behave purely probabilistically. Dually, in MDPs and MAs, nondeterministic choices can be used to model, for instance, the interaction with an adversarial environment or underspecification of the model with the goal to synthesize the optimal concrete system. In general, all model types can be enriched with cost structures. Together with the probabilities in the model this allows for reasoning over, for instance, expected costs until a certain goal is reached. Rather than providing formal definitions, we will illustrate a typical use case for each model type.

3.1.1 Discrete-time Markov chains (DTMCs)

We start with the simplest model. In DTMCs [103], every state is equipped with a single probability distribution over successor states. The evolution of the system therefore is fully probabilistic in the sense that it is governed only by repeated randomized trials. A famous example that can be captured in terms of a DTMC is the Herman protocol [94]. The general setting is this: a ring consisting of identical processes that each start either with a token or without one. If more than one process holds a token, the protocol is in an unstable state. The goal is to reach a configuration in which exactly one token remains, a situation called a stable configuration. This problem cannot be solved by deterministic algorithms and randomization is crucial. Herman’s protocol uses synchronous, unidirectional communication and can be shown to eventually reach a stable configuration with probability 1.

3.1.2 Continuous-time Markov chains (CTMCs)

CTMCs [103] extend DTMCs with a continuous notion of time. Here, the sojourn time of the system in a state is also determined by a random experiment. More specifically, the time is sampled according to a negative exponential distribution. The transitions between states happen just like for DTMCs, i.e., governed by the associated probability distributions. Examples for CTMCs can be found in, for instance, systems biology [29]. In this work, they are used to analyze the effect of concentrations of proteins and reaction rates on signal transduction pathways. In other words, the model combines discrete aspects (the molecule concentration) and continuous aspects (time). Here, not only the probabilistic but also the timing effects are important: Since both the underlying chemical reactions and the spatial distribution of molecules take time, fundamental questions like “what is the probability that the concentration of X is high after 10 seconds?” require a proper modeling of time.

3.1.3 Markov decision processes (MDPs)

MDPs [107] extend DTMCs with nondeterminic actions. That is, instead of a single distribution governing the successor states, the system can nondeterministically select between several actions, each identifying a different distribution. After a selection has been made, the successor states are resolved probabilistically, and in the successor state, a new selection process is initiated. As already mentioned, nondeterminism can be used to model the possible interaction with an adversarial environment. An important example for this are distributed protocols. Such protocols are often randomized to efficiently break symmetry. However, because of their distributed nature, the progress of the processes is not synchronized and they may be scheduled differently. A well-known example is the randomized consensus algorithm by Aspnes and Herlihy [95]. In this protocol, the participating processes repeatedly modify a shared global counter based on the outcome of a coin flip until the whole system agrees on one of two outcomes, i.e., consensus has been reached. To faithfully model the protocol, nondeterminism can be used to account for the missing information about the scheduling of the competing accesses to the counter. Probabilistic automata (PAs) [112] extend MDPs with a more flexible action labeling.

3.1.4 Markov automata (MAs)

Finally, MAs [49] extend PAs using the notion of continuous time that CTMCs use. In probabilistic states no time passes, and the system nondeterministically selects one of the available probability distributions. In Markovian states, an amount of time passes that is distributed in a negative exponential manner, as in CTMCs. A well-known example is the stochastic job scheduling problem [109]. Here, the task is to schedule n jobs with (different) exponential service times onto k processors. The processors are assumed to run a preemptive scheduling strategy: Upon completion of any job, all k processors can take over any of the remaining jobs. The corresponding MA uses nondeterministic choices to model the assignment of jobs to processors whenever such a choice can be made. Thus, the nondeterminism is used to underspecify the concrete behavior. Determining the job assignment that maximizes the probability for completion within a given time limit can thus be seen as synthesizing a scheduling policy that one would like to impose in the actual system.

3.1.5 Partially observable MDPs (POMDPs)

Partially observable MDPs [7, 85] are a popular extension that cater for a common issue with the analysis of MDPs. That analysis typically assumes that the nondeterminism can be resolved arbitrarily. The policy resolving the nondeterminism might, for example, depend on the internal state of a remotely running process. Consequently, the policies that are synthesized by such an analysis are unrealistic, and the verification results are too pessimistic. Consider a game like mastermind, where the adversary has a trivial strategy if it knows the secret they have to guess. Intuitively, to analyze an adversary that has to find a secret, we must assume it cannot observe this secret. For a range of privacy, security, and robotic domains, we may instead assume that the adversary must decide based on system observations. In widespread examples [85], the position of a robot is unknown and can only be determined by landmarks (such as doors), or the position of other agents in the same environment can only be observed if these agents are sufficiently close.

Formally, POMDPs extend MDPs by a set of observations and label every state with a one of these observations. Extensions in which actions are labeled or where states are labeled with distributions over observations can be reduced to this simpler case.

3.2 Modeling languages

Markov models for practical purposes are often too large to denote explicitly, but may be described by various more powerful and concise modeling languages. Depending on the domain, different modeling languages are more or less suitable. Furthermore, the structure of the model is often more apparent from a symbolic description than on the state level. Storm therefore tries to support a variety of different input languages. In order to be compatible with the widespread usage of Prism, the Prism language is supported. For testing small models, explicit enumeration of states and transitions is supported in two different formats. Furthermore, Storm accepts models given in Jani [25], a modeling language that was devised in a joint effort across multiple tools (involving Epmc, Modest, Fig) in an attempt to unify the cluttered language landscape. Storm supports three other modeling languages. First, the user can input generalized stochastic Petri nets (GSPNs) [100] specified in an extension of the Petri net Markup Language PNML, which is then translated to Jani automatically. GSPNs are an important modeling formalism in dependability and performance evaluation. Secondly, dynamic fault trees (DFTs) are a means to specify the fault behavior of systems and is a reliability engineering formalism that is widely used in industry [111]. DFTs can be specified in the Galileo format [115]. Finally, a recent trend in the analysis of probabilistic systems is probabilistic programming [55]. The latter refers to programs written in a probabilistic extension of regular programs. An extension to imperative while programs is pGCL [74], and can additionally be extended with statements expressing conditional reasoning [104], an ingredient that is essential to describe inference as in Bayesian networks. Storm can parse and translate programs written in pGCL to Jani, which makes such programs amenable to existing probabilistic model checking techniques.

3.3 Properties

Storm offers support for a multitude of properties. The most fundamental properties are reachability properties. Intuitively, they ask for the probability with which a system reaches a certain state. One may, e.g., ask

  • “is the probability to reach an unsafe state of the system less than 0.1?”

  • “is the probability to reach a target within 20 steps at least 0.9?”

For models involving nondeterministic choices, such an analysis will reason about all possible resolutions of nondeterminism and assert that the desired property holds in all cases. Alternatively, an easy extension is to ask for some resolution of the nondeterminism such that the property holds. Besides asking for whether the probability meets some threshold, one may also ask “what is the probability to reach an unsafe state of the system?.”

As models can be equipped with cost structures, properties allow for retrieving, e.g.,

  • “what is the expected number of coin flips until consensus has been reached?”

  • “what is the expected energy consumption after t time units?”

  • “what is the expected molecule concentration at time point t?”

Further properties include temporal logic formulas based on PCTL [66] and CSL [8, 11], conditional probability and cost queries [13, 14], long-run average values [6, 28, 44] (also known as steady-state or mean payoff values), cost-bounded properties [69] (see Sect. 4.2), and support for multi-objective queries [51, 109] (see Sect. 4.5).

3.4 Model checking methods

In probabilistic model checking and arguably in verification in general, (sadly) there is no known “one-size-fits-all” solution. Instead, the best tools and techniques depend heavily on the input model and the properties. Storm—as well as other model checkers—implements a variety of approaches that allow a knowledgeable user to pick the appropriate method as part of the input, and allows developers to extend and combine their favorite methods. In particular, we provide approaches based on solving (explicit) linear (in)equation systems, value iteration variants on explicit or symbolic representations of (parts of the) model, policy iteration methods, methods using abstraction techniques and bisimulation minimization. We refer to Sect. 4 for some of Storm’s distinguishing features for model checking, and Sect. 6 for specifics on the technical realization.

4 Storm’s features

Table 2 Overview of distinguishing features of Storm and their applicability based on the model types

In this section, we detail some of the outstanding features of Storm that go beyond conventional probabilistic model checking methods. We give an overview in Table 2.

In particular, we have chosen four aspects that improve probabilistic model checking of standard properties such as reachability or expected rewards. These are reflected by the first four rows. Sound/exact model checking reflects a collection of approaches that, compared to the classical numerical algorithms, provide stronger guarantees on the accuracy of the obtained results. Cost-bounded model checking, symbolic bisimulation minimization, and game-based abstraction reduce the size of the analyzed model in various ways to make probabilistic model checking more scalable.

Furthermore, we have selected three extensions that go beyond the classical variants of probabilistic model checking: We discuss how to extract counterexamples using Storm, how to handle finding strategies that satisfy multiple properties simultaneously using multi-objective model checking, and we discuss parametric models in which probabilities are not fixed constants but rather unknown symbols.

Finally, we discuss tailored model checking methods for POMDPs and dynamic fault trees. We stress that the modular structure of Storm enables these approaches to easily reuse the regular model checking methods and the other methods outlined in this section.

4.1 Exact and sound model checking

Several works [18, 58, 121, 123] observed that the numerical methods applied by probabilistic model checkers are prone to numerical errors. This has mostly two reasons. First, the floating point data types used by the tools are inherently imprecise. For example, representing the probability \(\frac{1}{10}\) using IEEE 754 compliant double precision introduces an error of \(5 \cdot 10^{-18}\). In the presence of numerical algorithms, these errors accumulate and may lead to incorrect results. An alternative to the above is to employ rational arithmetic. That is, by representing probabilities (and costs) in the model and also the results as rational numbers, models may be analyzed without introducing any numerical errors. Storm implements these ideas and allows for the exact solution of many properties. However, efficient approaches for floating point arithmetic such as value iteration become inefficient when using rational numbers, as the representation of the latter grow very large. Storm offers two tailored techniques to solve systems of (in)equations using rational arithmetic. The first is based on policy iteration and Gaussian elimination and the second on a recent technique called rational search [18]. The idea of the latter is to use an (imprecise) approximation of the exact solution and then sharpen this to a precise rational solution using the Kwek–Mehlhorn algorithm [90]. If a straightforward check then returns that the sharpened values constitute an actual solution, the technique can return it. Otherwise, the precision of the imprecise underlying solver is increased and the loop is restarted.

Secondly, the numerical algorithms sometimes themselves are strictly speaking unsound. For example, standard value iteration for computing reachability probabilities approximates the solution in the limit, but the termination criterion implemented by most tools does not guarantee that the obtained result is differing by at most the given precision \(\epsilon \) from the actual solution. One way to combat these problems is to approach the solution from both directions, a technique referred to as interval iteration [15, 23, 58]. Storm implements the latter and additionally the more recent sound value iteration [110] and optimistic value iteration [71]. Numerical errors asideFootnote 4, these methods ensure a correct result within a user-defined accuracy and come with a small time penalty as shown in Sect. 7.

4.2 Cost-bounded reachability

A typical application for Markov models is to analyze the probability to, e.g., reach a goal state before some resource like time or energy is depleted. Another typical application is to analyze the expected time before a number of tasks have been fulfilled. Both instances can be generalized to cost-bounded reachability. In cost-bounded reachability, one is interested in the behavior of the system that does not violate the bounds on the resources. The classical approach to analyze cost-bounded reachability is to model this behavior in the model description by keeping track of the resources explicitly and then rely on standard reachability queries [5]. That is, the states of the model keep track of the consumed resources, and the reachability query asks, e.g., what the probability is that one of the target states is reached in which the resource bounds are not violated. The downside is that the model grows with these bounds.

Storm alternatively allows modeling the (nonnegative) costs of actions or states in the modeling language. These costs are attached in the model, and then, one may analyze cost-bounded reachability with the adequate query. The clear advantage of this approach is that the resources are not encoded in the state space which keeps the model much smaller. Rather, Storm does a series of model checking calls on the much smaller model [69, 70], generalizing ideas from [59, 89] to multiple cost dimensions. The reduced memory footprint allows to handle much larger models, and often the reduced memory consumption also yields faster verification times.

Cost-bounded reachability is closely related to quantile properties [70, 89], where one fixes a desired reachability probability and asks how many resources have to be invested in order to achieve this probability.

4.3 Symbolic bisimulation minimization

A typical approach to alleviate the state-space explosion is to represent the state space symbolically. In the probabilistic setting, employing variants of decision diagrams (DDs) such as multi-terminal binary DDs (MTBDDs) or multi-valued DDs (MDDs) is the most widely used approach to deal with large state spaces [10]. They are a graph-based data structure that can exploit structure and symmetry in the underlying model to represent gigantic models very compactly.

A different angle to approach the problem is abstraction. Here, the idea is to remove details from the model that are unnecessary for the desired analysis. A well-studied technique is bisimulation minimization. Its core idea is that states with equivalent behavior (in some suitable sense) can be merged to obtain a quotient model that preserves the properties of the original input. Then, the (potentially much smaller) quotient can be analyzed instead. Bisimulation minimization was shown to yield substantial reductions in the case that models are represented explicitly (for instance, in terms of a probability matrix) [87].

Storm allows to combine a symbolic representation with bisimulation minimization, thereby extending previous work [119, 121]. We extended the approach to deal with nondeterministic models, which makes it available on all four model types supported by Storm (see Sect. 3.1). This combination leads to significant reductions in memory and time consumption for a variety of models, and enables the analysis of models that are otherwise out of reach [76]. The resulting quotient model is often small enough to be represented explicitly which enables a wide range of efficient analysis methods.

4.4 Game-based abstraction–refinement

Even though bisimulation minimization effectively helps reducing the model, it has two major drawbacks. First, it is not guided by the concrete analysis that is to be performed. The quotient model may be much too fine for the analysis of a given property as it preserves a whole class of properties. Secondly, with few exceptions [42], the algorithms to compute the bisimulation quotient require the entire state space and transitions to be available. If the model is very large or even infinite, the algorithms fail to produce a quotient even if the quotient is very small.

Game-based abstraction [92] addresses these two challenges. It is based on two fundamental ideas. The first is that states are merged much more aggressively than in bisimulation minimization. That is, they may be collapsed even if they have distinguishable behavior. The behavior of the original model is over-approximated by the abstraction, and the latter can therefore be used to obtain sound bounds for the measures on the former. Note that the abstraction contains two sources of nondeterminism: the one present in the original model and the nondeterminism that is introduced by the abstraction process. Merging these sources of nondeterminism results in very loose and unsatisfactory bounds on the target values. The second idea therefore is to keep the two kinds of nondeterminism apart. This gives rise to a stochastic game [35] whose solution gives lower and upper bounds on both minimal and maximal probabilities in the original model.

Storm implements a game-based abstraction–refinement loop based on the ideas in [120]. The loop is illustrated in Fig. 3. As a first step, the abstract game is derived from the model and the current partitioning of the states, which is initially induced by the given property. If the bounds obtained by the analysis of the game are precise enough, they can be returned. Otherwise, the abstraction is refined by splitting the partition in a suitable way and the process is repeated.

Fig. 3
figure 3

Overview of abstraction–refinement using games

To enable the analysis of gigantic or even infinite models, the abstraction is extracted directly from the high-level model description (given in terms of a Prism or Jani model). This extraction is achieved by the formulation as a (series of) satisfiability problem(s), which are dispatched to an off-the-shelf solver. While this has the aforementioned advantages, it is often the computationally most expensive part of the overall procedure. To combat this, Storm implements several optimizations outlined in [76, Ch. 6].

4.5 Multi-objective model checking

Initially, the focus in many probabilistic model checkers was mostly on computing the probability that a certain event happens. However, probabilistic model checking can provide meaningful data beyond the probability to reach some state, such as the optimal strategies for MDPs, i.e., functions that describe how to resolve the nondeterminism in an MDP such that the induced behavior satisfies a given property.

However, if a strategy should satisfy multiple properties, standard model checking techniques do not suffice. Consider two properties limiting time and energy usage. Standard techniques would independently compute two strategies, one optimizing time, the other optimizing energy consumption. Both strategies might be wasting the other resource, thus violating the limits described in the matching combined property. Multi-objective model checking [50, 51] helps in finding strategies that satisfy multiple properties at once, and can be used to clarify the trade-offs between various properties.

Essentially, state-of-the-art multi-objective model checking boils down to a series of preprocessing steps on the model, and then either solving a linear program [51] or iteratively applying standard model checking techniques [52]. Storm supports multi-objective model checking on MDPs, and in addition on MAs [109] under general as well as more restricted strategies [43]. Furthermore, it allows for a more flexible combination of various properties, including properties with (multiple) cost bounds [69, 70], and incorporates some particularly efficient preprocessing steps.

4.6 Synthesis of high-level counterexamples

Besides the computation of a single strategy, the synthesis of counterexamples and/or of sets of strategies that all satisfy or violate a given property has gained some attraction. Here, we discuss counterexamples, but similar ideas have been used for so-called permissive strategies [46] as implemented in Storm using [82].

Suppose that a system reaches a bad state with a probability above some threshold. To locate the reason for this behavior, it is helpful to obtain the part of the system that leads to this behavior, by means of a counterexample. Counterexamples try capturing the essence of the failed verification attempt and help the user of the model checker—being a human or another algorithm—to revise the system or its model accordingly. In the nonprobabilistic setting, a counterexample may be represented as one offending run of the system. However, such a representation is not necessarily possible in the probabilistic setting as there may be infinitely many paths that contribute to the overall probability mass reaching the bad state [63]. A single run ending in a bad state is therefore typically insufficient as a counterexample. While it is possible to consider sets of paths for probabilistic safety properties, the resulting counterexamples are large and hard to comprehend. Alternatively, counterexamples can be computed as sub-Markov models [1, 31].

Rather than considering counterexamples at the state-space level, Storm computes counterexamples in terms of the high-level model specification using the ideas of [122]. More concretely, given a Jani (or Prism) model that violates a safety property, Storm computes the smallest portion of the Jani code that already witnesses the violation based on the method proposed in [39]. It does so by a guided exploration of all candidate sub-models. Ultimately, the smallest sub-model highlights the core of the problem. It does so at the abstraction level of the user. High-level counterexamples are thus a valuable as diagnostic feedback to tool users (by humans). Recent work has illustrated that these examples can be effectively used in a counterexample-guided inductive synthesis approach of finite Markov chains [30].

4.7 Parametric model checking

Naturally, the model checking result of Markov models crucially depends on the transition probabilities. Often, these probabilities are approximations based on data or reflect configurable parts of a modeled system. To represent the uncertainty about the probabilities, parametric Markov models have been first considered in [38, 96]. In parametric Markov models, the probabilities are symbolic expressions rather than concrete values. For any valuation of the parameters, replacing the parameters in a parametric Markov model yields an instantiated parameter-free Markov model.

There are many interesting questions that one can ask revolving around parametric systems. The simplest is feasibility, i.e., whether there exist a valuation such that the instantiated Markov model satisfies a property. More advanced is parameter space partitioning where the goal is to decompose the parameter space into regions in which a predefined property is either satisfied or violated. Such a decomposition indicates for most parameter valuations whether they lead to a system that satisfies the given property. An alternative question is to find the solution function, i.e., a function in closed form that gives the model checking result of the instantiated Markov model in terms of the parameter values. Already the feasibility problem is ETR-complete, that is, it is asymptotically as hard as finding the root of a multivariate polynomial [124].

Storm supports the construction and analysis of parametric Markov models. Besides handling models and supporting efficient instantiation of parametric models, Storm provides three methods to perform parameter synthesis. The first is based on computing the aforementioned solution function through state elimination [38, 61] that can also be seen as Gaussian elimination. This basic algorithm is improved by heuristics that order the operations, and a representation of the rational functions that allows for faster operations [40]. The second method, referred to as parameter lifting, avoids computing a potentially large rational function and determines validity of a formula over a region of parameter valuations through a sound abstraction into a nonparametric system [108]. The third method [114] aims to analyze whether the solution function is monotonic in some parameter without actually computing the solution function, as the latter can be exponential in the number of parameters. These and further methods are all used by the parameter synthesis tool PROPhESY [81] which provides a playground for parameter synthesis approaches using Storm as a back end.

4.8 Partially observable Markov decision processes

Storm supports three methods for POMDP analysis:

First, Storm supports the verification of (quantitative) reachability in POMDPs, e.g., to check whether for each policy resolving the nondeterminism based on the available observations, the probability to reach a bad state is less than 0.1. In general, this problem is undecidable [99]. We consider an equivalent reformulation of the POMDP as an (infinite) belief MDP: Here, each state is a distribution over POMDP states. Such a belief MDP has additional properties that have been exploited to allow verification [80, 98, 102]. Storm uses a combination of abstraction-and-refinement techniques to iteratively generate a finite abstract belief MDP that soundly approximates the extremal reachability probabilities in the POMDP [19].

Often, POMDPs are analyzed in settings where nondeterminism is controllable: The main interests is than in the dual of the verification problem: Find a policy such that the induced probability to reach a bad state is less than 0.1. The problem remains undecidable. A popular approach to overcome the hardness of the problem is to limit the policies, i.e., by putting a (small) a priori bound on the memory of the policy [4, 24, 64, 101, 105, 125]. Such limits are especially reasonable when the nondeterminism is controllable, i.e., if a policy is to be synthesized. There are various cases in which small memory policies deliver adequate performance. Additionally, these policies are small (and arguably simple) by construction. Storm translates POMDPs under observation-based policies with a fixed amount of memory to parametric DTMCs [84]. Consider memoryless, observation-based policies: These policies map the current observation to a distribution over the available actions. We can encode all possible actions with the help of parameters. Finding values for these parameters then corresponds to finding an observation-based policy, and arguing over all parameters corresponds to arguing over all observation-based policies adhering to the memory limit.

Third, in POMDPs, even a qualitative variant of reachability is hard: In particular, to decide whether there exists a policy—resolving the nondeterminism based on the available observations—such that the probability to reach a bad state is 1 is EXPTIME-complete [33]. Storm can compute small memory policies via SAT encodings [32], and finds more general policies by an incremental procedure [83].

4.9 Model checking dynamic fault trees

Fault trees [111] are widely used in reliability engineering and model how component failures lead to failures of the complete system. Dynamic fault trees (DFTs) [47] extend (static) fault trees by dynamic gates. DFTs more faithfully model systems by allowing order-dependent failures, functional dependencies and spare management.

Dynamic fault trees may be translated into corresponding Markov models [21, 47] whose analysis yields common measures on dynamic fault trees, such as reliability and mean time to failure. The analysis of the corresponding Markov models also allows more complex measures, e.g., dealing with degraded modes [54]. The essential step here is that Storm supports all these queries out of the box. Due to the modular architecture of Storm features such as parametric DFTs are supported off the shelf without dedicated implementation.

To drastically improve the analysis of DFTs, Storm contains a dedicated translation of such models into Markov models [117]. To make the state-space generation as fast as possible, Storm utilizes the structure of the DFT, and constructs a Markov model that contains only the relevant behavior of the DFT. Symmetries in the fault trees are exploited to further collapse the model with is then subject to regular model checking with Storm. As the state-space explosion might still be present during translation, Storm also supports a partial state-space generation for DFTs [117]. This partial state space yields a sound abstraction, which may be model checked to obtain safe lower and upper bounds. The state space can be iteratively extended to obtain the desired precision of the analysis result.

5 Using Storm

Storm is available as free and open software. Below, we give an overview how to use Storm. A detailed and up-to-date guide may be found on Storm’s website:

http://stormchecker.org

Before you start. Storm has to be configured and compiled on the target machine. This procedure automatically looks up various dependencies and (optionally) adds them if they are not found on the system. While this configuration and compilation procedure offers some advantages, see Sect. 6.5, it is often cumbersome. Therefore, we recommend users which only want to experiment to rely on the docker containersFootnote 5 containing Storm with all the key dependencies, and all interfaces and extensions. One may start right away, at the cost of slightly reduced performance.

Model descriptions. Storm can be used with a variety of input languages including Jani and Prism. A complete up-to-date list and further resources can be found at Storm’s websiteFootnote 6. For the sake of conciseness, we do not discuss the details of these languages here.

Below, we consider a Prism description of the Bounded Retransmission Protocol (brp) [75]. This and many other examples can be found in the Quantitative Verification Benchmark Set (QVBS)Footnote 7 [72].

5.1 Command line interface

The key way to interact with Storm is through its command line interface. The command line interface allows to specify the input model and properties, and after analysis reports on the requested results. The command

$$\begin{aligned} {\texttt {{storm} {-{}-prism} brp.pm {-{}-prop} brp.props }} \end{aligned}$$

invokes Storm with a Prism description in brp.pm, and the properties listed in a file brp.props. Storm will build the model and perform model checking on each property. For advanced users, the methods used for model checking can be flexibly yet simply set, e.g.,

$$\begin{aligned} {\texttt {{storm} ... {-{}-engine} hybrid {-{}-eqsolver} elimination }} \end{aligned}$$

sets the engine to hybrid (see Sect. 6.3) and sets the linear equation solver to state elimination, see Sect. 6.4. Experts may exploit the possibility to configure even details of the various procedures, e.g., the order in which state elimination is applied.

5.2 C++ extensions

To be able to flexibly use the internal data structures of Storm, one may build an own tool using Storm as a library. This approach is also taken by the Storm command line interface, as well as other extensions shipped and tightly bound to Storm, such as the analysis of DFTs outlined in Sect. 4.9. This approach is the most flexible and powerful way of using Storm, but also requires most effort. We illustrate model checking DTMCs with the sparse engine in Fig. 4. The code parses a string and a property, builds a DTMC corresponding to the model, and applies model checking on the property to compute the corresponding probability for all states. The output is then created based on the model checking result of (some) initial state. We provide a minimal working example to build your own C++ tool based on Storm as a template repositoryFootnote 8.

Fig. 4
figure 4

Using the C++ interface (with Storm version 1.6.2). Please notice that we have omitted the necessary includes. An annotated version for the latest version is given in the starter project

5.3 Python interface

A much quicker way to flexibly interact with (a selection of) Storm’s internal data structures is the Python API called StormpyFootnote 9. We exemplify the ease of use in Fig. 5. The code is equivalent to Fig. 4. Using Python may induce some runtime penalty, but it enables a flexible access to the main functionality of Storm. We stress that the code is powerful enough to drive also larger projects, e.g., the parameter synthesis tool PROPhESY [40] relies on Stormpy. We provide a minimal working example to build your own Python tool based on Stormpy as a template repositoryFootnote 10.

Fig. 5
figure 5

Using Stormpy 1.6.2

6 Architecture

In this section, we report on some internal aspects of Storm. In particular, we aim to address how we realized performance and modularity. Naturally, we cannot go into the details of the various algorithms. Rather, we discuss some design choices that will help a user to feel more familiar with the code base.

6.1 Logical structure

The root directory of Storm contains—among others—sources and resources. The latter contains the logic for the configuration routines as well as various third-party dependencies. The sources are divided into various libraries and executables. The core functionality is found in the storm library. Inside that library, one finds data structures for the representation of matrices, models, expressions, modeling languages, as well as the model checking engines and solvers, which are discussed below. Besides this library, there are libraries for parsing, handling parametric models, and handling various modeling formalism such as GSPNs and DFTs. All libraries depend on the core storm library. Moreover, most libraries are accompanied by executables that provide adequate command line interfaces.

6.2 Models

Storm features two different in-memory representations of Markov models. First, it can use sparse matrices, an explicit representation form that uses memory roughly proportional to the number of transitions with nonzero probability. Sparse matrices are suited for small- and moderate-sized models and allow for fast operations also on models with irregular structure. Secondly, Storm can store models symbolically using MTBDD, cf. Sect. 4.3. The MTBDDs are built from the model description directly. While it is possible to go from MTBDDs to the explicit representation, the other direction is not (efficiently) possible. While MTBDDs often store a model compactly, typical operations for the analysis of models yield a growth in the MTBDDs and are therefore often slow. All models can be built representing the reachability probability with floating point arithmetic, exact rational numbers, or rational functions.

Table 3 Overview of engines and supported features in Storm

6.3 Model checking engines

Storm’s engines are built around the two model representations. The sparse engine exclusively uses the sparse matrix-based representation. It first constructs the matrix representation of the state space by exploring the reachable state space specified in the modeling language and then analyzes the model using one of the many (standard, numerical) approaches, which are encapsulated as solvers (see below). While the exploration engine also uses sparse matrices, it uses ideas from reinforcement learning to avoid exploring all reachable states [23]. Instead, it proceeds in an “on-the-fly” manner and explores those parts of the system that appear to be most relevant to the verification task.

The next two engines use MTBDDs as their primary form of representation. Except for the concrete in-memory representation, the dd engine is the counterpart to the sparse engine in the sense that model building and verification is done on the very same representation and no translation takes place. Storm’s hybrid engine tries to avoid the costly numerical operations on MTBDDs by transforming only parts of the system that are relevant for the considered property into a sparse matrix representationFootnote 11.

The dd-to-sparse engine is similar, but performs the translation independent of the property. This can be useful when multiple properties are to be checked on the same model or when symbolic bisimulation minimization is applied. In the latter case, the quotient model will directly be constructed in a sparse matrix representation.

The abstraction–refinement engine implements the technique described in Sect. 4.4 and is able to compute bounds for both minimal and maximal reachability probabilities for (infinite) MDPs.

Given simple features of the input Prism or Jani model (such as the number of parallel automata or the average variable range), the automatic engine automatically selects reasonable settings for Storm. The current implementation uses a decision tree with 30 leaf nodes and a height of 7. It has been generated with the tool scikit-learn [106] using training data from experiments on the QComp benchmark set [60]. To avoid over-fitting, the automatic choice only selects either

  • the sparse engine,

  • the sparse engine with exact model checking and rational arithmetic (cf. Sect. 4.1),

  • the hybrid engine, or

  • the dd-to-sparse engine with symbolic bisimulation minimization (cf. Sect. 4.3).

Support for queries and model descriptions. Table 3 provides an overview of the models and queries supported by each engine. The sparse engine supports all model checking queries present in Storm and all DTMCs, CTMCs, MDPs, MAs, and POMDPs described in Prism or Jani. The engine can be paired with sound or exact model checking as in Sect. 4.1. However, exact arithmetic does not support time-bounded properties in CTMCs and MAs as these involve exponentials. Many advanced features such as cost-bounded reachability and multi-objective model checking are only implemented in the sparse engine. The dd-to-sparse engine can often make use of these implementations, as well. The support within other engines is more limited. The dd engine does not support continuous-time models (considered too slow) and POMDPs (typically sufficiently small). The exploration engine and the abstraction–refinement engine are both limited to reachability queries on discrete-time models. Moreover, some advanced features of the Jani language (indexed assignments, nontrivial system compositions) currently cannot be translated into DDs. The automatic engine falls back to the sparse engine if the input model is not supported by the predicted configuration.

6.4 Solvers

Probably the most outstanding trait of Storm’s architecture is the concept of solvers. Ultimately, many tasks related to (probabilistic) verification revolve around solving subproblems. For example, computing reachability probabilities or expected costs in a DTMC reduces to solving a system of linear equations. Similarly, for an MDP a system of equations needs to be solved, with the difference that the equations are Bellman equations involving minima and maxima. However, these are by no means the only kinds of problems appearing in probabilistic verification.

Fig. 6
figure 6

Most important solvers used by Storm

Table 4 Solvers Storm provides out of the box

Figure 6 illustrates some functionalities of Storm which have a dependency to one or more solvers. For example, (explicit) model building employs Smt solving. As the initial states of symbolic models (e.g., Prism or Jani) are given by the satisfying assignments of an expression, Storm uses Smt solvers to enumerate the possible initial states. Similarly, the extraction of the abstract model from the symbolic model (as presented in Sect. 4.4) in the abstraction refinement engine crucially depends on enumerating satisfying assignments and therefore Smt solvers. As yet another example, consider the synthesis of high-level counterexamples as in Sect. 4.6. Here, one of the offered techniques relies on the solution of a Milp while the other uses Smt solvers.

Two of the main goals in the development of Storm were the ability to exchange central building blocks (like solvers) and to benefit from (re)using high-performance implementations provided by other libraries. It therefore offers abstract interfaces for the solver types mentioned above that are oblivious to the underlying implementation. Offering these interfaces has several key advantages. First, it provides easy and coherent access to the tasks commonly involved in probabilistic model checking. Secondly, it enables the use of dedicated state-of-the-art high-performance libraries for the task at hand. More specifically, as the performance characteristics of different backend solvers can vary drastically for the same input, this permits choosing the best solver for a given task. Licensing problems are avoided, because implementations can be easily enabled and disabled, depending on whether or not the particular license fits the requirements. Finally, implementing new solver functionality is easy and can be done without detailed knowledge of the global code base. This flexibility allows to keep Storm up to date with new state-of-the-art solvers.

For each of the solver interfaces, several actual implementations exist. For example, Storm currently has four implementations (each of them with a range of further options) of the linear equation solver interface for problems given as sparse matrices: One is based on Gmm++, one is based on Eigen [56], one uses its native internal data structures and algorithms for numerical algorithms and another one is based on Gaussian elimination [38]. Table 4 gives an overview over the currently available implementations. Here, all solvers that are purely implemented in terms of Storm’s data structures and do not use libraries are marked with an asterisk to indicate that they are “built in.”

To realize the support for DD-based representations of systems, Storm relies on two different libraries: CUDD [113] and Sylvan [118]. While the former is very well established in the field, the latter is more recent and tries to make use of modern multi-core CPU architectures by parallelizing costly operations. The parallelization comes at the price of more expensive bookkeeping and in general CUDD performs better if there are many operations on smaller DDs, while Sylvan is faster when fewer operations on larger DDs are involved. Storm implements an abstraction layer on top of the two libraries that uses static polymorphism. This way, it is possible to write code that is independent of the underlying library and does not incur runtime costs.

6.5 Technicalities

By far the largest part (over 170,000 lines of code) of Storm is written in the C++ programming language and extensively uses template meta-programming. This has several positive and negative implications. On the one hand, it serves the purpose of high performance for several reasons. First, C++ allows fine-grained control over implementation details like memory allocations. Secondly, C++ templates allow code to be heavily reused while maintaining performance as the static polymorphism enables type-dependent optimizations at compile time. Large parts of the code are written agnostic of the data type (floating point, rational number, or even rational functions) and only the core parts are specialized based on the data type. As this happens at compile time, no runtime cost is incurred. Finally, we observe that many high-performance solvers and data structure libraries that are well suited for the context of (probabilistic) verification are written in C or C++ (and also partially make use of template meta-programming), such as

  • SMT solvers (Z3 [45], MathSat [34], Smt-Rat [36]),

  • LP solvers (Gurobi [57], glpkFootnote 12),

  • linear algebra libraries (Gmm++Footnote 13, Eigen [56]),

  • DD libraries (CUDD [113], Sylvan [118]), and

  • rational arithmetic libraries (CArL [36], GMPFootnote 14).

Choosing C++ as the language for Storm therefore allows easy and fast interfacing with these solvers. On the other hand, the advantages come at a price. Advanced templating patterns can be difficult to understand and increase compile times significantly.

Table 5 Outcomes of experiments on 96 benchmark instances

7 Evaluation

This section contains an empirical evaluation of some key functionalities of Storm. Furthermore, we recap results of QComp 2019 [60] and QComp 2020 [26] to emphasize the competitiveness of Storm.

7.1 Setup and methodology

We consider the set of 100 benchmark instances that were selected in QComp 2019 and 2020 [26, 60]. Each instance consists of a symbolic model description and a property specification from the Quantitative Verification Benchmark Set (QVBS) [72]. If available, we consider model descriptions in the Prism language. Otherwise, the model is build from the Jani description. For a better comparison across Storm’s engines, we did not employ the techniques from Sect. 4.9 to solve DFTs. Since Storm has no native support for PTA, we used the tool moconv (part of the Modest ToolsetFootnote 15 [67]) to translate PTAs into MDPs. For four instances either moconv did not support the PTA or Storm did not support the output of moconv. We therefore restrict our evaluation to the remaining 96 benchmark instances.

For each instance, the task is to solve the corresponding model checking query within a time limit of 30 minutes and a memory limit of 12 GB. The results are compared to the reference results provided by the QVBS. If the relative difference between these values is greater than \(10^{-3}\), the result is considered incorrect. This setup coincides with the setup of QComp 2019. All experiments were run on 4 cores of an Intel® Xeon® Platinum 8160 Processor. We measure the wall-clock runtimes (including model building and model checking) for all experiments. Notice that this machine is more powerful than the QComp 2019 machine.

For our evaluation, we consider Storm version 1.6.2 in seven different configurations comprising

  • the main engines of Storm: sparse, hybrid, and dd,

  • symbolic bisimulation (bisim) with sparse quotient (Sect. 4.3),

  • sound and exact model checking within the sparse engine (Sect. 4.1), and

  • the automatic engine.

Whenever the invoked model checking method is sound (i.e., provides precision guarantees), the precision of Storm is set to \(10^{-3}\) (relative). Otherwise, Storm’s default precision \(10^{-6}\) (relative) is used. We select Sylvan [118] as DD-library, and set its memory limit to 4 GB. We also consider a “fastest” configuration that takes the best result from the seven configurations, i.e., a configuration which runs all seven configurations and terminates whenever the fastest terminates (and further runs the seven configurations independently on different machines).

All benchmark files, log files, and replication scripts are available at [77].

7.2 Results

Fig. 7
figure 7

Runtime comparison of Storm’s key features

Table 5 summarizes the outcomes of our experiments. The seven columns refer to the seven configurations as described above. In the first row, we indicate how many of the 96 considered instances were correctly solved for each configuration. The subsequent rows indicate the number of not supported instancesFootnote 16, the number of times the time or memory limit was exceeded, respectively, and the number of incorrect resultsFootnote 17 that were obtained. Observe that these rows always sum to 96.

For the “fastest” configuration, we obtain 87 solved instances and 0 incorrect results. The next rows (after the horizontal line) show how often each configuration was either the fastest among the tested ones or only 1% (50%) slower than the fastest one, i.e., terminated within 101% (150%) of the fastest configuration.

We further compare the runtimes of the different engines and features in Fig. 7. The shown quantile plot expresses how many benchmark instances (measured on the x-axis) each were solved in at most the time given on the y-axis. In other words, the point \(\left\langle x, y \right\rangle \) is contained in the quantile plot for configuration c if the maximal runtime when using c on the x fastest solved instances (for c) is y seconds. Time and memory outs, incorrect results, and unsupported experiments may skew the lines of the affected configurations as all these outcomes do not count as solved. Besides the seven considered configurations, we also depict the runtime obtained by the fastest engine or feature for each individual benchmark.

Finally, we compare the configurations of Storm one by one and give the results in Figs. 8 and 9. Each point in the depicted scatter plots indicates the runtimes of the two compared configurations for one benchmark instance. The type (DTMC, CTMC, MDP, MA, or PTA) of the verification task is indicated by means of different marks. The scatter plots use logarithmic scales on both axes and indicate speed-ups of 10 by means of dotted lines. If an experiment ran out of resources (time or memory), was not supported, or yielded an incorrect result, we draw the point on separate lines, labeled OOR, NS, and INC, respectively. We compare the engines (sparse, hybrid, and dd) with each other in Fig. 8a–c. Symbolic bisimulation, sound, and exact model checking are compared with the sparse engine (the default of Storm) in Fig. 8d–f. For the comparison with sound model checking, we do not depict benchmark instances where the default method is already sound. Figure 9a, b compares the automatic engine with the sparse engine and with the fastest configuration, respectively.

More detailed results of our experiments can be found on http://stormchecker.org/benchmarks.

Fig. 8
figure 8

Comparison of engines and features of Storm

Fig. 9
figure 9

Comparison of engines and features of Storm (continued)

7.3 Discussion

Comparing the three main engines of Storm (sparse, hybrid, and dd), the sparse engine was the most versatile engine during our experiments since it supports all 96 instances and successfully solved the majority (73) of them, outperforming the other two engines. However, looking at Fig. 7 we see that the other engines are competitive. The automatic engine often manages to pick the “right” configuration for a given benchmark and thus almost matches the performance of the (notional) fastest configuration. As indicated in Fig. 8, several instances could only be solved using symbolic techniques based on the hybrid or the dd engine. We emphasize that the benchmark selection can have a strong impact when comparing the engines of Storm because the symbolic engines are strongly reliant on the model structure. Moreover, many benchmarks are not supported by the hybrid and/or the dd engine which skews the lines in Fig. 7.

Symbolic bisimulation was extremely effective on models with a concise DD-based representation and a small bisimulation quotient. The export into a sparse quotient allows Storm to make use of the versatility of the sparse engine.

In Fig. 8e, we see that the overhead for sound model checking is often negligible. As mentioned above, we invoke classical model checking (such as value iteration) with the default precision parameters (\(10^{-6}\) relative precision), whereas sound model checking is invoked with the actual precision requirements (\(10^{-3}\) relative precision), yielding speed-ups for some instances.

Exact model checking is comparably costly. The use of exact (infinite precision) arithmetic induces increasingly larger number representations. Moreover, approximative, numerical solution methods cannot be applied. However, on a few instances where numerical methods do not work well, exact model checking was superior to the remaining configurations.

Figure 9a, b shows that—for this benchmark set—the automatic engine improved the runtime of the sparse engine in many cases and that there were only a few instances where it was outperformed by the (notional) fastest engine.

Fig. 10
figure 10

Performance of Storm compared with other state-of-the-art model checkers. All figures are taken from [60] licensed under Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/

7.4 Summary from QComp 2019

We briefly recap the results of QComp 2019, focusing on the performance evaluation. For further details, we refer to the competition report [60].

The experimental setup of QComp 2019 (benchmark selection, precision requirements, time and memory limits, etc.) coincides with our setup as detailed above, except that

  • a different machine was used, and

  • Storm was considered in version 1.3.0.

Each tool was executed in two different modes: once with default settings (which for Storm coincides with using the sparse engine) and once with benchmark specific settings. For the latter mode, the participants could provide a tailored tool invocation for each individual benchmark instance. For Storm, this was realized by empirically determining the fastest configuration for a given instance, where we considered the configurations sparse, hybrid, dd, bisim, sound, and exact (as above).

Figure 10 depicts the performance results of QComp 2019 that are relevant for Storm. The quantile plots in Fig. 10a, b compare Storm with the other participating general-purpose probabilistic model checkers Epmc [62], mcsta [68], and Prism [93] using the default and specific modes, respectively. Storm supported 96 of the 100 considered benchmark instances, whereas Epmc, Prism, and mcsta supported 63, 58, and 86 instances, respectively. For the quantile plots, only the 43 instances that were supported by all 4 tools were taken into account. In particular, all benchmarks are given in Prism language since Prism does not support Jani. The scatter plots in Fig. 10c, d compare Storm with the best of the other 8 participating tools. A point above the solid diagonal line indicates that on the corresponding instance, Storm was the fastest tool among all participants.

Considering the results for the default mode in Fig. 10a, Storm is the strongest competitor of the other three tools. However, the performance results of Storm and Prism are very close to each other. For instance-specific invocations (Fig. 10b), Storm clearly outperformed all its competitors. The scatter plots show that Storm performed best among all tools for 1/3 of the supported benchmarks in default mode and 1/2 of the supported benchmarks in specific mode.

7.5 Outlook to QComp 2020

Since QComp 2019 further progress of the participating tools has been made. For example, new and efficient model checking techniques for MDPs and MAs have been implemented in mcsta [27, 71]. QComp 2020 [26] captures some of these changes and gives a special emphasis to the correctness of the results produced by the tools. In contrast to the 2019 edition, the performance evaluation is divided in six tracks. The tracks consider the same benchmark set but impose different correctness requirements ranging from exact results to often  \(\varepsilon \)-correct results. Among all nine participants, Storm has been the only tool that implements supporting algorithms for all tracks and has proven competitiveness in each of them. More details to QComp 2020 can be found in its competition report [26].

We remark that both QComp 2019 and QComp 2020 necessarily only provide a snapshot of the tool landscape at the time of the evaluation. A repetition of the evaluation of QComp with newer tool versions can yield different results.

8 Conclusion

This paper presented the state-of-the-art probabilistic model checker Storm. We have discussed its main distinguishing features and described how it can be used for rapid prototyping of new algorithms and tools. Key aspects of Storm are its modularity, its accessibility through a Python interface, its various modeling formalisms, as well as the functionalities that go beyond the standard probabilistic model checking algorithms. We believe that its modularity, careful crafting of the most time-consuming operations, and our experience with earlier in-house developed model checkers, have led to a tool that is competitive to existing probabilistic model checkers. Storm provides an effective and efficient platform for future-proof developments in probabilistic model checking. It is open access and publicly available from http://stormchecker.org. A major challenges will be to keep up with the rapid progress in the field. This does not only involve the implementation of new algorithms, but also involve constantly revising existing code fragments.