AMYTISS: Parallelized Automated Controller Synthesis for Large-Scale Stochastic Systems

In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) build finite Markov decision processes (MDPs) as finite abstractions of given original systems, and (ii) synthesize controllers for the constructed finite MDPs satisfying bounded-time high-level properties including safety, reachability and reach-avoid specifications. In AMYTISS, scalable parallel algorithms are designed such that they support the parallel execution within CPUs, GPUs and hardware accelerators (HWAs). Unlike all existing tools for stochastic systems, AMYTISS can utilize high-performance computing (HPC) platforms and cloud-computing services to mitigate the effects of the state-explosion problem, which is always present in analyzing large-scale stochastic systems. We benchmark AMYTISS against the most recent tools in the literature using several physical case studies including robot examples, room temperature and road traffic networks. We also apply our algorithms to a 3-dimensional autonomous vehicle and 7-dimensional nonlinear model of a BMW 320i car by synthesizing an autonomous parking controller.


Motivations
Large-scale stochastic systems are an important modeling framework to describe many real-life safety-critical systems such as power grids, traffic networks, selfdriving cars, and many other applications. For this type of complex systems, automating the controller synthesis procedure to achieve high-level specifications, e.g., those expressed as linear temporal logic (LTL) formulae [24], is inherently very challenging mainly due to their computational complexity arising from uncountable sets of states and actions. To mitigate the encountered difficulty, finite abstractions, i.e., systems with finite state sets, are usually employed as replacements of original continuous-space systems in the controller synthesis procedure. More precisely, one can first abstract a given continuous-space system by a simpler one, e.g., a finite Markov decision process (MDP), and then perform analysis and synthesis over the abstract model (using algorithmic techniques from computer science [3]). Finally, the results are carried back to the original system, while providing a guaranteed error bound [5,[13][14][15][16][17][18][19][20][21]23].
Unfortunately, construction of finite MDPs for large-scale complex systems suffers severely from the so-called curse of dimensionality: the computational complexity grows exponentially as the number of state variables increases. To alleviate this issue, one promising solution is to employ high-performance computing (HPC) platforms together with cloud-computing services to mitigate the state-explosion problem. In particular, HPC platforms have a large number of processing elements (PEs) and this significantly affects the time complexity when serial algorithms are parallelized [7].

Contributions
The main contributions and merits of this work are: (1) We propose a novel data-parallel algorithm for constructing finite MDPs from discrete-time stochastic systems and storing them in efficient distributed data containers. The proposed algorithm handles large-scale systems. (2) We propose a parallel algorithm for synthesizing discrete controllers using the constructed MDPs to satisfy safety, reachability, or reach-avoid specifications. More specifically, we introduce a parallel algorithm for the iterative computation of Bellman equation in standard dynamic programming [26,27]. (3) Unlike the existing tools in the literature, AMYTISS accepts bounded disturbances and natively supports both additive and multiplicative noises with different practical distributions including normal, uniform, exponential, and beta.
We apply the proposed implementations to real-world applications including robot examples, room temperature and road traffic networks, and autonomous vehicles. This extends the applicability of formal methods to some safetycritical real-world applications with high dimensions. The results show remarkable reductions in the memory usage and computation time outperforming all existing tools in the literature. We provide AMYTISS as an open-source tool. After compilation, AMYTISS is loaded via pFaces [10] and launched for parallel execution within available parallel computing resources. The source of AMYTISS and detailed instructions on its building and running can be found in: https://github.com/mkhaled87/pFaces-AMYTISS Due to lack of space, we provide details of traditional serial and proposed parallel algorithms, case studies, etc. in an arXiv version of the paper [12].

Related Literature
There exist several software tools on verification and synthesis of stochastic systems with different classes of models. SReachTools [30] performs stochastic reachability analysis for linear, potentially time-varying, discrete-time stochastic systems. ProbReach [25] is a tool for verifying the probabilistic reachability for stochastic hybrid systems. SReach [31] solves probabilistic bounded reachability problems for two classes of models: (i) nonlinear hybrid automata with parametric uncertainty, and (ii) probabilistic hybrid automata with additional randomness for both transition probabilities and variable resets. Modest Toolset [6] performs modeling and analysis for hybrid, real-time, distributed and stochastic systems. Two competitions on tools for formal verification and policy synthesis of stochastic models are organized with reports in [1,2].
FAUST 2 [29] generates formal abstractions for continuous-space discrete-time stochastic processes, and performs verification and synthesis for safety and reachability specifications. However, FAUST 2 is originally implemented in MATLAB and suffers from the curse of dimensionality due to its lack of scalability for large-scale models. StocHy [4] provides the quantitative analysis of discrete-time stochastic hybrid systems such that it constructs finite abstractions, and performs verification and synthesis for safety and reachability specifications.
AMYTISS differs from FAUST 2 and StocHy in two main directions. First, AMYTISS implements novel parallel algorithms and data structures targeting HPC platforms to reduce the undesirable effects of the state-explosion problem. Accordingly, it is able to perform parallel execution in different heterogeneous computing platforms including CPUs, GPUs and HWAs. Whereas, FAUST 2 and StocHy can only run serially on one CPU, and consequently, it is limited to small systems. Additionally, AMYTISS can handle the abstraction construction and controller synthesis for two and a half player games (e.g., stochastic systems with bounded disturbances), whereas FAUST 2 and StocHy only handle one and a half player games (e.g., disturbance-free systems).
Unlike all existing tools, AMYTISS offers highly scalable, distributed execution of parallel algorithms utilizing all available processing elements (PEs) in any heterogeneous computing platform. To the best of our knowledge, AMYTISS is the only tool of its kind for continuous-space stochastic systems that is able to utilize all types of compute units (CUs), simultaneously.
We compare AMYTISS with FAUST 2 and StocHy in Table 1 in detail in terms of different technical aspects. Although there have been some efforts in FAUST 2 and StocHy for parallel implementations, these are not compatible with HPC platforms. Specifically, FAUST 2 employs some parallelization techniques using parallel for-loops and sparse matrices inside Matlab, and StocHy uses Armadillo, a multithreaded library for scientific computing. However, these tools are not designed for the parallel computation on HPC platforms. Consequently, they can only utilize CPUs and cannot run on GPUs or HWAs. In comparison, AMYTISS is developed in OpenCL, a language specially designed for data-parallel tasks, and supports heterogeneous computing platforms combining CPUs, GPUs and HWAs. Note that FAUST 2 and StocHy do not natively support reach-avoid specifications in the sense that users can explicitly provide some avoid sets. Implementing this type of properties requires some modifications inside those tools. In addition, we do not make a comparison here with SReachTools since it is mainly for stochastic reachability analysis of linear, potentially time-varying, discrete-time stochastic systems, while AMYTISS is not limited to reachability analysis and can handle nonlinear systems as well.
Note that we also provide a script in the tool repository 1 that converts the MDPs constructed by AMYTISS into PRISM-input-files [11]. In particular, AMYTISS can natively construct finite MDPs from continuous-space stochastic control systems. PRISM can then be employed to perform the controller synthesis for those classes of complex specifications that AMYTISS does not support.

Discrete-Time Stochastic Control Systems
We formally introduce discrete-time stochastic control systems (dt-SCS) below.

Definition 1. A discrete-time stochastic control system (dt-SCS) is a tuple
where, -X ⊆ R n is a Borel space as the state set and (X,B(X)) is its measurable space; -U ⊆ R m is a Borel space as the input set; -W ⊆ R p is a Borel space as the disturbance set; ς is a sequence of independent and identically distributed (i.i.d.) random variables from a sample space Ω to a measurable set V ς f : X ×U ×W → X is a measurable function characterizing the state evolution of the system.
The state evolution of Σ, for a given initial state x(0) ∈ X, an input sequence ν(·) : N → U , and a disturbance sequence w(·) : N → W , is characterized by the difference equations where Υ (k) := ς(k) with V ς = R n for the case of the additive noise, and Υ (k) := ς(k)x(k) with V ς equals to the set of diagonal matrices of the dimension n for the case of the multiplicative noise [22]. We keep the notation Σ to indicate both cases and use respectively Σ a and Σ m when discussing these cases individually. We should mention that our parallel algorithms are independent of the noise distribution. For an easier presentation of the contribution, we present our algorithms and case studies based on normal distributions but our tool natively supports other practical distributions including uniform, exponential, and beta. In addition, we provide a subroutine in our software tool so that the user can still employ the parallel algorithms by providing the density function of the desired class of distributions.

Remark 1.
Our synthesis is based on a max-min optimization problem for two and a half player games by considering the disturbance and input of the system as players [9]. Particularly, we consider the disturbance affecting the system as an adversary and maximize the probability of satisfaction under the worstcase strategy of a rational adversary. Hence, we minimize the probability of satisfaction with respect to disturbances, and maximize it over control inputs.
One may be interested in analyzing dt-SCSs without disturbances (cf. case studies). In this case, the tuple (1) reduces to Σ = (X, U, ς, f ), where f : X ×U → X, and the Eq. (2) can be re-written as Note that input models in this tool paper are given inside configuration text files. Systems are described by stochastic difference equations as (2)-(3), and the user should provide the right-hand-side of equations 2 . In the next section, we formally define MDPs and discuss how to build finite MDPs from given dt-SCSs.

Finite Markov Decision Processes (MDPs)
A dt-SCS Σ in (1) is equivalently represented by the following MDP [8, Proposition 7.6]: , is a conditional stochastic kernel that assigns to any x ∈ X, ν ∈ U , and w ∈ W, a probability measure T x (·|x, ν, w). The alternative representation as the MDP is utilized in [28] to approximate a dt-SCS Σ with a finite MDP Σ using an abstraction algorithm. This algorithm first constructs a finite partition of the state set X = ∪ i X i , the input set U = ∪ i U i , and the disturbance set W = ∪ i W i . Then representative pointsx i ∈ X i ,ν i ∈ U i , andw i ∈ W i are selected as abstract states, inputs, and disturbances. The transition probability matrix for the finite MDP Σ is also computed aŝ where the map Ξ : X → 2 X assigns to any x ∈ X, the corresponding partition element it belongs to, i.e., Ξ(x) = X i if x ∈ X i . SinceX,Û andŴ are finite sets,T x is a static map. It can be represented with a matrix and we refer to it, from now on, as the transition probability matrix. For a given logic specification ϕ and accuracy level , the discretization parameter δ can be selected a priori such that where depends on the horizon of formula ϕ, the Lipschitz constant of the stochastic kernel, and the state discretization parameter δ (cf. [28,Theorem 9]). We refer the interested reader to the arXiv version [12] for more details.
In the next sections, we propose novel parallel algorithms for the construction of finite MDPs and the synthesis of their controllers.

Parallel Construction of Finite MDPs
In this section, we propose an approach to efficiently compute the transition probability matrixT x of the finite MDP Σ, which is essential for any controller synthesis procedure, as we discuss later in Sect. 5.

Data-Parallel Threads for ComputingT X
The serial algorithm for computingT x is presented in Algorithm 1 in the arXiv version [12]. Computations of mean μ = f (x i ,ν j ,w k , 0), PDF(x | μ, Σ), where PDF stands for probability density functions and Σ is a noise covariance matrix, and ofT x all do not share data from one inner-loop to another. Hence, this is an embarrassingly data-parallel section of the algorithm. pFaces [10] can be utilized to launch necessary number of parallel threads on the employed hardware configuration (HWC) to improve the computation time of the algorithm. Each thread will eventually compute and store, independently, its corresponding values withinT x .

Less Memory for Post States inT X
T x is a matrix with the dimension of (n x ×n ν ×n w , n x ). The number of columns is n x as we need to compute and store the probability for each reachable partition element Ξ(x l ), corresponding to the representing post state x l . Here, we consider the Gaussian PDFs for the sake of a simpler presentation. For simplicity, we now focus on the computation of tuple (x i ,ν j ,w k ). In many cases, when the PDF is decaying fast, only partition elements near μ have high probabilities of being reached, starting fromx i and applying an inputν j .
We set a cutting probability threshold γ ∈ [0, 1] to control how many partition elements around μ should be stored. For a given mean value μ, a covariance matrix Σ and a cutting probability threshold γ, x ∈ X is called a PDF cutting point if γ = PDF(x|μ, Σ). Since Gaussian PDFs are symmetric, by repeating this cutting process dimension-wise, we end up with a set of points forming a hyper-rectangle in X, which we call it the cutting region and denote it byX Σ γ . This is visualized in Fig. 1 in the arXiv version [12] for a 2-dimensional system. Any partition element Ξ(x l ) with x l outside the cutting region is considered to have zero probability of being reached. Such approximation allows controlling the sparsity of the columns ofT x . The closer the value of γ to zero, the more accurateT x in representing transitions of Σ. On the other hand, the closer the value of γ to one, less post state values need to be stored as columns inT x . The number of probabilities to be stored for each (x i ,ν j ,w k ) is then |X Σ γ |. Note that since Σ is fixed prior to running the algorithm, number of columns needed for a fixed γ can be identified before launching the computation. We can then accurately allocate a uniform fixed number of memory locations for any tuple (x i ,ν j ,w k ) inT x . Hence, there is no need for a dynamic sparse matrix data structure andT x is now a matrix with a dimension of (n x × n ν × n w , |X Σ γ |).

A Parallel Algorithm for Constructing Finite MDP Σ
We present a novel parallel algorithm (Algorithm 2 in the arXiv version [12]) to efficiently construct and storeT x as a successor. We employ the discussed enhancements in Subsect. 4.1 and 4.2 within the proposed algorithm. We do not parallelize the for-loop in Algorithm 2, Step 2, to avoid excessive parallelism (i.e., we parallelize loops only over X and U , but not over W ). Note that, practically, for large-scale systems, |X ×Û | can reach up to billions. We are interested in the number of parallel threads that can be scheduled reasonably by available HW computing units.

Parallel Synthesis of Controllers
In this section, we employ dynamic programming to synthesize controllers for constructed finite MDPs Σ satisfying safety, reachability, and reach-avoid properties [26,27]. The classical serial algorithm and its proposed parallelized version are respectively presented as Algorithms 3 and 4 in the arXiv version [12]. We should highlight that the parallelism here mainly comes from the parallelization of matrix multiplication and the loop over time-steps cannot be parallelized due to the data dependency. More details can be found in the arXiv version.

On-the-Fly Construction ofT X
In AMYTISS, we also use another technique that further reduces the required memory for computingT x . We refer to this approach as on-the-fly abstractions (OFA). In OFA version of Algorithm 4 [12], we skip computing and storing the MDPT x and the matrixT 0x (i.e., Steps 1 and 5). We instead compute the required entries ofT x andT 0x on-the-fly as they are needed (i.e., Steps 13 and 15). This significantly reduces the required memory forT x andT 0x but at the cost of repeated computation of their entries in each time step from 1 to T d . This gives the user an additional control over the trade-off between the computation time and memory.

Supporting Multiplicative Noises and Practical Distributions
AMYTISS natively supports multiplicative noises and practical distributions such as uniform, exponential, and beta distributions. The technique introduced in Subsect. 4.2 for reducing the memory usage is also tuned for other distributions based on the support of their PDFs. Since AMYTISS is designed for extensibility, it allows also for customized distributions. Users need to specify their desired PDFs and hyper-rectangles enclosing their supports so that AMYTISS can include them in the parallel computation ofT x . Further details on specifying customized distributions are provided in the README file.
AMYTISS also supports multiplicative noises as introduced in (2). Currently, the memory reduction technique of Subsect. 4.2 is disabled for systems with multiplicative noises. This means users should expect larger memory requirements for systems with multiplicative noises. However, users can still benefit from the proposed OFA version to compensate for the increase in memory requirement. We plan to include this feature for multiplicative noises in a future update of AMYTISS. Note that for a better demonstration, previous sections were presented by the additive noise and Gaussian normal PDF to introduce the concepts.

Benchmarking and Case Studies
AMYTISS is self-contained and requires only a modern C++ compiler. It supports all major operating systems: Windows, Linux and Mac OS. Once compiled, utilizing AMYTISS is a matter of providing text configuration files and launching the tool. AMYTISS implements scalable parallel algorithms that run on top of pFaces [10]. Hence, users can utilize computing power in HPC platforms and cloud computing to scale the computation and control the computational complexities of their problems. Table 2 lists the HW configuration we use to benchmark AMYTISS. The devices range from local devices in desktop computers to advanced compute devices in Amazon AWS cloud computing services.  Table 3 shows the benchmarking results running AMYTISS with these HWCs for several case studies and makes comparisons between AMYTISS, FAUST 2 , and StocHy. We employ a machine with Windows operating system (Intel i7@3.6 GHz CPU and 16 GB of RAM) for FAUST 2 , and StocHy. It should be mentioned that FAUST 2 predefines a minimum number of representative points based on the desired abstraction error, and accordingly the computation time and memory usage reported in Table 3 are based on the minimum number of representative points. In addition, to have a fair comparison, we run all the case studies with additive noises since neither FAUST 2 nor StocHy supports multiplicative noises.
To show the applicability of our results to large-scale systems, we apply our techniques to several physical case studies. We synthesize controllers for 3-and 5-dimensional room temperature networks to keep temperatures in a comfort zone. Furthermore, we synthesize controllers for road traffic networks with 3 and 5 dimensions to keep the density of the traffic below some desired level. In addition, we apply our algorithms to a 2-dimensional nonlinear robot and synthesize controllers satisfying safety and reach-avoid specifications. Finally, we consider 3-and 7-dimensional nonlinear models of an autonomous vehicle and synthesize reach-avoid controllers to automatically park the vehicles. For details of case studies, see the arXiv version [12]. Table 3 presents a comparison between AMYTISS, FAUST 2 and StocHy w.r.t the computation time and required memory. For each HWC, we show the time in seconds to solve the problem. Clearly, employing HWCs with more PEs reduces the time to solve the problem. This is a strong indication for the scalability of the proposed algorithms. Since AMYTISS is the only tool for stochastic systems that can utilize the reported HWCs, we do not compare it with other similar tools.
In Table 3, first 13 rows, we also include the benchmark provided in StocHy [4, Case study 3]. Table 4 in the arXiv version [12] shows an additional comparison between StocHy and AMYTISS on a machine with the same configuration as the one employed in [4] (a laptop having an Intel Core i7 − 8550U CPU at 1.80GHz with 8 GB of RAM). StocHy suffers significantly from the stateexplosion problem as seen from its exponentially growing computation time. AMYTISS, on the other hand, outperforms StocHy and can handle bigger systems using the same hardware. Table 3.

AMYTISS, FAUST 2
and StocHy based on their native features for several (physical) case studies. CSB refers to the continuous-space benchmark provided in [4]. † refers to cases when we run AMYTISS with the OFA algorithm. N/M refers to the situation when there is not enough memory to run the case study. N/S refers to the lack of native support for nonlinear systems.
(Kx) refers to an 1000-times speedup. The presented speedup is the maximum speedup value across all reported devices. The required memory usage and computation time for   As seen in Table 3, AMYTISS outperforms FAUST 2 and StocHy in all the case studies (maximum speedups up to 692000 times). Moreover, AMYTISS is the only tool that can utilize the available HW resources. The OFA feature in AMYTISS reduces dramatically the required memory, while still solves the problems in a reasonable time. FAUST 2 and StocHy fail to solve many of the problems since they lack the native support for nonlinear systems, they require large amounts of memory, or they do not finish computing within 24 hours.
Note that considering only dimensions of systems can be sometimes misleading. In fact, number of transitions in MDPs (|X ×Û |) can give a better judgment on the size of systems since it directly affects the memory/time needed for solving the problem. For instance in Table 3, the number of transitions for the 14-dimensional case study is 16384, while for the 5-dimensional room temperature example is 279936 transitions (i.e., almost 17 times bigger). This means AMYTISS can clearly handle much larger systems than existing tools.