Inductive Synthesis for Probabilistic Programs Reaches New Horizons

This paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.


Introduction
Background and motivation. Controller synthesis for Markov decision processes (MDPs [35]) and temporal logic constraints is a well-understood and tractable problem, with a plethora of mature tools providing efficient solving capabilities. However, the applicability of these controllers to a variety of systems is limited: Systems may be decentralized, controllers may not be able to observe the complete system state, cost constraints may apply, and so forth. Adequate operational models for these systems exist in the form of decentralized partially-observable MDPs (DEC-POMDPs [33]). The controller synthesis problem for these models is undecidable [30], and tool support (for verification tasks) is scarce. This paper takes a different approach: the controller together with the environment can be modelled as probabilistic program sketches where "holes" in the probabilistic program model choices that the controller may make. Conceptually, the controllers of the DEC-POMDP are described by a user-defined finite family M of Markov chains. The synthesis problem that we consider is to find a Markov chain M (i.e., a probabilistic program) in the family M, such that M |= ϕ, where ϕ is the specification. To allow efficient algorithms, the family must have some structure. In particular, in our setting, the family is parameterized by a set of discrete parameters K; an assignment K → V of these parameters with concrete values V from its associated domain yields a family member, i.e., a Markov chain (MC). Such a parameterization is naturally obtained from the probabilistic program sketch, where some constants (or program parts) can be left open. The search for a family member can thus be considered as the search for a hole-assignment. This approach fits within the realm of syntax-guided synthesis [2].
Motivating example. Herman's protocol [24] is a well-studied randomized distributed algorithm aimed to obtain fast stabilization on average. In [26], a family M of MCs is used to model different protocol instances. They considered each instance separately, and found which of the controllers for Herman's protocol performs best. Let us consider the protocol in a bit more detail: It considers self-stabilization of a unidirectional ring of network stations where all stations have to behave similarly-an anonymous network. Each station stores a single bit, and can read the internal bit of one (say left) neighbour. To achieve stabilization, a station for which the two legible bits coincide updates its own bit based on the outcome of a coin flip. The challenge is to select a controller that flips this coin with an optimal bias, i.e., minimizing the expected time until stabilization. In a setting where the probabilities range over 0.1, 0.2, . . . , 0.9, this results in analyzing nine different MCs. Does the expected time until stabilization reduce if the controllers are additionally allowed to have a single bit of memory? In every step, there are 9·9 combinations for selecting the coin flip and for each memory cell and coin flip outcome, the memory can now be updated, yielding 2·2·2 possibilities. This one-bit extension thus results in a family of 648 models. If, in addition, one allows stations to make decisions depending on the token-bits, both the coin flips and the memory updates are multiplied by a factor 4, yielding 10, 368 models. Eventually, analyzing all individual MCs is infeasible.
Oracle-guided synthesis. To tackle the synthesis problem, we introduce an oracleguided inductive synthesis approach [25,39]. A learner selects a family member and passes it to the oracle. The oracle answers whether the family member satisfies ϕ, and crucially, gives additional information in case this is not the case. Inspired by [9], if the family member violates the specification ϕ, our oracle returns a set K of parameters such that all family members obtained by changing only the values assigned to K violate ϕ. We argue that such an oracle must (1) induce little overhead in providing K , (2) be aware of the existence of parameters in the family, and (3) have (resemblance of) awareness about the semantics of the parameters and their values.
Oracles. With these requirements in mind, we construct a counterexample (CE)based oracle from scratch. We do so by carefully exploiting existing methods. We construct critical subsystems as CEs [1]. Critical subsystems are parts of the MC that suffice to refute the specification. If a hole is absent in a CE, its value is irrelevant. To avoid the cost of finding optimal CEs-an NP-hard problem [19]-we consider greedy CEs that are similar to [9]. However, our greedy CEs are aware of the parameters, and try to limit the occurrence of parameters in the CE. Finally, to provide awareness of the semantics of parameter values, we provide lower and upper bounds on all states: Their difference indicates how much varying the value at a hole may change the overall reachability probability. These bounds are efficiently computed by another oracle. This oracle analyses a quotient MDP obtained by employing an abstraction method that is part of the abstraction-refinement loop in [10].
A hybrid variant. The two oracles are significantly different. Abstraction refinement is deductive: it argues about single family members by considering (an aggregation of) all family members. The critical subsystem oracle is inductive: by examining a single family member, it infers statements about other family members. This suggests a middle ground: a hybrid strategy monitors the performance of the two oracles during the synthesis and suggests their best usage. More precisely, the hybrid strategy integrates the counterexample-based oracle into the abstraction-refinement loop.
Major results. We present a novel and dedicated oracle deployed in an efficacious synthesis loop. We use model-checking results on an abstraction to tailor smaller CEs. Our greedy and family-aware CE construction is substantially faster than the use of optimal CEs. Together, these two improvements yield CEs that are on par with optimal CEs, but are found much faster. The integration of multiple abstraction-refinement steps yields a superior performance:x We compare our performance with the abstraction-refinement loop from [10] using benchmarks from [10]. Benchmarks can be classified along two dimensions: (A) Benchmarks with a structure good for CE-generation. (B) Benchmarks with a structure good for abstraction-refinement. A-benchmarks are a natural strength of our novel oracle. Our simple, efficient hybrid strategy significantly outperforms the state-ofthe-art on A-benchmarks, while it only yields limited overhead for B-benchmarks. Most importantly, the novel hybrid strategy can solve benchmarks that are out of reach for pure abstraction-refinement or pure CE-based reasoning. In particular, our hybrid method is able to synthesize the optimal Herman protocol with memory-the synthesis time on a design space with 3.1 millions of candidate programs reduces from a day to minutes.

Related work
The synthesis problems for parametric probabilistic systems can be divided into the following two categories.
Topology synthesis, akin to the problem considered in this paper, assumes a finite set of parameters affecting the MC topology. Finding an instantiation satisfying a reachability property is NP-complete in the number of parameters [12], and can naively be solved by analyzing all individual family members. An alternative is to model the MC family by an MDP and resort to standard MDP modelchecking algorithms. Tools such as ProFeat [13] or QFLan [40] take this approach to quantitatively analyze alternative designs of software product lines [21,28]. These methods are limited to small families. This motivated (1) abstractionrefinement over the MDP representation [10], and (2) counterexample-guided inductive synthesis (CEGIS) for MCs [9], mentioned earlier. The alternative problem of sketching for probabilistic programs that fit given data is studied, e.g., in [32,38].
Parameter synthesis considers models with uncertain parameters associated to transition probabilities, and analyses how the system behaviour depends on the parameter values. The most promising techniques are based on parameter lifting that treats identical parameters in different transitions independently [8,36] and has been implemented in the state-of-the-art probabilistic model checkers Storm [18] and PRISM [27]. An alternative approach based on building rational functions for the satisfaction probability has been proposed in [15] and further improved in [22,17,4]. This approach has been also applied to different problems such as model repair [5,34,11].
Both synthesis problems can be also attacked by search-based techniques that do not ensure an exhaustive exploration of the parameter space. These include evolutionary techniques [23,31] and genetic algorithms [20]. Combinations with parameter synthesis have been used [7] to synthesize robust systems.

Problem Statement
We formalize the essential ingredients and the problem statement. See [3] for more material.

Definition 1 (MC).
A Markov chain (MC) is a tuple D = (S, s 0 , P ), where S is a finite set of states, s 0 ∈ S is an initial state, and P : S → Distr(S) is a transition probability function. We write P (s, t) to denote P (s)(t). The state s is absorbing if P (s, s) = 1.
Let K denote a finite set of discrete parameters with finite domain V k . For brevity, we often assume that all domains are the same, and omit the subscript k. A realization r maps parameters to values in their domain, i.e., r : K → V . Let R D denote the set of all realizations of a set D of MCs. A K-parameterized set of MCs D(K) contains the MCs D r , for every r ∈ R D . In Sect. 3, we give an operational model for such sets. In particular, realizations will fix the targets of transitions. In our experiments, we describe these sets using the PRISM modelling language where parameters are described by undefined integer values.
Properties and specifications. For simplicity, we consider (unbounded) reachability properties 1 . For a set T ⊆ S of target states, let P[D, s |= ♦T ] denote the probability in MC D to eventually reach some state in T when starting in the state s ∈ S. A property ϕ ≡ P λ [♦T ] with λ ∈ [0, 1] and ∈ {≤, ≥} expresses that the probability to reach T does relate to λ according to . If = ≤, then ϕ is a safety property; otherwise, it is a liveness property. Formally, state s in MC D satisfies ϕ if P[D, s |= ♦T ] ≥ λ. The MC D satisfies ϕ if the above holds for its initial state. A specification is a set of properties Φ = {ϕ i } i∈I , and D |= Φ if ∀i ∈ I : D |= ϕ i .

Problem statement.
The key problem statement in this paper is feasibility: Given a parameterized set of Markov chains D(K) over parameters K and a specification Φ, find a realization r : When D is clear from the context, we often write r |= Φ to denote D r |= Φ.
We additionally consider the optimizing variant of the synthesis problem. The maximal synthesis problem asks: given a maximizing property The minimal synthesis problem is defined analogously.
As the state space S, the set K of parameters, and their domains are all finite, the above synthesis problems are decidable. One possible solution, called the one-by-one approach [14], considers each realization r ∈ R D . The state-space and parameter-space explosion renders this approach unusable for large problems, necessitating the usage of advanced techniques that exploit the family structure.

Counterexample-Guided Inductive Synthesis
In this section, we recap a baseline for a counterexample-guided inductive synthesis (CEGIS) loop, as put forward in [9]. In particular, we first instantiate an oracle-guided synthesis method, discuss an operational model for families, giving structure to the parameterized set of Markov chains, and finally detail the usage of CEs to create an oracle.

Learner
Oracle  Fig. 1. A learner takes a set R of realizations, and has to find a realization D r satisfying the specification Φ. The learner maintains (a symbolic representation of) a set Q ⊆ R of realizations that need to be checked. It iteratively asks the oracle whether a particular r ∈ Q is a solution. If it is a solution, the oracle reports success. Otherwise, the oracle returns a set R containing r and potentially more realizations all violating Φ. The learner then prunes R from Q. In Section 4, we focus on creating an efficient oracle that computes a set R (with r ∈ R ) of realizations that are all violating Φ. In Section 5, we provide a more advanced framework that extends this method. The remainder of this section lays the groundwork for these sections.

Families of Markov chains
To avoid the need to iterate over all realizations, an efficient oracle exploits some structure of the family. In this paper, we focus on sets of Markov chains having different topologies. We explain our concepts using the operational model of families given in [10]. Our implementation supports (more expressive) PRISM programs with undefined integer constants.

Definition 2 (Family of MCs).
A family of MCs is a tuple D = (S, s 0 , K, B) with S and s 0 as before, K is a finite set of parameters with domains V k ⊆ S for each k ∈ K, and B : S → Distr(K) is a family of transition probability functions.
Function B of a family D of MCs maps each state to a distribution over parameters K. In the context of the synthesis of probabilistic models, these parameters represent unknown options or features of a system under design. Realizations are now defined as follows.

Definition 3 (Realization). A realization of a family
The set R D = k∈K V k of all possible realizations is exponential in |K|.

Counterexample-guided oracles
We first consider the feasibility synthesis for a single-property specification and later, cf. Remark 1, generalize this to multiple properties and to optimal synthesis. The notion of counterexamples is at the heart of the oracle from [9] and Sect. 4.
If an MC D |= ϕ, a counterexample (CE) based on a critical subsystem can serve as diagnostic information about the source of the failure. We consider the following CE, motivated by the notion of critical subsystem in [37].

Definition 4 (Counterexample). Let
where the transition probability function P is defined by: The set C and the sub-MC D↓C are called a counterexample (CE) for the property Let D r be an MC violating the specification ϕ. To compute other realizations violating ϕ, the oracle computes a critical subsystem D r ↓C, which is then used to deduce a so-called conflict for D r and ϕ.  It is straightforward to compute a set of violating realizations from a conflict. A generalization of realization r induced by the set We often use the term conflict to refer to its generalization. The size of a conflict, i.e., the number |K C | of relevant parameters K C is crucial. Small conflicts potentially lead to generalizing r to larger subfamilies r↑K C . It is thus important that the CEs contain as few parameterized transitions as possible. The size of a CE in terms of the number of states is not of interest. Furthermore, the overhead of providing CEs should be bounded from below by the payoff: Finding a large generalization may take some time, but small generalizations should be returned quickly. The CE-based oracle in [9] uses an off-the-shelf CE procedure [16,41], and mostly does not provide small CEs.

A Smart Oracle with Counterexamples and Abstraction
This section develops an oracle based on CEs, tailored for the use in an oracleguided inductive synthesis loop described in Sect. 3. Its main features are: -a fast greedy approach to compute CEs that provide small conflicts: We achieve this by taking into account the position of the parameters. -awareness about the semantics of parameters by using model-checking results from an abstraction of the family. Before going into details, we provide some illustrative examples.
A motivating example First, we illustrate what it means to take CEs that lead to small conflicts. Consider and a family B of transition probability functions defined in Fig. 3 (left). As the Can we do better? To answer this, consider CE generation as a game: The Pruner creates a critical subsystem C. The Adversary wins if it finds a MC satisfying ϕ containing C, thus refuting that C is a counterexample. In our setting, we change the game: The Adversary must select a family member rather than an arbitrary MC. Analogously, off-the-shelf CE generators construct a critical subsystem C that for every possible extension of C is a CE. These are CEs without context. In our game, the Adversary may not extend the MC arbitrarily, but must choose a family member. These are CEs modulo a family.
Back to the example: Observe that for a CE for D r0 , we could omit states t and s 1 from the set C of critical states: we know for sure that, once D r0 takes transition (s 0 , s 1 ), it will reach target state t with probability at least 0.6. This exceeds the threshold 0.3, regardless of the value of the parameter Y . Hence, for family D, the set C = {s 0 } is a critical subsystem. The immediate advantage is that this set induces conflict K C = {X} (parameter Y has been generalized). This enables us to reject all realizations from the set r 0 ↑K C = {r 0 , r 1 }. It is 'easier' to construct a CE for a (sub)family than for arbitrary MCs. More generally, a successful oracle needs to have access to useful bounds, and effectively integrate them in the CE generation.

Counterexample construction
We develop an algorithm using bounds on reachability probabilities, similar to the bounds used above. Let us assume that for some set of realizations R and for every state s, we have bounds lb R (s), ub R (s), such that for every r ∈ R we have lb R (s) ≤ P[D r , s |= ♦T ] ≤ ub R (s). Such bounds always exist (take 0 and 1). We see later how we compute these bounds.
In what follows, we fix r and denote D r = (S, s 0 , P ). Let us assume D r violates a safety property ϕ ≡ P ≤λ [♦T ]. The following definition is central:  − −− → s may be considered a 'shortcut' that by-passes successors of s and leads straight to target s with probability γ(s). To ensure that D↓C[γ] is a CE, the value γ(s) must be a lower bound on the reachability probability from s in D. When constructing a CE for a singular MC, we pick γ = 0, whereas when this MC is induced by a realization r ∈ R, we can safely pick γ = lb R . The CE will be valid for every r ∈ R. It is a CE-modulo-R.
Algorithmically, we employ a state-exploration approach and therefore start with C (0) = ∅, i.e., all states are initially rerouted. If this is a CE, we are done. Otherwise, if the rerouting D↓C (0) [γ] satisfies ϕ , then we 'expand' some states to obtain a CE. Naturally, we must expand reachable states to change the satisfaction of ϕ. By expanding some state s ∈ S, we abandon the abstraction associated with the shortcut s γ(s) − −− → s and replace it with concrete behavior that was inherent to state s in MC D. Expanding a state cannot decrease the induced reachability probability as lb R is a valid lower bound. This gradual expansion of the reachable state space continues until for some C ⊆ S the corresponding rerouting D↓C[γ] violates ϕ . This gradual expansion process terminates as D↓S[γ] ≡ D and our assumption is D |= ϕ. We show this process on an example.
and the corresponding conflict is K C (1) = supp(s 0 ) = {X}. This is smaller than the naively computed conflict {X, Y }. Fig. 2 that for an MC D r with D r |= ϕ, multiple CEs may exist inducing different conflicts. An efficient expansion strategy should yield a CE that induces a small amount of relevant parameters (to prune more family members) and this CE is preferably obtained by a small number of model-checking queries. The method presented in Alg. 1 meets these criteria. The algorithm expands multiple states between subsequent model checks, while expanding only states that are associated with parameters that are relevant. In particular, in each iteration, we keep track of the set K (i) of relevant parameters optimistically starting with K (0) = ∅. We compute (see line 3) the set C (i) of states that are reachable from the initial state via states which are associated only with relevant parameters in K (i) , i.e., via states for which supp(B(s)) ⊆ K (i) . Here, H (i) represents a state exploration 'horizon': the set of states reachable from C (i) but containing some (still) irrelevant parameters. We then construct the corresponding rerouting D↓C (i) [γ] and check whether it is a CE. Otherwise, we greedily choose a state s from the horizon H (i) containing the least number of irrelevant parameters and add these parameters to our conflict (see line 7). The resulting conflict may not be minimal, but is computed fast. Our algorithm applies to probabilistic liveness properties 2 too using γ = ub R .

Computing bounds
We compute lb R and ub R using an abstraction [10]. The method considers some set R of realizations and computes the corresponding quotient Markov decision process (MDP) that over-approximates the behavior of all MCs in the family R. Model checking this MDP yields an upper and a lower bound of the induced probabilities for all states over all realizations in R. That is, Bound(D, R) computes lb R ∈ R S and ub R ∈ R S such that for each s ∈ S: To allow for refinement, two properties are crucial (with point-wise inequalities): In [10], the abstraction and refinement together define an abstraction-refinement loop (AR) that addresses the feasibility problem. In the worst case, this loop analyses 2 · |R| quotient MDPs, which (as of now) may be arbitrarily larger than the number of family members they represent.

Hybrid Dual-Oracle Synthesis
We introduce an extended synthesis loop in which the abstraction-based reasoning is used to prune the family R, and to accelerate the CE-based oracle from Sect. 4. The intuitive idea is outlined in Fig. 5. Note that if the CE-based oracle is not exploited, we emulate AR (explained in computing bounds above), whereas if the abstraction oracle is not used, we emulate CEGIS (with the novel oracle). Let us motivate combining these oracles in a flexible way. The naive version outlined in the previous section assumed a single abstraction step, and invokes CEGIS with the bounds obtained from that step. Evidently, the better (tighter) the bounds γ, the better the CEs. However, the abstraction-based bounds for R may be very loose. These bounds can be improved by splitting the set R and using the bounds on the two sub-families. The idea is to run a limited number of AR steps and then invoke CEGIS. Our experiments reveal that it can be crucial to be adaptive, i.e., the integrated method must be able to detect at run time when to switch.
The proposed hybrid method switches between AR and CEGIS, where we allow for refining during the AR phase and use the obtained refined bounds during CEGIS. Additionally, we estimate the efficiency σ (e.g., the number of pruned MCs per time unit) of the two methods and allocate more time t to the method with superior performance. That is, if we detect that CEGIS prunes sub-families twice as fast as AR, we double the time in the next round for CEGIS. The resulting algorithm is summarized in Alg. 2. Recall that AR (at line 5) takes one family from R, either solves it or splits it and returns the set of undecided families R . In contrast, CEGIS processes multiple families from R until the timeout and then returns the set of undecided families R . This workflow is motivated by the fact that one iteration of AR (i.e., the involved MDP model-checking) is typically significantly slower that one CEGIS iteration.

Remark 1.
Although the developed framework for integrated synthesis has been discussed in the context of feasibility with respect to a single property ϕ, it can be easily generalized to handle multiple-property specifications as well as to treat optimal synthesis. Regarding multiple properties, the idea remains the same: Analyzing the quotient MDP with respect to multiple properties yields multiple probability bounds. After initiating a CEGIS-loop and obtaining an unsatisfiable realization, we can construct a separate conflict for each unsatisfied property, while using the corresponding probability bound to enhance the CE generation process. Optimal synthesis is handled similarly to feasibility, but, after obtaining a satisfiable solution, we update the optimizing property to exclude this solution: e.g., for maximal synthesis this translates to increasing the threshold of the maximizing property. Having exhausted the search space of family members, the last obtained solution is declared to be the optimal one.

Experimental evaluation
Implementation. We implemented the hybrid oracle on top of the probabilistic model checker Storm [18]. While the high-performance parts were implemented in C++, we used a python API to flexibly construct the overall synthesis loop. For SMT solving, we used Z3 [29]. The tool chain takes a PRISM [27] or JANI [6] sketch and a set of temporal properties, and returns a satisfying realization, if such exists, or outputs that such realization does not exist. The implementation in the form of an artefact is available at https://zenodo.org/record/4422543.

Set-up.
We compare the adaptive oracle-guided synthesis with two state-of-the-art synthesis methods: program-level CEGIS [9] using a MaxSat CE generation [16,41] and AR [10]. These use the same architecture and data structures from Storm. All experiments are run on an Ubuntu 19.04 machine with Intel i5-8300H (4 cores at 2.3 GHz) and using up to 8 GB RAM, with all the algorithms being executed on a single thread. The benchmarks consists of five different models, see Table 1, from various domains that were used in [9,10]. As opposed to the benchmark considered in [9,10], we use larger variants of Grid and Herman to better demonstrate differences in the performance of individual methods.
To investigate the scalability of the methods, we consider a new variant of the Herman model, that allows us to scale the number of randomization strategies and thus the family size. In particular, we will compare performance on two instances of different sizes: small Herman * (5k members) and large Herman * (3.1M members, other statistics are reported in Table 1).
To reason about the pruning efficiency of different synthesis methods, we want to avoid feasible synthesis problems, where the order of family exploration can lead to inconsistent performance. Instead, we will primarily focus on nonfeasible problems, where all realizations need to be explored in order to prove unsatisfiability. The experimental evaluation is presented in three parts. (1) We evaluate the novel CE construction method and compare it with the MaxSat-based oracle from [9]. (2) We compare the hybrid synthesis loop with the two baselines AR and CEGIS. (3) We consider novel hard synthesis instances (multi-property synthesis, finding optimal programs) on instances of the model Herman * .

Comparing CE construction methods
We consider the quality of the CEs and their generation time. In particular, we want to investigate (1) whether using CEs-modulo-families yields better CEes, (2) how the quality of CEs from the smart oracle compares to the MaxSat-based oracle, and how their time consumption compares. As a measure of quality of a CE, the average number of its relevant parameters w.r.t. the total number of its parameters is taken. That is, smaller ratios imply better CEs. To measure the influence of using CEs-modulo-families, two types of bounds are used: (i) trivial bounds (i.e., γ = 0 for safety and γ = 1 for liveness properties), and (ii) non-trivial bounds corresponding to the entire family R D representing the most conservative estimate. The results are reported in (the left part of) Table 2. In the next subsection, we investigate this same benchmark from the point of view of the performance of the synthesis methods, which also shows the immediate effect of the new CE generation strategy.
The first observation is that using non-trivial bounds (as opposed to trivial ones) for the state expansion approach can drastically decrease the conflict size. It turns out that the CEs obtained using the greedy approach are mostly larger than those obtained with the MaxSat method. However (see Grid ), even for trivial bounds, we may obtain smaller CEs than for MaxSat: computing a minimal-command CE does not necessarily induce an optimal conflict. On the other hand, comparing the run-times in the parentheses, one can see that computing CEs via the greedy state expansion is orders of magnitude faster than computing command-optimal ones using MaxSat. It is good to realize that the greedy method makes at most |K| model-checking queries to compute CEs, while the MaxSat method may make exponentially many such queries. Overall, the greedy method using the non-trivial bounds is able to obtain CEs of comparable quality as the MaxSat method, while being orders of magnitude faster.
Performance comparison with AR/CEGIS We compare the hybrid synthesis loop from Sect. 5 with two state-of-the-art baselines: CEGIS and AR. The results are displayed in (the right half of) Table 2. In all 10 cases, the hybrid method outperforms the baselines. It is up to an order of magnitude faster.
Let us discuss the performance of the hybrid method. We classify benchmarks along two dimensions: (1) the performance of CEGIS and (2) the performance of AR. Based on the empirical performance, we classify (Grid ) as good-for-CEGIS (and not for AR), Maze, Pole and DPM as good-for-AR (and not for CEGIS), and Herman as hard (for both). Roughly, AR works well when the quotient MDP does not blow up and its analysis is precise due to consistent schedulers, i.e., when the parameter dependencies are not crucial for a precise analysis. CEGIS performs well when the CEs are small and fast to compute. On the other hand, synthesis problems for which neither pure CEGIS nor pure AR are able to effectively reason about non-trivial subfamilies, inherently profit from a hybrid method. The main point we want to discuss is how the hybrid method reinforces the strengths of both methods, rather than their weaknesses.
In the hybrid method, there are two factors that determine the efficiency: (i) how fast do we get bounds on the reachability probability that are tight enough to enable construction of good counterexamples? and (ii) how good are the constructed counterexamples? The former factor is attributed to the proposed adaptive scheme (see Alg. 2), where the method will prefer AR-like analysis and continue refinement until the computed bounds allow construction of small counterexamples. The latter is reflected above. Let us now discuss how these two aspects are reflected in the benchmarks.
In good-for-CEGIS benchmarks like Grid, after analyzing a quotient MDP for the whole family, the hybrid method mostly profits from better CEs yielding better bounds, thus outperforming CEGIS. Indeed, the CEs are found so fast that the bottleneck is no longer their generation. This also explains why the speedup is not immediately translated to the speedup on the overall synthesis loop. In the good-for-AR benchmark DPM, the hybrid method provides only a minor improvement as it has to perform a large number of AR-iterations before the novel CE-based pruning can be effectively used. This can be considered as the worst-case scenario for the hybrid method. On other good-for-AR benchmarks like Maze and Pole, the good performance on AR allows to quickly obtain tight bounds which can then be exploited by CEGIS. Finally, in hard models like Herman, abstraction-refinement is very expensive, but even the bounds from the first round yield bounds that, as opposed to the trivial bounds, now enable good CEs: CEGIS can keep using these bounds to quickly prune the state space.
More complicated synthesis problems Our new approach can push the limits of synthesis benchmarks significantly. We illustrate this by considering a new variant of the Herman model, Herman * , and a property imposing an upper bound on the expected number of rounds until stabilization. We put this bound just below the optimal (i.e., the minimal) value, yielding a hard non-feasible problem. The synthesis results are summarized in Table 3. As CEGIS performs poorly on Herman, it is excluded here.  Table 3. The impact of scaling the family size (of the Herman * model) and handling more complex synthesis problems. The left part shows the results for the smaller variant (5k members), the right part for the larger one (3.1M members).
First, we investigate on small Herman * how the methods can handle the synthesis for multi-property specifications. We add one feasible property to the (still non-feasible) specification (row 'two properties'). While including more properties typically slows down the AR computation, the performance of the hybrid method is not affected as the corresponding overhead is mitigated by additional pruning opportunities. Second, we consider optimal synthesis for the property as used in the feasibility synthesis. The hybrid method requires only a minor overhead to find an optimal solution compared to checking feasibility. This overhead is significantly larger for AR.
Next, we consider larger Herman * model having significantly more randomization strategies (3.1M members) that include solutions leading to a considerably faster stabilization. This model is out of reach for existing synthesis approaches: one-by-one enumeration takes more than 27 hours and the AR performs even worse-solving the feasibility and optimality problems requires 47 and 55 hours, respectively. On the other hand, the proposed hybrid method is able to solve these problems within minutes. Finally, we consider a relaxed variant of optimal synthesis (5%-optimality) guaranteeing that the found solution is up to 5% worse than the optimal. Relaxing the optimally criterion speeds up the hybrid synthesis method by about a factor three.
These experiments clearly demonstrate that scaling up the synthesis problem several orders of magnitude renders existing synthesis methods infeasible: they need tens of hours to solve the synthesis problems. Meanwhile, the hybrid method tackles these difficult synthesis problems without significant penalty and is capable of producing a solution within minutes.

Conclusion
We present a novel method for the automated synthesis of probabilistic programs. Pairing the counterexample-guided inductive synthesis with the deductive oracle using an MDP abstraction, we develop a synthesis technique enabling faster construction of smaller counterexamples. Evaluating the method on case studies from different domains, we demonstrate that the novel CE construction and the adaptive strategy lead to a significant acceleration of the synthesis process. The proposed method is able to reduce the run-time for challenging problems from days to minutes. In our future work, we plan to investigate counterexamples on the quotient MDPs and improve the abstraction refinement strategy. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.