Reactive Control Improvisation

Reactive synthesis is a paradigm for automatically building correct-by-construction systems that interact with an unknown or adversarial environment. We study how to do reactive synthesis when part of the specification of the system is that its behavior should be random. Randomness can be useful, for example, in a network protocol fuzz tester whose output should be varied, or a planner for a surveillance robot whose route should be unpredictable. However, existing reactive synthesis techniques do not provide a way to ensure random behavior while maintaining functional correctness. Towards this end, we generalize the recently-proposed framework of control improvisation (CI) to add reactivity. The resulting framework of reactive control improvisation provides a natural way to integrate a randomness requirement with the usual functional specifications of reactive synthesis over a finite window. We theoretically characterize when such problems are realizable, and give a general method for solving them. For specifications given by reachability or safety games or by deterministic finite automata, our method yields a polynomial-time synthesis algorithm. For various other types of specifications including temporal logic formulas, we obtain a polynomial-space algorithm and prove matching PSPACE-hardness results. We show that all of these randomized variants of reactive synthesis are no harder in a complexity-theoretic sense than their non-randomized counterparts.


Introduction
Many interesting programs, including protocol handlers, task planners, and concurrent software generally, are open systems that interact over time with an external environment. Synthesis of such reactive systems requires finding an implementation that satisfies the desired specification no matter what the environment does. This problem, reactive synthesis, has a long history (see [7] for a survey). Reactive synthesis from temporal logic specifications [18] has been particularly well-studied and is being increasingly used in applications such as hardware synthesis [3] and robotic task planning [14].
In this paper, we investigate how to synthesize reactive systems with random behavior: in fact, systems where being random in a prescribed way is part of their specification. This is in contrast to prior work on stochastic games where randomness is used to model uncertain environments or randomized strategies are merely allowed, not required. Solvers for stochastic games may incidentally produce randomized strategies to satisfy a functional specification (and some types of specification, e.g. multi-objective queries [4], may only be realizable by randomized strategies), but do not provide a general way to enforce randomness. Unlike most specifications used in reactive synthesis, our randomness requirement is a property of a system's distribution of behaviors, not of an individual behavior. While probabilistic specification languages like PCTL [11] can capture some such properties, the simple and natural randomness requirement we study here cannot be concisely expressed by existing languages (even those as powerful as SGL [2]). Thus, randomized reactive synthesis in our sense requires significantly different methods than those previously studied.
However, we argue that this type of synthesis is quite useful, because introducing randomness into the behavior of a system can often be beneficial, enhancing variety, robustness, and unpredictability. Example applications include: -Synthesizing a black-box fuzz tester for a network service, we want a program that not only conforms to the protocol (perhaps only most of the time) but can generate many different sequences of packets: randomness ensures this. -Synthesizing a controller for a robot exploring an unknown environment, randomness provides a low-memory way to increase coverage of the space. It can also help to reduce systematic bias in the exploration procedure. -Synthesizing a controller for a patrolling surveillance robot, introducing randomness in planning makes the robot's future location harder to predict.
Adding randomness to a system in an ad hoc way could easily compromise its correctness. This paper shows how a randomness requirement can be integrated into the synthesis process, ensuring correctness as well as allowing trade-offs to be explored: how much randomness can be added while staying correct, or how strong can a specification be while admitting a desired amount of randomness?
To formalize randomized reactive synthesis we build on the idea of control improvisation, introduced in [6], formalized in [9], and further generalized in [8]. Control improvisation (CI) is the problem of constructing an improviser, a probabilistic algorithm which generates finite words subject to three constraints: a hard constraint that must always be satisfied, a soft constraint that need only be satisfied with some probability, and a randomness constraint that no word be generated with probability higher than a given bound. We define reactive control improvisation (RCI), where the improviser generates a word incrementally, alternating adding symbols with an adversarial environment. To perform synthesis in a finite window, we encode functional specifications and environment assumptions into the hard constraint, while the soft and randomness constraints allow us to tune how randomness is added to the system. The improviser obtained by solving the RCI problem is then a solution to the original synthesis problem.
The difficulty of solving reactive CI problems depends on the type of specification. We study several types commonly used in reactive synthesis, including reachability games (and variants, e.g. safety games) and formulas in the temporal logics LTL and LDL [17,5]. We also investigate the specification types studied in [8], showing how the complexity of the CI problem changes when adding reactivity. For every type of specification we obtain a randomized synthesis algorithm whose complexity matches that of ordinary reactive synthesis (in a finite window). This suggests that reactive control improvisation should be feasible in applications like robotic task planning where reactive synthesis tools have proved effective.
In summary, the main contributions of this paper are: -The reactive control improvisation (RCI) problem definition (Sec. 3); -The notion of width, a quantitative generalization of "winning" game positions that measures how many ways a player can win from that position (Sec. 4); -A characterization of when RCI problems are realizable in terms of width, and an explicit construction of an improviser (Sec. 4); -A general method for constructing efficient improvisation schemes (Sec. 5); -A polynomial-time improvisation scheme for reachability/safety games and deterministic finite automaton specifications (Sec. 6); -PSPACE-hardness results for many other specification types including temporal logics, and matching polynomial-space improvisation schemes (Sec. 7).
Finally, Sec. 8 summarizes our results and gives directions for future work.

Notation
Given an alphabet Σ, we write |w| for the length of a finite word w ∈ Σ * , λ for the empty word, Σ n for the words of length n, and Σ ≤n for ∪ 0≤i≤n Σ i , the set of all words of length at most n. We abbreviate deterministic/nondeterministic finite automaton by DFA/NFA, and context-free grammar by CFG. For an instance X of any such formalism, which we call a specification, we write L(X ) for the language (subset of Σ * ) it defines (note the distinction between a language and a representation thereof). We view formulas of Linear Temporal Logic (LTL) [17] and Linear Dynamic Logic (LDL) [5] as specifications using their natural semantics on finite words (see [5]). We use the standard complexity classes #P and PSPACE, and the PSPACE-complete problem QBF of determining the truth of a quantified Boolean formula. For background on these classes and problems see for example [1].
Some specifications we use as examples are reachability games [15], where players' actions cause transitions in a state space and the goal is to reach a target state. We group these games, safety games where the goal is to avoid a set of states, and reach-avoid games combining reachability and safety goals [19], together as reachability/safety games (RSGs). We draw reachability games as graphs in the usual way: squares are adversary-controlled states, and states with a double border are target states.

Synthesis Games
Reactive control improvisation will be formalized in terms of a 2-player game which is essentially the standard synthesis game used in reactive synthesis [7]. However, our formulation is slightly different for compatibility with the definition of control improvisation, so we give a self-contained presentation here.
Fix a finite alphabet Σ. The players of the game will alternate picking symbols from Σ, building up a word. We can then specify the set of winning plays with a language over Σ. To simplify our presentation we assume that players strictly alternate turns and that any symbol from Σ is a legal move. These assumptions can be relaxed in the usual way by modifying the winning set appropriately.
Finite words: While reactive synthesis is usually considered over infinite words, in this paper we focus on synthesis in a finite window, as it is unclear how best to generalize our randomness requirement to the infinite case. This assumption is not too restrictive, as solutions of bounded length are adequate for many applications. In fuzz testing, for example, we do not want to generate arbitrarily long files or sequences of packets. In robotic planning, we often want a plan that accomplishes a task within a certain amount of time. Furthermore, planning problems with liveness specifications can often be segmented into finite pieces: we do not need an infinite route for a patrolling robot, but can plan within a finite horizon and replan periodically. Replanning may even be necessary when environment assumptions become invalid. At any rate, we will see that the bounded case of reactive control improvisation is already highly nontrivial.
As a final simplification, we require that all plays have length exactly n ∈ N. To allow a range [m, n] we can simply add a new padding symbol to Σ and extend all shorter words to length n, modifying the winning set appropriately.
Definition 2.1. A history h is an element of Σ ≤n , representing the moves of the game played so far. We say the game has ended after h if |h| = n; otherwise it is our turn after h if |h| is even, and the adversary's turn if |h| is odd. 1] such that for any history h ∈ Σ ≤n with |h| < n, σ(h, ·) is a probability distribution over Σ. We write x ← σ(h) to indicate that x is a symbol randomly drawn from σ(h, ·).
The next definition is just the conditional probability of a play given a history, but works for histories with probability zero, simplifying our presentation.

Motivating Example
Consider synthesizing a planner for a surveillance drone operating near another, potentially adversarial drone. Discretizing the map into the 7x7 grid in Fig. 1 (ignoring the depicted trajectories for the moment), a route is a word over the four movement directions. Our specification is to visit the 4 circled locations in 30 moves without colliding with the adversary, assuming it cannot move into the 5 highlighted central locations.
Existing reactive synthesis tools can produce a strategy for the patroller ensuring that the specification is always satisfied. However, the strategy may be deterministic, so that in response to a fixed adversary the patroller will always follow the same route. Then it is easy for a third party to predict the route, which could be undesirable, and is in fact unnecessary if there are many other ways the drone can satisfy its specification. Reactive control improvisation addresses this problem by adding a new type of specification to the hard constraint above: a randomness requirement stating that no behavior should be generated with probability greater than a threshold ρ. If we set (say) ρ = 1/5, then any controller solving the synthesis problem must be able to satisfy the hard constraint in at least 5 different ways, never producing any given behavior more than 20% of the time. Our synthesis algorithm can in fact compute the smallest ρ for which synthesis is possible, yielding a controller that is maximally-randomized in that the system's behavior is as close to a uniform distribution as possible.
To allow finer tuning of how randomness is introduced into the controller, our definition also includes a soft constraint which need only be satisfied with some probability 1 − . This allows us to prefer certain safe behaviors over others. In our drone example, we require that with probability at least 3/4, we do not visit a circled location twice.
These hard, soft, and randomness constraints form an instance of our reactive control improvisation problem. Encoding the hard and soft constraints as DFAs, our algorithm (Sec. 6) produced a controller achieving the smallest realizable ρ = 2.2 × 10 −12 . We tested the controller using the PX4 autopilot [16] to refine the generated routes into control actions for a drone simulated in Gazebo [13] (videos and code are available online [10]). A selection of resulting trajectories are shown in Fig. 1 (the remainder in Appendix A): starting from the triangles, the patroller's path is solid, the adversary's dashed. The left run uses an adversary that moves towards the patroller when possible. The right runs, with a simple adversary moving in a fixed loop, illustrate the randomness of the synthesized controller.

Reactive Control Improvisation
Our formal notion of randomized reactive synthesis in a finite window is a reactive extension of control improvisation [8,9], which captures the three types of constraint (hard, soft, randomness) seen above. We use the notation of [8] for the specifications and languages defining the hard and soft constraints: . Given hard and soft specifications H and S of languages over Σ, an improvisation is a word w ∈ L(H) ∩ Σ n . It is admissible if w ∈ L(S). The set of all improvisations is denoted I, and admissible improvisations A.
The hard specification DFA H in our running example. The soft specification S is the same but with only the shaded states accepting.
Running Example. We will use the following simple example throughout the paper: each player may increment (+), decrement (−), or leave unchanged (=) a counter which is initially zero. The alphabet is Σ = {+, −, =}, and we set n = 4. The hard specification H is the DFA in Fig. 2 requiring that the counter stay within [−2, 2]. The soft specification S is a similar DFA requiring that the counter end at a nonnegative value. Then for example the word ++== is an admissible improvisation, satisfying both hard and soft constraints, and so is in A. The word +−=− on the other hand satisfies H but not S, so it is in I but not A. Finally, +++− does not satisfy H, so it is not an improvisation at all and is not in I.
If there is an improvising strategy σ, we say that C is realizable. An improviser for C is then an expected-finite time probabilistic algorithm implementing such a strategy σ, i.e. whose output distribution on input h ∈ Σ ≤n is σ(h, ·). Definition 3.3. Given an RCI instance C = (H, S, n, , ρ), the reactive control improvisation (RCI) problem is to decide whether C is realizable, and if so to generate an improviser for C.
Running Example. Suppose we set = 1/2 and ρ = 1/2. Let σ be the strategy which picks + or − with equal probability in the first move, and thenceforth picks the action which moves the counter closest to ±1 respectively. This satisfies the hard constraint, since if the adversary ever moves the counter to ±2 we immediately move it back. The strategy also satisfies the soft constraint, since with probability 1/2 we set the counter to +1 on the first move, and if the adversary moves to 0 we move back to +1 and remain nonnegative. Finally, σ also satisfies the randomness constraint, since each choice of first move happens with probability 1/2 and so no play can be generated with higher probability. So σ is an improvising strategy and this RCI instance is realizable.
We will study classes of RCI problems with different types of specifications: Definition 3.4. If HSPEC and SSPEC are classes of specifications, then the class of RCI instances C = (H, S, n, , ρ) where H ∈ HSPEC and S ∈ SSPEC is denoted RCI (HSPEC, SSPEC). We use the same notation for the decision problem associated with the class, i.e., given C ∈ RCI (HSPEC, SSPEC), decide whether C is realizable. The size |C| of an RCI instance is the total size of the bit representations of its parameters, with n represented in unary and , ρ in binary.
Finally, a synthesis algorithm in our context takes a specification in the form of an RCI instance and produces an implementation in the form of an improviser. This corresponds exactly to the notion of an improvisation scheme from [8]: . A polynomial-time improvisation scheme for a class P of RCI instances is an algorithm S with the following properties: Correctness: For any C ∈ P, if C is realizable then S(C) is an improviser for C, and otherwise S(C) = ⊥. Scheme efficiency: There is a polynomial p : R → R such that the runtime of S on any C ∈ P is at most p(|C|). Improviser efficiency: There is a polynomial q : R → R such that for every C ∈ P, if G = S(C) = ⊥ then G has expected runtime at most q(|C|).
The first two requirements simply say that the scheme produces valid improvisers in polynomial time. The third is necessary to ensure that the improvisers themselves are efficient: otherwise, the scheme might for example produce improvisers running in time exponential in the size of the specification. A main goal of our paper is to determine for which types of specifications there exist polynomial-time improvisation schemes. While we do find such algorithms for important classes of specifications, we will also see that determining the realizability of an RCI instance is often PSPACE-hard. Therefore we also consider polynomial-space improvisation schemes, defined as above but replacing time with space.

Width and Realizability
The most basic question in reactive synthesis is whether a specification is realizable. In randomized reactive synthesis, the question is more delicate because the randomness requirement means that it is no longer enough to ensure some property regardless of what the adversary does: there must be many ways to do so. Specifically, there must be at least 1/ρ improvisations if we are to generate each of them with probability at most ρ. Furthermore, at least this many improvisations must be possible given an unknown adversary: even if many exist, the adversary may be able to force us to use only a single one. We introduce a new notion of the size of a set of plays that takes this into account.
The width counts how many distinct plays can be generated regardless of what the adversary does. Intuitively, a "narrow" game -one whose set of winning plays has small width -is one in which the adversary can force us to choose among only a few winning plays, while in a "wide" one we always have many safe choices available. Note that which particular plays can be generated depends on the adversary: the width only measures how many can be generated. For example, W (X) = 1 means that a play in X can always be generated, but possibly a different element of X for different adversaries.
Running Example. Figure 3 shows the synthesis game for our running example: paths ending in circled or shaded states are plays in I or A respectively (ignore the state labels for now). At left, the bold arrows show the 4 plays in I possible against the adversary that moves away from 0, and down at 0. This shows W (I) ≤ 4, and in fact 4 plays are possible against any adversary, so W (I) = 4. Similarly, at right we see that W (A) = 1.
It will be useful later to have a relative version of width that counts how many plays are possible from a given position: Given a set of plays X ⊆ Σ n and a history h ∈ Σ ≤n , the width of X given h is W (X|h) = max σ min τ |{π | hπ ∈ X ∧ P σ,τ (π|h) > 0}|. This is a direct generalization of "winning" positions: if X is the set of winning plays, then W (X|h) counts the number of ways to win from h.
We will often use the following basic properties of W (X|h) without comment (for this proof, and the details of later proof sketches, see Appendix B). Note that (3)-(5) provide a recursive way to compute widths that we will use later, and which is illustrated by the state labels in Fig. 3.
For any set of plays X ⊆ Σ n and history h ∈ Σ ≤n : if it is the adversary's turn after h, then W (X|h) = min u∈Σ W (X|hu). Now we can state the realizability conditions, which are simply that I and A have sufficiently large width. In fact, the conditions turn out to be exactly the same as those for non-reactive CI except that width takes the place of size [9]. (1) C is realizable.
Running Example. We saw above that our example was realizable with = ρ = 1/2, and indeed 4 = W (I) ≥ 1/ρ = 2 and 1 = W (A) ≥ (1 − )/ρ = 1. However, if we put ρ = 1/3 we violate the second inequality and the instance is not realizable: essentially, we need to distribute probability 1 − = 1/2 among plays in A (to satisfy the soft constraint), but since W (A) = 1, against some adversaries we can only generate one play in A and would have to give it the whole 1/2 (violating the randomness requirement).
The difficult part of the Theorem is constructing an improviser when the inequalities (2) hold. Despite the similarity in these conditions to the non-reactive case, the construction is much more involved. We begin with a general overview.

Improviser Construction: Discussion
Our improviser can be viewed as an extension of the classical random-walk reduction of uniform sampling to counting [20]. In that algorithm (which was used in a similar way for DFA specifications in [8,9]), a uniform distribution over paths in a DAG is obtained by moving to the next vertex with probability proportional to the number of paths originating at it. In our case, which plays are possible depends on the adversary, but the width still tells us how many plays are possible. So we could try a random walk using widths as weights: e.g. on the first turn in Fig. 3, picking +, −, and = with probabilities 1/4, 2/4, and 1/4 respectively. Against the adversary shown in Fig. 3, this would indeed yield a uniform distribution over the four possible plays in I.
However, the soft constraint may require a non-uniform distribution. In the running example with = ρ = 1/2, we need to generate the single possible play in A with probability 1/2, not just the uniform probability 1/4 . This is easily fixed by doing the random walk with a weighted average of the widths of I and A: specifically, move to position h with probability proportional to αW (A|h) + β(W (I|h) − W (A|h)). In the example, this would result in plays in A getting probability α and those in I \ A getting probability β. Taking α sufficiently large, we can ensure the soft constraint is satisfied.
Unfortunately, this strategy can fail if the adversary makes more plays available than the width guarantees. Consider the game on the left of Fig. 4, where W (I) = 3 and W (A) = 2. This is realizable with = ρ = 1/3, but no values of α and β yield improvising strategies, essentially because an adversary moving from X to Z breaks the worst-case assumption that the adversary will minimize the number of possible plays by moving to Y . In fact, this instance is realizable but not by any memoryless strategy. To see this, note that all such strategies can be parametrized by the probabilities p and q in Fig. 4. To satisfy the randomness constraint against the adversary that moves from X to Y , both p and (1 − p)q must be at most 1/3. To satisfy the soft constraint against the adversary that moves from X to Z we must have pq + (1 − p)q ≥ 2/3, so q ≥ 2/3. But then (1 − p)q ≥ (1 − 1/3)(2/3) = 4/9 > 1/3, a contradiction.
To fix this problem, our improvising strategyσ (which we will fully specify in Algorithm 1 below) takes a simplistic approach: it tracks how many plays in A and I are expected to be possible based on their widths, and if more are available it ignores X Y Z W p q P Q R Fig. 4. Reachability games where a naïve random walk, and all memoryless strategies, fail (left) and where no strategy can optimize either or ρ against every adversary simultaneously (right).
them. For example, entering state Z from X there are 2 ways to produce a play in I, but since W (I|X) = 1 we ignore the play in I \ A. Extra plays in A are similarly ignored by being treated as members of I \ A. Ignoring unneeded plays may seem wasteful, but the proof of Theorem 4.1 will show thatσ nevertheless achieves the best possible : Against any adversary, the error probability of Algorithm 1 is at most opt .
Thus, if any improviser can achieve an error probability , ours does. We could ask for a stronger property, namely that against each adversary the improviser achieves the smallest possible error probability for that adversary. Unfortunately, this is impossible in general. Consider the game on the right in Fig. 4, with ρ = 1. Against the adversary which always moves up, we can achieve = 0 with the strategy that at P moves to Q. We can also achieve = 0 against the adversary that always moves down, but only with a different strategy, namely the one that at P moves to R. So there is no single strategy that achieves the optimal for every adversary. A similar argument shows that there is also no strategy achieving the smallest possible ρ for every adversary. In essence, optimizing or ρ in every case would require the strategy to depend on the adversary.

Improviser Construction: Details
Our improvising strategy, as outlined in the previous section, is shown in Algorithm 1. We first compute α and β, the (maximum) probabilities for generating elements of A and I \ A respectively. As in [8], we take α as large as possible given α ≤ ρ, and determine β from the probability left over (modulo a couple corner cases).
Next we initialize m A and m I , our expectations for how many plays in A and I respectively are still possible to generate. Initially these are given by W (A) and W (I), but as we saw above it is possible for more plays to become available. The function PARTITION handles this, deciding which m A (resp., m I ) out of the available W (A|h) (W (I|h)) plays we will use. The behavior of PARTITION is defined by the following lemma; its proof (in Appendix B) greedily takes the first m A possible plays in A under some canonical order and the first m I − m A of the remaining plays in I. Finally, we perform the random walk, moving from position h to hu with (unnormalized) probability t u , the weighted average described above. if it is our turn after h then 7: pick u ∈ Σ with probability proportional to tu and append it to h 10: the adversary picks u ∈ Σ given the history h; append it to h return h The next few lemmas establish thatσ is well-defined and in fact an improvising strategy, allowing us to prove Theorem 4.1. Throughout, we write m A (h) (resp., m I (h)) for the value of m A (m I ) at the start of the iteration for history h. We also write t(h) = αm A (h) + β(m I (h) − m A (h)) (so t(hu) = t u when we pick u).  Lemma 4.5. If W (I) ≥ 1/ρ, then Pσ ,τ (π) ≤ ρ for every π ∈ Σ n and τ .
Proof (sketch). If the adversary is deterministic, the weights we use for our random walk yield a distribution where each play π has probability either α or β (depending on whether m A (π) = 1 or 0). If the adversary assigns nonzero probability to multiple choices this only decreases the probability of individual plays. Finally, since W (I) ≥ 1/ρ we have α, β ≤ ρ.
Proof (of Theorem 4.1). We use a similar argument to that of [8].

A Generic Improviser
We now use the construction of Sec. 4 to develop a generic improvisation scheme usable with any class of specifications SPEC supporting the following operations: Intersection: Given specs X and Y, find Z such that L(Z) = L(X ) ∩ L(Y). Width Measurement: Given a specification X , a length n ∈ N in unary, and a history h ∈ Σ ≤n , compute W (X|h) where X = L(X ) ∩ Σ n .
Efficient algorithms for these operations lead to efficient improvisation schemes: Theorem 5.1. If the operations on SPEC above take polynomial time (resp. space), then RCI (SPEC, SPEC) has a polynomial-time (space) improvisation scheme.
Proof. Given an instance C = (H, S, n, , ρ) in RCI (SPEC, SPEC), we first apply intersection to H and S to obtain A ∈ SPEC such that L(A) ∩ Σ n = A. Since intersection takes polynomial time (space), A has size polynomial in |C|. Next we use width measurement to compute W (I) = W (L(H) ∩ Σ n |λ) and W (A) = W (L(A) ∩ Σ n |λ). If these violate the inequalities in Theorem 4.1, then C is not realizable and we return ⊥.
Otherwise C is realizable, andσ above is an improvising strategy. Furthermore, we can construct an expected finite-time probabilistic algorithm implementingσ, using width measurement to instantiate the oracles needed by Lemma 4.2. Determining m A (h) and m I (h) takes O(n) invocations of PARTITION, each of which is poly-time relative to the width measurements. These take time (space) polynomial in |C|, since H and A have size polynomial in |C|. As m A , m I ≤ |Σ| n , they have polynomial bitwidth and so the arithmetic required to compute t u for each u ∈ Σ takes polynomial time. Therefore the total expected runtime (space) of the improviser is polynomial.
Note that as a byproduct of testing the inequalities in Theorem 4.1, our algorithm can compute the best possible error probability opt given H, S, and ρ (see Corollary 4.1). Alternatively, given , we can compute the best possible ρ.
We will see below how to efficiently compute widths for DFAs, so Theorem 5.1 yields a polynomial-time improvisation scheme. If we allow polynomial-space schemes, we can use a general technique for width measurement that only requires a very weak assumption on the specifications, namely testability in polynomial space: Proof (sketch). We apply Theorem 5.1, computing widths recursively using Lemma 4.1, (3)- (5). As in the PSPACE QBF algorithm, the current path in the recursive tree and required auxiliary storage need only polynomial space.

Reachability Games and DFAs
Now we develop a polynomial-time improvisation scheme for RCI instances with DFA specifications. This also provides a scheme for reachability/safety games, whose winning conditions can be straightforwardly encoded as DFAs.
Suppose D is a DFA with states V , accepting states T , and transition function δ : V × Σ → V . Our scheme is based on the fact that W (L(D)|h) depends only on the state of D reached on input h, allowing these widths to be computed by dynamic programming. Specifically, for all v ∈ V and i ∈ {0, . . . , n} we define: otherwise.
Running Example. Figure 6 shows the values C(v, i) in rows from i = n downward. Proof. We implement Theorem 5.1. Intersection can be done with the standard product construction. For width measurement we compute the quantities C(v, i) by dynamic programming (from i = n down to i = 0) and apply Lemma 6.1.

Temporal Logics and Other Specifications
In this section we analyze the complexity of reactive control improvisation for specifications in the popular temporal logics LTL and LDL. We also look at NFA and CFG specifications, previously studied for non-reactive CI [8], to see how their complexities change in the reactive case. For LTL specifications, reactive control improvisation is PSPACE-hard because this is already true of ordinary reactive synthesis in a finite window (we suspect this has been observed but could not find a proof in the literature).
Theorem 7.1. Finite-window reactive synthesis for LTL is PSPACE-hard.
Proof (sketch). Given a QBF φ = ∃x∀y . . . χ, we can view assignments to its variables as traces over a single proposition. In polynomial time we can construct an LTL formula ψ whose models are the satisfying assignments of χ. Then there is a winning strategy to generate a play satisfying ψ iff φ is true. This is perhaps disappointing, but is an inevitable consequence of LTL subsuming Boolean formulas. On the other hand, our general polynomial-space scheme applies to LTL and its much more expressive generalization LDL: Theorem 7.2. RCI (LDL, LDL) has a polynomial-space improvisation scheme.
Proof. This follows from Theorem 5.2, since satisfaction of an LDL formula by a finite word can be checked in polynomial time (e.g. by combining dynamic programming on subformulas with a regular expression parser).
Thus for temporal logics polynomial-time algorithms are unlikely, but adding randomization to reactive synthesis does not increase its complexity.
The same is true for NFA and CFG specifications, where it is again PSPACE-hard to find even a single winning strategy: Theorem 7.3. Finite-window reactive synthesis for NFAs is PSPACE-hard.
Proof (sketch). Reduce from QBF as in Theorem 7.1, constructing an NFA accepting the satisfying assignments of χ (as done in [12]). Proof. By Theorem 5.2, since CFG parsing can be done in polynomial time.
Since NFAs can be converted to CFGs in polynomial time, this completes the picture for the kinds of CI specifications previously studied. In non-reactive CI, DFA specifications admit a polynomial-time improvisation scheme while for NFAs/CFGs the CI problem is #P-equivalent [8]. Adding reactivity, DFA specifications remain polynomialtime while NFAs and CFGs move up to PSPACE.

Conclusion
In this paper we introduced reactive control improvisation as a framework for modeling reactive synthesis problems where random but controlled behavior is desired. RCI provides a natural way to tune the amount of randomness while ensuring that safety or other constraints remain satisfied. We showed that RCI problems can be efficiently solved in many cases occurring in practice, giving a polynomial-time improvisation scheme for reachability/safety or DFA specifications. We also showed that RCI problems with specifications in LTL or LDL, popularly used in planning, have the PSPACE-hardness typical of bounded games, and gave a matching polynomial-space improvisation scheme. This scheme generalizes to any specification checkable in polynomial space, including NFAs, CFGs, and many more expressive formalisms. Table 1 summarizes these results.
These results show that, at a high level, finding a maximally-randomized strategy using RCI is no harder than finding any winning strategy at all: for specifications yielding games solvable in polynomial time (respectively, space), we gave polynomial-time (space) improvisation schemes. We therefore hope that in applications where ordinary reactive synthesis has proved tractable, our notion of randomized reactive synthesis will also. In particular, we expect our DFA scheme to be quite practical, and are experimenting with applications in robotic planning. On the other hand, our scheme for temporal logic specifications seems unlikely to be useful in practice without further refinement. An interesting direction for future work would be to see if modern solvers for quantified Boolean formulas (QBF) could be leveraged or extended to solve these RCI problems. This could be useful even for DFA specifications, as conjoining many simple properties can lead to exponentially-large automata. Symbolic methods based on constraint solvers would avoid such blow-up.
We are also interested in extending the RCI problem definition to unbounded or infinite words, as typically used in reactive synthesis. These extensions, as well as that to continuous signals, would be useful in robotic planning, cyber-physical system testing, and other applications. However, it is unclear how best to adapt our randomness constraint to settings where the improviser can generate infinitely many words. In such settings the improviser could assign arbitrarily small or even zero probability to every word, rendering the randomness constraint trivial. Even in the bounded case, RCI extensions with more complex randomness constraints than a simple upper bound on individual word probabilities would be worthy of study. One possibility would be to more directly control diversity and/or unpredictability by requiring the distribution of the improviser's output to be close to uniform after transformation by a given function.

A Patrolling Drone Experiments
As described above, we ran experiments with two adversary strategies: one that moves towards the patrolling drone whenever possible, and one that moves in a fixed loop. We ran the improviser four times against each adversary, obtaining the trajectories in Figures 7 and 8. Animations showing the trajectories over time (and so illustrating that collisions do not in fact occur) are available online [10]. This site also provides our implementation of the DFA improvisation scheme, and implementations of the specifications and adversaries used in our drone experiments (as well as an adversary controlled by the user, so that one can type in actions and see how the improviser responds).
Letτ be a strategy which on history h picksũ and on histories prefixed with hũ follows τũ (otherwise picking arbitrarily). Then Proof. Index the elements of Σ via some canonical order as (u j ) 0≤j< for some ≥ 1. We first construct the partition j< m A j of m A . Find the greatest k ≤ such that j<k W (A|hu j ) ≤ m A . This is well-defined, since if k = 0 then the sum is zero and the condition is satisfied. If j<k W (A|hu j ) = m A we put m A j = W (A|hu j ) for j < k and m A j = 0 for j ≥ k. If instead j<k W (A|hu j ) < m A we must have k < , since j< W (A|hu j ) = u∈Σ W (A|hu) = W (A|h) ≥ m A . Then by the definition of k we have j≤k W (A|hu j ) > m A , so W (A|hu k ) > m A − j<k W (A|hu j ). Therefore we put m A j = W (A|hu j ) for j < k, m A k = m A − j<k W (A|hu j ), and m A j = 0 for j > k. Now we construct the partition j< m I j of m I . We do this by partitioning the difference m I − m A along the same lines as above, then adding back m A j to ensure Find the greatest k ≤ such that j<k d j ≤ m I − m A . This is well-defined since if k = 0 the sum is zero, and m I − m A ≥ 0 by assumption. If These partitions are canonical since the values of k used in each construction are uniquely determined (and the ordering of Σ is fixed). Also, k may be found by a linear search from 0 up to , which has value at most |Σ|. The quantities W (I|hu j ) all have polynomial bitwidth (they are bounded above by |Σ| n ), so the arithmetic above can be done in polynomial time. Therefore the total time needed to construct the partitions is polynomial relative to oracles for W (I|·) and W (A|·). Proof. First we show by induction on i that for all plays hπ ∈ Πσ ,τ with |h| = i, we have: Now take any play hπ ∈ Πσ ,τ with |h| = i < n and suppose the hypothesis holds. If it is the adversary's turn after h, then if the adversary outputs u ∈ Σ we have m A (h) = m A (hu) and m I (h) = m I (hu). So since m A (h) ≤ W (A|h) = min v∈Σ W (A|hv) ≤ W (A|hu) and m I (h) ≤ W (I|h) = min v∈Σ W (I|hv) ≤ W (I|hu), the hypothesis holds in the next step. If instead it is our turn after h and we output u ∈ Σ, then m A (hu) and m I (hu) are given by Lemma 4.2 and 0 ≤ m A (hu) ≤ m I (hu) ≤ W (I|hu) and m A (hu) ≤ W (A|hu) by construction. Furthermore t(hu) > 0, since if t(hu) = 0 then σ has probability zero to output u, a contradiction. This implies m I (hu) > 0, since if m I (hu) = 0 then m A (hu) = 0 and so t(hu) = 0. Therefore by induction we always have 0 ≤ m A (h) ≤ m I (h) ≤ W (I|h), m A (h) ≤ W (A|h), and t(h), m I (h) > 0. Now for any history h ∈ Σ ≤n after which it is our turn, by construction the quantities m A (hu) and m I (hu) for u ∈ Σ form partitions of m A (h) and m I (h) respectively. So u∈Σ t(hu) = u∈Σ αm A (hu)+β(m I (hu)−m A (hu)) = αm A (h)+β(m I (h)− m A (h)) = t(h) > 0. Soσ(h, ·) is a probability distribution over Σ, andσ is a welldefined strategy.