Keywords

figure a

1 Introduction

Hybrid systems are important models of many applications, capturing their differential equations and control [3, 4, 27, 28, 33, 41]. For overall system safety, the correctness of the control decisions in a hybrid system is crucial. Formal verification techniques can justify correctness properties. Such correct controllers have been identified in a sequence of challenging case studies [12, 14, 19, 22, 32, 34, 40]. A useful approach to verified control is to design and verify a safe control envelope around possible safe control actions. Safe control envelopes are nondeterministic programs whose every execution is safe. In contrast with controllers, control envelopes define entire families of controllers to allow control actions under as many circumstances as possible, as long as they maintain the safety of the hybrid system. Safe control envelopes allow the verification of abstractions of control systems, isolating the parts relevant to the safety feature of interest, without involving the full complexity of a specific control implementation. The full control system is then monitored for adherence to the safe control envelope at runtime [29]. The control envelope approach allows a single verification result to apply to multiple specialized control implementations, optimized for different objectives. It puts industrial controllers that are too complex to verify directly within the reach of verification, because a control envelope only needs to model the safety-critical aspects of the controller. Control envelopes also enable applications like justified speculative control [17], where machine-learning-based agents control safety-critical systems safeguarded within a verified control envelope, or [36], where these envelopes generate reward signals for reinforcement learning.

Control envelope design is challenging. Engineers are good at specifying the shape of a model and listing the possible control actions by translating client specifications, which is crucial for the fidelity of the resulting model. But identifying the exact control conditions required for safety in a model is a much harder problem that requires design insights and creativity, and is the main point of the deep area of control theory. Most initial system designs are incorrect and need to be fixed before verification succeeds. Fully rigorous justification of the safety of the control conditions requires full verification of the resulting controller in the hybrid systems model. We present a synthesis technique that addresses this hard problem by filling in the holes of a hybrid systems model to identify a correct-by-construction control envelope that is as permissive as possible.

Our approach is called Control Envelope Synthesis via Angelic Refinements (CESAR). The idea is to implicitly characterize the optimal safe control envelope via hybrid games yielding maximally permissive safe solutions in differential game logic [33]. To derive explicit solutions used for controller monitoring at runtime, we successively refine the games while preserving safety and, if possible, optimality. Our experiments demonstrate that CESAR solves hybrid systems synthesis challenges requiring different control insights.

Contributions. The primary contributions of this paper behind CESAR are:

  • optimal hybrid systems control envelope synthesis via hybrid games.

  • differential game logic formulas identifying optimal safe control envelopes.

  • refinement techniques for safe control envelope approximation, including bounded fixpoint unrollings via a recurrence, which exploits action permanence (a hybrid analogue to idempotence).

  • a primal/dual game counterpart optimality criterion.

2 Background: Differential Game Logic

We use hybrid games written in differential game logic (dGL, [33]) to represent solutions to the synthesis problem. Hybrid games are two-player noncooperative zero-sum sequential games with no draws that are played on a hybrid system with differential equations. Players take turns and in their turn can choose to act arbitrarily within the game rules. At the end of the game, one player wins, the other one loses. The players are classically called Angel and Demon. Hybrid systems, in contrast, have no agents, only a nondeterministic controller running in a nondeterministic environment. The synthesis problem consists of filling in holes in a hybrid system. Thus, expressing solutions for hybrid system synthesis with hybrid games is one of the insights of this paper.

An example of a game is \({(v\mathrel {{:}{=}}1 \cap v\mathrel {{:}{=}}-1) \,;\,\{x' = v\}}\). In this game, first Demon chooses between setting velocity v to 1, or to -1. Then, Angel evolves position x as \(x'=v\) for a duration of her choice. Differential game logic uses modalities to set win conditions for the players. For example, in the formula \([(v\mathrel {{:}{=}}1 \cap v\mathrel {{:}{=}}-1) \,;\,\{x' = v\}] \, x \ne 0\), Demon wins the game when \(x\ne 0\) at the end of the game and Angel wins otherwise. The overall formula represents the set of states from which Demon can win the game, which is \(x\ne 0\) because when \(x<0\), Demon has the winning strategy to pick \(v\mathrel {{:}{=}}-1\), so no matter how long Angel evolves \(x'=v\), \(x\) remains negative. Likewise, when \(x>0\), Demon can pick \(v\mathrel {{:}{=}}1\). However, when \(x=0\), Angel has a winning strategy: to evolve \(x'=v\) for zero time, so that \(x\) remains zero regardless of Demon’s choice.

We summarize dGL ’s program notation (Table 1). See [33] for full exposition. Assignment \(x\mathrel {{:}{=}}\theta \) instantly changes the value of variable \(x\) to the value of \(\theta \). Challenge \(?\psi \) continues the game if \(\psi \) is satisfied in the current state, otherwise Angel loses immediately. In continuous evolution \( x'=\theta ~ \& ~ \psi \) Angel follows the differential equation \(x'=\theta \) for some duration of her choice, but loses immediately on violating \(\psi \) at any time. Sequential game \(\alpha ;\beta \) first plays \(\alpha \) and when it terminates without a player having lost, continues with \(\beta \). Choice \({\alpha }\cup {\beta }\) lets Angel choose whether to play \(\alpha \) or \(\beta \). For repetition \({\alpha }^{*}\), Angel repeats \(\alpha \) some number of times, choosing to continue or terminate after each round. The dual game \(\alpha ^d\) switches the roles of players. For example, in the game \(?\psi ^d\), Demon passes the challenge if the current state satisfies \(\psi \), and otherwise loses immediately.

Table 1. Hybrid game operators for two-player hybrid systems

In games restricted to the structures listed above but without \(\alpha ^d\), all choices are resolved by Angel alone with no adversary, and hybrid games coincide with hybrid systems in differential dynamic logic (\(\textsf{dL}\)) [33]. We will use this restriction to specify the synthesis question, the sketch that specifies the shape and safety properties of control envelopes. But to characterize the solution that fills in the blanks of the control envelope sketch, we use games where both Angel and Demon play. Notation we use includes demonic choice \(\alpha \cap \beta \), which lets Demon choose whether to run \(\alpha \) or \(\beta \). Demonic repetition \(\alpha ^\times \) lets Demon choose whether to repeat \(\alpha \) choosing whether to stop or go at the end of every run. We define \({\alpha }^{* \le n}\) and \({\alpha }^{\times \le n}\) for angelic and demonic repetitions respectively of at most n times.

In order to express properties about hybrid games, differential game logic formulas refer to the existence of winning strategies for objectives of the games (e.g., a controller has a winning strategy to achieve collision avoidance despite an adversarial environment). The set of dGL  formulas is generated by the following grammar (where \({\sim }\in \{<,\le ,=,\ge ,>\}\) and \(\theta _1,\theta _2\) are arithmetic expressions in \(+,-,\cdot ,/\) over the reals, \(x\) is a variable, \(\alpha \) is a hybrid game):

$$ \phi {:}{=}\theta _1 \sim \theta _2 \mid \lnot \phi \mid \phi \wedge \psi \mid \phi \vee \psi \mid \phi \rightarrow \psi \mid \forall x{\,} \phi \mid \exists x{\,} \phi \mid [\alpha ] \, \phi \mid \langle \alpha \rangle \, \phi $$

Comparisons of arithmetic expressions, Boolean connectives, and quantifiers over the reals are as usual. The modal formula \(\langle \alpha \rangle \, \phi \) expresses that player Angel has a winning strategy to reach a state satisfying \(\phi \) in hybrid game \(\alpha \). Modal formula \([\alpha ] \, \phi \) expresses the same for Demon. The fragment without modalities is first-order real arithmetic. Its fragment without quantifiers is called propositional arithmetic \(\mathcal {P}_{\mathbb {R}}\). Details on the semantics of dGL can be found in [33]. A formula \(\phi \) is valid, written \(\vDash _{}\phi \), iff it is true in every state \(\omega \). States are functions assigning a real number to each variable. For instance, \(\phi \rightarrow [\alpha ] \, \psi \) is valid iff, from all initial states satisfying \(\phi \), Demon has a winning strategy in game \(\alpha \) to achieve \(\psi \).

Control Safety Envelopes by Example. In order to separate safety critical aspects from other system goals during control design, we abstractly describe the safe choices of a controller with safe control envelopes that deliberately underspecify when and how to exactly execute certain actions. They focus on describing in which regions it is safe to take actions. For example, Model 1 designs a train control envelope [34] that must stop by the train by the end of movement authority \(e\) located somewhere ahead, as assigned by the train network scheduler. Past \(e\), there may be obstacles or other trains. The train’s control choices are to accelerate or brake as it moves along the track. The goal of CESAR is to synthesize the framed formulas in the model, that are initially blank.

figure b

Line 6 describes the safety property that is to be enforced at all times: the train driving at position p with velocity v must not go past position e. Line 1 lists modeling assumptions: the train is capable of both acceleration (\(A{>}0\)) and deceleration (\(B{>}0\)), the controller latency is positive (\(T{>}0\)) and the train cannot move backwards as a product of braking (this last fact is also reflected by having \(v \ge 0\) as a domain constraint for the plant on Line 5). These assumptions are fundamentally about the physics of the problem being considered. In contrast, Line 2 features a controllability assumption that can be derived from careful analysis. Here, this synthesized assumption says that the train cannot start so close to \(e\) that it won’t stop in time even if it starts braking immediately. Line 3 and Line 4 describe a train controller with two actions: accelerating (\(a\mathrel {{:}{=}}A\)) and braking (\(a\mathrel {{:}{=}}-B\)). Each action is guarded by a synthesized formula, called an action guard that indicates when it is safe to use. Angel has control over which action runs, and adversarially plays with the objective of violating safety conditions. But Angel’s options are limited to only safe ones because of the synthesized action guards, ensuring that Demon still wins and the overall formula is valid. In this case, braking is always safe whereas acceleration can only be allowed when the distance to end position \(e\) is sufficiently large. Finally, the plant on Line 5 uses differential equations to describe the train’s kinematics. A timer variable t is used to ensure that no two consecutive runs of the controller are separated by more than time T. Thus, this controller is time-triggered.

Overview of CESAR. CESAR first identifies the optimal solution for the blank of Line 2. Intuitively, this blank should identify a controllable invariant, which denotes a set of states where a controller with choice between acceleration and braking has some strategy (to be enforced by the conditions of Line 3 and Line 4) that guarantees safe control forever. Such states can be characterized by the following dGL formula where Demon, as a proxy for the controller, decides whether to accelerate or brake: \([((a\mathrel {{:}{=}}A \cap a\mathrel {{:}{=}}-B) \,;\,{ \textsf {plant}})^*] \, { \textsf {safe}} \) where plant and safe are from Model 1. When this formula is true, Demon, who decides when to brake to maintain the safety contract, has a winning strategy that the controller can mimic. When it is false, Demon, a perfect player striving to maintain safety, has no winning strategy, so a controller has no guaranteed way to stay safe either.

This dGL formula provides an implicit characterization of the optimal controllable invariant from which we derive an explicit formula in \(\mathcal {P}_{\mathbb {R}}\) to fill the blank with using symbolic execution. Symbolic execution solves a game following the axioms of dGL to produce an equivalent \(\mathcal {P}_{\mathbb {R}}\) formula (Section 3.7). However, our dGL formula contains a loop, for which symbolic execution will not terminate in finite time. To reason about the loop, we refine the game, modifying it so that it is easier to symbolically execute, but still at least as hard for Demon to win so that the controllable invariant that it generates remains sound. In this example, the required game transformation first restricts Demon’s options to braking. Then, it eliminates the loop using the observation that the repeated hybrid iterations \((a\mathrel {{:}{=}}-B;{ \textsf {plant}})^*\) behave the same as just following the continuous dynamics of braking for unbounded time. It replaces the original game with \( a\mathrel {{:}{=}}-B \,;\,t\mathrel {{:}{=}}0 \,;\,\{p'=v, v'=a \ \& \ \wedge v \ge 0\}\), which is loop-free and easily symbolically executed. Symbolically executing this game to reach safety condition safe yields controllable invariant \(e-p>\frac{v^2}{2B}\) to fill the blank of Line 2.

Intuitively, this refinement (formalized in Section 3.4) captures situations where the controller stays safe forever by picking a single control action (braking). It generates the optimal solution for this example because braking forever is the dominant strategy: given any state, if braking forever does not keep the train safe, then certainly no other strategy will. However, there are other problems where the dominant control strategy requires the controller to strategically switch between actions, and this refinement misses some controllable invariant states. So we introduce a new refinement: bounded game unrolling via a recurrence (Section 3.5). A solution generated by unrolling n times captures states where the controller can stay safe by switching control actions up to n times.

Having synthesized the controllable invariant, CESAR fills the action guards (Line 3 and Line 4). An action should be permissible when running it for one iteration maintains the controllable invariant. For example, acceleration is safe to execute exactly when \([a\mathrel {{:}{=}}A;{ \textsf {plant}} ]e-p>\frac{v^2}{2B}\). We symbolically execute this game to synthesize the formula that fills the guard of Line 3.

3 Approach

This section formally introduces the Control Envelope Synthesis via Angelic Refinements (CESAR) approach for hybrid systems control envelope synthesis.

3.1 Problem Definition

We frame the problem of control envelope synthesis in terms of filling in holes in a problem of the following shape:

(1)

Here, the control envelope consists of a nondeterministic choice between a finite number of guarded actions. Each action \({ \textsf {act}} _i\) is guarded by a condition to be determined in a way that ensures safety within a controllable invariant [6, 18] to be synthesized also. The plant is defined by the following template:

$$ \begin{aligned} { \textsf {plant}} \ \equiv \ t\mathrel {{:}{=}}0 \,;\,\{{{x' = f(x), \, t'=1 \,}}\, \& \,\, { \textsf {domain}} \wedge t \le T\}. \end{aligned}$$
(2)

This ensures that the plant must yield to the controller after time T at most, where T is assumed to be positive and constant. In addition, we make the following assumptions:

  1. 1.

    Components \({ \textsf {assum}} \), \({ \textsf {safe}} \) and \({ \textsf {domain}} \) are propositional arithmetic formulas.

  2. 2.

    Timer variable t is fresh (does not occur except where shown in template).

  3. 3.

    Programs \({ \textsf {act}} _i\) are discrete \(\textsf{dL} \) programs that can involve choices, assignments and tests with propositional arithmetic. Variables assigned by \({ \textsf {act}} _i\) must not appear in \({ \textsf {safe}} \). In addition, \({ \textsf {act}} _i\) must terminate in the sense that \(\vDash _{}\langle { \textsf {act}} _i \rangle \, \text {true}\).

  4. 4.

    The modeling assumptions \({ \textsf {assum}} \) are invariant in the sense that \(\vDash _{}{ \textsf {assum}} \rightarrow [(\cup _i \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} ] \, { \textsf {assum}} \). This holds trivially for assumptions about constant parameters such as \(A>0\) in Model 1 and this ensures that the controller can always rely on them being true.

Definition 1

A solution to the synthesis problem above is defined as a pair (IG) where I is a formula and G maps each action index i to a formula \(G_i\). In addition, the following conditions must hold:

  1. 1.

    Safety is guaranteed: is valid and \(({ \textsf {assum}} \wedge I)\) is a loop invariant that proves it so.

  2. 2.

    There is always some action: \(({ \textsf {assum}} \wedge I) \rightarrow \bigvee _i G_i\) is valid.

Condition 2 is crucial for using the resulting nondeterministic control envelope, since it guarantees that safe actions are always available as a fallback.

3.2 An Optimal Solution

Solutions to a synthesis problem may differ in quality. Intuitively, a solution is better than another if it allows for a strictly larger controllable invariant. In case of equality, the solution with the more permissive control envelope wins. Formally, given two solutions \(S = (I, G)\) and \(S' = (I', G')\), we say that \(S'\) is better or equal to S (written \(S \sqsubseteq S'\)) if and only if \(\vDash _{}{ \textsf {assum}} \rightarrow (I \rightarrow I')\) and additionally either \(\vDash _{}{ \textsf {assum}} \rightarrow \lnot (I' \rightarrow I)\) or \(\vDash _{}({ \textsf {assum}} \wedge I) \rightarrow \bigwedge _i \, (G_i \rightarrow G'_i)\). Given two solutions S and \(S'\), one can define a solution \(S \sqcap S' = (I \vee I',\, i \mapsto (I \wedge G_i \,\vee \, I' \wedge G_i'))\) that is better or equal to both S and \(S'\) (\(S \sqsubseteq S \sqcap S'\) and \(S' \sqsubseteq S \sqcap S'\)). A solution \(S'\) is called the optimal solution when it is the maximum element in the ordering, so that for any other solution S, \(S \sqsubseteq S'\). The optimal solution exists and is expressible in \(\textsf {dGL} \):

$$\begin{aligned} I^{{{\,\textrm{opt}\,}}}&\ \equiv \ [((\cap _i \, { \textsf {act}} _i) \,;\,{ \textsf {plant}})^*] \, { \textsf {safe}} \end{aligned}$$
(3)
$$\begin{aligned} G^{{{\,\textrm{opt}\,}}}_i &\ \equiv \ [{ \textsf {act}} _i \,;\,{ \textsf {plant}} ] \, I^{{{\,\textrm{opt}\,}}}. \end{aligned}$$
(4)

Intuitively, \(I^{{{\,\textrm{opt}\,}}}\) characterizes the set of all states from which an optimal controller (played here by Demon) can keep the system safe forever. In turn, \(G^{{{\,\textrm{opt}\,}}}\) is defined to allow any control action that is guaranteed to keep the system within \(I^{{{\,\textrm{opt}\,}}}\) until the next control cycle as characterized by a modal formula. Section 3.3 formally establishes the correctness and optimality of \(S^{{{\,\textrm{opt}\,}}}\equiv (I^{{{\,\textrm{opt}\,}}}, \, G^{{{\,\textrm{opt}\,}}})\).

While it is theoretically reassuring that an optimal solution exists that is at least as good as all others and that this optimum can be characterized in dGL, such a solution is of limited practical usefulness since Eq. (3) cannot be executed without solving a game at runtime. Rather, we are interested in explicit solutions where I and G are quantifier-free real arithmetic formulas. There is no guarantee in general that such solutions exist that are also optimal, but our goal is to devise an algorithm to find them in the many cases where they exist or find safe approximations otherwise.

3.3 Controllable Invariants

The fact that \(S^{{{\,\textrm{opt}\,}}}\) is a solution can be characterized in logic with the notion of a controllable invariant that, at each of its points, admits some control action that keeps the plant in the invariant for one round. All lemmas and theorems throughout this paper are proved in the extended preprint [21, Appendix B].

Definition 2 (Controllable Invariant)

[Controllable Invariant] A controllable invariant is a formula I such that \(\,\vDash _{}I \rightarrow { \textsf {safe}} \) and \(\, \vDash _{}I \rightarrow \bigvee _i\, [{ \textsf {act}} _i \,;\,{ \textsf {plant}} ] \, I\).

From this perspective, \(I^{{{\,\textrm{opt}\,}}}\) can be seen as the largest controllable invariant.

Lemma 1

\(I^{{{\,\textrm{opt}\,}}}\) is a controllable invariant and it is optimal in the sense that \(\vDash _{}I \rightarrow I^{{{\,\textrm{opt}\,}}}\) for any controllable invariant I.

Moreover, not just \(I^{{{\,\textrm{opt}\,}}}\), but every controllable invariant induces a solution. Indeed, given a controllable invariant I, we can define \(\mathcal {G}(I) \equiv (i \mapsto [{ \textsf {act}} _i \,;\,{ \textsf {plant}} ] \, I)\) for the control guards induced by I. \(\mathcal {G}(I)\) chooses as the guard for each action \({ \textsf {act}} _i\) the modal condition ensuring that \({ \textsf {act}} _i\), preserves I after the \({ \textsf {plant}} \).

Lemma 2

If I is a controllable invariant, then \((I, \mathcal {G}(I))\) is a solution (Def. 1).

Conversely, a controllable invariant can be derived from any solution.

Lemma 3

If (IG) is a solution, then \(I' \equiv ({ \textsf {assum}} \wedge I)\) is a controllable invariant. Moreover, we have \((I, G) \sqsubseteq (I', \mathcal {G}(I'))\).

Solution comparisons w.r.t. \(\sqsubseteq \) reduce to implications for controllable invariants.

Lemma 4

If I and \(I'\) are controllable invariants, then \((I, \mathcal {G}(I)) \sqsubseteq (I', \mathcal {G}(I'))\) if and only if \(\,\vDash _{}{ \textsf {assum}} \rightarrow (I \rightarrow I')\).

Taken together, these lemmas allow us to establish the optimality of \(S^{{{\,\textrm{opt}\,}}}\).

Theorem 1

\(S^{{{\,\textrm{opt}\,}}}\) is an optimal solution (i.e. a maximum w.r.t. \(\sqsubseteq \)) of Def. 1.

This shows the roadmap for the rest of the paper: finding solutions to the control envelope synthesis problem reduces to finding controllable invariants that imply \(I^{{{\,\textrm{opt}\,}}}\), which can be found by restricting the actions available to Demon in \(I^{{{\,\textrm{opt}\,}}}\) to guarantee safety, thereby refining the associated game.

3.4 One-Shot Fallback Refinement

The simplest refinement of \(I^{{{\,\textrm{opt}\,}}}\) is obtained when fixing a single fallback action to use in all states (if that is safe). A more general refinement considers different fallback actions in different states, but still only plays one such action forever.

Using the \(\textsf {dGL} \) axioms, any loop-free \(\textsf {dGL} \) formula whose ODEs admit solutions expressible in real arithmetic can be automatically reduced to an equivalent first-order arithmetic formula (in \(\text {FOL}_\mathbb {R} \)). An equivalent propositional arithmetic formula in \(\mathcal {P}_{\mathbb {R}}\) can be computed via quantifier elimination (QE). For example:

$$\begin{aligned} & [(v\mathrel {{:}{=}}1 \cap v\mathrel {{:}{=}}-1) \,;\,\{x' = v\}] \, x \ne 0 \\ \ \equiv \ & [v\mathrel {{:}{=}}1 \cap v\mathrel {{:}{=}}-1] \, [\{x' = v\}] \, x \ne 0 &\,& \text {by }[{;}] \\ \ \equiv \ \ {} & [v\mathrel {{:}{=}}1] \, [\{x' = v\}] \, x \ne 0 \ \vee \ [v\mathrel {{:}{=}}-1] \, [\{x' = v\}] \, x \ne 0 &\,& \text {by }[\cap ] \\ \ \equiv \ \ {} & [\{x' = 1\}] \, x \ne 0 \ \vee \ [\{x' = -1\}] \, x \ne 0 &\,& \text {by }[:=] \\ \ \equiv \ \ {} & (\forall t{\ge }0\, x + t \ne 0) \vee (\forall t{\ge }0\, x - t \ne 0) &\,& \text {by }['],[:=] \\ \ \equiv \ \ {} & x > 0 \ \vee \ x < 0 &\,& \text {by QE} . \end{aligned}$$

Even when a formula features nonsolvable ODEs, techniques exist to compute weakest preconditions for differential equations, with conservative approximations [38] or even exactly in some cases [8, 35]. In the rest of this section and for most of this paper, we are therefore going to assume the existence of a reduce oracle that takes as an input a loop-free dGL formula and returns a quantifier-free arithmetic formula that is equivalent modulo some assumptions. Section 3.7 shows how to implement and optimize reduce.

Definition 3 (Reduction Oracle)

[Reduction Oracle] A reduction oracle is a function \({ \textsf {reduce}} \) that takes as an input a loop-free dGL formula F and an assumption \(A \in \mathcal {P}_{\mathbb {R}} \). It returns a formula \(R \in \mathcal {P}_{\mathbb {R}} \) along with a boolean flag \({ \textsf {exact}} \) such that the formula \(A \rightarrow (R \rightarrow F)\) is valid, and if \({ \textsf {exact}} \) is true, then \(A \rightarrow (R \leftrightarrow F)\) is valid as well.

Back to our original problem, \(I^{{{\,\textrm{opt}\,}}}\) is not directly reducible since it involves a loop. However, conservative approximations can be computed by restricting the set of strategies that the Demon player is allowed to use. One extreme case allows Demon to only use a single action \({ \textsf {act}} _i\) repeatedly as a fallback (e.g. braking in the train example). In this case, we get a controllable invariant \([{({ \textsf {act}} _i \,;\,{ \textsf {plant}})}^{*}] \, { \textsf {safe}} \), which further simplifies into \([{ \textsf {act}} _i \,;\,{ \textsf {plant}} _\infty ] \, { \textsf {safe}} \) with

$$ \begin{aligned} { \textsf {plant}} _\infty \!\equiv \{{{ x' = f(x), t'=1\,}}\, \& \,\,{ \textsf {domain}} \} \end{aligned}$$

a variant of plant that never yields control. For this last step to be valid though, a technical assumption is needed on \({ \textsf {act}} _i\), which we call action permanence.

Definition 4 (Action Permanence)

[Action Permanence] An action \({ \textsf {act}} _i\) is said to be permanent if and only if \(({ \textsf {act}} _i \,;\,{ \textsf {plant}} \,;\,{ \textsf {act}} _i) \equiv ({ \textsf {act}} _i \,;\,{ \textsf {plant}})\), i.e., they are equivalent games.

Intuitively, an action is permanent if executing it more than once in a row has no consequence for the system dynamics. This is true in the common case of actions that only assign constant values to control variables that are read but not modified by the plant, such as \(a\mathrel {{:}{=}}A\) and \(a\mathrel {{:}{=}}-B\) in Model 1.

Lemma 5

If \({ \textsf {act}} _i\) is permanent, \( \vDash _{}[{({ \textsf {act}} _i \,;\,{ \textsf {plant}})}^{*}] \, { \textsf {safe}} \leftrightarrow [{ \textsf {act}} _i \,;\,{ \textsf {plant}} _\infty ] \, { \textsf {safe}} \).

Our discussion so far identifies the following approximation to our original synthesis problem, where \(\textsf {P}\) denotes the set of all indexes of permanent actions:

$$\begin{aligned} I^{ 0} &\ \equiv \ [(\cap _{i\in \textsf {P}} \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} _\infty ] \, { \textsf {safe}}, \\ G^{ 0}_i &\ \equiv \ [{ \textsf {act}} _i \,;\,{ \textsf {plant}} ] \, I^{ 0}. \end{aligned}$$

Here, \(I^{ 0}\) encompasses all states from which the agent can guarantee safety indefinitely with a single permanent action. \(G^{ 0}\) is constructed according to \(\mathcal {G}(I^{ 0})\) and only allows actions that are guaranteed to keep the agent within \(I^{ 0}\) until the next control cycle. Note that \(I^{ 0}\) degenerates to \(\text {false}\) in cases where there are no permanent actions, which does not make it less of a controllable invariant.

Theorem 2

\(I^{ 0}\) is a controllable invariant.

Moreover, in many examples of interest, \(I^{ 0}\) and \(I^{{{\,\textrm{opt}\,}}}\) are equivalent since an optimal fallback strategy exists that only involves executing a single action. This is the case in particular for Model 1, where

$$ \begin{aligned} I^{ 0} & \ \equiv \ [a\mathrel {{:}{=}}-B \,;\,\{p'=v, v'=a \ \& \ v \ge 0\}] \, e - p > 0 \\ & \ \equiv \ \, e - p > v^2/2B \end{aligned}$$

characterizes all states at safe braking distance to the obstacle and \(G^{ 0}\) associates the following guard to the acceleration action:

$$ \begin{aligned} G^{ 0}_{a\mathrel {{:}{=}}A} & \ \equiv \ [a\mathrel {{:}{=}}A \,;\,\{p'=v, v'=a, t'=1 \ \& \ v \ge 0 \wedge t \le T\}] \, e - p > v^2/2B \\ & \ \equiv \ \, e - p > vT + {AT^2}/{2} + {(v + AT)^2}/{2B} \end{aligned}$$

That is, accelerating is allowed if doing so is guaranteed to maintain sufficient braking distance until the next control opportunity. Section 3.6 discusses automatic generation of a proof that \((I^{ 0}, G^{ 0})\) is an optimal solution for Model 1.

3.5 Bounded Fallback Unrolling Refinement

In Section 3.4, we derived a solution by computing an underapproximation of \(I^{{{\,\textrm{opt}\,}}}\) where the fallback controller (played by Demon) is only allowed to use a one-shot strategy that picks a single action and plays it forever. Although this approximation is always safe and, in many cases of interest, happens to be exact, it does lead to a suboptimal solution in others. In this section, we allow the fallback controller to switch actions a bounded number of times before it plays one forever. There are still cases where doing so is suboptimal (imagine a car on a circular race track that is forced to maintain constant velocity). But this restriction is in line with the typical understanding of a fallback controller, whose mission is not to take over a system indefinitely but rather to maneuver it into a state where it can safely get to a full stop [32].

For all bounds \(n \in \mathbb {N}\), we define a game where the fallback controller (played by Demon) takes at most n turns to reach the region \(I^0\) in which safety is guaranteed indefinitely. During each turn, it picks a permanent action and chooses a time \(\theta \) in advance for when it wishes to play its next move. Because the environment (played by Angel) has control over the duration of each control cycle, the fallback controller cannot expect to be woken up after time \(\theta \) exactly. However, it can expect to be provided with an opportunity for its next move within the \([\theta , \theta +T]\) time window since the plant can never execute for time greater than T. Formally, we define \(I^{ n}\) as follows:

$$\begin{aligned} I^{ n} &\,\equiv \, [{{ \textsf {step}}}^{\times \le n} \,;\,{ \textsf {forever}} ] \, { \textsf {safe}} \qquad { \textsf {forever}} \,\equiv \, (\cap _{i\in \textsf {P}} \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} _\infty \\ { \textsf {step}} &\,\equiv \, (\theta \mathrel {{:}{=}}* \,;\,?\theta \ge 0)^d \,;\,(\cap _{i\in \textsf {P}} \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} _{\theta + T} \,;\,?{ \textsf {safe}} ^d \,;\,?t\ge \theta \end{aligned}$$

where \({ \textsf {plant}} _{\theta +T}\) is the same as \({ \textsf {plant}} \), except that the domain constraint \(t\le T\) is replaced by \(t\le \theta +T\). Equivalently, we can define \(I^n\) by induction as follows:

$$\begin{aligned} I^{ n+1} \, \equiv \,I^{ n} \,\vee \, [{ \textsf {step}} ] \, I^{ n} \qquad I^{ 0} \, \equiv \,[{ \textsf {forever}} ] \, { \textsf {safe}}, \end{aligned}$$
(5)

where the base case coincides with the definition of \(I^{ 0}\) in Section 3.4. Importantly, \(I^{ n}\) is a loop-free controllable invariant and so \({ \textsf {reduce}} \) can compute an explicit solution to the synthesis problem from \(I^{ n}\).

Theorem 3

\(I^{ n}\) is a controllable invariant for all \(n \ge 0\).

Theorem 3 establishes a nontrivial result since it overcomes the significant gap between the fantasized game that defines \(I^n\) and the real game being played by a time-triggered controller. The proof critically relies on the action permanence assumption along with a result [21, Lemma 6] establishing that ODEs preserve a specific form of reach-avoid property as a result of being deterministic.

Fig. 1.
figure 1

Robot navigating a corridor (Model 2). A 2D robot must navigate safely within a corridor with a dead-end without crashing against a wall. The corridor extends infinitely on the bottom and on the right. The robot can choose between going left and going down with a constant speed V. The left diagram shows \(I^0\) in gray. The right diagram shows \(I^1\) under the additional assumption \(VT < 2R\) (\(I^1\) and \(I^0\) are otherwise equivalent). A darker shade of gray is used for regions of \(I^1\) where only one of the two available actions is safe according to \(G^1\).

Example. As an illustration, consider the example in Fig. 1 and Model 2 of a 2D robot moving in a corridor that forms an angle. The robot is only allowed to move left or down at a constant velocity and must not crash against a wall. Computing \(I^{ 0}\) gives us the vertical section of the corridor, in which going down is a safe one-step fallback. Computing \(I^{ 1}\) forces us to distinguish two cases. If the corridor is wider than the maximal distance travelled by the robot in a control cycle (\(VT > 2R\)), then the upper section of the corridor is controllable (with the exception of a dead-end that we prove to be uncontrollable in Section 3.6). On the other hand, if the corridor is too narrow, then \(I^{ 1}\) is equivalent to \(I^{ 0}\). Formally, we have \(I^1 \ \equiv \ (y>-R \,\wedge \, |x| < R) \ \vee \ (VT<2R \,\wedge \, (x>-R \,\wedge \, |y| < R)).\) Moreover, computing \(I^2\) gives a result that is equivalent to \(I^1\). From this, we can conclude that \(I^1\) is equivalent to \(I^n\) for all \(n \ge 1\). Intuitively, it is optimal with respect to any finite fallback strategy (restricted to permanent actions).

figure g

The controllable invariant unrolling \(I^{ n}\) has a natural stopping criterion.

Lemma 6

If \(I^{ n} \leftrightarrow I^{ n+1}\) is valid for some \(n \ge 0\), then \(I^{ n} \leftrightarrow I^{ m}\) is valid for all \(m \ge n\) and \(I^{ n} \leftrightarrow I^{ \omega }\) is valid where \(I^{ \omega } \, \equiv \,[{ \textsf {step}} ^\times \,;\,{ \textsf {forever}} ] \, { \textsf {safe}} \).

3.6 Proving Optimality via the Dual Game

Suppose one found a controllable invariant I using techniques from the previous section. To prove it optimal, one must show that \(\vDash _{}\,{ \textsf {assum}} \rightarrow (I^{{{\,\textrm{opt}\,}}}\rightarrow I)\). By contraposition and \([\alpha ] \, P \leftrightarrow \lnot \langle \alpha \rangle \, \lnot P\) (\([\cdot ]\)), this is equivalent to proving that:

$$\begin{aligned} \vDash _{}\ { \textsf {assum}} \wedge \lnot I \rightarrow \underbrace{\langle {((\cap _i \, { \textsf {act}} _i) \,;\,{ \textsf {plant}})}^{*} \rangle \, \lnot { \textsf {safe}}}_{\lnot I^{{{\,\textrm{opt}\,}}}}. \end{aligned}$$
(6)

We define the largest uncontrollable region \(U^{{{\,\textrm{opt}\,}}}\equiv \lnot I^{{{\,\textrm{opt}\,}}}\) as the right-hand side of implication 6 above. Intuitively, \(U^{{{\,\textrm{opt}\,}}}\) characterizes the set of all states from which the environment (played by Angel) has a winning strategy against the controller (played by Demon) for reaching an unsafe state. In order to prove the optimality of I, we compute a sequence of increasingly strong approximations U of \(U^{{{\,\textrm{opt}\,}}}\) such that \(U \rightarrow U^{{{\,\textrm{opt}\,}}}\) is valid. We do so via an iterative process, in the spirit of how we approximate \(I^{{{\,\textrm{opt}\,}}}\) via bounded fallback unrolling (Section 3.5), although the process can be guided by the knowledge of I this time. If at any point we manage to prove that \({ \textsf {assum}} \rightarrow (I \vee U)\) is valid, then I is optimal.

One natural way to compute increasingly good approximations of \(U^{{{\,\textrm{opt}\,}}}\) is via loop unrolling. The idea is to improve approximation U by adding states from where the environment can reach U by running the control loop once, formally, \(\langle (\cap _i \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} \rangle \, U\). This unrolling principle can be useful. However, it only augments U with new states that can reach U in time T at most. So it cannot alone prove optimality in cases where violating safety from an unsafe state takes an unbounded amount of time.

For concreteness, let us prove the optimality of \(I^{ 0}\) in the case of Model 1. In [34] essentially the following statement is proved when arguing for optimality: \( \ \vDash _{}{ \textsf {assum}} \wedge \lnot I^0 \rightarrow \langle (a\mathrel {{:}{=}}-B \,;\,{ \textsf {plant}})^* \rangle \, \lnot { \textsf {safe}}. \) This is identical to our optimality criterion from Eq. (6), except that Demon’s actions are restricted to braking. Intuitively, this restriction is sound since accelerating always makes things worse as far as safety is concerned. If the train cannot be saved with braking alone, adding the option to accelerate will not help a bit. In this work, we propose a method for formalizing such arguments within \(\textsf {dGL} \) to arbitrary systems.

Our idea for doing so is to consider a system made of two separate copies of our model. One copy has all actions available whereas the other is only allowed a single action (e.g. braking). Given a safety metric m (i.e. a term m such that \(\vDash _{}m \le 0 \rightarrow \lnot { \textsf {safe}} \)), we can then formalize the idea that “action i is always better w.r.t safety metric m" within this joint system.

Definition 5 (Uniform Action Optimality)

[Uniform Action Optimality] Consider a finite number of discrete \(\textsf{dL}\) programs \(\alpha _i\) and \( p \equiv \{x'=f(x) \ \& \ Q\}\). Let \(V = {{\,\textrm{BV}\,}}(p) \cup \bigcup _i {{\,\textrm{BV}\,}}(\alpha _i)\) be the set of all variables written by p or some \(\alpha _i\). For any term \(\theta \) and integer n, write \({\theta }^{(n)}\) for the term that results from \(\theta \) by renaming all variables \(v \in V\) to a fresh tagged version \({x}^{(n)}\). Using a similar notation for programs and formulas, define \( {p}^{(1,2)} \equiv \{({x}^{(1)})' = f({x}^{(1)}), ({x}^{(2)})' = f({x}^{(2)}) \ \& \ {Q}^{(1)} \wedge {Q}^{(2)} \}\). We say that action j is uniformly optimal with respect to safety metric m if and only if:

$$\begin{aligned} \vDash _{}\ {m}^{(1)} \ge {m}^{(2)} \rightarrow [{\alpha _j}^{(1)} \,;\,(\cup _i \, {\alpha _i}^{(2)}) \,;\,{p}^{(1,2)}] \, {m}^{(1)} \ge {m}^{(2)}. \end{aligned}$$

\({\textsf {best}}_{j}((\alpha _i)_i, p, m)\) denotes that action j is uniformly optimal with respect to m for actions \(\alpha _i\) and dynamics p.

With such a concept in hand, we can formally establish the fact that criterion Eq. (6) can be relaxed in the existence of uniformly optimal actions.

Theorem 4

Consider a finite number of discrete \(\textsf{dL}\) programs \(\alpha _i\) such that \(\vDash _{}\langle \alpha _i \rangle \, \text {true}\) for all i and \( p \equiv \{x'=f(x) \ \& \ q \ge 0\}\). Then, provided that \({\textsf {best}}_{j}((\alpha _i)_i, p, m)\) and \({\textsf {best}}_{j}((\alpha _i)_i, p, -q)\) (no other action stops earlier because of the domain constraint), we have:

$$\begin{aligned} \vDash _{}\ \langle ((\cap \, \alpha _i) \,;\,p)^* \rangle \, m \le 0 \leftrightarrow \langle (\alpha _j \,;\,p)^* \rangle \, m \le 0 . \end{aligned}$$

A general heuristic for leveraging Theorem 4 to grow U automatically works as follows. First, it considers \(R \equiv { \textsf {assum}} \wedge \lnot I \wedge \lnot U\) that characterizes states that are not known to be controllable or uncontrollable. Then, it picks a disjunct \(\bigwedge _j R_{j}\) of the disjunctive normal form of R and computes a forward invariant region V that intersects with it: \(V \equiv \bigwedge _j \{ R_{j} : { \textsf {assum}}, \, R_{j} \vdash [(\cup _i \, { \textsf {act}} _i) \,;\,{ \textsf {plant}} ] \, R_{j} \}\). Using V as an assumption to simplify \(\lnot U\) may suggest metrics to be used with Theorem 4. For example, observing \(\vDash _{}V \rightarrow (\lnot U \rightarrow (\theta _1 > 0 \wedge \theta _2 > 0))\) suggests picking metric \(m \equiv \min (\theta _1, \theta _2)\) and testing whether \({\textsf {best}}_{j}({ \textsf {act}}, p, m)\) is true for some action j. If such a uniformly optimal action exists, then U can be updated as \(U \leftarrow U \,\vee \, (V \wedge \langle ({ \textsf {act}} _j \,;\,{ \textsf {plant}})^* \rangle \, m \le 0)\). The solution \(I^{ 1}\) for the corridor (Model 2) can be proved optimal automatically using this heuristic in combination with loop unrolling.

3.7 Implementing the Reduction Oracle

The CESAR algorithm assumes the existence of a reduction oracle that takes as an input a loop-free \(\textsf {dGL} \) formula and attempts to compute an equivalent formula within the fragment of propositional arithmetic. When an exact solution cannot be found, an implicant is returned instead and flagged appropriately (Def. 3). This section discusses our implementation of such an oracle.

As discussed in Section 3.4, exact solutions can be computed systematically when all ODEs are solvable by first using the \(\textsf {dGL} \) axioms to eliminate modalities and then passing the result to a quantifier elimination algorithm for first-order arithmetic [9, 42]. Although straightforward in theory, a naïve implementation of this idea hits two practical barriers. First, quantifier elimination is expensive and its cost increases rapidly with formula complexity [11, 44]. Second, the output of existing QE implementations can be unnecessarily large and redundant. In iterated calls to the reduction oracle, these problems can compound each other.

To alleviate this issue, our implementation performs eager simplification at intermediate stages of computation, between some axiom application and quantifier-elimination steps. This optimization significantly reduces output solution size and allows CESAR to solve a benchmark that would otherwise timeout after 20 minutes in 26s. [21, Appendix E] further discusses the impact of eager simplification. Still, the doubly exponential complexity of quantifier elimination puts a limit on the complexity of problems that CESAR can currently tackle.

In the general case, when ODEs are not solvable, our reduction oracle is still often able to produce approximate solutions using differential invariants generated automatically by existing tools [38]. Differential invariants are formulas that stay true throughout the evolution of an ODE system.Footnote 1 To see how they apply, consider the case of computing \({ \textsf {reduce}} ([\{x'=f(x)\}] \, P, A)\) where P is the postcondition formula that must be true after executing the differential equation, and A is the assumptions holding true initially. Suppose that formula D(x) is a differential invariant such that \(D(x)\rightarrow P\) is valid. Then, a precondition sufficient to ensure that P holds after evolution is \(A \rightarrow D(x)\). For example, to compute the precondition for the dynamics of the parachute benchmark, our reduction oracle first uses the Pegasus tool [38] to identify a Darboux polynomial, suggesting an initial differential invariant \(D_0\). Once we have \(D_0\), the additional information required to conclude post condition P is \(D_0 \rightarrow P\). To get an invariant formula that implies \(D_0 \rightarrow P\), eliminate all the changing variables \(\{x, v\}\) in the formula \(\forall x \, \forall v\ (D_0 \rightarrow P)\), resulting in a formula \(D_1\). \(D_1\) is a differential invariant since it features no variable that is updated by the ODEs. Our reduction oracle returns \(D_0 \wedge D_1\), an invariant that entails postcondition P.

3.8 The CESAR Algorithm

The CESAR algorithm for synthesizing control envelopes is summarized in Algorithm 1. It is expressed as a generator that yields a sequence of solutions with associated optimality guarantees. Possible guarantees include “sound” (no optimality guarantee, only soundness), “k-optimal” (sound and optimal w.r.t all k-switching fallbacks with permanent actions), \(\omega \)-optimal” (sound and optimal w.r.t all finite fallbacks with permanent actions) and “optimal” (sound and equivalent to \(S^{{{\,\textrm{opt}\,}}}\)). Line 11 performs the optimality test described in Section 3.6. Finally, Line 10 performs an important soundness check for the cases where an approximation has been made along the way of computing \((I^{ n}, G^{ n})\). In such cases, I is not guaranteed to be a controllable invariant and thus Case (2) of Def. 1 must be checked explicitly.

When given a problem with solvable ODEs and provided with a complete QE implementation within reduce, CESAR is guaranteed to generate a solution in finite time with an “n-optimal” guarantee at least (n being the unrolling limit).

Algorithm 1
figure h

CESAR: Control Envelope Synthesis via Angelic Refinements

4 Benchmarks and Evaluation

To evaluate our approach to the Control Envelope Synthesis problem, we curate a benchmark suite with diverse optimal control strategies. As Table 2 summarizes, some benchmarks have non-solvable dynamics, while others require a sequence of clever control actions to reach an optimal solution. Some have state-dependent fallbacks where the current state of the system determines which action is “safer”, and some are drawn from the literature. We highlight a couple of benchmarks here. See [21, Appendix D] for a discussion of the full suite and the synthesized results, and [20] for the benchmark files and evaluation scripts.

Power Station is an example where the optimal control strategy involves two switches, corresponding to two steps of unrolling. A power station can either produce power or dispense it to meet a quota, but never give out more than it has produced. Charging is the fallback action that is safe for all time after the station has dispensed enough power. However, to cover all controllable states, we need to switch at least two times, so that the power station has a chance to produce energy and then dispense it, before settling back on the safe fallback. Parachute is an example of a benchmark with non-solvable, hyperbolic dynamics. A person jumps off a plane and can make an irreversible choice to open their parachute. The objective is to stay within a maximum speed that is greater than the terminal velocity when the parachute is open.

We implement CESAR in Scala, using Mathematica for simplification and quantifier elimination, and evaluate it on the benchmarks. Simplification is an art [23, 25]. We implement additional simplifiers with the Egg library [45] and SMT solver z3 [30]. Experiments were run on a 32GB RAM M2 MacBook Pro machine. CESAR execution times average over 5 runs.

CESAR synthesis is automatic. The optimality tests were computed manually. Table 2 summarizes the result of running CESAR. Despite a variety of different control challenges, CESAR is able to synthesize safe and in some cases also optimal safe control envelopes within a few minutes. As an extra step of validation, synthesized solutions are checked by the hybrid system theorem prover KeYmaera X [16]. All solutions are proved correct, with verification time as reported in the last column of Table 2.

Table 2. Summary of CESAR experimental results

5 Related Work

Hybrid controller synthesis has received significant attention [7, 26, 41], with popular approaches using temporal logic [5, 7, 46], games [31, 43], and CEGIS-like guidance from counterexamples [1, 10, 37, 39]. CESAR, however, solves the different problem of synthesizing control envelopes that strive to represent not one but all safe controllers of a system. Generating valid solutions is not an issue (a trivial solution always exists that has an empty controllable set). The real challenge is optimality which imposes a higher order constraint because it reasons about the relationship between possible valid solutions, and cannot, e.g., fit in the CEGIS quantifier alternation pattern \(\exists \forall \). So simply adapting existing controller synthesis techniques does not solve symbolic control envelope synthesis.

Safety shields computed by numerical methods [2, 13, 24] serve a similar function to our control envelopes and can handle dynamical systems that are hard to analyze symbolically. However, they scale poorly with dimensionality and do not provide rigorous formal guarantees due to the need of discretizing continuous systems. Compared to our symbolic approach, they cannot handle unbounded state spaces (e.g. our infinite corridor) nor produce shields that are parametric in the model’s parameters without hopelessly increasing dimensionality.

On the optimality side, a systematic but manual process was used to design a safe European Train Control System (ETCS) and justify it as optimal with respect to specific train criteria [34]. Our work provides the formal argument filling the gap between such case-specific criteria and end-to-end optimality. CESAR is more general and automatic.

6 Conclusion

This paper presents the CESAR algorithm for Control Envelope Synthesis via Angelic Refinements. It is the first approach to automatically synthesize symbolic control envelopes for hybrid systems. The synthesis problem and optimal solution are characterized in differential game logic. Through successive refinements, the optimal solution in game logic is translated into a controllable invariant and control conditions. The translation preserves safety. For the many cases where refinement additionally preserves optimality, an algorithm to test optimality of the result post translation is presented. The synthesis experiments on a benchmark suite of diverse control problems demonstrate CESAR’s versatility. For future work, we plan to extend to additional control shapes, and to exploit the synthesized safe control envelopes for reinforcement learning.