CESAR: Control Envelope Synthesis via Angelic Refinements

This paper presents an approach for synthesizing provably correct control envelopes for hybrid systems. Control envelopes characterize families of safe controllers and are used to monitor untrusted controllers at runtime. Our algorithm fills in the blanks of a hybrid system's sketch specifying the desired shape of the control envelope, the possible control actions, and the system's differential equations. In order to maximize the flexibility of the control envelope, the synthesized conditions saying which control action can be chosen when should be as permissive as possible while establishing a desired safety condition from the available assumptions, which are augmented if needed. An implicit, optimal solution to this synthesis problem is characterized using hybrid systems game theory, from which explicit solutions can be derived via symbolic execution and sound, systematic game refinements. Optimality can be recovered in the face of approximation via a dual game characterization. The resulting algorithm, Control Envelope Synthesis via Angelic Refinements (CESAR), is demonstrated in a range of safe control synthesis examples with different control challenges.


Introduction
Hybrid systems are important models of many applications, capturing their differential equations and control [26,40,3,32,4,27].For overall system safety, the correctness of the control decisions in a hybrid system is crucial.Formal verification techniques can justify correctness properties.Such correct controllers have been identified in a sequence of challenging case studies [33,39,12,31,19,14,21].A useful approach to verified control is to design and verify a safe control envelope around possible safe control actions.Safe control envelopes are nondeterministic programs whose every execution is safe.In contrast with controllers, control envelopes define entire families of controllers to allow control actions under as many circumstances as possible, as long as they maintain the safety of the hybrid system.Safe control envelopes allow the verification of abstractions of control systems, isolating the parts relevant to the safety feature of interest, without involving the full complexity of a specific control implementation.The full control system is then monitored for adherence to the safe control envelope at runtime [28].The control envelope approach allows a single verification result to apply to multiple specialized control implementations, optimized for different objectives.It puts industrial controllers that are too complex to verify directly within the reach of verification, because a control envelope only needs to model the safety-critical aspects of the controller.Control envelopes also enable applications like justified speculative control [17], where machine-learning-based agents control safety-critical systems safeguarded within a verified control envelope, or [35], where these envelopes generate reward signals for reinforcement learning.
Control envelope design is challenging.Engineers are good at specifying the shape of a model and listing the possible control actions by translating client specifications, which is crucial for the fidelity of the resulting model.But identifying the exact control conditions required for safety in a model is a much harder problem that requires design insights and creativity, and is the main point of the deep area of control theory.Most initial system designs are incorrect and need to be fixed before verification succeeds.Fully rigorous justification of the safety of the control conditions requires full verification of the resulting controller in the hybrid systems model.We present a synthesis technique that addresses this hard problem by filling in the holes of a hybrid systems model to identify a correct-by-construction control envelope that is as permissive as possible.
Our approach is called Control Envelope Synthesis via Angelic Refinements (CESAR).The idea is to implicitly characterize the optimal safe control envelope via hybrid games yielding maximally permissive safe solutions in differential game logic [32].To derive explicit solutions used for controller monitoring at runtime, we successively refine the games while preserving safety and, if possible, optimality.Our experiments demonstrate that CESAR solves hybrid systems synthesis challenges requiring different control insights.
Contributions.The primary contributions of this paper behind CESAR are: optimal hybrid systems control envelope synthesis via hybrid games.
differential game logic formulas identifying optimal safe control envelopes.
refinement techniques for safe control envelope approximation, including bounded fixpoint unrollings via a recurrence, which exploits action permanence (a hybrid analogue to idempotence).a primal/dual game counterpart optimality criterion.

Background: Differential Game Logic
We use hybrid games written in differential game logic (dGL, [32]) to represent solutions to the synthesis problem.Hybrid games are two-player noncooperative zero-sum sequential games with no draws that are played on a hybrid system with differential equations.Players take turns and in their turn can choose to act arbitrarily within the game rules.At the end of the game, one player wins, the other one loses.The players are classically called Angel and Demon.Hybrid systems, in contrast, have no agents, only a nondeterministic controller running in a nondeterministic environment.The synthesis problem consists of filling in holes in a hybrid system.Thus, expressing solutions for hybrid system synthesis with hybrid games is one of the insights of this paper.
An example of a game is (v := 1 ∩ v := −1) ; {x ′ = v}.In this game, first Demon chooses between setting velocity v to 1, or to -1.Then, Angel evolves position x as x ′ = v for a duration of her choice.Differential game logic uses modalities to set win conditions for the players.For example, in the formula [(v := 1 ∩ v := −1) ; {x ′ = v}] x ̸ = 0, Demon wins the game when x ̸ = 0 at the end of the game and Angel wins otherwise.The overall formula represents the set of states from which Demon can win the game, which is x ̸ = 0 because when x < 0, Demon has the winning strategy to pick v := −1, so no matter how long Angel evolves x ′ = v, x remains negative.Likewise, when x > 0, Demon can pick v := 1.However, when x = 0, Angel has a winning strategy: to evolve x ′ = v for zero time, so that x remains zero regardless of Demon's choice.
We summarize dGL's program notation (Table 1).See [32] for full exposition.Assignment x := θ instantly changes the value of variable x to the value of θ.Challenge ?ψ continues the game if ψ is satisfied in the current state, otherwise Angel loses immediately.In continuous evolution x ′ = θ & ψ Angel follows the differential equation x ′ = θ for some duration of her choice, but loses immediately on violating ψ at any time.Sequential game α; β first plays α and when it In games restricted to the structures listed above but without α d , all choices are resolved by Angel alone with no adversary, and hybrid games coincide with hybrid systems in differential dynamic logic (dL) [32].We will use this restriction to specify the synthesis question, the sketch that specifies the shape and safety properties of control envelopes.But to characterize the solution that fills in the blanks of the control envelope sketch, we use games where both Angel and Demon play.Notation we use includes demonic choice α ∩ β, which lets Demon choose whether to run α or β.Demonic repetition α × lets Demon choose whether to repeat α choosing whether to stop or go at the end of every run.We define α * ≤n and α ×≤n for angelic and demonic repetitions respectively of at most n times.
In order to express properties about hybrid games, differential game logic formulas refer to the existence of winning strategies for objectives of the games (e.g., a controller has a winning strategy to achieve collision avoidance despite an adversarial environment).The set of dGL formulas is generated by the following grammar (where ∼ ∈ {<, ≤, =, ≥, >} and θ 1 , θ 2 are arithmetic expressions in +, −, •, / over the reals, x is a variable, α is a hybrid game): Comparisons of arithmetic expressions, Boolean connectives, and quantifiers over the reals are as usual.The modal formula ⟨α⟩ ϕ expresses that player Angel has a winning strategy to reach a state satisfying ϕ in hybrid game α.Modal formula [α] ϕ expresses the same for Demon.The fragment without modalities is firstorder real arithmetic.Its fragment without quantifiers is called propositional arithmetic P R .Details on the semantics of dGL [32] is recalled in Appendix B, but we provide examples to give the intuition.[α ∪ β] ϕ expresses that Demon has a winning strategy when Angel chooses between α and β to achieve ϕ, while [α ∩ β] ϕ expresses that Demon has a winning strategy to achieve ϕ when Demon chooses whether to play game α or β.Correspondingly, ⟨α ∩ β⟩ ϕ expresses that Angel has a winning strategy to achieve ϕ when Demon has a choice between α and β.A formula ϕ is valid, written ⊨ ϕ, iff it is true in every state ω.States are functions assigning a real number to each variable.For instance, ϕ → [α] ψ is valid iff, from all initial states satisfying ϕ, Demon has a winning strategy in game α to achieve ψ.Our proofs are syntactic derivations in the dGL proof calculus, summarized in Appendix B with a standard first-order logic sequent calculus.A sequent Γ ⊢ ∆ with a finite list of antecedent formulas Γ and succedent formulas ∆ is short for ϕ∈Γ ϕ → ψ∈∆ ψ.
Control Safety Envelopes by Example.In order to separate safety critical aspects from other system goals during control design, we abstractly describe the safe choices of a controller with safe control envelopes that deliberately underspecify Model 1 The train ETCS model (slightly modified from [33]).Framed formulas are initially blank and are automatically synthesized by our tool as indicated.
when and how to exactly execute certain actions.They focus on describing in which regions it is safe to take actions.For example, Model 1 designs a train control envelope [33] that must stop by the train by the end of movement authority e located somewhere ahead, as assigned by the train network scheduler.Past e, there may be obstacles or other trains.The train's control choices are to accelerate or brake as it moves along the track.The goal of CESAR is to synthesize the framed formulas in the model, that are initially blank.
Line 6 describes the safety property that is to be enforced at all times: the train driving at position p with velocity v must not go past position e.Line 1 lists modeling assumptions: the train is capable of both acceleration (A>0) and deceleration (B>0), the controller latency is positive (T >0) and the train cannot move backwards as a product of braking (this last fact is also reflected by having v ≥ 0 as a domain constraint for the plant on Line 5).These assumptions are fundamentally about the physics of the problem being considered.In contrast, Line 2 features a controllability assumption that can be derived from careful analysis.Here, this synthesized assumption says that the train cannot start so close to e that it won't stop in time even if it starts braking immediately.Line 3 and Line 4 describe a train controller with two actions: accelerating (a := A) and braking (a := −B).Each action is guarded by a synthesized formula, called an action guard that indicates when it is safe to use.Angel has control over which action runs, and adversarially plays with the objective of violating safety conditions.But Angel's options are limited to only safe ones because of the synthesized action guards, ensuring that Demon still wins and the overall formula is valid.In this case, braking is always safe whereas acceleration can only be allowed when the distance to end position e is sufficiently large.Finally, the plant on Line 5 uses differential equations to describe the train's kinematics.A timer variable t is used to ensure that no two consecutive runs of the controller are separated by more than time T .Thus, this controller is time-triggered.
Overview of CESAR.CESAR first identifies the optimal solution for the blank of Line 2. Intuitively, this blank should identify a controllable invariant, which denotes a set of states where a controller with choice between acceleration and braking has some strategy (to be enforced by the conditions of Line 3 and Line 4) that guarantees safe control forever.Such states can be characterized by the fol-lowing dGL formula where Demon, as a proxy for the controller, decides whether to accelerate or brake: [((a := A ∩ a := −B) ; plant) * ] safe where plant and safe are from Model 1.When this formula is true, Demon, who decides when to brake to maintain the safety contract, has a winning strategy that the controller can mimic.When it is false, Demon, a perfect player striving to maintain safety, has no winning strategy, so a controller has no guaranteed way to stay safe either.
This dGL formula provides an implicit characterization of the optimal controllable invariant from which we derive an explicit formula in P R to fill the blank with using symbolic execution.Symbolic execution solves a game following the axioms of dGL to produce an equivalent P R formula (Section 3.7).However, our dGL formula contains a loop, for which symbolic execution will not terminate in finite time.To reason about the loop, we refine the game, modifying it so that it is easier to symbolically execute, but still at least as hard for Demon to win so that the controllable invariant that it generates remains sound.In this example, the required game transformation first restricts Demon's options to braking.Then, it eliminates the loop using the observation that the repeated hybrid iterations (a := −B; plant) * behave the same as just following the continuous dynamics of braking for unbounded time.It replaces the original game with a := −B ; t := 0 ; 2B to fill the blank of Line 2. Intuitively, this refinement (formalized in Section 3.4) captures situations where the controller stays safe forever by picking a single control action (braking).It generates the optimal solution for this example because braking forever is the dominant strategy: given any state, if braking forever does not keep the train safe, then certainly no other strategy will.However, there are other problems where the dominant control strategy requires the controller to strategically switch between actions, and this refinement misses some controllable invariant states.So we introduce a new refinement: bounded game unrolling via a recurrence (Section 3.5).A solution generated by unrolling n times captures states where the controller can stay safe by switching control actions up to n times.
Having synthesized the controllable invariant, CESAR fills the action guards (Line 3 and Line 4).An action should be permissible when running it for one iteration maintains the controllable invariant.For example, acceleration is safe to execute exactly when [a := A; plant]e − p > v 2 2B .We symbolically execute this game to synthesize the formula that fills the guard of Line 3.

Approach
This section formally introduces the Control Envelope Synthesis via Angelic Refinements (CESAR) approach for hybrid systems control envelope synthesis.

Problem Definition
We frame the problem of control envelope synthesis in terms of filling in holes in a problem of the following shape: Here, the control envelope consists of a nondeterministic choice between a finite number of guarded actions.Each action act i is guarded by a condition i to be determined in a way that ensures safety within a controllable invariant [6,18] to be synthesized also.The plant is defined by the following template: This ensures that the plant must yield to the controller after time T at most, where T is assumed to be positive and constant.In addition, we make the following assumptions: 1. Components assum, safe and domain are propositional arithmetic formulas.2. Timer variable t is fresh (does not occur except where shown in template).
3. Programs act i are discrete dL programs that can involve choices, assignments and tests with propositional arithmetic.Variables assigned by act i must not appear in safe.In addition, act i must terminate in the sense that ⊨ ⟨act i ⟩ true. 4. The modeling assumptions assum are invariant in the sense that ⊨ assum → [(∪ i act i ) ; plant] assum.This holds trivially for assumptions about constant parameters such as A > 0 in Model 1 and this ensures that the controller can always rely on them being true.
Definition 1.A solution to the synthesis problem above is defined as a pair (I, G) where I is a formula and G maps each action index i to a formula G i .In addition, the following conditions must hold: 1. Safety is guaranteed: prob(I, G) ≡ prob[ → I, i → G i ] is valid and (assum ∧ I) is a loop invariant that proves it so.2. There is always some action: Condition 2 is crucial for using the resulting nondeterministic control envelope, since it guarantees that safe actions are always available as a fallback.

An Optimal Solution
Solutions to a synthesis problem may differ in quality.Intuitively, a solution is better than another if it allows for a strictly larger controllable invariant.In case of equality, the solution with the more permissive control envelope wins.Formally, given two solutions S = (I, G) and S ′ = (I ′ , G ′ ), we say that S ′ is better or equal to S (written S ⊑ S ′ ) if and only if ⊨ assum → (I → I ′ ) and additionally either ⊨ assum → ¬(I ′ → I) or ⊨ (assum ∧ I) → i (G i → G ′ i ).Given two solutions S and S ′ , one can define a solution S ⊓ S ′ = (I ∨ I ′ , i → that is better or equal to both S and S ′ (S ⊑ S ⊓ S ′ and S ′ ⊑ S ⊓S ′ ).A solution S ′ is called the optimal solution when it is the maximum element in the ordering, so that for any other solution S, S ⊑ S ′ .The optimal solution exists and is expressible in dGL: Intuitively, I opt characterizes the set of all states from which an optimal controller (played here by Demon) can keep the system safe forever.In turn, G opt is defined to allow any control action that is guaranteed to keep the system within I opt until the next control cycle as characterized by a modal formula.Section 3.3 formally establishes the correctness and optimality of S opt ≡ (I opt , G opt ).While it is theoretically reassuring that an optimal solution exists that is at least as good as all others and that this optimum can be characterized in dGL, such a solution is of limited practical usefulness since Eq. ( 3) cannot be executed without solving a game at runtime.Rather, we are interested in explicit solutions where I and G are quantifier-free real arithmetic formulas.There is no guarantee in general that such solutions exist that are also optimal, but our goal is to devise an algorithm to find them in the many cases where they exist or find safe approximations otherwise.

Controllable Invariants
The fact that S opt is a solution can be characterized in logic with the notion of a controllable invariant that, at each of its points, admits some control action that keeps the plant in the invariant for one round.All lemmas and theorems throughout this paper are proved in Appendix B.

Definition 2 (Controllable Invariant). A controllable invariant is a formula
From this perspective, I opt can be seen as the largest controllable invariant.Lemma 1.I opt is a controllable invariant and it is optimal in the sense that ⊨ I → I opt for any controllable invariant I.
Moreover, not just I opt , but every controllable invariant induces a solution.Indeed, given a controllable invariant I, we can define G(I) ≡ (i → [act i ; plant] I) for the control guards induced by I. G(I) chooses as the guard for each action act i the modal condition ensuring that act i , preserves I after the plant.
Conversely, a controllable invariant can be derived from any solution.
Solution comparisons w.r.t.⊑ reduce to implications for controllable invariants.Taken together, these lemmas allow us to establish the optimality of S opt .Theorem 1. S opt is an optimal solution (i.e. a maximum w.r.t.⊑) of Def. 1.
This shows the roadmap for the rest of the paper: finding solutions to the control envelope synthesis problem reduces to finding controllable invariants that imply I opt , which can be found by restricting the actions available to Demon in I opt to guarantee safety, thereby refining the associated game.

One-Shot Fallback Refinement
The simplest refinement of I opt is obtained when fixing a single fallback action to use in all states (if that is safe).A more general refinement considers different fallback actions in different states, but still only plays one such action forever.
Using the dGL axioms, any loop-free dGL formula whose ODEs admit solutions expressible in real arithmetic can be automatically reduced to an equivalent first-order arithmetic formula (in FOL R ).An equivalent propositional arithmetic formula in P R can be computed via quantifier elimination (QE).For example: Even when a formula features nonsolvable ODEs, techniques exist to compute weakest preconditions for differential equations, with conservative approximations [37] or even exactly in some cases [34,8].In the rest of this section and for most of this paper, we are therefore going to assume the existence of a reduce oracle that takes as an input a loop-free dGL formula and returns a quantifierfree arithmetic formula that is equivalent modulo some assumptions.Section 3.7 shows how to implement and optimize reduce.

Definition 3 (Reduction Oracle).
A reduction oracle is a function reduce that takes as an input a loop-free dGL formula F and an assumption A ∈ P R .It returns a formula R ∈ P R along with a boolean flag exact such that the formula Back to our original problem, I opt is not directly reducible since it involves a loop.However, conservative approximations can be computed by restricting the set of strategies that the Demon player is allowed to use.One extreme case allows Demon to only use a single action act i repeatedly as a fallback (e.g.braking in the train example).In this case, we get a controllable invariant [(act i ; plant) * ] safe, which further simplifies into [act i ; plant ∞ ] safe with a variant of plant that never yields control.For this last step to be valid though, a technical assumption is needed on act i , which we call action permanence.
Definition 4 (Action Permanence).An action act i is said to be permanent if and only if (act i ; plant ; act i ) ≡ (act i ; plant), i.e., they are equivalent games.
Intuitively, an action is permanent if executing it more than once in a row has no consequence for the system dynamics.This is true in the common case of actions that only assign constant values to control variables that are read but not modified by the plant, such as a := A and a := −B in Model 1.
Our discussion so far identifies the following approximation to our original synthesis problem, where P denotes the set of all indexes of permanent actions: Here, I 0 encompasses all states from which the agent can guarantee safety indefinitely with a single permanent action.G 0 is constructed according to G(I 0 ) and only allows actions that are guaranteed to keep the agent within I 0 until the next control cycle.Note that I 0 degenerates to false in cases where there are no permanent actions, which does not make it less of a controllable invariant.
Theorem 2. I 0 is a controllable invariant.
Moreover, in many examples of interest, I 0 and I opt are equivalent since an optimal fallback strategy exists that only involves executing a single action.This is the case in particular for Model 1, where characterizes all states at safe braking distance to the obstacle and G 0 associates the following guard to the acceleration action: That is, accelerating is allowed if doing so is guaranteed to maintain sufficient braking distance until the next control opportunity.Section 3.6 discusses automatic generation of a proof that (I 0 , G 0 ) is an optimal solution for Model 1.

Bounded Fallback Unrolling Refinement
In Section 3.4, we derived a solution by computing an underapproximation of I opt where the fallback controller (played by Demon) is only allowed to use a one-shot strategy that picks a single action and plays it forever.Although this approximation is always safe and, in many cases of interest, happens to be exact, it does lead to a suboptimal solution in others.In this section, we allow the fallback controller to switch actions a bounded number of times before it plays one forever.There are still cases where doing so is suboptimal (imagine a car on a circular race track that is forced to maintain constant velocity).But this restriction is in line with the typical understanding of a fallback controller, whose mission is not to take over a system indefinitely but rather to maneuver it into a state where it can safely get to a full stop [31].For all bounds n ∈ N, we define a game where the fallback controller (played by Demon) takes at most n turns to reach the region I 0 in which safety is guaranteed indefinitely.During each turn, it picks a permanent action and chooses a time θ in advance for when it wishes to play its next move.Because the environment (played by Angel) has control over the duration of each control cycle, the fallback controller cannot expect to be woken up after time θ exactly.However, it can expect to be provided with an opportunity for its next move within the [θ, θ + T ] time window since the plant can never execute for time greater than T .Formally, we define I n as follows: where plant θ+T is the same as plant, except that the domain constraint t ≤ T is replaced by t ≤ θ + T .Equivalently, we can define I n by induction as follows: where the base case coincides with the definition of I 0 in Section 3.4.Importantly, I n is a loop-free controllable invariant and so reduce can compute an explicit solution to the synthesis problem from I n .
Theorem 3. I n is a controllable invariant for all n ≥ 0.
Theorem 3 establishes a nontrivial result since it overcomes the significant gap between the fantasized game that defines I n and the real game being played by a time-triggered controller.Our proof critically relies on the action permanence assumption along with the following property of differential equations, which establishes that ODE programs preserve a specific form of reach-avoid property as a result of being deterministic.Example.As an illustration, consider the example in Fig. 1 and Model 2 of a 2D robot moving in a corridor that forms an angle.The robot is only allowed to move left or down at a constant velocity and must not crash against a wall.Computing I 0 gives us the vertical section of the corridor, in which going down is a safe one-step fallback.Computing I 1 forces us to distinguish two cases.If the corridor is wider than the maximal distance travelled by the robot in a control cycle (V T > 2R), then the upper section of the corridor is controllable (with the exception of a dead-end that we prove to be uncontrollable in Section 3.6).On the other hand, if the corridor is too narrow, then I 1 is equivalent to I 0 .Formally, we have Moreover, computing I 2 gives a result that is equivalent to I 1 .From this, we can conclude that I 1 is equivalent to I n for all n ≥ 1. Intuitively, it is optimal with respect to any finite fallback strategy (restricted to permanent actions).
Model 2 Robot navigating a corridor with framed solutions of holes.
The controllable invariant unrolling I n has a natural stopping criterion.Lemma 7. If I n ↔ I n+1 is valid for some n ≥ 0, then I n ↔ I m is valid for all m ≥ n and I n ↔ I ω is valid where I ω ≡ [step × ; forever] safe.

Proving Optimality via the Dual Game
Suppose one found a controllable invariant I using techniques from the previous section.To prove it optimal, one must show that ⊨ assum → (I opt → I).By contraposition and [α] P ↔ ¬⟨α⟩ ¬P ([•]), this is equivalent to proving that: We define the largest uncontrollable region U opt ≡ ¬I opt as the right-hand side of implication 6 above.Intuitively, U opt characterizes the set of all states from which the environment (played by Angel) has a winning strategy against the controller (played by Demon) for reaching an unsafe state.In order to prove the optimality of I, we compute a sequence of increasingly strong approximations U of U opt such that U → U opt is valid.We do so via an iterative process, in the spirit of how we approximate I opt via bounded fallback unrolling (Section 3.5), although the process can be guided by the knowledge of I this time.If at any point we manage to prove that assum → (I ∨ U ) is valid, then I is optimal.
One natural way to compute increasingly good approximations of U opt is via loop unrolling.The idea is to improve approximation U by adding states from where the environment can reach U by running the control loop once, formally, ⟨(∩ i act i ) ; plant⟩ U .This unrolling principle can be useful.However, it only augments U with new states that can reach U in time T at most.So it cannot alone prove optimality in cases where violating safety from an unsafe state takes an unbounded amount of time.
For concreteness, let us prove the optimality of I 0 in the case of Model 1.In [33] essentially the following statement is proved when arguing for optimality: ⊨ assum ∧ ¬I 0 → ⟨(a := −B ; plant) * ⟩ ¬safe.This is identical to our optimality criterion from Eq. ( 6), except that Demon's actions are restricted to braking.Intuitively, this restriction is sound since accelerating always makes things worse as far as safety is concerned.If the train cannot be saved with braking alone, adding the option to accelerate will not help a bit.In this work, we propose a method for formalizing such arguments within dGL to arbitrary systems.
Our idea for doing so is to consider a system made of two separate copies of our model.One copy has all actions available whereas the other is only allowed a single action (e.g.braking).Given a safety metric m (i.e. a term m such that ⊨ m ≤ 0 → ¬safe), we can then formalize the idea that "action i is always better w.r.t safety metric m" within this joint system.Definition 5 (Uniform Action Optimality).Consider a finite number of discrete dL programs α i and p ≡ {x be the set of all variables written by p or some α i .For any term θ and integer n, write θ (n) for the term that results from θ by renaming all variables v ∈ V to a fresh tagged version x (n) .Using a similar notation for programs and formulas, define p (1,2) ≡ {(x (1) ) ′ = f (x (1) ), (x (2) ) ′ = f (x (2) ) & Q (1) ∧ Q (2) }.We say that action j is uniformly optimal with respect to safety metric m if and only if: 1) ; (∪ i α i (2) ) ; p (1,2) ] m (1) ≥ m (2) .best j ((α i ) i , p, m) denotes that action j is uniformly optimal with respect to m for actions α i and dynamics p.
With such a concept in hand, we can formally establish the fact that criterion Eq. ( 6) can be relaxed in the existence of uniformly optimal actions.
Theorem 4. Consider a finite number of discrete dL programs α i such that ⊨ ⟨α i ⟩ true for all i and p ≡ {x ′ = f (x) & q ≥ 0}.Then, provided that best j ((α i ) i , p, m) and best j ((α i ) i , p, −q) (no other action stops earlier because of the domain constraint), we have: A general heuristic for leveraging Theorem 4 to grow U automatically works as follows.First, it considers R ≡ assum∧¬I ∧¬U that characterizes states that are not known to be controllable or uncontrollable.Then, it picks a disjunct j R j of the disjunctive normal form of R and computes a forward invariant region V that intersects with it: an assumption to simplify ¬U may suggest metrics to be used with Theorem 4. For example, observing ⊨ V → (¬U → (θ 1 > 0 ∧ θ 2 > 0)) suggests picking metric m ≡ min(θ 1 , θ 2 ) and testing whether best j (act, p, m) is true for some action j.If such a uniformly optimal action exists, then U can be updated as U ← U ∨ (V ∧⟨(act j ; plant) * ⟩ m ≤ 0).The solution I 1 for the corridor (Model 2) can be proved optimal automatically using this heuristic in combination with loop unrolling.

Implementing the Reduction Oracle
The CESAR algorithm assumes the existence of a reduction oracle that takes as an input a loop-free dGL formula and attempts to compute an equivalent formula within the fragment of propositional arithmetic.When an exact solution cannot be found, an implicant is returned instead and flagged appropriately (Def.3).This section discusses our implementation of such an oracle.As discussed in Section 3.4, exact solutions can be computed systematically when all ODEs are solvable by first using the dGL axioms to eliminate modalities (see Appendix B) and then passing the result to a quantifier elimination algorithm for first-order arithmetic [9,41].Although straightforward in theory, a naïve implementation of this idea hits two practical barriers.First, quantifier elimination is expensive and its cost increases rapidly with formula complexity [11,43].Second, the output of existing QE implementations can be unnecessarily large and redundant.In iterated calls to the reduction oracle, these problems can compound each other.
To alleviate this issue, our implementation performs eager simplification at intermediate stages of computation, between some axiom application and quantifier-elimination steps.This optimization significantly reduces output solution size and allows CESAR to solve a benchmark that would otherwise timeout after 20 minutes in 26s.Appendix E further discusses the impact of eager simplification.Still, the doubly exponential complexity of quantifier elimination puts a limit on the complexity of problems that CESAR can currently tackle.
In the general case, when ODEs are not solvable, our reduction oracle is still often able to produce approximate solutions using differential invariants generated automatically by existing tools [37].Differential invariants are formulas that stay true throughout the evolution of an ODE system. 4 To see how they apply, consider the case of computing reduce([{x ′ = f (x)}] P, A) where P is the postcondition formula that must be true after executing the differential equation, and A is the assumptions holding true initially.Suppose that formula D(x) is a differential invariant such that D(x) → P is valid.Then, a precondition sufficient to ensure that P holds after evolution is A → D(x).For a concrete example, Appendix C shows how our reduction oracle computes the precondition for the dynamics of the parachute benchmark.It first uses the Pegasus tool [37] to identify a Darboux polynomial, suggesting an initial differential invariant D 0 .Once we have D 0 , the additional information required to conclude post condition P is D 0 → P .To get an invariant formula that implies D 0 → P , eliminate all the changing variables {x, v} in the formula ∀x ∀v (D 0 → P ), resulting in a formula D 1 .D 1 is a differential invariant since it features no variable that is updated by the ODEs.Our reduction oracle returns D 0 ∧ D 1 , an invariant that entails postcondition P .More details on our implementation of reduce and how it deals with ODEs in particular can be found in Appendix A.

The CESAR Algorithm
The CESAR algorithm for synthesizing control envelopes is summarized in Algorithm 1.It is expressed as a generator that yields a sequence of solutions with associated optimality guarantees.Possible guarantees include "sound " (no optimality guarantee, only soundness), "k-optimal " (sound and optimal w.r.t all k-switching fallbacks with permanent actions), "ω-optimal " (sound and optimal w.r.t all finite fallbacks with permanent actions) and "optimal " (sound and equivalent to S opt ).Line 11 performs the optimality test described in Section 3.6.Finally, Line 10 performs an important soundness check for the cases where an approximation has been made along the way of computing (I n , G n ).In such cases, I is not guaranteed to be a controllable invariant and thus Case (2) of Def. 1 must be checked explicitly.
When given a problem with solvable ODEs and provided with a complete QE implementation within reduce, CESAR is guaranteed to generate a solution in finite time with an "n-optimal " guarantee at least (n being the unrolling limit).

Benchmarks and Evaluation
To evaluate our approach to the Control Envelope Synthesis problem, we curate a benchmark suite with diverse optimal control strategies.As Table 2 summarizes, some benchmarks have non-solvable dynamics, while others require a sequence of clever control actions to reach an optimal solution.Some have state-dependent fallbacks where the current state of the system determines which action is "safer", and some are drawn from the literature.We highlight a couple of benchmarks here.See Appendix D for a discussion of the full suite and the synthesized results, and [20] for the benchmark files and evaluation scripts.Power Station is an example where the optimal control strategy involves two switches, corresponding to two steps of unrolling.A power station can either produce power or dispense it to meet a quota, but never give out more than it has produced.Charging is the fallback action that is safe for all time after the station has dispensed enough power.However, to cover all controllable states, we need to switch at least two times, so that the power station has a chance to produce energy and then dispense it, before settling back on the safe fallback.Parachute is an example of a benchmark with non-solvable, hyperbolic dynamics.A person jumps off a plane and can make an irreversible choice to open their parachute.The objective is to stay within a maximum speed that is greater than the terminal velocity when the parachute is open.
We implement CESAR in Scala, using Mathematica for simplification and quantifier elimination, and evaluate it on the benchmarks.Simplification is an art [24,22].We implement additional simplifiers with the Egg library [44] and SMT solver z3 [29].Experiments were run on a 32GB RAM M2 MacBook Pro machine.CESAR execution times average over 5 runs.
CESAR synthesis is automatic.The optimality tests were computed manually.Table 2 summarizes the result of running CESAR.Despite a variety of different control challenges, CESAR is able to synthesize safe and in some cases also optimal safe control envelopes within a few minutes.As an extra step of validation, synthesized solutions are checked by the hybrid system theorem prover KeYmaera X [16].All solutions are proved correct, with verification time as reported in the last column of Table 2.

Related Work
Hybrid controller synthesis has received significant attention [25,40,7], with popular approaches using temporal logic [5,7,45], games [30,42], and CEGIS-like guidance from counterexamples [38,1,36,10].CESAR, however, solves the different problem of synthesizing control envelopes that strive to represent not one but all safe controllers of a system.Generating valid solutions is not an issue (a trivial solution always exists that has an empty controllable set).The real challenge is optimality which imposes a higher order constraint because it reasons about the relationship between possible valid solutions, and cannot, e.g., fit in the CEGIS quantifier alternation pattern ∃∀.So simply adapting existing controller synthesis techniques does not solve symbolic control envelope synthesis.Safety shields computed by numerical methods [2,13,23] serve a similar function to our control envelopes and can handle dynamical systems that are hard to analyze symbolically.However, they scale poorly with dimensionality and do not provide rigorous formal guarantees due to the need of discretizing continuous systems.Compared to our symbolic approach, they cannot handle unbounded state spaces (e.g.our infinite corridor) nor produce shields that are parametric in the model's parameters without hopelessly increasing dimensionality.
On the optimality side, a systematic but manual process was used to design a safe European Train Control System (ETCS) and justify it as optimal with respect to specific train criteria [33].Our work provides the formal argument filling the gap between such case-specific criteria and end-to-end optimality.CESAR is more general and automatic.

Conclusion
This paper presents the CESAR algorithm for Control Envelope Synthesis via Angelic Refinements.It is the first approach to automatically synthesize symbolic control envelopes for hybrid systems.The synthesis problem and optimal solution are characterized in differential game logic.Through successive refinements, the optimal solution in game logic is translated into a controllable invariant and control conditions.The translation preserves safety.For the many cases where refinement additionally preserves optimality, an algorithm to test optimality of the result post translation is presented.The synthesis experiments on a benchmark suite of diverse control problems demonstrate CESAR's versatility.For future work, we plan to extend to additional control shapes, and to exploit the synthesized safe control envelopes for reinforcement learning.

A Reduce Operation
We define reduce after first introducing two helper functions that it requires.Function ▷(a, b) attempts to simplify FOL R formula a to P R assuming that b holds.The second helper function, odereduce (Def.6), isolates the action of reduce on differential equations.Since it is solely continuous programs that could lead to reduce failing to produce an exact solution, the exact bit of reduce depends on odereduce.Fig. 2 shows the definition of reduce, eliding the exact bit, which is simply true if all of the odereduce calls that reduce makes are exact, and false otherwise.and A, Q, and P be formulas in quantifier-free real arithmetic.An ODE reduction oracle odereduce is a function such that For solvable ODEs, odereduce is implemented as an exact oracle in Eq. (7).
In the general case, Pegasus [37], a tool that automatically generates ODE invariants, can often produce a formula satisfying the specification of odereduce.This may come at the cost of lost precision, possibly requiring reduce to set exact to false.Theorem 5 (Correctness of reduce).For any loop-free dGL formula F and assumptions A ∈ P R the function odereduce either sets exact=true and the formula A → (reduce(F, A) ↔ F ) is valid, or else it sets exact=false and the formula A → (reduce(F, A) → F ) is valid.

B.1 Background
The dGL axioms and proof rules [32] used here are summarized in Fig. 3, noting that ϕ → ψ and ϕ ⊢ ψ have the same meaning.The semantics [32] is as follows.
Definition 7 (dGL semantics).The semantics of a dGL formula ϕ is the subset Definition 8 (Semantics of hybrid games).The semantics of a hybrid game α is a function ς α (•) that, for each set of Angel's winning states X ⊆ S, gives the winning region, i.e. the set of states ς α (X) from which Angel has a winning strategy to achieve X in α (whatever strategy Demon chooses).It is defined inductively as follows The winning region of Demon, i.e. the set of states δ α (X) from which Demon has a winning strategy to achieve X in α (whatever strategy Angel chooses) is defined inductively as follows

B.2 Lemmas and Theorems from the Main Text
Lemma 1.I opt is a controllable invariant and it is optimal in the sense that ⊨ I → I opt for any controllable invariant I.
Proof.Let us first prove that I opt ≡ [((∩ i act i ) ; plant) * ] safe is a controllable invariant.First, note that axiom [ * ] along with the definition of I opt derives Safety ⊢ I opt → safe derives from Eq. ( 8) propositionally.Controllable invariance ⊢ I opt → i [act i ; plant] I opt derives from Eq. ( 8): This concludes the proof that I opt is a controllable invariant.Let us now prove that I opt is optimal.That is, let us consider a controllable invariant I and derive Fig. 3: dGL axiomatization and derived axioms and rules The two remaining premises are the two parts of the definition of I being a controllable invariant.This concludes the proof.⊓ ⊔ Lemma 2. If I is a controllable invariant, then (I, G(I)) is a solution (Def.1).
Proof.Let us assume that I is a controllable invariant and prove that (I, G) is a solution with G ≡ G(I).We first need to prove that assum ∧ I is an invariant for prob(I, G).Since assum is already assumed to be an invariant, it is enough to prove that I is an invariant itself.I holds initially since I → I is valid and it implies safe by definition of a controllable invariant.Preservation holds by the definition of G: where the axioms [;],[∪],[?]are used to unpack and repack the games leading to one conjunct for each action (treated separately via ∧R).We also need to prove that an action is always available, which holds by virtue of I being a controllable invariant: Proof.Consider a solution (I, G).We prove that I ′ ≡ (assum∧I) is a controllable invariant.Per Def. 1, I ′ is an invariant for prob(I, G) and so ⊨ I ′ → safe.Also, we have the following derivation which repacks games via axioms [∪],[?],[;] using their equivalences: where the open premises are part of the definition of (I, G) being a solution according to Def. 1.Let us now prove that (I, G) ⊑ (I ′ , G(I ′ )).Trivially, we have ⊨ assum → (I → (assum∧I)).Let us now derive ⊨ assum∧I → i (G i → G(I ′ ) i ): where the remaining premise is part of the definition of (I, G) being a solution.This concludes the proof.⊓ ⊔ Lemma 4. If I and I ′ are controllable invariants, then (I, Proof.Let us first assume that assum ⊨ I → I ′ and prove that (I, To do so, we leverage the fact that assum is an invariant.* assum, The reverse direction follows trivially from the definition of ⊑. ⊓ ⊔ Theorem 1. S opt is an optimal solution (i.e. a maximum w.r.t.⊑) of Def. 1.
Proof.We have S opt ≡ (I opt , G(I opt )).From Lemma 1 and Lemma 2, S opt is a solution.Let (I, G) be another solution.From Lemma 3, there exists a controllable invariant I ′ such that (I, G) ⊑ (I ′ , G(I ′ )).Then, from Lemma 4 and from the optimality of I opt (Lemma 1), we have (I ′ , G(I ′ )) ⊑ (I opt , G(I opt )).By transitivity, (I, G) ⊑ S opt .This concludes the proof.
Proof.We first prove that (act i ; plant) n ≡ (act i ; plant n ) by induction on n ≥ 1.The base case is trivial.Regarding the induction case, we have From this, we get (act i ; plant) * ≡ ?true ∪ (act i ; plant * ) from the semantics of loops in dGL.Thus, we have ⊨ [(act i ; plant) * ] safe ↔ safe ∧ [act i ; plant ∞ ] safe since t does not appear free in safe.From this, we prove our theorem by noting that safe ∧ [act i ; plant ∞ ] safe ↔ [act i ; plant ∞ ] safe since act i cannot write any variable that appears in safe.
Proof.Trivially, we have ⊨ I 0 → safe.More interestingly, let us prove that I 0 → ∨ i [α i ] I 0 where α i ≡ (act i ; plant).The proof crucially leverages the permanence assumption via the identity Proof.We proceed by induction on n.The base case is covered by Theorem 2.
Assume that I n is a controllable invariant and prove that I n+1 is one also.Abbreviate α i ≡ (act i ; plant).Without loss of generality, assume that all actions are permanent since non-permanent actions play no role in computing I n .The hard part is in proving that The first premise is a consequence of (I n , G n ) being a solution (our induction hypothesis) and the second one is a trivial consequence of the definition of I n+1 .We can now focus on proving the last premise.
To do so, it is useful to introduce the following predicate: Intuitively, R(a, b) is true if following the dynamics leads to reaching I n within time interval [a, b] while being safe the whole time.Using this predicate, we can reformulate [step] I n as follows: [step] In addition, Lemma 6 gives us the following key property of R: We can now complete the proof using Eq. ( 9): where we abbreviate Γ ≡ θ ≥ 0, [act i ] R(θ, θ + T ).In the case where t ≥ θ after a control cycle, the agent has reached I n and therefore In the case where t ≤ θ after a control cycle, the agent must perform the same action again with a timeout of This concludes the proof.⊓ ⊔ Lemma 6.Consider a property of the form R(a, b) Then this formula is valid: Proof.This follows from the semantics of dL since all involved differential equations are the same and t ≤ c ≤ b is the duration that passes during α c , and thus explaining the offset of −t on the time interval arguments of R(a, b) ⊓ ⊔ .
Lemma 7. If I n ↔ I n+1 is valid for some n ≥ 0, then I n ↔ I m is valid for all m ≥ n and I n ↔ I ω is valid where I ω ≡ [step × ; forever] safe.
Proof.The first part is simply a case of a recursive sequence I n+1 ≡ F (I n ) reaching a fixpoint (F (I) ≡ I ∨ [step] I).Let us then prove the ⊨ I n ↔ I ω equivalence, or rather the nontrivial direction ⊨ I ω → I n .From ⊨ I n ↔ I n+1 , we get ⊨ I n ↔ I n ∨ [step] I n and so ⊨ [step] I n → I n .In addition, by the monotonicity of (I n ) n , we have ⊨ I 0 → I n .The rest follows from the FP × rule: Consider a finite number of discrete dL programs α i such that ⊨ ⟨α i ⟩ true for all i and p ≡ {x ′ = f (x) & q ≥ 0}.Then, provided that best j ((α i ) i , p, m) and best j ((α i ) i , p, −q) (no other action stops earlier because of the domain constraint), we have: Table 3: Benchmark listing.

Benchmark Description
ETCS Train Kernel of the European Train Control System case study [33].Sled Swerve to avoid a wall.Intersection Car must either cross an intersection before the light turns red, or stop before the intersection.It may never stop at the intersection.Curvebot A Dubin's car must avoid an obstacle.

Parachute
Use the irreversible choice to open a parachute with some air resistance to stay at a safe velocity.Drawn form [15]. Corridor Navigate a corridor with a side passage and dead end.Power Station Choose between producing and distributing power while dealing with resistance loss and trying to meet a quota.Coolant A coolant system in an energy plant must maintain sufficient heat absorption while meeting coolant discharge limits.

D.1 ETCS Train
The European Train Control System has been systematically but manually modeled and proved safe in the literature [33].We consider the central model of this study and apply CESAR to automate design.It is listed as the running example Model 1.

D.2 Sled
Model 3 Sled must swerve to avoid an obstacle.
Where I, G1 and G2 are to be synthesized.
This benchmark displays CESAR's ability to reason about state-dependent fallbacks.The slope of a hill is pushing a sled forward along the y axis with constant speed v y .However, there is a wall blocking the way.It starts at y axis position T y and extends along the x axis from −T x to T x .The sled must swerve to avoid the obstacle.It can either go left (with velocity −V ) or right (with velocity V ).Which action is best depends on where the sled already is.Swerving left is a safe strategy when the sled can pass from the −x side, mathematically, T y > T x + x + y.Likewise, swerving right is a safe strategy when T y + x > T x + y.
Neither action alone gives the optimal invariant, but CESAR's I 0 characterization correctly captures the disjunction to find it.In synthesizing the guards for the actions, CESAR also identifies when it is still safe to switch strategies.
CESAR finds the solution below.The algebraic formulas presented are synthesized by CESAR automatically.The annotations describing their meaning are added manually for the convenience of the reader.
The guard for going left is G 1 ≡ despite going left one time period, can still pass from right The guard for going right is There are some redundancies in these expressions, but they are correct, comprehensible, and can be simplified further.
Where I, G1 and G2 are to be synthesized.
The parachute benchmark presents the challenge of dynamics with a solution that departs from the decidable fragment of real arithmetic.A person is free falling.At most once, they are allowed to take the action of opening a parachute.Once they do, their air resistance changes to p.The objective is to land at a speed no greater than m.The benchmark is inspired by one that appears in the literature (running example in [15]).CESAR uses Pegasus's Darboux Polynomial generation to solve the problem.
CESAR finds the solution below.The algebraic formulas presented are synthesized by CESAR automatically.The annotations describing their meaning are added manually for the convenience of the reader.

I ≡
either start below terminal velocity and terminal velocity without parachute is already safe g ≥ rv 2 ∧ m > (gr −1 ) 1/2 ∨ or air terminal velocity with parachute is safe and start below terminal velocity m > (gp −1 ) 1/2 ∧ pv 2 ≤ g The guard for not opening the parachute is that already terminal velocity without the parachute is safe, and the person has not exceeded terminal velocity yet.
Likewise, the guard for opening the parachute is that terminal velocity with the parachute is safe.
Where I, G1 and G2 are to be synthesized.
The intersection benchmark is a simple example of a system with free choice and state dependent fallback.A car sees a yellow light, and must decide whether to coast past the intersection, or to stop before it.It may never stop at the intersection, which is located at x = 0. Whether it would be safe to stop or to coast depends on the car's current position and velocity.
CESAR finds the solution below.The algebraic formulas presented are synthesized by CESAR automatically.The annotations describing their meaning are added manually for the convenience of the reader.

I ≡
safe if already past the intersection x > 0 ∨ otherwise, before the intersection, safe if velocity is already 0 or could coast past intersection ∨ v > 0 ∧ timeT oRed > 0 ∧ ( signal flips before car stops.Even braking, car crosses intersection before signal flips ∨ stop somewhere that's not the intersection, the signal flips after the car stops The guard for coasting has many repeated clauses, so we first explain them before presenting the expression.Assuming v positive and x ≤ 0, C ≡ v 3 + 2Bv(T v + x) < 0 means that after one time period of coasting, the car still stops before the intersection.D 1 ≡ 0 = 3timeToRed + 2v −1 x means that the signal flips when the car is 2/3rds along the way of coasting to the intersection.D 2 ≡ v(3timeToRed•v+2x) > 0 means that the signal flips after the car is 2/3rds along the way of coasting to the intersection.D 3 ≡ v(3timeToRed • v + 2x) < 0 means that the signal flips before the car is 2/3rds along the way of coasting to the intersection.E ≡ B = timeToRed −1 v means that the signal flips exactly when the car halts if it starts braking now.F ≡ 0 = timeToRed + v −1 x means that the signal flips exactly when the car reaches the intersection by coasting.G ≡ 2B + v 2 (timeToRed • v + x) −1 = 0 means that the car will be at the intersection when the signal flips if it starts braking now.H 1 ≡ v(timeToRed • v + x) < 0 means that if the car coasts, it will be before the intersection when the signal flips.H 2 ≡ v(timeToRed • v + x) > 0 means that if the car coasts, it will be after the intersection when the signal flips.
Likewise, the guard for braking has many repeated clauses.P ≡ 2B + v 2 /x ≤ 0 means that the car won't stop before the intersection.Q ≡ timeToRed > BT 2 +2x 2(BT −v) means that even after one braking cycle, the car can cross the intersection by coasting.R ≡ x(v 2 + 2Bx) > 0 means that the car will stop before the intersection.S ≡ B • timeToRed + (v 2 + 2Bx) 1/2 > v means that the car will stop before the intersection before the signal turns red.U ≡ 2(T v + x) < BT 2 means that the car will stop before time T. V ≡ BT 2 = 2(T v + x) means that if it were to brake, the car will come to a stop at time T. W ≡ BT 2 < 2(T v + x) means that if it were to brake, the car will come to a stop after time T. Because of structural similarities with G 1 , we do not provide a full annotation.Curvebot models a Dubin's car that must avoid an obstacle at (0,0).The dynamics result in a solution that is not in the decidable fragment of arithmetic, so CESAR again uses Pegasus to find a controllable invariant.
Our implementation generates the optimal invariant, consisting of everywhere except the origin.The algebraic formulas presented are synthesized by CESAR automatically.The annotations describing their meaning are added manually for the convenience of the reader.
The guard for setting om to 1 is simply that the origin does not lie on the resulting circular path.
This solution is almost optimal.It only misses the cases for G 1 and G 2 where despite the obstacle lying on the circular path, T is small enough that there is time to switch paths before collision.

D.6 Corridor
Corridor, shown in Model 2, is an example of a system requiring unrolling.
Power station is an example of a system that needs two steps of unrolling in order to reach the optimal invariant.A power station capable of producing 7000 kW can choose between charging (Line 3) and distributing out stored power at current 100 A and voltage of 2000 V (Line 4).Its objective is to meet an energy quota of 3000 J, excluding loss due to resistance of 5Ω by the time that timer gt counts down to 0. The station must never reach a state where it has no stored power left.The system is modeled in Model 7. The zero-shot invariant corresponds to the case where the station has already met its quota and is now charging.Discharge is not included in the invariant because regardless of how high stored energy is, in the unbounded time of the zero-shot invariant, a station that has chosen to distribute power will eventually run out of it, thus violating the condition stored > 0. The one shot invariant catches the case where the station first chooses to discharge till it meets its quota, and then flips to charging for infinite time.The two shot invariant, which is finally optimal, catches the case where the station first chooses to charge till it has enough energy stored to meet its quota, then discharges, and finally switches back to charging mode for infinite time.CESAR finds the solution below.The algebraic formulas presented are synthesized by CESAR automatically.The annotations describing their meaning are added manually for the convenience of the reader.The guard for charging checks that the choice to charge for a cycle still leaves enough time for distributing the power.The guard for drawing no water just needs to check that it still has enough time to absorb enough heat.The guard for drawing water checks that drawing water one cycle will not exceed the discharge limit.Finally, even on benchmark examples that would terminate without the use of eager simplification, enabling this optimization often results in shorter solutions.Table 4 shows the percentage reduction in size of solutions due to simplification, where size is measured in terms of number of characters.A 0% reduction means no change and a 50% reduction means cutting formula-size in half.

Fig. 1 :
Fig. 1: Robot navigating a corridor (Model 2).A 2D robot must navigate safely within a corridor with a dead-end without crashing against a wall.The corridor extends infinitely on the bottom and on the right.The robot can choose between going left and going down with a constant speed V .The left diagram shows I 0 in gray.The right diagram shows I 1 under the additional assumption V T < 2R (I 1 and I 0 are otherwise equivalent).A darker shade of gray is used for regions of I 1 where only one of the two available actions is safe according to G 1 .

Fig. 2 :
Fig. 2: Definition of reduce (exact elided).Notation P {e/x} indicates P with unbound occurrences of x replaced by expression e. odereduce isolates the effect of reduce on ODEs.▷ simplifies and quantifier eliminates P assuming A.
at the left boundary 0 = T x + x ∨ far enough right to stay right after this cycle right T x + y < T y + x ∧ T x > x ∨ far enough left to stay right after this cycle T x + T V + x < 0 ∧ T x + T y + x ≤ y ∨ right at the right boundary but there is time still to swerve T y > y ∧ T x ≤ x

D. 7
Power Station Model 7 Power Station benchmark.

G 1 ≡
enough time to charge then distribute gt > 1 ∧ 3000(gt• 50 − 51) + produced > 0 ∨ quota already met gt • 35 < 36 ∧ produced > 3000The guard for distributing basically checks that the choice to discharge for a cycle still leaves enough stored energy.

Table 1 :
Hybrid game operators for two-player hybrid systems Choice α∪β lets Angel choose whether to play α or β.For repetition α * , Angel repeats α some number of times, choosing to continue or terminate after each round.The dual game α d switches the roles of players.For example, in the game ?ψ d , Demon passes the challenge if the current state satisfies ψ, and otherwise loses immediately.
d ∪ β d ) d gives choice between α and β to Demon α × demonic repetition ((α d ) * ) d gives control of repetition to Demon terminates without a player having lost, continues with β.
which is loop-free and easily symbolically executed.Symbolically executing this game to reach safety condition safe yields controllable invariant e − p > v 2

Table 2 :
Summary of CESAR experimental results

Table 4 :
Simplification impact on solution size.