Dynamic Hierarchical Reactive Controller Synthesis

In the formal approach to reactive controller synthesis, a symbolic controller for a possibly hybrid system is obtained by algorithmically computing a winning strategy in a two-player game. Such game-solving algorithms scale poorly as the size of the game graph increases. However, in many applications, the game graph has a natural hierarchical structure. In this paper, we propose a modeling formalism and a synthesis algorithm that exploits this hierarchical structure for more scalable synthesis. We define local games on hierarchical graphs as a modeling formalism which decomposes a large-scale reactive synthesis problem in two dimensions. First, the construction of a hierarchical game graph introduces abstraction layers, where each layer is again a two-player game graph. Second, every such layer is decomposed into multiple local game graphs, each corresponding to a node in the higher level game graph. While local games have the potential to reduce the state space for controller synthesis, they lead to more complex synthesis problems where strategies computed for one local game can impose additional requirements on lower-level local games. Our second contribution is a procedure to construct a dynamic controller for local game graphs over hierarchies. The controller computes assume-admissible winning strategies that satisfy local specifications in the presence of environment assumptions, and dynamically updates specifications and strategies due to interactions between games at different abstraction layers at each step of the play. We show that our synthesis procedure is sound: the controller constructs a play which satisfies all local specifications. We illustrate our results through an example controlling an autonomous robot in a known, multistory building.


I. INTRODUCTION
Algorithmic reactive synthesis has recently emerged as a robust methodology to design correct-by-construction controller for specifications given in temporal logics (see, e.g., Girard and Pappas, 2009;Tabuada, 2009;Kloetzer and Belta, 2008;Wolff et al., 2013;Wong et al., 2013).In this technique, one solves a two-player discrete-time game on a graph between the system and the environment players, where the winning condition is specified in linear-time temporal logic.The game graph is usually obtained as a discrete abstraction of the underlying, possibly continuous or hybrid, dynamics.A winning strategy for the system player in such a game can be computed by algorithmic techniques from reactive synthesis (Zielonka, 1998;Emerson and Jutla, 1991).Such a system winning strategy gives a discrete controller, which can usually be refined to a continuous controller using primitives from continuous control.This controller synthesis methodology has been implemented in symbolic tools (Wongpiromsarn et al., 2011;Mazo et al., 2010;Finucane et al., 2010) and was successfully applied in a number of case studies, e.g., by Wong et al. (2013); Wongpiromsarn et al. (2010).
The two major concerns in the application of reactive synthesis to large problems is (i) the poor scalability of the symbolic game solving algorithms with increasing size of the game graph, and (ii) the limited existence of winning strategies against adversarial environment players in realistic settings.In this paper, we address these challenges by extending the scope of reactive synthesis for control by (i) introducing local game graphs over hierarchies as a new decomposed model, (ii) formalizing hierarchical reactive games over such models, and (iii) proposing a sound reactive controller synthesis algorithm for such games.This algorithm allows for dynamic specification changes and uses the construction of assumeadmissible winning strategies Brenguier et al. (2015) to explicitly model and use environment assumptions.
a) Local Game Graphs over Hierarchies: The modeling formalism introduced in this paper allows to exploit the intrinsic hierarchy and locality of a given large-scale system.This decomposes the controller synthesis problem into multiple small ones.Here, hierarchy means that the game graph allows for the introduction of abstract layers.Locality means that a state at a higher layer naturally corresponds to a sub-arena of the game graph at the next lower layer which is independent from all the other games at the same layer.
As an example, consider an autonomous robot traversing the floors of a building.The lowest layer of the game graph, the game under consideration in existing reactive synthesis techniques, would consist of states defined by grids giving the location and velocity of the robot in each room and each floor of the building, together with additional predicates, such as the location of obstacles, whether the robot is carrying something, or the open-closed status of each door.However, there is a natural hierarchy of abstractions: at the highest layer, we care only about the floors and may ask the robot to move from one floor to another; in the next layer, we would like to know the specific room it is in and specify which room to go next, and only within the context of a room, we may care about where exactly the robot is and where it has to go next.To model this hierarchy, we introduce a set of layers on top of a game graph, each being a game graph itself, where a state at a higher layer (e.g. a room) corresponds to a sub-arena of the game graph at the next lower layer (i.e., all states located inside this room), modeling locality within the hierarchy.
A.-K. Schmuck and Rupak Majumdar are with the Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany.
{akschmuck,rupak}@mpi-sws.org Such hierarchical and local decompositions are also heuristically applied in robotics.Examples are general modeling frameworks, such as hierarchical task-networks (HTN) (Erol et al., 1995) or Object-Action Complexes (OAC) (Kruger et al., 2009), or particular software architectures for incorporating long term tasks and short time motion planning for robots (Kaelbling and Lozano-Perez, 2011;Srivastava et al., 2014;Stock et al., 2015).One could view our abstraction layers, their interaction, and the system dynamics as an equivalent formalism to model task networks.Our controller synthesis algorithms should also apply to design controllers in these formalisms.To the best of our knowledge, the problem of correct-byconstruction synthesis for temporal logic specifications (beyond reachability) in the presence of environment assumptions has not been considered by these other formalisms.
Hierarchical approaches for control exist for other correct-by-construction controller synthesis techniques in the control community, such as supervisory control (e.g., Schmidt et al., 2008), hybrid control (e.g., Raisch and Moor, 2005), or continuous control (e.g., Pappas et al., 2000), but these can usually not handle temporal logic specifications.
In many large-scale projects using reactive controller synthesis, such as autonomous vehicles (Hess et al., 2014;Wongpiromsarn et al., 2012) and autonomous flight control (Koo and Sastry, 2002), similar hierarchical and local decompositions are implicitly and informally performed.However, there is no clear theoretical model connecting "low-layer" reactive control and "higher layer" task planning in their work, which is provided by our approach.
b) Hierarchical Reactive Games: To effectively use the constructed hierarchies of local game graphs for reactive controller synthesis, we assume that the specification is also decomposed into a set of local requirements, each restricted to one sub-arena of a particular layer, together with one "global" game at the highest layer.While such a decomposition is not guaranteed to exist for a given specification, it is usually quite natural to exist for specifications over large scale systems with intrinsic hierarchy and locality.For example, for the robot, one may consider the specifications: (i) a floor-layer task "visit all floors", (ii) a room-layer task "visit all rooms" for each floor, and (iii) a low layer task "if there is an empty bottle [in the current room], reach it and pick it up" for every room.
Synthesizing winning strategies for local games over hierarchies w.r.t.such sets of local specifications becomes challenging due to the interplay between layers both in a bottom-up and a top-down manner.The top-down interplay results because applying a strategy in a higher layer introduces additional specifications for the lower layer.For example, a requested move from one room to an adjacent one requires the local game in this room to fulfill a reachability specification in addition to its local specification.The bottom-up interplay results from the fact that moves in the lowest layer game correspond to moves in all higher layers which might change the strategy.For example, consider a room with two doors to two different adjacent rooms.The higher layer strategy may initially pick one door to continue.However, if this door gets closed before it was reached in the lower layer game, the higher layer strategy might ask to reach the second door instead.Thus, in each local game, winning objectives are generated dynamically, based on the strategy at a higher layer, the local specification for the local game and the current system and environment state in the lowest layer.
Intuitively, such interactive hierarchical games are similar to pushdown and modular games (Walukiewicz, 1996;Alur et al., 2003;De Crescenzo and La Torre, 2013), where the local state and the stack determine which (single) local game is played at a particular time point.In contrast, we always play one local game in every layer simultaneously, where visited states in different layers are projections of one another.Therefore, a move in one layer has to be correlated with the games at all other layers at all time steps, giving the dynamic interaction described above.
Our work also relates naturally to abstraction and refinement techniques in game solving, (e.g., Cousot and Cousot, 1977;Henzinger et al., 2000;Abadi and Lamport, 1991), which map "concrete" game structures with "abstract" ones with more abstract timing, to solve a single game for a global specification using different abstraction layers.In comparison, we propose a hierarchical structure where every system state is refined to a whole new local sub-game, having its own specification.Therefore, the game in the higher layer does only proceed for one step once the lower layer local sub-game is completed.In this sense we are "stitching" together solutions of local games in the lowest layer in a particular way which is determined by higher level games, to obtain a solution to the global game.
c) Dynamical Controller Synthesis: Given the hierarchical reactive games described above, we propose a reactive controller synthesis algorithm to solve such games, which allows for dynamic specification changes at each step of the play.Intuitively, the controller solves the dynamically constructed local games online and "stitches" their solutions together following the rules of the hierarchical game.Notice that a strategy computed at a level imposes additional conditions on games at lower levels; thus, we use a dynamic controller synthesis algorithm that updates the strategies as the game progresses.
In principle, any algorithm which calculates a winning strategy for a two-player game can be used as a building block to solve local games (e.g., Zielonka, 1998;Emerson and Jutla, 1991;Kupferman and Vardi, 2001;Ehlers and Finkbeiner, 2011;Kupferman and Weiner, 2012).However, these algorithms calculate winning strategies against any environment behavior.In most applications, such as our robot example, the requirement that the system wins against any environment strategy is too strong.For instance, in the robot example it is possible, but very unlikely, that an employee keeps an office door closed forever to prevent the robot to fulfill its task.Therefore, recently, assumptions on the environment behavior, which model "likely" behaviors of the latter, were considered to constrain the synthesis problem (see Bloem et al. (2014) and Brenguier et al. (2015) for a detailed overview of recent results).Intuitively, the constrained synthesis problem then asks if the system can win provided that the environment only behaves according to its assumptions.One type of strategies solving this problem are assume-admissible winning strategies by Brenguier et al. (2015).As this is the most expressive available technique to deal with environment assumptions known by the authors, we use their synthesis algorithm as a building block in our algorithm.
We prove that, whenever the environment meets its assumptions and all dynamically generated local games have a solution, our dynamical synthesis algorithm generates a winning hierarchical play for a given specification, i.e., the algorithm is sound.If these assumptions do not hold, we show that the play gets stuck but does not violate the specification up to this point.
The dynamic nature of our controller is also similar to the receding horizon strategies proposed by Wongpiromsarn et al. (2012); Vasile and Belta (2014), which translate long term goals into current local reachability specifications.This approach allows for a particular two-layer hierarchy and uses time horizons to decompose the synthesis problem locally.However, the general intrinsic hierarchical and local decomposability of a synthesis problem and the interaction of multiple abstract games is not formally exploited.In our presentation, our control synthesis algorithm solves local games completely; however, we can also use a receding horizon controller for each local game.
This paper was motivated by a systems project to build an end-to-end autonomous robotic telepresence system.For the scale of this model, existing reactive synthesis techniques would not work.However, the overall problem has a natural decomposition captured by our proposed model.While this paper focuses on the theoretical foundations of such a formal model and its reactive controller synthesis, we will discuss the implementation and systems aspects of our technique in a different paper.

II. PRELIMINARIES
In this section we first introduce notation and recall existing results from reactive synthesis.Then we discuss a detailed example to motivate our work.
A. Reactive Synthesis Revisited d) Notation: For a set W , we denote by W * , W + , and W ω the set of finite sequences, non-empty finite sequences, and infinite sequences, respectively, over W .We write W ∞ = W * ∪ W ω .For w ∈ W * , we write |w| for the length of w; the length of w ∈ W ω is ∞.We define dom(w) = {0, . .., |w| − 1} if w ∈ W * , and dom(w) = N if w ∈ W ω .We denote by dom + (w) = dom(w) \ {0} the positive domain of w.For k ∈ dom(w) we write w(k) for the kth symbol of w, ⌈w⌉ = w(|w| − 1) for the last symbol of w, and w| [0,k] for the restriction of w to the domain [0, k].Furthermore, w • w ′ for w ∈ W * and w ′ ∈ W ∞ denotes the concatenation of two strings.The prefix relation on strings is defined by w ⊑ w ′ if ∃w ′′ ∈ W * .w • w ′′ = w ′ .Given a set of strings ϕ ⊆ W ∞ , we denote by ϕ = ϕ ∪ {w ∈ W * | ∃w ′ ∈ ϕ .w ⊑ w ′ } the set of strings in ϕ and all their finite prefixes.Slightly abusing notation, we denote by w the set {w} of all prefixes of the string w ∈ W ∞ .e) Two-Player Games: A two-player game graph G = (X , Y , δ, ρ) between environment and system consists of a set of environment states X , a set of system states Y , an environment transition map δ : X × Y → 2 X , and a system transition map ρ : X × Y → 2 Y .We assume G is serial, i.e., δ and ρ map each input to non-empty sets.
A play π is finite if |π| < ∞ and infinite otherwise.The set of all plays is denoted by G .We model a winning condition in a two-player game as a set of plays ϕ ⊆ G .This set can be represented in different ways, e.g., by an LTL formula or by an ω-automaton.While our results do not assume a particular representation, the latter will determine the algorithm needed to solve the two-player game.
Given a game graph G, a set of initial strings I = (X × Y ) + ⊆ G and a winning condition ϕ ⊆ G , the tuple (G, I , ϕ) is called a game on G w.r.t.I and ϕ.A play π ∈ G is winning (resp.possibly winning) for (G, I , ϕ) if there exists an n ∈ dom(π) s.t.π| [0,n] ∈ I and π ∈ ϕ (resp.π ∈ ϕ).We denote the set of all winning and possibly winning plays for (G, I , ϕ) by WinningPlays(G, I , ϕ) and WinningPlays(G, I , ϕ), respectively.f) Strategies: A system strategy is a partial function f : (X × Y ) + × X ⇀ Y such that1 f (w, x) ∈ ρ(x, ⌈w⌉ 2 ) for all (w, x) ∈ dom(f ).An environment strategy is a left total2 function g : (X × Y ) + → X such that g(w) ∈ δ(⌈w⌉) for all w ∈ (X × Y ) + .We denote the sets of system and environment strategies over G by S s (G) and S e (G), respectively.A play π ∈ G with π(k) = (x(k), y(k)) for all k ∈ N is compliant with f ∈ S s (G), g ∈ S e (G) and I = (X × Y ) + ⊆ G if there is an n ∈ dom(π) such that π| [0,n] ∈ I and for all k ∈ dom(π), k > n, we have (2) l = 0 1 2 3 4 5 6 7 8  1 2 3 4 5 6 7 8 9 10111213141516 . . .Floor plan of the 5th and 6th floor of a six-story building.Using the depicted coordinates, we denote by q k ij and r k ij , respectively, the cell and the room in the ith column and jth row of floor k.Furthermore, s ij , i < j denotes the stair case from floor f i to floor f j .The workspace of this building is partitioned into grid cells (bottom), rooms (middle) and floors (top) which serve as abstraction layers l = 0 to l = 2 as discussed in Sec.II-B.The line of dots depicts a path of the robot from the initial state (light gray) to the final state (dark gray) in every layer.Filled circles denote projected states while non-filled circles denote abstract (but not projected) states, as discussed in Expl.2-3.
The set of plays compliant with f , g and I is denoted by CompliantPlays(f , g, I ) and we define CompliantPlays(f , I ) := g∈S e (G) CompliantPlays(f , g, I ). (3) The set of winning strategies for (G, I , ϕ) against g ∈ S e (G) is denoted by WinningStrategies(G, I , ϕ, g) and we define WinningStrategies(G, I , ϕ) = g∈S e (G) WinningStrategies(G, I , ϕ, g).
A system strategy which is not dominated is called admissible.The set of admissible strategies in the play (G, I , ϕ) is denoted by AdmissibleStrategies(G, I , ϕ).
g) The Synthesis Problem: The (unconstrained) synthesis problem takes as input a game (G, I , ϕ) and asks if there is a winning system strategy for the game.In most applications, the requirement that the system wins against any adversarial environment strategy is too stringent.The constrained synthesis problem additionally takes as input an assumption that models "likely" behaviors of the environment as a set of plays ζ ⊆ G .Intuitively, the constrained synthesis problem asks if the system can win provided that the environment player is restricted to play strategies that ensure ζ.In the presence of environment assumptions, the synthesis problem looks for assume-admissible winning strategies for the system (see Brenguier et al. (2015) for a discussion why this is an appropriate notion).
By swapping the roles of system and environment we can equivalently define winning and admissible strategies for the environment in the game (G, I , ζ) as before.Then a system strategy f is assume-admissibly winning for (G, I , ϕ) w.r.t.ζ (Brenguier et al. (2015) It should be noted that every winning strategy is assume-admissibly winning w.r.t.any assumption, but not vice-versa.

B. Example
To illustrate the theoretical results and their accompanying assumptions in this paper, we consider a robot that moves in a six story building with known floor plan, depicted in Fig. 1 (bottom) for floors 5 and 6.
To model this problem as a two-player game graph G, we partition the workspace into small cells which form a uniform grid.The resulting grid cells are enumerated by an index set Q.By assuming that the robot can only be in one grid cell at a time, the system state set is given by Y = Q.We furthermore define the set of environment states by X = 2 Q , where a state x ∈ X is a set containing all grid cells which are currently occupied by an obstacle.
This modeling formalism implies that each grid cell in Fig. 1 (bottom) represents a system state.We model additional properties by adding other binary variables.For example, by adding a predicate Bottle to the system state, we model whether the robot is carrying a bottle or not.As this additional variable might be true in any grid cell, the resulting system state set would consist of two copies of the grid world in Fig. 1 (bottom), where one is annotated with Bottle and the other one is not.To keep notation simple, such additional predicates are mostly neglected in this example.
The system transition map ρ in G results from applying an appropriate abstraction method for continuous dynamics, e.g., Tabuada (2009), while adding the obvious restrictions that (i) the robot cannot move into an obstacle-occupied cell, and (ii) the robot can only move to adjacent cells that are not separated by a wall.For the environment transition map δ several levels of detail can be used to model the movement and (dis)appearance of obstacles, see e.g., Wong et al. (2013); Vasile and Belta (2014) for examples.Now consider a task for the robot which asks it to reach a specific room on a specific floor.This corresponds to a reachability winning condition.In our setting, the winning condition is captured by the language of all plays π such that there exists k ≥ 0 with π(k) = (x(k), y(k)) and y(k) is a cell in the specified room.(It can easily be described in linear temporal logic as well.)The synthesis problem for this specification over the game graph G finds a strategy (a controller for the robot) that ensures that the robot eventually reaches the room.
There are two challenges in applying reactive synthesis in this scenario.First, the requirement that the robot must reach the room against all possible environments is too stringent.In such a robot motion example the environment player naturally has a very rich set of possible moves.For the specification considered above, the environment can simply keep a couple of doors closed forever to prevent the robot to reach its goal.However, this adversarial behavior is very unlikely in a real world application as, e.g., employees in an office building will always eventually visit/exit their office.This is the reason why we introduce environment assumptions that constrain the problem.A natural environment assumption allowing to realize the above specification models that all staircases are always eventually unblocked, all doors get always eventually re-opened, and moving obstacles always eventually allow a passage to exit a room.
As discussed in Brenguier et al. (2014), one cannot simply perform reactive synthesis w.r.t.environment assumptions by considering the implication ζ ⇒ ϕ that requires the controller to ensure ϕ holds only on plays satisfying ζ.This is because the robot may win the game by simply violating the environment assumption (for example, by blocking a door and preventing the environment from opening it).Thus, we consider assume-admissible strategies in this paper.
The second challenge is that of scalability.In any realistic model of our problem, the number of states is so large that existing reactive synthesis tools do not scale.Our main contribution in this paper is to scale up reactive synthesis techniques by considering local structure.We now consider this in more detail.
As depicted in Fig. 1, there is a natural hierarchy on the states of the workspace imposed by rooms and floors.That is, the workspace can also be partitioned using the set of rooms R or the set of floors F as index sets. 3This partition introduces two abstraction layers with decreasing precision with system state sets Y 1 = R and Y 2 = F .The set of environment states in layers 1 and 2 are defined as the set of closed doors X 1 = 2 D and the set of blocked staircases X 2 = 2 S , respectively.Even though the three layers in Fig. 1 are constructed separately, there is a natural abstraction relation between system states f ∈ F , r ∈ R, and q ∈ Q.A system state q is obviously related to the system state r if the grid cell q is "inside" room r.Furthermore, a door d is marked as closed if all cells intersecting with this door are occupied by an obstacle (usually being the door itself in this case), inducing a relation between environment states of layers 0 and 1.In Section III, we present abstract game graphs (AGGs) which capture such hierarchies in reactive games.
The abstraction relations naturally decompose every layer in the example into small, local game graphs located "inside" a higher level system state: the game graph G is decomposed in local game graphs G r , r ∈ R.This is possible for this example as the set of possible moves in one room is independent from the part of the environment state that does not belong to this context, e.g., all the obstacles contained in the set x that are not located inside this room.In Section IV, we introduce local game graphs (LGGs) which decompose AGGs to model this locality within the hierarchy.
To exploit this local structure in reactive synthesis, we additionally require that the specification is also given as a set of local specifications, one for each local game; otherwise, there is no obvious way to automatically break a global specification into local synthesis problems.For example, for the reachability task, one can consider a specification of reaching a room at the higher layer, and reaching from one point of a room to a prescribed exit point in the lower layer.Correspondingly, notice that the environment assumptions can also be decomposed into layers.
As a second example, consider the more complex task: "Collect all empty bottles in the building and return them to the kitchen in the 5th floor." This task can be manually decomposed in a natural fashion as follows.The level 2 task asks the robot to visit all floors of the building and to return to floor 5 whenever its capacity to carry empty bottles is reached.While in one floor, the level 1 task asks the robot to visit all rooms until the carrying capacity is reached, and to visit the kitchen whenever the latter is true and the robot is in floor 5. Finally, the level 0 tasks ask the robot to search for empty bottles in a single room, approach each bottle and pick it up.In this paper we assume that both the system specification and the environment assumptions are already given in a decomposed manner.The automatic decomposition of a global winning condition into local ones is an orthogonal, difficult, problem.
In Section IV-B, we define hierarchical reactive games (HRGs) by combining the set of LLGs over hierarchies with a set of local winning conditions and a set of local environment assumptions.This generates a set of local games over an LGG w.r.t. a local specification ϕ and a local assumption ζ.
The main challenge for reactive synthesis for HRGs is that the games played at the various layers interact.That is, a strategy at a higher layer ("go to the kitchen") introduces additional constraints at the lower layer ("the higher level strategy requires that the robot should go to the exit that takes it to the kitchen").In Section V, we provide a synthesis algorithm that computes a dynamic controller for HRGs.The controller computes assume-admissible strategies for each local game, and dynamically updates the winning conditions and strategies through the hierarchy.We prove that the algorithm is sound and that it aborts the game only when a local subgame cannot be won by the system against admissible strategies of the environment.

III. HIERARCHICAL DECOMPOSITION
We now introduce a hierarchy of L two player game graphs where the higher layers are a more abstract representation of the original game graph at layer l = 0.

A. Layering, Abstract Plays, and Timescales
Notice that while the system abstraction function maps system states at level l−1 to system states at level l, the environment abstraction function α l e maps a pair (x, y) of environment and system states at level l − 1 into an environment state at level l.This allows us to incorporate the loss of direct control with increasing abstraction level, as illustrated in the following example.
Example 1: Consider the robot in Sec.II-B and assume that the system states of layer 0 are extended by the binary variable Bottle, resulting in the state {q, Bottle} if the robot is in cell q and carries a bottle and the state {q} if the latter is not true.In this example, a transition from state {q} to {q, Bottle} is enforceable in layer 0 if there is a bottle in cell q (which can be modeled by a corresponding environment variable) assuming that the robot can always pick up a bottle when it is in this cell Now assume that the specification in the room level asks the robot to go to the kitchen, if it is carrying a bottle.To realize this task, a strategy in layer 1 does not need to enforce the robot to pick up a bottle in a particular room (because it might not actually know in which rooms bottles are located) but only observe that the latter happened.This intuition can only be modeled if Bottle is included in the environment states rather than the system states of layer 1.To be able to trigger this environment variable in layer 1 when the robot picks up a bottle, the tuple (x, {q, Bottle}) ∈ X 0 × Y 0 must be projected to an environment state {Bottle} ∪ x ′ ∈ X l using the map α 1 e .⊳ For notational convenience, we define the composition of abstraction functions and the special cases x = α 0 ↑ e (x, y) and y = α 0 ↑ s (y).A layering induces an abstraction for a play π ∈ G for each layer l > 0 as follows.Given a game G, a play π ∈ G , and layers X l , Y l L l=0 with abstraction functions α l e and α l s , we define the set of abstract plays and π l (0) = (α l ↑ e (x(0), y(0)), α l ↑ s (y(0))).Intuitively, the abstract plays in Π are an abstraction of the play π which becomes coarser the higher the layer, as multiple system and environment states are clustered into one state in a higher level.Specifically, this implies that state changes occur less frequently in a higher level than in the play π as outlined in the following example.
Example 2: Consider the path of the robot depicted by filled cycles in Fig. 1 (bottom).This path represents the system state component y of a play π ∈ G .Applying the second line of (6), this sequence y can be abstracted to layer l = 1 and l = 2 as follows.y = q 5 22 q 5 23 q 5 33 q 5 43 q 5 53 q 5 54 q 5 55 q 5 56 . . .y 1 = r 5 . . .y 2 = f 5 f 5 f 5 f 5 f 5 f 5 f 5 f 5 . . .The abstract sequences y 1 and y 2 are depicted in Fig. 1 (middle) and (top), respectively.The state changes in levels 1 and 2 correspond to changes in rooms and floors, respectively.While the state at level 0 changes in each time step, observe that state transitions in layers 1 and 2 only happen irregularly and not at every time point.It should be noted that environment states in layer 1 and 2, i.e., the set of closed doors and blocked stairs, can change independently from system state changes and is not illustrated in Fig. 1. ⊳ Expl. 2 illustrates that an abstract play π l is usually not turn-based.To obtain a turn-based game and to remove redundant information, we introduce a new time scale for every layer which is triggered by changes in the system states in an abstract game π l as follows.Given a play π ∈ G and a layer l ∈ [0, L], the timescale transformation κ l of π in layer l is the identity function if l = 0, and defined by the strictly monotone sequence κ l ∈ N ∞ s.t.
otherwise.The set of projected plays Π = {π l } L l=0 of π with πl = (x l , yl ) is defined as the sub-sequence of the abstract play π l at time points given by κ l for every l ∈ A projected play π is called infinite if |π| = ∞ and finite otherwise.While plays π ∈ G can always be made infinite (by the serial assumption on the transition relations), its projection πl to layer l > 0 need not be infinite.For example, if the robot from Sec. II-B should just move within room r 5 11 , this obviously induces an infinite play π.However, its projection to the room layer is given by π1 = r 5 11 , i.e., π1 is finite with length 1. Example 3: Consider the abstract sequences y 1 and y 2 in Expl. 2. Using ( 7) and ( 8) their induced time scale transformations are given by κ 1 = 0 3 6 . . .and κ 2 = 0 20 and the resulting projections for layer 1 and 2 are given by y1 = r 5 11 r 5 12 r 5 22 . . .and y2 = f 5 f 6 corresponding to changes in rooms and floors respectively at those times.In Fig. 1, system states of projected plays are depicted by filled circles, whereas states only belonging to abstract plays are depicted by non-filled cycles.⊳ It can be easily shown (see Lem. 1 in App. ) that the range of the timescale transformation κ l+1 is a subset of the range of κ l ; if there is an event at the (l + 1)st layer, there is a corresponding event at the lth (and so, in each lower) layer.Using this observation we can simplify notation by defining to denote the position in the lth layer of the kth event in the (l + 1)st layer.

B. Abstract Game Graphs
Using the notion of abstract states and plays from the previous section, we now construct game graphs for every layer l.We remark that the actual game is only played in the lowest layer, i.e., in the game graph G, and the higher layers only model projected plays of this game.
Definition 1: Let G = (X , Y , δ, ρ) be a game graph, and X l , Y l L l=0 a layering of G using the abstraction functions α l e and α l s .Then we define the set of abstract game graphs (AGG) and for l = 0 by G 0 := G. ⊳ Intuitively, the maps δ l and ρ l collect all transitions that can occur in projected plays πl of possible lowest level plays π ∈ G , as illustrated in the following example.It should be noted that all lowest level plays π are existentially quantified in (10), i.e., all possible plays in the lowest layer are considered.
Example 4: Consider the play π ∈ G and its abstract play π 1 depicted in Fig. 2. The existence of the play π introduces the depicted system and environment transitions using (10a) and (10b), respectively.Observe that the construction considers every environment change (induced by the play π) as an environment transition from the environment state at the last triggering instance indicated by κ.Furthermore, system transitions are only generated at triggering times.It can be seen in Fig. 2 that the environment state in layer l > 0 possibly changes multiple times before a system state change follows.⊳ The construction in Def. 1 allows us to prove that projected plays πl as defined in ( 8) are also plays in the game graph G l , i.e., πl ∈ G l .Intuitively, the proof shows that there always exist transitions, as the ones emphasized in Fig. 2, connecting system and environment states at triggering times.
Proposition 1: For any game G, any play π ∈ G , and any l ∈ [0, L], we have that πl is a play in G l , i.e., πl ∈ G l .Proof: The claim follows directly from Lem. 2 in App. as (1) holds for πl and G l when we pick n = κ l (m + 1) in (35).

IV. CONTEXT-BASED DECOMPOSITION
A set of AGGs imposes an abstraction hierarchy on top of a given game graph G.However, AGGs by themselves are not enough to decompose a synthesis problem.For example, if the winning condition is given by a set of plays on the lowest layer, the induced abstraction layers cannot be exploited by a synthesis algorithm.In order to derive an efficient synthesis technique, in this section, we introduce the second ingredient: local winning conditions, which induce local game graphs.
Roughly, a local winning condition for the game G l at layer l is a set of abstract plays π l whose states belong to a single state at layer l + 1.For example, reaching a different floor is a local specification at layer 2. A synthesis procedure to enforce ϕ L would require solving games at lower levels; in our example, the robot will have to successively reach a set of rooms, followed by the stairs to achieve its goal.Each of these "lower level" games occur in, roughly, the "local" game structure defined by states in the lower level that map to the current state of the higher level.We formalize this notion as local game graphs.

A. Local Game Graphs over Hierarchies
Fix a layer l and consider the games G l and G l+1 .Consider a system state ν ∈ Y l+1 .A first attempt to define a local game is to restrict the game G l to the set of system states {y ∈ Y l | α l+1 s (y) = ν}.However, this is not sufficient, because plays in the local game should be allowed to leave the region specified by ν for one step at the end.This is necessary to ensure that plays in consecutive local games can be concatenated to form a play over the game graph G l without formalizing a special reset action, as e.g., used in modular games by Alur et al. (2003).To account for these states, we introduce the Post operation: Including the one-step post states allows us to view the actual game as a layer 0 game and use the hierarchical and local decompositions as modeling formalism for hierarchical controller synthesis only.
Considering environment states instead of system states, a straightforward restriction to a context ν is not naturally given by α l+1 ↑ e , as the following example shows.Example 5: Consider the example from Sec. II-B and its floor plan depicted in Fig. 3. Recall from Sec. II-B that an environment state x ∈ X 0 contains all grid cells that are occupied by an obstacle.However, by playing a game in room r 5 11 one is only interested in obstacles that are located inside Y 0  1 2 3 4 5 6 7 8  1 2 3 4 5 6 7 8 9 10111213141516 . . .and Y 1 f 5 , respectively.The three arrows denote context changes requested by layer l which induce a reachability specification for layer l − 1 whose initial and goal states are depicted in light and dark gray, respectively.0 of Fig. 3.For notation convenience, we define r L as the identity map.Using the above intuition, we define local game graphs as follows.
Definition 2: Given an AGG G l , the local game graph (LGG) and transition maps δ l ν : x ′ ∈ δ l (x, y) ∧ y ∈ Y l ν⌉ ⇒ r l ν (x ′ ) ∈ δ l ν (r l ν (x), y) and (13a) We write ∪ {G L } for the set of LGGs over G. ⊳ Example 6: Consider the example from Sec. II-B and its floor plan depicted in Fig. 3.The striped areas in layers 0 and 1 correspond to the context restricted system state sets Y 0 r 5 11 and Y 1 f 5 , respectively.It is easy to see that Y 0 r 5 11 ⌊ = {q 5 25 , q 5 43 } and Y 1 f 5 ⌊ = {s 56 }, while layer l = 2 is not decomposed.⊳ In the robot example of Sec.II-B the generated set of LGGs is "truly local" in the sense that the local system dynamics do not depend on environment variables from other contexts.E.g., an obstacle in another room r ′ does not influence the dynamics of the robot in room r = r ′ .This inherent decomposability of the system dynamics, similar to the natural relations among states of different layers, is a feature of the system we want to control which is necessary for the subsequently proposed synthesis algorithm and formalized in the following assumption.
Assumption 1: For every layer l ∈ [0, L − 1] and context ν ∈ Y l+1 it holds for all x ∈ X l and y ∈ Y l ν⌉ that (14) It should be noted that the right hand side of ( 14) uses ρ l instead of ρ l ν .Therefore, ρ l ν ⊆ ρ l if Ass. 1 holds, which implies that in this case (13) holds in both directions.
Similarly to Prop. 1 we can prove that the part of a play π l that takes place in context ν is actually a play in G l ν .However, to formalize this we need to define local plays which are projected to the current context.Given a set of LGGs [G], a play π ∈ G 0 and its sets of abstract and projected plays Π and Π, the local restriction of π l and πl is defined for all m ∈ dom + (π l ) by The restriction of x l (m) (resp.xl (m)) at time k = κ l (m) is defined w.r.t. the last system state y l+1 (k − 1) as y l+1 (k) is only available after the next system move that is depended on x(k).The local restriction πl ↓ of the projected play introduces a sequence pl ↓ of local projected plays defined by ∀m ∈ dom + (π l+1 ) .pl where end(w) = |w| − 1 denotes the time of the last element of w.We write [p] π := pl for the set of all such sequences induced by π, where pL ↓ (0) = πL and end(p L ↓ ) = 0. Example 7: Consider the play π whose y-component is depicted by filled cycles in Fig. 1 (bottom).For illustration purposes, assume a static environment with a closed door between room r 5 11 and r 5 12 , denoted by the binary variable d, and an obstacle in q 5 63 .The closed door, which is an environment variable for layer 1, corresponds to obstacles in q 5 24 and q 5 25 for layer 0. For this play, the local plays contained in the set [p] π are given by the following strings.p0 ↓ (0) = ({q 5 24 , q 5 25 }, q 5 22 )({q 5 24 , q 5 25 }, q 5 23 )({q 5 24 , q 5 25 }, q 5 33 )({q 5 24 , q 5 25 }, q 5 43 ) p0 ↓ (1) = ({q 5 24 , q 5 25 }, q 5 43 )({q 5 63 }, q 5 53 )({q 5 63 }, q 5 54 )({q 5 63 }, q 5 55 ) . . .
where {⊥} denotes that no obstacles are present.Due to the definition of Y l ν in Def. 2, contexts of neighboring cells overlap.This is also visible by the above local plays, which overlap for one time instant.E.g, the state ({q 5 24 , q 5 25 }, q 5 43 ) belongs both to p0 ↓ (0) and p0 ↓ (1), which are the local plays in context Y 0 (17) Proof: (17) follows by combining the last lines of (36a) and (36b) in Lem. 3 proven in App. .

B. Hierarchical Reactive Games over Sets of LGGs
We have seen in the example of Sec.II-B that the motivation for constructing LGGs comes from the natural decomposability of system dynamics, environment assumptions and tasks into local and global components which are naturally restricted to a context ν ∈ Y l+1 .Recall that local specifications should intuitively only contain finite strings to eventually allow progress in the higher layer upon completion of the local task.This observation is formalized as follows.Given a set [G] of LGGs, layer l ∈ [0, L − 1], and context ν ∈ Y l+1 , the sets Example 8: Consider the floor plan in Fig. 3 and assume that the robot is in state q 5 22 corresponding to the states r 5 11 and f 5 in layers l = 1 and l = 2, respectively, as indicated by the light gray coloring.Now assume that the controller in layer l = 2 requests a context change from f 5 to f 6 .This induces the reachability specification ψ 1 f 5 (f 6 ) containing all sequences of rooms in G 1 f 5 with final room s 56 .Now a memoryless strategy for this specification first needs to request a context change from r 5 11 to r 5 21 .This request, in turn, induces the reachability specification ψ 1 r 5 11 (r 5 21 ) containing all sequences of cells in G r 5 11 with final cell q 5 43 .A possible first move of the robot to fulfill this specification is from q 5 22 to q 5 32 .The respective goal states of the two specifications are indicated in dark gray in Fig. 3. ⊳ The construction in ( 21) implies that only a (possibly strict) prefix ξ of a play π ∈ φ l ν (ν ′ ) needs to be contained in ϕ l ν .While this might seem restrictive for non-suffix closed specifications such as safety, one can circumvent this problem by using the idea of "weak until".Intuitively, one would specify to stay safe, i.e., only visit states from a set Q safe , "until" the context is left.Then (21) checks if the current requested context change can be enforced by staying in safe states.For reachability type specifications, such as the request of the completion of a certain task, this issue does not arise.
Given the above definitions of local specifications, hierarchical reactive games can be constructed from a set of LGGs as follows. Definition a set [p] π is defined to be winning (resp.possibly winning) for , and (iii) πL is winning (resp.possibly winning) for (G L , I L (0), ϕ L ). ⊳

V. ASSUME-ADMISSIBLE HIERARCHICAL STRATEGY CONSTRUCTION
Let ([G], I , [ϕ]) be a HRG with initial condition I ∈ (X × Y ) and let [ζ] be a set of local environment assumptions over [G].Then we want to synthesize a strategy (i.e., a controller) for layer 0 that generates a play whose projection is winning for the set of local system specifications [ϕ] if [ζ] holds.We assume that [ϕ] and [ζ] are both ω-regular languages.While in principle one can flatten the game and solve one global game to obtain a solution to this problem, this will be prohibitively expensive.We therefore propose an algorithm that constructs a winning strategy in each local game that is encountered and "stitches together" these winning strategies dynamically.Additionally, one could statically solve and memorize all possibly constructed local games.Our algorithm avoids this expensive construction by only solving games that actually arise online.Hence, our procedure is dynamic in that it solves a series of local games in each step starting from the current state -this is conceptually similar to receding horizon control approaches.To incorporate environment assumptions, we use a slightly modified version of the algorithm from Brenguier et al. (2015) to compute an assume-admissible winning strategy for a local game and a local environment assumption.Our procedure treats this algorithm as a black box; in principle, a different strategy synthesis algorithm can be used.

A. Synthesis of Assume-Admissibly Winning Strategies
Assume-admissibly winning strategies for the play (G, I , ϕ) w.r.t. the assumption ζ can be computed by the algorithm given by Brenguier et al. (2015, Thm. 4) in case ϕ and ζ are ω-regular objectives.We denote the outcome of this strategy synthesis by Sol AA (G, I , ϕ, ζ).Whenever the environment does not play admissible, the definition of assume-admissibly winning strategies does only restrict the behavior of the system to an admissible one.This does not give any guarantees w.r.t.ϕ in case the environment does not play admissible.To circumvent this issue we slightly modify the outcome of the available strategy synthesis.
Definition 4: Let f AA = Sol AA (G, I , ϕ, ζ) be an assume-admissibly winning strategy, then its associated possibly winning strategy f , is defined for all π ∈ G s.t. else. (23) We define the set of all possibly winning strategies for the game (G, I , ϕ) w.r.t.ζ by Sol (G, I , ϕ, ζ).⊳ A strategy f = Sol (G, I , ϕ, ζ) blocks whenever the environment forces the play into a state from which the play cannot be won anymore.This implies that all finite plays π compliant with f are possibly winning, i.e. π ∈ ϕ, even if the environment does not play admissible.However, if it does, the compliant play is winning.This is formalized by the following proposition.
Proof: Let f AA = Sol AA (G, I , ϕ, ζ) and f its associated possibly winning strategy.Using (4), g ∈ AdmissibleStrategies(G, I , ζ) implies f AA ∈ WinningStrategies(G, I , ϕ, g).Using (3), this implies π ∈ WinningPlays(G, I , ϕ).Therefore, the second case in ( 23) cannot occur and we obtain f = f AA , i.e., f ∈ WinningStrategies(G, I , ϕ, g).Observe that the left side of (24b) implies that the right side of (2) holds for π and f , hence We remark that the algorithm to compute assume-admissible strategies in Brenguier et al. (2015, Thm. 4) can be trivially adapted to ensure Prop.3, by blocking the game whenever a losing state (one in which there is no winning strategy for the system) is entered.

B. The Strategy Synthesis Algorithm
Recall that we aim to synthesize a strategy (i.e., a controller) for layer 0 that generates a play whose projection is assumeadmissible winning for the HRG Hence, the goal of each computation round of our algorithm is to determine the next system state y(k + 1) in layer 0, i.e., to calculate the current control action that needs to be applied to the system.This depends on the environment state x(k + 1) in layer 0 which is sensed in the beginning of each such computation round and projected to all layers l ∈ [1, L] in an "bottom up" fashion.The current state in every layer local game is given by the restriction of x l (k + 1) to the current context and the projection y l (k) of the last system state.Based on this information, the next step in every layer local game needs to be calculated.
This calculation is challenging due to the interaction between plays in different layers.In particular, a move from system state ν to ν ′ requested by a strategy in layer l ∈ [1, L] results in an additional reachability specification for the current local game in layer l − 1.Furthermore, such an "induced" reachability specification for the local game in layer l − 1 and context ν might change multiple times, before this context is left.This is due to the fact that an environment state in layer l > 0 possibly changes multiple times before a system state change follows, as discussed in the construction of abstract game graphs (see Sec. III-B).Hence, whenever such a specification change occurs, the strategy in layer l − 1 needs to be re-calculated.The only strategy that is not influenced by this interplay is the highest level strategy, which is computed only once when initializing the algorithm.Once the strategies are updated in a "top down" manner, the controller picks the next move at layer 0 based on the updated strategy for layer 0 and plays it.This changes the states for all higher layers and the algorithm continues with the next computation cycle.
We now describe the algorithm formally.⊲ Using I L as in ( 22), calculate the assume admissible winning strategy for the highest layer L using ⊲ Initialize the play and the local history, respectively, with π = (x(0), y(0)) = I and γl (0 ◮ Iteration for all k ∈ N: ⊲ Sense the environment move ⊲ Compute the local environment state x l ↓ (k + 1) using ( 6) and (15a), i.e., for each layer l; ⊲ Iteratively calculate the current strategy by with and the predicates are defined by ⊲ Play the next move following the current system strategy for layer l = 0 ⊲ Append (x(k + 1), y(k + 1)) to the play giving ⊲ Using (16b), compute the new context restricted history ) As discussed before, every computation round k of the construction in (25) starts with the sensing of the next environment move in (25c), giving the full 0-level environment state x(k+1) = x 0 (k+1).This state is used to compute the local restricted environment states x l ↓ (k +1) for every layer and current context y l+1 (k) in (25d).Note that this construction is done "bottom up".
Thereafter, the selection of the current strategy f l for every layer and its respective current goal state ν ′l are calculated.Observe that this is done "top down", as ν ′l is used to calculated the current reachability specification for the reachability game in layer l − 1.The construction of f l in (25f) distinguishes three cases: the play at the highest layer has been won, or the play at the higher layer got stuck, or none of these conditions occurred.We consider these cases separately.
For the first case observe, that the specification of level L might be a set of finite strings and local specifications are sets of finite strings by definition (see Sec. IV-B).Therefore, the play constructed in (25) does not need to be infinite to be winning for [ϕ].If the play in layer L is winning for ϕ L and the strategy does not request any other move (denoted by the predicate Done L in (25k)), then this is communicated downwards using the second line of (25f).In this case all lower level strategies must be winning for local specifications only, using the assume-admissible strategy calculated in (25h).
For the second case, observe that the strategy calculation in (25h) and (25i) does not need to have a solution.Further, even if it has a solution, system strategies are not assumed to be left-total.Hence, there might exist (non-admissible) environment moves that cause a blocking of f without the game being winning.These two situations are modeled by the predicate GotStuck l in (25k).If such a situation occurs, it is communicated downwards by the first line of (25f) resulting in GotStuck l ′ for all l ′ < l and therefore an abortion of the game.Intuitively, the first time GotStuck l occurs, it is because of an "unrealizeable" local specification.We introduce a fourth predicate to remember the first layer at which the controller got stuck.We will show in Sec.V-C that an unrealizable specification is the only reason for a non-winning play constructed in (25) to be aborted.In the third case, i.e., if neither GotStuck l nor Done l+1 is true, the strategy for level l is calculated by (25i) using again two subcases.In the first subcase, either a new context was entered (resulting in a new local game) or the "top down induced" reachability specification has changed (due to a change of ν ′l caused by a new environment state in layer l + 1).In this case the strategy for level l needs to be re-calculated.However, if neither of these two situations occurs, the strategy from the previous time step can be used, avoiding unnecessary re-computations.
After the strategy construction in (25f)-(25l), the system state is updated to y(k+1), using the currently selected lowest level strategy f 0 (k) in (25m).Hence, (25f)-(25l) only utilize the hierarchical structure of the game graph to compute f 0 (k), which is the only control action that is actually applied to the system, e.g., the robot in our example.Then (x(k + 1), y(k + 1)) is appended to the constructed play π.As intuitively assumed, such plays π generated by Alg. 1 up to length k are plays in G, i.e., π ∈ G , as shown in the following proposition.Observe, that this implies that also πl ∈ G l for all l ∈ [0, L] (from Prop. 1) and pl ↓ (m) ∈ G l yl+1 (m) for all l ∈ [0, L − 1] and m ∈ dom(π l+1 ) (from Prop.2).Proposition 4: Let π be a play computed in Alg. 1. Then π ∈ G .
Proof: It follows from ( 25c) and (25m) that and the definition of the latter in Sec.II gives . Now observe from (25o), ( 16b) and ( 8) that ⌈γ 0 (k − 1)⌉ 2 = y 0 (k − 1).Now using ρ 0 y 1 (k−1) ⊆ ρ 0 from Ass. 1 along with this observation, we see that (27) actually implies (1), hence π ∈ G .We call a play π calculated in (25 One round of the construction in ( 25) is ended by calculating the current local histories γl (k+1) for every layer.Intuitively, γl (k + 1) models the part of πl generated after the last context change in layer l and is therefore equivalent to ⌈p l ↓ ⌉.These histories are used in the calculation of assume-admissible strategies to ensure that a re-computation of a strategy within one context does result in a continuation of the already generated string w.r.t. the given specification.
While the local system strategies f l (k) are explicitly calculated for every time step k in (25f)-(25l), the local environment strategies g l (k) are only given implicitly by the observed environment move (25c) and its abstraction to every layer l.Formally, a play π calculated in (25) was played against an admissible environment strategy if for all l ∈ [0, L − 1], m ∈ dom(π l ) exists an environment strategy g If this holds, we call π an environment admissible play.

C. Soundness
In this section we prove three different soundness results for the play constructed in Alg. 1. Intuitively, Alg. 1 is sound if a play π calculated in ( 25) is winning for the HRG ([G], I , [ϕ]) if all generated local specifications are realizable and the environment plays admissible w.r.t.[ζ], which will be proven last in Thm. 3. As a first intermediate result we show that the only two reasons for a maximal play to terminate are actually that (i) a current local specification is not realizable or (ii) the play is already winning given a finite winning condition in layer L.
Theorem 1: Let π be a maximal play computed by ( 25).Then it holds that Proof: To prove this theorem we need that which is proven for all k ∈ dom(π) in Lem. 5 (see App. ).Furthermore, as we assume environment strategies to be left-total, (25c) can always be computed.Hence, π becomes finite while being maximal iff (25m) cannot be evaluated, i.e., Now we pick k = end(π) and prove both directions separately."⇒" Using (30b) and (25l) implies that either (i) ¬Done 0 (k) and GotStuck 0 (k), or (ii) Done 0 (k).Using (30a), (i) implies4 (29).right.2 .As Done 0 (k) implies ∀l ∈ [0, L] .Done l (k) (from (25k)), (ii) implies (29).right.1 ."⇐" If (29).right.2 is true, it follows from (30a) that GotStuck 0 (k) and ¬Done 0 (k) (see the proof of Lem. 5).Hence, ( 29) and (30b) implies (29).left .If (29).right.1 is true, we know from (25f) that f 0 (k) = h 0 (k).Therefore, (25k).right.3 and (30b) implies (29).left .While the second case in Thm. 1 is not desired w.r.t. the goal of constructing a winning play, it can usually not be avoided in a realistic scenario as we can (i) not enforce the environment to play admissible and (ii) checking feasibility of all possibly occurring local games before startup might not be appropriate, as this set might be very large.However, Alg. 1 ensures that if this situation occurs, the local specifications are not falsified up to this point.This is formalized by the notion of possibly winning, which ensures that generated finite plays always stay in the prefix closure of the considered local specifications.
Theorem 2: Given the preliminaries of Alg. 1, let π be the play computed by (25) up to length k, and [p] π its set of local projected play sequences.Then [p] π is possibly winning for ([G], I , [ϕ]).
We now prove the main result of this paper, namely that maximal plays π calculated by Alg. 1 (finite and infinite) are actually winning for ([G], I , [ϕ]) if the environment plays admissible and all constructed local plays have a solution, i.e., Theorem 3: Let π be a maximal and environment admissible play computed by ( 25 Proof: In this proof we use the following two observations where (34a) was proven in Lem.11 (see App. ), the left side of (34b) follows from Thm. 1 and (33), and the right side of (34b) is a simple consequence from the definition of projections in (8).Hence, we generally have two cases to consider when proving the three conditions for winning HRGs from Sec. IV-B.First observe that condition (i) is equivalent for winning and possibly winning, no matter whether π is finite or not.It therefore follows directly from Thm. 2. Furthermore, condition (ii) only needs to be proven if |π l+1 | < ∞ and recall that for this case Thm. 2 shows that pl ↓ (end(π l+1 )) is possibly winning for (G l yl+1 (m) , ⌈p l ↓ (m − 1)⌉, ϕ l yl+1 (m) ) for all l ∈ [0, L].Now observe from (34b) that Done l (end(π)) which implies from (25k) and (25j) that pl ↓ (end(π l+1 )) = γl (end(π)) ∈ ϕ l yl+1 (m) , where the first equality follows from (25o) and ( 16).This obviously implies that pl ↓ (end(π l+1 )) is winning in the above game.For finite plays, this reasoning also proves condition (iii).We therefore assume |π L | = ∞ and recall from the proof of Thm. 2 that (2) holds for πL w.r.t.h L and I L (0).As |π L | = ∞ we have πL ∈ CompliantPlays(h L , I L (0)).As h L = Sol G L , I L (0), ϕ L , ζ L and πL ∈ G L (from Prop. 4 and Prop. 1) and g L ∈ AdmissibleStrategies(G L , I L (0), ϕ L , ζ L ), it follows from (24b) in Prop. 3 that πL is winning for (G L , I L (0), ϕ L ).
The important difference between Thm. 2 and Thm. 3 is that environment admissible infinite plays can only be generated if layer L does not win in finite time, i.e., ¬Done L (k) for all k ∈ dom(π L ).If the environment does not play admissible, infinite plays can also be generated if Done L (k) is true, as the environment might never "help" to reach the specification (i.e., does not play admissible) but also never moves to a losing state (i.e., causing the game to be aborted).
Remark 1: It should be noted that the algorithm in Alg. 1 works identically if we use a "usual" synthesis techniques to calculate winning (instead of assume-admissibly winning) strategies in Sol (•) (i.e., a procedure to solve the unconstrained synthesis problem).Such a procedure is obtained, e.g., from the methods by Zielonka (1998); Emerson and Jutla (1991) for general ω-regular conditions, or more specialized procedures for co-safe properties (given by sets of finite-length plays) by Kupferman and Vardi (2001); Ehlers and Finkbeiner (2011); Kupferman and Weiner (2012).This outlines the modularity of our approach w.r.t. the actual strategy synthesis routine used in local games.However, it should be noted that in realistic scenarios, local games will usually not have winning strategies against a purely adversarial environment.Nevertheless, if the game gets stuck due to such an unrealizable sub-game, the result from Thm. 2 still holds, i.e., the specification is not violated in this case.

D. Comments on Completeness
Intuitively, the synthesis procedure given in Alg. 1 is complete if, whenever there exists a strategy f over the game graph G s.t.all plays π ∈ G compliant with f induce a set of local play sequences that are winning for ([G], I , [ϕ]) (if the environment plays an admissible strategy), then there exists a hierarchical strategy F s.t.its compliant play π generated by (25) induces projected plays that are also winning for ([G], I , [ϕ]) (if the environment plays an admissible strategy).
Unfortunately, this statement is not true.The major problem arises from the fact that assume-admissibly winning strategies are usually not unique for a particular game.Therefore, using one particular strategy calculated by Sol (•) disregards other winning plays.This has two important consequences.First, a move of the current layer l strategy cannot be revised if the current layer l − 1 game is not realizable for the corresponding reachability specification, even if there exists a different possibly winning extension in layer l.In our robot example, this corresponds to the case where the robot is in a particular room r with two adjacent rooms r ′ and r ′′ , where visiting either of them is winning.Now the current strategy for the room layer deterministically picks room r ′ .If the way towards room r ′ is blocked by a static obstacle, the game in layer 0 and context r does not have a solution and the play gets stuck.
This problem also arises in reverse layer interaction, as assume-admissibly winning strategies are only ensured to be winning against a "local" admissible environment strategy.They do not consider admissible environment moves in higher layers that might cause specification changes in the current layer.Hence, the local strategy synthesis might pick a strategy that leads the play to a region of the state space which is losing for a different specification that might occur later in this game due to such an admissible environment move in a higher layer.In the above example this would correspond to the case that the door to room r ′ gets closed which is visible to layer 1 and therefore causes the strategy to request the robot to move to room r ′′ , instead.Now assume that the way towards both r ′ and r ′′ was unblocked initially.Given the specification to reach r ′ the robot might pick one of two passages which allow to reach r ′ but the selected one is to narrow for the robot to turn.When the specification changes, the robot cannot turn and approach r ′′ , hence the game in layer 0 and context r does not have a solution and the play gets stuck.Taking these interactions into account when synthesizing local assume-admissible winning strategies is a promising idea for future work to obtain a complete algorithm.This would also reduce blocking situations which are caused by this interplay.
Completeness holds in the special case of a trivial environment (which has no choice of moves) and the strategy only picks one among the choice of system moves (as e.g. in Kloetzer and Belta, 2008;Vasile and Belta, 2014).However, in this case, one can compute a strategy statically using a dynamic programming procedure similar to context free reachability (see Reps et al., 1995;Alur et al., 2003).

VI. CONCLUSION
We have shown in this paper how a large-scale reactive controller synthesis problem with intrinsic hierarchy and locality can be modeled as a hierarchical two player game over a set of local game graphs w.r.t. to a set of local strategies on multiple, interacting abstraction layers.We have proposed a reactive controller synthesis algorithm for such hierarchical games that allows for dynamic specification changes at each step of the play which is recalculated online in every step.This re-calculation becomes computationally tractable by the proposed decomposition.We have shown that our algorithm is sound: whenever the environment meets its assumptions and all dynamically generated local games have a solution, the controller synthesis algorithm generates a winning hierarchical play for a given specification.If these assumptions do not hold, the algorithm terminates but the generated finite play does not violate the specification up to this point.
Fig.1.Floor plan of the 5th and 6th floor of a six-story building.Using the depicted coordinates, we denote by q k ij and r k ij , respectively, the cell and the room in the ith column and jth row of floor k.Furthermore, s ij , i < j denotes the stair case from floor f i to floor f j .The workspace of this building is partitioned into grid cells (bottom), rooms (middle) and floors (top) which serve as abstraction layers l = 0 to l = 2 as discussed in Sec.II-B.The line of dots depicts a path of the robot from the initial state (light gray) to the final state (dark gray) in every layer.Filled circles denote projected states while non-filled circles denote abstract (but not projected) states, as discussed in Expl.2-3.

Fig. 2 .
Fig. 2. Generation of system and environment transitions for layer l = 1 from a play π as formalized in Def. 1 and discussed in Expl. 4.
of using α l+1 ↑ e to restrict X l to context ν, we use a restricting function r l ν .For Expl. 5, the map r 1 r 5 11 simply maps the set x of obstacle locations to the subset x ′ ⊆ x of such locations that are inside the striped area in layer l = 0

Fig. 3 .
Fig. 3. Floor plan from Fig. 1.The striped areas in layers 0 and 1 correspond to Y 0 r 5 11 we use the convention that the environment moves first, the environment variables of such overlapping states are always restricted to the context, which is currently left.⊳ Proposition 2: Let [G] be a set of LGGs and G l y the set of plays in G l y .Furthermore, let π ∈ G and [p] π its induced set of local projected play sequences.Then it holds for all l ∈ [0, L − 1] and m ∈ dom(π l+1 ) that pl ↓ (m) ∈ G l yl+1 (m) .

3 :
Given a set of local specifications [ϕ] over a set of LGGs [G] and a set of level 0 initial states I ⊆ (X × Y ), the tuple ([G], I , [ϕ]) is called a hierarchical reactive game (HRG) over [G].Furthermore, given the set of local initial conditions Algorithm 1 (Strategy Synthesis Procedure): Let ([G], I , [ϕ]) be a HRG with I ∈ (X × Y ) and [ζ] a set of local environment assumptions over [G].Then the dynamic hierarchical strategy F = {f l } L l=0 for the game ([G], I , [ϕ]) w.r.t.[ζ] and its compliant play π are iteratively defined as follows: ◮ Initialization: ) s.t.(33) holds and let [p] π be its set of local play sequences.Then [p] π is winning for ([G], I , [ϕ]).