Information Flow Guided Synthesis

. Compositional synthesis relies on the discovery of assumptions, i.e., restrictions on the behavior of the remainder of the system that allow a component to realize its speciﬁcation. In order to avoid los-ing valid solutions, these assumptions should be necessary conditions for realizability. However, because there are typically many diﬀerent behaviors that realize the same speciﬁcation, necessary behavioral restrictions often do not exist. In this paper, we introduce a new class of assumptions for compositional synthesis, which we call information ﬂow assumptions . Such assumptions capture an essential aspect of distributed computing, because components often need to act upon information that is available only in other components. The presence of a certain ﬂow of information is therefore often a necessary requirement, while the actual behavior that establishes the information ﬂow is unconstrained. In contrast to behavioral assumptions, which are properties of individual computation traces, information ﬂow assumptions are hyperproperties , i.e., properties of sets of traces. We present a method for the automatic derivation of information-ﬂow assumptions from a temporal logic speciﬁcation of the system. We then provide a technique for the automatic synthesis of component implementations based on information ﬂow assumptions. This provides a new compositional approach to the synthesis of distributed systems. We report on encouraging ﬁrst experiments with the approach, carried out with the BoSyHyper synthesis tool.


Introduction
In distributed synthesis, we are interested in the automatic translation of a formal specification of a distributed system's desired behavior into an implementation that satisfies the specification [26]. What makes distributed synthesis far more interesting than the standard synthesis of reactive systems, but also more challenging, is that the result consists of a set of implementations of subsystems, each of which operates based only on partial knowledge of the global system state. While algorithms for distributed synthesis have been studied since the 1990s [12,20,26], their high complexity has resulted in applications of distributed synthesis being, so far, very limited.
One of the most promising approaches to making distributed synthesis more scalable is compositional synthesis [7,10,14,21,27]. The compositional synthesis of a distributed system with two processes, p and q, avoids the construction of the product of p and q and instead focuses on one process at a time. Typically, it is impossible to realize one process without making certain assumptions about the other process. Compositional synthesis therefore critically depends on finding the assumption that p must make about q, and vice versa: once the assumptions are known, one can build each individual process, relying on the fact that the assumption will be satisfied by the synthesized implementation of the other process. Ideally, the assumptions should be both sufficient (i.e., the processes are realizable under the assumptions) and necessary (i.e., any implementation that satisfies the specification would also satisfy the assumptions). Without sufficiency, the synthesis cannot find a compositional solution; without necessity, the synthesis loses valid solutions. While sufficiency is obviously checked as part of the synthesis process, it is often impossible to find necessary conditions, because there the specifications can be realized by many different behaviors. Any specific implementation would lead to a specific assumption; however, this implementation is only known once the synthesis is complete, and an assumption that is satisfied by all implementations often does not exist.
In this paper, we propose a way out of this chicken-and-egg type of situation. Previous work on generating assumptions for compositional synthesis has focused on behavioral restrictions on the environment of a subsystem. We introduce a new class of more abstract assumptions that, instead, focus on the flow of information. Consider a system architecture (depicted in Figure 1a) where two processes a and b are linked by a communication channel c, such that a can write to c and b can read from c. Suppose also that a reads a boolean input in from the environment that is, however, not directly visible to b. We are interested in a distributed implementation for a specification that demands that b should eventually output the value of input in. Since b cannot observe in, its synthesis must rely on the assumption that the value of in is communicated over the channel c by process a. Expressing this as a behavioral assumption is difficult, because there are many different behaviors that accomplish this. Process a could, for example, literally copy the value of in to c. It could also encode the value, for example by writing to c the negation of the value of in. Alternatively, it could delay the transmission of in by an arbitrary number of steps, and even use the length of the delay to encode information about the value of in. Fixing any such communication protocol, by a corresponding behavioral assumption on a, would unnecessarily eliminate potential implementations of b. The minimal assumption that subsystem a must rely on is in fact an information-flow assumption, namely that b will eventually learn the value of in.
We present a method that derives necessary information flow assumptions automatically. A fundamental difference between behavioral and information flow assumptions is that behavioral assumptions are trace properties, i.e., properties of individual traces; by contrast, information flow assumptions are hyperproperties, i.e., properties of sets of traces. In our example, the assumption that a will eventually communicate the value of in to b is the hyperproperty that any two traces that differ in the value of in must eventually also differ in c. The precise difference between the two traces depends on the communication protocol chosen in the implementation of a; however, any correct implementation of a must ensure that some difference in b's input (on channel c) in the two traces occurs, so that b can then respond with a different output.
Once we have obtained information flow assumptions for all of the subsystems, we proceed to synthesize each subsystem under the assumption generated for its environment. It is important to note that, at this point, the implementation of the environment is not known yet; as a result, we only know what information will be provided to process b, but not how. This also means that we cannot yet construct an executable implementation of the process under consideration; after all, this implementation would need to correctly decode the information provided by its partner process. Clearly, we cannot determine how to decode the information before we know how the implementation of the sending process encodes the information! Our solution to this quandary is to synthesize a prototype of an implementation for the process that works with any implementation of the sender, as long as the sender satisfies the information flow requirement. The prototype differs from the actual implementation in that it has access to the original (unencoded) information. Because of this information the prototype, which we call a hyper implementation, can determine the correct output that satisfies the specification. Later, in the actual implementation, the information is no longer available in its original, unencoded form, but must instead be decoded from the communication received from the environment. However, the information flow assumption guarantees that this is actually possible, and access to the original information is, therefore, no longer necessary.
In Section 2, we explain our approach in more detail, continuing the discussion of the bit transmission example mentioned above. The paper then proceeds to make the following contributions: -We introduce the notion of necessary information flow assumptions (Section 4.1) for distributed systems with two processes and present a method for the automatic derivation of such assumptions from process specifications given in linear-time temporal logic (LTL). -We strengthen information flow assumptions to the notion of time-bounded information flow assumptions (Section 4.2), which characterizes information that must be received in finite time. We introduce the notion of uniform distinguishability and prove that uniform distinguishability guarantees the necessity of the information flow assumption. -We introduce the notion of hyper implementations (Section 6) and provide a synthesis method for their automatic construction. We also explain how to transform hyper implementations into actual process implementations.
-We present a practical approach (Section 7) that simplifies the synthesis for cases where the information flow assumption refers to a finite amount of information.
-We report on encouraging experimental results (Section 8).

The Bit Transmission Problem
We use the bit transmission example from the introduction to motivate our approach. The example consists of two processes a and b that are combined into the distributed architecture shown in Figure 1a. Process a observes the (binary) input of the environment through variable in and can communicate with the second process b via a channel (modeled by the shared variable c). Process b observes its own local input from a and has a local output out. We are interested in synthesizing an implementation for our distributed system consisting of two strategies, one for each process, whose combined behavior satisfies the specification. In this example, the specification for process b is to transmit the initial value of in, an input of a, to b's own output; this is expressed by the linear-time temporal logic (LTL) formula ϕ b = in ↔ out. The specification does not restrict a's behavior, such that ϕ a = true.
Since the value of out is controlled by b, whereas in is determined by the environment and observed by a, this specification forces b to react to an input that b neither observes nor controls. To satisfy the goal, out must remain false forever if in is initially false, while out must eventually become true at least once if in starts with value true. Indeed, in order to set out to true, process b must know that in is initially true, which can only be satisfied via information flow from a to b. We can capture this information flow requirement as the following hyperproperty: For every pair of traces that disagree on the initial value of in, process a must (eventually) behave differently on c. The requirement can be expressed in HyperLTL by the formula Ψ = ∀π, π ′ .(in π in π ′ ) → (c π c π ′ ). The information flow requirement does not restrict a to behave in a particular manner; the encoding of the information about in on the channel c depends on a's behavior. Under the assumption that a will behave according to the information flow requirement Ψ , one can synthesize a solution of b that is correct for every implementation of a. Given its generality, we call such a solution a hyper implementation, shown in Figure 1b. Since the point in time when the information is received by b is unknown during the local synthesis process, an additional boolean variable t is added to the specification of b. This variable signals that the information has been transmitted and is later derived by a's implementation. Setting out to true is only allowed after t is observed by process b. When the hyper implementation is composed with the actual implementation of a, as shown in Figure 1c, both local specifications are satisfied. The resulting local implementation of b, depicted in Figure 1d, branches only on local inputs and, together with a, satisfies the specification. While changing state b 0 to b 1 , process b cannot distinguish in from ¬in. It has to wait for one time step, i.e., the first difference in outputs of process a, to observe the difference in the shared

Preliminaries
Architectures. For ease of exposition we focus in this paper on systems with two processes. Let V be a set of variables. An architecture with two black-box processes p and q is given as a tuple (I p , I q , O p , O q , I e ), where I p , I q , O p , O q , and I e are all subsets of V. O p and O q are the output variables of p and q. O e are the output variables of the uncontrollable environment. The three sets O p , O q and O e form a partition of V. I p and I q are the input variables of processes p and q, respectively. For each black-box process, the inputs and outputs are disjoint, i.e., I p ∩ O p = ∅ and I q ∩ O q = ∅. The inputs I p and I q of the black-box processes are all either outputs of the environment or outputs of the other blackbox process, i.e., I p ⊆ O q ∪ O e and I q ⊆ O p ∪ O e . We assume that all variables are of boolean type. For a set V ⊆ V, every subset V ′ ⊆ V defines a valuation of V , where the variables in V ∩ V ′ have value true and the variables in V \ V ′ have value false.
Implementations. An implementation of an architecture (I p , I q , O p , O q , I e ) is a pair (s p , s q ), consisting of a strategy for each of the two black-box processes. A strategy for a black-box process p is a function s p : (2 Ip ) * → (2 Op ) that maps finite sequences of valuations of p's input variables (i.e., histories of inputs) to a valuation of p's output variables. The (synchronous) composition s p ||s q of the two strategies is the function s : (2 Oe ) * → (2 V ) that maps finite sequences of valuations of the environment's output variables to valuations of all variables: we define s(ǫ) = s p (ǫ) ∪ s q (ǫ) and, for v ∈ (2 Oe ) * , x ∈ 2 Oe , s(v · x) = (s p (f p (v))∪s q (f q (v))∪x), where f p and f q map sequences of environment outputs to sequences of process inputs with Specifications. Our specifications refer to traces over the set V of all variables. In general, for a set V ⊆ V of variables, a trace over V is an infinite sequence Two traces of disjoint sets V, V ′ ⊂ V can be combined by forming the union of their valuations at each position, i.e., x 0 x 1 x 2 . . . ⊔ y 0 y 1 y 2 . . . = (x 0 ∪ y 0 )(x 1 ∪ y 1 )(x 2 ∪ y 2 ) . . .. Likewise, the projection of a trace onto a set of variables V ′ ⊆ V is formed by intersecting the valuations with V ′ at each position: For our specification language, we use propositional linear-time temporal logic (LTL) [25], with the set V of variables as atomic propositions and the usual temporal operators Next , Until U, Globally , and Eventually . System specifications are given as a conjunction ϕ p ∧ ϕ q of two LTL formulas, where ϕ p refers only to variables in O p ∪ O e , i.e., the formula relates the outputs of process p to the outputs of the environment, and ϕ q refers only to variables in O q ∪O e . The two formulas represent the local specifications for the two black-box processes. An implementation s = (s p , s q ) defines a set of traces We say that the implementation satisfies the specification if the traces of the implementation are contained in the specification, i.e., Traces(s p , s q ) ⊆ ϕ.
The synthesis problem. Given an architechture and a specification ϕ, the synthesis problem is to find an implementation s that satisfies ϕ. We say that a specification ϕ is realizable in a given architecture if such an implementation exists, and unrealizable if not.
A hyperproperty over V is a set H ⊆ 2 (2 V ) ω of sets of traces over V [6]. An implementation (s p , s q ) satisfies the hyperproperty H iff its traces are an element of H, i.e., Traces(s p , s q ) ∈ H. A specification language for hyperproperties is the temporal logic HyperLTL [5]. HyperLTL extends LTL with quantification over trace variables. The syntax of HyperLTL is given by the following grammar ϕ ::= ∀π. ϕ | ∃π. ϕ | ψ and ψ :: is a variable and π ∈ T is a trace variable. Note that the output variables are indexed by trace variables. The quantification over traces makes it possible to express properties like "ψ must hold on all traces", which is expressed by ∀π. ψ . Dually, one can express that "there exists a trace on which ψ holds", denoted by ∃π. ψ . The temporal operators are defined as in LTL.
In some cases, a hyperproperty can be expressed in terms of a binary relation on traces. A relation R ⊆ (2 V ) ω × (2 V ) ω of pairs of traces defines the hyperproperty H, where a set T of traces is an element of H iff for all pairs π, π ′ ∈ T of traces in T it holds that (π, π ′ ) ∈ R. We call a hyperproperty defined in this way a 2-hyperproperty. In HyperLTL, 2-hyperproperties are expressed as formulas with two universal quantifiers and no existential quantifiers. A 2-hyperproperty can equivalently be represented as a set of infinite sequences over the product alphabet Σ 2 : for a given 2- This representation is convenient for the use of automata to recognize 2-hyperproperties.
where Q denotes a finite set of states, Σ is a finite alphabet, q 0 is a designated initial state, F ⊆ Q is the set of accepting states, and δ : Q × Σ → P(Q) is the transition relation that maps a state and a letter to a set of possible successor states. A run of A on a finite word w

Necessary Information Flow in Distributed Systems
In reactive synthesis it is natural that the synthesized process reacts to different environment outputs. This is also the case for distributed synthesis, where some outputs of the environment are not observable by a local process and the hidden values must be communicated to the process. In the following we show when such information flow is necessary.

Necessary Information Flow
Our analysis focuses on pairs of situations for which the specification dictates a different reaction from a given black-box process p. Such pairs imply the need for information flow that will enable p to distinguish the two situations: if p cannot distinguish the two situations, it will behave in the same manner in both. Consequently, the specification will be violated, no matter how p is implemented, in at least one of the two situations. A process p needs to satisfy a local specification ϕ p , which relates its outputs O p to the outputs O e of the environment. (Recall that O e may contain inputs to the other black-box process.) We are therefore interested in pairs of traces over O e for which ϕ p does not admit a common valuation of O p . We collect such pairs of traces in a distinguishability relation, denoted by ∆ p : Definition 1 (Distinguishability). Given a local specification ϕ p for process p, the distinguishability relation ∆ p is the set of pairs of traces over O e (environment outputs) such that no trace over O p satisfies ϕ p in combination with both traces in the pair. Formally: By definition of ∆ p , process p must distinguish π e from π ′ e , because it cannot respond to both in the same manner. In our running example, ∆ b consists of all pairs of sequences of values of in that differ in the first value of in. Process b must act differently in such situations: if in is initially true then b must eventually set out to true, while if it starts as false, then b must keep out always set to false.
In general, a black-box process p must satisfy its specification ϕ p despite having only partial access to O e . The distinguishability relation therefore directly defines an information flow requirement: In order to satisfy ϕ p , enough information about O e must be communicated to p via its local inputs I p to ensure that p can distinguish any pair of traces in ∆ p . We formalize this information flow assumption as a 2-hyperproperty, which states that if the outputs of the environment in the two traces must be distinguished, i.e, the projection on O e is in ∆ p , then there must be a difference in the local inputs I p : Definition 2 (Information flow assumption). The information flow assumption ψ p induced by ∆ p is the 2-hyperproperty defined by the relation In our running example, the information flow assumption for process b requires that on any two executions that disagree on the initial value of in, the values communicated to b over the channel c must differ at some point. Observe that the information flow assumption ψ p specifies neither how the information is to be encoded on c nor the point in time when the different communication occurs. However, ψ p requires that the communication differs eventually if the initial values of in are different. Moreover, notice that both ∆ p and ψ p are determined by p's specification ϕ p . The following theorem shows that the information flow assumption ψ p is a necessary condition. Theorem 1. Every implementation that satisfies the local specification ϕ p for p also satisfies the information flow assumption ψ p .
Proof. Assume that there exists an implementation (s p , s q ) that satisfies ϕ p but not ψ p . We show that this leads to a contradiction. Since ψ p is not satisfied, there exists a pair of traces π, π ′ such that (π↓ Oe , π ′ ↓ Oe ) ∈ ∆ p and π↓ Ip = π ′ ↓ Ip . Let π e = π↓ Oe , and π ′ e = π ′ ↓ Oe . Since the inputs to process p are the same on π and π ′ , and since the strategies s p and s q are deterministic, the sequence of outputs is also the same. Let x 0 x 1 x 2 . . . = π↓ Ip = π ′ ↓ Ip be the sequence of inputs. We construct the sequence of outputs o 0 o 1 o 2 . . . generated by the implementation as follows: o k = s p (x 0 x 1 . . . x k−1 ) for all k ∈ N. Given that the implementation satisfies ϕ p , we have that both π e ⊔ o and π ′ e ⊔ o satisfy ϕ p . This, however, contradicts the assumption that (π↓ Oe , π ′ ↓ Oe ) ∈ ∆ p .

Time-bounded Information Flow
We now introduce a strengthened version of the information flow assumption. As shown in Theorem 1, the information flow assumption is a necessary condition for the existence of an implementation that satisfies the specification. Often, however, the information flow assumption is not strong enough to allow for the separate synthesis of individual components in a compositional approach.
Consider again process b in our motivating example. The information flow assumption guarantees that any pair of traces that differ in the initial value of the global input in will differ at some point in the value of the channel c. This assumption is not strong enough to allow process b to satisfy the specification that b must eventually set out to true iff the initial value of in is true. Suppose that in is true initially. Then b must at some point set out to true. Process b can only do so when it knows that the initial value of in is true. The information flow assumption is, however, too weak to guarantee that process b will eventually obtain this knowledge. To see this, consider a hypothetical behavior of process a that sets c forever to true, if in is true in the first position, and if in is true then a keeps c true for n − 1 steps, where n > 0 is some fixed natural number, before it sets in to false at the n th step. This behavior of process a satisfies the information flow assumption for any number n; however, without knowing n, process b does not know how many steps it should wait for in to become false. If, at any point in time t, the channel c has not yet been set to false, process b can never rule out the possibility that the initial value of in is true; it might simply be the case that t < n and, hence, the time when c will be set to false still lies in the future of t! Hence, process b can never actually set out to true.
We begin by presenting a finer version of the distinguishability relation from Definition 1 that we call time-bounded distinguishability. Recall that by Definition 1, a pair (π e , π ′ e ) is in the distinguishability relation ∆ p if every output sequence π p for p violates p's specification ϕ p when combined with at least one of the input sequences π e or π ′ e . Equivalently, if ϕ p is satisfied by π p combined with π e , then it is violated when π p is combined with π ′ e . Observe that for p to behave differently in two scenarios, a difference must occur at a finite time t. Clearly, this will only happen if p's input shows a difference in finite time. To capture this, we say that a pair (π e , π ′ e ) of environment output sequences is in the time-bounded distinguishability relation if the violation with π ′ e is guaranteed to happen in finite time. In order to avoid this violation, process p must act in finite time, before the violation occurs on π ′ e . We say that a trace π finitely violates an LTL formula ϕ, denoted by π f ϕ, if there exists a finite prefix w of π such that every (infinite) trace extending w violates ϕ.
Definition 3 (Time-bounded distinguishability). Given a local specification ϕ p for process p, the time-bounded distinguishability relation Λ p is the set of pairs (π e , π ′ e ) ∈ (2 Oe ) ω × (2 Oe ) ω of traces of global inputs such that every trace of local outputs π p ∈ O p either violates the specification ϕ p when combined with π e , or finitely violates p's local specification ϕ p when combined with π ′ e : Note that, unlike the distinguishability relation ∆ p , the time-bounded distinguishability relation Λ p is not symmetric: For (π e , π ′ e ), the trace π ′ e ⊔ π p has to finitely violate ϕ p , while the trace π e ⊔ π p only needs to violate ϕ p in the infinite evaluation. As a result, the corresponding time-bounded information flow assumption will also be asymmetric: we require that on input π e , process p eventually obtains the knowledge that the input is different from π ′ e . For input π ′ e we do not pose such a requirement. The intuition behind this definition is that on environment output π ′ e , process p must definitely produce some output that does not finitely violate ϕ p . This output can safely be produced without ever knowing that the input is π ′ e . However, on input π e , it becomes necessary for process p to eventually deviate from the output that would work for π ′ e . In order to safely do so, p needs to realize after some finite time that the input is not π ′ e . In our running example, π e would be an input in which in is initially true, while π ′ e will be one in which it starts out being false. Suppose we have a function t : (2 Oe ) ω → N that identifies, for each environment output π e , the time t(π e ) by which process p is guaranteed to know that the environment output is not π ′ e . We define the information flow assumption for this particular function t as a 2-hyperproperty. Since we do not know t in advance, the time-bounded information flow assumption is the (infinite) union of all 2-hyperproperties corresponding to the different possible functions t.
Definition 4 (Time-bounded information flow assumption). Given the time-bounded distinguishability relation Λ p for process p, the time-bounded information flow assumption χ p for p is the (infinite) union over the 2-hyperproperties induced by the following relations R t , for all possible functions t : (2 Oe ) ω → N: Unlike the information flow assumption (cf. Theorem 1), the time-bounded information flow assumption is not in general a necessary assumption. Consider a modification of our motivating example, where there is an additional environment output start, which is only visible to process a, not to process b. The previous specification ϕ b is modified so that if in is true initially, then out must be true two steps after start becomes true for the first time; if in is false initially, then out must become false after two positions have passed since the first time start has become true. The specification ϕ a ensures that the channel c is set to true until start becomes true. Clearly, this is realizable: if in is false initially, process a sets c to false once start becomes true, otherwise c stays true forever.
Process b starts by setting out to true. It then waits for c to become false, and, if and when that happens, sets out to false. In this way, process b accomplishes the correct reaction within two steps after start has occurred. However, the function t required by the time-bounded information flow assumption does not exist, because the time of the communication depends on the environment: the prefix needed to distinguish an environment output π e , where in is true initially from an environment output π ′ e , where in is false initially, depends on the time when start becomes true on π ′ e . We now characterize a set of situations in which the time-bounded information flow requirement is still a necessary requirement. For this purpose we consider time-bounded distinguishability relations where the safety violation occurs after a bounded number of steps. We call such time-bounded distinguishability relations uniform; the formal definition follows below.
Theorem 2. Let Λ p be a uniform time-bounded distinguishability relation derived from process p's local specification ϕ p . Every computation tree that satisfies ϕ p also satisfies the time-bounded information flow assumption χ p .
Proof. Let (s a , s b ) be an implementation that satisfies ϕ p . We show that the time-bounded information-flow assumption χ p is satisfied by defining a function t : (2 Oe ) ω → N such that the 2-hyperproperty given by R t is satisfied. To compute t(π ′ e ) for some trace of inputs π ′ e ∈ (2 Oe ) ω , we consider the trace of outputs π ′ p ∈ (2 Op ) ω obtained by applying the implementation to the prefixes of π ′ e . Since Λ p is uniform, there is a natural number n ∈ N such that for all π e with (π e , π ′ e ) ∈ Λ p , we have that π ′ e ⊔ π p n ϕ p . We set t(π e ) to n. To convince yourself that χ p is satisfied, suppose, by way of contradiction, that R t is violated on some pair (π e , π ′ e ) ∈ Λ p of input traces, i.e., the projection on I p is the same for π e and π ′ e on the entire prefix of length t(π ′ e ). But then, also the output of process p must be the same along the entire prefix; this, however, means that π ′ e will violate ϕ p after n = t(π ′ e ) steps, contradicting our assumption that the implementation satisfies ϕ p .

Automata for Information Flow Assumptions
We first give an explicit construction of an automaton that recognizes the information flow assumption ψ p that is induced by ϕ p . The local specification ϕ p is given as an LTL formula, which can be translated into an equivalent Büchi automaton A ϕ over alphabet 2 Oe∪Op [28]. We self-compose A ϕ into an automaton B over the alphabet 2 Oe∪Op × 2 Oe∪Op such that B accepts a sequence of pairs iff both the projection on the first components and the projection on the second components are accepted by A ϕ and, additionally, both components always agree on the values of O p . We then construct a Büchi automaton C over the alphabet 2 Oe × 2 Oe that guesses the values of O p nondeterministically so that a pair of sequences is accepted by C iff there exists a valuation of O p such that the extended sequences are accepted by B. The automaton C thus accepts all sequences of global inputs that process p does not need to distinguish, because there is a sequence of outputs that satisfies the specification in both cases. We construct another Büchi automaton D over alphabet 2 Ip × 2 Ip that recognizes a sequence of pairs of local input values iff they differ at some point. Finally, we construct a Büchi automaton E over the alphabet 2 Oe∪Ip × 2 Oe∪Ip that accepts a sequence of pairs iff the sequence of projections on O e is accepted by C or the sequence of projections on I p is accepted by D. The automaton E recognizes the information flow assumption ψ p of process p.
Theorem 3. For a process p with local specification ϕ p , there exists a Büchi automaton with an exponential number of states in the length of ϕ p that recognizes the information flow assumption ψ p induced by ϕ p .
Proof. The automaton E described above recognizes ψ p . We now claim that it has the stated size. The number of states of A ϕ is exponential in the length of ϕ p . By construction, the number of states of B is quadratic in the number of states of A ϕ , and C has the same number of states as B. The automaton D needs only two states. Hence, E has only two more states than B and so its total number of states is exponential in the length of ϕ p , as claimed.

Checking Uniformity
We begin with the construction of an automaton A Λp over alphabet 2 Oe × 2 Oe that recognizes the time-bounded distinguishability relation Λ p . Let A ¬ϕp be a deterministic ω-automaton over alphabet 2 Oe∪Op that recognizes all traces that violate the local specification ϕ p . Let B ¬ϕp be a deterministic finite-word automaton over alphabet 2 Oe∪Op that recognizes the bad prefixes of ϕ p . We combine A ¬ϕp and B ¬ϕp into a deterministic ω-automaton C over alphabet 2 Oe × 2 Oe × 2 Op that accepts traces of two inputs π e , π ′ e and an output π p such that π e ⊔ π p violates ϕ p or π ′ e ⊔ π e finitely violates ϕ p . We obtain the universal automaton D Λp with alphabet 2 Oe × 2 Oe as the universal projection of C with respect to the outputs π p .
Theorem 4. For a process p with local specification ϕ p , there exists a universal ω-automaton A Λp over alphabet 2 Oe × 2 Oe that recognizes the time-bounded distinguishability relation Λ p . The number of states of A Λp is doubly-exponential in the length of ϕ p .
Proof. Both A ¬ϕp and B ¬ϕp have doubly-exponentially many states in the length of ϕ p [23]. The size of C is the product of the sizes of A ¬ϕp and B ¬ϕp . Because of the universal projection, D Λp is universal, rather than deterministic, but still of doubly-exponential size.
Next, we check whether the time-bounded distinguishability relation is uniform. We construct an automaton that recognizes all traces of inputs and local outputs where no uniform bound exists. Let A ϕp be a universal ω-automaton over alphabet 2 Oe∪Op that recognizes all traces that satisfy the local specification ϕ p . We combine A ϕ with D Λp to a universal ω-automaton E over alphabet 2 Oe × 2 Oe × 2 Op that accepts traces of two inputs π e , π ′ e and an output π p when (π e , π ′ e ) ∈ Λ p and π e ⊔ π p ϕ p . From E we construct a universal automaton F over alphabet 2 f × 2 Oe × 2 Op that accepts π e and π p if there exists an π ′ e such that the bad prefix is reached on π ′ e after f becomes true for the first time. Finally, we obtain a universal automaton G over alphabet 2 Oe × 2 Op that accepts those π e and π p that are accepted by F for all traces of f that set f to true at least once.
Theorem 5. For a process p with local specification ϕ p , whether the time-bounded distinguishability relation is uniform can be checked in quadruply exponential running time.
Proof. A ϕp is exponential in the length of ϕ p , D Λp is doubly exponential; hence, E is also doubly exponential. Because of the projection in O e , F is triply exponential. Because F is universal, the universal projection in f does not cause a further increase in the number of states, the size of G is thus triply exponential. Emptiness of a universal automaton can be checked in exponential time, resulting in an overall quadruply exponential running time.

Computing Time-bounded Information Flow Assumptions
Our final goal in this section is to compute time-bounded information flow assumptions. Time boundedness introduces a new difficulty, because an unbounded number of traces is required to satisfy the same bound; hence, the time-bounded information flow assumption is not a k-hyperproperty for any value k ∈ N. In the following, we nonetheless represent the time-bounded information flow property as a 2-hyperproperty, by employing the following trick: We introduce a fresh atomic proposition t, which is to be read by process p as a new input and is to be computed by process p's environment. The first occurrence of t indicates that the time bound has been reached. This extra proposition allows us to express the time-bounded information flow assumption as a 2-hyperproperty: we first require that t occurs on every trace that appears as a left trace in Λ p (condition 1). Furthermore, process p must observe a difference between any pair of traces in Λ p before t occurs on the left trace (condition 2).
We begin with the universal automaton A Λp over alphabet 2 Oe × 2 Oe from Theorem 4, which recognizes the time-bounded distinguishability relation Λ p . We dualize A Λp to obtain the nondeterministic automaton A Λp that recognizes all pairs of traces not in Λ p . For condition 1, we construct a nondeterministic automaton H 1 that checks that t occurs on the left trace; for condition 2, we construct a nondeterministic automaton H 2 that ensures that the traces differ in the local inputs before t occurs. Combining A Λp with H 1 and H 2 , we obtain a nondeterministic automaton I over the alphabet 2 Oe∪Ip∪Op∪{t} × 2 Oe∪Ip∪Op∪{t} that represents the time-bounded information flow assumption.
Theorem 6. For a process p with local specification ϕ p , there exists a nondeterministic ω-automaton with a doubly-exponential number of states in the length of ϕ p that recognizes the time-bounded information flow assumption χ p induced by ϕ p .
Proof. The automaton I described above recognizes χ p . We now claim that it has the stated size. By Theorem 4, the number of states of A Λp is doublyexponential in the length of ϕ p . The dual A Λp has the same size as A Λp ; finally, H 1 and H 2 each has a constant number of states. Thus, the number of states of I is also doubly-exponential in the length of ϕ p .

Compositional Synthesis
We now use the time-bounded information flow assumptions to split the distributed synthesis problem for an architecture (I p , I q , O p , O q , I e ) into two separate synthesis problems. The local implementations are then composed and form a correct system, whose decomposition returns the solution for each process.

Constructing the Hyper Implementations
We begin with the synthesis of local processes. Let Λ p and Λ q be the timebounded distinguishability relations for p and q, and let χ p and χ q be the resulting time-bounded information flow assumptions. In the individual synthesis problems, we ensure that process p provides the information needed by process q, i.e., that the implementation of p satisfies χ q , and, similarly, that q provides the information needed by p, i.e., q's implementation satisfies χ p .
We carry out the individual synthesis of a process implementation on trees that branch according to the input of the process (including t p ) and the environment's output. In such a tree, the synthesized process thus has access to full information. We call this tree a hyper implementation, rather than an implementation, because the hyper implementation describes how the process will react to certain information, without specifying how the process will receive information. This detail is left open until we know the other process' hyper implementation: at that point, both hyper implementations can be turned into standard strategies, which are trees that branch according to the process' own inputs.
Definition 6 (Hyper implementation). Let p and q be processes and e be the environment. A 2 Oe∪Ip∪{tp} -branching 2 Op∪{tq} -labeled tree h p is a hyper implementation of p.
Since the hyper implementation has access to the full information, while the time-bounded information flow assumption only guarantees that the relevant information arrives after some bounded time, the strategy has "too much" information. We compensate for this by introducing a locality condition: on two traces (π e , π ′ e ) ∈ Λ p in the distinguishability relation of process p, as long as the input to the process from the external environment is identical, process p's output must be identical until t p happens (which signals that the bound for the transmission of the information has been reached). For traces (π e , π ′ e ) ∈ Λ p outside the distinguishability relation, process p's output must be identical until there is a difference in the input to process p or in the value of t p .
Definition 7 (Locality condition). Given the time-bounded distinguishability relation Λ p for process p, the locality condition η p for p is the 2-hyperproperty induced by the following relation R: We use HyperLTL to formulate the locality condition for process b in our running example. Based on the time-bounded distinguishability relation Λ b , which relates every trace with in in the first step to all traces on which ¬in holds there, we can write the locality condition: The order in the formula is analogous to the order in Definition 7. For all pairs of traces that are in the distinguishability relation, i.e., in is true on π and false on π ′ , the outputs being equivalent on both traces can only be released by t on trace π or by a difference in the local inputs (c). Moreover, if the traces are not in the distinguishability relation, i.e., ¬(in π ∧ ¬in π ′ ), then only a difference in t or c can release out to be equivalent on both traces. With the locality condition at hand, we define when a hyper implementation is locally correct: Definition 8 (Local correctness of hyper implementations). Let p and q be processes, let ϕ p be the local specification of p, let η p be its locality condition, and let χ q be the information flow assumption of q. The hyper implementation h p of p is locally correct if it satisfies ϕ p , η p , and χ q .
The specification ϕ p is a trace property, while η p and χ q are hyperproperties. Since all properties that need to be satisfied by the process are guarantees, it is not necessary to assume explicit behaviour of process q to realize process p. Local correctness relies on the guarantee that the other process satisfies the current process' own information flow assumption. Note that both the locality condition and the information flow assumption for p build on the time-bounded distinguishability relation of p. Fig. 2: The composition of the hyper implementations of a in Figure 1c and b in Figure 1d. The states are labeled with the combination of states reached for both processes, and multiple, if they cannot be distinguished.

Composition of Hyper Implementations
The hyper implementations of each of the processes are locally correct and satisfy the information flow assumptions of the other process respectively. However, the hyper implementations have full information of the inputs and are dependent on the additional variables t p and t q . To construct practically executable local implementations, we first compose the hyper implementations into one strategy.
Definition 9 (Composition of hyper implementations). Let p and q be two processes with hyper implementations given as infinite 2 Oe∪Ip∪{tp}∪ICpbranching 2 Op∪{tq} -labeled tree h p for process p, and an infinite 2 Oe∪Iq∪{tq}∪ICqbranching 2 Op∪{tp} -labeled tree h q for process q. Given two hyper implementations h p and h q , we define the composition h = h p ||h q to be a 2 Oe -branching 2 Op∪Oq -labeled tree, where h(v) = (h p (f p (v)) ∪ h q (f q (v))) ∩ (O p ∪ O q ) and f p , f q are defined as follows: If each hyper implementation satisfies the time-bounded information flow assumption of the other process, then there exists a strategy for each process (given as a tree that branches according to the local inputs of the process), such that the combined behavior of the two strategies corresponds exactly to the composition of the hyper implementations.
The composition of the hyper implementations of the bit transmission protocol is shown in Figure 2. The initial state is the combination of both processes initial states with the corresponding outputs. We change the state after the value of in is received. While process a directly reacts to in, process b cannot observe its value, and the composition can either be in h b 0 or h b 1 . Booth states have the same output. In the next step, process a communicates the value of in by setting c to true or false, such that the loop states h a 1 , h a 1 and h a 2 , h b 3 are reached.
The local strategies of the processes are constructed from the composed hyper implementations. As an auxiliary notion we introduce the knowledge set : the set of finite traces in the composition that cannot be distinguished by a process.
Definition 10 (Knowledge set). Let p and q be two processes with composed hyper implementations h = h p ||h q . For a finite trace v ∈ (2 Ip ) * of inputs to p, we define the knowledge set K p (v) to be K p (v) {w | w is a finite trace of (2 Oe ) * and f p (w) = v}.
Proof. If K p (v) is a singleton or empty, then the lemma is trivially true. Assume Since w and w ′ agree on the local inputs to p, there exists at least one a ∈ O e \I p s.t. w ↓ Oa = w ′ ↓ Oa . Then, h p (w) = h p (w ′ ) has to hold following the function f p of Definition 9. Given the locality from Definition 7, this is only possible if t p was observed in the input to h p , which is replaced by the output of h q in Definition 9. Since h q satisfies the time bounded information flow assumption χ p from Definition 4, h p observes a difference in I p before it reacts to the global inputs. Therefore, h(w) ↓ Op = h(w ′ ) ↓ Op which contradicts the assumption.
The local strategies from the composed hyper implementations are then defined as follows: Definition 11 (Local strategies from hyper implementations). Let p and q be two processes with time-bounded information flow assumptions χ p and χ q , and h = h p ||h q be the composition of their hyper implementations. For j ∈ {p, q} the strategy s j , represented as a 2 Ij -branching 2 Oj -labeled tree for process j, is defined as follows: where min(K j (v)) is the smallest trace based on an arbitrary order over K j (v).
The base case of the definition inserts a label for unreachable traces in the composed hyper implementation. For example, the local inputs I p \O e are determined by s q , and not all input words in (2 Iq ) * are possible. Process p's local strategy s p can discard these input words. The second case of the definition picks the smallest trace in the knowledge set and computes the outputs from h that are local to a process. Intuitively, the outputs of h have to be the same for every trace that a process considers possible in the composed hyper implementations. We therefore pick one of them, compute the output of the composed hyper-strategy, and restrict the output to the local outputs of the process. The following theorem states the correctness of the construction in Definition 11. Theorem 7. Let p and q be two processes with time-bounded information flow assumptions χ p and χ q , let h = h p ||h q be the composition of the hyper implementations, and s p and s q be the local strategies. Then, for all v ∈ (2 Oe ) * it holds that h(v) = s p (g p (v)) ∪ s q (g q (v)) where g p , g q are defined as follows: Proof. Proof by induction over v ∈ (2 Oe ) * . Base case: Let v = ǫ, then s p (g p (ǫ))∪ s q (g q (ǫ)) = h(ǫ) = ǫ. Induction Step: The induction step is shown from v ∈ (2 Oe ) * to v · x ∈ (2 Oe ) * , with x ∈ 2 Oe . Inserting g p from Theorem 7 we obtain ). Since s q (g q (v)) and g p (v) is assumed correct, we show that the input trace returned by g p and given to s p is correct: The input is local to p because x ∩ I p and s q (g q (v)) ∩ I p ) remove unobservable inputs, and all outputs of the previous step from q are added to the current input. It remains to show that the outputs of the local strategies combined are equal to the output of h: h(v ·x) = s p (g p (v ·x))∪s q (g q (v ·x)). Let v ′ ·x ′ = g p (v ·x). Given Definition 11 and Definition 10, we know that is assumed correct, we show that adding x ′ to v ′ still results in correctness of h(K(w)). Following Lemma 1, all elements in K(v ′ · x ′ ) and therefore all corresponding paths in h have the same label and picking any with )). Using the same argument for s q by interchanging p and q in every index yields the correctness of the theorem, i.e., for all v ∈ (2 Oe ) * it holds that h(v) = s p (g p (v)) ∪ s q (g q (v)).
Combining all definitions and theorems of the previous sections, we conclude with the following corollary. Corollary 1. Let (I p , I q , O p , O q , I e ) be an architecture and ϕ = ϕ p ∧ ϕ q be a specification. If the hyper-strategies h p and h q are locally correct, then the implementation (s p , s q ) satisfies ϕ.

A More Practical Approach
A major disadvantage of the synthesis approach of the preceding sections is that the hyper implementations are based on the full set of environment outputs; as a result, hyper implementations branch according to inputs that are not actually available; this, in turn, necessitates the introduction of the locality condition.
In this section, we develop a more practical approach, where the branching is limited to the information that is actually available to the process: this includes any environment output directly visible to the process and, additionally, the information the process is guaranteed to receive according to the information flow assumption. As a result, the synthesis of the process is sound without need for a locality condition. We develop this approach under two assumptions: First, we assume that the time-bounded information flow assumption only depends on environment outputs the sending process can actually see; second, we assume that the time-bounded information flow assumption can be decomposed into a finite set of classes in the following sense: For a trace π of environment outputs, the information class [π] p describes that, on the trace π, the process p eventually needs to become aware that the current trace is in the set [π]. The information class is obtained by collecting all traces that are not related to π in the timebounded distinguishability relation.
Definition 12 (Information classes). Given a time-bounded distinguishability relation Λ p for process p, the information class [π] p of a trace π over O e is the following set of traces: The next definition relativizes the specification of the processes for a particular information class, reflecting the fact that the process does not know the actual environment output, but only its information class; hence, the process output needs to be correct for all environment outputs in the information class.
Definition 13 (Relativized specification). For a process p with specification ϕ p and an information class c, the relativized specification ϕ p,c is the following trace property over ( The component specification, which is the basis for the synthesis of the process, must take into account that the process does not know the information class in advance; the behavior of the other process will only eventually reveal the information class. Let IC be the set of information classes for process p. Assume that this set is finite. We now replace the inputs of the process that come from the other process with new input channels IC as new inputs. In the hyper implementation, receiving such an input reveals the information class to the process. In the actual implementation, the information class will be revealed by the actual outputs of the other process that are observable for p. The component specification requires that the processes satisfy the relativized specification under the assumption that the information class is eventually received. We encode this assumption as a trace condition ψ, which requires that exactly one of the elements of IC eventually occurs. Definition 14 (Component specification). For process p with specification ϕ p , the component specification ϕ p over (I p ∩ O e ) ∪ IC ∪ O p is defined as where ψ is the following trace property over ( and π [π ′ ] and exactly one element of IC occurs on π} Fig. 3: The architecture used for our experiments in (a) where the number outputs, inputs, and communication channels can vary. Figure 3b shows the implementation of process b for its bit transmission component specification.
The component specification allows us to replace the locality condition (Def. 7), which is a hyperproperty, with a trace property. Note, however, that the process additionally needs to satisfy the information flow assumption of the other process, which may in general depend on the full set O e of environment outputs. This would require us to synthesize the process on the full set O e , and to re-introduce the locality condition. In practice, however, the information flow assumption of one process often only depends on the information of the other process. In this case, it suffices to synthesize each process based only on the locally visible environment outputs. Figure 3b shows the implementation of b for its component specification ϕ b . In contrast to its hyper implementation (cf. Figure 1b), it does not branch according to in and t p , but only variables in IC. The specification is encoded as the following LTL formula: The left hand side of the implication represents the assumption ψ, the right hand side specifies the guarantee for each information class. The composition and decomposition can be performed analogously to the hyper implementations, where we map the value of ic to the values of the communication variables. We construct the automata for component specification as follows.
1. By complementing the automaton for the time-bounded distinguishability relation, we obtain an automaton A IC that associates each trace over O e with its information class: i.e., the pair (v, w) of traces over O e is accepted by the complement automaton iff (v, w) is not in the time-bounded distinguishability relation. 2. We obtain the information classes in the following iterative process (under the assumption that the number of information classes is finite): (a) We identify some trace v such that there is a pair (v, w) in the language of A IC ; (b) for each such trace v, we compute an automaton A [v] for the information class [v], i.e., an automaton that accepts all (c) we eliminate all (v, w) with v ∈ L(R) from A IC ; (d) repeat until the language of A IC is empty. 3. We build an automaton A φp,c for the relativized specification. The automaton uses universal branching to guess the trace from the information class and applies ϕ p to each guess. 4. Using the automata A [v] for the information classes we build an automaton A ψ for condition ψ from Definition 14. 5. Using A ψ and A φp,c , we build an automaton A ϕp for the component specification.

Experiments
The focus of our experiments is on the performance of the compositional synthesis approach compared to non-compositional synthesis methods for distributed systems. While the time-bounded information flow assumptions and the component specification can be computed automatically by automata constructions, we have, for the purpose of these experiments, built them manually and encoded them as formulas in HyperLTL or LTL, which were then entered to the BoSy/BoSyHyper [13] synthesis tool. Our experiments are based on the following benchmarks: -AC. Atomic commit. The atomic commitment protocol specifies that the output of a local process is set to true iff the observable input and the unobservable inputs are true as well. We only consider one round of communication, the initial input determines all values. The parameter shows how many input variables each process receives, Par. = 1 for the running example. -EC. Eventual commit. The atomic commit benchmark extended to eventual inputs -if all inputs (independently of each other) eventually will be true, then there needs to be information flow. -SA. Send all. Every input of the sender is relevant for the receiver, so it will eventually be sent if it it set to true. The parameter represents the number of input values and therefore the number of information classes. Table 1 shows the performance of the compositional synthesis approach. The column architecture (Arch.) signalizes for each benchmark if the information flow is directional (dir.) or bidirectional (bidir.). Column (Inflow send) indicates the running time for the sending process; where applicable, column (Inflow rec.) indicates the running time for the synthesis of the process that only receives information.
We compare the compositional approach to BoSyHyper, based on a standard encoding of distributed synthesis in HyperLTL (Inc. BoSy), and a specialized tool for distributed synthesis [2] (Distr. BoSy). All experiments were performed on a MacBook Pro with a 2,8 GHz Intel Quad Core processor and 16 GB of RAM. The timeout was 30 minutes.
Information flow guided synthesis outperforms the standard approaches, especially for more complex components. For example, in the atomic commitment benchmark, scaling in the number of inputs does not impact the synthesis of the local processes, while Distr. BoSy eventually times out, and the running time of Inc. BoSy increases faster than for the information flow synthesis. For all approaches, the Send All benchmark is the hardest one to solve. Here, each input that will eventually be set needs to be eventually sent, which leads to nontrivial communication over the shared variables and an increased state space to memorize the individual inputs. Nevertheless, the information flow guided synthesis outperforms the other approaches and times out with parameter 3 because BoSyHyper cannot cope with the number of states needed. Synthesizing a receiver that does not satisfy an information flow assumption is close to irrelevant for every benchmark run. Since these processes are synthesized with local LTL specifications, scaling only in the number of local inputs or information that will eventually be received is easily possible. Notably, these receivers are compatible with any implementation of the sender, whereas the solutions of the other approaches are only compatible for the same synthesis run.

Related Work
Compositional synthesis is often studied in the setting of complete information, where all processes have access to all environment outputs [11,15,19,22]. In the following, we focus on compositional approaches for the synthesis of distributed systems, where the processes have incomplete information about the environment outputs. Compositionality has been used to improve distributed synthesis in various domains, including reactive controllers [1,18]. Closest to our approach is assume-guarantee synthesis [3,4], which relies on behavioral guarantees of the processs behaviour and assumptions about the behavior of the other processes. Recently, an extension of assume-guarantee synthesis for distributed systems was proposed [24], where the assumptions are iteratively refined. Using a weaker winning condition for synthesis, remorse-free dominance [8] avoids the explicit construction of assumptions and guarantees, resulting in implicit assumptions. A recent approach [16] uses behavioral guarantees in the form of certificates to guide the synthesis process. Certificates specify partial behaviour of each component and are iteratively synthesized. The fundamental difference between all these approaches to this work is that the assumptions are behavioral. To the best of our knowledge, this is the first synthesis approach based on information-flow assumptions. While there is a rich body of work on the verification of information-flow properties (cf. [9,17,29]), and the synthesis from information-flow properties and other hyperproperties has also been studied before (cf. [13]), the idea of utilizing hyperproperties as assumptions for compositional synthesis is new.

Conclusion
The approach of the paper provides the foundation for a new class of distributed synthesis algorithms, where the assumptions refer to the flow of information and are represented as hyperproperties. In many situations, necessary information flow assumptions exist even if there are no necessary behavioral assumptions. There are at least two major directions for future work. The first direction concerns the insight that compositional synthesis profits from the generality of hyperproperties; at the same time, synthesis from hyperproperties is much more challenging than synthesis from trace properties. To address this issue, we have introduced the more practical method in Section 7, which replaces locality, a hyperproperty, with the component specification, a trace property. However, this method is limited to information flow assumptions that refer to a finite amount of information. It is very common that the required amount of information is infinite in the sense that the same type of information must be transmitted again and again. We conjecture that our method can be extended to such situations. A second major direction is the extension to distributed systems with more than two processes. The two-process case has the advantage that the assumptions of one process must be guaranteed by the other. With more than two processes, the localization of the assumptions becomes more difficult or even impossible, if multiple processes have (partial) access to the required information.