Multi-Scale Verification of Distributed Synchronisation

Algorithms for the synchronisation of clocks across networks are both common and important within distributed systems. We here address not only the formal modelling of these algorithms, but also the formal verification of their behaviour. Of particular importance is the strong link between the very different levels of abstraction at which the algorithms may be verified. Our contribution is primarily the formalisation of this connection between individual models and population-based models, and the subsequent verification that is then possible. While the technique is applicable across a range of synchronisation algorithms, we particularly focus on the synchronisation of (biologically-inspired) pulse-coupled oscillators, a widely used approach in practical distributed systems. For this application domain, different levels of abstraction are crucial: models based on the behaviour of an individual process are able to capture the details of distinguished nodes in possibly heterogenous networks, where each node may exhibit different behaviour. On the other hand, collective models assume homogeneous sets of processes, and allow the behaviour of the network to be analysed at the global level. System-wide parameters may be easily adjusted, for example environmental factors inhibiting the reliability of the shared communication medium. This work provides a formal bridge across the abstraction gap separating the individual models and the population-based models for this important class of synchronisation algorithms.

to employ methods to use the shared communication medium without too many conflicts, e.g., in the form of collisions. Several protocols to organise shared medium access have been developed and analysed [1,35]. These protocols typically identify a common time frame and divide this frame into slots associated to each node. Thus every node has an allocated time slot that it may use to communicate its messages onto the shared medium.
Such an approach introduces the need for a common clock between the nodes, i.e., they need to synchronise. A valuable approach to achieve synchrony of nodes is the implementation of biologically-inspired pulse-coupled oscillators (PCOs) [25]. A network of PCOs synchronises in the following way: all oscillators have a similar clock cycle at the end of which they fire. That is, they transmit a broadcast message which is received by all oscillators in their communication range. These oscillators then adjust their own position within their clock cycle according to a phase response function. Depending on the concrete implementation, they may move their current position within the clock cycle closer to its end, or closer to its start.
Most analyses of the synchronisation behaviour of PCOs are concerned with continouous clock cycles, i.e., where clocks take real values from the interval [0, 1]. However, the smaller devices get, the more important it is to save memory and computing time for such a low-level functionality. Even a floating point number may need too much memory, compared to an implementation with, for example, a four-bit vector. Hence, in previous work, we chose to analyse the behaviour of discrete time PCOs [16].
In contrast to continuous time PCOs, networks of discrete time PCOs are not always guaranteed to synchronise. Instead, whether they synchronise or not depends on the type of coupling between the oscillators and their common phaseresponse function. We analysed the behaviour of such networks for different parameters via model-checking, to check both qualitatively for which parameters the networks synchronise, as well as quantitatively for how long they need to achieve a synchronised state and how much energy is used to achieve this [17]. In the context of large numbers of single oscillators, for example in the context of wireless sensor networks, the well-known state-space explosion problem of the model-checking approach is extremely important [9]. We formalised a network of oscillators as population models [13] which exploit the behavioural homogeneity of the nodes to encode the global state efficiently. This allows the network size to be increased above what would be feasible when distinguishing each node. But the construction of a population model from a given oscillator specification is not straightforward, and in particular, it is not obvious whether the constructed population model correctly reflects the behaviour of the oscillators. This results in an 'abstraction gap': after abstracting into populations, how can we be sure that the abstraction process was correct and that the results of verification of population models actually hold for the concrete models on which they are based?
In this paper, we remedy this lack of certainty, by proving the correspondence of our population model with an explicit formalisation of the oscillators. To that end, we present the concrete oscillator model as well as its formalisation as a discrete-time Markov chain. Subsequently we describe the corresponding population model, and show how we can, in addition to the abstraction created by the populations, reduce the state space even further to facilitate the analysis. Finally, we prove that the behaviour of a network of concrete oscillators can be simulated by the population model. We cannot prove a one-to-one correspondence, since the concrete model implicitly includes the possibility of identifying individual oscillators, which is exactly what the population model abstracts from. However, by providing a formal notion of abstraction, we prove that population models are a truthful abstraction of concrete models.
The paper is structured as follows. In Sect. 2, we review a selection of related work, both for models of pulse-coupled oscillators, as well as approaches for their verification. After an introduction of preliminary notions in Sect. 3, we present the concrete model of single oscillators, both as an algorithm and as a discrete-time Markov chain derived from this algorithm, in Sect. 4. The abstract model in terms of population models and proofs about their properties are contained in Sect. 5. In Sect. 6, we prove the correspondence between these two types of models, and conclude in Sect. 7.

Related Work
The canonical model of pulse-coupled oscillators, and their synchronisation, was formulated by Mirollo and Strogatz [25], and based on Peskin's model of a cardiac pacemaker [29]. Here the progression of an oscillator through its oscillation cycle is given by a real value in the interval [0, 1]. Mirollo and Strogatz proved that with a convex phase response function, a network of mutually coupled oscillators always converges, i.e., their position within the oscillation cycle eventually coincides. Such a model has been shown to be applicable to the clock synchronisation of wireless sensor nodes [31] and swarms of robots [27].
Synchronisation algorithms based on pulse-coupled oscillators are often benefitial in unreliable, decentralised networks, where other synchronisation algorithms are not appropriate. For example, the Flooding Time Synchronisation Protocol (FTSP) [23] requires the use of an arbitrary root node. In situations where the root becomes unavailable due to communication failure or power outage, FTSP will have to assign another root node. When implemented on unreliable, decentralised networks, FTSP may spend considerable resources on repeatedly assigning root nodes, which may slow down or prevent synchronisation [8]. Other algorithms such as the Berkeley algorithm [18] and Cristian's algorithm [11] require the use of centralised time servers, which is problematic for unreliable, decentralised networks.
Several decentralised network algorithms for synchronisation are based on pulse-coupled oscillators [31,34]. For example, the Gradient Time Synchronisation Protocol (GTSP) by Sommer and Wattenhofer [30] achieves synchronisation by having nodes send their current clock value to their neighbours. Each node then calculates the average of the clock values received and its own clock value. This process is then repeated to maintain synchronisation. Another approach to synchronisation, the Pulse-Coupled Oscillator Protocol [26], makes use of refractory periods after sending messages containing time information. During the refractory period, no more messages are sent, which reduces network bandwidth and energy usage. A similar approach is used in the FiGo protocol [8], which combines biologically inspired synchronisation with information distribution via gossiping. All of these approaches use different phase response functions.
In general, synchronisation algorithms based on PCOs are more robust for unreliable networks, as they do not require centralised nodes and can work with only partial network connectivity [8]. They are particularly useful for batterypowered nodes in wireless networks, as the node can be placed in a low-power node during the refractory period, thus reducing energy usage. (The clock keeps ticking even in low-power mode, thanks to the design of microcontrollers such as the 'Atmel ATmega128L' [4].) Synchronisation of clocks for networks of nodes has been investigated from different perspectives. Heidarian et al. [20] analysed the behaviour of a synchronisation protocol based on time allocation slots for up to four nodes and different topologies, from fully connected networks to line topologies. They modelled the protocol as timed automata [2], and used the model-checker UPPAAL [7] to examine its worst-case behaviour. Their model is based on continuous time, and in particular, they did not model pulse-coupled oscillators.
Bartocci et al. [5] described pulse-coupled oscillators as extended timed automata with suitable semantics to model their peculiarities. They defined a dedicated logic to analyse the behaviour of a network of such automata along traces, and used a pacemaker as a case study to verify the eventual synchronisation and the time needed to achieve this.
Our models and methods are slightly different to all of these approaches. This is, of course, evident for all the mentioned work that is not concerned with pulsecoupled oscillators. However, we also define the oscillation cycle to consist of discrete steps. To the best of our knowledge, with the exception the paper by Webster et al. [33] and our previous work [16,17], there is no other work concerned with PCOs with discrete oscillation cycles. Furthermore, all of these approaches distinguish between single oscillators in the network, while the properties of interest relate to global behaviour. This discrepancy between local modelling and global analysis restricts the size of networks that can be analysed, due to the state-space explosion. To extend the size of analysable networks, we employ population models, a counting-abstraction of such networks [12]. Instead of identifying each oscillator on its own, we record how many oscillators are in each step of the oscillation cycle. This reduces the state-space quite tremendously by exploiting the symmetries in the model [13], and we are hence able to extend the size of networks.
The notion of population models should not be confused with population protocols [3], a formalism to express distributed algorithms. In contrast to our setting, communication in population protocols is always between two agents, where one agent initiates the communication and the other responds. Furthermore, even though the agents cannot identify the other agents in the network, within the global model each agent is uniquely associated with a state. In our model, we cannot distinguish between two different agents sharing the same state, even at the global level. Finally, our oscillators may change their state without interacting with other oscillators, while the agents in a population protocol must communicate with another agent to change their internal state.
We will present a relation between the concrete models, where each oscillator can be identified, and corresponding population models, and show that these two models are in a simulation relation [24]. More precisely, the concrete model weakly simulates its abstraction, since the oscillators have to take transitions independently, while in the population model, all oscillators evolve in a single step.
Similarly to typical definitions of counter abstractions [14,6], we use counters to model concurrent entities that are indistinguishable for our purposes. For example, to analyse the probability of eventually reaching a synchronised state, we are not interested in an order of oscillators, which would be artificial anyway. However, in contrast to these approaches, we do not include means to introduce new entities into a model. That is, the values within our population models are naturally bounded by the number of oscillators within the network.

Preliminaries
In this section we define discrete-time Markov chains (DTMCs), stochastic processes with discrete state space and discrete time, and introduce Probabilistic Computation Tree Logic (PCTL), a logic that can be used to reason about probabilistic reachability and rewards in these processes.
Throughout this paper, we use the notation f ⊕ [x → y], where f is a function, to express updating f at x by y. That is, the function that coincides with f , except for x, where it takes the value y.

Discrete-Time Markov Chains
DTMCs can be used to model systems where the evolution of the system at any moment in time can be represented by a discrete probabilistic choice over several outcomes.
Definition 1 A discrete-time Markov chain D is a tuple (Q, σ 0 , P, L) where Q is a finite set of states. σ 0 is the initial state, and L : Q → P(V) is a labelling function that assigns properties of interest from a set of labels V to states. P : Q ×Q → [0, 1] is the transition probability matrix subject to σ ′ ∈Q P(σ, σ ′ ) = 1 for all σ ∈ Q, where P(σ, σ ′ ) gives the probability of transitioning from σ to σ ′ . We say that there is a transition between two states σ, σ ′ ∈ Q if P(σ, σ ′ ) > 0.
Intuitively, a DTMC is a state transition system where transitions between states are labelled with probabilities greater than 0 and where states are labelled with properties of interest. An execution path ω of a DTMC D = (Q, σ 0 , P, L) is a nonempty finite, or infinite, sequence σ 0 σ 1 σ 2 · · · where σ i ∈ Q and P(σ i , σ i+1 ) > 0 for i 0. We denote the set of all paths starting in state σ by Paths D (σ), and the set of all finite paths starting in σ by Paths D f (σ). For paths where the first state along that path is the initial state σ 0 we will simply use Paths D and Paths D f , and we will simply use Paths and Paths f if D is clear from the context. For a finite path ω f ∈ Paths f the cylinder set of ω f is the set of all infinite paths in Paths that share prefix ω f . The probability of taking a finite path σ 0 σ 1 · · · σn ∈ Paths f is given by n i=1 P(σ i−1 , σ i ). This measure over finite paths can be extended to a probability measure Pr over the set of infinite paths Paths , where the smallest σ-algebra over Paths is the smallest set containing all cylinder sets for paths in Paths f . For a detailed description of the construction of the probability measure we refer the reader to [21].

Probabilistic Computation Tree Logic
Probabilistic Computation Tree Logic [19] (PCTL) is a probabilistic extension of the temporal logic CTL. Properties for DTMCs can be formulated in PCTL and then checked against the DTMCs using model checking.
Formulas denoted by Φ are state formulas and formulas denoted by Ψ are path formulas. A PCTL formula is always a state formula, and a path formula can only occur inside the P operator. We now give the semantics of PCTL over a DTMC. Definition 3 Given a DTMC D = (Q, σ 0 , P, L), we inductively define the satisfaction relation |= for any state σ ∈ Q as follows: where v ∈ V, and for any path ω = σ 0 σ 1 σ 2 · · · of D as follows: Disjunction, true, false, and implication are derived as usual, and we define eventuality as F k Φ ≡ true U k Φ. We simply use F Φ and Φ U Φ ′ when k = ∞.

Concrete Model of a Network of Pulse-Coupled Oscillators
In this section we give a brief introduction to the formal model of a single pulsecoupled oscillator, as originally presented in previous work [16]. Subsequently, we encode fully-coupled networks of such oscillators as discrete time Markov chains.

Pulse-Coupled Oscillator Model
We consider a fully-coupled network of pulse-coupled oscillators with identical dynamics over discrete time. The phase of an oscillator u at time t is denoted by φu(t). The phase of an oscillator progresses through a sequence of discrete integer values bounded by some T 1. The phase progression over time of a single uncoupled oscillator is determined by the successor function, where the phase increases over time until it equals T , at which point the oscillator will fire in the next moment in time and the phase will reset to one. The phase progression of an uncoupled oscillator is therefore cyclic with period T , and we refer to one cycle as an oscillation cycle. When an oscillator fires, it may happen that its firing is not perceived by any of the other oscillators coupled to it. We call this a broadcast failure and denote its probability by µ ∈ [0, 1]. Note that µ is a global parameter, hence the chance of broadcast failure is identical for all oscillators. When an oscillator fires, and a broadcast failure does not occur, it perturbs the phase of all oscillators to which it is coupled; we use αu(t) to denote the number of all other oscillators that are coupled to u and will fire at time t.
Definition 4 The phase response function is a positive increasing function ∆ : {1, . . . , T } × N × R + → N that maps the phase of an oscillator u, the number of other oscillators perceived to be firing by u, and a real value defining the strength of the coupling between oscillators, to an integer value corresponding to the perturbation to phase induced by the firing of oscillators where broadcast failures did not occur. We require ∆(Φ, 0, ǫ) = 0 for all possible phase response functions, that is, oscillators are only perturbed if they perceive at least one firing oscillator.
We can introduce a refractory period into the oscillation cycle of each oscillator. A refractory period is an interval of discrete values [1, R] ⊆ [1, T ] where R T is the size of the refractory period, such that if φu(t) is inside the interval, for some oscillator u at time t, then u cannot be perturbed by other oscillators to which it is coupled. If R = 0 then we set [1, R] = ∅, and there is no refractory period at all.
and takes as parameters δ, the degree of perturbance to the phase of an oscillator, and Φ, the phase, and returns Φ if it is in the refractory period, or Φ + δ otherwise.
The phase evolution of an oscillator u over time is then defined as follows, where the update function and firing predicate, respectively denote the updated phase of oscillator u at time t in the next moment in time, and the firing of oscillator u at time t, update u (t) = 1 + ref(φu(t), ∆(φu(t), αu(t), ǫ)),

Modelling the Network Using a DTMC
We model the whole network of oscillators as a single DTMC D = (Q, s 0 , P, L), where each state s ∈ Q denotes a global state of the network. More precisely, the labelling function uniquely maps each state s to an combined encoding of the individual state of each oscillator. For simplicity, we identify the label of a state with the state itself, and hence we omit L from the DTMC, but describe each member of Q via its internal state.
We model each transition of an oscillator as a single transition within the DTMC. However, since the oscillators may influence each other within a single time step (that is, when they are firing), we cannot simply allow for arbitrary sequences of transitions. For instance, to model that all the oscillators progress on a similar time-scale, we need to prevent a single oscillator from taking a transition and thus progressing its phase without giving the other oscillators a chance to do the same. We achieve this by the following means: we divide the internal computation of each oscillator into two modes: start and update, and we add a counter to the model, containing the number of oscillators that fire.
The counter also possesses both modes, and resets at the start of each "round" of computation. First, in the start mode, each oscillator checks whether it would fire, according to its phase response function and the current number of oscillators that already fired, as given by the counter. If it does, it increases the counter and updates its mode to update, otherwise it just updates its mode. If all oscillators are in the update mode, they compute their new phases in a single step, according to the phase response function and the current state of the environment counter. Furthermore, we impose an order on the evaluation on the oscillators in the start mode if at least one oscillator fires, starting from the highest phase to the lowest. This ensures that firing oscillators are perceived by the other nodes, and thus may lead to the firing of the latter. This way of modelling the nodes implies the assumption that the time window during which each oscillator listens on the shared medium is long enough to perceive the firing of any other oscillator.
The general idea of the progress of the network of oscillators is visualised in Fig. 1. In the figure, each rounded rectangle shows a state of a network of four oscillators. The circles represents the nodes, where we inscribe its current phase and an abbreviation of its mode. A node that is about to fire is indicated by a starred circle, while a shaded circle indicates a node that is within the refractory period. The rectangle denotes the environment counter, with its corresponding value and mode. The phase response function is arbitrarily chosen, and of minor importance for the example.
In the first state, all outgoing transitions only check whether to increase the counter. Since no oscillator is in the firing phase, all oscillators just update their mode (observe that the single arrow actually denotes four transitions). In the next step, all oscillators increase their phase by one, and reset their mode to start. In the next four transitions, oscillator 2 fires and increases the counter, which in turn is sufficient for oscillator 3 to fire as well. Hence they both increased the counter by one, while oscillators 1 and 4 did not. During the last transition of the example, oscillator 2 and 3 reset their phase to one, while oscillator 1 is perturbed and increases its phase by two steps at once. Oscillator 4 is within its refractory period, which means that it is not perturbed, and simply increments its phase. In addition to these transitions, we also need some bookkeeping transitions, to ensure that the counter is reset before the oscillators check their phase response. Furthermore, observe that in the example, it is crucial that oscillator 3 checks its response after oscillator 2 increased the counter, since otherwise 3 would not have been perturbed to fire.
Formally, we conflate the states of the oscillators and the environment into a single state of the DTMC. Each oscillator can be described by a tuple consisting of the current phase Φ of the oscillator and the mode θ within this phase. The phase ranges from 1 to T , while the mode takes values from {start , update}. Furthermore, we use a single counter to keep track of the number of oscillators that fired successfully within a single phase computation.
For a fixed sequence of N oscillators, a state of the concrete model consists of a function ν that associates a phase and mode with each oscillator, and the state of the environment η that counts the number of oscillators that fired, A state is therefore a tuple s = (η, ν), where η is the state of the environment, and ν is the state of the network. We denote the set of all concrete system states by Qc. For simplicity, we use the notation p φ (p θ , respectively) for the corresponding projection function of the network states, i.e., if ν(u) = (Φu, θu), then p φ (ν(u)) = Φu and p θ (ν(u)) = θu. Similarly, for an environment state η = (θ, c), we will refer to θ by p θ (η) and to c by pc(η). We use the notation init Φ (s) = {u | p θ (ν(u)) = start ∧ p φ (ν(u)) = Φ} for the set of all oscillators sharing phase Φ and mode start in the state s = (η, ν). Furthermore, we simply use the notation init (s) = {u | p θ (ν(u)) = start }.
We now define the transition probabilities between states. To do this we first distinguish the following cases: 1. the environment resets its counter; 2. no oscillator has a clock value of T ; 3. an oscillator is in the mode start , has a clock value lower than T , is perturbed, but not enough to fire; 4. an oscillator is in the mode start , has a clock value lower than T and is perturbed enough to fire; 5. an oscillator is in the mode start , has a clock value of T , and broadcasts its pulse; 6. an oscillator is in the mode start , has a clock value of T , and fails to broadcast its pulse; 7. all oscillators are in the mode update , update their clock and reset their state to start .
We will impose an order on certain transitions for two reasons. Firstly, we will restrict transitions that are only used for bookkeeping purposes. For example, we will require that the reset transition of the environment is taken before any of the transitions for the oscillators within a phase are activated. In particular, this means that each computation starts with a transition of the type 1. Secondly, we need to ensure that, if at least one oscillator fires, the phase response of all oscillators is evaluated starting with oscillators in the highest phase, down to the lowest phase, as described above. The cases stated above are reflected in the following definitions for the transition probability between two states s = (η, ν) and s ′ = (η ′ , ν ′ ). Case 1, where the environment resetting its counter is treated as follows. In the precondition, we require that the mode of the counter is start, and the state of the oscillators does not change from s to s ′ . Furthermore, the mode of the counter 3  changes to update in s ′ , and its value is set to 0. Since this transition is mandatory at the beginning of each round, its probability is 1.
Now we turn to the cases 5 and 6 where some oscillator is at the end of its cycle. The preconditions of both cases are similar: the counter is required to be in the update mode, and there is an oscillator w, whose phase is T and mode is start. Furthermore, in s ′ , the mode of w is update, and the state of all other oscillators does not change. The difference between the cases is whether the counter is increased, that is, whether the oscillator manages to broadcast its signal. The probability of succeeding is 1−µ |init T (s)| , since there may be more than one oscillator in phase T at state s. Hence we have to normalise the tranistion probability accordingly. Similarly, the probability of failing to fire is µ |init T (s)| .
If p θ (η) = update and there is a w s.t. (2) If p θ (η) = update and there is a w s.t. (3) If no oscillator is at the end of its cycle, that is, in case 2, we define the probability of one oscillator updating its mode as follows. Observe that we have to normalise the transition probability by the number of all oscillators that have not transitioned to their update mode yet. This is correct, since no oscillator fires, which also means that no oscillator can be activated beyond the maximum phase. This implies in particular that the order of oscillator transitions does not matter in this round.
Now we will consider the cases 3 and 4, where some oscillator already fired (i.e., pc(η) > 0), and other oscillators are perturbed. We distinguish between two cases: either an oscillator is sufficiently perturbed to also fire or the perturbation does not cause the phase to exceed the firing threshold. One complication arises in these cases: we have to ensure that we only allow the oscillators to update their mode once all oscillators with a higher phase have been considered. Since the perturbation function is increasing, a higher phase may result in a higher perturbation. That is, oscillators with a higher phase need to be perturbed by fewer firing oscillators before their phase is increased beyond the threshold and they in turn fire. Hence, if we did not enforce such an order, oscillators with a lower phases might not be perturbed when oscillators with a higher phase fire. Again, observe that we normalise the transition probabilities according to the number of oscillators satisfying similar conditions. That is, this time we need to normalise on the number of oscillators with the same phase in the start mode.
If p θ (η) = update and there is a w s.t. (5) The cases where a perturbed oscillator fires are analogous to oscillators with a maximal phase, except for the addititional conditions that some other oscillator fired, and that all oscillators with higher phases have already been considered.
The final case 7, where all oscillators update their clock values simultaneously, is given by the following equation. It requires that all oscillators have finished their computation, whether they fire, and both the counter and the oscillators will reset their mode to start after the transition.
The formula F update is an abbreviation for the conjunction of the following four conditions, which model the update of the phases of the oscillators, according to the phase response function. Observe that the phases of the oscillators had not been updated by the previously defined transitions. Hence, we now update the phases of all oscillators at once.
In this formula, (8a) handles the simple case of firing oscillators, while (8b) defines the behaviour of oscillators within their refractory period. The formulas (8c) and (8d) reflect the two cases where oscillators are perturbed, either not exceeding their oscillation cycle, or firing, respectively.
With this model, we could begin to analyse the synchronisation behaviour with respect to different phase response functions or broadcast failure probabilities. However, the state space of the model increases exponentially with the number of oscillators, which makes an analysis beyond small numbers of infeasible. To overcome this restriction, we increase the level of abstraction as presented in the next section.

Population Model
In this section, we define a population model of a network of pulse-coupled oscillators for parameters as defined in Sect. 4.1 as S = (∆, N, T, R, ǫ, µ). Oscillators in our model have identical dynamics, and two oscillators are indistinguishable if they share the same phase. That is, we can reason about groups of oscillators, instead of individuals. We therefore encode the global state of the model as a tuple k 1 , . . . , k T where each k Φ is the number of oscillators sharing a phase value of Φ. The population model does not account for the introduction of additional oscillators to a network, or the loss of existing coupled oscillators. That is, the population N remains constant. Fig. 2: Evolution of the global state over four discrete time steps.
The set of all global states of S is Γ (S), or simply Γ when S is clear from the context. Example 1 Figure 2 shows four global states for an instantiated population model of N = 8 oscillators with T = 10 discrete values for their phase and a refractory period of length R = 2. We assume that the phase response function is linear, denotes rounding to the closest integer. Furthermore, let ǫ = 0.115. For example σ 0 = 0, 0, 2, 1, 0, 0, 5, 0, 0, 0 is the global state where two oscillators have a phase of three, one oscillator has a phase of four, and five oscillators have a phase of seven. The starred node indicates the number of oscillators with phase ten that will fire in the next moment in time, while the shaded nodes indicate oscillators with phases that lie within the refractory period (one and two). If no oscillators have some phase Φ then we omit the 0 in the corresponding node. Observe that, while going from σ i−1 to σ i (1 i 3), the oscillator phases increase by one. In the next section, we will explain how transitions between these global states are made. Note that directional arrows indicate cyclic direction, and do not represent transitions.
With every state σ ∈ Γ we associate a non-empty set of failure vectors, where each failure vector is a tuple of broadcast failures that could occur in σ.
We denote the set of all possible failure vectors by F.

Given a failure vector
. . , N } indicates the number of broadcast failures that occur for all oscillators with a phase of Φ. If f Φ = ⋆ then no oscillators with a phase of Φ fire. Semantically, f Φ = 0 and f Φ = ⋆ differ in that the former indicates that all (if any) oscillators with phase Φ fire and no broadcast failures occur, while the latter indicates that all (if any) oscillators with a phase of Φ do not fire. If no oscillators fire at all in a global state then we have only one possible failure vector, namely {⋆} T .

Transitions
In Section 5.2 we will describe how we can calculate the set of all possible failure vectors for a global state, and thereby identify all of its successor states. However we must first show how we can calculate the single successor state of a global state σ, given some failure vector F .
Absorptions. For real deployments of synchronisation protocols it is often the case that the duration of a single oscillation cycle will be at least several seconds [10,28]. The perturbation induced by the firing of a group of oscillators may lead to groups of other oscillators to which they are coupled firing in turn. The firing of these other oscillators may then cause further oscillators to fire, and so forth, leading to a "chain reaction", where each group of oscillators triggered to fire is absorbed by the initial group of firing oscillators. Since the whole chain reaction of absorptions may occur within just a few milliseconds, and in our model the oscillation cycle is a sequence of discrete states, when a chain reaction occurs the phases of all perturbed oscillators are updated at one single time step.
Since we are considering a fully connected network of oscillators, two oscillators sharing the same phase will have their phase updated to the same value in the next time step. They will always perceive the same number of other oscillators firing. Therefore, for each phase Φ we define the function is the number of oscillators with a phase greater than Φ perceived to be firing by oscillators with phase Φ, in some global state, incorporating the broadcast failures defined in the failure vector F . This allows us to encode the aforementioned chain reactions of firing oscillators. Note that our encoding of chain reactions results in a global semantics that differs from typical parallelisation operations, for example, the construction of the cross product of the individual oscillators. Observe that, in the concrete model of Sect. 4.2, we modelled such a behaviour by case 4.
Given a global state σ = k 1 , . . . , k T and a failure vector F = f 1 , . . . , f T , the following mutually recursive definitions show how we calculate the values α 1 (σ, F ), . . . , α T (σ, F ), and how functions introduced in Sect. 4.1 are modified to indicate the update in phase, and firing, of all oscillators sharing the same phase Φ. Observe that to calculate any α Φ (σ, F ) we only refer to definitions for phases greater than Φ and the base case is Φ = T , that is, values are computed from T down to 1. The function ref is the refractory function as defined in Sect. 4.1.
Transition Function. We now define the transition function that maps phase values to their updated values in the next time step. Note that since we no longer distinguish different oscillators with the same phase we only need to calculate a single value for their evolution and perturbation.
Definition 8 The phase transition function τ : Γ × {1, . . . , T } × F → N maps a global state σ, a phase Φ, and some possible failure vector F for σ, to the updated phase in the next discrete time step, with respect to the broadcast failures defined in F , and is defined as Let U Φ (σ, F ) be the set of phase values Ψ where all oscillators with phase Ψ in σ will have their phase updated to Φ in the next time step, with respect to the broadcast failures defined in F . Formally, We can now calculate the successor state of a global state σ and define how the model evolves over time. Definition 9 The successor function → succ : Γ × F → Γ maps a global state σ and a failure vector F to a state σ ′ , and is defined as Example 2 Recall that the perturbation function of our example was given as denotes rounding and ǫ = 0.115. Consider the global state σ 2 of Fig 3 where no oscillators will fire since k 10 = 0. We therefore have one possible failure vector for σ 0 , namely F = {⋆} 10 . Since no oscillators fire the dynamics of the oscillators are determined solely by their standalone evolution, and all oscillators simply increase their phase by 1 in the next time step. Now consider the global state σ 3 and F = ⋆, ⋆, ⋆, ⋆, ⋆, ⋆, 1, 0, 0, 0 , a possible failure vector for σ 3 , indicating that oscillators with phases of 7 to 10 will fire and one broadcast failure will occur for the single oscillator that will fire with phase 7.

Failure Vector Calculation
We construct all possible failure vectors for a global state by considering every group of oscillators in decreasing order of phase. At each stage we determine if the oscillators would fire. If they fire then we consider each outcome where any, all, or none of the firings result in a broadcast failure. We then add a corresponding value to a partially calculated failure vector and consider the next group of oscillators with a lower phase. If the oscillators do not fire then there is nothing left to do, since by Def. 4 we know that ∆ is increasing, therefore all oscillators with a lower phase will also not fire. We can then pad the partial failure vector with ⋆ appropriately to indicate that no failure could happen since no oscillator fired. Table 1 illustrates how a possible failure vector for global state σ 3 in Fig. 3 is iteratively constructed. The first three columns respectively indicate the current iteration i, the global state σ 3 with the currently considered oscillators underlined, and the elements of the failure vector F computed so far. The fourth column is true if the oscillators with phase T +1−i would fire given the broadcast failures in the partial failure vector. We must consider all outcomes of any or all firings resulting in broadcast failure. The final column therefore indicates whether the value added to the partial failure vector in the current iteration is the only possible value (false), or a choice from one of several possible values (true).
Initially we have an empty partial failure vector. At the first iteration there are 5 oscillators with a phase of 10. These oscillators will fire so we must consider each Table 1: Construction of a possible failure vector for a global state σ 3 = 0, 0, 0, 0, 0, 2, 1, 0, 0, 5 . Here we choose 0 broadcast failures, which is then added to the partial failure vector. At iterations 2 and 3 the oscillators would have fired, but since there are no oscillators with a phase of 9 or 8 we only have one possible value to add to the partial failure vector, namely 0. At iteration 4 a single oscillator with a phase of 7 fires, and we choose the case where the firing resulted in a broadcast failure. In the final iteration oscillators with a phase of 6 do not fire, hence we can conclude that oscillators with phases less than 6 also do not fire, and can fill the partial failure vector appropriately with ⋆. Formally, we define a family of functions fail indexed by Φ, where each fail Φ takes as parameters some global state σ, and V , a vector of length T − Φ. V represents all broadcast failures for all oscillators with a phase greater than Φ. The function fail Φ then computes the set of all possible failure vectors for σ with suffix V . Here we use the notation v ⌢ v ′ to indicate vector concatenation.
Observe that the result of fail T is always a set of well defined failure vectors, since whenever ⋆ is introduced into a failure vector at index Φ, all preceding indices are also filled with ⋆, as required by Definition 7.
Definition 11 Given a global state σ ∈ Γ , we define Fσ, the set of all possible failure vectors for that state, as Fσ = fail T (σ, ), and define next (σ), the set of all successor states of σ, as next Note that for some global states |next (σ)| < |Fσ|, since we may have that Given a global state σ and a failure vector F ∈ Fσ, we will now compute the probability of a transition being made to state → succ(σ, F ) in the next time step. Recall that µ is the probability with which a broadcast failure occurs. Firstly we define the probability mass function PMF : {1, . . . , N } 2 → [0, 1], where PMF(k, f ) gives the probability of f broadcast failures occurring given that k oscillators fire, We then denote by PFV : Γ × Fσ → [0, 1] the function mapping a possible broadcast failure vector F for σ, to the probability of the failures in F occurring. That is, Lemma 2 For any global state σ, PFV is a discrete probability distribution over Fσ.
Proof Given a global state σ = k 1 , . . . , k T we can construct a tree of depth T where each leaf node is labelled with a possible failure vector for σ, and each node Λ at depth Φ is labelled with a vector of length Φ corresponding to the last Φ elements of a possible failure vector for σ. We denote the label of a node Λ by V (Λ). We label each node Λω with ω ⌢ V (Λ). We iteratively construct the tree, starting with the root node, root , at depth 0, which we label with the empty tuple . For each node Λ at depth 0 Φ < T we construct the children of Λ as follows: 1. If oscillators with phase Φ fire we define the sample space Ω = {0, . . . , n Φ } to be a set of disjoint events, where each ω ∈ Ω is the event where ω broadcast failures occur, given that k Φ oscillators fired. For each ω ∈ Ω there is a child Λω of Λ with label ω ⌢ V (Λ), and we label the edge from Λ to Λω with PMF(k Φ , ω). 2. If oscillators with phase Φ do not fire then Λ has a single child Λ⋆ labelled with ⋆ ⌢ V (Λ), and we label the edge from Λ to Λ⋆ with 1.
We denote the label of an edge from a node Λ to its child Λ ′ by L(Λ, Λ ′ ). For case 2 we can observe that if oscillators with phase Φ do not fire then we know that oscillators with any phase Ψ < Φ will also not fire, since from Def. 4 we know that ∆ is an increasing function. Hence, all descendants of Λ will also have a single child, with an edge labelled with 1, and each node is labelled with the label of its parent, prefixed with ⋆ . After constructing the tree we have a vector of length T associated with each leaf node, corresponding to a failure vector for σ. The set Fσ of all possible failure vectors for σ is therefore the set of all vectors labelling leaf nodes. We denote by P ↓ (Λ) the product of all labels on edges along the path from Λ back to the root. Given a global state σ = k 1 , . . . , k T and a failure vector F = f 1 , . . . , f T ∈ Fσ labelling some leaf node Λ at depth T , we can see that Let D Φ denote the set of all nodes at depth Φ. We show d∈D Φ P ↓ (d) = 1 by induction on Φ. For Φ = 0, i.e., D Φ = {root }, the property holds by definition. Now assume that d∈D Φ P ↓ (d) = 1 holds for some 0 Φ < T . Let Λ be some node in D Φ , and let C Λ be the set of all children of Λ. Consider the following two cases: If oscillators with phase Φ do not fire then |C Λ | = 1, and for the only c ∈ C Λ we have that L(Λ, c) = 1. If oscillators with phase Φ fire observe that PMF is a probability mass function for a random variable defined on the sample space Ω = {0, . . . , k Φ }. In either case we can see that c∈C Λ L(Λ, c) = 1. Note that D Φ+1 = d∈D Φ C d , and recall that L(d, c) · P ↓ (d) = P ↓ (c). Therefore, Since c∈C d L(d, c) = 1 for each d ∈ D Φ , and from the induction hypothesis, we then have that We have already shown that P ↓ (Λ) = PFV(σ, F ) for any leaf node Λ labelled with a failure vector F , and since the set of all labels for leaf nodes is Fσ we can conclude that This proves the lemma. ⊓ ⊔ Example 3 We consider again the global states σ 3 = 0, 0, 0, 0, 0, 2, 1, 0, 0, 5 and σ 4 = 6, 0, 0, 0, 0, 0, 0, 0, 0, 2 , given in Fig. 3, of the population model instantiated in Example 1, and the failure vector F = ⋆, ⋆, ⋆, ⋆, ⋆, ⋆, 1, 0, 0, 0 given in Example 2, noting that F ∈ Fσ 3 , → succ(σ 3 , F ) = σ 4 , and µ = 0.1. We calculate the probability of a transition being made from σ 3 to σ 4 as PFV( 0, 0, 0, 0, 0, 2, 1, 0, 0, 5 , ⋆, ⋆, ⋆, ⋆, ⋆, ⋆, 1, 0, 0, 0 ) = 1 · 1 · 1 · 1 · 1 · 1 · PMF(1, 1) · PMF(0, 0) · PMF(0, 0) · PMF(5, 0) = (0.1 1 · 0.9 0 · 1) · (1) · (1) · (0.1 0 · 0.9 5 · 1) = 0.059049 We now have everything we need to fully describe the evolution of the global state of a population model over time. An execution path of a population model S is an infinite sequence of global states ω = σ 0 σ 1 σ 2 σ 3 · · · , where σ 0 is called the initial state, and σ k+1 ∈ next (σ) for all k 0.

Synchronisation
When all oscillators in a population model have the same phase in a global state we say that the state is synchronised. Formally, a global state σ = k 1 , . . . , k T is synchronised if, and only if, there is some Φ ∈ {1, . . . , T } such that k Φ = N , and hence k Φ ′ = 0 for all Φ ′ = Φ. We will often want to reason about whether some particular run ω of a model leads to a global state that is synchronised. We say that a path ω = σ 0 σ 1 · · · synchronises if, and only if, there exists some k 0 such that σ k is synchronised. Once a synchronised global state is reached any successor states will also be synchronised. Finally we can say that a model synchronises if, and only if, all runs of the model synchronise.

Model Construction
Given a population model S = (∆, N, T, R, ǫ, µ) we construct a DTMC D(S) = (Q, σ 0 , P, L) where L ranges over the singleton {synch}. We define the set of states Q to be Γ (S) ∪ {σ 0 }, where σ 0 is the initial state of the DTMC. For each σ = k 1 , . . . , k T ∈ Γ (S), we set In the initial state all oscillators are unconfigured. That is, oscillators have not yet been assigned a value for their phase. For each σ = k 1 , . . . , k T ∈ Q \ {σ 0 } we define to be the probability of moving from σ 0 to a state where k i arbitrary oscillators are configured with the phase value i for 1 i T . The multinomial coefficient defines the number of possible assignments of phases to distinct oscillators that result in the global state σ. The fractional coefficient normalises the multinomial coefficient with respect to the total number of possible assignments of phases to all oscillators. In general, given an arbitrary set of initial configurations (global states) for the oscillators, the total number of possible phase assignments can be calculated by computing the sum of the multinomial coefficients for each configuration (global state) in that set. Since Γ is the set of all possible global states, we have that We assign probabilities to the transitions as follows: for every σ ∈ Q \ {σ 0 }, we consider each F ∈ Fσ, and set P(σ, → succ(σ, F )) = PFV(σ, F ). For every combination of σ and σ ′ where σ ′ ∈ next (σ) we set P(σ, σ ′ ) = 0.

Model Reduction
We now describe a reduction of the population model that results in a significant decrease in the size of the model, but is equivalent to the original model with respect to the reachability of synchronised states. We first distinguish between states where one or more oscillators are about to fire, and states where no oscillators will fire at all. We refer to these states as firing states and non-firing states respectively. Definition 12 Given a population model S, a global state k 1 , . . . , k T ∈ Γ is a firing state if, and only if, k T > 0. We denote by Γ F the set of all firing states of S, and denote by Γ NF = Γ \ Γ F the set of all non-firing states of S. We will again omit S if it is clear from the context Given a DTMC D = (Q, σ 0 , P, L) let |P| = |{(t, t ′ ) | t, t ′ ∈ Q 2 and P(t, t ′ ) > 0}| be the number of non-zero transitions in P, and |D| = |Q| + |P| to be the total number of states and non-zero transitions in D.
We now proceed to prove this theorem. To that end, we need some preliminary properties of non-firing states and their relation to firing states. Lemma 3 Every non-firing state σ ∈ Γ NF has exactly one successor state, and in that state all oscillator phases have increased by 1.
Reachable State Reduction. Given a path ω = σ 0 · · · σ n−1 σn where σ i ∈ Γ NF for 0 < i < n and σ 0 , σn ∈ Γ F , we omit transitions (σ i , σ i+1 ) for 0 i < n, and instead introduce a direct transition from σ 0 , the first firing state, to σn, the next firing state in the sequence. For any σ = k 1 , . . . , k T ∈ Γ let δσ = max{Φ | k Φ > 0 and 1 Φ T } be the highest phase of any oscillator in σ. The successor state of a non-firing state is then the state where all phases have increased by T − δσ. Observe that T − δσ = 0 for any σ ∈ Γ F .

Definition 13 The deterministic successor function
maps a state σ ∈ Γ to the next firing state reachable by taking T −δσ deterministic transitions. Observe that for any firing state σ we have δσ = T , and hence that ։ succ(σ) = σ.
We now update the definition for the set of all successor states for some global state σ ∈ Γ to incorporate the deterministic successor function. Definition 15 Given a firing state σ ∈ Γ F let pred (σ) be the set of all non-firing predecessors of σ, where σ is reachable from the predecessor by taking some positive number of transitions deterministically. Formally, We refer to all states σ ′ ∈ pred (σ) as deterministic predecessors of σ.

Then given
to be the reduction of Q where all non-firing states from which a firing state can be reached deterministically are removed.  Fig. 4: Five possible initial configurations in Q for N = 2, T = 6.
Proof Let P = σ∈Γ F pred (σ) be the set of all predecessors of firing states in Γ F .
and only if, P = Γ NF . From Definition 15 it follows that P ⊆ Γ NF . In addition, for any σ ∈ Γ NF there is some state σ ′ such that σ ∈ pred (σ ′ ) and σ ′ = ։ succ(σ) ∈ Γ F , hence Γ NF ⊆ P and the lemma is proved. ⊓ ⊔ Lemma 5 For a population model S = (∆, N, T, R, ǫ, µ) and its corresponding DTMC D = (Q, σ 0 , P, L) with Q = Γ ∪ {σ 0 }, the number of states in the reduction of Q is given by Proof Observe that there are ( N +T −1 N ) ways to assign T distinguishable phases to N indistinguishable oscillators [15]. Since Q = Γ ∪ {σ 0 } and Γ is the set of all possible configurations for oscillators we can see that |Q| = ( N +T −1 N ) + 1. For any non-firing state σ = k 1 , . . . k T ∈ Γ NF we know from Definition 6 that T Φ=1 k Φ = N and from Definition 12 that k T = 0, so it must be the case that T −1 Φ=1 k Φ = N . That is, there must be ( N +T −2 N ) ways to assign T −1 distinguishable phases to N indistinguishable oscillators, and so |Γ NF | = ( N +T −2 N ). From Lemma 4 we know that Q ′ = Q \ Γ NF so it must be the case that Transition Matrix Reduction. Here we describe the reduction in the number of nonzero transitions in the model. We ilustrate how initial transitions to non-firing states are removed by using a simple example, and then describe how we remove transitions from firing states to any successor non-firing states.. Figure 4 shows five possible initial configurations σ i , . . . , σ i+4 ∈ Q for N = 2 oscillators with T = 6 values for phase, where a transition is taken from σ 0 to each σ k with probability P(σ 0 , σ k ). Any infinite run of D where a transition is taken from σ 0 to one of the configured states σ i , . . . , σ i+3 will pass through σ i+4 , since all transitions (σ i+k , σ i+k+1 ) for 0 k 3 are taken deterministically. Also, observe that states σ i , . . . , σ i+3 are not in Q ′ , since σ i+4 is reachable from each by taking some number of deterministic transitions. We therefore set the probability of moving from σ 0 to σ i+4 in P ′ to be the sum of the probabilities of moving from σ 0 to σ i+4 and each of its predecessors in P. Generally, given a state σ ∈ Q ′ where σ = σ 0 , we set P ′ (σ 0 , σ) = P(σ 0 , σ) + σ ′ ∈pred(σ) P(σ 0 , σ ′ ).
We now define how we calculate the probability with which a transition is taken from a firing state to each of its possible successors. For each firing state σ ∈ Q ′ we consider each possible successor σ ′ ∈ ։ next (σ) of σ and define F σ→σ ′ to be the set of all possible failure vectors for σ for which the successor of σ is σ ′ , given by F σ→σ ′ = {F ∈ Fσ | ։ succ( → succ(σ, F )) = σ ′ }. We then set the probability with which a transition from σ to σ ′ is taken to P ′ (σ, σ ′ ) = F ∈F σ→σ ′ PFV(σ, F ). Lemma 6 For a population model S = (∆, N, T, R, ǫ, µ), the corresponding DTMC D = (Q, σ 0 , P, L) with Q = {σ 0 } ∪ Γ , and its reduction D ′ (S) = (Q ′ , σ 0 , P ′ , L ′ ), the transitions in P are reduced in P ′ such that |P ′ | |P| − 2|Γ NF | Proof From Lemma 4 we know that |Q ′ | = |Q\Γ NF |, and hence that |Γ NF | transitions from σ 0 to non-firing states are not in P ′ , and from Lemma 3 we also know that there is one transition from each non-firing state to its unique successor state that is not in P ′ . Since no additional transitions are introduced in the reduction it is clear that |P ′ | |P| − 2|Γ NF |. ⊓ ⊔ Lemma 7 For every population model DTMC D = (Q, σ 0 , P, L), unbounded-time reachability properties with respect to synchronised firing states in D are preserved in its reduction D ′ .
Proof We want to show that for every ⊲⊳ ∈ {<, , , >} and every λ ∈ [0, 1], if σ 0 |= P ⊲⊳λ [F synch] holds in D then it also holds in D ′ . From the semantics of PCTL over a DTMC we have Therefore we need to show that where Pr D and Pr D ′ denote the probability measures with respect to the sets of infinite paths from σ 0 in D and D ′ respectively.
Given a firing state σ F ∈ Q we denote by Paths D σ F the set of all infinite paths of D starting in σ 0 where the first firing state reached along that path is σ F . All such sets for all firing states in Q form a partition, such that σ F ∈Γ F Paths D σ F = Paths D . That is, for all firing states σ F , σ F′ ∈ Q where σ F = σ F′ we have that Paths D σ F ∩Paths D σ F′ = ∅. Now observe that any infinite path ω of D can be written in the form ω = σ 0 ω NF 1 σ F 1 ω NF 2 σ F 2 · · · where σ F i is the i th firing state in the path and each ω NF i = σ 1 i σ 2 i · · · σ ki i is a possibly empty sequence of k i non-firing states. Then for every such path in D there is a corresponding path ω ′ of D ′ without non-firing states, and of the form 3 · · · , as for any i we have σ j i ∈ pred (σ F i ) for all 1 j k i . As only deterministic transitions have been removed in D ′ we can see that Hence, we only have to consider the finite paths from σ 0 to σ F 1 . To that end, observe that there are pred (σ F 1 ) possible prefixes for each path from σ 0 to σ F 1 where the initial transition is taken from σ 0 to some non-firing predecessor of σ F 1 , plus the single prefix where the initial transition is taken to σ F 1 itself. Overall there are exactly pred (σ F 1 ) + 1 distinct finite prefixes that have ω ′ as their corresponding path in D ′ . We denote the set of these prefixes for a path ω ′ in D ′ by Pref (ω ′ ). Since the measure of each finite prefix extends to a measure over the set of infinite paths sharing that prefix, it is sufficient to show that the sum of the probabilities for these finite prefixes is equal to the probability of the unique We can then write where k σ ′ is the number of deterministic transitions that lead from σ ′ to σ F 1 in D. Now recall that for any σ ∈ Q ′ \ {σ 0 } we have P ′ (σ 0 , σ) = P(σ 0 , σ) + σ ′ ∈pred(σ) P(σ 0 , σ).
So we have shown that Pr D Pref (ω ′ ) = Pr D ′ {σ 0 , σ F 1 } and the lemma is proved. ⊓ ⊔ Proof (of Theorem 1) Follows from Lemmas 5 and 6 for the reduction of states and transitions respectively, and from Lemma 7 for the preservation of unbounded time reachability properties. Table 2 shows the number of reachable states and transitions of the DTMC, and corresponding reduction, for different population sizes (N) and oscillation cycle lengths (T ), using the Mirollo and Strogatz model of synchronisation [25]. The number of reachable states is stable under changes to the parameters R, ǫ, and µ, since every possible firing state is always reachable from the initial state. For the results shown here the parameters were arbitrarily set to R = 1, ǫ = 0.1. The underlying graph of the DTMC, and hence the number of transitions, is stable under changes to the parameter µ, and is not if interest here.  Table 3 shows the number of transitions of the DTMC, and corresponding reduction, for various population model instances, and again uses the Mirollo and Strogatz model of synchronisation. Increasing the length of the refractory period (R) results in an increase in the reduction of transitions in the model. A longer refractory period leads to more firing states where the firing of a group of oscillators is ignored. This results in successor states having oscillators with lower values for phase, and hence a longer sequence of deterministic transitions (later removed in the reduction) leading to the next firing state. Conversely, increasing the strength of the coupling between oscillators (ǫ) results in a decrease in the reduction of transitions in the model. For the Mirollo and Strogatz model of synchronisation used here, increasing the coupling strength results in a linear increase in the pertubation to phase induced by the firing of an oscillator. This results in successor states of firing states having oscillators with higher values for phase, and hence a shorter sequence of deterministic transitions leading to the next firing state.

Reward Structures for Reductions
While probabilistic reachability properties allow us to quantitatively analyse models with respect to the likelihood of reaching a synchronised state, they do not allow us to reason about other properties of interest, for instance the expected time taken for the network to synchronise [16], or the expected energy consumption of the network [17]. Therefore, we will often want to augment the DTMC corresponding to a population model with rewards. We do this by annotating states and transitions with real-valued rewards (respectively costs, should values be negative) that are awarded when states are visited, or transitions taken. Definition 16 Given a DTMC D = (Q, σ 0 , P, L) a reward structure for D is a pair R = (Rs, R t ) where Rs : Q → R and R t : Q × Q → R are the state reward and state transition functions that respectively map real valued rewards to states and transitions in D.
For any finite path ω = σ 0 · · · σ k of D we define the total reward accumulated along that path up to, but not including, σ k as Given a DTMC D = (Q, σ 0 , P, L) augmented with a reward structure R, and some state σ ∈ Q, we will often want to reason about the reward that is accumulated along a path ω = σ 0 σ 1 σ 2 · · · ∈ Paths that eventually passes through some set of target states Ω ⊂ Q. We first define a random variable over the set of infinite paths V Ω : Paths → R ∪ {∞}. Given the set ω Ω = {j | σ j ∈ Ω} of indices of states in ω that are in Ω we define the random variable and define the expectation of V Ω with respect to Pr σ by The logic of PCTL can be extended to include reward properties by introducing the state formula R⊲⊳r[F Ψ ], where ⊲⊳∈ {<, , , >} and r ∈ R [22]. Given a state σ ∈ Q, a real value r, and a PCTL path formula Ψ , the semantics of this formula is given by where Sat (Φ) denotes the set of states in Q that satisfy Φ. Given a reward structure R = (Rs, R t ) for D we construct the corresponding reward structure R ′ = (R ′ s , R ′ t ) as follows: -There is no reward for the initial state and we set Rs(σ 0 ) = 0. -For every firing state σ F in Q with Rs(σ F ) = r we set R ′ s (σ F ) = r. -For every pair of distinct firing states σ F We set the reward for taking the transition from σ F 1 to σ F 2 in D ′ to be the sum of the rewards that would be accumulated across that sequence by a path in D, formally -For every firing state σ F in Q ′ there is a non-zero transition from the initial state σ 0 to σ F in P ′ . Therefore, all paths of D ′ where σ F is the first firing state along that path share the same prefix, namely σ 0 , σ F . For paths of D this is not necessarily the case, since σ F is the first firing state not only along the path where the initial transition is taken to σ F itself, but also along any path where the initial transition is taken to a non-firing state from which a sequence of deterministic transitions leads to σ F (that state is a deterministic predecessor of σ F ). We therefore set the reward along a path ω ′ = σ 0 σ F 1 σ F 2 · · · for taking the initial transition to σ F in D ′ to be the sum of the total rewards accumulated along all distinct path prefixes of the form σ 0 ω NF σ F , normalised by the total probabilitiy of taking any of these paths, where ω NF is a possibly empty sequence of deterministic predecessors of σ F , and where the total reward for each prefix is weighted by the probability of taking the transitions along that sequence, Proof (of Theorem 2) We want to show that for every reward structure R for D and corresponding reward structure R ′ for D ′ , every ⊲⊳ ∈ {<, , , >} and every r ∈ R, if σ 0 |= R⊲⊳r[F synch] holds in D then it also holds in D ′ . Let V Sat(Fsynch ) and V ′ Sat (Fsynch ) respectively denote the random variables over Paths D (σ 0 ) and Paths D ′ (σ 0 ) whose expectations correspond to R and R ′ . From the semantics of PCTL over a DTMC we have Therefore, we need to show that where Pr D and Pr D ′ denote the probability measures with respect to the sets of infinite paths from σ 0 in D and D ′ respectively. There are two cases: Firstly, if there exists some path of D that does not synchronise then by definition V Sat (synch) = ∞. Also, from Lemma 7 we know that there is a corresponding path of D ′ that does not synchronise, and hence that V ′ Sat (synch ) = ∞. By definition the probability measure of all paths of D and D ′ are strictly positive. Therefore, all summands of Equation 17 are defined, and the expectation of both V Sat (synch) and V ′ Sat(synch ) is ∞. Secondly, we consider the case where all possible paths of D and D ′ synchronise. First we define the function reduce : Paths D → Paths D ′ that maps paths of D to their corresponding path in the reduction D ′ , where ω NF i is the (possibly empty) sequence of deterministic predecessors of the firing state σ F i . Let reduce −1 (ω) denote the preimage of ω under reduce . Then, we can rewrite the left side of (17) to For any path ω of D or D ′ let pre s (ω) be the prefix of that path whose last state is the first firing state along that path that is in the set Sat (synch). So we want to show that the following holds for any path ω ′ of D ′ , Given some path ω let ω[i : j] denote the sequence of states in ω from the i th firing state to the j th firing state along that path (inclusively). The notation ω[− : j] indicates that no states are removed from the start of the path i.e. the first state is σ 0 , and the notation ω[i : −] indicates that no states are removed from the end of the path. By recalling that Pr (σ 0 σ 1 · · · σn) = n i=1 P(σ i−1 , σ i ) we can see that Pr (σ 0 σ 1 · · · σn) = Pr (σ 0 · · · σ i )Pr(σ i · · · σn) for any 0 < i < n. Also from (15) it is clear that for any reward structure R, tot R (σ 0 · · · σn) = tot R (σ 0 · · · σ i ) + tot R (σ i · · · σn) holds for all 0 < i < n. Now we can rewrite (18) to By the definition of R ′ we can write the right hand side of (19) as From Lemma 7 we know that and hence obtain Since Pref (ω ′ ) is the set of all possible finite prefixes from the initial state σ 0 to the first firing state σ F 1 , and since ω[− : 1] = pre s (ω)[− : 1] clearly holds, we know that Using this fact, and by observing that by definition we can write (20) as This is the same as the left hand side of (19) and the theorem is proved. ⊓ ⊔

Connecting the Concrete Model and the Population Model
In this section, we define the abstraction function to connect a concrete model with a population model. To that end, let Dc = (Qc, s 0 , Pc) be a concrete model of a network of N PCOs with a clock cycle length T , a refractory period R, a phase response function ∆, a coupling ǫ and broadcast failure probability of µ. Furthermore, let Dp = (Qp, σ 0 , Pp) be the DTMC of a population model for the same parameters. For simpler notation, we introduce some general notation for transitions in DTMCs. If there is a possible transition between two states q and q ′ in a DTMC, that is P(q, q ′ ) > 0, then we also write q → q ′ . Observe that for this simplification, q and q ′ are either both in Qc or both in Qp. We also denote the reflexive, transitive closure of → by ⇒.

Proving the Correspondence between Concrete and Population Models
We need to associate states in Dc to states in Dp. In general, several concrete states will be mapped to a single population state, since we do not distinguish between different orders of oscillators in the latter, while we do in the former.
Furthermore, we want to abstract from different modes of the oscillators. However, it is not sensible to associate all modes within a phase to the same population state, since in the transitions from one mode to the next the system chooses, whether an oscillator fails to broadcast its pulse or not. If we want to be able to define a simulation relation, we need to represent the failures described by the transitions in the population model. To have an exact correspondence, we first collect all the concrete states where the counter and all oscillators are at the start mode into a single set. The abstraction function h : Q ′ c → Qp takes a concrete state s and counts the number of oscillators sharing the same phase, mapping s = (η, ν) to the corresponding state of the population model, To show that this abstraction is sensibly defined, we need to show that the concrete model can weakly simulate the transitions allowed by the population model, and vice versa. That is, if the abstraction σ 1 of a concrete state s 1 allows a transition to another population state σ 2 , then there is a sequence of transitions from s 1 leading to s 2 , whose abstraction is σ 2 . Furthermore, if there is a transition sequence from one concrete state s 1 to s 2 , where both statescan be abstracted to population states σ 1 and σ 2 , respectively, then there is also a sequence of transitions connecting σ 1 with σ 2 . This situation is visualised in Fig. 5.
For the first direction, we actually show this condition for a single transition in the population model. However, this result can be straightforwardly extended to transition sequences. Lemma 8 Let s 1 ∈ Q ′ c and σ 1 , σ 2 ∈ Qp such that h(s 1 ) = σ 1 and σ 1 → σ 2 . Then there is a s 2 ∈ Q ′ c such that s 1 ⇒ s 2 and h(s 2 ) = σ 2 . Furthermore, the sum of the probabilities of transition sequences from s 1 to an instantiation s 2 of σ 2 is equal to the probability of the transition from σ 1 to σ 2 .
That is, Q F c (s) denotes the set of oscillators possibly firing in s. The sets Q P c (s) and Q PF c (s) denote the sets of oscillators being perturbed but not firing (since the perturbation is not sufficient for the oscillators to reach the end of their cycle), and possibly firing, respectively. We can only say that elements of Q F c (s) and Q PF c (s) possibly fire, since they may be affected by a broadcast failure.
We now have to construct a sequence of transitions, where we draw the firing oscillators from the sets Q F c (s 1 ) and Q PF c,Φ (s 1 ), according to the broadcast failure vector F . Furthermore, all elements of Q F c (s 1 ) and the sets Q PF c,Φ (s 1 ) have to take transitions such that their phase value in the next iteration is 1.
Let σ 1 = k 1 , k 2 , . . . , k T . Now consider an arbitrary sequence u 1 , . . . , u kT of all k T elements from Q F c (s 1 ). Additionally, let C T ⊆ Q F c (s 1 ) be the set of oscillators in phase T with a broadcast failure, i.e., |C T | = f T . Observe that p φ (ν 1 (u j )) = T for all 1 j k T . Furthermore, let r 0 = s 1 . Then we define a sequence of successors of r 0 = (η 0 , ν 0 ) as follows, where 1 j k T . If k i ∈ C T , then Observe that these states define a sequence of transitions from r 0 to r kT according to conditions (2) and (3). Now, for each phase Φ, with k Φ < T , we proceed similarly. That is, we first choose a sequence u Φ 1 , . . . , u Φ kΦ of oscillators and a set C Φ ⊆ Q PF c,Φ (s 1 ) with Subsequently, we define each r Φ j to be Observe again, that these sequences exhaust Q PF c,Φ (s 1 ) for each phase Φ. Furthermore, we claim that the number of firing oscillators that are not inhibited by a broadcast failure in the concrete model coincides with the number of perceived firing oscillators in the population model in this phase. F ). Now let Φ < T and assume pc(η Φ+1 0 ) = α Φ+1 (σ 1 , F ). By definition, we have By assumption, we then get We now distinguish two cases. First, assume that {u | p φ (ν 1 (u) = T )} = ∅, and let s = (ηs, νs) be such that s 1 ⇒ s and p θ (νs(u)) = update for all u. Then there is exactly one transition s → s 2 , which is defined according to equation (8). Furthermore, due to the assumption that no oscillator fires, we have pc(ηs) = 0, which implies ∆(Φ, pc(ηs), ǫ) = 0 for all Φ by Definition 4. Hence, for all u, we have p φ (ν 2 (u)) = p φ (νs(u)) + 1 = p φ (ν 1 (u)) + 1. That is, That is, we have σ 1 → σ 2 due to a deterministic transition, which, in particular, The second case is more involved. Let us assume {u | p φ (ν 1 (u) = T )} = ∅, that is, at least one oscillator fires. Hence, due to the preconditions of the transitions, we can divide the transition sequence from s 1 to s 2 as follows: where r Φ = (η rΦ , ν rΦ ) denotes the state where all oscillators with phase Φ changed their mode to update. Our goal now is to find a broadcast failure vector F , such that → succ(σ 1 , F ) = σ 2 . To that end, let . . , f T . With this broadcast failure vector at hand, we now have to show that Recall that U Φ (σ 1 , F ) = {Ψ | τ (σ 1 , Ψ, F ) = Φ}. Together with the condition we want to prove, this implies, that we need to show p φ (ν 2 (u)) = τ (σ 1 , p φ (ν 1 (u)), F ) for all oscillators u. We now need again to distinguish several cases, according to the different cases of the transition defined by condition (8).
First, let u be such that p φ (ν 1 (u)) R, i.e., oscillator u is within its refractory period. If p φ (ν 1 (u)) = T , then we have Otherwise, if p φ (ν 1 (u)) < T , we have Now assume that p φ (ν 1 (u)) R, i.e., oscillator u is outside of its refractory period and thus will be perturbed by firing oscillators. If p φ (ν 1 (u)) = T , then we proceed as in the previous case. So, let us assume p φ (ν 1 (u)) < T . To show that the transition function of the population model coincides with the result within the concrete model, we need to ensure that the perceived firing oscillators are equal in both models for each oscillator.
Claim Within each phase, the perceived oscillators in the population model coincide with the oscillators that fired up to the next higher phase in the concrete model. Formally, for each 1 Φ < T , we have pc(η rΦ+1 ) = α Φ (σ 1 , F ).
Lemma 10 Let Dc = (Qc, s 0 , Pc) be a concrete network of oscillators and Dp = (Qp, σ 0 , Pp) be its abstraction as a population model, as well as s 1 , Q ′ c and σ 1 , σ 2 ∈ Qp, with h(s 1 ) = σ 1 . Then, the sum of the probabilities of transition sequences from s 1 to all instantiations s 2 with h(s 2 ) = σ 2 is equal to the probability of the transition from σ 1 to σ 2 .
Proof Let σ = k 1 , . . . , k T . Furthermore, let N = T i=1 k i . Now, let s be an arbitrary state corresponding to σ. If no oscillator fires, we have N ! possibilities to create an transition sequence, each of which has a probability of 1 N ! to happen. Hence, we get that the probability that one of these transitions happen is N ! · 1 N ! = 1, which coincides with the definition in the population model. For the case that at least one oscillator fires and thus perturbs the other oscillators, we consider the construction in the proof of Lemma 8 with respect to a failure vector F = f 1 , . . . , f T for σ. During each phase Φ, we have to choose the particular order of the k Φ and in addition, we have to choose the set C Φ . That is, we have k Φ ! possible orders, and k Φ f Φ possibilities for the choice of C Φ . Furthermore, the combined probability for the transitions of the oscillators that should fire but are inhibited by a broadcast failure is Observe that at the start of the construction of each phase, |init Φ (s)| = k Φ . Hence the probability above simplifies to Due to the possible choices during the construction of the transition sequence, we have that the probability of one of these sequences to happen is which is exactly the function PMF(k Φ , f Φ ) as in the population model. Furthermore, with similar reasoning as above, the transition probability for the sequences, where no oscillator is perturbed anymore, is 1. Hence, we get that the combined probability of the set of paths from one instantiation s 1 of a population model state σ 1 to an instantiation of a successor σ 2 of σ 1 is equal to the probability of the the transition from σ 1 to σ 2 . ⊓ ⊔ From Lemmas 8, 9 and 10, we immediately get that the same weak simulation relation holds between a population model and the concrete network of oscillators it represents. Theorem 3 Let Dc = (Qc, s 0 , Pc) be a concrete network of oscillators and Dp = (Qp, σ 0 , Pp) be its abstraction as a population model. If we have s 1 , s 2 ∈ Q ′ c and σ 1 , σ 2 ∈ Qp, with h(s i ) = σ i , then σ 1 → * σ 2 if, and only if, s 1 ⇒ s 2 . Furthermore, the probabilities over all paths in both models coincide. In particular, we have h(s 0 ) = σ 0 , and that s 0 weakly bisimulates σ 0 . Hence, we can use population models to analyse the global properties of a network of pulse-coupled oscillators following the concrete model as defined in Sect. 4 without loss of precision. In particular, this allows us to increase the size of the network to check such properties, while still giving us the opportunity to analyse the internal behaviour of nodes, if we restrict the network size.

Experimental Validation
As Theorem 3 implies, the synchronisation probabilities for a concrete model and its corresponding population model coincide. However, we have to keep in mind that the PCTL formulas describing synchronisation are of course different. For a concrete model with four nodes and a cycle length T = 10, the synchronisation probability can be queried with the following formula.
For a population model, the corresponding property is For both types of models, we defined a suitable input for the model checker Prism, and compared the results for different values of R, ǫ and µ. As expected, the model checking results were matching exactly. Table 4 shows the model construction and checking times for some exemplary parameter combinations of the models as reported by Prism 1 . In the concrete model, the bulk of time is spent in the model checking phase, while the construction is much faster. For the analysis of the population model, however, the situation is reversed. The model construction phase is an order of magnitude longer than the model checking phase. As expected, the model checker needs less time for the analysis of the population model, if we add the time needed for model construction and checking.

Conclusion
In this paper we have introduced a formal concrete model for a network of nodes synchronising their clocks over a set of discrete values. Furthermore, we developed a population model that can alleviate state-space explosion when reasoning about significantly larger networks. We encoded both models as discrete-time Markov chains, and formally connected them by showing that a concrete model of a network weakly simulates a population model of that same network. We then showed that these two models are equivalent with respect to the reachability of distinguished states, namely those where all nodes in the network have synchronised their clocks.
Formalising the individual nodes of a network allows for the analysis of their internal properties. However, this internal structure also inhibits the verification of global network properties. Modelling the whole network as the product of the models for the individual nodes quickly, and unsurprisingly, results in a model that is too large to analyse with existing tools and techniques. While the use of appropriate collective abstractions, such as population models, allow for the analysis of larger networks, they often impose restrictions on the topologies of the network that can be considered. We could, of course, simply take the product of individual population models to represent network structures more specialised than the fully-connected graphs considered here, but again we face the consequences of this approach when trying to analyse the resulting model. In addition, when using population models we lose the possibility to distinguish between nodes having the same internal state. However, this does not restrict our analysis when considering networks of homogeneous nodes where the properties of interest relate to global behaviours of the network itself.
Our current definition of pulse-coupled oscillators only allows for non-negative results of the phase response function. However, there are also oscillator definitions with phase response functions with possibly negative values [32]. That is, instead of shifting the state of an oscillator towards the end of the cycle, the perturbation may reduce the value of the oscillator's state. It would be interesting to study the impact of negative-valued phase response functions in the setting of discrete clock values. While a concrete model can be instantiated to incorporate different topologies by explicit encoding of possible perturbances in the nodes' transitions, it is by no means obvious how to encorporate topologies into a population model. By design, the nodes in the latter are indistinguishable, hence the differences in the connections between nodes are lost. We could alleviate this restriction slightly, by modelling the connection between networks of strongly connected components. That is, each component can be modelled by a different population model, and the firings within one model can perturb different models. However, this would mean to compute the cross-product of the population model, and hence we are back at the state-space explosion problem. Furthermore, our abstract relation would need to take the mapping of single nodes into different components into account.
Deductive approaches might serve as an additional way to verify larger systems. In particular, due to the regularity of population models, we conjecture the existence of an inductive invariant that holds from a certain size of models onwards. That is, as soon as the population grows to a size to be treated as a single entity, we can increase this size by one node, and guarantee that synchronisation still occurrs. For the population model sizes below this threshold, we could still use our proposed model-checking technique as the induction base. However, is is not clear what such an invariant should be, and how it can be verified.