Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Wireless sensor networks (WSNs) are (possibly large-scale) networks of sensor nodes deployed in strategic areas to gather data. Sensor nodes collaborate using wireless communications with an asymmetric many-to-one data transfer model. Typically, they send their sensed data to a sink node which collects the relevant information. WSNs are primarily designed for monitoring environments that humans cannot easily reach (e.g., motion, target tracking, fire detection, chemicals, temperature); they are used as embedded systems (e.g., biomedical sensor engineering, smart homes) or mobile applications (e.g., when attached to robots, soldiers, or vehicles).

In wireless sensor networks, the basic operation is data fusion, whereby data from each sensor is agglomerated to form a single meaningful result. The fusion of individual sensor readings is possible only by exchanging messages that are timestamped by each sensor’s local clock. This mandates the need for a common notion of time among the sensors which is achieved by means of so called clock synchronization protocols [13, 15].

In this paper we do model checking of a distributed algorithm of clock synchronization for WSNs that has been developed by the Dutch company CHESS [12]. In order to realize an energy efficient communication mechanism, CHESS developed a gossip-based MAC algorithm [14] (abbreviated gMAC) which is responsible for regulating the access to the wireless shared channel. Here we are interested in verifying the robustness of the gMAC algorithm in the presence of packet loss. Packet loss is particularly relevant in wireless sensor networks which are deployed in environments with significant multi-path distortion (when part of the signal goes to the destination while another part bounces off an obstruction and then goes on to the destination). Most of sensor platforms do not have enough frequency diversity to reject multi-path propagation.

Our work has been strongly inspired by a recent analysis [8] of the gMAC synchronization protocol on clique and line topologies, in the ideal case of non-lossy communication. In the case of line topologies, paper [8] shows that the protocol fails to synchronize all nodes, when the number of nodes grows. On the other hand, on clique topologies the protocol behaves quite well and the paper provides constraints on guard times (delays added before and after the transmission of sync messages) to guarantee clock synchronization, independently on the clique size. In [8] the protocol is modeled as a network of timed automata and verified using the UPPAAL model checker [2, 3]. However, the model in [8] does not incorporate several features such as dynamic slot allocation, uncertain communication delays, and unreliable radio communication. In the current paper we extend their analysis by adopting a probabilistic model of radio communication that takes into account message loss according to the measurement of packet delivery suggested in [17]. As our model is a network of probabilistic timed automata, we decided to do our analysis by applying Statistical Model Checking (SMC) [7] within the UPPAAL toolset [2, 3]. SMC consists in monitoring a proper number of runs of the system and then applying a statistical algorithm to obtain an estimate of the result of the desired query.

Our analysis shows that low guard times (within the safety range proposed in [8]) are not sufficient to guarantee clock synchronization in clique topologies of arbitrary size. More precisely, in the case of lossy communication, the size of the clique does play a crucial role in the effectiveness of the protocol: the bigger is the clique the higher must be the guard time to ensure clock synchronization with high probability. Here it is important to notice that guard times cannot be arbitrary increased without dramatically affecting the duration of the battery life of the sensor nodes [1]. Finally, we move our analysis on grid topologies, with increasing neighbor degree, to better simulate a uniform node distribution of sensor nodes in a given area. Our simulations show that high values of the guard times may be not sufficient to guarantee clock synchronization in the presence of message loss, even in small \(5 \times 5\) grid networks. On the other hand, we observe that the efficiency of the protocol improves when the number of neighbours, and hence node connectivity, increases.

Outline

Section 2 introduces the gMAC protocol. Section 3 illustrates the corresponding UPPAAL probabilistic model. Section 4 details our analysis on cliques and grid topologies. Section 5 concludes the paper with final remarks, future and related work.

2 The gMAC Protocol

The gMAC protocol is a Time Division Multiple Access (TDMA) protocol, where time is divided into fixed length frames, and each frame is subdivided into slots. Slots can be either active or idle. During active slots, a node is either listening for incoming messages from neighbouring nodes (RX slot) or it is sending a message (TX slot). During idle slots a node is switched to energy saving mode. Active slots are gathered in a contiguous sequence placed at the beginning of each frame.

Since energy efficiency is a major concern in the design of wireless sensor networks, the number of active slots is typically much smaller than the total number of slots. In the implementation of gMAC the number of slots within a frame is 1129 out of which 10 are active. A node can only transmit a message once per time frame in its TX slot. If two neighbouring nodes choose the same send slot then a communication collision will occur in the intersection of their transmission ranges preventing message delivery. In the original protocol a node randomly chooses an active slots as send slot (TX slot) considering all other active slots as receive slots (RX slots). However, for the sake of simplicity, as in [8], in our analysis we assume that the TX slots are fixed and have been chosen in such a way that no collision occurs.

In order to ensure that when a node is sending all its neighbours are listening, guard times are introduced. This means that each sender waits for some time (\(\mathsf g \) clock cycles) at the beginning of its TX slot to ensure that all its neighbours are ready to receive messages; similarly, a sender does not transmit for a certain amount of time (\(\mathsf t \) clock cycles) at the end of its TX slot. Guard times cannot be arbitrary increased without dramatically affecting the duration of the battery life of the sensor nodes [1]. So, the choice of proper guard time values is crucial in the protocol design. In the current implementation, each slot consists of 29 clock cycles, out of which 18 cycles are used as guard time.

The CHESS sensor nodes come equipped with a 32 kHz crystal oscillator that drives an internal clock used to determine the beginning and the end of each slot. Sensor nodes are also equipped with an ATMega64 micro-controller and a Nordic nRF24L01 [10] packet radio. Depending on the environment, the Nordic nRF24L01 radio has a transmission range between \(0.5m\) and \(50m\), For the sake of simplicity we assume that all nodes have the same transmission range; this means that the transmission between nodes is assumed to be symmetric.

3 UPPAAL Probabilistic Model for gMAC

In this section, we provide a small extension of the UPPAAL model for gMAC of [8] in which a probabilistic choice to model message loss is introduced. The model assumes a finite, fixed set of sensor nodes \(\mathsf {Nodes} = \{ 0, \ldots , \mathsf N -1 \}\). The behaviour of each individual node \(i \in \mathsf {Nodes} \) is described by means of three different timed automata: \(\mathbf {Clock} (i)\), \(\mathbf {WSN} (i)\), \(\mathbf {Synchronizer} (i)\). Automaton \(\mathbf {Clock} (i)\) models the hardware clock of the node, \(\mathbf {WSN} (i)\) takes care of sending messages, and \(\mathbf {Synchronizer} (i)\) re-synchronizes the hardware clock upon receipt of a message. The automaton \(\mathbf {Synchronizer} (i)\) is the only one where probabilities are introduced to model packet loss. The complete model consists of the composition of the three automata \(\mathbf {Clock} (i)\), \(\mathbf {WSN} (i)\) and \(\mathbf {Synchronizer} (i)\), for each \(i \in \mathsf {Nodes} \).

For each node \(i\) there are two state variables: \(\mathsf {clk} [i]\), which records the value of the hardware clock (initially \(0\)), and \(\mathsf {csn} [i]\), which records the current slot number (initially \(0\)). Furthermore, there are two broadcast channels: \(\mathsf {tick} [i]\), used to synchronize the activities within the node \(i\), and \(\mathsf {start\_message} [i]\), used to inform all neighbours of the beginning of \(i\)’s transmission. Table 1 reports the protocol parameters.

Table 1. Protocol parameters

Figure 1 depicts the automaton \(\mathbf {Clock} (i)\) of [8] modeling the hardware clock of node \(i\). The local clock variable \(\mathsf x \) measures the time between two consecutive clock ticks. A \(\mathsf {tick} [i]!\) action is enabled when \(\mathsf x \) reaches the value \(\mathsf {min} \) and must fire before \(\mathsf x \) reaches the value \(\mathsf {max} \). When the action \(\mathsf {tick} [i]!\) occurs the variable \(\mathsf x \) is reset to \(0\) and the variable \(\mathsf {clk} [i]\) is incremented by \(1\). As explained in [8], the state variable \(\mathsf {clk} [i]\) is reset after \(\mathsf k _0\) clock ticks for the model checking to become feasible. A realistic clock drift rate is about \(20\) ppm (parts-per-million). Such a rate is achieved in the model by setting \(\mathsf {min} =10^5 - 2\) and \(\mathsf {max} =10^5 +2\). In the model of [8] ticks may nondeterministically occur within the time interval \([\mathsf {min} ,\mathsf {max} ]\); thus the delay between two \(\mathsf {tick} [i]!\) actions is nondeterministic. The stochastic semantics for timed automata of UPPAAL SMC excludes nondeterminism; thus the delay between two \(\mathsf {tick} [i]!\) actions is implemented as a uniformly distributed stochastic delay. This may be non-realistic in general: clocks usually run either too fast or too slow for long periods of time, due to environmental differences. However in our analysis we will focus on small networks and we will assume all sensors being in the same environmental conditions. Thus we exploit UPPAAL SMC’s random clock speed jitter.

Fig. 1
figure 1

\(\mathbf {Clock} (i)\)

Fig. 2
figure 2

\(\mathbf {WSN} (i)\)

Figure 2 describes automaton \(\mathbf {WSN} (i)\) of [8] devoted to message sending. The automaton waits in the initial location \(\mathsf {WAIT} \) until the current slot number \(\mathsf {csn} [i]\) equals the TX slot \(\mathsf {tsn} [i]\), and \(\mathsf g \) ticks occur in that slot. Then the automaton moves to the location \(\mathsf {GO\_SEND} \) which is immediately left by performing a \(\mathsf {start\_message} [i]!\) action. This action leads the automaton to the location \(\mathsf {SENDING} \). The automaton remains in this location until the beginning of the tail interval (which starts after \(\mathsf {k _0}{-}\mathsf t \) ticks). Then the automaton returns to the location \(\mathsf {WAIT} \) where it increments the current slot number \(\mathsf {csn} [i]\) every \(\mathsf {k _0}\) ticks.

Fig. 3
figure 3

\(\mathbf {Synchronizer} (i)\)

Figure 3 contains the automaton \(\mathbf {Synchronizer} (i)\) which is devoted to synchronize the hardware clock. We enrich the corresponding automaton of [8] with a simulation of message loss on the channel \(\mathsf {start\_message} [i]\). The UPPAAL model checker features branching edges with associated weights for the probabilistic extension. Thus we define an integer constant \(\mathsf {loss} \), with \(0\le \mathsf {loss} \le 100\), and every node can either lose a message with weight \(\mathsf {loss} \) or receive it with weight \((100{-}\mathsf {loss} )\). The automaton \(\mathbf {Synchronizer} (i)\) waits in its initial location \(\mathsf {S0} \) until it detects, in an active slot (\(\mathsf {csn} [i]<\mathsf n \)), the beginning of a new message from a neighbor \(j\) (action \(\mathsf {start\_message} [j]?\)). When this happens the automaton moves to a committing location \(\mathsf C \) and it immediately goes to a branching edge where: (i) with weight \(\mathsf {loss} \) it returns in its initial location \(\mathsf {S0} \) and (ii) with weight \(100{-}\mathsf {loss} \) it goes to location \(\mathsf {S1} \). Case (i) formalizes the loss of the starting message from node \(j\), while case (ii) formalizes the reception of the same message. Notice that UPPAAL requires input determinism to ensure that the system to be tested always produces the same outputs on any given sequence of inputs. Thus we need an extra intermediate location instead of branching immediately on the \(\mathsf {start\_message} [j]!\) action. Notice also that the \(\mathsf {start\_message} [j]!\) action occurs exactly when \(\mathsf {clk} [j]=\mathsf g \) (as a result of the synchronization of automata \(\mathbf {WSN} (j)\) and \(\mathbf {CLOCK} (j)\)); while the node \(i\), by means of automaton \(\mathbf {Synchronizer} (i)\), resets the variable \(\mathsf {clk} [i]\) to \(\mathsf g +1\) after a further tick. The guard \(\mathsf {neighbor} (i,j)\) indicates that \(i\) and \(j\) are in the transmission range of each other. Formally, \(\mathsf {neighbor} ()\) is a symmetric function related to the slot allocation in the following manner [8]:

$$\begin{aligned} \begin{array}{rcl} \mathsf {neighbor} (i,j) &{} \Longrightarrow &{} \mathsf {tsn} [i]\ne \mathsf {tsn} [j] \\ \mathsf {neighbor} (i,j) \wedge \mathsf {neighbor} (i,k) &{} \Longrightarrow &{} \mathsf {tsn} [j]\ne \mathsf {tsn} [k] \end{array} \end{aligned}$$
(1)

This means that whenever two nodes are neighbours or have a common neighbor then they must have distinct TX slot numbers. The function \(\mathsf {neighbor} ()\) is helpful to provide a formal definition of synchronized sensor networks. Intuitively, a sensor network is said to be synchronized if, whenever a node is sending in a given slot number then all neighbouring nodes are in the same slot number.

Definition 1.

A network is said to be synchronized if for all reachable states \((\,\forall i, j \in \mathsf {Nodes} \,)(\,(\,\mathsf {SENDING} _i \wedge \mathsf {neighbor} (i,j)\,) \Longrightarrow (\,\mathsf {csn} [i] = \mathsf {csn} [j]\,)\,)\).

4 Our Analysis

UPPAAL Statistical Model Checker [7] evaluates properties on execution runs of a network of probabilistic timed automata. The execution time of these runs is represented by a variable \(\mathtt {time} \). It is left to the user to set a bound to this variable. In particular, fixed a constant value \(\mathtt {bound} \), the Statistical Model Checker can reply to queries of the following shape

$$\begin{aligned} \mathtt {Pr[time<=bound]\,(<> expr)} \end{aligned}$$

by performing an adequate number of runs to estimate the probability to reach a state which satisfies the property \(\mathtt {expr} \), within the time \(\mathtt {bound} \). The user must fix two main statistical parameters, \(\alpha \) and \(\varepsilon \), both in the real interval \(\,]0,1[\,\). The answer provided by the tool is a confidence interval \([p-\varepsilon ,p+\varepsilon ]\) where \(\alpha \) represents the probability of the answer of being wrong. The higher is the precision of the analysis the bigger must be the number of runs performed by the simulator. Thus the waiting time for a reply of a query depends both on the length of runs, i.e. on the parameter \(\mathtt {bound} \), and on the statistical parameters \(\varepsilon \) and \(\alpha \).

In order to make feasible our analysis we try to understand if we can change some system parameters without affecting the quality of the analysis. In particular, we focus on the parameter \(\mathsf C \) that is the number of slots composing a frame. The effectiveness of any synchronization protocol is crucially based on the exchange of some timing information to synchronize neighbor nodes. Said in other words, the longer nodes remain silent the quicker they get out of sync, because they do not get enough information to synchronize with each other. As consequence, once fixed the number of active slots, if the system gets out of sync with probability \(p\), for a certain value of \(\mathsf C \), then the same system will get out of sync more quickly (or with higher probability) for a bigger value of \(\mathsf C \).

The parameter \(\mathsf {loss} \) expresses the probability of message loss at the physical level due to the unreliability of the wireless medium. In our analysis, we will instantiate \(\mathsf {loss} \) according to the results appeared in [17], where packet delivery performances of WSNs have been studied at physical layer under different transmission powers and physical-layer encodings. In that analysis, \(60\) Mica motes have been used to measure packet delivery under three different environmental settings: office building, open parking lot, habitat with moderate foliage. Under these settings, results show that the physical layer contributes to the packet-delivery performance, which is defined as the fraction of packets not successfully received by the receiver within a time window.

For the sake of simplicity, all nodes of our networks will be instantiated with the same value of the parameter \(\mathsf {loss} \). According to [17] this parameter will be set to: \(10\), to approximate the average message loss in a parking lot; \(20\), to model the average message loss in a office building; and \(30\), which represents the average message loss in an habitat setting with moderate foliage.

4.1 Verifying Clique Topologies

Paper [8] derives necessary and sufficient constraints on the guard times to guarantee the correctness of the protocol on clique topologies in the case of perfect communication. These constraints depend on the clock ratio \(\mathsf {min} /\mathsf {max} \), on the parameter \(\mathsf k _0\) and on the maximal distance \(\mathsf M \) between two transmitting slotsFootnote 1; they do not depend on the size \(\mathsf N \) of the network. They are:

$$\begin{aligned} \begin{array}{rl} \mathsf g > &{} \big (1- \frac{\mathsf{min }}{\mathsf{max }}\big ) \cdot \mathsf{M }\cdot \mathsf{k }_0 + \frac{\mathsf{min }}{\mathsf{max }} \\ \mathsf{g } < &{} \big (1- \frac{\mathsf{max }}{\mathsf{min }}\big ) \cdot \mathsf{M }\cdot \mathsf{k }_0 + \mathsf{k }_0 - 2 \\ \mathsf{t } > &{} \big (1- \frac{\mathsf{min }}{\mathsf{max }}\big ) \cdot (\mathsf{k }_0 -\mathsf{g }) + \frac{\mathsf{min }}{\mathsf{max }} \end{array} \end{aligned}$$
(2)

From an analysis of these conditions, paper [8] demonstrates that guard time values \(\mathsf g =\mathsf t =3\) are sufficient to guarantee clock synchronization in a clique of arbitrary size.

Here we want to demonstrate that, in the presence of packet loss, the size of the clique network does play a crucial role. In particular, the bigger is the clique the higher must be the value of \(\mathsf g \) (and \(\mathsf t \)) to ensure clock synchronization. Said in other words: in the presence of message loss, fixed a value of \(\mathsf g \) (and \(\mathsf t \)), there is always a clique which gets out of sync with high probability.

For networks with full connectivity clock synchronization means that all nodes of the network agree on the current slot. As a consequence, Definition 1 can be rephrased as in [8] in the following manner:

Definition 2.

A clique network is said to be synchronized if for all reachable states it holds the following: \((\forall i, j \in \mathsf {Nodes} )(\mathsf {SENDING} _i \Longrightarrow \mathsf {csn} [i] {=} \mathsf {csn} [j])\).

So, in order to estimate the probability of going out of synchronization we will use UPPAAL SMC to perform the following quantitative check:

$$\begin{aligned} \begin{array}{rl} \mathtt {Pr[time<=bound]} &{} \mathtt {(<> exists(i:Nodes) exists(j:Nodes)} \\ &{} \mathtt {(WSN(i).SENDING and not(csn[i]==csn[j]))} \end{array} \end{aligned}$$
(3)
Table 2. On the parameter \(\mathsf C \)

4.1.1 Simulation setting

In our simulations on cliques, all protocol parameters will satisfy the constraints in (2). As in [8], the guard time \(\mathsf t \) is chosen to be the same as \(\mathsf g \). Parameter \(\mathsf {tsn} [i]\) is chosen equal to \(i\), as fully connectivity implies a different TX slot for each node. We set \(\mathsf k _0=29\). Unfortunately, we cannot set \(\mathsf C =1129\), as in the real implementation, because the length of the runs that can be analyzed by UPPAAL is limited: in order to avoid integer overflow, the parameter \(\mathtt {bound} \) cannot overtake the value \(2\cdot 10^{9}\). This means that if we would keep \(\mathsf C \) close to the real value, then our execution runs would last for just a single time frame and they would be too short to provide any significant result. According to the discussion done in the preface of this section we perform our analysis for low values of the parameter \(\mathsf C \). This modification does not affect our analysis. As an example, in Table 2 we consider cliques with \(\mathsf N =10,15\), \(\mathsf {loss} =20\) and \(\mathsf g =3,4\). We then perform the quantitative check (3) by varying \(\mathsf C \) and keeping constant the number of observed time frames. The value \(p\) represents the center of the confidence interval computed by UPPAAL SMC. Every check required \(6623\) runs of the protocol and lasted for about four days on a Intel core i5-2420M CPU 2.30GHz with 6G RAM. We gradually increased the precision of the parameter \(\varepsilon \) in order to achieve an interval which does not include the value \(0\) as a reply; in other words we have looked for lower bounds \(p-\varepsilon > 0\).

Table 2 outlines that when the number of time frames is fixed then the probability of going out of sync for the system does not decrease when the parameter \(\mathsf C \) increases (similar results can be obtained for different values of \(\mathsf N \), \(\mathsf g \), \(\mathsf {loss} \) and for different topologies). As a consequence, our simulations provide a lower bound of the probability of getting out of sync in a setting with \(\mathsf C =1129\).

Table 3. Cliques and node number. Maximal run length. \(\alpha = 0.05\), \(\varepsilon =0.025\)

In Table 3 we study the behaviour of the protocol on cliques up to \(30\) nodes. We vary the number of nodes \(\mathsf N \), the guard time \(\mathsf g \) and the parameter \(\mathsf {loss} \). Since we consider fully connected networks and the transmitting slots are grouped at the beginning of each time frame, we fix \(\mathsf C =\mathsf N +2\), as in [8], to allow at least two idle slots at the end of each frame. We set the statistical parameters \(\varepsilon = 0.025\), \(\alpha =0.05\) to have meaningful results. We check property (3) on the maximal run UPPAAL SMC can handle without incurring in integer overflow. The result of the quantitative check is represented by the probability \(p\), which is the center of the confidence interval computed by UPPAAL SMC. Every check required \(2952\) runs of the protocol. In the following table we report the time required by our simulations on a Intel core i3-2310M CPU 2.10GHz with 4G RAM.

All runs in Table 3 are quite short (from 30 to 60 frames, depending on \(\mathsf N \)); however, they are long enough to deduce some significant observation. For instance, we notice that once fixed the value of the guard time \(\mathsf g \), the probability of going out of sync increases when either \(\mathsf N \) or \(\mathsf {loss} \) increase. Moreover, once fixed both \(\mathsf N \) and \(\mathsf {loss} \), the probability \(p\) decreases when the guard time \(\mathsf g \) increases. Since the probability of going out of sync cannot decrease when going to longer runs, in Table 3 we compare probabilities associated to runs of different lengths. In particular, we notice that if we fix \(\mathsf {loss} \) and \(\mathsf g \) then the probability of getting out of sync increases when \(\mathsf N \) increases. At the end of this section we will compare runs of the same length.

The analysis provided in Table 3 says also that the protocol is certainly not suitable in certain scenarios. For instance, in a clique of at least \(10\) nodes with \(\mathsf g =3\) the system will get immediately out of sync with high probability if the loss probability is greater than \(0.2\). In other settings the results are not that strong. This is the case of a clique with \(10\) nodes, \(\mathsf g =4\) and \(\mathsf {loss} =20\). In this case, our analysis says that this system will get out of sync with probability \(0.025\). Such a value is ten times smaller than the loss probability, too small to conclude anything, at least in a so short run. Unfortunately, a priori, we cannot predict the behaviour of the system for longer runs as the probability \(p\) may increase or stabilize. In the following we will try to overcome this limitation.

UPPAAL SMC can simulate the behaviour of our systems on runs limited in size, called execution modules. At the beginning of an execution module all nodes are in the same time slot and with the same value in their clock variables. At the end of an execution module, UPPAAL SMC computes an estimate of the probability \(p\) to reach a state which does not satisfy Definition 2. This definition does not identify a single state of the system: nodes may have different clock values while still being in the same time slot. We claim that the initial state, where all nodes begin the execution module with the same clock value, is the state which has the smallest probability to lead the system out of sync. In order to support our argument, we provide an example in Table 4. We consider cliques of \(5\) and \(10\) nodes, with \(\mathsf g =4\), \(\mathsf C =7\) and \(\mathsf {loss} =20\). Table 4 shows experiments in which the system starts from a state that satisfies Definition 2 while internal clocks may have different values. The starting value of every internal clock is randomly chosen from a fixed interval \(\mathcal I \) of clock values. Runs are \(30\) time frames long. We set \(\alpha =0.05\). It can be noticed that the smallest desync probability is obtained when the execution module starts in the initial state where all nodes have the same clock value. Similar results can be obtained for other values of \(\mathsf N \), \(\mathsf g \), \(\mathsf C \) and \(\mathsf {loss} \).

Table 4. Module comparison – \(\mathsf g =4\), \(\alpha =0.05\), run length \(30\) frames –

In virtue of this observation, we can divide a long run in consecutive execution modules, all starting in the initial state. Then, we can derive by composition a lower bound of the probability of desynchronization for that run. Thus, if \([p - \varepsilon ,\,p + \varepsilon ]\) is the confidence interval provided by UPPAAL SMC after performing the quantitative check (3) within an execution module, then the probability of going out of sync within \(n\) execution modules is at least

$$\begin{aligned} 1-(1-(p-\varepsilon ))^n. \end{aligned}$$
(4)

Table 5 extends the results of Table 3 to longer runs which lasts for \(300\), \(600\) and \(900\) time frames, respectively. The forth column of Table 5 reports the lower bound of the confidence interval of an execution module. When \(\mathsf N =10, 15, 20, 30\) the execution module studied in Table 3 lasts for approximately \(60, 40, 30\) and \(20\) time frames respectively. Thus, by applying the formula (4) with \(n=5, 8, 10, 15\) we obtain a lower bound for the probability of being out of sync within \(300\) time frames in the cases of cliques with \(10\), \(15\), \(20\) and \(30\) nodes, respectively. Analogously when \(n=10, 15, 20, 30\) and \(n=15, 22, 30, 45\) we obtain a lower bound for the probability of being out of sync within \(600\) and \(900\) time frames, respectively.

Table 5. Quantitative check on cliques with \(\mathsf {loss} = 20\) and \(\mathsf \alpha =0.05\)

As discussed at the beginning of this section, the values of Table 5 represent also lower bounds of the probability of desynchronization for the real implementation. In the real setting, with \(\mathsf C =1129\) and clock frequency of \(32\) kHz, a time frame lasts for about \(1\) sec. As a consequence, Table 5 expresses a lower bound of the probability of getting out of sync within \(5\), \(10\) and \(15\) min. Thus, when \(\mathsf g =3\) the probability of getting out of sync is high also for small networks enough for small networks (around \(10\) nodes), but when \(\mathsf N =15\) we have a probability of being out of sync of almost \(0.4\) in \(15\) min. When \(\mathsf N =20\) the probability reaches \(0.7\) in less than \(15\) min. When \(\mathsf N =30\) the probability reaches \(0.7\) in less than \(5\) min. These results outline an increasing of the desync probability when the number of nodes increases.

4.2 Verifying Grid Topologies

Clock synchronization in clique topologies has been studied in [8] as a first step towards more realistic topologies. Usually sensor nodes have a limited number of neighbours and do not have direct communication with the whole network. In this section, we study how the gMAC synchronization protocol behaves on regular topologies which better simulate a uniform node distribution in a given areaFootnote 2. In particular, we will focus on grid topologies where nodes have a uniform number of neighbours. Unlike cliques, there are no theoretical results suggesting how to choose protocol parameters to guarantee the synchronization of grid networks in the case of non-lossy communication. The implementation of gMAC adopts an high guard time, \(\mathsf g =9\), to ensure synchronization in networks with arbitrary topologies. In this section, we study whether high values of \(\mathsf g \) guarantee node synchronization in grid-based networks, in the case of lossy communication.

In our simulations we focus on a small sensor network where nodes are placed in a \(5{\times }5\) grid, thus \(\mathsf N =25\). Unlike cliques, grid topologies do not need a different TX slot for each node: we can allocate the same TX slot to different nodes provided that when two nodes are neighbours or have a common neighbor then they get distinct TX slot numbers. According to the implementation of gMAC, where the number of TX slots is limited, we consider the minimum number of TX slots to be allocated to satisfy conditions (1). The number of transmission slots depends on the number of neighbours for each single node; if a node \(v\) has \(k\) neighbours then we need a TX slot in which \(v\) transmits and all its neighbours listen, and \(k\) distinct slots in which each neighbor transmits and \(v\) listens. Thus if \(k\) represents the maximum node degree, then we need at least \(k+1\) TX slots.

In the following we analyze the behaviour of the protocol on grid topologies by considering three possible maximum node degrees: \(4\), \(6\) and \(8\). These grid networks require at least \(5\), \(7\) and \(9\) TX slots, respectively. Below we report the three topologies we consider along with a simple slot allocation which satisfies conditions (1) by using exactly \(k+1\) TX slots, where \(k\) is the maximum node degree. The grid structures outlines the network topology while the identifiers \(0,1,\ldots \) show the TX slot allocated for the corresponding node.

As for cliques, our analysis does not loose in generality if we consider small values of \(\mathsf C \). Thus we pick \(\mathsf C =7,9,11\) for the three different cases, respectively. Depending on the maximum node degree, a single time frame is composed by \(5, 7, 9\) TX slots plus \(2\) idle slots. TX slots are allocated according to the distributions depicted above. We set \(\mathsf k _0=29\) and we vary the parameter \(\mathsf {loss} \), as done for cliques. Then, we apply UPPAAL SMC to perform the following quantitative check, according to Definition 1:

$$\begin{aligned} \begin{array}{l} \mathtt {Pr[time<=bound]} \mathtt {(<> exists(i:Nodes) exists(j:Nodes)} \\ \quad \mathtt {(neighbor(i,j) and WSN(i).SENDING and not(csn[i]==csn[j]))} \end{array} \end{aligned}$$
(5)

Again, we consider the longest run UPPAAL SMC can handle to avoid integer overflow by setting \(\mathtt {bound} =2\cdot 10^9\). This means that an execution module lasts for almost \(100\), \(80\) and \(65\) time frames when the maximum node degree is \(4\), \(6\) and \(8\), respectively. These are quite short runs, but long enough to conclude that the system may get out of sync also for high values of the guard time \(\mathsf g \). The result of the quantitative check is reported in Table 6. The value \(p\) represents the center of the confidence interval computed by UPPAAL.

Table 6. Grids \(5\times 5\) and node degree. Maximal run length. \(\alpha = 0.05\), \(\varepsilon =0.03\)

The compositional reasoning on execution modules adopted for cliques can be easily generalized to grid topologies. Table 7 fixes \(\mathsf {loss} =20\) and \(\mathsf g =6\). It reports lower bounds to the probability of getting out of sync within \(900\), \(1800\), \(2700\) and \(3600\) time frames. As said before, in the real implementation a time frame lasts for around \(1\) sec. Thus, when considering \(\mathsf g =6\) and a message loss of \(20\%\), we observe that the desync probability exceeds \(0.5\) in less than \(15\) min for degree \(4\), in less than \(30\) min for degree \(6\), and in less than \(45\) min for degree \(8\). Table 7 outlines also how the performances of the protocol depend on the node degree: the probability of getting out of sync decreases for grid topologies with higher node degree.

Finally, let us give a taste of what happens when \(\mathsf g =7\). Among the results on Table 6 we extend the case of degree \(4\) and \(\mathsf {loss} =20\), where the probability of getting out of sync within \(100\) time frames lays in the interval \([0.03 \, , \, 0.09]\). The projection to \(2700\) time frames says that the probability of getting out of sync becomes greater than \(0.54\). In the real settings, this means that the probability of getting out of sync exceeds \(0.5\) in less than \(45\) min.

Table 7. Quantitative check on \(5\times 5\) grids with \(\mathsf {loss} = 20\) and \(\mathsf g =6\)

In conclusion, in the case of lossy communication, small grid topologies have a high probability of getting out of sync even for high values of the guard time \(\mathsf g \) and for low values of the loss probability. Moreover the probability of getting out of sync increases when decreasing the maximum node degree.

5 Conclusions, Future and Related Work

Our work has been strongly inspired by a recent analysis [8] of the gMAC synchronization protocol on clique and line topologies, in the ideal case of non-lossy communication. That analysis provides constraints on the protocol parameters that are both necessary and sufficient for the correctness of the protocol for cliques of arbitrary size. Here we have carried on the work of [8] in the case of lossy communication. We have extended their model and obtained a network of probabilistic timed automata [6] which has been used for doing Statistical Model Checking within the UPPAAL toolset [7]. As a main result, we have showed that in the presence of message loss the constraints provided in [8] may be not sufficient to ensure clock synchronization of cliques of arbitrary size. Then, we have extended our analysis of the protocol to small grid topologies and again found that, in the case of lossy communication, the nodes of the grid may get out of sync with high probability. More interestingly, grid topologies with higher node degree have a smaller probability of desynchronization. This lets us to conjecture that higher connectivity helps synchronization protocols. In this respect, among the regular topologies, clique topologies are those with the best performances!

As in [8] we have assumed a fixed slot allocation. However, the implementation of gMAC includes a probabilistic dynamic slot allocation algorithm. The only analysis we are aware of the probabilistic gMAC algorithm appears in [16]. In that paper, mobile sensors do not use a fixed schedule to control medium access but instead employ gMAC’s full decentralized slot allocation: gossiping is introduced to allow each node to decide when to send. Paper [16] analyzes the energy-efficiency of gMAC under the assumption of perfect clock synchronization. The protocol, formalised in the MoDeST language [4], is evaluated using the discrete-event simulator of the M\(\ddot{\mathrm{o }}\)bius tool suite. We are planning to study the performance of the gMAC protocol with dynamic slot allocation in the case of lossy communication and realistic clock. In doing that, we intend to adopt either a (truncated) normal distribution or a (truncated) exponential distribution for modeling a more realistic delay between consecutive ticks.

Statistical Model Checking allows us to study networks of bigger size with respect to the state-of-the art model checking technology, such as PRISM [9, 11]. SMC can be seen as a trade off between testing and formal verification: its approach consists in performing an appropriate number of simulations which are elaborated with statistical algorithms to verify if a given property is satisfied with a certain probability. Unlike an exhaustive approach, a simulation-based solution does not guarantee a correct result with a \(100\%\) confidence. It is only possible to bound the probability of making an error. In order to study bigger systems with an higher confidence, paper [5] proposes a distributed implementation of UPPAAL SMC by means of a master/slave architecture where several computers are used to generate simulations and a single master process is used to collect those simulations and perform the statistical test. We are planning to employ this approach to extend the confidence of the results we obtained in this paper.