Measures of region failure survivability for wireless mesh networks

Wireless mesh networks (WMNs) are considered as a promising alternative to wired local, or metropolitan area networks. However, owing to their exposure to various disruptive events, including natural disasters, or human threats, many WMN network elements located close to the failure epicentre are frequently in danger of a simultaneous failure, referred to as a region failure. Therefore, network survivability, being the ability to provide the continuous transmission after a failure, is of great importance. In this paper, we define three new measures of wireless mesh networks survivability for a region failure scenario, including the region failure survivability function, p-fractile region survivability function, and the expected percentage of total flow delivered after a failure as a function of region radius r. These measures are next used to evaluate the vulnerability of example wireless mesh networks to region failures.

element from the failure epicentre. This in turn leads to the concept of a region failure [16][17][18][19], where many nodes may fail simultaneously, even if the failure epicentre is located far away from them, but the area of a negative influence is large enough.
According to [17], a region of failures may be defined with respect to either network topology, or network geometry. Since the main factor influencing the possibility of node failures as a consequence of real-world natural disasters or attacks is the distance of nodes from the event epicentre, geometrical representation of a failure region determined by a circular area of radius r, shown in Fig. 1, is commonly used [16,17].
Majority of works on region failures consider the model of a single region, where, at a given time, failures are restricted to one region only. However, there are also some papers investigating the case of simultaneous failures occurring in multiple regions (see e.g., [17]).
Region-based failures need to be evaluated in detail, since their impact on wireless mesh networks performance degradation, measured e.g., in terms of the fraction of flow that survived failures of nodes located inside a given failure region, depends on characteristics of disruptive events (e.g., the strength of an earthquake, or the force of a bomb explosion)-in particular on the size of the respective failure region.
However, to the best of our knowledge, there are currently no such proposals available in the literature appropriate for measuring the vulnerability of wireless mesh networks to region failures of differentiated radiuses r.
The main achievement of this paper is the introduction of three new measures of WMNs survivability for a region failure scenario assuming circular failure areas with random location of failure epicentres, i.e.: • region failure survivability function (RFS) being the cumulative probability of all region failure scenarios d, for which at least w percent of flows are successfully served after failures, • p-fractile region survivability function (PFRS), providing information on total flow reduction to at most w percent after a failure at certain probability p, • expected percentage of total flow delivered after a region failure as a function of region radius r (EPFD).
These measures can be utilized to evaluate the vulnerability of a given wireless mesh network to region failures, as well as to provide comparisons of characteristics between different WMNs.
The rest of the paper is organized as follows. Section 2 presents related works, while Sect. 3 includes details of the assumed network model. Proposed measures to evaluate the vulnerability of wireless mesh networks to region failures are next introduced in Sect. 4, and are followed by a description of methodology of WMN survivability evaluation (Sect. 5). Section 6 presents results of simulations performed for example network topologies assuming the link cost metric based on the distance between nodes. Section 7 concludes the paper.

Related works
Recent works on network survivability evaluation provide the respective methodology mainly for wired networks (see e.g., [20][21][22][23][24][25]). Only a few of them refer to wireless networks. Early papers on wireless networks survivability focused on connectivity of a network topology as a measure of fault-tolerance [26]. In general, this metric can be used to give a binary answer to the question, whether transmission after a simultaneous failure of k-1 elements (nodes/links) is possible. If the answer is positive, then the network is said to be k-connected. Specific variations of this metric include e.g., average connectivity [27], distance connectivity [28], or path connectivity [29].
However, they are all not suitable for a region failure scenario, where faults of network elements occur only in bounded areas. To capture these characteristics, regionbased connectivity was introduced (see e.g., [15][16][17], and [26]). In particular, in [26] it was shown that the difference between connectivity and region-based connectivity can be arbitrarily large.
Papers considering the circular region failure scenario in WMNs include the models of: • deterministic failures (e.g., the single circular model [17]). In this case, any node located within the failure region is always assumed to fail with probability 1, • probabilistic failures, where probability that a node is affected by a disruptive event depends on the distance between the node and the failure epicentre [15]. In r Fig. 1 Example of a region failure: dark grey circle centred at some point of Euclidean space (i.e., the epicentre of disruptions), and characterized by a given radius r, represents the area of possible failures of WMN nodes general, this probability is assumed to decrease when locating the node farther away from the failure epicentre.
Natural disasters, or attacks are rarely deterministic, and nodes being within their scope fail with a certain probability. Therefore, the use of probabilistic models is more appropriate. However, known probabilistic approaches also have limitations. In particular, the authors of [15] assume that the size of a failure region (defined by radius r) is constant. Another constraint in [15] is that probability p n of node n failure, even though it is decreasing with the distance r n of node n from the failure epicentre, is constant in each i-th area between two consecutive concentric annuluses, as given in Fig. 2a. This in turn may result in over-, or underestimating the node failure probability values in some areas (which was in fact shown in that paper).
The main purpose of connectivity measures presented in [16,17], and [26] is to determine whether transmission in WMNs is possible between pairs of non-faulty nodes. However, no survivability functions are currently available in the literature that are designed to evaluate vulnerability of WMNs to failures occurring in regions of differentiated radiuses r. Our paper is thus the first one to propose such measures for the case of varying radius r of a failure region, and utilizing the continuous function of node failure probability (see Fig. 2b and Eq. 3), thus covering the models from [15,17] as special cases.
It is worth noting that some survivability measures have been proposed in the literature so far only for random failure scenarios in wired networks (see e.g., [30]), assuming that failures of network elements are statistically independent and equally probable. This is, however, completely in contrast to common characteristics of region failures in WMNs, which makes any comparison of measures introduced here with the ones for wired networks (e.g., from [30]) inadequate.

Network model
In this paper, we assume that the network topology is given by a graph G = (N, E), where N is the set of wireless mesh nodes, and E is the set of directed edges e h = (i, j). Each WMN link between nodes i and j is modelled by two edges in opposite directions. Location of any node n is given by coordinates (x n , y n ). Nodes are considered here to be stationary (which is a common scenario in WMNs [31]). However, analysis presented in this paper can be still valid for mobile nodes assuming evaluation of a network topology at certain time t (i.e., investigating the respective instant topology at time t).
Available capacity of any WMN link is a result of influence of multiple factors, the most important ones being: medium access protocol implementation, interchannel interference implied by the respective link scheduling algorithm [32,33], as well as time-varying factors including e.g., weather-based disruptions caused by heavy rain falls (general propagation conditions) [10,34]. As a result, even in case of stationary WMN nodes, effective link capacity changes over time. In such a scenario, when evaluating survivability properties of WMNs, it is proper to analyze them at certain time t. Therefore, in this paper we assume that at the observation time t, each edge e h is characterized by capacity c h (t).
The set of demands, denoted as K, consists of demands k defined by ordered triples (p k , q k ,d k ), i.e., including the respective source and destination nodes p k and q k , as well as the demanded capacityd k .
Two matrices will be used in our model description: A nn and D nn . Node-to-node incidence matrix A nn is used to provide the connectivity information, with elements a ij defined as given in Eq. 1.
Elements d pq of matrix D nn store information about aggregate capacitiesd k required for flows (commodities) k between given pairs of end nodes.
Each time, location of a failure epicentre is chosen at random within the smallest rectangular area containing the network. We assume a probabilistic failure scenario, in which any disruptive event affects nodes localized within a given radius r from the failure epicentre. In particular, in our model: • radius r of a failure circular region is uniformly distributed over (0, r max ), where r max is assumed to be equal to half the largest Euclidean distance between any two nodes in the network, • probability P(r n ) of node n failure is given by a decreasing continuous function of distance r n between node n and the failure epicentre (see Fig. 2b and Eq. 3). P(r n ) introduced here is thus the generalization of the respective formula from [15]. where: (x n , y n ) are coordinates (location) of node n, (x, y) are coordinates (location) of the failure epicentre, r is the radius of a failure region, r n is the distance of node n from the failure epicentre.
Our definition of a WMN node failure probability function given by Eq. 3 is justifiable, since, as stated in [15], the power of real physical attacks (including e.g., bomb explosions, electromagnetic pulse (EMP) attacks), or natural disasters (e.g., earthquakes, floods, etc.) is known to attenuate gradually with the increase of the distance of WMN nodes from the failure epicentre. Following [15], the maximum value of node failure probability equals 1 (if the failure epicentre matches exactly the location of node n), while its minimal value (i.e., 0) is assumed for nodes located at a distance r n of at least r from the failure epicentre.
This gradual attenuation of P(r n ) values with the increase of the distance r n can be disturbed by environmental factors including e.g., topography, or node protection characteristics. However, if we neglect them to simplify the analysis (e.g., as in [15]), we obtain the linear decrease of probability P(r n ) of a node n failure with the increase of the distance r n between this node and the centre of a disruptive event, as proposed in Eq. 3.

Proposed measures of a WMN survivability evaluation
In the remaining part of the paper: d is used to denote a region failure scenario, determined by a set of nodes being nonoperational after the outage, P(d) is the probability of a failure scenario d occurrence, W(d) is the random variable denoting the percentage w of flows successfully delivered in a given failure scenario d, f w denotes the probability density function of percentage of flows W to survive the region failure, i.e., PðdÞ ð 4Þ We introduce the following measures of a wireless mesh network survivability for a region failure scenario: a. region failure survivability function (RFS) of the percentage w of flows successfully transmitted after region failures: According to Eq. 5, for any value of w, RFS(w) is defined as the cumulative probability of all region failure scenarios d (i.e., for differentiated radiuses r of failure regions), for which at least w percent of flows are successfully transmitted after failures. Therefore, it can be also expressed as the reverse cumulative distribution function of W. Although Eq. 5 has some similarities with the respective one from [30] for wired networks, calculation of P(d) values is completely different (see Sect. 5).
b. p-fractile region survivability (PFRS): As given in formula (6), the value of p-fractile region survivability is defined as the minimum percentage w of flows successfully delivered after a region failure, for which the probability of not exceeding this value is equal to p. In other words, PFRS gives us useful information about probability p that the total flow will be reduced to at most w percent after a region failure.
The common feature of RFS and PFRS measures is that they do not depend directly on radius r (i.e., they allow radius r to take any value from (0, r max ) interval). These functions are thus designed to give general information on network vulnerability to region failures.
RFS and PFRS measures are thus convenient to use if the objective is to analyze the performance of wireless mesh networks independent of the size r of a failure region. Although they are similar to each other in terms of utilization scenarios, information they provide is of a different type. For instance, if for a given WMN at least w percent of traffic should be delivered independent of region failures (e.g., because such a portion of traffic is considered to be critical based on the Service Level Agreement between network operator and the customers), then RFS measure will provide appropriate information about probability p of fulfilling this requirement under region failures independent of the failure region size r. Naturally, the greater is the value of p, the better.
PFRS, is in turn useful for a network operator to see, given probability p, what is the upper bound on the fraction w of flow surviving a region failure, e.g., in statements like: ''with probability 0.5 the total flow will be reduced to at most 60 %''. This measure is thus useful to provide information on probability that not all of w percent of flows (e.g., referred to as the critical flow) will survive the region failure.
Unlike RFS and PFRS, the following EPFD function provides a detailed characteristics with respect to particular radiuses r of failure regions.
c. expected percentage of total flow delivered after a failure (EPFD) as a function of region radius r: where: r is the radius of a failure region, f W (w,r) is the probability density function of W determined for region failures of a given radius r f W ðw; rÞ ¼ X d:WðdÞ¼w;r PðdÞ ð 8Þ As given in formula (7), EPFD(r) is defined as the expected value of percentage of flows to survive node failures occurring in circular regions of a given radius r, i.e., calculated using the probability density function f W (w, r) determined for region failures of a specified radius r [formula (8)].
Example scenarios of EPFD measure utilization include performance analysis/comparison of WMNs in case of region failures related e.g., to natural disasters (like flood, or volcano eruption) with expected radiuses r of the failure region. Also, when expecting a failure characterized by a given radius r (e.g., incoming flood), network operator can use this information to predict its impact on the WMN performance, and, if possible, try to take preventive actions.
In the latter part of the paper, we will provide information on how to utilize the introduced measures to evaluate vulnerability of wireless mesh networks to region failures. We will also show how to use them to compare characteristics of several different WMN topologies.

Methodology of a WMN survivability evaluation
In this section, we show how to evaluate survivability characteristics of a wireless mesh network under region failures. In particular, this section is to explain the methodology of determining RFS, PFRS, and EPFD survivability characteristics for WMNs. Proposed measures can be derived from the auxiliary function F[w], where w [ {0,1,…, 100}, providing information on the frequency a given percentage w of flows was successfully delivered after region failures.
In real life, F[w] values would be collected for an existing wireless mesh network based on observations of the network performance after consecutive occurrences of disruptive events bringing about the region failures of WMN nodes. However, taking into consideration long values of real inter-failure time, typically measured in terms of months/years, deriving any survivability characteristics based on real-life experiments would be extremely time-consuming, and, therefore, practically impossible. It is thus crucial to define an iterative procedure to simulate consecutive region failures in a realistic way that eliminates the inter-failure time from evaluations. Additional benefit would be also the possibility to analyze not only the performance of existing networks, but also to predict the survivability characteristics of planned (i.e., non-deployed) wireless mesh networks based on information about the abstract topology of a WMN, as well as on estimated traffic demands.
Such a procedure of determining F[w] values by means of simulations for a single set of demands is proposed in Fig. 3. As an input, the most important information that it takes includes: (1) the topology of an existing/planned wireless mesh network determined by graph G = (N, E), where N and E are the sets of vertices and directed edges, accordingly, representing WMN nodes and links, respectively; (2) location of network nodes determined by coordinates (x n , y n ); (3) information related to traffic demands defined by the source and destination nodes, as well as the demanded capacity.
Each iteration of this procedure (defined by Steps 3-11) is to obtain the percentage w of flows successfully delivered after a single region failure. Following [15], coordinates of each region failure epicentre, as well as radius r of a failure region can be considered as random values defined by the continuous uniform distribution function. Therefore, in each iteration of our procedure, characteristics of a failure region can be determined randomly (Step 5), implying that inter-failure time need not be simulated. In particular, it means that in each iteration: • location of a failure epicentre is chosen at random (using the continuous uniform distribution function) within the smallest rectangular area containing the WMN topology, • radius r of a failure circular region is uniformly distributed over (0, r max ), where r max is assumed to be equal to half the largest Euclidean distance between any two nodes in the network.
The set of failed nodes is next identified in Step 6, as proposed in Eq. 3. After that, in order to evaluate the percentage w of flows successfully delivered in a given region failure scenario, for each flow that can be served after a region failure (i.e., with both end nodes being non-faulty), our procedure tries to find a new (alternate) path of capacityd k (Steps 9.1-9.5). If the respective path is found, but, due to link capacity limitations, it cannot be assigned the full demanded capacityd k , this procedure tries to apply the multipath routing to increase as much as possible the capacity assigned to demand k after a region failure (Step 10). After finishing the process of finding the alternate paths for all demands in a given region failure scenario, the percentage w of flows successfully delivered after a failure is calculated based on the ratio of the aggregate flow c f restored after the failure to the total flow c being transported before the failure (Step 11). Calculations are repeated until the assumed number fr of region failures are analyzed (Step 13).
RFS, PFRS and EPFD functions can be easily obtained based on F[w]. In particular, RFS(w) can be calculated based on empirical probabilities of restoring w percent of flows after failures (each empirical probability of restoring w percent of flows can be obtained by dividing the respective value of F[w] by fr, i.e., by the total number of analyzed region failures). According to Eq. 5, RFS(w) can be next determined as the reverse cumulative distribution function of W. PFRS(p) can be also calculated based on the cumulative distribution function of W [see formula (6)]. Finally, EPFD(r) can be calculated based on probability density functions f(w, r) found separately for each radius r of a failure region, using Eq. 7.
It is worth noting that the optimal solution to the problem of finding a new set of paths after failures of nodes occurring in a given region with the objective to maximize the percentage of restored flows may be obtained e.g., by finding the solution to the respective linear programming problem (LP) [35]. However, due to its NP-completeness finding the optimal solution is feasible using only offline approaches for small problem instances (e.g., for networks up to 12-15 nodes). Therefore in our simulations, determining the detours in Step 9 of the algorithm from Fig. 3 was done using the reactive heuristic approach based on the Dijkstra's algorithm from [36], having the polynomial computational complexity bounded in above by O(|N| 2 ), where |N| is the number of WMN nodes.

Simulation analysis
In this section, we present utilization of proposed measures to evaluate the vulnerability of five example wireless mesh networks from Fig. 4 to region failures, i.e., N29, N29_2, N29_3, N44, and N59 networks. First three of them (shown in Fig. 4a-c) consist of 29 nodes connected by 68, 68, and 57 links, accordingly. The remaining two networks (Fig. 4d, e) are formed by 44, and 59 nodes, respectively, connected by 97, and 150 wireless links, accordingly. Nodes of 29-node networks were located in 4,000 9 10,000 m 2 , 6,000 9 6,000 m 2 , and 8,000 9 8,000 m 2 fields, accordingly, while for N44 and N59 networks, a field of 10,000 9 10,000 m 2 was assumed. Due to the fact that horizontal and vertical sizes of the first (i.e., N29) network rectangular areas (4,000 m and 10,000 m, accordingly) differed much from each other, this network was expected to have the worst properties concerning the portion of flows surviving the region failures of the analyzed radiuses r up to half the largest Euclidean distance between any two nodes in the network.
In simulations, both before and after failures, paths were found as the cheapest ones using the distance metric [37,38]. After failures, redirection of flows with survived end nodes took place in a reactive manner. In order to provide the appropriate statistical results concerning RFS, PFRS, and EPFD functions, the respective F[w] values were the aggregate ones achieved with respect to all u = 100 investigated demand sets of a certain size with fr = 9,000 analyzed random failure regions per each demand set.
In particular, we considered three simulation scenarios. The first two (Scenario A and B) were intended to utilize the proposed measures to evaluate characteristics of different WMN topologies, but under a similar network load. In this case, the sets of unicast transmission demands consisted of 25 % of randomly chosen node pairs. In Scenario A, special focus was put on analysing characteristics of networks of the same size in terms of the number of nodes (i.e., N29, N29_2, and N29_3 networks consisting of 29 nodes), while Scenario B was prepared to evaluate networks being similar in terms of the area they covered (i.e., not necessarily similar in terms of the number of nodes). Topologies analyzed in Scenario B included: N29, N44, and N59.
In Scenario C, we used our measures under differentiated loads of N59 network, implied by four sizes of demand sets (i.e., consisting of randomly chosen 25, 50, 75, and 100 % node pairs). For each unicast demand k, the requested capacityd k was assumed to be unitary, while each network link offered the aggregate capacity of 160 units in each direction. Radiuses r of failure regions were uniformly distributed in range (0, r max ), where r max was assumed to be equal to half the largest Euclidean distance between any two nodes in the network.
Statistical analysis of results based on 95 % confidence intervals, showed that sizes of these intervals did not exceed 1 % of the original values. Therefore, due to low visibility, these intervals are not shown in Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14. 6.1 Region failure survivability (RFS) Figures 5 and 6 present values of region failure survivability function (RFS) for the analyzed topologies as a function of w parameter under the assumptions of Scenario A. Recall that RFS measure, defined in Eq. 5, was designed to evaluate the probability that at least w percent of flows survives after a failure.
As shown in Figs. 5 and 6, with the increase of w, independent of the network topology, RFS starts decaying from the value of 1 (since independent of the network topology, probability of reducing the total flow to at least 0 % is equal to 1). For any value of w, when comparing RFS characteristics for any two network topologies, greater values of RFS denote a better performance of a network after a failure, since they indicate a greater chance that the total flow will be reduced to at least w percent after a failure. The general conclusion is that better results in terms of survivability under region failures offer WMN networks with RFS functions characterized by a slower decay with the increase of w (i.e., for which independent of w parameter, RFS values are higher). For instance, as shown in Fig. 5, in terms of vulnerability to circular region failures, N29_2 and N29_3 networks located inside a square area outperform the N29 Network, for which its horizontal and vertical sizes are remarkably different. This is also the reason why N44 and N59 networks outperform the N29 network in Scenario B (Fig. 6).
By definition [see formula (5)], RFS is a function of w parameter only. However, since radiuses of failure regions are differentiated, the respective probability density function f w used to obtain the RFS values can be viewed as a superposition of the respective ones achieved for all analyzed cases of radiuses r. Figure 7 shows the respective f w functions for radiuses of failure regions confined to ten equal-length subranges of (0, r max ) interval, obtained for N59 network in Scenario B. As shown in Fig. 7, with the decrease of region radius r, it is more probable that a significant fraction of flows will survive the failure. However, as region radius r increases, possibility of restoring only a small fraction of total flow also gets increased.
It is worth mentioning that the introduced RFS measure does not depend on network load (see results for Scenario C presented in Fig. 8). Therefore, it can be used to compare characteristics of different WMN topologies.    For any value of p, it is thus better if the upper bound on the portion w of flow surviving the failure is higher. Another important observation is that independent of the network topology, PFRS values are positively correlated with p. This can be explained by the fact that greater upper bounds on total flow reduction to at most w percent (represented by PFRS values) cover greater subsets of failure scenarios, i.e., occurring at a greater joint probability p.
The general conclusion is that the lower the values of PFRS, the network is more vulnerable to region failures. Similar to results from Section VIa, PFRS measure also indicated that N29 network has the worst properties among all analyzed network topologies in Scenario A as well as in Scenario B. Figure 11 presenting results for differentiated loads of N59 network (Scenario C), additionally shows that PFRS measure is the next one that does not depend on the network load. Figures 12, 13, 14 show values of EPFD function obtained in Scenarios A-C, accordingly. Recall that EPFD function, defined in Eq. 7 as the expected percentage of total flow delivered after failures occurring in circular areas, was    designed to evaluate the resistance of a network topology to region failures of certain radius r. In general, greater values of EPFD function imply that more network flows can survive the failure. As shown in Figs. 12 and 13, in Scenarios A and B better results were obtained for N29_2, N29_3, N44, and N59 networks being more compact than N29 network (see Fig. 4). The results comply with the respective ones from Figs. 5, 6 and 9, 10 (i.e., N29 was found to be the worst one in terms of all three measures).

EPFD function
Regarding different sizes of demand sets (shown in Fig. 14 for N59 network), no visible differences were observed.

Conclusions
In this paper, we addressed the issue of wireless mesh networks survivability with special focus on region failures occurring in circular areas. Three new measures dedicated to evaluation of WMN survivability were introduced. The first two-i.e., region failure survivability function (RFS), and p-fractile region survivability function (PFRS), were designed to give information on WMN vulnerability to region failures independent on the radius r of the failure region, while the third introduced measure-the expected percentage of total flow delivered after a region failure as a function of region radius r (EPFD)-allowed for evaluation of WMN performance depending on the radius r of a failure region.
In the second part of the paper, proposed measures were used to evaluate the properties of three example topologies of wireless mesh networks. Results showed that the introduced measures give adequate and consistent information on WMN networks vulnerability to region failures. Additionally, since all introduced measures did not depend on the network load, they can be utilized in comparisons of different WMNs.
Concerning network topologies, the general conclusion following from the analyzed scenarios is that better performance in terms of total flow surviving the region failure is achieved by networks consisting of nodes covering the square area in a regular way. Future work is to utilize the introduced measures in a proactive design of highly survivable WMNs.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.