1 Introduction

Robust and timely communication between all participants is essential for efficient first responder mission execution. Infrastructure-based mobile communication technologies might not be available for this if the disastrous event damages parts of the infrastructure. Therefore, first responders have to build their own network with whatever devices are available. This leads to a highly heterogeneous network structure and often ad hoc style communication paradigms [1]. Pure mobile ad hoc network (MANET) communication is however suffering from frequent disruptions due to node mobility limiting the coverage of the network and forcing the users to repeat messages, as soon as communication is possible again. To avoid this, mechanisms from delay tolerant networks (DTNs) can be added to the devices enabling them to store messages until the nodes are within coverage again [2, 3]. However, this also increases the delay to a level that might not be acceptable in first responder communications [1, 4].

In our previous work [5], we already studied different options to mitigate the experienced delay due to DTN mechanisms by analyzing hybrid DTN-MANET approaches. Such a hybrid network can consist of both mobile and stationary nodes in the MANETs where all nodes are DTN-capable. Besides that, it is possible to connect the nodes to the remaining infrastructure network if suitable devices that can act as gateway are present. This general network structure is widely discussed in literature as a good option for efficient disaster communication (e.g., in [1]).

The results of our previous study showed that the DTN protocol plays a crucial role in the overall system performance if it is placed on layer 5. In that case, the DTN protocol defines when messages are handed down to the MANET protocols on layer 3 for further transmission. However, we observed that current protocols are not able to efficiently exploit the underlaying MANET connections, especially if these represent stable, long-lasting contacts that would be available and ready to use immediately.

Previous studies on DTN protocols or hybrid MANET-DTN solutions did not observe this behavior because the focus there was on rather random scenarios leading to contact patterns according to the DTN design assumptions or, if interactions with infrastructure or other fully connected parts of the network are studied (e.g., in [6]), the assumption often is that the messages have to reach this well-connected part for success as the last node acts as gateway. However, these assumptions are not valid in disaster scenarios. The contacts are not random as nodes move according to tactical requirements of the mission. This leads to well connected groups that meet infrequently with others but have to communicate with the mission coordinators frequently and in a timely fashion. In such scenarios, it is important to utilize every communication opportunity and short-cut other parts of the network without the need for special gateways.

In this paper, we extend our work [5, 7] to enable a true hybrid scheme that is capable of efficiently exploiting any available contact opportunity by adding two mechanisms to the system. One is the integration and usage of a layer 3 service discovery mechanism that allows a DTN node to discover all connected nodes with DTN capabilities on layer 3. The other one is an algorithm that triggers the recalculation of the utility score in DTN protocols and thus is able to produce better scores for stable connections. By doing this, we are able to enhance the awareness of the DTN protocols of underlying contacts and effectively reduce the experienced delay, as far as that is possible based on the given topology. The algorithm is integrated into an enhanced version of the RAPID (Resource Allocation Protocol for Intentional DTN) protocol [8].

The paper is organized as follows. In Section 2, we will first give a more detailed review of communication in disaster scenarios and the applicability of hybrid DTN approaches to these scenarios. Afterwards, we review general design assumptions of DTN protocols with a focus on the calculation of utility values. Based on that discussion, we then introduce our concept to enable an efficient exploitation of underlying MANET multi-hop contacts and present relevant implementation details. In Section 3, we present results regarding three sets of simulations. The first is a proof of concept study, showing the impact of the different mechanisms of our approach. The second compares our modified protocol version to the original performance and the analytically obtained optimal solution. The third evaluation targets the robustness of the modified approach with respect to random node failures. Finally, we conclude the paper and indicate future work.

2 Methods

In this section, we introduce the methodology leading to our hybrid MANET-DTN approach as well as the simulation setup used during the further evaluations.

2.1 Disaster scenario review

Disaster scenarios involve many different types of events that threaten the life of people. The range lasts from large scale man-made terror attacks or natural events such as floods and wildfires to smaller scale events as, for example, search and rescue missions. Even if all scenarios are different, there are some common characteristics.

The geographical area of the event is usually large, denying an efficient coverage by one communication technologies only, if long-range communication such as public land mobile networks are not available. Infrastructure-based communication would be ideal, but is likely to be affected by the disastrous event as well. Even if the infrastructure is not hit directly, it might be overloaded and thus not reliable enough for disaster communication. However, communication between all participating rescue forces is essential for efficient recovery and response processes in order to prevent further damage. This not only includes communication to coordinate and distribute of resources but also frequent reports from users in the field. Especially, the reports of possible findings or changes in the overall situation have to be communicated fast and reliably.

As a result, the rescue forces often build their own networks, based on nodes they carry or that are otherwise part of their equipment (e.g., mounted on vehicles or special unmanned aerial vehicles). The combination of mobile nodes carried by the users and stationary nodes, e.g., mounted on vehicles creates an environment with two types of connectivity, even if pure MANET principles are used between all nodes.

The first type spans a local network with rather good connectivity between the participants resulting in stable routes. It is build from stationary nodes, but exists also between groups of nodes that move together. Here, a pure MANET solution would be sufficient.

The second type is the communication between the mobile nodes. This is features with frequent disruptions that can last for long periods, e.g., if a group of nodes moves away from the otherwise covered area to fulfill its mission task. DTNs are a good way to handle the disruptions since DTNs allow the messages to be saved until a new communication opportunity arises. Timely message delivery in this case is more challenging, because simply waiting for the next communication opportunity might take to long.

To overcome the second point, several works suggest to add additional nodes to enhance the connectivity between groups or use dedicated message ferries to deliver messages. The crucial aspect here is to exploit all available communication opportunities as efficient as possible. This ultimately results in a heterogeneous and hybrid network.

We chose to use a Search and Rescue scenario as an example for our evaluations. This represents a rather small event but shows all the elements of a bigger scenario at a better observable scale. Figure 1 shows an example screenshot of the scenario with both mobile and stationary nodes.

Fig. 1
figure 1

Screenshot of the Search and Rescue scenario in ONE with backbone nodes deployed along the center of the area

The fixed nodes in this scenario form a central backbone of fully connected nodes resulting in an area of stable MANET connectivity. In that case, all nodes were DTN-capable, and the multi-hop communication both on MANET and DTN level is enabled.

As an evaluation metric, we chose a composite metric that covers both the reliability and the timeliness of the message delivery in the network. First responder communication is time critical in most cases and requires a high delivery ratio. In German rules and regulations, it is mentioned that messages between different participants should be acknowledged within 3 min, otherwise a broken communication has to be assumed. Therefore, we use a deadline of 90 s to decide whether a message arrives late in one direction. We use the error ratio (ratioerror) as evaluation metric that is calculated according to Eq. 1, including both messages that do not get delivered and messages that get delivered too late. It therefore covers both mentioned metrics relevant in the scenario.

$$ {\text{ratio}_{\text{error}}} = \frac{{\text{num}_{\text{lost}}} + {\text{num}_{\text{late}}} (90 \mathrm{s})}{{\text{num}_{\text{sent}}}} $$
(1)

However, when we performed simulations using this setup, we observed that other than expected there was no performance improvement with an increasing number of additional nodes, both carefully placed at fixed positions or mobile. Figure 2 shows the results and the gap between the simulation and an analytically obtained ideal solution based on perfect knowledge of the contacts as well as 95% confidence intervals around the results. The optimal solution (red) was calculated via our analytical framework introduced in [9] and represents the lower bound on the error ratio of that particular scenario.

Fig. 2
figure 2

Gap between current DTN protocol performance and ideal result [5]

While analytical results showed that there are more and longer lasting connections that enhance the connectivity, the protocols under test were not able to exploit them [5]. When looking closer at the resulting paths of the messages, we observed that some of the contacts were not considered at all. These were the long-lasting contacts, that were established between the backbone nodes once they are in place and are disconnected at the end of the mission, when the equipment gets collected. The result is a large gap between the actual performance and what is theoretically possible.

2.2 DTN utility scoring

To better illustrate the problem, let us review general design assumptions for DTN protocols and their utility functions.

DTN protocols were designed for situations where nodes get frequently disconnected from the network and these disconnections can last for a long time [10]. In order to be still able to communicate and send/receive data, the nodes store otherwise undeliverable messages persistently until they meet another DTN-enabled node. Such a meeting is called a contact between two DTN nodes or a communication opportunity.

Traditional approaches assume that a contact is a direct point-2-point contact as depicted in Fig. 3a. This contact will last as long as the two nodes are within communication range of each other.

Fig. 3
figure 3

Contact types in principle. a Point-2-point b Multi-hop

When two nodes are in contact with each other, utility-based DTN routing protocols first calculate a utility score, which is then used to decide whether to forward a message to the other currently reachable node. This can also include transitive contact patterns, e.g., if a node B meets nodes A and C frequently, then also nodes A and C meet frequently, even if they are never in direct contact with each other [11]. Based on the utility score, the nodes in contact then exchange the relevant messages for which the peer node has the better score. Whether all messages in that category can be exchanged, is depending on the duration of the contact. Therefore, the recognition and exploitation of such contacts is essential for the performance in DTN networks [12].

Multi-hop communication is considered on the DTN plane as well. In this case, it is possible to forward the message along a chain of DTN nodes, as long as the utility score at each hop indicates to forward the message and if the routing scheme supports further transfers. This should be the case with the backbone nodes in our example scenario. Using these mechanisms, many of the traditional DTN use cases can be handled well. The base assumption is, however, that the contacts are relatively short and partially predictable.

To enhance the network performance, the natural choice is to add more nodes and ideally position them in a way to create additional contact opportunities. This results in a higher node density and eventually will create more and longer lasting contacts. However, the requirement that all nodes have to be DTN-enabled might be too limiting in some cases, as it requires additionally installed software on the respective devices. If it is important to actually exploit any communication opportunity, even if the nodes have heterogeneous capabilities, also non-DTN nodes should be used as relay. This is possible without changes, if a suitable MANET protocol is used at layer 3 and the DTN functionality is added on top.

The option to exploit also non-DTN nodes as relays leads to the possibility of direct logical DTN contacts that however are reachable via a multi-hop connection in the underlying MANET only. Fig. 3b shows a simple version of this case with one intermediate node. In such a situation, hybrid approaches with a combination of traditional IP-based MANET routing on the network layer (OSI layer 3) becomes important. Then, the DTN instances rely on the route information from the lower layer to actually discover and identify the respective contact. This can be obtained by service discovery mechanisms as described by various authors (e.g., [1315]). If this information is available, the DTN instances should be able to use it as a base for the utility calculations in the resulting overlay (see Fig. 4 as an example).

Fig. 4
figure 4

Illustration of underlay/overlay interdependencies for multi-hop contacts. a Layer 2 topology b Actual DTN contacts

As mentioned above, this was however not the case in the SAR scenario. This observation was surprising and to our knowledge it has not been reported before.

The reason for this behavior is only partly related to the choice of scenario which is however a typical hybrid MANET-DTN setup and thus represents a class of scenarios with mixed connectivity types. The other part comes from the way DTN protocols actually calculate the utility score. As mentioned above, this score is calculated whenever two nodes meet or when the contact breaks.

Besides that, approaches targeting DTNs as means to bridge the communication to otherwise connected parts of the network (e.g., to connect to remaining or rebuilt infrastructure networks) mostly focus on the routing towards the connected network [6] and then assume that this will finally deliver the message to its destination within the connected network. Therefore, the goal is to reach that part, but not use it as a part of the DTN network in order to reach other DTN nodes. However, also the routing towards those parts requires efficient discovery of the contacts involved.

Regarding multi-contacts, cases in which a node is in contact with multiple other nodes simultaneously, there has been one study showing that this can have a positive impact as well, but is also usually not considered in the current design of DTN protocols [16].

2.3 Hybrid solution concept

In this section, we will present our concept to enable effective utilization of stable layer 3 contacts via a hybrid interaction of the DTN and MANET instances of a node. To achieve that, two things are required. The first is that the DTN layer is aware of any multi-hop contacts and the second is to enable the protocols to exploit stable long-lasting contacts.

In order to enable DTN protocols to exploit logical DTN contacts that are available only via a multi-hop connection on the underlying MANET and also to efficiently utilize multi-hop DTN connections, the respective routes on layer 3 have to be known. Direct, point-2-point DTN contacts are usually recognized by the DTN instances as is. In case of logical contacts, an efficient MANET routing approach on layer 3 is required that is capable of finding routes to all available nodes under various and changing network conditions. Depending on the type of MANET routing protocol applied on the node, this is either done pro-actively as part of the routing procedure or has to be triggered on demand by an upper layer.

However, the routing protocols provide a route to another IP node. This does not include information on whether the corresponding node has DTN capabilities or not. If such heterogeneous setups are considered, additional service discovery mechanisms are required to identify which subset of reachable nodes actually has DTN capabilities. In previous projects [7, 17], we developed a suitable MANET routing framework as well as mechanisms for effective service discovery. These two mechanisms enable the DTN node to effectively discover multi-hop contacts to other DTN nodes that are directly reachable via an existing route on layer 3.

Secondly, the DTN protocol has to be enabled to use these contacts efficiently. The same applies to any other direct DTN contacts as well. Currently, this is not the case for rather stable networking conditions.

In such cases, the scoring gets calculated only once at the beginning of the contact, with a medium score as subsequent encounters are supposed to boost this value. A contact that does last long therefore gets one utility value that never gets reinforced by subsequent calculations since the contact exists only once. The situation is even worse, if the protocol uses some form of decay mechanism to forget about contacts that do not receive renewal updates. In that case, the initially low value of the long-lasting contact gets further reduced over time, even if the contact is still active. Therefore, shorter contacts with large disruption periods will have a better scoring value as their value gets recalculated occasionally. Consequently, nodes reachable via the long-lasting contact have a low probability to be chosen as the next hop, and the stable connection is not used as expected.

Another problem is that the check for a message exchange is also only executed if a new contact comes up. This further decreases the utilization of stable connections and affects also protocols that do not use a utility-based routing scheme. It is therefore another crucial effect.

To solve both problems, it should be sufficient to periodically re-initiate the calculations and checks done otherwise for newly discovered contacts. Such a re-initiation would imitate the effect of shorter contact durations or more frequent interruptions leading to a better utility score and also check for possible messages more often. Using a quite simple enhancement would allow the protocols to overcome both drawbacks of the current design. On the other hand, the simple re-initiation does not affect the performance of the protocol towards other scenarios with the typical short contact duration. These contacts are still handled in the same way. Algorithm 1 shows a generic version of the combined process.

The recalculation can be achieved by integrating a function to periodically monitor active contacts and, if a connection has been active for a certain period of time, trigger the recalculation of the score. Ideally this is integrated in the existing function of the service discovery procedure to get up-to-date routing information. Such an approach therefore makes the utility scoring algorithm aware of contact durations and enables any protocol to handle long-lasting contacts as well. Using this mechanism is preferable as it does not require changes to the way a DTN routing protocol calculates the utility score. Therefore, the mechanism can be integrated into any available DTN routing protocol.

A periodic check on the DTN layer is needed anyway, in order to efficiently discover changes in available nodes, even if on-demand approaches are used as MANET routing. Otherwise, the DTN instance relies on notification on connections with ongoing traffic from the MANET instance. Besides the periodic triggering of the service discovery procedure, the same function could trigger the recalculation of the scoring values. Whether the same intervals are used in both cases has to be decided based on the scenario and explicitly based on the introduced overhead to the network for each service discovery request. Depending on the chosen update period, the contact-aware utility scoring will favor the stable conditions because their score will increase and thus be sufficiently high to initiate the forwarding of messages.

Finally, the periodic procedure should trigger a check whether additional messages can be transfered based on the updated information. This aspect makes the approach even more versatile, as it allows a more transitive behavior over stable connections. For example, if a neighbor on a long-lasting contact has new messages or contacts that enable transfers that were not indicated under the original conditions when the nodes met initially.

Figure 5 shows the general interaction between the different protocol components and instances in a corresponding node.

Fig. 5
figure 5

Conceptual design of hybrid interaction between DTN protocols with contact-aware scoring and MANET service discovery

The presented concept is however an example solution specifically designed for the requirements of first responder communication networks by integrating our adaptive routing framework and extending nodes with DTN capabilities. The result is a communication system that fulfills the requirements of a robust and adaptive communication under heterogeneous conditions. However, the mechanisms in general are applicable to other scenarios and architectures as well, if these are able to collect the required information on layer 3 and provide it to the DTN instance.

2.4 Simulation setup and implementation

To validate the concept of our approach, we first repeated the simulations of the mentioned SAR scenario with backbone nodes in Opportunistic Network Environment (ONE) [18] and collected the corresponding connectivity traces that get reported by one of the default reporting tools in ONE. The detailed list of simulation parameters is given in Table 1.

Table 1 Simulation parameters in ONE

Overall, this scenario features several groups of first responders searching the area in a distributed way. While the group members remain in quite good connectivity to each other, the communication to other groups gets frequently disrupted. In such a rescue mission, a high delivery ratio and the timely delivery of messages are the crucial performance metrics. To enhance both metrics, additional DTN-enabled backbone nodes were added subsequently to the scenario and carefully positioned to be fully connected at all times and cover a large part of the search area. Even though RAPID was designed with the typical contact distribution in mind, its features with variable optimization goals make it an interesting candidate for first responder communications too. Besides that, RAPID was also designed with hybrid operations in mind [8].

2.4.1 Proof of concept

We first performed a study to prove the base assumptions of our work. Simulation results with the above setup show that both service discovery from MANET instances as well as periodic score recalculations are required.

To validate the concept, the resulting contact trace from ONE was altered by adding/deleting contacts to represent the following variants: Variant 1 representing the original connection history. Variant 2 featuring a virtual DTN node representing the backbone in order to allow the backbone to act as storage. Figure 6 shows the resulting topology. Variant 3 featuring all layer 3 connections. This corresponds to the topology in Fig. 4a with explicit additional connections for all transitive multi-hop connections as well. It therefore represents the result of a service discovery request for all available DTN nodes that are reachable from any other DTN node.

Fig. 6
figure 6

Topology with virtual DTN node (green) representing the backbone

In variant 2, all nodes of the backbone have been replaced by one single, virtual node. Afterwards, connections overlapping in time, arising from this process, have been merged to one contact. For the third variant, we combined all appropriate connections to create new contacts, in order to reproduce all possible layer 3 connections. Therefore, we created an algorithm which joins two simultaneous connections with one common node to compute a new indirect contact. Just like in the variant 2, we merged overlapping connections. These two steps have been repeated until no new connection was found. In this way, we found all indirect connections, independent of the hop count.

For each variant, we included artificial interrupts in the connectivity to check whether a pure boost of the utility score is sufficient to better utilize available contacts. To do so, we created modified traces for each variant with no additional breaks and breaks every 500, 100, 50, 10, or 5 s, respectively. With this modification, one long-lasting contact get represented by a series of shorter contacts. The change of the connection state automatically triggers the mentioned recalculation of the DTN utility score.

2.4.2 Implementation of context-aware scoring in RAPID

Finally, we applied our idea to periodically trigger the recalculation of the utility score by making the scoring mechanism aware of long-lasting contacts to the RAPID protocol. This protocol features three different utility algorithms that can be selected based on application requirements [8]. It allows to minimize the average delay of all messages to minimize the number of messages missing their deadlines in terms of time to live associated with the messages or to minimize the maximum delay of all messages and uses information on previous contacts as well as the currently transmitted messages as context information, which gets stored in tables accordingly. The calculation of the score is triggered by detected changes in the connection status. For example, if a new contact is discovered for the first time, a dummy entry is created in the respective table. Once the contact ends, this dummy entry gets updated with statistical information about this contact, such as average duration (for re-occurring contacts) and messages transferred via this contact. This statistical data is then used to calculate the utility score with variations depending on the optimization goal.

The update mechanism based on the contact status is however the reason why RAPID showed no benefit from long-lasting stable contacts in its original form. Long-lasting contacts as they are created by the backbone nodes become active once and might stay so throughout the simulation or at least change very rarely. This leads to missing statistics and thus a bad utility score for any node reachable via such a connection. It is however rather simple to avoid this situation. To do this, all active contacts have to be monitored. Once the last update to an entry referring to a still active contact is older than a certain configurable threshold value, the statistics should be reported for this entry including a reset of the interval. By doing this, the utility calculation itself remains unaltered, it is however provided with more statistical information based on the actual contact history. Algorithm 2 shows the algorithm describing the implementation in ONE.

Our modified version of the protocol is available on GitHubFootnote 1.

3 Results and discussion

In this section, we will first present and discuss the results of a general proof of concept study, showing that the proposed mechanisms are providing better results. Afterwards, we compare and discuss these theoretical results with the results of an example protocol-specific implementation of the concept.

3.1 Proof of concept

Figures 7 shows the impact of the different contact durations for each variant on the average delay (a) and the error ratio (b) respectively. Original contact intervals in this figure represent the dataset without artificially added contact interrupts.

Fig. 7
figure 7

Performance resulting from different variants. a Delay b Error ratio

Regarding the average delay, all variants show an improved situation, except variant 3 where the delay remains more or less constant due to the availability and awareness of the layer 3 multi-hop connections. The average delay is however quite high and significantly above the desired deadline of 90 s for all cases. We show these results nonetheless to illustrate the effect of periodic updates on the delay and thus the impact of periodic recalculation of the utility score.

The better metric is however the error ratio. Based on Fig. 7b, we can observe that relatively long contact intervals of 500 s already show an improvement and that the error ratio is only slightly decreased further by adding shorter intervals. More interestingly, the impact of different variants is quite significant, with variant 3 showing the best results. In case of shorter intervals, the performance of this variant is decreased due to contacts that do not last long enough to cover the complete multi-hop connection and thus leading to failed transmission attempts that have to be repeated at a later point in time. Even if the performance gets worse, this variant still outperforms the others. This shows that efficient service discovery, revealing the current contacts to all available nodes is required.

Another critical point is the overhead introduced by service discovery requests at a certain interval. If only layer 3 service discovery is used to determine the connection status as shown with the artificial breaks, this has to be done periodically which can create a high additional load to the network. To limit this, the interval length for such periodic requests has to be selected carefully. Based on the previous analysis, a value between 500 and 100 s should represent a good compromise between recent updates and overhead limitation.

To further study the benefit of different contact intervals, we repeated this preliminary experiment by stepwise adding further backbone nodes. Figure 8 shows the results for 0 to 35 backbone nodes.

Fig. 8
figure 8

Impact of different artificially introduced contact durations on the overall performance

In this diagram, we compare the performance of RAPID based on the modified traces without artificial interrupts as well as artificial break intervals of 500, 100, and 50 s. Please note that all results from these traces are between the original protocol performance and the ideal solution presented in Fig. 2. Even if the theoretical result cannot be reached, the improvement is quite significant, especially for the cases with many additional nodes. There, the error ratio can be reduced by around 25%. The exact values of the curve are application dependent, because further gains might be possible if nodes get repositioned or use different technologies. However, it is important to show that an improvement is possible by adding nodes, as expected.

When comparing the interval length, the performance differences are less significant and partially contradicting. But this actually might result from the artificial breaks, which tear down the connections for 0.1 s and re-establish them afterwards. Any message that is in mid transfer would be lost at this occasion and has to be repeated later on, leading to an unnecessary overhead. Therefore, an actual artificial breaking of the connections is not desired for the implementation of the concept in a real protocol.

3.2 Context-aware utility calculation in RAPID

The implementation was finally tested against previous results from the original protocol version, the theoretical bound and two trace versions of variant 3. Both traces with the modified connections were added in order to show the effectiveness of the implementation in comparison to the proof of concept. The first trace uses no artificial breaks and thus represents the case when only service discovery is used without periodic contact-aware utility calculation. Since we want to study the effect of a hybrid solution, this trace was also used as base for the modified protocol too. Otherwise, the knowledge on possible multi-hop contacts would be missing. The second trace includes artificial breaks with the same contact interval as configured for the contact-aware utility function, which was set to 100 s. Figure 9 shows the corresponding results.

Fig. 9
figure 9

Performance results of the modified protocol. a Comparison to artificial traces b Comparison to previous implementation

As seen before, the artificial traces and the modified protocol are able to follow the general trend with a decreasing error ratio if the number of backbone nodes increases. This shows that our proposed concept is able to reproduce the behavior that is based on artificially generated traces and that it is outperforming the original protocol version.

A bit more attention should be paid to the result in Fig. 9a however. Out of the remaining three data sets, two are artificially generated to proof the need for both service discovery and periodic recalculation of the utility score are needed. These results are obtained by artificially adding connections to all multi-hop reachable nodes and artificial breaks forcing an unaltered version of RAPID to recalculate the scoring. Both version are therefore idealized. The final data set covers the performance of our modified version of RAPID, based on the original contacts.

While the general trend is the same for both artificial data sets, the set without periodic breaks shows a slightly worse performance especially for cases with a higher number of backbone nodes. This indicates that a pure service discovery scheme alone is not able to enhance the performance as expected, if more and longer lasting contacts are present as it is the case in more dense networks. Therefore, our decision to utilize two mechanisms is justified.

All three sets show a similar performance, where the two artificial versions are slightly better than the real implementation. However, since these values are generated artificially based on perfect future knowledge and actually forced breaks of the connection, they are idealized as well. It was to be expected, that the enhanced RAPID implementation does not reach the same performance in any case, if the routing takes its own decisions based on the knowledge it has at that time. This most likely results from differences in the treatment of physically interrupted connections and the periodic contact-aware recalculations.

However, the overall improvement of the network performance is significant as shown in Fig. 9b. Nevertheless, further optimizations on the interval length of the periodic updates could help enhance the performance further and fine tune the results.

Besides that, it should be noted that similar concepts can be applied to any other DTN routing protocol as well, since we do not require a change in the fundamental utility scoring algorithms but rather force the protocols to trigger the corresponding calculations more often. Another possible enhancement for the given scenario could be the exploitation of resulting multi-contacts as already suggested in [16].

3.3 Robustness evaluation

In order to evaluate the impact of the modified protocol version on the network robustness, we added simulations with random failures of 20% of the backbone nodes for each trace configuration. The nodes are chosen using a random number generator (RNG) with uniform distribution supporting equal probability for all backbone nodes of the given trace to be selected as inactive. The selected nodes are then configured as inactive in the simulation using the existing functionality of ONE. We chose this setup, as the focus of the work is to present a hybrid scheme that is actively exploiting additional nodes and its ability to adapt to changed conditions regarding these nodes.

For each backbone configuration, we executed several runs with different seeds for the RNG resulting in different distributions of inactive nodes each. The runs however only reflect a subset of all possible combinations of failing nodes. Despite the random selection, we ensured that nodes at the edge of the spectrum are also considered.

For each run, we then calculated the average number of messages considered as error (numlost+numlate(90s)) as well as the standard deviation of that to show the variance. Based on these numbers, we calculated the corresponding average error ratio and the percentage of variance. Figure 10 shows the corresponding results in comparison to the performance without node failures.

Fig. 10
figure 10

Impact of random node failures on the overall performance

The variance is in all cases quite high, indicating that nodes with different criticality to the overall performance exist in the tree-like backbone topology. This is expected due to the topology and the expected message flow along the backbone. If the random selection targets nodes closer to the root of the tree, this is more crucial for the overall network performance than failures at the leaf-level of the tree.

Besides that, both variants with node failures show a better performance than the original fully connected backbone. This might sound surprising at first, as a smaller number of nodes is available. However, the failure of individual nodes in this case breaks the original structure with long-lasting contacts resulting in new additional contact opportunities as nodes move between the gaps. As mentioned before, these changes in the contact distribution trigger the normal utility calculation. This results in better overall performance and raises the question on an optimal backbone node placement.

Since the focus of this work is not on the optimal distribution of the backbone nodes, we concentrate more on the resulting performance in general. There, the modified RAPID implementation outperforms the original version in all cases except if ten backbone nodes are used however with a high variance. The gap between both variants is not as large as with a continuous backbone, but still the enhancement is clearly visible for scenarios with a large number of backbone nodes. The reduced gap size was expected as the random failures add unpredictability to the network and thus limit the effect of long-lasting contacts.

The results from this evaluation show clearly that the proposed scheme is robust in terms of random node failures, as the benefits are clearly visible for well-connected parts of the network, while not reducing the performance in case of random failures compared to the original protocol.

The observation that certain nodes are more critical than others and that different node distributions have a significant impact of the overall network performance was expected based on the scenario setup. For future work, it is however an interesting question to exploit this behavior further. Knowledge on vulnerable nodes in a network both in terms of traffic load and criticality, and the optimization of the deployment strategies can help further enhance the communication capabilities of first responders.

4 Conclusions

In this paper, we discussed the design assumptions of DTN routing protocols and how they affect the overall network performance if the protocols are applied in hybrid DTN-MANET scenarios featuring partially stable and long-lasting contacts on layer 3. While this is desired for efficient routing on layer 3, it proved to be problematic in the DTN context, as the utility scoring is often unaware of such contacts and thus unable to exploit them.

We then presented our concept to mitigate the observed problems by combining efficient layer 3 service discovery and adding contact-awareness to the utility scoring algorithm. Following a case study with artificially generated traces to validate our design assumptions, we developed a modified version of RAPID as an example using the contact-aware utility scoring mechanism. Comparisons with the original version of the protocol show the effectiveness of the concept and a significant improvement of the overall network performance.

The presented enhancement helps significantly close the gap between the original RAPIDs performance and the ideal performance of continuous backbones. Especially, the effective utilization of stable contacts qualifies the presented solution for the application in heterogeneous networking environments with zones of stable connectivity and common disruptions as in disaster scenarios. In case of random node failures, the concept shows a better performance than the original version for a large number of backbone nodes and otherwise a similar performance and is thus considered as robust solution.

Moreover, our concept is not limited to the RAPID protocol, even if that was chosen as an example here. The general ideas apply to any other DTN protocol as well that uses utility scoring to decide whether to forward messages to a peer node. Considering the periodic triggering of the forwarding check, this also applies to any other protocol.

As future work, we plan to work on an optimized parameter set for the contact-aware utility scoring algorithm to exploit the limits of this approach. Besides that, we want to apply our solution to further protocols as the concept itself is not depending on a single implementation and can be applied to any protocol that calculates a utility score based on contact statistics in some form.

The mentioned optimization of the backbone structure and analysis of vulnerable nodes in the topology should be targeted to better understand the impact of the chosen deployment on the communication systems performance. Finally, the efficient utilization of additional contacts should be evaluated further for mobile message ferries as well.