1 Introduction

The publish/subscribe paradigm provides a anonymous, loosely coupled communication between event producers and interested subscribers [1]. This paradigm has been initially used for large scale systems. e.g., the Internet [2,3,4]. Recently, it has been used in other scenarios, e.g., wireless sensor networks [5,6,7] and the Internet of Things [8,9,10]. In this regard, several systems can be mentioned as relevant in the field, e.g., SIENA [2], JEDI [11], REBECA [12] and REDS [13] for both publish/subscribe in general and for their client mobility support in particular.

The main idea behind the publish/subscribe paradigm is to separate the devices that generate content from those that consume it. The content generated can range from a temperature reading of a sensor to an access notification on a web page, or even the distribution of a live television broadcast through the internet. In a publish/subscribe system the processes that generate and send content to the network are called the publishers, and those that consume the events are called subscribers. The decoupling is complete between both such processes. A publisher does not need to know which is the subscriber that is receiving the information it is sending to, nor do both of them need to be communicating at the same time in order for the message exchange to happen.

In this paper we propose an approach that not only supports client mobility, but also handles broker mobility in a wireless ad hoc network. In this approach the amount of devices connected to the network, and the connections between them are able to change. Whenever a broker moves physically a change is made in the network topology. Due to the wireless nature of the network the broker might lose connections or be able to connect to new devices. The protocol we propose tries to minimize these changes in topology to help with stability. We can also simplify a fault in one of the brokers as a loss of connectivity, making it easier to the protocol to also be fault tolerant.

The rest of the paper is organized as follows. Section 2 introduces the related work on publish/subscribe and the mobility and fault tolerance in these systems. Section 3 presents the model and definitions. Section 4 addresses the creation of the network overlay to route events between brokers. Section 5 presents a protocol to handle broker migration in publish/subscribe systems. Section 6 presents performance results of our approach and compares it to AODV. Finally, Sect. 7 concludes the paper.

2 Related work

Most of the research done in publish/subscribe systems is centered on improving current solutions, be it the reliability of delivering an event [14], improving the performance or increasing the fault tolerance [15]. Some work tries to improve on a typical tree structure for event delivery. In [16] authors propose the creation of a tree for each topic a subscriber can subscribe to with the publisher being the root of the tree for optimal message delivery. In some cases a communication tree might be too weak against node failure and the authors of [17] propose using gossiping so that the system can keep working while the tree is being repaired due to a node failure.

Another topic is the support for mobility. Though there are various protocols for publish/subscribe middleware, few of them support mobility [18]. In [19], authors mention some possible solutions for mobility support in publish/subscribe. Strategies are suggested to extend existing solutions, both in centralized and decentralized networks. In the case of a mobile network, nodes will need to adapt to disconnections, partitions of the network or the merging of those partitions, and the storage of undelivered events. In [20], Huang and Garcia-Molina study the tree construction problem in wireless ad hoc publish/subscribe systems. They define the optimality of a publish/subscribe tree by developing a metric to evaluate its efficiency, and propose a greedy algorithm that builds the publish/subscribe tree in a fully distributed fashion. Several works also address the different factors that affect the performance of a system with mobile nodes [21, 22], mostly based on mobile clients. A proposal to create self-configurable and adaptive peer-to-peer architecture for implementing content-based publish/subscribe communications on top of structured overlay networks has also been made [23, 24].

Another possible solution to support mobility is the use of information-centric networks [25,26,27]. Since this kind of network supports mobility natively, authors propose exploiting this property instead of using traditional TCP/IP communications.

Internet of Things (IoT) and Wireless Sensor Networks (WSN) also constitute an area that is still pushing research towards new topics [28, 29]. A number of recent contributions have also been made in the area [30,31,32,33]. Most of the approaches support mobility through the inclusion of gateway nodes and the separation of the publish/subscribe system from the WSN. The gateway nodes receive messages from any number of sensors and act as a publisher to the publish/subscribe system. This allows for the sensors to be mobile devices that send events to the gateway they are connected to, but does not fully compose a mobile publish/subscribe system.

2.1 Mobile clients

Most of the research carried out to support the mobility of nodes in a publish/subscribe system has been done with regard to supporting mobile clients, be they publishers or subscribers.

The first system to support client mobility was called JEDI [11], named for Java Event-Based Distributed Infrastructure. In JEDI a node must notify of its intention to migrate to the broker to which it is connected, before the migration happens. This is done by the use of explicit moveOut and moveIn messages, which a subscriber sends in order to start and finalize the migration process.

SIENA [2, 30] was a system developed at the same time as JEDI that also allowed for client mobility. It also uses explicit moveOut and moveIn messages and it uses flooding, that has been found to be excessive [34].

Another framework that added support for client mobility is the REBECA [12, 35] publish/subscribe system. In this case the moving node does not need to send an explicit moveOut message, a broker will detect when one of its connected subscribers has disconnected. The broker will then create a virtual counterpart of the roaming subscriber that will be merged with the real one once the migration finishes.

Mobile XSiena [36] is a publish/subscribe platform which seeks to extend the XSiena [37] content-based publish/subscribe system in order to support user mobility. The key mobility-related features of Mobile XSiena are mobile device integration, seamless networking, reconnection support, location-based matching, and persistent events. This was later integrated into the Phoenix framework [38,39,40].

MQTT is a commonly used protocol that also has received improvements in order to support client mobility. Though MQTT offers the support for subscriber mobility by allowing a subscriber to be connected to a subset of brokers, creating backups in case of a link failure, it does not allow for network reconfiguration in the case of a new connection, the subscriber will have to issue the subscriptions again. In [41] authors extend the protocol to support publisher mobility, by detecting a disconnection in the publisher node, and storing undelivered messages while the system is reconfigured. This approach guarantees the delivery order of the messages to be the same as that of the creation.

\(\mathcal {PSVR}\) [42] is a routing algorithm for a publish/subscribe system in a WSN. Siegemund et. al. mention the cost of maintaining a communication overlay in a dynamic environment, that is often really high or is omitted [43, 44], where systems usually recreate the overlay completely. The proposed algorithm is designed for systems with highly dynamic subscribers and publishers.

2.2 Mobile brokers

The scenario of mobility inside the event notification service is the most difficult to handle [45]. In this case the algorithms need to be able to handle the migration of not only clients but also reconfiguration on the subscription delivery path. There are few solutions that support full mobility on publish/subscribe systems.

In [24] an extension to SIENA is introduced where a self-organizing algorithm executed by brokers will try to optimize message delivery. Mechanisms are introduced to allow the reconfiguration caused by changes in topology, mostly to minimize the notification cost, but it could also be a first step towards supporting mobile nodes. Though the complexity of the algorithm, together with the need for a human administrator in case of a broker failure during the topology change procedure, makes it unsuitable for a highly mobile environment where a broker might start the topology change, but be disconnected by the time it finishes.

EMMA [46] is an extension to MQTT that not only handles client and broker migration in a transparent way, it also uses its migration mechanism in order to optimize QoS. It uses a controller node that is constantly monitoring the network and is informed of any change in device connectivity. The controller will then try to optimize event delivery and issue migrations to both clients and brokers to load balance the system. The requirement of a device that needs to know the connectivity of each node in the system prevent this solution to be used in a fully mobile environment where it might sometimes be unreachable.

2.3 Proposed solution

In this paper, we propose a protocol where any node, at any time, can migrate in the network without the need to notify neighboring nodes or a single central controller. In a truly mobile environment nodes might not know they need to migrate before a connection is lost. Using our protocol, nodes (publishers, subscribers or brokers) will be able to join or leave the publish/subscribe service at any time.

Unlike existing solutions, in our protocol, if the network is partitioned, the service will not have to wait for connections to restore. Any partition will work as an independent service, and after some time if connections recover the partitions will merge and any messages that were not delivered yet will be reach their destination.

3 Model and definitions

In a publish/subscribe system we might find two different components. Clients will produce and consume events while the notification service handles the subscriptions issued by the clients and assures the correct delivery of events to the interested clients.

We can further divide the clients into two subsets: subscribers that will register their interests and consume events, and publishers that will produce those events. We will use \(s \in S\) to refer to a subscriber belonging to the set of subscribers S and \(p \in P\) to refer to a publisher that belongs to the set of publishers P. Any clients in the system may behave as a subscriber, publisher or even both at the same time. We will also use the nomenclature \(f \in F\) when referring to a filter that belongs to the set of filters F.

The notification service is composed of a set of brokers which we will call B and refer to individually as \(b \in B\). The brokers will be connected at the logical level by an acyclic graph or a spanning tree. The brokers are responsible for storing the subscriptions issued by the subscribers and routing the published events to the matching subscribers. At any moment a broker will have a set of neighboring brokers in the graph, that it can communicate with. We will refer to this set as \(N_i\) for broker \(b_i\). A broker will also be able to communicate with clients that are connected to it. For this reason we will refer to the set of interfaces, be it other brokers or clients, that a broker \(b_i\) can communicate with at any moment as \(I_i\).

All communications are by point-to-point message passing over FIFO channels. Since participants are mobile, the set of channels linking them, as well as the neighbor set evolves. There is no need to have previous knowledge of the sets, i.e., initially each participant knows only itself and the amount of participants on each set might change as time passes.

3.1 Simple routing

The Simple Routing [47] protocol assumes a static system where brokers are connected in an acyclic graph, and clients are permanently bound to a single broker. This routing strategy is based on the propagation of subscription (SUB) and unsubscription (UNS) messages to all of the brokers in the system. Every broker \(b_i\) maintains a routing table \(R_i\) that is based on the received SUB and UNS messages and models the subscriptions in the system. The routing tables enable brokers to filter incoming events received as PUB messages, and forward them only towards those subscribers with matching subscriptions.

The routing table \(R_i\) at every broker \(b_i\) contains, for every subscription in the system, a routing entry (fz) where \(f \in F\) and \(z \in I_i\), to indicate that the publication of an event e matching f must either be forwarded towards broker z (if \(z \in B\)) or delivered to subscriber z (if \(z \in S\)).

3.2 Phoenix

The Phoenix [40] protocol handles subscriber mobility in content-based publish/subscribe systems. In order to do so, the routing table at brokers also stores the identity of the subscriber that issued each subscription. This way, when a subscriber migrates, the broker to which it was connected can be notified of the change. There are two extra types of messages used by Phoenix, one for notifying the migration of a subscriber(MIG), and another for replaying queued events to a migrated subscriber(REP). Whenever the subscriber re-connects to the system, possibly to another broker, it will issue a MIG message, whose propagation allows routing tables to be updated and published messages for the subscriber to be delivered.

4 Creating the network overlay

In order for the devices on the network to communicate efficiently we must create a logical overlay over a wireless ad hoc network. We need a way to create an acyclic graph (a spanning tree) in order to correctly route the messages. We also need a mechanism that detects when a change in the topology has occurred so a new link will be created when an old one disappears. The algorithm that creates this graph must also support the formation of several partitions in the network, each one working independently until they can merge together again. Lowering the changes made to the graph caused by physical changes on the network will also help to reduce the migrations needed to synchronize the publish/subscribe system.

We can use any algorithm that gives us these properties. In our case, we have chosen a leader election algorithm that has a heartbeat mechanism in order to keep the leader stable [48]. Once a leader has been elected, this node will keep sending messages so that all the other nodes will have this one as their leader. When a node receives one of this messages it will know the path to the leader [49], and it will broadcast it so the message spreads to all nodes within communication range. With this we create the overlay we need for constructing the publish/subscribe system.

Using this algorithm, in the event that the network is partitioned, each of the partitions will choose a leader. And eventually when the network becomes connected again both partitions will merge choosing a single leader and maintaining a single graph. Furthermore, with this heartbeat message, when a node first receives the message of a new round it will store the sender as the next hop to the leader. This next hop might be modified by any physical change in the location of a node or by a failure since the heartbeat message will arrive via another node. With this we can detect when the topology has changed and notify the publish/subscribe system so that it can migrate accordingly.

figure c

5 The MFT-PubSub protocol

In this section we will describe the Mobile Fault Tolerant publish/subscribe (MFT-PubSub) protocol, and the changes made to Phoenix. Since the brokers are moving any change in the topology can happen at any time. These changes can range from a simple client migration to the migration of multiple brokers at the same time. Due to the changing nature of the communication tree brokers might not have been notified of a change further down on the connection tree.

In order to take this into account we add a timestamp to any message sent by a subscriber. Previously on Phoenix a timestamp was also used so that a subscriber could request all the messages it lost during migration, and these messages were stored on a single broker. But, in our case the subscribers are not the only ones that are migrating, brokers will also migrate. As a broker migrates it has no knowledge of the last received message by a subscriber. Furthermore since the broker network is also changing, we cannot designate a single broker as the one responsible for storing the events. In order to solve this we need a mechanism that tells us if a message has been delivered. With this, if an error occurs, the broker will store the message as undelivered. When a broker receives a migration message, from a subscriber or another broker, it will send all messages stored for the subscribers that migrate. This is why we decided to repurpose the timestamp concept. This new timestamp will consist of a sequence number that increases each time a subscriber sends a message. We will also include a hop count to the messages, this way any broker will know on how many hops it can reach a subscriber. This is referred to as (th) in the algorithms. With these two values we have useful information when a migration occurs in order to find how the topology is changing. Any broker with a higher sequence number will be deemed to have the latest information and correct path on that subscriber, if the timestamps are equal the one that reports being the closest will have a higher probability of being correct.

If we want to include this information we have to modify the previously defined SUB, UNS and MIG messages. The changes can be seen on Algorithm 1. When a broker \(b_i\) receives one of these messages it will first check if the message contains new information by comparing the timestamps. Then it will store the new value and before propagating the message to the rest of the network it will increase the hop count of the message by one. Algorithm 2 shows how the event replay works.

figure d
figure e

We also have to take into account the possibility of a subscriber migrating from one partition of the network to another, and since both partitions function individually the subscriber will have different subscriptions in each of them. We added a new message called FILTERS to fix this issue. We can see how this message is sent on lines 23–27 on Algorithm 1. Whenever a subscribers sends a MIG message the broker it migrates to will answer with a FILTERS message. This message contains all the subscriptions of that subscriber that the broker has in its routing table. Using this information the subscriber may decide that the subscriptions are outdated and issue SUB or UNS messages to fix and update the routing tables of the brokers on that partition.

Table 1 New Message descriptions
figure f
figure g

Table 1 shows the messages used in order to support broker mobility without forcing migrations. The main message for this protocol is called BMIG and it is used for notifying the migration of a broker. We also add two helper messages to fix the routing tables in case of a migration; BQUERY and BSUB, for asking about the subscriptions a broker has of an specific subscriber and for sending the subscriptions issued by a single subscriber that a broker has in its routing table respectively.

When a broker \(b_i\) migrates from \(b_o\) to \(b_n\), it calculates two sets of subscribers, C for its children and O for the rest, based on their next hop. It updates its routing table for the subscribers in O with \(b_n\) as their new next hop. After sending a BMIG message to \(b_n\), it sends any queued message for subscribers in O through \(b_n\). The code that describes this behavior can be seen on Algorithm 3.

A BMIG message has three parameters; two lists of subscribers with their timestamps, separating what the sending broker believes are children nodes \(C_j\), and the rest \(O_j\), and a hop count for the message. Upon reception of this message from another broker \(b_j\), a broker \(b_i\) first goes through the \(C_j\) set in order to find inconsistencies, as shown in lines 75–89 on Algorithm 4. If \(b_i\) has a newer timestamp or a lower hop count to the subscriber than what is shown on \(C_j\) \(b_i\) will send a message containing the subscriptions of that subscriber with the correct timemstamp and hop count to \(b_j\) (lines 132–138 of Algorithm 5). On the other hand if the timestamp is lower it will ask \(b_j\) to send updated information on the subscriber. At the same time a new list of children subscribers is created with the corrected information. The same procedure is followed for all subscribers in \(O_j\) on lines 90–96, in this case hop counts are ignored and \(b_i\) will only check the timestamps. To finish checking for inconsistencies on \(b_j\)’s routing table on lines 97–101 the first broker that receives a BMIG message will check if both sets \(C_j\) and \(O_j\) contain all the subscribers \(b_i\) knows. For any subscriber that is not in the combination of both sets, \(b_i\) will send a message back to \(b_j\) with its subscriptions.

Once all inconsistencies have been fixed, in lines 103–108, \(b_i\) updates its routing table to show the change in topology for the subscribers that are children of \(b_j\) and previous next hops for those subscribers are stored. The corrected children list will be forwarded to those stored brokers. Finally any queued messages will be forwarded to the updated subscribers.

Once a broker \(b_i\) receives a BSUB message it will check if it has a newer timestamp for the subscriber than what \(b_i\) itself has, if it is older \(b_i\) will ignore the message. Then \(b_i\) will first remove all entries for that subscriber from its routing table and add the ones that came with the message updating the subscriber’s timestamp as shown on lines 117–123 of Algorithm 5. This message will be forwarded as if it were a SUB message issued by any subscriber. Finally any queued message will be sent to the subscriber.

When a broker receives a BQUERY message, with a timestamp older than what it has, it will directly answer back with the subscriptions of the subscriber the message is asking for, lines 129–131 of Algorithm 5.

Fig. 1
figure 1

a shows a straightforward migration of one broker whereas b has two migrating nodes. Numbers in links refer to the order of events

Figure 1 contains two examples of possible migrations. If we were to take a look into the messages needed to co mplete the migration shown in Fig. 1a we would only see two messages. A BMIG message that is sent by \(b_3\) and needs to be routed to \(b_1\) through \(b_2\). Whereas, the migration that takes place in Fig. 1b is more complicated. In this case we need to notify \(b_3\) that \(b_6\) has migrated before it so it can update its routing table during its migration.

6 Performance evaluation

This section presents the performance evaluation of the MFT-PubSub protocol presented in the previous section. Results have been obtained by simulation, using the OMNeT++ [50] tool with the Castalia [51] simulation framework. Table 2 presents the different simulated scenarios. The area has been calculated for a node density of 0,005 nodes per square meter, which is adequate for wireless sensor networks, i.e., giving an area of 200 square meters per node. We also define a role (publisher, subscriber or broker) for each node. We are interested in seeing how the protocol behaves in different scenarios, not in fully stress testing it. For this reason we have chosen to use only 2 publishers for all scenarios whereas the number of subscribers and brokers increases. The protocol does not store any information on the publishers, a publisher simply sends a message and the brokers are in charge of correctly delivering it. By increasing the subscribers and brokers we increase the amount of nodes the protocol has to take into account. Subscribers will also subscribe to 2 filters and publishers will randomly choose to send messages matching one or the other.

Table 2 Simulation Configurations

The duration of the simulations is set at 700 seconds, with a publication rate by publishers of 1 message every second. This message generation rate is enough to test if the protocol works without overloading the communication buffers of the low power devices of the simulation. The messages have a constant data payload of 100 bytes. At the end there is a 200 second period where no new messages are sent so that messages that are still in buffers have time, and the opportunity, to be delivered. The mobility of nodes follows a random waypoint model [52], with speeds of 2-4-6-8-10 meters per second. Using this mobility model nodes will choose a random point in the simulation area and move towards it at a constant speed. Once the point is reached the process will be repeated. For the radio module we choose to use one that is already configured in the Castalia framework, the CC2420 chip, with a transmission power of 0dbm and an additive collision model. With respect to the MAC layer, we have used the Carrier-Sense Multiple Access (CSMA) configuration that comes with the Castalia installation. All possible combinations of size and speed are repeated 10 times with a different random seed for the mobility pattern and the results are averaged. The combination of node density and movement speed for the simulations means that all iterations have large amount of migrations and moments where the network is partitioned. The total number of migrations range from 10 in the smallest ones to over 600 in the largest simulations. The low transmission and reception power, and speed of the radio module chosen for the simulation means that in the larger scenarios the amount of message collision increases, causing over 600 migrations due to nodes trying to find better connections.

Due to the difficulty of finding an algorithm that allows for full broker mobility in a publish/subscribe system we had to choose a more general communication protocol for ad hoc networks. We compare our protocol with Ad hoc On-Demand Distance Vector (AODV) [53]. AODV uses a reactive approach to route creation to compensate for the dynamic nature of the network, where routes are created only whenever a node wants to send a message. In order to better compare both of them we also use the same roles that can be seen in Table 2. With AODV publishers are informed of the subscriber identifiers via a configuration file and all nodes work as brokers.

6.1 Delivery rate

One of the metrics that is able to tell us how well our protocol works is the delivery rate of messages. We consider the delivery rate as the number of messages a subscriber receives with respect to the ones that were originally sent to it. In MFT-PubSub messages that are not yet delivered are stored on the brokers waiting to be sent as soon as it receives new information about the subscriber. Eventually all messages will be delivered, but in the case of our simulations we consider that any message not yet delivered at the end on the simulation as undelivered.

In Fig. 2 we can see a comparison between our protocol and AODV. MFT-PubSub seems to have better resilience to speed, even improving the delivery rate as the speed goes up. Both protocols are strongly affected by the network size, the bigger the network, the harder it is to correctly deliver a message.

In addition, if we look at Fig. 3 we can see how many messages are actually delivered. The behavior we see in Fig. 2a, where we see an improvement of delivery rate for higher speeds can be further analyzed with Fig. 3b. Here we can see a slight increase in the total number of messages delivered related to the speed, but as the speed reaches 6 m/s it starts to drop. This behavior can be explained by the way our algorithm buffers the messages. Whenever a broker cannot find the path to a subscriber it will store it and wait for new information on that subscriber, as the speed goes there are more opportunities for a subscriber to pass by a broker that has a message for it. If we compare the delivery rate differences in Fig. 2 with the amount of messages delivered in Fig. 3a, we might think that the difference in delivered messages is not as big as the delivery rate might suggest. This difference is due to how a publish/subscribe system works, in order to deliver a message to a subscriber that subscriber has to first subscribe to some content. In these simulations we only take into account the messages that are routed to a subscriber as having to be delivered to that subscriber. If a broker receives a PUB message but does not know a subscriber on the other side of the network is interested on it, the message will not be considered a loss.

Fig. 2
figure 2

Message delivery rate comparison of both algorithms depending on the size of the simulation

Fig. 3
figure 3

Average number of messages correctly delivered. In a we show the results for a node speed of 8 m/s on different configurations. And, in b we show the results of all speeds for the C16 configuration

6.2 End-to-end delay

Another metric is the time it takes a message to reach its destination, we call this the end-to-end delay. In Fig. 4 we can see how network size affects the end-to-end delay. Even though MFT-PubSub uses buffering of messages to be delivered at a later date, it still keeps up with AODV, that tries to deliver a message as soon as possible, even obtaining better results on bigger networks where AODV struggles to keep routing tables updated.

Fig. 4
figure 4

Comparison of end-to-end delay, in seconds, for data messages for a node speed of 8 m/s. Note the logarithmic scale on the y axis

6.3 Number of messages exchanged

Finally, an interesting metric is the total number of messages exchanged in the network. This gives us insight into how efficiently a protocol is able to route messages, and how much overhead the protocol creates. In Fig. 5 we have this data as the average number of messages sent by each node, be it to find a route, delivery of a publication or any other kind of message. For the smallest configuration AODV has better performance than our protocol, since MFT-PubSub has to maintain a communication tree. But, whereas the number of messages needed as the network size gets bigger barely changes in our algorithm, AODV has a huge increase in the number of messages it needs to find the correct routes.

Fig. 5
figure 5

Comparison of the average number of messages sent by each node in order to correctly route messages for a node speed of 8 m/s. Note the logarithmic scale on the y axis

In the case of MFT-PubSub we also observed that there is a big difference between the publishers/subscribers and brokers in the number of messages. The former only need to send a few messages in total to keep connected to the spanning tree and the brokers do most of the work.

7 Conclusion

In this paper we have presented an approach to introduce full mobility support (i.e., not only clients but also brokers) for a publish/subscribe system. It is based on a spanning tree created via a leader election algorithm that works in situations where it is not known a priori how many nodes there are. This algorithm also gives us a mechanism to detect the movement of nodes as a migration.

Our protocol, named MFT-PubSub, uses a mechanism to reduce the number of messages for any migration and only exchanges information when explicitly asked.

We have simulated MFT-PubSub on Castalia and compared it to AODV, to analyze the performance with respect to mobility support and number of devices supported. We improve on the message delivery rate of AODV, though the performance is significantly reduced for networks up to 66 nodes. We have also shown that the number of messages exchanged increases linearly with the number of nodes in the system and is an order of magnitude lower than AODV for simulations of more than 10 nodes.

MFT-PubSub allows for any node in the network to behave as any role of a publish/subscribe system; be it publisher, subscriber or broker. In the future we want to further test this approach and use it as a solution for multicast communication in mobile environments.