Information Systems Frontiers

, Volume 16, Issue 1, pp 19–34 | Cite as

Scalable and leaderless Byzantine consensus in cloud computing environments

  • JongBeom Lim
  • Taeweon Suh
  • JoonMin Gil
  • Heonchang Yu
Article

Abstract

Traditional Byzantine consensus in distributed systems requires n ≥ 3f + 1, where n is the number of nodes. In this paper, we present a scalable and leaderless Byzantine consensus implementation based on gossip, requiring only n ≥ 2f + 1 nodes. Unlike conventional distributed systems, the network topology of cloud computing systems is often not fully connected, but loosely coupled and layered. Hence, we revisit the Byzantine consensus problem in cloud computing environments, in which each node maintains some number of neighbors, called local view. The message complexity of our Byzantine consensus scheme is O(n), instead of O(n2). Experimental results and correctness proof show that our Byzantine consensus scheme can solve the Byzantine consensus problem safely in a scalable way without a bottleneck and a leader in cloud computing environments.

Keywords

Byzantine fault tolerance Consensus Gossip Cloud computing 

1 Introduction

Byzantine consensus is a well-known problem in distributed systems. To reach consensus, each process (or node) Pidecides (or proposes) a value for a distributed computation. Then, the processes (or the nodes) in the system communicate among them, exchanging their decided values. In the problem, a process (or a node) may exhibit arbitrary and unpredictable behavior intentionally or inadvertently. This class of failure is called a Byzantine failure and is considered as the worst possible failure model. Traditionally, a solution requires at least 3f + 1 nodes to tolerate f Byzantine nodes (Lamport et al. 1982). Because reducing the total number of nodes in the presence of Byzantine nodes is an important aspect in distributed systems, several algorithms that require only 2f + 1 nodes have been developed (Yin et al. 2003; Correia et al. 2004; Chun et al. 2007; Kapitza et al. 2012) and (Veronese et al. 2013).Another important factor that may influence the performance of Byzantine fault tolerance algorithms is scalability, i.e., an algorithm should work well when the number of nodes increases without a bottleneck. However, previous Byzantine fault tolerance algorithms explicitly or implicitly assumed that the network topology is fully connected because they rely on broadcast or multicast primitives. Therefore, they scale poorly for a large number of nodes. In fact, the network topology in cloud computing is often not fully connected and is layered due to loosely coupled environments, i.e., virtualized resources are provided as a service over the Internet as needed (Wang et al. 2011).

Moreover, many algorithms of Byzantine consensus rely on an elected leader to coordinate the agreement protocol and these are vulnerable to performance degradation caused by a malicious leader (Amir et al. 2011), as well as a single point of failure at the leader. These observations let us think of improvement of Byzantine consensus algorithms especially in cloud computing environments, in which the underlying overlay is susceptible to frequent topology changes due to dynamism and churn of the participating nodes.

In this paper, we present a scalable and leaderless Byzantine consensus solution in cloud computing environments. Specifically, we use a randomized approach using gossip to solve the Byzantine consensus problem, and to address the aforementioned issues appeared in previous work. In our approach, each node stores small membership information, called local view, instead of full membership information, and each local view can be constructed by sampling random nodes in the system.

For application perspective, a gossip protocol is also suitable for social cloud computing (Chard et al. 2012), in which individual users (or nodes) of a social network leverage pre-existing relationships between users (or nodes) to enable mutually beneficial sharing a resource and a service, as well as distributed computation or data intensive oriented cloud computing environments.

However, determining whether consensus is reached is a not a trivial task in such an environment because there is no leader to coordinate the protocol and each node maintains only small membership information of the system. Hence, a new algorithm, which uses a piggybacking mechanism, is devised. Table 1 shows the comparison of Byzantine consensus algorithms.
Table 1

Comparison of Byzantine consensus algorithms

Approach

Number of nodes

Tamperproof component

Single point of failure

Leader-based

Scalability

State machine replication

\(3f + 1\) or \(2f + 1\)

not required

yes

yes

\(\times \)

Byzantine quorum system

\(4f + 1\)

not required

yes/no

yes/no

\(\triangle \)

Tamperproof-based

\(2f + 1\)

required

yes

yes

\(\times \)

Our approach

\(2f + 1\)

not required

no

no

\(\bigcirc \)

The contributions of this paper can be summarized as follows:
  1. 1.

    It presents a Byzantine consensus solution that requires n ≥ 2f + 1 nodes, which is minimal in the literature, without tamperproof components, satisfying the correctness properties: safety and liveness;

     
  2. 2.

    It introduces a new design rationale for Byzantine consensus in cloud computing environments, i.e., against scalability, bottleneck and single point of failure, by using gossip. To achieve this goal, new algorithms are particularly devised and incorporated into the gossip protocol;

     
  3. 3.

    It shows that our Byzantine consensus solution provides several benefits: (1) our solution scales well as the number of nodes increases; (2) it has no any bottleneck or single point of failure because of the absence of an elected leader.

     

The rest of this paper is organized as follows. We describe background information as well as the Byzantine consensus model in cloud computing environments in Section 2. We present details of our algorithm in Section 3; an example of the algorithm and correctness proof are also provided in this section. We give an evaluation of our algorithm with scalable settings in Section 4. After we discuss related work in Section 5, we conclude our paper in Section 6.

2 Preliminaries

In this section, we begin with a brief review of fault tolerance in distributed systems and then describe the Byzantine consensus problem especially in cloud computing environments, rather than traditional distributed systems. It goes on to examine how the Byzantine consensus can be reached with only 2f + 1 nodes. The section then describes a gossip protocol, on which our Byzantine consensus algorithm is based.

2.1 Fault tolerance

Fault tolerance refers to the ability of a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails (Johnson 1984). There are two subareas of fault tolerance in computer systems: hardware fault tolerance and software fault tolerance. Because hardware fault tolerance techniques come under the purview of the electrical engineering community, we mainly focus on software fault tolerance techniques.

Usually, fault tolerant systems are based on the concept of redundancy or replication. In redundancy techniques, multiple identical instances of the same system or part are provided, and the failed instance is switched to one of the remaining instances in case of a failure. For replication techniques, multiple identical instances of the same system or part are provided, and tasks or requests are made to perform in all of instances in parallel, allowing choosing the correct result from the instances.

Byzantine consensus is also based on the concept of replication, and the consensus problem is more difficult when nodes exhibit Byzantine behavior, than a fail-stop model (Schneider 1984), where a failed node remains halted forever. In Byzantine failure model, a failed node may not remain halted but continuously doing arbitrary behavior including stopping, crashing, corrupting its local state, and producing incorrect/inconsistent outputs.

2.2 Byzantine consensus in cloud computing environments

Figure 1 shows the cloud computing architecture, consisting of a client, service end-points, and the underlying infrastructure. A client requests an application service by contacting a service end-point (a web portal) and receives a response or a result from the service end-point. To perform an application-specific service, it uses a number of (virtualized) computing nodes, which are connected by overlay links. The physical infrastructures are generally composed of several computing clusters, often providing virtualization capabilities to support a number of virtual machines within a host machine (Buyya et al. 2009; Komal et al. 2013).
Fig. 1

Cloud computing architecture

To complete a distributed computation while achieving a common goal, each node should agree on some values. Examples of these applications include mutual exclusion, termination detection, and deadlock detection. The importance of Byzantine fault tolerance in cloud computing environments is that cloud computing should guarantee robustness, reliability, and availability in the presence of node joining/leaving and failures, which are due to the highly dynamic nature of the system (Zhang et al. 2011). In this regard, Byzantine consensus is a basic and important building block for aforementioned applications.

Informally, consensus is the problem of making a set of nodes to agree on a value. Consensus in the presence of Byzantine nodes typically requires n ≥ 3f + 1 nodes. In Fig. 2a, the commander broadcasts the initial value a. As soon as receive the initial value from the commander, each node also broadcasts the received value except for the commander, then, waits until it has collected n values (including its own). After that, it evaluates the majority. For example, node A will return “a” because majority of the received values {a, a, b} is a.
Fig. 2

Byzantine consensus

However, our observation is that for the most of applications in cloud computing environments, a decision value is not determined by an external node (commander) but an internal computing node, i.e., it proposes a value from the result of its own state and computation. These examples include synchronization and ordering of activities, and coordinating the updates to replicated data, etc. Figure 2b shows this circumstance. (For instance, a client may request a computation such that max(min(a, b), min(c, d)).) In such a situation, it requires n ≥ 2f + 1 nodes since a correct node needs only f + 1 correct received values (including its own) for evaluating majority. In order to response to the request of the client, a node within service end-point periodically samples one of computing nodes at random, instead of aggregating 2f + 1 matching messages. This reduces another potential performance degradation and this is safe to a client because a piggybacked message consists of individually signed messages from other nodes. The more details of our Byzantine consensus scheme are described in Section 3.

2.3 Gossip protocol

A gossip protocol is a method to communicate among uniquely identifiable nodes in a cycle-based fashion, inspired by the form of gossip seen in social networks. For example, if node P has just been updated for some data, it is willing to spread to other nodes. Subsequently, it will contact some neighbors and try to push the data. In contrast, if node P has not yet seen new data, it is willing to be updated. Then, it will try to get the data by pulling other neighbors. A gossip protocol guarantees the message delivery with the high probability even failures are occur because of its inherent properties (Ganesh et al. 2003). Refer to Allavena et al. (2005) and Gurevich and Keidar (2009) for correctness proof of the gossip protocol. The simplest form of the gossip protocol comes in two states: susceptible and infected. This form of the gossip protocol is called the SI model (Newman 2010). The logistic growth function for the SI model can mathematically be described as follows:
$$ I(c)=\frac{i_{0}e^{fc}}{1-i_{0}+i_{0}e^{fc}} $$
(1)
where i0 is the value of I(c) at c = 0, f is fanout and c is cycle.In a gossip protocol, each node maintains small membership information, which is called local view, rather than full membership information in the system. Hence, the overlay network can greatly be simplified. At each cycle, a node selects f (fanout) gossip targets from its local view, and then communicates with the gossip targets using one of the following ways: (1) push mode, (2) pull mode, and (3) push-pull mode. Algorithms 1 and 2 illustrate how a gossip protocol operates.

In an active thread, a node selects a gossip target. (In this algorithm, we assume that f is set to 1 for brevity). Then, it sends its own local information to the target if it is in push mode. When the protocol is using pull mode, it receives a message containing local information of the target and updates local information with the message. In a passive thread, it waits until it is selected as a target from other nodes. Whenever a node is selected from other nodes, it receives a message containing local information of the gossip target, then updates local information with the message if it is in push mode. When the protocol is using pull mode, it just sends a message containing its local information. Note that push-pull mode can be seen as a combination of push mode and pull mode of the protocol.

3 The proposed Byzantine consensus technique

The basic idea of our Byzantine consensus scheme is that rather than broadcasting only its own value, we require each node to have a data structure that is able to contain other nodes’ values. (Because we do not assume that the network topology is fully connected like in cloud computing environments, it is hard to maintain a consistent state by sending only its own value to a gossip target.) Then, when gossiping each node sends and receives the data structure to maintain a

consistent state. In contrast to the ones that rely on an elected leader to coordinate agreement, our Byzantine consensus scheme can safely maintain a consistent state requiring only n ≥ 2f + 1 nodes.

3.1 System model

The system is composed by a set of nodes or processes \(\prod = \{node_{1}, node_{2}, node_{3}, \ldots \ node_{n}\}\), and each node is functionally equal to another node. Henceforth, we use the terms a node and a process interchangeably. A node is an independent processing unit for a given environment. In cloud computing environments, a node can be considered as a single virtual machine. There is no notion of global memory. Therefore, message passing is the only way to communicate each other in the system. Communication channels are reliable but are not restricted to FIFO. In terms of failures, it is assumed that any node can be subject to Byzantine failures, i.e., they may deviate arbitrarily from the specification of the algorithm intentionally or inadvertently.

To prevent identification forgery, we use a digital signature scheme that uses public and private keys for signing and verifying a message. That is, a node signs a message with a private key before sending to a gossip target and a receiver verifies a message using a public key of a sender. It guarantees the identity and the non-repudiation of the signatory.

3.2 Algorithmic details

Algorithms 3 and 4 present our Byzantine consensus algorithms using gossip in active and passive threads, respectively. There are three data structures used in our algorithms: LocalDecision, DecisionVector and FinalDecision. LocalDecision is computed at each node for local decision. DecisionVector is an array that is able to contain nLocalDecision of the system, where n is the number of nodes. FinalDecision is computed at each node as a result of majority with DecisionVector.

We now describe the algorithmic details as follows. In line 2 of Algorithm 3, a node computes LocalDecision. Then, the value of LocalDecision is assigned to the ith element of DecisionVectori(line 3). At each cycle, the procedure presented in line 6 ∼ 12 is repeated according to f (fanout). After selecting a neighbor from local view, it sends DecisionVector to a target if it is in push mode, it receives DecisionVector from a target and calls the updateVector() function if it is in pull mode. After that, it calls the checkConsensus() function. In the updateVector() function (line 13 ∼ 16), each element of DecisionVector of its own is updated comparing with DecisionVector of a target. In the checkConsensus() function (line 17 ∼ 20), it first compute majority of DecisionVector of its own. Next, it checks if at least half + 1 of elements of DecisionVector is the same as the result of majority. If this condition holds, it assigns the majority value to FinalDecision.

In the passive thread (Algorithm 4), it infinitely waits for targets. Each time it is selected from other nodes, it receives DecisionVector from a target and updates its DecisionVector if it is using push mode, it sends DecisionVector to a target if it is using pull mode. The updateVector() function is the same as described in Algorithm 3. Note also that push-pull mode is a combination of push and pull modes.

3.3 Example

Figure 3 shows an example of executing the Byzantine consensus algorithms using gossip. In this example, there are 5 nodes including 2 Byzantine nodes (i.e., Node B and Node E), where n ≥ 2f + 1. Byzantine nodes are indicated with gray filled circles and other nodes are assumed to be correct. We assumed that each node has a local view that is able to contain 2 random neighbor nodes. The initial states of nodes are shown in Fig. 3a. The correct decision value is assumed to be a. Although the order of execution is not determined among nodes in practice, we assume that the executions are in alphabetical order for brevity. In addition, push-pull gossip mode is used in the example.
Fig. 3

Example of execution

The execution during cycle 1 is as follows.
  • Node A selects node C from its local view (indicated with right arrow in Fig. 3b) as a gossip target. Next, node A sends its DecisionVector [a, \(null, null, null, null\)] to node C and receives node C’s DecisionVector [\(null, null\), a, \(null, null\)]. Then, node A and C update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [a, null, a, \(null, null\)].

  • Node B selects node E from its local view as a gossip target. Next, node B sends its DecisionVector [null, c, \(null, null, null\)] to node E and receives node E’s DecisionVector [\(null, null, null, null\), d]. (Since both are Byzantine nodes, they send an arbitrary value each time.) Then, node B and E update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [null, c, \(null, null\), d].

  • Node C selects node B from its local view as a gossip target. Next, node C sends its DecisionVector [a, null, a, \(null, null\)] to node B and receives node B’s DecisionVector [null, e, \(null, null\), d]. Then, node C and B update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [a, e, a, null, d].

  • Node D selects node C from its local view as a gossip target. Next, node D sends its DecisionVector [\(null, null, null\), a, null] to node C and receives node C’s DecisionVector [a, e, a, null, d]. Then, node D and C update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [a, e, a, a, d].

  • Node E selects node A from its local view as a gossip target. Next, node E sends its DecisionVector [null, c, \(null, null\), f] to node A and receives node A’s DecisionVector [a, null, a, \(null, null\)]. Then, node E and A update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [a, c, a, null, f].

The node states after cycle 1 are shown in Fig. 3b. After cycle 1, node C and D can decide the consensus value: a, since majority of DecisionVectors [a, e, a, a, d] is a.

The execution during cycle 2 is as follows.
  • Node A selects node D from its local view as a gossip target. Next, node A sends its DecisionVector [a, c, a, null, f] to node D and receives node D’s DecisionVector [a, e, a, a, d]. Then, node A and D update its own DecisionVector. The updated DecisionVectors of the two nodes at this moment are [a, c\(\cdot \)e, a, a, d\(\cdot \)f]. During this stage, conflicts on DecisionVectors are found. Specifically, the elements for node B and E on DecisionVectors. Node A and D, therefore, will suspect node B and E are Byzantine nodes.

  • Node B selects node A from its local view as a gossip target. Next, node B sends its DecisionVector [a, g, a, null, d] to node A and receives node A’s DecisionVector [a, c\(\cdot \)e, a, a, d\(\cdot \)f]. Then, node B and A update its own DecisionVector. The updated DecisionVectors of node B and A at this moment are [a, h, a, a, d\(\cdot \)f] and [a, c\(\cdot \)e\(\cdot \)g, a, a, d\(\cdot \)f], respectively. Because node B is a Byzantine node, it could not suspect itself or node E as Byzantine nodes. In contrast, node A will suspect both node B and E as Byzantine nodes.

  • Node C selects node E from its local view as a gossip target. Next, node C sends its DecisionVector [a, e, a, a, d] to node E and receives node E’s DecisionVector [a, c, a, null, h]. Then, node C and E update its own DecisionVector. The updated DecisionVectors of node C and E at this moment are [a, c\(\cdot \)e, a, a, d\(\cdot \)h] and [a, c\(\cdot \)e, a, a, i], respectively. Because node E is a Byzantine node, it could not suspect itself or node B as Byzantine nodes. In contrast, node C will suspect both node B and E as Byzantine nodes.

  • Node D selects node B from its local view as a gossip target. Next, node D sends its DecisionVector [a, c\(\cdot \)e, a, a, d\(\cdot \)f] to node B and receives node B’s DecisionVector [a, j, a, a, d\(\cdot \)f]. Then, node D and B update its own DecisionVector. The updated DecisionVectors of node D and B at this moment are [a, c\(\cdot \)e\(\cdot \)j, a, a, d\(\cdot \)f] and [a, k, a, a, d\(\cdot \)f], respectively. Because node B is a Byzantine node, it could not suspect itself or node E as Byzantine nodes. In contrast, node D will suspect both node B and E as Byzantine nodes.

  • Node E selects node D from its local view as a gossip target. Next, node E sends its DecisionVector [a, c\(\cdot \)e, a, a, l] to node D and receives node D’s DecisionVector [a, c\(\cdot \)e\(\cdot \)j, a, a, d]\(\cdot \)f]. Then, node E and D update its own DecisionVector. The updated DecisionVectors of node E and D at this moment are [a, c\(\cdot \)e\(\cdot \)j, a, a, m] and [a, c\(\cdot \)e\(\cdot \)j, a, a, d]\(\cdot \)f\(\cdot \)l], respectively. Because node E is a Byzantine node, it could not suspect itself or node B as Byzantine nodes. In contrast, node D will suspect both node B and E as Byzantine nodes.

The node states after cycle 2 are shown in Fig. 3c. After cycle 2, every correct node can decide the consensus value. Although Byzantine nodes can decide the consensus value, they could not.

3.4 Correctness conditions and proof

We prove that our proposed algorithms satisfy the correctness conditions. We consider the algorithms as presented in Algorithms 3 and 4, where each correct node has to agree on the vector of decided values to reach consensus. In this context, the problem is equivalent to the interactive consistency problem (Pease et al. 1980), where n ≥ 2f + 1. We begin by defining the requirements for the interactive consistency problem as follows.
  • Validity: If process Piis correct, then all correct nodes decide on the ith element of their decision vector to the value that Pihas decided.

  • Termination: Every correct node eventually sets its decision value from the decision vector.

  • Agreement: The decision value from the decision vector of all correct nodes is the same.

Theorem 1

The proposed Byzantine consensus algorithms using gossip satisfy the validity requirement.

Proof

The proof is by contradiction. Suppose that the proposed Byzantine consensus algorithms using gossip do not satisfy the validity requirement. Let Piand Pjbe correct nodes and there are k correct nodes in the system. According to the specification of the algorithm, Pidecides the local decision value and the value is assigned to the element of the decision vector. Suppose Piselects Pjas a gossip target, Pj’s decision vector is updated with Pi’s decision vector. Hence, Pj’s decision vector contains the value that Pihas decided. Because Pjis not a Byzantine node and the message is digitally signed with his or her private key, Pjcannot modify the Pi’s decision value. As a result, Pjshould decide on Pi’s value on its ith element of the decision vector. Due to the high probability of message delivery of gossip, Pi’s decision value will be propagated to the nodes in the system. Therefore, all the k correct nodes will update ith element of their decision vector with the value that Pihas decided. This is a contradiction. □

Theorem 2

The proposed Byzantine consensus algorithms using gossip satisfy the termination requirement.

Proof

The proof is by induction.
  • Basis: There is one correct node in the system (i.e., n = 1).Let Pibe the correct node. Since there is one node in the system, the size of the decision vector is one. As soon as Pidecides the local decision value in a finite time, the decision vector is updated with its own local decision value. Therefore, the algorithms satisfy the termination requirement when there is one correct node in the system.

  • Induction step (1): There are k correct nodes in the system, where n ≥ 2f + 1.Let Piand Pjbe the correct nodes in the system. Based on Theorem 1, if Pidecides the local decision value, all the correct nodes will decide on ith element of their decision vector to the value Pihas decided. Likewise, after Pjdecides the local decision value in a finite time, jth element of all correct nodes’ decision vectors will be set to the value Pjhas decided as gossip cycle progresses. After all, all the k correct nodes will decide on the k element of their decision vector mutually. Since k > f, all the k correct nodes can eventually set its decision value from the decision vector by computing majority. Therefore, the algorithms satisfy the termination requirement when there are k correct nodes in the system.

  • Induction step (2): There are k + 1 correct nodes in the system, where n ≥ 2f + 1.Let Pk + 1be the k + 1-th correct node in the system. Suppose that all k correct nodes have decided on the k elements of their decision vector mutually except for Pk + 1. If Pk + 1selects Pias a gossip target using push mode, Pican decide on k + 1-th element of its decision vector. At this stage, Pihas decided on k + 1 elements of its decision vector. Hence, Piwill set the decision value from the decision vector by computing majority, and the k + 1-th element of the decision vector will be propagated to nodes in the system as gossip cycle progresses. Similarly, if Pk + 1selects Pias a gossip target using pull mode, Pk + 1can decide on k + 1 correct elements of its decision vector since Pihas the decision vector containing k correct elements, and the k + 1-th element of the decision vector will be propagated to nodes in the system as discussed. Therefore, the algorithms satisfy the termination requirement when there are k + 1 correct nodes in the system.

Theorem 3

The proposed Byzantine consensus algorithms using gossip satisfy the agreement requirement.

Proof

The proof is by contradiction. Suppose that the decision value from the decision vector of correct nodes is different. Let Piand Pjbe the correct nodes and there are k correct nodes in the system. For Pi, it decides the correct local decision value in a finite time. After it decides the local decision value, it gossips with other nodes and updates its decision vector. When the requisite number of cycles is reached, it computes majority of the decision vector. Because at least \(\lfloor n/2 \rfloor + 1\) nodes are correct and correct nodes obey the specification of the algorithms, the result of majority will be the correct one. In the same way, a correct node Pjdecides the correct local decision value in a finite time. Then, it gossips with other nodes and updates its decision vector. After the requisite number of cycles, it compute majority of the decision vector and its result will also be the correct one. Since both Piand Pjare correct, the decision value of Piand Pjwill be the same. Likewise, all the k correct nodes’ decision values will be the same because they output the correct local decision value and assign the value to the corresponding element of their decision vector according to the Algorithm 3 and 4. This is a contradiction. □

4 Evaluation

This section gives performance results of our unstructured approach on Byzantine consensus. Since, to the best of our knowledge, no solution that solves the Byzantine consensus problem using an unstructured form, such as via gossip, has yet been proposed, no direct comparison of performance evaluations with other methods is given. Rather, we have tried to include the analysis of complexity of other approaches with ours in Section 5. We measured the fraction of DecisionVector that has not null of each node in the system with scalable settings, varying the number of nodes, gossip mode, and fanout. Table 2 shows the parameters and their values used in the evaluations by types of execution: gracious and non-gracious executions. We note that because our Byzantine consensus algorithm uses several data structures, additional memory space is required on each node specifically for LocalDecision, DecisionVector, and FinalDecision. However, storing LocalDecision and FinalDecision is trivial, because 64-bit data type is enough for numerical applications. For DecisionVector, the array size is proportional to the number of nodes in the system. If there are 10\(^{4}\) nodes in the system, the array size of DecisionVector is 80 kb. In this scenario, sending and receiving DecisionVector are also trivial in contemporary network environments even without compression.
Table 2

Evaluation parameters and their values (numbers in parenthesis are the default value unless specified otherwise)

Execution

Parameter

Value

Both

Number of nodes

10\(^{2.5}\), 10\(^{3}\), 10\(^{3.5}\), (10\(^{4}\))

Gracious

Gossip mode

Push, Pull, (Push-pull)

Gracious

Size of local view

10, (20), 30, 40

Gracious

Fanout

(1), 2, 3, 4

Non-gracious

Gossip mode

Push-pull

Non-gracious

Size of local view

20

Non-gracious

Fanout

1

Non-gracious

Byzantine probability

0.1, 0.2, 0.3, 0.4

4.1 Gracious execution

For the first part of the performance evaluation, we varied the evaluation parameters to confirm the effects of theirs in the absence of Byzantine failures of nodes.

Effects of gossip mode

As discussed in Section 3.2, there are three communication modes in a gossip protocol: push, pull, and push-pull. Figure 4 shows the averaged fraction data of DecisionVector that has not null of all nodes with standard deviation for each gossip mode. Comparing with push and pull modes, push mode has better performance than pull mode but has higher standard deviation. Obviously, push-pull mode has better performance than both push and pull modes. This phenomenon is constant in Figs. 4a–d. Notice that the requisite number of cycles increases linearly, as the number of nodes increases exponentially. If we use push-pull gossip mode, the requisite numbers of cycles are 4, 5, 6 and 6 on average when the numbers of nodes are 10\(^{2.5}\), 10\(^{3}\), 10\(^{3.5}\) and 10\(^{4}\), respectively. Recall that the FinalDecision value can be calculated by computing majority with at least \(\lfloor n/2 \rfloor + 1\) correct elements of DecisionVector. In this regard, our Byzantine consensus scheme is scalable in terms of the number of nodes without a bottleneck.
Fig. 4

Effects of gossip mode

Effects of size of local view

Local view is also key to achieve scalability so that the overlay network can greatly be simplified. To see the effects of size of local view, we configured each node to have 10, 20, 30, and 40 as the size of local view, respectively, and each result of evaluation is shown in Fig. 5. The other evaluation parameters are set to default values as noted in Table 2. Surprisingly, the effects of size of local view are sufficiently small. Therefore, we cannot say that the larger size of local view is better because our method relies on random sampling. This means that our Byzantine consensus scheme has the great advantage over previous methods that assumed the underlying overlay network is fully connected explicitly or implicitly because they rely on broadcast or multicast primitives. The requisite numbers of cycles using push-pull mode are 4, 5, 6 and 6 on average when the numbers of nodes are 10\(^{2.5}\), 10\(^{3}\), 10\(^{3.5}\) and 10\(^{4}\), respectively (see Fig. 5).
Fig. 5

Effects of local view size

Effects of fanout

As discussed, fanout is the number of gossip targets each node selects at a cycle. We presuppose that the higher value of fanout has the less requisite number of cycles with higher message complexity. Evidently, Fig. 6 shows that the higher value of fanout has the better performance. Taking triangle up graphs (fanout: 2) as an example, it clipped 2 or 3 cycles off comparing with the graphs whose fanout is 1 for the requisite number of cycles. When fanout is 4, the requisite number of cycles is 2 for all subfigures in Fig. 6. Note that message complexity is also proportional to the fanout parameter.
Fig. 6

Effects of fanout

4.2 Non-gracious execution

Besides gracious execution, we evaluate the performance of our solution in the presence of Byzantine failures. We subdivide the behavior of Byzantine nodes into two categories: benign and malicious. For benign behavior of Byzantine nodes, they send an arbitrary (incorrect) message to a gossip target when gossiping, obeying the specification of the algorithm (i.e., updateVector() in Algorithm 3). Because each element of DecisionVector is digitally signed with his or her private key, Byzantine nodes cannot alter other elements of DecisionVector with correctly verifiable messages except for its own element.

For malicious behavior of Byzantine nodes, they deviate from the specification of the algorithm. In other words, they do not execute updateVector() in order to hinder or delay other correct nodes’ objectives. Hence, this form of Byzantine nodes may send an empty DecisionVector except for its own element. We evaluate the effects of the benign and malicious behavior of Byzantine nodes separately. The evaluation parameters used in this non-gracious scenario are the same as in the gracious scenario.

To show the effects of Byzantine nodes, we configured the evaluation parameter: Byzantine probability. As we need at least f + 1 correct nodes, Byzantine probability does not be configured to higher than 0.5. Note also that since probability is a measure of the expectation that an event will occur, the actual number of Byzantine nodes will be different from the one calculated with the Byzantine probability parameter.

Effects of benign behavior of Byzantine nodes

Figure 7 shows the effects of benign behavior of Byzantine nodes. The fraction data of Fig. 7 and 8 are calculated only for the correct elements of DecisionVector. As we expected, due to Byzantine nodes, the requisite number of cycles is delayed. Taking triangle down graphs (Byzantine probability: 0.4) as an example, the requisite numbers of cycles are 5, 6, 6 and 7 when the numbers of nodes are 10\(^{2.5}\), 10\(^{3}\), 10\(^{3.5}\) and 10\(^{4}\), respectively. Because the correct nodes should collect at least \(\lfloor n/2 \rfloor + 1\) correct elements of DecisionVector, the more gossip cycles are needed as the number of Byzantine nodes increases. Nevertheless, they eventually set the FinalDecision value.
Fig. 7

Effects of benign behavior of Byzantine nodes

Fig. 8

Effects of malicious behavior of Byzantine nodes

Effects of malicious behavior of Byzantine nodes

When Byzantine nodes exhibit malicious behavior (i.e., they do not perform updateVector()), they make the correct nodes’ job more difficult. In addition to the results of the benign behavior of Byzantine nodes, they require more gossip cycles to reach consensus. Because when a correct node selects a Byzantine node as a gossip target, the Byzantine node will not benefit the correct node. Therefore, when the correct nodes select Byzantine nodes as a gossip target frequently, the requisite number of cycles could be prolonged. Nevertheless, our Byzantine consensus scheme satisfies validity, termination and agreement in the presence of malicious behavior of Byzantine node since the probability of the network partitioning is small (Allavena et al. 2005; Gurevich and Keidar 2009).

4.3 Complexity analysis

The complexity of our Byzantine consensus scheme is evaluated in terms of the maximum number of allowable Byzantine nodes in the system and the message complexity.

Theorem 4

The maximum number of allowable Byzantine nodes in the systems is\(\lfloor n/2 \rfloor - 1\)when n is even, or\(\lfloor n/2 \rfloor \)when n is odd, where n is the number of nodes.

Proof

The Byzantine consensus scheme presented in this paper requires n ≥ 2f + 1, where n is the number of nodes and f is the number of Byzantine nodes. This means that our proposed scheme works safely if the number of correct nodes is greater than the number of Byzantine nodes. According to the specification of Algorithm 3, each correct nodes will periodically gossip with other nodes and calculate majority. With the result of majority, it counts the number of elements whose value is equal to the result of majority. If the count number is greater than the half of the number of nodes, it decides the decision value. Because the gossip protocol guarantees message delivery with the high probability and the probability of the network partitioning is small, the correct nodes can eventually decide the FinalDecision value after some gossip cycles. In this regard, the maximum number of allowable Byzantine nodes in the system depends on n. Specifically, when the number of nodes is even, the maximum number of allowable Byzantine nodes is \(\lfloor n/2 \rfloor - 1\) because the number of correct nodes should be greater than the number of Byzantine nodes. For the same reason, if the number of nodes is odd, the maximum number of allowable Byzantine nodes is \(\lfloor n/2 \rfloor \). □

lemma 1

The requisite number of cycles to carry out correct majority when\(n \geq 2f + 1\)is c, where c depends on the choice of parameters and the number of Byzantine nodes.

Proof

Unlike the previous approaches on Byzantine consensus, our scheme is nondeterministic and relies on random uncertainty. Therefore, the requisite number of cycles to carry out correct majority depends on parameters such as gossip mode, size of local view, fanout.

Considering gossip mode, the push-pull mode has a higher degree of probability of reaching consensus than the push or the pull modes because the push-pull mode performs more the message-exchanging step of the protocol according to the specification.

For size of local view, the probability of being selected from other nodes at a cycle is 1 if fanout is set to 1. Each local view is capable of contain some number of neighbors (i.e., v) and at each cycle a node select a gossip target from local view. Specifically, \(\frac {v}{n}\cdot \frac {1}{v} = \frac {1}{n}\). Because n nodes perform a gossip algorithm, the probability is 1, where v ≫ 1. Moreover, the probability of being selected from other nodes is proportional to the fanout (f ) parameter, where f ≥ 1 (i.e., \(\frac {f}{n}\cdot n\)).For correct nodes, therefore, the correct number of elements of DecisionVector is non-decreasing according to the specification of the algorithm. Hence, they will eventually get at least \(\lfloor n/2 \rfloor + 1\) correct elements of DecisionVector with the high probability through gossiping after some number of gossip cycle c. Based on these statements the requisite number of cycles is c. □

lemma 2

The message complexity of our Byzantine consensus scheme is nfc.

Proof

At a cycle, each node selects f (fanout) neighbors from its local view, and therefore our Byzantine consensus scheme generates nf messages at a cycle. In Lemma 1, we show that the requisite number of cycles is c. As a result, the message complexity of our Byzantine consensus scheme is nfc. If we set the fanout parameter to 1, the message complexity of our Byzantine consensus scheme is nc. If we set the fanout parameter higher than 1, the requisite number of cycles could be shortened. To generalize above arguments, the message complexity of our Byzantine consensus scheme is ncf . □

Theorem 5

The message complexity of our Byzantine consensus scheme is O(n).

Proof

In Lemma 2, we argued that the message complexity of our Byzantine consensus scheme is nfc. However, in the gossip protocol, gossiping cycles are periodic events and will happen infinitively if the gossip protocol is used for (for example) a failure detection service. By amortizing the message complexity by a cycle, the message complexity of our Byzantine consensus scheme is nf . Furthermore, disregarding the coefficient f (fanout), we can say that the amortized message complexity of our Byzantine consensus scheme is asymptotically at most n. □

5 Related work

There are two broad approaches to implement Byzantine consensus services: state machine replication (Schneider 1990; Castro and Liskov 2002) and Byzantine quorum system (Malkhi and Reiter 1997; Bessani et al. 2009). The idea of the state machine replication approach is based on maintaining a consistent state among nodes in the system. On the other hand, the idea of the Byzantine quorum system is based on executing different operations in different intersecting sets of nodes. One disadvantage of using the Byzantine quorum system in comparison to the state machine approach is that they sometimes need one or more shared memory objects. Moreover, these type of Byzantine quorum systems requires n ≥ 4f + 1 nodes to safely calculate majority in the intersection between any two quorums, that is, \(\forall Q_{1}, Q_{2} \in Q, \mid {\mathrm {Q}}_{1}\cap Q_{2} \mid \geq 2f + 1\).The pioneer work on the state machine approach is originated from Paxos (Lamport 1998). One of the notable work on Byzantine fault tolerance based on Paxos is PBFT (Castro and Liskov 2002), which is requiring n ≥ 3f + 1 with 5 communication rounds in normal case operations. In Yin et al. (2003), the high resource demand constraint problem is addressed by separating agreement and execution requiring n ≥ 2f + 1 in execution while requiring n ≥ 3f + 1 in agreement.Recently, alternative solutions on Byzantine fault tolerance have been proposed based on trusted subsystems with cryptographic authentication (i.e., tamperproof components) to prevent equivocation. Correia et al. (2004), Chun et al. (2007), Kapitza et al. (2012) and Veronese et al. (2013). In Veronese et al. (2013), for example, the authors implemented the USIG (Unique Sequential Identifier Generator) service, which provides a monotonically increasing counter so that it will never assign the same identifier to two different messages, will never assign an identifier that is lower than a previous one, and will never assign an identifier that is not the successor of the previous one. Using these trusted subsystems, Byzantine nodes cannot equivocate on messages, and therefore the Byzantine failure model can be reduced to the fail-stop model (Schneider 1984) requiring only n ≥ 2f + 1.However, most of solutions that use state machine replication approach are leader-based. In other words, they rely on an explicitly elected reader to coordinate the protocol. The underlying assumption of leader-based protocols is that the network topology is fully connected because they rely on a broadcast primitive. Furthermore, the primary hindrance of using a leader-based protocol is based on the following questions: how to elect a leader?, how to tolerate faults of a leader?, etc., as well as the possibility of a performance bottleneck. Another barrier is that leader-based protocols are vulnerable to performance degradation caused by a malicious leader (Amir et al. 2011). In addition, the message complexity of most state machine based solutions is O(n2) due to prepare and commit phases in PBFT (Castro and Liskov 2002), instead of O(n) like in our protocol. Furthermore, we evaluate our proposed Byzantine consensus scheme with scalable settings, varying the number of (Byzantine) nodes instead of a small scale scenario such as f = 1 (Kapitza et al. 2012; Veronese et al. 2013) or f = 2 (Amir et al. 2011).

6 Conclusions

In this paper, we present a scalable and leaderless Byzantine consensus scheme in cloud computing environments. This unstructured construction using gossip provides scalability, reliability, and efficiency. Unlike previous studies, which require n ≥ 3f + 1 nodes or a trusted subsystem, our Byzantine consensus scheme requires only n ≥ 2f + 1 nodes, which are minimal in this literature, without a performance bottleneck or a single point of failure (i.e., a leader). The experimental results show that our Byzantine consensus scheme is scale well in terms of the number of nodes with the efficient message complexity of O(n), where n is the number of nodes.

References

  1. Allavena, A., Demers, A., Hopcroft, J.E. (2005). Correctness of a gossip based membership protocol. In Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, PODC ’05, (pp. 292–301). New York: ACM. doi:10.1145/1.0738141073871.CrossRefGoogle Scholar
  2. Amir, Y., Coan, B., Kirsch, J., Lane, J. (2011). Prime: Byzantine replication under attack. IEEE Transactions on Dependable and Secure Computing, 8(4), 564–577. doi:10.1109/TDSC.2010.70.CrossRefGoogle Scholar
  3. Bessani, A., Correia, M., da Silva Fraga, J., Lung, L.C. (2009). An efficient byzantine-resilient tuple space. IEEE Transactions on Computers, 58(8), 1080–1094. doi:10.1109/TC.2009.71.CrossRefGoogle Scholar
  4. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616. doi:10.1016/j.future.2008.12.001. http://www.sciencedirect.com/science/article/pii/S0167739X08001957.CrossRefGoogle Scholar
  5. Castro, M., & Liskov, B. (2002). Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4), 398–461. doi:10.1145/571637.571640.CrossRefGoogle Scholar
  6. Chard, K., Bubendorfer, K., Caton, S., Rana, O. (2012). Social cloud computing: a vision for socially motivated resource sharing. IEEE Transactions on Services Computing, 5(4), 551–563. doi:10.1109/TSC.2011.39.CrossRefGoogle Scholar
  7. Chun, B.G., Maniatis, P., Shenker, S., Kubiatowicz, J. (2007). Attested append-only memory: making adversaries stick to their word. In Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles, SOSP ’07 (pp. 189–204). New York: ACM. doi:10.1145/1.2942611294280.
  8. Correia, M., Neves, N., Verissimo, P. (2004). How to tolerate half less one byzantine nodes in practical distributed systems. In Proceedings of the 23rd IEEE international symposium on reliable distributed systems, 2004 (pp. 174–183). doi:10.1109/RELDIS.2004.1353018.
  9. Ganesh, A., Kermarrec, A. M., Massoulie, L. (2003). Peer-to-peer membership management for gossip-based protocols. IEEE Transactions on Computers, 52(2), 139–149. doi:10.1109/TC.2003.1176982.CrossRefGoogle Scholar
  10. Gurevich, M., & Keidar, I. (2009). Correctness of gossip-based membership under message loss. In Proceedings of the 28th ACM symposium on principles of distributed computing, PODC ’09 (pp. 151–160). New York: ACM. doi:10.1145/1.5827161582743.
  11. Johnson, B. (1984). Fault-tolerant microprocessor-based systems. IEEE Micro, 4(6), 6–21. doi:10.1109/MM.1984.291277.CrossRefGoogle Scholar
  12. Kapitza, R., Behl, J., Cachin, C., Distler, T., Kuhnle, S., Mohammadi, S. V., Schröder-Preikschat, W., Stengel, K. (2012). Cheapbft: resource-efficient byzantine fault tolerance. In Proceedings of the 7th ACM european conference on computer Systems, EuroSys ’12 (pp. 295–308). New York: ACM. doi:10.1145/2.1688362168866.
  13. Komal, M., Ansuyia, M., Deepak, D. (2013). Round robin with server affinity: A VM load balancing algorithm for cloud based infrastructure. Journal of Information Processing Systems, 9(3), 379–394. doi:10.3745/JIPS.2013.9.3.379. ISSN 1976-913X.CrossRefGoogle Scholar
  14. Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2), 133–169. doi:10.1145/279227.279229.CrossRefGoogle Scholar
  15. Lamport, L., Shostak, R., Pease, M. (1982). The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3), 382–401. doi:10.1145/357172.357176.CrossRefGoogle Scholar
  16. Malkhi, D., & Reiter, M. (1997). Byzantine quorum systems. In Proceedings of the twenty-ninth annual ACM symposium on theory of computing, STOC ’97 (pp. 569–578). New York: ACM. doi:10.1145/258533.258650.
  17. Newman, M. (2010). Networks: an introduction. New York: Oxford University Press, Inc.CrossRefGoogle Scholar
  18. Pease, M., Shostak, R., Lamport, L. (1980). Reaching agreement in the presence of faults. Journal of the ACM, 27(2), 228–234. doi:10.1145/322186.322188.CrossRefGoogle Scholar
  19. Schneider, F.B. (1984). Byzantine generals in action: implementing fail-stop processors. ACM Transactions on Computer Systems, 2(2), 145–154. doi:10.1145/190.357399.CrossRefGoogle Scholar
  20. Schneider, F.B. (1990). Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 22(4), 299–319. doi:10.1145/98163.98167.CrossRefGoogle Scholar
  21. Veronese, G., Correia, M., Bessani, A., Lung, L.C., Verissimo, P. (2013). Efficient byzantine fault-tolerance. IEEE Transactions on Computers, 62(1), 16–30. doi:10.1109/TC.2011.221.CrossRefGoogle Scholar
  22. Wang, S.S., Yan, K. Q., Wang, S.C. (2011). Achieving efficient agreement within a dual-failure cloud-computing environment. Expert Systems with Applications, 38(1), 906–915. doi:10.1016/j.eswa.2010.07.072.CrossRefGoogle Scholar
  23. Yin, J., Martin, J.P., Venkataramani, A., Alvisi, L., Dahlin, M. (2003). Separating agreement from execution for byzantine fault tolerant services. In Proceedings of the nineteenth ACM symposium on operating systems principles, SOSP ’03 (pp. 253–267). New York: ACM. doi:10.1145/945445.945470.
  24. Zhang, Y., Zheng, Z., Lyu, M. (2011). Bftcloud: a byzantine fault tolerance framework for voluntary-resource cloud computing. In IEEE international conference on cloud computing (CLOUD), 2011 (pp. 444–451). doi:10.1109/CLOUD.2011.16.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • JongBeom Lim
    • 1
  • Taeweon Suh
    • 1
  • JoonMin Gil
    • 2
  • Heonchang Yu
    • 1
  1. 1.Department of Computer Science EducationKorea UniversitySeoulKorea
  2. 2.School of Computer & Information Communications EngineeringCatholic University of DaeguDaeguKorea

Personalised recommendations