1 Introduction

Byzantine agreement (BA, aka consensus) is a classical problem introduced in [40] that asks n parties to agree on a message so that three properties are satisfied: (i) termination, (ii) agreement and (iii) validity, in a setting where any t of the parties may behave maliciously. Validity enforces the non-triviality of solutions, as it requires that if the non-faulty/“honest” parties start the execution with the same value, then that should be the output value.

BA has been classically considered in a “permissioned setting”: the parties running the protocol are setup so they are able to reliably and directly communicate with each other, or have access to a public-key directory that reliably lists all their public keys. This is captured by a suitable network or trusted setup assumption. The “permissionless setting,” on the other hand, was introduced with the development of the Bitcoin blockchain [37], and refers to an environment where parties may enter the protocol execution at will, the communication infrastructure is assumed to deliver messages without reliably identifying their origin, and the trusted setup is reduced to the existence of an unpredictable public string—the “genesis block” (which sometimes for simplicity we will just refer to as a CRS [common reference string], or “public-state setup” [25]).

BA in the permissionless setting above using proofs of work (PoW)Footnote 1 was first (formally) studied in [26]. In terms of running time, the protocols presented in [26] run in \(O(\textsf{polylog} \kappa )\) rounds, where \(\kappa \) is the security parameter and address the binary input case, where the parties wish to agree on a single bit. Subsequent work improved on various aspects at the expense of stronger assumptions. For example, Andrychowicz and Dziembowski [1] offered a multi-valued BA protocol also based on PoWs (RO) but with no trusted setup, assuming in addition the existence of existentially unforgeable signatures, and with a running time proportional to the number of parties. The latter was in turn improved by Garay et al. [28] to \(O(\textsf{polylog} \kappa )\) rounds, and just assuming PoWs and no trusted setup. Recently, an expected-constant-round BA protocol was introduced by Das et al. [13], by requiring in addition to the Andrychowicz and Dziembowski [1] assumptions the existence of verifiable delay functions (VDFs) [7]. Refer to Table 1 for a comparison of existing PoW-based (or “PoW-inspired”) BA protocols.

Table 1. Round complexity of PoW-based (or PoW-inspired) permissionless Byzantine agreement protocols, with their corresponding setup and cryptographic assumptions.

Given the above state of the art, in this work we focus on the question of solving permissionless BA in the original PoW-based blockchain model of Bitcoin with expected-constant round complexity.

1.1 Overview of Our Results

We present a new permissionless PoW-based multi-valued BA protocol that has expected-constant round complexity and demonstrate how it can be used to solve permissionless state machine replication (SMR, or, equivalently, a distributed ledger) [42] with fast settlement. In more detail, our results are as follows.

A new PoW-based permissionless consensus protocol. We put forth Chain-King Consensus—the first PoW-based permissionless consensus protocol that achieves agreement and validity in expected-constant time. Our construction is based on mining on parallel chains, and “emulating” a classical “phase-king” consensus protocol [6] with a randomized chain (the “chain-king”) selection rule on top of the parallel chains construction. Our protocol is based on the following ideas.

First, we revisit the parallel chain technique (cf. [3, 21, 22]) as a method for combining multiple blockchains advancing in parallel. Our key observation is that running \(m = \textsf{polylog} (\kappa )\) parallel chains is sufficient to maintain independence via an \(m{\times }1\)Footnote 2 PoW technique [26] (while prior work set \(m = \varTheta (\kappa )\) and hence at best was only able to argue “sub-independence”; see [22]). In fact, our protocol runs m independent instances of \(2{\times }1\) PoWs, with the latter component being responsible for transaction processing.Footnote 3 The key property we utilize is that in a constant number of rounds, a fraction of the m parallel chains will be sufficiently advanced to offer a form of “common prefix” property (cf. [26]) with a constant probability of success.

Second, and contrary to prior work on parallel chains, we “slice” the chain progression into stages where parallel chains can cross-reference each other. In the first stage, parties converge on their views and ensure fresh randomness is introduced; in the second stage they process transactions; and in the third, they prepare for the cross referencing by the upcoming stage, after which the stages rotate indefinitely. A key property of our cross-referencing rule is the concept of a dense chain—a strengthening of the concepts of “chain growth” and “chain quality” [26]. Given the short length of each stage (a constant number of rounds), chain density ensures that the adversary faces difficulties to create multiple compromised chains. The key conclusion of this chain structure is phase-oblivious agreement, which refers to the fact that, on a large fraction of chains, the majority of input values are contributed by honest parties.

The core agreement component of our protocol follows the “phase king” approach (cf. [5, 6]). The key idea of porting this protocol design technique to the permissionless setting is to map the chains in the parallel chains cluster to the roles of the different parties in the classical protocol. As a result, the king itself is one of the chains. Moreover, due to the “dilution” of adversarial power that occurs in the parallel chains setting, we can set the king deterministically to be a specific chain. This technique, which may be of independent interest, results in our “Chain-King Consensus” algorithm.

Chain-King Consensus is one-shot, in the sense that it will provide just a single instance of agreement in the permissionless setting in expected-constant time. The natural question given such protocol is whether it is possible to apply sequential self-composition with running time remaining expected linear in the number of instances. This is a delicate task due to non-simultaneous termination (cf. [12]). We provide a round-preserving sequential composition solution that first adapts Bracha termination [8] to the permissionless setting and reduces the “termination slack” among honest parties to 1 phase. Then, we adapt the super-phase expansion technique of [12] to widen the interval between state updates from 1 phase to 4 phases. We identify a set of good properties for a sequence of phases that when they occur parties that are in different timelines can converge on the same single phase and make a unanimous decision to update their state.

A new PoW-based permissionless fast SMR protocol. Given that Chain-King Consensus is a one-shot multi-valued Byzantine agreement protocol terminating in expected constant rounds, next we show how to build a state machine replication protocol on top of its sequential composition. The resulting protocol achieves consistency and expected-constant-time liveness for all types of transactions (including the conflicting ones). This answers a question left open in previous work on PoW-based fast ledgers [3, 22], where fast settlement of transactions was offered only for non-conflicting transactions, thus making our ledger construction the first expected-constant processing time ledger in the PoW setting. We note that fast processing of conflicting transactions can be crucial for many applications such as sequencing smart contract operations. We also describe how it is possible to “bootstrap from genesis”: this essential operation permits new parties to join the protocol execution as well as facilitate third party observers who wish to connect and parse the distributed ledger in order to issue transactions or read transaction outputs.

1.2 Related Work

Round Complexity of Synchronous BA Protocols. For “classical” BA protocols with deterministic termination, it is known that \(t + 1\) rounds [19] are necessary, where t denotes the upper bound on the number of corrupted parties, and matching upper bounds exist, both in the information-theoretic and cryptographic settings [16, 31, 35].

The linear dependency of the number of rounds on the number of corrupted parties can be circumvented by introducing randomization. Rabin [41] showed that consensus reduces to an “oblivious common coin” (OCC)—i.e., a common view of the honest parties of some public randomness. As a result, randomized protocols with linear corruption resiliency and probabilistic termination in expected-constant rounds is possible. Later on, Feldman and Micali [18] showed how to construct an OCC “from scratch” and gave the first expected-constant-time Byzantine agreement protocol, tolerating the optimal number of corrupted parties (less than 1/3 of the total number of parties), in the information-theoretic setting. In the setting where trusted private setup (i.e., a PKI) is provided, Katz and Koo [32] presented an expected-constant-round BA protocol with optimal resiliency (less than 1/2 in the cryptographic setting).

We already mentioned that with the advent of blockchains, BA protocols that do not rely on a fixed set of participants became possible. For PoW-based BA protocols, please refer to the beginning of this section. Regarding Proof-of-Stake protocols, Algorand [11] uses verifiable random functions (VRFs) to self-elect parties, and agreement and validity are achieved in expected-constant time.

Regarding BA protocols based on some other assumptions, we note that in an unpublished manuscript (also mentioned in the introduction) [17], Eckey, Faust and Loss design an expected-constant-round BA protocol based on PoWs and time-lock puzzles (TLPs). Further, Das et al. [13] propose a BA protocol based on the much stronger primitive of verifiable delay functions (VDFs) that also terminate in expected-constant time.

Many PoWs from One PoW. As mentioned in the introduction, Garay, Kiayias and Leonardos [26] showed how to use a Nakamoto-style blockchain to solve BA. Achieving the optimal corruption threshold of less than 1/2 of the participants, however, presented some challenges, which were resolved by the introduction of a technique called “\(2{\times }1\) PoW,” which is used to compose two modes of mining, one for blocks and one for inputs. In a nutshell, in \(2{\times }1\) PoW, a random oracle output is checked twice with respect to both its leading zeros and tailing zeros. Sufficient leading zeros implies the success of mining a block, and that’s the original—i.e., Bitcoin’s—approach to assess and verify whether a PoW has been produced, while sufficient trailing zeros imply the success of mining an input. This scheme guarantees that both mining procedures can be safely composed and the adversary is bound to its original computational power and is not able to favor one PoW operation over the other.

The \(2{\times }1\) PoW primitive has found applications in many other scenarios (e.g., [39]) and its generalization—\(m{\times }1\) PoW—makes parallel chains possible and has been used to improve transaction throughput [3] and for accelerating transaction confirmation [22]. We note that, in the case of parallel chains existing \(m{\times }1\) PoW constructions cannot achieve full independence on all parallel chains. We elaborate on this in the full verision of this paper [24].

Non-simultaneous Termination and Sequential Composition. A consequence of the round complexity “acceleration” provided by randomized BA protocols is that their termination is probabilistic and not necessarily simultaneous [15]. This is problematic when this type of BA protocol is invoked by a higher-level protocol. More specifically, parties would not be able to figure out when to safely return to the higher-level protocol and start the next execution. One solution is to run randomized BA protocols for \(O(\textsf{polylog} \kappa )\) rounds where \(\kappa \) is the security parameter. The running time is still independent of the number of parties, and, with overwhelming probability, parties would terminate and be able to start the next execution when \(O(\textsf{polylog} \kappa )\) rounds have elapsed. A more sophisticated sequential composition approach is to employ so-called “Bracha termination” and “super-round” expansion in order to preserve an expected-constant round complexity (cf. [12]). We adapt these techniques to the permissionless setting.

Settlement Latency in State Machine Replication. Most PoW-based SMR protocols achieve liveness in a time which is a function of the security parameter, hence suffering from long transaction settlement latency. The “Ledger Combiner” approach [22] proposes a novel grade assignment function to build a virtual ledger on top of different parallel ledgers, achieving constant settlement time but only for non-conflicting transactions. Prism [3] also gives a PoW-based parallel chain protocol with expected-constant settlement time, but only for non-conflicting transactions. Other approaches to fast transaction settlement include Algorand’s [11], which being Proof-of-Stake-based, achieves expected-constant settlement delay for all types of transactions. Finally, Momose and Ren [36] achieve expected-constant confirmation delay, assuming a PKI and VRFs.

Due to space limitations, complementary material, the detailed specification of some of the building blocks and algorithms, as well as all the proofs, are presented in the full version of this paper [24].

2 Model and Preliminaries

Our model of computation follows Canetti’s formulation of “real world” notion of protocol execution [9, 10] for multi-party protocols. Inputs are provided by an environment program \(\mathcal {Z}\) to parties that execute the protocol \(\varPi \). The adversary \(\mathcal {A}\) is a single entity that takes control of corrupted parties. \(\mathcal {A}\) can take control of parties on the fly (i.e., “adaptive”) and is allowed to observe honest parties’ actions before deciding her reaction (i.e., “rushing”). To specify the “resources” that may be available to the instances running protocol \(\varPi \)—for example, access to reliable point-to-point channels or a “diffuse” channel (see below)—we will follow the approach of describing them as ideal functionalities in the terminology of [10].

Clock, Random Oracle, Diffusion and CRS Functionalities. We divide time into discrete intervals called “rounds.” Parties are always aware of the current round (i.e., synchronous processors) and this is captured by a global clock \(\mathcal {G}_{\textsc {Clock}}\) [33]. By convention, the hash function H to generate PoWs is modeled as a random oracle \(\mathcal {F} _{\textsc {RO}}\). Message dissemination is synchronous and it guarantees that all honest messages sent at the current round to be delivered to all honest parties at the beginning of the next round. This synchronous communication behavior is captured by \(\mathcal {F} _{\textsc {Diffuse}}\) and the adversarial power is limited to reorder messages and let honest parties receive messages originally from \(\mathcal {A}\) in two adjacent rounds by selectively choosing the receiver in the first round. Finally, we model a public-state setup by the common reference string (CRS) functionality \(\mathcal {F} _{\textsc {CRS}}^\mathcal {D} \) with some distribution with sufficiently high entropy. A full specification of the above resources can be found in the full version of this paper [24].

Honest Majority. We express our honest majority condition in terms of parties’ computational power, measured in particular by the number of RO queries that they are allowed per round, as opposed to by the number of parties (which are assumed to have equal computational power—cf. [26]).

Definition 1

(Honest majority). Let \(h_r, t_r\) denote the number of honest and corrupted random oracle queries at round r respectively. For all \(r \in \mathbb {N} \), it holds that \(h_r > t_r\).

To limit the adversary to make a certain number of queries to \(\mathcal {F} _{\textsc {RO}}\), we adopt the “wrapper functionality” approach (cf. [2, 29, 30]) \(\mathcal {W}(\mathcal {F} _{\textsc {RO}})\) that wraps the corresponding resource, thus enforcing the limited access to it.

Byzantine Agreement. We adapt the definition of the consensus problem (aka Byzantine agreement [35]) to our permissionless setting (cf. [26]). Note that here agreement implies (eventual) termination.

Definition 2

(Byzantine agreement). A protocol \(\varPi \) solves Byzantine Agreement in the synchronous setting provided it satisfies the following two properties:

  • Agreement: There is a round after which all honest parties return the same output if queried by the environment.

  • Validity: The output returned by an honest party \(\textsf{P}\) equals the input of some party \(\textsf{P} '\) at round 1 that is honest at the round \(\textsf{P}\) ’s output is produced.

Blockchain Notation. A block with target \(T \in \mathbb {N}\) is a quadruple of the form \(\mathcal {B} = \langle ctr, r, h, x \rangle \) where \(ctr, r \in \mathbb {N}\), \(h \in \{0, 1\}\) and \(x \in \{0, 1\}^*\). A blockchain \(\mathcal {C}\) is a (possibly empty) sequence of blocks; the rightmost block by convention is denoted by \(\textrm{head}(\mathcal {C})\) (note \(\textrm{head}(\varepsilon ) = \varepsilon \)). These blocks are chained in the sense that if \(\mathcal {B} _{i + 1} = \langle ctr, r, h, x \rangle \), then \(h = H(\mathcal {B} _i)\), where \(H(\cdot )\) is cryptographic hash function with output in \(\{0, 1\}^\kappa \). We adopt \(\textsf{TS}(\mathcal {B})\) to denote the timestamp of \(\mathcal {B}\); and, slightly abusing the notations and omitting the time r, we will use \(\mathcal {C} ^{\lceil {k}} \) to denote the chain from pruning all blocks \(\mathcal {B}\) such that \(\textsf{TS}(\mathcal {B}) \ge r - k\). Let \(\mathbb {C} = \langle \mathcal {C} _1, \mathcal {C} _2, \ldots , \mathcal {C} _m \rangle \) denote m parallel chains and \(\mathbb {C} _j\) the j-th chain \(\mathcal {C} _j\) in \(\mathbb {C}\).

Finally, we introduce some basic string notation, which will be useful when describing our multi-chain-oriented PoW mechanism. For a \(\kappa \)-bit string s, where \(\kappa \) is the security parameter, we will use \(s_i~(i \in [m])\) to denote the i-th bit of s, \([s]_{i\sim m}\) to denote the i-th segment after s is equally divided into m segments—i.e., \([s]_{i\sim m} = s_{[(i - 1) * \kappa / m ]+ 1}, \ldots , s_{i * \kappa / m}\). We will write \([s]^{\textsf{R}}\) as the reverse of string s (i.e., by flipping all its bits), and use \([s]^{\textsf{R}}_{i\sim m}\) to denote the reverse of the i-th segment.

3 Chain-King Consensus

In this section, we present our permissionless expected-constant-time Byzantine agreement protocol, which we name \(\textsf{ChainKingConsensus}\)Footnote 4. We first sketch the basic protocol approach—parallel chains in Sect. 3.1 and phase-based chain-selection rule in Sect. 3.2. Then, we describe the main protocol in Sect. 3.3. We next show how to adapt the one-shot protocol execution using sequential composition in order to decide on a series of outputs in Sect. 3.4.

3.1 Parallel Chains and \(m{\times }1\) Proofs of Work

We introduce a new approach to achieve full independence of mining on parallel chains while preserving the original simple structure. At a high level, our scheme emulates an ideal setting of m parallel oracles while bounding the security loss that such parallel mining incurs. More specifically, the protocol will run \(m = \varTheta (\textsf{polylog} \kappa )\) parallel chains; note that the number of bits allocated on each chain will still be super-logarithmic in the security parameter (i.e., \(\kappa / \varTheta (\textsf{polylog} \kappa ) = \varOmega (\textsf{polylog} \kappa )\)), and hence the protocol will allow an arbitrary number of participants. Later we will show that (i) poly-logarithmically many parallel chains suffice to achieve the desired convergence; and (ii) poly-logarithmically many bits (those will be the bits available to each of the parallel random oracle invocations) will suffice to eliminate bad events with respect to the random oracle.

Our Parallel Chain Structure. We will use \(m = \varTheta (\log ^2 \kappa )\) parallel chains as the basic building block for \(\textsf{ChainKingConsensus}\). Importantly, on each chain we will employ the \(2{\times }1\) PoW technique [26] to bind the mining process of the chain with input messages (which will be used to reach consensus; details in Sect. 3.3). At a high level, this can be viewed as running m ideal parallel repetitions of a \(2{\times }1\) PoW blockchain.

We will call the blocks that form the blockchains a chain-block (or block for short) and denote it by \(\mathcal {B}\), and the application data field, which will contain consensus-related values, we will call an input-block, and denote it by \(\texttt{IB}\). Since the protocol will run an m chain production procedure and m input-block production procedure, we will make a one-to-one correspondence between the chain-blocks and input-blocks. More precisely, the input-block produced by the i-th segment of the RO output will only be valid on the i-th parallel chain. See Fig. 1 for an illustration of the RO output and how successes on the bounded mining procedures are achieved.

Fig. 1.
figure 1

The mining process on our parallel chain. We assume that the target value is appropriately set so that at least 6 leading zeroes imply a success of the chain’s block mining and at least 6 tailing zeroes blocks implies a success of the input mining. The blocks’ superscript denotes on which chain they will be valid.

We now provide details on the blocks’ structure. Since the mining procedure of chain-blocks and input-blocks are bound together, they share the same block header \(\langle ctr, r, h, st, h', val \rangle \), which is a concatenation of random nonce \(ctr \in \mathbb {N} \), timestamp \(r \in \mathbb {N} \), previous hash reference \(h \in \{0, 1\}^\kappa \), block state \(st \in \{0, 1\}^*\) (Merkle root of content), input freshness \(h' \in \{0, 1\}^\kappa \), and input message \(val \in \{0, 1\}^*\). Note that the previous hash h is a string of \(\kappa \) bits, consisting of m segments of the previous block hash of length \(\kappa / m\). Block state st is a concatenation of m block states; this is by convention the Merkle tree root of block content whose details we will omit for now (later on we will use \(\textsf{Blockify}\) to denote the procedure of generating block states). Input freshness \(h'\) is a string of \(\kappa \) bits and can be extracted from the (local) chain by procedure \(\textsf{ExtractInputFreshness}\). We defer the details of this algorithm to Sect. 3.2 and use it in a black-box way here. The input value val is the message that is of concern to the consensus protocol. For instance, in the case of binary consensus, \(val \in \{0,1\}\) and in multi-valued consensus val is a value picked from a larger input domain. Looking ahead, we note that when performing “slack” reduction and sequential composition of protocol instances, val may convey additional information (Sect. 3.4). Moreover, we remark that for all parallel chains, parties will try to mine the input-block that contains the same input message, hence, unlike hst and \(h'\), in this field all values only need to appear once.

We note that as multiple mining procedures are bound together, for a valid block with respect to a particular chain, those header bits associated with other procedures become “dummy” and will only be useful when validating whether the block corresponds to a successful PoW. We now provide details about such dummy information. Regarding a valid chain-block on the i-th chain, only the nonce ctr, timestamp r, i-th segment of previous hash reference \([h]_{i\sim m}\) and i-th segment of block state \([st]_{i\sim m}\) are useful information. All other bits in h and st, along with input freshness reference \(h'\) and input message val are dummy information and they are merely used in the PoW validationFootnote 5 On a similar vein, for input-blocks that are valid on the i-th chain, only the nonce ctr, timestamp r, input message val and i-th segment of fresh randomness \([h']_{i\sim m}\) are useful information; all other bits in \(h'\), previous hash reference h and block content root st are dummy information.

We are now ready to describe the mining procedure. Given a parallel chain \(\mathbb {C}\), block state st and input val, first, the protocol extracts the previous hash reference by concatenating the i-th segment of block hash computed from the tip of the i-th chain (recall that each segment is a \((\kappa / m)\)-bit string). (When the chain is empty it refers to the corresponding segment in the CRS.) Next, after calling \(\textsf{ExtractInputFreshness}\) on \(\mathbb {C}\) to obtain the input randomness, the protocol queries the random oracle and gets output u. Then, it divides u into m segments of equal length and iterates over those segments. If the original i-th segment is less than T, the protocol successfully mines a new chain-block on the i-th chain and appends it to \(\mathbb {C}\). If the reverse of i-th segment is less than T, the protocol succeeds in mining an input-block and stores it locally (and will be diffused in the future). See [24] for a full description of the mining procedure.

Basic Properties of our Parallel Chain Structure. As a warm-up, we present a preliminary analysis of our parallel chain structure. The goal is to show that, when appropriately parameterized, a constant fraction of the parallel chains will have “good” properties.

Our main analytical approach follows that in [23, 27], where the focus is on whether an execution on a single chain is typical—i.e., whether random variables related to the execution on this single chain stays close to their expected values and bad events with respect to the RO never happen. In [23] it is proved that any execution of the protocol for a number of rounds at least polylogarithmic in the security parameter, is typical with overwhelming probability.

Here we apply the above technique to a constant number of rounds and adapt it to our setting where the mining procedure of chain-blocks and input-blocks are bound together. Importantly, we are interested in the random variables expressing the total number of rounds that at least one honest chain-block (resp., input-block) is produced, the total number of rounds that exactly one honest chain-block is produced, and the total number of adversarial successes on chain-blocks (resp., input-blocks). For the sake of conciseness, here we provide an informal description of typical executions; refer to [24] for more details.

Definition 3

(Typical execution, informal). An execution is typical if for any set of at least k consecutive rounds, bad events (collisions) on the RO never happen, and the following quantities stay close to their expected values:

  1. (i)

    the number of rounds where at least one honest chain-block (resp., input-block) is produced;

  2. (ii)

    the number of rounds where exactly one honest chain-block is produced;

  3. (iii)

    the number of adversarial chain-blocks (resp., input-blocks).

Note that since in our case k is a constant, the probability that in a k-round window the execution is typical is constant due to Chernoff bounds. Hence, the probability that an execution running for \(L = \textsf{poly} (\kappa )\) rounds is typical will be negligible. Nonetheless, let us consider a constant number \(\rho \in \mathbb {N} ^+\) of rounds. When the protocol is appropriately parameterized, the execution running for \(\rho \) steps will be typical with constant probability. The intuition here is that, the number of windows of at least k rounds within the period of \(\rho \) rounds is \(\varTheta (\rho ^2)\). For any constant \(\beta < 1\), when the probability that a k-round time window is typical is \(\alpha \), then by choosing \(\rho \le \sqrt{\ln \beta / \ln \alpha }\) we get the desired convergence probability. Moreover, for the same \(\beta \), \(\rho \) can be chosen as an arbitrary multiple of k.

Given the full independence of the m mining processes, we show that when the number of m parallel chains is sufficiently large, the success probability of a single execution being typical translates to the fraction of typical executions among the m parallel executions, yielding the following:

Theorem 1

For any \(\beta < 1\), running \(m = \varTheta (\log ^2 \kappa )\) parallel chains as described above for a constant number \(\rho \) of rounds, results in at least a \(\beta \) fraction of them being typical with overwhelming probability in \(\kappa \).

3.2 From Parallel Chains to Phase Oblivious Agreement

Given that running parallel chains from the CRS enjoy good properties only when the lifetime of the execution is bounded by a constant (Theorem 1), we now show how to combine the parallel chain structure with a novel phase-based cross-chain reference scheme in order to provide fresh randomness and extend the protocol running time to any polynomial in terms of the security parameter. This gives us novel chain validation and selection rules. Moreover, we show that in each phase, the approach achieves what we call phase oblivious agreement, which serves as an essential building block in our \(\textsf{ChainKingConsensus}\) protocol.

In this section, we assume static participation where parties are always online and their number is fixed yet unknown to any protocol participant. Later on (Sect. 4.2), we elaborate on how to let new joining parties synchronize with other participants.

Protocol Phases. We divide the protocol execution time into sequential, non-overlapping phases of length \(\rho \) rounds. Note that \(\rho \) is a constant and at round i parties are in the \(\lceil i / \rho \rceil \)-th phase (the phase index starts at 1). As we assume synchronous processors, parties are always aware of the current round and phase numbers (they maintain local variables \(\texttt{r}\) and \(\texttt{phase}\) to store this information).

In contrast to the “conventional” longest-chain consensus rule where parties keep extending chains starting from the genesis block, in our protocol in each phase parties will build parallel chains separately, which we will denote \(\mathbb {C} ^{(i)}\), and the j-th chain in the i-th phase by \(\mathbb {C} ^{(i)} _j\). Let \(\mathbb {C}\) now denote the sequence of parallel chains in each phase—i.e., \(\mathbb {C} = \mathbb {C} ^{(1)}, \mathbb {C} ^{(2)}, \ldots , \mathbb {C} ^{(i)} \). In the first phase, \(\mathbb {C} ^{(1)} \) points to the CRS, thus the adversary starts the computation simultaneously with the honest parties. Unfortunately, the CRS is only available at the onset of the execution, and hence, naïvely, there is no method to prevent the adversary from mining into the future—e.g., when it is in phase i, he can mine blocks for phase \(i + 1\). If pre-mining is possible for an unbounded time, then no security guarantees can be achieved in the \((i + 1)\)-th phase even if typical execution holds.

One conventional method to solve the pre-mining problem in blockchains (cf. [39]) consists of referring to a stable block with randomness that is unpredictable to the adversary (e.g., an honest block). Unfortunately, since phases here only last for a constant number of rounds, thus far there is no approach that would enable parties to explicitly agree on common unpredictable randomness in constant time (as this would directly imply full agreement on a non-trivial fact, which is our goal). Without a full agreement on common randomness, the adversary can split the honest computational power by building a chain with randomness that is acceptable by, say, half of the honest parties but that will be rejected by the rest. In such way the adversary can then split the honest computational power and thus completely break the security of the protocol.

To overcome the failure of the conventional common fresh randomness approach, we propose a new scheme called “cross-chain reference” to secure the execution on parallel chains in the second and subsequent phases. In short, cross-chain reference asks for all chains in the i-th phase to point to a large fraction of the chains in the \((i - 1)\)-th phase that are “dense,” a property which we will elaborate on soon.

As a preparation for securing phase-based parallel chains, we first introduce the structure of a phase (see Fig. 2). A phase, consisting of \(\rho \) rounds, is further divided into three non-overlapping stages. A block is assigned to a specific stage based on its timestamp. The first stage, view convergence, consists of the first \(\rho _{\textsf {view}}\) rounds in a phase. It guarantees that at the end of this stage, on sufficiently many parallel chains, honest parties agree on a common prefix obliviously and they input some recent randomness so that the adversary cannot pre-compute too many blocks for the next stage. Then, the second stage, output generation, which consists of \(\rho _{\textsf {output}}\) rounds after the view convergence stage, is used to decide the output of this phase. Only input blocks that are included by chain-blocks in this stage will be considered in the decision making procedure at the end of this phase (details in Sect. 3.3). The length \(\rho _{\textsf {output}}\) is chosen sufficiently large so that the honest input-blocks account for the majority on sufficiently many parallel chains. Finally, the last stage, reference convergence, consists of the last \(\rho _{\textsf {ref}}\) rounds. This last stage is used to secure the blocks that will be pointed by the cross-chain reference. Note that \(\rho _{\textsf {ref}}\) is also the upper bound on adversarial pre-mining—i.e., the adversary cannot start to mine blocks in the next phase \(\rho _{\textsf {ref}}\) rounds earlier than the honest parties.

Fig. 2.
figure 2

An illustration of a party \(\textsf{P}\) ’s local parallel chains \(\mathbb {C} _{\textsf{local}}\) and dense chains \(\texttt{denseChains}\) . In order for initial blocks in phase \(i + 1\) to be valid, they should point to at least 2 dense chains in phase i . In this toy example, all blocks point to the first dense chain in \(\mathbb {C} _{\textsf{local}}\) and the third dense chain in \(\texttt{denseChains}\) . Note that the second chain is not dense.

Dense Chains. Next, we introduce a new concept called dense chains which asks for the density of a chain (in terms of the number of blocks with timestamps in a given time period) and can also be used as a proof of “chain growth” (cf. [23]).

Definition 4

(Dense chains). A chain \(\mathcal {C}\) is a \((\tau , s, u, v)\)-dense chain if for any set \(S = \{p, \ldots , q \}\) of consecutive rounds such that \(u \le p < q \le v\) and \(|S| > s\), there are at least \(\tau \cdot |S|\) blocks in \(\mathcal {C}\) with timestamp in S. A chain \(\mathcal {C}\) is a dense chain on phase i if it is a \((\tau , \rho _{\textsf {ref}}, (i - 1) \rho + \rho _{\textsf {view}}, i \cdot \rho )\)-dense chain—i.e., the chain is dense in the last two stages of the i-th phase.

We choose the density parameter \(\tau \) in such a way that when typical execution property holds on a single chain, the following two properties are guaranteed: (i) even if the adversary completely stops producing PoWs, the honest parties by themselves can produce a dense chain; and (ii) in the i-th phase, the adversary cannot come up with a dense chain before the reference convergence stage.

With foresight, the purpose of dense chains is to secure the execution of future phases, by asking parties to provide sufficiently many dense chains as a proof of having invested enough computational power before the current phase.

Cross-Chain References. Next, we elaborate on the cross-chain reference approach which we use to “link” neighboring phases (this provides unpredictability so that the adversary can only pre-mine for a bounded amount of time). At a high level, a cross-chain reference on an initial block in the j-th chain and i-th phase is a \(\kappa \)-bit string consisting of m pointers to m sufficiently deep blocks on chains in the \((i - 1)\)-th phase. These deep blocks are picked as the last blocks in the output generation stage, one on each chain. Their hashes (that is, the j-th segment of a block hash, for a block on the j-th chain) are concatenated to form the \(\kappa \)-bit string. We assign this reference to the input freshness \(h'\) (recall our block header structure in Sect. 3.1) in the initial blocks on each chain’s i-th phase. For a cross-chain reference to be considered valid, it should point to at least a large fraction of deep blocks in dense chains in the previous phaseFootnote 6. However, these dense chains are not necessarily required to match the parties’ own chains of the previous phase, but can be attached as a proof of validity.

To facilitate the chain validation and selection algorithm, a party \(\textsf{P}\) maintains local variables \(\mathbb {C} _{\textsf{local}}\) to record her own parallel chains and \(\texttt{denseChains}\) to bookkeep all valid (single) dense chains that are not in \(\mathbb {C} _{\textsf{local}}\). Note that \(\texttt{denseChains}\) and \(\mathbb {C} _{\textsf{local}}\) are diffused together. In more detail, \(\texttt{denseChains}\) is a two dimension vector with \(\texttt{denseChains} [i][j]\) containing a (possibly empty) set of (single) dense chains that a party has seen as the j-th chain in i-th phase. Party \(\textsf{P}\) also maintains a local variable \(\texttt{chainBuffer}\) which contains all pairs of \(\langle \mathbb {C}, \texttt{denseChains} \rangle \) that \(\textsf{P}\) receives at the beginning of the round. Refer to Fig. 2 for an illustration of our phase-based parallel chain.

We now formalize the \(\textsf{ExtractInputFreshness}\) procedure (see [24]) which parties use to extract cross-chain reference and fresh randomness for input-blocks. Specifically, when this algorithm is called in the view convergence stage of the i-th phase (\(i > 1\)), it returns a \(\kappa \)-bit string which is a concatenation of hashes of the blocks with largest block height whose timestamp is less than \((i - 1) \rho - \rho _{\textsf {ref}} \) on each chain. When \(\textsf{ExtractInputFreshness}\) is called in the output generation stage, it returns the concatenation of m hashes of the blocks that are k-rounds before the end of the view convergence stage in this phase (we will show later that k is the parameter for common prefix on typical chains). When this algorithm is called at any other time, it returns an all-zero string.

Parallel-Chain Validation Algorithm. Recall that in our protocol, we use a \(m{\times }1\) PoW scheme to mine m parallel chains, and, on each chain, we use \(2{\times }1\) PoW to bind the mining process of chain-blocks \(\mathcal {B}\) and input-blocks \(\texttt{IB}\) together; moreover, we divide chains into phases and introduce cross-chain reference to link neighbouring phases. Our validation rule will consider the validity of all blocks, chains (in a single phase) and the cross-chain references. Specifically, a parallel chain \(\mathbb {C}\) (with its associated \(\texttt{denseChains} \)) will be considered valid if the following holds (refer to [24] for a full description):

  • Valid single chains. For any \(\mathcal {C} = \mathcal {B} _1, \mathcal {B} _2, \ldots , \mathcal {B} _n\) (either a \(\mathbb {C} ^{(i)} _j\) or in \(\texttt{denseChains} [i][j]\)), \(\mathcal {C}\) should be a valid single chain. More specifically, \(\mathcal {C}\) is a valid chain if (i) all blocks are the result of successful PoWs; (ii) all blocks’ state st match their corresponding block content; and (iii) for all \(i > 1\), \(B_{i}\) refers to the hash of \(B_{i - 1}\). Additionally, for chains in the first phase, \(\mathcal {B} _1\) should point to the CRS.

  • Valid input blocks. For any input block \(\texttt{IB}\) included in \(\mathcal {C}\) in the i-th phase, \(\texttt{IB}\) should pass the following check: (i) it reports a unique hash among all input-blocks; (ii) the timestamp of \(\texttt{IB}\) falls in the output generation stage; (iii) \(\texttt{IB}\) is a successful PoW and contains a valid input message val; and (iv) \(\texttt{IB}\) points to the last block on \(\mathcal {C}\) with timestamp less than \((i - 1)\rho + \rho _{\textsf {view}}- k\) (i.e., good fresh randomness).

  • Valid cross-chain reference. In the i-th phase (\(i > 1\)), all initial blocks of chains in \(\mathbb {C} ^{(i)}\) and \(\texttt{denseChains} [i]\) report good cross-chain reference. In order for a cross-chain reference to be good, at least a \(\beta > 3 / 4\) fraction of hashes should match the last blocks in the output generation stage on dense chains in the \((i - 1)\)-th phase, either in \(\mathbb {C}\) or \(\texttt{denseChains} \). Note that their positions should also match—i.e., the j-th segment of reference should match a deep block in \(\mathbb {C} ^{(i - 1)} _j\) or \(\texttt{denseChains} [i - 1][j]\).

We remark that our chain validation rule is different from that used in both the single chain validation as well as in all previous parallel-chain constructions due to its novel cross-chain reference mechanism. Specifically, starting in the second, the initial block on a single chain \(\mathcal {C}\) does not directly point to the last block in the previous phase—i.e. its previous state reference h becomes dummy. As long as \(\mathcal {C}\) provides a valid cross-chain reference and forms a valid single chain, \(\mathcal {C}\) will be considered as valid. We note that since previous state references (the hash pointer) between neighboring phases are not continuous, the adversary is allowed to keep extending the head of the chains in the previous phase by keeping mining and inserting blocks. Moreover, as our protocol does not ask for cross-references to all previous chains, it is also possible that honest parties never hold exactly the same parallel chain.

We now provide some more intuition on these two new properties. Regarding the adversarial extension of chains from previous phases, parties will check-point their chains phase-by-phase (see the chain selection rule below), hence this does not undermine the security of online parties. Regarding the possible disagreement on a certain fraction of the parallel chains, we note that this is unavoidable. Otherwise, if parties were aware that they would achieve a full agreement on a specific phase, this would directly imply that they reach consensus (and with simultaneous termination!). Our goal is to let honest parties share parallel chains such that in each phase, they obliviously agree on the prefix of a large fraction of the chains.

Parallel-Chain Selection Algorithm. We now introduce the chain selection algorithm. In a nutshell, this algorithm does not update local parallel chains as a whole; rather, it updates each single chain in the current phase—i.e., after phase i has passed, \(\mathbb {C} _{\textsf{local}}\) is check-pointed up to phase i and all chains in the previous phases will never be changed. When parties are in the first phase, they use the longest chain rule to select each single chain separately. When parties are in the i-th phase \((i > 1)\), a party \(\textsf{P}\) processes the chains stored in \(\texttt{chainBuffer}\) as follows:

  • Filter invalid chains. For any \(\mathbb {C} \in \texttt{chainBuffer} \), if \(\mathbb {C}\) is not a valid chain, \(\textsf{P}\) rejects \(\mathbb {C}\) immediately and removes it (as well as its associated dense chains) from \(\texttt{chainBuffer}\).

  • Update \(\texttt{denseChains}\) . For all \(i' < i\) and \(j \in [m]\), \(\textsf{P}\) updates \(\texttt{denseChains} [i'][j]\) as follows. If there is a valid dense chain \(\mathcal {C}\) as the j-th chain in phase \(i'\) (either in \(\mathbb {C}\) or \(\texttt{denseChains} [i'][j]\) from another party) that forks from all the chains in \(\texttt{denseChains} [i' - 1][j]\) for more than \(\rho _{\textsf {ref}}\) rounds, \(\textsf{P}\) adds \(\mathcal {C}\) to \(\texttt{denseChains} [i' - 1][j]\) (i.e., it bookkeeps new dense chains with new cross-chain reference pointer blocks).

  • Adopt longer chains. \(\textsf{P}\) uses the longest chain rule to select chains in the current phase. For any incoming chain \(\mathbb {C}\), if \(\textrm{len}(\mathbb {C} ^{(i)} _j) > \textrm{len}(\mathcal {C}) \) where \(\mathcal {C}\) is the j-th chain in \(\mathbb {C} _{\textsf{local}} ^{(i)}\), then \(\textsf{P}\) updates \(\mathcal {C}\) to \(\mathbb {C} ^{(i)} _j\).

Refer to [24] for a detailed description of the above rules.

Phase Oblivious Agreement. Notice that in each phase, the probability that for a large fraction of chains their execution is typical (Definition 3) is overwhelming. Further, our phase-based parallel-chain structure and density-based chain validation and selection rules guarantee that the adversary can only pre-mine for a bounded amount of time, hence “good” properties—i.e., agreement and chain quality (high enough fraction of honest blocks) on the input-blocks—hold on a large fraction of the chains in every phase. Except that as parties are not able to discern on which chains they have agreement, agreement is achieved obliviously, yielding the following:

Theorem 2

(Phase oblivious agreement). There exist protocol parameterizations such that the following properties hold. Let \(\beta \in (3 / 4, 1)\) and consider a phase i. Let \(\mathbb {C}, \mathbb {C} '\) denote the parallel chains held by two honest parties \(\textsf{P}, \textsf{P} '\) at rounds \(r, r'\) after phase i (i.e., \(\min \{r, r'\} > i \rho \)), respectively. Then there exists a subset \(S \subseteq \{1, 2, \ldots , m \}\) of size larger than \(\beta \cdot m\) such that for all \(j \in S\), the following two properties hold on chains \(\mathcal {C} = \mathbb {C} ^{(i)}_j\) and \(\mathcal {C} ' = \mathbb {C} '^{(i)}_j\).

  • Agreement: \({\mathcal {C}}^{{\lceil {\rho _{\textsf {ref}}}}} = {\mathcal {C} '}^{{\lceil {\rho _{{\textsf {ref}}}}}}\).

  • Honest input-block majority: For all input-blocks included in the output generation stage of \(\mathcal {C}\) and \(\mathcal {C} '\), more than half of them are produced by honest parties.

3.3 From Phase Oblivious Agreement to Chain-King Consensus

In this section we explain how our chain-king consensus protocol can be derived from phase-based parallel chains. We present \(\textsf{ChainKingConsensus}\) as a multi-valued consensus protocol with input domain \(V, |V| \ge 2\).Footnote 7 For simplicity we assume inputs are scalars, but the formulation can be easily adapted to any other type of input.

At a high-level chain-king consensus can be viewed as following the “phase king” approach (cf. [5, 6]) with randomized king selection on top of phase-based parallel chains. The execution is based on the iteration of 3 phases. Parties will only terminate at the end of each iteration (i.e., the phase with index a multiple of 3). Two thresholds, more than one half of the number of chains \(( > m / 2)\) and more than three-quarters \(( > 3m / 4)\), are of interest. Importantly, a distinguished chain—the first chain \(\mathbb {C} _1\)—is identified as the king chain. This king chain is hard-coded in the protocol and will never change during the whole execution.

Similarly to all existing consensus protocols with probabilistic termination, in \(\textsf{ChainKingConsensus}\) parties might terminate at different phases. We measure the quality of non-simultaneous termination by measuring the maximum number of phases that two honest parties can terminate apart from each other:

Definition 5

(c-slack termination). A protocol \(\varPi \) satisfies c-slack termination if any pair of honest parties \(\textsf{P}, \textsf{P} '\) are guaranteed to terminate \(\varPi \) within c phases of each other.

Input Messages and Internal Variables. So far we have not yet specified the input messages in each phase. In (multi-valued) \(\textsf{ChainKingConsensus}\), at the onset of the protocol execution, party \(\textsf{P}\) is activated with an input \(v \in V\). \(\textsf{P}\) starts to mine input messages (i.e., by setting a variable \(\texttt{val} \) in the RO query—see Sect. 3.1) which is their current suggestion for the protocol output; \(\textsf{P}\) will terminate based on her local states which we will detail soon.

In addition to variable \(\texttt{val} \in V\), \(\textsf{P}\) locally manages two Boolean variables \(\texttt{lock}\) and \(\texttt{decide}\) which are both initialized to \(\textsf{false}\), and a three-valued variable \(\texttt{exit} \in \{\infty , 1, 0\}\) which is initialized to \(\infty \). In more detail:

  • Variable \(\texttt{val}\) reflects \(\textsf{P}\) ’s suggestion on the output, and can be modified if in certain phases \(\textsf{P}\) receives sufficiently many different input values.

  • Value \(\texttt{lock}\) indicates whether \(\textsf{P}\) will “listen” to the king-chain (see Algorithm 1 below for details) in the last phase of an iteration. It is set to \(\textsf{true}\) if parties are confident that all honest parties will set their \(\texttt{val}\) to the same value. If \(\texttt{lock}\) remains \(\textsf{false}\) at the end of an iteration, \(\textsf{P}\) will update her \(\texttt{val}\) based on her local view of the king-chain. If \(\textsf{P}\) has not decided at the end of an interation and \(\texttt{lock}\) is set to \(\textsf{true}\), it is reset to \(\textsf{false}\) for the next iteration.

  • Variable \(\texttt{decide}\) is used to record whether \(\textsf{P}\) decides on her local value \(\texttt{val}\). It is set to \(\textsf{true}\) only when \(\textsf{P}\) is confident that all honest parties are going to agree on the value that she holds, and the adversary is limited to only influencing in which phase parties will terminate. When \(\texttt{decide}\) is set to \(\textsf{true}\), \(\texttt{val}\) is fixed and will never change in the future (except with neglibible probability). Further, it is set to \(\textsf{true}\) only in the first and second phase of an iteration and is checked at the last phase to see if \(\texttt{exit}\) needs to be updated.

  • Variable \(\texttt{exit}\) indicates whether \(\textsf{P}\) should stop querying the RO and producing blocks. When \(\texttt{exit} = \infty \), \(\textsf{P}\) has not yet reached the end of the iteration when she decides, hence \(\textsf{P}\) keeps updating the other variables. When \(\texttt{exit} = 1\), \(\textsf{P}\) have set \(\texttt{decide}\) to \(\textsf{true}\) and hence is ready to output \(\texttt{val}\). However, \(\textsf{P}\) is not aware if other honest parties have decided, hence \(\textsf{P}\) keeps producing blocks with \(\texttt{val}\). This will last for one iteration and then \(\texttt{exit}\) is set to 0. When \(\texttt{exit} = 0\), \(\textsf{P}\) stops making RO queries and stops the execution of (this instance of the) protocol.

We highlight one significant difference between Chain-King Consensus and classical BA protocols. In the classical setting, parties terminate the protocol once they decide on an output. For protocols with probabilistic termination, some honest parties might terminate a few rounds after other honest parties (cf. [15]). Parties who have terminated continue to send the same message to all honest parties (cf. [18, 32]), and the parties that are behind can stick to the previous message if they do not receive any new message from those parties that have already terminated. As it turns out, this strategy essentially relies on the set of participating parties being known, which does not apply the permissionless setting where parties can neither authenticate with each other nor know the source of a message. Hence, in order to let parties that are behind safely terminate, we explicitly distinguish “\(\texttt{decide}\),” which means parties output their local variable \(\texttt{val}\), and “\(\texttt{exit}\),” which means parties stop (or will stop) the PoW mining process and exit the protocol. We provide more details on “mining for one more iteration” after we introduce the state update algorithm.

Phase Output Extraction. The decision made at the end of each phase is based on the input messages collected in that phase. Since we have m parallel chains, parties will extract a vector of size m. For the i-th element in the vector, it is extracted from the medianFootnote 8 of input values that appear in the input-blocks, collected in the output generation stage (i.e., blocks with timestamps in \((i \rho - (\rho _{\textsf {output}} + \rho _{\textsf {ref}}), i \rho - \rho _{\textsf {ref}} ]\) in i-th phase). Note that since parties might disagree on some bounded fraction of the chains, different honest parties will extract different phase output vectors. Nevertheless, thanks to Theorem 2, two honest output vectors will share a large fraction of common elements obliviously. (See [24] for the full description of this process.)

State Update Algorithm. At the end of each phase (i.e., when local clocks reach round \(i \cdot \rho \)), parties run Algorithm 1 to decide whether to update their local variables or not. It generally follows the randomized phase-king algorithm approach [20], but introduces a novel king selection rule and an extra termination iteration.

figure a

We now provide a high-level overview and some intuition about the state update algorithm. In the first phase of an iteration, given phase output vector \(\vec {V}\), parties first check if more than m/2 chains report the same value val. If this is the case, they set their \(\texttt{val}\) to val. Since more than m/2 accounts for the majority of the chains, if there exists such value val then it will be unique. Further, if in their local view, more than 3m/4 of the chains report val, they set both \(\texttt{decide}\) and \(\texttt{lock}\) to \(\textsf{true}\) and they will decide at the end of iteration. The second phase is almost a repetition of the first one except that in this phase parties will not set \(\texttt{decide}\) to \(\textsf{true}\).

If during the first two phases in an iteration, a party \(\textsf{P}\) has never seen more than 3m/4 of the chains report the same value, \(\textsf{P}\) is still “confused” and its internal variable \(\texttt{lock}\) remains \(\textsf{false}\) at the end of the last phase. Under such circumstance, \(\textsf{P}\) will refer to the king chain and adopt the median value among the input-messages included—i.e., the first element in phase vector \(\vec {V}\). Note that this is different from previous phase-king style constructions, where with deterministic termination, king rotates among \(t + 1\) fixed parties (where at least one of them is honest) [5, 6], while with probabilistic termination, parties first broadcast their \(\texttt{val}\) and then run an oblivious leader election algorithm to try to agree on an honest king with constant probability [32]. In contrast, in our protocol the chain-king is always the first chain. Moreover, even though the adversary knows that the first chain is the king, he will not be able to focus on it due to the basic nature of parallel chains. As a result, given that the adversary’s power is “diluted,” parties agree obliviously with constant probability on the king-chain’s value. When the honest parties get lucky, they will start the beginning of the next iteration with a unanimous value in \(\texttt{val}\), which guarantees decision; if they do not, they will start the next iteration with a different \(\texttt{val}\) and they can hope for getting lucky with the next king chain.

Next, we elaborate on the difference between \(\texttt{decide}\) and \(\texttt{exit}\) as well as their interaction. As we mentioned earlier, even if parties have decided, they should still participate in the protocol by keeping making RO queries and diffusing blocks with their output value. This is because due to non-simultaneous termination, if parties decide in the current iteration stop from participating in the protocol, then parties that are going to decide in the next iteration would not be able to get enough information since the honest majority condition might be broken. In the classical setting, this is easily circumvented by honest parties who do not receive a message from other parties, reusing their previous message as the current input (cf. [18, 32]). However, in a PoW setting the above strategy is not feasible. Therefore we distinguish the termination of deciding output and mining blocks by using two different variables \(\texttt{decide}\) and \(\texttt{exit}\). Specifically, for any party that decides the output in i-th phase, it should first keep mining for an extra iteration (by setting \(\texttt{exit}\) to 1 and no longer update \(\texttt{val}\), \(\texttt{lock}\) and \(\texttt{decide}\)), and then terminate and set \(\texttt{exit}\) to 0 at the \((i + 3)\)-th phase (recall that an iteration consists of 3 phases).Footnote 9 After parties set \(\texttt{exit}\) to 0, they output \(\texttt{val}\) and exit the protocol.

The \(\textsf{ChainKingConsensus}\) protocol. Having presented the various protocol components, we are now ready to put things together and state what the protocol achieves. During the protocol execution, parties keep updating their local parallel chains and mining their output suggestion. At the end of each phase, they use \(\textsf{StateUpdate}\) to update their consensus-related internal variables. Upon setting their \(\texttt{exit}\) variable to 0, parties terminate the protocol and output \(\texttt{val}\). The full specification of the protocol is presented in [24].

\(\textsf{ChainKingConsensus}\) achieves agreement and validity in an expected-constant number of rounds, and since parties terminate at the end of neighboring phases, it satisfies 3-slack termination (cf. Definition 5). Further, when parties start the protocol with a unanimous input configuration, then they decide at the end of the third phase (except with negligible probability). If they do not start with an unanimous input, then the expected time for decision is \(3 / (3/4) + 3 = 7\) phases.

Theorem 3

There exist protocol parameterizations such that \(\textsf{ChainKingConsensus}\) satisfies agreement, validity and 3-slack termination with expected-constant round complexity.

Remark 1

\(\textsf{ChainKingConsensus}\) also achieves “strong validity” (i.e., that the output equals the input of at least one honest party) if (i) we change the phase output extraction from selecting the median of input-messages to the input-message with the highest plurality; and (ii) the adversarial computational power is bounded by \(t < (1 - \delta ) n / (|V| - 1)\). (This matches the lower bound in [20].)

Remark 2

We note that PoW-based Crusader Agreement [14] (where parties either output the same value v or \(\bot \), and if they start unanimously they output that value) can be achieved in constant time. Specifically, parties run \(\textsf{ChainKingConsensus}\) and terminate at the end of first phase. If a party \(\textsf{P}\) sets her \(\texttt{decide}\) variable to \(\textsf{true}\), \(\textsf{P}\) outputs \(\texttt{val}\); otherwise she outputs \(\bot \).

3.4 Fast Sequential Composition

The chain-king consensus protocol presented in Sect. 3.3 is one-shot—i.e., parties start at the same time and terminate at (possibly) different phases. This non-simultaneous termination turns out to be problematic when \(\textsf{ChainKingConsensus}\) is invoked by a high-level protocol, such as MPC or SMR, and where parties need to decide on a series of outputs repeatedly. Given the non-simultaneous termination situation, after the first invocation, parties would not be able to return to the calling high-level protocol synchronously, and in subsequent invocations, \(\textsf{ChainKingConsensus}\) does not by itself provide any security guarantees if parties start at different phasesFootnote 10. Ideally, when the same protocol is invoked multiple times, the round complexity should be preserved—i.e., for \(\ell \) sequential invocations, the total running time should be expected \(O(\ell )\) rounds.

In the classical distributed computing and cryptographic protocols literature, this is studied as the sequential composition of BA protocols, with positive results: By using so-called “Bracha termination” [8] and super-round expansion [12], a BA protocol with probabilistic termination can asymptotically preserve the same round complexity while continuously deciding on a series of outputs.

In this section we show how to achieve fast sequential composition of multiple instances of \(\textsf{ChainKingConsensus}\) by first emulating the Bracha termination strategy on parallel chains, thus enabling parties to terminate in two neighboring phases; then, for later invocations, we introduce a novel “super-phase expansion” protocol that guarantees security under non-simultaneous start while preserving the expected-constant round complexity. Note that, our “super-phase expansion” works for any slack of constant number of rounds, hence Bracha termination is in fact not necessary. Nonetheless, we will first go through this strategy since it helps to achieve a more concise and (practically) efficient result.

Bracha Termination. In our one-shot Chain-King Consensus protocol, honest parties might terminate at the end of different but adjacent iterations. We now show how to reduce this slack from one iteration (i.e., 3 phases) to one phase. The high-level idea follows Bracha’s original suggestion [8], but we adapt it to the PoW setting.

We first describe this approach in the classical setting (information-theoretic and assuming \(n \ge 3t + 1\)). In Bracha’s suggestion, as soon as a party decides on an output v or upon receiving at least \(t + 1\) messages (decidev) for the same value v, it sends (decidev) to all parties. Then, upon receiving \(n - t\) messages (decidev) for the same value v, a party outputs v and terminates.

We now elaborate on our early termination strategy which tries to emulate Bracha’s suggestion on parallel chains. Recall that in \(\textsf{ChainKingConsensus}\) the input-block content is its producer’s output suggestion val. Here we extend it two types of messages: either output suggestion val, or decide suggestion (decideval). We say a chain \(\mathbb {C} ^{(i)}_j\) decides on val if more than half of the input-blocks included in the output generation stage report (decideval) for the same val. Note that when a chain does not decide on any val, the output extraction algorithm treats all (decideval) messages the same as val.

Thus, protocol \(\textsf{ChainKingConsensus}\) is modified with the following additional steps: (i) When \(\textsf{P}\) ’s internal variable \(\texttt{decide}\) is \(\textsf{false}\), \(\textsf{P}\) includes only \(\texttt{val}\) in her input-blocks; when \(\texttt{decide}\) is \(\textsf{true}\), \(\textsf{P}\) mines \((decide, \texttt{val})\); (ii) At the end of any phase, upon observing more than m/2 chains decide on val, \(\textsf{P}\) sets her \(\texttt{val}\) to val and \(\texttt{decide}\) to \(\textsf{true}\); (iii) At the end of any phase, upon observing more than 3m/4 chains decide on val, \(\textsf{P}\) sets her \(\texttt{val}\) to val and \(\texttt{exit}\) to 1; and (iv) After setting \(\texttt{exit}\) to 1 in the previous step, \(\textsf{P}\) continues to mine \((decide, \texttt{val})\) for one more phase and then set \(\texttt{exit}\) to 0.

We present the new state update mechanism in [24].

Theorem 4

There exist protocol parameterizations such that ChainKing Consensus. modified with the above state update algorithm satisfies agreement, validity and 1-slack termination with expected-constant round complexity.

Slack-Tolerant Sequential composition of \(\textsf{ChainKingConsensus}\) . Now we present how sequential composition works in the permissionless setting. We remark that this is not a straightforward emulation of the super-round expansion technique in the classical literature as in our setting, the adversary effectively has more power in “swinging” the decision of honest parties. We elaborate on the difference between classical round expansion and our novel “super-phase expansion.”

In order to perform sequential composition, our protocol should be appropriately adjusted so that we have better quality of phase-oblivious agreement. Recall that Theorem 2 holds for any constant \(\beta < 1\). While in one-shot \(\textsf{ChainKingConsensus}\) we have protocol parameterizations such that at least three quarters of the chains will reach phase-oblivious agreement, it is possible to achieve that an arbitrary (constant) fraction of chains reach oblivious agreement. One consequence is that we will get a slow-down on the length of a phase in terms of number of rounds; the asymptotic result (i.e., expected-constant number of rounds), however, is preserved.

Furthermore, consider any \(n \in \mathbb {N} ^+\) consecutive phases in an execution of the protocol. If \(\beta \) fraction of the chains have reached oblivious agreement in one phase, then at least \([1 - n(1 - \beta )]\) fraction of chains reach oblivious agreement over all n phases. By appropriately choosing n and \(\beta \), we get the following property: In any n consecutive phases at least three-quarters of the chains achieve phase-oblivious agreement over all phases. For example, when \(\beta = 95\%\) and \(n = 3\), for any 3 consecutive phases, honest parties obliviously agree on at least three quarters of the chains. As a result, we have the following corollary to Theorem 2:

Corollary 1

(Multi-phase oblivious agreement). There exist protocol parameterizations such that the following properties hold. Consider \(n \in \mathbb {N} ^+\) consecutive phases \(i, i + 1, \ldots , i + n - 1\), \(i \ge 1\). Let \(\mathbb {C}, \mathbb {C} '\) denote the parallel chains held by two honest parties \(\textsf{P}, \textsf{P} '\) at round \(r, r'\), respectively, after the \((i + n - 1)\)-th phase (i.e., \(\min \{r, r'\} > (i + n - 1) \rho \)). Then there exists a subset \(S \subseteq \{1, 2, \ldots , m \}\) of size \(|S| > 3m / 4\) such that for any \(j \in S\) and any \(k \in \{i, i + 1, \ldots , i + n\}\), the following two properties hold on chains \(\mathcal {C} = \mathbb {C} ^{(k)}_j\) and \(\mathcal {C} ' = \mathbb {C} '^{(k)}_j\):

  • Agreement. \({\mathcal {C}}^{{\lceil {\rho _{{\textsf {ref}}}}}} = {\mathcal {C} '}^{{\lceil {\rho _{{\textsf {ref}}}}}}\).

  • Honest input-block majority. For all input blocks included in the output generation stage of \(\mathcal {C}\) and \(\mathcal {C} '\), more than half of them are produced by honest parties.

Regarding input messages, we also require that parties attach messages indicating the index of invocations and the index and steps of iterations in their input messages. That is, a valid input message in sequential composition would be of the form “This is the i-th invocation, j-th iteration and k-th phase, and my output suggestion is val.” We omit the details of encoding such messages. Moreover, in some “dummy” phases, parties are allowed to send dummy suggestion \(\bot \) that contains no information.

Given that parties can terminate and start within two neighboring phases, our super-phase expansion (which will be adopted in the second and subsequent invocations) replaces the original (aligned) phase to four (possibly unaligned) phases “input-input-input-dummy.” I.e., parties report their suggested output during the first three phases in their local view, and leave the last phase dummy. See Fig. 3 for an illustration of an aligned super-phase and an unaligned one.

Fig. 3.
figure 3

Illustration of the super-phase expansion and how parties extract the super-phase output. represents the phase where a party mines input messages with her output suggestion, and is the dummy phase. The i-th super phase is represented by ; and the associated phases to extract output are depicted with .

The decision process works as follows. When a party \(\textsf{P}\) reaches the end of a super-phase (in her local view), she decides an output (a vector of size m) for this super-phase based on the output of five previous (normal) phases (i.e., starting from one normal phase before the current super-phase (see the illustration of “Super-Phase Output” in Fig. 3). For each chain, \(\textsf{P}\) does the following. Recall that parties are allowed to report \(\bot \). When there is a (normal) phase such that more than half of the input-blocks report \(\bot \), then we say this phase reports \(\bot \). Otherwise, pick the median value of all non-\(\bot \) values (after sorting) as the output of this phase. The decisions are as follows: (i) When there are more than two phases that output non-\(\bot \) values, output the value in the second phase; and (ii) when there is one phase outputting a value val, output val for this super-phase.

Next, we provide some intuition on why adding a dummy phase at the end of a super-phase is necessary. When honest parties do not start unanimously with the same value, the adversary can join forces with those late honest parties in their last phase so that the view of honest parties are not consistent (because the parties that terminate early should make a decision when other honest parties have not yet finished their current super phase). With dummy rounds, all honest parties share a consistent view under multi-phase oblivious agreement, hence guaranteeing agreement and validity.

Moreover, keeping including the output suggestion for 3 consecutive normal phases is also necessary. For a concrete example, suppose the underlying one-shot consensus protocol achieves 1-slack termination, and the honest computational power accounts for \(60\%\) of the total (i.e., the adversary owns \(40\%\)) and honest parties are equally divided into two subsets, starting from two neighbouring phases. In other words, we have parties starting and terminate early (resp., late) that accounts for \(30\%\) of computational power. Then, if parties include their output suggestion for only two phases, the adversary can refrain from mining in the first normal phase of the early parties, and join forces with the late parties in their second normal phase but inject a non-honest input. In such a case, even if parties start unanimously with v the output of this chain under multi-phase oblivious agreement will not be v (as \(40\%\) is greater than \(30\%\)), thus violating the validity property of consensus. With 3 consecutive mining normal phases, at least two of them will overlap, an adopting an output in the second non-\(\bot \) phase will be safe.

We present more details on how our super-phase expansion can be adapted to c-slack termination for \(c > 1\) in [24].

By adopting the 1-slack termination technique and super-phase expansion, we get the following theorem for sequential composition of \(\ell \) invocations of \(\textsf{ChainKingConsensus}\).

Theorem 5

There exist protocol parameterizations such that the sequential composition of \(\ell \) invocations of \(\textsf{ChainKingConsensus}\) satisfies agreement and validity on each invocation, and the round complexity is expected \(O(\ell )\).

4 Application: Fast State Machine Replication

We now show how to adapt the sequential composition approach in Sect. 3.4 to implement a state machine replication (SMR) protocol. Our resulting protocol achieves both Consistency and expected-constant-time Liveness for all types of transactions (including conflicting ones). Namely, for any transaction \(\texttt{tx}\), when \(\texttt{tx}\) is diffused to all honest participants (miners), it takes in expectation a constant number of rounds to get settled into the immutable final ledger.

We first give our definition of SMR, and elaborate on why fast SMR protocol cannot be directly derived from the sequential composition of multi-valued Chain-King Consensus. Then, in Sect. 4.1 we propose a new method that introduces randomness to the output of the king chain and helps circumvent the above problem while preserving expected-constant settlement time for all types of transactions. Finally, in Sect. 4.2 we show how a third party observer, joining in the middle of the protocol, can catch up with honest parties and learn the state of the ledger.

SMR Background. State machine replication (SMR) is the problem of distributing the operation of a state machine across a set of replicas so that the operation of the machine is resilient to failure of a subset of the replicas. This concept was originally described in [34], and later further elaborated on by Schneider [42] where a high-level description of SMR was provided. Blockchain protocols, and in particular Bitcoin’s [37] have renewed interest in SMR definitions and constructions, as they can be seen as a way to realize SMR in a setting where there is no predetermined set of replicas. This has been studied and formalized in a series of works (e.g., [23, 25, 38]).

We now give a concise definition of SMR. A number of n servers, a subset \(\mathcal {H}\) of which is assumed to be non-faulty, maintain a log of transactions, denoted \(\textsf{Log}\). The log of each server also timestamps each transaction. The notation \(\textsf{Log} _i[t]\) denotes the log of the server \(\textsf{P} _i\) up to time t. Furthermore, it is assumed that each server has a buffer for incoming transactions, denoted by \(I_i[t]\), that are valid with respect to its view (invalid transactions are dropped). Finally, and for simplicity, assume that all well-formed transactions are admissible in the log. In SMR, the following two conditions must be satisfied:

  • Consistency: \(\forall \textsf{P} _i, \textsf{P} _j \in \mathcal {H} \) (where not necessarily \(i \ne j\)) and \(t, t'\) it holds that \( \textsf{Log}_i[t] \preceq \textsf{Log}_j [t']\) or \(\textsf{Log}_j [t'] \preceq \textsf{Log}_i[t] \).

  • Liveness: There is a parameter \(u\in \mathbb {N}\) for which the following holds: \((\forall \textsf{P} _i \in \mathcal {H}: \texttt{tx} \in I_i[t] ) \implies \texttt{tx} \in \textsf{Log}_i[t + u]\).

Typically, the Liveness parameter u is a pre-defined value according to the protocol parameterization. It is natural to extend the notion and allow u to be a random variable with a distribution that depends on the specific parameterization. I.e., given a transaction \(\texttt{tx}\) appearing in all honest buffers at time t, the probability that it is included in all honest logs at time \(t + u\) shares the same distribution with u. In our protocol, we achieve u with a geometric distribution, hence the time for \(\texttt{tx}\) to get installed in the immutable ledger is expected-constant.

Note that there are more properties of interest for SMR, such as observability, which is the requirement that a third party observer be capable of interpreting correctly the current state of the ledger by inspecting the logs of the servers.

4.1 From Sequential Composition to State Machine Replication

An SMR protocol accepts a batch of transactions as input. While we omit here the details on the particular form of the transactions, we note that the input domain is of exponential size. Thus, “strong validity” (i.e., the requirement that output is at least one honest input) is impossible even if the adversary only controls a tiny fraction of the computational power (cf. Remark 1). Also note that a unanimous start would rarely happen given that the adversary can collude with clients and send different or conflicting transactions to different parties. Therefore, if we follow the method from Sect. 3.3—e.g., to apply the median or plurality rule—to select the output on the king chain, as long as the adversary carefully selects the set of transactions, he can always make his input batch be selected as the output. By carefully constructing such transaction batches, the adversary will be able to indefinitely delay the confirmation of any honest transaction \(\texttt{tx}\), even if \(\texttt{tx}\) has been provided to all honest participants.

Proof-of-Work as a Lottery. We now present a new construction that helps preventing the adversarial control described above when parties do not start unanimously. In a nutshell, when a party \(\textsf{P}\) is still “confused” at the end of an iteration (i.e., her internal variable \(\texttt{lock}\) remains \(\textsf{false}\)), \(\textsf{P}\) adopts the output of the king chain as her new input, which is the (valid) input-block reported in the first chain, with the smallest block hash. When the honest parties obliviously agree on the king chain (which happens with constant probability), they will refer to the same block. Notice that honest parties make more RO queries than the corrupted parties. The following lemma shows that with probability (roughly) one half, the input-block with smallest block hash is produced by an honest party.

Lemma 1

Let \(h = \textsf{poly} (\kappa )\) and \(t = \textsf{poly} (\kappa )\) denote the number of random oracle queries made by honest and corrupted parties, respectively. Under honest majority assumption \((h > t)\), the probability that the smallest RO output is from an honest query is \(1 / 2 - \textsf{negl} (\kappa )\).

Fast State Machine Replication. We are now ready describe our SMR protocol. At a high level, it can be viewed as the sequential composition of Chain-King Consensus, equipped with a new phase output extraction algorithm, described as follows. When parties are extracting output in the first and second phase of an iteration, for each chain they will output v if the majority of input-blocks is v; otherwise they will output \(\bot \) (in this way, the adversary cannot let parties decide on a batch of transactions that is not an honest input in the first two stages). When they are in the third phase (i.e., that’s when the “confused” parties listen to the king chain) they will output the input-block with the smallest hash value. We provide the detailed analysis in the proof of the following theorem in [24].

Theorem 6

There exist protocol parameterizations such that the sequential composition of Chain-King Consensus with the minimum-PoW king selection rule satisfies Consistency and expected-constant Liveness.

4.2 Bootstrapping from the Genesis Block

In this section, we focus on the observability property of our SMR protocol. Recall that in Sect. 3.2, we stated that a full agreement on all parallel chains in the previous phase is impossible, and parties that join at a specific phase cannot learn the previous execution by “tracing back” using cross-chain reference. Thus, it becomes challenging or even impossible for a passive observer to join the protocol in the middle of the execution. To solve this, in this section we slightly modify our Chain-King Consensus protocol and design a bootstrapping algorithm for fresh parties to synchronize state with all honest parties. Note that the design of a bootstrapping procedure to let fresh parties join is also an essential building block for protocols that support dynamic participation.

When a fresh party \(\textsf{P} _{\textsf{new}}\) joins, \(\textsf{P} _{\textsf{new}}\) has no knowledge about the protocol execution except for the CRS and global time (recall that we assume synchronous processors). To become synchronized and learn the ledger state, \(\textsf{P} _{\textsf{new}}\) needs to bootstrap by passively listening to the protocol. We highlight that, in order for \(\textsf{P} _{\textsf{new}}\) to synchronize with other honest parties (i.e., achieving phase oblivious agreement), \(\textsf{P} _{\textsf{new}}\) needs to run a bootstrapping procedure which lasts for a constant number of rounds (precisely \(\rho \) rounds).

In order to let fresh parties join the protocol, we modify our Chain-King Consensus protocol as follows. In the i-th phase \((i > 1)\), concatenated with the consensus-related input message, parties also include the fresh randomness extracted from their local chains in the \((i - 1)\)-th phase. More specifically, they extract the hash of the last block in the output generation stage on each chain in the \((i - 1)\)-th phase of \(\mathbb {C} _{\textsf{local}}\), assemble them as a \(\kappa \)-bit string and append it to the input-block content. For chains where a typical execution holds, honest parties adopt the same block hash. Next, in i-th phase, a Crusader Agreement is run on the block hash of each chain in the \((i - 1)\)-th phase (recall from Remark 2 that a single phase suffices to serve as a Crusader Agreement protocol). I.e., for the j-th chain with a typical execution, parties agree on a unique block hash that is the same as their local \(\mathbb {C} ^{(i - 1)} _j\), and for other chains, all parties either output the same hash or \(\bot \).

Thus, when a fresh party \(\textsf{P} _{\textsf{new}}\) joins the protocol, she first passively listens to the protocol for \(\rho \) rounds so that she observes the end of a phase, say phase i. Our chain selection rule guarantees that \(\textsf{P} _{\textsf{new}}\) has parallel chains in phase i that obliviously agree with other honest parties on more than 3m/4 chains (recall Theorem 2). Now, \(\textsf{P} _{\textsf{new}}\) can “trace back” all the chains where typical execution holds by using the fresh randomness included in the current phase; and iterate them phase-by-phase. Specifically, when \(\textsf{P} _{\textsf{new}}\) is at the end of phase i, she runs the bootstrapping procedure (see [24] for the complete specification) to extract the hashes of dense chains in the previous phase and use them to form her local chain \(\mathbb {C} _{\textsf{local}} ^{(i - 1)}\). For instance, consider the j-th chain in the \((i - 1)\)-th phase. If on more than 3m/4 chains in phase i, a majority of the input blocks report fresh randomness that matches a chain \(\mathcal {C} \in \texttt{denseChains} [i - 1][j]\), then \(\textsf{P} _{\textsf{new}}\) will select \(\mathcal {C}\) and add it as the j-th chain in \(\mathbb {C} _{\textsf{local}} ^{(i - 1)}\). If no such chain exists, \(\textsf{P} _{\textsf{new}}\) will randomly pick a chain or just leave it empty.

Note that the security of both Chain-King Consensus and Crusader Agreement only rely on the consistent view of chains where typical execution holds; hence, at the end of the joining procedure, \(\textsf{P} _{\textsf{new}}\) achieves phase oblivious agreement with all honest parties. As a result, \(\textsf{P} _{\textsf{new}}\) can reconstruct the entire execution and update her internal state to build the whole ledger.