1 Introduction

Since 1980s, significant effort has been devoted to making secure computation protocols practical. This include novel garbling schemes [4, 5, 16, 47], programming tools [17, 19, 30,31,32], and their applications [18, 21, 34, 38, 44, 45]. While these works are restricted in the passive (honest-but-curious) threat model, which is fairly weak to model real-world adversaries, security against active adversaries is often more desirable.

The most practical approach for building actively-secure two-party computation protocols by far is the cut-and-choose paradigm. With cut-and-choose, roughly speaking, one party generates \(\kappa \) garbled circuits where \(\kappa \) depends on the statistical security parameter s; some fraction of those are “checked” by the other party—who aborts if any misbehavior is detected—and the remaining fraction are evaluated with the results being used to derive the final output. A rigorous analysis of the cut-and-choose paradigm was first given by Lindell and Pinkas [27], which required setting \(\kappa \) to roughly 3s, and was later optimized to \(\kappa =s\) [7, 26] since it suffices to have only one honest circuit used for evaluation (hence we call them SingleCut protocols).

For better asymptotic efficiency, Nielsen and Orlandi proposed LEGO [36], which exploited the circuit evaluator’s randomness to group individual NAND gates (as opposed to circuits in the batched-execution setting) to thwart active attacks. This idea evolved to MiniLEGO [11], which is compatible with the free-XOR technique. Their recent independent work [37] provides an implementation of their protocol [12], demonstrating the practical efficiency of LEGO approach. However, they use homomorphic commitments and the security of their protocol depends on the majority of the gadgets in every bucket being correct. In contrast, our work shows a different construction of LEGO protocols with highly competitive performance.

Researchers have also exploited the idea of batched cut-and-choose (hence we call BatchedCut) to efficiently execute a batch of N computational instances of the same function f between the same two parties using possibly different inputs [20, 28, 29]. It was believed that BatchedCut allows to reduce \(\kappa \) to \(O(s/\log N)\). However, we will show in Sect. 6.4 that this should really be \(2+O(s/\log N)\) and 2 is actually a tight bound on the complexity of any BatchedCut protocols.

1.1 Contribution

New Techniques. We propose two new optimizations for constructing efficient LEGO protocols:

  1. 1.

    The main bottleneck of LEGO protocols is wire-soldering, which converts, in a privacy-preserving way, a wire-label of a logical-gate bucket to a wire-label on a garbled gate to enable combining multiple independently garbled gates to realize a logical gate. To achieve high performance wire-soldering, we introduce a new cryptographic primitive called XOR-homomorphic interactive hash (IHash) to replace the XOR-homomorphic commitments used in prior works. We propose a simple construction of IHash by integrating Reed-Solomon codes, pseudorandom generators (PRG), and a single invocation of a w-out-of-n oblivious transfer protocol (Sect. 4.2). We proved the security of our interactive hash construction (Sect. 4.3). IHash can be a primitive of independent interest, e.g., it may also be used to efficiently solder circuits in other BatchedCut protocols.

  2. 2.

    Using IHash, we are able to improve existing LEGO-based cut-and-choose mechanism in two more aspects:

    1. (a)

      Our protocols guarantee security assuming a single correctly garbled gate exists in every bucket. In contrast, existing LEGO-based protocols [11, 12, 36, 37] require majority correctness in every bucket. This enhancement allows us to roughly reduce the number of gadgets in every bucket by 1/2 when offering 40-bit statistical security.

    2. (b)

      We can increase the faulty gate detection rate from 1/4 with previous works [11, 12, 37] to 1/2. At moderate additional cost, we can even detect faulty gates with probability 1. This technique allows us to run cut-and-choose on larger circuit components rather than individual ANDs.

The above optimizations combined not only simplify the construction of LEGO-protocols but also the analysis for deriving the cut-and-choose parameters. Due to these benefits, our approach is adopted to work with pools for building highly scalable reactive secure computation services against active attacks [49]. Independent of this work, Duplo [24] is the first to apply the fully-check technique with homomorphic commitments, while our work shows how the idea works with homomorphic hashes, enabling simpler parameter analysis.

Implementation and Evaluation. We have implemented our protocol and experimentally evaluated its performance with several representative computations. In particular, our protocol exhibits very attractive performance in handling the target function’s input and output wires: 0.57 \(\upmu \)s per garbler’s input-wire and 8.24 \(\upmu \)s per evaluator’s input-wire, and 0.02 \(\upmu \)s per output-wire, which are roughly 24x, 2.4x, and 600x faster than WMK [42]’s highly optimized designs (Fig. 9). Without exploiting parallelism, our protocol is able to execute 105.3M logical XOR gates per second and (when bucket size is 5) 45.5K logical AND per second on commodity hardware (two Amazon EC2 c4.2xlarge instances over LAN). We show, for the first time, that by cut-and-choosing SubBytes, even small applications such as a single AES could run 2x faster and consume 2x less bandwidth than cut-and-choosing ANDs.

Finally, we prove an asymptotic tight bound on the duplication factor \(\kappa \) of BatchedCut protocols (Sect. 6.4). This bound turns out to be overlooked in prior works [11, 12, 28, 36].

2 Technical Overview

Notations. We assume \(P_1\) and \(P_2\), holding x and y respectively, want to securely compute a function f(xy). We use the standard definition of actively-secure two-party computation [15]. Throughout this paper, we assume \(P_1\) is the circuit generator (who is also the IHash sender) and \(P_2\) is the circuit evaluator (who is also the IHash receiver). For simplicity, we assume that only \(P_2\) will receive the final result f(xy). We assume f can be represented as a circuit C containing N AND gates while the rest are all XORs. All vectors in this paper are by default column vectors. We summarize the list of variables in Fig. 1.

Fig. 1.
figure 1

Variables and their meanings.

2.1 LEGO Protocols

LEGO protocols belong to the BatchedCut category of cut-and-choose-based secure computation protocols [50]. For a Boolean circuit C of N logical gates, the high-level steps of a LEGO protocol to compute C are,

  1. 1.

    Generate. \(P_1\) generates a total of T garbled gates.

  2. 2.

    Evaluate. \(P_2\) randomly picks \(B \cdot N\) gates and groups them into N buckets. Each bucket will realize a gate in C. \(P_2\) evaluates every bucket by first translating wire-labels on the bucket’s input-wires to wire-labels on individual garbled gate’s input-wires, evaluating every garbled gate in the bucket, and then translating the obtained wire-labels on the garbled gates’ output-wires back to a wire-label on the bucket’s output-wire. (The wire-label translation, also called wire-soldering, is explained in more detail below.)

  3. 3.

    Check. \(P_2\) checks each of the rest \(T-BN\) garbled gates for correctness. If any of these gates was found faulty, \(P_2\) aborts. Though, due to the randomized nature of the checks, \(P_2\) will not always be able to detect it when checking a faulty gate.

  4. 4.

    Output. \(P_1\) reveals the secret mapping on the circuit’s final output-wires so that \(P_2\) is able to map the final output-wire labels into their logical bit values.

The first construction [36] was based on NANDs and require public key operations for wire-soldering. Fredericksen et al. [11] later proposed a LEGO scheme that is compatible with the notable free-XOR optimization [25] using XOR-Homomorphic commitments as a black box. Under this paradigm, it suffices to assume all the garbled gates are ANDs since all XORs can be securely computed locally and no extra treatment is needed to ensure correct behavior on processing XORs. However, due to the use of the global secret \(\varDelta \) for free-XOR, a garbled AND can’t be fully opened for check purpose. Instead, a random one of the four possible pairs of inputs to a binary gate is picked to check correctness.

Wire-Soldering. As depicted in Fig. 5, each bucket realizes a logical gate, thus has input and output wires like the logical gate it realizes. In order to evaluate an independently generated garbled gate assigned to a bucket, an input-wire of the bucket (with wire-labels \(w^0_{\mathsf {bucket}}\) and \(w^1_{\mathsf {bucket}}=w^0_{\mathsf {bucket}}\oplus \varDelta \) denoting 0 and 1) needs to be connected to the corresponding input-wire (with wire-labels \(w^0_{\mathsf {gate}}\) and \(w^1_{\mathsf {gate}}=w^0_{\mathsf {gate}}\oplus \varDelta \)) of the garbled gate to evaluate. This is done by requiring \(P_1\) to send \(d=w^0_{\mathsf {bucket}}\oplus w^0_{\mathsf {gate}}\) and \(P_2\) to xor d with the wire-label on the bucket (either \(w^0_{\mathsf {bucket}}\) or \(w^1_{\mathsf {bucket}}\)) he obtained from evaluating the previous bucket. To prevent a malicious \(P_1\) from sending a forged d, existing protocols used XOR-Homomorphic commitments to let \(P_1\) commit \(\varDelta \), \(w^0_{\mathsf {bucket}}\), and \(w^0_{\mathsf {gate}}\) (which allows \(P_2\) to derive the commitment of d homomorphically), so that \(P_2\) can verify the validity of d from its decommitment without learning any extra information about \(w^0_{\mathsf {bucket}}\) and \(w^0_{\mathsf {gate}}\).

2.2 Our Optimizations

Below we sketch the intuition behind our optimization ideas of LEGO protocols.

XOR-Homomorphic Interactive Hash. XOR-homomorphic Interactive Hash (IHash) is a cryptographic protocol involving two participants, which we call the sender and the receiver, respectively. The design of IHash is directly motivated by the security goals of wire-soldering:

  1. 1.

    Binding. Every i-hash of a secret message uniquely identifies the message with all but a negligible probability, so that the message holder cannot modify a secret message once its i-hash is sent.

  2. 2.

    Hiding. The i-hash receiver does not learn any extra information about the secret message other than the i-hash itself. For a uniform-randomly sampled message, it is guaranteed that certain entropy remains after its i-hash is sent because by definition an i-hash needs to be shorter than the original message.

  3. 3.

    XOR-Homomorphism. Given the i-hashes of two messages \(m_1\) and \(m_2\), the receiver can locally compute the i-hash of \(m_1\oplus m_2\). This enables the receiver (circuit evaluator) to solder wire-labels from independently garbled gates using a verifiable label-difference supplied by the circuit generator.

Unlike a commitment scheme which requires the committer’s cooperation to match messages with commitments, IHash allows the receiver alone to verify if any message matches with an i-hash (like with traditional hashes). In addition, a commitment hides every bit of its message whereas i-hashes allow leaking arbitrary information about its message through the i-hash itself, up to the length of the i-hash. Nevertheless, we find that this somewhat weaker primitive suffices to solder wires in LEGO protocols.

Fig. 2.
figure 2

Interactive Hash based on OT and Reed-Solomon code

Figure 2 illustrates our construction of the XOR-homomorphic IHash scheme. The high-level idea is to let the IHash sender encode his/her secret message m using a \([n, \ell ,n-\ell +1]_{2^\sigma }\) Reed-Solomon code and let the receiver secretly watch (soon we will detail how to watch secretly) w of the n symbols in \(\mathsf {Encode}(m)\). Recall that a \([n, \ell ,n-\ell +1]_{2^\sigma }\)-code is one that takes in an \(\ell \)-symbol message and outputs an n-symbol encoding (where symbols are of \(\sigma \)-bit) so that the minimal distance between the codewords is \(n-\ell +1\) symbols. The w watched symbols are the i-hash of m. The receiver can verify a particular message matches with its i-hash by encoding the message and making sure all values at the w watched positions on the encoding coincide with the i-hash it holds. As a result, with respect to binding, if the sender forges a message \(m'\), at least \(n-\ell +1\) symbols in the codewords have to be different, hence, the i-hash receiver can detect the forgery with all but probability. With respect to hiding, although the i-hash reveals \(w\sigma \) bits entropy in m to the receiver, if the original message has \(\ell \sigma \) bits entropy, then the rest \((\ell -w)\sigma \) bits entropy remains perfectly hidden to the receiver. Therefore, to guarantee hiding, we can set \(\ell \) sufficiently large based on the security parameter. Finally, the additive (i.e., XOR) homomorphic property of i-hashes is inherent in the linearity of Reed-Solomon codes.

The “secret watch” above can be realized by a w-out-of-n oblivious transfer protocol. Moreover, we only need to invoke this oblivious transfer once. The key idea is to let the sender pick n random seeds and obliviously transfer w seeds of the receiver’s choice. Later, the sender sends correction messages \(\mathsf {Encode}(m)_i\oplus \mathsf {PRG}( seed _i)\) to the receiver where \(\mathsf {Encode}(m)_i\) is the \(i^\mathrm {th}\) symbol of m’s encoding and \(i\in \{1,\dots ,n\}\). Thus, learning \( seed _i\) allows the receiver to see the corresponding symbols in m’s codeword. We also notice that the input-wire labels of all garbled gates are uniformly random. Therefore, setting the \(i^\mathrm {th}\) symbol of the \(j^\mathrm {th}\) wire-label to be \(\mathsf {PRG}( seed _i,j)\) where \(i\in \{1,\dots ,\ell \}\) while using a systematic code will reduce the work to only send the corrections on the last \(n-\ell \) symbols, i.e., \(m_i\oplus \mathsf {PRG}( seed _i,j)\) for \(i\in \{\ell +1,\dots ,n\}\).

Fast Wire-Soldering. Wire-soldering is one of the most challenging efficiency barriers in LEGO protocols. Recall that \(P_1\) garbles all the AND gates independently. Thus, in the circuit evaluation phase where B random AND gates are grouped into a bucket to evaluate a logical AND gate, \(P_2\) needs to “translate” an input wire-label of a bucket to its corresponding input wire-label on a garbled gate in the bucket. To this end, we require, at the garbling stage, that, for every garbled gate, \(P_1\) i-hash one wire-label on every wire of the garbled gate to \(P_2\); and, at the gate evaluation stage, that \(P_1\) send the xor-differences between every pair of the source and target wire-labels. The validity of the xor-differences can be verified against their i-hashes. Note that even if an i-hash leaks entropy, we can increase the length of the wire-labels to ensure enough entropy remain in the labels to guarantee the needed computational security.

Moreover, for benefits that will be clear soon, we require \(P_1\) also to i-hash the global \(\varDelta \) (required by the free-XOR technique) to \(P_2\). Recall that all the wire-labels at the bucket level also need to be i-hashed to \(P_2\). To prevent \(P_2\) from learning logical values of the intermediate wire-labels, \(P_1\) will i-hash either the 0-label or the 1-label of each wire with equal probability. Without extra treatment, however, this will allow a malicious \(P_1\) to surreptitiously flip a wire-label’s logical value. We fix this issue by adding a random permutation message \(\rho \) to each wire and use \(\rho \)’s parity bit to bind the plaintext bit the i-hashed wire-label represents. For integrity of \(\rho \), we require \(P_1\) to i-hash \(\rho \) to \(P_2\) so that \(P_2\) can verify the \(\rho \) values of each wire at the garbled gate checking stage. We stress that the value of \(\rho \) for all intermediate wires will never be revealed.

To achieve fast wire-soldering for practically efficient LEGO protocols, we found the following two optimizations indispensable.

  1. 1.

    The Reed-Solomon encoding process can be viewed as multiplying the public encoding matrix \(A_{n\times \ell }=[a_{i,j}]_{1\le i \le n, 1\le j \le \ell }\) with the message vector \(\varvec{m}_{\ell \times 1}=[m_1,\dots ,m_\ell ]\) where \(m_i\)’s are \(\sigma \)-bit symbols. To ensure security of LEGO protocols, the \((n, \ell , \sigma )\) values would be \((n_w, \ell _w, \sigma _w)=(86, 32, 8)\) for wire-labels and \((n_p, \ell _p, \sigma _p)=(44, 20, 6)\) for permutation messages. A naïve implementation of the encoding process will require more than 2700 Galois Field (GF) multiplications per wire-label and 900 GF multiplications per permutation message, which amounts to more than 10K multiplications per garbled gate. Even if field multiplications are realized as table-lookups, 10K memory accesses per gate is already 40\(\times \) slower than AESNI-based garbling itself, making LEGO approach noncompetitive in practice.

    Our key idea to speedup encoding process is to pack many symbols into operands (e.g., \({\texttt {\_\,\_m128, \_\,\_m256, \_\,\_m512}}\)) of vector instructions and leverage Intel Intrinsic instructions [1] to enable efficient message encoding. Below we illustrate the idea with an example where \(n=96, \ell =32, \sigma =8\)), i.e., encoding a 32-symbol message \(\varvec{m}\) into a 96-symbol codeword where symbols are of 8-bit. First, we use a systematic code so that it suffices to compute the last \(n-\ell =64\) symbols of the codeword since the first \(\ell =32\) symbols are identical to the original message. Thus, we can restrict our attention to the last 64 rows of the encoding matrix A, call it \(A'\). Let \(\varvec{a}_{\cdot ,1}=[a_{33,1},\dots ,a_{96,1}]^T\) be the first column of the matrix \(A'\) and we can store the 64 symbols of \(\varvec{a}_{\cdot ,1}\) in a single \({\texttt {\_\,\_m512}}\) register. Let \(m_i\) be the \(i^{ th }\) symbol of \(\varvec{m}\). The last 64 symbols in the encoding of \(\varvec{m}\) can be computed as \(\varvec{c}=\sum _{i=1}^{\ell }m_i \varvec{a}_{\cdot ,i}=\sum _{i=1}^{\ell }[m_ia_{33,i},\dots ,m_ia_{96,i}]^T\). Since \(m_i\in \mathrm {GF}(2^8)\) and all column vectors \(\varvec{a}_{\cdot ,i}\) are publicly fixed, each \(m_i \varvec{a}_{\cdot ,i}\) can thus be efficiently derived with a single lookup into a table of 256 entries of \({\texttt {\_\,\_m512}}\) values and the sum can be computed with \(\ell -1=31\) \({\texttt {\_\,\_mm512\_xor}}\) instructions. This optimization would reduce \(32\times 96=3072\) field multiplications per encoding down to just 32 memory reads, about a hundredth cost of the implementation based on naïve table-lookups.

  2. 2.

    We observe that, although the hiding of i-hashes for wire-labels is computational (since an adversary could use the garbled truth table to search the “right” label offline), the hiding on the permutation message \(\rho \) is perfect because no additional constraints are provided to allow offline search for the permutation bit (i.e., the parity of \(\rho \)). Thus, it suffices to require only 1-bit of entropy remain in \(\rho \) after i-hashing each \(\rho \) for perfectly hiding which wire-labels on a wire was i-hashed. This observation allows us to select much more efficient parameters to i-hash \(\rho \), i.e., \((n_p, \ell _p, \sigma _p)=(44, 20, 6)\) as opposed to (86, 32, 8) for i-hashing wire-labels.

Increased Faulty Gate Detection Rate. In existing protocols [11, 12, 37], a faulty gate being checked will only be detected with probability 1/4. This is because the garbled gates are produced with respect to the global secret \(\varDelta \) (the xor-difference between a 0-label and its 1-label) required by the free-XOR technique, hence only one out of the four garbled rows of a binary gate can be opened for checking. In contrast, our protocol allows a faulty gate to be detected with probability 1/2. We achieve this by integrating the Half-Gate garbling technique [47] which requires only two garbled rows per gate Since each check opens one of the two garbled rows, this allows to detect a faulty gate with probability 50% without revealing \(\varDelta \), which is formally proved in Lemma 5.3.

To detect faulty gates with 100% probability, the idea is to allow fully open a garbled gate at gate-checking time. This requires garbling each gate with respect to a freshly sampled \(\varDelta \) and sending the i-hash of this \(\varDelta \) with the gate. To solder two wires garbled with different \(\varDelta \), additional verifiable XOR messages are also needed. We detail this special soldering procedure at the end of Sect. 5.1.

Dealing with Faulty Gates Used for Evaluation. Our protocol is able to guarantee security as long as a single correctly garbled gate exists in every bucket. This improvement is due to a combination of the IHash and the free-XOR techniques. Denote the i-hash of a message \(m\in \{0,1\}^*\) by \({\left\langle m\right\rangle }\). We let the circuit evaluator to learn \({\left\langle \varDelta \right\rangle }\) where \(\varDelta \) is the global secret. On each wire, the evaluator also learns an i-hash \({\left\langle w\right\rangle }\) where w defines either the 0-label or the 1-label on that wire (but the evaluator doesn’t know which). Recall that the evaluator can locally verify the validity of a wire-label using the IHash’s \(\mathsf {Verify}\) algorithm. Therefore, if at least one gate in a bucket is good, evaluating all the garbled gates in the bucket will give one or more valid output wire-labels. When translating these wire-labels to the bucket output wire-label, one of the following two cases has to happen:

  1. 1.

    They all match with the same i-hash, either \({\left\langle w\right\rangle }\) or \({\left\langle w\right\rangle }\oplus {\left\langle \varDelta \right\rangle }\). Since all valid wire-labels are consistent on the plaintext bit they represent and one of them is known to be correct, the evaluator can directly proceed with this valid wire-label to evaluate the subsequent buckets.

  2. 2.

    Some of them translate to \({\left\langle w\right\rangle }\) whereas others translate to \({\left\langle w\right\rangle }\oplus {\left\langle \varDelta \right\rangle }\). In this case, the evaluator can simply xor the two valid labels to recover \(\varDelta \). Once \(\varDelta \) is known, the evaluator can use it to recover the circuit generator’s private input x and locally computes and outputs f(xy).

Hence, in either case our protocol can be proved secure.

Fig. 3.
figure 3

Garbling with 256-bit wire-labels. AES (\(\cdot \)) denotes calling AES with a fixed, publicly-known key. \(\mathsf {Compress}(m)\) essentially computes \(A'm\) where \(A'\) is a rank-16, \(16\times 32\) matrix over GF\((2^8)\). \(A'\) is randomly picked by the circuit generator after the evaluator chose its watch symbols.

Entropy Extraction for Efficient Garbling. With fixed-key AESNI instructions, the state-of-the-art garbling technique is able to produce 20 million garbled rows per second, which is about 10\(\times \) faster than a SHA256-based garbling scheme [4, 33, 47]. However, the wire-labels in our protocol need to be longer than 128 bits to ensure enough entropy (e.g., more than 80 bits) remains even part of a wire-label is leaked to the evaluator through its i-hash. Although it is straightforward to use SHA256 to implement garbling to accommodate longer wire-labels in the random oracle model, a priori, it is unclear how this can be efficiently realized only assuming fixed-key AES is an ideal cipher.

Our intuition is to compress a longer wire-label down to a 128-bit label while preserving as much entropy as possible, and then run existing fixed-key AES-based garbling with the compressed labels. As a concrete example, wire-labels in our protocol are 256-bit (i.e. 32 8-bit symbols). During i-hashing, a 32-symbol wire-label will be encoded into an 86-symbol codeword; and the evaluator randomly picks 21 of the 86 symbols in the codewords to watch. Since the “watch” reveals \(8 \times 21=168\) bits entropy, 88 bits of entropy remains in each wire-label. To compress a 256-bit wire-label m to a 128-bit \(m'\) while carrying over the entropy, our strategy is to randomly sample a \(16\times 32\), rank-16 matrix \(A'=[a'_{i,j}]\) where \(a'_{i,j}\in \mathrm {GF}(2^8)\) and compute \(m'=A'm\) (in other words, \(A'\) represents a set of 16 linearly independent row-vectors in the vector space \(\mathrm {GF}(2^8)^{32}\)). Note that \(A'\) is sampled only after the evaluator has chosen its watched symbols. Intuitively, this compression preserves entropy because the chances are extremely low that any one of the 16 row-vectors happens to be a linear combination of a set of 21 row-vectors of \(\mathrm {GF}(2^8)^{32}\) that are picked randomly and independently by the evaluator. We present a formal analysis of the entropy loss in Sect. 6.2.

Given this \(\mathsf {Compress}\) algorithm (i.e., essentially a matrix multiplication as described above), we can formalize our garbling scheme based on that of Half-Gates [47]. The main difference lies in the function H. Our H, specified in Fig. 3, maps \(\{0,1\}^{256}\times \mathbb {N}\) to \(\{0,1\}^{256}\) and involves two calls to a fixed-key AES cipher to produce a 256-bit pseudorandom mask to encrypt a longer output wire-label.

2.3 Related Work

TinyLEGO. As an independent and concurrent work, TinyLEGO [12, 37] explored ways to improve LEGO protocols in the single-execution setting. To save on wire-soldering, Duplo [24] also explored checking multi-gate gadgets. However, our approach is different in several aspects:

  1. 1.

    To solder the wires, previous work used additive homomorphic commitment, whereas we propose the notion of IHash and give a highly efficient construction of IHash using PRG, Reed-Solomon code and Intel Intrinsics [1]. We show that, despite IHash being leakier than homomorphic commitments, it suffices the purpose of constructing efficient LEGO protocols. Our construction of IHash shares some similarity with that of XOR-Homomorphic commitments in [13]. However, the two schemes differ in the way OTs are used, the selection of error correcting codes, and the way to pick critical protocol parameters.

  2. 2.

    TinyLEGO involves cut-and-choosing two types of garbled gadgets (i.e., ANDs and wire-authenticators) and requires correct majority in the total number of garbled gadgets in each bucket, whereas our protocol only uses garbled ANDs and the security holds as long as a single correctly garbled AND exists in each bucket. In addition, the faulty gate detection rate in our protocol is twice of that in TinyLEGO.

  3. 3.

    Because of our optimizations, our protocol can run more than 2x faster in a LAN and be highly competitive over a WAN. However, their protocols are about 20–50 more efficient in bandwidth thus would be advantageous in some bandwidth-stringent network environments. See Sect. 7 for detailed performance comparisons.

NNOB [35] and SPDZ [10]. Both NNOB and SPDZ require a linear number of rounds and expensive pre-processing, our protocol has only a small constant rounds and lightweight setup. Thus, ours performs better when the network latency cannot be ignored, or when resources are limited in the preparation phase.

Lindell-Riva [29] and Rindal-Rosulek [41]. In the offline/online setting, Lindell-Riva and Rindal-Rosulek provided very efficient prototypes of secure computation protocols. Although the high-level idea of cut-and-choose resembles that of LEGO, their results are not applicable if only one (or just very few) executions is needed. For example, with [29], one AES can be computed in about 74 ms amortized time assuming a 75,000 ms offline delay is acceptable for preparing 1024 executions. Since wire-soldering is much less of an issue, their technique would be far less efficient when carried out to cut-and-choose individual-gates.

Wang-Malozemoff-Katz. Wang et al. [42] recently designed and implemented by far the most efficient SingleCut secure computation protocol. Thanks to a (mostly) symmetric-key cryptography based garbler’s input consistency enforcement mechanism and their careful use of SSE instructions for preventing selective failure attacks, a single AES instance can be computed in 65 ms (with input/output wires processed at roughly 20 \(\upmu \)s per wire). However, even compared with their optimized protocol, processing input/output wires in our protocol can still be much faster, hence will be competitive in computing shallow circuits with many input/output wires (see Sect. 7 for detailed performance comparisons). Moreover, due to the advantages of LEGO protocols in supporting actively secure RAM-based secure computations, it would be interesting future work to develop better homomorphic hash constructions and plug it into our framework to obtain improved LEGO protocols.

Wang-Ranellucci-Katz. Wang et al. [43] proposed a secure two-party computation scheme based on collaborative garbling over authenticated multiplicative triples. Their approach would be advantageous in speed when compared to state-of-the-art LEGO protocols [37]. However, ideas of this work can be extended to allow fully open a garbled gate at the verification stage and applied at a circuit-level to produce protocols that are more efficient in bandwidth. Hence, our framework is still interesting in designing efficient protocols in low-bandwidth settings.

Parallelism. Noting the embarrassingly parallelizable nature of the protocols in this domain (including ours), we follow the convention of many existing works [42, 47] to restrict our attention to the single-threaded model and treat computation as an energy-consuming scarce resource.

3 Preliminaries

3.1 Oblivious Transfer

We use 1-out-of-2 oblivious transfers to send wire labels corresponding to the evaluator’s input, and two k-out-of-n oblivious transfers for wire soldering. A k-out-of-n oblivious transfer protocol takes n messages \(m_1,\dots ,m_n\) from the sender and a set of k indices \(i_1,\dots ,i_k\in [1,n]\) from the receiver, and outputs nothing but the k messages \(m_{i_1},\dots ,m_{i_k}\) to the receiver. Composably secure 1-out-of-2 OT can be efficiently instantiated from dual-mode cryptosystems [39] and efficiently extended [2, 23] from inexpensive symmetric operations plus a constant number of base OTs. Camenisch et al. [8] proposed an efficient and simulatable k-out-of-n OT in the Random Oracle Model.

3.2 Garbled Circuits

First proposed by Yao [46], garbled circuits were later formalized as a cryptographic primitive of its own interest [5]. Bellare et al. have carved out three security notions for garbling: privacy, obliviousness, and authenticity. We refer readers to their paper for the formal definitions. In the past few years, many optimizations have been proposed to improve various aspects of garbled circuits, such as bandwidth [40, 47], evaluator’s computation [40], memory consumption [19], and using dedicated hardware [5]. Our protocol leverages Half-Gates garbling recently proposed by Zahur et al. [47] which offers the simulation-based definition of privacy, obliviousness, and authenticity under a circular correlation robustness assumption of the hash function H. We summarize their garbling algorithms \(\mathsf{\small GenAND}\) and \(\mathsf{\small EvlAND}\) in Fig. 3.

More formally, a garbling scheme \(\mathcal {G}\) is a 5-tuple \((\mathsf {Gb},\mathsf {En},\mathsf {Ev},\mathsf {De}, f)\) of algorithms, where \(\mathsf {Gb}\) is an efficient randomized garbler that, on input \((1^k,f)\), outputs (Fed); \(\mathsf {En}\) is an encoder that, on input (ex), outputs X; \(\mathsf {Ev}\) is an evaluator that, on input (FX), outputs Y; \(\mathsf {De}\) is a decoder that, on input (dY), outputs y. The correctness of \(\mathcal {G}\) requires that for every \((F,e,d)\leftarrow \mathsf {Gb}(1^k,f)\) and every x,

$$\begin{aligned} \mathsf {De}(d, \mathsf {Ev}(F,\mathsf {En}(e,x))) = f(x). \end{aligned}$$

Let \(\varPhi \) be a prefixed function modeling the acceptable information leak and “\(\approx \)” symbolizes computational indistinguishability. Privacy of \(\mathcal {G}\) implies that there exists an efficient simulator \(\mathcal {S}\) such that for every x,

$$\begin{aligned} \left\{ (F,X,d) : \begin{array}{l} (F,e,d)\leftarrow \mathsf {Gb}(1^k,f),\\ X\leftarrow \mathsf {En}(e,x). \end{array}\right\} \approx \{\mathcal {S}(1^k, f, \varPhi (f))\}. \end{aligned}$$

Obliviousness of \(\mathcal {G}\) implies that there exists an efficient simulator \(\mathcal {S}\) such that for every x,

$$\begin{aligned} \{(F,e,d)\leftarrow \mathsf {Gb}(1^k,f), X\leftarrow \mathsf {En}(e,x):(F,X)\}\approx \{\mathcal {S}(1^k,f)\}. \end{aligned}$$

4 Homomorphic Interactive Hash

In this section, we describe XOR-homomorphic interactive hash, a new primitive that enables multiple enhancements in our LEGO protocol.

4.1 Definition

It involves two parties, known as the sender (\(P_1\)) and the receiver (\(P_2\)), to compute an interactive hash (i-hash) while the receiver can locally verify a message against an i-hash that it holds. The ideal functionality of \({\mathcal{F}_{\textsc {XorIHash}}}\) is described in Fig. 4, where \(\mathsf {Hash}\) is an efficient two-party probabilistic algorithm that takes a message m from \(P_1\) and outputs an i-hash of m (denoted as \({\left\langle m\right\rangle }\)) to \(P_2\) without revealing any additional information to either party; and \(\mathsf {Verify}\) is an efficient algorithm (locally computable by \(P_2\)) that takes an i-hash \({\left\langle m\right\rangle }\) and a message \(m'\) and outputs a bit b indicating whether \(m=m'\). Like conventional hashes, we require \(|{\left\langle m\right\rangle }|<|m|\) and that for any two distinct messages \(m_1\) and \(m_2\), \({\left\langle m_1\right\rangle }\not ={\left\langle m_2\right\rangle }\) except for a negligible probability. Finally, we require the hashes to be XOR-homomorphic, i.e., \({\left\langle m_1\right\rangle }\oplus {\left\langle m_2\right\rangle }={\left\langle m_1\oplus m_2\right\rangle }\).

Fig. 4.
figure 4

Ideal XOR-homomorphic interactive hashes. (\(P_1\) is the hash sender and \(P_2\) the hash receiver. “Send a delayed output x to party P” reflects a standard treatment of fairness, i.e., “send (x, P) to the adversary; when receiving ok from the adversary, output x to P.”)

Note that Fig. 4 actually describes a family of ideal functionalities for \({\mathcal{F}_{\textsc {XorIHash}}}\), as it leaves the exact definition of \({\left\langle m_i\right\rangle }\) unspecified (other than requiring \(|{\left\langle m_i\right\rangle }|<|m_i|\)). Along with a specific definition of \({\left\langle m_i\right\rangle }\), the ideal functionality defined in Fig. 4 will yield a concrete XOR-homomorphic interactive hash scheme. For example, our construction given in Sect. 4.2 realize a concrete version of IHash in which \({\left\langle m_i\right\rangle }\) is defined as \(m'_i * v\) where \(m'_i\) is the Reed-Solomon encoding of \(m_i\), v is a binary vector supplied by \(P_2\) containing exact w 1-bits and ’\(*\)’ denotes pair-wise multiplication of two equal-length vectors but leaving out all entries (in the product vector) corresponding to the 0-entries in v.

Interactive hash offers certain “hiding” and “binding” properties. That is, with all but negligible probability, the receiver of \({\left\langle m\right\rangle }\) learns nothing about m except for what can be efficiently computed from \({\left\langle m\right\rangle }\); and the sender of \({\left\langle m\right\rangle }\) can’t claim a different message \(m'\) to be the preimage of \({\left\langle m\right\rangle }\). However, unlike cryptographic commitments, with IHash, (1) some entropy in m can be leaked to the receiver yet the rest remains; (2) the message owner can’t compute the hash on its own; and (3) the hash receiver can verify on its own whether a message matches with a hash.

4.2 Construction

Figure 2 illustrates the high-level idea behind our construction. Let \(\mathrm {OT}^{w}_{n}\) be an ideal functionality for a w-out-of-n oblivious transfer. \(\mathsf {Encode}_{\ell ,n,d}(\cdot )\) denotes the encoding algorithm of \([n,\ell ,n-\ell +1]_{2^{\sigma }}\) Reed-Solomon systematic code, i.e., (over \(\sigma \)-bit symbols) \(\ell \)-symbol messages are encoded into n-symbol codewords with minimal distance of \(n-\ell +1\) symbols. Let \(\mathsf {PRG}\) be a pseudorandom generator and sk are the statistical and computational security parameters.

To allow the receiver to obliviously watch the set of w positions on every message’s n-symbol codeword without invoking an OT instance per message, we let the sender generate n secret seeds and call only once a w-out-of-n OT to allow the receiver learn w of these seeds (Step 1 of \(\mathsf {IHash}_{}\mathsf {.Setup}\)). These seeds are then used as keys to a \(\mathsf {PRG}\) to create n rows of pseudorandom symbols, of which the receiver is able to recover w rows. When a message m is ready to be i-hashed, the sender simply encodes m and sends the xor-difference between m’s codeword and the next column of n pseudorandom symbols generated from the seeds (Step 1b of \(\mathsf {IHash}_{}\mathsf {.Hash}\)) so that the receiver can record the symbols for which it watched the corresponding keys.

To obtain active-security, our protocol actually generates \(n+\xi \) i-hashes when i-hashing n messages, then uses the extra \(\xi \) i-hashes to verify that the sender followed the protocol honestly (Step 2 of \(\mathsf {IHash}_{}\mathsf {.Hash}\)). In our protocol, we set to bound the failure probability of our simulator \(\mathcal {S}\) in the security proof of Theorem 4.1 by \(2^{-s}\).

The detailed construction steps are as follows.

  • \(\mathsf {IHash}_{\ell ,\sigma }\mathsf {.Setup}(\{ seed _1,\dots , seed _n\};\ \{i_1,\dots , i_w\})\)

    Note \(\{ seed _1,\dots , seed _n\}\) is \(P_1\)’s secret input and \(\{i_1,\dots , i_w\}\) is \(P_2\)’s secret inputs.

    1. 1.

      \(P_1\) and \(P_2\) run \(\mathrm {OT}^{w}_{n}\) where \(P_1\) is the sender with inputs \( seed _1, \dots , seed _n\), and \(P_2\) is the receiver with inputs \(i_1, \dots , i_w\). At the end of this step, \(P_2\) learns \( seed _{i_1},\dots , seed _{i_w}\).

  • \(\mathsf {IHash}_{\ell ,\sigma }\mathsf {.Hash}(\varvec{m}_1,\dots ,\varvec{m}_\nu )\)

    1. 1.

      For \(t=1,\dots , \nu + \xi \),

      1. (a)

        For \(1\le i\le \ell \), \(P_1\) computes \(x_i=\mathsf {PRG}( seed _i, t)\), where \(x_i \in \{0,1\}^\sigma \), and sets \(\varvec{m'}_t:=x_1\Vert \dots \Vert x_\ell \).

      2. (b)

        For \(\ell < i\le n\), \(P_1\) sends to \(P_2\)

        $$\begin{aligned} x'_i :=\mathsf {PRG}( seed _i, t) \oplus \mathsf {Encode}(\varvec{m'}_t)[i] \end{aligned}$$

        where \(\mathsf {Encode}(\varvec{m'}_t)[i]\) denotes the \(i^{ th }\) symbol of \(\varvec{m'}_t\)’s systematic codeword.

      3. (c)

        \(\forall i\in \{i_1, \dots ,i_w\}\), \(P_2\) computes

        Then \(P_2\) sets \({\left\langle \varvec{m'}_t\right\rangle } = (x_{i_1},\dots , x_{i_w})\).

    2. 2.

      For \(t=1,\dots , \xi \),

      1. (a)

        \(P_2\) randomly picks \(y\leftarrow \{0,1\}^{\sigma \cdot \nu }\) and sends it to \(P_1\).

      2. (b)

        \(P_1\) sends \(\hat{\varvec{m'}}_t:=\sum _{i=1}^{\nu } y_i\varvec{m'}_i+\varvec{m'}_{\nu +t}\) where \(y_i\in \{0,1\}^\sigma \) to \(P_2\).

      3. (c)

        \(P_2\) runs \(\displaystyle \mathsf {IHash}_{\ell ,\sigma }\mathsf {.Verify}\left( \sum _{i=1}^{\nu } y_i{\left\langle \varvec{m'}_i\right\rangle }+{\left\langle \varvec{m'}_{\nu +t}\right\rangle }, \hat{\varvec{m'}}_t\right) \) where \(y_i\in \{0,1\}^\sigma \) and aborts if it fails.

    3. 3.

      For \(i=1, \dots , \nu \), \(P_1\) sends \(\varvec{x}_i:=\varvec{m}_i\oplus \varvec{m'}_i\) to \(P_2\), who then computes

      where means \(a_1\Vert \cdots \Vert a_n\) and \(\mathsf {Encode}(\varvec{x}_t)[i]\) denotes the \(i^{ th }\) symbol of \(\varvec{x}_t\)’s codeword. \(P_1\) outputs nothing and \(P_2\) outputs \({\left\langle \varvec{m}_1\right\rangle },\dots ,{\left\langle \varvec{m}_\nu \right\rangle }\).

  • \(\mathsf {IHash}_{\ell ,\sigma }\mathsf {.Verify}\left( {\left\langle \varvec{m}_1\right\rangle },\dots ,{\left\langle \varvec{m}_\nu \right\rangle }, \bigoplus _{i=1}^\nu \varvec{m}'_i\right) \)

    1. 1.

      \(P_2\) computes \({\left\langle \varvec{m}\right\rangle }:=\bigoplus _{i=1}^t{\left\langle \varvec{m}_i\right\rangle }\) and let \(\varvec{m}'=\bigoplus _{i=1}^\nu \varvec{m}'_i\).

    2. 2.

      \(P_2\) parses \({\left\langle \varvec{m}\right\rangle }\) into \((x_{i_1},\dots ,x_{i_w})\in \{0,1\}^{\sigma \cdot w}\) and returns 1 if for all \(i\in \{i_1,\dots ,i_w\}\), \(\mathsf {Encode}(\varvec{m}')[i] = x_i\); and 0, otherwise.

    • (Setting \(t=1\) allows \(P_2\) to verify any single messages.)

Optimization. If the goal is only to i-hash random messages as it is used in our main protocol, it suffices to treat the \(\varvec{m'}_i\)s (generated by calling \(\mathsf {PRG}\) in Step 1a) as the random messages to i-hash, hence no need to send the first \(\ell \) symbols of \(\varvec{x}_i\) in Step 3 (where \(\varvec{x}_i\) is the xor-differences between an input message \(\varvec{m}_i\) and a random message \(\varvec{m'}_i\)), saving \(\ell \sigma \) bits per i-hashed message.

4.3 Proof of Security

Theorem 4.1

Assuming there exists a secure OT, the protocol described in Sect. 4.2 securely realizes an XOR-homomorphic interactive hash.

Due to page limit, we move the proof to Appendix A.1 of the full paper [48].

Our proof uses two lemmas that we state below but prove in Appendices A.2 and A.3 of the full paper [48].

Lemma 4.2

If \(\exists i\in \{1,\dots ,k\}\) such that \(\varvec{m}'_i\not =\varvec{m}_i\), then Step 2 of \(\mathsf {IHash}_{}\mathsf {.Hash}\) has to abort except with \(2^{-s}\) probability.

Lemma 4.3

Let \(\mathbf {H}_{min}\) be the min-entropy function. For \(\mathbf {H}_{min}(\varvec{m})=\sigma \cdot \ell \) where \(\varvec{m}\in \{0,1\}^{\sigma \cdot \ell }\) and \({\left\langle \varvec{m}\right\rangle }\in \{0,1\}^{\sigma \cdot w}\),

  1. 1.

    \(\mathbf {H}_{min}(\varvec{m}|{\left\langle \varvec{m}\right\rangle })=(\ell -w)\sigma \). I.e., \((\ell -w)\sigma \) entropy remains even if \(P_2\) learns \({\left\langle \varvec{m}\right\rangle }\).

  2. 2.

    For every \(\varvec{m}_1\) and \(\varvec{m}_2\) where \(\varvec{m}_1\not =\varvec{m}_2\),

5 The Main Protocol

5.1 Protocol Description

Assume \(P_1\) (the generator) and \(P_2\) (the evaluator) wish to compute f over secret inputs xy, where f is realized as a boolean circuit C that has only AND and XOR gates. The protocol proceeds as follows.

  1. 0.

    Setup. The parties decide the public parameters \(\ell _w, \sigma _w, \ell _p, \sigma _p\) from the security parameters sk (see Sects. 6.1 and 6.2 for the detailed discussion).

    1. (a)

      \(P_1\) randomly picks \( seed _1,\dots , seed _{n_w}\); \(P_2\) randomly picks \(i_1,\dots ,i_{n_w}\). \(P_1\) (as the sender using \( seed _1,\dots , seed _{n_w}\)) and \(P_2\) (as the receiver using \(i_1,\dots ,i_{n_w}\)) run \(\mathsf {IHash}_{\ell _w,\sigma _w}\mathsf {.Setup}\) to initialize the IHash scheme for i-hashing wire-labels.

    2. (b)

      \(P_1\) randomly picks \(\varDelta \in \{0,1\}^{\lambda _w}\) where \(\lambda _w=\ell _w\sigma _w\) and calls \(\mathsf {IHash}_{\ell _w,\sigma _w}\mathsf {.Hash}\) to send \({\left\langle \varDelta \right\rangle }\) to \(P_2\).

    3. (c)

      \(P_1\) (using seeds \(H(\varDelta ,1),\dots ,H(\varDelta ,n_p)\) where H is a random oracle) and \(P_2\) (using freshly sampled indices \(i'_1,\dots ,i'_{n_p}\)) run \(\mathsf {IHash}_{\ell _p,\sigma _p}\mathsf {.Setup}\) to initialize the IHash scheme for wire permutation strings.

    4. (d)

      \(P_1\) sends \(H(H(\varDelta ,1)),\dots ,H(H(\varDelta ,n_p))\) to \(P_2\).

Then, \(P_1\) randomly select 16 linearly-independent vectors \(a_{1},\dots ,a_{16}\) from \(\mathrm {GF}(8)^{\ell _w}\), which will be row vectors of the matrix to be left multiplied with a wire-label to realize the \(\mathsf {Compress}\) function (compressing a 256-bit wire-label into 128-bit, see Fig. 3). \(P_1\) sends \(a_{1},\dots ,a_{16}\) to \(P_2\).

Fig. 5.
figure 5

A bucket of B garbled gates. (Wire labels and hashes exist at both the bucket-level (e.g. \(w^{p_l}_l\)) and gate-level (e.g. \(w^{p_{i,l}}_l\)).)

  1. 1.

    Circuit Initialization. Let \(n_w\) be the total number of wires in C. \(P_1\) picks \(m_1,\dots ,m_{n_w}\in \{0,1\}^{\lambda _w}\) where \(\lambda _w=\ell _w\sigma _w\); then run \(\mathsf {IHash}_{\ell _w,\sigma _w}\mathsf {.Hash}\) with \(P_2\) to send \({\left\langle m_1\right\rangle },\dots ,{\left\langle m_{n_w}\right\rangle }\) to \(P_2\). Then, \(P_1\) samples \(\rho ^1,\dots ,\rho ^{n_w}\in \{0,1\}^{\lambda _p}\) where \(\lambda _p=\ell _p\sigma _p\), then run \(\mathsf {IHash}_{\ell _p,\sigma _p}\mathsf {.Hash}\) with \(P_2\) to send \({\left\langle \rho ^{i_1}\right\rangle },\dots ,{\left\langle \rho ^{i_w}\right\rangle }\) to \(P_2\). For all \(1\le i\le n_w\), \(P_1\) sets \(p_i=\rho ^i_{1}\oplus \cdots \oplus \rho ^i_{\lambda _p}\) (where \(\rho ^i_{j}\) denotes the j-th bit of \(\rho ^i\)) and \(w_i^0:=m_i\oplus p_i\varDelta \). Let \(w_i^1=m_i\oplus \bar{p}_i\varDelta \), hence \(m_i=w_i^{p_i}\). Then, \(P_1\) and \(P_2\) process the initial input-wires as follows.

    1. (a)

      For \(1\le i\le n_I^{P_1}\), let \((w_i^0, w_i^1)\) be the pair of wire labels on the wire associated with \(x_i\), \(P_1\) sends \(w^{x_i}_i\) to \(P_2\).

    2. (b)

      For every input-wire \(W_i\) associated with \(P_2\)’s private input \(y_i\):

      1. i.

        \(W_i\) is \(\oplus \)-split into s wires \(W_{i,1},\dots ,W_{i,s}\).

      2. ii.

        \(P_1\) picks \(m_1, \dots , m_s\) and run \(\mathsf {IHash}_{\ell _w, \sigma _w}\mathsf {.Hash}\) with \(P_2\) to send \({\left\langle m_1\right\rangle }, \dots , {\left\langle m_s\right\rangle }\) to \(P_2\). For \(1\le j\le s\), \(P_1\) sets \(w_{i,j}^0=m_j\) and \(w_{i,j}^1=w_{i,j}^0\oplus \varDelta \).

      3. iii.

        \(P_2\) samples \(y_{i,1}\leftarrow \{0,1\},\dots ,y_{i,s}\leftarrow \{0,1\}\) such that \(y_{i,1}\oplus \cdots \oplus y_{i,s}=y_i\).

      4. iv.

        For \(1\le j\le s\), \(P_2\) retrieves \(w^{y_{i,j}}_{i,j}\) from \(P_1\) through oblivious transfer, and verifies \(w^{y_{i,j}}_{i,j}\) against \({\left\langle w^{y_{i,j}}_{i,j}\right\rangle }\) (note \(P_2\) can compute \({\left\langle w^{y_{i,j}}_{i,j}\right\rangle }:={\left\langle w^0_{i,j}\right\rangle }\oplus y_{i,j}{\left\langle \varDelta \right\rangle }\)). Any verification failure will result in \(P_2\)’s delayed abort at Step 5.

      5. v.

        \(P_2\) sets \(w^{y_i}_i:=w^{y_{i,1}}_{i,1}\oplus \cdots \oplus w^{y_{i,s}}_{i,s}\), \({\left\langle w^{y_i}_i\right\rangle }:={\left\langle w^{y_{i,1}}_{i,1}\right\rangle }\oplus \cdots \oplus {\left\langle w^{y_{i,s}}_{i,s}\right\rangle }\), and \(p_i=0\).

  2. 2.

    Generate. \(P_2\) randomly picks \({\mathcal {J}}\in \{0,1\}^k\) and commits it to \(P_1\). \({\mathcal {J}}\) will be used as the randomness for cut-and-choose later.

    1. (a)

      \(P_1\) picks 2T random \(\lambda _w\)-bit messages (\(\lambda _w=\ell _w\sigma _w\)) and run \(\mathsf {IHash}_{\ell _w,\sigma _w}\mathsf {.Hash}\) with \(P_2\) to send the i-hashes of the 2T random messages. Denote the 2T messages by \(\left\{ m_{i,l}, m_{i,r}\right\} _{i=1}^{T}\), and their i-hashes by \(\left\{ {\left\langle m_{i,l}\right\rangle }, {\left\langle m_{i,r}\right\rangle }\right\} _{i=1}^{T}\).

    2. (b)

      \(P_1\) picks 3T random \(\lambda _p\)-bit (\(\lambda _p=\ell _p\sigma _p\)) messages and run \(\mathsf {IHash}_{\ell _p, \sigma _p}\mathsf {.Hash}\) with \(P_2\) to send the i-hashes of these 3T random messages. Denote these messages and i-hashes by \(\left\{ \rho ^{i,l}, \rho ^{i,r}, \rho ^{i,o}\right\} _{i=1}^{T}\) and \(\left\{ {\left\langle \rho ^{i,l}\right\rangle }, {\left\langle \rho ^{i,r}\right\rangle }, {\left\langle \rho ^{i,o}\right\rangle }\right\} _{i=1}^{T}\). \(P_1\) computes \(p_{i,l}=\rho ^{i,l}_1\oplus \cdots \oplus \rho ^{i,l}_{\lambda _p}\), where \(\rho ^{i,l}_j\) denotes the \(j^{ th }\) bit of \(\rho ^{i,l}\). Similarly, \(P_1\) derives \(p_{i,r}\) and \(p_{i,o}\) from \(\rho ^{i,r}\) and \(\rho ^{i,o}\), respectively. (\(p_{i,l},p_{i,r},p_{i,o}\) will be used as the i-hash permutation bits on the three wires connected to a garbled AND gate.)

    3. (c)

      For \(i=\{1,\dots ,T\}\), \(P_1\) sets \(w_{i,l}^0:=m_{i,l}\oplus p_{i,l}\varDelta \) and \(w_{i,r}^0:=m_{i,r}\oplus p_{i,r}\varDelta \), then runs the garbling algorithm \(\mathsf{\small GenAND}\) (Fig. 3) to create T garbled AND gates:

      $$\begin{aligned} (w^0_{i,o},T_{i,G},T_{i,E})\leftarrow \mathsf{\small GenAND}(i,\varDelta ,w_{i,l}^0,w_{i,r}^0) \end{aligned}$$

      where \(w_{i,l}^{0},w_{i,r}^{0},w_{i,o}^{0}\) are the wire labels representing 0’s on the left input-wire, the right input wire, and the output-wire, respectively; \(T_{i,G}\) is the single garbled row in the generator half-gate and \(T_{i,E}\) the single row in the evaluator half-gate.

    4. (d)

      Let \(w_{i,o}^1:=w_{i,o}^0\oplus p_{i,o}\varDelta \) for all \(1\le i\le T\). \(P_1\) and \(P_2\) run

      $$\begin{aligned} \mathsf {IHash}_{\ell _w,\sigma _w}\mathsf {.Hash}\left( w_{1,o}^{p_{1,o}},\dots ,w_{T,o}^{p_{T,o}} \right) \end{aligned}$$

      so that \(P_2\) learns \({\left\langle w_{1,o}^{p_{1,o}}\right\rangle },\dots , {\left\langle w_{T,o}^{p_{T,o}}\right\rangle }\).

  3. 3.

    Evaluate. \(P_2\) opens to \(P_1\) the cut-and-choose randomness \({\mathcal {J}}\), which is used to select and group \(B\cdot N\) garbled ANDs into N buckets.

    Recall that for every logical gate in C, \(P_2\) has obtained from step 1 two wire-labels \(w^a_l,w^b_r\), which correspond to secret values ab on the input-wires and i-hashes \({\left\langle \rho ^l\right\rangle }, {\left\langle w^{p_l}_l\right\rangle }\), \({\left\langle \rho ^r\right\rangle }, {\left\langle w^{p_r}_r\right\rangle }, {\left\langle \rho ^o\right\rangle }, {\left\langle w^{p_o}_o\right\rangle }\). \(P_1\) and \(P_2\) follow an identical topological order to process the logical gates as follows: For every XOR, \(P_2\) sets \(w_o:=w^a_l\oplus w^b_r\); For every logical AND (Fig. 5), we denote the B garbled AND gates by \(g_1,\dots , g_B\),

    1. (a)

      \(P_2\) sets \(\mathcal {O}\) to an empty set and executes the following for \(1\le i\le B\) (note that \(P_2\) always continues execution until Step 5 even if any check failed),

      1. i.

        Let \(p_{i,l}\) be the i-hash permutation bit of the left input-wire of \(g_i\), i.e., \(p_{i,l}=g_i.p_l\). Let \(p_{i,r}, p_{i,o}, \rho ^{i,l}, \rho ^{i,r}, \rho ^{i,o}\) be similarly defined. \(P_1\) sends \(\rho ^l \oplus \rho ^{i,l}\), \(\rho ^r \oplus \rho ^{i,r}\) and \(\rho ^o\oplus \rho ^{i,o}\) to \(P_2\), who verifies them against their i-hashes and computes

        $$\begin{aligned} p_l\oplus p_{i,l}&:=\bigoplus _{1\le j\le \lambda _p}(\rho ^l_j\oplus \rho ^{i,l}_j)\\ p_r\oplus p_{i,r}&:=\bigoplus _{1\le j\le \lambda _p}(\rho ^r_j\oplus \rho ^{i,r}_j)\\ p_o\oplus p_{i,o}&:=\bigoplus _{1\le j\le \lambda _p}(\rho ^o_j\oplus \rho ^{i,o}_j). \end{aligned}$$
      2. ii.

        For \(b\in \{0,1\}\), define \(w^b_{i,l}\) be the wire-label representing signal b on gate \(g_i\)’s left input-wire, and let \(w^b_{i,r}\), \(w^b_{i,o}\) be similarly defined with \(g_i\)’s right input-wire and output-wire. \(P_1\) sends

        $$\begin{aligned} \delta _l:=&w^{p_l}_l \oplus w^{p_{i,l}}_{i,l} \oplus (p_l\oplus p_{i,l})\varDelta \\ \delta _r:=&w^{p_r}_r \oplus w^{p_{i,r}}_{i,r} \oplus (p_r\oplus p_{i,r})\varDelta \\ \delta _o:=&w^{p_o}_o \oplus w^{p_{i,o}}_{i,o} \oplus (p_o \oplus p_{i,o})\varDelta \end{aligned}$$

        to \(P_2\), who verifies them against their hashes and computes \(w^a_{i,l} :=w^a_l \oplus \delta _l\) and \(w^b_{i,r} :=w^b_{r} \oplus \delta _r\).

      3. iii.

        Recall that \(T_{i,G}=g_i.T_G\), \(T_{i,E}=g_i.T_E\). \(P_2\) runs \(w_{i,o}:=\mathsf{\small EvlAND}(w^a_l, w^b_r, T_{i,G}, T_{i,E})\), and sets \(w_o:=w_{i,o}\oplus \delta _o\).

      4. iv.

        \(P_2\) verifies \(w_{o}\) against \({\left\langle w^{p_o}_{o}\right\rangle }\) and \({\left\langle w^{p_o}_{o}\right\rangle }\oplus {\left\langle \varDelta \right\rangle }\). If either verification succeeds, \(P_2\) adds \(w_o\) to \(\mathcal {O}\).

    2. (b)

      If \(\mathcal {O}\) contains two different labels, say w and \(w'\). \(P_2\) computes \(\varDelta ^*:=w\oplus w'\), and uses \(\varDelta ^*\) to recover \(P_1\)’s private inputs x and computes f(xy). Otherwise, \(\mathcal {O}=\{w\}\) so \(P_2\) sets \(w_o=w\).

  4. 4.

    Check. \(P_2\) verifies the correctness of the rest \(T-BN\) garbled AND gates. For every check-gate parsed into

    $$\begin{aligned} \left( {\left\langle \rho ^l\right\rangle },{\left\langle w_l^{p_l}\right\rangle }, {\left\langle \rho ^r\right\rangle },{\left\langle w_r^{p_r}\right\rangle }, {\left\langle \rho ^o\right\rangle },{\left\langle w_o^{p_o}\right\rangle }, T_G, T_E \right) , \end{aligned}$$
    1. (a)

      \(P_2\) samples \(a\leftarrow \{0,1\}, b\leftarrow \{0,1\}\), sends them to \(P_1\).

    2. (b)

      \(P_1\) sends \({\rho ^l},{\rho ^r},{\rho ^o}\) to \(P_2\). \(P_2\) verifies them with \({\left\langle \rho ^l\right\rangle }, {\left\langle \rho ^r\right\rangle }\), and \({\left\langle \rho ^o\right\rangle }\). Let \(p_l=\rho ^{l}_1\oplus \cdots \oplus \rho ^{l}_{\lambda _p}\), \(p_r=\rho ^{r}_1\oplus \cdots \oplus \rho ^{r}_{\lambda _p}\), \(p_o=\rho ^{o}_1\oplus \cdots \oplus \rho ^{o}_{\lambda _p}\),

      1. i.

        \(P_1\) sends \(w^a_{l}={w^{p_{l}}_{l}}\oplus (a\oplus p_l){\varDelta }\) to \(P_2\), who verifies it against \({\left\langle w^{p_{l}}_{l}\right\rangle }\oplus (a\oplus p_l){\left\langle \varDelta \right\rangle }\).

      2. ii.

        \(P_1\) sends \(w^b_{r}={w^{p_{r}}_{r}}\oplus (b\oplus p_r){\varDelta }\) to \(P_2\), who verifies it against \({\left\langle w^{p_{r}}_{r}\right\rangle }\oplus (b\oplus p_r){\left\langle \varDelta \right\rangle }\).

      3. iii.

        Let \(z=a\wedge b\). \(P_1\) sends \(w^z_{o}={w^{p_{o}}_{o}}\oplus (z\oplus p_o){\varDelta }\) to \(P_2\), who verifies it against \({\left\langle w^{p_{o}}_{o}\right\rangle }\oplus (z\oplus p_o){\left\langle \varDelta \right\rangle }\).

    3. (c)

      \(P_2\) checks \(w^z_o=\mathsf{\small EvlAND}(w^a_l,w^b_r,T_G,T_E)\).

  5. 5.

    Output determination.

    1. (a)

      If any check failed in steps 3 and 4, \(P_2\) aborts.

    2. (b)

      \(P_1\) proves in zero knowledge that it executes the Step 0b and Step 0c honestly. Namely, the double-hashes received in Step 0d, the i-hash \({\left\langle \varDelta \right\rangle }\) hold by \(P_2\), and the watched seeds of \(\mathsf {IHash}_{\ell _p,\sigma _p}\mathsf {.Setup}\) are all respect to the same \(\varDelta \). \(P_2\) aborts if the ZK proof fails.

    3. (c)

      Otherwise, \(P_2\) outputs f(xy), either from a recovered \(\varDelta \) or interpreted final output labels.

Remarks. In practice, Step 5b can be done efficiently with ZK proof techniques [14, 47], costing \(2n_p\) semi-honest garbled circuit executions of SHA256.

Solder Gates Garbled with Different \(\varDelta \) s. The protocol given above assumes all the gates are garbled with respect to the same \(\varDelta \) and allows detecting faulty gates with probability 50%. To further increase the faulty gate detection rate to 100%, we can let \(P_1\) to garble each gate with a freshly sampled secret \(\varDelta \), so that when a garbled gate is chosen to be checked, \(P_1\) can fully open the gate without leaking the \(\varDelta \) used in other garbled gates.

The procedure to solder a wire associated with \(({\left\langle \varDelta _1\right\rangle }, {\left\langle \rho _1\right\rangle }, {\left\langle w_1^{p_1}\right\rangle })\) (where \(p_1\) is the xor-sum of all bits of \(\rho _1\)) to another wire associated with \(({\left\langle \varDelta _2\right\rangle }, {\left\langle \rho _2\right\rangle }, {\left\langle w_2^{p_2}\right\rangle })\) (where \(p_2\) is the xor-sum of all bits of \(\rho _2\)) is also a bit different from that described in the main protocol. First, \(P_1\) reveals \(\varDelta ':=\varDelta _1\oplus \varDelta _2\) and \(\rho ':=\rho _1\oplus \rho _2\) to \(P_2\), who validates \(\varDelta '\) and \(\rho '\) using the corresponding i-hashes. Let \(p'\) be the xor-sum of all bits of \(\rho '\). If \(p'=0\), which implies \({\left\langle w_1^{p_1}\right\rangle }\) and \({\left\langle w_2^{p_2}\right\rangle }\) are hashes of wire-labels denoting the same plaintext signal, then \(P_1\) reveals \(w':=w_1^{p_1}\oplus w_2^{p_2}\) to \(P_2\); otherwise, if \(p'=1\), then \(P_1\) reveals \(w':=w_1^{p_1}\oplus w_2^{p_2} \oplus \varDelta _2\) to \(P_2\). With a wire-label w to be translated, \(P_2\) will validate \(w'\), then output \(w\oplus w'\) as the translated wire-label if \({\left\langle w\right\rangle }={\left\langle w_1^{p_1}\right\rangle }\); and output \(w\oplus w'\oplus \varDelta '\) if \({\left\langle w\right\rangle }={\left\langle w_1^{p_1}\right\rangle }\oplus {\left\langle \varDelta _1\right\rangle }\).

Figure 6 shows wire-label conversion for every possible combination of \(p_1, p_2, w\), and verifies \(P_2\) can always output a label that denotes the same plaintext signal as w. To see the security of this soldering scheme, we note that to a malicious \(P_2\), \((\varDelta ', \rho ', w')\) is always indistinguishable from a tuple of random messages.

Fig. 6.
figure 6

Soldering wires associated with different \(\varDelta \)s.

5.2 Proof of Security

First, we show that for any N and security parameter sk, it is possible to set parameters \(T,B, n, \ell , w\) such that our protocol securely computes f(xy) (except with probability \(2^{-s}\)). For concrete values of skN, we detail how to optimize \(T,B,n,\ell ,w\) for performance in Sect. 6.3.

Lemma 5.1

For any Nsk, there exist TB such that if \(P_2\) does not abort at Step 5, then with all but \(2^{-s}\) probability every bucket has at least one correctly garbled gate.

The validity of Lemma 5.1 is implied by our cut-and-choose parameter selection strategy described in Sect. 6.3.

Lemma 5.2

If every bucket has at least one correctly garbled gate, \(P_2\) will output f(xy) at Step 5 except with negligible probability.

Lemma 5.3

Given \({\left\langle \varDelta \right\rangle }\), a garbled AND gate \(\left( T_G, T_E \right) \), and the i-hashes of its wire-labels and permutation messages \(\left( {\left\langle \rho ^l\right\rangle },{\left\langle w_l^{p_l}\right\rangle }, {\left\langle \rho ^r\right\rangle },{\left\langle w_r^{p_r}\right\rangle }, {\left\langle \rho ^o\right\rangle },{\left\langle w_o^{p_o}\right\rangle } \right) \), where \(p_l = \rho ^l_1\oplus \dots \oplus \rho ^l_{\lambda _p}\), \(p_r = \rho ^r_1\oplus \dots \oplus \rho ^r_{\lambda _p}\), \(p_o = \rho ^o_1\oplus \dots \oplus \rho ^o_{\lambda _p}\). If any of the following is not satisfied (see Fig. 3 for \({\mathsf{\small EvlAND}}\)),

$$\begin{aligned} \mathsf{\small EvlAND}(w_l^0,w_r^0,T_G,T_E)=w_o^0;&\,\, \mathsf{\small EvlAND}(w_l^0,w_r^1,T_G,T_E)=w_o^0;\\ \mathsf{\small EvlAND}(w_l^1,w_r^0,T_G,T_E)=w_o^0;&\,\, \mathsf{\small EvlAND}(w_l^1,w_r^1,T_G,T_E)=w_o^1, \end{aligned}$$

then \(P_2\) will detect this with probability at least 1/2 at Step 4.

Due to page limit, we moved the proof of Lemmas 5.2 and 5.3 to Appendices A.4 and A.5 of the full paper [48].

Theorem 5.4

Under the assumptions outlined in Sect. 3, the protocol in Sect. 5.1 securely computes f in the presence of malicious adversaries.

Due to page limit, we moved the proof to Appendix A.6 of the full paper [48].

6 Parameters and Bounds

6.1 \(\mathsf {IHash}\)-ing Permutation Messages

Here the goal is to decide the best parameters \(n_p, \ell _p, w_p, \sigma _p\) that are used to i-hash the wire permutation messages, i.e., the \(\rho \)’s used in the main protocol, to achieve the necessary binding and hiding properties. This can be framed into a constrained optimization problem:

$$\begin{aligned} (n_p, \ell _p, w_p, \sigma _p)=\text {arg min}\;\,\mathsf {cost}(n,\ell ,w,\sigma ) \end{aligned}$$

subject to:

(1)
$$\begin{aligned} \sigma (\ell -w)&\ge k \\ 2^{\sigma }&\ge n \nonumber \\ n,\ell ,\sigma ,w&\in \mathbb {Z}^+ \nonumber \end{aligned}$$
(2)

where inequality (1) ensures s-bit statistical binding, inequality (2) ensures k-bit hiding, and the target cost function can depend on a number of deployment-specific tradeoffs between bandwidth and computation.

We stress that hiding for the permutation messages is perfect because there is no additional information revealed to allow a malicious evaluator to verify its guess on the hidden bits (comparing to the fact that garbled truth table can be used to verify guesses about the wire-labels). Once the cost function is fixed, an efficient solver through aggressive pruning can be constructed. In our experiment, we set \(n_p=44, \ell _p=20, w_p=19, \sigma _p=6\), which provides 40-bit statistical binding (verify this by plugging them into (1)) and 6-bit perfect hiding since \((\ell _p-w_p)\cdot \sigma _p=6\) (although only 1-bit perfect hiding is needed).

6.2 \(\mathsf {IHash}\)-ing and \(\mathsf {Compress}\)-ing Wire-Labels

Here our goal is to determine the best parameters \(n_w, \ell _w, w_w, \sigma _w\) to process the wire-labels so that s-bit statistical security and k-bit computational security can be guaranteed for the main protocol. Note the entropy hiding on the wire-labels downgrades to computational because a malicious evaluator could run offline tests on its guesses of the wire-labels using the garbled rows.

To ensure 40-bit binding and at least 80-bit hiding, \(\ell _w\cdot \sigma _w\) (the wire-labels’ length) has to be more than 128 bits. This poses a challenge to efficient garbling using fixed-key AES assembly instructions since AES only works on 128-bit blocks. We solve this challenge by making \((\ell _w-w_w)\sigma _w\) slightly larger (i.e., by a factor of \(1+\varepsilon \)) than k, followed by a linear \(\mathsf {Compress}\) function to derive 128-bit compressed labels that each carries more than 80-bit entropy from the original, watched wire-labels. Namely, for wire-labels, we replace constraint (2) by

$$\begin{aligned} {\sigma _w(\ell _w-w_w)} \ge (1+\varepsilon ) k \end{aligned}$$

where \(\varepsilon >0\) compensates the entropy loss during the compression. We choose \(n_w=86, \ell _w=32, w_w=21\) and \(\sigma _w=8\).

To compress wire-labels, the generator samples \(128/\sigma _w=16\) linear-independent vectors (over \(\mathrm {GF}(2^{8})^{32}\)) once and left-multiply them to 256-bit wire-labels to obtain 128-bit compressed labels.

Recall that these 16 linear-independent vectors are declared only after the evaluator chose its 21 watch symbols. The entropy analysis of this \(\mathsf {Compress}\) function can be done by considering the following experiment:

  1. 1.

    \(P_1\) randomly samples 32 symbols, \(m_1,\dots ,m_{32}\in \mathrm {GF}(2^8)\).

  2. 2.

    \(P_2\) randomly chooses 21 linear-independent vectors \(\varvec{W}=(W_1,\dots ,W_{21})\) from \(\mathrm {GF}(2^{8})^{32}\) and thus learns \((m_1,\ \dots ,\ m_{32})\cdot W_i\) for all \(1\le i\le 21\).

  3. 3.

    \(P_1\) randomly chooses and sends 16 linear-independent vectors \(\varvec{T}=(T_1,\dots ,T_{16})\in \mathrm {GF}(2^{8})^{32}\), then outputs \(v_1,\dots ,v_{16}\), where \(v_i=(m_1\ \dots \ m_{32})\cdot T_i\) for all \(1\le i\le 16\) .

The question is: how much entropy in the output \(v_1,\dots ,v_{16}\) remains hidden to \(P_2\)? In other words, for every \(\mathcal {A}\), every rank-21 matrix \(\varvec{W}\in \mathrm {GF}(2^8)^{32\times 21}\) and every rank-16 matrix \(\varvec{T}\in \mathrm {GF}(2^8)^{32\times 16}\), define

$$\begin{aligned} Q=\Pr \big (\;\varvec{m}\leftarrow \mathrm {GF}(2^8)^{32}\ :\ \mathcal {A}(\varvec{W}, \varvec{T},\,\varvec{m}\cdot \varvec{W})=\varvec{m}\cdot \varvec{T}\;\big ). \end{aligned}$$

We want to know the min-entropy of \(\varvec{m}\cdot \varvec{T}\), which is essentially \(\log (1/Q)\). We answer this question with Lemma 6.1 and elaborate the analysis in its proof.

Lemma 6.1

Setting \(n_w=86, \ell _w=32, w_w=21\) and \(\sigma _w=8\) ensures 40-bit statistical binding and more than 87.999 bits hiding in the compressed wire-labels; while setting \(n_w=88, \ell _w=48, w_w=32\) and \(\sigma _w=8\) ensures 40-bit statistical binding and more than 127 bits hiding in the compressed wire-labels.

Proof

Following Lemma 4.3, 40-bit statistical binding for both settings can be verified by computing and .

Next, we examine the hiding aspect. Let \(\varvec{T}=(T_1,T_2,\cdots ,T_{16})\) be an \(32 \times 16\) matrix over \(\mathrm {GF}(2^8)\) of rank 16, and \(\varvec{W}=(W_1,W_2,\cdots ,W_{21})\) be an \(32 \times 21\) matrix over \(\mathrm {GF}(2^8)\) of rank 21. We want to show that for every adversary \(\mathcal {A}\),

$$\begin{aligned} -\log \Pr \big (\,\varvec{m}\leftarrow \mathrm {GF}(2^8)^{32}\ :\ \mathcal {A}(\varvec{W}, \varvec{T}, \varvec{m}\cdot \varvec{W})=\varvec{m}\cdot \varvec{T}\,\big )>87.999. \end{aligned}$$

Define \(D=\dim (\varvec{T}\oplus \varvec{W})-\dim (\varvec{W})\) where “\(\oplus \)” denotes direct sum and “\(\dim (\cdot )\)” denotes the dimension of a given vector space. For every rank-t matrix \(\varvec{T}\) and every rank-w matrix \(\varvec{W}\), we note that

$$\begin{aligned} \Pr \big (\,\varvec{m}\leftarrow \mathrm {GF}(2^8)^{32}\ :\ \mathcal {A}(\varvec{W}, \varvec{T}, \varvec{m}\cdot \varvec{W})=\varvec{m}\cdot \varvec{T}\,\big )=2^{-8D}. \end{aligned}$$

Thus, our goal is to show that \(-\log \mathbb {E} (2^{-8D})>87.999\). The concept of expectation is introduced because \(2^{-8D}\) itself is a random variable over the random choices of picking \(\varvec{T}\) and \(\varvec{W}\).

Define \(d_{i,j}=\dim (\mathrm {Span}(T_{i+1},T_{i+2},\cdots ,T_{t},\varvec{W}))-\dim (\varvec{W})\) under the condition that \(\dim (\mathrm {Span}(T_1,T_2,\cdots ,T_{i}) \cap \varvec{W})=j\). Thus, \(D=d_{0,0}\) and we can derive \(d_{i,j}\) from \(d_{i+1,j+1}\) and \(d_{i+1,j}\) using recursion. Vector \(T_{i+1}\) has to fall into one of the two cases:

  1. 1.

    \(T_{i+1}\in \varvec{W}\): Thus \(\mathrm {Span}(T_{i+1},T_{i+2},\cdots ,T_{t},\varvec{W})=\mathrm {Span}(T_{i+2},T_{i+3},\cdots ,T_{t},\varvec{W})\). In addition, because \(T_{i+1} \not \in \mathrm {Span}(T_1,T_2,\cdots ,T_{i})\) and \(\mathrm {Span}(T_1,T_2,\cdots ,T_{i+1}) \cap \varvec{W} = (\mathrm {Span}(T_1,T_2,\cdots ,T_{i}) \cap \varvec{W}) \oplus Span(T_{i+1})\), we know,

    $$\begin{aligned} \dim (\mathrm {Span}(T_1,T_2,\cdots ,T_{i+1}) \cap \varvec{W})&= \dim (\mathrm {Span}(T_1,T_2,\cdots ,T_{i}) \cap \varvec{W})+1\\&= j+1 \end{aligned}$$

    Note the probability of \(T_{i+1}\in \varvec{W}\), conditioned on \(T_{i+1}\not \in \mathrm {Span}(T_1,\dots ,T_i)\), is \((2^{8\cdot (21-j)}-1)/(2^{8\cdot (32-i)}-1)\). This is because there are \(2^{8\cdot (32-i)}-1\) possible non-zero \(T_{i+1}\) that satisfy \(T_{i+1}\not \in \mathrm {Span}(T_1,\dots ,T_i)\); and \(2^{8\cdot (21-j)}-1\) non-zero choices of \(T_{i+1}\) such that \(T_{i+1}\in \varvec{W}\) but \(T_{i+1}\not \in \mathrm {Span}(T_1,\dots ,T_i)\cap \varvec{W}\), since \(\dim (\mathrm {Span}(T_1,\dots ,T_i)\cap \varvec{W})=j\).

  2. 2.

    \(T_{i+1} \not \in \varvec{W}\): Thus \(\mathrm {Span}(T_{i+1},T_{i+2},\cdots ,T_{t},\varvec{W})=\mathrm {Span}(T_{i+2},T_{i+3},\cdots ,T_{t},W)\oplus \mathrm {Span}(T_{i+1})\), and

    $$\begin{aligned} \dim (\mathrm {Span}(T_1,T_2,\cdots ,T_{i+1}) \cap \varvec{W}) = \dim (\mathrm {Span}(T_1,T_2,\cdots ,T_{i}) \cap \varvec{W}) = j. \end{aligned}$$

    This happens with probability \(1-(2^{8\cdot (21-j)}-1)/(2^{8\cdot (32-i)}-1)\).

Therefore,

$$\begin{aligned} d_{i,j}=\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}d_{i+1,j+1}+\left( 1-\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}\right) (d_{i+1,j}+1) \end{aligned}$$

Moreover,

$$\begin{aligned} \mathbb {E} (2^{-8\cdot d_{i,j}}) =&\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}\mathbb {E} (2^{-8\cdot d_{i+1,j+1}}) + \left( 1-\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}\right) \mathbb {E} \left( 2^{-8\cdot (d_{i+1,j}+1)}\right) \\ =&\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}\mathbb {E}(2^{8\cdot d_{i+1,j+1}})+\left( 1-\frac{2^{8\cdot (21-j)}-1}{2^{8\cdot (32-i)}-1}\right) \frac{1}{2^8}\mathbb {E}(2^{-8\cdot d_{i+1, j}}) \end{aligned}$$

Finally, the base cases for bootstrapping the recursive calculation are: (1) \(d_{16,j}=0\) for all j; and (2) \(d_{i,21}=i\) for all i. It is easy to calculate \(d_{0,0}\) in full precision with a computer program. Thus,

$$\begin{aligned} -\log _2\mathbb {E}(2^{-8\cdot D})=&\log _2\mathbb {E}(2^{8\cdot d_{0,0}})\\ =&\log _2\frac{340282366920938463463374607431768211457}{1099511627777}\\>&87.9999999999986 > 88 - 10^{-11}. \end{aligned}$$

When setting \(n_w=88, \ell _w=48, w_w=32\) and \(\sigma _w=8\), we can derive a similar recurrence as above and solve it for a different \(d_{0,0}\) and verify that \(\log _2\mathbb {E}(2^{8\cdot d_{0,0}})>127\).    \(\square \)

Remark

As an alternative, the generic Leftover Hash Lemma [3, 22] could be used to solve the wire-label entropy extraction problem. However, the potentially large entropy loss (\(2\log (1/2^{-80})\,+\,O(1)=160\,+\,O(1)\) bits) makes it unsuitable in our case. In contrast, less than \(10^{-11}\) bit of entropy (out of the \(8\cdot (32\,-\,21)=88\) bits remaining entropy before \(\mathsf {Compress}\)-ing) will actually be lost due to our \(\mathsf {Compress}\) function (see Lemma 6.1’s proof)!

6.3 LEGO Cut-and-Choose Parameters

While existing analysis of LEGO cut-and-choose rely on empirical point trials in a likely area of the parameter space, we show that the search can be fully guided to efficiently identify the best cut-and-choose parameters in practical scenarios.

Recall the goal is to determine the best TB to ensure s-bit statistical security for computing a circuit of N gates. Assume out of the total T garbled gates, b of them are faulty gates. Let \(P_\mathrm{c}\) be the probability of selecting a particular set of \(T-BN\) gates in which t gates are faulty. Then . Let \(P_\mathrm{e}\) be the probability that at least one bucket is filled entirely by faulty gates, then where the equality is approached from below when \(N\gg B\). Since the overall failure probability \(P_\mathrm{overall}\le \max _b\sum _{t=1}^{b}2^{-t} P_\mathrm{c} P_\mathrm{e}\), it suffices to find the smallest T such that \(\max _b\sum _{t=1}^{b}2^{-t} P_\mathrm{c} P_\mathrm{e}\le 2^{-s}\). Note that we only need to consider b values up to a upper bound slightly larger than s because \(P_\mathrm{c} P_\mathrm{e}<1\) and \(\sum 2^{-t}\) converges as \(t\rightarrow \infty \). In addition, we observe that the smallest T for any fixed B (we call \(T_B\)) can be quickly determined since \(P_\mathrm{overall}\) strictly decreases when T grows. Thus, TB can be efficiently determined through pruning when examining \(B=2,3,...\lceil T_B/N \rceil \).

6.4 A Fallacy and a Tight Bound

Define \(\kappa =T/N\). Prior works claimed \(O(\kappa )=O(T/N)=O((skN/\log N)/N)=O(sk/\log N)\), implying \(\kappa \rightarrow 0\) when \(N\rightarrow +\infty \) [11, 12, 36]. However, we found that this is not the case. More precisely, T should be \(O(\kappa \cdot N+skN/\log N)\) and 2 is a tight bound of \(\kappa \). That is, \(\kappa \le 2\) cannot be achieved without compromising security while any \(\kappa >2\) is securely achievable (using our protocol) if N is large enough. However, the formal proof of this seemingly intuitive result is nontrivial.

Theorem 6.2

Let \(\kappa =T/N\) and \({{\mathop {\Pr }\nolimits _{{\textit{overall}}}}}\) be the overall success rate of \(P_1\) attacking LEGO-style cut-and-choose.

  1. 1.

    If \(\kappa \le 2\), there exists a constant \(c>0\) and an integer \(N_0\) such that \({{\mathop {\Pr }\nolimits _{{\textit{overall}}}}}>c\) for all \(N>N_0\).

  2. 2.

    For every statistical security parameter s and computational security parameter k, there exists an integer \(N_0\) such that the protocol of Sect. 5.1 with a \(\kappa >2\) securely computes all circuits of size \(N>N_0\).

Due to page limit, the proof of Theorem 6.2 is moved to Appendix B of the full paper [48].

7 Evaluation

Measurement Methodology. We ran experiments on Amazon EC2 (instance type: c4.2xlarge) running Ubuntu Linux in both the LAN (2.5 Gbps, \({<1}\) ms latency) and WAN (200 Mbps, 20 ms latency) network settings. Our IHash parameters are listed in Fig. 7. As our comparison baseline, we chose WMK [42] and TinyLEGO [12, 37], two implementations of single-execution setting protocols representing the state-of-the-art. All comparisons are aligned on the same hardware and network environment, based on single-threaded executions. We include results for both 88- and 127-bit computational security for our protocols, but anticipate the performance numbers for 127-bit computational security would drop significantly if processors with AVX512 instructions become available.

Fig. 7.
figure 7

IHash parameters.

Fig. 8.
figure 8

Microbenchmarks. (The unites are either microsecond or byte. CPU timings do not include network time. Timings are averaged over \(10^6\) executions.)

Microbenchmarks. We first measured the performance of our protocol over seven basic tasks (see Fig. 8):

  1. 1.

    Garble: This is to generate three wires (including the wire permutation messages and corresponding i-hashes) and a garbled truth table for AND.

  2. 2.

    Check: This includes evaluating a garbled gate and verifying the results with the three revealed wire permutations bits.

  3. 3.

    Evaluate: This includes evaluating a garbled truth table.

  4. 4.

    Solder: This includes solder three wires of a garbled AND gate.

  5. 5.

    \(P_1\) ’s Input: This includes generating a fresh wire and sending the wire-label associated to an input bit of \(P_1\).

  6. 6.

    \(P_2\) ’s Input: This includes generating 40 fresh wires without the wire permutation messages and running 40 extended OTs.

  7. 7.

    Output: This includes revealing the wire permutation messages of an output wire if \(P_2\) should learn the output (or sending the output wire-label obtained from evaluation if \(P_1\) should learn the output).

We have also compared the performance of the basic procedures between our approach and WMK’s [42] (Fig. 9). While our speed of processing logical AND gates is about 2.5–5x slower, we can outperform WMK’s highly optimized circuit input/output handling mechanism by 2–200x in LAN setting and 3–75 in the WAN setting, which demonstrates a clear advantage of LEGO approach over traditional cut-and-choose protocols. Note that as the communication cost becomes the bottleneck in the WAN setting, the performance gap of a task will approach the bandwidth requirement ratio of the task between the two protocols.

Fig. 9.
figure 9

Microbenchmark comparisons with WMK [42]. The units are either microsecond/wire or microsecond/gate. Timings are wall-clock time. Security parameters are aligned at \(s=40, k=127\), and assuming \(B=5\) in our protocol.

Applications. Figure 10 shows how our protocol compares to the baselines over several end-to-end oblivious applications running with \(2^{-40}\) statistical security. These include

  1. 1.

    AES. It encrypts one block using AES. The circuit takes a 128-bit input from each party and computes \(N=6800\) AND gates. We set \(B=5\), \(T=39535\).

  2. 2.

    DES. It encrypts one block using DES. The circuit takes a 768-bit key from \(P_1\) and a 64-bit message from \(P_2\). Since \(N=18175\), we set \(B=5\), \(T=97593\).

  3. 3.

    Comparison. It compares two 10K-bit integers so it takes 10K bits from \(P_1\) and 10K bits from \(P_2\) and outputs 1 bit to indicate which number is larger. It involves \(N=10\)K AND gates, so we set \(B=5\) and \(T=55973\).

  4. 4.

    Hamming Distance. This computes the Hamming distance between two 2048-bit strings. The circuit takes a 2048-bit input from each party and computes \(N=4087\) AND gates, so we set \(B=5\), \(T=25432\).

Our approach is more than 2x faster than TinyLEGO [37] in the LAN setting. Our measurements of their AES and DES protocols are in line with the numbers reported in their paper, though we noticed that theirs run much slower on applications with longer inputs and outputs such as Compare and Hamming (whose performance numbers were not included in their paper). We suspect (and get confirmed by one of the TinyLEGO implementers) that this is probably due to some implementation issues and the use of input authenticators mechanism required in TinyLEGO. On the flip side, we note that TinyLEGO uses 20–50% less bandwidth.

In comparison with WMK, our protocol is about 4–5x slower when running AES and DES over the LAN. However, for input/output intensive applications like Compare and Hamming, the overall performance of our protocol is very close, and can even be 30–80% faster than WMK, especially in the WAN setting.

Fig. 10.
figure 10

Applications performance. (Measurements are averaged over 10 executions. The numbers don’t include the setup cost, i.e., for the base OTs and ZK proof of Step 5b.)

Cut-and-Choose Larger Circuit Components. We used AES as an example application to study the potential benefit of JIMU’s support of cut-and-choosing larger circuit components. We adopted the strategy of Huang et al. [19] with which the only components using non-XOR gates are SubBytes (each has 34 ANDs if Boyar and Peralta’s SubByte circuit is used [6]). We compared the approach where SubBytes are the basic unit of cut-and-choose with that of cut-and-choosing individual ANDs. Figure 11 shows the detailed comparison on how the protocol parameters and performance are affected by increasing the size of basic cut-and-choose units. For running the same application (a column of Fig. 11), cut-and-choosing SubBytes may require using larger B than cut-and-choosing ANDs because N will be 34x smaller. However, much overhead for enabling wire-soldering can be saved for all internal wire connections. We observe 45%–60% savings in time and 50%–60% savings in bandwidth when SubBytes are treated as the basic cut-and-choose units while the savings increase as the number of AES circuits involved in the application increases.

Fig. 11.
figure 11

Benefits of cut-and-choosing larger components.