Keywords

1 Introduction

Secure Multiparty Computation (MPC) enables untrusting parties to compute a function of their private inputs while revealing only the function output. In this work, we consider semi-honest MPC protocols that use the classic trick of Beaver to evaluate Boolean circuits by preprocessing ‘multiplication triples’ [Bea92].

In such protocols, \(\mathtt {XOR}\) gates are ‘free’ (i.e., require no interaction), but each \(\mathtt {AND}\) gate consumes a distinct multiplication triple. To generate these triples, the parties use a more expensive MPC protocol that can be run ahead of time in a preprocessing phase. The communication in this phase is proportional to the number of triples and is usually the performance bottleneck. Hence, if we reduce the number of required triples, then we significantly improve performance.

Because one triple is needed per \(\mathtt {AND}\) gate, protocols waste significant work if the computed function has conditional behavior, e.g. as the result of an IF program statement. Each gate requires a distinct triple, even if the output of the gate is entirely discarded by the function’s conditional behavior.

Our protocol re-uses multiplication triples across conditional branches. A single triple can support any number of \(\mathtt {AND}\) gates, so long as the gates occur in mutually exclusive program branches. This re-use does require that the parties hold additional correlated randomness, but the parties can generate this randomness efficiently. Our approach greatly decreases total communication and hence improves performance.

1.1 High Level Intuition

Multiplication triples typically cannot be re-used (Sect. 2.5 reviews multiplication triples in detail): triples are essentially one-time-pads on cleartext values in the circuit: since no strict subset of parties knows the values in the triple, it is secure to use the triple to mask cleartext values. However, if we use the same triple for two different gates, then we violate security. We can work around this.

Consider two conditionally composed circuits \(C^0\) and \(C^1\), both with n \(\mathtt {AND}\) gates. For sake of example, suppose \(C^0\) is the active branch, but suppose the parties do not know and should not learn this fact. We re-use the same triples to evaluate gates in both \(C^0\) and \(C^1\) by carefully applying secret shared masks to the triples. For the inactive branch \(C^1\), the parties mask the shares with \(\mathtt {XOR}\) shares of uniform masks, randomizing the triples and preventing us from breaking the security of one-time-pad. By randomizing the triples, we violate the correctness of \(\mathtt {AND}\) gates on the inactive branch, but this is of no concern: the output of each inactive \(\mathtt {AND}\) gate is ultimately discarded. For the active branch \(C^0\), the parties use the triples ‘as is’, meaning the active branch is evaluated normally. Of course, the parties should not know which branch is inactive, so from the perspective of the parties it should appear plausible that either branch could have used randomized triples. To achieve this, for the active branch the parties also \(\mathtt {XOR}\) masks onto the triples, but in this case each mask is a sharing of zero: hence the \(\mathtt {XOR}\)ing is a no-op.

The problem of amortizing triples across branches thus reduces to the problem of generating secret shared masks, both uniform and ‘all-zero’. We present techniques for efficiently generating these masks, the most general of which is based on oblivious vector-scalar multiplication, achieved by a small number of 1-out-of-2 oblivious transfers. The crucial point is that the protocols for generating masks require far less communication than protocols for generating triples.Footnote 1 Thus, we decrease communication and improve performance.

1.2 Advantage Over [HKP20]

Recent work showed an improvement similar to ours: [HKP20] showed that oblivious transfers can be re-used across conditional branches in the classic GMW protocol (see Sect. 2.4 for a review).

However, [HKP20] has one significant disadvantage: their performance improvement depends on circuit topology. Efficient GMW implementations minimize latency by organizing circuits into layers of gates. The input wires into each layer are the outputs of previous layers only, and hence all gates in a particular layer can be executed simultaneously. This strategy yields latency proportional to the circuit’s multiplicative depth instead of to the number of gates.

Due to this important optimization, [HKP20]’s performance improvement is limited by the ‘relative alignment’ of the layers across branches: two branches are highly aligned if each of their respective layers has a similar number of \(\mathtt {AND}\) gates. Their protocol issues oblivious transfers that simultaneously run one gate per branch and hence cannot optimize gates that occur in different layers. The relative alignment of branches is dependent on the target application. [HKP20] suggests resorting to compiler technologies to extract more performance.

Our approach does not depend on topology. Instead, we depend only on the number of \(\mathtt {AND}\) gates in each branch. The parties require enough triples to handle the maximum number of \(\mathtt {AND}\) gates across the branches. It is difficult to analytically quantify our improvement over [HKP20] without a specific application in mind, but our experiments show that the improvement is significant. We ran both approaches across a variety of topologies, and on average we improved communication by \(2.6\times \) (see Sect. 8). In addition to concretely outperforming [HKP20], our approach also demonstrates that conditional improvement is possible for a different class of protocols (i.e. those based on triples), and hence is of independent interest.

1.3 Our Contributions

  • Efficient Re-use of Beaver Triples. Our MPC protocol is secure against the semi-honest corruption of up to \(p-1\) parties. The protocol re-uses triples across branches and requires a number of triples proportional only to the size of the longest execution path rather than to the size of the entire circuit.

  • Topology-Independent Improvement. Unlike [HKP20], our improvement is independent of the topology of the conditional branches.

  • Implementation and evaluation. We implemented our approach in C++ and report performance (see Sect. 8). For 2PC and a circuit with 16 branches, we improve communication over state-of-the-art [HKP20] on average by \(2.6\times \) and over a standard triple-based protocol (i.e., without our conditional improvement) by \(12\times \).

2 Preliminaries

2.1 Notation

  • p denotes the number of parties.

  • b denotes the number of branches.

  • Subscript notation associates variables with parties. E.g., \(a_i\) is a variable held by party \(P_i\).

  • G denotes a pseudo-random generator (PRG).

  • \(\kappa \) is the computational security parameter (e.g. 128).

  • t denotes the ‘taken’ branch in a conditional, i.e. the branch that is active during the oblivious execution. \(\bar{t}\) implies an inactive branch.

  • Superscript notation associates variables with a particular branch. E.g., \(x^0\) is associated with branch 0 while \(x^1\) is associated with branch 1.

  • \(\in _\$\) denotes that the left hand side is uniformly drawn from the right hand set. E.g., \(x \in _\$\{0, 1\}\) denotes that x is a uniform bit.

  • \(\triangleq \) denotes that the left hand side is defined to be the right hand side.

  • We manipulate vectors and bitstrings (i.e., vectors of bits):

    • Variables denoting vectors are indicated with bold notation, e.g. \(\varvec{a}\). If we wish to explicitly write out a vector, we use parenthesized, comma-separated values, e.g. (ab, ..., yz).

    • We index vectors with brackets and use 1-based indexing, e.g. \(\varvec{a}[1]\).

    • When clear from context, n denotes vector length.

    • When two bitstrings are known to have the same length, we use \(\oplus \) to denote the bitwise \(\mathtt {XOR}\) sum:

      $$ \varvec{a} \oplus \varvec{b} = (\varvec{a}[1], ..., \varvec{a}[n]) \oplus (\varvec{b}[1], ..., \varvec{b}[n]) \triangleq (\varvec{a}[1]\oplus \varvec{b}[1],..., \varvec{a}[n]\oplus \varvec{b}[n]) $$
    • We indicate a bitwise vector scalar product by writing the scalar to the left of the vector:

      $$ a\varvec{b} = a(\varvec{b}[1], ..., \varvec{b}[n]) \triangleq (a(\varvec{b}[1]),..., a(\varvec{b}[n])) $$
  • We manipulate \(\mathtt {XOR}\) secret shares. Section 2.2 presents our secret share notation and reviews basic properties.

2.2 \(\mathtt {XOR}\) Secret Shares

Our main contribution is a Beaver-triple based construction for efficient conditional branching. Additionally, we review prior work that is based on the classic GMW protocol. Both techniques are based on \(\mathtt {XOR}\) secret shares. Thus, we briefly establish notation for \(\mathtt {XOR}\) shares and review their properties.

An \(\mathtt {XOR}\) secret sharing held amongst p parties is a vector of bits \((x_1, ..., x_p)\) where each party \(P_i\) holds \(x_i\). We refer to the full vector as a sharing and to the individual bits held by parties as shares. The semantic value of a sharing (i.e., the cleartext value that the sharing represents) is the \(\mathtt {XOR}\) sum of its shares. If the semantic value of a sharing \((x_1, ..., x_p)\) is a bit x, i.e. \(x_1 \oplus ... \oplus x_p = x\), then we use the shorthand \([\![x ]\!]\) to denote the sharing:

$$\begin{aligned}{}[\![x ]\!]&\triangleq (x_1, ..., x_p)&\text {such that}~x_1 \oplus ... \oplus x_p = x \end{aligned}$$

Typically, sharings are used in a context where no strict subset of parties knows the semantic value of the sharing. Nevertheless, parties can easily perform homomorphic linear operations over \(\mathtt {XOR}\) sharings.

  • Parties \(\mathtt {XOR}\) two sharings by locally \(\mathtt {XOR}\)ing their respective shares:

    $$\begin{aligned}{}[\![x ]\!] \oplus [\![y ]\!]&= (x_1, ..., x_p) \oplus (y_1, ..., y_p)&\text { Defn. sharing}\\&= (x_1 \oplus y_1, ..., x_p \oplus y_p)&\text { Defn. vector } \mathtt {XOR}\\&= [\![x \oplus y ]\!]&\mathtt {XOR}\ \text { commutes, assoc., defn. sharing} \end{aligned}$$
  • Parties \(\mathtt {AND}\) a sharing with a public constant by locally scaling each share.

    $$\begin{aligned} c[\![x ]\!]&= c(x_1, ..., x_p)&\text {Defn. sharing}\\&= (cx_1, ..., cx_p)&\text {Defn. vector scalar product}\\&= [\![cx ]\!]&\mathtt {AND}\ \text { distributes over } \mathtt {XOR}\text { , defn. sharing} \end{aligned}$$
  • Parties encode public constants as sharings by letting \(P_1\) take the constant as his/her share and letting all other parties take 0 as his/her share.

    $$\begin{aligned}{}[\![c ]\!]&= (c, 0, ..., 0)&0 \text { identity, defn. sharing} \end{aligned}$$

    This allows the parties to \(\mathtt {XOR}\) sharings with public constants.

A party can easily share her private bit x with the parties. She uniformly draws p bits, with the constraint that the p bits \(\mathtt {XOR}\) to x. She then distributes \([\![x ]\!]\) amongst the parties.

Parties can compute sharings of uniform values. To draw a uniform sharing, each party locally draws a uniform share. In our protocols, we overload \(\in _\$\) notation to draw sharings: for example, \([\![x ]\!] \in _\$\{0, 1\}\) indicates that each party \(P_i\) draws a uniform share \(x_i\).

Finally, parties can reconstruct the semantic value of a sharing. To do so, each party broadcasts his/her share (or sends it to a specified output party). Upon receiving all shares, each party locally \(\mathtt {XOR}\)s the shares.

For security, we often require that each party’s share be uniformly chosen. We point out where shares are uniform when relevant.

Fig. 1.
figure 1

The functionality defining \(\mathtt {VS}\) gate semantics. \(\mathtt {VS}\) gates allow parties to multiply a sharing of a bitstring by a sharing of a scalar.

2.3 Vector Scalar Multiplication

Next, we review how parties operate non-linearly over sharings. In contrast to typical approaches that consider \(\mathtt {AND}\) gates, we instead consider more general vector scalar multiplication gates, which we call \(\mathtt {VS}\) gates. We consider these more expressive gates because they are needed to review prior work [HKP20] and because we use \(\mathtt {VS}\) gates in our constructions.

To begin, we extend the notion of sharings to vectors. Specifically, we define the sharing of a vector to be a vector of sharings:

$$ [\![\varvec{x} ]\!] = [\![(\varvec{x}[1], ..., \varvec{x}[n]) ]\!] \triangleq ([\![\varvec{x}[1] ]\!], ..., [\![\varvec{x}[n] ]\!]) $$

Suppose we wish to scale a shared vector \(\varvec{x}\) by a shared bit s. That is, we wish to compute the scalar product \([\![s\varvec{x} ]\!]\). Unlike linear operations, this vector scalar multiplication requires the parties to communicate.

[HKP20] showed that parties can use \(p(p-1)\) oblivious transfers (OTs) to implement a \(\mathtt {VS}\) gate. We review their \(\mathtt {VS}\) protocol at a high level; Fig. 1 specifies the protocol functionality. For simplicity, we focus on \(p=2\) parties and length-2 vectors, but the approach generalizes to arbitrary p and n.

Suppose two parties \(P_1, P_2\), holding sharings \([\![s ]\!], [\![(a, b) ]\!]\), wish to compute \([\![(sa, sb) ]\!]\): semantically, they wish to scale the vector (ab) by the bit s.

Observe the following equality over the desired semantic value:

$$\begin{aligned} (sa, sb)&= s(a, b)&\text {Distribute}\\&= (s_1 \oplus s_2)(a_1 \oplus a_2, b_1 \oplus b_2)&\text {Defn. sharing}\\&= (s_1a_1 \oplus s_1a_2 \oplus s_2a_1 \oplus s_2a_2, s_1b_1 \oplus s_1b_2 \oplus s_2b_1 \oplus s_2b_2)&\text {Distribute}\\&= s_1 (a_1, b_1) \oplus s_1 (a_2, b_2) \oplus s_2 (a_1, b_1) \oplus s_2 (a_2, b_2)&\text {Group} \end{aligned}$$

The first and fourth summands can be computed locally by the respective parties. Thus, we need only show how to compute \(s_1(a_2, b_2)\) (the remaining third summand is computed symmetrically). To compute this vector \(\mathtt {AND}\), the parties perform a single 1-out-of-2 OT of length-2 secrets. Here, \(P_2\) plays the OT sender and \(P_1\) the receiver. \(P_2\) draws two uniform bits \(x, y \in _\$\{0, 1\}\) and allows \(P_1\) to choose between the following two secrets:

$$\begin{aligned} (x, y)\qquad (x \oplus a_2, y \oplus b_2) \end{aligned}$$

\(P_1\) chooses based on \(s_1\) and hence receives \((x \oplus s_1a_2, y \oplus s_1b_2)\). \(P_2\) uses the vector (xy) as her share of this summand. Thus, the parties hold \([\![s_1(a_2, b_2) ]\!]\).

Put together, the full vector multiplication s(ab) uses only two 1-out-of-2 OTs of length-2 secrets. \(\mathtt {VS}\) gates generalize to arbitrary numbers of parties and vector lengths: a vector scaling of n elements between p parties requires \(p(p-1)\) 1-out-of-2 OTs of length-n secrets.

\(\mathtt {VS}\) gates are important for our constructions. We present a modification to the above protocol, used once per conditional branch, that is optimized for scalar multiplication of long vectors (see Sect. 6.2). This modification is similar to techniques in [KK13, ALSZ13] and reduces communication by up to half.

2.4 Efficient Conditionals from \(\mathtt {VS}\) Gates: [HKP20] Review

[HKP20] was the first work to significantly reduce the cost of branching in the multi-party setting. Their \(\mathtt {MOTIF}\) protocol extends the classic GMW protocol with \(\mathtt {VS}\) gates in order to amortize oblivious transfers across conditional branches. We review how \(\mathtt {VS}\) gates enable this amortization.

For simplicity, consider two branches computed by two parties. Since the two branches are conditionally composed, one branch is active and one is inactive.

\(\mathtt {MOTIF}\) ’s key invariant, set up by the protocol’s circuit gadgets, is that on each wire of the inactive branch the parties hold a sharing \([\![0 ]\!]\), whereas on the active branch they hold valid sharings. \(\mathtt {XOR}\) gates immediately propagate this invariant: on the inactive branch, \(\mathtt {XOR}\) gates output \([\![0 ]\!]\), while on the active branch \(\mathtt {XOR}\) gates output valid sharings.Footnote 2

Next, we review how \(\mathtt {VS}\) gates make use of and propagate the invariant. Let \([\![a^0 ]\!], [\![b^0 ]\!]\) be sharings held on wires in branch 0 and \([\![a^1 ]\!], [\![b^1 ]\!]\) be sharings held on wires in branch 1. Suppose the parties wish to compute both \([\![a^0b^0 ]\!]\) and \([\![a^1b^1 ]\!]\). Despite the fact that the parties compute two \(\mathtt {AND}\) gates, they need only two 1-out-of-2 OTs. Let t denote the active branch. Hence, \(a^{\bar{t}}\) and \(b^{\bar{t}}\) are both 0.

Observe the following equalities:

$$\begin{aligned} (a^t \oplus a^{\bar{t}})b^t&= (a^t \oplus 0)b^t = a^tb^t\\ (a^t \oplus a^{\bar{t}})b^{\bar{t}}&= (a^t\oplus 0)0 = 0 \end{aligned}$$

Thus computing both \([\![(a^t \oplus a^{\bar{t}})b^t ]\!]\) and \([\![(a^t \oplus a^{\bar{t}})b^{\bar{t}} ]\!]\), propagates the invariant: the active branch receives the correct sharing while the inactive branch receives \([\![0 ]\!]\). These products reduce to a vector-scalar product computable by a \(\mathtt {VS}\) gate (see Fig. 1).

$$ [\![(a^t \oplus a^{\bar{t}})(b^t, b^{\bar{t}}) ]\!] $$

Thus, \(\mathtt {MOTIF}\) computes two \(\mathtt {AND}\) gates for the price of one. This improvement generalizes to an arbitrary number of branches.

Branch Layer Alignment. As discussed in Sect. 1.2, the \(\mathtt {MOTIF}\) protocol is dependent on circuit topology. The less aligned the layers of the branches are (branches that are highly aligned have similar numbers of \(\mathtt {AND}\) gates in each layer), the less the circuit benefits from \(\mathtt {MOTIF}\).

In the above example, the parties issued two OTs to implement the two \(\mathtt {AND}\) gates simultaneously. The parties can only perform this optimization if inputs for both gates are available. If not, the parties cannot amortize the OTs. Hence, gates in different layers cannot share OTs in layer-by-layer evaluation.

In the p-party protocol, in each layer \(\mathtt {MOTIF}\) eliminates all OTs except for the total of \(p(p-1)\cdot \max (w^i)\) OTs, where \(w^i\) is the number of \(\mathtt {AND}\) gates in the current layer of branch i. In contrast, our technique does not depend on the circuit’s topology and is always proportional only to the circuit’s longest execution path.

Fig. 2.
figure 2

The Beaver Triple preprocessing functionality, \(\mathtt {TripleGen}\).

2.5 Semi-honest Triple-Based Protocol Review

In this work, we amortize Beaver triples across conditional branches. We thus review how triples enable non-linear operations over \(\mathtt {XOR}\) sharings.

Suppose the parties hold sharings \([\![x ]\!]\) and \([\![y ]\!]\) and wish to compute a uniform sharing \([\![xy ]\!]\). Suppose further that the parties have a Beaver triple: they have three uniform sharings \([\![a ]\!], [\![b ]\!], [\![ab ]\!]\) where \(a, b \in _\$\{0, 1\}\) are uniform bits unknown to any strict subset of parties. First, the parties locally compute \([\![a \oplus x ]\!]\) and \([\![b \oplus y ]\!]\), then reconstruct the semantic values \(a \oplus x\) and \(b \oplus y\) by broadcasting shares. This is secure: \(a \oplus x\) leaks nothing about x because a is uniform and secret (and similarly for y). The parties can now compute \([\![xy ]\!]\) as follows:

$$\begin{aligned} (a \oplus x)(b \oplus y) \oplus (a \oplus x)[\![b ]\!] \oplus (b \oplus y)[\![a ]\!] \oplus [\![ab ]\!] = [\![xy ]\!] \end{aligned}$$

This protocol is simple and efficient: the parties broadcast only two bits per \(\mathtt {AND}\) gate. However, because the triple values a and b are used as one-time-pads on semantic values, one triple is typically needed per gate. Thus, the parties must preprocess many triples according to the functionality in Fig. 2. Computing this functionality is often the most expensive step in triple-based protocols. For example, triples might be achieved via the classic GMW protocol, requring \(p(p-1)\) OTs per triple. In this work, we show a technique that re-uses triples across conditional branches and hence decreases overall cost.

3 Related Work

We review related work, focusing on works that optimize secure evaluation of conditional branches or that use multiplication triples in the malicious model.

\(\mathtt {MOTIF}\). The most closely related work is \(\mathtt {MOTIF}\)  [HKP20]. \(\mathtt {MOTIF}\) amortizes oblivious transfers across conditional branches in the classic semi-honest GMW protocol [GMW87]. We reviewed this approach in detail in Sect. 2.4, explained why our approach outperforms \(\mathtt {MOTIF}\) in Sect. 1.2, and present experimental comparisons between the two approaches in Sect. 8.

Stacked Garbling. Recent works demonstrated similar conditional improvements for the garbled circuit (GC) technique [Kol18, HK20b, HK20a]. [Kol18, HK20b] reduced communication in settings where one party knows which branch is active. [Kol18] is motivated by the use case where the GC generator knows the active branch, such as when evaluating one of several database queries. [HK20b] is motivated by zero knowledge proofs. [HK20a] superceded these prior works and for the first time showed that communication can be greatly improved even if no party knows which branch is active.

These works’ “stacking” technique does not have an obvious analog for interactive multiparty protocols, so different techniques are needed, such as explored in [HKP20] and in this work. However, our approach follows the basic idea of material re-use introduced by Stacked Garbling: the expensive material is safely re-used in the (possibly incorrect) evaluation of inactive branches, whose output is obliviously discarded

Universal Circuits. We improve branching via cryptographic techniques. Another approach instead recompiles conditionals into a new form. Universal circuits (UCs) are programmable constructions that can evaluate any circuit up to a given size n. Branches can be compiled to one UC, potentially amortizing cost. At runtime, the UC can be programmed to compute the active branch.

Decades after Valiant’s original work [Val76], UC enjoyed renewed interest due to its relevance to MPC, and UC constructions have steadily improved [KS08, LMS16, GKS17, AGKS19, KS16, ZYZL18]. Even with these improvements, representing conditional branches with UCs is often impractical. The state-of-the-art UC construction applied to a circuit with n gates still has factor \(3 \log n\) overhead [LYZ+20]. Thus, UC-based conditional evaluation is often more expensive than simply evaluating the condition naïvely. UC-based branching is superceded by cryptographic techniques such as Stacked Garbling, \(\mathtt {MOTIF}\), and this work.

Maliciously Secure Triple-Based Protocols. We present an improved triple-based semi-honest protocol. Two exciting and related lines of work explore triple-based protocols in the malicious model. These two lines differ primarily in how they preprocess triples. One line generates triples using homomorphic encryption [BDOZ11, DPSZ12, DKL+13, KPR18] while another generates them using oblivious transfer [NNOB12, LOS14, FKOS15, KOS16, CDE+18]. To achieve malicious security, these methods rely on expensive primitives such as zero knowledge proofs and cut-and-choose. As a result, preprocessing is expensive.

Amortizing triples in these protocols would be an important improvement. While we make no claims in the malicious model, malicious improvements have historically been preceded by similar improvements in the semi-honest model. We leave investigating triple amortization in the malicious model as future work.

4 Technical Overview

As reviewed in Sect. 2.5, Beaver triples can efficiently and securely implement \(\mathtt {AND}\) gates. In general, triples cannot be re-used, and hence a circuit with b branches each with n \(\mathtt {AND}\) gates typically requires \(n\cdot b\) triples.

As discussed in Sect. 1.1, our key observation is that triples can be re-used across conditional branches, as long as uniform \(\mathtt {XOR}\) masks are additionally applied. These masks allow us to re-use the same triple to compute b gates across b branches. Thus b branches each with n gates require only n triples, improving the number of needed triples by factor b. Our technique does require the parties to hold additional shared per-branch masks, but these masks are computed cheaply.

This section presents our protocol, \(\varPi _{\mathsf {MT}}\) (the ‘masked triple protocol’), with detail sufficient to understand our contribution. \(\varPi _{\mathsf {MT}}\) securely computes Boolean circuits among p parties, re-uses triples across conditional branches, and is secure against the semi-honest corruption of up to \(p-1\) parties. Full formal algorithms, with accompanying proofs of correctness and security, are in Sect. 5.

4.1 Re-Using Beaver Triples

For simplicity, consider only two branches, \(C^0\) and \(C^1\) and, without loss of generality, let \(C^0\) be the active branch. The parties re-use the same set of triples for both branches. For the inactive branch, the parties will mask the triples with sharings of uniform bits; on the active branch the parties will mask the triples with sharings of zeros.

Suppose the parties hold sharings \([\![x^0 ]\!], [\![y^0 ]\!]\) on branch 0 and \([\![x^1 ]\!], [\![y^1 ]\!]\) on branch 1. Suppose further that they wish to obliviously compute one of \([\![x^0y^0 ]\!]\) or \([\![x^1y^1 ]\!]\), depending on which branch is active. Let \([\![a ]\!], [\![b ]\!], [\![ab ]\!]\) be a uniform preprocessed triple. On the active branch, the parties mask \([\![a ]\!]\) and \([\![b ]\!]\) with uniform sharings of 0:

$$ [\![a ]\!] \oplus [\![0 ]\!] = [\![a ]\!] \qquad [\![b ]\!] \oplus [\![0 ]\!] = [\![b ]\!] $$

The parties use this masked triple to compute branch 0’s \(\mathtt {AND}\) gate normally: the parties compute and reconstruct \(a \oplus x^0\) and \(b \oplus y^0\), and then locally compute the correct product:

$$ (a\oplus x^0)(b\oplus y^0) \oplus (a\oplus x^0)[\![b ]\!] \oplus (b\oplus y^0)[\![a ]\!] \oplus [\![ab ]\!] = [\![x^0y^0 ]\!] $$

In contrast, on the inactive branch the parties mask their shares with uniform bits. Let \(r, s \in _\$\{0, 1\}\) be two such bits and let the parties hold uniform sharings \([\![r ]\!]\), \([\![s ]\!]\). The parties compute:

$$ [\![a ]\!] \oplus [\![r ]\!] = [\![a \oplus r ]\!] \qquad [\![b ]\!] \oplus [\![s ]\!] = [\![b \oplus s ]\!] $$

When the parties use this masked triple, they compute and reconstruct \(a \oplus r \oplus x^1\) and \(b \oplus s \oplus y^1\), and then locally compute the following expression:

$$ (a\oplus r \oplus x^1)(b\oplus s \oplus y^1) \oplus (a \oplus r\oplus x^1)[\![b \oplus s ]\!] \oplus (b\oplus s\oplus y^1)[\![a \oplus r ]\!] \oplus [\![ab ]\!] $$

The above expression does not correctly compute \([\![x^1y^1 ]\!]\), but this is irrelevant since all computations performed in the inactive branch are ultimately discarded by the circuit’s conditional behavior.

Now, consider the security of the above re-use. As discussed above, each party views the following reconstructed semantic values:

$$ a \oplus x^0\quad b \oplus y^0 \quad a \oplus r \oplus x^1 \quad b \oplus s \oplus y^1 $$

Because abrs are all uniform, this view is simulated by four uniform bits. Thus, our approach is secure. See Sect. 5.3 for a formal proof.

Fig. 3.
figure 3

The \(\mathtt {MaskGen}\) functionality provides parties with the pairs of masks needed to implement our optimization. The functionality computes two shared bitstrings. One bitstring is the all zero bitstring while the other is uniform. The two strings are swapped according to s, and the parties are given \([\![s ]\!]\).

Although we have shown that mask sharings allow triple amortization, we have not discussed how these sharings are computed. Figure 3 formalizes \(\mathtt {MaskGen}\), a preprocessing functionality that computes strings of masks \(M^0\) and \(M^1\) such that (1) the parties receive a uniform sharing \([\![s ]\!]\) where \(s \in _\$\{0, 1\}\), (2) \(M^s\) is a uniform sharing of all zeros, and (3) \(M^{\bar{s}}\) is a uniform sharing of random bits. During the preprocessing phase, the parties use \(\mathtt {MaskGen}\) to preprocess strings with size sufficient to mask each triple. We formalize and prove secure \(\varPi _{\mathsf {MT}}\) in the \(\mathtt {MaskGen}\)-hybrid (and \(\mathtt {TripleGen}\)-hybrid) model. Instantiations of \(\mathtt {MaskGen}\) are provided and proved secure in Sect. 6.

Entering a Conditional. \(\mathtt {MaskGen}\) constructs two bitstrings that are ordered according to a uniform bit s (the parties hold a uniform sharing \([\![s ]\!]\)). To use our approach, the parties need to appropriately ‘line up’ the masks with the branches: the active branch should use the all zeros mask and the inactive branch should use the uniform mask. We assume the parties have explicit access to a sharing of the branch condition: the parties hold \([\![t ]\!]\). Upon entering the conditional, the parties compute \([\![s \oplus t ]\!]\) and then broadcast their shares to reconstruct \(s \oplus t\). If \(s \oplus t\) is 0, the parties do nothing. Otherwise, they locally swap their respective shares of the strings \(M^0\) and \(M^1\). After performing this conditional swap, the parties are assured that \(M^t\) is the all zeros mask and \(M^{\bar{t}}\) is uniform. Note, \(s \oplus t\) does not reveal the active branch t because s is uniform.

Exiting a Conditional. Exiting conditionals is performed using ordinary Boolean logic. Let \([\![x^0 ]\!]\), \([\![x^1 ]\!]\) be corresponding output sharings from the two branches. We leave the branch by multiplexing each such pair of outputs: we compute \([\![x^0 \oplus t(x^0 \oplus x^1) ]\!] = [\![x^t ]\!]\). Thus, multiplexing requires one \(\mathtt {AND}\) gate per conditional output.

4.2 Nested Branches

We have presented a technique for handling conditionals with only two branches. To generalize to higher branching factors, we nest conditionals. At each conditional, we use \(\mathtt {MaskGen}\) to generate fresh masks and then apply these masks to the (possibly already masked) triples. This trivially and securely allows us to handle arbitrary branching control flow.

As a brief argument of security, consider that each branch uses a distinct mask string from each of its parent conditionals. Further, if the branch is inactive (1) at least one mask string will be uniform and (2) the \(\mathtt {XOR}\) sum of all uniform mask strings for the branch is unique. Thus, all \(\mathtt {AND}\) gate broadcasts can be simulated by uniform bits. We argue this more formally in Sect. 5.3.

We note that instead of nesting, it is possible to generalize our approach to directly handle vectors of conditionals, e.g., corresponding to program switch statements. This direction is not necessarily preferable: for a circuit with b branches, both techniques amortize a triple across up to b gates, and the work required to generate masks is very similar. We present the nested formalization due to its generality and relative simplicity.

5 \(\varPi _{\mathsf {MT}}\): Formalization and Proofs

We now present \(\varPi _{\mathsf {MT}}\) formally. Section 5.1 begins by defining circuit syntax, including circuits with explicit conditional branching. We then specify our protocol in Sect. 5.2 and prove it correct and secure in Sect. 5.3.

5.1 Circuit Formal Syntax

Conditional branching is central to our approach. Thus, traditional circuits that include only low-level gates are insufficient for our formalization. We instead use the syntax of [HK20a] which makes explicit conditional branching. We review and formally present their syntax.

Conventionally, a circuit is a list of Boolean gates together with specified input and output wires. We refer to this representation as a netlist. We do not modify the semantics of netlists and evaluate them using the standard triple-based technique (see Sect. 2.5).

We extend the space of circuits with notion of a conditional. A conditional is parameterized over two circuits, \(C^0\) and \(C^1\). By convention, the first bit of input to the conditional is the branch condition t. The semantics of a conditional is that branch \(C^t\) is given the remaining input to the overall conditional, and \(C^t\)’s output is returned.

Finally, we require an extra notion that allows us to place conditionals ‘in the middle’ of the overall circuit. A sequence is parameterized over two circuits \(C'\) and \(C''\). When executed, the sequence passes its input to \(C'\), feeds the output of \(C'\) as input to \(C''\), then returns the output of \(C''\).

More formally, the space of circuits \(\mathcal {C}\) is defined inductively. Let \(C^0, C^1, C', C''\) be arbitrary circuits. The space of circuits is defined as follows:

$$ \mathcal {C}\triangleq Netlist(\cdot ) \mid Cond(C^0, C^1) \mid Seq(C', C'') $$

That is, a circuit is either a (1) netlist, a (2) conditional, or a (3) sequence. By arbitrarily nesting conditionals and sequences, we may achieve complex branching control structure.

Fig. 4.
figure 4

\(\varPi _{\mathsf {MT}}\) allows p parties to securely compute a circuit \(C\in \mathcal {C}\). \(\varPi _{\mathsf {MT}}\) delegates to a recursive sub-procecure \(\mathtt {eval}\).

5.2 \(\varPi _{\mathsf {MT}}\) Formalization

Figure 4 presents our protocol for handling circuits with conditional branching. \(\varPi _{\mathsf {MT}}\) first delegates to \(\mathtt {TripleGen}\), generating sufficient multiplication triples to handle the circuit, and then delegates to the sub-protocol \(\mathtt {eval}\). \(\mathtt {eval}\) recursively walks the structure of the circuit and securely achieves circuit semantics.

\(\varPi _{\mathsf {MT}}\) formalizes the ideas stated in Sect. 4 in a natural manner. The most interesting case in \(\mathtt {eval}\) is the handling of conditionals, where we (1) invoke the \(\mathtt {MaskGen}\) oracle, (2) mask the available triples, and (3) recursively evaluate both branches. Although we for clarity write \(\mathtt {MaskGen}\) inline, the actual \(\mathtt {MaskGen}\) protocol does not depend on any circuit values, and thus can be moved to a preprocessing phase. After evaluating both branches, we discard the inactive branch outputs and propagate the active branch outputs via a multiplexer. The multiplexer is implemented simply as a netlist, and computes the following function for each corresponding pair of branch outputs \([\![x^0 ]\!], [\![x^1 ]\!]\):

$$\begin{aligned}{}[\![x^0 \oplus t(x^0 \oplus x^1) ]\!] = [\![x^t ]\!] \end{aligned}$$
(1)

For simplicity, we abstract some algorithms and briefly describe them below. Other than \(\varPi _{\mathtt {Base}}\), which is discussed in Sect. 2.5, we do not write these algorithms in full, as they are simple.

  • \(\varPi _{\mathtt {Base}}\) is the standard triple-based protocol as specified in Sect. 2.5. \(\varPi _{\mathtt {Base}}\) takes as input (1) a vector of gates \((g_1,...,g_k)\), (2) a vector of (possibly masked) triples \([\![\varvec{triples} ]\!]\) and (3) the netlist input \([\![\varvec{inp} ]\!]\). \(\varPi _{\mathtt {Base}}\) returns a sharing of outputs \([\![\varvec{out} ]\!]\). We emphasize that while we do not, for simplicity, explicitly list \(\varPi _{\mathtt {Base}}\), the protocol is not a black-box functionality.Footnote 3

  • \(\mathtt {neededtriples}\) computes the number of needed triples for the circuit \(C\). This computed number is equal to the number of \(\mathtt {AND}\) gates on the circuit’s longest execution path.

  • \(\mathtt {shareinput}\) allows the parties to construct and distribute sharings of their respective private inputs.

  • \(\mathtt {reconstruct}\) allows parties to reconstruct a sharing via broadcast.

  • \(\mathtt {mux}\) computes the per-output multiplexer function (Eq. 1).

  • \(\mathtt {applymask}\) specifies how mask sharings are \(\mathtt {XOR}\)ed onto triples. Specifically, for each uniformly shared triple \([\![a ]\!], [\![b ]\!], [\![c ]\!]\), we draw two bits from the mask sharing and \(\mathtt {XOR}\) one bit onto both \([\![a ]\!]\) and \([\![b ]\!]\).

5.3 \(\varPi _{\mathsf {MT}}\) Proofs

Now that we have introduced \(\varPi _{\mathsf {MT}}\), we prove it both correct and secure in the \(\mathtt {MaskGen}\)- and \(\mathtt {TripleGen}\)-hybrid model. We instantiate \(\mathtt {MaskGen}\) in Sect. 6.

In both proofs, we refer to the notion of a valid triple. A triple is valid if it is uniformly shared and of the following form: \([\![a ]\!], [\![b ]\!], [\![ab ]\!]\). That is, the third term is a share of the product of the first two terms. An invalid triple is a triple \([\![a ]\!], [\![b ]\!], [\![c ]\!]\) such that \(c \ne ab\). Invalid triples arise in our protocol due to the application of extra masks to the first two entries in triples.

Theorem 1

( \(\varPi _{\mathsf {MT}}\) correctness). For all circuits \(C\in \mathcal {C}\) and all private inputs \(\varvec{inp}_1,...,\varvec{inp}_p \in \{0, 1\}^*\):

$$ \varPi _{\mathsf {MT}}(C, \varvec{inp}_1, ..., \varvec{inp}_p) = C(\varvec{inp}_1, ..., \varvec{inp}_p) $$

Proof

By induction on the structure of \(C\). The inductive invariant is as follows:

If the triples passed to \(\mathtt {eval}\) are valid, then \(\mathtt {eval}\) correctly implements the semantics of \(C\).

We focus on conditionals; the correctness of netlists follows trivially from the standard triple-based protocol. The correctness of sequences is immediate.

Suppose \(C\) is a conditional \(Cond(C^0, C^1)\). Further, suppose \([\![t ]\!]\) is the branch condition. If the triples passed to the conditional are invalid, then the inductive invariant vacuously holds. Thus, we need only consider evaluation of a conditional on valid triples. The oracle call to \(\mathtt {MaskGen}\) constructs two mask strings \(M^0, M^1\) such that \(M^s\) is all-zero and \(M^{\bar{s}}\) is uniform. By reconstructing \(s \oplus t\) and accordingly locally swapping the two mask strings, the parties ensure \(M^t\) is the all-zero string. Thus, \(C^t\) is given valid triples (the valid triples are masked by sharings of zeros and hence remain valid) and, by induction, returns the correct semantic outputs. \(C^{\bar{t}}\) will not return correct values, but these outputs are discarded by the multiplexer. Thus, conditionals support the inductive invariant.

The top level circuit is given valid triples via the oracle call to \(\mathtt {TripleGen}\). This fact, combined with the inductive invariant, implies that \(\varPi _{\mathsf {MT}}\) is correct.    \(\square \)

Theorem 2

( \(\varPi _{\mathsf {MT}}\) security). \(\varPi _{\mathsf {MT}}\) is secure against semi-honest corruption of up to \(p - 1\) parties in the \(\mathtt {TripleGen}\)-hybrid and \(\mathtt {MaskGen}\)-hybrid model.

Proof

By construction of a simulator for one party, which we later generalize to simulate up to \(p-1\) parties. Each broadcast received by a party can be simulated by a uniform bit.Footnote 4 We prove this simulation secure by induction on the structure of the circuit \(C\). The inductive invariant is as follows:

Let \([\![a ]\!], [\![b ]\!], [\![c ]\!]\) be a (possibly invalid) triple. For each triple, we refer to the semantic values a and b as the one-time-pad parts. \(\mathtt {eval}\) uses both one-time-pad parts of each triple to mask at most one cleartext value.

For netlists, this is trivial: we use a distinct triple for each \(\mathtt {AND}\) gate, and each one-time-pad part is used only to mask one of the gate inputs. Similarly, sequences satisfy the inductive invariant trivially: we provide different triples to both parts of the sequence.

Therefore we focus on conditionals. Consider a conditional \(Cond(C^0, C^1)\). As a brief aside from proving that the inductive invariant holds, while the parties reconstruct \(s \oplus t\), s is uniform, and hence this leaks nothing about t (i.e., \(s \oplus t\) can be simulated by a uniform bit). Now, returning to the invariant: We first split the triples into sufficient numbers for the conditional body and for the multiplexer. The multiplexer is implemented by a netlist, and hence trivially satisfies our invariant. The conditional body is more complicated. Indeed, we use the same triples to evaluate both branches. However, our call to \(\mathtt {MaskGen}\) together with the conditional swap ensures that \(M^{\bar{t}}\) is a sharing of uniform bits. When we apply \(M^{\bar{t}}\) to the triples, we re-randomize the one-time-pad parts of the triples. (Note, applying \(M^{t}\) (the all zeros mask) has no effect on the one-time-pad parts.) Thus, we provide independent one-time-pad parts to both \(C^0\) and \(C^1\), satisfying the inductive invariant.

Because each one-time-pad part is (1) uniform and (2) used to mask at most one cleartext value, and because each broadcast is masked by a one-time-pad part, each broadcast can be simulated by a uniform bit. Thus, we can simulate a single party’s view.

The generalization from simulating one party to simulating up to \(p-1\) is based on a simple observation about \(\mathtt {XOR}\) secret shares: the view of \(p-1\) parties holds no more information than the share of 1 party. The remaining broadcasts from the remaining, unsimulated parties can be simulated by uniform bits.

\(\varPi _{\mathsf {MT}}\) is secure against semi-honest corruption of up to \(p-1\) parties.    \(\square \)

Some MPC techniques, e.g., computing multiplicative inverse [BIB89], rely on opening (randomized) intermediate values. This may not always be compatible with our optimization, since our randomization of the inactive branch may cause an invalid opened value, thereby revealing that it was in fact inactive.

6 Semi-Honest \(\mathtt {MaskGen}\) Instantiations

In this section, we instantiate \(\mathtt {MaskGen}\) (Fig. 3). We present three protocols, two formally and one informally, that follow two general approaches:

  1. 1.

    The first approach is generic in that it works for an arbitrary number of parties and is based on vector scalar multiplication (Sect. 2.3). Since our approach often uses long masks, we introduce a useful trick that improves vector scalar multiplication for long vectors.

  2. 2.

    In special cases, masks can be more efficiently derived starting from short seeds. We present two and three-party protocols which require communication proportional only to \(\kappa \) rather than to the mask length n.

6.1 p-Party Mask Generation

Our general mask generation technique, \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) (Fig. 5), allows p parties to preprocess length-n masks using only a single \(\mathtt {VS}\) gate.

In this protocol, parties jointly sample a uniform sharing of a uniform bit \([\![s ]\!]\) and a uniform bitstring \([\![\varvec{r} ]\!]\). The parties compute \([\![s\varvec{r} ]\!]\) via a \(\mathtt {VS}\) gate, set the first mask to \([\![M^0 ]\!] = [\![s\varvec{r} ]\!]\), and set the second mask to \([\![M^1 ]\!] = [\![s \varvec{r} ]\!] \oplus [\![\varvec{r} ]\!]\).

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) is correct and secure.

Theorem 3

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) correctly implements \(\mathtt {MaskGen}\).

Proof

s and \(\varvec{r}\) are uniform. Depending on s, the product \(s\varvec{r}\) is, of course, either all zeros or \(\varvec{r}\). Thus, setting \([\![M^0 ]\!] = [\![s\varvec{r} ]\!]\) and \([\![M^1 ]\!] = [\![s \varvec{r} ]\!] \oplus [\![\varvec{r} ]\!]\) places the all zeros mask in \(M^s\).    \(\square \)

Fig. 5.
figure 5

Protocol \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) is our default method for generating masks, and is secure for an arbitrary number of parties.

Theorem 4

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) is secure against semi-honest corruption of up to \(p - 1\) parties in the \(\mathtt {VS}\)-hybrid model.

Proof

We communicate only once: when evaluating a single \(\mathtt {VS}\) gate. Hence, the simulator is trivially constructed from the \(\mathtt {VS}\) gate simulator.    \(\square \)

Fig. 6.
figure 6

\(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) can be used to evaluate the interactive subterms that emerge from computing a \(\mathtt {VS}\) gate.

Vector Scalar Multiplication for Long Vectors. We have shown that \(\mathtt {VS}\) gates can efficiently compute pairs of masks. However, this requires us to evaluate \(\mathtt {VS}\) gates over potentially long vectors: we compute \(\mathtt {VS}\) gates over vectors with length proportional to the number of \(\mathtt {AND}\) gates, which can be arbitrarily high.

As discussed in Sect. 2.3, we decompose vector scalar products into summands, some that are computed locally and others that are computed interactively. For each interactive summand, one party holds a bit a, one a vector \(\varvec{b}\), and the two must jointly compute \([\![a\varvec{b} ]\!]\). Let n be the length of \(\varvec{b}\). To compute this product, the protocol presented by [HKP20] requires two messages of length n. In this section, we introduce a natural trick, \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \), that reduces this communication cost by half: only one message of length n need be sent, and the other can be derived from a pseudo-random seed. Both the functionality and the protocol are listed in Fig. 6. We explain the \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) trick in more detail in our proof of correctness. Our trick is similar to techniques in [KK13, ALSZ13]. Recall (from Sect. 2.3) that \(p(p-1)\) interactive summands emerge from a single vector scalar multiplication. Thus, we can compose a full vector scalar multiplication protocol from \(p(p-1)\) calls to \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \). We refer to this full protocol as \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \).

Theorem 5

\(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \), and hence \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \), is correct.

Proof

The key observation is that \(P_1\)’s input bit a determines one of two possible outcomes for the vector scalar multiplication. If \(a=0\), the output is a sharing of all zeros. In this case, \(P_1\) and \(P_2\)’s output shares must \(\mathtt {XOR}\) to zeros. If \(a=1\), the output is a sharing of \(P_2\)’s input vector \(\varvec{b}\).

We achieve this functionality with a single 1-out-of-2 OT of length \(\kappa \) strings. \(P_1\) acts as the OT receiver and uses as her choice bit a. If \(a=0\), \(P_1\) receives the seed that \(P_2\) used to generate her share. \(P_1\) expands this seed and obtains the same share as \(P_2\). If \(a=1\), \(P_1\) receives a key that helps him to decrypt a ciphertext sent separately by \(P_2\). The ciphertext holds a valid share of \(\varvec{b}\).

The correctness of \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) is immediate from the correctness of \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) and [HKP20]’s \(\mathtt {VS}\) instantiation.    \(\square \)

Next, we prove this faster vector scalar multiplication procedure secure. Ideally, we would modularly prove \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) and \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) secure by simulation. Unfortunately, this is not possible. Specifically, suppose that in \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) \(P_1\) provides input \(a = 0\). In this case, \(P_1\) outputs the expansion of the pseudorandom seed \(\varvec{w}\) received by the OT oracle. Now we need to simulate \(\varvec{w}\) that matches the expansion \(G(\varvec{w})\) output by the protocol. Since G is assumed secure, this simulation is infeasible. Therefore, we forego modularity, and instead prove the security of our top level circuit protocol, \(\varPi _{\mathsf {MT}}\), but where we instantiate the \(\mathtt {MaskGen}\) functionality based on \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \). With our PRG-utilizing subprocedures ‘inlined’, we can prove the top-level protocol secure, since the expansions of PRG seeds no longer appear as protocol outputs and we can simulate the seeds simply by random strings.

Theorem 6

Let \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\) be the protocol \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) (Fig. 5), with \(\mathtt {VS}\) instantiated by \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \). Let \(\varPi _{\mathsf {MT}}'\) be the protocol \(\varPi _{\mathsf {MT}}\) (Fig. 4), with \(\mathtt {MaskGen} \) instantiated by \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\). \(\varPi _{\mathsf {MT}}'\) is secure against semi-honest corruption of up to \(p-1\) parties in the \(\mathtt {TripleGen}\)-hybrid and OT-hybrid model.

Proof

By construction of a simulator.

The proof is similar to that of Theorem 2, so we elide most details. Because we explicitly instantiate \(\mathtt {MaskGen} \) with \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\), we focus on the corresponding difference in the proof and explain how we simulate \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\) messages. Namely, we argue that all messages of \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\) are simulated by uniform bits.

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\) invokes \(\mathtt {\Pi }{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \), which in turn makes \(2(p-1)\) per-party calls to \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \). Each pair of parties \(P_i, P_j\) jointly call \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \) twice, once where \(P_i\) is the receiver and once where \(P_i\) is the sender.

When \(P_i\) is the receiver, he receives two messages:

  • First, \(P_i\) receives from \(P_j\) an encrypted share of \(\varvec{b}\): \(P_j\) chooses a PRG seed \(\varvec{k}\), expands \(G(\varvec{k})\), and then sends \(G(\varvec{k}) \oplus G(\varvec{S}) \oplus \varvec{b}\) to \(P_i\). The simulator can simulate this received message by uniform bits because \(\varvec{k}\) and \(\varvec{S}\) are both uniform and because G is a secure PRG.

  • Second, \(P_i\) receives a message from the OT oracle. Depending on its input bit \(a_i\), \(P_i\) receives either the seed \(\varvec{k}\) or the seed \(\varvec{S}\) (which was used to generate \((a\varvec{b})_2\)). In either case, \(\varvec{k}\) and \(\varvec{S}\) are simulated by a uniform string. This does not conflict with the previously simulated message \(G(\varvec{k}) \oplus G(\varvec{S}) \oplus \varvec{b}\), since one of the seeds \(\varvec{k}\) or \(\varvec{S}\) remains hidden from \(P_i\).

If \(P_i\) is the sender, no message is received; \(P_i\)’s view is trivially simulated.

It is easy to see that the masks produced by \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} '\) are used exactly once in \(\varPi _{\mathsf {MT}}'\), and hence the inductive invariant of Theorem 2 is maintained.

\(\varPi _{\mathsf {MT}}'\) is secure.    \(\square \)

Fig. 7.
figure 7

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \) is an efficient two-party protocol for generating masks.

6.2 Efficient 2PC and 3PC Mask Generation

In this section, we present two efficient implementations of \(\mathtt {MaskGen}\), one for two parties, and one for three. At a high level, these methods are based on (1) distributing pseudo-random seeds and (2) expanding the seeds with a PRG into n-bit masks. The advantage of these seed-based methods is that they use communication proportional only to \(\kappa \). This is a significant improvement over \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \), which uses communication proportional to the mask length n.

Two Party Improved Protocol. Figure 7 presents our protocol for two parties, \(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\). Here, the parties use vector scalar multiplication to distribute XOR sharings of two length-\(\kappa \) strings; one sharing encodes a uniform string and one encodes the all zeros string. The parties then interpret their respective shares as PRG seeds and apply G. Because of the nature of XOR sharings, this means that for the all zeros sharing, the parties generate the same pseudorandom expansion, so the resultant expansions are a sharings of all zeros. In contrast, the expansion of the random sharing leads to a larger pseudorandom sharing.

By using this protocol, the two parties can share arbitrarily long masks at the cost of only \(O(\kappa )\) bits of communication.

Theorem 7

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \) correctly implements \(\mathtt {MaskGen}\).

Proof

By the correctness of \(\mathtt {VS}\) gates and properties of XOR shares.

One of \([\![\varvec{S}^0 ]\!]\) and \([\![\varvec{S}^1 ]\!]\) is a sharing of zeros while the other is a sharing of a random bitstring. The position of the all-zeros sharing is determined by a uniform bit s. Consider one such sharing, and interpret both shares as PRG seeds. If the parties’ two seeds are the same, then the expanded masks will also be the same, and will therefore \(\mathtt {XOR}\) to zeros. If the parties’ two seeds differ, then, by the properties of the PRG, the expanded masks will \(\mathtt {XOR}\) to a uniform value.

\(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \) is correct.    \(\square \)

We next prove \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \) secure. Like \(\mathtt {\Pi }{\text {-}}\mathtt {Half}{\text {-}}\mathtt {VS}{\text {-}}\mathtt {Long} \), we unfortunately cannot modularly prove this protocol secure by simulation: each party outputs the expansion of a PRG seed that appears in the party’s view. We therefore instead prove that \(\varPi _{\mathsf {MT}}\) is secure in the case where we instantiate \(\mathtt {MaskGen} \) with \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \). This higher level approach works because the output of a PRG does not appear as final output, so the PRG seeds can be simulated.

Theorem 8

( \(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\) Security). Let \(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\) be \(\varPi _{\mathsf {MT}}\) (Fig. 4), where we instantiate \(\mathtt {MaskGen} \) with \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \). \(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\) is secure against semi-honest corruption of a single party in the \(\mathtt {TripleGen}\)-hybrid and \(\mathtt {VS}\)-hybrid model.

Proof

The proof is nearly identical to that of \(\varPi _{\mathsf {MT}}\) (Theorem 2); we therefore focus our discussion on the call to \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \).

In \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \), the parties jointly sample uniform sharings \([\![\varvec{s} ]\!]\) and \([\![\varvec{S} ]\!]\), and then compute \([\![s\varvec{S} ]\!]\) via \(\mathtt {VS}\). \(\mathtt {VS}\) outputs uniform sharings, and so the message each party receives from the \(\mathtt {VS}\) oracle is simulated by uniform bits. The parties locally expand their shares to obtain masks \([\![M^0 ]\!]\) and \([\![M^1 ]\!]\). Because G is a secure PRG, \([\![M^{\bar{s}} ]\!]\) is a sharing of a uniform string.

Now, recall Theorem 2’s inductive invariant: we must ensure that the top-level protocol uses the one-time-pad part of each multiplication triple at most once. \(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\) XORs the current triples with both \([\![M^0 ]\!]\) and \([\![M^1 ]\!]\). Because \([\![M^{\bar{s}} ]\!]\) is a sharing of a uniform string, this appropriately rerandomizes the triples into the inactive branch, and hence we support the inductive invariant.

\(\varPi _{\mathsf {MT}}{\text {-}}\mathtt {2P}\) is secure against semi-honest corruption of a single party.    \(\square \)

Three Party Informal \(\mathtt {MaskGen}\) Protocol. The three party efficient \(\mathtt {MaskGen}\) protocol is a relatively straightforward generalization of the two party protocol. However, the mask generation is notationally complex, so for simplicity we present informally. A similar technique was used in [BKKO20] to help to construct a two-private three-server distributed point function.

Unlike the two-party protocol, \(P_1, P_2\), and \(P_3\) each obtain two pairs of seeds. Each pair is used to generate one mask by (1) expanding both seeds with a PRG into an n-bit string and (2) \(\mathtt {XOR}\)ing the two expanded outputs together. At a high level, we ensure that:

  1. 1.

    For the all zeros mask, each party holds the same seed as one other party. Thus, their PRG expansions \(\mathtt {XOR}\) to zeros.

  2. 2.

    For the uniform mask, each party holds a seed distinct from all other parties. Thus, their PRG expansions \(\mathtt {XOR}\) to a uniform mask.

The key difficulty is in making the two above scenarios indistinguishable from the perspective of any strict subset of parties. We contrast these two scenarios, showing that they appear indistinguishable.

In the first case, the parties are given seeds as follows:

$$\begin{aligned} P_1: \varvec{S}_1, \varvec{S}_2\qquad P_2: \varvec{S}_1, \varvec{S}_3\qquad P_3: \varvec{S}_2, \varvec{S}_3 \end{aligned}$$

If we consider an adversary who corrupts any two parties, he will see that one seed is shared between them and the others appear uniform.

In the second case the parties are given seeds as follows:

$$\begin{aligned} P_1: \varvec{S}_4, \varvec{S}_5\qquad P_2: \varvec{S}_4, \varvec{S}_6\qquad P_3: \varvec{S}_4, \varvec{S}_7 \end{aligned}$$

As in the first case, an adversary that corrupts two parties sees one seed in common; the others are uniform. Hence, the cases are indistinguishable for any one or two parties.

Thus, \(\varPi _{\mathsf {MT}}\) instantiated with this three party \(\mathtt {MaskGen} \) trick results in a secure and correct protocol. Seed distribution can easily be implemented by GMW extended with \(\mathtt {VS}\) gates: the parties sample seven uniform seeds \(\varvec{S}_1,...,\varvec{S}_7 \in _\$\{0, 1\}^\kappa \), swap them using a \(\mathtt {VS}\) gate, and output each of them to the appropriate party.

7 Implementation

We implemented our approach in C++. Specifically, we implemented \(\varPi _{\mathsf {MT}}\), instantiating \(\mathtt {MaskGen}\) with both \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) and \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \) (we did not implement the three-party variant). We instantiated \(\mathtt {TripleGen}\) with the natural approach based on random OT. For comparison, we also (1) implemented a standard triple-based protocol and (2) incorporated \(\mathtt {MOTIF}\) ’s implementation into our repository. We discuss key aspects of our implementation in Sect. 7.1.

To the best of our knowledge, there is no comprehensive suite of MPC benchmark circuits, particularly for circuits that include conditional branches. Thus, we implemented a random circuit generator to produce benchmarks. In designing the circuit generator, our key goal was to capture the impact of branch alignment on \(\mathtt {MOTIF}\) ’s performance such that we can highlight our improvement. The circuit generator samples circuits with a variety of branch alignments. We describe details of circuit generation in Sect. 7.2.

7.1 Key Implementation Aspects

Our implementation of \(\varPi _{\mathsf {MT}}\) is straightforward, but we note some of its interesting aspects. We use the 1-out-of-2 OT protocol of [IKNP03] as implemented by EMP [WMK16] in order to generate both triples and masks. In \(\varPi _{\mathsf {MT}}\) and standard triple-based protocol, we list \(\mathtt {AND}\) gates in layers so that we can parallelize broadcasts for \(\mathtt {AND}\)s in the same circuit layer. \(\mathtt {MOTIF}\) similarly parallelizes OTs for \(\mathtt {VS}\) gates in the same layer. Thus, all three protocols use communication rounds proportional to the circuit’s multiplicative depth.

7.2 Random Circuit Generation

Circuit generation consists of three main steps:

  1. 1.

    We parameterize circuits on the numbers of conditional branches, the number of circuit layers, the number of \(\mathtt {XOR}\) and \(\mathtt {AND}\) gates per branch, and the number of input/output wires to each branch. Each branch uses the same parameters.

  2. 2.

    We uniformly assign a number of gates to each branch’s layers. We implement this functionality with a RANCOM algorithm [NW78], which is based on the balls-in-cells problem and is called separately for each branch.

  3. 3.

    We connect the gates layer by layer. Specifically, we maintain a pool of wires whose value has already been assigned (i.e., it is a branch input or the output of a gate). For each gate, we uniformly sample two inputs from the pool and choose a fresh output wire. Once a layer has been entirely connected, we add all of that layer’s gate outputs to the pool.

The above strategy is relatively ad-hoc, and may not be representative of all applications. Again, we adopt the above approach (1) to show the impact of circuit alignment on our relative performance over \(\mathtt {MOTIF}\) and (2) because no standard benchmark suite exists.

8 Performance Evaluation

We compare \(\varPi _{\mathsf {MT}}\) to \(\mathtt {MOTIF}\) and the standard triple-based protocol. We compare these protocols for various numbers of parties. All experiments were run on a commodity laptop running Ubuntu 20.04 with an Intel(R) Core(TM) i5-8350U CPU @ 1.70 GHz and 16 GB RAM. All parties were run on the same machine and network settings were configured with the tc command. We averaged each data point over 100 runs.

In each experiment, we generated random circuits as described in Sect. 7.2. We fixed the circuit parameters to 10 layers, 30, 000 \(\mathtt {AND}\) gates per branch and 30, 000 \(\mathtt {XOR}\) gates per branch. We set the number of branch input and output wires to 128. We generated a new circuit with these same parameters for each run of each experiment. We performed and report on three experiments:

  1. 1.

    We fixed the number of branches to two, fixed the number of parties to two, and explore variation in performance based on branch alignment (Sect. 8.1).

  2. 2.

    We fixed the number of parties to two, varied the number of branches, and explore corresponding communication and wall-clock runtime (Sect. 8.2 and Sect. 8.3).

  3. 3.

    We fixed the branching factor to 16, varied the number of parties, and explore corresponding communication.

Each experiment shows that our approach is preferred in almost every setting.

Fig. 8.
figure 8

Random OTs required to evaluate a circuit with two branches.

8.1 Branch Alignment

We first demonstrate \(\mathtt {MOTIF}\) ’s dependence on circuit topology in the case of two branches. Figure 8 plots the distribution of the number of random OTs needed for two parties to evaluate each protocol. Across all 100 runs, \(\varPi _{\mathsf {MT}}\) and the standard triple-based protocol always need the same number of OTs. On the other hand, \(\mathtt {MOTIF}\) ’s performance differs depending on branch alignment. Because we sample alignments uniformly, this results in an increased number of consumed OTs.

Discussion. For two branches and on average, our approach required \(1.5\times \) fewer OTs than \(\mathtt {MOTIF}\) and consistently required \(2\times \) fewer OTs than the standard triple-based protocol. Given that random OTs are the main communication bandwidth bottleneck, \(\mathtt {MOTIF}\) is far from reducing communication by the optimal factor \(2\times \). \(\varPi _{\mathsf {MT}}\) never used more OTs than \(\mathtt {MOTIF}\). \(\mathtt {MOTIF}\) ’s best run required \(1.12\times \) more OTs than \(\varPi _{\mathsf {MT}}\) and \(1.71\times \) in the worst case.

8.2 Communication

We next report our 2PC communication improvement over both \(\mathtt {MOTIF}\) and the standard triple-based protocol as a function of branching factor. We instantiated \(\mathtt {MaskGen}\) with \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {2P} \).

Figure 9 plots both preprocessing communication and total communication. For further reference, Fig. 10 tabulates our communication improvement.

In our measurements, preprocessing constitutes both triple generation and mask generation. Each data point is averaged over 100 runs; the amount of communication may differ from run to run because each circuit has a randomly generated topology. In \(\varPi _{\mathsf {MT}}\)  the total communication is constant. In contrast, \(\mathtt {MOTIF}\) communication differs significantly across runs due to the layering issue explained in Sect. 2.4.

Fig. 9.
figure 9

2PC comparison of \(\varPi _{\mathsf {MT}}\) against \(\mathtt {MOTIF}\) and a standard triple-based protocol. We plot the following metrics as functions of the branching factor: the preprocessing per-party communication (top), the total per-party communication (bottom).

Fig. 10.
figure 10

Per-party communication improvement for our 2PC random circuit experiment as a function of the branching factor.

Discussion. In this metric, \(\varPi _{\mathsf {MT}}\) is preferred:

  • Preprocessing Communication. On 16 branches, we improve communication by \(2.96\times \) over \(\mathtt {MOTIF}\) and by \(14.4\times \) over the standard triple-based protocol. There are three reasons we did not achieve \(16\times \) improvement over standard triple-based protocol. First, both the standard triple-based approach and ours must perform the same number of base OTs to set up an OT extension matrix [IKNP03]. This adds a small amount of communication (around 20KB) common to both approaches, which cuts slightly into our advantage. Second, we need one OT per each of \(b - 1\) mask pairs. Third, entering and exiting conditionals have very small overhead differences.

  • Total Communication. On 16 branches, our approach improves total communication by \(2.6\times \) over \(\mathtt {MOTIF}\) and by \(12\times \) over the standard protocol. Our total communication improvement is lower than our preprocessing improvement because our evaluation phase communication is not improved. While improvement over the standard protocol is almost constant across runs, the improvement over \(\mathtt {MOTIF}\) differs due to varying circuit topology: our improvement ranges from \(2.16\times \) to \(2.93\times \).

8.3 Wall-Clock Time

Fig. 11.
figure 11

2PC comparison of \(\varPi _{\mathsf {MT}}\) against \(\mathtt {MOTIF}\) and a standard triple-based protocol. We plot the following metrics as functions of branching factor: wall-clock time on a LAN (left), the wall-clock time on a LAN where other processes share bandwidth (center), and the wall-clock time on a WAN (right).

We next present the wall-clock time improvements over \(\mathtt {MOTIF}\) and the standard triple-based protocol. We consider three simulated network settings:

  1. 1.

    LAN: A simulated gigabit ethernet connection with 1Gbps bandwidth and 2ms round-trip latency.

  2. 2.

    Shared LAN: A simulated local area network connection where the protocol shares network bandwidth with a number of other processes. The connection features 50Mbps bandwidth and 2ms round-trip latency.

  3. 3.

    WAN: A simulated wide area network connection with 100Mbps bandwidth and 20ms round-trip latency.

Figure 11 plots total wall-clock time for each network setting.

Discussion. In these metrics, \(\varPi _{\mathsf {MT}}\) is preferred:

  • LAN wall-clock time. On a fast LAN, our approach’s improvement is diminished compared to our communication improvement. On average and for 16 branches, we improve by \(1.52\times \) over \(\mathtt {MOTIF}\) and by \(1.81\times \) over the standard protocol. A 1Gbps network is very fast, and our modest hardware struggles to keep up with available bandwidth.

  • Shared LAN wall-clock time. On the more constrained shared LAN, our hardware easily keeps up with the communication channel, and we see corresponding improvement. On average and for 16 branches, we achieve \(2.26\times \) speedup over \(\mathtt {MOTIF}\) and \(7.43\times \) speedup over the standard protocol.

  • WAN wall-clock time. On this high-latency network our advantage is less pronounced. On average and for 16 branches, we achieve \(1.14\times \) speedup over \(\mathtt {MOTIF}\) and \(2.04\times \) speedup over the standard protocol. This high-latency network highlights the weakness of multi-round protocols in such settings.

Fig. 12.
figure 12

Protocol per-party communication usage as a function of the number of parties. Like \(\mathtt {MOTIF}\) and the standard protocol, we consume per-party communication linear in the number of parties.

8.4 Scaling to MPC

Our last experiment emphasizes our approach’s scaling to the multiparty setting. This experiment uses the same circuit parameters as the former experiments, but we fix the number of branches to 16. We implemented \(\mathtt {\Pi }{\text {-}}\mathtt {MaskGen}{\text {-}}\mathtt {VS} \) and ran the circuit among 4-8 parties. Figure 12 plots per-party communication as a function of the number of parties.

Discussion. \(\varPi _{\mathsf {MT}}\) works well in the multiparty setting. Our optimization does not add additional costs as compared to \(\mathtt {MOTIF}\) and standard triple-based protocol. Each technique consumes communication quadratic in the number of parties.