

# Masked Triples

## Amortizing Multiplication Triples Across Conditionals

David Heath<sup>(⊠)</sup>, Vladimir Kolesnikov, and Stanislav Peceny

Georgia Institute of Technology, Atlanta, GA, USA {heath.davidanthony,kolesnikov,stan.peceny}@gatech.edu

Abstract. A classic approach to MPC uses preprocessed *multiplication triples* to evaluate arbitrary Boolean circuits. If the target circuit features conditional branching, e.g. as the result of a IF program statement, then triples are wasted: one triple is consumed per AND gate, even if the output of the gate is entirely discarded by the circuit's conditional behavior.

In this work, we show that multiplication triples can be re-used across conditional branches. For a circuit with b branches, each having n AND gates, we need only a total of n triples, rather than the typically required  $b \cdot n$ . Because preprocessing triples is often the most expensive step in protocols that use them, this significantly improves performance.

Prior work similarly amortized *oblivious transfers* across branches in the classic GMW protocol (Heath et al., Asiacrypt 2020, [HKP20]). In addition to demonstrating conditional improvements are possible for a different class of protocols, we also concretely improve over [HKP20]: their maximum improvement is bounded by the *topology* of the circuit. Our protocol yields improvement independent of topology: we need triples proportional to the size of the program's longest execution path, regardless of the structure of the program branches.

We implemented our approach in C++. Our experiments show that we significantly improve over a "naïve" protocol and over prior work: for a circuit with 16 branches and in terms of total communication, we improved over naïve by  $12 \times$  and over [HKP20] by an average of  $2.6 \times$ .

Our protocol is secure against the semi-honest corruption of p-1 parties.

**Keywords:** MPC  $\cdot$  Conditional branching  $\cdot$  Beaver triples

## 1 Introduction

Secure Multiparty Computation (MPC) enables untrusting parties to compute a function of their private inputs while revealing only the function output. In this work, we consider semi-honest MPC protocols that use the classic trick of Beaver to evaluate Boolean circuits by preprocessing 'multiplication triples' [Bea92].

In such protocols, XOR gates are 'free' (i.e., require no interaction), but each AND gate consumes a distinct multiplication triple. To generate these triples, the parties use a more expensive MPC protocol that can be run ahead of time in a preprocessing phase. The communication in this phase is proportional to the number of triples and is usually the performance bottleneck. Hence, if we reduce the number of required triples, then we significantly improve performance.

Because one triple is needed per AND gate, protocols waste significant work if the computed function has conditional behavior, e.g. as the result of an IF program statement. Each gate requires a distinct triple, even if the output of the gate is entirely discarded by the function's conditional behavior.

Our protocol re-uses multiplication triples across conditional branches. A single triple can support any number of AND gates, so long as the gates occur in mutually exclusive program branches. This re-use does require that the parties hold additional correlated randomness, but the parties can generate this randomness efficiently. Our approach greatly decreases total communication and hence improves performance.

#### 1.1 High Level Intuition

Multiplication triples typically cannot be re-used (Sect. 2.5 reviews multiplication triples in detail): triples are essentially one-time-pads on cleartext values in the circuit: since no strict subset of parties knows the values in the triple, it is secure to use the triple to mask cleartext values. However, if we use the same triple for two different gates, then we violate security. We can work around this.

Consider two conditionally composed circuits  $C^0$  and  $C^1$ , both with n AND gates. For sake of example, suppose  $C^0$  is the active branch, but suppose the parties do not know and should not learn this fact. We re-use the same triples to evaluate gates in both  $C^0$  and  $C^1$  by carefully applying secret shared masks to the triples. For the inactive branch  $C^1$ , the parties mask the shares with XOR shares of uniform masks, randomizing the triples and preventing us from breaking the security of one-time-pad. By randomizing the triples, we violate the correctness of AND gates on the inactive branch, but this is of no concern: the output of each inactive AND gate is ultimately discarded. For the active branch  $C^0$ , the parties use the triples 'as is', meaning the active branch is evaluated normally. Of course, the parties should not know which branch is inactive, so from the perspective of the parties it should appear plausible that either branch could have used randomized triples. To achieve this, for the active branch the parties also XOR masks onto the triples, but in this case each mask is a sharing of zero: hence the XORing is a no-op.

The problem of amortizing triples across branches thus reduces to the problem of generating secret shared masks, both uniform and 'all-zero'. We present techniques for efficiently generating these masks, the most general of which is based on oblivious vector-scalar multiplication, achieved by a small number of 1-out-of-2 oblivious transfers. The crucial point is that the protocols for generating masks require far less communication than protocols for generating triples.<sup>1</sup> Thus, we decrease communication and improve performance.

<sup>&</sup>lt;sup>1</sup> We emphasize communication improvement because multiplication triples are often constructed from communication-expensive *oblivious transfers* (OTs). Silent OT [BCG+19] is an exciting new primitive that largely removes the communication

## 1.2 Advantage Over [HKP20]

Recent work showed an improvement similar to ours: [HKP20] showed that oblivious transfers can be re-used across conditional branches in the classic GMW protocol (see Sect. 2.4 for a review).

However, [HKP20] has one significant disadvantage: their performance improvement depends on circuit topology. Efficient GMW implementations minimize latency by organizing circuits into *layers* of gates. The input wires into each layer are the outputs of previous layers only, and hence all gates in a particular layer can be executed simultaneously. This strategy yields latency proportional to the circuit's *multiplicative depth* instead of to the number of gates.

Due to this important optimization, [HKP20]'s performance improvement is limited by the 'relative alignment' of the layers across branches: two branches are highly aligned if each of their respective layers has a similar number of AND gates. Their protocol issues oblivious transfers that *simultaneously* run one gate per branch and hence cannot optimize gates that occur in different layers. The relative alignment of branches is dependent on the target application. [HKP20] suggests resorting to compiler technologies to extract more performance.

Our approach does not depend on topology. Instead, we depend only on the number of AND gates in each branch. The parties require enough triples to handle the maximum number of AND gates across the branches. It is difficult to analytically quantify our improvement over [HKP20] without a specific application in mind, but our experiments show that the improvement is significant. We ran both approaches across a variety of topologies, and on average we improved communication by  $2.6 \times$  (see Sect. 8). In addition to concretely outperforming [HKP20], our approach also demonstrates that conditional improvement is possible for a different class of protocols (i.e. those based on triples), and hence is of independent interest.

## 1.3 Our Contributions

- Efficient Re-use of Beaver Triples. Our MPC protocol is secure against the semi-honest corruption of up to p-1 parties. The protocol re-uses triples across branches and requires a number of triples proportional only to the size of the longest execution path rather than to the size of the entire circuit.
- Topology-Independent Improvement. Unlike [HKP20], our improvement is independent of the topology of the conditional branches.
- Implementation and evaluation. We implemented our approach in C++ and report performance (see Sect. 8). For 2PC and a circuit with 16 branches, we improve communication over state-of-the-art [HKP20] on average by

overhead of OT. The trade off is increased computation: the classic OT extension of [IKNP03], which uses relatively little computation, is still preferable to Silent OT in most settings. This said, even if we were to use Silent OT, our improvement would be beneficial: we greatly reduce the needed number of OTs and hence would significantly reduce Silent OT computation overhead.

 $2.6\times$  and over a standard triple-based protocol (i.e., without our conditional improvement) by  $12\times.$ 

# 2 Preliminaries

# 2.1 Notation

- p denotes the number of parties.
- -b denotes the number of branches.
- Subscript notation associates variables with parties. E.g.,  $a_i$  is a variable held by party  $P_i$ .
- G denotes a pseudo-random generator (PRG).
- $\kappa$  is the computational security parameter (e.g. 128).
- t denotes the 'taken' branch in a conditional, i.e. the branch that is active during the oblivious execution.  $\bar{t}$  implies an inactive branch.
- Superscript notation associates variables with a particular branch. E.g.,  $x^0$  is associated with branch 0 while  $x^1$  is associated with branch 1.
- ∈<sub>\$</sub> denotes that the left hand side is uniformly drawn from the right hand set. E.g.,  $x \in_{\$} \{0, 1\}$  denotes that x is a uniform bit.
- $\triangleq$  denotes that the left hand side is defined to be the right hand side.
- We manipulate vectors and bitstrings (i.e., vectors of bits):
  - Variables denoting vectors are indicated with bold notation, e.g. a. If we wish to explicitly write out a vector, we use parenthesized, comma-separated values, e.g. (a, b, ..., y, z).
  - We index vectors with brackets and use 1-based indexing, e.g. a[1].
  - When clear from context, n denotes vector length.
  - When two bits trings are known to have the same length, we use  $\oplus$  to denote the bit wise <code>XOR</code> sum:

$$\boldsymbol{a} \oplus \boldsymbol{b} = (\boldsymbol{a}[1],...,\boldsymbol{a}[n]) \oplus (\boldsymbol{b}[1],...,\boldsymbol{b}[n]) \triangleq (\boldsymbol{a}[1] \oplus \boldsymbol{b}[1],...,\boldsymbol{a}[n] \oplus \boldsymbol{b}[n])$$

• We indicate a bitwise vector scalar product by writing the scalar to the left of the vector:

$$a\boldsymbol{b} = a(\boldsymbol{b}[1],...,\boldsymbol{b}[n]) \triangleq (a(\boldsymbol{b}[1]),...,a(\boldsymbol{b}[n]))$$

– We manipulate XOR secret shares. Section 2.2 presents our secret share notation and reviews basic properties.

# 2.2 XOR Secret Shares

Our main contribution is a Beaver-triple based construction for efficient conditional branching. Additionally, we review prior work that is based on the classic GMW protocol. Both techniques are based on XOR secret shares. Thus, we briefly establish notation for XOR shares and review their properties. An XOR secret sharing held amongst p parties is a vector of bits  $(x_1, ..., x_p)$ where each party  $P_i$  holds  $x_i$ . We refer to the full vector as a *sharing* and to the individual bits held by parties as *shares*. The semantic value of a sharing (i.e., the cleartext value that the sharing represents) is the XOR sum of its shares. If the semantic value of a sharing  $(x_1, ..., x_p)$  is a bit x, i.e.  $x_1 \oplus ... \oplus x_p = x$ , then we use the shorthand [x] to denote the sharing:

$$\llbracket x \rrbracket \triangleq (x_1, ..., x_p) \qquad \text{such that } x_1 \oplus ... \oplus x_p = x$$

Typically, sharings are used in a context where no strict subset of parties knows the semantic value of the sharing. Nevertheless, parties can easily perform homomorphic linear operations over XOR sharings.

- Parties XOR two sharings by locally XORing their respective shares:

$$\begin{split} \llbracket x \rrbracket \oplus \llbracket y \rrbracket &= (x_1, ..., x_p) \oplus (y_1, ..., y_p) & \text{Defn. sharing} \\ &= (x_1 \oplus y_1, ..., x_p \oplus y_p) & \text{Defn. vector XOR} \\ &= \llbracket x \oplus y \rrbracket & \text{XOR commutes, assoc., defn. sharing} \end{split}$$

- Parties AND a sharing with a public constant by locally scaling each share.

$$\begin{split} c[\![x]\!] &= c(x_1,...,x_p) & \text{Defn. sharing} \\ &= (cx_1,...,cx_p) & \text{Defn. vector scalar product} \\ &= [\![cx]\!] & \text{AND distributes over XOR , defn. sharing} \end{split}$$

- Parties encode public constants as sharings by letting  $P_1$  take the constant as his/her share and letting all other parties take 0 as his/her share.

$$\llbracket c \rrbracket = (c, 0, ..., 0)$$
 0 identity, defn. sharing

This allows the parties to XOR sharings with public constants.

A party can easily share her private bit x with the parties. She uniformly draws p bits, with the constraint that the p bits XOR to x. She then distributes  $[\![x]\!]$  amongst the parties.

Parties can compute sharings of *uniform* values. To draw a uniform sharing, each party locally draws a uniform share. In our protocols, we overload  $\in_{\$}$  notation to draw sharings: for example,  $[\![x]\!] \in_{\$} \{0,1\}$  indicates that each party  $P_i$  draws a uniform share  $x_i$ .

Finally, parties can *reconstruct* the semantic value of a sharing. To do so, each party broadcasts his/her share (or sends it to a specified output party). Upon receiving all shares, each party locally XORs the shares.

For security, we often require that each party's share be uniformly chosen. We point out where shares are uniform when relevant.

**VS** Functionality:

- INPUT: Parties  $P_1, ..., P_p$  input a scalar [s] and a bitstring [x].
- OUTPUT: Parties output a uniform sharing  $[s\boldsymbol{x}]$ .

Fig. 1. The functionality defining VS gate semantics. VS gates allow parties to multiply a sharing of a bitstring by a sharing of a scalar.

#### 2.3 Vector Scalar Multiplication

Next, we review how parties operate non-linearly over sharings. In contrast to typical approaches that consider AND gates, we instead consider more general *vector scalar multiplication* gates, which we call VS gates. We consider these more expressive gates because they are needed to review prior work [HKP20] and because we use VS gates in our constructions.

To begin, we extend the notion of sharings to vectors. Specifically, we define the sharing of a vector to be a vector of sharings:

$$\llbracket \boldsymbol{x} \rrbracket = \llbracket (\boldsymbol{x}[1], ..., \boldsymbol{x}[n]) \rrbracket \triangleq (\llbracket \boldsymbol{x}[1] \rrbracket, ..., \llbracket \boldsymbol{x}[n] \rrbracket)$$

Suppose we wish to scale a shared vector  $\boldsymbol{x}$  by a shared bit s. That is, we wish to compute the scalar product  $[\![s\boldsymbol{x}]\!]$ . Unlike linear operations, this vector scalar multiplication requires the parties to communicate.

[HKP20] showed that parties can use p(p-1) oblivious transfers (OTs) to implement a VS gate. We review their VS protocol at a high level; Fig. 1 specifies the protocol functionality. For simplicity, we focus on p = 2 parties and length-2 vectors, but the approach generalizes to arbitrary p and n.

Suppose two parties  $P_1, P_2$ , holding sharings  $[\![s]\!], [\![(a, b)]\!]$ , wish to compute  $[\![(sa, sb)]\!]$ : semantically, they wish to scale the vector (a, b) by the bit s.

Observe the following equality over the desired semantic value:

$$\begin{aligned} (sa, sb) &= s(a, b) & \text{Distribute} \\ &= (s_1 \oplus s_2)(a_1 \oplus a_2, b_1 \oplus b_2) & \text{Defn. sharing} \\ &= (s_1a_1 \oplus s_1a_2 \oplus s_2a_1 \oplus s_2a_2, s_1b_1 \oplus s_1b_2 \oplus s_2b_1 \oplus s_2b_2) & \text{Distribute} \\ &= s_1(a_1, b_1) \oplus s_1(a_2, b_2) \oplus s_2(a_1, b_1) \oplus s_2(a_2, b_2) & \text{Group} \end{aligned}$$

The first and fourth summands can be computed locally by the respective parties. Thus, we need only show how to compute  $s_1(a_2, b_2)$  (the remaining third summand is computed symmetrically). To compute this vector AND, the parties perform a single 1-out-of-2 OT of length-2 secrets. Here,  $P_2$  plays the OT sender and  $P_1$  the receiver.  $P_2$  draws two uniform bits  $x, y \in \{0, 1\}$  and allows  $P_1$  to choose between the following two secrets:

$$(x,y)$$
  $(x\oplus a_2, y\oplus b_2)$ 

 $P_1$  chooses based on  $s_1$  and hence receives  $(x \oplus s_1 a_2, y \oplus s_1 b_2)$ .  $P_2$  uses the vector (x, y) as her share of this summand. Thus, the parties hold  $[s_1(a_2, b_2)]$ .

Put together, the full vector multiplication s(a, b) uses only two 1-out-of-2 OTs of length-2 secrets. VS gates generalize to arbitrary numbers of parties and vector lengths: a vector scaling of n elements between p parties requires p(p-1)1-out-of-2 OTs of length-n secrets.

VS gates are important for our constructions. We present a modification to the above protocol, used once per conditional branch, that is optimized for scalar multiplication of *long* vectors (see Sect. 6.2). This modification is similar to techniques in [KK13, ALSZ13] and reduces communication by up to half.

#### 2.4 Efficient Conditionals from VS Gates: [HKP20] Review

[HKP20] was the first work to significantly reduce the cost of branching in the multi-party setting. Their MOTIF protocol extends the classic GMW protocol with VS gates in order to amortize oblivious transfers across conditional branches. We review how VS gates enable this amortization.

For simplicity, consider two branches computed by two parties. Since the two branches are conditionally composed, one branch is active and one is inactive.

MOTIF's key invariant, set up by the protocol's circuit gadgets, is that on each wire of the inactive branch the parties hold a sharing [0], whereas on the active branch they hold valid sharings. XOR gates immediately propagate this invariant: on the inactive branch, XOR gates output [0], while on the active branch XOR gates output valid sharings.<sup>2</sup>

Next, we review how VS gates make use of and propagate the invariant. Let  $\llbracket a^0 \rrbracket, \llbracket b^0 \rrbracket$  be sharings held on wires in branch 0 and  $\llbracket a^1 \rrbracket, \llbracket b^1 \rrbracket$  be sharings held on wires in branch 1. Suppose the parties wish to compute both  $\llbracket a^0 b^0 \rrbracket$  and  $\llbracket a^1 b^1 \rrbracket$ . Despite the fact that the parties compute two AND gates, they need only two 1-out-of-2 OTs. Let t denote the active branch. Hence,  $a^{\bar{t}}$  and  $b^{\bar{t}}$  are both 0.

Observe the following equalities:

$$(a^t \oplus a^{\overline{t}})b^t = (a^t \oplus 0)b^t = a^t b^t$$
$$(a^t \oplus a^{\overline{t}})b^{\overline{t}} = (a^t \oplus 0)0 = 0$$

Thus computing both  $[\![(a^t \oplus a^{\bar{t}})b^t]\!]$  and  $[\![(a^t \oplus a^{\bar{t}})b^{\bar{t}}]\!]$ , propagates the invariant: the active branch receives the correct sharing while the inactive branch receives  $[\![0]\!]$ . These products reduce to a vector-scalar product computable by a VS gate (see Fig. 1).

$$\llbracket (a^t \oplus a^{\bar{t}})(b^t, b^{\bar{t}}) \rrbracket$$

Thus, MOTIF computes two AND gates for the price of one. This improvement generalizes to an arbitrary number of branches.

<sup>&</sup>lt;sup>2</sup> MOTIF does not natively support NOT gates because they would break the invariant: NOT maps [0] to [1]. NOT gates can be implemented by XOR gates together with a distinguished wire that holds [1] on the active branch and [0] on the inactive branch.

TripleGen Functionality:

- INPUT: Parties  $P_1, ..., P_p$  provide no input.
- − OUTPUT: Let  $a, b \in \{0, 1\}$  be uniform bits. Parties output uniform sharings  $\llbracket a \rrbracket$ ,
  - $\llbracket b \rrbracket$ , and  $\llbracket ab \rrbracket$ .

Fig. 2. The Beaver Triple preprocessing functionality, TripleGen.

**Branch Layer Alignment.** As discussed in Sect. 1.2, the MOTIF protocol is dependent on circuit topology. The less *aligned* the layers of the branches are (branches that are highly aligned have similar numbers of AND gates in each layer), the less the circuit benefits from MOTIF.

In the above example, the parties issued two OTs to implement the two AND gates simultaneously. The parties can only perform this optimization if *inputs* for both gates are available. If not, the parties cannot amortize the OTs. Hence, gates in different layers cannot share OTs in layer-by-layer evaluation.

In the *p*-party protocol, in each layer MOTIF eliminates all OTs except for the total of  $p(p-1) \cdot \max(w^i)$  OTs, where  $w^i$  is the number of AND gates in the current layer of branch *i*. In contrast, our technique does not depend on the circuit's topology and is always proportional only to the circuit's longest execution path.

#### 2.5 Semi-honest Triple-Based Protocol Review

In this work, we amortize Beaver triples across conditional branches. We thus review how triples enable non-linear operations over XOR sharings.

Suppose the parties hold sharings  $[\![x]\!]$  and  $[\![y]\!]$  and wish to compute a uniform sharing  $[\![xy]\!]$ . Suppose further that the parties have a Beaver triple: they have three uniform sharings  $[\![a]\!], [\![b]\!], [\![ab]\!]$  where  $a, b \in_{\$} \{0, 1\}$  are uniform bits unknown to any strict subset of parties. First, the parties locally compute  $[\![a \oplus x]\!]$  and  $[\![b \oplus y]\!]$ , then reconstruct the semantic values  $a \oplus x$  and  $b \oplus y$  by broadcasting shares. This is secure:  $a \oplus x$  leaks nothing about x because a is uniform and secret (and similarly for y). The parties can now compute  $[\![xy]\!]$  as follows:

$$(a \oplus x)(b \oplus y) \oplus (a \oplus x)\llbracket b \rrbracket \oplus (b \oplus y)\llbracket a \rrbracket \oplus \llbracket ab \rrbracket = \llbracket xy \rrbracket$$

This protocol is simple and efficient: the parties broadcast only two bits per AND gate. However, because the triple values a and b are used as one-time-pads on semantic values, one triple is typically needed per gate. Thus, the parties must preprocess many triples according to the functionality in Fig. 2. Computing this functionality is often the most expensive step in triple-based protocols. For example, triples might be achieved via the classic GMW protocol, requiring p(p-1) OTs per triple. In this work, we show a technique that re-uses triples across conditional branches and hence decreases overall cost.

# 3 Related Work

We review related work, focusing on works that optimize secure evaluation of conditional branches or that use multiplication triples in the malicious model. MOTIF. The most closely related work is MOTIF [HKP20]. MOTIF amortizes oblivious transfers across conditional branches in the classic semi-honest GMW protocol [GMW87]. We reviewed this approach in detail in Sect. 2.4, explained why our approach outperforms MOTIF in Sect. 1.2, and present experimental comparisons between the two approaches in Sect. 8.

Stacked Garbling. Recent works demonstrated similar conditional improvements for the garbled circuit (GC) technique [Kol18,HK20b,HK20a]. [Kol18,HK20b] reduced communication in settings where one party knows which branch is active. [Kol18] is motivated by the use case where the GC generator knows the active branch, such as when evaluating one of several database queries. [HK20b] is motivated by zero knowledge proofs. [HK20a] superceded these prior works and for the first time showed that communication can be greatly improved even if no party knows which branch is active.

These works' "stacking" technique does not have an obvious analog for interactive multiparty protocols, so different techniques are needed, such as explored in [HKP20] and in this work. However, our approach follows the basic idea of *material re-use* introduced by Stacked Garbling: the expensive material is safely re-used in the (possibly incorrect) evaluation of inactive branches, whose output is obliviously discarded

Universal Circuits. We improve branching via cryptographic techniques. Another approach instead recompiles conditionals into a new form. Universal circuits (UCs) are programmable constructions that can evaluate any circuit up to a given size n. Branches can be compiled to one UC, potentially amortizing cost. At runtime, the UC can be programmed to compute the active branch.

Decades after Valiant's original work [Val76], UC enjoyed renewed interest due to its relevance to MPC, and UC constructions have steadily improved [KS08,LMS16,GKS17,AGKS19,KS16,ZYZL18]. Even with these improvements, representing conditional branches with UCs is often impractical. The state-of-the-art UC construction applied to a circuit with n gates still has factor  $3 \log n$  overhead [LYZ+20]. Thus, UC-based conditional evaluation is often more expensive than simply evaluating the condition naïvely. UC-based branching is superceded by cryptographic techniques such as Stacked Garbling, MOTIF, and this work.

Maliciously Secure Triple-Based Protocols. We present an improved triple-based semi-honest protocol. Two exciting and related lines of work explore triple-based protocols in the malicious model. These two lines differ primarily in how they preprocess triples. One line generates triples using homomorphic encryption [BDOZ11, DPSZ12, DKL+13, KPR18] while another generates them using oblivious transfer [NNOB12, LOS14, FKOS15, KOS16, CDE+18]. To achieve malicious

security, these methods rely on expensive primitives such as zero knowledge proofs and cut-and-choose. As a result, preprocessing is expensive.

Amortizing triples in these protocols would be an important improvement. While we make no claims in the malicious model, malicious improvements have historically been preceded by similar improvements in the semi-honest model. We leave investigating triple amortization in the malicious model as future work.

## 4 Technical Overview

As reviewed in Sect. 2.5, Beaver triples can efficiently and securely implement AND gates. In general, triples cannot be re-used, and hence a circuit with b branches each with n AND gates typically requires  $n \cdot b$  triples.

As discussed in Sect. 1.1, our key observation is that triples *can* be re-used across conditional branches, as long as uniform **XOR** masks are *additionally* applied. These masks allow us to re-use the same triple to compute b gates across b branches. Thus b branches each with n gates require only n triples, improving the number of needed triples by factor b. Our technique does require the parties to hold additional shared per-branch masks, but these masks are computed cheaply.

This section presents our protocol,  $\Pi_{MT}$  (the 'masked triple protocol'), with detail sufficient to understand our contribution.  $\Pi_{MT}$  securely computes Boolean circuits among p parties, re-uses triples across conditional branches, and is secure against the semi-honest corruption of up to p-1 parties. Full formal algorithms, with accompanying proofs of correctness and security, are in Sect. 5.

#### 4.1 Re-Using Beaver Triples

For simplicity, consider only two branches,  $C^0$  and  $C^1$  and, without loss of generality, let  $C^0$  be the active branch. The parties re-use the same set of triples for both branches. For the inactive branch, the parties will mask the triples with sharings of uniform bits; on the active branch the parties will mask the triples with sharings of zeros.

Suppose the parties hold sharings  $[\![x^0]\!], [\![y^0]\!]$  on branch 0 and  $[\![x^1]\!], [\![y^1]\!]$  on branch 1. Suppose further that they wish to obliviously compute one of  $[\![x^0y^0]\!]$  or  $[\![x^1y^1]\!]$ , depending on which branch is active. Let  $[\![a]\!], [\![b]\!], [\![ab]\!]$  be a uniform preprocessed triple. On the active branch, the parties mask  $[\![a]\!]$  and  $[\![b]\!]$  with uniform sharings of 0:

$$\llbracket a \rrbracket \oplus \llbracket 0 \rrbracket = \llbracket a \rrbracket \qquad \llbracket b \rrbracket \oplus \llbracket 0 \rrbracket = \llbracket b \rrbracket$$

The parties use this masked triple to compute branch 0's AND gate normally: the parties compute and reconstruct  $a \oplus x^0$  and  $b \oplus y^0$ , and then locally compute the correct product:

$$(a \oplus x^0)(b \oplus y^0) \oplus (a \oplus x^0)\llbracket b \rrbracket \oplus (b \oplus y^0)\llbracket a \rrbracket \oplus \llbracket ab \rrbracket = \llbracket x^0 y^0 \rrbracket$$

MaskGen Functionality:

- PARAMETERS: The size of output bitstrings n.
- INPUT: Parties  $P_1, ..., P_p$  provide no private input.
- OUTPUT: Let  $s \in \{0,1\}$  be a uniform bit. Let  $r \in \{0,1\}^n$  be a uniform bitstring. Parties output  $[\![s]\!]$  as well as the pair:

$$M^{0}, M^{1} = \begin{cases} \llbracket 0^{n} \rrbracket, \llbracket \boldsymbol{r} \rrbracket & \text{if } s = 0 \\ \llbracket \boldsymbol{r} \rrbracket, \llbracket 0^{n} \rrbracket & \text{otherwise} \end{cases}$$

All output sharings are uniform.

**Fig. 3.** The MaskGen functionality provides parties with the pairs of masks needed to implement our optimization. The functionality computes two shared bitstrings. One bitstring is the all zero bitstring while the other is uniform. The two strings are swapped according to s, and the parties are given [s].

In contrast, on the inactive branch the parties mask their shares with uniform bits. Let  $r, s \in_{\$} \{0, 1\}$  be two such bits and let the parties hold uniform sharings  $[\![r]\!], [\![s]\!]$ . The parties compute:

$$\llbracket a \rrbracket \oplus \llbracket r \rrbracket = \llbracket a \oplus r \rrbracket \qquad \llbracket b \rrbracket \oplus \llbracket s \rrbracket = \llbracket b \oplus s \rrbracket$$

When the parties use this masked triple, they compute and reconstruct  $a \oplus r \oplus x^1$ and  $b \oplus s \oplus y^1$ , and then locally compute the following expression:

$$(a \oplus r \oplus x^1)(b \oplus s \oplus y^1) \oplus (a \oplus r \oplus x^1)\llbracket b \oplus s \rrbracket \oplus (b \oplus s \oplus y^1)\llbracket a \oplus r \rrbracket \oplus \llbracket ab\rrbracket$$

The above expression does not correctly compute  $[x^1y^1]$ , but this is irrelevant since all computations performed in the inactive branch are ultimately discarded by the circuit's conditional behavior.

Now, consider the security of the above re-use. As discussed above, each party views the following reconstructed semantic values:

$$a \oplus x^0 \quad b \oplus y^0 \quad a \oplus r \oplus x^1 \quad b \oplus s \oplus y^1$$

Because a, b, r, s are all uniform, this view is simulated by four uniform bits. Thus, our approach is secure. See Sect. 5.3 for a formal proof.

Although we have shown that mask sharings allow triple amortization, we have not discussed how these sharings are computed. Figure 3 formalizes MaskGen, a preprocessing functionality that computes strings of masks  $M^0$  and  $M^1$  such that (1) the parties receive a uniform sharing [s] where  $s \in_{\$} \{0, 1\}, (2)$  $M^s$  is a uniform sharing of all zeros, and (3)  $M^{\bar{s}}$  is a uniform sharing of random bits. During the preprocessing phase, the parties use MaskGen to preprocess strings with size sufficient to mask each triple. We formalize and prove secure  $\Pi_{MT}$  in the MaskGen-hybrid (and TripleGen-hybrid) model. Instantiations of MaskGen are provided and proved secure in Sect. 6. **Entering a Conditional.** MaskGen constructs two bitstrings that are ordered according to a uniform bit s (the parties hold a uniform sharing  $[\![s]\!]$ ). To use our approach, the parties need to appropriately 'line up' the masks with the branches: the active branch should use the all zeros mask and the inactive branch should use the uniform mask. We assume the parties have explicit access to a sharing of the *branch condition*: the parties hold  $[\![t]\!]$ . Upon entering the conditional, the parties compute  $[\![s \oplus t]\!]$  and then broadcast their shares to reconstruct  $s \oplus t$ . If  $s \oplus t$  is 0, the parties do nothing. Otherwise, they locally swap their respective shares of the strings  $M^0$  and  $M^1$ . After performing this conditional swap, the parties are assured that  $M^t$  is the all zeros mask and  $M^{\bar{t}}$  is uniform. Note,  $s \oplus t$  does not reveal the active branch t because s is uniform.

**Exiting a Conditional.** Exiting conditionals is performed using ordinary Boolean logic. Let  $[\![x^0]\!]$ ,  $[\![x^1]\!]$  be corresponding output sharings from the two branches. We leave the branch by multiplexing each such pair of outputs: we compute  $[\![x^0 \oplus t(x^0 \oplus x^1)]\!] = [\![x^t]\!]$ . Thus, multiplexing requires one AND gate per conditional output.

#### 4.2 Nested Branches

We have presented a technique for handling conditionals with only two branches. To generalize to higher branching factors, we *nest* conditionals. At each conditional, we use MaskGen to generate fresh masks and then apply these masks to the (possibly already masked) triples. This trivially and securely allows us to handle arbitrary branching control flow.

As a brief argument of security, consider that each branch uses a distinct mask string from each of its parent conditionals. Further, if the branch is inactive (1) at least one mask string will be uniform and (2) the XOR sum of all uniform mask strings for the branch is *unique*. Thus, all AND gate broadcasts can be simulated by uniform bits. We argue this more formally in Sect. 5.3.

We note that instead of nesting, it is possible to generalize our approach to directly handle vectors of conditionals, e.g., corresponding to program switch statements. This direction is not necessarily preferable: for a circuit with b branches, both techniques amortize a triple across up to b gates, and the work required to generate masks is very similar. We present the nested formalization due to its generality and relative simplicity.

# 5 $\Pi_{MT}$ : Formalization and Proofs

We now present  $\Pi_{MT}$  formally. Section 5.1 begins by defining circuit syntax, including circuits with explicit conditional branching. We then specify our protocol in Sect. 5.2 and prove it correct and secure in Sect. 5.3.

#### 5.1 Circuit Formal Syntax

Conditional branching is central to our approach. Thus, traditional circuits that include only low-level gates are insufficient for our formalization. We instead use the syntax of [HK20a] which makes explicit conditional branching. We review and formally present their syntax.

Conventionally, a circuit is a list of Boolean gates together with specified input and output wires. We refer to this representation as a *netlist*. We do not modify the semantics of netlists and evaluate them using the standard triple-based technique (see Sect. 2.5).

We extend the space of circuits with notion of a *conditional*. A conditional is parameterized over two circuits,  $C^0$  and  $C^1$ . By convention, the first bit of input to the conditional is the branch condition t. The semantics of a conditional is that branch  $C^t$  is given the remaining input to the overall conditional, and  $C^t$ 's output is returned.

Finally, we require an extra notion that allows us to place conditionals 'in the middle' of the overall circuit. A *sequence* is parameterized over two circuits C' and C''. When executed, the sequence passes its input to C', feeds the output of C' as input to C'', then returns the output of C''.

More formally, the space of circuits C is defined inductively. Let  $C^0, C^1, C', C''$  be arbitrary circuits. The space of circuits is defined as follows:

$$\mathcal{C} \triangleq Netlist(\cdot) \mid Cond(C^0, C^1) \mid Seq(C', C'')$$

That is, a circuit is either a (1) netlist, a (2) conditional, or a (3) sequence. By arbitrarily nesting conditionals and sequences, we may achieve complex branching control structure.

## 5.2 $\Pi_{MT}$ Formalization

Figure 4 presents our protocol for handling circuits with conditional branching.  $\Pi_{MT}$  first delegates to TripleGen, generating sufficient multiplication triples to handle the circuit, and then delegates to the sub-protocol eval. eval recursively walks the structure of the circuit and securely achieves circuit semantics.

 $\Pi_{MT}$  formalizes the ideas stated in Sect. 4 in a natural manner. The most interesting case in eval is the handling of conditionals, where we (1) invoke the MaskGen oracle, (2) mask the available triples, and (3) recursively evaluate both branches. Although we for clarity write MaskGen inline, the actual MaskGen protocol does not depend on any circuit values, and thus can be moved to a pre-processing phase. After evaluating both branches, we discard the inactive branch outputs and propagate the active branch outputs via a *multiplexer*. The multiplexer is implemented simply as a netlist, and computes the following function for each corresponding pair of branch outputs  $[x^0], [x^1]$ :

$$\llbracket x^0 \oplus t(x^0 \oplus x^1) \rrbracket = \llbracket x^t \rrbracket$$
<sup>(1)</sup>

For simplicity, we abstract some algorithms and briefly describe them below. Other than  $\Pi_{\text{Base}}$ , which is discussed in Sect. 2.5, we do not write these algorithms in full, as they are simple.

```
Functionality:
  - PARAMETERS: A circuit C \in \mathcal{C}.
  - INPUT: Each party P_i inputs a private bitstring inp_i \in \{0, 1\}^*.
  - OUTPUT: Each party P_i outputs C(inp_1, ..., inp_n).
Semi-Honest Protocol:
   \Pi_{\mathsf{MT}}(C, \boldsymbol{inp}_1, ..., \boldsymbol{inp}_n):
      \llbracket inp \rrbracket \leftarrow \texttt{shareinput}(inp_1,...,inp_n)
      \llbracket triples \rrbracket \leftarrow \texttt{TripleGen}(\texttt{neededtriples}(C))
      \llbracket out \rrbracket \leftarrow eval(C, \llbracket triples \rrbracket, \llbracket inp \rrbracket)
     return reconstruct([out])
   eval(C, \llbracket triples \rrbracket, \llbracket inp \rrbracket):
      switch C:
        case Sequence(C', C''):
            ▷ Parse [triples] into parts of sufficient length for each circuit.
           [triples'], [triples''] \leftarrow [triples]
           return eval(C'', \llbracket triples'' \rrbracket, eval(C', \llbracket triples' \rrbracket, \llbracket inp \rrbracket))
        case Cond(C^0, C^1):
            \triangleright Parse [[triples]] into triples for the conditional and for the multiplexer.
           \llbracket triples_{cond} \rrbracket, \llbracket triples_{mur} \rrbracket \leftarrow \llbracket triples \rrbracket
            \triangleright Split the branch condition t from the other branch inputs.
            \triangleright By convention, the first input wire is the condition bit.
           [\![t]\!], [\![inp_{cond}]\!] \leftarrow [\![inp]\!]
           \llbracket M^0 \rrbracket, \llbracket M^1 \rrbracket, \llbracket s \rrbracket \leftarrow \texttt{MaskGen}(2 \cdot |\llbracket triples_{act} \rrbracket))
           (s \oplus t) \leftarrow \texttt{reconstruct}(\llbracket s \rrbracket \oplus \llbracket t \rrbracket)
           (M^0, M^1) \leftarrow \text{if } (s \oplus t) \text{ then } (M^1, M^0) \text{ else } (M^0, M^1)
           \llbracket \boldsymbol{out}^0 \rrbracket \leftarrow \texttt{eval}(C^0, \texttt{applymask}(M^0, \llbracket \boldsymbol{triples}_{cond} \rrbracket), \llbracket \boldsymbol{inp}_{cond} \rrbracket)
           \llbracket \boldsymbol{out}^1 \rrbracket \leftarrow \texttt{eval}(C^1, \texttt{applymask}(M^1, \llbracket \boldsymbol{triples}_{cond} \rrbracket), \llbracket \boldsymbol{inp}_{cond} \rrbracket)
           return mux([t], [out^0], [out^1], [triples_{mux}])
        case Netlist(g_1, ..., g_k):
            \triangleright Evaluate with a standard triple-based protocol.
           return \Pi_{\text{Base}}((g_1, ..., g_k), \llbracket triples \rrbracket, \llbracket inp \rrbracket)
```

Fig. 4.  $\Pi_{MT}$  allows p parties to securely compute a circuit  $C \in C$ .  $\Pi_{MT}$  delegates to a recursive sub-procedure eval.

-  $\Pi_{\text{Base}}$  is the standard triple-based protocol as specified in Sect. 2.5.  $\Pi_{\text{Base}}$  takes as input (1) a vector of gates  $(g_1, ..., g_k)$ , (2) a vector of (possibly masked) triples [[triples]] and (3) the netlist input [[inp]].  $\Pi_{\text{Base}}$  returns a sharing of outputs [[out]].

We emphasize that while we do not, for simplicity, explicitly list  $\Pi_{\text{Base}}$ , the protocol *is not* a black-box functionality.<sup>3</sup>

- neededtriples computes the number of needed triples for the circuit C. This computed number is equal to the number of AND gates on the circuit's longest execution path.
- shareinput allows the parties to construct and distribute sharings of their respective private inputs.
- reconstruct allows parties to reconstruct a sharing via broadcast.
- mux computes the per-output multiplexer function (Eq. 1).
- applymask specifies how mask sharings are XORed onto triples. Specifically, for each uniformly shared triple [[a]], [[b]], [[c]], we draw two bits from the mask sharing and XOR one bit onto both [[a]] and [[b]].

## 5.3 $\Pi_{MT}$ Proofs

Now that we have introduced  $\Pi_{MT}$ , we prove it both correct and secure in the MaskGen- and TripleGen-hybrid model. We instantiate MaskGen in Sect. 6.

In both proofs, we refer to the notion of a *valid* triple. A triple is valid if it is uniformly shared and of the following form:  $[\![a]\!], [\![b]\!], [\![ab]\!]$ . That is, the third term is a share of the product of the first two terms. An *invalid* triple is a triple  $[\![a]\!], [\![b]\!], [\![c]\!]$  such that  $c \neq ab$ . Invalid triples arise in our protocol due to the application of extra masks to the first two entries in triples.

**Theorem 1** ( $\Pi_{MT}$  correctness). For all circuits  $C \in C$  and all private inputs  $inp_1, ..., inp_n \in \{0, 1\}^*$ :

$$\Pi_{\mathsf{MT}}(C, inp_1, ..., inp_p) = C(inp_1, ..., inp_p)$$

*Proof.* By induction on the structure of C. The inductive invariant is as follows:

If the triples passed to eval are valid, then eval correctly implements the semantics of C.

We focus on conditionals; the correctness of netlists follows trivially from the standard triple-based protocol. The correctness of sequences is immediate.

Suppose C is a conditional  $Cond(C^0, C^1)$ . Further, suppose  $\llbracket t \rrbracket$  is the branch condition. If the triples passed to the conditional are invalid, then the inductive invariant vacuously holds. Thus, we need only consider evaluation of a conditional on valid triples. The oracle call to MaskGen constructs two mask strings

<sup>&</sup>lt;sup>3</sup> We cannot support  $\Pi_{\text{Base}}$  as a black-box because it is possible to implement a protocol that securely handles netlists, but that is insecure when given *masked* triples. For example, the parties could out-of-band check the validity of triples and reveal all inputs if the triples are found invalid. Therefore,  $\Pi_{\text{Base}}$  is a white-box protocol.

 $M^0, M^1$  such that  $M^s$  is all-zero and  $M^{\bar{s}}$  is uniform. By reconstructing  $s \oplus t$  and accordingly locally swapping the two mask strings, the parties ensure  $M^t$  is the all-zero string. Thus,  $C^t$  is given valid triples (the valid triples are masked by sharings of zeros and hence remain valid) and, by induction, returns the correct semantic outputs.  $C^{\bar{t}}$  will not return correct values, but these outputs are discarded by the multiplexer. Thus, conditionals support the inductive invariant.

The top level circuit is given valid triples via the oracle call to TripleGen. This fact, combined with the inductive invariant, implies that  $\Pi_{MT}$  is correct.  $\Box$ 

**Theorem 2** ( $\Pi_{MT}$  security).  $\Pi_{MT}$  is secure against semi-honest corruption of up to p-1 parties in the TripleGen-hybrid and MaskGen-hybrid model.

*Proof.* By construction of a simulator for one party, which we later generalize to simulate up to p-1 parties. Each broadcast received by a party can be simulated by a uniform bit.<sup>4</sup> We prove this simulation secure by induction on the structure of the circuit C. The inductive invariant is as follows:

Let [a], [b], [c] be a (possibly invalid) triple. For each triple, we refer to the semantic values a and b as the *one-time-pad* parts. **eval** uses both one-time-pad parts of each triple to mask at most one cleartext value.

For netlists, this is trivial: we use a distinct triple for each AND gate, and each one-time-pad part is used only to mask one of the gate inputs. Similarly, sequences satisfy the inductive invariant trivially: we provide different triples to both parts of the sequence.

Therefore we focus on conditionals. Consider a conditional  $Cond(C^0, C^1)$ . As a brief aside from proving that the inductive invariant holds, while the parties reconstruct  $s \oplus t$ , s is uniform, and hence this leaks nothing about t (i.e.,  $s \oplus t$  can be simulated by a uniform bit). Now, returning to the invariant: We first split the triples into sufficient numbers for the conditional body and for the multiplexer. The multiplexer is implemented by a netlist, and hence trivially satisfies our invariant. The conditional body is more complicated. Indeed, we use the same triples to evaluate both branches. However, our call to MaskGen together with the conditional swap ensures that  $M^{\tilde{t}}$  is a sharing of uniform bits. When we apply  $M^{\tilde{t}}$  to the triples, we re-randomize the one-time-pad parts of the triples. (Note, applying  $M^t$  (the all zeros mask) has no effect on the one-time-pad parts.) Thus, we provide independent one-time-pad parts to both  $C^0$  and  $C^1$ , satisfying the inductive invariant.

Because each one-time-pad part is (1) uniform and (2) used to mask at most one cleartext value, and because each broadcast is masked by a one-time-pad part, each broadcast can be simulated by a uniform bit. Thus, we can simulate a single party's view.

<sup>&</sup>lt;sup>4</sup> One caveat is that broadcasts used to reconstruct the circuit's outputs must XOR to the correct output value. The simulator must arrange the simulated output broadcasts such that they appropriately add up. This is typical in MPC proofs and is easy to set up.

The generalization from simulating one party to simulating up to p-1 is based on a simple observation about XOR secret shares: the view of p-1 parties holds no more information than the share of 1 party. The remaining broadcasts from the remaining, unsimulated parties can be simulated by uniform bits.

 $\Pi_{MT}$  is secure against semi-honest corruption of up to p-1 parties.

Some MPC techniques, e.g., computing multiplicative inverse [BIB89], rely on opening (randomized) intermediate values. This may not always be compatible with our optimization, since our randomization of the inactive branch may cause an invalid opened value, thereby revealing that it was in fact inactive.

## 6 Semi-Honest MaskGen Instantiations

In this section, we instantiate MaskGen (Fig. 3). We present three protocols, two formally and one informally, that follow two general approaches:

- 1. The first approach is generic in that it works for an arbitrary number of parties and is based on vector scalar multiplication (Sect. 2.3). Since our approach often uses long masks, we introduce a useful trick that improves vector scalar multiplication for long vectors.
- 2. In special cases, masks can be more efficiently derived starting from short seeds. We present two and three-party protocols which require communication proportional only to  $\kappa$  rather than to the mask length n.

## 6.1 *p*-Party Mask Generation

Our general mask generation technique,  $\Pi$ -MaskGen-VS (Fig. 5), allows p parties to preprocess length-n masks using only a single VS gate.

In this protocol, parties jointly sample a uniform sharing of a uniform bit  $[\![s]\!]$ and a uniform bitstring  $[\![r]\!]$ . The parties compute  $[\![sr]\!]$  via a VS gate, set the first mask to  $[\![M^0]\!] = [\![sr]\!]$ , and set the second mask to  $[\![M^1]\!] = [\![sr]\!] \oplus [\![r]\!]$ .

 $\Pi\text{-}\mathsf{MaskGen}\text{-}\mathsf{VS}$  is correct and secure.

Theorem 3. II-MaskGen-VS correctly implements MaskGen.

*Proof.* s and r are uniform. Depending on s, the product sr is, of course, either all zeros or r. Thus, setting  $\llbracket M^0 \rrbracket = \llbracket sr \rrbracket$  and  $\llbracket M^1 \rrbracket = \llbracket sr \rrbracket \oplus \llbracket r \rrbracket$  places the all zeros mask in  $M^s$ .

**Theorem 4.**  $\Pi$ -MaskGen-VS is secure against semi-honest corruption of up to p-1 parties in the VS-hybrid model.

*Proof.* We communicate only once: when evaluating a single VS gate. Hence, the simulator is trivially constructed from the VS gate simulator.  $\Box$ 

# Functionality: - Parties $P_1, \ldots, P_p$ compute MaskGen.

Semi-Honest Protocol:

▷ n is the output mask length.  $\Pi-\text{MaskGen-VS}(n):$ ▷ Parties jointy sample  $\boldsymbol{r}$  and s.  $\llbracket \boldsymbol{r} \rrbracket \in_{\$} \{0,1\}^{n}$   $\llbracket s \boldsymbol{r} \rrbracket \leftarrow \llbracket s \rrbracket \llbracket \boldsymbol{r} \rrbracket$  ▷ Computed via VS gate.  $\llbracket M^{0} \rrbracket \leftarrow \llbracket s \boldsymbol{r} \rrbracket$   $\llbracket M^{1} \rrbracket \leftarrow \llbracket s \boldsymbol{r} \rrbracket \oplus \llbracket \boldsymbol{r} \rrbracket$ ▷  $M^{s}$  is all zeros,  $M^{\overline{s}}$  is uniform. return  $\llbracket s \rrbracket, \llbracket M^{0} \rrbracket, \llbracket M^{1} \rrbracket$ 

Fig. 5. Protocol II-MaskGen-VS is our default method for generating masks, and is secure for an arbitrary number of parties.

Vector Scalar Multiplication for Long Vectors. We have shown that VS gates can efficiently compute pairs of masks. However, this requires us to evaluate VS gates over potentially long vectors: we compute VS gates over vectors with length proportional to the number of AND gates, which can be arbitrarily high.

As discussed in Sect. 2.3, we decompose vector scalar products into summands, some that are computed locally and others that are computed interactively. For each interactive summand, one party holds a bit a, one a vector b, and the two must jointly compute  $[\![ab]\!]$ . Let n be the length of b. To compute this product, the protocol presented by [HKP20] requires two messages of length n. In this section, we introduce a natural trick,  $\Pi$ -Half-VS-Long, that reduces this communication cost by half: only one message of length n need be sent, and the other can be derived from a pseudo-random seed. Both the functionality and the protocol are listed in Fig. 6. We explain the  $\Pi$ -Half-VS-Long trick in more detail in our proof of correctness. Our trick is similar to techniques in [KK13, ALSZ13]. Recall (from Sect. 2.3) that p(p-1) interactive summands emerge from a single vector scalar multiplication. Thus, we can compose a full vector scalar multiplication protocol from p(p-1) calls to  $\Pi$ -Half-VS-Long. We refer to this full protocol as  $\Pi$ -VS-Long.

**Theorem 5.**  $\Pi$ -Half-VS-Long, and hence  $\Pi$ -VS-Long, is correct.

*Proof.* The key observation is that  $P_1$ 's input bit a determines one of two possible outcomes for the vector scalar multiplication. If a = 0, the output is a sharing of all zeros. In this case,  $P_1$  and  $P_2$ 's output shares must XOR to zeros. If a = 1, the output is a sharing of  $P_2$ 's input vector  $\boldsymbol{b}$ .

```
Functionality:
   - INPUT: Party P_1 inputs a private bit a, P_2 inputs a private vector b.
   - OUTPUT: Parties output a uniform sharing [ab].
Semi-Honest Protocol:
 \Pi-Half-VS-Long(a, b):
 P_1 :
                                                                                         P_2 :
                                                                                         \boldsymbol{S} \in_{\$} \{0,1\}^{\kappa}
                                                                                          \triangleright P_2 expands S to get his share of a\mathbf{b}.
                                                                                         (a\mathbf{b})_2 \leftarrow G(\mathbf{S})
                                                                                         k \in_{\$} \{0,1\}^{\kappa}
                                                                                          \triangleright P_2 sends G(\mathbf{k}) \oplus ((a\mathbf{b})_2 \oplus \mathbf{b}) to P_1.
                                                                                         \boldsymbol{v} \leftarrow G(\boldsymbol{k}) \oplus ((a\boldsymbol{b})_2 \oplus \boldsymbol{b})
     \boldsymbol{v} \leftarrow \texttt{recv}(P_2)
                                                                                         send(\boldsymbol{v}, P_1)
     \triangleright P_1 and P_2 run OT.
     \triangleright P_1 is the OT receiver.
                                                                                          \triangleright P_2 is the OT sender.
     \boldsymbol{w} \leftarrow \mathsf{OT}_{\mathsf{recv}}(a)
                                                                                           OT_{send}(\boldsymbol{S}, \boldsymbol{k})
      \triangleright \boldsymbol{w} = a(\boldsymbol{S} \oplus \boldsymbol{k}) \oplus \boldsymbol{S}.
      \triangleright If a = 0, G(\boldsymbol{w}) = G(\boldsymbol{S}) = (a\boldsymbol{b})_2.
      \triangleright \text{ If } a = 1, G(\boldsymbol{w}) \oplus \boldsymbol{v} = G(\boldsymbol{k}) \oplus \boldsymbol{v}.
                                                     = (a\mathbf{b})_2 \oplus \mathbf{b}.
     (a\boldsymbol{b})_1 \leftarrow G(\boldsymbol{w}) \oplus a\boldsymbol{v}
     return (a\mathbf{b})_1
                                                                                        return (ab)_2
```

Fig. 6.  $\Pi$ -Half-VS-Long can be used to evaluate the interactive subterms that emerge from computing a VS gate.

We achieve this functionality with a single 1-out-of-2 OT of length  $\kappa$  strings.  $P_1$  acts as the OT receiver and uses as her choice bit a. If a = 0,  $P_1$  receives the seed that  $P_2$  used to generate her share.  $P_1$  expands this seed and obtains the same share as  $P_2$ . If a = 1,  $P_1$  receives a key that helps him to decrypt a ciphertext sent separately by  $P_2$ . The ciphertext holds a valid share of **b**.

The correctness of  $\Pi$ -VS-Long is immediate from the correctness of  $\Pi$ -Half-VS-Long and [HKP20]'s VS instantiation.

Next, we prove this faster vector scalar multiplication procedure secure. Ideally, we would modularly prove  $\Pi$ -Half-VS-Long and  $\Pi$ -VS-Long secure by simulation. Unfortunately, this is not possible. Specifically, suppose that in  $\Pi$ -Half-VS-Long  $P_1$  provides input a = 0. In this case,  $P_1$  outputs the expansion of the pseudorandom seed w received by the OT oracle. Now we need to simulate w that matches the expansion G(w) output by the protocol. Since Gis assumed secure, this simulation is infeasible. Therefore, we forego modularity, and instead prove the security of our top level circuit protocol,  $\Pi_{MT}$ , but where we instantiate the MaskGen functionality based on  $\Pi$ -VS-Long. With our PRGutilizing subprocedures 'inlined', we can prove the top-level protocol secure, since the expansions of PRG seeds no longer appear as protocol outputs and we can simulate the seeds simply by random strings.

**Theorem 6.** Let  $\Pi$ -MaskGen-VS' be the protocol  $\Pi$ -MaskGen-VS (Fig. 5), with VS instantiated by  $\Pi$ -VS-Long. Let  $\Pi'_{MT}$  be the protocol  $\Pi_{MT}$  (Fig. 4), with MaskGen instantiated by  $\Pi$ -MaskGen-VS'.  $\Pi'_{MT}$  is secure against semi-honest corruption of up to p-1 parties in the TripleGen-hybrid and OT-hybrid model.

*Proof.* By construction of a simulator.

The proof is similar to that of Theorem 2, so we elide most details. Because we explicitly instantiate MaskGen with  $\Pi$ -MaskGen-VS', we focus on the corresponding difference in the proof and explain how we simulate  $\Pi$ -MaskGen-VS' messages. Namely, we argue that all messages of  $\Pi$ -MaskGen-VS' are simulated by uniform bits.

 $\Pi$ -MaskGen-VS' invokes  $\Pi$ -VS-Long, which in turn makes 2(p-1) perparty calls to  $\Pi$ -Half-VS-Long. Each pair of parties  $P_i, P_j$  jointly call  $\Pi$ -Half-VS-Long twice, once where  $P_i$  is the receiver and once where  $P_i$  is the sender.

When  $P_i$  is the receiver, he receives two messages:

- First,  $P_i$  receives from  $P_j$  an encrypted share of **b**:  $P_j$  chooses a PRG seed  $\mathbf{k}$ , expands  $G(\mathbf{k})$ , and then sends  $G(\mathbf{k}) \oplus G(\mathbf{S}) \oplus \mathbf{b}$  to  $P_i$ . The simulator can simulate this received message by uniform bits because  $\mathbf{k}$  and  $\mathbf{S}$  are both uniform and because G is a secure PRG.
- Second,  $P_i$  receives a message from the OT oracle. Depending on its input bit  $a_i$ ,  $P_i$  receives either the seed k or the seed S (which was used to generate  $(ab)_2$ ). In either case, k and S are simulated by a uniform string. This does not conflict with the previously simulated message  $G(k) \oplus G(S) \oplus b$ , since one of the seeds k or S remains hidden from  $P_i$ .

If  $P_i$  is the sender, no message is received;  $P_i$ 's view is trivially simulated.

It is easy to see that the masks produced by  $\Pi$ -MaskGen-VS' are used exactly once in  $\Pi'_{MT}$ , and hence the inductive invariant of Theorem 2 is maintained.

 $\Pi'_{\mathsf{MT}}$  is secure.

# 

#### 6.2 Efficient 2PC and 3PC Mask Generation

In this section, we present two efficient implementations of MaskGen, one for two parties, and one for three. At a high level, these methods are based on (1) distributing pseudo-random seeds and (2) expanding the seeds with a PRG into *n*-bit masks. The advantage of these seed-based methods is that they use communication proportional only to  $\kappa$ . This is a significant improvement over II-MaskGen-VS, which uses communication proportional to the mask length *n*.

```
Functionality:
   - Parties P_1 and P_2 compute MaskGen.
Semi-Honest Protocol:
                     \Pi-MaskGen-2P(n):
                        [s] \in \{0,1\}; [S] \in \{0,1\}^{\kappa}
                        [s\mathbf{S}] \leftarrow [s][\mathbf{S}] \triangleright Computed via VS gate.
                        \triangleright P_1, P_2 compute uniform shares of two seeds \boldsymbol{S}^0, \boldsymbol{S}^1.
                        \llbracket \boldsymbol{S}^0 
rbracket \leftarrow \llbracket s \boldsymbol{S} 
rbracket ; \ \llbracket \boldsymbol{S}^1 
rbracket \leftarrow \llbracket s \boldsymbol{S} 
rbracket \oplus \llbracket \boldsymbol{S} 
rbracket
                        \triangleright P_1, P_2 expand their PRG seeds into n-bit masks.
                        \llbracket M^0 \rrbracket \leftarrow \Pi - \texttt{Expand} - 2\mathsf{P}(\llbracket S^0 \rrbracket, n)
                        \llbracket M^1 \rrbracket \leftarrow \amalg - \texttt{Expand} - 2\mathsf{P}(\llbracket S^1 \rrbracket, n)
                        \triangleright M^s is all zeros, M^{\overline{s}} is uniform.
                       return [\![s]\!], [\![M^0]\!], [\![M^1]\!]
                     \triangleright P_1 and P_2 expand their respective seeds into n-bit strings.
                     \Pi-Expand-2P((S_1, S_2), n):
                        P_1: return G(\mathbf{S}_1, n)
                                                                                                P_2: return G(\mathbf{S}_2, n)
```

Fig. 7. II-MaskGen-2P is an efficient two-party protocol for generating masks.

**Two Party Improved Protocol.** Figure 7 presents our protocol for two parties,  $\Pi_{MT}$ -2P. Here, the parties use vector scalar multiplication to distribute XOR sharings of two length- $\kappa$  strings; one sharing encodes a uniform string and one encodes the all zeros string. The parties then interpret their respective shares as PRG seeds and apply G. Because of the nature of XOR sharings, this means that for the all zeros sharing, the parties generate the same pseudorandom expansion, so the resultant expansions are a sharings of all zeros. In contrast, the expansion of the random sharing leads to a larger pseudorandom sharing.

By using this protocol, the two parties can share arbitrarily long masks at the cost of only  $O(\kappa)$  bits of communication.

#### Theorem 7. II-MaskGen-2P correctly implements MaskGen.

*Proof.* By the correctness of VS gates and properties of XOR shares.

One of  $[S^0]$  and  $[S^1]$  is a sharing of zeros while the other is a sharing of a random bitstring. The position of the all-zeros sharing is determined by a uniform bit s. Consider one such sharing, and interpret both shares as PRG seeds. If the parties' two seeds are the same, then the expanded masks will also be the same, and will therefore XOR to zeros. If the parties' two seeds differ, then, by the properties of the PRG, the expanded masks will XOR to a uniform value.

**Π-MaskGen-2P** is correct.

We next prove  $\Pi$ -MaskGen-2P secure. Like  $\Pi$ -Half-VS-Long, we unfortunately cannot modularly prove this protocol secure by simulation: each party outputs the expansion of a PRG seed that appears in the party's view. We therefore instead prove that  $\Pi_{MT}$  is secure in the case where we instantiate MaskGen with  $\Pi$ -MaskGen-2P. This higher level approach works because the output of a PRG does not appear as final output, so the PRG seeds can be simulated.

**Theorem 8** ( $\Pi_{MT}$  - 2P Security). Let  $\Pi_{MT}$  - 2P be  $\Pi_{MT}$  (Fig. 4), where we instantiate MaskGen with  $\Pi$  - MaskGen - 2P.  $\Pi_{MT}$  - 2P is secure against semi-honest corruption of a single party in the TripleGen-hybrid and VS-hybrid model.

*Proof.* The proof is nearly identical to that of  $\Pi_{MT}$  (Theorem 2); we therefore focus our discussion on the call to  $\Pi$ -MaskGen-2P.

In  $\Pi$ -MaskGen-2P, the parties jointly sample uniform sharings  $[\![s]\!]$  and  $[\![S]\!]$ , and then compute  $[\![sS]\!]$  via VS. VS outputs uniform sharings, and so the message each party receives from the VS oracle is simulated by uniform bits. The parties locally expand their shares to obtain masks  $[\![M^0]\!]$  and  $[\![M^1]\!]$ . Because G is a secure PRG,  $[\![M^{\bar{s}}]\!]$  is a sharing of a *uniform string*.

Now, recall Theorem 2's inductive invariant: we must ensure that the toplevel protocol uses the one-time-pad part of each multiplication triple at most once.  $\Pi_{MT}$  - 2P XORs the current triples with both  $[\![M^0]\!]$  and  $[\![M^1]\!]$ . Because  $[\![M^{\bar{s}}]\!]$  is a sharing of a uniform string, this appropriately rerandomizes the triples into the inactive branch, and hence we support the inductive invariant.

 $\Pi_{MT}$  - 2P is secure against semi-honest corruption of a single party.

Three Party Informal MaskGen Protocol. The three party efficient MaskGen protocol is a relatively straightforward generalization of the two party protocol. However, the mask generation is notationally complex, so for simplicity we present informally. A similar technique was used in [BKKO20] to help to construct a two-private three-server distributed point function.

Unlike the two-party protocol,  $P_1$ ,  $P_2$ , and  $P_3$  each obtain two *pairs* of seeds. Each pair is used to generate one mask by (1) expanding both seeds with a PRG into an *n*-bit string and (2) XORing the two expanded outputs together. At a high level, we ensure that:

- 1. For the all zeros mask, each party holds the same seed as one other party. Thus, their PRG expansions XOR to zeros.
- 2. For the uniform mask, each party holds a seed distinct from all other parties. Thus, their PRG expansions XOR to a uniform mask.

The key difficulty is in making the two above scenarios indistinguishable from the perspective of any strict subset of parties. We contrast these two scenarios, showing that they appear indistinguishable.

In the first case, the parties are given seeds as follows:

 $P_1: S_1, S_2$   $P_2: S_1, S_3$   $P_3: S_2, S_3$ 

If we consider an adversary who corrupts any two parties, he will see that one seed is shared between them and the others appear uniform. In the second case the parties are given seeds as follows:

 $P_1: S_4, S_5$   $P_2: S_4, S_6$   $P_3: S_4, S_7$ 

As in the first case, an adversary that corrupts two parties sees one seed in common; the others are uniform. Hence, the cases are indistinguishable for any one or two parties.

Thus,  $\Pi_{MT}$  instantiated with this three party MaskGen trick results in a secure and correct protocol. Seed distribution can easily be implemented by GMW extended with VS gates: the parties sample seven uniform seeds  $S_1, ..., S_7 \in_{\$} \{0, 1\}^{\kappa}$ , swap them using a VS gate, and output each of them to the appropriate party.

# 7 Implementation

We implemented our approach in C++. Specifically, we implemented  $\Pi_{MT}$ , instantiating MaskGen with both  $\Pi$ -MaskGen-VS and  $\Pi$ -MaskGen-2P (we did not implement the three-party variant). We instantiated TripleGen with the natural approach based on random OT. For comparison, we also (1) implemented a standard triple-based protocol and (2) incorporated MOTIF's implementation into our repository. We discuss key aspects of our implementation in Sect. 7.1.

To the best of our knowledge, there is no comprehensive suite of MPC benchmark circuits, particularly for circuits that include conditional branches. Thus, we implemented a random circuit generator to produce benchmarks. In designing the circuit generator, our key goal was to capture the impact of branch alignment on MOTIF's performance such that we can highlight our improvement. The circuit generator samples circuits with a variety of branch alignments. We describe details of circuit generation in Sect. 7.2.

#### 7.1 Key Implementation Aspects

Our implementation of  $\Pi_{MT}$  is straightforward, but we note some of its interesting aspects. We use the 1-out-of-2 OT protocol of [IKNP03] as implemented by EMP [WMK16] in order to generate both triples and masks. In  $\Pi_{MT}$  and standard triple-based protocol, we list AND gates in layers so that we can parallelize broadcasts for ANDs in the same circuit layer. MOTIF similarly parallelizes OTs for VS gates in the same layer. Thus, all three protocols use communication rounds proportional to the circuit's multiplicative depth.

## 7.2 Random Circuit Generation

Circuit generation consists of three main steps:

1. We parameterize circuits on the numbers of conditional branches, the number of circuit layers, the number of XOR and AND gates per branch, and the number of input/output wires to each branch. Each branch uses the same parameters.

- 2. We uniformly assign a number of gates to each branch's layers. We implement this functionality with a RANCOM algorithm [NW78], which is based on the balls-in-cells problem and is called separately for each branch.
- 3. We connect the gates layer by layer. Specifically, we maintain a pool of wires whose value has already been assigned (i.e., it is a branch input or the output of a gate). For each gate, we uniformly sample two inputs from the pool and choose a fresh output wire. Once a layer has been entirely connected, we add all of that layer's gate outputs to the pool.

The above strategy is relatively ad-hoc, and may not be representative of all applications. Again, we adopt the above approach (1) to show the impact of circuit alignment on our relative performance over MOTIF and (2) because no standard benchmark suite exists.

# 8 Performance Evaluation

We compare  $\Pi_{MT}$  to MOTIF and the standard triple-based protocol. We compare these protocols for various numbers of parties. All experiments were run on a commodity laptop running Ubuntu 20.04 with an Intel(R) Core(TM) i5-8350U CPU @ 1.70 GHz and 16 GB RAM. All parties were run on the same machine and network settings were configured with the tc command. We averaged each data point over 100 runs.

In each experiment, we generated random circuits as described in Sect. 7.2. We fixed the circuit parameters to 10 layers, 30,000 AND gates per branch and 30,000 XOR gates per branch. We set the number of branch input and output wires to 128. We generated a new circuit with these same parameters for each run of each experiment. We performed and report on three experiments:

- 1. We fixed the number of branches to two, fixed the number of parties to two, and explore variation in performance based on branch alignment (Sect. 8.1).
- 2. We fixed the number of parties to two, varied the number of branches, and explore corresponding communication and wall-clock runtime (Sect. 8.2 and Sect. 8.3).
- 3. We fixed the branching factor to 16, varied the number of parties, and explore corresponding communication.

Each experiment shows that our approach is preferred in almost every setting.

## 8.1 Branch Alignment

We first demonstrate MOTIF's dependence on circuit topology in the case of two branches. Figure 8 plots the distribution of the number of random OTs needed for two parties to evaluate each protocol. Across all 100 runs,  $\Pi_{MT}$  and the standard



Fig. 8. Random OTs required to evaluate a circuit with two branches.

triple-based protocol always need the same number of OTs. On the other hand, MOTIF's performance differs depending on branch alignment. Because we sample alignments uniformly, this results in an increased number of consumed OTs.

Discussion. For two branches and on average, our approach required 1.5× fewer OTs than MOTIF and consistently required 2× fewer OTs than the standard triple-based protocol. Given that random OTs are the main communication bandwidth bottleneck, MOTIF is far from reducing communication by the optimal factor 2×.  $\Pi_{MT}$  never used more OTs than MOTIF. MOTIF's best run required 1.12× more OTs than  $\Pi_{MT}$  and 1.71× in the worst case.

## 8.2 Communication

We next report our 2PC communication improvement over both MOTIF and the standard triple-based protocol as a function of branching factor. We instantiated MaskGen with  $\Pi$ -MaskGen-2P.

Figure 9 plots both preprocessing communication and total communication. For further reference, Fig. 10 tabulates our communication improvement.

In our measurements, preprocessing constitutes both triple generation and mask generation. Each data point is averaged over 100 runs; the amount of communication may differ from run to run because each circuit has a randomly generated topology. In  $\Pi_{MT}$  the total communication is constant. In contrast, MOTIF communication differs significantly across runs due to the layering issue explained in Sect. 2.4.

Discussion. In this metric,  $\Pi_{MT}$  is preferred:

- **Preprocessing Communication.** On 16 branches, we improve communication by  $2.96 \times$  over MOTIF and by  $14.4 \times$  over the standard triple-based protocol. There are three reasons we did not achieve  $16 \times$  improvement over standard triple-based protocol. First, both the standard triple-based approach and ours must perform the same number of *base OTs* to set up an OT extension matrix [IKNP03]. This adds a small amount of communication



**Fig. 9.** 2PC comparison of  $\Pi_{MT}$  against MOTIF and a standard triple-based protocol. We plot the following metrics as functions of the *branching factor*: the preprocessing per-party communication (top), the total per-party communication (bottom).

| # Branches | $\Pi_{MT}(MB)$ | $\Pi$ - MOTIF $(\mathbf{MB})$ | Broadcast (MB) |
|------------|----------------|-------------------------------|----------------|
| 2          | 0.57           | 0.86                          | 1.11           |
| 4          | 0.58           | 1.16                          | 2.19           |
| 8          | 0.61           | 1.46                          | 4.10           |
| 16         | 0.67           | 1.79                          | 8.19           |
| 32         | 0.80           | 2.14                          | 16.09          |

Fig. 10. Per-party communication improvement for our 2PC random circuit experiment as a function of the branching factor.

(around 20KB) common to both approaches, which cuts slightly into our advantage. Second, we need one OT per each of b - 1 mask pairs. Third, entering and exiting conditionals have very small overhead differences.

- Total Communication. On 16 branches, our approach improves total communication by  $2.6 \times$  over MOTIF and by  $12 \times$  over the standard protocol. Our total communication improvement is lower than our preprocessing improvement because our evaluation phase communication is not improved. While improvement over the standard protocol is almost constant across runs, the improvement over MOTIF differs due to varying circuit topology: our improvement ranges from  $2.16 \times$  to  $2.93 \times$ .

#### 8.3 Wall-Clock Time



**Fig. 11.** 2PC comparison of  $\Pi_{MT}$  against MOTIF and a standard triple-based protocol. We plot the following metrics as functions of branching factor: wall-clock time on a LAN (left), the wall-clock time on a LAN where other processes share bandwidth (center), and the wall-clock time on a WAN (right).

We next present the wall-clock time improvements over MOTIF and the standard triple-based protocol. We consider three simulated network settings:

- 1. LAN: A simulated gigabit ethernet connection with 1Gbps bandwidth and 2ms round-trip latency.
- 2. Shared LAN: A simulated local area network connection where the protocol shares network bandwidth with a number of other processes. The connection features 50Mbps bandwidth and 2ms round-trip latency.
- 3. WAN: A simulated wide area network connection with 100Mbps bandwidth and 20ms round-trip latency.

Figure 11 plots total wall-clock time for each network setting.

Discussion. In these metrics,  $\Pi_{MT}$  is preferred:

- LAN wall-clock time. On a fast LAN, our approach's improvement is diminished compared to our communication improvement. On average and for 16 branches, we improve by  $1.52 \times$  over MOTIF and by  $1.81 \times$  over the standard protocol. A 1Gbps network is very fast, and our modest hardware struggles to keep up with available bandwidth.
- Shared LAN wall-clock time. On the more constrained shared LAN, our hardware easily keeps up with the communication channel, and we see corresponding improvement. On average and for 16 branches, we achieve 2.26× speedup over MOTIF and 7.43× speedup over the standard protocol.
- WAN wall-clock time. On this high-latency network our advantage is less pronounced. On average and for 16 branches, we achieve 1.14× speedup over MOTIF and 2.04× speedup over the standard protocol. This high-latency network highlights the weakness of multi-round protocols in such settings.



Fig. 12. Protocol per-party communication usage as a function of the number of parties. Like MOTIF and the standard protocol, we consume per-party communication linear in the number of parties.

#### 8.4 Scaling to MPC

Our last experiment emphasizes our approach's scaling to the multiparty setting. This experiment uses the same circuit parameters as the former experiments, but we fix the number of branches to 16. We implemented  $\Pi$ -MaskGen-VS and ran the circuit among 4-8 parties. Figure 12 plots per-party communication as a function of the number of parties.

Discussion.  $\Pi_{MT}$  works well in the multiparty setting. Our optimization does not add additional costs as compared to MOTIF and standard triple-based protocol. Each technique consumes communication quadratic in the number of parties.

Acknowledgements. This work was supported in part by NSF award #1909769, by a Facebook research award, and by Georgia Tech's IISP cybersecurity seed funding (CSF) award.

## References

- [AGKS19] Alhassan, M.Y., Günther, D., Kiss, Á., Schneider, T.: Efficient and scalable universal circuits. Cryptology ePrint Archive, Report 2019/348 (2019). https://eprint.iacr.org/2019/348
- [ALSZ13] Asharov, G., Lindell, Y., Schneider, T., Zohner, M.: More efficient oblivious transfer and extensions for faster secure computation. In: Sadeghi, A.-R., Gligor, V.D., Yung, M. (eds.) ACM CCS 2013, pp. 535–548. ACM Press, November 2013
- [BCG+19] Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., Scholl, P.: Efficient pseudorandom correlation generators: silent OT extension and more. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11694, pp. 489–518. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26954-8\_16
- [BDOZ11] Bendlin, R., Damgård, I., Orlandi, C., Zakarias, S.: Semi-homomorphic encryption and multiparty computation. In: Paterson, K.G. (ed.) EURO-CRYPT 2011. LNCS, vol. 6632, pp. 169–188. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4\_11

- [Bea92] Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1\_34
- [BIB89] Bar-Ilan, J., Beaver, D.: Non-cryptographic fault-tolerant computing in constant number of rounds of interaction. In: Rudnicki, P. (ed.) 8th ACM PODC, pp. 201–209. ACM, August 1989
- [BKKO20] Bunn, P., Katz, J., Kushilevitz, E., Ostrovsky, R.: Efficient 3-party distributed ORAM. In: Galdi, C., Kolesnikov, V. (eds.) SCN 2020. LNCS, vol. 12238, pp. 215–232. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-57990-6\_11
- [CDE+18] Cramer, R., Damgård, I., Escudero, D., Scholl, P., Xing, C.: SPD Z<sub>2k</sub>: efficient MPC mod 2<sup>k</sup> for dishonest majority. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10992, pp. 769–798. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96881-0\_26
- [DKL+13] Damgård, I., Keller, M., Larraia, E., Pastro, V., Scholl, P., Smart, N.P.: Practical covertly secure MPC for dishonest majority – Or: breaking the SPDZ limits. In: Crampton, J., Jajodia, S., Mayes, K. (eds.) ESORICS 2013. LNCS, vol. 8134, pp. 1–18. Springer, Heidelberg (2013). https://doi. org/10.1007/978-3-642-40203-6\_1
- [DPSZ12] Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5\_38
- [FKOS15] Frederiksen, T.K., Keller, M., Orsini, E., Scholl, P.: A unified approach to MPC with preprocessing using OT. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9452, pp. 711–735. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48797-6\_29
- [GKS17] Günther, D., Kiss, Á., Schneider, T.: More efficient universal circuit constructions. Cryptology ePrint Archive, Report 2017/798 (2017). http:// eprint.iacr.org/2017/798
- [GMW87] Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game or a completeness theorem for protocols with honest majority. In: Aho, A. (ed.) 19th ACM STOC, pages 218–229. ACM Press, May 1987
  - [HK20a] Heath, D., Kolesnikov, V.: Stacked garbling. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020. LNCS, vol. 12171, pp. 763–792. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56880-1\_27
  - [HK20b] Heath, D., Kolesnikov, V.: Stacked garbling for disjunctive zero-knowledge proofs. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020. LNCS, vol. 12107, pp. 569–598. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45727-3\_19
- [HKP20] Heath, D., Kolesnikov, V., Peceny, S.: MOTIF: (almost) free branching in GMW - via vector-scalar multiplication. In: Moriai, S., Wang, H. (eds.) ASIACRYPT 2020. Part III, volume 12493 of LNCS, pp. 3–30. Springer, Heidelberg (2020)
- [IKNP03] Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4\_9

- [KK13] Kolesnikov, V., Kumaresan, R.: Improved OT extension for transferring short secrets. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 54–70. Springer, Heidelberg (2013). https://doi.org/10.1007/ 978-3-642-40084-1\_4
- [Kol18] Kolesnikov, V.: Free IF: how to omit inactive branches and implement Suniversal garbled circuit (Almost) for free. In: Peyrin, T., Galbraith, S. (eds.) ASIACRYPT 2018. LNCS, vol. 11274, pp. 34–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03332-3\_2
- [KOS16] Keller, M., Orsini, E., Scholl, P.: MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In: Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C., Halevi, S. (eds.) ACM CCS 2016, pp. 830–842. ACM Press, October 2016
- [KPR18] Keller, M., Pastro, V., Rotaru, D.: Overdrive: making SPDZ great again. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10822, pp. 158–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78372-7\_6
  - [KS08] Kolesnikov, V., Schneider, T.: A practical universal circuit construction and secure evaluation of private functions. In: Tsudik, G. (ed.) FC 2008. LNCS, vol. 5143, pp. 83–97. Springer, Heidelberg (2008). https://doi.org/ 10.1007/978-3-540-85230-8\_7
  - [KS16] Kiss, Á., Schneider, T.: Valiant's universal circuit is practical. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 699–728. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-3\_27
- [LMS16] Lipmaa, H., Mohassel, P., Sadeghian, S.: Valiant's universal circuit: Improvements, implementation, and applications. Cryptology ePrint Archive, Report 2016/017 (2016). http://eprint.iacr.org/2016/017
- [LOS14] Larraia, E., Orsini, E., Smart, N.P.: Dishonest majority multi-party computation for binary circuits. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8617, pp. 495–512. Springer, Heidelberg (2014). https:// doi.org/10.1007/978-3-662-44381-1\_28
- [LYZ+20] Liu, H., Yu, Y., Zhao, S., Zhang, J., Liu, W.: Pushing the limits of valiant's universal circuits: simpler, tighter and more compact. IACR Cryptology ePrint Archive 2020, 161 (2020)
- [NNOB12] Nielsen, J.B., Nordholt, P.S., Orlandi, C., Burra, S.S.: A new approach to practical active-secure two-party computation. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 681–700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5\_40
  - [NW78] Nijenhuis, A., Wilf, H.S.: Combinatorial Algorithms For Computers and Calculators. AP Academic Press, New York (1978)
  - [Val76] Valiant, L.G.: Universal circuits (preliminary report). In: STOC, pp. 196– 203, New York, NY, USA, 1976. ACM Press
- [WMK16] Wang, X., Malozemoff, A.J., Katz, J.: MP-toolkit: Efficient MultiParty computation toolkit (2016). https://github.com/emp-toolkit
- [ZYZL18] Zhao, S., Yu, Y., Zhang, J., Liu, H.: Valiant's universal circuits revisited: an overall improvement and a lower bound. Cryptology ePrint Archive, Report 2018/943 (2018). https://eprint.iacr.org/2018/943