The TinyTable Protocol for 2Party Secure Computation, or: GateScrambling Revisited
 12 Citations
 3.2k Downloads
Abstract
We propose a new protocol, nicknamed TinyTable, for maliciously secure 2party computation in the preprocessing model. One version of the protocol is useful in practice and allows, for instance, secure AES encryption with latency about 1 ms and amortized time about 0.5 \(\upmu \)s per AES block on a fast cloud setup. Another version is interesting from a theoretical point of view: we achieve a maliciously and unconditionally secure 2party protocol in the preprocessing model for computing a Boolean circuit, where both the communication complexity and preprocessed data size needed is O(s) where s is the circuit size, while the computational complexity is \(O(k^\epsilon s)\) where k is the statistical security parameter and \(\epsilon <1\) is a constant. For general circuits with no assumption on their structure, this is the best asymptotic performance achieved so far in this model.
1 Introduction
In 2party secure computation, two parties A and B want to compute an agreed function securely on privately held inputs, and we want to construct protocols ensuring that the only new information a party learns is the intended output.
In this paper we will focus on malicious security: one of the parties is under control of an adversary and may behave arbitrarily. As is well known, this means that we cannot guarantee that the protocol always gives output to the honest party, but we can make sure that the output, if delivered, is correct. It is also well known that we cannot accomplish this task without using a computational assumption, and in fact heavy publickey machinery must be used to some extent.
However, as observed in several works [BDOZ11, DPSZ12, NNOB12, DZ13], one can confine the use of cryptography to a preprocessing phase where the inputs need not be known and can therefore be done at any time prior to the actual computation. The preprocessing produces “raw material” for the online phase which is executed once the inputs are known, and this phase can be information theoretically secure, and usually has very small computational and communication complexity, but round complexity proportional to the depth of the computation in question. An alternative (which is not our focus here) is to use Yaogarbled circuits. This approach is incomparable even in the case we consider here where the function to compute is known in advance. This is because malicious security requires many garbled circuit to be stored and evaluated in the online phase. Hence, the round and communication complexity is smaller than for information theoretic protocols, but the storage and computational complexity is larger.
We will focus on the case where the desired computation is specified as a Boolean circuit. The case of arithmetic circuits over large fields was handled in [DPSZ12] which gave a solution where communication and computational complexities as well as the size of the preprocessed data (called data complexity in the following) are proportional to the circuit size. The requirement is that the field has \(2^{\varOmega (k)}\) elements where k is the security parameter and the allowed error probability is \(2^{ \varOmega (k)}\).
On the other hand, for Boolean circuits, state of the art is the protocol from [DZ13], nicknamed MiniMac which achieves data and communication complexity O(s) where s is the circuit size, and computational complexity \(O(k^\epsilon s)\), where \(\epsilon <1\) is a constant. For an alternative variant of the protocol, all complexities are \(O({\text {polylog}}(k) s)\). However, the construction only works for circuits with a sufficiently nice structure, called “well formed” circuits in [DZ13]. Informally, a wellformed circuit allows a modest amount of parallelization throughout the computation – for instance, very tall and skinny circuits are not allowed. If wellformedness is not assumed, both complexities would be \(\varOmega (ks)\) using known protocols.
On the practical side, many of the protocols in the preprocessing model are potentially practical and several of them have been implemented. In particular, implementations of the MiniMac protocol were reported in [DLT14, DZ16]. In [DLT14], MiniMac was optimised and used for computing many instances of the same Boolean circuit in parallel, while in [DZ16] the protocol was adapted specifically for computing the AES circuit, which resulted in an implementation with latency about 6 ms and an amortised time of 0,4 ms per AES block.
Our Contribution. In this paper, we introduce a new protocol for the preprocessing model, nicknamed TinyTable. The idea is to implement each (nonlinear) gate by a scrambled version of its truthtable. Players will do lookups in the tables using bits that are masked by uniformly random bits chosen in the preprocessing phase, together with the tables.
The idea of gatescrambling goes back at least to [CDvdG87] where a (much less efficient) approach based on the quadratic residuosity problem was proposed. Scrambled truth tables also appear more recently in [IKM+13], but here the truthtable for the entire function is scrambled, leading to a protocol with complexity exponential in the length of the inputs (but very small communication). Even more recently, in [DZ16], a (different form of) table lookup was used to implement the AES Sboxes.
What we do here is to observe that the idea of scrambled truth tables makes especially good sense in the preprocessing model and, more importantly, if we combine this with the “right” authentication scheme, we get an extremely practical and maliciously secure protocol. This first version of our protocol has communication complexity O(s) and data and computational complexity O(ks). Although the computational complexity is asymptotically inferior to previous protocols when counting elementary bit operations, it works much better in practice: XOR and NOT gates require no communication, and for each nonlinear gate, each player sends and receives 1 bit and XORs 1 word from memory into a register. This means that TinyTable saves a factor of at least 2 in communication in comparison to standard passively secure protocols in the preprocessing model, such as GMW with precomputed OT’s or the protocol using circuit randomization via Beavertriples.
We implemented a version of this protocol that was optimised for AES computation, by using tables for each Sbox. This is more costly in preprocessing but, compared to a Boolean circuit implementation, it reduces the round complexity of the online phase significantly (to about 10). On a fast cloud setup, we obtain online latency of 1 ms and amortized time about 0.5 \(\upmu \)s per AES block for an error probability of \(2^{64}\). To the best of our knowledge, this is the fastest online time obtained for secure twoparty AES with malicious security. To illustrate what we gain from the AES specific approach, we also implemented the version that works for any Boolean circuit and applied it to an AES circuit computing the same function as the optimized protocol does. On the same cloud setup and same security level, we obtained a latency of 2.4 ms and amortized time about 10 \(\upmu \)s per AES block.
We describe how one can do the preprocessing we require based on the preprocessing phase of the TinyOT protocol from [NNOB12]. This protocol is basically maliciously secure OT extension with some extra processing on top. In the case of Boolean circuits, the work needed to preprocess an AND gate roughly equals the work we would need per AND gate in TinyOT. For the case of AES Sbox tables, we show how to use a method from [DK10] to preprocess such tables. This requires 7 binary multiplications per table entry and local computation.
As for the speeds of preprocessing one can obtain, the best approaches and implementations of this type of protocol are from [FKOS15]. They do not have an implementation of the exact case we need here (2party TinyOT) but they estimate that one can obtain certainly more than 10.000 AND gates per second and most likely up to 100.000 per second [Kel]. This would mean that we could preprocess a binary AES circuits in about 40 ms.
Our final contribution is a version of our protocol that has better asymptotic performance. We get data and communication complexity O(s), and computational complexity \(O(k^\epsilon s)\), where \(\epsilon <1\) is a constant. Alternatively we can also have all complexities be \(O({\text {polylog}}(k) s)\). While this is the same result that was obtained for the MiniMac protocol, note that we get this for any circuit, not just for wellformed ones. Roughly speaking, the idea is to use the MiniMac protocol to authenticate the bits that players send during the online phase. This task is very simple: it parallelizes easily and can be done in constant depth. Therefore we get a better result than if we had used MiniMac directly on the circuit in question.
2 Construction
We show a 2PC protocol for securely computing a Boolean circuit C for two players A and B. The circuit may contain arbitrary gates taking two inputs and giving one output. We let \(G_1, \ldots , G_N\) be the gates of the circuit, and let \(w_1, \ldots , w_W\) be the wires. We note that arbitrary fanout is allowed, and we do not assume special fanout gates for this, we simply allow that an arbitrary number of copies of a wire leave a gate, and all these copies are assigned the same wire index. We assume for simplicity that both parties are to learn the output.
We will fix some arbitrary order in which the circuit can be evaluated gate by gate, such that the output gates come last, and such that when we are about to evaluate gate i, its inputs have already been computed. We assume throughout that the indexing of the gates satisfy this constraint. We call the wires coming out of the output gates the output wires.
Proposition 1
\(F^{pre}_{sem}\) composed with \(\pi _{sem}\) implements (with perfect semihonest security) the ideal functionality \(F_{\mathsf {SFE}}\) for secure function evaluation.
In Fig. 3, we show a functionality that does preprocessing as in the semihonest case, but in addition commits players bit by bit to the content of the tables. The idea is that, for entry \(A_i[c,d]\) in a table, A is also given a random string \(a_{i,c,d}^{A_i[c,d]}\) which serves as an authentication code that A can use to show that he sends the correct value of \(A_i[c,d]\), while B is given the pair \(a_{i,c,d}^0, a_{i,c,d}^1\) serving as a key that B can use to verify that he gets the correct value. Of course, using the authentication codes directly in this way, we would have to send k bits to open each bit. However, in our application, we can bring down the communication needed to (essentially) \(\ell +k\) bits, because we can delay verification of most of the bits opened. The idea is that, instead of sending the authentication codes, players will accumulate the XOR of all of them and check for equality later at the end. The protocol shown in Fig. 5 uses this idea to implement maliciously secure computation.
Theorem 1
\(F^{pre}_{mal}\) composed with \(\pi _{mal}\) implements \(F_{\mathsf {SFE}}\) with statistical security against malicious and static corruption.
Proof
We will show security in the UC model (in the variant where the environment also plays the role of the adversary), which means we need to exhibit a simulator S such that no (unbounded) environment Z can distinguish \(F^{pre}_{mal}\) composed with \(\pi _{mal}\) from S composed with \(F_{\mathsf {SFE}}\). The case where no player is corrupt is trivial to simulate since messages sent contain uniformly random bits, with the only exception of the output stage where the shares sent determine the outputs  which the simulator knows, and hence these shares are also easy to simulate. We now describe the simulation for the case where A is corrupt (the other case where B is corrupt is similar).

Setup. S runs internally copies of \(F^{pre}_{mal}\) and \(\pi _{mal}\) where it corrupts A, and gives B a default input, say all zeros. It will let Z play the role of the corrupt A. We assume that both players send the same circuit C to be computed (otherwise there is nothing to simulate) and S will send C to \(F_{\mathsf {SFE}}\).

Input. In the input stage of the protocol, when Z (corrupt A) sends \(e_u\) for an input wire, S computes \(b_u = e_u\oplus r_u\) and sends it to \(F_{\mathsf {SFE}}\). This is the extracted input of A for this wire, note that S knows \(r_u\) from (its copy of) \(F^{pre}_{mal}\).

Computing stage. For the first M gates, S will simply let its copy of B run the protocol with Z acting as corrupt A. If B aborts, S sends “abort” to \(F_{\mathsf SFE}\) and stops, otherwise it sends “compute result”.

Output stage. S gets the outputs from \(F_{\mathsf SFE}\), i.e., \(b_o\) for each output wire \(w_o\). S modifies the table inside its copy of B such that for each output gate \(G_i\) with input wires \(w_u, w_v\) and output \(w_o\), \(B_i[e_u,e_v]\) satisfies \(b_o= r_o\oplus A_i[e_u,e_v] \oplus B_i[e_u,e_v]\). Note that S knows \(r_o\) and \(A_i[e_u,e_v] \) from its copy of \(F^{pre}_{mal}\). S now runs the output stage according to the protocol. If Z lets the protocol finish normally, S sends “deliver output” to \(F_{\mathsf {SFE}}\), otherwise it sends “abort”.
To show that this simulation is statistically good, observe first that the simulation up until the point where the outputs are opened is perfect: this follows since in the computation phase of the protocol, the honest player sends only uniformly random bits that are independent of anything else in the environment’s view. Therefore this also holds for the simulated honest player, even if he runs with a default input. The reason why only random bits are sent is as follows: whenever the honest player sends a bit, it can be assigned to a particular wire, say \(w_v\) that has not been handled before. Therefore no information about the wire mask \(r_v\) was released before. The bit sent by the honest player can be written as \(r_v\) xored with other bits and since \(r_v\) was chosen independently of anything else, the bit sent is also random in the adversary’s view.
In the verification step (last part of Item 4 of the protocol), the honest player sends the correct verification value \(m_B\) that can be computed from what the environment has already seen. The correct verification value \(m_A= t_B\) to be sent by A is well defined (it can be computed easily from the view of the environment), and if the value actually sent is incorrect, the protocol will abort in both the real and the simulated process.
Now, if the protocol proceeds to the output step, the only new information the environment sees is, for each output gate \(G_i\) with input wires \(w_u, w_v\) and output \(w_o\): the output \(b_o\) as well as B’s share \(e_B= B_i[e_u,e_v]\) and the verification value \(b^{e_B}_{i,e_u,e_v}\). Note the latter two values are determined from \(b_o\) and the environment’s view so far: \(e_B\) is determined by \(b_o= e_A\oplus e_B\) and then \(b^{e_B}_{i,e_u,e_v}\) is determined by what A received from the preprocessing initially. It follows that the entire simulation is perfect given the event that the output generated is the same in the real as in the simulated process.
Now, in the simulation the output is always the correct output based on the inputs determined in the input stage. Therefore, to argue statistically close simulation, it is sufficient to show that the real protocol completes with incorrect output except with probability \(2^{k}\). This in turn follows easily from the fact that if A always sends correct shares from the tables, the output is always correct. And if he does not, the verification value corresponding to the incorrect message is completely unknown to A and can be guessed with probability at best \(2^{k}\). Hence B will abort in any such case, except with probability \(2^{k}\).
This theorem can quite easily be extended to adaptive corruption (of one player). For this case, the simulator would start running internally a copy of the protocol where A, B are honest and use default inputs. When A (or B) will be corrupted, one observes that the internal view of A can easily be modified so it is consistent with the real input of A (which the simulator is now given). The simulator gives this modified view of A to the environment and continues as described in the proof of the above theorem.
2.1 Free XOR
It is easy to modify our construction to allow noninteractive processing of XOR gates. For simplicity, we only show how this works for the case of semihonest security, malicious security is obtained in exactly the same way as in the previous section (Fig. 6).
2.2 Removing NOTGates
A slight change in the preprocessing allows us to completely remove the online operations associated with NOT gates. Namely, when preprocessing a NOT gate \(G_i\), we will set \(r_o= 1r_v\), where \(w_u, w_o\) are the input and output wires, anything else remains unchanged. Then, in the online phase, we can simply ignore the NOT gates, or in other words, by convention we will set \(e_o= e_u\).
2.3 Generalisation to Bigger Tables
If the circuit contains a part that evaluates a nonlinear function f on a small input, it is natural to implement computation of this function as a table. If the input is small, such a table is not prohibitively large. Suppose, for instance, that the input and output is 8 bits, as is the case for the AES SBox. Then we will store tables \(A_f, B_f\) each indexed by 8bit value M such that \(A_f[M] \oplus B_f[M] = f(x \oplus M)\oplus O\), where O is an 8bit output mask chosen in the preprocessing. We make sure in the preprocessing that the i’th bit of B denoted B[i] equals the wire mask for the i’th wire going into the evaluation of f, whereas O[i] equals the wire mask for the i’th wire receiving output from f. We can then use the table for f exactly as we use the tables for the AND gates.
To get malicious security we must note that the simple authentication scheme we used in the binary case will be less practical here: as we have 256 possible output values, each party would need to store 256 strings per table entry. It turns out this can be optimized considerably using a different MAC scheme, as described in the following section.
3 The Linear MAC Scheme
In this section, we describe some variations over a wellknown information theoretically secure MAC scheme that can be found, e.g., in [NNOB12]. We optimise it to be efficient on modern Intel processors, this requires some changes in the construction and hence we need to specify and reprove the scheme. It is intended to be used in conjunction with the generalisation to bigger tables described in the previous section.
There is a committer \({\mathsf {C}}\), a verifier \({\mathsf {V}}\) and a preprocessor \({\mathsf {P}}\). There is a security parameter k. Some of the computations are done over the finite field \(\mathbb {F}= {\text {GF}}(2^k)\). Let \(p(\texttt {X})\) be the irreducible polynomial of degree k used to do the computation in \(\mathbb {F}= {\text {GF}}(2^k)\), i.e., elements \(x, y \in \mathbb {F}\) are polynomials of degree at most \(k1\) and multiplication is computed as \(z = x y \bmod p\). We will also be doing computations in the finite field \(\mathbb {G}= {\text {GF}}(2^{2k1})\). Let \(q(\texttt {X})\) be the irreducible polynomial of degree \(2k1\) used to do the computation in \(\mathbb {G}\). Notice that elements \(x, y \in \mathbb {F}\) are polynomials of degree at most \(k1\), so xy is a polynomial of degree at most \(2k2\). We can therefore think of xy as an element of \(\mathbb {G}\). Note in particular that \(x y \bmod q = x y\) when \(x, y \in \mathbb {F}\).
3.1 Basic Version
The MAC scheme has message space \(\mathbb {F}\). We denote a generic message by \(x \in \mathbb {F}\). The MAC scheme has key space \(\mathbb {F}\times \mathbb {G}\). We denote a generic key by \(K = (\alpha , \beta ) \in \mathbb {F}\times \mathbb {G}\). The tag space is \(\mathbb {G}\). We denote a generic tag by \(y \in \mathbb {G}\). The tag is computed as \(y = {\text {mac}}_K(x) = \alpha x + \beta \). Note that \(\alpha \in \mathbb {F}\) and \(x \in \mathbb {F}\), so \(\alpha x \in \mathbb {G}\) as described above and hence can be added to \(\beta \) in \(\mathbb {G}\). We use this particular scheme because it can be computed very efficiently using the PCLMULQDQ instruction on modern Intel processors. With one PCLMULQDQ instruction we can compute \(\alpha x\) from which we can compute \(\alpha x + \beta \) using one additional XOR.
The intended use of the MAC scheme is as follows. The preprocessor samples a message x and a uniformly random key K and computes \(y = {\text {mac}}_K(x)\). It gives K to \({\mathsf {V}}\) and gives (x, y) to \({\mathsf {C}}\). To reveal the message \({\mathsf {C}}\) sends (x, y) to \({\mathsf {V}}\) who accepts if and only if \(y = {\text {mac}}_K(x)\). Since K is sampled independently of x, the scheme is clearly hiding in the sense that \({\mathsf {V}}\) gets no information on x before receiving the opening information. We now show that the scheme is binding.
Let \(\mathcal {A}\) be an unbounded adversary. Run \(\mathcal {A}\) to get a message \(x \in \mathbb {F}\). Sample a uniformly random key \((\alpha , \beta ) \in \mathbb {F}\times \mathbb {G}\) and compute \(y = \alpha x + \beta \). Given y to \(\mathcal {A}\). Run \(\mathcal {A}\) to get an output \((y', x') \in \mathbb {G}\times \mathbb {F}\). We say that \(\mathcal {A}\) wins if \(x' \ne x\) and \(y' = {\text {mac}}_K(x')\). We can show that no \(\mathcal {A}\) wins with probability better than \(2^{k}\). To see this, notice that if \(\mathcal {A}\) wins then he knows \(x,y,x',y'\) such that \(y = \alpha x + \beta \) and \(y' = \alpha x' + \beta \). This implies that \(y'y = \alpha (x'x)\), from which it follows that \(\alpha = (y'y)(x'x)^{1}\). This means that if \(\mathcal {A}\) can win with probability r then \(\mathcal {A}\) can guess \(\alpha \) with probability at least r. It is then sufficient to prove that no adversary can guess \(\alpha \) with probability better than \(2^{k}\). This follows from the fact that \(\alpha \) is uniformly random given \(\alpha x + \beta \), because \(\alpha x\) is some element of \(\mathbb {G}\) and \(\beta \) is a uniformly random element of \(\mathbb {G}\) independent of \(\alpha x\).
3.2 The Homomorphic Vector Version
We now describe a vector version of the scheme which allows to commit to multiple message using a single key.
The MAC scheme has message space \(\mathbb {F}^\ell \). We denote a generic message by \(\varvec{x} \in \mathbb {F}^\ell \). The MAC scheme has key space \(\mathbb {F}\times \mathbb {G}^\ell \). We denote a generic key by \(K = (\alpha , \varvec{\beta }) \in \mathbb {F}\times \mathbb {G}^\ell \). The tag space is \(\mathbb {G}^\ell \). We denote a generic tag by \(\varvec{y} \in \mathbb {G}^{\ell }\). The tag is computed as \(\varvec{y} = {\text {mac}}_K(\varvec{x}) = \alpha \varvec{x} + \varvec{\beta }\), i.e., \(y_i = \alpha x_i + \beta _i\). Note that \(\alpha \in \mathbb {F}\) and \(x_i \in \mathbb {F}\), so \(\alpha x_i \in \mathbb {G}\).
The intended use of the MAC scheme is as follows. The preprocessor samples a message \(\varvec{x}\) and a uniformly random key K and computes \(\varvec{y} = {\text {mac}}_K(\varvec{x})\). It gives K to \({\mathsf {V}}\) and gives \((\varvec{x}, \varvec{y})\) to \({\mathsf {C}}\). To reveal the message \(x_i\) the comitter \({\mathsf {C}}\) sends \((x_i, y_i)\) to \({\mathsf {V}}\), who accepts if and only if \(y_i = \alpha x_i + \beta _i\).
The preprocessed information also allows to open to any sum of a subset of the \(x_i\)’s. Let \(\varvec{\lambda } \in \mathbb {F}^\ell \) with \(\lambda _i \in \{0,1\}\). Let \(x_\lambda = \sum _i \lambda _i x_i \pmod {\mathbb {F}}\), let \(y_\lambda = \sum _i \lambda _i y_i \pmod {\mathbb {G}}\), and let let \(\beta _\lambda = \sum _i \lambda _i \beta _i \pmod {\mathbb {G}}\). To reveal \(x_\lambda \) the committer \({\mathsf {C}}\) sends \((x_\lambda , y_\lambda )\) and the verifier \({\mathsf {V}}\) checks that \(y_\lambda = \alpha x_\lambda + \beta _\lambda \). If both players are honest, this is clearly the case. The only nontrivial thing to notice is that since \(\sum _i \lambda _i x_i \pmod {\mathbb {F}}\) does not involve any reduction modulo p we have that \(\sum _i \lambda _i x_i \pmod {\mathbb {F}} = \sum _i \lambda _i x_i = \sum _i \lambda _i x_i \pmod {\mathbb {G}}\).
The scheme is hiding in the sense that after a number of openings to elements \(x_\lambda \) the verifier learns nothing more than what can be computed from the received values \(x_\lambda \). To see this notice that K is independent of \(\varvec{x}\) and hence could be simulated by \({\mathsf {V}}\). Also the openings can be simulated. Namely, whenever \({\mathsf {V}}\) received an opening \((x_\lambda , y_\lambda )\) from an honest \({\mathsf {C}}\), we know that \(y_\lambda = \alpha x_\lambda + \beta _\lambda \), so \({\mathsf {V}}\) could have computed \(y_\lambda \) itself from \(x_\lambda \) and K. Hence no information extra to \(x_\lambda \) is transmitted by transmitting \((x_\lambda , y_\lambda )\).
We then prove that the scheme is binding. Let \(\mathcal {A}\) be an unbounded adversary. Run \(\mathcal {A}\) to get a message \(\varvec{x} \in \mathbb {F}^\ell \). Sample a uniformly random key \((\alpha , \varvec{\beta }) \in \mathbb {F}\times \mathbb {G}^\ell \) and compute \(y_i = \alpha x_i + \beta _i\) for \(i = 1, \ldots , \ell \). Give \(\varvec{y}\) to \(\mathcal {A}\). Run \(\mathcal {A}\) to get an output \((y', x', \varvec{\lambda }) \in \mathbb {G}\times \mathbb {F}\times \{0,1\}^\ell \). We say that \(\mathcal {A}\) wins if \(x' \ne x_\lambda \) and \(y' = \alpha x' + \beta _{\lambda }\). We can show that no \(\mathcal {A}\) wins with probability better than \(2^{k}\). To see this, notice that if \(\mathcal {A}\) wins then he knows \(x'\) and \(y'\) such that \(y' = \alpha x' + \beta _{\lambda }\). He also knows \(x_ {\lambda }\) and \(y_{\lambda }\) as these can be computed from \(\varvec{x}\), \(\varvec{y}\) and \(\varvec{\lambda }\), which he knows already. And, it holds that \(y_{\lambda } = \alpha x_{\lambda } + \beta _{\lambda }\). Therefore \(y'  y_{\lambda } = \alpha (x'  x_{\lambda })\), and it follows as above that \(\mathcal {A}\) can compute \(\alpha \). Since no information is leaked on \(\alpha \) by the value \(\alpha \varvec{x} + \varvec{\beta }\) given to \(\mathcal {A}\) it follows that it is uniform in \(\mathbb {F}\) in the view of \(\mathcal {A}\). Therefore \(\mathcal {A}\) can compute \(\alpha \) with probability at most \(2^{k}\).
We note for completeness that the scheme could be extended to arbitrary linear combinations. In that case one would however have to send \(x_\lambda = \sum _i \lambda _i x_i \pmod {\mathbb {F}}\) and \(y_\lambda = \sum _i \lambda _i x_i \pmod {\mathbb {F}}\) which would involve doing reductions modulo p. The advantage of the above scheme where \(\lambda _i \in \{0,1\}\) is that no polynomial reductions are needed, allowing full use of the efficiency of the PCLMULQDQ instruction.
3.3 Batched Opening
We now present a method to open a large number of commitments in an amortised efficient way, by sending only k bits.
For notational simplicity we assume that \({\mathsf {C}}\) wants to reveal all the values \((x_1, \ldots , x_n)\), but the scheme trivially extends to opening arbitrary subsets and linear combinations. To reveal \((x_1, \ldots , x_n)\) as described above, \({\mathsf {C}}\) would send \(Y^{\mathsf {C}}=(y_1, \ldots , y_n)\) and \({\mathsf {V}}\) would compute \(y_i^{\mathsf {V}}= \alpha x_i + \beta _i\) for \(i = 1, \ldots , n\) and \(Y^{\mathsf {V}}=(y_1^{\mathsf {V}}, \ldots , y_n^{\mathsf {V}})\) and check that \(Y^{\mathsf {V}}= Y^{\mathsf {C}}\). Consider now the following optimization where \({\mathsf {C}}\) and \({\mathsf {V}}\) is given a function H that outputs (at least) k bits. They could then compare \(Y^{\mathsf {C}}\) and \(Y^{\mathsf {V}}\) by sending \(h^{\mathsf {C}}= H(Y^{\mathsf {C}})\) and checking that \(h^{\mathsf {C}}= H(Y^{\mathsf {V}})\).
This is clearly secure in the random oracle model: by what we saw above, if \(x_i' \ne x_i\) then with overwhelming probability, \({\mathsf {C}}\) has not been able to call the oracle on input \(y_i^{\mathsf {V}}\). Assuming he has not, he has no information on the value of \(F(y_i^{\mathsf {V}})\), and hence the probability that \(h^{\mathsf {C}}\) happens to be equal to \(H(Y^{\mathsf {V}})\) is \(2^{k}\).
Recall that we are going to use the MAC scheme outlined here for the case where we use bigger tables than for Boolean gates, such as the AES Sbox, and that in such a case the preprocessing will produce this type of linear MACs. This means that if we use the batch opening method, we will need to compute H in the online phase. Our definition of H is well suited for this on modern Intel processors: macs will typically be of size at most 128 bits, so as F we can use AES encryption under a fixed key that is chosen for each protocol execution. We will make the heuristic assumption that we can model this as a random permutation. Then, assuming we will be calling the function much less than \(\sqrt{2^{128}} = 2^{64}\) times (which should certainly be true in practice) such a permutation is well known to be statistically indistinguishable from a random function.
4 Preprocessing
In this section we show how to securely implement the preprocessing for Boolean circuits. The idea is to generate the tables by a computation on linear secret shared values, which in case of malicious security also include MACs. We will consider an additive secret sharing of x where A hold \(x_A\in \{0,1\}\) and B hold \(x_B\in \{0,1\}\) such that \(x=x_A+x_B\).
In the case of malicious security the MACs are elements in a finite field \(\mathbb {F}\) of characteristic 2 and size at least \(2^k\) where k is the security parameter. Here A hold a key \(\alpha _A\in \mathbb {F}\) and B a key \(\alpha _B\in \mathbb {F}\). We denote a secret shared value with MACs \([\![x]\!]\) where A hold \((x_A, y_A, \beta _A)\) and B hold \((x_B, y_B, \beta _B)\) such that \(y_A=\alpha _B x_A + \beta _B\) and \(y_B=\alpha _A x_B + \beta _A\). If a value is to be opened the MAC is checked, e.g. if x is opened to A she receives \(x_B\) and \(y_B\) and checks that indeed \(y_B = \alpha _A x_B + \beta _A\) or abort otherwise. This is also the format used in the TinyOT protocol [NNOB12], so we can use the preprocessing protocol from there to produce single values \([\![a]\!]\) for random a and triples of form \([\![x]\!], [\![y]\!], [\![z]\!]\) where x, y are random bits and \(z=xy\). Any other preprocessing producing the same data format will of course also be OK, for instance, the protocols presented in [FKOS15] will give better speeds than original TinyOT.
Note that, by a standard protocol [Bea91], we can use one triple to produce from any \([\![a]\!], [\![b]\!]\) the product \([\![ab]\!]\), this just requires opening \(a+x, b+y\) and local computation. Also, we can compute the sum \([\![a+b]\!]\) by only local computation.
In Fig. 8, we describe a protocol that implements the preprocessing functionality \(F^{pre}_{mal}\) assuming a secure source of triples and single random values as described here, and also assuming that the circuit contains only AND, XOR and NOT gates. We use a function F that we model as a random oracle.
Preprocessing for AES. To preprocess an AES Sbox table, we can again make use of the TinyOT preprocessing. This can be combined with a method from [DK10]. Here, it is shown how to compute the Sbox function securely using 7 binary multiplications and local operations. We can then make the table by simply computing each entry (in parallel). It is also possible to compute the Sbox using arithmetic in \(\mathbb {F}_{256}\), but if we have to build such multiplications from binary ones, as we would if using TinyOT, this most likely does not pay off.
5 Implementation
We implement two clients, Alice and Bob, securely evaluating the aes128 encryption function. Alice inputs the message, Bob inputs the expanded key and both parties learns the ciphertext. This function contains several operations where all operations except subbytes are linear. We first implement an optimized version where all linear operations are computed locally using the aesni/sse instruction set, and every nonlinear Sbox lookup is replaced with a tinytable lookup and opening. Afterwards we implement a binary version with free XOR gates and no NOT gates using the aesexpanded circuit from [TS]. For both implementations we benchmark a passively secure version and maliciously secure versions all providing statistical security \(2^{k}\). Here we test the linear MAC scheme with \(k=64\) and the lookup table MAC scheme with \(k=64\) and \(k=32\).
In the protocol the parties receive preprocessed data generated by a trusted party. We assume this data is present in memory and reuse the same instance for our benchmark. The parties proceed to compute the encryption function on a set of test vectors and verify correctness. We test the implementation on two setups: a basic LAN setup and on a cloud. The LAN setup consist of two PCs connected via 1 GbE, where each machine has a i73770K CPU at 3.5 GHz and 32 GB RAM. For our cloud setup we use Amazon EC2 with two c4.8xlarge instances locally connected via 10 GbE with 36 vCPUs (hyperthreads). The parties communicate over the TCP protocol.
Size of preprocessed data
Optimized  Binary  

Linear64  760.0 KiB  342.7 KiB 
Lookup64  80.4 MiB  512.7 KiB 
Lookup32  40.2 MiB  257.7 KiB 
Passive  40.0 KiB  2.7 KiB 
Execution times for optimized version
LAN  Cloud  

Sequential  Parallel  Sequential  Parallel  
Linear64  1.03 ms  3.15 \(\upmu \)s  1.09 ms  0.47 \(\upmu \)s 
Lookup64  1.03 ms  3.01 \(\upmu \)s  1.05 ms  0.45 \(\upmu \)s 
Lookup32  1.02 ms  2.95 \(\upmu \)s  1.05 ms  0.32 \(\upmu \)s 
Passive  0.88 ms  2.89 \(\upmu \)s  0.97 ms  0.29 \(\upmu \)s 
Execution times for binary version
LAN  Cloud  

Sequential  Parallel  Sequential  Parallel  
Linear64  4.92 ms  75.36 \(\upmu \)s  2.37 ms  19.19 \(\upmu \)s 
Lookup64  4.38 ms  54.72 \(\upmu \)s  2.22 ms  11.90 \(\upmu \)s 
Lookup32  4.18 ms  40.81 \(\upmu \)s  2.18 ms  9.98 \(\upmu \)s 
Passive  3.94 ms  25.50 \(\upmu \)s  1.84 ms  6.73 \(\upmu \)s 
6 An Asymptotically Better Solution
Recall that the main problem we have with obtaining malicious security is that we must make sure that players reveal correct bits from the tables they are given, but on the other hand only the relevant bits should be revealed.
In this section we show an asymptotically better technique for committing players to their tables such that we can open only the relevant bits.
The idea is as follows: if player A is to commit to string \(\varvec{s}\), that is known at preprocessing time, then the preprocessing protocol will establish a (verifiable) secret sharing of \(\varvec{s}\) among the players. Concretely, we will use the representation introduced for the socalled MiniMac protocol in [DZ13]: we choose an appropriate linear error correcting (binary) code C. This code should be able to encode strings of length k bits, and have length and minimum distance linear in k.^{2}
Based on this preprocessing, we can design a protocol that allows A to open any desired substring of \(\varvec{s}\), as follows: let I denote the characteristic vector of the substring, i.e., it is an \(\ell \) bit string where the i’th bit is 1 if A is to reveal the i’th bit of \(\varvec{s}\) and 0 otherwise.
We then compute a representation [I] (which is trivial using the additive shares of \(\varvec{a}\)), use the MiniMac protocol to compute \([I*\varvec{s}]\) and then open this representation to reveal \(I*\varvec{s}\) which gives B the string he needs to know and nothing more. This is possible if we let the preprocessing supply appropriate correlated randomness for the multiplication of I and \(\varvec{s}\). The protocol we just sketched here will be called \(\pi _{MiniMac}\) in the following.
In Fig. 9 we specify the preprocessing functionality \(F^{pre}_{MiniMac}\) we assumed in \(\pi _{MiniMac}\), i.e., it outputs the tables as well as MiniMac representations of them. Now consider the functionality \(F_{table}\) from Fig. 10 that simply stores the tables and outputs bits from them on request. By trivial modification of the security proof for MiniMac, we have
Lemma 1
The protocol \(\pi _{MiniMac}\) composed with \(F^{pre}_{MiniMac}\) implements \(F_{table}\) with statistical security against a malicious adversary.
As for the online efficiency of \(\pi _{MiniMac}\), note that in [DZ13] the MiniMac protocol is claimed to be efficient only for socalled wellformed circuits, but this is not a problem here since the circuit we need to compute is a completely regular depth 1 circuit. Indeed, bitwise multiplication of strings is exactly the operation MiniMac was designed to do efficiently. Therefore, simple inspection of [DZ13] shows that the preprocessing data we need will be of size \(O(\ell ) = O(s)\), where s is the circuit size, and this is also the communication complexity. The computational complexity is dominated by the time spent on encoding in C. Unfortunately, we do not know codes with the right algebraic properties that also have smart encoding algorithms, so the only approach known is to simply multiply by the generator matrix. We can optimize by noting that if \(\ell > k^2\) we will always be doing many encodings in parallel, so we can collect all vectors to encode in a matrix and use fast matrix multiplication. With current state of the art, this leads to computational complexity \(O(s k^\epsilon )\) where \(\epsilon \approx 0.3727\).
Alternatively, we can let C be a ReedSolomon code over an extension field with \(\varOmega (k)\) elements. We can then use FFT algorithms for encoding and then all complexities will be \(O({\text {polylog}}(k) s)\).
As a final step, consider the protocol \(\pi _{mal}^{table}\) from Fig. 11. By trivial adaptation of the proof for semihonest security, we get that
Lemma 2
The protocol \(\pi _{mal}^{table}\) composed with \(F_{table}\) implements \(F_{\mathsf {SFE}}\) with malicious and statistical security.
We can then combine Lemmas 1 and 2 to get a protocol for \(F_{\mathsf {SFE}}\) in the preprocessing model, which together with the efficiency consideration above gives us:
Theorem 2
There exists 2party protocol in the preprocessing model (using \(F^{pre}_{MiniMac}\)) for computing any Boolean circuit of size s with malicious and statistical security, where the preprocessed data size and communication complexity are O(s) and the computational complexity is \(O(k^\epsilon s)\) where k is the security parameter and \(\epsilon < 1\). There also exists a protocol for which all complexities are \(O({\text {polylog}}(k) s)\).
Footnotes
 1.
For this functionality as well as for the other preprocessing functionalities we define, whenever players are to receive shares of a secret value, the functionality lets the adversary choose shares for the corrupt player. This is a standard trick to make sure that the functionality is easier to implement: the simulator can simply run a fake instance of the protocol with the adversary and give to the functionality the shares that the corrupt player gets out of this. If we had let the functionality make all the choices, the simulator would have to force the protocol into producing the shares that the functionality wants. This weaker functionality is still useful: as long as the shared secret is safe, we don’t care which shares the corrupt player gets.
 2.
Furthermore its Schur transform should also have minimum distance linear in k. The Schur transform is the code obtained as the linear span of all vectors in the set \(\{ \varvec{c}*\varvec{d} \ \varvec{c}, \varvec{d}\in C\}\). See [DZ13] for further details on existence of such codes.
Notes
Acknowledgements
The first and third authors were supported by advanced ERC grant MPCPRO.
References
 [BDOZ11]Bendlin, R., Damgård, I., Orlandi, C., Zakarias, S.: Semihomomorphic encryption and multiparty computation. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 169–188. Springer, Heidelberg (2011). doi: 10.1007/9783642204654_11 CrossRefGoogle Scholar
 [Bea91]Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). doi: 10.1007/3540467661_34 Google Scholar
 [CDvdG87]Chaum, D., Damgård, I.B., Graaf, J.: Multiparty computations ensuring privacy of each party’s input and correctness of the result. In: Pomerance, C. (ed.) CRYPTO 1987. LNCS, vol. 293, pp. 87–119. Springer, Heidelberg (1988). doi: 10.1007/3540481842_7 Google Scholar
 [DK10]Damgård, I., Keller, M.: Secure multiparty AES. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 367–374. Springer, Heidelberg (2010). doi: 10.1007/9783642145773_31 CrossRefGoogle Scholar
 [DLT14]Damgård, I., Lauritsen, R., Toft, T.: An empirical study and some improvements of the MiniMac protocol for secure computation. In: Abdalla, M., Prisco, R. (eds.) SCN 2014. LNCS, vol. 8642, pp. 398–415. Springer, Cham (2014). doi: 10.1007/9783319108797_23 Google Scholar
 [DPSZ12]Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012). doi: 10.1007/9783642320095_38 CrossRefGoogle Scholar
 [DZ13]Damgård, I., Zakarias, S.: Constantoverhead secure computation of boolean circuits using preprocessing. In: Sahai, A. (ed.) TCC 2013. LNCS, vol. 7785, pp. 621–641. Springer, Heidelberg (2013). doi: 10.1007/9783642365942_35 CrossRefGoogle Scholar
 [DZ16]Damgård, I., Zakarias, R.W.: Fast oblivious AES a dedicated application of the MiniMac protocol. In: Pointcheval, D., Nitaj, A., Rachidi, T. (eds.) AFRICACRYPT 2016. LNCS, vol. 9646, pp. 245–264. Springer, Cham (2016). doi: 10.1007/9783319315171_13 CrossRefGoogle Scholar
 [FKOS15]Frederiksen, T.K., Keller, M., Orsini, E., Scholl, P.: A unified approach to MPC with preprocessing using OT. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9452, pp. 711–735. Springer, Heidelberg (2015). doi: 10.1007/9783662487976_29 CrossRefGoogle Scholar
 [IKM+13]Ishai, Y., Kushilevitz, E., Meldgaard, S., Orlandi, C., PaskinCherniavsky, A.: On the power of correlated randomness in secure computation. In: Sahai, A. (ed.) TCC 2013. LNCS, vol. 7785, pp. 600–620. Springer, Heidelberg (2013). doi: 10.1007/9783642365942_34 CrossRefGoogle Scholar
 [Kel]Keller, M. Private CommunicationGoogle Scholar
 [NNOB12]Nielsen, J.B., Nordholt, P.S., Orlandi, C., Burra, S.S.: A new approach to practical activesecure twoparty computation. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 681–700. Springer, Heidelberg (2012). doi: 10.1007/9783642320095_40 CrossRefGoogle Scholar
 [TS]Tillich, S., Smart, N.: Circuits of Basic Functions Suitable For MPC and FHE. https://www.cs.bris.ac.uk/Research/CryptographySecurity/MPC/