Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Secure two-party computation [17, 39] allows two mutually distrusting parties to perform secure computation using their private inputs without revealing any extra information to each other. It is known that even against semi-honest adversaries, i.e. adversaries who follow the prescribed protocol but are curious to find additional information, achieving information theoretic security in the plain model is impossible for most tasks [2, 3, 27, 29, 30]. For example, even the seemingly simple task of securely computing the AND of two bits is not possible. On the other hand, if suitable correlated randomness is provided as setup to the parties, then general secure two-party computation becomes possible [8, 26, 28]. A particularly useful type of correlated randomness is the random oblivious transfer (OT) correlation, where the sender gets two random bits \((s_0,s_1)\) and the receiver gets \((c,s_c)\), where c is a random bit.

One reason for the usefulness of OT correlations is the existence of highly efficient OT-based secure computation protocols both in theory and in practice. Indeed, protocols such as TinyOT [35] have popularized the approach of starting with random bit OT correlations for obtaining practically efficient secure computation protocols. Random OT correlations can be distributed or securely generated in an offline phase, long before the inputs are known, and later used in an online phase to perform a desired secure computation. But what if some information about the correlated randomness is leaked to an adversary? Can we still extract “fresh” correlated randomness after such leakage has occurred?

This question is a direct analog of the classical question of privacy amplification [4, 5] that arose in the context of secure communication. Privacy amplification asks the following question: given shared secret randomness which has been partially leaked to an eavesdropper, can parties agree upon a common key which remains hidden from the eavesdropper? In our setting, we ask the same question for correlated randomness, which is useful for secure computation. Note, however, that participants in a privacy amplification protocol protect their secret only from an outsider. Instead, in our setting, each party must protect its secrets against the other party. For example, a fresh oblivious transfer correlation ensures that the bit c is hidden from the sender and the bit \(s_{1-c}\) is hidden from the receiver.

Quite surprisingly, very little is known about our question. This is in sharp contrast to the problem of privacy amplification, and despite decades of study of OT-based secure computation. In particular, the question of how much leakage can be tolerated when recovering OT correlations has remained wide open. In our work, we resolve this question.

Prior to our work, Ishai, Kushilevitz, Ostrovsky and Sahai [24] studied this question, introducing the notion of correlation extractors. Concretely, they consider the setting of extracting fresh OT correlations from n independent copies of random OT correlations that have been subject to leakage. (One can either consider a deterministic leakage, captured by an arbitrary function f with t bits of output, or general probabilistic leakage, subject to the constraint that the secret has expected min-entropy of t bits conditioned on the leakage.) The main result of [24] is an interactive protocol for extracting OT correlations that remains secure even when some constant fraction of the 2n secret bits of each party can be leaked to the other party. Unfortunately, the concrete fractional leakage tolerated by this protocol is extremely small, approximately \(10^{-7}\). So, at best, this result serves as a proof of concept.

Since their work in 2009, there has not been any progress on this problem. In our work, we show that given n OT correlations as setup, one can tolerate \((1-\varepsilon )n/2\) bits of leakage, for an arbitrarily small constant \(\varepsilon >0\), with negligible error. This leakage rate is near-optimal [25]. Moreover, in contrast to the previous protocol of [24], our protocol uses a minimal amount of interaction, requiring only two messages as opposed to the 4 messages that are inherently required by the technique from [24]. Finally, our protocol is conceptually simpler, completely avoiding the use of Algebraic-Geometric codes [16, 19] needed in [24] and replacing them with simple families of binary linear codes.

Having settled the question of leakage-resilience for OT correlations, we then step back, and consider the question more broadly. While OT correlations are extremely useful and have a long history of applicability, perhaps there are other correlations that are better with respect to leakage-resilience, and still allow for secure computation. More precisely, we ask if there are correlations (XY) such that both parties receive 2n bits but where even after greater than n/2 bits of leakage, it is still possible to produce fresh secure OT correlations. We answer this question in the affirmative. We show that the so-called inner product correlation, where parties receive random binary vectors and additive shares of their inner product, can tolerate a significantly higher fractional leakage. Concretely, we show how to extract a fresh OT from such an inner product correlation while tolerating up to \(1/2-\epsilon \) fractional leakage, for any \(\epsilon >0\) (compared to the optimal leakage rate of 1/4 for OT correlations). This opens up a new set of questions to explore in future work (for more discussion refer to the full version of the paper [20]).

Finally, we note that while the primary focus of this work is on the information-theoretic setting for secure computation, the problem we consider is well motivated even in the setting of computational security. The reason is twofold. First, the fresh OTs produced by our extraction procedures can be used by computationally secure OT-based protocols such as those based on garbled circuits [39]. Second, these extraction procedures can be applied even when a computationally secure protocol is used for realizing the offline generation of correlated randomness. Suppose that a computationally secure two-party protocol \(\varPi \) is used for this purpose. If the leakage occurs after the execution of \(\varPi \) terminates (and the two parties erase everything but the output), then our protocols are guaranteed to produce clean OTs that can be consumed by subsequent (computational or information-theoretic) protocols. Moreover, if \(\varPi \) is so-called “leakage-tolerant” [6, 7, 21], then the same holds even if leakage can occur during the execution of \(\varPi \). Such leakage-tolerant protocols can be constructed under standard intractability assumptions.

1.1 Our Contribution

In this section we give a more detailed overview of our main results.

Oblivious Transfer Correlation Extractor. We present our results in the terminology of “random oblivious transfer extractors.” A random oblivious transfer (ROT) is a two party primitive where client S receives random bits \((s_0,s_1)\); and the client R receives random bit c and \(s_c\). Random oblivious transfer correlations can be easily converted into standard oblivious transfers, where a receiver R selects one of two bits held by a sender S. The latter can serve as a basis for general secure multiparty computation.

More concretely, Oblivious Transfer (OT) is a two-party functionality where client S (sender) has inputs \((s_0,s_1)\), client R (receiver) has input c, and client R obtains output \(s_c\). A Random OT correlation, referred to as ROT, provides \((s_0,s_1)\) to one party and \((c,s_c)\) to the other party, where \(s_0,s_1,c\) are uniform random bits. We work in the ROT\(^n\)-hybrid model, that is, there are n copies of ROT correlation provided to the two parties. A semi-honest client S can leak \(t_S\) bits from the correlation and a semi-honest client R can leak \(t_R\) bits from the correlation, where by default we define t bits of leakage as the output of an adversarially chosen function with t output bits. (However, all of our results extend to leakage measured in terms of average conditional min-entropy.) An \((n,t_S,t_R,\varepsilon )\) OT extractor is a two-party protocol between client S and client R such that it produces a \((1-\varepsilon )\)-secure copy of oblivious transfer despite prior leakage obtained by the clients.

Our first result shows the following feasibility result:

Theorem 1

(OT Extractor). For any \(n,t_S,t_R\in {\mathbb N} \), there exists a 2-message \((n,t_S,t_R,\varepsilon )\) OT Extractor protocol which produces a secure OT, such that \(\varepsilon \le 2^{-(g/4) +1}\) where .

Note that our result shows that if there is sufficient gap between n and the total leakage \((t_S+t_R)\), then we can securely extract one oblivious transfer. Further, the simulation error decreases exponentially in the gap. For example, \(t_S=t_R = 0.49n\) leakage tolerant extractors exist by our result. Contrast this to the result of [24] who can tolerate leakage up to cn bits of leakage where c is a minuscule small constant. Thus, ours is the first feasibility result in the regime of high leakage tolerance. Moreover, our leakage resilience is (near) optimal due to the negative result of [25]. The negative result states that there does not exist any OT combiner (let alone an extractor) which can tolerate up to \(n/2 -O(1)\) bits of leakage. Our protocol also improves upon the round complexity of [24] from 4 messages to 2 messages, which are clearly necessary.

We show that if the gap \(g = n - (t_S+t_R)\) is at least cn, for some constant \(c\in (0,1)\), then we can trade off simulation error and increase the production rate of our extractor. That is, in the leaky ROT\(^n\) hybrid, we can produce large number of secure independent copies of oblivious transfer. Our result is summarized in the following theorem:

Theorem 2

(High Production). For every \(n,t_S,t_R\in {\mathbb N} \), such that \(g=n-(t_S+t_R)=\varTheta (n)\), and \(\rho =\omega (\log n)\), there exists a 2-message \((n,t_S,t_R)\) OT Extractor with production \(p=n/\rho \) and \(\varepsilon \le \mathsf {negl} (n)\).

Intuitively, this theorem states that if the gap is linear in n then we can obtain slightly sub-linear number of secure oblivious transfers while incurring negligible security error. Although our production rate is sub-constant, we show that it is possible to extract large number of secure oblivious transfers even if parties are permitted to perform \(t_S=t_R=0.49n\) bits of leakage. Contrasting this with the result of [24], for practical and typical n the number of oblivious transfers produced in our scheme surpasses the number of oblivious transfers produced in their protocol. Because their production rate, although linear, is a very small constant; even a generous estimate of the rate of production puts it below \(1.2\cdot 10^{-7}\). The hidden constant in our asymptotic production rate is small, say upper bounded by \(10^{-1}\). So, in concrete terms, our production rate is \(\sim (g/n)/10\log ^2 n\), which is higher than the rate achieved by [24] for a practical range of parameters (we use \(\rho =\log ^2 n\) to derive this bound). An obvious open problem is to explore whether our approach can be extended to achieve the ideal goal of producing a linear number of secure OTs even if the gap is an arbitrarily small linear function of n.

Overall, our construction significantly simplifies the prior construction of [24] at a conceptual level by forgoing usage of Algebraic Geometric [16, 19] codes and instead relying on binary linear codes generated by generator matrices whose parity check matrices are random Toeplitz matrices.

Unlike [24], we do not achieve constant (multiplicative) communication overhead per instance of oblivious transfer produced. Our communication complexity overhead per oblivious transfer produced is linear in n. We also do not consider the problem of error tolerance, another important area of exploration in future work.

Restriction to Combiners. Combiners are special types of extractors where parties’s leakage functions are restricted. Parties are allowed to only indicate \(T\subseteq [n]\) as their leakage function. The client S can send \(\left| {T}\right| \le t_S\) and client R can send \(\left| {T}\right| \le t_R\). The leakage provided is \((s_0,s_1,c,s_c)\) of all ROT correlations indexed by T. Note that the actual information learned by the clients is one-bit per index (because all bits can be reconstructed from input and one bit leakage). We show that our construction yields slightly better simulation error than the general analysis of Theorem 1.

Theorem 3

(OT Combiner). For any \(n,t_S,t_R\in {\mathbb N} \), there exists a 2-message \((n,t_S,t_R,\varepsilon )\) OT-Combiner which produces one secure OT using O(n) bits of communication, where \(\varepsilon \le 2^{-g/2 }\) and .

Note that the construction presented in [25] achieves similar bounds but the communication complexity in their construction is quadratic in n; while ours is linear in n. We emphasize that the higher production result of Theorem 2 also applies to the setting of combiners.

Large Correlations. We show that, in fact, there are correlations that can tolerate a fractional leakage close to 1/2.

Theorem 4

(High Tolerance). For any \(s,t\in {\mathbb N} \), there exists a correlation (XY) over a pair of \((s+1)\)-bit strings such that, even after any party leaks t bits on the correlation (XY), they can securely realize OT using 2-message communication with simulation error \(\varepsilon \le 2^{-(g/2 +1)}\), where .

The correlation (XY) used to prove the above theorem is the so-called inner-product correlation, where each party receives a random s-bit vector and the mod-2 inner product of the two vectors is secret-shared between the parties. Moreover, it is not hard to show that our protocol cannot tolerate leakage rate bigger than 1/2. We leave open the question whether the 1/2 leakage rate is optimal for arbitrary correlations.

1.2 Prior Related Works

Most relevant work to our work is the work of Ishai, Kushilevitz, Ostrovsky, and Sahai [24], where the notion of correlation extractors was proposed. They showed that if the parties are allowed to leak a small linear amount of leakage, then a small linear number of correlations can be extracted. Both the leakage and production rates are a minuscule fraction of the initial number of correlations.

A closely related concept is the notion of OT combiners, which are a restricted variant of OT extractors in which leakage is limited to local information about individual OT correlations and there is no global leakage. The study of OT combiners was initiated by Harnik et al.  [23]. Since then, there has been work on several variants and extensions of OT combiners [22, 26, 32, 33, 37].

Recently, [25] constructed OT combiners with nearly optimal leakage parameters. Our protocols were inspired by the OT combiners from [25], but the results we achieve are stronger in several imporant ways. First, whereas [25] only considers t physical bits of leakage, we tolerate a arbitrary bits of leakage (similarly to [24], though with a much better leakage rate). Second, even in the case of physical leakage, our solutions improve over [25] by reducing the communication and randomness complexity from quadratic to linear. Finally, our protocols can be used to produce a near-linear number of OTs without significantly compromising the leakage rate, whereas [25] only considers the case of producing a single OT.

Another related work is that of Dziembowski and Faust [14] which (similarly to our Theorem 4) obtains some form of leakage-resilient secure computation from the inner product correlation. However, the construction from [14] requires multiple independent instances of an inner product correlation even for producing just a single OT, and moreover the model considered in [14] assumes that the leakage applies individually to each instance. Even if the analysis of [14] could somehow be strengthened to tolerate some amount of global leakage, the tolerable leakage rate must inevitably be small (since even with a leakage rate of 1/4, one can entirely compromise one of the inner product instances in [14]). Thus, the approach of [14] does not seem relevant to our goal of maximizing the leakage rate.

1.3 Technical Overview

We provide a short overview of our construction which proves Theorem 1. Our construction is inspired by the Massey secret sharing scheme [31]. Our construction is closely related to the constructions of [24, 25]. The central novelty in our construction approach is that we choose a different class of matrices (thus, reducing communication complexity of our algorithm), but the primary technical contribution of our work is our new analysis in the context of leakage. We consider general leakage (unlike the setting of [25] which considers physical bits of leakage) and, hence, lose a small quadratic factor in simulation error. But the same construction when used in the setting of combiners yields identical simulation error as [25].

For \(i\in [n]\), suppose the client S receives random pair of bits \((a_i,b_i)\) and client R receives \((x_i,z_i)\), such that \(x_i\) is a random bit and \(z_i= a_ix_i\oplus b_i\), from the setup. Client S picks a random codeword \((u_0,u_1,\cdots ,u_n)\) in a binary linear code \({\mathcal C}\) of length \((n+1)\). Client R picks a random codeword \((r_0,r_1,\cdots ,r_n)\) in the binary linear code \({\mathcal C} ^\perp \) of length \((n+1)\). Note that the set of all component-wise product of such codewords has non-trivial distance. Hence, they can correct one erasure. In particular, \(u_0r_0 = \sum _{i\in [n]} u_ir_i\). Hence, the clients need not explicitly compute \(u_0r_0\); but, instead, it suffices to compute \(u_ir_i\) for all \(i\in [n]\) and recovering one erasure thereafter.

For this section, we shall only consider privacy of client R against a semi-honest client S. Consider the following protocol: For each \(i\in [n]\),

  1. 1.

    Client R sends \(m_i = x_i\oplus r_i\).

  2. 2.

    Client S sends \(\alpha _i = a_i\oplus u_i\). Client S sends \(\beta _i = a_im_i \oplus b_i\).

Note that client R can compute \(\beta _i \oplus \alpha _ir_i \oplus z_i = u_ir_i\). To argue the privacy of client R, we need to show that \(r_0\) remains hidden from the view of client S. Let H be the generator matrix of \({\mathcal C} ^\perp \) and H is interpreted as \([H_0 | H']\), where \(H_0\) is the first column of H and \(H'\) is the remaining n columns. Note that the ability of client S to predict \(r_0\) can be abstracted out as follows: For \(\lambda \) uniform random vector, given \((\lambda H'\oplus x_{[n]},H)\), client S needs to predict \(\lambda H_0\).

Note that since client S is permitted to perform \(t_S\) bits of leakage on \(x_{[n]}\), we have the guarantee that \(x_{[n]}\) has high min-entropy on average. Now, the experiment is reminiscent of min-entropy extraction from high min-entropy sources via masking with small bias distributions. But, the uniform distribution over codes of a fixed binary linear codespace \({\mathcal C} ^\perp \) is not a small-bias source (projection on every dual codewords has full bias). So, we consider a set of codes \(({\mathcal C} _I,{\mathcal C} ^\perp _I)\), where I is the index, such that on average these codewords have small bias. Such a distribution suffices in our setting, because leakage is performed in an offline phase and the random linear code or \({\mathcal C} _I\) is chosen only in the online phase. The class of matrices chosen are binary matrices in systematic form whose parity check matrices are uniformly chosen Toeplitz matrices. This, intuitively, is the basic argument which all our proofs reduce to.

Theorem 2 is obtained by sampling \(\{S_1,\cdots ,S_m\}\) such that they are all disjoint and each \(S_i\) indexes a set of servers. One OT is extracted by applying Theorem 1 on each index set \(S_i\).

2 Preliminaries

Notations. We represent random variables by capital letters, for example X, and the values they take by small letters, for example \(\mathop {\mathrm {Pr}} [X=x]\). The set \(\{1,\cdots ,n\}\) is represented by [n], for \(n\in {\mathbb N} \). Given a vector \(v=(v_1,\cdots ,v_n)\) and \(T = \{i_1,\cdots ,i_{\left| {T}\right| }\} \subseteq [n]\), we represent \(\left( v_{i_1},\cdots ,v_{i_{\left| {T}\right| }}\right) \) by \(v_T\). Similarly, given a \(k\times n\) matrix G, we represent by \(G_T\) the sub-matrix of G formed by columns indexed by T. For brevity, we use \(G_i\) instead of \(G_{\{i\}}\), where \(i\in [n]\).

Probability Basics. The support of a probability distribution X, represented as \(\mathsf {Supp} (X)\) is the set of elements in the sample space which are assigned non-zero probability by X. A uniform distribution over a set S is represented by \(\mathbf {U} _S\). A probability distribution X over a universe U is a flat source if there exists a constant \(c\in (0,1]\) such that \(\mathop {\mathrm {Pr}} [X=x]\) is either 0 or c, for all \(x\in U\). Further, we say that X is a flat-source of size 1/c. Given a joint distribution (XY) over sample space \(U\times V\), the conditional distribution (X|y) represents the distribution over sample space U such that the probability of \(x\in U\) is \(\mathop {\mathrm {Pr}} [X=x|Y=y]\).

The statistical distance between two distributions X and Y over a finite sample space U is defined to be: \(\frac{1}{2}\sum _{u\in U}\left| {\mathop {\mathrm {Pr}} [X=u]-\mathop {\mathrm {Pr}} [Y=u]}\right| \).

Entropy Definitions. For a probability distribution X over a sample space U, we define entropy of x as , for every \(x\in U\). The entropy of X, represented by \({\mathbf H} (X)\), is defined to be \({\mathbb {E}}[H_X(x)]\). The min-entropy of X, represented by \({\mathbf H} _{\infty }(X)\), is defined to be \(\min _{x\in \mathsf {Supp} (X)} H_X(x)\). If \({\mathbf H} _{\infty }(X)\ge n\), then X can be written as convex linear combination of distributions, each of which are flat sources of size \(\ge 2^n\). The average min-entropy [10], represented by \(\widetilde{{\mathbf H}}_\infty (X|Y)\), is defined to be \(-\lg {\mathbb {E}}_{y\sim Y}\left[ 2^{-{\mathbf H} _{\infty }(X|y)}\right] \). Following lemma is useful for lower bounding average min-entropy after leakage on a high min-entropy source.

Lemma 1

(Chain Rule [10]). If \({\mathbf H} _{\infty }(X)\ge n\) and L be arbitrary \(\ell \)-bit leakage on X, then \(\widetilde{{\mathbf H}}_\infty (X|L)\ge n-\ell \).

2.1 Elementary Fourier Analysis

We define character \(\chi _S(x) = (-1)^{\sum _{i\in S}x_i}\), where \(S\subseteq [n]\) and \(x\in {\{0,1\}} ^n\). The inner product of two functions \(f:{\{0,1\}} ^n\rightarrow {\mathbb R} \) and \(g:{\{0,1\}} ^n\rightarrow {\mathbb R} \) is defined by . Given a probability distribution M over the sample space \({\{0,1\}} ^n\), the function \(f=M\) represents the function \(f(x)=\mathop {\mathrm {Pr}} [M=x]\).

Definition 1

(Bias of a Distribution). Let \(f:{\{0,1\}} ^n \rightarrow {\mathbb R} \) be a probability function. The bias of f with respect to subset \(S\subseteq [n]\) is defined to be:

Definition 2

(Small-bias Distribution Family [11]). Let \({\mathcal F} =\{F_1,\cdots , F_k\}\) be a family of distributions over sample space \({\{0,1\}} ^n\) such that for every \(\emptyset \ne S\subseteq [n]\), we have:

Then the distribution family \({\mathcal F}\) is called an \(\delta ^2\)-biased family.

Lemma 2

(Min-entropy Extraction [1, 11, 18, 34]). Let \({\mathcal F} =\{F_1,\cdots ,F_\mu \}\) be \(\delta ^2\)-biased family of distributions over the sample space \({\{0,1\}} ^n\). Let (ML) be a joint distribution such that the marginal distribution M is over \({\{0,1\}} ^n\) and \(\widetilde{{\mathbf H}}_\infty (M|L)\ge m\). Then, the following holds:

$$\mathrm {SD}\left( {\left( F_I\oplus M,L,I\right) },{\left( U_{{\{0,1\}} ^n},L,I\right) }\right) \le \frac{\delta }{2}\left( \frac{2^n}{2^m}\right) ^{1/2},$$

where I is a uniform distribution over \([\mu ]\).

2.2 Functionalities

We introduce some useful functionalities in this section.

Oblivious Transfer. A 2-choose-1 bit Oblivious Transfer (referred to as OT) is a two party functionality which takes input \((s_0s_1)\in {\{0,1\}} ^2\) from the sender and input \(c\in {\{0,1\}} \) from the receiver and outputs \(s_c\) to the receiver.

Random Oblivious Transfer. A random 2-choose-1 bit Oblivious Transfer (referred to as ROT) is an input-less two party functionality which samples uniformly random bits \(s_0,s_1,c\) and outputs \((s_0,s_1)\) to the sender and \((c,s_c)\) to the receiver. The joint distribution of sender-receiver outputs is called an ROT-correlation.

Oblivious Linear-Function Evaluation. Let \(({\mathbb F},+,\cdot )\) be an arbitrary field. An Oblivious Linear-function Evaluation over \({\mathbb F}\) is a two party functionality which takes inputs \((u,v)\in {\mathbb F} ^2\) from the sender and \(x\in {\mathbb F} \) from the receiver and outputs \(u\cdot x+v\) to the receiver. This functionality is referred to as OLE\(({\mathbb F})\). A random oblivious linear-function evaluation (ROLE) can be defined analogous to ROT.

The special case when \({\mathbb F} ={\mathbb G} {\mathbb F} (2)\), is simply referred to as OLE and is equivalent to OT.

Random Inner Product Correlation. This is an input-less two party functionality which samples , and \( b = a + {\left\langle {x_{[n]},y_{[n]}}\right\rangle } \). It outputs \((x_{[n]},a)\) to party A and \((y_{[n]},b)\) to party B. Note that for \(n=1\), this is equivalent to random oblivious transfer correlation and oblivious linear function evaluation.

2.3 Combiners and Extractors

In this section, we define oblivious transfer combiners and extractors.

Definition 3

( \((n,p,t_S,t_R,\varepsilon )\) (Single Use) OT-Combiner). An \((n,p,t_S,t_R,\varepsilon )\) (single use) OT-Combiner is an interactive protocol in the clients-servers setting. There are two clients S and R; and n servers. Each server implements one instance of oblivious transfer on inputs from S and R. We consider a semi-honest adversary who can either corrupt the client S and \(t_S\) servers or client R and \(t_R\) servers. The protocol implements p independent copies of secure oblivious transfer instances with correctness and simulation error at most \(\varepsilon \).

The correctness conditions for the protocol says that the receiver’s output is correct in all p-instances of OT with probability at least \((1-\varepsilon )\).

The privacy requirement says that the adversary should not learn more than it should. Let \((s^{{\left( i\right) }} _0, s^{{\left( i\right) }} _1)\) and \(c^{{\left( i\right) }} \) be the inputs of the sender and the receiver, respectively, in \(i^{th}\) copy of OT produced. Then a corrupt sender (resp., corrupt receiver) cannot output \(c^{{\left( i\right) }} \) (resp., \(s^{{\left( i\right) }} _{1-c}\)) with probability more than \(\frac{1}{2}+\varepsilon \) for any instance of OT produced.

Leakage Model and Correlation Extractors. Here we begin by describing our leakage model for ROLE correlations formally followed by defining correlation extractors for OLE. Recall that OT and OLE are just local renaming of each other. Our leakage model is as follows:

  1. 1.

    n -Random OLE Correlation Generation phase: For \(i\in [n]\), the sender S gets random \((a_i, b_i) \in {\{0,1\}} ^2\) and receiver R gets \((x_i, z_i)\), where \(x_i \in {\{0,1\}} \) is chosen uniformly at random and \(z_i = a_ix_i + b_i\).

  2. 2.

    Corruption and leakage phase. A semi-honest adversary corrupts either the sender and sends a leakage function \(L : {\{0,1\}} ^n \rightarrow {\{0,1\}} ^{t_S}\). It receives \(L(\{x_i\}_{i\in [n]})\). Or, it corrupts the receiver and send a leakage function \(L : {\{0,1\}} ^n \rightarrow {\{0,1\}} ^{t_R}\). It receives \(L(\{a_i\}_{i\in [n]})\).

    Note that without loss of generality any leakage on sender (resp., receiver) can be seen as a leakage on \(\{a_i\}\) (resp., \(\{x_i\}\)).

Let (XY) be the random OT correlation. We denote \((t_S, t_R)\)-leaky version of \((X,Y)^n\) described above as \(((X,Y)^n)^{[t_S,t_R]}\).

Definition 4

( \((n,p, t_S,t_R,\varepsilon )\) OT-Extractor). An \((n,p,t_S,t_R,\varepsilon )\) OT-Extractor is an interactive protocol between two parties S and R in the \(((X,Y)^n)^{[t_S,t_R]}\) hybrid described above. The protocol implements p independent copies of secure oblivious transfer instances with simulation error \(\varepsilon \).

The correctness and privacy requirements are same as those defined above for \((n,p,t_S,t_R,\varepsilon )\) (Single Use) OT-Combiner.

Note that in our setting, in \((X,Y)^n\) hybrid, parties only get one sample from this correlation; unlike the typical setting where parties can invoke the trusted functionality of the hybrid multiple times. The maximum fractional leakage resilience is defined by the ordered tuple \((t_S/n,t_R/n)\); and the production rate is defined by p / n.

Remark: An \((n,p,t_S,t_R,\varepsilon )\) OT extractor is also an \((n,p,t_S,t_R,\varepsilon )\) OT combiner.

Noisy Leakage Model. The leakage model described above is referred to as the “bounded leakage” model since we restrict the number of bits output by the leakage function. But this model is sometimes too restrictive and does not capture many side channel attacks, which are the main cause of leakage in real world applications. A more realistic assumption one can make is to assume that leakages are sufficiently noisy. It is observed via experiments that the real-world physical leakages are inherently noisy. There have been many works trying to model noisy leakage and present solutions in this setting [9, 12, 13, 15, 36]. At a high level the noisy feature of a leakage function f is captured by assuming that an observation of f(x) only implies a bounded bias in the probability distribution of x. More formally, f is said to be \(\delta \)-noisy if

$$\delta = \mathrm {SD}\left( {(X)},{(X|f(X))}\right) .$$

Note that if \({\mathbf H} _{\infty }(X) \ge n\) then for any \(k < n\), we can choose appropriate \(\delta \), such that \(\widetilde{{\mathbf H}}_\infty (X | f(X)) \ge k\), where f is a \(\delta \)-noisy channel.

We emphasize that all our protocols only rely on the fact that the initial correlation given to any party has high average min-entropy (\(\widetilde{{\mathbf H}}_\infty \)) after the leakage. Hence, all our protocols directly work even in the general setting of noisy leakage.

2.4 Distribution over Matrices

An \(k\times n\) matrix M with \({\{0,1\}}\) entries is in systematic form if \(M = [ I_{k\times k} \Vert P]\), where \(I_{k\times k}\) is the identity matrix of dimension k and P is the parity check matrix of dimension \(k\times (n-k)\). The matrix P is a Toeplitz matrix if \(P_{i,j} = P_{i-1,j-1}\), for all \(i\in (1,k]\) and \(j\in (1,n-k]\). So, a Toeplitz matrix is uniquely defined by its first row and first column. We shall consider uniform distribution over \(k\times n\) binary matrices in systematic form such that their parity check matrices are uniformly chosen Toeplitz matrices. A salient feature of family of such matrices is proved in Lemma 3.

Let \({\mathbb T} _{(k,n)} \) is a uniform distribution over matrices M of the following form. Let \(M \equiv \left[ I_{k\times k} \big \vert P_{k\times (n-k)}\right] \), where P is a binary Toeplitz matrix of dimension \(k\times (n-k)\).

Define \({\mathbb T} _{\bot ,(k,n)} \) is a uniform distribution over matrices M of the following form. Let \(M \equiv \left[ P_{k\times (n-k)} \big \vert I_{k\times k}\right] \), where P is a binary Toeplitz matrix of dimension \(k\times (n-k)\).

Note that there exists an bijection between the matrices in \({\mathbb T} _{(k,n)}\) and \({\mathbb T} _{\bot ,(n-k,n)}\) established by the function which maps dual matrices to each other.

For a given \(G \in {\mathbb T} _{(k,n)} \), the distribution \(F_G\) corresponds to a uniform distribution over the codewords generated by G. We have the following lemma, which will be used to prove the main unpredictability lemma in the next section.

Lemma 3

For the distribution of matrices \({\mathbb T} _{(k,n)} \), the following holds. For any \(\emptyset \ne T \subseteq [n]\),

Proof

Since \(F_G\) is a uniform distribution of codewords over a linear code, for any G, either \(\mathsf {Bias} _T(F_G)^2\) is either 0 or 1. Moreover, \(\mathsf {Bias} _T(F_G)^2 = 1\) if and only if \(\sum _{i \in T} G_i = 0^k\). Hence, it suffices to show the following: For any fixed column \(c \in {\{0,1\}} ^k\) and non-empty set \(T \subseteq [n]\), \(\mathop {\mathrm {Pr}} [\sum _{i\in T} G_i=c] \le 2^{-k}\). We prove this using a sequence of observations.

Note that: \(G_i=c\), for \(i>k\), happens with probability \(2^{-k}\).

Next, we claim that: \(G_i+G_j=c\), for \(i>j>k\), happens with probability \(2^{-k}\). This is so because the probability that the \(G_{i,k}+G_{j,k}=c_k\) happens with probability 1 / 2. Fixing the values of \(G_{i,k}\) and \(G_{j,k}\), the probability that we have \(G_{i,k-1}+G_{j,k-1}=c_{k-1}\) is 1 / 2; because the random variable \(G_{j,k-1}\) is not fixed (columns \(\{k+1, \cdots , n\}\) form a Toeplitz matrix). Extending this argument, we get for any \(T' \subseteq \{k+1, \cdots , n\}\), \(\mathop {\mathrm {Pr}} [\sum _{i\in T'} G_i=c] \le 2^{-k}\).

To prove full claim, note that \(\mathop {\mathrm {Pr}} [\sum _{i\in T} G_i=c] = \mathop {\mathrm {Pr}} [\sum _{i\in T:i>k}G_i = c + \sum _{i\in T':i\le k}G_i] \le 2^{-k}\) using the above conclusion.

3 Unpredictability Lemma

In this section we present the main unpredictability lemma.

Lemma 4

(Unpredictability Lemma). Let \({\mathbb G} \in \left\{ {\mathbb T} _{(k,n+1)}, {\mathbb T} _{\bot ,(k,n+1)} \right\} \). Consider the following game between a honest challenger \({\mathcal H}\) and an adversary \({\mathcal A}\):

figure a

The adversary \({\mathcal A}\) wins the game if \(y_0=\tilde{y}\). For any \({\mathcal A}\), the advantage of the adversary, i.e. \(\mathsf {Adv} ({\mathcal A}) = \mathop {\mathrm {Pr}} (y_0 = \tilde{y}) - 1/2 \le \frac{1}{2}\sqrt{\frac{2}{2^{k-t}}}\).

Proof

Let \({\mathbb G} \) be the distribution \({\mathbb T} _{(k,n+1)} \). The proof for the other case will work similarly.

Given a \(G \in {\mathbb G} \), the distribution \(F_G\) corresponds to a uniform distribution over the codewords generated by G. Note that over choice of G, they form a \(\delta ^2=2^{-k}\) biased family of distributions (by Lemma 3).

By Lemma 1, \(\widetilde{{\mathbf H}}_\infty (M_{[n]} | {\mathcal L} (M_{[n]}) ) \ge n-t\). Let \(M = (0, M_{[n]})\), then putting these in Lemma 2, we get

$$ \mathrm {SD}\left( {\left( F_G\oplus M,L,G\right) },{\left( U_{{\{0,1\}} ^{n+1}},L,G\right) }\right) \le \frac{1}{2}\sqrt{\frac{2^{n+1}}{2^{k+n-t}}} $$

The lemma follows by noting that \(\mathsf {Adv} ({\mathcal A}) \le \mathrm{SD}{\Big ((F_G\oplus M,L,G\big ),}{\Big (U_{{\{0,1\}} ^{n+1}}, L,G\Big )\Big )}\).

All our security proofs will directly reduce to this unpredictability lemma, i.e. Lemma 4.

4 Oblivious Transfer Extractor

4.1 Extracting One Oblivious Transfer

In this section, we shall prove Theorem 1 by presenting our \((n,t_S,t_R,\varepsilon )\) OT extractor which extracts one copy of secure OT. For ease of presentation, we provide our construction in the random oblivious linear evaluation (ROLE) correlation hybrid; and also produce one secure copy of oblivious linear evaluation. Recall that a ROLE correlation provides to the sender and \((x,z=ax\oplus b)\), where , to the receiver. The security requirement insists that the sender cannot predict x and the receiver cannot predict a. Note that \((s_0\oplus s_1)c\oplus s_0\) is identical to oblivious transfer. So, oblivious transfer and OLE are equivalent to each other; consequently, it suffices to construct a OLE extractor in ROLE\(^n\) hybrid.

Fig. 1.
figure 1

Round optimal correlation extractor protocol which extracts one copy of Oblivious Linear Function Evaluation from n copies of Random Oblivious Linear Functions Evaluations.

The construction provided here is similar to the construction provided in [25]. But we deal with general leakage, instead of restricted leakage of physical bits in the combiner setting, using more sophisticated analysis tools. We also achieve lower communication complexity. In particular, we improve the communication complexity from \(\varTheta (n^2)\) in [25] to \(\varTheta (n)\) in the current work. When analyzed appropriately for the combiner setting, our current protocol achieves identical simulation error as in that paper (but reduces the communication complexity to linear from quadratic).

Note that after the correlation generation step, the protocol is only two rounds, i.e. client R sends one message (by combining steps 1 and 2.c) and client S replies with one message (step 2.d).

No Corruption Case. We will first prove the correctness of the protocol presented in Fig. 1 for the case when all clients and servers are honest and there is no leakage.

The construction does not output \(\mathsf {abort} \) with probability \(1-2^{-(n+1-k)}\), because the algorithm aborts if and only if the first row of the parity check matrix of G is all 0s. Conditioned on not aborting, we show that the protocol is perfectly correct. Following lemma proves correctness.

Lemma 5

In the protocol in Fig. 1 the client R outputs \(y = s_0c \oplus s_1\).

Proof

We first show that \(t_i = u_i r_i \oplus v_i\).

$$\begin{aligned} t_i&=\beta _i \oplus \alpha _ir_i \oplus z_i = (a_im_i \oplus b_i \oplus v_i) \oplus (a_i r_i \oplus u_i r_i) \oplus z_i \\&= a_i x_i \oplus a_i r_i \oplus b_i \oplus v_i \oplus a_i r_i \oplus u_i r_i \oplus a_ix_i \oplus b_i = u_ir_i \oplus v_i \end{aligned}$$

This shows that \(z = \oplus _{i\in [n]} t_i = u_0 r_0 \oplus v_0\). This follows from \(\mathop {\oplus }_{i=0}^n u_i \cdot r_i = 0\) and \(\oplus _{i=0}^n v_i = 0\). Now for y we have the following:

$$\begin{aligned} y&= \beta \oplus \alpha c + z = (u_0m \oplus v_0 \oplus s_1) \oplus (u_0c \oplus s_0c) \oplus z \\&= u_0r_0 \oplus u_0c \oplus v_0 \oplus s_1 \oplus u_0c \oplus s_0c \oplus u_0 r_0 \oplus v_0 = s_0c \oplus s_1. \end{aligned}$$

Sender Privacy and Receiver Privacy. In order to give a modular analysis, we consider a simpler protocol of 4-rounds which is equivalent to the protocol presented in Fig. 1. In the simpler protocol, the first two rounds correspond to ROLE extraction, where the receiver sends the messages \(\{m_i\}_{i\in [n]}\) and receives \(\{(\alpha _i, \beta _i)\}_{i\in [n]}\) and computes ROLE \(z = u_0r_0 \oplus v_0\). In the following, we will refer to this as ROLE extraction phase. In the next two rounds, it uses this ROLE to compute the OLE on inputs \(s_0, s_1, c\) as follows: Receiver sends message m and gets back \(\alpha , \beta \) and computes y. Note that since we only consider semi-honest adversaries and leakage only occurs before the start of the protocol, these two protocols are equivalent in correctness and security guarantees.

Below, in order to prove the sender and receiver privacy we analyze this protocol. For security of both sides, it is sufficient to prove that extracted ROLE is secure in first phase.

Receiver Privacy. In order to prove receiver privacy, we need to show that the choice bit c is hidden from the semi-honest sender who can obtain \(t_S\) bits of leakage. We note that it suffices to show that at the end of the ROLE extraction phase (described above), the choice bit \(r_0\) is hidden.

Let L denote the random variable for leakage obtained by the semi-honest sender. We will denote the random variable for the choice bit vector \(x_{[n]}\) for the receiver in the correlation generation phase by \(X_{[n]}\). Note that \(X_{[n]}\) is identical to uniform distribution over \({\{0,1\}} ^n\). Note that L has at most \(t_S\) bits of leakage on \(X_{[n]}\).

The view of client S at the end of the random correlation extraction phase is:

$$\vartheta =(a_{[n]}, b_{[n]},G, (u_0, \cdots , u_n), (v_0, \cdots , v_n), m_{[n]},L= \ell )$$

Below we show that for any semi-honest client S, we have \(\mathop {\mathrm {Pr}} (S(\vartheta ) = r_0)\) is at most \(1/2 + 2^{-g/4-1}\).

Note that \(\mathop {\mathrm {Pr}} (S(\vartheta ) = r_0) = \mathop {\mathrm {Pr}} (S(H, m_{[n]}, L) = r_0)\), where H is the generator matrix for \({\mathcal C} ^\perp \). Recall that \(H \in {\mathbb T} _{\bot ,(n+1-k,n+1)} \), where \(k = t_R + g/2\). In Fig. 1, the client R picks a random codeword \((r_0, \cdots , r_n) \in {\mathcal C} ^\perp \). Alternatively, this can be done by picking and \((r_0, \cdots , r_n) = w\cdot H\), where H is the generator matrix for \({\mathcal C} ^\perp \). Note that \(m_{[n]} = (w\cdot H)_{[n]} \oplus x_{[n]}\) and \(r_0 = \langle H_0, w \rangle \).

Since, the sender can leak \(t_S\) bits on \(x_{[n]}\), we have: \(\widetilde{{\mathbf H}}_\infty (X_{[n]}|L)\ge m=(n-t_S)\). By Lemma 4, the advantage of predicting \(\langle H_0, w \rangle \) is at most: \(2^{-g/4-1}\).

Sender Privacy. In order to prove sender privacy for Fig. 1, we need to show that the bit \(s_0\) is hidden from the receiver after the protocol. Note that it suffices to show that at the end of the ROLE extraction phase (for the simpler protocol described above) bit \(u_0\) is hidden.

Let L denote the random variable for leakage on vector \(a_{[n]}\) obtained by the semi-honest adversary who corrupts the receiver after the random correlation generation phase. We will denote the random variable for the bit vector \(a_{[n]}\) for the sender in the correlation generation phase by \(A_{[n]}\). Note that \(A_{[n]}\) is identical to uniform distribution over \({\{0,1\}} ^n\) and L has at most \(t_R\) bits of leakage on \(A_{[n]}\). So, we get \(\widetilde{{\mathbf H}}_\infty (A_{[n]}|L)\ge m=n-t_R\).

The view of client R at the end of the random correlation extraction phase is:

$$\vartheta =(x_{[n]}, z_{[n]},G, (r_0, \cdots , r_n), m_{[n]}, \alpha _{{\{0,1\}}, [n]}, L= \ell )$$
Fig. 2.
figure 2

Simulator for Sender Privacy. The distribution \({\mathbb G}\) is uniform distribution over \(k\times (n+1)\) binary matrices in systematic form whose parity check matrices are uniform Toeplitz matrices.

Let \(U_0\) denote the random variable for \(u_0\). We are interested in the conditional distribution \((U_0|\vartheta )\). Below we will show that for any semi-honest client R, \(\mathop {\mathrm {Pr}} (R(\vartheta ) = u_0)\) is at most \(1/2 + 2^{-(g/4)}\).

We show this via a reduction to Lemma 4 in Fig. 2. Given any adversary \({\mathcal A}\) who can predict \(u_0\), we convert it into an adversary \({\mathcal A} '\) against the honest challenger \({\mathcal H}\) of Lemma 4 with identical advantage. It is easy to see that this reduction is perfect. Note that the only difference in the simulator from the actual protocol is that the generator matrix G is being generated by the honest party \({\mathcal H}\) instead of being obtained from \({\mathcal A}\). This does not cause any issues, because we are only dealing with semi-honest adversaries. At the end of random correlation extraction phase, the advantage in predicting \(U_0\) is at most: \(2^{-(g/4)}\).

Note that our simulation works even for arbitrary choice of \(x_{[n]}\) and \(m_{[n]}\). In particular, it works when these vectors are chosen uniformly at random.

4.2 Trading Off Simulation Error with Production Rate

In this section we use sub-sampling techniques to trade-off simulation-error to get improved production rate. The main idea is to sample small subsets of disjoint correlations and, subsequently, run the protocol in Fig. 1 on those subsets independently. This increases the simulation error (due to smaller number of OTs used to output each fresh OT i.e. smaller value of n), but yields higher production rates.

In our case, we use the trivial sub-sampling technique of picking indices at random with suitable probability; in case of a sample repeating itself, we discard it and re-sample. This technique yields distinct samples and has identical properties as the naïve subsampling technique (see [38]). The sophisticated techniques of [38] are also relevant to our setting; but they do not yield any reduction in “simulation error increase.” They are useful only to reduce the communication complexity of the protocols.

We only work in the setting where \(g = n- (t_S+t_R)\) is at least cn, for some constant \(c\le 1\). In general c could have been a function of n, but we forgo those cases. The main technical lemma is the following:

Lemma 6

(Sub-sampling [38]). Let \((A_{[n]},L)\) be a joint distribution such that, there exists a constant \(\mu \in (0,1)\) such that, \(\widetilde{{\mathbf H}}_\infty (A_{[n]}|L)\ge \mu n\). For every constant \(\varepsilon \in (0,\mu )\) and \(\rho =\omega (\log n)\), there exists an efficient algorithm which outputs \((S_1,\cdots ,S_m)\in \left( 2^{[n]}\right) ^m\) such that \(m=n/\rho \) and with probability \(1-\mathsf {negl} (n)\), the following holds:

  1. 1.

    Large and Distinct: There exists a constant \(\lambda \in (0,1)\) such that \(\left| {S_i}\right| =\lambda \rho \). We have \(S_i\cap S_j =\emptyset \), for all \(i,j\in [m]\) and \(i\ne j\).

  2. 2.

    High Entropy: \(\widetilde{{\mathbf H}}_\infty (S_{i+1}|S_{[i]},L) \ge (\mu -\varepsilon )\left| {S_{i+1}}\right| \).

Obtaining the Result of Theorem 2 . We obtain this theorem as a direct application of Lemma 6. Recall that we will be working in the setting when \(g = n - (t_S + t_R) \ge cn\) for some constant \(c \in (0,1]\). Now we apply Lemma 6 to obtain the disjoint sets \(S_1, \cdots , S_m\) for \(m = n/\rho \) where \(\rho = \omega (\log n)\). Next, we apply the protocol in Fig. 1 to each of the sets independently for the following choice of parameters: \(n' = \left| {S_i}\right| \), \(t'_S = (\frac{t_S}{n} + \varepsilon )\left| {S_i}\right| \), and \(t'_R = (\frac{t_R}{n} + \varepsilon )\left| {S_i}\right| \). Note that new gap \(g' = (\frac{g}{n} - 2\varepsilon )\left| {S_i}\right| \). The simulation error obtained for any OT produced will be bounded by \(2^{-\varTheta (g')} = \mathsf {negl} (n)\).

We observe that the approach of subsampling to obtain “disjoint subsets” while preserving min-entropy is unlikely to yield constant production rate extractors.

5 Inner Product Correlation

In this section we prove Theorem 4. Our protocol is provided in Fig. 3.

Fig. 3.
figure 3

Random oblivious function evaluation extractor from one inner product correlation over n-bits.

When both parties are honest, we need to prove the correctness of the protocol which trivially follows.

Sender Corrupt. Suppose a semi-honest client A can leak t bits on information from \((y_{[n]},b)\). In this case, we have \(\widetilde{{\mathbf H}}_\infty (Y_{[n]}|L)\ge m = n-t\). For security, we need to prove the hiding of the bit \(r_0\) given \(r_{[n]}\oplus y_{[n]}\), where \(r_{[n]}\) is a uniformly chosen codeword from the image of “H with its first column punctured.” Now, we can directly invoke Lemma 4 and get that the distribution \((R_0|\vartheta )\) is \( = 2^{-(g/2+1)}\) close to the uniform distribution over \({\{0,1\}}\), where \(\vartheta \)is the view of client A at the end of the protocol and \(g=n/2-t\).

Receiver Corrupt. For this case, we construct a reduction similar to the reduction provided in Fig. 2. Again, in this case we assume that client A sends the matrix G instead of client B (which is acceptable because the adversaries are semi-honest). Suppose there exists an adversary \({\mathcal A}\) which can distinguish \(U_0\) from a uniformly random bit with certain advantage. We shall construct an adversary \({\mathcal A} '\) which uses \({\mathcal A}\) to break the unpredictability experiment of Lemma 4 with identical advantage using a simulation similar to Fig. 2.

Note that as before this will be a perfect simulation of the view of \({\mathcal A}\) because the bit \(v_0\) is uniformly random in the actual protocol. Thus, if \({\mathcal A}\) can predict \(u_0 = \lambda G_0\) then the adversary \({\mathcal A} '\) can also predict \(\lambda G_0\) with identical advantage. By Lemma 4, the distribution \((U_0|\vartheta )\) is at most \(2^{-(g/2+1)}\) far from the uniform distribution over \({\{0,1\}}\).