1 Introduction

Bellare and Rogaway [5] designed the RSA-OAEP encryption scheme as a drop-in replacement for RSA PKCS #1 v1.5 [55] with provable security. In particular, it follows the same paradigm as RSA PKCS #1 v1.5 in that it encrypts a message of less than k bits to a k-bit ciphertext (where k is the modulus length) by first applying a fast, randomized, and invertible “padding transform” to the message before applying RSA. In the case of RSA-OAEP, the underlying padding transform (which is itself called ‘OAEP’Footnote 1) embeds a message m and random coins r as \(s\Vert (H(s) {\,\oplus \,}r)\) where ‘\(\Vert \)’ denotes concatenation, \(s = (m \Vert 0^{k_1}) {\,\oplus \,}G(r)\) for some parameter \(k_1\), and G and H are hash functions (see Fig. 2 on p. 12). In contrast, PKCS #1 v1.5 essentially just concatenates m with r.

RSA-OAEP was designed using the random oracle (RO) methodology [6]. This means that the hash functions are modeled as independent truly random functions, available to all parties via oracle access. When the scheme is implemented in practice, these oracles are heuristically “instantiated" in certain ways using a cryptographic hash function. In particular, this means that any oracle call by the scheme’s algorithms is replaced by the computation of a concrete function. In terms of security, a cryptographic hash function (or a function built from one) is of course not random nor computable only via an oracle (it has a short, public description), but schemes designed using this methodology are hoped to be secure. Unfortunately, a series of works, starting with the seminal paper of Canetti et al. [20], showed that there are schemes secure in the RO model that are insecure under every instantiation of the oracles; such RO model schemes are called uninstantiable. Thus, to gain confidence in an RO model scheme, we should show that it is instantiable, i.e., that the oracles admit a secure instantiation by efficiently computable functions under well-defined assumptions. Then, when we instantiate the scheme, we know that our goal is at least plausible. We feel this is especially important for a scheme such as RSA-OAEP, which is by now widely standardized and implemented (e.g., in SSH [32]).

Yet, while RO model schemes continue to be proposed, relatively few have been shown to be instantiable. In particular, we are not aware of any result showing instantiability of RSA-OAEP, even under a relatively modest security model. In fact, the scheme has come under criticism lately due to several works (discussed in Sect. 1.2) showing the impossibility of certain types of instantiations under chosen-ciphertext attack (IND-CCA) [52]. Fortunately, we bring some good news: We give reasonable assumptions under which RSA-OAEP is secure against chosen-plaintext attack (IND-CPA) [31]. We believe this is an important step toward a better understanding of the scheme’s security.

1.1 Our Contributions

Our result on the instantiability of RSA-OAEP is obtained via three steps or other results. (These other results may also be of independent interest.) First, we show a general result on the instantiability of “padding-based encryption,” of which f-OAEP is a special case, under the assumption that the underlying padding transform is what we call a fooling extractor and the trapdoor permutation is lossy [49]. We then show (as the second and third steps, respectively) that OAEP and RSA satisfy the respective conditions under suitable assumptions.

Padding-based encryption without ROs. Our first result is a general theorem about padding-based encryption (PBE), a notion formalized recently by Kiltz and Pietrzak [38].Footnote 2 PBE generalizes the design methodology of PKCS #1 and RSA-OAEP we already mentioned. Namely, we start with a k-bit to k-bit trapdoor permutation (TDP) that satisfies a weak security notion like one-wayness. To “upgrade" the TDP to an encryption scheme satisfying a strong security notion like IND-CPA, we design an invertible “padding transform" which embeds a plaintext and random coins into a k-bit string, to which we then apply the TDP. This methodology is quite natural and has long been prevalent in practice, motivating the design of OAEP and later schemes such as SAEP [13] and PSS-E [23]. The latter were all designed and analyzed in the RO model.

We show that the RO model is unnecessary in the design and analysis of IND-CPA secure PBE. To do so, we formulate a connection between PBE and a new notion we call “fooling extractor for small-range distinguishers." or just “fooling extractor,” and lossy trapdoor functions as defined by Peikert and Waters [49]. Lossiness means that there is an alternative, “lossy” key generation algorithm that outputs a public key indistinguishable from a normal one, but which induces a small (“lossy”) range function. This is powerful because it allows one to prove security with respect to the lossy key generation algorithm, where information-theoretic arguments apply. A fooling extractor is a kind of randomness extractor (a concept introduced in [46]) whose output on a high-entropy source looks random to any function (or distinguisher) with a small range.Footnote 3 Our result says that if the padding transform of a PBE scheme is an “adaptive” fooling extractor for sources of the form (mR)—where m is a plaintext and R is the random coins (which we call “encryption sources”)—and its TDP is sufficiently lossy (the logarithm of its lossy range size should be slightly less than the length of R), then the PBE scheme is IND-CPA. Here “adaptive” means that m may depend on the choice of the extractor seed. We call such padding transforms “encryption-compatible.”

OAEP fools small-range distinguishers. Our second result says that the OAEP padding transform is encryption-compatible if the hash function G is t-wise independent for appropriate t (roughly, proportional to the allowed message length).Footnote 4 Note that no restriction is put on hash function H; in particular, neither hash function is modeled as an RO. The inspiration for our proof comes from the “Crooked" Leftover Hash Lemma (LHL) of Dodis and Smith [26], especially its application to deterministic encryption by Boldyreva et al. [10] (who also gave a simpler proof). Qualitatively, the Crooked LHL says that \((K,f(\Pi (K,X)))\) looks like (Kf(U)) for any small-range function f, pairwise-independent function \(\Pi \) keyed by K, and high-entropy source X; in our terminology, this says that a pairwise-independent function is a fooling extractor for such X. In our application, we might naïvely view \(\Pi \) as the OAEP. There are two problems with this. First, OAEP is not pairwise independent, even in the RO model. Second, showing that OAEP is encryption-compatible entails showing adaptivity (as defined above), whereas in the lemma K is independent of X.

To solve the first problem, we show that the Crooked LHL can be strengthened to say that \(K,f(X,\Pi (K,X))\) looks like Kf(XU); i.e., that \(\Pi (K,X)\) looks random to f even given X. The proof is a careful extension of the proof of the Crooked LHL in [10]. Then, by viewing X as the random coins in OAEP and \(\Pi \) as the hash function G, we can conclude that OAEP is a fooling extractor for any fixed encryption source (mR), where m is independent of K (note that our analysis does not use any properties of H—the only fact we use about the second Feistel round is that it is invertible).

To solve the second problem, we extend an idea of Trevisan and Vadhan [61] to our setting and show that if G is t -wise independent for large enough t, the probability that the chosen seed (or key) is “bad” for a particular encryption source is so small that we can take a union bound over all possible m and conclude that OAEP is in fact adaptive, meaning it is indeed encryption-compatible. Interestingly, we obtain better parameters in the case that f is regular, meaning every preimage set has the same size. However, our analysis still goes through assuming that every preimage set is sufficiently large, which we show can always be assumed with some loss in parameters.

Lossiness of RSA. To instantiate RSA-OAEP, it remains to show lossiness of RSA. Our final result is that RSA is indeed lossy under reasonable assumptions. We first show lossiness of RSA under the \(\Phi \)-Hiding Assumption (\(\Phi \)A) of Cachin, Micali, and Stadler [16]. \(\Phi \)A has been used as the basis for a number of efficient protocols, e.g., [15, 16, 29, 33]. \(\Phi \)A states roughly that given an RSA modulus \(N = pq\), it is hard to distinguish primes e that divide \(\phi (N) = (p-1)(q-1)\) from those that do not. Normal RSA parameters (Ne) are such that \(\gcd (e,\phi (N) = 1\). Under \(\Phi \)A, we may alternatively choose (Ne) such that e divides \(p-1\). The range of the RSA function is then reduced by a factor 1 / e. To resist known attacks, we can take the bit-length of e up to almost 1 / 4 that of N, giving RSA lossiness of almost k / 4 bits, where k is the modulus length.Footnote 5 We also stress that even though the only currently known algorithm to break the \(\Phi \)A with such parameters is to factor the modulus N, it is considerably stronger than the standard factoring/RSA assumptions.

In practice, e is usually chosen to be small for efficiency reasons. We observe that in this case more lossiness can be achieved by considering multi-prime RSA where \(N = p_1 \cdots p_m\) for \(m \ge 2\) (for a fixed modulus length). In the lossy case, we choose (Ne) such that e divides \(p_i\) for all \(1 \le i \le m-1\); the range of the RSA function is then reduced by a factor \(1/e^{m-1}\). In a preliminary version of this paper [37], we showed that the maximum bit-length of e in this case to avoid our best attack was roughly \(k(1/m - 2/m^2)\) where k is the modulus length. By devising better attacks, this value was subsequently reduced to \(k(2/3m^{2/3})\) by Herrmann [35] and \(k(1/m - 2/(em \log (m+1)))\), where e is the base of the natural logarithm, by Tosu and Kunihiro [60]. So, for a fixed modulus size we gain in lossiness only for small e. If we assume such multi-prime RSA moduli are indistinguishable from two-prime ones, we can achieve such a gain in lossiness in the case of standard (two-prime) RSA as well.

Implications for RSA-OAEP. Combining the results above gives that RSA-OAEP is IND-CPA in the standard model under (rather surprisingly, at least to us) simple, non-interactive, and non-interdependent assumptions on RSA and the hash functions. The parameters for RSA-OAEP supported by our proofs are discussed in Sect. 6. While they are considerably worse than what is expected in practice, we view the upshot of our results not as the concrete parameters they support, but rather that they increase the theoretical backing for the scheme’s security at a more qualitative level, showing it can be instantiated at least for larger parameters. In particular, our results give us greater confidence that chosen-plaintext attacks are unlikely to be found against the scheme; such attacks are known against the predecessor of RSA-OAEP in PKCS #1 v1.5 [22]. That said, we strongly encourage further research to try to improve the concrete parameters. Indeed, initial steps in this direction have already been taken; see Sect. 1.3 below.

Moreover, our analysis brings to light to some simple modifications that may increase the scheme’s security. The first is to key the hash function G. Although our results have some interpretation in the case that G is a fixed function (see below), it may be preferable for G to have an explicit, randomly selected key. It is in an interesting open question whether our proof can be extended to function families that use shorter keys. The second possible modification is to increase the length of the randomness versus that of the redundancy in the message when encrypting short messages under RSA-OAEP. Of course, we suggest these modifications only in cases where they do not impact efficiency too severely.

Using unkeyed hash functions. Formally, our results assume G is randomly chosen from a large family (i.e., it is a keyed hash function). However, our analysis actually shows that almost every function (i.e. all but a very small fraction) from the family yields a secure instantiation; we just do not know an explicit member that works. In other words, it is not strictly necessary that G be randomly chosen. When G is instantiated in practice using a fixed cryptographic hash function like MD5 or SHA1, it is plausible that the resulting instantiation is secure. One can also assume the fixed cryptographic hash function to be implicitly keyed, where the key (in this context called the initialization vector) is chosen and fixed by its designer, and hard-coded into its implementation.

On chosen-ciphertext security. Any extension of our results to security under chosen-ciphertext attack (IND-CCA) must get around the negative results of Kiltz and Pietrzak [38] (which we discuss in more detail in Sect. 1.2). One possible approach to this is based on the fact that, by the results of Bellare and Palacio [4], the notion of plaintext awareness (PA) + IND-CPA implies IND-CCA. Thus, in order to show IND-CCA security of RSA-OAEP in the standard model it suffices, by our results, to show PA (which is an orthogonal property to privacy). To show the latter one could try to use non-black-box assumptions on H along the lines of [18]. We leave a detailed investigation to future work.

1.2 Related Work

Security of OAEP in the RO model. In their original paper [5], Bellare and Rogaway showed that OAEP is IND-CPA assuming the TDP is one-way. They further showed it achieves a notion they called “plaintext awareness." Subsequently, Shoup [58] observed that the latter notion is too weak to imply security against chosen-ciphertext attacks, and in fact there is no black-box proof of IND-CCA security of OAEP based on one-wayness of the TDP. Fortunately, Fujisaki al. [28] proved that OAEP is nevertheless IND-CCA assuming so-called “partial-domain" one-wayness and that partial-domain one-wayness and (standard) one-wayness of RSA are equivalent.

Security of OAEP without ROs. Results on instantiability of OAEP have so far mainly been negative. Boldyreva and Fischlin [11] showed that (contrary to a conjecture of Canetti [17]) one cannot securely instantiate even one of the two hash functions (while still modeling the other as an RO) of OAEP under IND-CCA by a “perfectly one-way" hash function [17, 19] if one assumes only that f is partial-domain one-way. Brown [14] and Paillier and Villar [47] later showed that there are no “key-preserving" black-box proofs of IND-CCA security of RSA-OAEP based on one-wayness of RSA. Recently, Kiltz and Pietrzak [38] (building on the earlier work of Dodis et al. [24] in the signature context) generalized these results and showed that there is no black-box proof of IND-CCA (or even NM-CPA) security of OAEP based on any property of the TDP satisfied by an ideal (truly random) permutation.Footnote 6 In fact, their result can be extended to rule out a black-box proof of NM-CPA security of OAEP assuming the TDP is lossy [39], so our results are in some sense optimal given our assumptions.

Instantiations of related schemes. A positive instantiation result about a variant of OAEP called OAEP++ [40] (where part of the transform is output in the clear) was obtained by Boldyreva and Fischlin in [12]. They showed an instantiation that achieves (some weak form of) non-malleability under chosen-plaintext attacks (NM-CPA) for random messages, assuming the existence of non-malleable pseudorandom generators (NM-PRGs).Footnote 7 We note that the approach of trying to obtain positive results for instantiations under security notions weaker than IND-CCA originates from their work, and the authors explicitly ask whether OAEP can be shown IND-CPA in the standard model based on reasonable assumptions on the TDP and hash functions.

Another line of work has looked at instantiating other RO model schemes related at least in spirit to OAEP. Canetti [17] showed that the IND-CPA scheme in [6] can be instantiated using (a strong form of) perfectly one-way probabilistic hash functions. More recently, the works of Canetti and Dakdouk [18], Pandey al. [48], and Boldyreva et al. [9] obtained (partial) instantiations of the earlier IND-CCA scheme of [6]. Hofheinz and Kiltz [36] recently showed an IND-CCA secure instantiation of a variant of the DHIES scheme of [51].

1.3 Subsequent Work

Subsequent to the preliminary version of this paper [37], our results have been improved in several ways. First, as mentioned above, Hermann [35] and Tosu and Kunihiro [60] gave better cryptanalyses of our extension of \(\Phi \)A to the case of multiple primes. Furthermore, Lewko al. [42] resolved an open problem raised by our work and proved “approximate regularity” of lossy RSA on arithmetic progressions of sufficient length, leading to improved security bounds for RSA-OAEP; see Sect. 6. They also showed that this result gives a proof of IND-CPA security of RSA PKCS #1 v1.5. Subsequently, Smith and Zhang [59] proved a stronger result on approximate regularity of lossy RSA under a stronger assumption on RSA, leading to better parameters. They also fixed an erroneous claim of [42] about an “average-case” version of approximate regularity of lossy RSA, which can be used to prove large consecutive runs of input bits simultaneously hardcore without the stronger assumption on RSA.

Seurin [57] (building additionally on Freeman et al. [27]) showed how to extend our results to the case of the Rabin trapdoor function [50] instead of RSA. Hemenway el al. [34] showed how to use our result on the lossiness of RSA under \(\Phi \)A to obtain new constructions of non-committing encryption under this assumption. Bellare et al. [3] proved IND-CPA security of RSA-OAEP under standard one-wayness of RSA, but making a much stronger assumption on the hash functions than we do.

2 Preliminaries

Notation and conventions. For a probabilistic algorithm A, by

figure a

, we mean that A is executed on input x and the output is assigned to y, whereas if S is a finite set then by , we mean that s is assigned a uniform element of S. We sometimes use

figure b

to make A’s random coins explicit. We denote by \(\mathrm {Pr}\bigl [A(x) \,{\Rightarrow }\,y :\,\ldots ~]\) the probability that A outputs y on input x when x is sampled according to the elided experiment. Unless otherwise specified, an algorithm may be probabilistic and its running-time includes that of any overlying experiment. We denote by \(1^k\) the unary encoding of the security parameter k. We sometimes suppress dependence on k for readability. For \(i \in {\mathbb {N}}\) we denote by \(\{0,1\}^i\) the set of all binary strings of length i. If s is a string, then |s| denotes its length in bits, whereas if S is a set then |S| denotes its cardinality. By ‘\(\Vert \)’ we denote string concatenation. All logarithms are base 2.

Basic Definitions. Writing \(P_X(x)\) for the probability that a random variable X puts on x, the statistical distance between random variables X and Y with the same range is given by \(\Delta (X,Y) = \frac{1}{2} \sum _x |P_X(x) - P_Y(x)|\). If \(\Delta (X,Y)\) is at most \(\varepsilon \) then we say XY are \(\varepsilon \) -close and write \(X \approx _\varepsilon Y\). We say that X is independent if it is independent of all other random variables under consideration. The min-entropy of X is \(\mathrm {H}_\infty (X) = -\log (\max _x P_X(x))\). A random variable X over \(\{0,1\}^n\) is called an \((n,\ell )\) -source if \(\mathrm {H}_\infty (X) \ge \ell \). If \(\ell = n\) then X is said to be uniform. Let \(f : A \rightarrow B\) be a function. We denote by R(f) the range of f, i.e., \(\{b \in B~|~\exists a \in A, f(a) = b\}\). We call |R(f)| the range size of f. We call f regular if each pre-image set is the same size, i.e., \(|\{x \in D~|~f(x) = y \}|\) is the same for all \(y \in R\).

Public-key encryption and its security. A public-key encryption scheme with message-space \(\mathrm {MsgSp}\) is a triple of algorithms \({\mathcal {AE}}= ({\mathcal {K}}, {\mathcal {E}}, {\mathcal {D}})\). The key generation algorithm \({\mathcal {K}}\) returns a public key \( pk \) and matching secret key \( sk \). The encryption algorithm \({\mathcal {E}}\) takes \( pk \) and a plaintext m to return a ciphertext. The deterministic decryption algorithm \({\mathcal {D}}\) takes \( sk \) and a ciphertext c to return a plaintext. We require that for all messages \(m \in \mathrm {MsgSp}\)

is (very close to) 1.

To an encryption scheme \(\Pi = ({\mathcal {K}}, {\mathcal {E}},{\mathcal {D}})\) and an adversary \(A = (A_1, A_2)\), we associate a chosen-plaintext attack experiment,

figure c

where we require A’s output to satisfy \(|m_0| = |m_1|\). Define the ind-cpa advantage of A against \(\Pi \) as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ind\text{- }cpa}}_{\Pi ,A}(k) = 2\cdot \Pr \left[ \, \mathbf {Exp}^{\mathrm {ind\text{- }cpa}}_{\Pi ,A}(k) \,{\Rightarrow }\,1 \,\right] - 1. \end{aligned}$$

Lossy trapdoor permutations. A lossy trapdoor permutation (LTDP) generator [49]Footnote 8 is a pair \(\mathsf {LTDP}= ({\mathcal {F}}, {\mathcal {F}}')\) of algorithms. Algorithm \({\mathcal {F}}\) is a usual trapdoor permutation (TDP) generator, namely it outputs a pair \((f, f^{-1})\) where f is a (description of a) permutation on \(\{0,1\}^k\) and \(f^{-1}\) its inverse. Algorithm \({\mathcal {F}}'\) outputs a (description of a) function \(f'\) on \(\{0,1\}^k\). We call \({\mathcal {F}}\) the “injective mode" and \({\mathcal {F}}'\) the “lossy mode" of \(\mathsf {LTDP}\) respectively, and we call \({\mathcal {F}}\) “lossy” if it is the first component of some lossy TDP. For a distinguisher D, define its ltdp-advantage against \(\mathsf {LTDP}\) as

We say \(\mathsf {LTDP}\) has residual leakage \(s\) if for all \(f'\) output by \({\mathcal {F}}'\) we have \(|R(f')| \le 2^s\). The lossiness of \(\mathsf {LTDP}\) is \(\ell =k - s\).

t

-wise independent hashing. Let \(H :{\mathcal {K}}\times D \rightarrow R\) be a (keyed) hash function. We say that H is t -wise independent [62] if for all distinct \(x_1,\ldots , x_t \in D\) and all \(y_1, \ldots , y_t \in R\)

In other words, \(H(K,x_1),\ldots ,H(K,x_t)\) are all uniform and independent.

3 Padding-Based Encryption from Lossy TDP + Fooling Extractor

In this section, we show a general result on how to build IND-CPA secure padding-based encryption (PBE) without using random oracles, by combining a lossy TDP with a “fooling extractor" for small-range distinguishers.

3.1 Background and Tools

We first provide the definitions relevant to our result.

Padding-based encryption. The idea behind padding-based encryption (PBE) is as follows: We start with a k-bit to k-bit trapdoor permutation (e.g., RSA) and wish to build a secure encryption scheme. As in [5], we are interested in encrypting messages of less than k bits to ciphertexts of length k. It is well-known that we cannot simply encrypt messages under the TDP directly to achieve strong security. So, in a PBE scheme we “upgrade" the TDP by first applying a randomized and invertible “padding transform" to a message prior to encryption.

Our definition of PBE largely follows the recent formalization in [38]. Let \(k,\mu ,\rho \) be three integers such that \(\mu +\rho \le k\). A padding transform \((\pi ,{\hat{\pi }})\) consists of two mappings \(\pi : \{0,1\}^{\mu + \rho } \rightarrow \{0,1\}^k\) and \({\hat{\pi }}: \{0,1\}^k \rightarrow \{0,1\}^\mu \cup \{\bot \}\) such that \(\pi \) is injective and the following consistency requirement is fulfilled:

$$\begin{aligned} \forall m \in \{0,1\}^\mu ,r \in \{0,1\}^\rho \,:\quad {\hat{\pi }}(\pi (m \, \Vert \,r)) = m. \end{aligned}$$

A padding transform generator is an algorithm \(\Pi \) that on input \(1^k\) outputs a (description of a) padding transform \((\pi ,{\hat{\pi }})\). Let \({\mathcal {F}}\) be a k-bit trapdoor permutation generator and \(\Pi \) be a padding transform generator. Define the associated padding-based encryption scheme \({\mathcal {AE}}_\Pi [{\mathcal {F}}] = ({\mathcal {K}}, {\mathcal {E}}, {\mathcal {D}})\) with message-space \(\{0,1\}^\mu \) by

figure d

Padding-based encryption schemes have long been prevalent in practice, for example PKCS #1 [55]. While OAEP [5] is the best-known, the notion also captures later schemes such as SAEP [13] and PSS-E [23].

Fooling extractors. We define a new notion that we call “fooling extractor for small-range distinguishers" or just “fooling extractor.” Intuitively, fooling extractors are a type of randomness extractor [46] that “fools" distinguishers with small-range output. We give some more intuition after the formal definition.

Definition 3.1

Let \(\mathsf {FExt}:\{0,1\}^c \times \{0,1\}^n \rightarrow \{0,1\}^k\) be a function and let \({\mathcal {X}}= \{X_1, \ldots , X_q\}\) be a class of \((n,\ell )\)-sources (as defined in Sect. 2). We say that \(\mathsf {FExt}\) fools range- \(2^s\) distinguishers on \({\mathcal {X}}\) with probability \(1-\varepsilon \) (or is an \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}\)) if for all functions \(f'\) on \(\{0,1\}^k\) with range size at most \(2^s\) and all \(1 \le i \le q\):

$$\begin{aligned} (K,f'(\mathsf {FExt}(K,X_i)) \approx _{\varepsilon } (K,f'(U)) \,, \end{aligned}$$

where K is uniform on \(\{0,1\}^c\) and U is uniform and independent on \(\{0,1\}^n\). We call K the key or seed of \(\mathsf {FExt}\). Note that K is independent of i above.

We say that \(\mathsf {FExt}\) adaptively fools range-\(2^s\) distinguishers on \({\mathcal {X}}\) with probability \(1-\varepsilon \) (or is an adaptive \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}\)) if for all functions \(f'\) on \(\{0,1\}^k\) with range size at most \(2^s\):

Since , the above implies that \((K,f'(\mathsf {FExt}(K,X_i)) \approx _{\varepsilon } (K,f'(U))\) for i depending on K (or, put differently, \((K,f'(\mathsf {FExt}(K,X_i)) \approx _{\varepsilon } (K,f'(U))\) holds for every i over the same choice of K).

As a useful special case, we say that \(\mathsf {FExt}\) fools range-\(2^s\) regular distinguishers on \({\mathcal {X}}\) with probability \(1-\varepsilon \) (or is a regular \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}\)) if we quantify only over regular f in the definition. An adaptive regular \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}\) is defined analogously.

We note that while the intuition given prior to the definition describes fooling the function f, it actually requires fooling an “implicit” or “external” distinguisher that sees both the output \(f'(\mathsf {FExt}(K,X_i))\) of f and the extractor seed K. This crucial for the definition to be meaningful. Indeed, just asking that \(f'(\mathsf {FExt}(K,X_i))\) be indistinguishable from f(U) for all small-range functions f is equivalent to asking only that \(\mathsf {FExt}(K,X_i)\) be indistinguishable from U. This latter requirement is trivial to achieve (if one is not concerned with key length)–for example, by using K as a one-time pad.

We also note that the concept of fooling extractors was implicit in the work of Dodis and Smith [26] on error-correction without leaking partial information, whose “Crooked” Leftover Hash Lemma establishes in our language that a pairwise-independent function is a \((s,\varepsilon )\)-fooling extractor for every singleton \((n,\ell )\)-source X where \(s \le \ell - 2 \log (1/\varepsilon ) + 2\). This lemma was later applied in the context of deterministic public-key encryption by Boldyreva et al. [10], who also gave a simpler proof.

3.2 The Result

To state our result, we first formalize the concept of encryption-compatible padding transforms.

Definition 3.2

Let \(\Pi \) be a padding transform generator whose coins are drawn from \(\mathsf {Coins}\). Define the associated function \(h_\Pi : \mathsf {Coins}\times \{0,1\}^{\mu + \rho } \rightarrow \{0,1\}^k\) by \(h(cc,m \Vert r) = \pi (m \Vert r)\) for all \(cc \in \mathsf {Coins}, m \in \{0,1\}^\mu , r \in \{0,1\}^\rho \), where \((\pi ,{\hat{\pi }}) \leftarrow \Pi (1^k; cc)\). Define the class \({\mathcal {X}}_\Pi \) of encryption sources associated to \(\Pi \) as containing all sources of the form (mR), where \(m \in \{0,1\}^\mu \) is fixed and \(R \in \{0,1\}^\rho \) is uniform. (Note that the class \({\mathcal {X}}_\Pi \) therefore contains \(2^\mu \) distinct \((\mu +\rho ,\rho )\)-sources.) We say that \(\Pi \) is \((s,\varepsilon )\) -encryption-compatible if \(h_\Pi \) as above is an adaptive \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}_\Pi \). (Here \(\mathsf {Coins}\) plays the role of \(\{0,1\}^c\) in Definition 3.1.) A regular \((s,\varepsilon )\)-encryption-compatible padding transform generator is defined analogously.

Theorem 3.3

Let \(\mathsf {LTDP}= ({\mathcal {F}},{\mathcal {F}}')\) be an LTDP with residual leakage \(s\), and let \(\Pi \) be an \((s,\varepsilon )\)-encryption-compatible padding transform generator. Then, for any IND-CPA adversary A against \({\mathcal {AE}}_\Pi [{\mathcal {F}}]\), there is an adversary D against \(\mathsf {LTDP}\) such that for all \(k \in {\mathbb {N}}\)

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ind}\text{- }\mathrm {cpa}}_{{\mathcal {AE}},A}(k) \,\le \,\mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP},D}(k) + \varepsilon . \end{aligned}$$

Furthermore, the running-time of D is the time to run A.

Proof

Given \(A=(A_1, A_2)\), we define three games, called \(G_0,G_1,G_2\), in Fig. 1. Note that game \(G_0\) is the experiment \(\mathbf {Exp}^{\mathrm {ind\text{- }cpa}}_{\Pi ,A}(k)\) defining IND-CPA security. We claim that for a distinguisher D against \(\mathsf {LTDP}\) that is simple to construct, we have

$$\begin{aligned} \frac{1}{2} + \mathbf {Adv}^{\mathrm {ind\text{- }cpa}}_{{\mathcal {AE}}_\Pi [{\mathcal {F}}],A}(k)&\,=\,&\Pr \left[ \, G_0 \,{\Rightarrow }\,1 \,\right] \end{aligned}$$
(1)
$$\begin{aligned}&\,\le \,&\Pr \left[ \, G_1 \,{\Rightarrow }\,1 \,\right] + \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP},D}(k) \end{aligned}$$
(2)
$$\begin{aligned}&\,\le \,&\Pr \left[ \, G_2 \,{\Rightarrow }\,1 \,\right] + \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP},D}(k) + \varepsilon \end{aligned}$$
(3)
$$\begin{aligned}&\,=\,&\frac{1}{2} + \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP},D}(k) + \varepsilon \,, \end{aligned}$$
(4)

from which the theorem follows by re-arranging terms. So let us justify the above.

Equation (1) is true by the definition of IND-CPA security.

For (2) we can construct a distinguisher D as required since \(G_0, G_1\) do not use \(f^{-1}\) in any way.

Equation (3) is true by the definition of encryption compatibility. Namely, since \(h_\Pi \) in the definition is an adaptive \((s,\varepsilon )\)-fooling extractor for \({\mathcal {X}}_\Pi \), we know the expectation over the coins cc is at most \(\varepsilon \) for m depending on cc (and hence \(\pi \)), where \((\pi ,{\hat{\pi }}) \leftarrow \Pi (1^k; cc)\), of \(\Delta (f'(\pi (m,R)), f'(U))\), so in particular it holds for \(m = m_b\) in game \(G_1\).

Finally, (4) uses the fact that in \(G_2\) no information about b is given to A. Note that the final two steps in the proof are information-theoretic, meaning they do not use any assumption about A’s running-time. \(\square \)

Fig. 1
figure 1

Games for the proof of Theorem 3.3. Shaded areas indicate the differences between games

Remark 3.4

The analogous result holds for regular LTDPs and regular encryption-compatible padding transforms. That is, if the LTDP is regular, then it suffices to use a regular encryption-compatible padding transform to obtain the same conclusion. The latter may be easier to design or more efficient than in the general case; indeed, we get better parameters for OAEP in the regular case in Sect. 4. Furthermore, known examples of LTDPs (including RSA, as shown in Sect. 5) are regular, although a technical issue about the domain of RSA versus the output range of OAEP makes it challenging to exploit this for RSA-OAEP; see Sect. 6.

4 OAEP as a Fooling Extractor

In this section, we show that the OAEP padding transform of Bellare and Rogaway [5] is encryption-compatible as defined in Sect. 3 if its initial hash function is t-wise independent for t depending on the message length and lossiness of the TDP.

4.1 OAEP

We recall the OAEP padding transform of Bellare and Rogaway [5], lifted to the “instantiated” setting, i.e., where its hash functions may be keyed. (The original scheme was defined for unkeyed hash functions.) Let \(G :{\mathcal {K}}_G \times \{0,1\}^\rho \rightarrow \{0,1\}^\mu \) and \(H :{\mathcal {K}}_H \times \{0,1\}^\mu \rightarrow \{0,1\}^\rho \) be hash functions. The associated padding transform generator \(\mathsf {OAEP}[G,H]\) on input \(1^k\) returns \((\pi _{K_G,K_H},{\hat{\pi }}_{K_G,K_H})\), where

figure e

and

figure f

, defined via

figure g

See Fig. 2 for a graphical illustration.

Fig. 2
figure 2

Algorithms \(\pi _{K_G,K_H}(m,r)\) and \({\hat{\pi }}_{K_G,K_H}(s,t)\) for \(\mathsf {OAEP}[G,H]\)

Remark 4.1

Since we mainly study IND-CPA security, for simplicity we define above the “no-redundancy" version of OAEP, i.e., corresponding to the “basic scheme" in [5]. However, all our results also holds for the redundant version. Additionally, as is typical in the literature, we have defined OAEP to apply the G-function to the least-significant bits of the input; in standards and implementations, it is typically the most significant bits (where the order of m and r are switched). Again, we stress that our results hold in either case.

4.2 Analysis

The following establishes that OAEP is encryption-compatible if the hash function G is t-wise independent for appropriate t. No restriction is put on the other hash function H. Indeed, our result also applies to SAEP [13] (although the latter is neither standardized nor known to provide CCA security in the RO model, except in certain cases).

Theorem 4.2

Let \(G :{\mathcal {K}}_G \times \{0,1\}^\rho \rightarrow \{0,1\}^\mu \) and \(H :{\mathcal {K}}_H \times \{0,1\}^\mu \rightarrow \{0,1\}^\rho \) be hash functions, and suppose G is t-wise independent. Let \(\mathsf {OAEP}= \mathsf {OAEP}[G,H]\). Then

  1. (1)

    \(\mathsf {OAEP}\) is \((s,\varepsilon )\)-encryption-compatible where \(\varepsilon = 2^{-u}\) for \(u = \frac{t}{3t+2}(\rho - s - \log t + 2) - \frac{2(\mu + s)}{3t+2} - 1\).

  2. (2)

    \(\mathsf {OAEP}\) is regular \((s,\varepsilon )\)-encryption-compatible where \(\varepsilon = 2^{-u}\) for \(u = \frac{t}{2t+2}(\rho - s - \log t + 2) - \frac{\mu + s + 2}{t+1}-1\).

  3. (3)

    When \(t = 2\), \(\mathsf {OAEP}\) is \((s,\varepsilon )\)-encryption-compatible where \(\varepsilon = 2^{-u}\) for \(u = (\rho - s - 2\mu )/4 - 1\).

Note that parts (2) and (3) capture special cases of (1) in which we get better bounds. The techniques used in the proof were first developed in the context of the classical LHL by Trevisan and Vadhan [61] and Dodis, Sahai and Smith [25], though the style of presentation of our theorem statement and proof are inspired by Barak al. [1, Lemma1]. We mention that due to our use of (variants of) the Crooked LHL rather than the classical one and the stucture of OAEP, some of the technical details differ in our case and require new ideas.

Corollary 4.3

Let \(G :{\mathcal {K}}_G \times \{0,1\}^\rho \rightarrow \{0,1\}^\mu \) and \(H :{\mathcal {K}}_H \times \{0,1\}^\mu \rightarrow \{0,1\}^\rho \) be hash functions and suppose that G is t-wise independent for \(t\ge 3 \frac{\mu +s}{\rho -s}\). Then \(\mathsf {OAEP}[G,H]\) is \((s,\varepsilon )\)-encryption-compatible where \(\varepsilon =\exp (-c(\rho -s - \log t))\) for a constant \(c>0\).

In particular, \(c\approx 1/2\) for regular functions. For such a function, if \(\rho - s\) is at least 180, then \(\varepsilon \) is roughly \(2^{-80}\) for \(t=10\) and message lengths \(\mu \le 2^{15}\) (which for practical purposes does not restrict the message-space). Applying Theorem 3.3, we see that if G is 10-wise independent and the number of random bits used in OAEP is at least 180 bits larger than the residual lossiness of the TDP, then the security of OAEP is tightly related to that of the lossy TDP.

Remark 4.4

To show security of OAEP against what we call key-independent chosen-plaintext attack, it suffices to argue that \(\mathsf {OAEP}[G,H]\) is a fooling extractor for any fixed encryption source \(X = (m,R)\) where \(m \in \{0,1\}^\mu \). The latter holds for any \(\varepsilon > 0\) and \(s \le \rho - 2 \log (1/\varepsilon ) + 2\) assuming G is only pairwise-independent (i.e., \(t = 2\)). See Appendix 8 for details.

Proof

(of Theorem 4.2) We now prove the above theorem.

Overview. We write \(\mathsf {OAEP}\) for \(\mathsf {OAEP}[G,H]\). The high-level idea for all three parts of the theorem is the same. Fix a lossy function \(f'\) with range size at most \(2^s\). We first show that for every fixed message \(m\in \{0,1\}^\mu \), with high probability (say \(1-\delta \)) over the choice of \(K_G\), the statistical distance between \(f'(\mathsf {OAEP}(m,R))\) and \(f'(U)\) is small (say \({\hat{\varepsilon }}\)). This aspect of the proof changes from part to part. We then take a union bound to show that the above holds for all messages over the same choice of \(K_G\) with probability at least \(1-2^\mu \delta \). This means that the statistical distance between the pair \((K_G,f'(\mathsf {OAEP}(m,R)))\) and \((K_G, f'(\mathsf {OAEP}(U)))\) is at most \(\varepsilon ={\hat{\varepsilon }}+2^{\mu }\delta \) for all messages over the same choice of \(K_G\). Finally, we express \(\delta \) as a function of \({\hat{\varepsilon }}\), and select \({\hat{\varepsilon }}\) to minimize this sum. Note that the entire argument works for any choice of H.

We first prove part (3) of the theorem, then part (2), and finally part (1).

Proof of part (3). To prove part (3) of the theorem, we strengthen the Crooked LHL of [26] to give the distinguisher access to the input to the fooling function as well its output.

Lemma 4.5

(Augmented Crooked LHL.) Let \(h :{\mathcal {K}}\times A \rightarrow B\) be a pairwise-independent function and let \(g :A \times B \rightarrow S\) be a function. Let X be a random variable on A such that \(\mathrm {H}_\infty (X) \ge \lg |S| + 2\lg (1/{\hat{\varepsilon }}) - 2\) for some \({\hat{\varepsilon }} > 0\). Then

$$\begin{aligned} \Delta ((K, g(X,h(K,X)), (K, g(X,U)) \,\le \,{\hat{\varepsilon }} \,, \end{aligned}$$

where

figure h

and .

The proof, which extends the proof of the Crooked LHL given in [10], is in Appendix 1.

Now we let G play the role of h in Lemma 4.5 and let \(\{0,1\}^\rho \) and \(\{0,1\}^\mu \) play the roles of A and B, respectively. Let g in the lemma be defined by \(g(a,b) = f(m {\,\oplus \,}a \Vert b {\,\oplus \,}H(K_H, m {\,\oplus \,}a))\) for arbitrary but fixed \(m \in \{0,1\}^\mu , K_H \in {\mathcal {K}}_H\). It follows that OAEP is a \((s,{\hat{\varepsilon }})\)-fooling extractor for every fixed encryption source X of the form (mR). Part (3) of the theorem now follows by applying Markov’s inequality and taking a union bound over all such sources.

In more detail, let \(f'\) be any function on \(\{0,1\}^k\) to a set \({\mathcal {Y}}\) of size at most \(2^s\), and let \(X = (m,R)\) be any \((\mu + \rho ,\rho )\)-source, where \(m \in \{0,1\}^\mu \) is fixed and R is uniform over \(\{0,1\}^\rho \). Define random variable \(Z_{K_G,K_H}\) to take value \(\Delta (f'(\pi _{k_G,k_H}(m \Vert R), f'(U))\) for U uniform on \(\{0,1\}^k\), if \(K_G = k_G\) and \(K_H = k_H\), where here and in what follows the probability is over the random choices of \(K_G\) and \(K_H\) (although as the distribution on \(K_H\) does not matter – we use only the fact that it is independent of \(m,R,K_G\)). Then applying Lemma 4.5 as explained above, we have \({\mathbf{E}}\left[ \, Z_{K_G,K_H} \,\right] \le 1/2 \sqrt{|S|\cdot 2^{-\rho }}\). Thus by Markov’s inequality

$$\begin{aligned} \Pr \left[ \, Z_{K_G,K_H} \ge {\hat{\varepsilon }} \,\right] \le \frac{\sqrt{2^{s-\rho }}}{2{\hat{\varepsilon }}} \end{aligned}$$

for any \({\hat{\varepsilon }} > 0\). By a union bound, the probability that the above holds simultaneously for all \(2^\mu \) possible \((\mu + \rho ,\rho )\)-sources \(X = (m,R)\) is at least \(1-\delta _{{\hat{\varepsilon }}}\), where

$$\begin{aligned} \delta _{{\hat{\varepsilon }}} = \frac{ 2^{\mu } \cdot \sqrt{2^{s-\rho }}}{2{\hat{\varepsilon }}}. \end{aligned}$$

It now follows (by a conditioning argument) that \(\mathsf {OAEP}\) is \((s,\varepsilon )\)-encryption-compatible with \(\varepsilon ={\hat{\varepsilon }}+\delta _{{\hat{\varepsilon }}}\). Note that \(\delta _{{\hat{\varepsilon }}}\) can be written in the form \( \gamma \cdot {\hat{\varepsilon }}^{-1}\) (where \(\gamma \) depends on \(\rho ,s,\mu \) but not \({\hat{\varepsilon }}\)). Setting \({\hat{\varepsilon }}=\gamma ^{1/2}\) yields \(\varepsilon \le 2 \gamma ^{1/2}\) and part (3) of the Theorem follows by observing that

$$\begin{aligned} u = -\log \varepsilon\ge & {} -\frac{1}{2} \cdot \log \gamma -1 \\= & {} -\frac{1}{2} \cdot (\mu + 1/2(s-\rho ) ) -1\\= & {} (\rho - s - 2\mu )/4 - 1. \end{aligned}$$

Proof of part (2). Instead of Markov’s inequality, the proof of part (2) of the theorem uses a stronger tail inequality for t-wise independent random variables, due to Bellare and Rompel [7] (our application was inspired by the use of t-wise independence by Trevisan and Vadhan [61] and Dodis, Sahai, and Smith [25]).

Let \(f'\) be any function on \(\{0,1\}^k \) to a set \({\mathcal {Y}}\) of size at most \(2^s\). For this part of the theorem, assume that \(f'\) is regular, that is, that each preimage set has size exactly \(2^{k-s}\). Let \(X = (m,R)\) be any \((\mu + \rho ,\rho )\)-source, where \(m \in \{0,1\}^\mu \) is fixed and R is uniform over \(\{0,1\}^\rho \). For each \(r \in \{0,1\}^\rho \) and \(y \in {\mathcal {Y}}\), define the random variable

$$\begin{aligned} Z_{r,y}= {\left\{ \begin{array}{ll} 2^{-\rho } &{} \text {if}\ f'(\pi _{K_G,K_H}(m \Vert r)) = y\, , \\ 0 &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

where as before the probability is over the random choices of \(K_G\) and \(K_H\) (although as before the distribution on \(K_H\) does not matter – we use only the fact that it is independent of \(m,R,K_G\)). Let \(Z_y = \sum _r Z_{r,y}\). We claim that \({\mathbf{E}}\left[ \, Z_y \,\right] = 2^{-s}\). To see this, note that

$$\begin{aligned} {\mathbf{E}}\left[ \, Z_y \,\right] = \sum _r 2^{-\rho } \cdot \Pr \left[ \, f'(U \Vert r) = y \,\right] = \Pr \left[ \, f'(U \Vert R) = y \,\right] = 2^{-s} \end{aligned}$$

where we use the fact that R is uniform and \(f'\) is regular.

To bound the deviation of \(Z_y\) from its mean, note that for a fixed y, the variables \(\{Z_{r,y}\}_{r\in \{0,1\}^{\rho }}\) are t-wise independent (by the t-wise independence of G) and take values in \([0,2^{-\rho }]\). We can apply the following tail bound (modified from the original to apply to random variables in \([0,M]\) rather than [0, 1]).

Lemma 4.6

(Bellare and Rompel [7]) Let \(A_1, \ldots A_n\) be t-wise independent random variables taking values in \([0,M]\). Let \(A = \sum _i A_i\) and \(\delta \le 1\). Then

$$\begin{aligned} \Pr \left[ \, |A - {\mathbf{E}}\left[ \, A \,\right] | \ge \delta \cdot {\mathbf{E}}\left[ \, A \,\right] \,\right] \le c_t \left( \frac{t\cdot M}{\delta ^2 \cdot {\mathbf{E}}\left[ \, A \,\right] }\right) ^{t/2} \end{aligned}$$

where \(c_t < 3\) and \(c_t < 1\) when \(t \ge 8\).

Setting \(\delta = 2 {\hat{\varepsilon }}\), we get that for every \(y\in {\mathcal {Y}}\),

$$\begin{aligned} \Pr \left[ \, |Z_y - 2^{-s}| \ge 2 {\hat{\varepsilon }} \cdot 2^{-s} \,\right] \le c_t \left( \frac{t}{4{\hat{\varepsilon }}^2 \cdot 2^{-s + \rho }}\right) ^{t/2} . \end{aligned}$$
(5)

By a union bound, the probability that there exists a \(y\in {\mathcal {Y}}\) such that \(|Z_y - 2^{-s}| \ge 2 {\hat{\varepsilon }} \cdot 2^{-s}\) is at most

$$\begin{aligned} 2^s c_t \left( \frac{t}{4{\hat{\varepsilon }}^2 \cdot 2^{-s}}\right) ^{t/2}. \end{aligned}$$

Observe that if \(|Z_y - 2^{-s}| \ge 2 {\hat{\varepsilon }} \cdot 2^{-s}\) for all \(y \in {\mathcal {Y}}\) then, letting Y denote the random variable \(f'(\pi _{K_G,K_H}(m, R))\), we have

$$\begin{aligned} \Delta ((K_G,K_H,Y), (K_G,K_H,f'(U)) \,\le \,\frac{1}{2} \sum _{y \in {\mathcal {Y}}} |Z_y - 2^{-s}| \,=\,\sum _{y \in {\mathcal {Y}}} {\hat{\varepsilon }} \cdot 2^{-s} \,=\,{\hat{\varepsilon }} . \end{aligned}$$

By another union bound, the probability that the above holds simultaneously for all \(2^\mu \) possible \((\mu + \rho ,\rho )\)-sources \(X = (m,R)\) is at least \(1-\delta _{{\hat{\varepsilon }}}\), where

$$\begin{aligned} \delta _{{\hat{\varepsilon }}} =2^{\mu + s} c_t \left( \frac{t}{4{\hat{\varepsilon }}^2 \cdot 2^{-s +\rho }}\right) ^{t/2} . \end{aligned}$$
(6)

It now follows (by a conditioning argument) that \(\mathsf {OAEP}\) is \((s,\varepsilon )\)-encryption-compatible with \(\varepsilon ={\hat{\varepsilon }}+\delta _{{\hat{\varepsilon }}}\). Note that \(\delta _{{\hat{\varepsilon }}}\) can be written in the form \( \gamma \cdot {\hat{\varepsilon }}^{-t}\) (where \(\gamma \) depends on \(t,\rho ,s,\mu \) but not \({\hat{\varepsilon }}\)). Setting \({\hat{\varepsilon }}=\gamma ^{1/(t+1)}\) yields \(\varepsilon \le 2 \gamma ^{1/(t+1)}\) and part (2) of the Theorem follows by observing that

$$\begin{aligned} u = -\log \varepsilon\ge & {} -\frac{1}{t+1} \cdot \log \gamma -1 \\= & {} -\frac{1}{t+1} \cdot (\frac{t}{2} (\rho -s-\log t +2)+\mu + s + \log c_t) -1\\\ge & {} \frac{t}{2t+2} \cdot (\rho -s-\log t+2)-\frac{\mu +s+2}{t+1} -1. \end{aligned}$$

Proof of part (1). We now turn to proving the lemma for general (not necessarily balanced) functions \(f'\). We first give a proof for approximately balanced functions, in which no pre-image set is too small; we then show that this implies a bound for arbitrary functions.

Assume for now that \(\min _{y \in {\mathcal {Y}}} |\mathsf {preimg}_{f'}(y)| \ge \lambda \cdot 2^{k-s}\) for some real number \(0<\lambda \le 1\) (note that regularity corresponds to \(\lambda = 1\)), where \(\mathsf {preimg}_{f'}(y) = \{x \in \{0,1\}^k~|~f(x) =y\}\) We sketch how to modify the proof of part (2) under this assumption; essentially, we end up with an extra factor of \(\lambda \) in the denominator of Eq. 6. We use the same definition of \(Z_y\) as in part (2). Instead of \({\mathbf{E}}\left[ \, Z_y \,\right] = 2^{-s}\), we now have \({\mathbf{E}}\left[ \, Z_y \,\right] = \Pr \left[ \, f(U \Vert R) = y \,\right] = |\mathsf {preimg}_{f'}(y)|/2^k\). Thus, instead of Eq. (5), we have

$$\begin{aligned}&\Pr \left[ \, |Z_y - |\mathsf {preimg}_{f'}(y)|/2^k| \ge 2 {\hat{\varepsilon }} \cdot |\mathsf {preimg}_{f'}(y)|/2^k \,\right] \\&\quad \,\le \,c_t \left( \frac{t}{4{\hat{\varepsilon }}^2 \cdot |\mathsf {preimg}_{f'}(y)|/2^k \cdot 2^{\rho }}\right) ^{t/2}. \end{aligned}$$

Using \(\min _{y \in {\mathcal {Y}}} |\mathsf {preimg}_{f'}(y)| \ge \lambda \cdot 2^{k-s}\) and taking a union bound, we get that the probability that there exists \(y\in {\mathcal {Y}}\) such that

$$\begin{aligned} |Z_y - |\mathsf {preimg}_{f'}(y)|/2^k| \ge 2 {\hat{\varepsilon }} \cdot |\mathsf {preimg}_{f'}(y)|/2^k| \end{aligned}$$
(7)

is at most

$$\begin{aligned} 2^s c_t \left( \frac{t}{4{\hat{\varepsilon }}^2 \cdot \lambda \cdot 2^{-s} \cdot 2^{\rho }}\right) ^{t/2}. \end{aligned}$$
(8)

We can obtain a bound for arbitrary functions \(f'\) by noting that every function \(f'\) is “close” to a function with no small pre-images. Specifically:

Claim 4.7

Let \(f' :\{0,1\}^k \rightarrow {\mathcal {Y}}\) where \(|{\mathcal {Y}}| \le 2^s\) be a function. For any real number \(\lambda >0\), there exists a function \(g' :\{0,1\}^k \rightarrow {\mathcal {Y}}\) such that (i) \(\min _{y \in {\mathcal {Y}}} |\mathsf {preimg}_{g'}(y)| \ge \lambda \cdot 2^{k-s}\); and (ii) the function \(g'\) agrees with \(f'\) on a \(1-\lambda \) fraction of its domain. In particular, \(\Delta (f'(U),g'(U)) \le \lambda \).

We can now prove part (3) of the Theorem from Eq. (8) by choosing \(\lambda = {\hat{\varepsilon }}\) in the claim and then completing the analysis as in part (2). It remains to prove the claim.

Proof (of Claim 4.7): The idea is that we will take all the small pre-image sets of \(f'\) and merge them together with some larger preimage set (e.g., if 0 has a large pre-image set, then for all elements x such that \(\mathsf {preimg}_{f'}(f'(x))\) is small, we set \(f(x)=0\)). How many elements can belong to small pre-image sets? There are at most \(2^s\) pre-image sets, each of which contains at most \(\lambda \cdot {2^{k-s}}\) elements. So there are at most \(\lambda \cdot 2^k\) elements of the domain on which \(f'\) has to be changed.\(\square \)

This concludes the proof of the Theorem.

5 Lossiness of RSA

In this section, we show that the RSA trapdoor permutation is lossy under reasonable assumptions. In particular, we show that, for large enough encryption exponent e, RSA is considerably lossy under the \(\Phi \)-Hiding Assumption of [16]. We then show that by generalizing this assumption to multi-prime RSA we can get even more lossiness. Finally, we propose a “Two-Or-m-Primes” Assumption that, when combined with the former, amplifies the lossiness of standard (two-prime) RSA for small e.

5.1 Background on RSA and Notation

We denote by \({\mathcal {RSA}}_k\) the set of all tuples (Npq) such that \(N=pq\) is the product of two distinct k / 2-bit primes. Such an N is called an RSA modulus. By we mean that (Npq) is sampled according to the uniform distribution on \({\mathcal {RSA}}_k\). An RSA TDP generator [53] is an algorithm \({\mathcal {F}}\) that returns (Ne), (Nd), where N is an RSA modulus and \(ed \equiv 1 \pmod {\phi (N)}\). (Here \(\phi (\cdot )\) denotes Euler’s totient function, so in particular \(\phi (N) = (p-1)(q-1)\).) The tuple (Ne) defines the permutation on \({{\mathbb {Z}}}_N^*\) given by \(f(x)=x^e \bmod N\), and similarly (Nd) defines its inverse. We say that a lossy TDP generator \(\mathsf {LTDP}= ({\mathcal {F}}, {\mathcal {F}}')\) is an RSA LTDP if \({\mathcal {F}}\) is an RSA TDP generator.

To define the \(\Phi \)-Hiding Assumption and later some extensions of it, the following notation is also useful. For \(i \in {\mathbb {N}}\) we denote by \({\mathcal {P}}_i\) the set of all i-bit primes. Let R be a relation on p and q. By \({\mathcal {RSA}}_k[R]\) we denote the subset of \({\mathcal {RSA}}_k\) for that the relation R holds on p and q. For example, let e be a prime. Then \({\mathcal {RSA}}_k[p=1 \bmod e]\) is the set of all (Npq), where \(N=pq\) is the product of two distinct k / 2-bit primes pq and \(p=1 \bmod e\). That is, the relation R(pq) is true if \(p=1 \bmod e\) and q is arbitrary. By we mean that (Npq) is sampled according to the uniform distribution on \({\mathcal {RSA}}_k[R]\).

5.2 RSA Lossy TDP from \(\Phi \)-Hiding

\(\Phi \)

-Hiding Assumption (\(\Phi \)

A). We recall the \(\Phi \)-Hiding Assumption of [16]. For an RSA modulus N, we say that N \(\phi \) -hides a prime e if \(e~|~\phi (N)\). Intuitively, the assumption is that, given RSA modulus N, it is hard to distinguish primes which are \(\phi \)-hidden by N from those that are not. Formally, let \(0<c < 1/2\) be a (public) constant determined later. Consider the following two distributions:

To a distinguisher D, we associate its \(\Phi A\) advantage defined as

$$\begin{aligned} \mathbf {Adv}^{\Phi \mathrm {A}}_{c,D}(k) \,=\,\Pr \left[ \, D({\mathcal {R}}_1) \,{\Rightarrow }\,1 \,\right] - \Pr \left[ \, D({\mathcal {L}}_1) \,{\Rightarrow }\,1 \,\right] . \end{aligned}$$

As shown in [16], distributions \({\mathcal {R}}_1, {\mathcal {L}}_1\) can be sampled efficiently assuming the widely accepted Extended Riemann Hypothesis (as we need a density estimate on the number of primes of a particular form).Footnote 9

RSA LTDP from \(\Phi \)

A. We construct an RSA LTDP based on \(\Phi \)A. In injective mode the public key is (Ne) where e is not \(\phi \)-hidden by N, whereas in lossy mode it is. Namely, define \(\mathsf {LTDP}_1 = ({\mathcal {F}}_1, {\mathcal {F}}'_1)\) as follows:

figure i

The fact that algorithm \({\mathcal {F}}_1\) has only a very small probability of failure (returning \(\bot \)) follows from the fact that \(\phi (N)\) can have only a constant number of prime factors of length ck and Bertrand’s Postulate.

Proposition 5.1

Suppose there is a distinguisher D against \(\mathsf {LTDP}_1\). Then there is a distinguisher \(D'\) such that for all \(k\in {\mathbb {N}}\)

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP}_1,D}(k) \,\le \,\mathbf {Adv}^{\Phi \mathrm {A}}_{c,D'}(k). \end{aligned}$$

Furthermore, the running-time of \(D'\) is that of D. \(\mathsf {LTDP}_1\) has lossiness ck.

The proof is straightforward.

From a practical perspective, a drawback of \(\mathsf {LTDP}_1\) is that \({\mathcal {F}}_1\) chooses \(N = pq\) in a non-standard way, so that it hides a prime of the same length as e. Moreover, for small values of e it returns \(\bot \) with high probability. This is done for consistency with how [16] formulated \(\Phi \)A. But, to address this, we also propose what we call the Enhanced \(\Phi \)A (E\(\Phi \)A), which says that N generated in the non-standard way (i.e., by \({\mathcal {F}}_1\)) is indistinguishable from one chosen at random subject to \(\gcd (e,\phi (N)) = 1\).Footnote 10 We conjecture that E\(\Phi \)A holds for all values of c that \(\Phi \)A does. Details follow.

Enhanced \(\Phi \)

-Hiding Assumption. We say that the Enhanced \(\Phi \) -Hiding Assumption (E\(\Phi \)A) holds for c if the following two distributions \({\mathcal {R}}_{1^*}\) and \({\mathcal {L}}_{1^*}\) are computationally indistinguishable:

To a distinguisher D, we associate its E \(\Phi \) A advantage defined as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {E}\Phi \mathrm {A}}_{c,D}(k) \,=\,\Pr \left[ \, D({\mathcal {R}}_{1^*}) \,{\Rightarrow }\,1 \,\right] - \Pr \left[ \, D({\mathcal {L}}_{1^*}) \,{\Rightarrow }\,1 \,\right] . \end{aligned}$$

As before, distributions \({\mathcal {R}}_{1^*}, {\mathcal {L}}_{1^*}\) can be sampled efficiently assuming the widely accepted Extended Riemann Hypothesis. We conjecture that E\(\Phi \)A holds for all values of \({\mathcal {K}}_\phi , c\) that \(\Phi \)A does.

RSA LTDP from E \(\Phi \)

A. Now define \(\mathsf {LTDP}_{1^*} = ({\mathcal {F}}_{1^*}, {\mathcal {F}}'_{1^*})\) where

figure j

and \({\mathcal {F}}'_{1^*} = {\mathcal {F}}'_1\) in Sect. 5.2. Again we have the probability that \({\mathcal {F}}_{1^*}\) returns \(\bot \) is very small. We stress that \({\mathcal {F}}_{1^*}\), unlike \({\mathcal {F}}_1\), chooses pq at random as is typical in practice. We have the following proposition.

Proposition 5.2

If the Enhanced \(\Phi \)-Hiding Assumption holds for c, then \(\mathsf {LTDP}_{1^*}=({\mathcal {F}}_{1^*}, {\mathcal {F}}'_{1^*})\) is an RSA LTDP with lossiness ck. In particular, suppose there is a distinguisher D against \(\mathsf {LTDP}_{1^*}\). Then there is a distinguisher \(D'\) such that

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP}_{1^*},D}(k) \,\le \,\mathbf {Adv}^{\mathrm {E}\Phi \mathrm {A}}_{c,D'}(k). \end{aligned}$$

Furthermore, the running-time of \(D'\) is that of D.

Again, the proof is straightforward.

Parameters for \(\mathsf {LTDP}_1\). When e is too large, \(\Phi \)A can be broken by using Coppersmith’s method for finding small roots of a univariate modulo an unknown divisor of N [21, 43]. Namely, consider the polynomial \(r(x) = e x + 1 \bmod p\). Coppersmith’s method allows us to find all roots of r smaller than \(N^{1/4}\), and thus factor N, in lossy mode in polynomial time if \(c \ge 1/4\). (This is essentially the “factoring with high bits known" attack.) More specifically, applying [43, Theorem1], N can be factored in time \(\mathrm{poly}(\log N)\) and \(O(N^\varepsilon )\) if \(c = 1/4 - \varepsilon \) (i.e., \(\log e \ge \log N(1/4-\varepsilon )\)). For example, with modulus size \(k = 2048\), we can set \(\varepsilon = .04\) for 80-bit security (to enforce \(k \varepsilon \ge 80\)) and obtain \(2048 (1/4-0.04)=430\) bits of lossiness.

5.3 RSA Lossy TDP from Multi-prime \(\Phi \)-Hiding

Multi-prime RSA (according to [41] the earliest reference is [54]) is a generalization of RSA to moduli \(N = p_1 \cdots p_m\) of length k with \(m \ge 2\) prime factors of equal bit-length. Multi-prime RSA is of interest to practitioners since it allows to speed up decryption and is included in RSA PKCS #1 v2.1. We are interested in it here because for it we can show greater lossiness, in particular with smaller encryption exponent e.

Notation and terminology. Let \(m \ge 2\) be fixed. We denote by \({\mathcal {MRSA}}_k\) the set of all tuples \((N,p_1,\ldots ,p_m)\), where \(N=p_1\cdots p_m\) is the product of distinct k / m-bit primes. Such an N is called an m -prime RSA modulus. By we mean that \((N,p_1,\ldots ,p_m)\) is sampled according to the uniform distribution on \({\mathcal {MRSA}}_k\). The rest of the notation and terminology of Sect. 5 is extended to the multi-prime setting in the obvious way.

Multi \(\Phi \)

-hiding assumption. For an m-prime RSA modulus N , let us say that N \(m \phi \) -hides a prime e if \(e~|~p_i-1\) for all \(1 \le i \le m-1\). Intuitively, the assumption is that, given such N, it is hard to distinguish primes which are \(m \phi \)-hidden by N from those that do not divide \(p_i-1\) for any \(1 \le i \le m\). Formally, let \(m = m(k) \ge 2\) be a polynomial and let \(c = c(k)\) be an inverse polynomial determined later. Consider the following two distributions:

Above and in what follows, by \(p_{i \le m-1} = 1 \bmod e\) we mean that \(p_i = 1 \bmod e\) for all \(1 \le i \le m-1\). To a distinguisher D, we associate its M \(\Phi \) A advantage defined as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {M}\Phi \mathrm {A}}_{m,c,D}(k) \,=\,\Pr \left[ \, D({\mathcal {R}}_2) \,{\Rightarrow }\,1 \,\right] - \Pr \left[ \, D({\mathcal {L}}_2) \,{\Rightarrow }\,1 \,\right] . \end{aligned}$$

As before, distributions \({\mathcal {R}}_2, {\mathcal {L}}_2\) can be sampled efficiently assuming the widely accepted Extended Riemann Hypothesis.

Note that if we had required that in the lossy case \(N = p_1 \cdots p_m\) is such that \(e~|~p_i\) for all \(1 \le i \le m\), then in this case we would always have \(N = 1 \bmod e\). But in the injective case \(N \bmod e\) is random, which would lead to a trivial distinguishing algorithm. This explains why we do not impose \(e~|~p_m\) in the lossy case above.

Multi-prime RSA LTDP from M \(\Phi \)

A. We construct a multi-prime RSA LTDP based on M\(\Phi \)A having lossiness \((m-1) \log e\), where in lossy mode N \(m\phi \)-hides e. Namely, define \(\mathsf {LTDP}_2 = ({\mathcal {F}}_2, {\mathcal {F}}'_2)\) as follows:

figure k

Proposition 5.3

Suppose there is a distinguisher D against \(\mathsf {LTDP}_2\). Then there is a distinguisher \(D'\) such that for all \(k \in {\mathbb {N}}\)

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP}_2,D}(k) \,\le \,\mathbf {Adv}^{\mathrm {M}\Phi \mathrm {A}}_{m,c,D'}(k). \end{aligned}$$

Furthermore, the running-time of \(D'\) is that of D. \(\mathsf {LTDP}_2\) has lossiness \((m-1)ck\).

The proof is straightforward.

Parameters for \(\mathsf {LTDP}_2\). Using [35, Section 3] we can break the M\(\Phi \)A in time \(\mathrm{poly}(\log N)\) and \(O(N^{\varepsilon })\) if

$$\begin{aligned} c \ge 1/m - \frac{2}{3 \sqrt{m^3}} -\varepsilon . \end{aligned}$$

For \(m \ge 3\) this improves the bound with \(c \ge 1/m-1/m^2-\varepsilon \) obtained from “factoring with high bits known"; for \(m\ge 4\) this improves the bound with \(c \ge 1/m- 2\frac{ (1/m)^{(1/(m-1)} - (1/m)^{m/(m-1)}}{m(m-1)}-\varepsilon \) from the preliminary version [37]. We also note that Tosu and Kunihiro [60] showed a bound with \(c \ge 1/m - \frac{2}{em \log (m+1)}\) where e is the base of the natural logarithm, which is better than [35] for \(m \ge 6\) (see [60, Section4.4] for comparison).

For example, with modulus size \(k=2048\) and \(m=3\) (\(m=4,5\)) we set \(\varepsilon = .04\) (for about 80-bit security) and obtain 676 (778, 822) bits of lossiness for \(\mathsf {LTDP}_2\), according to Proposition 5.3.

5.4 Small-Exponent RSA LTDP from 2-vs-m Primes

For efficiency reasons, the public RSA exponent e is typically not chosen to be too large in practice. (For example, researchers at UC San Diego [63] found that 99.5 % of the certificates in the campus’s TLS corpus had \(e = 2^{16} +1\).) Therefore, we investigate the possibility of using an additional assumption to “amplify” the lossiness of RSA for small e.

Our high-level idea is to assume that it is hard to distinguish \(N = pq\) where pq are primes of length k / 2 from \(N = p_1 \cdots p_m\) for \(m > 2\), where \(p_1, \ldots , p_m\) are primes of length k / m (which we call the “2-vs-m Primes” Assumption). This assumption is a generalization of the “2-vs-3 Primes” Assumptions introduced in [8] and used independently to construct a “slightly lossy” TDF based on modular squaring [45]. Combined with the M\(\Phi \)A Assumption of Sect. 5.3, we obtain \((m-1) \log e\) bits of lossiness from standard (two-prime) RSA. Let us state our assumption and construction formally.

2-vs- m

Primes Assumption. We say that the 2-vs- m primes assumption holds for m if the following two distributions \({\mathcal {N}}_2\) and \({\mathcal {N}}_m\) are computationally indistinguishable:

To a distinguisher D, we associate its HFA-advantage defined as

$$\begin{aligned} \mathbf {Adv}^{\mathrm {2vmp}}_m(D) \,=\,\Pr \left[ \, D({\mathcal {N}}_2) \,{\Rightarrow }\,1 \,\right] - \Pr \left[ \, D({\mathcal {N}}_m) \,{\Rightarrow }\,1 \,\right] . \end{aligned}$$

RSA LTDP from 2-vs- m

Primes + M \(\Phi \)

A. Define \(\mathsf {LTDP}_3 = ({\mathcal {F}}_3, {\mathcal {F}}'_3)\) as follows:

figure l

Proposition 5.4

If the 2-vs-m Primes Assumption holds for m and the Multi-Prime \(\Phi \)-Hiding Assumption holds for me, then \(\mathsf {LTDP}_3=({\mathcal {F}}_3, {\mathcal {F}}'_3)\) is an RSA LTDP with lossiness \((m-1)ck\). In particular, suppose there is a distinguisher D against \(\mathsf {LTDP}_3\). Then there is a distinguisher \(D_1, D_2\) such that

$$\begin{aligned} \mathbf {Adv}^{\mathrm {ltdp}}_{\mathsf {LTDP}_3}(D) \,\le \,\mathbf {Adv}^{\mathrm {2vmp}}_m(D_1) + \mathbf {Adv}^{\mathrm {M}\Phi \mathrm {A}}_{m,c}(D_2). \end{aligned}$$

Furthermore, the running-time of \(D_1,D_2\) is that of D.

Again, the proof is a straightforward.

Parameters for \(\mathsf {LTDP}_3\). We note that m in the construction cannot be too large; otherwise, a small factor of N in the lossy case can be recovered by the elliptic curve factoring method due to Lenstra [41], whose running-time is proportional to the smallest factor of N. The largest factor recovered by the method so far was 223-bits in length [64]. Thus, for example using 2048-bit RSA with \(e = 2^{16} -1\), if we assume it is hard to recover factors larger than that we can get \(8 \cdot 16 = 128\) bits of lossiness under the HFA plus M\(\Phi \)A where \(m = 9\).

Enhanced HFA. As in the previous cases, to address the fact that in practice \(N = pq\) is chosen at random and not subject to p hiding a prime of the same bit-length as e, we may define an enhanced version of HFA. Then under the enhanced HFA + enhanced M\(\Phi \)A assumptions, we obtain the same amount of lossiness for standard 2-prime RSA.

6 Instantiating RSA-OAEP

By combining the results of Sects. 34, and 5, we obtain standard model instantiations of RSA-OAEP under chosen-plaintext attack.

Regularity. In particular, we would like to apply part (2) of Theorem 4.2 in this case, as it is not hard to see that under all of the assumptions discussed in Sect. 5, RSA is a regular lossy TDP on the domain \({{\mathbb {Z}}}_N^*\). Unfortunately, this is different from \(\{0,1\}^{\rho + \mu }\) (identified as integers), the range of OAEP. In RSA PKCS #1 v2.1, the mismatch is handled by selecting \(\rho +\mu = \lfloor \log N\rfloor - 16\), and viewing OAEP’s output as an integer less than \(2^{\rho +\mu }<N/2^{16}\) (i.e., the most significant two bytes of the output are zeroed out). The problem is that in the lossy case RSA may not be regular on the subdomain \(\{0,\ldots ,2^{\rho +\mu } \}\) (although this has been proven in subsequent work; see below). So, we just detail the weaker parameters given by part (1) of Theorem 4.2 here.

Concrete parameters. Since the results in Sect. 5 have several cases and the parameter settings are rather involved, we avoid stating an explicit theorem about RSA-OAEP. If we use part (1) of Theorem 4.2, one can see that for \(u = 80\) bits security, messages of roughly \(\mu \approx k-s -3\cdot 80\) bits can be encrypted (for sufficiently large t). For concreteness, we give two example parameter settings. Using the Multi \(\Phi \)-Hiding Assumption with \(k=1024\) bits and 3 primes, we obtain \(\ell =k-s=291\) bits of lossiness and hence can encrypt messages of length \(\mu = 40\) bits (for \(t \approx 400\)). Using the \(\Phi \)-Hiding Assumption with \(k=2048\), we obtain \(\ell =k-s=430\) bits of lossiness and hence can encrypt messages of length \(\mu = 160\) bits (for \(t \approx 150\)).

Subsequent improvements. The approximately regularity of RSA on the above subdomain (and, more generally, on arithmetic progressions of sufficient length) has subsequently been shown by Lewko et al. [42]. This allows us to obtain essentially the better parameters given by part (2) of Theorem 4.2. For example, using the \(\Phi \)-Hiding Assumption with \(k=2048\), we can encrypt messages of length 274 bits (see [42, Section5.3]).