Keywords

1 Introduction

Motivated from DES block cipher design, Luby and Rackoff [LR88] formally analyzed a paradigm of constructing a pseudorandom permutation (PRP) from a pseudorandom function (PRF). However, the opposite trend is more popular due to wide availability of block ciphers (modeled to be pseudorandom permutations). So pseudorandom functions are traditionally built upon block ciphers. A straightforward application of the classical PRP-PRF switch [Sho04] gives security up to the birthday bound. However, in view of lightweight block ciphers [BPP+17, BKL+07] this bound may not be suitable. For example, a birthday bound secure PRF construction based on DES (64-bit block cipher) may be broken in approximately \(2^{32}\) bits of data. In fact, Bhargavan and Leurent [BL16] performed practical attacks on TLS and OpenVPN when a 64-bit block cipher is used. To resist such attacks, several beyond birthday bound secure constructions have been proposed. This includes popular constructions such as sum of permutations (or SoP in short) [HWKS98, Pat08, DHT17, BN18b], truncation of permutation [HWKS98, BN18a], EDM type constructions [CS16, CS18], Sum-ECBC [Yas10], Pmac_Plus [Yas11], 3Kf9 [ZWSW12], DbHtS [DDNP18] and 1kPmac_Plus [DDN+17a].

Apart from block cipher, the recent trend of using ideal (unkeyed) permutation has also motivated several pseudorandom functions from ideal permutation. Sponge-based PRF [BDPVA11b, CDH+12, BDPVA11a, ADMVA15] and Farfalle [BDH+17] are two such examples of PRF from ideal permutations. Recently, Chen et al. in Crypto 2019 [CLM19] considered permutation versions of SoP and EDM-dual. Depending on the choice of the keys and the permutation, some of the constructions provide birthday bound security, while some achieve beyond the birthday bound. They have also claimed tight security by showing some matching attacks.

1.1 Some Beyond Birthday Bound Constructions

Most of the constructions mentioned above are sequential in nature. Some of these constructions can be viewed as composition of two simpler constructions. For a permutation \(\pi \), we denote \(\pi (x) \oplus x\) as \(\pi ^{\oplus }(x)\) (this is known as Davies-Meyer function which has been used to define hash functions in case of public permutation). Let \(\pi _1\) and \(\pi _2\) be two independent keyed random permutations over \(\{0,1\}^n\).

EDM and Its Dual. For a message \(m \in \{0,1\}^n\), we define

$$\begin{aligned} \mathsf {EDM}(m)&= \pi _2(\pi _1^{\oplus }(m)) \end{aligned}$$
(1)

In other words, EDM (encrypted Davies-Meyer) is a composition function \(\pi _2 \circ \pi _1^{\oplus }\). Here \(\pi _1\) and \(\pi _2\) are two independently keyed block ciphers (or random permutations). Dual version of EDM (denoted as EDMD) is defined as the composition in the other direction:

$$\begin{aligned} \mathsf {EDMD}(m)&=\pi _1^{\oplus } \big ( \pi _2 (m) \big ). \end{aligned}$$

In [CS16, CS18] it has been proved that EDM is PRF secure up to \(2^{2n/3}\) queries (i.e. 2n/3-bit secure). Later in Crypto 2017 [DHT17], security of EDM is shown to be at least 3n/4-bit using \(\chi ^2\)-method. Independently, Mennink and Neves in [MN17] proved that EDM and EDMD have n-bit PRF security using the generalized version of Patarin’s mirror theory [Pat08]. However, the proofs of mirror theory are extremely sketchy and contain several unverified gaps.

EWCDM and Its Dual. The previous constructions can only process n-bit message. With the help of universal hash \(\mathcal {H}\), one can extend the message space, using the Wegman Carter paradigm [WC81]. We now recall the construction EWCDM [CS16] and its dual version EWCDMD [MN17] (see Fig. 1). For a nonce (which should be fresh for every execution of MAC) \(\nu \in \{0,1\}^n\) and a message \(m \in \mathcal {M}\), we define

$$\begin{aligned} \mathsf {EWCDM}(\nu , m)&= \pi _2(\pi _1^{\oplus }(\nu ) \oplus \mathcal {H}(m)) \end{aligned}$$
(2)
$$\begin{aligned} \mathsf {EWCDMD}(\nu , m)&= \pi _2^{\oplus }(\pi _1(\nu ) \oplus \mathcal {H}(m)) \end{aligned}$$
(3)
Fig. 1.
figure 1

EWCDMD: Wegman-Carter followed by Davies-Meyer.

In [CS16], Cogliati and Seurin proved 2n/3-bit PRF (pseudorandom function) and MAC (message authentication) security for EWCDM in a nonce respecting model.

SoKAC21. So far we have considered constructions based on secret keyed primitives. Very recently, Chen et al. in CRYPTO 2019 [CLM19] proposed a pseudorandom function, called SoKAC21 (see Fig. 2), based on ideal public permutations. It is designed for small message space and claimed to be achieving beyond birthday bound security. For an n-bit message m, and two ideal permutations \(\pi ^{\textsf {pub}}_1, \pi ^{\textsf {pub}}_2\), and an n-bit secret key K, we define

$$\begin{aligned} \mathsf {SoKAC21}(K, m)&= \pi ^{\textsf {pub}}_2\big (\pi ^{\textsf {pub}}_1(m \oplus K) \oplus K \big ) \oplus \pi ^{\textsf {pub}}_1(m \oplus K) \oplus K \end{aligned}$$
(4)
Fig. 2.
figure 2

SoKAC21 - Sum of Key Alternating Cipher with a single key.

This construction can be viewed as a composition of Even Mansour followed by Davies-Meyer. We note that an equivalent view (due to which it is named sum of key alternating cipher) of the above construction is \(\pi _2(v \oplus K) \oplus \pi _1(m \oplus K) \oplus K\) where \(v =\pi _1(m \oplus K)\).

1.2 Composition Constructions and Our Contribution

All the constructions mentioned in the previous subsection can be viewed as composition of ideal primitives or some functions derived from ideal primitives.

Public and Secret Ideal Primitives. Let and denote n-bit random function and random permutation respectively. A random function or permutation is called public if adversary has direct access to these primitives or their inverses whenever exist, in addition with concerned constructions based on these primitives. In this case we call the adversarial model ideal function or ideal permutation model. We denote the public random function and permutation as \(\gamma ^{\textsf {pub}}\) and \(\pi ^{\textsf {pub}}\) respectively.

When the ideal primitives are secret (i.e. cannot accessed directly by an adversary), we denote them as \(\gamma ^{\textsf {sec}}\) and \(\pi ^{\textsf {sec}}\). Note that secret primitives appears when a keyed function (e.g. a keyed compression function) or a keyed permutation (e.g., a block cipher) is replaced by the ideal counterpart through hybrid argument.

We use subscript notation to denote independent copies of the primitives. For example, \(\pi _1, \pi _2\) are two independent random permutations (either secret or public which would be understood from the superscript notation).

Our Contribution. In this paper, we first analyze the PRF or PRP constructions \(g \circ f\) where

$$ f,\ g \in \{\gamma ^{\textsf {pub}}, \gamma ^{\textsf {sec}}, \pi ^{\textsf {sec}}\}.$$

Due to a trivial reasonFootnote 1 we exclude \(\pi ^{\textsf {pub}}\). Moreover, we must assume that at least one of the functions is secret. In this paper, we show birthday bound PRF attack on (1) \(\gamma ^{\textsf {sec}}_2 \circ \gamma ^{\textsf {sec}}_1\) and (2) \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\). The idea behind the attacks for these constructions are simple. For \(\gamma ^{\textsf {sec}}_2 \circ \gamma ^{\textsf {sec}}_1\) we expect more collisions than perfect random function. In other words, we have higher probability of realizing collision on \(\gamma ^{\textsf {sec}}_2 \circ \gamma ^{\textsf {sec}}_1\) than that of \(\gamma ^{\textsf {sec}}\). For the second construction, we observe the outputs of public function \(\gamma ^{\textsf {pub}}\) and outputs of \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\) (or \(\gamma ^{\textsf {sec}}\) in case of ideal oracle). We show that the probability of collision between these two lists is higher in case of the real world than the ideal world. In the real construction, collision can happen in two ways – (1) an output of \(\pi ^{\textsf {sec}}\) collides with an input of public function call \(\gamma ^{\textsf {pub}}\), (2) accidental collision (which happens in the final outputs without having collision among inputs).

Birthday Attack on EWCDMD. We exploit the attack idea of \(\gamma ^{\textsf {sec}}_2 \circ \gamma ^{\textsf {sec}}_1\) to describe a PRF attack against EWCDMD in query complexity \(2^{n/2}\). In an early version of CRYPTO 2017Footnote 2, Mennink and Neves [MN17] showed almost n-bit PRF security for EWCDMD. So our result invalidates the initial claim of the construction.

The main idea of the attack is simple. EWCDMD can be viewed as a composition of two keyed non-injective functions (and so it follows birthday paradox), namely \(\pi ^{\oplus }_2\) and a function f mapping \((\nu , m)\) to \(\pi _1(\nu ) \oplus \mathcal {H}(m)\). Thus, we expect that the collision probability of the composition \(\pi _2^{\oplus } \circ f\) is almost double of the collision probability for the random function. So, by observing a collision we can distinguish EWCDMD from a random function. Note that EWCDM is a composition of a permutation and a non-injective keyed function. Hence our observation is not applicable to it.

Birthday Attack on SoKAC21. Similarly, we exploit the attack idea of \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\) to have birthday bound PRF attack on SoKAC21. In this construction we have \(\pi _2^{\oplus }\) instead of public random function. However, with a careful analysis (and using the recent result on sum of permutation) we can have birthday attack on SoKAC21. This again violates the beyond birthday security claimed in [CLM19].

2 Preliminaries

Notation. For \( n \in \mathbb {N}\), [n] denotes the set \( \{1,2,\ldots ,n\} \). For \( n,k \in \mathbb {N}\), such that \( n \ge k \), we define the falling factorial \( (n)_k := n!/(n-k)! = n(n-1)\cdots (n-k+1)\). For \( a \in \mathbb {N}\), an a-tuple \( (x_1,x_2,\ldots ,x_a) \) and also a multi-set \(\{x_1, \ldots , x_a\} \) is simply denoted as \(x^a\) (this should be clear from the context). For any set \( \mathcal {X}\), \( (\mathcal {X})_a \) denotes the set of all \(x^a\) so that \(x_1, \ldots , x_a\) are distinct. We call all those \(x^a\) element-wise distinct. Note, \(|(\mathcal {X})_q| =(|\mathcal {X}|)_q\).

The set of all functions from \(\mathcal {X}\) to \(\mathcal {Y}\) is denoted as \(\textsf {Func}(\mathcal {X}, \mathcal {Y})\) and the set of all permutations over \(\mathcal {X}\) is denoted as \(\textsf {Perm}(\mathcal {X})\). We use shorthand notations \( \mathsf {Perm}(n) \) (or \(\mathsf {Func}(n)\)) to denote the set of all permutations (or functions respectively) from \( \{0,1\}^n \) to itself.

For a finite set \( \mathcal {X}\), denotes the uniform and random sampling of \( \textsf {X}\) from \( \mathcal {X}\). We write when \(\textsf {X}_i\)’s are chosen uniformly and independently from the set \(\mathcal {D}\). In other words, \(\textsf {X}_1, \ldots , \textsf {X}_a\) is a random with replacement sample. We write when \(\textsf {X}_i\)’s are chosen randomly from \(\mathcal {D}\) in without replacement manner. More precisely, for all element-wise distinct \(x^a \in (\mathcal {D})_a\),

$$\textsf {Pr}(\textsf {X}_1 =x_1, \ldots , \textsf {X}_a =x_a) =\frac{1}{(|\mathcal {D}|)_a}.$$

2.1 Statistical Distance

Let \(\textsf {X}, \textsf {Y}\) be two random variables over a sample space \(\mathcal {S}\). Then the statistical distance between \(\textsf {X}\) and \(\textsf {Y}\) is defined as

$$\mathsf {D}(\textsf {X}, \textsf {Y}) :=\frac{1}{2} \sum _{a \in \mathcal {S}} |\textsf {Pr}(\textsf {X}=a) - \textsf {Pr}(\textsf {Y}=a)|.$$

An equivalent definition of statistical distance is the following:

$$\mathsf {D}(\textsf {X}, \textsf {Y}) =\max _{E \subseteq \mathcal {S}} |\textsf {Pr}(\textsf {X}\in E) - \textsf {Pr}(\textsf {Y}\in E)|.$$

To see why it is an equivalent definition, we first note that the maximization holds for \(E_1 =\{a \in \mathcal {S}: \textsf {Pr}(\textsf {X}=a) > \textsf {Pr}(\textsf {Y}=a) \}\). From the definition of \(E_1\), we can write the sum \(\sum _{a \in \mathcal {S}} |\textsf {Pr}(\textsf {X}=a) - \textsf {Pr}(\textsf {Y}=a)|\) (after splitting over \(E_1\) and \(E^c_1\)) as

$$\begin{aligned}&\sum _{a \in E_1} (\textsf {Pr}(\textsf {X}=a) - \textsf {Pr}(\textsf {Y}=a)) +\sum _{a \in E_1^c} \textsf {Pr}(\textsf {Y}=a) - \textsf {Pr}(\textsf {X}=a) \\&=\textsf {Pr}(\textsf {X}\in E_1) - \textsf {Pr}(\textsf {Y}\in E_1) +\textsf {Pr}(\textsf {Y}\in E_1^c) - \textsf {Pr}(\textsf {X}\in E_1^c) \\&=2\big (\textsf {Pr}(\textsf {X}\in E_1) - \textsf {Pr}(\textsf {Y}\in E_1)\big ). \end{aligned}$$

Thus we have established the equivalence.

Lemma 1

(replacement lemma). Let \(\textsf {X}, \textsf {Y}\) be two random variables over a sample space \(\mathcal {S}\) and \(\textsf {Z}\) be independent with \(\textsf {X}\) and \(\textsf {Y}\) sampled from \(\mathcal {T}\). Let \(E \subseteq \mathcal {S}\times \mathcal {T}\) then

$$\begin{aligned} |\textsf {Pr}((\textsf {X}, \textsf {Z}) \in E) - \textsf {Pr}((\textsf {Y}, \textsf {Z}) \in E)| \le \mathsf {D}(\textsf {X}, \textsf {Y}). \end{aligned}$$
(5)

Proof

For every z, let \(E_z =\{ s \in \mathcal {S}: (s,z) \in E \}\). Then by independence, we have

  1. 1.

    \(p_1 :=\textsf {Pr}((\textsf {X}, \textsf {Z}) \in E) =\sum _z \textsf {Pr}(\textsf {Z}=z) \cdot \textsf {Pr}(\textsf {X}\in E_z)\) and similarly,

  2. 2.

    \(p_2 :=\textsf {Pr}((\textsf {Y}, \textsf {Z}) \in E) =\sum _z \textsf {Pr}(\textsf {Z}=z) \cdot \textsf {Pr}(\textsf {Y}\in E_z)\).

Hence,

$$\begin{aligned} |p_1 -p_2|&=|\sum _z \textsf {Pr}(\textsf {Z}=z) \cdot \textsf {Pr}(\textsf {X}\in E_z) - \sum _z \textsf {Pr}(\textsf {Z}=z) \cdot \textsf {Pr}(\textsf {Y}\in E_z)|\\&\le \sum _z \textsf {Pr}(\textsf {Z}=z) \cdot |\textsf {Pr}(\textsf {X}\in E_z) - \textsf {Pr}(\textsf {Y}\in E_z)| \\&\le \sum _z \textsf {Pr}(\textsf {Z}=z) \cdot \mathsf {D}(\textsf {X}, \textsf {Y}) \\&=\mathsf {D}(\textsf {X}, \textsf {Y}) \end{aligned}$$

2.2 Sum of Without Replacement Samples

Let \(\mathcal {D}\) be a set of size N. In [DHT17] it has been proved that sum of two independent without replacement sample almost behaves like one with replacement sample. More precisely, let , , and \(\textsf {X}^a\), \(\textsf {Y}^a\) are independent. Define \(\textsf {W}_i =\textsf {X}_i \oplus \textsf {Y}_i\) for all \(i \in [a]\). Then, in [DHT17] it is shownFootnote 3 that

$$\begin{aligned} \mathsf {D}(\textsf {Z}^a, \textsf {W}^a ) \le \frac{4a}{N}. \end{aligned}$$
(6)

Due to Lemma 1, we can simply replace sum of random without replacement sample involved in an event by the random sample at the cost of probability 4a/N. We use this idea of replacement while we analyze SoKAC21 construction.

2.3 Security Definitions

Random Function and Random Permutation. is said to be the random function from the set \(\mathcal {X}\) to \(\mathcal {Y}\). Similarly, is said to be the random permutation over the set \(\mathcal {Y}\). In this paper we mostly use the set \(\mathcal {X}=\mathcal {Y}=\{0,1\}^n\).

Keyed Function and Permutation. A keyed function with key space \(\mathcal {K}\), domain \(\mathcal {X}\) and range \(\mathcal {Y}\) is a function \(\textsf {F} : \mathcal {K} \times \mathcal {X} \rightarrow \mathcal {Y}\) and we denote \(\textsf {F}(K, X)\) by \(\textsf {F}_{K}(X)\). Similarly, a keyed permutation with key space \(\mathcal {K}\) and domain \(\mathcal {X}\) is a mapping \(\textsf {E} : \mathcal {K} \times \mathcal {X} \rightarrow \mathcal {X}\) such that for all key \(K \in \mathcal {K}\), \(X \mapsto \textsf {E}(K, X)\) is a permutation over \(\mathcal {X}\) and we denote \(\textsf {E}_{K}(X)\) for \(\textsf {E}(K, X)\).

PRF. Given an oracle algorithm \(\mathsf {A}\) with oracle access to a function from \(\mathcal {X}\) to \(\mathcal {Y}\), making at most q queries, running time at most t and outputting a single bit, we define the prf-advantage of \(\mathsf {A}\) against the family of keyed functions \(\textsf {F}\) as

PRP. Given an oracle algorithm \(\mathsf {A}\) with oracle access to a permutation of \(\mathcal {X}\), making at most q queries, running time at most t and outputting a single bit, we define the prp-advantage of \(\mathsf {A}\) against the family of keyed permutations \(\textsf {E}\) as

PRF and PRP in Ideal Model. Some keyed constructions uses ideal public primitive such as a random function and a random permutation. Let \(P_1, \ldots , P_r\) be such all primitives used for a keyed construction \(\textsf {F}_K :=\textsf {F}^{P_1,\ldots , P_r}_K\). Let \(P_i^{\pm }\) denotes both \(P_i\) and its inverse \(P_i^{-1}\). We define PRF and PRP-advantage in the public primitive model as follows.

$$\mathbf {Adv}^{\mathrm {PRF}}_{\textsf {F}}(\mathsf {A}) := |\textsf {Pr}(\mathsf {A}^{\textsf {F}_K, P_1^{\pm }, \ldots , P_r^{\pm }} = 1) - \textsf {Pr}(\mathsf {A}^{\gamma , P_1^{\pm }, \ldots , P_r^{\pm }} = 1)|.$$

In the above two probabilities, \(K, \gamma , P_1, \ldots , P_r\) are all independently drawn. Similarly, we define PRP-advantage in public model as

$$\mathbf {Adv}^{\mathrm {PRP}}_{\textsf {F}}(\mathsf {A}) := |\textsf {Pr}(\mathsf {A}^{\textsf {F}_K, P_1^{\pm }, \ldots , P_r^{\pm }} = 1) - \textsf {Pr}(\mathsf {A}^{\pi , P_1^{\pm }, \ldots , P_r^{\pm }} = 1)|.$$

Almost XOR Universal Hash Function. A keyed hash function \(\mathcal {H}_K: \mathcal {D}\rightarrow \mathcal {R}\) is called \(\epsilon \)-AXU (almost xor universal) if \(\textsf {Pr}(\mathcal {H}_K(m) \oplus \mathcal {H}_K(m') =\delta ) \le \epsilon \) for all \(m \ne m'\) and for all \(\delta \). Here the probability is computed under randomness of the key chosen uniformly from the key space.

3 Collision Probability

Let \(\mathcal {D}\) be a set of size N. We quickly recall collision probability for a uniform random sample . For any positive integers \(a \le N\), we write \( \textsf {dp}_N(a) :=\frac{(N)_a}{N^a}\) and \(\textsf {cp}_N(a) :=1 - \textsf {dp}_N(a)\). When N is understood from the context, we skip the notation N. If a is very small compared to N (i.e. \(a/N \approx 0\)), a precise estimation of \(\textsf {dp}_N(a)\) is \(e^{-a(a-1)/2N}\). This follows from the approximation \(1- \epsilon \approx e^{-\epsilon }\) for very small positive \(\epsilon \). In fact the error term \(|e^{-\epsilon } - (1 - \epsilon )|\) is in the order \(O(\epsilon ^2)\).

Given a list \(\mathcal {L}\) of elements \(x_1, \ldots , x_a\), we write \(\textsf {Dist}(\mathcal {L})\) if \(x_i\)’s are distinct. Otherwise, we write \(\textsf {Coll}(\mathcal {L})\).

Lemma 2

(collision probability). Let \(\mathcal {D}\) be a set of size N. Let and let \(\mathcal {L}\) denote the list containing \(\textsf {X}_i\)’s, \(1 \le i \le a\). Then,

  1. 1.

    \(\textsf {Pr}(\textsf {Dist}(\mathcal {L})) =\textsf {dp}_N(a)\).

  2. 2.

    \(\textsf {Pr}(\textsf {Coll}(\mathcal {L})) =\textsf {cp}_N(a) \le a^2/2N\).

We skip the proof as it is straightforward conclusion from the definition. The second statement follows from the union bound.

Now we compute probability for having a collision between two lists. We say that there is a collision between two lists, denoted as \(\textsf {LColl}(\mathcal {L}_1, \mathcal {L}_2)\) if the lists are not disjoint.

Lemma 3

(list-collision probability for without replacement sample). Let and such that \(\textsf {X}^p\) and \(\textsf {Y}^q\) are independent. Then,

$$\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q)) =1 - \frac{(N-p)_q}{(N)_q}$$

Proof

We compute the complement event, i.e., \(\textsf {X}^p\) and \(\textsf {Y}^q\) are disjoint. The conditional probability of the complement event conditioning on \(\textsf {X}^p =x^p\) is \(\frac{(N-p)_q}{(N)_q}\). This can be easily seen as the number of choices of \(\textsf {Y}^q\) is exactly \((N-p)_q\). As the conditional probability is independent of choice of \(x^p\), the unconditional probability is also same as \(\frac{(N-p)_q}{(N)_q}\). This completes the proof.    \(\square \)

We denote the probability \(1 - \frac{(N-p)_q}{(N)_q}\) as \(\textsf {lcp}_N^{wor}(p, q)\) (or simply \(\textsf {lcp}^{wor}(p,q)\) whenever N is understood from the context).

When \(\mathcal {L}_1 :=\textsf {X}^p\) and \(\mathcal {L}_2 :=\textsf {Y}^q\), where , we denote the list-collision probability \(\textsf {Pr}(\textsf {LColl}(\mathcal {L}_1, \mathcal {L}_2))\) as \(\textsf {lcp}_N^{\$}(p, q)\) (or simply \(\textsf {lcp}^{\$}(p,q)\) whenever N is understood from the context). Here \(\mathcal {D}\) is a set of size N.

Lemma 4

(list-collision probability for random samples). For all positive integers pq, we have

$$\begin{aligned} |\textsf {lcp}^{\$}_N(p, q) - 1 +\big (1- \frac{q}{N}\big )^p| \le 2\textsf {cp}_N(p). \end{aligned}$$
(7)

(When p is small compared to \(\sqrt{N}\), the collision probability \(\textsf {cp}_N(p)\) is almost zero and in that case, the above result says that \(1 - \big (1- \frac{p}{N}\big )^q\) is a very good approximation of \(\textsf {lcp}^{\$}_N(p, q)\).)

Proof

Let and E denote the event \(\textsf {Dist}(\textsf {X}^p)\). So \(\textsf {Pr}(E) =\textsf {dp}_N(p)\). Fix any distinct \(x^p\). Then, the list collision \(\textsf {LColl}(x^p, \textsf {Y}^q)\) holds with probability \(1 - (1 - \frac{p}{N})^q\). Now,

$$\begin{aligned} \textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q))&=\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c) \\&=\sum _{x^p \in (\mathcal {D})_p} \textsf {Pr}(\textsf {LColl}(x^p, \textsf {Y}^q) \wedge \textsf {X}^p =x^p) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c) \\&=( 1 - (1 - \frac{p}{N})^q) \times \sum _{x^p \in (\mathcal {D})_p} \textsf {Pr}(\textsf {X}^p =x^p) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c) \\&=( 1 - (1 - \frac{p}{N})^q) \times \textsf {Pr}(E) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c) \\&=( 1 - (1 - \frac{p}{N})^q) \times (1 - \textsf {Pr}(E^c) ) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c) \end{aligned}$$

Note that in our notation, \(\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q)) =\textsf {lcp}^{\$}_N(p, q)\). Hence,

$$\begin{aligned} |\textsf {lcp}^{\$}_N(p, q) - 1 +\big (1- \frac{q}{N}\big )^p|&=| ( 1 - (1 - \frac{p}{N})^q) \times \textsf {Pr}(E^c) +\textsf {Pr}(\textsf {LColl}(\textsf {X}^p, \textsf {Y}^q) \wedge E^c)|\\&\le 2 \cdot \textsf {Pr}(E^c). \end{aligned}$$

The lemma follows from the definition that \(\textsf {Pr}(E^c) =\textsf {cp}_N(p)\).    \(\square \)

4 Birthday Attack on Composition of Ideal Primitives

In this section, we analyze compositions of ideal primitives. We recall that and denote n-bit random function and random permutation respectively. We follow the notations described in Sect. 1.2. Here \(\equiv \) is used to mean two systems equivalent (i.e. the probabilistic behavior of interaction for any adversary would be same for both).

  1. 1.

    It is easy to verify that \(\pi ^{\textsf {sec}}\circ \gamma ^{\textsf {sec}}\equiv \gamma ^{\textsf {sec}}\circ \pi ^{\textsf {sec}}\equiv \gamma \) and \(\pi ^{\textsf {sec}}_1 \circ \pi ^{\textsf {sec}}_2 \equiv \pi \). In [MS15] \(\pi ^{\textsf {sec}}\circ \pi ^{\textsf {sec}}\) (iterated random permutation) has been analyzed and it almost behaves as \(\pi ^{\textsf {sec}}\) with a maximum distinguishing advantage \(O(q/2^n)\) where q is the number of queries. Authors of [MS15, Nan15] have actually analyzed a more general construction \(\pi ^{\textsf {sec}}\circ \cdots \circ \pi ^{\textsf {sec}}\) (applied r times).

  2. 2.

    In [BDD+17], \(\gamma ^{\textsf {sec}}\circ \gamma ^{\textsf {sec}}\) (iterated random function) has also been analyzed. This is equivalent to \(\gamma ^{\textsf {sec}}\) with a maximum distinguishing advantage \(O(q^2/2^n)\). Authors of [BDD+17] actually analyzed more general construction \(\gamma ^{\textsf {sec}}\circ \cdots \circ \gamma ^{\textsf {sec}}\) (applied r times). The main idea behind the distinguishing attack is that the collision probability of an iterated random function is more probable than that of a random function.

    Using a similar argument, we can show that \(\gamma ^{\textsf {sec}}_2 \circ \gamma ^{\textsf {sec}}_1\) can be distinguished from \(\gamma ^{\textsf {sec}}\) by making \(2^{n/2}\) queries. Let \(x_1, \ldots , x_q\) be q queries and let \(y_1, \ldots , y_q\) be the responses. In case of the real world, \(y_i =\gamma ^{\textsf {sec}}_2(z_i)\) where \(z_i =\gamma ^{\textsf {sec}}_1(x_i)\). Let \(\mu :=\textsf {cp}_{2^n}(q)\). Now,

    $$\begin{aligned} \textsf {Pr}(\textsf {Coll}(y^q))&=\textsf {Pr}(\textsf {Coll}(z^q)) +\textsf {Pr}(\textsf {Coll}(y^q)\ |\ \textsf {Dist}(z^q)) \times \textsf {Pr}(\textsf {Dist}(z^q))\\&=\mu +\mu (1-\mu ) \end{aligned}$$

    Let \(\mathcal {A}\) return 1 if it observes a collision among outputs. Thus, the distinguishing advantage of the adversary is at least \(\mu (1-\mu )\). When \(q =2^{n/2}\), \(\textsf {cp}(q) \approx 1- \frac{1}{\sqrt{e}}\) and hence advantage is \(\frac{1}{\sqrt{e}} \times (1- \frac{1}{\sqrt{e}})\) which is at least 0.2. One can also choose q (which should be again \(O(2^{n/2})\)) such that \(\mu \approx 1/2\) and hence the advantage would be about 0.25.

    Same attack can be applied to \(\gamma ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}\) and \(\gamma ^{\textsf {pub}}\circ \gamma ^{\textsf {sec}}\) as if the adversary does not take an advantage of accessing the public random function \(\gamma ^{\textsf {pub}}\).

  3. 3.

    Let us consider the construction \(\pi ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}\). An adversary \(\mathcal {A}\) first finds a collision pair \((m, m')\) of \(\gamma ^{\textsf {pub}}\) by making \(2^{n/2}\) queries to it. Then, \(\pi ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}(m) =\pi ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}(m')\). Clearly, in the ideal world, \(\gamma (m) =\gamma (m')\) holds with probability \(2^{-n}\). So \(\mathcal {A}\) is a PRF-distinguisher against \(\pi ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}\) making about \(2^{n/2}\) queries to the public random function. The same attack is also applied to \(\gamma ^{\textsf {sec}}\circ \gamma ^{\textsf {pub}}\).

  4. 4.

    Although \(\gamma ^{\textsf {sec}}\circ \pi ^{\textsf {sec}}\) is equivalent to a random function, we have the following birthday bound complexity PRF-attack on \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\) (replacing the outer layer of secret random function by public random function). Here we exploit the public access of \(\gamma ^{\textsf {pub}}\) (since otherwise it is equivalent to a random function) (Fig. 3).

    Let E denote the event that there are ij such that \(y_i =c_j\).

    Ideal World: In the ideal world we have . So

    $$\textsf {Pr}(E) =\textsf {lcp}^{\$}(p,q) =\mu \text{(say) }.$$

    Real World: In the real world, let \(z_i =\pi ^{\textsf {sec}}(i)\). So \(c_i =\gamma ^{\textsf {pub}}(z_i)\). Thus, independent of \(x^p\). Now, we write the event E as the disjoint union (denoted as \(\sqcup \))

    $$\textsf {LColl}(z^q, x^p) \ \sqcup \ \big (\lnot \textsf {LColl}(z^q, x^p) \wedge \textsf {LColl}(c^q, y^p)\big ).$$

    Given that \(z^q\) is distinct from \(x^p\), we have . Now, \(\textsf {Pr}(\textsf {LColl}(z^q, x^p) ) =\textsf {lcp}^{wor}(p,q) :=\mu _1\) (say). Then,

    $$\begin{aligned} \textsf {Pr}(E)&=\mu _1 +(1 - \mu _1) \mu . \end{aligned}$$

    So, the distinguishing advantage of our adversary is \(\mu _1(1 - \mu )\). By Lemma 3 and Lemma 4, the distinguishing advantage is at least

    $$\begin{aligned} (1 - \frac{(2^n-p)_q}{(2^n)_q}) \times \big ( (1 - \frac{p}{2^n})^q - 2\textsf {cp}_{2^n}(q) \big ). \end{aligned}$$
    (8)

    Further, we have

    $$\begin{aligned} \frac{(2^n-p)_q}{(2^n)_q}&=\prod _{i =0}^{q-1} (1 - \frac{p}{2^n-i}) \\&\le (1 - \frac{p}{2^n})^q \\&\le 1 - \frac{pq}{2^n} +\frac{pq^2}{2^{2n +1}}. \end{aligned}$$

    The last inequality follows from the following fact:

    $$(1 - x)^q \le 1 - {q \atopwithdelims ()1}x +{q \atopwithdelims ()2} x^2,\ \ 0 \le x \le 1.$$

    We also have \((1 - \frac{p}{2^n})^q \ge 1 - \frac{pq}{2^n}\). By substituting the above inequalities in Eq. 8, the distinguishing advantage is at least

    $$(1 - \frac{pq}{2^n} - \frac{q^2}{2^n}) \times \frac{pq}{2^n}\times (1 - \frac{q}{2^{n +1}}).$$

    Now if we choose \(p =q =\sqrt{2^n/3}\) then the advantage is at least \(\frac{1}{9}(1 - \frac{1}{3 \times 2^{n/2}})\). This value is almost 1/9 for a reasonably large n.

Fig. 3.
figure 3

Distinguisher for composition construction \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\).

5 Birthday Attack on SoKAC21

In the previous section we have shown the basic attacks on composition of ideal primitives. A similar idea can be used for composition of constructions which are not ideal. However, a more dedicated analysis of advantage computation is required. In this section we show a birthday attack on a recent proposal SoKAC21. In the following section we show birthday attack of Dual EWCDM.

We first recall the definition of SoKAC21 (see Fig. 2 and Eq. 4 for details). It uses two public n-bit random permutations \(\pi ^{\textsf {pub}}_1\) and \(\pi ^{\textsf {pub}}_2\). Given an n-bit key K, an n-bit input m, we define SoKAC21 output as

$$F_K(m) :=\pi ^{\textsf {pub}}_2(x) \oplus x, \text{ where } x =\pi ^{\textsf {pub}}_1(m \oplus K) \oplus K.$$

Our attack does not exploit public queries to \(\pi ^{\textsf {pub}}_1\) and hence \(\pi ^{\textsf {pub}}_1(m \oplus K) \oplus K\) behaves identically to a secret random permutation \(\pi ^{\textsf {sec}}(m)\). Let \(\textsf {DM}(x) :=\pi ^{\textsf {pub}}_2(x) \oplus x\) (Davies-Meyer construction based on a public random permutation). So SoKAC21 is actually the composition \(\textsf {DM}\circ \pi ^{\textsf {sec}}\). However, \(\textsf {DM}\) is not perfect random function. But if we choose the inputs of \(\textsf {DM}\) in a without replacement manner, the output of \(\textsf {DM}\) can be viewed as the sum of two WOR samples and hence it is very close to uniform distribution. We use this principle along with the attack strategy as described in the previous section for the composition construction \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\). We simply write \(\pi ^{\textsf {pub}}\) instead of \(\pi ^{\textsf {pub}}_2\) and \(\pi ^{\textsf {sec}}\) instead of the Even-Mansour construction on \(\pi ^{\textsf {pub}}_1\) (Fig. 4).

Fig. 4.
figure 4

Distinguisher for SoKAC21 which can be viewed as the composition construction \(\textsf {DM}\circ \pi ^{\textsf {sec}}\).

We define the event \(E :=\textsf {LColl}(c^q, y^p)\) (i.e. there exists ij such that \(y_i =c_j\)).

Ideal World: In the ideal world . Moreover, \(y_i\) is defined as sum of two without replacement sample. By Eq. 6, \(y_i\)’s are close to a with replacement sample \(y'_1, \ldots , y'_p\) with the statistical distance at most \(4p/2^n\). Moreover \(y'_i\)’s are independent of \(c^q\). Let \(\mu :=\textsf {Pr}(\textsf {LColl}(c^q, (y')^p)) =\textsf {lcp}^{\$}(p,q)\). So by using Lemma 1,

$$\textsf {Pr}(E) =\textsf {Pr}(\textsf {LColl}(c^q, y^p)) \le \textsf {lcp}^{\$}(p,q) +4p/2^n.$$

Real World: In the real world, let \(z_i =\pi ^{\textsf {sec}}(i)\). So \(c_i =\pi ^{\textsf {pub}}(z_i) \oplus z_i\) for all i and independent of \(x^p\). Now, the event E can be written as a disjoint union \(E_1 \sqcup E_2\) where

  1. 1.

    \(E_1\) is \(\textsf {LColl}(z^q, x^p)\) and

  2. 2.

    \(E_2\) is \(\lnot \textsf {LColl}(z^q, x^p) \wedge \textsf {LColl}(c^q, y^p)\).

Let \(\textsf {Pr}(E_1) =\textsf {lcp}^{wor}(p,q) =\mu _1\) (say).

Now, we compute the probability of the event \(E_2\) which is same as \(E_1^c \wedge \textsf {LColl}(c^q, y^p)\). Given that \(z^q\) is distinct from \(x^p\) (i.e. \(E_1^c\) holds) we have

As \(c_i =\textsf {DM}(z_i)\) and \(y_i =\textsf {DM}(x_i)\), \(c_i\)’s and \(y_i\)’s are almost uniformly distributed. More precisely, for ,

$$\mathsf {D}((c^q, y^p) ; ((c')^q, (y')^p)) \le 4(p +q)/2^n.$$

So by Lemma 1, \(\textsf {Pr}(E_2) \ge (1 - \mu _1)\times (\mu - 4(p +q)/2^n)\) where \(\mu =\textsf {lcp}^{\$}(p,q)\). Now,

$$\begin{aligned} \textsf {Pr}(E)&=\textsf {Pr}(E_1) +\textsf {Pr}(E_2) \\&\ge \mu _1 +(1 - \mu _1) (\mu - \frac{4(p +q)}{2^n}). \end{aligned}$$

So, subtracting the probability \(\textsf {Pr}(E)\) of the real world from that of the ideal world, the distinguishing advantage is at least

$$\mu _1(1 - \mu ) - \frac{8p +4q}{2^n}.$$

We have already shown that \(\mu _1(1- \mu )\) is at least \(\frac{1}{9} - \frac{1}{27 \cdot 2^{n/2}}\) when \(p =q =\sqrt{2^n/3}\) (see the last paragraph of our analysis on \(\gamma ^{\textsf {pub}}\circ \pi ^{\textsf {sec}}\)). Hence the advantage is at least \(\frac{1}{9} - \frac{1}{2^{n/2-1}}\).

6 Birthday Attack on Dual-EWCDM

In this section we provide details of a nonce respecting distinguishing attack on EWCDMD. For better understanding we consider a specific hash function \(\mathcal {H}(m) = K \cdot m\) where K is a nonzero random key chosen uniformly from \(\{0,1\}^n \setminus \{0\}\) and \(m \in \mathcal {M}:= \{0,1\}^n\). Here \(K \cdot m\) means the field multiplication with respect to a fixed primitive polynomial. Clearly, \(\mathcal {H}\) is \(\frac{1}{2^n-1}\) AXU hash. Moreover it is injective hash. In other words, for distinct messages \(m_1, \ldots , m_q\), \(\mathcal {H}(m_1), \ldots , \mathcal {H}(m_q)\) are distinct.

Distinguishing Attack. \(\mathcal {A}\) choses \((\nu _1, m_1), \ldots , (\nu _q, m_q) \in \{0,1\}^ n \times \mathcal {M}\) where all \(\nu _i\)’s are distinct and all \(m_i\)’s are distinct. Suppose \(T_1, \ldots , T_q\) are all responses. \(\mathcal {A}\) returns 1 if there is a collision among \(T_i\) values, otherwise returns zero.

When \(\mathcal {A}\) is interacting with a random function, \(\textsf {Pr}[\mathcal {A}\rightarrow 1] \le q(q-1)/2^{n+1}\) (by using the union bound). Now we provide lower bound of \(\textsf {Pr}[\mathcal {A}\rightarrow 1]\) while \(\mathcal {A}\) is interacting with EWCDMD in which \(\pi _1, \pi _2\) are two independent random permutations and \(\mathcal {H}\) is the above hash function whose key is chosen independently. To obtain a lower bound we first prove the following lemma. Let \(N= 2^n\).

Lemma 5

Let \(x_1, \ldots , x_q \in \{0,1\}^n\) be q distinct values. Let \(\pi \) be a random permutation. Then, for all distinct \(\nu _1, \ldots , \nu _q\), let C denote the event that there is a collision among values of \(\pi (\nu _i) \oplus x_i\), \(1 \le i \le q\). Then,

$$\begin{aligned} \alpha (1- \beta ) \le \textsf {Pr}[C] \le \alpha \end{aligned}$$

where \(\alpha = \frac{q(q-1)}{2(N-1)}\) and \(\beta = \frac{(q-2)(q+1)}{4(N-3)}\). In particular, for distinct \(x_i\)’s, there is a collision among \(\pi (x_i) \oplus x_i\) values has probability at least \(\alpha (1 - \beta )\).

\(\mathbf{Proof. }\) Let \(E_{i,j}\) denote the event that \(\pi (\nu _i) \oplus \pi (\nu _j) = x_i \oplus x_j\). So for all \(i \ne j\), \(\textsf {Pr}[E_{i,j}] = 1/(N-1)\). Let \(C = \cup _{i \ne j} E_{i,j}\) denote the collision event. By using union bound we can easily upper bound

$$\begin{aligned} \textsf {Pr}[C] \le \alpha := \frac{q(q-1)}{2(N-1)}. \end{aligned}$$

Now, we show the lower bound. For this, we apply Boole’s inequality and we obtain lower bound of collision probability as

$$\begin{aligned} \textsf {Pr}[C] \ge \alpha - \sum \textsf {Pr}[E_{i,j} \cap E_{k,l}] \end{aligned}$$

where the sum is taken over all possible choices of \(\{\{i,j \}, \{k, l\}\}\), and the number of such choices is at most \({ q(q-1)/2 \atopwithdelims ()2} =q(q-1)(q+1)(q-2)/8 \). Note that for each such choice ijkl,

$$\textsf {Pr}[E_{i,j} \cap E_{k,l}] \le \frac{1}{(N-1)(N-3)}.$$

Hence,

$$\begin{aligned} \textsf {Pr}[C]&\ge \alpha - \frac{q(q-1)(q+1)(q-2)}{8(N-1)(N-3)} \end{aligned}$$
(9)
$$\begin{aligned}&= \alpha (1 - \frac{(q-2)(q+1)}{4(N-3)}) = \alpha (1 - \beta ). \end{aligned}$$
(10)

This completes the proof.    \(\square \)

Advantage Computation. Using the above Lemma we now show that the probability that \(\mathcal {A}\) returns 1 while interacting with EWCDMD is significant when \(q = O(2^{n/2})\).

Let \(C_1\) denote the event that there is a collision among the values \(z_i := \pi _1(\nu _i) \oplus \mathcal {H}(m_i)\). We can apply our lemma as \(\mathcal {H}(m_i)\)’s are distinct due to our choice of the hash function. Thus, \(\textsf {Pr}[C_1] \ge \alpha (1 - \beta )\). Moreover, \(\textsf {Pr}[\lnot C_1] \ge (1 - \alpha )\). Given \(\lnot C_1\), T values are outputs of Davies-Meyer based on random permutation \(\pi _2\) for distinct inputs. So by using previous lemma,

$$\textsf {Pr}(\text{ collision } \text{ in } T \text{ values } ~|~\lnot C_1) \ge \alpha (1 - \beta ).$$

Hence,

$$\begin{aligned} \textsf {Pr}(\mathcal {A}\rightarrow 1)&\ge \textsf {Pr}(C_1) + \textsf {Pr}(\mathrm {collision\,in\,} T \mathrm {\,values\,}~|~\lnot C_1) \times \textsf {Pr}[\lnot C_1]\\&\ge \alpha (1- \beta ) +(1 - \alpha ) \times \textsf {Pr}(\mathrm {collision\,in\,} T \mathrm {\,values\,}~|~\lnot C_1) \\&\ge \alpha (1- \beta ) +\alpha (1 - \alpha )(1 - \beta ) \\&=(2 \alpha - \alpha ^2)(1 - \beta ) \ge 2\alpha - 2\alpha \beta - \alpha ^2. \end{aligned}$$

Thus, the advantage of the adversary is at least \(\alpha - 2\alpha \beta - \alpha ^2\). It is easy to see that when \(2q^2 \ge N\), we have \(1 - 2 \beta - \alpha \le 1/2\) and hence the advantage is at least \(\alpha /2 =q(q-1)/4(N-1)\).

Remark 1

We would like to note that the distinguishing advantages of both constructions can be made closer to one if we repeat the whole process independently O(n) times.

6.1 Issues in the Previous Proofs

Now we briefly describe what were the issues in the proofs of [CLM19, MN17]. Both proofs used H-technique and hence it is broadly divided into two parts: bounding probability of bad events and finding good lower bound for realizing any fixed good transcript in the real world. The flaws in their proof lie on the good transcript analysis.

Suppose \(\pi _1\) and \(\pi _2\) are two random permutations. In the both proofs, good transcript analysis deals to compute the probability distribution of sum of the two random permutations. More precisely, for fixed \(\lambda _1, x_1, y_1, \ldots x_q, y_q, \lambda _q \in \{0,1\}^n\), we want to provide a lower bound of the event \(\pi _1(x_i) \oplus \pi _2(y_i) =\lambda _i\) for all i. This is also known as mirror theory and have been studied in several papers [Pat10, Pat13, DDN+17a, DDNY19, DDNY18]. A desired lower bounds are known if the equality patterns of \(x_i\) and \(y_i\)’s satisfy certain conditions. In the proofs of [CLM19, MN17], equality pattern of \(y_i\)’s depend on the values of \(\pi _1(x_i)\) for all i. So, clearly, we cannot use the mirror theory based lower bound. This is the main flaw of the proofs.

7 Concluding Discussion

We have demonstrated a distinguishing attack on EWCDMD. We would like to note that this attack does not work for EDM, EWCDM and EDMD as we can not write them as a composition of two non-injective functions. We also demonstrate a birthday attack on SoKAC21. Our attack also does not work if we mask the final output by a key (which is another variant of sum of key alternating ciphers). However, at the same time, we do not know how to prove its beyond birthday security.

7.1 Some Open Problems

Followings are some of open problems on which cryptography community could have interest.

  1. 1.

    We would like to note that our attack against EWCDMD is a PRF attack and it is not easy to extend to a forging attack in a nonce respecting situation. Thus, proving MAC security would be an interesting research problem.

  2. 2.

    One can consider the following dual variant:

    $$\begin{aligned} \pi _2(\pi _1(\nu ) \oplus \mathcal {H}(m)) \oplus \pi _1(\nu ). \end{aligned}$$
    (11)

    This is very close to the sum of permutations. However, the presence of \(\mathcal {H}(m)\) makes it very difficult to prove (without using Patarin’s claim or conjecture on the interpolation probability of sum of random permutations). Moreover, it can not be expressed as a composition function with n-bit outputs. Hence it is a potential dual candidate of EWCDM.

  3. 3.

    Another possibility is to use three independent random permutations. As mentioned in [CS16], we can consider

    $$\begin{aligned} \pi _3\big ( \pi _1(\nu ) \oplus \pi _2(\nu ) \oplus \mathcal {H}(m) \big ). \end{aligned}$$

    This will give \(2^n\) security in nonce respecting model assuming that the sum of permutations would give n-bit PRF security. However, we don’t know the trade-off between the number of allowed repetition of nonce and the security bound.

  4. 4.

    Proving beyond birthday security (or demonstrating birthday attacks) of some other variants of SoKAC21 would be an interesting open problem.