1 Introduction

Collision-resistant hashing (\(\textsf{CRH}\)) is a fundamental primitive that is important throughout cryptography. These are functions that shrink their input but for which it is computationally infeasible to find two inputs (called “colliding” inputs) that map to the same output, even though many such pairs exist.

Recently, natural relaxations of such hash functions, called Multi-Collision-Resistant Hash Functions (\(t\text {-}\textsf{MCRH}\) for some integer t), have been studied [2, 4, 17,18,19]. These are functions where it is computationally infeasible to find a set of t distinct inputs that are all mapped to the same output, even though many such collisions exist and moreover, it might even be possible to find sets of \((t-1)\) colliding inputs efficiently. Clearly, a \(\textsf{CRH}\) is a \(t\text {-}\textsf{MCRH}\) for any value of \(t \ge 2\). In this paper, we address the question of whether the existence of a \(t\text {-}\textsf{MCRH}\) for some \(t > 2\) implies the existence of a \(\textsf{CRH}\).

The existing evidence in this regard is ambiguous. In some important applications like constant-round statistically hiding commitments, \(\textsf{CRH}\) may be replaced by \(\textsf{MCRH}\) [2, 18]. Further, \(\textsf{MCRH}\) imply a different relaxation of \(\textsf{CRH}\) called distributional \(\textsf{CRH}\) [19]. Similar to \(\textsf{CRH}\), there is also a blackbox separation between \(\textsf{MCRH}\) and one-way permutations [2, 18]. These suggest that \(\textsf{MCRH}\) might be as powerful as \(\textsf{CRH}\).

On the other hand, \(\textsf{CRH}\) have properties that \(\textsf{MCRH}\) are not known to possess. For instance, it is well-known that for \(\textsf{CRH}\), shrinkage of even a single bit suffices to construct a \(\textsf{CRH}\) of essentially any desired shrinkage (see [10, Section 6.2.3] for details). Such a transformation for \(t\text {-}\textsf{MCRH}\) that preserves the number t of collisions resisted is not known. A non-trivial transformation that somewhat increases the t is known; however, if starting with a \(t\text {-}\textsf{MCRH}\) that already has substantial shrinkage [4].

1.1 Our Results

Loosely speaking, we show that the existence of \(t\text {-}\textsf{MCRH}\) for \(t = 3\) or 4 that are sufficiently shrinking implies the existence of \(\textsf{CRH}\). Our proof of this is non-constructive and non-blackbox. It is non-constructive because, even when given an explicit \(t\text {-}\textsf{MCRH}\), we can only prove that a \(\textsf{CRH}\) exists but cannot explicitly point out a specific construction. It is non-blackbox because we make non-blackbox use of a potential \(\textsf{CRH}\) adversary.

Before stating our results formally, we define these primitives. Throughout this work, for a function \(h: \{0,1\}^n \rightarrow \{0,1\}^*\), integer \(t \in \mathbb {N}\) and set \(X\subseteq \{0,1\}^n\), we denote by t-\(\textrm{coll}_h(X)\) the event that (1) \(|X| = t\) and (2) \(h(x)=h(x')\) for every \(x,x' \in X\).

Definition 1

For functions \(t=t(n)\) and \(\ell =\ell (n)\), a \((t,\ell )\) -multi-collision-resistant hash function (\((t,\ell )\)-\(\textsf{MCRH}\)) consists of a probabilistic polynomial-time algorithm \(\textsf{Gen}\) that on input \(1^n\) outputs a circuit \(h: \{0,1\}^n \rightarrow \{0,1\}^{n-\ell (n)}\) such that the following holds. For every family of polynomial-size circuits \(A=(A_n)_{n \in \mathbb {N}}\), every polynomial p and all sufficiently large \(n \in \mathbb {N}\), it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h \leftarrow \textsf{Gen}(1^n)\\ X \leftarrow A_n(h) \end{array}} \big [ t\text {-}\textrm{coll}_h(X) \big ] < 1/p(n). \end{aligned}$$
(1)

Observe that for a \((t,\ell )\)-\(\textsf{MCRH}\) to be non-trivial, we need \(\ell (n) \ge \log {t(n)}\). The standard definition of \(\textsf{CRH}\) is equivalent to (2, 1)-\(\textsf{MCRH}\). As noted earlier, while a (2, 1)-\(\textsf{MCRH}\) can be used to construct a (2, cn)-\(\textsf{MCRH}\) for any \(c<1\), this is not known to be true for a \((t,\log {t})\)-\(\textsf{MCRH}\) for \(t > 2\). This potentially qualitative difference between \(t\text {-}\textsf{MCRH}\) with different levels of shrinkage also shows up in the theorems we are able to prove in this paper. Thus, it is important to be explicit about the shrinkage \(\ell \) of an \(\textsf{MCRH}\) in our terminology. (Nevertheless, in some informal discussions we may use the terminology t-\(\textsf{MCRH}\) without explicitly stating the shrinkage.)

Variants of \(\textsf{MCRH}\). We also consider certain variants of the definition of \(\textsf{MCRH}\). In an infinitely often \(\textsf{MCRH}\) we require every adversary to fail on infinitely many n’s (rather than all sufficiently large n’s). More precisely, we say that \(\textsf{Gen}\) is a \((t,\ell )\)-\(\textsf{ioMCRH}\) if Eq. (1) only holds for infinitely many n’s (rather than all sufficiently large n’s). Every \(\textsf{MCRH}\) is also an \(\textsf{ioMCRH}\), but the converse is not necessarily true.

We say that a hash function family is non-uniform if the sampling algorithm is non-uniform. That is, instead of an algorithm \(\textsf{Gen}\) that samples the hash functions h when run as \(\textsf{Gen}(1^n)\), there is a family of probabilistic circuits \(\left( \textsf{Gen}_n\right) _{n\in {\mathbb {N}}}\) such that \(\textsf{Gen}_n\) has size \(\textsf{poly}(n)\) and outputs h. In this work, we follow the standard practice of modeling adversaries as non-uniform circuits. Jumping ahead, as some of our constructions make use of a potential non-uniform adversary, the hash functions we construct will also be non-uniform.

Remark 2

Hsiao and Reyzin [14] consider a variant of \(\textsf{CRH}\) in which the adversary is given also the coins used by the generator to sample the hash function h. An analogous variant may be considered for \(\textsf{MCRH}\) (with or without the infinitely often and non-uniform qualifiers). We remark that all of our results can be easily adapted to the [14] setting as well.

Main Results. With the above definitions in hand, we are ready to state our main results. The first result, which is easiest to state, is the construction of an (infinitely often and non-uniform) \(\textsf{CRH}\) from a sufficiently shrinking 3-\(\textsf{MCRH}\). Similar to standard \(\textsf{CRH}\), the shrinkage of non-uniform infinitely often secure \(\textsf{CRH}\) can also be generically increased from a single bit to cn for any \(c<1\), so we often do not specify it. The parameters stated in the theorems below result in \(\textsf{CRH}\) with shrinkage \(\Omega (\log {n})\).

Theorem 3

Suppose there exists a \(\big (3,n/2+\omega (\log {n})\big )\)-\(\textsf{MCRH}\). Then there exists a non-uniform \(\textsf{ioCRH}\).

The same conclusion also holds under the weaker assumption that the 3-\(\textsf{MCRH}\) in the hypothesis above is non-uniform and/or only infinitely often secure (this is true for the remaining theorems as well). Given Theorem 3, it is natural to wonder for what values of t we can construct \(\textsf{ioCRH}\) from t-\(\textsf{MCRH}\). Curiously, while we are able to show such an implication from a sufficiently shrinking 4-\(\textsf{MCRH}\), our techniques stop working when \(t \ge 5\).

Theorem 4

Suppose there exists a \(\big ( 4,\frac{5}{6}n+\omega (\log {n}) \big )\)-\(\textsf{MCRH}\). Then there exists a non-uniform \(\textsf{ioCRH}\).

We discuss the limitation of our techniques to \(t \le 4\) in Sect. 4. Getting around this and constructing \(\textsf{ioCRH}\) from \(t\text {-}\textsf{MCRH}\) for larger constants t (let alone all constants or even super constant values of t) is an interesting open problem. Despite this restriction, for large enough constants t, we are able to show that t-\(\textsf{MCRH}\) generically implies \(t'\)-\(\textsf{ioMCRH}\) for many values of \(t' < t\).

Theorem 5

Consider any constants t, k, \(t_f\ge \max \left[ (2t\sqrt{k-1})^{2/3},24\right] \), and function \(\ell =\ell (n)\). If there exists a \((t,\ell )\)-\(\textsf{MCRH}\) then there exists a \((t_f,\ell _f)\)-\(\textsf{ioMCRH}\), for \(\ell _f(n) = \min \big [(\ell (n)-n/k),(\ell (kn)-n(k-1)-O(\log {n}))\big ]\).

Theorem 5 is only meaningful if t is larger than \(t_f\) (and thus larger than 24). These bounds are not optimized, and our construction works for some smaller values of t and \(t_f\) as well. Starting with any \((t,\ell )\), the parameter k controls a trade-off between the best values of \(t_f\) and \(\ell _f\) we can obtain from the above theorem. It may be verified that, for the \(\ell _f(n)\) above to be positive for some value of k, we need to start with an \(\ell \) such that \(\ell (n) > n/2\). With appropriate choices of the parameters, the theorem can be applied multiple times in sequence to get a \((t_f',\ell _f')\)-\(\textsf{ioMCRH}\) from the \((t_f,\ell _f)\)-\(\textsf{ioMCRH}\) for some \(t_f'<t_f\), etc. For \(t > 4\), however, there is no sequence of parameters that can be used to get all the way down to a (2, 1)-\(\textsf{ioMCRH}\).

For an example of an instantiation, consider a 100-\(\textsf{MCRH}\) that has output of length n/10—that is, a (100, 9n/10)-\(\textsf{MCRH}\). With \(k = 2\), noting that \((2\cdot 100 \cdot 1)^{2/3} \approx 34\), the above theorem gives us a (35, 4n/10)-\(\textsf{ioMCRH}\) (ignoring additive \(O(\log {n})\) terms). Similarly, with \(k = 4\) and \(k = 9\), we can get a (50, 6n/10)-\(\textsf{ioMCRH}\) and a (69, n/10)-\(\textsf{ioMCRH}\), respectively. Values of k outside the range [2, 9] lead to negative values for \(\ell _f(n)\) and thus do not result in shrinking hash functions.

1.2 Our Techniques

In this overview we focus on Theorem 3, that is, our approach for constructing an (infinitely often and non-uniform) \(\textsf{CRH}\) from a 3-\(\textsf{MCRH}\). Suppose we have a \((3,\ell )\)-\(\textsf{MCRH}\), for a shrinkage parameter \(\ell =\ell (n)\) to be determined below. At a high level, we will construct two families of functions such that if neither of them is a \(\textsf{CRH}\), then 3-way collisions can be found in the original hash function family.

Our approach is inspired by a recent construction of Komargodski and Yogev [19] of distributional \(\textsf{CRH}\) from \(\textsf{MCRH}\). Distributional \(\textsf{CRH}\) (or \(\textsf{DCRH}\)), introduced by Dubrov and Ishai [7], are a different relaxation of \(\textsf{CRH}\) in which it should be hard to find random collisions (although it may be easy to find some specific collisions). In contrast to [19] who construct (infinitely often) \(\textsf{DCRH}\) from \(\textsf{MCRH}\), we show that \(\textsf{MCRH}\) imply worst-case collision resistance. We defer a thorough comparison of our techniques and results with those of [19] to Sect. 1.3.1.

The candidate \(\textsf{CRH}\)s. Fix some input length n. Let \(\mathcal {H}= \left\{ h:\{0,1\}^n\rightarrow \{0,1\}^{n-\ell } \right\} \) be a \((3,\ell )\)-\(\textsf{MCRH}\) (for simplicity we assume that the hash functions are sampled uniformly at random from this family). Since it may be possible to find 2-way collisions for functions in \(\mathcal {H}\), we will have to modify \(\mathcal {H}\). Toward this end we introduce an additional non-cryptographic function family \(\mathcal {G}= \left\{ g:\{0,1\}^n\rightarrow \{0,1\}^{m} \right\} \), with \(m=m(n) < \ell (n)\). The exact properties that we need from \(\mathcal {G}\), as well as setting of the parameter \(m=m(n)\), will be specified below.

Thus, our first family of hash functions is \(\mathcal {F}= \left\{ f_{h,g}:\{0,1\}^n\rightarrow \{0,1\}^{n-\ell +m} \right\} \), where \(h\in \mathcal {H}\), \(g\in \mathcal {G}\). The evaluation \(f_{h,g}(x)\) is simply the concatenation \(f_{h,g}(x) = (h(x),g(x))\). There are two possibilities: either \(\mathcal {F}\) is a \(\textsf{CRH}\) or it is not. If the former is true, then we are done and so we might as well assume the latter. Namely, assume that there exists an efficient (non-uniform) adversary A that, given \(f_{h,g}\in \mathcal {F}\) as input, outputs \((x_0,x_1)\) such that \(x_0\ne x_1\) but \(f_{h,g}(x_0) = f_{h,g}(x_1)\). For simplicity, let us assume that A is perfect—that is, that A finds a valid collision for any \(f_{h,g}\in \mathcal {F}\).

We will use A—an adversary for \(\mathcal {F}\)—to construct a second family of hash functions. We denote the family by \(\mathcal {F}_A = \left\{ f_{h,A}:\mathcal {G}\rightarrow \{0,1\}^{n-\ell } \right\} \). Each function \(f_{h,A}\) takes as its input the description of a function g from \(\mathcal {G}\), runs \(A(f_{h,g})\) to get \((x_0,x_1)\)—a collision for \(f_{h,g}\)—and outputs \(h(x_0)\). The fact that \(\mathcal {F}_A\) depends on an adversary A is what makes our construction non-blackbox, non-uniform, and non-constructive. In particular, as the description of the family \(\mathcal {F}_A\) involves the description of a purported adversary A for \(\mathcal {F}\), unless this adversary was explicitly given, we would be unable to point out an explicit construction of \(\mathcal {F}_A\) (even given \(\mathcal {H}\)).

What makes \(\mathcal {F}_A\) interesting for our purposes is that, intuitively, a pairwise collision \(g_0,g_1 \in \mathcal {G}\) for \(\mathcal {F}_A\) actually specifies four inputs (namely, \((x_{00},x_{01}) \leftarrow A(f_{h,g_0})\) and \((x_{10},x_{11}) \leftarrow A(f_{h,g_1})\)) that all collide under h. We will attempt to leverage this fact to argue that \(\mathcal {F}_A\) must be collision-resistant.

Thus, assume toward a contradiction that \(\mathcal {F}_A\) is not a \(\textsf{CRH}\). That is, that there exists an efficient adversary \(A'\) that finds collisions for \(\mathcal {F}_A\). We assume again that \(A'\) is also perfect in the same manner as A, and show how to use \(A'\) to find a 3-way collision for \(\mathcal {H}\).

Finding 3-way collisions. For any \(h\in \mathcal {H}\), given A and \(A'\) as above, we can find a collision for h as follows:

  1. 1.

    Run \(A'(f_{h,A})\) to get \((g_0,g_1)\).

  2. 2.

    Run \(A(f_{h,g_0})\) to get \((x_{00},x_{01})\).

  3. 3.

    Run \(A(f_{h,g_1})\) to get \((x_{10},x_{11})\).

  4. 4.

    Identify three distinct elements among \(\left\{ x_{00},x_{01},x_{10},x_{11} \right\} \) and output them if they exist.

We make the following observations about this procedure:

  1. 1.

    The fact that A finds valid collisions implies that \(x_{00}\ne x_{01}\) and \(x_{10}\ne x_{11}\).

  2. 2.

    The fact that \(g_0\) and \(g_1\) are a collision for \(f_{h,A}\) implies that, whether given \(f_{h,g_0}\) or \(f_{h,g_1}\) as input, A will find collisions that have the same output under h—that is, \(h(x_{00}) = h(x_{01}) = h(x_{10}) = h(x_{11})\).

  3. 3.

    The definition of \(f_{h,g_0}\) and \(f_{h,g_1}\) implies that \(g_0(x_{00}) = g_0(x_{01})\) and \(g_1(x_{10}) = g_1(x_{11})\). Further, the fact that \(A'\) finds valid collisions implies that \(g_0 \ne g_1\).

Property 2 above implies that the set \(X = \{ x_{00},x_{01},x_{10},x_{11}\}\) forms a collision under h, while Property 1 implies that X contains at least 2 distinct elements. Unfortunately though, nothing so far guarantees that this set contains more than 2 elements. A particularly alarming, but so far possible, scenario is that \(x_{00}=x_{10}\) and \(x_{01}=x_{11}\). Thus, it is not at all immediate that the set X contains a 3-way collision. This is the point where we will need to use special properties of the family of functions \(\mathcal {G}\). In particular, we will choose \(\mathcal {G}\) in such a way that Property 3 above will ensure that X does indeed contain a 3-way collision for h.

The family \(\mathcal {G}\). Let \({\mathbb {F}}\) denote the finite field of size \(2^{n/2}\). Functions in \(\mathcal {G}\) correspond to elements of \({\mathbb {F}}\). Thus, for each \(\alpha \in {\mathbb {F}}\), there is a function \(g_\alpha \in \mathcal {G}\), which is computed as follows. Given input \(x\in \{0,1\}^n\), divide x into two halves \(x_L,x_R\in \{0,1\}^{n/2}\) and interpret them as elements of \({\mathbb {F}}\) in the natural way. The evaluation of \(g_\alpha (x)\) is simply the value of the line specified by \((x_L,x_R)\) at the point \(\alpha \)—that is, \(g_\alpha (x) = x_L + \alpha \cdot x_R\) (computations performed over \({\mathbb {F}}\)).

If for some \(x_0,x_1\in \{0,1\}^n\) and some \(g_\alpha \in \mathcal {G}\) we have \(g_\alpha (x_0) = g_\alpha (x_1)\), this implies that the lines specified by \(x_0\) and \(x_1\) intersect at \((\alpha ,g_\alpha (x_0))\). Since any two distinct lines can intersect at most one point, \(\mathcal {G}\) has the following property: for any two distinct \(x_0,x_1\in \{0,1\}^n\), there is at most one function \(g\in \mathcal {G}\) such that \(g(x_0) = g(x_1)\).

Consider now the two pairwise collisions that we have: both \(\left\{ x_{00},x_{01} \right\} \) and \(\left\{ x_{10},x_{11} \right\} \) are pairs of distinct inputs such that \(g_0(x_{00}) = g_0(x_{01})\) and \(g_1(x_{10}) = g_1(x_{11})\). Suppose that these two sets are identical to one another: for example that \(x_{00}=x_{10}\) and \(x_{01}=x_{11}\). Since \(g_0\ne g_1\), this implies that there are two distinct functions in G such that \(x_{00}\) and \(x_{01}\) collide on them, a contradiction of the above property of G.

Thus, these two sets cannot be identical, implying that the set of collisions X above contains at least 3 distinct elements. This gives us a 3-way collision for h. We conclude that if \(\mathcal {H}\) is a 3-\(\textsf{MCRH}\), then either A or \(A'\) cannot exist. That is, either \(\mathcal {F}\) is collision-resistant, or \(\mathcal {F}_A\), constructed using the corresponding adversary A, is collision-resistant.

Shrinkage. It remains to argue that both \(\mathcal {F}\) and \(\mathcal {F}_A\) are in fact shrinking. As noted earlier, a \(\textsf{CRH}\) with one bit of shrinkage is sufficient to construct a \(\textsf{CRH}\) with essentially any desired shrinkage (and the same holds for non-uniform \(\textsf{ioCRH}\)). So it would be sufficient for \(\mathcal {F}\) and \(\mathcal {F}_A\) to shrink by even one bit.

By construction, functions in \(\mathcal {G}\) map n-bit inputs to n/2-bit outputs. This means that \(\mathcal {F}\) maps n bits to \((\frac{3}{2}n - \ell )\) bits and is shrinking as long as \(\ell > n/2\). As noted above, each member of \(\mathcal {G}\) is described by an element of \({\mathbb {F}}\), in other words a string of length n/2. Thus, functions in \(\mathcal {F}_A\) map n/2 bits to \((n-\ell )\) bits. So again, if \(\ell > n/2\), this is shrinking.

Coping with imperfect adversaries. Above, we assumed that the adversaries A and \(A'\) work perfectly—given a hash function, they always find a collision for it. This was done for simplicity of presentation here. In the actual construction, there are several difficulties that arise from dealing with imperfect adversaries. First, if A and \(A'\) are standard \(\textsf{CRH}\) adversaries, this would only imply that they find collisions for an infinite set of input lengths n, rather than all large enough n. We can only make the above arguments for the set of n’s for which both of them work, and this set could well be empty. This is the reason that we can only argue that \(\mathcal {F}\) or \(\mathcal {F}_A\) is an infinitely often \(\textsf{CRH}\) rather than a standard \(\textsf{CRH}\).

In addition, in the actual construction we only know that A succeeds with non-negligible probability, rather than with probability 1 as assumed above. This means that \(\mathcal {F}_A\) might only be defined for a relatively small (but non-negligible) fraction of its domain. We resolve this second difficulty by showing how, in general, to transform collision-resistant hash functions that only work on a small subset of their domain, to full-fledged \(\textsf{CRH}\). This transformation, which we find to be of independent interest, is based on the so-called reverse randomization technique, introduced by Lautemann [20] and used in several works in cryptography since [5, 8, 9, 21]. We defer the details to Sect. 2. We remark that this transformation introduces a small overhead and in particular leads to our hypothesis being that \(\ell \) is larger than \(n/2+\omega (\log {n})\) rather than just n/2 as above.

Improving Collision Resistance in General \(t\text {-}\textsf{MCRH}\). A simple generalization of the above approach to getting a \(t_f\)-\(\textsf{MCRH}\) from a \(t\text {-}\textsf{MCRH}\) for some \(t_f < t\) is to keep the construction as is and just change the arguments in the proof. Let , and let the families \(\mathcal {F}\), \(\mathcal {G}\), and \(\mathcal {F}_A\) be just as defined above. If \(\mathcal {F}\) were not a \(t_f\)-\(\textsf{MCRH}\) and \(\mathcal {F}_A\) were not a \(\textsf{CRH}\), then we can find a t-wise collision for functions in \(\mathcal {H}\) in the same manner we found 3-wise collisions above—given \(h\in \mathcal {H}\), find a pairwise collision \((g_0,g_1)\) for \(f_{h,A}\in \mathcal {F}_A\), and then for each \(g_b\), find a \(t_f\)-wise collision \((x_{b1},\dots ,x_{bt_f})\) for \(f_{h,g_b}\). By the same argument as above, the sets \(\left\{ x_{0i} \right\} \) and \(\left\{ x_{1i} \right\} \) can have at most one element in common, and they all have the same value of \(h(x_{bi})\). This gives a \((2t_f-1)\)-wise collision for h, which is a contradiction. Thus, either \(\mathcal {F}\) is a \(t_f\)-\(\textsf{MCRH}\) or \(\mathcal {F}_A\) is a \(\textsf{CRH}\). The only guarantee we have, however, is that the weaker of these statements holds, meaning that a \(t_f\)-\(\textsf{MCRH}\) exists.

The price of this transformation is that the shrinkage of the resulting hash functions decreases by at least n/2 from that of \(\mathcal {H}\), as this is the size of the output of functions in \(\mathcal {G}\). For one, this precludes the transformation from being applied twice in order to get a \(t_f'\)-\(\textsf{MCRH}\) for some \(t_f' < t_f\). In order to obtain better shrinkage and also to improve how much smaller \(t_f\) can be than t, we generalize our construction. For any \(k \ge 2\), denote by \({\mathbb {F}}_k\) the finite field of size \(2^{n/k}\) (assume that k divides n). Now, instead of \(\mathcal {G}\) being the set of functions representing evaluations of lines in \({\mathbb {F}}_2\), we set it to be the functions representing evaluations of polynomials of degree \((k-1)\) over \({\mathbb {F}}_k\). That is, each function \(g\in \mathcal {G}\) corresponds to an element \(\lambda \in {\mathbb {F}}_k\), and given input \(x\in \{0,1\}^n\), interprets it as a list of elements \(x_0, \dots , x_{k-1} \in {\mathbb {F}}_k\), and outputs \(\sum _{i=0}^{k-1} x_i \lambda ^i\).

Notice that the shrinkage of \(\mathcal {F}\) is now \((\ell (n) - n/k)\), as opposed to the \((\ell (n) - n/2)\) earlier. The shrinkage of \(\mathcal {F}_A\) can be computed to be \((\ell (kn)-n(k-1))\), which can be made better than \((\ell (n)-n/2)\) by an appropriate choice of k. We claim now that, for certain values of \(t_f\), either \(\mathcal {F}\) is a \(t_f\)-\(\textsf{MCRH}\), or the \(\mathcal {F}_A\) constructed using the corresponding adversary A is a \(t_f\)-\(\textsf{MCRH}\). If they were not, given an \(h\in \mathcal {H}\), we can proceed along the same lines as earlier to first get a set of functions \(g_1, \dots , g_{t_f}\in \mathcal {G}\) that collide under \(\mathcal {F}_A\). Then, we can use A on each \(f_{h,g_i}\) to get \(t_f\) sets \(X_i = \left\{ x_{i1},\dots ,x_{it_f} \right\} \), each of size \(t_f\), such that all the \(x_{ij}\)’s have the same value under h and all the elements of each \(X_i\) have the same value under \(g_i\).

If we can also prove that there are at least t distinct \(x_{ij}\)’s in the union of these sets, we would have a t-wise collision for h and thus a contradiction. Notice that each set \(X_i\) corresponds to a set of \(t_f\) polynomials (given by \(x_{i1}, \dots , x_{it_f}\)) that all have the same evaluation at the field element, say \(\lambda _i\), corresponding to \(g_i\). Thus we end up with the following question: given \(t_f\) sets \(X_i\) of \(t_f\) polynomials each and \(t_f\) pairs \((\lambda _i,y_i)\) with the guarantee that for each \(x\in X_i\) we have \(x(\lambda _i) = y_i\), what is the smallest possible number of distinct polynomials in the union \(\cup _{i=1}^{t_f} X_i\)?

This is closely related to bounds on the list-decodability of Reed–Solomon codes, which we use to show that as long as \(t_f\) is at least roughly \((2t\sqrt{k-1})^{2/3}\), there have to be at least t distinct elements among the above sets. This gives us a transformation from t-\(\textsf{MCRH}\) to \(t_f\)-\(\textsf{MCRH}\) for such values of t, which is again much better than the transformation to -\(\textsf{MCRH}\) that followed from our original construction. We elaborate on this in Sect. 3.3. By paying attention to details, we show that this transformation can be used to go from a 4-\(\textsf{MCRH}\) to a 3-\(\textsf{MCRH}\) with a loss of n/3 in shrinkage, and then on to a \(\textsf{CRH}\) with an additional loss of n/2. This approach, however, cannot be used to get a \(\textsf{CRH}\) starting from a 5-\(\textsf{MCRH}\). We discuss this barrier in Sect. 4.

1.3 Related Work

Multi-collision resistance was first studied by Joux [16], who showed that for a certain class of hash functions called iterated hash functions, certain collision-finding attacks can be augmented to find multi-collisions without much overhead. Subsequent work has studied similar attacks on some other specific classes of hash functions [22, 27, ...]. The formal theoretical study of \(\textsf{MCRH}\) began with the work of Komargodski et al [17], who defined \(\textsf{MCRH}\) and showed connections to problems arising from Ramsey theory.

A more detailed study of \(\textsf{MCRH}\) was done later in three concurrent and independent works [2, 4, 18]. Berman et al. [2] showed that \((n^2,\sqrt{n})\)-\(\textsf{MCRH}\) can be constructed from the hardness of a variant of the Entropy Approximation problem [6]. Both Berman et al. and Komargodski et al. [2, 18] showed that constant-round statistically hiding commitment schemes can be constructed from \(\textsf{MCRH}\) with various parameters, which implies a blackbox separation between such \(\textsf{MCRH}\) and one-way permutations [12]. This separation extends the well-known separation between \(\textsf{CRH}\) and one-way permutations [25]. The latter separation was also extended in other directions by Bitansky and Degwekar [1].

Komargodski et al. also showed how to use \(\textsf{MCRH}\) to construct succinct argument systems. Additionally, they claimed to show a blackbox separation between \(\textsf{CRH}\) and (3, n/2)-\(\textsf{MCRH}\), but there is a gap in the proof [1, 23], and for the time being such a separation is not known.

Bitansky et al. [4] studied \(\textsf{MCRH}\) and also considered a keyless version of \(\textsf{MCRH}\). They used both variants to construct round-efficient succinct zero-knowledge arguments. Notably, they use the keyless version of \(\textsf{MCRH}\) to construct 3-message zero-knowledge arguments. Holmgren and Lombardi [13] showed how to construct \(\textsf{MCRH}\) (and even \(\textsf{CRH}\)) from exponentially secure one-way functions with certain direct product properties.

The paper closest to ours is that of Komargodski and Yogev [19] on distributional \(\textsf{CRH}\) (\(\textsf{DCRH}\)). \(\textsf{DCRH}\), first defined by Dubrov and Ishai [7], is a relaxation of \(\textsf{CRH}\) where the adversary’s task is to sample a random collision—given a function h, to sample \((x,x')\) where x is a uniformly random input and \(x'\) is uniformly random conditioned on \(h(x) = h(x')\). Whereas with some primitives like one-way functions the distributional version implies the full-fledged one [15], this is not known to be the case with \(\textsf{CRH}\). See also Bitansky et al. [3] for more recent work on \(\textsf{DCRH}\).

1.3.1 Detailed Comparison with [19]

Komargodski and Yogev show that the existence of a \((t,\Omega (n))\)-\(\textsf{MCRH}\) for any constant t implies the existence of an infinitely often \(\textsf{DCRH}\).Footnote 1 Their construction is also non-explicit and non-blackbox, and their approach is quite similar to ours. Our results are technically incomparable—they obtain a weaker primitive (\(\textsf{DCRH}\) as opposed to our \(\textsf{CRH}\)), but they can work with any \(t\text {-}\textsf{MCRH}\), whereas we are limited to 4-\(\textsf{MCRH}\). We describe their approach at a high level here and discuss the salient differences.

Let \(\mathcal {H}= \left\{ h:\{0,1\}^n\rightarrow \{0,1\}^{n/2} \right\} \) be a (3, n/2)-\(\textsf{MCRH}\). They also construct two families of hash functions such that at least one of them has to be a \(\textsf{DCRH}\). The first family is \(\mathcal {H}\) itself. Suppose \(\mathcal {H}\) is not a \(\textsf{DCRH}\) and there is an adversary A that samples uniformly random collisions for \(h\in \mathcal {H}\). Note that A is necessarily randomized. Without loss of generality (by padding), we can assume that the number \(\rho \) of random bits that A uses is larger than n. The second family of hash functions is then defined as \(\mathcal {H}_A = \left\{ f_{h,A}:\{0,1\}^\rho \rightarrow \{0,1\}^{n/2} \right\} \), where \(h\in \mathcal {H}\). The function \(f_{h,A}(r)\) is computed by first running A(hr) to get a collision \((x_0,x_1)\) and then outputting \(h(x_0)\).

If \(\mathcal {H}_A\) is also not a \(\textsf{DCRH}\), then there is another adversary \(A'\) that finds random collisions for \(f_{h,A}\in \mathcal {H}_A\). This \(A'\) can be used to find a pair of uniformly random \((r_0,r_1)\) such that \(A(h;r_0)\) and \(A(h;r_1)\) both find collisions that have the same output under h. That is, if \((x_{00},x_{01})\leftarrow A(h;r_0)\) and \((x_{10},x_{11})\leftarrow A(h;r_1)\), then \(h(x_{00}) = h(x_{01}) = h(x_{10}) = h(x_{11})\). Further, as \(r_0\) and \(r_1\) are uniformly random upto this condition, and A also samples uniformly random collisions, this set of x’s is also random conditioned on colliding under h. Thus, with very high probability, they will all be distinct, giving a 3-way collision for h.

Essentially, the work of our family of functions \(\mathcal {G}\) is here performed by the randomness of the distributional collision-finding adversary A. Such a distributional adversary is much more powerful than the normal collision-finding adversary that we have access to. The distinctness of the collisions found comes for free with a distributional adversary, whereas we have to use \(\mathcal {G}\) to get it. It also enables the constructed \(\textsf{DCRH}\) above to not lose any shrinkage compared to the original 3-\(\textsf{MCRH}\). This allows them to start from \((t,\Omega (n))\)-\(\textsf{MCRH}\) for any constant t and iteratively perform the above process to eventually get a \(\textsf{DCRH}\), while the best we can do is start from a (4, 5n/6)-\(\textsf{MCRH}\).

1.4 Open Questions

We show using non-blackbox techniques that \(\textsf{CRH}\) exist assuming the existence of sufficiently shrinking 3-\(\textsf{MCRH}\) (or 4-\(\textsf{MCRH}\)). This indicates that blackbox separations are not necessarily the last word in classifying the power of cryptographic primitives. Still, our proof is non-constructive. The question that follows immediately from this observation is whether an explicit construction of \(\textsf{CRH}\) from \(\textsf{MCRH}\) is possible.

Question 1

Can explicit \(\textsf{CRH}\) (or even \(\textsf{ioCRH}\)) be constructed from 3-\(\textsf{MCRH}\)?

The answer to this question is unclear to us. If it were positive, such a construction, apart from being useful in obtaining explicit and usable \(\textsf{CRH}\), would likely require novel and interesting techniques.

The other direction in which our results can be improved is constructing primitives that are secure in the standard cryptographic sense rather than only infinitely often secure. Infinitely often security (or hardness) comes up regularly in cryptography and complexity theory, and we are not aware of any techniques to convert such security to standard security without additional assumptions. Being able to construct such primitives is also likely to require new and interesting techniques.

Question 2

Can a standard (as opposed to i.o.) \(\textsf{CRH}\) be constructed from a 3-\(\textsf{MCRH}\)?

The third obvious question arising from our work is to construct a \(\textsf{CRH}\) from \(t\text {-}\textsf{MCRH}\) for \(t > 4\), even assuming the best possible shrinkage. As discussed in Sect. 4, our approach itself is not sufficient for this purpose and new techniques, or at least non-trivial modifications to ours, will be needed here.

Question 3

Can \(\textsf{CRH}\) be constructed from \((t,n-\textsf{polylog}(n))\)-\(\textsf{MCRH}\) for all constant t?

Apart from these, there are several adjacent questions about the primitives we deal with here. As noted above, Berman et al [2] construct \(n^2\)-\(\textsf{MCRH}\) from assumptions about problems related to the complexity class \(\textsf{SZK}\). Their construction does not extend to \(t\text {-}\textsf{MCRH}\) for constant t, and it would be interesting to see whether something like this is possible.

Question 4

Can \(t\text {-}\textsf{MCRH}\) for some constant t be constructed based on the average-case hardness of the Entropy Approximation problem (or the variant used by [2])?

Perhaps the most intriguing question is whether the classic separation of \(\textsf{CRH}\) from one-way permutations [25] can be side-stepped using non-blackbox techniques such as those in this paper. Even a non-constructive answer to this question would be pivotal to our understanding of the relative power of these key cryptographic primitives.

Question 5

Can non-blackbox techniques be used to construct \(\textsf{CRH}\) (or even \(\textsf{MCRH}\)) from one-way permutations?

Unfortunately, while our techniques are non-blackbox, they still relativize—they work in the presence of any oracle that the construction and adversaries may have access to. The existing separation [25] essentially demonstrates an oracle relative to which one-way permutations exist, but \(\textsf{CRH}\)’s do not. Thus, our approach cannot be used as is to get around it.Footnote 2

An interesting approach toward answering this question was formulated by Holmgren and Lombardi [13], who showed that exponentially secure one-way functions with strong enough direct product properties can be used to construct \(\textsf{CRH}\) (or \(\textsf{MCRH}\) if starting from a weaker security property). They point out that proving that one-way permutations have such properties would then answer the above question.

1.5 Organization

In Sect. 2 we define partial domain \(\textsf{MCRH}\) (resp., \(\textsf{CRH}\)) and show how to transform such hash functions to standard, full domain \(\textsf{MCRH}\) (resp., \(\textsf{CRH}\)). This notion, and the transformation, is important for our main results—the transformations from t-\(\textsf{MCRH}\) to \(t_f\)-\(\textsf{MCRH}\) for suitable \(t_f<t\), which are presented in Sect. 3. Finally, in Sect. 4 we show some inherent barriers to our approach.

2 Partial Domain \(\textsf{MCRH}\)

In this section we introduce and study partial domain \(\textsf{MCRH}\). Loosely speaking, these are \(\textsf{MCRH}\) defined over only a (potentially small) part of their domain. The main result shown in this section is a transformation from such partial domain \(\textsf{MCRH}\) to full-fledged \(\textsf{MCRH}\)—a transformation that will be used to establish our main theorems in Sect. 3. We remark that an impatient reader can skip directly to Sect. 3 after reviewing only the definition of partial domain \(\textsf{MCRH}\).

A partial domain \(\textsf{MCRH}\) \(\mathcal {H}= (\mathcal {H}_n)_{n \in \mathbb {N}}\) is defined similarly to an \(\textsf{MCRH}\) except that for every \(h \leftarrow \textsf{Gen}(1^n)\), some of the inputs in the domain of h may be defined as “invalid.” On such invalid inputs the hash function outputs \(h(x)=\bot \). A collision-finding adversary for such a partial domain \(\textsf{MCRH}\) needs to find a tuple of valid colliding inputs. We require that the number of valid inputs is a noticeable fraction of the domain. We proceed to the formal definition.

Definition 6

A partial domain \((t,\ell )\)-\(\textsf{MCRH}\) consists of a probabilistic polynomial-time algorithm \(\textsf{Gen}\) that on input \(1^n\) outputs a circuit \(h: \{0,1\}^n \rightarrow (\{0,1\}^{n-\ell } \cup \{\bot \})\) such that the following holds.

  1. 1.

    For every family of polynomial-size circuits \(A=(A_n)_{n \in \mathbb {N}}\), every polynomial p and all sufficiently large \(n \in \mathbb {N}\) it holds that:

    $$\begin{aligned} \Pr _{\begin{array}{c} h \leftarrow \textsf{Gen}(1^n)\\ X \leftarrow A_n(h) \end{array}} \Big [ \big ( t\text {-}\textrm{coll}_h(X) \big ) \text { and } \big ( \forall i \in [t],\; h(x_i) \ne \bot \big ) \Big ] < 1/p(n). \end{aligned}$$
    (2)
  2. 2.

    There exists a polynomial q such that with all but negligible probability over \(h \leftarrow \textsf{Gen}(1^n)\) it holds that \(\big | \{ x \in \{0,1\}^n: h(x) \ne \bot \}\big | \ge \frac{1}{q(n)} \cdot 2^n\).

To highlight the distinction from partial domain \(\textsf{MCRH}\), we will sometimes refer to a standard \(\textsf{MCRH}\) as a full domain \(\textsf{MCRH}\). We also generalize the definition of partial domain to the case of infinitely often \(\textsf{MCRH}\) and non-uniform \(\textsf{MCRH}\) in the natural way. We emphasize that the extension of Definition 6 to the infinitely often case requires Condition 1 to hold infinitely often, whereas Condition 2 remains unchanged—that is, it should hold for all sufficiently large n.

The following lemma shows how to transform a partial domain \(\textsf{MCRH}\) to a full domain \(\textsf{MCRH}\). The proof technique is based on Lautemann’s [20] proof that \(\textsf {BPP}\) is contained in the polynomial hierarchy (this technique has been used in several works in cryptography since then [5, 8, 9, 21]).

Lemma 7

If there exists a partial domain \((t,\ell )\)-\(\textsf{MCRH}\), then there exists a full domain \((t,\ell -O(\log (n)))\)-\(\textsf{MCRH}\). The same is true if both the initial and resulting \(\textsf{MCRH}\) are non-uniform and/or merely \(\textsf{ioMCRH}\).

Proof of Lemma 7

We prove the lemma with respect to standard \(\textsf{MCRH}\). The proof extends readily also to non-uniform and/or \(\textsf{ioMCRH}\).

Let \(\textsf{Gen}\) be the sampling algorithm for a partial domain \((t,\ell )\)-\(\textsf{MCRH}\) and let \(q=q(n)\) be the polynomial guaranteed in the definition (i.e., for all but a negligible fraction of hash functions at least \(2^n/q(n)\) of the inputs are valid). We construct a new full domain hash function family using a sampling algorithm \(\textsf{Gen}'\) as follows.

On input \(1^n\), the algorithm \(\textsf{Gen}'\) first invokes \(\textsf{Gen}(1^n)\) to obtain a hash function \(h: \{0,1\}^n \rightarrow \{0,1\}^{n-\ell }\). The algorithm further samples \(z_1,\dots ,z_k \in \{0,1\}^n\), where \(k=2n \cdot q(n)\). The algorithm constructs a hash function \(h'\) that on input x, outputs \(h'(x) = \big ( h(x \oplus z_i),i \big ) \in \{0,1\}^{n-\ell } \times \{0,\dots ,k\}\), were i is the minimal index such that \(h(x \oplus z_i) \ne \bot \) and in case no such i exists it outputs a default value (0, 0). We will sometimes denote the hash function by \(h'=(h,z_1,\dots ,z_k)\) and note that \(h': \{0,1\}^n \rightarrow \{0,1\}^{n-\ell +O(\log {n})}\).

Denote the subset of hash functions in the support of \(\textsf{Gen}(1^n)\) for which at least 1/q(n) fraction of the inputs is valid by H. By definition of partial domain \(\textsf{MCRH}\) we have that:

Claim 7.1.  \(\Pr _{h \leftarrow \textsf{Gen}(1^n)} [ h \not \in H ] = {{\,\textrm{negl}\,}}(n)\).

Next we argue that for \(h \in H\), with overwhelming probability over the \(z_i\)’s, no input for the hash function \(h'=(h,z_1,\dots ,z_k)\) is mapped to the default value.

Claim 7.2.  For every \(h \in H\), with all but \(2^{-n}\) probability over \(z_1,\dots ,z_k\), no input for the hash function \(h' = (h,z_1,\dots ,z_k)\) is mapped to the default value.

Proof.  For every fixed \(x \in \{0,1\}^n\) and every \(i \in [k]\), the probability over \(z_i\) that \(h(x \oplus z_i) = \bot \) is at most \(1-1/q(n)\). Therefore, the probability that \(h(x \oplus z_i) = \bot \) for all \(i \in [k]\) is at most \((1-1/q(n))^{2n \cdot q(n)} \le 2^{-2n}\). The claim follows by taking a union bound over all \(x \in \{0,1\}^n\). \(\square \)

Consider \(h' = (h,z_1,\dots ,z_k) \in H\) such that no input is mapped to the default value. In such a case, every t-way collision \(\{ x_1,\dots ,x_t \}\) for \(h'\) must satisfy that \(h(x_1 \oplus z_{i}) = h(x_2 \oplus z_i) = \dots = h(x_t \oplus z_i)\) for some \(i \in [k]\). Thus, we have a t-way collision \(\{ x_1 \oplus z_i,\dots , x_t \oplus z_i\}\) of size t also for h.

Applying Claims 7.1 and 7.2, we conclude that a collision finding algorithm wrt \(\textsf{Gen}'\), which succeeds with probability \(\epsilon =\epsilon (n)\), yields a collision finding algorithm for \(\textsf{Gen}\) that succeeds with probability \(\epsilon (n)-{{\,\textrm{negl}\,}}(n)-2^{-n}\) and the lemma follows. \(\square \)

3 Improving Collision Resistance in \(\textsf{MCRH}\)

In this section, we prove Theorems 3 to 5 (which were stated in Sect. 1.1). We start by setting up a common framework for the proofs of all of the theorems. The proofs of Theorems 3 to 5 will be completed in Sects. 3.1 to 3.3, respectively.

Setup. Consider a constant t and a (shrinkage) function \(\ell :{\mathbb {N}}\rightarrow {\mathbb {N}}\). Let \(t_f\) and k parameters that will be determined later such that \(k< t_f < t\). Define the function \(\ell _f(n) = \min \left[ \ell (n)-n/k,\ell (kn)-n(k-1)\right] \). Let \(\textsf{Gen}\) be (a sampler for) a \((t,\ell )\)-\(\textsf{ioMCRH}\). We will use \(\textsf{Gen}\) to construct a \((t_f,\ell _f)\)-\(\textsf{ioMCRH}\).Footnote 3 Below, when it is clear from the context, we sometimes use \(\ell \) as a shorthand for \(\ell (n)\). For simplicity, we will assume that k, whatever it is set to, divides n; our proof can be easily extended to work when this is not the case.

Let \({\mathbb {F}}\) be the finite field of size \(2^{n/k}\).Footnote 4 We view an input \(x \in \{0,1\}^n\) for a hash function \(h \leftarrow \textsf{Gen}(1^n)\) as representing a degree \((k-1)\) univariate polynomial over \({\mathbb {F}}\) as follows: x is interpreted as a vector \((x_0,\dots ,x_{k-1}) \in {\mathbb {F}}^k\), and the polynomial is defined as \(P_x(\xi ) = \sum _{i=0}^{k-1} x_i \cdot \xi ^i\) (where the arithmetic is over the field). For ease of notation, for \(\lambda \in {\mathbb {F}}\), we use \(x(\lambda )\) to denote the evaluation of the polynomial \(P_x\) at the point \(\lambda \).

The First Hash Family. We construct a new hash function family defined by the sampler \(\textsf{Gen}'\) that, on input \(1^n\), works as follows:

  1. 1.

    Invoke \(\textsf{Gen}(1^n)\) to obtain a hash function \(h:\{0,1\}^n \rightarrow \{0,1\}^{n-\ell }\).

  2. 2.

    Sample a random \(\lambda \in {\mathbb {F}}\).

  3. 3.

    Output the hash functionFootnote 5\(h': \{0,1\}^n \rightarrow \{0,1\}^{n-\ell +n/k}\) defined as \(h'(x) = \big ( h(x),x(\lambda ) \big )\).

If \(\textsf{Gen}'\) is a \((t_f,\ell ')\)-\(\textsf{ioMCRH}\), where \(\ell '(n) = (\ell (n) - n/k)\), then we are done. Thus, we may assume that it is not—namely, that there exists a polynomial-size circuit family \(A'=(A'_n)_{n \in \mathbb {N}}\) and a polynomial \(p'\) such that for all sufficiently large \(n \in \mathbb {N}\) it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h' \leftarrow \textsf{Gen}'(1^n)\\ X \leftarrow A'_n(h') \end{array}} \big [ t_f\text {-}\textrm{coll}_{h'}(X) \big ] \ge \frac{1}{p'(n)}. \end{aligned}$$
(3)

Using the definition of \(h'\), Eq. (3) can be rewritten as:

$$\begin{aligned} \Pr _{\begin{array}{c} h \leftarrow \textsf{Gen}(1^n)\\ \lambda \leftarrow {\mathbb {F}}\\ X \leftarrow A'_n(h,\lambda ) \end{array}} \Big [ \big ( t_f\text {-}\textrm{coll}_{h}(X) \big ) \text { and } \big ( \forall x_1,x_2 \in X,\; x_1(\lambda ) = x_2(\lambda ) \big ) \Big ] \ge \frac{1}{p'(n)}. \end{aligned}$$
(4)

For every h in the support of \(\textsf{Gen}(1^n)\), define:

$$\begin{aligned} \delta _h = \Pr _{\begin{array}{c} \lambda \leftarrow {\mathbb {F}}\\ X \leftarrow A'_n(h,\lambda ) \end{array}} \Big [ \big ( t_f\text {-}\textrm{coll}_{h}(X) \big ) \text { and } \big ( \forall x_1,x_2 \in X,\; x_1(\lambda ) = x_2(\lambda ) \big ) \Big ]. \end{aligned}$$

Thus, Eq. (4) implies that \(\textsf {E}_{h \leftarrow \textsf{Gen}(1^n)}[\delta _h] \ge \frac{1}{p'(n)}\). We shall aim to restrict our attention to hash functions h for which \(\delta _h\) is relatively large (i.e., close to the expectation). The following lemma describes a sampling algorithm for such hash functions.

Lemma 8

There exists a probabilistic polynomial time algorithm \({\widetilde{\textsf{Gen}}}\) that on input \(1^n\) outputs a hash function \(h: \{0,1\}^n \rightarrow \{0,1\}^{n-\ell }\) in the support of \(\textsf{Gen}(1^n)\) such that the following holds for all sufficiently large n:

  • \(\Pr _{h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)} \left[ \delta _h > \frac{1}{4p'(n)} \right] = 1-2^{-\Omega (n)}\).

  • For every event E:

    $$\begin{aligned} \Pr _{h \leftarrow \textsf{Gen}(1^n)} \big [ h \in E \big ] \ge \frac{1}{3p'(n)} \cdot \Pr _{h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)} \big [ h \in E \big ] - 2^{-\Omega (n)}. \end{aligned}$$

The first item in Lemma 8 states that with very high probability, a hash function h sampled by \({\widetilde{\textsf{Gen}}}\) has relatively large \(\delta _h\). The second item relates the distributions \(\textsf{Gen}\) and \({\widetilde{\textsf{Gen}}}\) and in particular implies that events that happen with non-negligible probability over the latter also happen with non-negligible probability over the former. The proof of Lemma 8 is deferred to Sect. 3.4, but on first reading, the reader may find it convenient to think of the simpler case in which all h have \(\delta _h \ge \frac{1}{4p'(n)}\) in which case we can simply take \({\widetilde{\textsf{Gen}}}=\textsf{Gen}\).

The Second Hash Family. We now use the adversary \(A'\) to construct a new partial domain non-uniform hash function family defined by a sampler \(\textsf{Gen}'' = (\textsf{Gen}''_n)_{n \in \mathbb {N}}\) as follows. The samplerFootnote 6\(\textsf{Gen}''_{n/k}\) works as follows:

  1. 1.

    Invoke \({\widetilde{\textsf{Gen}}}(1^n)\) to obtain a hash function \(h: \{0,1\}^n \rightarrow \{0,1\}^{n-\ell }\).

  2. 2.

    Output a hash functionFootnote 7\(h'': \{0,1\}^{n/k} \rightarrow (\{0,1\}^{n-\ell }) \cup \{\bot \}\) that is computed as follows:

    • The input to \(h''\), which is a vector in \(\{0,1\}^{n/k}\), is interpreted as a field element \(\lambda \in {\mathbb {F}}\) in the natural way (recall that \(|{\mathbb {F}}|=2^{n-k}\)).

    • To hash \(\lambda \), first invokeFootnote 8\(A'_n(h,\lambda )\) and then consider two cases:

      1. (a)

        Case 1: If \(A'_n(h,\lambda )\) outputs \(X \subseteq \{0,1\}^n\) such that \(t_f\text {-}\textrm{coll}_h(X)\) and \(\forall x_1,x_2 \in X,\; x_{1}(\lambda ) = x_{2}(\lambda )\). In such a case \(h''(\lambda )\) outputs h(x) for an arbitrary \(x \in X\) (the specific choice does not matter since all elements in X collide under h).

      2. (b)

        Case 2: If \(A'_n(h,\lambda )\) does not generate an output as above (which can be easily tested in polynomial-time) \(h''(\lambda )\) outputs \(\bot \).

Recall that we currently have two assumptions in place—\(\textsf{Gen}\) is a \((t,\ell )\)-\(\textsf{ioMCRH}\) and \(\textsf{Gen}'\) is not a \((t_f,\ell ')\)-\(\textsf{ioMCRH}\), with the above \(A'\) being the corresponding adversary. Under these assumptions we will prove the following lemma.

Lemma 9

\(\textsf{Gen}''\) is a partial domain non-uniform \((t_f,\ell '')\)-\(\textsf{ioMCRH}\), where \(\ell ''(n) = \ell (kn) - n(k-1)\).

Lemma 9, for various values of t and \(t_f\), together with transformation of partial domain \(\textsf{MCRH}\) into full-domain \(\textsf{MCRH}\) (Lemma 7), implies Theorems 3 to 5. To prove it, we will need to show that \(\textsf{Gen}''\) satisfies the two conditions from Definition 6, and that it has shrinkage \(\ell ''\). The latter follows by construction. We will show in Proposition 10 that \(\textsf{Gen}''\) satisfies Condition 2 of Definition 6 irrespective of the choice of \(t_f\) and k. The proof that \(\textsf{Gen}''\) satisfies Condition 1 is where the proofs of the three theorems diverge. For different values of \(t_f\) and k, the fact that it does is proven in Sects. 3.1 to 3.3, leading to Theorems 3 to 5.

Proposition 10

There exists a polynomial q such that, for all sufficiently large n, with all but negligible probability over \(h'' \leftarrow \textsf{Gen}_n''\), it holds that \(\big | \{ x \in \{0,1\}^n: h''(x) \ne \bot \}\big | \ge \frac{1}{q(n)} \cdot 2^n\).

Proof

By the first item in Lemma 8, with all but \(2^{-\Omega (n)}\) probability over \(h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)\) it holds that \(\delta _h \ge 1/(4p'(n))\). If \(\delta _h \ge 1/(4p'(n))\) then the corresponding \(h''\) (that is output by \(\textsf{Gen}_{n/k}''\) when it samples h from \({\widetilde{\textsf{Gen}}}(1^n)\)) does not output \(\bot \) on an inverse polynomial fraction of its domain. Thus, \(\textsf{Gen}''\) satisfies the requirements of the proposition. \(\square \)

3.1 From 3-\(\textsf{MCRH}\) to \(\textsf{CRH}\) (\(t=3,t_f=2\))

In this subsection, we prove that \(\textsf{Gen}''\) satisfies Condition 1 of Definition 6 under the parameter setting \(t=3\), \(t_f=2\), and \(k=2\). This is stated in the following proposition. This proves Lemma 9 under this setting, which, together with Lemma 7, completes the proof of Theorem 3.

Proposition 11

Let \(t=3\) and \(k=2\). For every family of polynomial-size circuits \(A''=(A_n'')_{n \in \mathbb {N}}\), every polynomial \(p''\) and infinitely many \(n \in \mathbb {N}\) it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h'' \leftarrow \textsf{Gen}_n''\\ (\lambda _1,\lambda _2) \leftarrow A_n''(h'') \end{array}} \Big [ \big ( \lambda _1\ne \lambda _2 \big ) \text { and } \big ( h''(\lambda _1) = h''(\lambda _2) \ne \bot \big ) \Big ] < 1/p''(n). \end{aligned}$$

Proof

Fix a hash function \(h'' \leftarrow \textsf{Gen}_{n/k}''(1^{n/k})\) and consider a pair \(\lambda _1,\lambda _2 \in {\mathbb {F}}\) such that \(\lambda _1 \ne \lambda _2\) and \(h''(\lambda _1) = h''(\lambda _2) \ne \bot \). Let \(\{x_{1,1},x_{1,2}\} = A'_n(h,\lambda _1)\) and \(\{x_{2,1},x_{2,2}\} = A'_n(h,\lambda _2)\). Recall that \(h''\) can be recast as a function \(h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)\).

Claim 11.1. The set \(\{ x_{i,j} \}_{i,j \in \{1,2\}}\) contains a 3-way collision for h.

Proof. Since \(h''(\lambda _1) \ne \bot \) we have that \(x_{1,1} \ne x_{1,2}\) but \(h(x_{1,1})=h(x_{1,2})\) and \(x_{1,1}(\lambda _1) = x_{1,2}(\lambda _2)\). Similarly, since \(h''(\lambda _2) \ne \bot \), we have that \(x_{2,1} \ne x_{2,2}\) but \(h(x_{2,1})=h(x_{2,2})\) and \(x_{2,1}(\lambda _1) = x_{2,2}(\lambda _2)\). In addition, since \(h''(\lambda _1)=h''(\lambda _2)\) we have that \(h(x_{1,1})=h(x_{2,1})\). Overall, this means that \(h(x_{1,1})=h(x_{1,2})=h(x_{2,1})=h(x_{2,2})\) so all of the elements do indeed collide.

Thus we only need to show that the set \(\{x_{1,1},x_{1,2},x_{2,1},x_{2,2}\}\) contains at least 3 distinct elements. Suppose that \(x_{1,1} = x_{2,1}\) and \(x_{1,2} = x_{2,2}\) (the other case is handled similarly). In such a case we have that the line \(x_{1,1}\) and the line \(x_{1,2}\), which are distinct lines, agree on the distinct points \(\lambda _1\) and \(\lambda _2\). But this is a contradiction since two distinct lines (i.e., degree 1 polynomials) can agree on at most one point. \(\square \)

Thus, the existence of an adversary \(A''\) contradicting the proposition’s hypothesis immediately yields a method for finding a 3-way collision for a random \(h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)\), with probability at least \(1/p''(n)\), for all sufficiently large n. By the second item of Lemma 8, this method also works for \(h \leftarrow \textsf{Gen}(1^n)\) with probability at least \(\frac{1}{3 p'(n) \cdot p''(n)} - 2^{-\Omega (n)}\) (again, for all sufficiently large n)—a contradiction. \(\square \)

3.2 From 4-\(\textsf{MCRH}\) to 3-\(\textsf{MCRH}\) (\(t=4,t_f=3\))

Having handled the case of \(t=3\), we proceed to the special case of \(t=4\). We show how to transform a sufficiently shrinking 4-\(\textsf{MCRH}\) into a 3-\(\textsf{ioMCRH}\). If the latter is sufficiently shrinking, we can then apply Theorem 3 to obtain an \(\textsf{ioCRH}\).

Thus, we need to show that \(\textsf{Gen}''\) satisfies Condition 1 of Definition 6 under the parameter setting \(t=4\), \(t_f=3\), and \(k=3\). This is stated in the following proposition. This proves Lemma 9 under this setting, which, together with Lemma 7, completes the proof of Theorem 3.

Proposition 12

Let \(t=4\) and \(k=3\). For every family of polynomial-size circuits \(A''=(A_n'')_{n \in \mathbb {N}}\), every polynomial \(p''\) and infinitely many \(n \in \mathbb {N}\) it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h'' \leftarrow \textsf{Gen}_n''\\ (\lambda _1,\lambda _2,\lambda _3) \leftarrow A_n''(h'') \end{array}} \Big [ \big ( \lambda _1,\lambda _2,\lambda _3 \text { are distinct} \big ) \text { and } \big ( h''(\lambda _1) = h''(\lambda _2) = h''(\lambda _3) \ne \bot \big ) \Big ] < 1/p''(n). \end{aligned}$$

As the proof mirrors that of Proposition 11, we provide only a sketch.

Proof Sketch

Similarly to Proposition 11, each \(\lambda _i\) yields a 3-way collision \(x_{i,1},x_{i,2},x_{i,3}\) and the set \(\{x_{i,j}\}_{i, \in \{1,2,3\}}\) all collide on h. What remains to be shown is that this set contains 4 distinct elements.

Suppose not. Then, wlog, it must be the case that \(x_{1,1}=x_{2,1}=x_{3,1}\), \(x_{2,1}=x_{2,2}=x_{2,3}\), and \(x_{1,3},=x_{2,3}=x_{3,3}\). Each one of \(x_{1,1},x_{1,2},x_{1,3}\) specifies a degree \(k-1\) polynomial, that is, a quadratic polynomial. Thus, we have 3 distinct quadratic polynomials that agree on the 3 points \(\lambda _1,\lambda _2,\lambda _3\)—a contradiction.

Overall, we get that a 3-way collision finder for \(\textsf{Gen}''\) yields a 4-way collision finder for \({\widetilde{\textsf{Gen}}}\), and therefore, as in the proof of Proposition 11, also for \(\textsf{Gen}\). \(\square \)

Overall, this yields a \((3,\ell _f-O(\log {n}))\)-\(\textsf{ioMCRH}\) from a \((4,\ell )\)-\(\textsf{MCRH}\), where \(\ell _f = \min [ \ell (n)-n/3, \ell (3n)-2n]\). In particular, if \(\ell (n) > \frac{5}{6} \cdot n + \omega (\log {n})\), we get that \(\ell _f > \frac{1}{2} n + \omega (\log {n})\). At this point we can apply Theorem 3 to derive a (non-uniform) \(\textsf{ioMCRH}\), thereby establishing Theorem 4.

3.3 From General \(t\text {-}\textsf{MCRH}\) to \(t_f\)-\(\textsf{MCRH}\)

In this subsection, we consider a generic constant t and show that \(\textsf{Gen}''\) satisfies Condition 1 of Definition 6 under the certain settings of \(t_f\) and k. This is captured by the following lemma.

Lemma 13

Consider any t, k, and \(t_f\ge \max \left[ (2t\sqrt{k-1})^{2/3},24\right] \). For every family of polynomial-size circuits \(A''=(A_n'')_{n \in \mathbb {N}}\), every polynomial p, and infinitely many \(n \in \mathbb {N}\), it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h'' \leftarrow \textsf{Gen}''(1^n)\\ X \leftarrow A_n''(h'') \end{array}} \Big [ \big ( t_f\text {-}\textrm{coll}_{h''}(X) \big ) \text { and } \big ( \forall i \in [t],\; h''(x_i) \ne \bot \big ) \Big ] < 1/p(n). \end{aligned}$$
(5)

Under the above setting of parameters, Lemma 9 follows from Lemma 13. Combined with Lemma 7 (the partial to full domain transformation), this completes the proof of Theorem 5. The proof of Lemma 13 makes use of list-decoding bounds for Reed–Solomon codes.

Proof

Assume toward a contradiction that there exists a polynomial-size circuit family \(A'' = (A''_n)_{n \in \mathbb {N}}\) and a polynomial \(p''\) such that for all sufficiently large \(n \in \mathbb {N}\) it holds that:

$$\begin{aligned} \Pr _{\begin{array}{c} h'' \leftarrow \textsf{Gen}''_{n/k}\\ \Lambda \leftarrow A''_{n/k}(h'') \end{array}} \big [ (t_f\text {-}\textrm{coll}_{h''}(\Lambda )) \text { and } (\forall \lambda \in \Lambda : h''(\lambda ) \ne \bot ) \big ] \ge 1/p''(n). \end{aligned}$$

Fix a large enough n such that both \(A_{n/k}''\) and \(A_n'\) have such non-negligible success probability. Fix also an h in the support of \({\widetilde{\textsf{Gen}}}(1^n)\) and the corresponding \(h''\) (that is output by \(\textsf{Gen}_{n/k}''\) when it samples h from \({\widetilde{\textsf{Gen}}}(1^n)\)) such that for the \(\Lambda = \{\lambda _1,\dots ,\lambda _{t_f}\}\) output by \(A''_{n/k}(h'')\), the conditions in the above probability statement hold. Denote \(X_i = A_n'(h,\lambda _i)\).

Claim 13.1.  It holds that:

  1. 1.

    For every \(i \in [t_f]\), the set \(X_i\) contains \(t_f\) distinct elements and for every \(x_1,x_2 \in X_i\) it holds that \(x_1(\lambda _i) = x_2(\lambda _{i})\).

  2. 2.

    For every \(i,j \in [t_f]\) and \(x_1 \in X_i\), \(x_2 \in X_j\) it holds that \(h(x_1)=h(x_2)\).

Proof. The fact that the event \(t_f\text {-}\textrm{coll}_{h''}(\Lambda )\) holds implies that all of the \(\lambda _i\)’s are distinct but \(h''(\lambda _{1})=\dots =h''(\lambda _{t_f}) \ne \bot \). By the definition of \(\textsf{Gen}''\), this means that for every \(i \in [t_f]\), it holds that \(A'(h,\lambda _i)\) outputs a set \(X_i = \{x_{i,1},\dots ,x_{i,t_f}\}\) such that \(t_f\text {-}\textrm{coll}_{(h,\lambda _i)}(X_i)\). This implies Item 1 in the claim as well as the fact that \(h(x_{i,j})=h(x_{i,j'})\) for every \(i,j,j' \in [t_f]\).

On the other hand, the fact that \(h''(\lambda _{1})=\dots =h''(\lambda _{t}) \ne \bot \) means that \(h(x_{1,1})=\dots =h(x_{t_f,1})\). Overall, we conclude that all of the \(x_{i,j}\)’s collide under h. This establishes Item 2. \(\square \)

Let \(X \subseteq \{0,1\}^n\) be the multi-set \(X = \cup _{i \in [t_f]} X_i\). We emphasize that X is a multi-set, where the multiplicity of an element \(x \in X\) is equal to the number of \(i \in [t_f]\) such that \(x \in X_i\). The following proposition shows that X contains a t-way collision for h.

Proposition 14

t-\(\textrm{coll}_h(X)\) holds.

Proof

By Item 2 in Claim 13.1, all elements in the set X indeed collide under h and so we only need to show that the set contains at least t distinct elements. Define a function \(f: \Lambda \rightarrow {\mathbb {F}}\) as \(f(\lambda _i)=x_{i}(\lambda _i)\), where \(x_i\) is an arbitrary element in \(X_i\) (by Item 1 in Claim 13.1, the specific choice does not matter). Let \(d=k-1\). Let \(X_{close} \subseteq X\) denote the set of points \(x \in X\) such that x, viewed as a degree d polynomial over \({\mathbb {F}}\), agrees with f on at least \(\sqrt{2t_fd}\) points in \(\Lambda \). By construction, all \(x \in X \backslash X_{close}\) have multiplicity at most \(\sqrt{2t_fd}\).

Claim 14.1. The number of distinct elements in \(X_{close}\) is at most \(\sqrt{2t_f/d}\).

This claim follows immediately from the following lemma of Sudan [26], which is a special case of an earlier lemma of Goldreich et al. [11].Footnote 9\(\square \)

Lemma 15

([11, 26]) Let \({\mathbb {F}}\) be a finite field and let \(\{ (x_i,y_i) \}_{i=1}^n \in ({\mathbb {F}}\times {\mathbb {F}})^n\) be a sequence of N pairs. The number of degree d polynomials f such that \(|\{i: f(x_i)=y_i)\}| \ge \sqrt{2d N}\) is at most \(\sqrt{2N/d}\).

Thus, the multi-set X, which contains \((t_f)^2\) elements overall (counting multiplicities), has at most \(\sqrt{2t_f/d}\) elements with multiplicity at least \(\sqrt{2t_fd}\). This means that the number of distinct elements in X is at least:

$$\begin{aligned} \frac{(t_f)^2 - \sqrt{2t_f/d} \cdot t_f}{\sqrt{2t_fd}} \ge \frac{(t_f)^{3/2}}{2 \sqrt{d}} \ge t \end{aligned}$$

where the first inequality holds for any \(t_f \ge 24\) and \(d\ge 1\), and the second inequality follows from the condition in the hypothesis that \(t_f \ge (2t\sqrt{k-1})^{2/3}\). \(\square \)

Thus, under the assumption that such an \(A''\) exists, we are able to find a t-way collision for a random \(h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)\) with probability at least \(1/p''(n)\) for all large enough n. By the second item of Lemma 8, this method also works for \(h \leftarrow \textsf{Gen}(1^n)\) with probability at least \(\left( \frac{1}{3 p'(n) \cdot p''(n)} - 2^{-\Omega (n)}\right) \) for all large enough n—a contradiction to our assumption that \(\textsf{Gen}\) is a \((t,\ell )\)-\(\textsf{ioMCRH}\). So such an \(A''\) cannot exist, which proves Lemma 13. \(\square \)

3.4 Proof of Lemma 8

Consider the following basic process \(\textsf{Gen}_0(1^n)\) (this is not yet the eventual process \({\widetilde{\textsf{Gen}}}\) which we need to show in order to prove Lemma 8).

\(\underline{\textsf{Gen}_0(1^n):}\)

  1. 1.

    Sample \(h \leftarrow \textsf{Gen}(1^n)\).

  2. 2.

    Sample \(\lambda _1,\dots ,\lambda _\ell \leftarrow {\mathbb {F}}\), where \(\ell =\Theta ( (p'(n))^2 \cdot n \cdot r(n) )\) where r is a polynomial bounding the number of random coins that \(\textsf{Gen}(1^n)\) uses. Use \(\lambda _1,\dots ,\lambda _\ell \) to compute an approximation \({\hat{\delta }}_h\) for \(\delta _h\) by setting

    $$\begin{aligned}{} & {} {\hat{\delta }}_h = \frac{1}{\ell } \cdot \Big | \Big \{ i \in [\ell ]: \big ( t_f\text {-}\textrm{coll}_{h}(X) \big ) \text { and } \big ( \forall x_1,x_2 \in X,\; x_1(\lambda _i) = x_2(\lambda _i) \big ) \\{} & {} \quad \quad \text {, where } X \leftarrow A'_n(h,\lambda _i) \Big \} \Big |. \end{aligned}$$
  3. 3.

    If \({\hat{\delta }}_h > 1/(3p'(n))\) output h otherwise output \(\bot \).

Denote by \(p_\bot = \Pr [\textsf{Gen}_0(1^n) = \bot ]\). Let \(\mu \) denote the distribution obtained by sampling from \(\textsf{Gen}_0(1^n)\) conditioned on not getting \(\bot \).

Proposition 16

\(p_\bot \le 1-1/(3p'(n))\).

Proof

Since \(E_{h \leftarrow \textsf{Gen}(1^n)}[\delta _h] \ge 1/p'(n)\) (see Eq. (4)), by Markov’s inequality, with probability \(1/2p'(n)\) over \(h \leftarrow \textsf{Gen}(1^n)\) it holds that \(\delta _h \ge 1/(2p'(n))\).

Assume that such an h is sampled in Step 1 of \(\textsf{Gen}_0(1^n)\). By the Chernoff bound, the probability that it passes the check in Step 2 is at least 0.99. In case these two events occur the process outputs \(h \ne \bot \) and so we have that \(p_\bot \le 1-1/(3p'(n))\). \(\square \)

Proposition 17

For every event E it holds that:

$$\begin{aligned} \Pr _{h \leftarrow \textsf{Gen}(1^n)}[ h \in E ] \ge (1-p_\bot ) \cdot \Pr _{h \leftarrow \mu }[ h \in E]. \end{aligned}$$

Proof

By linearity, it suffices to prove the claim for the case that \(E=\{h\}\) is a singleton. Furthermore, we can view the distribution \(\mu \) as sampling from \(\textsf{Gen}_0(1^n)\) repeatedly until a function \(h \ne \bot \) is obtained. With that in mind we have that

$$\begin{aligned} \Pr [ \mu = h ]&= \sum _{i=0}^\infty \Pr [ \mu \text { outputs }h\text { in iteration }i+1\text { and }\bot \text { in all previous iterations}] \\&= \sum _{i=0}^\infty \Pr [\textsf{Gen}_0(1^n) = h] \cdot (p_\bot )^i \\&\le \Pr [ \textsf{Gen}(1^n) = h] \cdot \frac{1}{1-p_\bot }, \end{aligned}$$

where the final inequality follows from the fact that \(\Pr [\textsf{Gen}_0(1^n)=h] \le \Pr [\textsf{Gen}(1^n) = h]\) and a standard bound on the sum of a geometric series. \(\square \)

Consider the “rejection sampling with cutoff” sampler \({\widetilde{\textsf{Gen}}}(1^n)\) defined as follows:

  1. 1.

    Repeat \(\Theta ( p'(n) \cdot n)\) times:

    1. (a)

      Sample \(h \leftarrow \textsf{Gen}_0(1^n)\).

    2. (b)

      If \(h \ne \bot \) output h and abort. Otherwise continue to the next iteration.

  2. 2.

    If this step has been reached, then output some default hash function in the support of \(\textsf{Gen}(1^n)\).

Note that \({\widetilde{\textsf{Gen}}}\) can indeed be implemented in probabilistic polynomial time.

Proposition 18

The statistical distance between \(\mu \) and \({\widetilde{\textsf{Gen}}}(1^n)\) is at most \(2^{-\Omega (n)}\).

Proof

The statistical distance between the two distributions is equal to the probability that \({\widetilde{\textsf{Gen}}}\) gets to Step 2. It follows from Proposition 16 that the latter probability is bounded by \((1-1/(3p'(n)))^{\Omega (p'(n) \cdot n)} \le 2^{-\Omega (n)}\). \(\square \)

Combining Propositions 16 to 18 we have that for every event E,

$$\begin{aligned} \Pr _{h \leftarrow \textsf{Gen}(1^n)}[ h \in E ]&\ge (1-p_\bot ) \cdot \Pr _{h \leftarrow \mu }[ h \in E] \nonumber \\&\ge \frac{1}{3p'(n)} \cdot \Pr _{h \leftarrow \mu }[ h \in E] \nonumber \\&\ge \frac{1}{3p'(n)} \cdot \Pr _{h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)}[ h \in E] - 2^{-\Omega (n)}. \end{aligned}$$
(6)

This establishes the second part of Lemma 8. The following proposition establishes also the first part.

Proposition 19

\(\Pr _{h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)} \left[ \delta _h < \frac{1}{4p'(n)} \right] = 2^{-\Omega (n)}\).

Proof

Fix h with \(\delta _h < \frac{1}{4p'(n)}\). For \({\textsf{Gen}_0}(1^n)\) to output h, the approximation must deviate by at least an \(\frac{1}{12p'(n)}\) factor which, by the Chernoff bound, happens with probability at most \(2^{-(2n+p'(n)+r(n))}\).

By taking a union bound over the \(O(p'(n) \cdot n)\) iterations in \({\widetilde{\textsf{Gen}}}(1^n)\), the probability that an h as above is sampled by the rejection sampling process is at most \(\frac{O( p'(n) \cdot n)}{2^{2n+p'(n)+r(n)}} \le 2^{-(n+r(n))}\). By another application of the union bound we have that:

$$\begin{aligned} \Pr _{h \leftarrow {\widetilde{\textsf{Gen}}}(1^n)} \left[ \delta _h< \frac{1}{4p'(n)} \right] = \sum _{h :\; \delta _h < \frac{1}{4p'(n)}} \Pr [ {\widetilde{\textsf{Gen}}}(1^n) = h ] \le 2^{r(n)} \cdot 2^{-(n+r(n))} = 2^{-n}. \end{aligned}$$

\(\square \)

Lemma 8 follows from Eq. (6) and Proposition 19.

4 Limitations of Our Approach

In this section, we discuss why our approach to constructing a \(\textsf{CRH}\) (more precisely a non-uniform \(\textsf{ioCRH}\)) cannot work when starting from a \(t\text {-}\textsf{MCRH}\) for \(t > 4\). Our discussion will not be completely formal, but should convince the reader of this claim. We will consider, in fact, a generalization of the construction presented in previous sections that uses an unspecified (list-decodable) code rather than the Reed–Solomon code. For simplicity, we go back some of the assumptions made in the presentation in Sect. 1.2—that we start with a \((t,\ell )\)-\(\textsf{MCRH}\) that simply samples uniformly random functions from a set \(\mathcal {H}= \left\{ h:\{0,1\}^n\rightarrow \{0,1\}^{n-\ell } \right\} \), and that all collision-finding adversaries below are perfect. Say we wish to construct from this a \((t_f,\ell _f)\)-\(\textsf{ioMCRH}\) for some \(t_f \le t\).

Formalizing our approach. The generalized version of our construction may be described as follows. Let C be a code with message length of n bits and codewords of length N over an alphabet \(\Sigma \). In particular, C is a subset of \(\Sigma ^N\) of size \(2^n\). (The constructions in Sect. 3 correspond to taking C to be the Reed–Solomon code of various degrees over fields of characteristic 2.) We will also write C(x) for an \(x\in \{0,1\}^n\) to denote the codeword that x is mapped to by the code. Our construction defines the following families of functions:

  • \(\mathcal {G}= \left\{ g_\lambda : \{0,1\}^n\rightarrow \Sigma \right\} _{\lambda \in [N]}\): for any \(x\in \{0,1\}^n\) and \(\lambda \in [N]\), \(g_\lambda (x)\) is the \(\lambda ^{\text {th}}\) symbol of C(x).

  • \(\mathcal {F}= \left\{ f_{h,g}:\{0,1\}^n \rightarrow \{0,1\}^{n-\ell }\times \Sigma \right\} _{h\in \mathcal {H},g\in \mathcal {G}}\): \(f_{h,g}(x)\) is simply the concatenation (h(x), g(x)). Suppose \(\mathcal {F}\) is not a \(t_f\)-\(\textsf{ioMCRH}\), and the corresponding adversary is A.

  • \(\mathcal {F}_A = \left\{ f_{h,A}: [N] \rightarrow \{0,1\}^{n-\ell } \right\} _{h\in \mathcal {H}}\): given input \(\lambda \in [N]\), the function \(f_{h,A}\) first runs \(A(h,g_\lambda )\) to get \(x_1,\dots ,x_{t_f}\in \{0,1\}^n\), and outputs \(h(x_1)\). (Here \(g_\lambda \) is the function corresponding to \(\lambda \) in \(\mathcal {G}\).)

We would like to show then that if \(\mathcal {F}\) is not a \(t_f\)-\(\textsf{ioMCRH}\) and \(\mathcal {F}_A\) constructed using the adversary A is also not a \(t_f\)-\(\textsf{ioMCRH}\), then we can find t-wise collisions for functions in \(\mathcal {H}\), which is a contradiction. In order to do this, we make use of the collision-finding adversary \(A'\) for \(\mathcal {F}_A\). The process then proceeds as follows:

  1. 1.

    Given an \(h\in \mathcal {H}\), first run \(A'(f_{h,A})\) to get functions \(g_1, \dots , g_{t_f} \in \mathcal {G}\) that collide under \(f_{h,A}\).

  2. 2.

    Then, for each \(g_i\), run \(A(f_{h,g_i})\) to get a set \(X_i = \left\{ x_{i1},\dots ,x_{it_f} \right\} \) whose elements collide under \(f_{h,g_i}\).

  3. 3.

    If there are t distinct elements in the union \(\cup _{i=1}^{t_f} X_i\), output them.

Arguments outlined in Sect. 1.2 and Sect. 3 explain why all the \(x_{ij}\)’s have the same output under h, and only the following question remains: can we ensure that there are indeed t distinct elements among the \(X_i\)’s, while \(\mathcal {F}\) and \(\mathcal {F}_A\) are both shrinking? Note that the shrinkage of \(\mathcal {F}\) is \((\ell -\log {\left| \Sigma \right| })\), and that of \(\mathcal {F}_A\) is \((\log {N} - (n-\ell ))\).

The question of the existence of t distinct \(x_{ij}\)’s may be recast as follows. We are given \(t_f\) sets of codewords \(C_i = \left\{ c_{i1},\dots ,c_{it_f} \right\} \), where each of the \(t_f\) codewords in \(C_i\) is distinct. Each \(C_i\) corresponds to a statement that, for some \(\lambda _i\in [N]\) (where the \(\lambda _i\)’s are distinct), all the codewords in \(C_i\) agree on the \(\lambda _i^{\text {th}}\) coordinate. In other words, there are \(t_f\) tuples \((\lambda _i,y_i)\in [N]\times \Sigma \) such that for all \(c_{ij}\in C_i\), we have \(c_{ij}[\lambda _i] = y_i\). We would then like to claim that there is no set of codewords \(T \subseteq C\) such that \(\left| T\right| < t\), for each i we have \(C_i \subseteq T\), and still \(c_{ij}[\lambda _i] = y_i\) for all \(i,j\in [t_f]\). At the very least, this requires that no set of \((t-1)\) codewords agree on \(t_f\) coordinates.

Optimality of current choices. It turns out, however, that (an extension of) the singleton bound implies that in order for this to happen for \(t_f < t\), the alphabet \(\Sigma \) has to be quite large, thus implying an upper bound on the shrinkage of the resulting family \(\mathcal {F}\). Let us start with the simple case of \(t = 3\) and \(t_f = 2\). Here, the condition stated above becomes the following: any 2 codewords agree on at most 1 coordinate. In other words, the distance of the code has to be at least \((N-1)\).

Proposition 20

In any code \(C \subseteq \Sigma ^N\) where \(\left| C\right| = 2^n\) and any 2 codewords agree on at most 1 coordinate, it has to be that \(\left| \Sigma \right| \ge 2^{n/2}\).

Proof

This is simply the singleton bound. Consider truncating all the codewords in C to the first two coordinates. As no two codewords agree on more than one coordinate, this set of truncated codewords still has no repetitions and so has size at least \(2^n\). This implies that \(\left| \Sigma \right| ^2 \ge 2^n\), which implies that \(\left| \Sigma \right| \ge 2^{n/2}\). \(\square \)

Proposition 20 implies that the shrinkage of \(\mathcal {F}\) is \((\ell -\log {\left| \Sigma \right| }) \le (\ell -n/2)\). In particular, this says that using a different code in place of the Reed–Solomon code (of degree 1 in this case) in our transformation from \((3,\ell )\)-\(\textsf{MCRH}\) to \(\textsf{CRH}\) cannot improve the shrinkage \(\ell \) that we can start with.

We can similarly show that our choices in our transformation from 4-\(\textsf{MCRH}\) to \(\textsf{CRH}\) were also close to optimal. To start with, note that we cannot use our approach to go directly from 4-\(\textsf{MCRH}\) to \(\textsf{CRH}\). This would require showing that 2 sets \(C_i\) of size 2 each have no intersection, which implies that for any codeword \(c \in C\), there exists at most one \(\lambda \) for which there is some \(c'\) such that \(c[\lambda ] = c'[\lambda ]\). A simple counting argument shows that this cannot happen unless \(\left| \Sigma \right| \ge 2^n\), at which point all shrinkage is lost.

So to get a \(\textsf{CRH}\) from a 4-\(\textsf{MCRH}\), we have to construct a 3-\(\textsf{MCRH}\) first. The following proposition implies that the loss in shrinkage in going from a 4-\(\textsf{MCRH}\) to a 3-\(\textsf{MCRH}\) is at least n/3 irrespective of the choice of the code C. So, in order to go from a \((4,\ell )\)-\(\textsf{MCRH}\) to a \(\textsf{CRH}\), \(\ell \) would have to be at least \((n/3+n/2) = 5n/6\), which is what we obtained.

Proposition 21

In any code \(C \subseteq \Sigma ^N\) where \(\left| C\right| = 2^n\) and any 3 codewords all agree on at most 2 coordinates, it has to be that \(\left| \Sigma \right| \ge \Omega (2^{n/3})\).

Proof

Again, truncate the codewords in C to the first 3 coordinates. This set of truncated codewords has to have at least \(2^n/2\) distinct elements. Otherwise, this would mean that some 3 codewords in C agreed on the first 3 coordinates, which is precluded by the hypothesis. Thus, \(\Sigma ^3 \ge 2^n/2\), which implies that \(\Sigma \ge (2^n/2)^{1/3}\). \(\square \)

Obstructions to improvement. More generally, the above techniques can be used to prove the following general bound.

Proposition 22

In any code \(C \subseteq \Sigma ^N\) where \(\left| C\right| = 2^n\) and any p codewords all agree on at most q coordinates, it has to be that \(\left| \Sigma \right| \ge (2^{n}/(p-1))^{1/(q+1)}\).

Proposition 22 implies, for instance, that going from a 5-\(\textsf{MCRH}\) to a 4-\(\textsf{MCRH}\) (resp. 3-\(\textsf{MCRH}\)) using our approach would incur a loss of at least n/4 (resp. n/3) in shrinkage. Further, we can show that going from a 5-\(\textsf{MCRH}\) to a 3-\(\textsf{MCRH}\) in fact incurs a loss of at least n/2. In order to do this, we show that if the alphabet \(\Sigma \) is of size somewhat less than \(2^{n/2}\), then there actually does exist a \(T\subseteq C\) of size 4 such that the sets \(C_1, C_2, C_3\) with their requisite properties are subsets of T. This is implied immediately by the following proposition.

Proposition 23

For any code \(C\subseteq \Sigma ^N\) such that \(\left| C\right| = 2^n\) and \(\left| \Sigma \right| \le 2^{n/2}/2\), there exist codewords \(c,c_1,c_2,c_3 \in C\) such that on each of the first three coordinates, at least two of the \(c_i\)’s agree with c.

Proof

Consider just the first three coordinates of codewords in C. Let \(S_1\) be the set of all codewords c such that there exists another codeword \(c'\) such that \(c[1] = c'[1]\) and \(c[2] = c'[2]\). Let \(S_2\) and \(S_3\) denote similar sets of codewords that instead look at the first and third, and second and third coordinates, respectively. If we can prove that there exists a codeword c that is contained in all of the \(S_i\)’s, then we would be done.

We do this by showing that each \(S_i\) has to be large. Take \(S_1\), for instance. By definition, \(S_1\) is the set of all codewords that have some “collision” in the first two coordinates. Since the first two coordinates are supported on \(\Sigma ^2\), the number of codewords that do not have any collisions in these coordinates can be at most \(\left| \Sigma \right| ^2\). Thus, \(S_1\) (and similarly \(S_2\) and \(S_3\)) is of size at least \((2^n-\left| \Sigma \right| ^2) \ge (3/4)\cdot 2^n\). So there has to exist at least one codeword in the intersection of all three \(S_i\)’s. Take this codeword to be c, and its colliding codeword in each \(S_i\) to be the respective \(c_i\). This proves the proposition. \(\square \)

To go from a 5-\(\textsf{MCRH}\) to a \(\textsf{CRH}\), we would first have to go to a 4-\(\textsf{MCRH}\) or a 3-\(\textsf{MCRH}\), and then to a \(\textsf{CRH}\) from there. As noted above, going from a 4-\(\textsf{MCRH}\) (resp. 3-\(\textsf{MCRH}\)) to a \(\textsf{CRH}\) already incurs a loss of at least 5n/6 (resp. n/2) in shrinkage. Following the above bounds on constructions of 4- or 3-\(\textsf{MCRH}\) from 5-\(\textsf{MCRH}\), neither of these routes is viable, and our approach as is cannot be used to construct a \(\textsf{CRH}\) from a 5-\(\textsf{MCRH}\) (and thus also from \(t\text {-}\textsf{MCRH}\) for \(t > 5\)).

Potential workarounds. One possibility to getting a \(\textsf{CRH}\) from even a 5-\(\textsf{MCRH}\) is to use the hash function h itself to split up codewords that may otherwise appear together in the sets \(C_i\). The codewords in any given \(c_i\) correspond to a set of inputs that collide under both h and g, but so far we have only used the fact that they collide under g. Could their collision under h be used meaningfully somehow to improve this approach? Of course, there might also be approaches significantly different from ours that construct \(\textsf{CRH}\) from such \(\textsf{MCRH}\).