1 Introduction

In recent years, there has been interest in the study of cryptographic primitives that are implemented by local functions, that is, functions in which each output bit depends on a constant number of input bits. This study has been in large part spurred by the discovery that, under widely accepted cryptographic assumptions, local functions can achieve rich forms of cryptographic functionality, ranging from one-wayness and pseudorandom generation to semantic security and existential unforgeability [6].

Local functions have simple structure: They can be described by a sparse input–output dependency graph and a sequence of small predicates applied at each output. Besides allowing efficient parallel evaluation, this simple structure makes local functions amenable to analysis and gives hope for proving highly non-trivial statements about them. Given that the cryptographic functionalities that local functions can achieve are quite complex, it is very interesting and appealing to try to understand which properties of local functions (namely graphs and predicates) are necessary and sufficient for them to implement such functionalities.

In this work, we focus on the study of local pseudorandom generators with large stretch. We give evidence that for most graphs, all but a handful of “degenerate” predicates yield pseudorandom generators with output length \(m = n^{1 + \varepsilon }\) for some constant \(\varepsilon > 0\). Conversely, we show that for almost all graphs, degenerate predicates are not secure even against linear distinguishers. Taken together, these results expose a dichotomy: Every predicate is either very hard or very easy, in the sense that it either yields a generator for almost all graphs or fails to do so for almost all graphs.

1.1 Easy, Sometimes Hard, and Almost Always Hard Predicates

Recall that a pseudorandom generator is a length increasing function \(f:\{0,1\}^n \rightarrow \{0,1\}^m\) such that no efficiently computable test can distinguish with noticeable advantage between the value \(f(x)\) and a randomly chosen \(y\in \{0,1\}^m\), when \(x\in \{0,1\}^n\) is chosen at random. The additive stretch of \(f\) is defined to be the difference between its output length \(m\) and its input length \(n\).

In the context of constructing local pseudorandom generators of superlinear stretch, we may assume without loss of generality that all outputs apply the same predicate \(P:\{0,1\}^d \rightarrow \{0,1\}\).Footnote 1 We are interested in understanding which \(d\)-local functions \(f_{G, P}:\{0,1\}^n \rightarrow \{0,1\}^m\), described by a graph \(G\) and a predicate \(P\), are pseudorandom generators. For a predicate \(P\), we will say

  • \(P\) is easy if \(f_{G, P}\) is not pseudorandom (for a given class of adversaries) for every \(G\),

  • \(P\) is sometimes hard if \(f_{G, P}\) is pseudorandom for some \(G\), and

  • \(P\) is almost always hard if \(f_{G, P}\) is pseudorandom for a \(1-o(1)\) fraction of \(G\).Footnote 2

Cryan and Miltersen [17] and Mossel et al. [28] identified several classes of predicates that are easy for polynomial-time algorithms when the stretch is a sufficiently large linear function. These include (1) unbalanced predicates, (2) linear predicates, (3) predicates that are biased toward one input (i.e., \(\Pr _w[P(w) = 1] \ne \frac{1}{2}\)), and (4) predicates that are biased toward a pair of inputs (i.e., \(\Pr _w[P(w)=w_i\oplus w_j]\ne \frac{1}{2}\)). We call such predicates degenerate. By a case-analysis, it can be showed that degenerate predicates include all predicates of locality at most four [17, 28].

On the positive side, Mossel et al. [28] also gave examples of five-bit predicates that are sometimes (exponentially) hard against linear distinguishers. Applebaum et al. [5] show that when the locality is sufficiently large, almost always hard predicates against linear distinguishers exist.

Pseudorandomness against linear distinguishers means that there is no subset of output bits whose XOR has noticeable bias. This notion, due to Naor and Naor [29], was advocated in the context of local pseudorandom generators by Cryan and Miltersen [17]. A bit more formally, for a function \(f:\{0,1\}^n\rightarrow \{0,1\}^{m}\), we let

$$\begin{aligned} \mathsf {bias}(f)=\max _{L}\left| \Pr [L(f(\mathcal {U}_n))=1]-\Pr [L(\mathcal {U}_m)=1] \right| , \end{aligned}$$

where the maximum is taken over all affine functions \(L:\mathbb {F}_2^m\rightarrow \mathbb {F}_2\). A small-bias generator is a function \(f\) for which \(\mathsf {bias}(f)\) is small (preferably negligible) as a function of \(n\).

1.2 Our Results

We fully classify predicates by showing that all predicates that are not known to be easy, are almost always hard.

Theorem 1.1

(Non-degenerate predicates are hard) Let \(P:\{0,1\}^d\rightarrow \{0,1\}\) be any non-degenerate predicate. Then, for every \(\varepsilon <1/4\) and \(m=n^{1+\varepsilon }\):

where \(\delta (n)=\exp (-\Omega (n^{1/4-\varepsilon }))\) and \(G\) is randomly chosen from all \(d\)-regular hypergraphs with \(n\) nodes (representing the inputs) and \(m\) hyperedges (representing the outputs).

The theorem shows that, even when locality is large, the only easy predicates are degenerate ones, and there are no other “sources of easiness” other than ones that already appear in predicates of locality \(4\) or less.

Conversely, we show that degenerate predicates are easy for linear distinguishers (as opposed to general polynomial-time distinguishers).

Theorem 1.2

(Linear tests break degenerate predicates) For every \(m=n+\Omega (n)\), and every degenerate predicate \(P:\{0,1\}^d\rightarrow \{0,1\}\)

where \(G\) is randomly chosen from all \(d\)-regular hypergraphs with \(n\) nodes and \(m\) hyperedges.

The proof of Theorem 1.2 mainly deals with degenerate predicates that are correlated with a pair of their inputs. In this case, we show that the nonlinear distinguisher which was previously used in [28] and was based on a semi-definite program for MAX-2-LIN [21] can be replaced with a simple linear distinguisher. (The proof for other degenerate predicates follows from previous works).

Taken together, Theorems 1.1 and 1.2 expose a dichotomy: A predicate can be either easy (fail for almost all graphs) or hard (succeeds for almost all graphs). One possible interpretation of our results is that, from a designer point of view, a strong emphasis should be put on the choice of the predicate, while the choice of the input–output dependency graph may be less crucial (since if the predicate is appropriately chosen then most graphs yield a small-bias generator). In some sense, this means that constructions of local pseudorandom generators with large stretch are robust: As long as the graph \(G\) is “typical,” any non-degenerate predicate can be used (our proof classifies explicitly what is a typical family of graphs and in addition shows that even a mixture of different non-degenerate predicates would work).

1.3 Why Polynomial Stretch?

While Applebaum et al. [6] give strong evidence that local pseudorandom generators exist, the stretch their construction achieves is only sublinear (\(m=n+n^{1-\varepsilon }\)). In contrast, the regime of large (polynomial or even linear) stretch is not as well understood, and the only known constructions are based on nonstandard assumptions. (See Sect. 1.5.)

Large-stretch local generators are known to have several applications in cryptography and complexity, such as secure computation with constant overhead [25] and strong (average-case) inapproximability results for constraint-satisfaction problems [7]. These results are not known to follow from other (natural) assumptions. It should be mentioned that it is possible to convert small polynomial stretch of \(m=n^{1+\varepsilon }\) into arbitrary (fixed) polynomial stretch of \(m=n^c\) at the expense of constant blow-up in the locality. (This follows from standard techniques, see [4] for details). Hence, it suffices to focus on the case of \(m=n^{1+\varepsilon }\) for some fixed \(\varepsilon \).

The proof of Theorem 1.1 yields exponentially small bias when \(m=O(n)\), and sub-exponential bias for \(m=n^{1+\varepsilon }\) where \(\varepsilon <1/4\). We do not know whether this is tight, but it can be shown that some non-degenerate predicates become easy (to break on a random graph) when the output length is \(m=n^2\) or even \(m=n^{3/2}\). In general, it seems that when \(m\) grows, the number of hard predicates of locality \(d\) decreases, till the point \(m^{\star }\) where all predicates become easy. (By [28], \(m^{\star }\le n^{d/2}\).) It will be interesting to obtain a classification for larger output lengths, and to find out whether a similar dichotomy happens there as well.

1.4 Why Small-Bias?

Small-bias generators are a strict relaxation of cryptographic pseudorandom generators in that the tests \(L:\mathbb {F}_2^m\rightarrow \mathbb {F}_2\) are restricted to be affine (as opposed to arbitrary efficiently computable functions). Even though affine functions are, in general, fairly weak distinguishers, handling them is a necessary first step toward achieving cryptographic pseudorandomness. In particular, affine functions are used extensively in cryptanalysis and security against them already rules out an extensive class of attacks.

For local pseudorandom generators with linear stretch, Cryan and Miltersen conjectured that affine distinguishers are as powerful as polynomial-time distinguishers [17]. In Sect. 5, we attempt to support this view by showing that resilience against small-bias, by itself, leads to robustness against other classes of attacks.

Small-bias generators are also motivated by their own right being used as building blocks in constructions that give stronger forms of pseudorandomness. This includes constructions of local cryptographic pseudorandom generators [4, 7], as well as pseudorandom generators that fool low-degree polynomials [14], small-space computations[24], read-once formulas [11].

1.5 Related Work

The function \(f_{G,P}\) was introduced by Goldreich [22] who conjectured that when \(m=n\), one-wayness should hold for a random graph and a random predicate. This view is supported by the results of [3, 16, 20, 22, 26, 27, 30] who show that a large class of algorithms (including ones that capture DPLL-based heuristics) fail to invert \(f_{G,P}\) in polynomial time.

At the linear regime, i.e., when \(m=n+\Omega (n)\), it is shown in [12] that if the predicate is degenerate, the function \(f_{G,P}\) can be inverted in polynomial time. (This strengthens the results of [17, 28] who only give distinguishers.) Recently, a strong self-amplification theorem was proved in [13] showing that for \(m=n+\Omega _d(n)\) if \(f_{G,P}\) is hard-to-invert over tiny (sub-exponential small) fraction of the inputs with respect to sub-exponential time algorithm, then the same function is actually hard-to-invert over almost all inputs (with respect to sub-exponential time algorithms).

Pseudorandom generators with sub-linear stretch can be implemented by \(4\)-local functions based on standard intractability assumptions (e.g., hardness of factoring, discrete-log, or lattice problems) [6], or even \(3\)-local functions based on the intractability of decoding random linear codes [8]. However, it is unknown how to extend this result to polynomial or even linear stretch since all known stretch amplification procedures introduce a large (polynomial) overhead in the locality. In fact, for the special case of \(4\)-local functions (in which each output depends on at most 4 input bits), there is a provable separation: Although such functions can compute sub-linear pseudorandom generators [6], they cannot achieve polynomial stretch [17, 28].

Alekhnovich [1] conjectured that for \(m=n+\Theta (n)\), the function \(f_{G,P}\) is pseudorandom for a random graph and when \(P\) is a randomized predicate which computes \(z_1\oplus z_2\oplus z_3\) and with some small probability \(p<\frac{1}{2}\) flips the result. Although this construction does not lead directly to a local function (due to the use of noise), it was shown in [7] that it can be derandomized and transformed into a local construction with linear stretch. (The restriction to linear stretch holds even if one strengthen Alekhnovich’s assumption to \(m=\mathrm{poly}(n)\).)

More recently, [4] showed that the pseudorandomness of \(f_{G,P}\) with respect to a random graph and output length \(m\) can be reduced to the one-wayness of \(f_{H,P}\) with respect to a random graph \(H\) and related output length \(m'\). The current paper complements this result as it provides a criteria for choosing the predicate \(P\).Footnote 3

2 Techniques and Ideas

In this section, we give an overview of the proof of our Theorem 1.1. Let \(f:\{0,1\}^n\rightarrow \{0,1\}^{m}\) be a \(d\)-local function where each output bit is computed by applying some \(d\)-local predicate \(P:\{0,1\}^d\rightarrow \{0,1\}\) to a (ordered) subset of the inputs \(S\subseteq [n]\).Footnote 4 Any such function can be described by a list of \(m\) \(d\)-tuples \(G=(S_1,\ldots ,S_m)\) and the predicate \(P\). Under this convention, we let \(f_{G,P}:\{0,1\}^n\rightarrow \{0,1\}^{m}\) denote the corresponding \(d\)-local function.

We view \(G\) as a \(d\)-regular hypergraph with \(n\) nodes (representing inputs) and \(m\) hyperedges (representing outputs) each of size \(d\). (We refer to such a graph as an \((m,n,d)\)-graph.) Since we are mostly interested in polynomial stretch, we think of \(m\) as \(n^{1+\varepsilon }\) for some fixed \(\varepsilon >0\), e.g., \(\varepsilon =0.1\).

We would like to show that for almost all \((m,n,d)\)-graphs \(G\), the function \(f_{G,P}\) fools all linear tests \(T\), where \(P\) is non-degenerate. Following [28], we distinguish between light linear tests which depend on less than \(k=\Omega (n^{1-2\varepsilon })\) outputs and heavy tests which depend on more than \(k\) outputs.

Recall that a non-degenerate predicate satisfies two forms of “nonlinearity”: (1) (2-resilient) \(P\) is uncorrelated with any linear function that involves less than 3 variables; and (2) (degree 2) the algebraic degree of \(P\) as a polynomial over \(\mathbb {F}_2\) is at least 2. Both properties are classical design criteria which are widely used in practical cryptanalysis (cf. [31]). It turns out that the first property allows to fool light tests and the second property fools heavy tests.

2.1 Fooling Light Tests

Our starting point is a result of [28] which shows that if the predicate is the parity predicate \(\oplus \) and the graph is a good expander, the output of \(f_{G,\oplus }(\mathcal {U}_n)\) perfectly fools all light linear tests. In terms of expectation, this can be written as

$$\begin{aligned} {{\mathrm{\mathsf {E}}}}_x[L(f_{G,\oplus }(x))=0], \end{aligned}$$

where we think of \(\{0,1\}\) as \(\left\{ \pm 1\right\} \), and let \(L:\left\{ \pm 1\right\} ^m\rightarrow \left\{ \pm 1\right\} \) be a light linear test. Our key insight is that the case of a general predicate \(P\) can be reduced to the case of linear predicates.

More precisely, let \(\xi \) denote the outcome of the test \(L(f_{G,P}(x))\). Then, by looking at the Fourier expansion of the predicate \(P\), we can write \(\xi \) as a convex combination over the reals of exponentially many summands of the form \(\xi _i=L(f_{G_{i},\oplus }(x))\) where the graphs \(G_{i}\) are subgraphs of \(G\). (The exact structure of \(G_i\) is determined by the Fourier representation of \(P\).) When \(x\) is uniformly chosen, the random variable \(\xi \) is a weighted sum (over the reals) of many dependent random variables \(\xi _i\)’s. We show that if \(G\) has sufficiently high vertex expansion (every not too large set of hyperedges covers many vertices) then the expectation of each summand \(\xi _i\) is zero, and so, by the linearity of expectation, the expectation of \(\xi \) is also zero.

When the predicate is 2-resilient, the size of each hyperedge of \(G_{i}\) is at least 3, and therefore, if every \(3\)-uniform subgraph of \(G\) is a good expander, \(f_{G,P}\) (perfectly) passes all light linear tests. Most graphs \(G\) satisfy this property. We emphasize that the argument crucially relies on the perfect bias of XOR predicates, as there are exponentially many summands. (See Sect. 3.1 for full details.)

2.2 Fooling Heavy Tests

Consider a heavy test which involves \(t\ge k\) outputs. Switching back to zero-one notation, assume that the test outputs the value \(\xi =P(x_{S_1})+ \cdots + P(x_{S_t}) \pmod 2\) where \(x\mathop {\leftarrow }\limits ^{R}\mathcal {U}_n\). Our goal is to show that \(\xi \) is close to a fair coin. For this, it suffices to show that the sum \(\xi \) can be rewritten as the sum (over \(\mathbb {F}_2\)) of \(\ell \) random variables

$$\begin{aligned} \xi =\xi _1+ \cdots + \xi _{\ell } \pmod 2, \end{aligned}$$
(1)

where each random variable \(\xi _i\) is an independent non-constant coin, i.e., \(\Pr [\xi _i=1]\in [2^{-d},1-2^{-d}]\). In this case, the statistical distance between \(\xi \) and a fair coin is exponentially small (in \(\ell \)), and we are done as long as \(\ell \) is large enough.

In order to partition \(\xi \), let us look at the hyperedges \(S_1,\ldots ,S_t\) which are involved in the test. As a first attempt, let us collect \(\ell \) distinct “independent” hyperedges that do not share a single common variable. Renaming the edges, we can write \(\xi \) as

$$\begin{aligned} \left( P(x_{T_1})+ \cdots + P(x_{T_{\ell }})\right) + \left( P(x_{S_{\ell +1}})+\cdots + P(x_{S_t})\right) \pmod 2, \end{aligned}$$

where the first \(\ell \) random variables are indeed statistically independent. However, the last \(t-\ell \) hyperedges violate statistical-independence as they may be correlated with more than one of the first \(\ell \) hyperedges. This is the case, for example, if \(S_{j}\) has a non-empty intersection with both \(T_i\) and \(T_r\). This problem is fixed by collecting \(\ell \) “strongly-independent” hyperedges \(T_1,\ldots , T_{\ell }\) for which every \(S_j\) intersects at most a single \(T_i\). (Such a big set is likely to exist since \(t\) is sufficiently large.) In this case, for any fixing of the variables outside the \(T_i\)’s, the random variable \(\xi \) can be partitioned into \(\ell \) independent random variables of the form \(\xi _i=P(x_{T_i})+\sum P(x_{S_j})\), where the sum ranges over the \(S_j\)’s which intersects \(T_i\). This property (which is a relaxation of Eq. 1) still suffices to achieve our goal, as long as the \(\xi _i\)’s are non-constant.

To prove the latter, we rely on the fact that \(P\) has algebraic degree 2. Specifically, let us assume that \(S_i\) and \(T_j\) have no more than a single common input node. (This condition can be typically met at the expense of throwing a small number of the \(T_i\)’s.) In this case, the random variable \(\xi _i=P(x_{T_i})+\sum P(x_{S_j})\) cannot be constant, as the first summand is a degree 2 polynomial in \(x_{T_i}\) and each of the last summands contain at most a single variable from \(T_i\). Hence, \(\xi _i\) is a non-trivial polynomial whose degree is lower-bounded by 2. This completes the argument. Interestingly, nonlinearity is used only to prove that the \(\xi _i\)’s are non-constant. Indeed, linear predicates fail exactly for large tests for which the \(\xi _i\)’s become fixed due to local cancellations. (See Sect. 3.2 for details.)

2.3 Proving Theorem 1.2

When \(P\) is a degenerate predicate and \(G\) is random, the existence of a linear distinguisher follows by standard arguments. The cases of linear or biased \(P\) are trivial, and the case of bias toward one input was analyzed by Cryan and Miltersen. When \(P\) is biased toward a pair of inputs, say the first two, we think of \(P\) as an “approximation” of the parity \(x_1 \oplus x_2\) of its first two inputs. If \(P\) happened to be the predicate \(x_1 \oplus x_2\), one could find a short “cycle” of output bits that, when XORed together, causes the corresponding input bits to cancel out. In general, as long as the outputs along the cycle do not share any additional input bits, the output of the test will be biased, with bias exponential in the length of the cycle. In Sect. 4, we show that a random \(G\) is likely to have such short cycles, and so the corresponding linear test will be biased.

3 Non-degenerate Predicates are Hard

In this section, we prove Theorem 1.1. We follow the outline described in Sect. 2 and handle light linear tests and heavy linear tests separately.

3.1 Fooling Light Tests

In this section, we show that if the predicate \(P\) is \(2\)-resilient (see definition below) and the graph \(G\) is a good expander, the function \(f_{G,P}\) is \(k\)-wise independent, and in particular fools linear tests of weight smaller than \(k\). We will need the following definitions.

Lossless expansion. Let \(G\) be an \((m,n,d)\)-graph. We will say \(G\) is \((k, t)\)-expanding (\(1 \le k \le m, 1 \le t \le d\)) if for every \(\ell \le k\), every collection of \(\ell \le k\) distinct hyperedges of \(G\) covers more than \(t\ell \) distinct vertices. We say \(G\) is \((k, a)\) -linear (\(1 \le a \le d\)) if for every collection of \(k\) distinct hyperedges \(S_1, \dots , S_k\) and every collection of subsets \(T_1 \subseteq S_1, \dots , T_k \subseteq S_k\) where \(\left| T_1 \right| , \dots , \left| T_k \right| \ge a\), the indicator vectors of \(T_1, \dots , T_k\) are linearly independent over \(\mathbb {F}_2^n\).

Fourier coefficients. The Fourier expansion of a predicate \(P:\{0,1\}^d\rightarrow \left\{ \pm 1\right\} \) is given by \(\sum _{T\subseteq [d]}\alpha _T \chi _T\) where \(\chi _T(x_1,\ldots ,x_d)=(-1)^{\sum _{t\in T} x_i}\) is Parity on the coordinates in the set \(T\). The predicate is \(a\)-resilient if \(\alpha _T\) is zero for every \(T\) of size smaller or equal to \(a\).

The following lemma shows that resiliency combined with \((k,a)\)-linearity leads to \(k\)-wise independence.

Lemma 3.1

If \(P\) is \((a-1)\)-resilient and the \((m,n,d)\)-graph \(G\) is \((k,a)\)-linear then \(f_{G,P}\) is \(k\)-wise independent generator, i.e., the \(m\) r.v.’s \((y_1,\ldots ,y_m)=f_{G,P}(\mathcal {U}_n)\) are \(k\)-wise independent.

To prove the lemma, we will employ the following fact which follows from Vazirani’s XOR lemma (cf. [23]).

Fact 3.2

A sequence of \(\pm 1\) random variables \((y_1,\ldots ,y_t)\) is \(k\)-wise independent if for every \(\ell \le k\) and every \(i_1 < i_2 < \cdots < i_{\ell }\) it holds that \({{\mathrm{\mathsf {E}}}}[y_{i_1}\cdot \cdots \cdot y_{i_{\ell }}]=0.\)

We can now prove Lemma 3.1.

Proof of Lemma 3.1

We will use the following notation: For a hyperedge \(S=(i_1,\ldots ,i_d)\) and a set \(T\subseteq [d]\), we define the \(T\)-projection of \(S\), denoted by \(S_{T}\), to be the set \(\left\{ i_j: j\in T\right\} \).

Fix an \(\ell \le k\) outputs of \(f_{G,P}\), and let \(S_1,\ldots ,S_{\ell }\) be the corresponding hyperedges. By Fact 3.2, we should show that \({{\mathrm{\mathsf {E}}}}_x [\prod _i P(x_{S_i})]=0\). For every \(x\in \{0,1\}^n\) we have:

$$\begin{aligned} \prod _{i=1}^{\ell } P(x_{S_i})= \prod _{i=1}^{\ell } \sum _{T\subseteq [d], |T|\ge a}\alpha _{T}\chi _T(x_{S_i})=\sum _{\mathbf {T}=(T_1,\ldots ,T_{\ell }),\left| T_i \right| \ge a} \prod _i \alpha _{T_i}\chi _{S_{i,T_i}}(x). \end{aligned}$$

Hence, by the linearity of expectation, it suffices to show that

$$\begin{aligned} {{\mathrm{\mathsf {E}}}}_x\left[ \prod _i \chi _{S_{i,T_i}}(x)\right] =0, \end{aligned}$$

for every \((T_1,\ldots ,T_{\ell })\) where \(T_i\subseteq [d],\left| T_i \right| \ge a\). (Recall that the \(\alpha _{T_i}\)’s are constants.) Observe that \(\prod _i \chi _{S_{i,T_i}}(x)\) is just a parity function, which, by \((k,a)\)-linearity, is nonzero. Since every nonzero parity function has expectation zero, the claim follows. \(\square \)

Next, we show that \((k,a)\)-linearity is implied by expansion, and a random graph is likely to be expanding.

Lemma 3.3

Let \(d\ge 3\) be a constant. Let \(\Delta \le \sqrt{n}/\log n\) and \(3\le a\le d\).

  1. 1.

    Every \((m, n, d)\)-graph which is \((k, d - a/2)\)-expanding is also \((k,a)\)-linear.

  2. 2.

    A random \((\Delta n, n, d)\)-graph is \((\alpha n/\Delta ^2, d - a/2)\)-expanding whp, where \(\alpha \) is a constant that depends on \(a\) and \(d\).Footnote 5

Proof

If \(G\) is not \((k, a)\) linear, then there exists a nonempty collection of \(\ell \le k\) hyperedges \(S_1, \dots , S_\ell \) of \(G\) and subsets \(T_1 \subseteq S_1, \dots , T_\ell \subseteq S_\ell \), \(\left| T_i \right| \ge a\) such that the indicator vectors of \(T_1, \dots , T_\ell \) sum up to zero over \(\mathbb {F}_2^n\). Therefore, every vertex covered by one of \(T_1, \dots , T_\ell \) must be covered at least twice, and so \(T_1, \dots , T_\ell \) can cover at most \(\frac{1}{2}(\left| T_1 \right| + \dots + \left| T_\ell \right| )\) vertices. On the other hand, the total number of vertices covered by \(S_1 - T_1, \dots , S_\ell - T_\ell \) can be at most \(\left| S_1 - T_1 \right| + \dots + \left| S_\ell - T_\ell \right| \). Therefore, the collection \(S_1, \dots , S_\ell \) covers at most

$$\begin{aligned} \frac{1}{2} (\left| T_1 \right| + \dots + \left| T_\ell \right| ) + (\left| S_1 - T_1 \right| + \dots + \left| S_\ell - T_\ell \right| )&= d\ell - \frac{1}{2} (\left| T_1 \right| + \dots + \left| T_\ell \right| )\\&\le (d - a/2) \ell \end{aligned}$$

vertices of \(G\). Thus, \(G\) is not \((k, d - a/2)\)-expanding.

The second item follows by a standard probabilistic calculation. Fix some \(t\in (d-\frac{a}{2},d)\), e.g., \(t = d - (a+1)/2\). For \(\ell \le k\), we upper bound the probability that there exists a non-expanding subset of size \(\ell \), i.e., the probability that there exists a set of hyperedges \(A\) of size \(\ell \) and a set of vertices \(B\) of size \(\ell t\) such all the vertices in \(A\) belong to \(B\) by a union bound:

$$\begin{aligned} \left( {\begin{array}{c}\Delta n\\ \ell \end{array}}\right) \cdot \left( {\begin{array}{c}n\\ t \ell \end{array}}\right) \cdot \Bigl (\frac{\ell t}{n}\Bigr )^{d \ell }&\le \Bigl (\frac{e\Delta n}{\ell } \Bigr )^{\ell } \cdot \Bigl (\frac{en}{t \ell } \Bigr )^{t \ell } \cdot \Bigl (\frac{\ell t}{n}\Bigr )^{d \ell } = \Bigl (\frac{e^{t+1}\Delta n}{\ell }\Bigr )^\ell \cdot \Bigl (\frac{\ell t}{n}\Bigr )^{(a/2)\ell }\\&= \biggl (\frac{e^{t+1}t^{a/2}\Delta }{(n/\ell )^{a/2-1}}\biggr )^{\ell }. \end{aligned}$$

where \(e\) denotes the base of the natural logarithm and the inequality follows by the well-known upper-bound \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \le \left( \frac{e n}{k}\right) ^k\). Using the assumption \(a \ge 3\), we can upper bound the last expression by \(p_\ell = (c_{d,a} \Delta \sqrt{\ell /n})^\ell \), where \(c_{d, a}\) is a constant that depends on \(d\) and \(a\) only. Now observe that

  • For \(\ell = 1, 2, 3\) we have \(p_\ell = O(1/\log n)\),

  • For \(4\le \ell \le 10 \log n\) using \(\Delta \le \sqrt{n}/\log n\) we obtain \(p_\ell \le (c_{d, a}\sqrt{\ell }/\log n)^\ell = O(1/(\log n)^2)\), and

  • For \(10 \log n \le \ell \le \alpha n/\Delta ^2\), we have \(p_\ell \le (c_{d,a} \sqrt{\alpha })^\ell = O(1/n^{10})\) for \(\alpha = 1/(2c_{d,a})^2\).

Summing up the contributions of \(p_\ell \) to the failure probability, we conclude that the probability \(G\) is not \((\alpha n/\Delta ^2, d - a/2)\) expanding is at most \(o(1)\). \(\square \)

By combining the lemmas, we obtain the following corollary.

Corollary 3.4

If \(P\) is 2-resilient and \(m=\Delta n\) for constant \(\Delta \), then whp over the choice of an \((m,n,d)\)-graph \(G\), the function \(f_{G,P}\) is \(k\)-wise independent for \(k=\Omega (n)\). If \(\Delta =n^{\varepsilon }\), the above holds with \(k=\Omega (n^{1-2\varepsilon } )\).

By taking \(\varepsilon <1/4\), we obtain that 2-resiliency suffices for \(\omega (\sqrt{n})\)-wise independence with high probability.

3.2 Fooling Heavy Tests

In this section, we show that if the predicate \(P\) is nonlinear and the graph \(G\) has large sets of “independent” hyperedges, the function \(f_{G,P}\) fools linear tests of weight larger than \(k\). Formally, we will need the following notion of independence.

\((k,\ell ,b)\) -independence. Let \(\mathcal{S}\) be a collection of \(k\) distinct hyperedges. A subset \(\mathcal{T}\subseteq \mathcal{S}\) of \(\ell \) distinct hyperedges is an \((\ell ,b)\)-independent set of \(\mathcal{S}\) if the following two properties hold: (1) Every pair of hyperedges \((T_i,T_j)\in \mathcal{T}\) are of distance at least \(2\), namely for every pair \(T_i\ne T_j \in \mathcal{T}\) and \(S\in \mathcal{S}\),

$$\begin{aligned} T_i\cap S=\emptyset \text { or } T_j\cap S=\emptyset ; \end{aligned}$$

and (2) For every \(T_i\in \mathcal{T}\) and \(S\ne T_i\) in \(\mathcal{S}\) we have

$$\begin{aligned} |T_i\cap S|< b. \end{aligned}$$

A graph is \((k,\ell ,b)\)-independent if every set of hyperedges of size larger than \(k\) has an \((\ell ,b)\)-independent set.

Our key lemma shows that good independence and large algebraic degree guarantee resistance against heavy linear tests.

Lemma 3.5

If an \((m,n,d)\)-graph \(G\) is \((k,\ell ,b)\)-independent and \(P\) has an algebraic degree of at least \(b\), then every linear test of size at least \(k\) has bias of at most \(\frac{1}{2}e^{-2\ell /2^d}\).

Proof

Fix some test \(\mathcal{S}=(S_1,\ldots ,S_k)\) of size \(k\), and let \(\mathcal{T}=(T_{1},\ldots ,T_{\ell })\) be an \((\ell ,b)\)-independent set of \(\mathcal{S}\). Fix an arbitrary assignment \(\sigma \) for all the input variables which do not participate in any of the \(T_i\)’s and choose the other variables uniformly at random. In this case, we can partition the output of the test \(y\) to \(\ell \) summands over \(\ell \) disjoint blocks of variables, namely

$$\begin{aligned} y=\sum _{i\in [k]} P(x_{S_i})=\sum _{i\in [\ell ]} z_i(x_{T_i}), \end{aligned}$$

where

$$\begin{aligned} z_i(x_{T_i})=P(x_{T_i})+\sum _{S:T_i\ne S\cap T_i\ne \emptyset } P(x_{S\cap T_i},\sigma _{S\setminus T_i}), \end{aligned}$$

and the sums are over \(\mathbb {F}_2\). We need two observations: (1) The random variables \(z_i\)’s are statistically independent (as each of them depends on a disjoint block of inputs); and (2) the r.v. \(z_i\) is non-constant and, in fact, it takes each of the two possible values with probability at least \(2^{-d}\). To prove the latter fact, it suffices to show that \(z_i(x)\) is a nonzero polynomial (over \(\mathbb {F}_2\)) of degree at most \(d\). Indeed, recall that \(z_i\) is the sum of the polynomial \(P(x_{T_i})\) whose degree is in \([b,d]\), and polynomials of the form \(P(x_{S\cap T_i},\sigma _{S\setminus T_i})\) whose degree is smaller than \(b\) (as \(|S\cap T_i|<b\)). Therefore, the degree of \(z_i\) is in \([1,d]\).

To conclude the proof, we note that the parity of \(\ell \) independent coins, each with expectation in \((\delta ,1-\delta )\), has bias of at most \(\frac{1}{2}(1-2\delta )^{\ell }\). (See, e.g., [28]). \(\square \)

We want to show that a random graph is likely to be \((k,\ell ,2)\)-independent.

Lemma 3.6

For every positive \(\varepsilon \) and \(\delta \). A random \((n^{1+\varepsilon },n,d)\)-graph is, whp, \((n^{2\varepsilon +\delta },n^{\delta /2},2)\) independent.

Proof

We will need the following claim. Call a hyperedge \(S\) \(b\)-intersecting if there exists another hyperedge \(S'\) in the graph for which \(|S'\cap S|\ge b\). We first bound the number of \(b\)-intersecting hyperedges.

Claim 3.7

Let \(b\) be a constant. Then, in a random \((m=n^{1+\varepsilon },n,d)\)-graph, whp, the number of \(b\)-intersecting hyperedges is at most \(n^{2(1+\varepsilon )-b}\log n\).

Hence, whp, at most \(O(n^{2\varepsilon }\log n)\) of the hyperedges are 2-intersecting, and for \(\varepsilon <1/4\) there are at most \(o(\sqrt{n})\) such hyperedges.

Proof (of Claim 3.7)

Let \(X\) be the random variable which counts the number of \(b\)-intersecting hyperedges. First, we bound the expectation of \(X\) by \(m^2 d^{2b}/n^b=d^{2b}\cdot n^{2(1+\varepsilon )-b}\). To prove this, it suffices to bound the expected number of pairs \(S_i,S_j\) which \(b\)-intersects. Each such pair \(b\)-intersects with probability at most \(d^{2b}/n^b\), and so, by linearity of expectation, the expected number of intersecting pairs is at most \(m^2 d^{2b}/n^b\). Now, by applying Markov’s inequality, we have that \(\Pr [X>\frac{\log n}{d^{2b}} {{\mathrm{\mathsf {E}}}}[X]]<d^{2b}/\log n=o(1)\), and the claim follows. (A stronger concentration can be obtained via a martingale argument.) \(\square \)

We can now prove Lemma 3.6. Assume, without loss of generality, that \(\varepsilon >1\) (as if the claim holds for some value of \(\varepsilon \) it also holds for smaller values). First observe that, whp, all the input nodes in \(G\) have degree at most \(2n^{\varepsilon }\). As by a multiplicative Chernoff bound, the probability that a single node has larger degree is exponentially small in \(n^{\varepsilon }\). We condition on this event and the event that there are no more than \(r=n^{2\varepsilon }\log n\) \(2\)-expanding edges. Fix a set of \(k=n^{2\varepsilon +\delta }\) hyperedges. We extract an \((\ell ,2)\)-independent set by throwing away the \(2\)-expanding edges, and then by iteratively inserting an hyperedge \(T\) into the independent set and removing all the hyperedges \(S\) that share with \(T\) a common node, and the hyperedges which share a node with an edge, that shares a node with \(T\). At the beginning, we removed at most \(r\) edges, and in each iteration, we remove at most \((d2n^{\varepsilon })^2\) edges, hence there are at least \(\ell \ge \frac{k-r}{4d^2n^{2\varepsilon }}>n^{\delta /2}\) hyperedges in the independent set. \(\square \)

Combining the lemmas together we get:

Corollary 3.8

Fix some positive \(\varepsilon \) and \(\delta \). If \(P\) has an algebraic degree of at least \(2\) and \(m=n^{1+\varepsilon }\), then, whp over the choice of a random \((m,n,d)\)-graph, the function \(f_{G,P}\) has at most sub-exponential bias (i.e., \(\exp (-\Omega (n^{\delta }))\)) against linear tests of size at least \(n^{2\varepsilon +2\delta }\).

By combining Corollaries 3.4 and 3.8, we obtain Theorem 1.1.

4 Linear Tests Break Degenerate Predicates

In this section, we prove Theorem 1.2; that is, we show that the assumptions that \(P\) is nonlinear and 2-resilient are necessary for \(P\) to be a hard predicate. Clearly, the assumption that \(P\) is nonlinear is necessary even when \(m = n + 1\).

When \(m \ge Kn\) for a sufficiently large constant \(K\) (depending on \(d\)), it follows from work of Cryan and Miltersen [17] that if \(P\) is not 1-resilient, then for any \(f:\{\pm 1\}^n \rightarrow \{\pm 1\}^m\), the output of \(f\) is distinguishable from uniform with constant advantage by some linear test. When \(P\) is 1-resilient but not 2-resilient, Mossel, Shpilka, and Trevisan show that \(f\) is distinguishable from uniform by a polynomial-time algorithm, but not by one that implements a linear test.

Here, we show that if \(P\) is not 2-resilient, then the output of \(f_{G,P}\) is distinguishable by linear tests with non-negligible advantage with high probability over the choice of \(G\).

Claim 4.1

Let \(K>4\) and \(d\in \mathbb {N}\) be constants. Assume that the predicate \(P:\{0,1\}^d\rightarrow \{0,1\}\) is unbiased and 1-resilient but not 2-resilient, i.e., \(\left| {{\mathrm{\mathsf {E}}}}[P(z)z_1z_2] \right| = \alpha > 0\). Then for every \(\ell = o(\log n)\), with probability \(1 - (2^{-\Omega (\ell )} + d\ell /n)\) over the choice of a random \((Kn,n,d)\)-graph \(G\), there exists a linear test that distinguishes the output of \(f_{G,P}\) from random with advantage \(\alpha ^\ell \).

Proof

Let \(H\) be the directed graph with vertices \(\{1, \dots , n\}\) where every hyperedge \((i_1, i_2, \dots , i_d)\) in \(G\) induces the edge \((i_1, i_2)\) in \(H\).

Let \(\ell \) be the length of the shortest directed cycle in \(H\) and without loss of generality assume that this cycle consists of the inputs \(1, 2, \dots , \ell \) in that order. Let \(z_i\) be the name of the output that involves inputs \(i\) and \(i + 1\) for \(i\) ranging from \(1\) to \(\ell \) (where \(i\) is taken modulo \(\ell \)) and \(S_i\) the corresponding hyperedge. With probability at least \(1 - d\ell /n\), input \(i\) does not participate in any hyperedge besides \(S_i\) and \(S_{i+1}\) and all other inputs participate in at most one of the hyperedges \(S_1, \dots , S_\ell \).

We now calculate the bias of the linear test that computes \(z_1 \oplus \dots \oplus z_\ell \). For simplicity, we will assume that \(d = 3\); larger values of \(d\) can be handled analogously but the notation is more cumbersome. We will denote the entries in \(S_i\) by \(i\), \(i+1\) and \(i'\). Then, the Fourier expansion of \(z_i(x_{S_i})\) has the form

$$\begin{aligned} z_i(x_{S_i}) = \alpha x_ix_{i+1} + \beta x_ix_{i'} + \gamma x_{i+1}x_{i'} + \delta x_ix_{i+1}x_{i'} \end{aligned}$$

The Fourier expansion of the expression \({{\mathrm{\mathsf {E}}}}[z_1(x_{S_1})\dots z_\ell (x_{S_\ell })]\) can be written as a sum of \(4^\ell \) products of different monomials participating in the above terms. The only monomial that does not vanish is the one containing all the \(\alpha \)-terms, namely

$$\begin{aligned} {{\mathrm{\mathsf {E}}}}\Bigl [\prod \nolimits _{i=1}^n \alpha x_ix_{i+1}\Bigr ] = \alpha ^\ell . \end{aligned}$$

All the other products of monomials contain at least one unique term of the form \(x_{i'}\), and this causes the expectation to vanish.

It remains to argue that with high probability \(\ell \) is not too large. We show that with probability \(1 - O((4/K)^{\ell })\), \(H\) has a directed cycle of length \(\ell \), as long as \(\ell < \log _{2K}(n/4)\). Let \(X\) denote the number of directed cycles of length \(\ell \) in \(H\). The number of potential directed cycles of length in \(H\) is \(n(n-1)\dots (n-\ell + 1) \ge (n - \ell )^{\ell }\). Each of these occurs uniquely in \(H\) with probability of at least

$$\begin{aligned} (Kn)(Kn - 1)\dots (Kn - \ell + 1) \Bigl (\frac{1}{n(n-1)}\Bigr )^\ell \Bigl (1 - \frac{1}{n(n-1)}\Bigr )^{Kn - \ell } \ge \Bigl (\frac{Kn - \ell }{n^2}\Bigr )^{\ell }. \end{aligned}$$

Therefore, \({{\mathrm{\mathsf {E}}}}[X] \ge (K/4)^{\ell }\). The variance can be upper bounded as follows. The number of pairs of cycles of length \(\ell \) that intersect in \(i\) edges is at most \(\left( {\begin{array}{c}\ell \\ i\end{array}}\right) n^{2\ell - i - 1}\), and the covariance of the indicators for these cycles is at most \((K/n)^{2\ell - i}\). Adding all the covariances up as \(i\) ranges from \(1\) to \(\ell \), it follows that

$$\begin{aligned} {{\mathrm{\mathsf {Var}}}}[X] \le {{\mathrm{\mathsf {E}}}}[X] + \sum _{i=1}^\ell \left( {\begin{array}{c}\ell \\ i\end{array}}\right) n^{2\ell - i - 1} \Bigl (\frac{K}{n}\Bigr )^{2\ell - i} \le {{\mathrm{\mathsf {E}}}}[X] + \frac{2^{\ell }K^{2\ell }}{n}. \end{aligned}$$

By Chebyshev’s inequality,

$$\begin{aligned} \Pr [X = 0] \le \frac{{{\mathrm{\mathsf {Var}}}}[X]}{{{\mathrm{\mathsf {E}}}}[X]^2} < \frac{2}{{{\mathrm{\mathsf {E}}}}[X]} \end{aligned}$$

as long as \(\ell < \log _{2K}(n/4)\). \(\square \)

5 Implications of Small-Bias

For local functions with large stretch, small bias seems like a good approximation for cryptographic pseudorandomness. Specifically, we are not aware of any local function \(f_{G,P}\) with linear stretch that fools linear distinguishers but can be distinguished by some polynomial-time adversary.Footnote 6 One may conjecture that if \(f_{G,P}\) fools linear adversaries for most graphs, then it also fools polynomial-time adversaries. In other words, local functions are too simple to “separate” between the two different notions. We attempt to support this view by showing that resilience against small-bias, by itself, leads to robustness against other classes of attacks.

First, we observe that, for local functions, \(k\)-wise independence follows directly from \(\varepsilon \)-bias. (This is not the case for non-local functions.)

Lemma 5.1

Let \(f:\{0,1\}^n\rightarrow \{0,1\}^{m}\) be a \(d\)-local function which is \(2^{-kd}\)-biased. Then, it is also \(k\)-wise independent.

Proof

Assume toward a contradiction that \(f\) is not \(k\)-wise independent. Then, there exists a set of \(k\) outputs \(T\) and a linear distinguisher \(L\) for which

$$\begin{aligned} \varepsilon =\left| \Pr _{y\mathop {\leftarrow }\limits ^{R}f(\mathcal {U}_n)}[L(y_{T})=1]-\Pr [L(\mathcal {U}_k)=1] \right| >0, \end{aligned}$$

where \(y_T\) denotes the restriction of the string \(y\) to the indices in \(T\). Since \(f\) is \(d\)-local, \(y_{T}\) is sampled by using less than \(kd\) bits and therefore \(\varepsilon \ge 2^{-kd}\). \(\square \)

Note that the proof of our main theorem establishes \(k\)-wise independent as an intermediate step (Sect. 3.1). However, the above lemma is stronger in the sense that it holds for every fixed graph and every output length including ones that are not covered by the main theorem.

By plugging in known results about \(k\)-wise independent distributions, it immediately follows that if a local function is sufficiently small-biased, then it is pseudorandom against \(\mathbf {AC^0}\) circuits [15], linear threshold functions over the reals [18], and degree-2 threshold functions over the reals [19].

Moreover, attacks on local functions, which are actively studied at the context of algorithms for constraint-satisfaction problems, appear to be based mainly on “local” heuristics (DPLL, message-passing algorithms, random-walk-based algorithms) or linearization [9]. Hence, it appears that in the context of local functions, the small-bias property already covers all “standard” attacks. We support this intuition by showing that small-biased local functions (on a random-looking input–output graph) are not merely min-wise independent, but have a stronger property: Even after reading an arbitrary set of \(t\)-outputs, the posterior distribution on every set of \(\ell \) inputs, while not uniform, still has \(h\) bits of min-entropy. We refer to this property as \((t,\ell ,h)\)-robustness.

Lemma 5.2

Suppose that \(P\) is a predicate for which \(f_{G,P}:\{0,1\}^n\rightarrow \{0,1\}^{m}\) is \(k\)-wise independent, whp over the choice of a random \((m,n,d)\) graph \(G\). Then, whp over the choice of a random \((m'=\Omega (m),n,d)\) graph \(H\), the function \(f_{H,P}:\{0,1\}^n\rightarrow \{0,1\}^{m'}\) is \((t=\Omega (k),\ell ,h)\)-robust, where \(h=\min \left( \ell , \Omega (m\cdot (\ell /n)^d), \Omega (k)\right) \).

(See Sect. 6 for more details and proof.) Robustness holds with inverse polynomial parameters \((t=n^{\alpha },\ell =n^{\beta },h=n^{\gamma })\) when \(m=n^{1+\varepsilon }\), and with linear parameters when \(m=O(n)\) is linear. The notion of robustness is the main technical tool used by Cook et al. [16] to prove that myopic backtracking algorithms cannot invert \(f_{G,P}\) in polynomial time (for the case \(m=n\)).Footnote 7 By Lemma 5.2, robustness follows directly “for free” from small-bias, and thus, we can derive a similar lower-bound for larger output lengths (but for a smaller class of predicates). (See Sect. 6 for details.)