Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

We derandomize Valiant’s (J ACM 62, Article 13, 2015) subquadratic-time algorithm for finding outlier correlations in binary data. This demonstrates that it is possible to perform a deterministic subquadratic-time similarity join of high dimensionality. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant’s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders by Reingold et al. (Ann Math 155(1):157–187, 2002). We say that a function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:\{-1,1\}^d\rightarrow \{-1,1\}^D$$\end{document}f:{-1,1}d→{-1,1}D is a correlation amplifier with threshold \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0\le \tau \le 1$$\end{document}0≤τ≤1, error \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma \ge 1$$\end{document}γ≥1, and strength p an even positive integer if for all pairs of vectors \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x,y\in \{-1,1\}^d$$\end{document}x,y∈{-1,1}d it holds that (i) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\langle x,y\rangle |<\tau d$$\end{document}|⟨x,y⟩|<τd implies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\langle f(x),f(y)\rangle |\le (\tau \gamma )^pD$$\end{document}|⟨f(x),f(y)⟩|≤(τγ)pD; and (ii) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\langle x,y\rangle |\ge \tau d$$\end{document}|⟨x,y⟩|≥τd implies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left (\frac{\langle x,y\rangle }{\gamma d}\right )^pD \le \langle f(x),f(y)\rangle \le \left (\frac{\gamma \langle x,y\rangle }{d}\right )^pD$$\end{document}⟨x,y⟩γdpD≤⟨f(x),f(y)⟩≤γ⟨x,y⟩dpD.

A naïve way to solve Problem 1 is to compute the n 2 inner products ⟨x, y⟩ for (x, y) ∈ X × Y and filter out everything but the outliers. Our interest is in algorithms that scale subquadratically in n, when both d and q are bounded from above by slowly growing functions of n. That is, we seek running times of the form O(n 2− ) for a constant > 0 . Furthermore, we seek to do this without a priori knowledge of q.
Running times of the form O(n 2−c ) for a constant c > 0 are immediately obtainable using techniques such as the seminal locality-sensitive hashing of Indyk and Motwani [20] and its variants (see Sect. 1.5). However, such algorithms converge to quadratic running time in n unless is bounded from below by a positive constant. Our interest is in algorithms that avoid such a "curse of weak outliers" and run in subquadratic time essentially independently of the magnitude of , provided that is sufficiently separated from . Such ability to identify weak outliers from large amounts of data is useful, among others, in machine learning from noisy data. Our task can also be seen as a high-dimensional inner-product similarity join on a large number of weakly similar attributes.
One strategy to circumvent the curse of weak outliers is to pursue the following intuition: (1) partition the input vectors into buckets of at most s vectors each, (2) aggregate each bucket into a single vector by taking the vector sum, and (3) compute the inner products between the ⌈n∕s⌉ × ⌈n∕s⌉ pairs of aggregate vectors. With sufficient separation between and , at most q of these inner products between aggregates will be large, and every outlier pair is discoverable among the at most s × s input pairs that correspond to each large inner product of aggregates. Furthermore, a strategy of this form is oblivious to q until we actually start searching inside the buckets, which enables adjusting and based on the number of large aggregate inner products. 1 3

Randomized Amplification
Such bucketing strategies have been studied before with the help of randomization. In 2012, Valiant [36] presented a breakthrough algorithm that, before bucketing, replaces each input vector with a randomly subsampled 1 version of its pth Kronecker power. Because of the tensor-power identity the ratio between outlier and background correlations gets amplified to essentially its pth power, assuming that the sample is large enough so that sufficient concentration bounds hold with high probability. This amplification makes the outliers stand out from the background even after bucketing, which enables detection in subquadratic time using fast matrix multiplication.
A subset of the present authors [23] further improved on Valiant's algorithm by a modified sampling scheme that simultaneously amplifies and aggregates the input by further use of fast matrix multiplication. With this improvement, Problem 1 can be solved in subquadratic time if the logarithmic ratio log = (log )∕(log ) is bounded from above by a constant less than 1. Also this improved algorithm relies on randomization.

Explicit Amplification
In this paper we seek deterministic subquadratic algorithms. As with the earlier randomized algorithms, we seek to map the d-dimensional input vectors to a higher dimension D so that inner products are sufficiently amplified in the process. Towards this end, we are interested in explicit functions f ∶ {−1, 1} d → {−1, 1} D that approximate the tensor-power identity (1). Remark A correlation amplifier f guarantees by (2) that correlations below in absolute value stay bounded; and by (3) that correlations at least in absolute value become positive and are governed by the two-sided approximation with (1) ⟨x ⊗p , y ⊗p ⟩ = ⟨x, y⟩ p , (2) if �⟨x, y⟩� < d, then �⟨f (x), f (y)⟩� ≤ ( ) p D ; and

Definition 1 (Correlation amplifier)
multiplicative error ≥ 1 . In particular, (3) implies that correlations at least cannot mask outliers under bucketing because all such correlations get positive sign under amplification.
It is immediate that correlation amplifiers exist. For example, take f (x) = x ⊗p , with p even, to obtain a correlation amplifier with D = d p , = 0 , and = 1 by (1). For our present purposes, however, we seek correlation amplifiers with D substantially smaller than d p . Furthermore, we seek constructions that are explicit in the strong 2 form that there exists a deterministic algorithm that computes any individual coordinate of f(x) in time poly(log D, p) by accessing poly(p) coordinates of a given In what follows explicitness always refers to this strong form.

Our Results
The main result of this paper is that sufficiently powerful explicit amplifiers exist to find outlier correlations in deterministic subquadratic time.

Theorem 1 (Explicit amplifier family) There exists an explicit correlation ampli
As a corollary we obtain a deterministic algorithm for finding outlier correlations in subquadratic time using bucketing and fast matrix multiplication. Let us write for the limiting exponent of rectangular integer matrix multiplication. That is, for all constants > 0 there exists an algorithm that multiplies an m × ⌊m ⌋ integer matrix with an ⌊m ⌋ × m integer matrix in O(m 2+ ) arithmetic operations. In particular, it is known that 0.3 < ≤ 1 [25].
Theorem 2 (Deterministic subquadratic algorithm for outlier correlations) For any constants 0 < < 1, 0 < max < 1, 0 < < , and C > 60 , there exists a deterministic algorithm that solves a given instance of Problem 1 in time assuming that the parameters n, d, , satisfy the following three constraints 2 In comparison, a weaker form of explicitness could require, for example, that there exists a deterministic algorithm that computes the entire vector f(x) from a given x in time D ⋅ poly(log D, p).
Remark In contrast to LSH-based techniques discussed previously, the running time (5) remains subquadratic regardless of the magnitude of provided that the conditions of Theorem 2 are satisfied, particularly the separation of and in condition 3. 3 The constants in (4) and (5) have not been optimized beyond our desired goal of obtaining deterministic subquadratic running time when d and q are bounded by slowly growing functions of n. In particular, (5) gives substantially worse subquadratic running times compared with the existing randomized strategies [23,36]. The algorithm in Theorem 2 needs no a priori knowledge of q and is oblivious to q until it starts searching inside the buckets.

Overview and Discussion of Techniques
A straightforward application of the probabilistic method (Lemma 12) establishes that low-dimensional correlation amplifiers can be obtained by subsampling uniformly at random the dimensions of the tensor power x ⊗p as long as the sample size D is large enough. Thus, in essence our Theorem 1 amounts to derandomizing such a subsampling strategy by presenting an explicit sample that is, up to the error bounds (2) and (3), indistinguishable from the "perfect" amplifier x ↦ x ⊗p under taking of inner products.
The construction underlying Theorem 1 amounts to an -fold composition of explicit squaring amplifiers ( p = 2 ) with increasingly strong control on the error ( ) and the interval of amplification ( [ , 1] ) at each successive composition. Towards this end, we require a flexible explicit construction of squaring amplifiers with strong control on the error and the interval. We obtain such a construction from an explicit family of expander graphs (Lemma 2) obtainable from the explicit zigzagproduct constructions of Reingold et al. [34]. In particular, the key to controlling the error and the interval is that the expander family gives Ramanujan-like 4 concentration ∕ ≤ 16 −1∕4 of the normalized second eigenvalue ∕ by increasing the degree . In essence, since we are working with {−1, 1}-valued vectors, by increasing the degree we can use the Expander Mixing Lemma (Lemma 1) and the Ramanujan-like concentration to control (Lemma 4) how well the restriction x G to the edges of an expander graph G approximates the full tensor square x ⊗2 under taking of inner products.
Our construction has been motivated by the paradigm of gradually increasing independence [9,14,15,21] in the design of pseudorandom generators. Indeed, we obtain the final amplifier gradually by successive squarings, taking care that the degree i of the expander that we apply in each squaring i = 0, 1, … , − 1 increases with a similar squaring schedule given by (11) and (15) to simultaneously control the error and the interval, and to bound the output dimension roughly by the square of the degree of the last expander in the sequence. Here the term "gradual" is not particularly descriptive since growth under successive squaring amounts to doubly exponential growth in the number of squarings. Yet such growth can be seen as gradual and controlled in the following sense: we obtain strong amplification compared with the final output dimension precisely because the first − 1 squarings "come for free" as 0 1 ⋯ −2 is (up to low-order multiplicative terms) no more than 2 −1 , essentially because we are taking the sum of powers of 2 in the exponent. The analogy with pseudorandom generators can in fact be pushed somewhat further. Namely, a correlation amplifier can be roughly seen as a pseudorandom generator that by (3) seeks to fool a "truncated family of uniform combinatorial rectangles" with further control requested by (2) below the truncation threshold . To see the rough analogy, let z ∈ {−1, 1} d be the Hadamard product of the vectors x, y ∈ {−1, 1} d and observe that (3) seeks to approximate (with multiplicative error) the expectation of a uniform random entry in the d p -length Kronecker power z ⊗p by instead taking the expectation over an explicit D-dimensional sample given by f. The Kronecker power z ⊗p is a uniform special case (with z = z 1 = z 2 = ⋯ = z p ) of a "combinatorial rectangle" formed by a Kronecker product z 1 ⊗ z 2 ⊗ ⋯ ⊗ z p , and truncation means that we only seek approximation in cases where � ∑ d u=1 z(u)� ≥ d , and accordingly want constructions that take this truncation into account-that is, we do not seek to fool all combinatorial rectangles and accordingly want stronger control on the dimension D (that is, the "seed length" log D).
For a review of the state of the art in pseudorandom generators we refer to Gopalan et al. [14] and Kothari and Meka [24]. Our goal to obtain a small output dimension D roughly corresponds to optimizing the seed length of a pseudorandom generator.
While our explicit construction (4) does not reach the exact output dimension obtainable by Lemma 12, it should be observed that in our parameter range of interest (with > 1 a constant and 0 < ≤ max for a constant 0 < max < 1 ), both (4) and (32) are of the form D ≥ d − (p) ; only the constants hidden by the asymptotic notation differ between the explicit and nonconstructive bounds. Moreover, using results of Alon [4] we show a lower bound (Lemma 17) on the output dimension D of any correlation amplifier: namely, that D ≥ 1 5 ( 1 ) p , when p is in the range governed by ( ) p ≤ 1∕100 and p ≤ . Thus, viewed as a pseudorandom generator with "seed length" log D , Theorem 1 essentially does not admit improvement except possibly at the multiplicative constants.

Related Work and Applications
Problem 1 is a basic problem in data analysis and machine learning admitting many extensions, restrictions, and variants. A large body of work exists studying approximate near neighbour search via techniques such as locality-sensitive hashing (e.g. [5-7, 13, 20, 29, 30]), with recent work aimed at derandomization (see Pagh [31] and Pham and Pagh [33]) and resource tradeoffs (see Kapralov [22]) in particular. However, these techniques enable subquadratic scaling in n only when is bounded from below by a positive constant, whereas the algorithm in Theorem 2 remains subquadratic even in the case of weak outliers when tends to zero with increasing n, as long as and are separated. Ahle et al. [1] show that subquadratic scaling in n is not possible for log = 1 − o(1∕ √ log n) unless both the Orthogonal Vectors Conjecture and the Strong Exponential Time Hypothesis [19] fail.
In the context of databases and information retrieval, inner product is one of the widely used metrics for similarity join [1]. Apart from low-dimensional approaches such as trees that partition the space in one dimension at a time, existing scalable methods for high-dimensional data employ randomization, leading to the possibility of false negatives (missed pairs) and false positives (spurious pairs). Recently Pagh [31] and Pham and Pagh [33] eliminated false negatives completely; however, their method still has a random running time. We present here a completely deterministic solution.
In small dimensions, Alman and Williams [3] present a randomized algorithm that finds exact Hamming-near neighbours in a batch-query setting analogous to Problem 1 in subquadratic time in n when the dimension is constrained to d = O(log n) . Recently, Chan and Williams [10] show how to derandomize related algorithm designs; also, Alman et al. [2] derandomize the probabilistic polynomials for symmetric Boolean functions used in [3], achieving deterministic subquadratic batch queries in small dimensions.
One special case of Problem 1 is the problem of learning a weight 2 parity function in the presence of noise, or the light bulb problem.

Problem 2 (Light bulb problem, Valiant
[37]) Suppose we are given as input a parameter 0 < < 1 and a set of n vectors in {−1, 1} d such that one planted pair of vectors has inner product at least d in absolute value, and all other n − 2 vectors are chosen independently and uniformly at random. Our task is to find the planted pair among the n vectors.
Remark From e.g. the Hoeffding bound (7) it follows that there exists a constant c such that when d ≥ c −2 log n the planted pair is with high probability (as n increases) the unique pair in the input with the maximum absolute correlation.
For a problem whose instances are drawn from a random ensemble, we say that an algorithm solves almost all instances of the problem if the probability of drawing an instance where the algorithm fails tends to zero as n increases.
Paturi et al. [32], Dubiner [11], and May and Ozerov [27] present randomized algorithms that can be used to solve almost all instances of the light bulb problem in subquadratic time if we assume that is bounded from below by a positive constant; if tends to zero these algorithms converge to quadratic running time in n.
Valiant [36] showed that a randomized algorithm can identify the planted correlation in subquadratic time on almost all inputs even when tends to zero as n increases. As a corollary of Theorem 2, we can derandomize Valiant's design and still retain subquadratic running time (but with a worse constant) for almost all inputs, except for extremely weak planted correlations with ≤ n − (1) that our amplifier is not in general able to amplify with sufficiently low output dimension to enable an overall subquadratic running time.
Corollary 1 (Deterministic subquadratic algorithm for the light bulb problem) For any constants 0 < < , C > 60 , 0 < max < 1 , and > 1 , there exists a deterministic algorithm that solves almost all instances of Problem 2 in time assuming the parameters n, d, satisfy the two constraints Corollary 1 extends to parity functions of larger (constant) weight in the presence of noise (cf. [16,23,36]). This generalized version of the problem is as follows.

Problem 3 (Learning parity with noise) Let S ⊆ [v]
with |S| = k be the support of a parity function and 0 < < 1 the noise level. Our task is to determine the set S by drawing independent random examples (x, y) such that x ∈ {−1, 1} v is chosen uniformly at random, and the label y With no information on k, the trivial solution is to enumerate all 2 v subsets of [v] to locate the support S. Blum et al. [8] provide a non-trivial solution which runs in time and sample complexity poly |1 − 2 | 2 a , 2 b for any positive integers a, b with ab ≥ v ; this is 2 O(v∕ log v) when ≠ 1∕2 is a constant independent of v. If we assert that k is a constant independent of v, the trivial complexity drops from exponential to v k , and non-trivial speed-ups seek to lower the coefficient 1 of k in the exponent. Randomized solutions for constant k include Valiant's breakthrough algorithm [36] and our subsequent randomized improvement [23] which runs in time Our present contribution is a deterministic algorithm for learning constantweight parity functions with noise. Our interest is in the case where the noise level approaches 1/2, and accordingly we assume that |1 − 2 | is bounded from above by a constant less than 1. We say that a deterministic algorithm solves almost all instances of Problem 3 if the probability of drawing an instance on which the algorithm fails tends to zero as v increases. 5 5 Observe that from an information-theoretic perspective it is a positive-but-negligible-probability event that the drawn examples do not uniquely identify S. Corollary 2 (Deterministic algorithm for learning parity with noise) For all constants 0 < < , C > 60 , > 1 , 0 < < 1 , there exists a constant k 0 and a deterministic algorithm that for all constants k ≥ k 0 draws d examples and finds the support of almost all instances of Problem 3 in time assuming the parameters v, d, satisfy the constraints Algorithms for learning parity functions enable extensions to further classes of Boolean functions such as sparse juntas and DNFs (cf. [12,28,36]).

Preliminaries
All vectors in this paper are integer-valued. For a vector x ∈ ℤ d we denote the entry for the inner product of x and y. We write log for the logarithm with base 2 and ln for the logarithm with base exp(1).
In our proofs, we need the following bound due to Hoeffding [17, Theorem 2] which provides an exponentially small upper bound on the deviation of a sum of bounded independent random variables from its expectation.
Then, for all c > 0 , the following holds:

Explicit Amplifiers by Approximate Squaring
This section proves Theorem 1. We start with preliminaries on expanders, show an approximate squaring identity using expander mixing, and then rely on repeated approximate squaring for our main construction. The proof is completed by some routine preprocessing.

Preliminaries on Expansion and Mixing
We work with undirected graphs, possibly with self-loops and multiple edges. A graph G is -regular if every vertex is incident to exactly edges, with each selfloop (if present) counting as one edge. Suppose that G is -regular with vertex set V, and let L be a set of labels such that the edge-ends incident to each vertex have been labeled with unique labels from L. The rotation map For an excellent survey on expansion and expander graphs, we refer to Hoory et al. [18].

Lemma 1 (Expander mixing lemma, [18, Lemma 2.5]) For all S, T ⊆ V(G) we have
We work with the following family of graphs obtained from the zig-zag product of Reingold et al. [34]. In particular Lemma 2 gives us ∕ ≤ 16 −1∕4 , which will enable us to control relative inner products by increasing .

Main Construction
The main objective of this section is to prove the following lemma, which we will then augment to Theorem 1 by routine preprocessing of the input dimension.

Lemma 3 (Repeated approximate squaring) There exists an explicit correlation
Approximate squaring via expanders For a vector x ∈ {−1, 1} D , let us write x ⊗2 ∈ {−1, 1} D 2 for the Kronecker product of x with itself. Our construction for correlation amplifiers will rely on approximating the squaring identity for vectors in {−1, 1} D . In more precise terms, let G be a (D, , )-graph and let j) . In particular, x G has exactly D coordinates. The amplifier function We now construct an amplifier function f that uses approximate squarings, ≥ 1 , with the graphs drawn from the graph family in Lemma 2. Accordingly, we assume that all vectors have lengths that are positive integer powers of 2.
The input x =x 0 ∈ {−1, 1} d 0 to the amplifier has dimension d 0 = 2 k for a positive integer k. For i = 0, 1, … , − 1 , suppose we have the vector x i ∈ {−1, 1} d i . Let b i be a positive integer whose value will be fixed later. Let t i be the unique positive integer with Note in particular that d i divides D i since d i is a power of 2. Let G i be a (2 16b i t i , 2 4b i , 16 ⋅ 2 3b i )-graph from Lemma 2. Take D i ∕d i copies of x i to obtain the vector Since the graph family in Lemma 2 admits rotation maps that can be computed in time poly(b, t) , we observe that f is explicit. Indeed, from the construction it is immediate that to compute any single coordinate of f (x) it suffices to (i) perform in total 2 −1−i evaluations of the rotation map of the graph G i for each i = 0, 1, … , − 1 , and (ii) access at most 2 coordinates of x. Since b i t i = O(log d ) for all i = 0, 1, … , − 1 , we have that we can compute any coordinate of f (x) in time poly(log d , 2 ) and accessing at most 2 coordinates of x.
Parameterization and analysis Fix 0 > 0 and 0 > 1 . To parameterize the amplifier (that is, it remains to fix the values b i ), let us track a pair of vectors as it proceeds through the approximate squarings for i = 0, 1, … , − 1.
We start by observing that copying preserves relative inner products. That is, for any pair of vectors An easy manipulation of Lemma 4 using the parameters in Lemma 2 gives us additive control over an approximate squaring via For all inner products that are in absolute value above a threshold, we want to turn this additive control into multiplicative control via Let us insist this multiplicative control holds whenever | i | ≥ i for the threshold parameter i defined for all i = 0, 1, … , − 1 by Enforcing (10) via (9) at the threshold, let us assume that The next lemma confirms that assuming (12) gives two-sided control of inner products which is retained to the next approximate squaring. The following lemma shows that small inner products remain small.
Proof From (9) and (12), we have Observe that 1 − −1 0 ≤ 0 − 1 . Thus, from (13) we conclude that In the converse direction, from (13) and (11) we conclude that Proof From (9) and (12), we have (14) we conclude that ◻ Let us now make sure that (12) holds. Solving for i in (12), we have In particular, we can make sure that (15) and hence (12)  . The base case i = 0 is immediate. For i ≥ 1 , there are two cases to consider. First suppose that | i | < i . Then, by Lemma 6 we have Since 0 > 1 , from Lemmas 7 and 8 it now follows that f meets the required amplification constraints (2) and (3) with p = 2 , = 0 , and = 0 .
Let us now complete the parameterization and derive an upper bound for d . For each i = 0, 1, … , − 1 , take b i to be the smallest nonnegative integer so that b i ≥ 10 and i = 2 4b i satisfies (15).
Recall that d 0 = 2 k . From (15) we have that (11), it follows that Repeatedly taking two copies of the output as necessary, for all 2 K with 2 K ≥ d we obtain a correlation amplifier with parameters (2 k , 2 K , 2 , 0 , 0 ) . This completes the proof of Lemma 3. ◻

Copy-and-Truncate Preprocessing of the Input Dimension
We still want to remove the assumption from Lemma 3 that the input dimension is a positive integer power of 2. The following copy-and-truncate preprocessing will be sufficient towards this end. Let x ∈ {−1, 1} d and let k be a positive integer. Define the vector x ∈ {−1, 1} 2 k by concatenating ⌈2 k ∕d⌉ copies of x one after another, and truncating the result to the 2 k first coordinates to obtain x.
Let us study how the map x ↦x operates on a pair of vectors x, y ∈ {−1, 1} d . For notational compactness, let us work with relative inner products ,̂ with ⟨x, y⟩ = d and ⟨x,ŷ⟩ =̂2 k .

Lemma 9 For any
Proof Let and t be the unique integers such that 2 k + = td with 0 ≤ < d . Since we are leaving out coordinates, we have

Completing the Proof of Theorem 1
Let d, K, , , be parameters meeting the constraints in Theorem 1, in particular the constraint (4). To construct a required amplifier f, we preprocess each input vector x with copy-and-truncate, obtaining a vector x of length 2 k . We then then apply an amplifier f ∶ {−1, 1} 2 k → {−1, 1} 2 K given by Lemma 3. In symbols, we define . It is immediate from Lemmas 3 and 9 that the resulting composition is explicit.
We begin by relating the given parameters of Theorem 1 to those of Lemma 3. Take 0 = 1∕2 , 0 = −1 , and select the minimal value of k so that the constraint in Lemma 9 is satisfied; that is 2 k is constrained as follows, Substituting this upper bound into the bound of Lemma 3, we get a lower bound for 2 K , Observe that an integer 2 K satisfying (4) also satisfies (16). We have not attempted to optimise our construction, and prefer the the statement of Theorem 1 as it is reasonably clean and is sufficient to prove Theorem 2.
Let us study how the map x ↦ f (x) operates on a pair of vectors x, y ∈ {−1, 1} d . For notational compactness, again we work with relative inner products ,̂, with ⟨x, y⟩ = d , ⟨x,ŷ⟩ =̂2 k , and ⟨f (x), f (y)⟩ = 2 K . Observe that in the notation of the proof of Lemma 3, we have ̂= 0 and = . Proof First we show that |̂| ≤ 0 , dividing into cases as in Lemma 9. If | | < 0 then |̂| < 0 0 = −1 To complete the proof, we condition on |̂| . If |̂| ≤ 0 then Lemma 8 applies, and we have Otherwise, 0 ≤ |̂| < and by Lemma 7 we have Proof It will be convenient to split the analysis according to whether is positive or negative. Suppose first that ≥ .

A Deterministic Algorithm for Outlier Correlations
This section proves Theorem 2. We start by describing the algorithm, then parameterize it and establish its correctness, and finally proceed to analyze the running time.

The Algorithm
Fix the constants , max , , C as in Theorem 2. Based on these constants, fix the constants 0 < < 1 and > 1 . (We fix the precise values of and later during the analysis of the algorithm, and stress that , do not depend on the given input.) Suppose we are given as input the parameters 0 < < < 1 and X, Y ⊆ {−1, 1} d with |X| = |Y| = n so that the requirements in Theorem 2 hold. We work with a cor- D, p, , ) . (We fix the precise values of the parameters p and D later during the analysis of the algorithm so that f originates from Theorem 1.) The algorithm proceeds as follows. First, apply f to each vector in X and Y to obtain the sets X f and Y f . Let s = ⌊n ⌋ . Second, partition the n vectors in both X f and Y f into ⌈n∕s⌉ buckets of size at most s each, and take the vector sum of the vectors in each bucket to obtain the sets Third, using fast rectangular matrix multiplication on X f and Ỹ f , compute the matrix Z whose entries are the inner products ⟨x,ỹ⟩ for all x ∈X f and all ỹ ∈Ỹ f . Fourth, iterate over the entries of Z, and whenever the detection inequality holds, brute-force search for outliers among the at most s 2 inner products in the corresponding pair of buckets. Output any outliers found.

Parameterization and Correctness
Let us now parameterize the algorithm and establish its correctness. Since > 1 is a constant and assuming that p is large enough, by Theorem 1 we can select D to be the integer power of 2 with Recall that we write for the exponent of rectangular matrix multiplication. To apply fast rectangular matrix multiplication in the third step of the algorithm, we want so recalling that d ≤ n and n − 1 < s , it suffices to require that Let us assume for the time being that (1 − ) − > 0 . (We will justify this assumption later when we choose a value for .) Let p be the unique positive-integer power of 2 such that We will later, when fixing and , make sure that the right-hand side in (20) is at least 1, so that p exists and is positive.
Let us now consider a single entry ⟨x,ỹ⟩ in Z, and analyze how the corresponding (at most s 2 ) inner products ⟨x, y⟩ between the two buckets of input vectors relate to the detection inequality (18). We make two claims: Claim 1 (background case). If all of the inner products have �⟨x, y⟩� ≤ d , then (18) does not hold, so the algorithm will not search inside this pair of buckets. This claim will be used to control the running time. The claim follows directly from (2) and (3), since there are at most s 2 ≤ n 2 inner products, each having �⟨f (x), f (y)⟩� ≤ ( ) p D.
Claim 2 (outlier case). If at least one of the inner products has �⟨x, y⟩� ≥ d , then (18) holds, so the algorithm searches inside this pair of buckets. This guarantees that the outliers are detected.
Note that in the third case, namely, if some inner products have �⟨x, y⟩� > d but none has �⟨x, y⟩� ≥ d , we make no claim on whether (18) holds or not. The algorithm is not required to search inside such pairs of buckets (since there are no outliers there), but may so do without hindering our overall running time bound.
We proceed to parameterize the algorithm so that Claim 2 holds. In the outlier case, by (2) and (3), there is at least one inner product with ⟨f (x), f (y)⟩ ≥ ( −1 ) p D , and the remaining at most n 2 inner products have ⟨f (x), f (y)⟩ ≥ −( ) p D . Thus in the outlier case we have For Claim 2 we need the detection inequality (18) to hold whenever (21) holds. Towards this end, it suffices to require that Rearranging and solving for p, we require that From (20) and (22) we thus see that it suffices to have or equivalently, Let us derive a lower bound for the left-hand side of (23). Fix the constant > 1 so that log = − log max 100,000 . By our assumptions we have ≤ max and 1 − log ≥ , so we have the lower bound Thus, (23) holds for all large enough n when we require Since < 1 , we have that (23) holds when we set We also observe that (1 − ) − > 0 , or equivalently, < ( − )∕ holds for our choice of .
Having now fixed and , we observe that in terms of assumption 2 of the statement of Theorem 2, we have = c 1 and (1− ) − C = c 2 . Thus the assumption ≥ c 1 n −c 2 guarantees that the right-hand side of (20) is at least 1, which was required for the existence of p. This completes the parameterization of the algorithm.

Running Time
Let us now analyze the running time of the algorithm. The first and second steps run in time Õ (nD) since p = O(log n) by (20) and f originates from Theorem 1 and hence is explicit. From (19) and n − 1 < s , we have nD ≤ 4n 1+(1− ) ≤ 4n 2− . Since (19) holds, the third step of the algorithm runs in time O (n∕s) 2+ for any constant > 0 that we are free to choose. Since n∕s ≤ 2n 1− for all large enough n, we can choose > 0 so that (2 + )(1 − ) ≤ 2 − . Thus, the first, second, and third steps together run in time O(n 2− ) . The fourth step runs in time O(n 2− + qs 2 d) . Indeed, observe from Claim 1 in § 4.2 that the detection inequality (18) holds for at most q entries in Z. We have qs 2 d ≤ qn 2 + , which completes the running time analysis and the proof of

Applications
This section proves Corollaries 1 and 2.

The Light Bulb Problem
A useful variant of the Problem 1 asks for all outlier pairs of distinct vectors drawn from a single set S ⊆ {−1, 1} d rather than two sets X, Y. We observe that the singleset variant reduces to ⌈log �S�⌉ instances of the two-set variant by numbering the vectors in S with binary numbers from 0 to |S| − 1 and splitting S into two sets X i , Y i based on the value of the i th bit for each i = 0, 1, … , ⌈log �S�⌉ − 1.

Proof of Corollary 1
We reduce to (the single-set version of) Problem 1 and apply Theorem 2. Towards this end, in Theorem 2 set = 1 − 1∕ and max = max . Suppose we are given an instance of Problem 2 whose parameters n, d, satisfy the constraints. Set = . We observe that the constraints in Theorem 2 are satisfied since (1) d ≤ n holds by assumption, (2) ≤ max holds since = ≤ max , (3) the constants c 1 and c 2 here match those in Theorem 2, and the constraint c 1 n −c 2 ∕ ≤ implies c 1 n −c 2 ≤ , and (iv) log = log log = log log We claim that q = 1 in almost all instances of Problem 2 whose parameters satisfy the constraints in Corollary 1. Indeed, by the Hoeffding bound (7) and the union bound, the probability that some other pair than the planted pair in an instance has inner product that exceeds d in absolute value is at most so q = 1 with high probability as n increases. The claimed running time follows by substituting the chosen constants and q = 1 to (5). ◻

Learning Parities with Noise
We now generalize the result for parity functions of larger constant weight, and prove Corollary 2.

Proof of Corollary 2
Fix the constants 0 < < , C > 60 , > 1 , 0 < < 1 . We will fix the value of the constant k 0 later. Let k ≥ k 0 be a constant. The algorithm first draws d examples from a given instance of Problem 3 and then transforms these to two collections of vectors that we feed to the algorithm of Theorem 2 and then proceed to mimic the proof of Corollary 1. Let us first set up some notation.
Suppose we are now given as input an instance of Problem 3 with noise level that satisfies |1 − 2 | ≤ < 1 . Furthermore, we assume that is part of the input. (If this is not the case, at the cost of increasing time complexity, we can search for using a geometric progression with limit 1/2.) With the objective of eventually applying Theorem 2, set and In particular, we have < since 0 < |1 − 2 | < 1 and > 1 . Let d be the least positive integer that satisfies where 0 < < 1∕2 is constant whose value we will fix later. Draw from the given In particular, we can assume that |X|, |Y| ≤ n for n = ⌊v k(1∕2+ ) ⌋.
The set X consists of all the vectors with a � .
Let us now study the distribution of inner products between vectors in X and Y. We write Bin ±1 (d, ) for a random variable that is the sum of d independent random variables, each of which takes the value −1 with probability , and the value 1 otherwise. Observe that the expectation of Bin ±1 (d, ) is (1 − 2 )d.
a J 1 = (a Thus, (b) holds with high probability as v increases. It remains to verify the constraints for the parameters n, d, , in Theorem 2. Suppressing the constants, our choice of d in (26) For Theorem 2 to apply, this must be bounded from above by n = v (k) , which holds if |1 − 2 | ≥ v − (k) . This holds by assumption for sufficiently large k. Select k 0 so that this constraint holds and k 0 ≥ ⌈1∕(2 )⌉ . We can choose max = and = 1 − 1∕ . We then have = |1 − 2 | 2 < max < 1 by assumption, as required. Since n ≥ v k∕2 , we also have by assumption  (24) and (25) into (26) and approximating upwards with This section shows that nontrivial correlation amplifiers exist and establishes a lower bound on the output dimension D of any correlation amplifier. The former is done by a routine application of the Hoeffding bound and the latter by applying results of Alon [4].

Low-Dimensional Amplifiers Exist
By combining the Hoeffding bound with the union bound, we observe that lowdimensional amplifiers exist.

Lemma 12 (Existence)
There exists a correlation amplifier

and d, p, D are positive integers satisfying
We also obtain a lower bound from (33) when �⟨x, y⟩� ≥ d, In fact, (33) implies conditions (2) and (3) in Definition 1. So if the function f satisfies (33) for all x, y ∈ {−1, 1} d , then f is a correlation amplifier. We use Theorem 3 to bound the probability that (33) fails, and take a union bound over the range of f to establish a non-constructive existence result for sufficiently large D.
Define the random variable Z f = ⟨f (x), f (y)⟩ . Since f(x) is a restriction onto D entries of x ⊗p chosen uniformly at random, we have Summing over the Z f ,i in (7), the probability that (33) fails to hold is bounded above by Taking a union bound over all x, y ∈ {−1, 1} d , there exists a correlation amplifier with parameters (d, D, p, , ) whenever Solving for D, we get Simplifying this expression and approximating ln 16 by 3 completes the proof. ◻

Lower Bound on Output Dimension
We next show a lower bound on the output dimension D of any correlation amplifier, when the other parameters d, p, and are given. The proof is based on taking a collection of N vectors x i ∈ {−1, 1} d , with all pairs below the background threshold , and then bounding the number of their images f (x i ) ∈ {−1, 1} D , whose absolute pairwise correlations are required to be below = ( ) p by Definition 1.

Lemma 13
There is a collection of N = exp( 2 d∕4) vectors Proof We show this by the probabilistic argument. We call a pair of vectors bad if �⟨x i , x j ⟩� ≥ d . Let a collection of vectors X 1 , X 2 , … , X N be chosen uniformly at random from {−1, 1} d . Consider a pair X i , X j with i ≠ j , and let Z ij = ⟨X i , X j ⟩ . Now Z ij is a sum of d independent random variables in [−1, 1] , with E[Z ij ] = 0 . Applying the two-sided Hoeffding bound with c = d , we observe that the pair X i , X j is bad with probability Since there are less than N 2 ∕2 = (1∕2) exp( 2 d∕2) pairs of vectors, the expected number of bad pairs is less than 1. Thus in at least one collection there are no bad pairs. ◻ To bound the number of the image vectors, we use a combinatorial result from Alon [4] to bound the rank of their correlation matrix. We will require the following lemmas. The next lemma is in essence Alon's Theorem 9.3 [4], modified to avoid any asymptotic notation. All logarithms here are in base 2. Proof Choose r as stated. Note that by the assumed range of , we have r ≥ 1 . Let further k = ⌈r⌉ , so in particular 1 ≤ r ≤ k < r + 1.
Let A = (a ij ) = (b k ij ) . Since the off-diagonal elements of B satisfy |b ij | < , it follows from the choice of k that the off-diagonal elements of A satisfy �a ij � ≤ k ≤ r = 1∕ √ N . Combining Lemmas 14 and 15, we have Taking logarithms and rearranging the inequality we obtain implying Observing that log N = r log(1∕ 2 ) , we get and, since ≤ 1∕100 and r ≥ 1 , this implies as stated. ◻

Remark
The parameter r measures, in a sense, the distance from the case of an extremely low correlation requirement = 1∕ √ N . If r tends to infinity, the exponent 2r∕(r + 1) approaches 2, matching the asymptotic form given by Alon [4]. However, with small r the exponent diminishes, reaching 1 in the limiting case r = 1 , that is, when = 1∕ √ N . In the limiting case a direct application of Lemma 14 would give the better linear bound D � ≥ N∕2.
We can now combine Lemmas 13 and 16 to get a lower bound on output dimension. .  ( 2 d∕8) . For p greater than the limit, one can essentially map all of the N = exp( 2 d∕4) input vectors to orthogonal output vectors of dimension D ≤ 2N using a Hadamard matrix, in which case (2) holds for arbitrary p > 1.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix: An Expander Family
This section proves Lemma 2 following Reingold et al. [34]; we present the proof for completeness of exposition only with no claim of originality. Following Reingold et al. [34] we will work with normalized eigenvalues. To avoid confusion with the unnormalized treatment in the manuscript proper, we say that a graph is a [D, , ] -graph if the graph has D vertices, is -regular, and | 2 |∕ ≤ . (Here | 2 | is the unnormalized second eigenvalue as defined in the manuscript proper.) Finally, we construct the expanders that we require in the manuscript proper.
Proof Take q = 2 b and d = 15 in Proposition 5.3 of Reingold et al. [34] to obtain a [2 16b , 2 2b , 15 ⋅ 2 −b ]-graph H whose rotation map can be computed in time poly(b) . (Indeed, observe that an irreducible polynomial to perform the required arithmetic in the finite field of order 2 b can be constructed in deterministic time poly(b) by an algorithm of Shoup [35].) Let us study the sequence G t given by (36). The time complexity of the rotation map follows immediately from Lemma 19. Since b ≥ 10 , Lemma 20 gives that t ≤ + 4 2 for all t ≥ 1 . Take = 15 ⋅ 2 −b and observe that since b ≥ 10 we have 2 −b < 1∕900 . Thus,