The Hardest Halfspace

We study the approximation of halfspaces $h:\{0,1\}^n\to\{0,1\}$ in the infinity norm by polynomials and rational functions of any given degree. Our main result is an explicit construction of the"hardest"halfspace, for which we prove polynomial and rational approximation lower bounds that match the trivial upper bounds achievable for all halfspaces. This completes a lengthy line of work started by Myhill and Kautz (1961). As an application, we construct a communication problem that achieves essentially the largest possible separation, of $O(n)$ versus $2^{-\Omega(n)},$ between the sign-rank and discrepancy. Equivalently, our problem exhibits a gap of $\log n$ versus $\Omega(n)$ between the communication complexity with unbounded versus weakly unbounded error, improving quadratically on previous constructions and completing a line of work started by Babai, Frankl, and Simon (FOCS 1986). Our results further generalize to the $k$-party number-on-the-forehead model, where we obtain an explicit separation of $\log n$ versus $\Omega(n/4^{n})$ for communication with unbounded versus weakly unbounded error. This gap is a quadratic improvement on previous work and matches the state of the art for number-on-the-forehead lower bounds.

for every halfspace h in n variables. In words, every halfspace can be approximated pointwise by a linear polynomial to error just barely smaller than the trivial bound of 1. Many authors pursued matching lower bounds on E(h, 1) for specific halfspaces h, culminating in an explicit construction by Håstad [33] that matches Muroga's bound (1.1). The study of E(h, d) for d 2 proved to be challenging. For a long time, essentially the only result was the lower bound E(h, d) 1 − 2 −Θ(n/d 2 )+1 due to Beigel [16], where h is the so-called odd-max-bit halfspace. Paturi [62] proved the incomparable lower bound E(h, Θ(n)) 1/3, where h is the majority function on n bits. Much later, the bound E(h, Θ( √ n)) 1 − 2 −Θ( √ n) was obtained in [76] for an explicit halfspace. This fragmented state of affairs persisted until the question was resolved completely in [77], with an existence proof of a halfspace h such that E(h, d) 1 − 2 −Θ(n) for d = 1, 2, . . . , Θ(n). This result is clearly as strong as one could hope for, since it essentially matches Muroga's upper bound for approximation by linear polynomials. The work in [77] further determined the minimum error, denoted R(h, d), to which this h can be approximated by a degree-d rational function, showing that this quantity too is as large for h as it can be for any halfspace. Explicitly constructing a halfspace with these properties is our main technical contribution: where c > 0 is an absolute constant.
Classic bounds for the approximation of the sign function imply that for any d, the lower bounds in Theorem 1.1 are essentially the best possible for any halfspace on n variables (see Sections 5.1 and 5.2 for details). Thus, the construction of Theorem 1.1 is the "hardest" halfspace from the point of view of approximation by polynomials and rational functions. Theorem 1.1 is not a de-randomization of the existence proof in [77], which incidentally we are still unable to de-randomize. Rather, it is based on a new and simpler approach, presented in detail at the end of this section. Given the role that halfspaces play in theoretical computer science, we see Theorem 1.1 as answering a basic question of independent interest. In addition, Theorem 1.1 has applications to communication complexity and computational learning, which we now discuss.
1.2. Discrepancy vs. sign-rank. Consider the standard model of randomized communication [50], which features players Alice and Bob and a Boolean function F : X × Y → {−1, +1}. On input (x, y) ∈ X × Y, Alice and Bob receive the arguments x and y, respectively. Their objective is to compute F on any given input with minimal communication. To this end, each player privately holds an unlimited supply of uniformly random bits which he or she can use in deciding what message to send at any given point in the protocol. The cost of a protocol is the total number of bits exchanged by Alice and Bob in a worst-case execution. The -error randomized communication complexity of F , denoted R (F ), is the least cost of a protocol that computes F with probability of error at most on every input.
Our interest in this paper is in communication protocols with error probability close to that of random guessing, 1/2. There are two standard ways to define the complexity of a function F in this setting, both inspired by probabilistic polynomial time for Turing machines [31]: for some explicitly given reals w 0 , w 1 , . . . , w n , such that UPP(F n ) log n + O(1), PP(F n ) = Ω(n).
Theorem 1.2 gives essentially the strongest possible separation of the communication classes PP and UPP, improving quadratically on previous constructions and matching the previous nonconstructive separation. Another compelling aspect of the theorem is the simple form (1.2) of the communication problem in question. The last two bounds in Theorem 1.2 state that F n has sign-rank at most n + 1 and discrepancy 2 −Ω(n) , which is essentially the strongest possible separation. The best previous construction [71] achieved sign-rank O(n) and discrepancy 2 −Ω( √ n) . We further generalize Theorem 1.2 to the number-on-the-forehead k-party model, the standard formalism of multiparty communication. Analogous to two-party communication, the k-party model has its own classes UPP k and PP k of problems solvable efficiently by protocols with unbounded error and weakly unbounded error, respectively. Their formal definitions can be found in Section 2.8. In this setting, we prove: There is a k-party communication problem F n : ({0, 1} n ) k → {−1, +1}, defined by F n (x 1 , x 2 , . . . , x k ) = sgn w 0 + n i=1 w i x 1,i x 2,i · · · x k,i for some explicitly given reals w 0 , w 1 , . . . , w n , such that UPP(F n ) log n + O(1), PP(F n ) = Ω n 4 k , disc(F n ) = exp −Ω n 4 k . Theorem 1.3 gives essentially the strongest possible explicit separation of the kparty communication complexity classes UPP k and PP k for up to k (0.5 − ) log n parties, where > 0 is an arbitrary constant. The previous best explicit separation [27,80] of these classes was quadratically weaker, with communication complexity Ω( √ n/4 k ) for unbounded error and O(log n) for weakly unbounded error. The communication lower bound in Theorem 1.3 reflects the state of the art in the area, in that the strongest lower bound for any explicit communication problem F : ({0, 1} n ) k → {−1, +1} to date is Ω(n/2 k ) due to Babai et al. [12].

Computational learning.
A sign-representing polynomial for a given function f : {0, 1} n → {−1, +1} is any real polynomial p such that f (x) = sgn p(x) for all x. The minimum degree of a sign-representing polynomial for f is called the threshold degree of f, denoted deg ± (f ). Clearly 0 deg ± (f ) n for every Boolean function f on n variables. The reader can further verify that sign-representation is equivalent to pointwise approximation with error strictly less than, but arbitrarily close to, the trivial error of 1. Sign-representing polynomials are appealing from a learning standpoint because they immediately lead to efficient learning algorithms. Indeed, any function of threshold degree d is by definition a linear combination of N = n 0 + n 1 + · · · + n d monomials and can thus be viewed as a halfspace in N dimensions. As a result, f can be PAC learned [86] under arbitrary distributions in time polynomial in N, using a variety of halfspace learning algorithms.
The study of sign-representing polynomials started fifty years ago with the seminal monograph of Minsky and Papert [57], who examined the threshold degree of several common functions. Since then, the threshold degree approach has yielded the fastest known PAC learning algorithms for notoriously hard concept classes, including DNF formulas [45] and AND-OR trees [8]. Conspicuously absent from this list of success stories is the concept class of intersections of halfspaces. While solutions are known to several restrictions of this learning problem [18,51,87,9,44,46,43], no algorithm has been discovered for PAC learning the intersection of even two halfspaces in time faster than 2 Θ(n) . Known hardness results, on the other hand, only apply to polynomially many halfspaces or to proper learning, e.g., [19,3,47,39].
This state of affairs has motivated a quest to determine the threshold degree of the intersection of two halfspaces [57,61,42,76,77]. Prior to our work, the best lower bound was Ω( √ n) for an explicit intersection of two halfspaces [76], complemented by a tight but highly nonconstructive Ω(n) lower bound [77]. Using Theorem 1.1, we prove: The symbol h n ∧ h n above stands for the intersection of two copies of h n on disjoint sets of variables. In other words, Theorem 1.4 constructs an explicit intersection of two halfspaces whose threshold degree is asymptotically maximal, Ω(n). While the nonconstructive Ω(n) lower bound of [77] already ruled out the threshold degree approach as a way to learn intersections of halfspaces, we see Theorem 1.4 as contributing a key qualitative piece of the puzzle. Specifically, it constructs a small and simple family of intersections of two halfspaces that are off-limits to all known algorithmic approaches (namely, the family obtained by applying h n ∧h n to different subsets of the variables x 1 , x 2 , . . . , x 4n ).
1.4. Proof overview. Our solution has two main components: the construction of a sparse set of integers that appear random modulo m, and the univariatization of a multivariate Boolean function. We describe each of these components in detail.
Discrepancy of integer sets. Let m 2 be a given integer. Key to our work is the notion of m-discrepancy, which quantifies the pseudorandomness or aperiodicity modulo m of any given multiset of integers. It is largely unrelated to the notion of discrepancy in communication complexity (Section 1.2). Formally, the m-discrepancy of a nonempty multiset Z = {z 1 , z 2 , . . . , z n } is defined as where ω is a primitive m-th root of unity. This fundamental quantity arises in combinatorics and theoretical computer science, e.g., [30,69,2,38,64,5]. The identity 1 + ω + ω 2 + · · · + ω m−1 = 0 for any m-th root of unity ω = 1 implies that the set Z = {0, 1, 2, . . . , m − 1} achieves the smallest possible m-discrepancy: disc(Z, m) = 0. Much sparser sets with small m-discrepancy can be shown to exist using the probabilistic method (Fact 3.3 and Corollary 3.4). Specifically, one easily verifies for any constant > 0 the existence of a set Z ⊆ {0, 1, 2, . . . , m − 1} with m-discrepancy at most and cardinality O(log m), an exponential improvement in sparsity compared to the trivial set {0, 1, 2, . . . , m − 1}. We are aware of two efficient constructions of sparse sets with small m-discrepancy, due to Ajtai et al. [2] and Katz [38]. The approach of Ajtai et al. is elementary except for an appeal to the prime number theorem, whereas Katz's construction relies on deep results in number theory. Neither work appears to directly imply the kind of optimal de-randomization that we require, namely, an algorithm that runs in time polynomial in log m and produces a multiset of cardinality O(log m) with m-discrepancy bounded away from 1. We obtain such an algorithm by adapting the approach of Ajtai et al. [2]. The centerpiece of the construction of Ajtai et al. [2] is what the authors call the iteration lemma, stated in this paper as Theorem 3.6. Its role is to reduce the construction of a sparse set with small m-discrepancy to the construction of sparse sets with small p-discrepancy, for primes p m. Ajtai et al. [2] proved their iteration lemma for m prime, but we show that their argument readily generalizes to arbitrary moduli m. By applying the iteration lemma in a recursive manner, one reaches smaller and smaller primes. The authors of [2] continue this recursive process until they reach primes p so small that the trivial construction {0, 1, 2, . . . , p − 1} can be considered sparse. We proceed differently and terminate the recursion after just two stages, at which point the input size is small enough for brute force search based on the probabilistic method. The final set that we construct has size logarithmic in m and m-discrepancy a small constant, as opposed to the superlogarithmic size and o(1) discrepancy in the work of Ajtai et al. [2].
We note that this modified approach additionally gives the first explicit circulant expander on n vertices of degree O(log n), which is optimal and improves on the previous best degree bound of (log * n) O(log * n) · O(log n) due to Ajtai et al. [2]. Background on circulant expanders, and the details of our expander construction, can be found in Section 5.6.
Univariatization. We now describe the second major component of our proof. Consider a halfspace h n (x) = sgn( z i x i − θ) in Boolean variables x 1 , x 2 , . . . , x n , where the coefficients can be assumed without loss of generality to be integers. Then the linear form z i x i − θ ranges in the discrete set {±1, ±2, . . . , ±N }, for some integer N proportionate to the magnitude of the coefficients. As a result, one can approximate h n to any given error by approximating the sign function to on {±1, ±2, . . . , ±N }. This approach works for both rational approximation and polynomial approximation. We think of it as the black-box approach to the approximation of h n because it uses the linear form z i x i − θ rather than the individual bits. There is no reason to expect that the black-box construction is anywhere close to optimal. Indeed, there are halfspaces [76, Section 1.3] that can be approximated to arbitrarily small error by a rational function of degree 1 but require a black-box approximant of degree Ω(n). Surprisingly, we are able to construct a halfspace h n with exponentially large coefficients for which the black-box approximant is essentially optimal. As a result, tight lower bounds for the rational and polynomial approximation of h n follow immediately from the univariate lower bounds for approximating the sign function on {±1, ±2, ±3, . . . , ±2 Θ(n) }. The role of h n is to reduce the multivariate problem taken up in this work to a well-understood univariate question, hence the term univariatization.
The construction of h n involves several steps. First, we study the probability distribution of the weighted sum z 1 X 1 + z 2 X 2 + · · · + z n X n modulo m, where z 1 , z 2 , . . . , z n are given integers and the bits X 1 , X 2 , . . . , X n ∈ {0, 1} are chosen uniformly at random. We show that the distribution is exponentially close to uniform whenever the multiset {z 1 , z 2 , . . . , z n } has m-discrepancy bounded away from 1. For the next step, fix any multiset {z 1 , z 2 , . . . , z n } with small m-discrepancy and consider the linear map L : {0, 1} n → Z m given by L(x) = z i x i . At this point in the proof, we know that for uniformly random X ∈ {0, 1} n , the probability distribution of L(X) is exponentially close to uniform. This implies that the characteristic functions of L −1 (0), L −1 (1), . . . , L −1 (m − 1) have approximately the same Fourier spectrum up to degree cn, for some constant c > 0. We substantially strengthen this conclusion by proving that there are probability distributions µ 0 , µ 1 , . . . , µ m−1 , supported on L −1 (0), L −1 (1), . . . , L −1 (m − 1), respectively, such that the Fourier spectra of µ 0 , µ 1 , . . . , µ m−1 are exactly the same up to degree cn. Our proof relies on a general tool from [77,Theorem 4.1], proved there using the Gershgorin circle theorem.
As our final step, we use µ 0 , µ 1 , . . . , µ m−1 to construct a halfspace in terms of z 1 , z 2 , . . . , z n whose approximation by rational functions and polynomials gives corresponding approximants for the sign function on the discrete set {±1, ±2, . . . , ±m}. More generally, for any tuple z 1 , z 2 , . . . , z n , we define an associated halfspace and prove a lower bound on m in terms of the discrepancy of the multiset {z 1 , z 2 , . . . , z n }. Combining this result with the efficient construction of an integer set with small mdiscrepancy for m = 2 Θ(n) , we obtain an explicit halfspace h n : {0, 1} n → {−1, +1} whose approximation by polynomials and rational functions is equivalent to the univariate approximation of the sign function on {±1, ±2, ±3, . . . , ±2 Θ(n) }. Theorem 1.1 now follows by appealing to known lower bounds for the polynomial and rational approximation of the sign function. To obtain the exponential separation of communication complexity with unbounded versus weakly unbounded error (Theorem 1.2), we use the pattern matrix method [73,75] to "lift" the lower bound of Theorem 1.1 to a discrepancy bound. Finally, our result on the threshold degree of the intersection of two halfspaces (Theorem 1.4) works by combining the rational approximation lower bound of Theorem 1.1 with a structural result from [76] on the sign-representation of arbitrary functions of the form f ∧ f.
A key technical contribution of this paper is the identification of m-discrepancy as a pseudorandom property that is weak enough to admit efficient de-randomization and strong enough to allow the univariatization of the corresponding halfspace. The previous, existential result in [77] used a completely different and more complicated pseudorandom property based on affine shifts of the Fourier transform on {0, 1} n , which we have not been able to de-randomize. Apart from the construction of a low-discrepancy set, our proof is simpler and more intuitive than the existential proof in [77].

Preliminaries
We start with a review of the technical preliminaries. The purpose of this section is to make the paper as self-contained as possible, and comfortably readable by a broad audience. The expert reader should therefore skim this section for notation or skip it altogether.
where the linear map on the right-hand side serves the purpose of switching between the distinct arithmetizations for the domain versus range. A partial function f on a set X is a function whose domain of definition, denoted dom f, is a nonempty proper subset of X. We generalize coordinatewise composition f • g to partial Boolean functions f and g in the natural way. Specifically, f • g is the Boolean function given by (2.1), with domain the set of all inputs (. . . , x i , . . . ) ∈ (dom g) n for which (. . . , (1 − g(x i ))/2, . . . ) ∈ dom f. We use the following two versions of the sign function: For a subset X ⊆ R, we let sgn | X denote the restriction of the sign function to X . A halfspace for us is any Boolean function h : {0, 1} n → {−1, +1} given by for some reals w 1 , w 2 , . . . , w n , θ. The majority function MAJ n : {0, 1} n → {−1, +1} is the halfspace defined by Some authors define MAJ n only for n odd, in which case the tiebreaker term 1/4 can be omitted. The complement and the power set of a set S are denoted as usual by S and P(S), respectively. The symmetric difference of sets S and T is S ⊕ T = (S ∩ T ) ∪ (S ∩ T ). Throughout this manuscript, we use brace notation as in {z 1 , z 2 , . . . , z n } to specify multisets rather than sets. The cardinality |Z| of a finite multiset Z is defined as the total number of element occurrences in Z, with each element counted as many times as it occurs. The equality and subset relations on multisets are defined analogously, with the number of element occurrences taken into account. The infinity norm of a function f : X → R is denoted f ∞ = sup x∈X |f (x)|. For real-valued functions f and g and a nonempty finite subset X of their domain, we write We will often use this notation with X a nonempty proper subset of the domain of f and g. We let ln x and log x stand for the natural logarithm of x and the logarithm of For a complex number x, we denote the real part, imaginary part, and complex conjugate of x as usual by Re(x), Im(x), and x, respectively. We typeset the imaginary unit i in boldface to distinguish it from the index variable i.
For an arbitrary integer a and a positive integer m, recall that a mod m denotes the unique element of {0, 1, 2, . . . , m − 1} that is congruent to a modulo m. For an integer m 2, the symbols Z m and Z * m refer to the ring of integers modulo m and the multiplicative group of integers modulo m, respectively. For a multiset Z = {z 1 , z 2 , . . . , z n } of integers, we adopt the standard notation Note that the multisets in (2.3)-(2.6) each have cardinality n, the same as the original set Z. We often use these shorthands in combination, as in (aZ For a logical condition C, we use the Iverson bracket The following concentration inequality, due to Hoeffding [34], is well-known. Fact 2.1 (Hoeffding's Inequality). Let X 1 , X 2 , . . . , X n be independent random vari- In Fact 2.1 and throughout this paper, we typeset random variables using capital letters.
is divisible by both a and b. Since a and b are relatively prime, we conclude that a(a −1 ) b + b(b −1 ) a − 1 is divisible by ab, which is equivalent to (2.7).
Recall that the prime counting function π(x) for a real argument x 0 evaluates to the number of prime numbers less than or equal to x. In what follows, it will be clear from the context whether π refers to 3.14159 . . . or the prime counting function. The asymptotic growth of the latter is given by the prime number theorem, which states that π(n) ∼ n/ ln n. Many explicit bounds on π(n) are known, such as the following theorem of Rosser [68].  (Rosser). For n 55, n ln n + 2 < π(n) < n ln n − 4 .
The number of distinct prime divisors of a natural number n is denoted ν(n). We will need the following first-principles bound on ν(n), which is asymptotically tight for infinitely many n. In particular, Proof. An integer n 1 has by definition ν(n) distinct prime divisors. Letting p k denote the k-th prime, we have where the second step uses the trivial estimate p k k + 1. The second step in this derivation settles (2.8), whereas the last step settles (2.9).

Matrix analysis.
For an arbitrary set X such as X = C or X = {−1, 1}, the symbol X n×m denotes the family of n × m matrices with entries in X. The symbols I n and J n,m stand for the order-n identity matrix and the n × m matrix of all ones, respectively. When the dimensions of the matrix are clear from the context, we omit the subscripts and write simply I or J. The shorthand diag(d 1 , d 2 , . . . , d n ) refers to the diagonal matrix with entries d 1 , d 2 , . . . , d n on the diagonal: The transpose and conjugate transpose of M are denoted M T and M * = M T , respectively. The conjugation, transpose, and conjugate transpose operations apply as a special case to vectors, which we view as matrices with a single column. We use the familiar matrix norms M ∞ = max |M ij | and M 1 = |M ij |. Again, these definitions carry over to vectors as a special case.
for some c 0 , c 1 , . . . , c m−1 ∈ C. Thus, every row of C is obtained by a circular shift of the previous row one entry to the right. We let circ(c 0 , c 1 , . . . , c m−1 ) denote the right-hand side of (2.10). In this notation, circ(1, 0, . . . , 0) = I and circ(1, 1, . . . , 1) = J. The eigenvalues and eigenvectors of a circulant matrix are well-known and straightforward to determine. For the reader's convenience, we include the short derivation below in Fact 2.5 and Corollary 2.6.
is an eigenvector of C with eigenvalue Proof. Let v denote the vector in (2.11). Then for k = 1, 2, 3, . . . , m, where the third step uses ω m = 1.
As a corollary to Fact 2.5, one recovers the full complement of eigenvalues for any circulant matrix C and furthermore learns that C is unitarily similar to a diagonal matrix. In the statement below, recall that a primitive m-th root of unity is any generator, such as exp(2πi/m), for the multiplicative group of the roots of Corollary 2.6. Let C = circ(c 0 , c 1 , . . . , c m−1 ) be a circulant matrix. Let ω be a primitive m-th root of unity. Then the matrix is unitary and satisfies In particular, the eigenvalues of C, counting multiplicities, are Proof. For k, k = 0, 1, . . . , m − 1, we have where the second step is valid because ω is primitive and in particular ω k = ω k . We conclude that which in light of (2.13) is equivalent to (2.12).

Polynomial approximation.
Recall that the total degree of a multivariate real polynomial p : R n → R, denoted deg p, is the largest degree of any monomial of p. We use the terms "degree" and "total degree" interchangeably in this paper. Let f : X → R be a given function with domain X ⊆ R n . For any d 0, define where the infimum is over real polynomials p of degree at most d. In words, E(f, d) is the least error in a pointwise approximation of f by a polynomial of degree no greater than d. The -approximate degree of f is the minimum degree of a polynomial p that approximates f pointwise within : In this overview, we focus on the polynomial approximation of the sign function.
We start with an elementary construction of an approximant due to Buhrman et al. [21]. The degree upper bound in Fact 2.7 is not tight. Indeed, a quadratically stronger bound of O(N log(2/ )) follows in a straightforward manner from Jackson's theorem in approximation theory [67, Theorem 1.4]. Our applications do not benefit from this improvement, however, and we opt for the construction of Buhrman et al. [21] because of its striking simplicity. For the reader's convenience, we provide their short proof below.
Proof (adapted from Buhrman et al.) For a positive integer d, consider the degreed univariate polynomial In words, B d (t) is the probability of observing at least as many heads as tails in a sequence of d independent coin flips, each coming up heads with probability t. By Hoeffding's inequality (Fact 2.1) for sufficiently large . As a result, the shifted and scaled polynomial 2B d On the lower bounds side, Paturi proved that low-degree polynomials cannot approximate the majority function well. He in fact obtained analogous results for all symmetric functions, but the special case of majority will be sufficient for our purposes.
Theorem 2.8 (Paturi). For some constant c > 0 and all integers n 1, The constant 1/3 in Paturi's theorem can be replaced by any other in (0, 1). His result is of interest to us because along with Fact 2.7, it implies a lower bound for the approximation of the sign function on the discrete set of points {±1, ±2, . . . , ±N } for any N.
Proposition 2.9. For all positive integers N and d, Then the composition of these two approximants obeys This in turn gives an approximant for the majority function on n = (N − 1)/2 bits: In view of Paturi's lower bound for the majority function (Theorem 2.8), the approx- where p and q are polynomials on R n . We refer to the degrees of p and q as the numerator degree and denominator degree, respectively, of r. The degree of r is, then, the maximum of the numerator and denominator degrees. For a function f : X → R with domain X ⊆ R n , we define where the infimum is over multivariate polynomials p and q of degree at most d 0 and d 1 , respectively, such that q does not vanish on X. In words, R(f, d 0 , d 1 ) is the least error in an approximation of f by a multivariate rational function with numerator degree and denominator degree at most d 0 and d 1 , respectively. We will be mostly working with R(f, d 0 , d 1 ) in the regimes d 0 = d 1 and d 0 d 1 . In the former regime, we use the shorthand As a limiting case of the latter regime, we have The study of the rational approximation of the sign function dates back to the seminal work by Zolotarev [89] in the 1870s. The problem was revisited almost a century later by Newman [60], who proved the following result.    (Sherstov). For any positive integers N and d, Among other things, Theorem 2.11 implies the following result on the rational approximation of the majority function [76, Eq. (2.2) and Theorems 5.1, 5.9].
Theorem 2.12 (Sherstov). For any positive integers n and d, . Note that in this notation, f and f ∧ f are completely different functions, the former having domain X and the latter X × X. The following ingenious observation, due to Beigel et al. [17], relates the notions of sign-representation and rational approximation for conjunctions of Boolean functions. Then Proof (adapted from Beigel et al.). Fix arbitrary rational functions p 1 (x)/q 1 (x) and Multiplying through by the positive quantity q 1 (x) 2 q 2 (y) 2 gives the desired sign- The construction of Theorem 2.13 is somewhat ad hoc, and there is no particular reason to believe that it gives a sign-representing polynomial of asymptotically optimal degree. Remarkably, it does. The following converse to the theorem of Beigel (Sherstov). Let f : X → {−1, +1} and g : Y → {−1, +1} be given functions, where X, Y ⊂ R n are arbitrary finite sets. Assume that f and g are not identically false. Let d = deg ± (f ∧ g). Then  (Razborov and Sherstov). Let n 1 , . . . , n k be positive integers.
Proposition 2.16 follows in a straightforward manner from Minsky and Papert's Proposition 2.15 by induction on the number of blocks k.

Communication complexity.
An excellent reference on communication complexity is the monograph by Kushilevitz and Nisan [50]. In this overview, we will limit ourselves to key definitions and notation. We adopt the randomized numberon-the-forehead model, due to Chandra et al. [24]. The model features k communicating players, tasked with computing a (possibly partial) Boolean function F on the Cartesian product X 1 ×X 2 ×· · ·×X k of some finite sets X 1 , X 2 , . . . , X k . A given input (x 1 , x 2 , . . . , x k ) ∈ X 1 ×X 2 ×· · ·×X k is distributed among the players by placing x i , figuratively speaking, on the forehead of the i-th player (for i = 1, 2, . . . , k). In other words, the i-th player knows the arguments x 1 , . . . , x i−1 , x i+1 , . . . , x k but not x i . The players communicate by sending broadcast messages, taking turns according to a protocol agreed upon in advance. Each of them privately holds an unlimited supply of uniformly random bits, which he can use along with his available arguments when deciding what message to send at any given point in the protocol. The protocol's purpose is to allow accurate computation of F everywhere on the domain of F . An -error protocol for F is one which, on every input (x 1 , x 2 , . . . , x k ) ∈ dom F, produces the correct answer F (x 1 , x 2 , . . . , x k ) with probability at least 1 − . The cost of a protocol is the total bit length of the messages broadcast by all the players in the worst case. 1 The -error randomized communication complexity of F, denoted R (F ), is the least cost of an -error randomized protocol for F . As a special case of this model for k = 2, one recovers the original two-party model of Yao [88] reviewed in the introduction.
We focus on randomized protocols with probability of error close to that of random guessing, 1/2. There are two natural ways to define the communication complexity of a multiparty problem F in this setting. The communication complexity of F with unbounded error, introduced by Paturi and Simon [63], is the quantity The error probability in this formalism is "unbounded" in the sense that it can be arbitrarily close to 1/2. Babai et al. [11] proposed an alternate quantity, which includes an additive penalty term that depends on the error probability: The contribution of a b-bit broadcast to the protocol cost is b rather than k · b.
We refer to PP(F ) as the communication complexity of F with weakly unbounded error. These two complexity measures naturally give rise to corresponding complexity classes UPP k and PP k in multiparty communication complexity [11], both inspired by Gill's probabilistic polynomial time for Turing machines [31]. Formally, It is standard practice to abbreviate PP = PP 2 and UPP = UPP 2 . The following well-known fact, whose proof in the stated generality is available in [80,Fact 2.4], gives a large class of communication problems that are efficiently computable with unbounded error.
In the setting of k = 2 parties, Paturi and Simon [63] showed that unboundederror communication complexity has a natural matrix-analytic characterization. For a matrix M without zero entries, the sign-rank of M is denoted rk ± (M ) and defined as the minimum rank of a real matrix R such that sgn R i,j = sgn M i,j for all i, j. In words, the sign-rank of M is the minimum rank of a real matrix that has the same sign pattern as M. We extend the notion of sign-rank to communication problems F : X × Y → {−1, +1} by defining rk ± (F ) = rk ± (M F ), where M F = [F (x, y)] x∈X,y∈Y is the characteristic matrix of F. The following classic result due to Paturi and Simon [63, Theorem 3] relates two-party unbounded-error communication complexity to sign-rank. 2.9. Discrepancy. A k-dimensional cylinder intersection is a function χ : In other words, a k-dimensional cylinder intersection is the product of k functions with range {0, 1}, where the i-th function does not depend on the i-th coordinate but may depend arbitrarily on the other k − 1 coordinates. Introduced by Babai et al. [12], cylinder intersections are the fundamental building blocks of communication protocols and for that reason play a central role in the theory. For a (possibly partial) Boolean function F on X 1 × X 2 × · · · × X k and a probability distribution P on X 1 × X 2 × · · · × X k , the discrepancy of F with respect to P is given by where the maximum is over cylinder intersections χ. The minimum discrepancy over all distributions is denoted Upper bounds on a function's discrepancy give lower bounds on its randomized communication complexity, a classic technique known as the discrepancy method [28,12,50].
Theorem 2.19. Let F be a (possibly partial) Boolean function on X 1 ×X 2 ×· · ·×X k . Then for 0 1/2,  [50], discrepancy is a challenging quantity to analyze. The pattern matrix method is a technique that gives tight bounds on the discrepancy and communication complexity for a large class of communication problems. The technique was developed in [73,75] for two-party communication complexity and has since been generalized by several authors to the multiparty setting. We now review the strongest form [79,78] of the pattern matrix method, focusing our discussion on discrepancy bounds. Set disjointness is the k-party communication problem of determining whether k given subsets of the universe {1, 2, . . . , n} have empty intersection, where, as usual, the i-th party knows all the sets except for the i-th. Identifying the sets with their characteristic vectors, set disjointness corresponds to the Boolean function DISJ n,k : ({0, 1} n ) k → {−1, +1} given by The partial function UDISJ n,k on ({0, 1} n ) k , called unique set disjointness, is defined as DISJ n,k with domain restricted to inputs x ∈ ({0, 1} n ) k such that x 1,i ∧ x 2,i ∧ · · · ∧ x k,i = 1 for at most one coordinate i. In set-theoretic terms, this restriction corresponds to requiring that the k sets either have empty intersection or intersect in a unique element. The pattern matrix method pertains to the communication complexity of composed communication problems. Specifically, let G be a (possibly partial) Boolean function on X 1 ×X 2 ×· · ·×X k , representing a k-party communication problem, and let f : {0, 1} n → {−1, +1} be given. The coordinatewise composition f •G is then a k-party communication problem on X n 1 × X n 2 × · · · × X n k . We are now in a position to state the pattern matrix method for discrepancy bounds [79,Theorem 5.7].
Theorem 2.21 (Sherstov). For every Boolean function f : {0, 1} n → {−1, +1}, all positive integers m and k, and all reals 0 < γ < 1, This theorem makes it possible to prove communication lower bounds by leveraging the existing literature on polynomial approximation. In follow-up work, the author improved Theorem 2.21 to an essentially tight upper bound [78,Theorem 5.7]. However, we will not need this sharper version.

Discrepancy of integer sets
Let m 2 be an integer modulus. Key to our work is the notion of mdiscrepancy, which quantifies the pseudorandomness or aperiodicity of any given multiset of integers modulo m. The m-discrepancy of a nonempty multiset Z = {z 1 , z 2 , . . . , z n } of arbitrary integers is defined as where ω is a primitive m-th root of unity; the right-hand side is obviously the same for any such ω. By way of terminology, we emphasize that the notion of mdiscrepancy just defined is unrelated to the notion of discrepancy from Section 2.9. As a matter of convenience, we define The notion of m-discrepancy has a long history in combinatorics and theoretical computer science, e.g., [30,69,2,38,64,5]. The m-discrepancy of an integer multiset Z has a natural interpretation in terms of the discrete Fourier transform on Z m . Specifically, consider the frequency vector (f 0 , f 1 , . . . , f m−1 ) of Z, where f j is the total number of element occurrences in Z that are congruent to j modulo m. Applying the discrete Fourier transform to (f j ) m−1 j=0 produces the sequence ( 3.1. Basic properties. We collect a few elementary properties of m-discrepancy. To start with, we quantify the "continuity" of disc(Z, m) in the first argument. By way of notation, we remind the reader that the cardinality |Z| of a multiset Z is found by summing, for each distinct element z ∈ Z, the number of times z occurs in Z. The m-discrepancy of Z is invariant under a variety of operations on Z, such as shifting the elements of Z by any given integer or multiplying the elements of Z by an integer relatively prime to m. For our purposes, the following observation will be sufficient. Proof. The claim is immediate from the definition of m-discrepancy because ω is a primitive m-th root of unity if and only if ω −1 is.
For the reader's convenience, we include a short proof below. where the second step uses Hoeffding's inequality (Fact 2.1). Applying the union bound across all m-th roots of unity ω = 1, we conclude that the probability that disc({Z 1 , Z 2 , . . . , Z n }, m) is at most 4(m − 1) exp(−n 2 /8).
In some applications, one is restricted to working with subsets of {0, 1, 2, . . . , m−1} as opposed to arbitrary multisets with possibly repeated elements. We record a version of Fact 3.3 for this setting. Proof. The probability that Z does not contain 0 or repeated elements is easily seen to be In all of our applications, the error parameter > 0 will be a small constant. In this regime, Corollary 3.4 guarantees the existence of a set Z ⊆ {1, 2, . . . , m−1} with m-discrepancy at most and cardinality O(log m), an exponential improvement in sparsity compared to the trivial set {0, 1, 2, . . . , m − 1}. No further improvement is possible: it is well known that any nonempty multiset with m-discrepancy bounded away from 1 has cardinality Ω(log m). This classical lower bound has a remarkable variety of proofs, e.g., using random walks [5], sphere packing arguments [29], and diophantine approximation [53]. We include here a particularly simple and self-contained proof, adapted from Leung et al. [53]. Unlike all other technical statements in this paper, Fact 3.5 is not used in the proof of our main result and is provided solely for completeness. Proof (adapted from [53]). The proof is based on a classic technique from simultaneous diophantine approximation. For a nonnegative real number x, let frac(x) denote the fractional part of x. Abbreviate q = (m − 1) 1/n and consider the q intervals for some u 1 , u 2 , . . . , u n ∈ Z. Now where the first step uses the definition of m-discrepancy; the second step applies the triangle inequality; the third step is valid by periodicity; the fourth step uses the bound |1 − exp(2πxi)| = 2 − 2 cos(2πx) 2π|x| for all real x; and the final step is immediate from (3.6).

An explicit construction.
We now turn to the problem of efficiently constructing sparse sets with small m-discrepancy. Two such constructions are known to date, due to Ajtai et al. [2] and Katz [38]. The approach of Ajtai et al. is elementary except for an appeal to the prime number theorem. Katz's construction, on the other hand, relies on deep results in number theory. Neither work appears to directly imply the kind of optimal de-randomization that we require, namely, an algorithm that runs in time polynomial in log m and produces a multiset of cardinality O(log m) with m-discrepancy bounded away from 1. We obtain such an algorithm by adapting the approach of Ajtai et al. [2]. The following technical result plays a central role. Ajtai et al. [2] proved a special case of Theorem 3.6 for m prime, but their argument readily generalizes to arbitrary moduli m as just stated. For the reader's convenience, we provide a complete proof of Theorem 3.6 in Appendix A. The theorem's purpose is to reduce the construction of a sparse set with small m-discrepancy to the construction of sparse sets with small p-discrepancy, for primes p m. By applying Theorem 3.6 in a recursive manner, one reaches smaller and smaller primes. The authors of [2] continue this recursive process until they reach primes p so small that the trivial construction {1, 2, 3, . . . , p − 1} can be considered sparse. We proceed differently and terminate the recursion after just two stages, at which point the input size is small enough for brute force search based on Corollary 3.4. The final set that we construct has size logarithmic in m and m-discrepancy a small constant, as opposed to the superlogarithmic size and o(1) discrepancy in the work of Ajtai et al. [2]. A detailed exposition of our algorithm follows. We may assume that where π is the prime counting function and ν is the number of distinct prime divisors function. Indeed, if any of (3. Assuming (3.7)-(3.12), our construction of Z has three stages. In the first and second stages, we construct sparse sets S p ⊆ {1, 2, . . . , p − 1} with small pdiscrepancy for all primes p ∈ (P /2, P ] and p ∈ (P /2, P ], respectively. In the final stage, we construct the set Z in the theorem statement. We ensure that each stage runs in time polynomial in ln m. disc(S p , p ) δ, prime p ∈ (P /2, P ]. (3.14) The primes in (P /2, P ] can be identified by the trivial algorithm in time polynomial in P = O(ln ln m). For each such prime p , we can find a set S p with the above properties in time P O(|S p |) = o(ln m) by trying out all candidate sets.

Stage 2.
Apply the construction of Theorem 3.6 with parameters P = P and R = 1/δ 2 to the sets constructed in Stage 1 to obtain a set S p ⊆ {1, 2, . . . , p −1} for each prime p ∈ (P /2, P ]. This choice of parameters is legitimate by (3.9). By (3.13), the new sets have the same cardinality, namely, The prime number theorem (Fact 2.3) implies that |S p | = O(P ) = O(ln ln m). In view of (3.7), (3.14), and P = exp(δP ), the new sets have disc(S p , p ) 6cδ, prime p ∈ (P /2, P ]. This choice of parameters is legitimate by (3.10). This new set has cardinality , P and p m , which in view of (3.11) and (3.12) guarantees that S m is nonempty. Simplifying, where the second step applies the prime number theorem (Fact 2.3). The multiplicative constant in this asymptotic bound on |S m | can be easily recovered from the explicit bounds in Fact 2.3. Using (3.9), (3.15), and m = exp(δP ), we further obtain disc(S m , m) 11cδ.

Univariatization
Consider a halfspace h n (x) = sgn( z i x i −θ) in Boolean variables x 1 , x 2 , . . . , x n ∈ {0, 1}, where the coefficients can be assumed without loss of generality to be integers. Then the linear form z i x i − θ ranges in the discrete set {±1, ±2, . . . , ±N }, for some integer N proportionate to the magnitude of the coefficients. As a result, one can approximate h n to any given error by approximating the sign function to on {±1, ±2, . . . , ±N }. This approach works for both rational approximation and polynomial approximation. Needless to say, there is no reason to expect that the degree of the approximant in this naïve construction is anywhere close to optimal. Perhaps the most dramatic example is the odd-max-bit function, defined by OMB n (x) = sgn(1 + n i=1 (−2) i x i ). A moment's thought reveals that OMB n can be approximated to any given error > 0 by a rational function of degree 1, whereas the naïve construction produces an approximant of degree Ω(n).
Surprisingly, we are able to construct a halfspace h n (x) = sgn( z i x i − θ) with exponentially large coefficients for which the naïve construction is essentially optimal. Specifically, we show that a rational approximant for h n with given error and given numerator and denominator degrees implies an analogous univariate rational approximant for the sign function on {±1, ±2, ±3, . . . , ±2 Θ(n) }. As a result, tight lower bounds for the rational and polynomial approximation of h n follow immediately from the univariate lower bounds for the sign function. The construction of h n , carried out in this section, is the centerpiece of our paper. The role of h n is to reduce the multivariate problem taken up in this work to a well-understood univariate question, whence the title of this section. We have broken down the proof into four steps, corresponding to subsections 4.1-4.4 below.
4.1. Distribution of a linear form modulo m. We start by studying the probability distribution of the weighted sum z 1 X 1 + z 2 X 2 + · · · + z n X n modulo m, where z 1 , z 2 , . . . , z n are given integers and X 1 , X 2 , . . . , X n ∈ {0, 1} are chosen uniformly at random. We will show that the distribution is close to uniform whenever the multiset {z 1 , z 2 , . . . , z n } has small m-discrepancy. This result uses the following classical fact on linear forms modulo m. This implies (4.1) because |ω −ks | = 1 for all k, s ∈ Z.
In the original version of this manuscript, we proved (4.1) using a different, matrixanalytic argument, which we include as Appendix B. The short and elegant proof above was pointed out to us by T. S. Jayram, who kindly allowed us to include it. We now simplify the right-hand side of (4.1) and relate it to m-discrepancy. In view of Fact 4.1, the proof is complete.
By way of notation, we remind the reader that f, g X = 1 |X | x∈X f (x)g(x) for any real-valued functions f and g and a nonempty subset X of their domain. In words, Theorem 4.3 states that if χ 1 , χ 2 , . . . , χ k each have small correlation with f and, in addition, have small pairwise correlations, then a distribution exists with respect to which f is completely uncorrelated with χ 1 , χ 2 , . . . , χ k . We are now in a position to prove the existence of the promised fooling distributions. In the statement that follows, recall that H(p) = −p log p − (1 − p) log(1 − p) is the binary entropy function.  Then each X s is nonempty. Moreover, there is a probability distribution µ s on X s (for each s) such that for all s, s ∈ Z and all real polynomials p : {0, 1} n → R of degree at most δn.
Proof. For a subset A ⊆ {1, 2, . . . , n}, define The centerpiece of the proof is the following claim. (4.8) We will proceed with the main proof and settle the claim after we are finished. Fix s ∈ Z arbitrarily. Let A denote the family of nonempty subsets of {1, 2, . . . , n} of cardinality at most δn. Recall from (2.2) that As a result, where the second step uses (4.9) and Claim 4.5; the third step is valid because 1 + disc(Z, m) < 2(1 − δ) by (4.4); and the final step is immediate from (4.4). An analogous calculation shows that for every A ∈ A , where the second step follows from (4.9) and Claim 4.5, and the last step uses (4.4).
Recall from Claim 4.5 that each X s is nonempty. Applying Theorem 4.3 with (4.10) and (4.11) to the functions χ A (A ∈ A ) and f = 1, we infer the existence of a probability distribution µ s on X s such that Now that the probability distributions µ s have been constructed for each s ∈ Z, consider an arbitrary polynomial p : {0, 1} n → R of degree at most δn. Then p = |A| δn p A χ A for some reals p A . As a result, (4.12) implies that E µs p = p ∅ for all s ∈ Z, thereby settling (4.6).
4.3. The univariate reduction. At last, we present a generic construction of a halfspace whose approximation by rational functions and polynomials gives corresponding approximants for the sign function on the discrete set {±1, ±2, . . . , ±m}.
In more detail, let z 1 , z 2 , . . . , z n be given integers. For any such n-tuple, we define an associated halfspace and prove a lower bound on m in terms of the discrepancy of the multiset {z 1 , z 2 , . . . , z n }. The following first-principles calculation will be helpful.
Proposition 4.6. Let a 1 , a 2 , . . . , a k ∈ R and b 1 , b 2 , . . . , b k > 0. Then Proof. Abbreviate m = min a i /b i and M = max a i /b i . Since each b i is positive, we obtain mb i a i M b i . Taking a weighted sum of these inequalities, we arrive at m E b i E a i M E b i , which is equivalent to (4.15).
Proof. Fix 0 < < 1 arbitrarily for the remainder of the proof, and suppose that R(f, d 0 , d 1 ) < for some d 0 , d 1 δn/2. Our goal is to show that The proof is algorithmic and involves three steps. Given any approximant for f , we will first manipulate it to control the sign behavior in the numerator and denominator, then symmetrize it with respect to y, and finally-the arduous part of the proof-symmetrize it with respect to x. The result of these manipulations will be a univariate approximant for the sign function.
Step 3: Symmetrization on x. We have reached the most demanding part of the proof, where we symmetrize the approximants obtained so far with respect to x. For s ∈ Z, let X s ⊆ {0, 1} n be given by (4.5). Then Lemma 4.4 guarantees that each X s is nonempty, and additionally provides a probability distribution µ s on X s (for each s ∈ Z) such that for every polynomial P : {0, 1} n → R, . Now (4.23) and (4.24) imply that and all x in the support of µ s . Since the numerators and denominators of these fractions are positive, Proposition 4.6 allows us to pass to expectations with respect to x ∼ µ s to obtain or equivalently Equations (4.26) and (4.27) show that r * * (s−1)/q * * (s−1) and p * * (s−1)/r * * (s−1) approximate sgn s pointwise on {±1, ±2, . . . , ±m} to error less than . Moreover, (4.25) ensures that the degrees of p * * , q * * , r * * are at most the degrees of p * , q * , r * , respectively. We conclude that These complementary bounds force (4.17) and thereby complete the proof.  where C 1/10 1 is the constant defined in Theorem 3.7. On input n, the construction of h n is as follows. For n < 1/c , the sought property (4.28) amounts to R(h n , 0, 0) R(sgn | {−1,1} , 0, 0), which is in turn equivalent to R(h n , 0, 0) 1 and holds trivially for the halfspace h n (x) = (−1) x1 .
We now turn to the nontrivial case, n 1/c . Abbreviate m = 2 c n . Then the algorithm of Theorem 3.7 constructs, in time polynomial in n, a nonempty multiset Z with m-discrepancy disc(Z, m) 1 10 (4.30) and cardinality |Z| n/2. Observe that for any integer k 1, the union of k copies of Z is a multiset with m-discrepancy disc(Z, m) and cardinality k|Z|. Therefore, we may assume without loss of generality that (4.31) We let where z 1 , z 2 , z 3 , . . . , z |Z| denote the elements of the multiset Z.

Main results
Using the halfspace h n constructed in our master theorem, we will now establish the main results of this paper.

Polynomial approximation.
Prior to our work, the strongest lower bound for the approximation of an explicit halfspace f n : {0, 1} n → {−1, +1} by polynomials was E(f n , c √ n) 1 − 2 −c √ n for an absolute constant c > 0, proved in [76,77]. The result that we are about to prove is a quadratic improvement on previous work, with respect to both degree and error. As we will discuss shortly, this new result is essentially the best possible. Theorem 5.1 is essentially as strong as one could hope for. First of all, any function in n Boolean variables can be approximated to zero error by a polynomial of degree at most n, i.e., at most a constant factor larger than what is assumed in (5.1). Moreover, a classic result due to Muroga [58] implies that for every halfspace, the error bound in (5.1) is almost achieved by polynomials of degree 1: There is an absolute constant c > 0 such that for every n and every halfspace h : {0, 1} n → {−1, +1}, Proof. Muroga [58] showed that every halfspace h : {0, 1} n → {−1, +1} can be represented as h(x) = sgn( n j=1 z j x j − θ) for some integers z 1 , z 2 , . . . , z n , θ whose absolute values sum to n O(n) . It follows that

Rational approximation.
We now show that the halfspace h n constructed in our master theorem cannot be approximated pointwise to any small constant except by rational functions of degree Ω(n). This degree lower bound matches the trivial upper bound and is a quadratic improvement on the previous best construction [76,77]. More generally, we derive a lower bound on the approximation of h n by rational functions of any given degree d, and this lower bound too is essentially the best possible for any halfspace. Details follow.
where the first step corresponds to taking d 0 = d 1 = d in Theorem 4.8, and the second step is immediate from Theorem 2.11. This implies (5.2) for c > 0 small enough.
We now show that the lower bounds on the approximation error in Theorem 5.3 are essentially the best possible for any halfspace. Proof. As already mentioned, Muroga [58] showed that h( where the second step uses Newman's rational approximation (Fact 2.10).

Threshold degree.
Here, we use the halfspace h n constructed in our master theorem to study the degree required to sign-represent intersections of halfspaces. Our result is a lower bound of Ω(n) for the intersection h n ∧ h n of two independent copies of h n . This result improves quadratically on the previous best construction [76,77] and matches the trivial upper bound of O(n) for sign-representing any Boolean function in n variables. Proof. Abbreviate D n = deg ± (h n ∧ h n ). Taking f = g = h n in Theorem 2.14 shows that R(h n , 4D n ) < 1/2, which by Theorem 5.3 forces D n = Ω(n).
Theorem 5.5 should be contrasted with the result of Beigel et al. [17] that the conjunction of any constant number of majority functions on {0, 1} n has threshold degree O(log n). We now derive a lower bound of Ω( √ n log n) on the threshold degree of the intersection of an explicitly given halfspace and a majority function, improving quadratically on the previous best construction [76,77]. As we discuss shortly, the new construction is optimal up to a logarithmic factor. Proof. Abbreviate D n = deg ± (h n ∧MAJ n ). Then R(h n , 4D n )+R(MAJ n , 2D n ) < 1 by Theorem 2.14. The lower bounds for the rational approximation of h n and MAJ n in Theorems 2.12 and 5.3 now imply that D n = Ω( √ n log n). , and we will now construct a pair of halfspaces whose intersection has threshold density 2 Θ(n) . Prior to our work, the best construction [76] had threshold density 2 Θ( √ n) . To proceed, we recall a technique due to Krause and PudlÃ¡k [48] We are now in a position to obtain the claimed density results.
where a ⊕ b ∈ {0, 1} denotes as usual the XOR of a and b. Similarly, one has is the result of starting with the intersection H 4n ∧ H 4n of two explicitly given halfspaces in 4n variables each, and replacing their input variables with appropriately chosen parity functions. This replacement cannot increase the threshold density because the parity of several parity functions is another parity function. We conclude that dns(H 4n ∧ H 4n ) = 2 Ω(n) . This completes the proof of (5.5).
The proof of (5.6) is closely analogous. Specifically, recall from Theorem 5.6 that h n ∧ MAJ n has threshold degree Ω( √ n log n). By Theorem 5.8, the function (h n ∧ MAJ n ) KP = h KP n ∧ MAJ KP n has threshold density exp(Ω( √ n log n)). It follows from (5.7) and (5.8) that h KP n ∧ MAJ KP n is the result of starting with the intersection H 4n ∧ MAJ 4n for an explicit halfspace H 4n in 4n variables, and replacing the input variables with appropriately chosen parity functions or their negations. This replacement cannot increase the threshold density because the parity of several parity functions is another parity function. We conclude that dns(H 4n ∧ MAJ 4n ) = exp(Ω( √ n log n)). This completes the proof of (5.6).
Both lower bounds in Theorem 5.9 are essentially the best possible for any halfspace H n : {0, 1} n → {−1, +1}. Indeed, the first lower bound is tight by definition, while the second lower bound nearly matches the upper bound of exp(O( √ n log 2 n)) that follows from Remark 5.7.

Communication complexity.
Using the pattern matrix method, we will now "lift" the approximation lower bound of Theorem 5.1 to communication complexity. As a result, we will obtain an explicit separation of k-party communication complexity with unbounded and weakly unbounded error (which for k = 2 is equivalent to a separation of sign-rank and discrepancy). Our application of the pattern matrix method is based on the fact that the unique set disjointness function UDISJ m,k has an exact representation on its domain as a polynomial with a small number of monomials; cf. [ Moreover, for some explicitly given reals w 0 , w 1 , . . . , w n .
Proof. Let h n : {0, 1} n → {−1, +1} be the halfspace constructed in Theorem 4.8. Then by definition, h n (x) = sgn p n (x) for a linear polynomial p n : R n → R. Moreover, Theorem 5.1 ensures that where the right-hand side features the coordinatewise composition of the polynomial p n with n independent copies of the polynomial (1 − UDISJ * m,k )/2. The identity (5.9) implies that F n,k coincides with h n • UDISJ m,k on the domain of the latter. Therefore, where the second step uses (5.14) and the pattern matrix method (Theorem 2.21). Applying the discrepancy method (Corollary 2.20), we obtain PP(F n,k ) log 2 disc(F n,k ) cn. (5.17) To complete the proof, define the functions F n,k for any positive integers n and k by 5.6. A circulant expander. Consider a d-regular undirected graph G on n vertices, with adjacency matrix A. Since A is symmetric, it has n real eigenvalues (counting multiplicities). We denote these eigenvalues by λ 1 (G) λ 2 (G) · · · λ n (G) and define λ(G) = max{|λ 2 (G)|, |λ 3 (G)|, . . . , |λ n (G)|}. It is well known and straightforward to verify that λ 1 (G) = d and |λ i (G)| d for i = 2, 3, . . . , n. We say that G is an -expander if λ(G) d. This spectral notion is intimately related to key graph-theoretic and stochastic properties of G, such as vertex expansion and the convergence rate of a random walk on G to the uniform distribution. One is typically interested in -expanders that are d-regular for d as small as possible, where 0 < < 1 is a constant. The existence of expanders with strong parameters can be verified using the probabilistic method [6], and explicit constructions are known as well.
In this section, we study the problem of constructing circulant expanders. Formally, a graph is circulant if its adjacency matrix is circulant. It is clear that a circulant graph is d-regular for some d, meaning that every vertex has out-degree d and in-degree d. We focus on circulant graphs that are undirected and have no self-loops, which corresponds to adjacency matrices that are symmetric and have zeroes on the diagonal. It is well known [5] that for any 0 < < 1 and all large enough n, there exists a circulant -expander on n vertices of degree O(log n). This degree bound is asymptotically optimal [5,29,53], and the problem of constructing such circulant expanders explicitly has been studied by several authors [4,2,5]. The best construction prior to our work, due to Ajtai et al. [2], achieves degree (log * n) O(log * n) log n. In this section, we construct a circulant -expander of optimal degree, O(log n), for any constant 0 < < 1. By way of terminology, recall that the adjacency matrix of a circulant graph on n vertices is circ(1 S ) for some subset S ⊆ {0, 1, 2, . . . , n − 1}. With this in mind, we say that an algorithm constructs a circulant graph on n vertices in time T (n) if the algorithm outputs in time T (n) the elements of the associated subset S. The formal statement of our result follows. Proof. Let C be the constant from Theorem 3.7. We first consider the trivial case when 2(C log n) 2 n, which means that n is bounded by an explicit constant. In this case, we take G n to be the complete graph on n vertices. It is clear that G n is a d-regular circulant graph for d = n − 1. The adjacency matrix of G n is circ(0, 1, 1, . . . , 1), whose eigenvalues by Corollary 2.6 are n − 1, −1, −1, . . . , −1. In particular, λ(G n ) = 1 = d/(n−1). This settles (5.24), whereas (5.23) holds trivially because d and n are bounded by a constant.
This result is a slight generalization of the iteration lemma of Ajtai et al. [2], which corresponds to the special case for m prime. We closely follow their proof but provide ample detail to make it more accessible. We have structured the presentation around five key milestones, corresponding to Sections A.1-A.5 below. Before proceeding, the reader may wish to review the number-theoretic preliminaries in Section 2.2.
A.1. Shorthand notation. In the remainder of this manuscript, we adopt the shorthand e(x) = exp(2πxi), where i is the imaginary unit. We will need the following bounds, illustrated in Figure  To verify these bounds, write |1 − e(x)| = |1 − exp(2πxi)| = 2 − 2 cos(2πx) and apply elementary calculus. We let P denote the set of prime numbers p ∈ (P/2, P ] with p m. In this notation, the multiset S is given by S = {(r + s · (p −1 ) m ) mod m : p ∈ P, s ∈ S p , r = 1, 2, . . . , R}.

(A.4)
A.2. Elements of S are nonzero and distinct. As our first step, we verify that the elements of S are nonzero modulo m. Consider any r ∈ {1, 2, . . . , R}, any prime p ∈ (P/2, P ] with p m, and any s ∈ S p . Then pr + s ∈ [1, P R + P − 1] ⊆ [1, m). This means that pr + s ≡ 0 (mod m), which in turn implies that r + s · (p −1 ) m ≡ 0 (mod m). We now show that the multiset S contains no repeated elements. For this, consider any r, r ∈ {1, 2, . . . , R}, any primes p, p ∈ P, and any s ∈ S p and s ∈ S p such that r + s · (p −1 ) m ≡ r + s · (p −1 ) m (mod m).
(A.5) Our goal is to show that p = p , r = r , s = s . To this end, multiply (A.5) through by pp to obtain r · pp + s · p ≡ r · pp + s · p (mod m). (A.6) The left-hand side and right-hand side of (A.6) are integers in [1, RP 2 +(P −1)P ] ⊆ [1, m), whence r · pp + s · p = r · pp + s · p. (A.7) This implies that p | s · p , which in view of s < p and the primality of p and p forces p = p . Now (A.7) simplifies to r · p + s = r · p + s , (A. 8) which in turn yields s ≡ s (mod p). Recalling that s, s ∈ {1, 2, . . . , p − 1}, we arrive at s = s . Finally, substituting s = s in (A.8) gives r = r .
A.3. Correlation for k small. So far, we have shown that the elements of S are distinct and nonzero. Recall that our objective is to bound the m-discrepancy of this set. Put another way, we must bound the exponential sum s∈S e k m · s (A.9) for all k = 1, 2, . . . , m − 1. This subsection and the next provide two complementary bounds on (A.9). The first bound, presented below, is preferable when k is close to zero modulo m. We proceed to bound the two summations in (A.11). Bounding the second summation is straightforward: where the first step is valid because all sets S p have the same cardinality, and the last step uses (A.10).
The other summation in (A.11) requires more work. For p ∈ P and K ∈ {k, k − m}, we have where C 1 is a constant independent of R, P, m. Moreover, C can be easily calculated from the explicit bounds in Facts 2.3 and 2.4. We will show that the theorem conclusion (A.1) holds with c = 4C 2 . We may assume that This conclusion is equivalent to (A.1). The proof of Theorem 3.6 is complete.