Simulation Theorems via Pseudorandom Properties

We generalize the deterministic simulation theorem of Raz and McKenzie [RM99], to any gadget which satisfies certain hitting property. We prove that inner-product and gap-Hamming satisfy this property, and as a corollary we obtain deterministic simulation theorem for these gadgets, where the gadget's input-size is logarithmic in the input-size of the outer function. This answers an open question posed by G\"{o}\"{o}s, Pitassi and Watson [GPW15]. Our result also implies the previous results for the Indexing gadget, with better parameters than was previously known. A preliminary version of the results obtained in this work appeared in [CKL+17].


Introduction
A very basic problem in computational complexity is to understand the complexity of a composed function f • g in terms of the complexities of the two simpler functions f and g used for the composition. For concreteness, we consider f : {0, 1} p → Z and g : {0, 1} m → {0, 1} and denote the composed function as f • g p : {0, 1} mp → Z; then f is called the outer-function and g is called the inner-function. The special case of Z being {0, 1} and f the XOR function has been the focus of several works [Yao82, Lev87, Imp95, Sha03, LSS08, VW08, She12b], commonly known as XOR lemmas. Another special case is when f is the trivial function that maps each point to itself. This case has also been widely studied in various parts of complexity theory under the names of 'direct sum' and 'direct product' problems, depending on the quality of the desired solution [JRS03, BPSW05, HJMR07, JKN08, Dru12, Pan12, JPY12, JY12, BBCR13, BRWY13a, BRWY13b, BBK + 13, BR14, KLL + 15, Jai15]. Making progress on even these special cases of the general problem in various models of computation is an outstanding open problem. While no such general theorems are known, there has been some progress in the setting of communication complexity. In this setting the input for g is split between two parties, Alice and Bob. A particular instance of progress from a few years ago is the development of the pattern matrix method by Sherstov [She11] and the closely related block-composition method of Shi and Zhu [SZ09], which led to a series of interesting developments [Cha07,LSS08,CA08,She12a,She13,RY15], resolving several open problems along the way. In both these methods, the relevant analytic property of the outer function is approximate degree. While the pattern-matrix method entailed the use of a special inner function, the block-composition method, further developed by Chattopadhyay [Cha09], Lee and Zhang [LZ10] and Sherstov [She12a,She13], prescribed the inner function to have small discrepancy. These methods are able to lower bound the randomized communication complexity of f • g p essentially by the product of the approximate degree of f and the logarithm of the inverse of discrepancy of g.
The following simple protocol is suggestive: Alice and Bob try to solve f using a decision tree (randomized/deterministic) algorithm. Such an algorithm queries the input bits of f frugally. Whenever there is a query, Alice and Bob solve the relevant instance of g by using the best protocol for g. This allows them to progress with the decision tree computation of f , yielding (informally) an upper bound of M cc f • g p = O(M dt f · M cc g ), where M could be the deterministic or randomized model and M dt denotes the decision tree complexity. A natural question is if the above upper bound is essentially optimal. The case when both f and g are XOR clearly shows that this is not always the case. However, this may just be a pathological case. Indeed it is natural to study for what models M and which inner functions g, is the above naive algorithm optimal.
In a remarkable and celebrated work, Raz and McKenzie [RM99] showed that this naïve upper bound is always optimal for deterministic protocols, when g is the Indexing function (IND), provided the gadget size is polynomially large in p. This theorem was the main technical workhorse of Raz and McKenzie to famously separate the monotone NC hierarchy. The work of Raz and McKenzie was recently simplified and built upon by Göös, Pitassi and Watson [GPW15] to solve a longstanding open problem in communication complexity. In line with [GPW15], we call such theorems simulation theorems, because they explicitly construct a decision-tree for f by simulating a given protocol for f • g p . More recently, de Rezende, Nordström and Vinyals [dRNV16] port the above deterministic simulation theorem to the model of real communication, yielding new trade-offs for the measures of size and space in the cutting planes proof system.
In this work, our main result is the following: where Z is any domain, and g : {0, 1} n × {0, 1} n → {0, 1} be inner-product function, or any function from the gap-Hamming class of promise-problems. Then, The inner-product function IP n {0, 1} n × {0, 1} n → {0, 1} is defined as IP n (x, y) = i∈ [n] x i · y i , where the summation is taken over field F 2 . Problems in the class of gap-Hamming promise-problems, parameterized with γ and denoted byGH n,γ : {0, 1} n × {0, 1} n → {0, 1}, distinguish the case of (x, y) having Hamming distance at least ( 1 2 + γ)n from the case of (x, y) having Hamming distance at most ( 1 2 − γ)n, for 0 ≤ γ ≤ 1/4. Note that this is the first deterministic simulation theorem with logarithmic gadget size, whereas the Raz-McKenzie simulation theorem requires a polynomial size gadget. This answers a problem raised by both Göös-Pittasi-Watson [GPW15] and Göös et.al.
[GLM + 15] of proving a Raz-McKenzie style deterministic simulation theorem for a different inner function than Indexing with a better gadget size. Moreover, it is not hard to verify that an instance of the function g easily embeds in Indexing by exponentially blowing up the size. This enables us to also re-derive the original Raz-McKenzie simulation theorem for the Indexing function, even attaining significantly better parameters. This improvement in parameters answers a question posed to us recently by Jakob Nordström [Nor16].
The techniques required to prove the deterministic simulation theorem are based on those which appear in [RM99,GPW15]. Our contribution in this part is two-fold. On one hand, we generalize the proof considerably, by singling-out a new pseudo-random property of a function g : {0, 1} n → {0, 1}, which we call "having (δ, h)-hitting rectangle-distributions", and then showing that a simulation theorem will hold (D cc (f p • g p ) = Θ(D dt (f ) · h)) for any g with this property. We then show that the inner-product function and the gap-Hamming problem have the above property. This results in a simulation theorem for IP and GH with exponentially smaller gadget size than was previously known. We discuss the pseudo-random property and its connection to gadget-size in the next sub-section.
It is well known that inner-product has strong pseudo-random properties. In particular it has vanishing discrepancy under the uniform distribution which makes it a good 2-source extractor. In fact, such strong properties of inner-product were recently used to prove simulation theorems for more exotic models of communication by Göös et al. [GLM + 15] and also by the authors and Dvořák [CDK + 17] to resolve a problem with a direct-sum flavor. By comparison, the pseudo-random property we abstract for proving our simulation theorem seems milder. This intuition is corroborated by the fact that we can show that gap-Hamming problems also possess our property, even though we know that these problems have large Ω(1) discrepancy under all distributions. Interestingly, any technique that relies on the inner-function having small discrepancy, such as the block-composition method, will not succeed in proving simulation theorems for such inner gadgets.
We would, at this point, like to point out to the readers that a preliminary version of the results obtained in this paper appeared in [CKLM17].
We remark here that Wu, Yao and Yuen [WYY17] have independently reported a proof of the simulation theorem for the inner-product function, while a draft of this manuscript was already in circulation. Implicit in their proof is the construction of hitting rectangle-distributions for IP, and their construction of these distributions is similar to our own. This suggests that our pseudo-random property is essential to how simulation theorems are currently proven.

Our techniques
The main tool for proving a tight deterministic simulation theorem is to use the general framework of the Raz-McKenzie theorem as expounded by Göös-Pittasi-Watson [GPW15]. Given an input z ∈ {0, 1} p for f , and wishing to compute f (z), we will query the bits of z while simulating (in our head) the communication protocol for f • g p , on inputs that are consistent with the queries to z we have made thus far. Namely, we maintain a rectangle A × B ⊆ {0, 1} np × {0, 1} np so that for any (x, y) ∈ A × B, g p (x, y) is consistent with z on all the coordinates that were queried. We will progress through the protocol with our rectangle A × B from the root to a leaf. As the protocol progresses, A × B shrinks according to the protocol, and our goal is to maintain the consistency requirement. For that we need that inputs in A × B allow for all possible answers of g on those coordinates which we did not yet query. Hence A × B needs to be rich enough, and we are choosing a path through the protocol that affects this richness the least. If the protocol forces us to shrink the rectangle A × B so that we may not be able to maintain the richness condition, we query another coordinate of z to restore the richness. Once we reach a leaf of the protocol we learn a correct answer for f (z), because there is an input (x, y) ∈ A × B on which g p (x, y) = z (since we preserved consistency) and all inputs in A × B give the same answer for f • g p , The technical property of A × B that we will maintain and which guarantees the necessary richness is called thickness. A × B is thick on the i-th coordinate if for each input pair (x, y) ∈ A × B, even after one gets to see all the coordinates of x and y except for x i and y i , the uncertainty of what appears in the ith coordinate remains large enough so that g(x i , y i ) can be arbitrary. Let us denote by Ext i A (x 1 , . . . , x i−1 , x i+1 , . . . , x p ) the set of possible extensions x i such that x 1 , . . . , x p ∈ A. We define Ext i B (y 1 , . . . , y i−1 , y i+1 , . . . , y p ) similarly. If for a given x 1 , . . . , x i−1 , x i+1 , . . . , x p and y 1 , . . . , y i−1 , y i+1 , . . . , y p we know that both Ext i A (x 1 , . . . , x i−1 , x i+1 , . . . , x p ) and Ext i B (y 1 , . . . , y i−1 , y i+1 , . . . , y p ) are of size at least 2 ( 1 2 +ǫ)n then for g = IP n there are extensions x i ∈ Ext i A (x 1 , . . . , x i−1 , x i+1 , . . . , x p ) and y i ∈ Ext i B (y 1 , . . . , y i−1 , y i+1 , . . . , y p ) such that IP n (x i , y i ) = z i . Hence, we say that . , x p ) and Ext i B (y 1 , . . . , y i−1 , y i+1 , . . . , y p ) are of size at least τ · 2 n , for every choice of i and x 1 , . . . , x p ∈ A, y 1 , . . . , y p ∈ B.
So if we can maintain the thickness of A × B, we maintain the necessary richness of A × B. It turns out that this is indeed possible using the technique of Raz-McKenzie and Göös-Pittasi-Watson. Hence as we progress through the protocol we maintain A × B to be τ -thick and dense. Once the density of either A or B drops below certain level we are forced to make a query to another coordinate of z. Magically, that restores the density (and thus thickness) of A × B on coordinates not queried. (An intuitive reason is that if the density of extensions in some coordinate is low then the density in the remaining coordinates must be large.) We capture the property of the inner function g that allows this type of argument to work, as follows. For δ ∈ (0, 1) and integer h ≥ 1 we say that g has (δ, h)-hitting monochromatic rectangle distributions if there are two distributions σ 0 and σ 1 where for each c ∈ {0, 1}, σ c is a distribution over c-monochromatic , such that for any set X ×Y ⊂ {0, 1} n ×{0, 1} n of sufficient size, a rectangle randomly chosen according to σ c will intersect X ×Y with large probability. More precisely, for any c ∈ {0, 1} and for any If such distributions σ 0 and σ 1 exist, we say that g has (δ, h)-hitting monochromatic rectangle-distributions. We then prove the following: Theorem 1.2. If g has (δ, h)-hitting monochromatic rectangle-distributions, δ < 1/6, and p ≤ 2 h 2 , then We prove this general theorem and then establish that GH and IP over n-bits has (o(1), Ω(n))-hitting rectangle-distributions. This immediately yields Theorem 1.1.
The distribution σ 0 for GH n, 1 4 is sampled as follows: we first sample a random string x of Hamming weight n 2 , and we look at the set of all strings of Hamming weight n 2 which are at Hamming distance at most n 8 from x. Let's call this set U x . The output of σ 0 will be the rectangle U x × U x . The output of σ 1 is U x × Ux, wherex is the bit-wise complement of x. For any such x, U x × U x will be a 0-monochromatic rectangle and U x × Ux will be a 1-monochromatic rectangle. Note that if U x does not hit a subset A of {0, 1} n , then it means that x is at least n 8 Hamming distance away from the set A. By an application of Harper's theorem, we can show that for a sufficiently large set A, the number of strings which are at least n 8 Hamming distance away from A is exponentially small. This will imply that both σ 0 and σ 1 will hit a sufficiently large rectangle with probability exponentially close to 1, which is our required hitting property.
The σ 0 distribution for IP n is picked as follows: To produce a rectangle U × V we sample uniformly at random a linear sub-space V ⊆ F n 2 of dimension n/2 and we set U = V ⊥ to be the orthogonal complement of V . Since a random vector space of size 2 n/2 hits a fixed subset of {0, 1} n of size 2 ( 1 2 +ǫ)n with probability 1 − O(2 −ǫn ), and both U and V are random vector spaces of that size, U × V intersects a given rectangle X × Y with probability 1 − O(2 −ǫn ). Hence, we obtain (O(2 −ǫn ), ( 1 2 + ǫ)n)-hitting distribution for IP. For the 1-monochromatic case, we first pick a random a ∈ F n 2 of odd hamming weight and them pick random V and U = V ⊥ inside of the orthogonal complement of a. The distribution σ 1 outputs the 1-monochromatic rectangle (a + V ) × (a + U ), and will have the required hitting property.

Organization
Section 2 consists of basic definitions and preliminaries. In Section 3 we prove a deterministic simulation theorem for any gadget admitting (δ, h)-hitting monochromatic rectangle-distribution: sub-section 3.1 provides some supporting lemmas for the proof, and sub-section 3.2 holds the proof itself. In Section 4 we show that GH n, 1 4 on n-bits has (o(1), n 100 )-hitting rectangle distribution, and in Section 5 we show that IP on n-bits has (o(1), n/5)-hitting rectangle distribution.

Basic definitions and preliminaries
A combinatorial rectangle, or just a rectangle for short, is any product A × B, where both A and B are finite sets.

Consider a product set
For any a ∈ ({0, 1} n ) p , we let a I = a i1 , a i2 , . . . , a i k be the projection of a onto the coordinates in I.
for some k ≤ p, we may omit the set I and write only a ′ × a ′′ . For i ∈ [p] and a p-tuple a, a =i denotes a [p]\{i} , and similarly, we call those a ′′ extensions of a ′ . Again, if A and I are clear from the context, we may omit them and write only Ext(a ′ ).
Suppose n ≥ 1 is an integer and A = {0, 1} n . For an integer p, a set A ⊆ A p and a subset S ⊆ A, the restriction of A to S at coordinate i is the set we first restrict the i-th coordinate then project onto the coordinates in I). Clearly A i,S =i is non-empty if and only if S and A i intersect.
The density of a set A ⊆ A p will be denoted by α = |A| |A| p , and α i,S I =

Communication complexity
See [KN97] for an excellent exposition on this topic, which we cover here only very briefly. In the twoparty communication model introduced by Yao [Yao79], two computationally unbounded players, Alice and Bob, are required to jointly compute a function F : A × B → Z where Alice is given a ∈ A and Bob is given b ∈ B. To compute F , Alice and Bob communicate messages to each other, and they are charged for the total number of bits exchanged. Formally, a deterministic protocol π : A × B → Z is a binary tree where each internal node v is associated with one of the players; Alice's nodes are labeled by a function a v : A → {0, 1}, and Bob's nodes by b v : B → {0, 1}. Each leaf node is labeled by an element of Z. For each internal node v, the two outgoing edges are labeled by 0 and 1 respectively. The execution of π on the input (a, b) ∈ A × B follows a path in this tree: starting from the root, in each internal node v belonging to Alice, she communicates a v (a), which advances the execution to the corresponding child of v; Bob does likewise on his nodes, and once the path reaches a leaf node, this node's label is the output of the execution. We say that π correctly computes F on (a, b) if this label equals F (a, b).
To each node v of a deterministic protocol π we associate a set R v ⊆ A × B comprising those inputs (a, b) which cause π to reach node v. It is easy see that this set R v is a combinatorial rectangle, i.e.
The communication complexity of π is the height of the tree. The deterministic communication complexity of F , denoted D cc (F ), is defined as the smallest communication complexity of any deterministic protocol which correctly computes F on every input.

Decision tree complexity
In the (Boolean) decision-tree model, we wish to compute a function f : {0, 1} p → Z when given query access to the input, and are charged for the total number of queries we make.
Formally, a deterministic decision-tree T : {0, 1} p → Z is a rooted binary tree where each internal node v is labeled with a variable-number i ∈ [p], each edge is labeled 0 or 1, and and each leaf is labeled with an element of Z. The execution of T on an input z ∈ {0, 1} p traces a path in this tree: at each internal node v it queries the corresponding coordinate z i , and follows the edge labeled z i . Whenever the algorithm reaches a leaf, it outputs the associated label and terminates. We say that T correctly computes f on z if this label equals f (z).
The query complexity of T is the height of the tree. The deterministic query complexity of f , denoted D dt (F ), is defined as the smallest query complexity of any deterministic decision-tree which correctly computes f on every input.

Functions of interest
The Inner-product function on n-bits, denoted IP n is defined on {0, 1} n × {0, 1} n to be: x i · y i mod 2.
For N = 2 n , the Indexing function on N -bits, IND N , is defined on {0, 1} log N × {0, 1} N to be: Let n be a natural number and γ = k n ∈ (0, 1/2). For two n-bit strings x and y, let d H (x, y) = i x i ⊕ y i be their Hamming-distance. The gap-Hamming problem, denoted GH n,γ is a promise-problem defined on {0, 1} n × {0, 1} n , by the condition

Deterministic simulation theorem
A simulation theorem shows how to construct a decision tree for a function f from a communication protocol for a composition problem f • g p . Such a theorem can also be called a lifting theorem, if one wishes to emphasize that lower-bounds for the decision-tree complexity of f can be lifted to lower-bounds for the communication complexity of f • g p . As mentioned in Section 1, the deterministic lifting theorem proved in [RM99], and subsequently simplified in [GPW15], uses IND N as inner function g with N being polynomially larger than p. In this section we will show a deterministic simulation theorem for any function which possesses a certain pseudo-random property, which we will now define. Later we will show that the Inner-product function has this property. Definition 3.2. For a real δ ≥ 0 and an integer h ≥ 1, we say that a (possibly partial) function g : A × B → {0, 1} has (δ, h)-hitting monochromatic rectangle-distributions if there are two (δ, h)-hitting rectangle-distributions σ 0 and σ 1 , where each σ c is a distribution over rectangles within A × B that are c-monochromatic with respect to g.
The theorem we will prove in Section 3.2 is the following: Theorem 3.3. Let ε ∈ (0, 1) and δ ∈ (0, 1 6 ) be real numbers, and let h ≥ 6/ε and 1 ≤ p ≤ 2 h(1−ε) be integers. Let f : {0, 1} p → Z be a function and g : A × B → {0, 1} be a (possibly partial) function. If g has (δ, h)-hitting monochromatic rectangle-distributions then In Section 4 we will show that GH n, 1 4 has (o(1), n 100 )-hitting monochromatic rectangle-distributions. From this we obtain a simulation theorem for GH n, 1 4 : Corollary 3.4. Let n be large enough even integer, and p ≤ 2 n 200 be an integer. For any function In Section 5 we will show that IP n has (o(1), n( 1 2 − ε))-hitting monochromatic rectangle-distributions, for any constant ε ∈ (0, 1/2). This allows us to derive: Corollary 3.5. Let n be large enough integer, ε ∈ (0, 1/2) be a constant real, and p ≤ 2 ( 1 2 −ε)n be an integer. For any function f : This allow us to significantly improve the gadget size known for simulation theorem of [RM99,GPW15], that uses the Indexing function instead of Inner-Product. Indeed, Jakob Nordström [Nor16] recently posed to us the challenge of proving a simulation theorem for f • IND p N , with a gadget size N smaller than p 3 (p 3 is already a significant improvement to [RM99,GPW15]).
This follows from the above corollary, because of the following reduction: Given an instance (a, b) ⊆ ({0, 1} np ) 2 of f •IP p n where p ≤ 2 n( 1 2 +ε) , Alice and Bob can construct an instance of f •IND p N where N = 2 n . Bob converts his input b ∈ {0, 1} np to b ′ ∈ {0, 1} N p , so that each b ′ i = [IP n (x 1 , b i ) , · · · , IP n (x N , b i ) ] where {x 1 , · · · , x N } = {0, 1} n is an ordering of all n-bit strings. It is easy to see that IP n (a i , b i ) = IND N (a i , b ′ i ). Hence it follows as a corollary to our result for IP: Corollary 3.6. Let ε ∈ (0, 1/2) be a constant real number, and N and p be sufficiently large natural numbers, such that p 2+ε ≤ N . Then D dt (f ) = O( 1 ε·log N · D cc (f • IND p N )). Also, it is worth noting that the proof of Lemma 7 (projection lemma) in [GPW15] implicitly proves that IND n has (o(1), 3 20 log N )-hitting rectangle-distribution. Hence we can also apply Theorem 3.3 directly to obtain a corollary similar to Corollary 3.6 (albeit with much larger gadget size N ).

Thickness and its properties
The following property is from [GPW15, Lemma 6].
Lemma 3.9 (Average-thickness implies thickness). For any p ≥ 2, if A ⊆ A p is ϕ-average-thick, then for every δ ∈ (0, 1) there is a δ p ϕ-thick subset Proof. The set A ′ is obtained by running Algorithm 1.
Algorithm 1 Let a ′ be a right node of G(A j , i) with non-zero degree less than δ p ϕ · 2 n .

5: Set
The total number of iteration of the algorithm is at most i∈[p] |A =i |. (We remove at least one node in some G(A j , i) in each iteration which was a node also in the original G(A, i).) So the number of iterations is at most As the algorithm removes at most δ p ϕ · 2 n elements of A in each iteration, the total number of elements removed from A is at most δ|A|, so |A ′ | ≥ (1 − δ)|A|. Hence, the algorithm always terminates with a non-empty set A ′ that must be δ p ϕ-thick.
Lemma 3.10. Let p ≥ 2 be an integer, i ∈ [p], A ⊆ A p be a τ -thick set, and S ⊆ A. The set Proof. Notice that A i,S =i is non-empty iff S ∩ A i is non-empty. Consider the case of p ≥ 3. Let a ∈ A, where a i ∈ S. Set a ′ = a =i . For j ′ ∈ [p − 1], let j = j ′ + 1 if j ′ ≥ i, and j = j ′ otherwise. Clearly, =i is τ -thick. To see the case p = 2, assume there is some string a ′ ∈ A =i which has some extension a ′′ ∈ S; but A itself is τ -thick, so there have to be at least τ · |A| many such a ′ , which will then all be in A i,S =i .
Lemma 3.11. Let h ≥ 1, p ≥ 2 and i ∈ [p] be integers and δ, τ, ϕ ∈ (0, 1) be reals, where τ ≥ 2 −h . Consider a function g : A × B → {0, 1} which has (δ, h)-hitting monochromatic rectangle-distributions. Suppose A × B ⊆ A p × B p is a non-empty rectangle which is τ -thick, and suppose also that d avg (A, i) ≤ ϕ · |A|. Then for any c ∈ {0, 1}, there is a c-monochromatic rectangle U × V ⊆ A × B such that The constant 3 in the statement may be replaced by any value greater than 2, so the lemma is still meaningful for δ arbitrarily close to 1/2.
Proof. Fix c ∈ {0, 1}. Consider a matrix M where rows correspond to strings a ∈ A =i , and columns correspond to rectangles R = U × V in the support of σ c . Set each entry M (a, R) to 1 if U ∩ Ext {i} A (a) = ∅, and set it to 0 otherwise.
For each a ∈ A =i , |Ext {i} A (a)| ≥ τ |A|, and because σ c is a (δ, h)-hitting rectangle-distribution and τ ≥ 2 −h , we know that if we pick a column R according to σ c , then M (a, R) = 1 with probability ≥ 1 − δ. So the probability that M (a, R) = 1 over uniform a and σ c -chosen R is ≥ 1 − δ.
Call a column of M A-good if M (a, R) = 1 for at least 1 − 3δ fraction of the rows a. Now it must be the case that the A-good columns have strictly more than 1/2 of the σ c -mass. Otherwise the probability that M (a, R) = 1 would be < 1 − δ.
A similar argument also holds for Bob's set B =i . Hence, there is a c-monochromatic rectangle R = U × V whose column is both A-good and B-good in their respective matrices. This is our desired rectangle R.
We know: Combined with the lower bound on and B i,V =i follows from Lemma 3.10.
Proof. This follows from repeated use of Lemma 3.10. Fix arbitrary z ∈ {0, 1} p . Set A (1) = A and B (1) = B. We proceed in rounds i = 1, . . . , p − 1 maintaining a τ -thick rectangle If we pick U i × V i from σ zi , then the rectangle (A (i) ) {i} ∩ U i × (B (i) ) {i} ∩ V i will be non-empty with probability ≥ 1 − δ > 0 (because σ zi is a (δ, h)-hitting rectangle-distribution and τ ≥ 2 −h ). Fix such U i and V i . Set a i to an arbitrary string in (A (i) ) {i} ∩ U i , and b i to an arbitrary string in ( , and proceed for the next round. By Lemma 3.10, Eventually, we are left with a rectangle A (p) × B (p) ⊆ A × B where both A (p) and B (p) are τ -thick (and non-empty). Again with probability 1 − δ > 0, the z p -monochromatic rectangle U p × V p chosen from σ zp will intersect A (p) ×B (p) . We again set a p and b p to come from the intersection, and set a = a 1 , a 2 , . . . , a p and b = b 1 , b 2 , . . . , b p .

Proof of the simulation theorem
Now we are ready to present the simulation theorem (Theorem 3.3). Let ε ∈ (0, 1/2) and δ ∈ (0, 1/16) be real numbers, and h ≥ 6/ε and 1 ≤ p ≤ 2 h(1−ε) be integers. Let f : {0, 1} p → Z be a function and g : A × B → {0, 1} be a (possibly partial) function. Assume that g has (δ, h)-hitting monochromatic rectangle-distributions. We assume we have a communication protocol Π for solving f • g p , and we will use Π to construct a decision tree (procedure) for f . Let C be the communication cost of the protocol Π. If p ≤ 5C/h the theorem is true trivially. So assume p > 5C/h. Set ϕ = 4 · 2 −εh and τ = 2 −h . The decision-tree procedure is presented in Algorithm 2. On an input z ∈ {0, 1} p , it uses the protocol Π to decide which bits of z to query.
The algorithm maintains a rectangle A × B ⊆ A p × B p and a set I ⊆ [p] of indices. I corresponds to coordinates of the input z that were not queried, yet. Let v 0 , v 1 be the children of v.

11:
Let U × V be a z i -monochromatic rectangle of g such that 12: (1)

14:
( Query z i , where i is the j-th (smallest) element of I.

18:
Let U × V be a z i -monochromatic rectangle of g such that 19:

21:
( Correctness. The algorithm maintains an invariant that A I × B I is τ -thick. This invariant is trivially true at the beginning. If both A I and B I are ϕ-average-thick, the algorithm finds sets A ′ and B ′ on line 5-7 as follows. Consider the case that Alice communicates at node v. She is sending one bit. Let A 0 be inputs from A on which Alice sends 0 at node v and A 1 = A \ A 0 . We can pick c ∈ {0, 1} such that |(A c ) I | ≥ |A I |/2. Set A ′′ = A i . Since A I is ϕ-average-thick, A ′′ I is ϕ/2-average-thick. So using Lemma 3.9 on A ′′ I with δ set to 1/2, we can find a subset A ′ of A ′′ such that A ′ I is ϕ 4·|I| -thick and |A ′ I | ≥ |A ′′ I |/2. (A ′ ⊆ A ′′ will be the pre-image of A ′ I obtained from the lemma.) Since ϕ = 4 · 2 −εh and |I| ≤ p ≤ 2 h(1−ε) , the set A ′ I will be 2 −h -thick, i.e. τ -thick. Setting B ′ = B, the rectangle A ′ × B ′ satisfies properties from lines 6-7. A similar argument holds when Bob communicates at node v.
If A I is not ϕ-average-thick, the existence of U × V at line 11 is guaranteed by Lemma 3.11. Similarly in the case when B I is not ϕ-average-thick.
Next we argue that the number of queries made by Algorithm 2 is at most 5C/εh. In the first part of the while loop (line 3-8), the density of the current A I × B I drops by a factor 4 in each iteration. There are at most C such iterations, hence this density can drop by a factor of at most 4 −C = 2 −2C . For each query that the algorithm makes, the density of the current A I × B I increases by a factor of at least (1 − 3δ)/ϕ ≥ 1 2ϕ ≥ 2 εh−3 (here we use the fact that δ ≤ 1/6). Since the density can be at most one, the number of queries is upper bounded by 2C εh − 3 ≤ 4C εh , when h ≥ 6/ε.
Finally, we argue that f (A × B) at the termination of Algorithm 2 is the correct output. Given an input z ∈ {0, 1} p , whenever the algorithm queries any z i , the algorithm makes sure that all the input pairs (x, y) in the rectangle A × B are such that g(x i , y i ) = z i -because U × V is always a z i -monochromatic rectangle of g. At the termination of the algorithm, I is the set of i such that z i was not queried by the algorithm. As p > 4C/εh, I is non-empty. Since A I × B I is τ -thick, it follows from Lemma 3.12 that A × B contains some input pair (x, y) such that g |I| (x I , y I ) = z I , and so g p (x, y) = z. Since Π is correct, it must follow that f (z) = f • g p (A × B). This concludes the proof of correctness.
With greater care the same argument will allow for δ to be close to 1 2 . This would require also tightening the 1 − 3δ factors appearing in Lemma 3.11 to something close to 1 − 2δ. The details are left to the reader, should he be interested.

Hitting rectangle-distributions for GH
We construct a hitting rectangle distribution for GH n, 1 4 . Subsequently, we will show a (δ, h)-hitting rectangle distribution where Let ε = 1 8 and H be the set of all strings in {0, 1} n with Hamming weight n/2. Now consider the rectangle distributions σ 0 and σ 1 obtained from the following sampling procedure: • Choose a random string x ∈ H, and letx ∈ H be its bit-wise complement.
• Now let U x = B εn (x) and V x = B εn (x).
• The output of σ 1 is the rectangle U x × V x , and the output of For the chosen value of ε, U x × V x is a 1-monochromatic rectangle, since for any On the other hand, U x × U x is 0-monochromatic, since for any u, u ′ ∈ U x , Both inequalities are obtained by a straight-forward application of triangle inequality.
To prove Lemma 4.1, we need the following theorem due to Harper. We will call S ⊂ {0, 1} n a Hamming ball with center c ∈ {0, 1} n if B r (c) ⊆ S ⊂ B r+1 (c) for some non-negative integer r. For sets S, T ⊂ {0, 1} n , we define the distance between S and T as d(S, T ) = min{d H (s, t) | s ∈ S, t ∈ T }. Note that Claim 4.2 also tells us when B r (S) is smallest for a set S ⊂ {0, 1} n . This can be argued in the following way: Given a set S ∈ {0, 1} n , let T S = {0, 1} n \ B r (S). It is immediate that d(S, T S ) = r + 1. Now let us suppose that S is such that it achieves the smallest B r (S ′ ) among all S ′ ∈ {0, 1} n with |S ′ | = |S|. This also means that T S is the biggest such set. Using Harper's theorem, we can find set S 0 and T 0 such that d(S 0 , T 0 ) ≥ r + 1 where S 0 is centered around1 and T 0 is centered around0 with |S 0 | = |S| and |T 0 | = |T S |. Now it is easy to see that T 0 ⊆ {0, 1} n \ B r (S 0 ), i.e., |T S | = |T 0 | ≤ |T S0 |, which is a contradiction. This means that |B r (S)| will be the smallest if S is a Hamming ball centered around 1. This gives us the following corollary. Proof of Lemma 4.1. We will show that any set A ⊂ {0, 1} n of size |A| ≥ 2 99 100 n will be hit by U x with probability ≥ 1 − 2 − n 100 . The lemma now follows since U x and V x have the same marginal distribution.
The event U x ∩ A = ∅ happens exactly when x / ∈ B εn (A): From

Hitting rectangle-distributions for IP
In this section, we will show that IP n has (4 · 2 −n/20 , n/5)-hitting monochromatic rectangle-distributions. This will show a deterministic simulation result when the inner function is IP n , i.e., We will use the following well-known variant of Chebyshev's inequality: Proposition 5.1 (Second moment method). Suppose that X i ∈ [0, 1] and X = i X i are random variables. Suppose also that for all i and j, X i and X j are anti-correlated, in the sense that Then X is well-concentrated around its mean, namely, for every ε: All of the rectangle-distributions rely on the following fundamental anti-correlation property: Lemma 5.2 (Hitting probabilities of random subspaces). Let 0 ≤ d ≤ n be natural numbers. Fix any v = w in F n 2 , and pick a random subspace V of dimension d. Then the probability that v ∈ V is exactly And the probability that both v, w ∈ V is exactly if v, w = 0 p v if w = 0, and p w if v = 0.
Hence it always holds that p v,w ≤ p v p w .
Proof. The case when v or w are 0 is trivial. The value p v = Pr[v ∈ V ] for a random subspace V of dimension d equals Pr[M v = 0] for a random non-singular (n − d) × n matrix M , letting V = ker M . For any v = 0, v ′ = 0, M will have the same distribution as M N , where N is some fixed linear bijection of F n 2 mapping v to v ′ ; it then follows that p v = p v ′ always. But then v =0 and since all p v 's are equal, then p v = 2 d −1 2 n −1 . Now let p v,w = Pr[v ∈ V, w ∈ V ]. In the same way we can show that p v,w = p v ′ ,w ′ for all two such pairs, since a linear bijection will exist mapping v to v ′ and w to w ′ (because every v = w is linearly independent in F n 2 ). And now v,w =0 The value of p v,w is then as claimed. We conclude by estimating It can now be shown that a random subspace of high dimension will hit a large set w.h.p.: Let's look at the case where0 ∈ B. We can estimate µ as follows: 1 When0 ∈ B we still have µ ∈ (1 ± 2 − ε 2 n )β|V | , because 1 − 2 d −1 2 n −1 ≤ 1 ≪ 1 3 · 2 − ε 2 n β|V |. So this holds in both cases.
Proof. We define the distributions σ 0 and σ 1 by the following sampling methods: Sampling from σ 0 : We choose a uniformly-random n 2 -dimensional subspaces V of F n 2 , and let V ⊥ be its orthogonal complement; output V × V ⊥ .
Sampling from σ 1 : First we pick a ∈ {0, 1} n uniformly at random conditioned on the fact that a has odd Hamming weight; then we pick random subspace W of dimension (n − 1)/2 from a ⊥ , and let W ⊥ be the orthogonal complement of W inside a ⊥ . We output V × V , where V = a + W and V = a + W ⊥ .
Hence the same probability lower-bounds the event that A × B ∩ R = ∅.