1 Introduction

Information cost was introduced by a series of papers [1, 6, 8, 9, 13] as a complexity measure for two-player communication protocols. Internal information cost measures the amount of information that each player learns about the input of the other player while executing a given protocol. In the usual setting of communication complexity we have two players, Alice and Bob, each having an input x and y, respectively. Their goal is to determine the value f(xy) for some predetermined function f. They achieve the goal by communicating to each other some amount of information about their inputs according to some protocol.

The usual measure considered in this setting is the number of bits exchanged by Alice and Bob, whereas the internal information cost measures the amount of information transferred between the players during the communication. Clearly, the amount of information is upper bounded by the number of bits exchanged but not vice versa. There might be a lengthy protocol (say even of exponential size) that reveals very little information about the players’ inputs.

In recent years, a substantial research effort was devoted to proving the converse relationship between the information cost and the length of protocols, i.e., to proving that a protocol which reveals only I bits of information can be simulated by a different protocol which communicates only (roughly) I bits. Such results are known as compression theorems. Barak et al. [1] prove that a protocol that communicates C bits and has internal information cost I can be replaced by another protocol that communicates \(O(\sqrt{I \cdot C} \log C)\) bits. For the case when the inputs of Alice and Bob are sampled from independent distributions they also obtain a protocol that communicates \(O(I \cdot \mathop { polylog} C)\) bits. These conversions do not preserve the number of rounds. In a follow up paper [6] consider a bounded round setting and give a technique that converts the original q-round protocol into a protocol with \(O(q \cdot \log I)\) rounds that communicates \(O(I + q\log \frac{q}{\varepsilon })\) bits with additional error \(\varepsilon \).

All known compression theorems are in the randomized setting. We distinguish two types of randomness—public and private. Public random bits are seen by both communicating players, and both players can take actions based on these bits. Private random bits are seen only by one of the parties, either Alice or Bob. We use public-coin (private-coin) to denote protocols that use only public (private) randomness. If a protocol uses both public and private randomness, we call it a mixed-coin protocol.

Simulating a private-coin protocol using public randomness is straightforward: Alice views a part of the public random bits as her private random bits, Bob does the same using some other portion of the public bits, and they communicate according to the original private-coin protocol. This new protocol communicates the same number of bits as the original protocol and computes the same function. In the other direction, an efficient simulation of a public-coin protocol using private randomness is provided by Newman’s Theorem [16]. Sending over Alice’s private random bits to make them public could in general be costly as they may need, e.g., polynomially many public random bits, but Newman showed that it suffices for Alice to transfer only \(O(\log n + \log \frac{1}{\delta })\) random bits to be able to simulate the original public-coin protocol, up to an additional error of \(\delta \).

In the setting of information cost the situation is quite the opposite. Simulating public randomness by private randomness is straightforward: one of the players sends a part of his private random bits to the other player and then they run the original protocol using these bits as the public randomness. Since the random bits contain no information about either input, this simulation reveals no additional information about the inputs; thus the information cost of the protocol stays the same. This is despite the fact that the new protocol may communicate many more bits than the original one.

However, the conversion of a private-randomness protocol into a public-randomness protocol seems significantly harder. For instance, consider a protocol in which in the first round Alice sends to Bob her input x bit-wise XOR-ed with her private randomness. Such a message does not reveal any information to Bob about Alice’s input—as from Bob’s perspective he observes a random string—but were Alice to reveal her private randomness to Bob, he would learn her complete input x. This illustrates the difficulty in converting private randomness into public.

We will generally call “Reverse Newman’s Theorem” (RNT) a result that makes randomness public in an interactive protocol without revealing more information. This paper is devoted to attacking the following:

RNT  Question Can we take a private-coin protocol with information cost I and convert it into a public-coin protocol with the same behavior and information cost \(\tilde{O}(I)\)?

Interestingly, the known compression theorems [1, 6, 12] give compressed protocols that use only public randomness, and hence as a by-product they give a conversion of private-randomness protocols into public-randomness equivalents. However, the parameters of this conversion are far from the desired ones.Footnote 1 In Sect. 4 we show that the RNT  question represents the core difficulty in proving full compression theorems; namely, we will prove that any public-coin protocol that reveals I bits of information can already be compressed to a protocol that uses \(\tilde{O}(I)\) bits of communication, and hence a fully general RNT  would result in fully general compression results, together with the direct-sum results that would follow as a consequence. This was discovered independently by Denis Pankratov, who in his MSc thesis [17] extended the analysis of the [1] compression schemes to show that they achieve full compression in the case when only public randomness is used. Our compression scheme is similar but slightly different: we discovered it originally while studying the compression problem in a Kolmogorov complexity setting (as in [4]), and our proof for the Shannon setting arises from the proper “translation” of this proof; we include it for completeness and because we think it makes for a more elementary proof.

Main contributions Our main contribution is a Reverse Newman’s Theorem in the bounded-round scenario. We will show that any q-round private-coin protocol can be converted to an O(q)-round public-coin protocol that reveals only additional \(\tilde{O}(q)\) bits of information (Theorem 1). Our techniques are new and interesting. Our main technical tool is a conversion of one round private-randomness protocols into one round public-randomness protocols. This conversion proceeds in two main steps. After discretizing the protocol so that the private randomness is sampled uniformly from some finite domain, we convert the protocol into what we call a 1–1 protocol, which is a protocol having the property that for each input and each message there is at most one choice of private random bits that will lead the players to send that message. We show that such a conversion can be done without revealing too much extra information. In the second step we take any 1–1 protocol and convert it into a public-coin protocol while leaking only a small additional amount of information about the input. This part relies on constructing special bipartite graphs that contain a large matching between the right partition and any large subset of left vertices.

Furthermore, we will prove two compression results for public-randomness protocols: a round-preserving compression scheme to be used in the bounded-round case, and a general (not round-preserving) compression scheme which can be used with a fully general RNT. Either of these protocols achieves much better parameters than those currently available for general protocols (that make use of private randomness as well as public). The round-preserving compression scheme is essentially a constant-round average-case one-shot version of the Slepian–Wolf coding theorem [19], and is interesting in its own right.

As a result of our RNT and our round-preserving compression scheme, we will get a new compression result for general (mixed-coin) bounded-round protocols. Whereas previous results for the bounded-round scenario [6] gave compression schemes with communication complexity similar to our own result, their protocols were not round-preserving. We prove that a q-round protocol that reveals I bits of information can be compressed to an O(q)-round protocol that communicates \(O(I + 1)+q \log (\frac{q n}{\delta })\) bits, with additional error \(\delta \). As a consequence we will also improve the bounded-round direct-sum theorem of [6].

Subsequent work Since the publication of the conference version of the paper [2], the following papers have extended or made use of our results:

  • Braverman et al. [7] have shown direct-product theorems for constant-round randomized communication complexity, which is an improvement of our direct-sum results.

  • Braverman and Garg [3] have devised a shorter proof of a Reverse Newman’s Theorem for constant-round protocols, and with tighter bounds. They show that a private-coin single-round protocol revealing I bits of information can be made public-coin by revealing only \(\log I\) additional bits (a better bound than our \(O(\log 2 n \ell )\) of Theorem 2).

  • Kozachinsky [14] has shown a general Reverse Newman’s Theorem, proving that a private-coin protocol revealing I bits of information and using C bits of communication can be converted into a public-coin protocol revealing \(O(\sqrt{I C})\) bits of information. Together with our and (independently) Pankratov’s compression result for general protocols (Theorem 3), this gives the best-known direct-sum result for general protocols of (Braverman et al).

  • Bauer et al. [5] show how to compress a protocol with internal entropy \(H^{int}\) and worst-case communication C into a protocol with communication \((\frac{H^{int}}{\varepsilon })^2 \log \log C\) incurring extra error \(\varepsilon \); in the case of public-coin protocols, \(H^{int}\) is exactly the information cost, and hence this gives an exponential improvement for the dependence on C, compared to any of our schemes.

  • Kozachinsky [15] has also provided a simpler proof of the one-shot Slepian–Wolf theorem, with smaller constants.

Differences from the conference version The paper has been substantially altered since its conference version. We provide a new lower bound on the degree of matching graphs, and a lower bound against any improvement to our strategy for proving a single-round Reverse Newman’s Theorem. Furthermore, besides improvements in overall readability, the paper includes new proofs for:

  • Theorem 7 (Constant-round average-case one-shot Slepian–Wolf), the proof in the conference submission was wrong.

  • Lemma 2 (Making protocols 1–1 without losing information), the new proof is one-third the size and much simpler.

  • Lemma 1 (Existence of matching graphs), we have a shorter, more elegant proof with slightly worse bounds, that are nonetheless good enough for our applications.

Organization of the paper In Sect. 3 we discuss our Reverse Newman’s Theorem. In Sect. 4 we will prove compression results. Section 5 will give applications to direct-sum theorems. Finally, Sect. 6 is dedicated to showing alternatives to the constructions we have presented, as well as bounds that prevent further improvement to our techniques.

2 Preliminaries

We use capital letters to denote random variables, calligraphic letters to denote sets, and lower-case letters to denote elements in the corresponding sets. So typically A is a random variable distributed over the set \({\mathcal {A}}\), and a is an element of \({\mathcal {A}}\). We will also use capital and lower-case letters to denote integers numbering or indexing certain sequences. We use \(\,\varDelta \!\left( A,A'\right) \) to denote the statistical distance between the probability distributions of two random variables A and \(A'\):

$$\begin{aligned} \,\varDelta \!\left( A,A'\right) = \frac{1}{2} \sum _{a \in \mathcal {A}} \left| \Pr [A=a] - \Pr [A' = a]\right| . \end{aligned}$$

2.1 Information Theory

For a given probability random variable A distributed over the support \(\mathcal {A}\), its entropy is

$$\begin{aligned} H(A) = \sum _{a\in \mathcal {A}} p_a \log \frac{1}{p_a}, \end{aligned}$$

where \(p_a = \Pr [A = a]\). Given a second random variable B that has a joint distribution with A, the conditional entropy H(A | B) equals

$$\begin{aligned} {{\mathrm{\mathbb {E}}}}_{b \in B}[ H(A | B = b) ]. \end{aligned}$$

In this paper, and when clear from the context, we denote a conditional distribution \(A | B = b\) more succinctly by A|b.

Fact 1

If A has n possible outcomes then

$$\begin{aligned} H(A)\le \log n. \end{aligned}$$

Fact 2

$$\begin{aligned} H(A|B)\le H(A)\le H(A,B),\quad H(A|B,C)\le H(A|C)\le H(A,B|C). \end{aligned}$$

Fact 3

$$\begin{aligned} H(A,B) =H(A)+H(B|A),\quad H(A,B|C) =H(A|C)+H(B|A,C). \end{aligned}$$

We let \(I(A : B)\) [\(I(A : B|C)\)] denote the Shannon mutual information between A and B (conditional to C):

$$\begin{aligned} I(A:B)&= H(A) - H(A | B) = H(B) - H(B | A),\\ I(A:B|C)&= H(A|C) - H(A | B,C) = H(B|C) - H(B | A,C). \end{aligned}$$

Notice that the first inequality in Fact 2 does not apply to Shannon information: \(I(A:B|C)\) may be larger than \(I(A:B)\) (for instance when \(C=A+B\) for independent AB).

Fact 4

(Chain rule)

$$\begin{aligned} I(A_1, \ldots , A_k : B | C) = I(A_1 : B | C) + \sum _{i = 2}^{k} I(A_i : B | C, A_1, \ldots , A_{i-1}) \end{aligned}$$

Here \(A_1,\dots ,A_k\) stands for a random variable in the set of k-tuples and \(A_i\) stands for its \(i\hbox {th}\) projection.

Fact 5

A and B are independent conditional to C (which means that whatever outcome c of C we fix, A and B become independent conditional to the event \(C=c\)) if and only if \(I(A:B|C) = 0\).

Fact 6

If A and B are independent conditional to D then

$$\begin{aligned} I(A:C|B,D) = I(A:B C|D) \le I(A:C|D). \end{aligned}$$

Fact 7

If A and C are independent conditional to the pair BD then

$$\begin{aligned} I(A:B,C|D) =I(A:B|D). \end{aligned}$$

From Fano’s inequality the following easily follows:

Fact 8

For any two random variables AB over the same universe \({\mathcal {U}}\), it holds that

$$\begin{aligned} |H(A) - H(B)| \le \log (|{\mathcal {U}}|) \,\varDelta \!\left( A,B\right) + 1, \end{aligned}$$

2.2 Two-Player Protocols

We will be dealing with protocols that have both public and private randomness; this is not very common, so we will give the full definitions, which are essentially those of [1, 6]. We will be working exclusively in the distributional setting. From here onwards, we will assume that the input is given to two players, Alice and Bob, by way of two random variables XY sampled from a possibly correlated distribution \(\mu \) over the support \(\mathcal {X}\times {\mathcal {Y}}\).

A private-coin protocol \(\pi \) with output set \(\mathcal {Z}\) is defined as a rooted tree, called the protocol tree, in the following way:

  1. 1.

    Each non-leaf node is owned by either Alice or Bob.

  2. 2.

    If v is a non-leaf node belonging to Alice, then:

    1. (a)

      The children of v are owned by Bob; each child is labeled with a binary string, and the set \({\mathcal {C}}(v)\) of labels of v’s children is prefix-free.

    2. (b)

      Associated with v is a set \(\mathcal {R}_v\), and a function \(M_v: \mathcal {X}\times \mathcal {R}_v \rightarrow {\mathcal {C}}(v)\).

  3. 3.

    The situation is analogous for Bob’s nodes.

  4. 4.

    With each leaf we associate an output value in \(\mathcal {Z}\).

On input xy the protocol is executed as follows:

  1. 1.

    Set v to be the root of the protocol tree.

  2. 2.

    If v is a leaf, the protocol ends and outputs the value associated with v.

  3. 3.

    If v is owned by Alice, she picks a string r uniformly at random from \(\mathcal {R}_v\) and sends the label of \(M_v(x, r)\) to Bob, they both set \(v := M_v(x, r)\), and return to the previous step. Bob proceeds analogously on the nodes he owns.

A general, or mixed-coin, protocol is given by a distribution over private-coin protocols. The players run such a protocol by using shared randomness to pick an index r (independently of X and Y) and then executing the corresponding private-coin protocol \(\pi _r\). A protocol is called public-coin if every \(\mathcal {R}_v\) has size 1, i.e., no private randomness is used.

We let \(\pi (x, y, r, r_A, r_B)\) denote the messages exchanged during the execution of \(\pi \), for given inputs xy, and random choices \(r, r_A\) and \(r_B\), and \(\textsc {Out}_\pi (x, y,r, r_A, r_B)\) be the output of \(\pi \) for said execution. The random variable R is the public randomness, \(R_A\) is Alice’s private randomness, and \(R_B\) is Bob’s private randomness; we use \(\varPi \) to denote the random variable \(\pi (X, Y, R, R_A, R_B)\). We assume without loss of generality that \(R, R_A,\) and \(R_B\) are uniformly distributed.

Definition 1

The worst-case communication complexity of a protocol \(\pi \), \(\mathsf {CC}(\pi )\), is the maximum number of bits that can be transmitted in a run of \(\pi \) on any given input and choice of random strings. The average communication complexity of a protocol \(\pi \), with respect to the input distribution \(\mu \), denoted \(\mathsf {ACC}_\mu (\pi )\), is the average number of bits that are transmitted in an execution of \(\pi \), for inputs drawn from \(\mu \). The worst-case number of rounds of \(\pi \), \(\mathsf {RC}(\pi )\), is the maximum depth reached in the protocol tree by a run of \(\pi \) on any given input. The average number of rounds of \(\pi \), w.r.t. \(\mu \), denoted \(\mathsf {ARC}_\mu (\pi )\), is the average depth reached in the protocol tree by an execution of \(\pi \) on input distribution \(\mu \).

Definition 2

The (internal) information cost of protocol \(\pi \) with respect to \(\mu \) is:

$$\begin{aligned} \mathsf {IC}_\mu (\pi ) = I(Y : \varPi , R, R_A| X) + I(X : \varPi ,R, R_B| Y) \end{aligned}$$

Here the term \(I(Y : R, \varPi , R_A| X)\) stands for the amount of information Alice learns about Bob’s input after the execution of the protocol (and the meaning of the second term is similar). This term can be re-written in several different ways:

$$\begin{aligned} I(Y : \varPi , R, R_A| X)&= I(Y : \varPi | X, R, R_A)= I(Y : \varPi ,R | X, R_A),\\ I(Y : \varPi , R, R_A| X)&= I(Y : \varPi ,R | X)=I(Y : \varPi | X,R). \end{aligned}$$

Here the first equality holds, as Bob’s input Y is independent from randomness \(R,R_A\) conditional to X, which is obvious (see Fact 6 from the preliminaries). The second equality holds, since Y is independent from randomness R conditional to \(X,R_A\), which is also obvious.

The third equality holds, as Y is independent from \(R_A\) conditional to \(\varPi ,X,R\) (Fact 7). This independence follows from the rectangle property of protocols: for every fixed \(\varPi ,X,R\) the set of all pairs \(((Y,R_B),R_A)\) producing the transcript \(\varPi \) is a rectangle and thus the pair \((Y,R_B)\) (and hence Y) is independent from \(R_A\) conditional to \(\varPi ,X,R\). The fourth equality is proven similarly to the first and the second ones.

The expressions \(I(Y : \varPi ,R | X)\) and \(I(Y : \varPi | X,R)\) for the information revealed to Alice are the most convenient ones and we will use them throughout the paper. Similar transformations can be applied to the second term in Definition 2.

Definition 3

A protocol \(\pi \) is said to compute function \(f:\mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Z}\) with error probability \(\varepsilon \) over distribution \(\mu \) if

$$\begin{aligned} {\mathop {\Pr }\limits _{\mu , R, R_A, R_B}} [ \textsc {Out}_\pi (x, y, r, r_A, r_B)&= f(x, y) ] \ge 1-\varepsilon \ . \end{aligned}$$

Many of our technical results require that the protocol uses a limited amount of randomness at each step. This should not be surprising—this is also a requirement of Newman’s theorem. This motivates the following definition.

Definition 4

A protocol \(\pi \) is an \(\ell \)-discrete protocol Footnote 2 if \(|\mathcal {R}_v|=2^\ell \) at every node of the protocol tree.

When a protocol is \(\ell \)-discrete, we say that it uses \(\ell \) bits of randomness for each message; when \(\ell \) is clear from context, we omit it. While the standard communication model allows players to use an infinite amount of randomness at each step, this is almost never an issue, since one may always “round the message probabilities” to a finite precision. This intuition is captured in the following observation.

Observation 1

Suppose \(\pi \) is a private-coin protocol. Then, there exists an \(\ell \)-discrete protocol \(\pi '\) with \(\ell = O(\log (|\mathcal {X}|) + \log (|\mathcal {Y}|) + \mathsf {CC}(\pi ))\) such that (i) \(\mathsf {CC}(\pi ') \le \mathsf {CC}(\pi )\), (ii) \(\mathsf {RC}(\pi ') \le \mathsf {RC}(\pi )\), and (iii) for all xy we have

$$\begin{aligned} \,\varDelta \!\left( \varPi '(x,y,R_A,R_B),\varPi (x,y,R_A,R_B)\right) \le 2^{-\varOmega (\ell )}. \end{aligned}$$

Furthermore, for any input distribution \(\mu \), the error of \(\pi '\) is at most the error of \(\pi \) plus \(2^{-\ell }\). Equally small differences hold between \(\mathsf {ACC}_\mu (\pi ')\), \(\mathsf {ARC}_\mu (\pi ')\), and their \(\pi \) equivalents, and \(\mathsf {IC}_\mu (\pi ')\) is within an additive constant of \(\mathsf {IC}_\mu (\pi )\).

Hence, while working exclusively with discretized protocols, our theorems will also hold for non-discretized protocols, except with an additional exponentially small error term. We consider this error negligible, and hence avoid discussing it beyond this point; the reader should bear in mind, though, that when we say that we are able to simulate a discretized protocol exactly, this will imply that we can simulate any protocol with sub-inverse-exponential \(2^{-\varOmega (\ell )}\) error.

We are particularly interested in the case of one-way protocols, where Alice sends a single message to Bob. A one-way protocol \(\pi \) is given by a function \(M_\pi : \mathcal {X}\times \mathcal {R}\mapsto \mathcal {M}\); on input x Alice randomly generates r and sends \(M_\pi (x,r)\). Note that if \(\pi \) is private-coin, then \(\mathsf {IC}_\mu (\pi ) = I(X:M(X,R_A)|Y)\), and similarly, if \(\pi \) is public-coin, then \(\mathsf {IC}_\mu (\pi ) = I(X:R, M(X,R)|Y)\).

Finally, we close this section with a further restriction on protocols, which we call 1–1. Proving an RNT  result for 1–1  protocols will be a useful intermediate step in the general RNT  proof.

Definition 5

A one-way protocol \(\pi \) is a 1–1  protocol if \(M_\pi (x,\cdot )\) is 1–1  for all x.

3 Towards a Reverse Newman’s Theorem

Our main result is the following:

Theorem 1

(Reverse Newman’s Theorem, bounded-round version) Let \(\pi \) be an arbitrary, \(\ell \)-discrete, mixed-coin, q-round protocol, and let \(C = \mathsf {CC}(\pi )\), \(n = \max \{\log |\mathcal {X}|,\log |\mathcal {Y}|\}\). Suppose that \(\pi \)’s public randomness R is chosen from the uniform distribution over the set \(\mathcal {R}\), and \(\pi \)’s private randomness \(R_A\) and \(R_B\) is chosen from uniform distributions over the sets \(\mathcal {R}_A\) and \(\mathcal {R}_B\), respectively.

Then there exists a public-coin, q-round protocol \(\tilde{\pi }\), whose public randomness \(R'\) is drawn uniformly from \(\mathcal {R}\times \mathcal {R}_A\times \mathcal {R}_B\), and that has the exact same transcript distribution, i.e., for any input pair xy and any message transcript t,

$$\begin{aligned} \Pr [ \pi (x, y, R, R_A, R_B) = t ] = \Pr [ {\tilde{\pi }}(x, y, R') = t ], \end{aligned}$$

and for any distribution \(\mu \) giving the input (XY),

$$\begin{aligned} \mathsf {IC}_\mu (\tilde{\pi }) \le \mathsf {IC}_\mu (\pi ) + O\left( q \log \left( 2 n \ell \right) \right) . \end{aligned}$$
(1)

We conjecture, furthermore, that a fully general RNT holds:

Conjecture 1

Theorem 1 holds with (1) replaced by

$$\begin{aligned} \mathsf {IC}_\mu (\tilde{\pi }) \le \tilde{O}(\mathsf {IC}_\mu (\pi )), \end{aligned}$$

where \(\tilde{O}(\cdot )\) suppresses terms and factors logarithmic in \(\mathsf {IC}_\mu (\pi )\) and \(\mathsf {CC}(\pi )\).

In Sects. 4 and 5, we show that RNTs  imply fully general compression of interactive communication, and hence the resulting direct-sum theorems in information complexity. This results in new compression and direct-sum theorems for the bounded-round case. We believe that attacking Conjecture 1, perhaps with an improvement of our techniques, is a sound and new approach to proving these theorems.

Before proving Theorem 1 let us first remark that it suffices to show it only for protocols \(\pi \) without public randomness (with an absolute constant in the O-notation). To see this, fix any outcome r of the random variable R, and look at the protocol \(\pi \) conditioned on \(R = r\). This is a protocol without public randomness, let us denote it by \(\pi _r\). Using the expression

$$\begin{aligned} I(X:\varPi |Y,R)+I(Y:\varPi |X,R) \end{aligned}$$

for information cost of \(\pi \), we see that it equals the average information cost of the protocol \(\pi _r\). Therefore, assuming that we are able to convert \(\pi _r\) into a public-coin protocol \(\tilde{\pi }_r\), as in Theorem 1, we can let the protocol \(\tilde{\pi }\) pick a random r and then run \(\tilde{\pi }_r\). As the information cost of the resulting protocol \(\tilde{\pi }\) again equals the average information cost of \(\tilde{\pi }_r\), the inequality (1) follows from similar inequalities for \(\pi _r\) and \(\tilde{\pi }_r\). For this reason, the theorems below will be proven for private-coin—rather than mixed-coin—protocols.

As suggested by the \(O(q \log (2 n \ell ))\)-term of (1), Theorem 1 will be derived from its one-way version.

3.1 RNT  for One-Way Protocols

Theorem 2

(RNT  for one-way protocols) For any one-way private-coin \(\ell \)-discrete protocol \(\pi \) there exists a one-way public-coin \(\ell \)-discrete protocol \(\pi '\) such that \(\pi \) and \(\pi '\) generate the same message distributions, and for any input distribution \((X,Y) \sim \mu \), we have

$$\begin{aligned} \mathsf {IC}_\mu (\pi ') \le \mathsf {IC}_\mu (\pi ) + O(\log (2n\ell )), \end{aligned}$$

where \(n=\log |\mathcal {X}|\).

Proof

We first sketch the proof. The public randomness \(R'\) used by the new protocol \(\pi '\) will be the very same randomness R used by \(\pi \). So we seem to have very little room for changing \(\pi \), but actually there is one change that we are allowed to make. Let \(M_\pi : \mathcal {X}\times \mathcal {R}\mapsto \mathcal {M}\) be the function Alice uses to generate her message. It will be helpful to think of \(M_\pi \) as a table, with rows corresponding to possible inputs x, columns corresponding to possible choices of the private random string r, and the (xr) entry being the message \(M_\pi (x, r)\). Noticing that r is picked uniformly, Alice might instead send message \(M(x, \phi _x(r))\), where \(\phi _x\) is some permutation of \(\mathcal {R}\). In other words, she may permute each row in the table using a permutation \(\phi _x\) for the row x. The permutation \(\phi _x\) will “scramble” the formerly-private now-public randomness R into some new string \(\tilde{r} = \phi _x(r)\) about which Bob hopefully knows nothing. This “scrambling” keeps the message distribution exactly as it was, changing only which R results in which message. We will see that this can be done in such a way that, in spite of knowing r, Bob has no hope of knowing \(\tilde{r}= \phi _x(r)\), unless he already knows x to begin with.

To understand what permutation \(\phi _x\) we need, we first note the following. Let \(M'=M_{\pi '}(X,R)\) denote the message that the protocol \(\pi '\) we have to design sends for input X and public randomness R. Then the information cost of \(\pi '\) is

$$\begin{aligned} I(M',R:X|Y). \end{aligned}$$

The information cost of the original protocol \(\pi \) is

$$\begin{aligned} I(M:X|Y)=I(M':X|Y), \end{aligned}$$

where the equality holds as the distributions of the triples (MXY) and \((M',X,Y)\) are identical (regardless of the chosen permutations \(\phi _x\)). Thus the difference between information costs of \(\pi '\) and \(\pi \) equals

$$\begin{aligned} I(M',R:X|Y)-I(M':X|Y)=I(R:X|M',Y), \end{aligned}$$

which is at most \(H(R|M',Y)\). If we permute each row of the table in such a way that every message m appears in at most \(d=(n\cdot \ell )^{O(1)}\) columns, then

$$\begin{aligned} H(R|M',Y)=O(\log n\ell ), \end{aligned}$$

as the entropy of any random variable with at most d outcomes does not exceed \(\log d\). Unfortunately, it may happen that there are no such permutations. For instance, this is the case when a row has the same message m in every column.

We will show that if this is not the case, and, moreover, each row has pairwise different messages, then we can “almost” achieve the goal: one can permute each row in such a way that with probability at least \(1-1/n^2\) the message \(M'=M_{\pi '}(X,R)\) appears in at most \(d=(n\cdot \ell )^{O(1)}\) columns. Thus we first prove Theorem 2 for the special case of 1–1 protocols, i.e. for protocols where each row has pairwise different messages.

The proof of Theorem 2 for 1–1 protocols. We first will construct a special bipartite graph G, which we call a matching graph. Its left nodes will be all possible messages m and its right nodes will be all random strings r. Our strategy will be to find a way of permuting each row of our table so that for every row x and most columns r (in row x) the message \(M_{\pi '}(x,r)\) in the cell (xr) of the table is connected by an edge to r in the graph G.

Definition 6

An \((m,\ell ,d,\delta )\)-matching graph is a bipartite graph \(G=(\mathcal {M}\cup \mathcal {R}, \mathcal {E})\) such that \(|\mathcal {M}|=2^m\), \(|\mathcal {R}| = 2^\ell \), \(\deg (u) = d\) for each \(u \in \mathcal {M}\), and such that for all \(\mathcal {M}' \subseteq \mathcal {M}\) with \(|\mathcal {M}'|=2^\ell \), \(G_{\mathcal {M}'\cup \mathcal {R}}\) has a matching of size at least \(2^\ell (1-\delta )\).

To gain some intuition about what is happening, suppose we had the following fictional object: an \((m, \ell ,n,0)\)-matching graph—i.e., we have a degree-n graph with the property that any left-set of size \(|\mathcal {R}|\) will have a perfect matching with \(\mathcal {R}\) that uses only edges in the graph. Now let \(\mathcal {M}_x = M_\pi (x, \mathcal {R})\) be the set of messages that \(\pi \) can send on input x; then in the new protocol \(\pi '\), \(M_{\pi '}(x,r)\) is the message that is matched with r in the perfect matching between \(\mathcal {M}_x\) and \(\mathcal {R}\) (see Fig. 1). It should be clear that \(\pi '\) gives each message exactly the same probability mass.

Fig. 1
figure 1

An ideal ‘matching graph’

To see that, in this new protocol \(\pi '\), R reveals little information about X when \(M'\) is known, notice that if we know the message \(m'=M_{\pi '}(x,r)\), then in order to specify r we only need to say which edge in the graph must be followed; this is specified with \(\log n\) bits because our graph has degree n. Hence \(I(X : R | M) \le H(R | M) \le \log n\).

In truth, matching graphs with such good parameters do not exist. But we can have good-enough approximations, and we can show that this is enough for our purposes. These graphs are obtained through the Probabilistic Method.

Lemma 1

For all integer \(\ell \le m\) and positive \(\delta \) there is an \((m,\ell , d, \delta )\)-matching graph with \(d=O(m/\delta )\).

In Sect. 6.1 we will show that the lemma holds also \(d = O((m-\ell )/\delta ^2) + \ln (1/\delta )/\delta \) (Lemma 10). That bound has better dependence on \(m,\ell \) (especially when \(m-\ell \ll m\)). However, it has worse dependence on \(\delta \). In Sect. 6.2 we show a lower bound of \(d = \varOmega ((m-\ell )/\delta )\), which almost matches our upper bounds.

Proof

Hall’s theorem [11] states that if in a bipartite graph every left subset of cardinality \(i\le L\) has at least i neighbors then every left subset of cardinality \(i\le L\) has a matching in the graph.

Thus it suffices to construct a bipartite graph having this property for \(L=(1-\delta )2^\ell \). By the union bound, a random graphFootnote 3 of degree d fails to have this property with probability at most

$$\begin{aligned} \sum _{i=1}^L 2^{mi} 2^{\ell i} \left( i/2^\ell \right) ^{di}. \end{aligned}$$

Here \(2^{mi}\) is an upper bound for the number of i-element left subsets \(\mathcal {M}'\), \(2^{\ell i}\) is an upper bound for the number of \(i-1\)-element right subsets \(\mathcal {R}'\), and \((i/2^\ell )^{di}\) is an upper bound for the probability that all neighbors of \(\mathcal {M}'\) fall into \(\mathcal {R}'\). For \(L=(1-\delta )2^\ell \) this sum is upper bounded by a geometric series

$$\begin{aligned} \sum _{i=1}^{L} \left[ 2^m2^{\ell }(1-\delta )^{d}\right] ^i. \end{aligned}$$

Thus we are done, if the base of this series \(2^m2^{\ell }(1-\delta )^{d}\) is less than 1/2, say, which happens for \(d=O(m/\delta )\). \(\square \)

Now the proof of Theorem 2 for 1–1 protocols proceeds as follows. Let \(n = \log |{\mathcal {X}}|\) and \(\ell = \log |{\mathcal {R}}|\). Assume without loss of generality that \({\mathcal {M}} = M({\mathcal {X}}, {\mathcal {R}})\); then \(|{\mathcal {M}}| \le 2^{n+\ell }\). Now let G be an \((n+\ell ,\ell , d, \delta )\)-matching graph having \({\mathcal {M}}\) as a subset of its left set and \({\mathcal {R}}\) as its right set, for \(\delta = \frac{1}{n^2}\). For these parameters, we are assured by Lemma 1 that such a matching graph exists having left-degree \(d = O((n+\ell )n^2)\).

We construct the new protocol \(\pi '\) as follows. For each \(x \in {\mathcal {X}}\) let \({\mathcal {M}}_x = M(x, {\mathcal {R}})\) be the set of messages that might be sent on input x. Noticing that \(|{\mathcal {M}}_x| = 2^\ell \), consider a partial G-matching between \({\mathcal {M}}_x\) and \({\mathcal {R}}\) pairing all but a \(\delta \)-fraction of \({\mathcal {M}}_x\); then define a bijection \(M'_x : {\mathcal {R}} \rightarrow {\mathcal {M}}_x\) by setting \(M'_x(r) = m\) if (mr) is an edge in the matching, and pairing the unmatched m and r’s arbitrarily (possibly using edges not in G). Finally, set \(M'(x, r) = M'_x(r)\).

Since \(M'(x, r) = M'_x(r)\) for some bijection \(M'_x\) between \(\mathcal {R}\) and \({\mathcal {M}}_x\), it is clear that M and \(M'\) generate the same transcript distribution for any input x.

Now we prove that \(M'\) does not reveal much more information than M. We have seen that the difference between the information costs of \(\pi '\) and \(\pi \) is at most \(H(R|M',Y)\). Thus it suffices to show that \(H(R|M',Y)\) is at most the logarithm of the left degree of the matching graph plus a constant. As \(H(R|M',Y)\) is the average of \(H(R|M',Y=y)\) over all choices of y, it suffices to show that

$$\begin{aligned} H(R|M',Y=y)\le \log d+3 \end{aligned}$$

for every y. While proving this inequality, we will drop the condition \(Y=y\) to simplify notation.

Let us introduce a new random variable K, which is a function of \(X,R,M'\) and takes the value 1 if \((M',R)\) is an edge of the matching graph and is equal to 0 otherwise. Recall that for every x the pair \((M'(x,R),R)\) is an edge of the matching graph with probability at least \(1-1/n^2\). Therefore, \(K=0\) with probability at most \(1/n^2\). Call a message m bad if the probability that \(K=0\) conditional to \(M'=m\) (that is, the fraction of rows x, among all rows containing m, such that m was not matched within the graph in the row x) is more than 1 / n. Then \(M'\) is bad with probability less than 1 / n, otherwise \(K=0\) would happen with probability greater than \(1/n^2\).

The conditional entropy \(H(R|M')\) is the average of

$$\begin{aligned} H(R|M'=m) \end{aligned}$$

over a randomly chosen m. Notice that \(H(R|M'=m)\) is at most the log-cardinality of \(\mathcal {X}\), because in 1–1 protocols R is a function of the pair \((M',X)\). Thus \(H(R|M'=m)\le n\) for all m, and hence the total contribution of all bad m’s in \(H(R|M')\) is at most 1. Thus it suffices to show that for all good m,

$$\begin{aligned} H(R|M'=m)\le \log d+2. \end{aligned}$$

To this end notice that

$$\begin{aligned} H(R|M'=m)\le H(K|M'=m)+H(R|K,M'=m)\le 1+H(R|K,M'=m). \end{aligned}$$

Thus it is enough to prove that \(H(R|K,M'=m)\le \log d+1\) for all good m. Again, \(H(R|K,M'=m)\) can be represented as the weighted sum of two terms,

$$\begin{aligned} H(R|K=1,M'=m) \quad {\text {and}}\quad H(R|K=0,M'=m). \end{aligned}$$

The former term is at most \(\log d\), because when \(K=1\) and \(M'=m\) we can specify R by the number of the edge (mR) in the matching graph. The latter term is at most n, however its weight is at most 1 / n, since m is good. This completes the proof of Theorem 2 for 1–1 protocols.

The proof of Theorem 2 in general case. The general case follows naturally from 1–1-case and the following lemma, which makes a protocol 1–1  by adding a small amount of communication.

Lemma 2

(A 1–1 conversion which reveals little information) Given a one-round \(\ell \)-discrete private-coin protocol \(\pi \), there is a one-round 1–1  \(\ell \)-discrete private-coin protocol \(\pi '\) whose message is of the formFootnote 4

$$\begin{aligned} M_{\pi '}(x, r) = (M_\pi (x, r), J(x, r)), \end{aligned}$$

and such that, for any input distribution \(\mu \),

$$\begin{aligned} \mathsf {IC}_\mu (\pi ') \le \mathsf {IC}_\mu (\pi ) + \log \ell + 1. \end{aligned}$$

Proof

We think of \(M(\cdot , \cdot )\) as a table, where the inputs \(x \in \mathcal {X}\) are the rows and the random choices \(r\in \mathcal {R}\) are the columns, and fix some ordering \(r_1 < r_2 < \ldots \) of \(\mathcal {R}\). The second part J(xr) of \(M_{\pi '}\) will be the ordinal number of the message M(xr) inside the row x i.e.,

$$\begin{aligned} J(x, r) = | \{ r' \le r | M(x, r') = M(x, r) \} |. \end{aligned}$$

This ensures that \(M_{\pi '}\) is 1–1.

The difference between the information costs of \(\pi '\) and \(\pi \) is

$$\begin{aligned} I(M,J:X|Y)-I(M:X|Y)=I(J:X|Y,M). \end{aligned}$$

Thus, it suffices to show that for every particular ym we haveFootnote 5

$$\begin{aligned} I(J:X|Y=y,M=m)\le \log \ell + 1. \end{aligned}$$
(2)

Fix any y and m, and drop the conditions \(Y=y,M=m\) to simplify the notation. Obviously, \(I(J:X)=H(J)-H(J|X)\). For any fixed x the random variable J has the uniform distribution over the set \(\{1,2,\dots ,W_x\}\), where \(W_x\) stands for the number of occurrences of the message m in row x of the table.

Let us partition x’s into \(\ell \) classes so that if x is in the \(i\hbox {th}\) class then \(2^{i-1}\le W_x < 2^i\). Let \(Z = Z_{y,m}\) be the class to which X belongs. Its entropy is at most \(\log \ell \) and hence we have

$$\begin{aligned} I(J:X)\le I(J:X|Z)+H(Z)\le I(J:X|Z)+\log \ell . \end{aligned}$$

Thus it suffices to show that for every i we have

$$\begin{aligned} I(J:X|Z=i)\le 1. \end{aligned}$$

Notice that

$$\begin{aligned} H(J|Z=i)\le i, \end{aligned}$$

as for all x in \(i\hbox {th}\) class we have \(W_x\le 2^i\). On the other hand,

$$\begin{aligned} H(J|X,Z=i)\ge i-1, \end{aligned}$$

as for every x in \(i\hbox {th}\) class we have \(W_x\ge 2^{i-1}\) and the distribution of J conditional to \(X=x,Y=y,M=m,Z=i\) is uniform. Thus

$$\begin{aligned} I(J:X|Z=i)=H(J|Z=i)-H(J|X,Z=i)\le i-(i-1)=1. \end{aligned}$$

Now we are able to finish the proof of Theorem 2 in the general case. Suppose \(\pi \) is a given one-way private-coin \(\ell \)-discrete protocol. Let \(\pi _2\) be the 1–1  protocol guaranteed by Lemma 2, and let \(\pi _3\) be the protocol constructed from \(\pi _2\) in the proof of Theorem 2 for 1–1 case. Note that \(\pi _3\)’s message is of the form \(M_{\pi _3}(X,R) = (M_\pi (X,R), J(X,R))\), since it is equidistributed with \(M_{\pi _2}\). Furthermore, we have

$$\begin{aligned} \mathsf {IC}_\mu (\pi _3) \le \mathsf {IC}_\mu (\pi ) + O(\log 2n\ell ). \end{aligned}$$

Now, create a protocol \(\pi _4\), which is identical to \(\pi _3\), except that Alice omits J(XR). Since for each x the message \(M_{\pi _4}(x,r)\) sent by \(\pi _4\) equals \(M(x, \phi _x(r))\) for some permutation \(\phi _x\) of \(\mathcal {R}\), it is clear that M and \(M'\) generate the same transcript distribution for any input x. And, by the information-processing inequality,

$$\begin{aligned} \mathsf {IC}_\mu (\pi _4) \le \mathsf {IC}_\mu (\pi _3). \end{aligned}$$

This completes the proof of Theorem 2. \(\square \)

3.2 RNT  for Many-Round Protocols

Let us prove Theorem 1 as a consequence of Theorem 2.

Proof

(Proof of Theorem 1) Let c be the constant hidden in O-notation in Theorem 2 so that every one round private-coin \(\ell \)-discrete protocol \(\pi \) with \(|\mathcal {X}|,|\mathcal {Y}|\le 2^n\) can be converted into one round public-coin protocol \(\pi '\) generating the same distribution on transcripts with

$$\begin{aligned} \mathsf {IC}(\pi ')\le \mathsf {IC}(\pi )+c\log 2n\ell . \end{aligned}$$

We are given a q-round private-coin protocol \(\rho \) and will simulate it by a public-coin protocol \(\rho '\) with

$$\begin{aligned} \mathsf {IC}(\rho ')\le \mathsf {IC}(\rho )+ 2 q c \log 2n\ell . \end{aligned}$$

The transformation of \(\rho \) into \(\rho '\) is as one can expect: in each node v of the protocol tree \(\rho \) we use a permutation of messages that depends on the input of the player communicating in that node. More specifically, let \(m_{<j}\) denote the concatenation of messages sent by \(\rho '\) up to round j. In \(j\hbox {th}\) round of \(\rho '\) we apply the protocol \(\rho '_{m_{<j}}\), which is obtained by the transformation of Theorem 2 from the 1-round sub-protocol \(\rho _{m_{<j}}\) of \(\rho \) rooted from the node \({m_{<j}}\) of the protocol tree of \(\rho \). This change does not affect the probability distribution over messages sent in each node and hence the resulting protocol \(\rho '\) generates exactly the same distribution on transcripts. The protocol \(\rho '\) uses the same randomness as \(\rho \); however, unlike \(\rho \) it uses public and not private randomness.

We have to relate now the information cost of \(\rho '\) to that of \(\rho \). To this end we split the information cost of \(\rho '\) into the sum of information costs of each round of \(\rho '\). Specifically, by the Chain rule (Fact 4) the amount of information revealed by \(\rho '\) to Bob (say) equals

$$\begin{aligned} I(X:M_1,R_1,\dots ,M_q,R_q|Y)=&\sum _jI(X:M_j,R_j|Y,M_{<j},R_{<j}). \end{aligned}$$

where \(R_j\) denotes randomness used in the \(j\hbox {th}\) round of \(\rho '\) and \(M_j=\rho '_{M_{<j}}(X,R_j)\) denotes the message sent in the \(j\hbox {th}\) round of \(\rho '\).

From \(I(R_{<j} : M_j, R_j | Y, M_{<j}) = 0\), we conclude from Theorem 2—using Facts 5 and 6 from the preliminaries—that

$$\begin{aligned} I(X:M_j,R_j|Y,M_{<j},R_{<j})\le & {} I(X:M_j, R_j|Y,M_{<j}) \\\le & {} I(X:M_j|Y,M_{<j})+c\log 2n\ell , \end{aligned}$$

where \(I(X:M_j|Y,M_{<j})\) in the right hand side is the information cost of \(j\hbox {th}\) round of the original protocol \(\rho \). Summing up this inequality over all \(j=1,\dots ,q\) and applying the Chain rule to \(\rho \) we see that

$$\begin{aligned} I(X:M_1,R_1,\dots ,M_q,R_q|Y)\le I(X:M_1,\dots ,M_q|Y) + q c\log 2n\ell . \end{aligned}$$

The similar inequality for the amount of information revealed by \(\rho \) and \(\rho '\) to Alice is proved in a similar way. \(\square \)

4 Compression for Public-Coin Protocols

We present in this section two results of the following general form: we will take a public-coin protocol \(\pi \) that reveals little information, and “compress” it into a protocol \(\rho \) that uses little communication to perform the same task with about the same error probability. It turns out that the results in this setting are simpler and give stronger compression than in the case where Alice and Bob have private randomness (such as in [1, 6]). We present two bounds, one that is dependent on the number of rounds of \(\pi \), but which is also round-efficient, in the sense that \(\rho \) will not use many more rounds than \(\pi \); and one that is independent of the number of rounds of \(\pi \), but where the compression is not as good when the number of rounds of \(\pi \) is small. We begin with the latter.

Theorem 3

Suppose there exists a public-coin protocol \(\pi \) to compute \(f:\{0,1\}^n\times \{0,1\}^n\rightarrow \mathcal {Z}\) over the distribution \(\mu \) with error probability \(\delta '\), and let \(C = \mathsf {CC}(\pi )\), \(I = \mathsf {IC}_\mu (\pi )\). Then for any positive \(\delta \) there is a public-coin protocol \(\rho \) computing f over \(\mu \) with error \(\delta ' +\delta \), and with \(\mathsf {ACC}_\mu (\rho ) = O( I\cdot \log (2Cn/\delta ))\).

Proof

Our compression scheme is similar, but not identical, to that of [1]—the absence of private randomness allows for a more elementary proof.

It suffices to prove the theorem only for deterministic protocols—the case for public-coin protocols can be proved as follows. By fixing any outcome r of randomness R of a public-coin protocol \(\pi \), we obtain a protocol \(\pi _r\) without public randomness and can apply Theorem 3 to \(\pi _r\). The average communication length of the resulting deterministic protocol \(\rho _r\) is at most \(O( I(\pi _r)\cdot \log (2Cn/\delta ))\). Thus the average communication of the public-coin protocol \(\rho \) that chooses a random r and runs \(\rho _r\) will be at most \(O( I\cdot \log (2Cn/\delta ))\).

Thus we have to show that any deterministic protocol \(\pi \) can be simulated with communication roughly:

$$\begin{aligned} I(Y : \varPi | X) + I(X : \varPi | Y) = H(\varPi |X) + H(\varPi |Y) \end{aligned}$$

(the equality follows because \(H(\varPi | X, Y) = 0\), since the transcript \(\varPi \) is a function of X and Y). As we do not relate the round complexity of \(\rho \) to that of \(\pi \) in this theorem, we may assume that in the protocol \(\pi \) every message is just a bit (and the turn to communicate does not necessarily alternate). In other words, the protocol tree has binary branching.

Given her input x, Alice knows the distribution of \(\varPi |x\), and she can hence compute the conditional probability \(\Pr [\pi (X, Y) = t | X = x]\) for each leaf t of the protocol tree. We will use the notation \(w_a(t|x)\) for this conditional probability. Likewise Bob computes \(w_b(t|y) = \Pr [\pi (X, Y) = t | Y = y]\). Now it must hold that \(\pi (x, y)\) is the unique leaf such that both \(w_a(t|x),w_b(t|y)\) are positive. Alice and Bob then proceed in stages to find that leaf: at a given stage they have agreed that a certain partial transcript, which is a node in the protocol tree of \(\pi \), is a prefix of \(\pi (x, y)\). Then each of them chooses a candidate transcript, which is a leaf extending their partial transcript (the candidate transcripts of Alice and Bob may be different). Then they find the largest common prefix (lcp) of their two candidate transcripts, i.e., find the first bit at which their candidate transcripts disagree. Now, because one of the players actually knows what that bit should be (that bit depends either on x or on y), the player who got it wrong can change her/his bit to its correct value, and this will give the new partial transcripts they agree upon. They proceed this way until they both know \(\pi (x, y)\).

It will be seen that the candidate leaf can be chosen in such a way that the total probability mass under the nodes they have agreed upon halves at every correction, and this will be enough to show that Alice will only need to correct her candidate transcript \(H(\varPi |X)\) times (and Bob \(H(\varPi |Y)\) times) on average. Efficient protocols for finding the lcp of two strings will then give us the required bounds.

We first construct an interactive protocol that makes use of a special device, which we call lcp box. This is a conceptual interactive device with the following behavior: Alice takes a string u and puts it in the lcp box, Bob takes a string v and puts it in the lcp box, then a button is pressed, and Alice and Bob both learn the largest common prefix of u and v. Using an lcp box will allow us to ignore error events until the very end of the proof, avoiding an annoying technicality that offers no additional insight.

Lemma 3

For any given probability distribution \(\mu \) over input pairs and for every deterministic protocol \(\pi \) with information cost I (w.r.t. \(\mu \)) and worst case communication C there is a deterministic protocol \(\tilde{\rho }\) with zero communication computing the same function with the same error probability (w.r.t. \(\mu \)) as \(\pi \), and using lcp box for C-bitstrings at most I times on average (w.r.t. \(\mu \)).

Proof

On inputs x and y, in the new protocol \(\tilde{\rho }\) Alice and Bob compute weights \(w_a(t|x),w_b(t|y)\) of every leaf of the protocol tree of \(\pi \), as explained above. Furthermore, for every binary string s let \(w_a(s|x)\) denote the sum of weights \(w_a(t|x)\) over all leaves t under s. Define \(w_b(s|y)\) in a similar way.

The protocol \(\tilde{\rho }\) runs in stages: before each stage i Alice and Bob have agreed on a binary string \(s=s_{i-1}\), which is a prefix of \(\pi (x,y)\). Initially \(s=s_0\) is empty.

On stage i Alice defines the candidate transcript \(t_a\) as follows: she appends 0 to \(s=s_{i-1}\) if \(w_a(s0|x)>w_a(s1|x)\) and she appends 1 to s otherwise. Let \(s'\) denote the resulting string. Again, she appends 0 to \(s'\) if \(w_a(s'0|x)>w_a(s'1|x)\) and she appends 1 to \(s'\) otherwise. She proceeds in this way until she gets a leaf of the tree (by construction its weight is positive). Bob defines his candidate transcript \(t_b\) in a similar way. Then they put \(t_a\) and \(t_b\) in the lcp box and they learn the largest common prefix \(s^{*}\) of \(t_a\) and \(t_b\). By construction both \(w_a(s^{*}|x)\) and \(w_b(s^{*}|y)\) are positive and hence \(s^{*}\) is a prefix of \(\pi (x,y)\).Footnote 6 Recall that no leaf of the protocol tree is a prefix of another leaf. Therefore either \(s^{*}=t_a=t_b\), in which case they stop the protocol, as they both know \(\pi (x,y)\). Or \(s^{*}\) is a proper prefix of both \(t_a\) and \(t_b\). If the node \(s^{*}\) of the protocol tree belongs to Alice, then Bob’s next bit is incorrect, and otherwise Alice’s next bit is incorrect. They both add the correct bit to \(s^{*}\) and let \(s_i\) be the resulting string.

Each time Alice’s bit is incorrect \(w_a(s|x)\) decreases by a factor of 1 / 2, and similarly each time Bob’s bit is incorrect \(w_b(s|y)\) decreases by a factor of 1 / 2. At the start we have \(w_a(s|x)=w_b(s|y)=1\) and at the end we have \(w_a(s|x)=w_a(\pi (x,y)|x)\) and \(w_b(s|y)=w_b(\pi (x,y)|y)\). Hence they use the lcp box at most

$$\begin{aligned} \log 1/ w_a(\pi (x,y)|x)+\log 1/ w_b(\pi (x,y)|y) \end{aligned}$$

times. By definition of the conditional entropy the average of \(\log 1/ w_a(\pi (X,Y)|X)\) is equal to \(H(\varPi |X)\) and the average of \(\log 1/ w_b(\pi (X,Y)|Y)\) equals \(H(\varPi |Y)\). Thus Alice and Bob use lcp box at most I times on average. \(\square \)

Now we have to transform the protocol of Lemma 3 to a randomized public-coin protocol computing f that does not use an lcp box, with additional error \(\delta \). The use of an lcp box can be simulated with an error-prone implementation:

Lemma 4

([10]) For every positive \(\varepsilon \) and every natural C there is a randomized public-coin protocol such that on input two C-bit strings xy, it outputs the largest common prefix of xy with probability at least \(1 - \varepsilon \); its worst-case communication complexity is \(O(\log (C/\varepsilon ))\).

The lemma is proven by hashing (as in the randomized protocol for equality) and binary search. From this lemma we obtain the following corollary.

Lemma 5

For every positive \(\delta \) any protocol \(\tilde{\rho }\) to compute \(f:\{0,1\}^n\times \{0,1\}^n\rightarrow \mathcal {Z}\) that uses an lcp box \(\ell \le 2n\) times on average for strings of length at most C can be simulated with error \(\delta \) by a protocol \(\rho \) that does not use an lcp box, and communicates \(O(\ell \log (\frac{2Cn}{\delta }))\) bits more on average.Footnote 7

Proof

The protocol \(\rho \) simulates \(\tilde{\rho }\) by replacing each use of the lcp box with the protocol given by Lemma 4 with some error parameter \(\varepsilon \) (to be specified later). The simulation continues while the total communication is less than n. Once it becomes n, we stop the simulation and both players exchange their inputs.

Notice that the additional error probability introduced by the failure of the protocol of Lemma 4 is at most \(\varepsilon \ell \): for each input pair (xy) the error probability is at most \(\varepsilon i(x,y)\), where i(xy) stands for the number of times we invoke lcp box for that particular pair, and the average of \(\varepsilon i(x,y)\) over (xy) equals \(\varepsilon \ell \). Thus if we take \(\varepsilon \le \delta /\ell \), the error probability introduced by failures of the lcp box is at most \(\delta \).

Each call of lcp box costs \(O(\log (C/\varepsilon )\). Thus the communication of \(\rho \) is at most

$$\begin{aligned} O(\ell \log (C/\varepsilon ))+(\ell \varepsilon )(3n) \end{aligned}$$

more on average than that of \(\tilde{\rho }\). Here the first term is an upper bound for the average communication over all triples (xy,  randomness for lcp box) such that no lcp failure occurs and the second term accounts for the communication over all remaining triples.

Let \(\varepsilon =\delta /2n\) (which is less than \(\delta /\ell \), as we assume that \(\ell \le 2n\)) so that the average communication is at most \(O(\ell \log (\frac{2Cn}{\delta })+\ell \delta )=O(\ell \log (\frac{2Cn}{\delta }))\).

We are now able to finish the proof of Theorem 3. Notice that the information cost of the initial protocol is at most 2n. Hence we can apply Lemma 5 for \(\ell =I\) to the protocol of Lemma 3. The average communication of the resulting protocol \(\rho \) is at most \(O( I\cdot \log (2Cn/\delta ))\). \(\square \)

The proof of Theorem 3 offers no guarantee on the number of rounds of the compressed protocol \(\rho \). It is possible to compress a public-coin protocol on a round-by-round basis while preserving, up to a multiplicative constant, the total number of rounds used.

Theorem 4

Suppose there exists a public-coin protocol \(\pi \) to compute \(f:\{0,1\}^n\times \{0,1\}^n\rightarrow \mathcal {Z}\) over input distribution \(\mu \) with error probability \(\delta '\), and let \(I = \mathsf {IC}_\mu (\pi )\) and \(q = \mathsf {RC}(\pi ).\) Then for every positive \(\delta \) there exists a public-coin protocol \(\rho \) that computes f over \(\mu \) with error \(\delta ' + \delta \), and with \(\mathsf {ACC}_\mu (\rho ) = O(I + 1)+q\log (nq/\delta )\) and \(\mathsf {ARC}_\mu (\rho ) = O(q)\).

Proof

Again it suffices to prove the theorem for deterministic protocols \(\pi \). The idea of the proof is to show the result one round at a time. In round i, Alice, say, must send a certain message \(m_i\) to Bob. From Bob’s point of view, this message is drawn according to the random variable \(M_i = M_i(\tilde{X}, y, m_1, \ldots , m_{i-1})\) where \(\tilde{X}\) is Alice’s input conditioned on Bob’s input being y and on the messages \(m_1, \ldots , m_{i-1}\) that were previously exchanged. We will show that there is a sub-protocol \(\sigma _i\) that can simulate round i with small error by using constantly-many rounds and with

$$\begin{aligned} O(H(M_i|y, m_1, \ldots , m_{i-1})) = O(I(X : M_i |&y, m_1, \ldots , m_{i-1})) \end{aligned}$$

bits of communication on average. Then putting these sub-protocols together, and truncating the resulting protocol whenever the communication is excessive, we obtain the protocol \(\rho \) which simulates \(\pi \). \(\square \)

The procedure to compress each round is achieved through an interactive variant of the Slepian–Wolf theorem [4, 18, 19]. We could not apply the known theorems directly, however, since they were made to work in different settings.

In a similar fashion to the proof of Theorem 3, we will make use of a special interactive device, which we call a transmission \(\mu \) -box, where \(\mu \) is a probability distribution over input pairs (XY). Its behavior is as follows: one player takes a string x and puts it in the transmission box, the other player takes a string y and puts it in the box, a button is pressed, and then the second player knows x. The usage of a transmission \(\mu \)-box is charged in such a way that the average cost when the input pair (XY) is drawn at random with respect to \(\mu \) is \(O(H(X|Y) +1)\) bits of communication and O(1) rounds.

Lemma 6

Let \(\pi \) be any deterministic q-round protocol, and let \(\mu \) be the distribution of the inputs (XY). Then there exists a deterministic protocol \(\tilde{\rho }\) that makes use of the transmission box (each time for a different distribution) to achieve the following properties.

  1. 1.

    The average communication of \(\tilde{\rho }\) is \(\mathsf {ACC}_\mu (\tilde{\rho }) = O(\mathsf {IC}_\mu (\pi ) + q)\);

  2. 2.

    The average number of rounds of \(\tilde{\rho }\) is \(\mathsf {ARC}_\mu (\tilde{\rho }) = O(q)\);

  3. 3.

    \(\tilde{\rho }\) uses a transmission box q times; and

  4. 4.

    After \(\tilde{\rho }\) is run on the inputs xy, both players know \(\pi (x, y)\).

Proof

Let \(\pi _{<j}(x, y)\) denote the sequence of messages sent by \(\pi \) in the first \(j-1\) rounds for inputs xy. The protocol \(\tilde{\rho }\) simulates \(\pi \) on a round-per-round basis.

Assume that in the new protocol \(j-1\) rounds were played. Let \(m_{<j}\) denote the sequence of \(j-1\) messages sent earlier and let xy stand for inputs. Assume further that in \(j\hbox {th}\) round of \(\pi \) Alice has to communicate. Her message is a function M of the sequence \(m_{<j}\) and her input x. Let \(\nu \) denote the probability distribution on pairs (my) where

$$\begin{aligned} \nu (m,y)=\Pr [M(X,m_{<j})=m,\ Y=y | \pi _{<j}(X, Y)=m_{<j}]. \end{aligned}$$

In round j of protocol \(\tilde{\rho }\), Alice puts the string \(M(x,m_{<j})\) into the transmission \(\nu \)-box and Bob puts his input y there and they press the button. If it is Bob’s turn to communicate, then they reverse their positions.

Items 2, 3 and 4 from the statement of the Lemma follow from construction of \(\tilde{\rho }\) and from the description of the transmission box. It remains to bound the average communication length of \(\tilde{\rho }\). Again by assumption on transmission box, the average communication in round j is at most \(O(I_j+1)\) where

$$\begin{aligned} I_j=H(M(X,\pi _{<j}(X, Y))|Y, \pi _{<j}(X, Y)), \end{aligned}$$

if it is Alice’s turn to communicate and

$$\begin{aligned} I_j=H(M(Y,\pi _{<j}(X, Y))|X, \pi _{<j}(X, Y)), \end{aligned}$$

otherwise. From the chain rule (Fact 4) it follows that the sum of \(I_j\) over all j of the first type is equal to \(I(\varPi :X|Y)\), while that the sum of \(I_j\) over all j of the second type is equal to \(I(\varPi :Y|X)\). \(\square \)

To proceed we need a protocol simulating the transmission box.

Lemma 7

(Constant-round average-case one-shot Slepian–Wolf) Let \(\mu \) be the distribution of the inputs (XY). For every positive \(\varepsilon \) there is a public-coin communication protocol with the following properties:

  1. 1.

    For all fixed xy, after execution of the protocol Bob learns x with probability at least \(1-\varepsilon \).

  2. 2.

    When (XY) are drawn according to \(\mu \), the protocol communicates

    $$\begin{aligned} O(H(X|Y)+ 1)+\log (1/\varepsilon ) \end{aligned}$$

    bits in O(1) rounds on average.

Contrast this to the classical Slepian–Wolf theorem, where Alice and Bob are given a stream of i.i.d. pairs \((X_1, Y_1), \ldots , (X_n, Y_n)\), and Alice gets to transmit \(X_1, \ldots , X_n\) by using only one-way communication, and with an amortized communication of H(X|Y).

Proof

Let y be Bob’s given input. For a given x in the support of X, let \(p(x) = \Pr [X = x | Y = y]\), and for a given subset \(\mathcal {X}\) of the same support, let \(p(\mathcal {X}) = \Pr [X \in \mathcal {X}| Y = y]\). Then Bob begins by arranging the x’s in the support of X by decreasing order of the probability p(x). He then defines the two sets

$$\begin{aligned} \mathcal {X}_1 = \{ x_1, \ldots , x_{i(1)} \}, \qquad \mathcal {Z}_1 = \mathcal {X}_1, \end{aligned}$$

where i(1) is the minimal index which makes \(p(\mathcal {X}_1) \ge 1/2\). Inductively, while \(\mathcal {Z}_k\) does not contain the entire support of X, he then defines:

$$\begin{aligned} \mathcal {X}_{k+1} = \{ x_{i(k) + 1}, \cdots , x_{i(k+1)} \}, \qquad \mathcal {Z}_{k+1} = \mathcal {Z}_k \cup \mathcal {X}_{k+1}, \end{aligned}$$

where \(i(k+1) > i(k)\) is the minimal index which makes \(p(\mathcal {X}_{k+1}) \ge \frac{1 - p(\mathcal {Z}_k)}{2}\). I.e. \(\mathcal {X}_{k+1}\) is the smallest set which takes the remaining highest-probability x’s so that they total at least half of the remaining probability mass.

Because at least one new \(x_i\) is added at every step, this inductive procedure gives Bob a finite number of sets \(\mathcal {Z}_1, \ldots , \mathcal {Z}_K=X\). Then the protocol consists of applying the protocol of the following lemma, which will be proved later.

Lemma 8

For every natural m and every positive \(\varepsilon \) there exists a randomized public-coin protocol with the following behavior. Suppose that Bob is given a family of finite sets \(\mathcal {Z}_1 \subseteq \cdots \subseteq \mathcal {Z}_K\subset \{0,1\}^m\) and Alice is given a string \(z\in \mathcal {Z}_K\). Then the protocol transmits z to Bob, except with a failure probability of at most \(\varepsilon \). For k the smallest index for which \(z\in \mathcal {Z}_k\), the run of this protocol uses at most \(2k+1\) rounds and \(2 \log |\mathcal {Z}_k|+\log \frac{1}{\varepsilon } + 4k\) bits of communication.

Now let us bound the average number of rounds and communication complexity. First notice that \(p(\mathcal {X}_k) \le 2^{1-k}\), and hence, taking the average over Alice’s inputs, we find that

$$\begin{aligned} \sum _{k = 1}^K p(\mathcal {X}_k) 4 k = O(1) \end{aligned}$$

must upper bound the average number of rounds, as well as the contribution of the 4 k term to the average communication. To upper-bound the contribution of the \(2 \log |\mathcal {Z}_k|\) term, we first settle that:

  1. (i)

    \(p(\mathcal {X}_k) \le 2 p(\mathcal {X}_{k+1}) + 2 p(x_{i(k)})\), which can be seen by summing two inequalities that follow from the minimality of i(k) in the definition of \(\mathcal {X}_k\):

    $$\begin{aligned} p(\mathcal {X}_k) - p(x_{i(k)}) \le \frac{1 - p(\mathcal {Z}_{k-1})}{2}, \qquad \frac{1 - p(\mathcal {Z}_k)}{2} \le p(\mathcal {X}_{k+1}), \end{aligned}$$

    after which we get

    $$\begin{aligned} \frac{p(\mathcal {X}_k)}{2} - p(x_{i(k)}) \le p(\mathcal {X}_{k+1}). \end{aligned}$$
  2. (ii)

    \(|\mathcal {Z}_k| \le \frac{1}{p(x)}\) for any \(x \in \mathcal {X}_{k+1} \cup \{ x_{i(k)} \}\), which follows since every \(x' \in \mathcal {Z}_k\) has a higher-or-equal probability than the x’s in \(\mathcal {X}_{k+1} \cup \{ x_{i(k)} \}\), but the sum of all the \(p(x')\) still adds up to less than 1.

Now we are ready to bound the remaining term in the average communication:

$$\begin{aligned}&\sum _{k=1}^K p(\mathcal {X}_k) \log |\mathcal {Z}_k| \le 2 \sum _{k=1}^{K-1} p(\mathcal {X}_{k+1}) \log |\mathcal {Z}_k| + p(\mathcal {X}_K) \log |\mathcal {Z}_K|\\&\quad + \, 2 \sum _{k=1}^K p(x_{i(k)}) \log |\mathcal {Z}_k| \le 5 \sum _{x} p(x) \log \frac{1}{p(x)} = O(H(X | Y = y)); \end{aligned}$$

above, the first inequality follows from (i), and the second from (ii). \(\square \)

Proof of Lemma 8

The protocol is divided into stages and works as follows. On the first stage, Bob begins by sending the number \(\ell _1 = \log |\mathcal {Z}_1|\) in unary to Alice, and Alice responds by picking \(L_1 = \ell _1 + \log \frac{1}{\varepsilon } + 1\) random linear functions \(f_1^{(1)}, \ldots , f_{L_1}^{(1)}: {\mathbb {Z}}_2^m\rightarrow {\mathbb {Z}}_2\) using public randomness, and sending Bob the hash values \(f_1^{(1)}(z), \ldots , f_{L_1}^{(1)}(z)\). Bob then looks for a string \(z' \in \mathcal {Z}_1\) that has the same hash values he just received; if there is such a string, then Bob says so, and the protocol is finished with Bob assuming that \(z' = z\).

Otherwise, the protocol continues. At stage k, Bob computes the number \(\ell _k = \log |\mathcal {Z}_k|\), and sends the number \(\ell _k - \ell _{k-1}\) in unary to Alice; Alice responds by picking \(L_k = \ell _k - \ell _{k-1} + 1\) random linear functions \(f_{1}^{(k)}, \ldots , f_{L_k}^{(k)}\), whose evaluation on z she sends over to Bob. Bob then looks for a string \(z' \in \mathcal {Z}_k\) that has the same hash values for all the hash functions which were picked in this and previous stages; if there is such a string, then Bob says so, and the protocol is finished with Bob assuming that \(z' = z\). If the protocol has not halted in K rounds, Alice just sends her input to Bob.

An error will occur whenever a \(z' \not = z\) is found that has the same fingerprint as z. The probability that this happens at stage k for a specific \(z' \in \mathcal {Z}_k\) is \(2^{-L}\), where \(L = \ell _k + k + \log \frac{1}{\varepsilon }\) is the total number of hash functions picked up to this stage. By a union bound, the probability that such a \(z'\) exists is at most \(|\mathcal {Z}_k| 2^{-\ell _k} \frac{\varepsilon }{2^k} \le \frac{\varepsilon }{2^k}\). Again by a union bound, summing over all stages k we get a total error probability of \(\varepsilon \).

To bound the communication for \(z \in \mathcal {Z}_k\), notice that sending all \(\ell _1.\dots ,\ell _k\) costs Bob at most \(\log |\mathcal {Z}_k|+k\) bits of total communication,Footnote 8 that the total number of hash values sent by Alice is at most \(\log |\mathcal {Z}_k| + 2k + \log \frac{1}{\varepsilon }\), and that Bob’s reply (saying whether the protocol should continue) costs him k bits. \(\square \)

From Lemma 7 we get an analogue of Lemma 5.

Lemma 9

For every positive \(\delta \le 1/3\) any protocol \(\tilde{\rho }\) to compute \(f:\{0,1\}^n\times \{0,1\}^n\rightarrow \mathcal {Z}\) that uses transmission boxes q times can be simulated with error \(\delta \) by a protocol \(\rho \) that does not use transmission boxes, and communicates \(q\log (\frac{qn}{\delta })+1\) bits more.

Proof

The protocol \(\rho \) simulates \(\tilde{\rho }\) by replacing each use of a transmission box with the protocol given by Lemma 7 with some error parameter \(\varepsilon \) (to be specified later). The simulation continues while the total communication is less than n. Once it becomes n, we stop the simulation and the players exchange their inputs.

The additional error probability introduced by the failure of the protocol of Lemma 7 is at most \(q\varepsilon \). Assuming that \(\varepsilon \le \delta /q\), the error probability introduced by a transmission box failure is at most \(\delta \).

Each call of a transmission box costs \(\log (1/\varepsilon )\) bits of communication more than we have charged the protocol \(\tilde{\rho }\). Thus the communication of \(\rho \) is at most

$$\begin{aligned} q\log (1/\varepsilon )+(q\varepsilon )(2n) \end{aligned}$$

longer than that of \(\tilde{\rho }\). Set \(\varepsilon =\delta /qn\), so that the communication of \(\rho \) is at most

$$\begin{aligned} q\log (qn/\delta ) + 2 \delta \le q\log (qn/\delta ) + 1 \end{aligned}$$

more than that of \(\tilde{\rho }\).

The desired protocol that establishes Theorem 4 is obtained by applying Lemma 9 to the protocol of Lemma 6.\(\square \)

5 Applications

From the combination of Theorems 1 and 4, and Observation 1, we can obtain a new compression result for general protocols.

Corollary 1

Suppose there exists a mixed-coin, q-round protocol \(\pi \) to compute f over the input distribution \(\mu \) with error probability \(\varepsilon \), and let \(C = \mathsf {CC}(\pi )\), \(I = \mathsf {IC}_\mu (\pi )\), \(n = \log |\mathcal {X}| + \log |\mathcal {Y}|\). Then there exists a public-coin, O(q)-average-round protocol \(\rho \) that computes f over \(\mu \) with error \(\varepsilon + \delta \), and with

$$\begin{aligned} \mathsf {ACC}_\mu (\rho ) \le O\left( I + q\log \left( \frac{q n C}{\delta }\right) \right) . \end{aligned}$$
(3)

As we will see in the following sub-section, this will result in a new direct-sum theorem for bounded-round protocols. In general, given that we have already proven Theorem 3, and given that this approach shows promise in the bounded-round case, it becomes worthwhile to investigate whether we can prove Conjecture 1 with similar techniques.

5.1 Direct-Sum Theorems for the Bounded-Round Case

The following theorem was proven in [1]:

Theorem 5

([1], Theorem 12) Suppose that there is a q-round protocol \(\pi ^k\) that computes k copies of f with communication complexity C and error \(\varepsilon \), over the k-fold distribution \(\mu ^k\). Then there exists a q-round mixed-coin protocol \(\pi \) that computes a single copy of f with communication complexity C and the same error probability \(\varepsilon \), but with information cost \(\mathsf {IC}_\mu (\pi ) \le \frac{2 C}{k}\) for any input distribution \(\mu \).

As a consequence of this theorem, and of Corollary 1, we will be able to prove a direct-sum theorem. The proof is a simple application of Theorem 5, and Corollary 1.

Theorem 6

(Direct-sum theorem for the bounded-round case) There is some constant d such that, for any input distribution \(\mu \) and any \(0 < \varepsilon < \delta < 1\), if f requires, on average,

$$\begin{aligned} C + q \log \left( \frac{q n C}{\delta - \varepsilon } \right) \end{aligned}$$

bits of communication to be computed over \(\mu \) with error \(\delta \) in d q (average) rounds, then \(f^{\otimes k}\) requires k C bits of communication, in the worst case, to be computed over \(\mu ^{\otimes k}\) with error \(\varepsilon \) in q rounds.

5.2 Comparison with Previous Results

We may compare Corollary 1 with the results of [6]. In that paper, the n C factor is missing inside the \(\log \) of equation (3), but the number of rounds of the compressed protocol is \(O(q \log I)\) instead of O(q). A similar difference appears in the resulting direct-sum theorems.

We remark that the compression of Jain et al. [12] is also achieved with a round-by-round proof. Our direct-sum theorem is incomparable with their more ambitious direct-product result. It is no surprise, then, that the communication complexity of their compression scheme is \(O(\frac{q I}{\delta })\), i.e., it incurs a factor of q, whereas we pay only an additive term of \(\tilde{O}(q)\). However, their direct-product result also preserves the number of rounds in the protocol, whereas in our result the number of rounds is only preserved within a constant factor.

6 Alternative Constructions and Matching Lower Bounds

6.1 A Different Upper Bound on the Degree of Matching Graphs

Lemma 10

For all integer \(\ell \le m\) and positive \(\delta \) there is an \((m,\ell , d, \delta )\)-matching graph with \(d = (2+(m-\ell )\ln 2)/\delta ^2 + \ln (1/\delta )/\delta \).

Proof

We show the existence of such a graph using a probabilistic argument. Let A and B be any sets of \(M=2^m\) left and \(L=2^\ell \) right nodes, respectively. Construct a random graph G by choosing d random neighbors independently for each \(u\in A\). Different neighbors of the same node u are also chosen independently, thus they might coincide. For any \(A'\subseteq A\) of size L, let \(E_{A'}\) be the event that \(G_{A'\cup B}\) does not have a matching of size \(L(1-\delta )\), and let . Note that the lemma holds if .

Next, we bound \(\Pr [E_{A'}]\). Let \(A'=\{u_1,\dots ,u_L\}\) be any set of L left nodes. Let \(\mathcal {N}(u)\) denote the neighborhood of a vertex u. Consider the following procedure for generating a matching for \(G_{A' \cup B}\):

figure a

Define the indicator variables \(X_1,\dots ,X_L\) as follows: \(X_i=1\) if the the condition in the 4th line of Find-Matching is true and 0 otherwise. From the definition of these variables it follows that for all i and all \(b=(b_1,\dots ,b_i)\in \{0,1\}^i\) the conditional probability of \(X_{i+1}=0\) given \(X_1=b_1,\dots ,X_i=b_i\) is equal to

$$\begin{aligned} (|b|/L)^d, \end{aligned}$$

where |b| stands for Hamming weight of vector b, i.e. the number of 1s in \(b=(b_1,\dots ,b_i)\). Consider also similar random variables \(Y_1,\dots ,Y_L\) where the distribution of \(Y_1,\dots ,Y_L\) is defined by the formula

$$\begin{aligned} \Pr [Y_{i+1}=0| Y_1=b_1,\dots ,Y_i=b_i]= {\left\{ \begin{array}{ll} (|b|/L)^d,&{} {\text {if }}\; |b|<(1-\delta )L,\\ 1,&{} {\text {if }}\; |b|\ge (1-\delta )L. \end{array}\right. } \end{aligned}$$

In terms of \(X_1,\dots ,X_L\) the event \(E_{A'}\) happens if and only if \(X_1+\dots +X_L<(1-\delta )L\). For every string b of Hamming weight less than \((1-\delta )L\) the probabilities \(\Pr [X=b]\) and \(\Pr [Y=b]\) coincide. Thus it suffices to upper bound the probability \(\Pr [Y_1+\dots +Y_L<(1-\delta )L]\). To this end consider independent random variables \(Z_1,\dots ,Z_L\in \{0,1\}\), where the probability of \(Z_i=1\) is \((1-\delta )^d\).

Claim

\(\Pr [|Y|<(1-\delta )L]\le \Pr [|Z|<(1-\delta )L]\).

Proof

We prove this using the coupling method. We claim that there is a joint distribution of Y and Z such that the marginal distributions are as defined above, and with probability 1 it holds that \(Z_i\le Y_i\) for all i. This joint distribution is defined by the following process: we pick L independent reals \(r_1,\dots ,r_L\in [0;1]\) and let

$$\begin{aligned} Z_i&={\left\{ \begin{array}{ll} 0, &{}{\text {if }}\; r_i< (1-\delta )^d;\\ 1, &{}{\text {otherwise.}}\end{array}\right. }\\ Y_i&={\left\{ \begin{array}{ll} 0, &{}{\text {if }}\; r_i< \Bigl (\frac{Y_1+\dots +Y_{i-1}}{L}\Bigr )^d {\text { and }}\frac{Y_1+\dots +Y_{i-1}}{L}<1-\delta ;\\ 1, &{}{\text {otherwise.}}\end{array}\right. } \end{aligned}$$

We claim that the inequality \(Z_i\le Y_i\) (holding with probability 1) implies that for for every downward closed set \(E\subset \{0,1\}\) it holds \(\Pr [Y\in E]\le \Pr [Z\in E]\) (we call a set E downward closed if \(b\in E\) and \(b'\le b\), component-wise, implies \(b'\in E\)). Indeed,

$$\begin{aligned} \Pr [Y\in E]\le \Pr [Y\in E,Z\in E]\le \Pr [Z\in E], \end{aligned}$$

where the first inequality holds, since E is downward closed and thus \(Y\in E\) implies \(Z\in E\). The set of Boolean vectors \(b\in \{0,1\}^L\) of Hamming weight less than \((1-\delta )L\) is downward closed hence the statement. \(\square \)

By this lemma it suffices to upper bound the probability

$$\begin{aligned} \Pr [Z_1+\dots +Z_L<(1-\delta )L], \end{aligned}$$

which can be obtained by Chernoff bound.

Let \(S :=\sum (1-Z_i)\), and let \(\mu :={{\mathrm{\mathbb {E}}}}[S]\), \(p=(1-\delta )^d\). Note that \(\mu = pL\). Also, let \(\psi :=\delta /p - 1\). Using the multiplicative version of the Chernoff bound, we have

$$\begin{aligned} \Pr [S > \delta L]= & {} \Pr [S > pL\cdot (\delta /p)] \\= & {} \Pr [S > \mu (1+\psi )] \\< & {} \left( \frac{e^\psi }{(1+\psi )^{(1+\psi )}}\right) ^\mu \\= & {} \exp \left( \mu \left( \frac{\delta }{p} - 1 - \frac{\delta }{p} \ln \left( \frac{\delta }{p}\right) \right) \right) \\< & {} \exp \left( \mu \left( \frac{\delta }{p} - \frac{\delta }{p} \ln \left( \frac{\delta }{p}\right) \right) \right) \\= & {} \exp \left( pL\frac{\delta }{p}\left( 1 - \ln \delta + \ln p\right) \right) \\= & {} \exp \left( \delta L + \delta L \ln (1/\delta ) + \delta L \ln p\right) \\= & {} \exp \left( \delta L \left( 1+\ln (1/\delta ) + \ln p\right) \right) . \end{aligned}$$

Thus for every set \(A'\) of L left nodes we have \(\Pr [E_{A'}]<e^{\delta L \left( 1+\ln (1/\delta ) + \ln p\right) }\). There are \(\left( {\begin{array}{c}M\\ L\end{array}}\right) \) subsets of A of size L. By Stirling’s formula, we have

$$\begin{aligned} \left( {\begin{array}{c}M\\ L\end{array}}\right) \le \frac{(M)^L}{L!} \le \left( \frac{Me}{L}\right) ^L = \exp (L(1+\ln M/L)) . \end{aligned}$$

By union bound we have

where the final inequality uses \(d = (2+\ln M/L)/\delta ^2 + \ln (1/\delta )/\delta \). \(\square \)

6.2 A Lower Bound on the Degree of Matching Graphs

Lemma 11

An \((m,\ell , d, \delta )\)-matching graph must have

$$\begin{aligned} d = \varOmega \left( \min \left( \frac{m-\ell }{\delta }, \delta 2^\ell \right) \right) . \end{aligned}$$

Proof

We will prove that in such a bipartite graph there must exist a left-set A of size \(2^m (1 - 4 \delta )^d\) whose neighbors are contained in a right-set B of size \((1-2\delta )2^\ell \). If the graph is a matching graph with said parameters, it must then follow that \(|A| \le 2^\ell \), hence \(d \ge (m-\ell ) / \log (1 - 4 \delta ) = \varOmega ((m - \ell )/\delta )\).

We show this through the probabilistic method. Let us pick a random right-set B of size \((1-2\delta )2^\ell \). For a given left-node a, the probability that all its neighbors fall into B is at least

$$\begin{aligned} \left( {\begin{array}{c}2^\ell - d\\ (1-2\delta )2^\ell - d\end{array}}\right) \big / \left( {\begin{array}{c}2^\ell \\ (1-2\delta )2^\ell \end{array}}\right) \ge (1-2\delta )^d \left( 1 - \frac{2d}{2^\ell }\right) ^d. \end{aligned}$$

Under the assumption that \(d \le \delta 2^\ell \), the left-hand side is at least \((1-4\delta )^d\).

It must then hold that for such random B, the expected number of left-nodes that map into B is \(2^m(1-4\delta )\). Hence, for some choice of B, there will exist a left-set A of the same size whose neighbors are all in B. \(\square \)

6.3 A Lower Bound for Eq. (2) of the Proof of Lemma 2

Lemma 12

There is an \(\ell \)-discrete private-coin one-way protocol \(\pi \), and a message m sent by \(\pi \), such that for J defined as in Lemma 2, it holds that

$$\begin{aligned} I(J:X|M_\pi = m) = \varOmega (\log \ell ). \end{aligned}$$

Proof

Suppose Alice is given an input X uniformly distributed over \(\{x_1, \ldots , x_N\}\), and private randomness uniformly distributed over \(\{r_1, \ldots , r_N\}\), so that \(\ell = \log N\). Let \(\pi \) be a one-way protocol given by

$$\begin{aligned} M_\pi (x_j, r_k) = {\left\{ \begin{array}{ll} 0 &{} {\text {if }}\; k \le \left\lfloor \frac{N}{j+1} \right\rfloor ,\\ 1 &{} {\text {otherwise.}} \end{array}\right. } \end{aligned}$$

Then conditioned on \(M_\pi = 0\), we will have \(J(x_j, r_k) = k\). Let \(M = \sum _{i=1}^N \lfloor \frac{N}{i+1}\rfloor \) be the size of \(M_\pi ^{-1}(0)\). Finally, let m denote the event \(M_\pi = 0\). Then

$$\begin{aligned} I(X : J | m)= & {} H(X | m) - H(X | m, J) \\= & {} \sum _{j = 1}^N \frac{1}{M} \cdot \left\lfloor \frac{N}{j+1} \right\rfloor \log \frac{M}{\left\lfloor \frac{N}{j+1}\right\rfloor } - \sum _{k=1}^N \frac{1}{M} \cdot \left\lfloor \frac{N}{k+1} \right\rfloor \log \left\lfloor \frac{N}{k+1}\right\rfloor \\= & {} \log M - \frac{2}{M} \sum _{i=1}^N \left\lfloor \frac{N}{i+1} \right\rfloor \log \left\lfloor \frac{N}{i+1}\right\rfloor , \end{aligned}$$

which is \(\ge U\) if and only if:

$$\begin{aligned} 2 \sum _{i=1}^N \left\lfloor \frac{N}{i+1} \right\rfloor \log \left\lfloor \frac{N}{i+1}\right\rfloor \le M ( \log M - U ). \end{aligned}$$
(4)

Let us denote the left-hand side with A and the right-hand side with B. Because \(\frac{N}{x}\) is monotonically decreasing for \(x \ge 1\), then:

$$\begin{aligned} A \le \frac{2}{\ln 2} \int _1^{N+1} \frac{N}{x} \ln \frac{N}{x} d x. \end{aligned}$$

The relevant primitive is \(\int \frac{N}{x} \ln \frac{N}{x} d x = -\frac{1}{2} N (\ln \frac{N}{x})^2\) and hence

$$\begin{aligned} A\le & {} \frac{2}{\ln 2} \left( -\frac{1}{2} N \left( \ln \frac{N}{N+1}\right) ^2 + \frac{1}{2} N (\ln N)^2 \right) \\= & {} \frac{2}{\ln 2} \left( N \ln N \ln (N + 1) - \frac{1}{2} N (\ln (N+1))^2\right) . \end{aligned}$$

We denote this last quantity by \(A'\). Good bounds for M are:Footnote 9

$$\begin{aligned} N \ln N - 3 N \le M = \sum _{i=1}^N \left\lfloor \frac{N}{i+1}\right\rfloor \le N \ln N + N . \end{aligned}$$

Let \(B' := N \ln N - 3 N\), so that \(B \ge B'(\log B' - U)\). Then we will show that for an appropriate choice of U,

$$\begin{aligned} A' \le B' ( \log B' - U ) \end{aligned}$$

and hence \(A \le B\) and also \(I(X : J | m) \ge U\). Equivalently,

$$\begin{aligned} A'- B' \log B' + B' U \le 0 \end{aligned}$$
(5)

For convenience, let \(\alpha = \frac{\ln (N+1)}{\ln N}\) (which goes to 1 as N goes to \(\infty \)). Then \(A' = \frac{1}{\ln 2} N (\ln N)^2 (2 \alpha - \alpha ^2)\) and \(B' \log B' = \frac{1}{\ln 2} N (\ln N)^2 + \frac{1}{\ln 2} N \ln N \ln \ln N + O(N \ln N)\). Now the proof follows from the following:

Claim

\(N (\ln N)^2(2 \alpha - \alpha ^2 - 1) \rightarrow - \frac{1}{N}\) as \(N \rightarrow \infty \).

Because under this claim, the dominant negative term in (5) is \(\frac{1}{\ln 2} N \ln N \ln \ln N\), and thus all we need to do is set U to be \(c \ln \ln N\) for some \(c < \frac{1}{\ln 2}\), and this ensures (5) is negative. For such a choice of U, it will hold that

$$\begin{aligned} I(X:J|m) \ge U = c \ln \ln N = \varOmega (\log \ell ). \end{aligned}$$

Unfortunately, l’Hopital’s rule does not seem to help us, as the terms become too complicated. Instead we estimate how fast \((2 \alpha - \alpha ^2 - 1)\) approaches 0 as N goes to infinity. For this, let \(\beta = \frac{\ln (\frac{1}{x} + 1)}{\ln \frac{1}{x}}\) and let us estimate \(\beta \) as x approaches 0. For x close to, but different than, 0, we have:

$$\begin{aligned} \beta = 1 - \frac{1}{\ln x} \ln (x + 1) = 1 - \frac{x}{\ln x} + \frac{x^2}{2\ln x} \pm O\left( \frac{x^3}{\ln x} \right) \end{aligned}$$

(the last equality is by the Taylor expansion of \(\ln (x + 1)\) around 0). We also have

$$\begin{aligned} \beta ^2= & {} \left( 1 - \frac{x}{\ln x} + \frac{x^2}{2 \ln x} - O\left( \frac{x^3}{\ln x} \right) \right) ^2 \\= & {} \beta - \frac{x}{\ln x} + \frac{x^2}{(\ln x)^2} + \frac{x^2}{2\ln x} \pm O\left( \frac{x^3}{(\ln x)^2} \right) . \end{aligned}$$

Hence,

$$\begin{aligned} 2 \beta - \beta ^2 = 1 - \frac{x^2}{(\ln x)^2} \pm O\left( \frac{x^3}{(\ln x)^2} \right) . \end{aligned}$$

From this we can conclude that for \(x = 1/N\), we have

$$\begin{aligned} 2 \alpha - \alpha ^2 - 1 = - \frac{1}{N^2 (\ln N)^2} \pm O\left( \frac{1}{N^3 (\ln N)^2} \right) , \end{aligned}$$

and our claim follows. \(\square \)