Derandomization for Sliding Window Algorithms with Strict Correctness∗

In the sliding window streaming model the goal is to compute an output value that only depends on the last n symbols from the data stream. Thereby, only space sublinear in the window size n should be used. Quite often randomization is used in order to achieve this goal. In the literature, one finds two different correctness criteria for randomized sliding window algorithms: (i) one can require that for every data stream and every time instant t, the algorithm computes a correct output value with high probability, or (ii) one can require that for every data stream the probability that the algorithm computes at every time instant a correct output value is high. Condition (ii) is stronger than (i) and is called “strict correctness” in this paper. The main result of this paper states that every strictly correct randomized sliding window algorithm can be derandomized without increasing the worst-case space consumption.


Introduction
Sliding window streaming algorithms process an input sequence a 1 a 2 · · · a m from left to right and receive at time t the symbol a t as input. Such algorithms are required to compute at each time instant t a value f (a t−n+1 · · · a t ) that depends on the n last symbols (we should assume t ≥ n here). The value n is called the window size and the sequence a t−n+1 · · · a t is called the window content at time t. In many applications, data items in a stream are outdated after a certain time, and the sliding window model is a simple way to model this. A typical application is the analysis of a time series as it may arise in network monitoring, healthcare and patient monitoring, and transportation grid monitoring [3].
A general goal in the area of sliding window algorithms is to avoid the explicit storage of the window content, and, instead, to work in considerably smaller space, e.g. space polylogarithmic in the window size. In the seminal paper of Datar, Gionis, Indyk and Motwani [12], where the sliding window model was introduced, the authors prove that the number of 1's in a 0/1-sliding window of size n can be maintained in space O( 1 · log 2 n) if one allows a multiplicative error of 1 ± . Also a matching lower bound is shown. Other algorithmic problems that were addressed in the extensive literature on sliding window streams include the computation of statistical data (e.g. computation of the variance and k-median [5], and quantiles [4]), optimal sampling from sliding windows [9], membership problems for formal languages [13][14][15][16], computation of edit distances [10], database querying (e.g. processing of join queries over sliding windows [18]) and graph problems (e.g. checking for connectivity and computation of matchings, spanners, and minimum spanning trees [11]). The reader can find further references in [1,Chapter 8] and [8].
Many of the above mentioned papers deal with sliding window algorithms that only compute a good enough approximation of the exact value of interest. In fact, even for very simple sliding window problems it is unavoidable to store the whole window content. Examples are the exact computation of the number of 1's [12] or the computation of the first symbol of the sliding window for a 0/1-data stream [14]. In this paper, we consider a general model for sliding window approximation problems, where a (possibly infinite set) of admissible output values is fixed for each word. To be more accurate, a specific approximation problem is described by a relation Φ ⊆ * × Y which associates to words over a finite alphabet (the set of data values in the stream) admissible output values from a possibly infinite set Y . A sliding window algorithm for such a problem is then required to compute at each time instant an admissible output value for the current window content. This model covers exact algorithms (where Φ is a function Φ : * → Y ) as well as a wide range of approximation algorithms. For example the computation of the number of 1's in a 0/1-sliding window with an allowed multiplicative error of 1 ± is covered by our model, since for a word with k occurrences of 1, the admissible output values are the integers between (1 − )k and (1 + )k.
A second ingredient of many sliding window algorithms is randomization. Following our recent work [13][14][15] we model a randomized streaming algorithm for a given approximation problem as a probabilistic automaton over a finite alphabet. Probabilistic automata were introduced by Rabin [23] and can be seen as a common generalization of deterministic finite automata and Markov chains. The basic idea is that for every state q and every input symbol a, the next state is chosen according to some probability distribution. As an extension to the classical model of Rabin, states in a probabilistic automaton are not accepting or rejecting but are associated with output values from the set Y . This allows to associate with every input word w ∈ * and every output value y ∈ Y the probability that the automaton outputs y on input w. In order to solve a specific approximation problem Φ ⊆ * × Y in the sliding window model one should require that for a given window size n, a probabilistic automaton P n should have a small error probability λ (say λ = 1/3) on every input stream. But what does the latter exactly mean? Two different definitions can be found in the literature: -For every input stream w = a 1 · · · a m , the probability that P n outputs on input w a value y ∈ Y with (a m−n+1 · · · a m , y) / ∈ Φ is at most λ. Clearly an equivalent formulation is that, for all input streams w = a 1 · · · a m and all 0 ≤ t ≤ m the probability that P n outputs on input a 1 . . . a t a value y ∈ Y with (a t−n+1 · · · a t , y) / ∈ Φ is at most λ. In this case, we say that P n is λ-correct for Φ and window size n.
-For every input stream w = a 1 · · · a m , the probability that P n outputs at some In this case, we say that P n is strictly λ-correct for Φ and window size n.
One can rephrase the difference between strict λ-correctness and λ-correctness as follows: λ-correctness means that while the randomized sliding window algorithm runs on an input stream it returns at each time instant an admissible output value with probability at least 1 − λ. In contrast, strict λ-correctness means that while the randomized sliding window algorithm reads an input stream, the probability that the algorithm returns an admissible output value at every time instant is at least 1 − λ. Obviously this makes a difference: imagine that Ω = {1, 2, 3, 4, 5, 6} and that for every input word w ∈ * the admissible output values are 2,3,4,5,6, then the algorithm that returns at every time instant the output of a fair dice throw is 1/6-correct. But the probability that this algorithm returns an admissible output value at every time instant is only (5/6) m for an input stream of length m and hence converges to 0 for m → ∞. Of course, in general, the situation is more complex since successive output values of a randomized sliding window algorithm are not independent.
In the following discussion, let us fix the error probability λ = 1/3 (using probability amplification, one can reduce λ to any constant > 0). In our recent paper [15] we studied the space complexity of the membership problem for regular languages with respect to λ-correct randomized sliding window algorithms. It turned out that in this setting, one can gain from randomization. Consider for instance the regular language ab * over the alphabet {a, b}. Thus, the sliding window algorithm for window size n should output "yes", if the current window content is ab n−1 and "no" otherwise. From our results in [13,14], it follows that the optimal space complexity of a deterministic sliding window algorithm for the membership problem for ab * is Θ(log n). On the other hand, it is shown in [15] that there is an λ-correct randomized sliding window algorithm for ab * with (worst-case) space complexity O(log log n) (this is also optimal). In fact, we proved in [15] that for every regular language L, the space optimal λ-correct randomized sliding window algorithm for L has either constant, doubly logarithmic, logarithmic, or linear space complexity, and the corresponding four space classes can be characterized in terms of simple syntactic properties.
Strict λ-correctness is used (without explicit mentioning) for instance in [7,12]. 1 In these papers, the lower bounds shown for deterministic sliding-window algorithms are extended with the help of Yao's minimax principle [24] to strictly λ-correct randomized sliding-window algorithms. The main result from the first part of the paper states that this is a general phenomenon: we show that every strictly λ-correct sliding window algorithm for an approximation problem Φ can be derandomized without increasing the worst-case space complexity (Theorem 1). To the best of our knowledge, this is the first investigation on the general power of randomization on the space consumption of sliding window algorithms. We emphasize that our proof does not utilize Yao's minimax principle, which would require the choice of a "hard" distribution of input streams specific to the problem. It remains open, whether such a hard distribution exists for every approximation problem.
We remark that the proof of Theorem 1 needs the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize it. In fact, we show that for a certain problem a restriction to polynomially long input streams yields an advantage of strictly correct randomized algorithms over deterministic ones, see Propositions 1 and 2. Whether such an advantage can be also obtained for input streams of length singly exponential in the window size remains open.
In the second part of the paper we come back to the problem of counting the number of 1's in a sliding window [7,12]. Datar et al. [12] proved a space lower bound of Ω( 1 · log 2 n) for approximating the number of 1's in a sliding window of size n with a multiplicative error of 1 ± . This lower bound is first shown for deterministic algorithms and then, using Yao's minimax principle [24], extended to strictly λcorrect randomized sliding-window algorithms. We show that the same lower bound also holds for the wider class of λ-correct randomized sliding-window algorithms (Theorem 2). For the proof of this result we first show a lower bound for the oneway randomized communication complexity of the following problem: Alice holds m many bit numbers a 1 , . . . , a m , and Bob holds an index 1 ≤ i ≤ m and an -bit number b. The goal of Bob is to find out whether a i > b holds. We show that Alice has to transfer at least m /3 bits to Bob if the protocol has an error probability of at most 1/200 (Theorem 4). From this result we can derive Theorem 2 using ideas from the lower bound proof in [12].
Let us add further remarks on our sliding window model. First of all, it is crucial for our proofs that the input alphabet (i.e., the set of data values in the input stream) is finite. This is for instance the case when counting the number of 1's in a 0/1-sliding window. On the other hand, the problem of computing the sum of all data values in a sliding window of arbitrary numbers (a problem that is considered in [12] as well) is not covered by our setting, unless one puts a bound on the size of the numbers in the input stream.
As a second remark, note that our sliding window model is non-uniform in the sense that for every window size we may have a different streaming algorithm. In other words: it is not required that there exists a single streaming algorithm that gets the window size as a parameter. Clearly, lower bounds get stronger when shown for the non-uniform model. Moreover, all proofs of lower bounds in the sliding window setting, we are aware of, hold for the non-uniform model.

Preliminaries
With [0, 1] we denote the real interval {p ∈ R : 0 ≤ p ≤ 1} of all probabilities. With log we always mean the logarithm to the base two.
The set of all words over a finite alphabet is denoted by * . The empty word is denoted by ε. The length of a word w ∈ * is denoted with |w|. The sets of words over of length exactly, at most and at least n are denoted by n , ≤n and ≥n , respectively. In the context of streaming algorithms, we also use the term "stream" for words.

Approximation Problems
An approximation problem is a relation Φ ⊆ * × Y where is a finite alphabet and Y is a (possibly infinite) set of output values. The relation Φ associates with each word w ∈ * a set of admissible or correct output values in Y . Typical examples include: -exact computation problems ϕ : * → Y where we identify ϕ with its graph Another exact problem is given by the characteristic function χ L : * → {0, 1} of a language L ⊆ * (χ L (w) = 1 if and only if w ∈ L). -approximation of some numerical value for the data stream, which can be modeled by a relation Φ ⊆ * × N. A typical example would be the problem For a window length n ≥ 0 and a stream w ∈ * we define last n (w) to be the suffix of n w of length n where ∈ is a fixed alphabet symbol. The word last n (ε) = n is also called the initial window. To every approximation problem Φ ⊆ * × Y we associate the sliding window problem for window length n.

Probabilistic Automata with Output
In the following we will introduce probabilistic automata [22,23] as a model of randomized streaming algorithms which produce an output after each input symbol. A randomized streaming algorithm or a probabilistic automaton P = (Q, , ι, ρ, o) consists of a (possibly infinite) set of states Q, a finite alphabet , an initial state distribution ι : Q → [0, 1], a transition probability function ρ : The space of the randomized streaming algorithm P (or the number of bits used by P) is given by s(P) = log |Q| ∈ R ≥0 ∪ {∞}.
If ι and ρ map into {0, 1}, then P is a deterministic automaton; in this case we write P as P = (Q, , q 0 , δ, o), where q 0 ∈ Q is the initial state and δ : Q × → Q is the transition function. A run on a word a 1 · · · a m ∈ * in P is a sequence π = (q 0 , a 1 , q 1 , a 2 , . . . , a m , q m ) where q 0 , . . . , q m ∈ Q and ρ(q i−1 , a i , q i ) > 0 for all 1 ≤ i ≤ m. If m = 0 we obtain the empty run (q 0 ) starting and ending in q 0 . We write runs in the usual way We denote by Runs(P, w) the set of all runs on w in P and denote by Runs(P, q, w) those runs on w that start in q ∈ Q. If P is clear from the context, we simply write Runs(w) and Runs(q, w). Notice that for each w ∈ * the function ρ ι is a probability distribution on Runs(P, w) and for each q ∈ Q the restriction of ρ to Runs(P, q, w) is a probability distribution on Runs(P, q, w). If is a set of runs (which will often be defined by a certain property of runs), then Pr π∈Runs(w) [π ∈ ] denotes the probability π∈Runs(w)∩ ρ ι (π ) and Pr π∈Runs(q,w) [π ∈ ] denotes π∈Runs(q,w)∩ ρ(π).

Correctness definitions
Let P = (Q, , ι, ρ, o) be a randomized streaming algorithm with output function o : Q → Y , let Φ ⊆ * ×Y be an approximation problem and let w = a 1 a 2 · · · a m ∈ * be an input stream. Furthermore let 0 ≤ λ ≤ 1 be an error probability.
A (strictly) λ-correct randomized streaming algorithm P n for SW n (Φ) is also called a (strictly) λ-correct randomized sliding window algorithm for Φ and window size n. If P n is deterministic and (strictly) 0-correct, we speak of a deterministic sliding window algorithm for Φ and window size n. The reader might think of having for every window size n a sliding window algorithm P n . We do not assume any uniformity here in the sense that the sliding window algorithms for different window sizes do not have to follow a common pattern. This is the same situation as in non-uniform circuit complexity, where one has for every input length n a circuit C n and it is not required that the circuit C n can be computed from n.

Remark 1
The trivial sliding window algorithm stores for window size n the window content with log | | · n bits. Hence every approximation problem has a deterministic sliding window algorithm D n with s(D n ) ≤ log | | · n. In particular, for every (strictly) λ-correct randomized sliding window algorithm P n for Φ and window size n, there exists a (strictly) λ-correct randomized sliding window algorithm P n for Φ and window size n such that s(P n ) ≤ min{s(P n ), log | | · n}.

Derandomization of Strictly Correct Algorithms
In this section we prove the main result of this paper, which states that strictly correct randomized sliding window algorithms can be completely derandomized: Theorem 1 Let Φ ⊆ * × Y be an approximation problem, n ∈ N be a window size and 0 ≤ λ < 1 be an error probability. For every randomized sliding window algorithm P n which is strictly λ-correct for Φ and window size n there exists a deterministic sliding window algorithm D n for Φ and window size n such that s(D n ) ≤ s(P n ).
The proof idea is to successively construct a (doubly exponentially long) strictly correct run which reads all possible windows of length n from a certain subset of memory states. This strictly correct run then defines a deterministic algorithm which is always correct.
Let Φ ⊆ * × Y , n ∈ N be a window size and 0 ≤ λ < 1 as in Theorem 1. Let P n be a strictly λ-correct sliding window algorithm for Φ and window size n. By Remark 1, we can assume that P n has a finite state set. Consider a run π : q 0 Consider a nonempty subset S ⊆ Q and a function δ : Q × → Q such that S is closed under δ, i.e., δ(S × ) ⊆ S. We say that the run π is δ-conform if δ(q i−1 , a i ) = q i for all 1 ≤ i ≤ m. We say that π is (S, δ)universal if for all q ∈ S and x ∈ n there exists a δ-conform subrun π : q x − → q of π . Finally, π is δ-universal if it is (S, δ)-universal for some nonempty subset S ⊆ Q which is closed under δ. Lemma 1 Let π be a strictly correct run in P n for Φ, let S ⊆ Q be a nonempty subset and let δ : Q × → Q be a function such that S is closed under δ. If π is (S, δ)-universal, then there exists q 0 ∈ S such that D n = (Q, , q 0 , δ, o) is a deterministic sliding window algorithm for Φ and window size n.
Proof Let q 0 = δ(p, n ) ∈ S for some arbitrary state p ∈ S and define D n = (Q, , q 0 , δ, o). Let w ∈ * and consider the run σ : p n −→ q 0 w − → q in D n of length ≥ n. We have to show that (last n (w), o(q)) ∈ Φ. We can write n w = x last n (w) for some x ∈ * . Hence, we can rewrite the run σ as σ : p For the rest of this section we fix an arbitrary function δ : Q × → Q such that for all q ∈ Q, a ∈ , ρ(q, a, δ(q, a)) = max{ρ(q, a, p) : p ∈ Q}.
Thus, we choose δ(q, a) as a most likely a-successor of q. Note that ρ (q, a, δ(q, a) for all q ∈ Q, a ∈ . Furthermore, let D n = (Q, , q 0 , δ, o) where the initial state q 0 will be defined later. We inductively define for each i ≥ 1 a state p i , a run π * i in D n on some word w i ∈ * , and a non-empty set S i ⊆ Q, which is closed under δ. For m ≥ 0, we abbreviate Runs(P n , w 1 · · · w m ) by R m . Note that R 0 = Runs(P n , ε). For 1 ≤ i ≤ m let H i denote the event that for a random run π = π 1 · · · π m ∈ R m , where each π j is a run on w j , the subrun π i is (S i , δ)-universal. Notice that H i is independent of m ≥ i.
First, we choose for p i (i ≥ 1) a state that maximizes the probability which is at least 1/|Q|. Note that p 1 is a state such that ι(p 1 ) is maximal, since R 0 only consists of empty runs (q). For S i we take any maximal strongly connected component of D n (viewed as a directed graph) which is reachable from p i . As usual, strongly connected component means that for all p, q ∈ S i the state p is reachable from q and vice versa. Maximality means that for every q ∈ S i and every a ∈ , also δ(q, a) belongs to S i , i.e., S i is closed under δ. Note that such a δ-closed strongly connected component must exist since Q is finite. Finally, we define the run π * i and the word w i . The run π * i starts in p i . Then, for each pair (q, x) ∈ S i × n the run π * i leads from the current state to state q via a simple run and then reads the word x from q. The order in which we go over all pairs (q, x) ∈ S i × n is not important. Since S i is a maximal strongly connected component of D n such a run π * i exists. Hence, π * i is a run on a word where y q,x is the word that leads from the current state via a simple run to state q.
Since we choose the runs on the words y q,x to be simple, we have |y q,x | ≤ |Q| and thus |w i | ≤ |Q| · | | n · (|Q| + n). Let us define Note that by construction, the run π * i is (S i , δ)-universal. Inequality (1)   Since the event [π m = π * m ] implies the event [π m−1 ends in p m ] (recall that π * m starts in p m ) and π * m is (S m , δ)-universal, we obtain: where the last inequality follows from (3). This proves the lemma.
Thus, r m := Pr π∈R m [∃i ≤ m : We can now show our main theorem: Proof (Theorem 1) We use the probabilistic method in order to show that there exists q 0 ∈ Q such that D n = (Q, , q 0 , δ, o) is a deterministic sliding window algorithm for Φ. With Lemma 3 we get Pr π∈R m [π is strictly correct for SW n (Φ) and δ-universal] [π is not strictly correct for SW n (Φ) or is not δ-universal] μ) (note that λ < 1 and 0 < μ < 1 since we can assume that |Q| ≥ 2). Hence there are m ≥ 0 and a strictly correct δ-universal run π ∈ R m . We can conclude with Lemma 1.

Polynomially Long Streams
The word w 1 w 2 · · · w m (with m > log(1 − λ)/ log(1 − μ)) from the previous section, for which there exists a strictly correct and δ-universal run has a length that is doubly exponential in the window size n. To see this note that 0 Here, ln(1 − λ) is a negative constant and μ − 1 is very close to −1. Moreover, 1/μ grows doubly exponential in n by (2). In other words: We need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize the algorithm. In this section we show that at least we cannot reduce the length to poly(n): if we restrict to inputs of length poly(n) then strictly λ-correct sliding window algorithms can yield a proper space improvement over deterministic sliding window algorithms.
For a word w = a 1 · · · a n let w R = a n · · · a 1 denote the reversed word. Take the language K pal = {ww R : w ∈ {a, b} * } of all palindromes of even length, which belongs to the class DLIN of deterministic linear context-free languages [6], and let L = $K pal . As explained in Section 2.1 we identify L with the (exact) approximation problem χ L : {a, b, $} * → {0, 1} where χ L (w) = 1 if and only if w ∈ L. We write L n for SW n (χ L ). Note that the following proposition holds for arbitrarily long input streams.
Proposition 1 Any deterministic sliding window algorithm for L and window size 2n + 1 uses Ω(n) space.
Proof Let D 2n+1 be a deterministic sliding window algorithm for L and window size 2n + 1, and take two distinct words $x and $y where x, y ∈ {a, b} n . Since D 2n+1 accepts $xx R and rejects $yx R , the algorithm D 2n+1 reaches two different states on the inputs $x and $y. Therefore, D 2n+1 must have at least |{a, b} n | = 2 n states and hence Ω(n) space. Proposition 2 Fix a polynomial p(n) and let n ∈ N be a window size. If n is large enough, there is a randomized streaming algorithm P n with s(P n ) ≤ O(log n) such that Pr π∈Runs(P n ,w) [π is strictly correct for L n ] ≥ 1 − 1/n for all input words w ∈ * with |w| ≤ p(n).
Proof Babu et al. [6] have shown that for every language K ∈ DLIN there exists a randomized streaming algorithm using space O(log n) which, given an input v of length n, -accepts with probability 1 if v ∈ K, -and rejects with probability at least 1 − 1/n if v / ∈ K.
We use this statement for the language K pal ∈ DLIN. We remark that the algorithm needs to know the length of v in advance. To stay consistent with our definition, we view the above algorithm as a family (S n ) n≥0 of randomized streaming algorithms S n . Furthermore, the error probability 1/n can be further reduced to 1/(n+1) d where d is chosen such that p(n) ≤ n d for sufficiently large n (by picking random primes of size Θ(n d+1 ) in the proof from [6]). Now we prove our claim for L = $K pal . The streaming algorithm P n for window size n works as follows: After reading a $-symbol, the algorithm S n−1 from above is simulated on the longest factor from {a, b} * that follows (i.e. S n−1 is simulated until the next $ arrives). Simultaneously we maintain the length of the maximal suffix over {a, b}, up to n, using O(log n) bits. If reaches n−1, then P n accepts if and only if S n−1 accepts. Notice that P n only errs if the stored length is n − 1, which happens at most once in every n steps. Therefore the number of time instants where P n errs on an input stream w of length |w| ≤ p(n) ≤ n d is at most |w|/n ≤ n d /n = n d−1 (if n is large enough). Moreover, at each of these time instants the error probability is at most 1/n d . By the union bound we have for every stream w ∈ {$, a, b} ≤p(n) : Pr π∈Runs(P n ,w) [π is not strictly correct for L n ] ≤ n d−1 · 1 n d = 1/n. This concludes the proof.

Lower Bound for Basic Counting
For an approximation error > 0 let us define the basic counting problem where c 1 (w) denotes the number of 1's in w. In [12] Datar, Gionis, Indyk and Motwani prove that any strictly λ-correct randomized sliding window algorithm for C 1, and window size n must use k 64 log 2 n k − log(1 − λ) bits where k = 1/ . We adapt their proof to show that the lower bound also holds for the weaker notion of λ-correct randomized sliding window algorithms.
Theorem 2 Let > 0 and k = 1/ . Every 1/200-correct randomized sliding window algorithm for C 1, and window size n ≥ 4k must use k 48 log 2 ( n k ) many bits.
In the statement above we can assume any algorithm with error probability λ < 1/2 using the median trick, see e.g. [2]: We run m copies of the algorithm in parallel and output the median of their outputs. Using the Chernoff bound we can choose m such that the median is a correct -approximation with error probability 1/200. This reduces the space lower bound only by a constant.
For the rest of the section let us fix n ∈ N and 0 < < 1. Furthermore set k = 1/ . For the proof we use a reduction from a suitable communication problem. If m = 1 we write GT = GT ,1 .

Proposition 3 Let B =
√ nk such that n ≥ 4k, and j = log n B . If P n is a λ-correct sliding window algorithm for C 1, and window size n then there exists a one-round protocol for GT log 4B k , jk 4 with cost s(P n ) and error probability λ.
Proof In the following we ignore rounding issues. The idea is that Alice encodes her jk/4 many numbers by a bit stream consisting of jk/4 groups and feeding it into the sliding window algorithm. Then Bob can compare his number b with any of Alice's numbers a (i) with high probability, by sliding the window to the appropriate position.
As in [12] we partition the window of length n into j blocks of size B, 2B, 4B, . . . , 2 j −1 B from right to left where j = log n B . Notice that j ≥ 1 by our assumption that n ≥ 4k. The blocks are numbered 0 to j −1 from right to left. The i-th block of length 2 i B is divided into B many subblocks of length 2 i . Each block is divided into k/4 groups consisting of 4B/k contiguous subblocks. In the following we choose from every group exactly one subblock which is filled with 1's; the remaining subblocks in the group are filled with 0's. An example is shown in Figure 1. where both concatenations are interpreted from right to left. Datar et al. [12] argue that for any two distinct tuples a and b, the arrangements w(a) and w(b) must be distinguished by a deterministic sliding window algorithm for C 1, and window length n.
We will present a communication protocol for GT log 4B k , jk 4 based on a λ-correct sliding window algorithm P n for C 1, . Notice that log 4B k ≥ 1 by the assumption that n ≥ 4k. Suppose that Alice holds the tuple a = (a (1) , . . . , a (j k/4) ) of numbers from M, Bob holds b ∈ M and an index 1 ≤ p ≤ jk/4. Their goal is to determine whether a p > b. The protocol is defined as follows: Alice simulates P n on w(a) and sends the reached state to Bob, using s(P n ) bits. Suppose that p = ik/4 + r for some 0 ≤ i ≤ j − 1 and 1 ≤ r ≤ k/4. Bob then insert a suitable number of 0's in the stream such that the length-n window starts with the b-th subblock from the r-th group in the i-th block of w(a). Notice that this is possible without knowing the tuple a because of the regular structure of arrangements which is known to Bob. The number of 1-bits in the obtained window is precisely Since the absolute approximation error is bounded by the two cases above can be distinguished by P n with probability 1 − λ.
It remains to prove a lower bound for the one-round communication complexity of GT ,m . We start by showing that the one-round communication complexity of GT n is Ω(n). This was already proven by Yao [25,Theorem 5]. More generally, Miltersen et al. showed that any r-round protocol for GT n requires Ω(n 1/r ) bits using the round elimination technique [21]. We will first reprove the Ω(n) lower bound for GT n , by directly plugging in r = 1 and GT n into the proof of [21, Lemma 11]. Afterwards we adapt the proof to show the Ω( m) lower bound for GT ,m .
Theorem 3 Every one-round randomized protocol for GT n with error probability 1/200 has cost at least n/3 bits.
Proof We follow the proof of [21, Lemma 11]. Consider a randomized one-round protocol for GT n with error probability 1/200. The goal is to prove that the protocol must use n/3 bits. By Yao's minimax principle [24] it suffices to exhibit a "hard" input distribution D on the set of inputs {0, 1} n × {0, 1} n and to prove that every deterministic protocol P with Pr D [P (x, y) = GT n (x, y)] ≤ 1/200 must have cost Ω(n). See [20] for similar applications of Yao's minimax principle in the area of communication complexity.
For a bit string x = x 1 · · · x n and an index 1 ≤ i ≤ n we define the bit string τ i (x) = x 1 · · · x i−1 01 n−i of length n. Interpreted as binary numbers, we have the property The "hard" input distribution D is the uniform distribution on In other words, Alice holds a uniformly random string x ∈ {0, 1} n and Bob holds τ i (x) where the index 1 ≤ i ≤ n is also chosen uniformly at random and independently from x. By property (4) Bob needs to determine the value of x i . Intuitively, the prefix x 1 · · · x i−1 of τ i (x) does not help Bob, so this is basically the "index"function, for which every one-round randomized protocol with error probability 1/3 has cost Ω(n) [19,Theorem 3.7]. Consider any deterministic protocol P with communication cost c such that Hence the set R = {x ∈ {0, 1} n | x has at most 0.01n bad indices} must contain at least 2 n−1 bit strings. Since Alice sends at most c bits, she partitions R into at most 2 c subsets according to the bit string send to Bob. Let T be one of these subsets that has maximum cardinality. We have |T | ≥ |R|/2 c = 2 n−1−c , i.e., c ≥ n − 1 − log |T |.
We call an index 1 ≤ i ≤ m bad if P errs on (x, τ i (x), p(i)). Again we can find a set T ⊆ {0, 1} m such that -|T | ≥ 2 m−1−c , -all x ∈ T have at most 0.01 m bad indices, -and Alice sends the same message on all x ∈ T .
Using precisely the same arguments as in the previous proof we obtain |T | ≤ 1.0917 m and thus c ≥ m/3 whenever m ≥ 2. If m = 1 then Alice must also send at least one bit in any communication protocol for GT ,m with error probability 1/200.
Theorem 2 now follows from Proposition 3 and Theorem 4: Let B = √ nk and j = log n B ≥ 1. Then, any randomized 1/200-correct sliding window algorithm for C 1, and window size n must use at least · log 2 n k many bits.

Open Problems
In the proof of Theorem 1 we need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability. We pose the question whether this can be reduced to exponentially long streams. Another open problem is whether one can extend Theorem 4 to arbitrary communication problems. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommonshorg/licenses/by/4.0/.