1 Introduction

Sliding window streaming algorithms process an input sequence a1a2am from left to right and receive at time t the symbol at as input. Such algorithms are required to compute at each time instant t a value f(atn+ 1at) that depends on the n last symbols (we should assume tn here). The value n is called the window size and the sequence atn+ 1at is called the window content at time t. In many applications, data items in a stream are outdated after a certain time, and the sliding window model is a simple way to model this. A typical application is the analysis of a time series as it may arise in network monitoring, healthcare and patient monitoring, and transportation grid monitoring [3].

A general goal in the area of sliding window algorithms is to avoid the explicit storage of the window content, and, instead, to work in considerably smaller space, e.g. space polylogarithmic in the window size. In the seminal paper of Datar, Gionis, Indyk and Motwani [12], where the sliding window model was introduced, the authors prove that the number of 1’s in a 0/1-sliding window of size n can be maintained in space \(O(\frac {1}{\epsilon } \cdot \log ^{2} n)\) if one allows a multiplicative error of 1 ± 𝜖. Also a matching lower bound is shown. Other algorithmic problems that were addressed in the extensive literature on sliding window streams include the computation of statistical data (e.g. computation of the variance and k-median [5], and quantiles [4]), optimal sampling from sliding windows [9], membership problems for formal languages [13,14,15,16], computation of edit distances [10], database querying (e.g. processing of join queries over sliding windows [18]) and graph problems (e.g. checking for connectivity and computation of matchings, spanners, and minimum spanning trees [11]). The reader can find further references in [1, Chapter 8] and [8].

Many of the above mentioned papers deal with sliding window algorithms that only compute a good enough approximation of the exact value of interest. In fact, even for very simple sliding window problems it is unavoidable to store the whole window content. Examples are the exact computation of the number of 1’s [12] or the computation of the first symbol of the sliding window for a 0/1-data stream [14]. In this paper, we consider a general model for sliding window approximation problems, where a (possibly infinite set) of admissible output values is fixed for each word. To be more accurate, a specific approximation problem is described by a relation \({\varPhi } \subseteq {\Sigma }^{*} \times Y\) which associates to words over a finite alphabet Σ (the set of data values in the stream) admissible output values from a possibly infinite set Y. A sliding window algorithm for such a problem is then required to compute at each time instant an admissible output value for the current window content. This model covers exact algorithms (where Φ is a function Φ: ΣY) as well as a wide range of approximation algorithms. For example the computation of the number of 1’s in a 0/1-sliding window with an allowed multiplicative error of 1 ± 𝜖 is covered by our model, since for a word with k occurrences of 1, the admissible output values are the integers between (1 − 𝜖)k and (1 + 𝜖)k.

A second ingredient of many sliding window algorithms is randomization. Following our recent work [13,14,15] we model a randomized streaming algorithm for a given approximation problem as a probabilistic automaton over a finite alphabet. Probabilistic automata were introduced by Rabin [23] and can be seen as a common generalization of deterministic finite automata and Markov chains. The basic idea is that for every state q and every input symbol a, the next state is chosen according to some probability distribution. As an extension to the classical model of Rabin, states in a probabilistic automaton are not accepting or rejecting but are associated with output values from the set Y. This allows to associate with every input word w ∈Σ and every output value yY the probability that the automaton outputs y on input w. In order to solve a specific approximation problem \(\varPhi \subseteq {\Sigma }^{*} \times Y\) in the sliding window model one should require that for a given window size n, a probabilistic automaton \(\mathcal {P}_{n}\) should have a small error probability λ (say λ = 1/3) on every input stream. But what does the latter exactly mean? Two different definitions can be found in the literature:

  • For every input stream w = a1am, the probability that \(\mathcal {P}_{n}\) outputs on input w a value yY with (amn+ 1am,y)∉Φ is at most λ. Clearly an equivalent formulation is that, for all input streams w = a1am and all 0 ≤ tm the probability that \(\mathcal {P}_{n}\) outputs on input \(a_{1} {\dots } a_{t}\) a value yY with (atn+ 1at,y)∉Φ is at most λ. In this case, we say that \(\mathcal {P}_{n}\) is λ-correct for Φ and window size n.

  • For every input stream w = a1am, the probability that \(\mathcal {P}_{n}\) outputs at some time instant t (ntm) a value yY with (atn+ 1at,y)∉Φ is at most λ. In this case, we say that \(\mathcal {P}_{n}\) is strictly λ-correct for Φ and window size n.

One can rephrase the difference between strict λ-correctness and λ-correctness as follows: λ-correctness means that while the randomized sliding window algorithm runs on an input stream it returns at each time instant an admissible output value with probability at least 1 − λ. In contrast, strict λ-correctness means that while the randomized sliding window algorithm reads an input stream, the probability that the algorithm returns an admissible output value at every time instant is at least 1 − λ. Obviously this makes a difference: imagine that Ω = {1,2,3,4,5,6} and that for every input word w ∈Σ the admissible output values are 2,3,4,5,6, then the algorithm that returns at every time instant the output of a fair dice throw is 1/6-correct. But the probability that this algorithm returns an admissible output value at every time instant is only (5/6)m for an input stream of length m and hence converges to 0 for \(m \to \infty \). Of course, in general, the situation is more complex since successive output values of a randomized sliding window algorithm are not independent.

In the following discussion, let us fix the error probability λ = 1/3 (using probability amplification, one can reduce λ to any constant > 0). In our recent paper [15] we studied the space complexity of the membership problem for regular languages with respect to λ-correct randomized sliding window algorithms. It turned out that in this setting, one can gain from randomization. Consider for instance the regular language ab over the alphabet {a,b}. Thus, the sliding window algorithm for window size n should output “yes”, if the current window content is abn− 1 and “no” otherwise. From our results in [13, 14], it follows that the optimal space complexity of a deterministic sliding window algorithm for the membership problem for ab is \(\varTheta (\log n)\). On the other hand, it is shown in [15] that there is an λ-correct randomized sliding window algorithm for ab with (worst-case) space complexity \(O(\log \log n)\) (this is also optimal). In fact, we proved in [15] that for every regular language L, the space optimal λ-correct randomized sliding window algorithm for L has either constant, doubly logarithmic, logarithmic, or linear space complexity, and the corresponding four space classes can be characterized in terms of simple syntactic properties.

Strict λ-correctness is used (without explicit mentioning) for instance in [7, 12].Footnote 1 In these papers, the lower bounds shown for deterministic sliding-window algorithms are extended with the help of Yao’s minimax principle [24] to strictly λ-correct randomized sliding-window algorithms. The main result from the first part of the paper states that this is a general phenomenon: we show that every strictly λ-correct sliding window algorithm for an approximation problem Φ can be derandomized without increasing the worst-case space complexity (Theorem 1). To the best of our knowledge, this is the first investigation on the general power of randomization on the space consumption of sliding window algorithms. We emphasize that our proof does not utilize Yao’s minimax principle, which would require the choice of a “hard” distribution of input streams specific to the problem. It remains open, whether such a hard distribution exists for every approximation problem.

We remark that the proof of Theorem 1 needs the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize it. In fact, we show that for a certain problem a restriction to polynomially long input streams yields an advantage of strictly correct randomized algorithms over deterministic ones, see Propositions 1 and 2. Whether such an advantage can be also obtained for input streams of length singly exponential in the window size remains open.

In the second part of the paper we come back to the problem of counting the number of 1’s in a sliding window [7, 12]. Datar et al. [12] proved a space lower bound of \(\varOmega (\frac {1}{\epsilon } \cdot \log ^{2} n)\) for approximating the number of 1’s in a sliding window of size n with a multiplicative error of 1 ± 𝜖. This lower bound is first shown for deterministic algorithms and then, using Yao’s minimax principle [24], extended to strictly λ-correct randomized sliding-window algorithms. We show that the same lower bound also holds for the wider class of λ-correct randomized sliding-window algorithms (Theorem 2). For the proof of this result we first show a lower bound for the one-way randomized communication complexity of the following problem: Alice holds m many bit numbers a1,…,am, and Bob holds an index 1 ≤ im and an -bit number b. The goal of Bob is to find out whether ai > b holds. We show that Alice has to transfer at least m/3 bits to Bob if the protocol has an error probability of at most 1/200 (Theorem 4). From this result we can derive Theorem 2 using ideas from the lower bound proof in [12].

Let us add further remarks on our sliding window model. First of all, it is crucial for our proofs that the input alphabet (i.e., the set of data values in the input stream) is finite. This is for instance the case when counting the number of 1’s in a 0/1-sliding window. On the other hand, the problem of computing the sum of all data values in a sliding window of arbitrary numbers (a problem that is considered in [12] as well) is not covered by our setting, unless one puts a bound on the size of the numbers in the input stream.

As a second remark, note that our sliding window model is non-uniform in the sense that for every window size we may have a different streaming algorithm. In other words: it is not required that there exists a single streaming algorithm that gets the window size as a parameter. Clearly, lower bounds get stronger when shown for the non-uniform model. Moreover, all proofs of lower bounds in the sliding window setting, we are aware of, hold for the non-uniform model.

2 Preliminaries

With [0,1] we denote the real interval \(\{ p \in \mathbb {R} : 0 \leq p \leq 1\}\) of all probabilities. With \(\log \) we always mean the logarithm to the base two.

The set of all words over a finite alphabet Σ is denoted by Σ. The empty word is denoted by ε. The length of a word w ∈Σ is denoted with |w|. The sets of words over Σ of length exactly, at most and at least n are denoted by Σn, Σn and Σn, respectively. In the context of streaming algorithms, we also use the term “stream” for words.

2.1 Approximation Problems

An approximation problem is a relation \(\varPhi \subseteq {\Sigma }^{*} \times Y\) where Σ is a finite alphabet and Y is a (possibly infinite) set of output values. The relation Φ associates with each word w ∈Σ a set of admissible or correct output values in Y. Typical examples include:

  • exact computation problems φ: ΣY where we identify φ with its graph Φ = {(w,φ(w)) : w ∈Σ}. A typical example is the mapping \(c_{1} \colon \{0,1\}^{*} \to \mathbb {N}\) where c1(w) is the number of 1’s in w. Another exact problem is given by the characteristic function \(\chi _{L} \colon {\Sigma }^{*} \to \{0,1\}\) of a language \(L \subseteq {\Sigma }^{*}\) (χL(w) = 1 if and only if wL).

  • approximation of some numerical value for the data stream, which can be modeled by a relation \(\varPhi \subseteq {\Sigma }^{*} \times \mathbb {N}\). A typical example would be the problem {(w,k) : (1 − 𝜖) ⋅ c1(w) ≤ k ≤ (1 + 𝜖) ⋅ c1(w)} for some 0 < 𝜖 < 1.

For a window length n ≥ 0 and a stream w ∈Σ we define lastn(w) to be the suffix of \(\square ^{n} w\) of length n where \(\square \in {\Sigma }\) is a fixed alphabet symbol. The word \(\text {last}_{n}(\varepsilon ) = \square ^{n}\) is also called the initial window. To every approximation problem \(\varPhi \subseteq {\Sigma }^{*} \times Y\) we associate the sliding window problem

$$ \text{SW}_{n}{(\varPhi)} = \{ (x,y) \in {\Sigma}^{*} \times Y : (\text{last}_{n}(x),y) \in {\varPhi} \} $$

for window length n.

2.2 Probabilistic Automata with Output

In the following we will introduce probabilistic automata [22, 23] as a model of randomized streaming algorithms which produce an output after each input symbol. A randomized streaming algorithm or a probabilistic automaton \(\mathcal {P} = (Q,{\Sigma },\iota ,\rho ,o)\) consists of a (possibly infinite) set of states Q, a finite alphabet Σ, an initial state distribution ι: Q → [0,1], a transition probability function ρ: Q ×Σ× Q → [0,1] and an output function o: QY such that

  • \({\sum }_{q \in Q} \iota (q) = 1\),

  • \({\sum }_{q \in Q} \rho (p,a,q) = 1\) for all pQ, a ∈Σ.

The space of the randomized streaming algorithm \(\mathcal {P}\) (or the number of bits used by \(\mathcal {P}\)) is given by \(s(\mathcal {P}) = \log |Q| \in \mathbb {R}_{\ge 0} \cup \{\infty \}\).

If ι and ρ map into {0,1}, then \(\mathcal {P}\) is a deterministic automaton; in this case we write \(\mathcal {P}\) as \(\mathcal {P} = (Q,{\Sigma },q_{0},\delta ,o)\), where q0Q is the initial state and δ: Q ×Σ→ Q is the transition function. A run on a word \(a_{1} {\cdots } a_{m} \in {\Sigma }^{*}\) in \(\mathcal {P}\) is a sequence \(\pi = (q_{0},a_{1},q_{1},a_{2},\dots ,a_{m},q_{m})\) where \(q_{0}, \dots , q_{m} \in Q\) and ρ(qi− 1,ai,qi) > 0 for all 1 ≤ im. If m = 0 we obtain the empty run (q0) starting and ending in q0. We write runs in the usual way

$$ \pi \colon q_{0} \xrightarrow{a_{1}} q_{1} \xrightarrow{a_{2}} {\cdots} \xrightarrow{a_{m}} q_{m} $$

or also omit the intermediate states: \(\pi \colon q_{0} \xrightarrow {a_{1} {\cdots } a_{m}} q_{m}\). We extend ρ to runs in the natural way: if \(\pi \colon q_{0} \xrightarrow {a_{1}} q_{1} \xrightarrow {a_{2}} {\cdots } \xrightarrow {a_{m}} q_{m}\) is a run in \(\mathcal {P}\) then \(\rho (\pi ) = {\prod }_{i=1}^{m} \rho (q_{i-1},a_{i},q_{i})\). Furthermore we define ρι(π) = ι(q0) ⋅ ρ(π). We denote by \(\text {Runs}(\mathcal {P},w)\) the set of all runs on w in \(\mathcal {P}\) and denote by \(\text {Runs}(\mathcal {P},q,w)\) those runs on w that start in qQ. If \(\mathcal {P}\) is clear from the context, we simply write Runs(w) and Runs(q,w). Notice that for each w ∈Σ the function ρι is a probability distribution on \(\text {Runs}(\mathcal {P},w)\) and for each qQ the restriction of ρ to \(\text {Runs}(\mathcal {P},q,w)\) is a probability distribution on \(\text {Runs}(\mathcal {P},q,w)\). If π is a set of runs (which will often be defined by a certain property of runs), then \(\Pr _{\pi \in \text {Runs}(w)}[\pi \in {\Pi }]\) denotes the probability \({\sum }_{\pi \in \text {Runs}(w) \cap {\Pi }} \rho _{\iota }(\pi )\) and \(\Pr _{\pi \in \text {Runs}(q,w)}[\pi \in {\Pi }]\) denotes \({\sum }_{\pi \in \text {Runs}(q,w) \cap {\Pi }} \rho (\pi )\).

2.3 Correctness definitions

Let \(\mathcal {P} = (Q,{\Sigma },\iota ,\rho ,o)\) be a randomized streaming algorithm with output function o: QY, let \(\varPhi \subseteq {\Sigma }^{*} \times Y\) be an approximation problem and let \(w = a_{1}a_{2} {\cdots } a_{m} \in {\Sigma }^{*}\) be an input stream. Furthermore let 0 ≤ λ ≤ 1 be an error probability.

  • A run \(\pi \colon q_{0} \xrightarrow {w} q_{m}\) in \(\mathcal {P}\) is correct for Φ if (w,o(qm)) ∈Φ. We say that \(\mathcal {P}\) is λ-correct for Φ if for all w ∈Σ we have

    $$ \underset{\pi \in \text{Runs}(w)}{\Pr} [\pi \text{ is correct for } {\varPhi}] \ge 1 - \lambda. $$
  • A run \(\pi \colon q_{0} \xrightarrow {a_{1}} q_{1} \xrightarrow {a_{2}} {\cdots } q_{m-1} \xrightarrow {a_{m}} q_{m}\) in \(\mathcal {P}\) on w is strictly correct for Φ if (a1at,o(qt)) ∈Φ for all 0 ≤ tm. We say that \(\mathcal {P}\) is strictlyλ-correct for Φ if for all w ∈Σ we have

    $$ \underset{\pi \in \text{Runs}(w)}{\Pr} [\pi \text{ is strictly correct for } {\varPhi}] \ge 1 - \lambda. $$

A (strictly) λ-correct randomized streaming algorithm \(\mathcal {P}_{n}\) for SWn(Φ) is also called a (strictly)λ-correct randomized sliding window algorithm for Φ and window size n. If \(\mathcal {P}_{n}\) is deterministic and (strictly) 0-correct, we speak of a deterministic sliding window algorithm for Φ and window size n. The reader might think of having for every window size n a sliding window algorithm \(\mathcal {P}_{n}\). We do not assume any uniformity here in the sense that the sliding window algorithms for different window sizes do not have to follow a common pattern. This is the same situation as in non-uniform circuit complexity, where one has for every input length n a circuit Cn and it is not required that the circuit Cn can be computed from n.

Remark 1

The trivial sliding window algorithm stores for window size n the window content with \(\lceil \log |{\Sigma }| \rceil \cdot n\) bits. Hence every approximation problem has a deterministic sliding window algorithm \(\mathcal {D}_{n}\) with \(s(\mathcal {D}_{n}) \le \lceil \log |{\Sigma }| \rceil \cdot n\). In particular, for every (strictly) λ-correct randomized sliding window algorithm \(\mathcal {P}_{n}\) for Φ and window size n, there exists a (strictly) λ-correct randomized sliding window algorithm \(\mathcal {P}_{n}^{\prime }\) for Φ and window size n such that

$$ s(\mathcal{P}_{n}^{\prime}) \le \min\{ s(\mathcal{P}_{n}), \lceil \log |{\Sigma}| \rceil \cdot n \}. $$

3 Derandomization of Strictly Correct Algorithms

In this section we prove the main result of this paper, which states that strictly correct randomized sliding window algorithms can be completely derandomized:

Theorem 1

Let \(\varPhi \subseteq {\Sigma }^{*} \times Y\) be an approximation problem, \(n \in \mathbb {N}\) be a window size and 0 ≤ λ < 1 be an error probability. For every randomized sliding window algorithm \(\mathcal {P}_{n}\) which is strictly λ-correct for Φ and window size n there exists a deterministic sliding window algorithm \(\mathcal {D}_{n}\) for Φ and window size n such that \(s(\mathcal {D}_{n}) \le s(\mathcal {P}_{n})\).

The proof idea is to successively construct a (doubly exponentially long) strictly correct run which reads all possible windows of length n from a certain subset of memory states. This strictly correct run then defines a deterministic algorithm which is always correct.

Let \({\varPhi } \subseteq {\Sigma }^{*} \times Y\), \(n \in \mathbb {N}\) be a window size and 0 ≤ λ < 1 as in Theorem 1. Let \(\mathcal {P}_{n}\) be a strictly λ-correct sliding window algorithm for Φ and window size n. By Remark 1, we can assume that \(\mathcal {P}_{n}\) has a finite state set. Consider a run

$$ \pi \colon q_{0} \xrightarrow{a_{1}} q_{1} \xrightarrow{a_{2}} {\cdots} \xrightarrow{a_{m}} q_{m} $$

in \(\mathcal {P}_{n}\). The run π is simple if qiqj for 0 ≤ i < jm. A subrun of π is a run

$$ q_{i} \xrightarrow{a_{i+1}} q_{i+1} \xrightarrow{a_{i+2}} {\cdots} q_{j-1} \xrightarrow{a_{j}} q_{j} $$

for some 0 ≤ ijm. Consider a nonempty subset \(S \subseteq Q\) and a function δ: Q ×Σ→ Q such that S is closed under δ, i.e., \(\delta (S \times {\Sigma }) \subseteq S\). We say that the run π is δ-conform if δ(qi− 1,ai) = qi for all 1 ≤ im. We say that π is (S,δ)-universal if for all qS and x ∈Σn there exists a δ-conform subrun \(\pi ^{\prime } \colon q \xrightarrow {x} q^{\prime }\) of π. Finally, π is δ-universal if it is (S,δ)-universal for some nonempty subset \(S \subseteq Q\) which is closed under δ.

Lemma 1

Let π be a strictly correct run in \(\mathcal {P}_{n}\) for Φ, let \(S \subseteq Q\) be a nonempty subset and let δ: Q ×Σ→ Q be a function such that S is closed under δ. If π is (S,δ)-universal, then there exists q0S such that \(\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)\) is a deterministic sliding window algorithm for Φ and window size n.

Proof

Let \(q_{0} = \delta (p,\square ^{n}) \in S\) for some arbitrary state pS and define \(\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)\). Let w ∈Σ and consider the run \(\sigma \colon p \xrightarrow {\square ^{n}} q_{0} \xrightarrow {w} q\) in \(\mathcal {D}_{n}\) of length ≥ n. We have to show that (lastn(w),o(q)) ∈Φ. We can write \(\square ^{n} w = x \text {last}_{n}(w)\) for some x ∈Σ. Hence, we can rewrite the run σ as \(\sigma \colon p \xrightarrow {x} q^{\prime } \xrightarrow {\text {last}_{n}(w)} q\). We know that \(q^{\prime } \in S\) because S is closed under δ. Since π is (S,δ)-universal, it contains a subrun \(q^{\prime } \xrightarrow {\text {last}_{n}(w)} q\). Strict correctness of π implies (lastn(w),o(q)) ∈Φ. □

For the rest of this section we fix an arbitrary function δ: Q ×Σ→ Q such that for all qQ, a ∈Σ,

$$ \rho(q,a,\delta(q,a)) = \max \{ \rho(q,a,p) \colon p \in Q \} . $$

Thus, we choose δ(q,a) as a most likely a-successor of q. Note that

$$ \rho(q,a,\delta(q,a)) \geq \frac{1}{|Q|} $$
(1)

for all qQ, a ∈Σ. Furthermore, let \(\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)\) where the initial state q0 will be defined later. We inductively define for each i ≥ 1 a state pi, a run \(\pi _{i}^{*}\) in \(\mathcal {D}_{n}\) on some word \(w_{i} \in {\Sigma }^{*}\), and a non-empty set \(S_{i} \subseteq Q\), which is closed under δ. For m ≥ 0, we abbreviate \(\text {Runs}(\mathcal {P}_{n},w_{1} {\cdots } w_{m})\) by Rm. Note that \(R_{0} = \text {Runs}(\mathcal {P}_{n}, \varepsilon )\). For 1 ≤ im let Hi denote the event that for a random run π = π1πmRm, where each πj is a run on wj, the subrun πi is (Si,δ)-universal. Notice that Hi is independent of mi.

First, we choose for pi (i ≥ 1) a state that maximizes the probability

$$ \underset{\pi \in R_{i-1}}{\Pr}[\pi \text{ ends in } p_{i} \mid \forall 1 \leq j \leq i-1 : \overline{H_{j}} ], $$

which is at least 1/|Q|. Note that p1 is a state such that ι(p1) is maximal, since R0 only consists of empty runs (q). For Si we take any maximal strongly connected component of \(\mathcal {D}_{n}\) (viewed as a directed graph) which is reachable from pi. As usual, strongly connected component means that for all p,qSi the state p is reachable from q and vice versa. Maximality means that for every qSi and every a ∈Σ, also δ(q,a) belongs to Si, i.e., Si is closed under δ. Note that such a δ-closed strongly connected component must exist since Q is finite. Finally, we define the run \(\pi ^{*}_{i}\) and the word wi. The run \(\pi ^{*}_{i}\) starts in pi. Then, for each pair \((q,x) \in S_{i} \times {\Sigma }^{n}\) the run \(\pi ^{*}_{i}\) leads from the current state to state q via a simple run and then reads the word x from q. The order in which we go over all pairs \((q,x) \in S_{i} \times {\Sigma }^{n}\) is not important. Since Si is a maximal strongly connected component of \(\mathcal {D}_{n}\) such a run \(\pi ^{*}_{i}\) exists. Hence, \(\pi ^{*}_{i}\) is a run on a word

$$ w_{i} = \underset{q \in S_{i}}{\prod} \underset{x \in {\Sigma}^{n}}{\prod} y_{q,x} x, $$

where yq,x is the word that leads from the current state via a simple run to state q. Since we choose the runs on the words yq,x to be simple, we have |yq,x|≤|Q| and thus |wi|≤|Q|⋅|Σ|n ⋅ (|Q| + n). Let us define

$$ \mu = \frac{1}{|Q|^{|Q| \cdot |{\Sigma}|^{n} \cdot (|Q|+n) + 1}} . $$
(2)

Note that by construction, the run \(\pi ^{*}_{i}\) is (Si,δ)-universal. Inequality (1) yields

$$ \underset{\pi \in \text{Runs}(p_{i},w_{i})}{\Pr} [\pi = \pi^{*}_{i}] \ge \frac{1}{|Q|^{|w_{i}|}} \ge \mu \cdot |Q|. $$
(3)

Lemma 2

For all m ≥ 0 we have \(\Pr _{\pi \in R_{m}}[H_{m} \mid \forall i\leq m-1: \overline {H_{i}}] \geq \mu \).

Proof

In the following, let π be a random run from Rm and let πi be the subrun on wi. Under the assumption that the event [πm− 1 ends in pm] holds, the events \([\pi _{m} = \pi ^{*}_{m}]\) and \([\forall i \leq m-1: \overline {H_{i}}]\) are conditionally independent.Footnote 2 Thus, we have

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m} \wedge \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m}] . \end{array} $$

Since the event \([\pi _{m} = \pi ^{*}_{m}]\) implies the event [πm− 1 ends in pm] (recall that \(\pi ^{*}_{m}\) starts in pm) and \(\pi ^{*}_{m}\) is (Sm,δ)-universal, we obtain:

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &\ge& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \wedge \pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \mid \pi_{m-1} \text{ ends in } p_{m} \wedge \forall i \leq m-1: \overline{H_{i}}] \cdot \\ && \underset{\pi \in R_{m}}{\Pr}[\pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m}] \cdot \\ && \underset{\pi \in R_{m}}{\Pr}[\pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &\ge & \underset{\pi_{m} \in \text{Runs}(p_{m},w_{m})}{\Pr} [\pi_{m} = \pi^{*}_{m}] \cdot \frac{1}{|Q|}\\ & \geq & \mu, \end{array} $$

where the last inequality follows from (3). This proves the lemma. □

Lemma 3

\(\Pr _{\pi \in R_{m}}[\pi is \delta -universal] \ge \Pr _{\pi \in R_{m}}[\exists i \leq m: H_{i}] \ge 1 - (1-\mu )^{m}\).

Proof

The first inequality follows from the definition of the event Hi. Moreover, with Lemma 2 we get

$$ \begin{array}{@{}rcl@{}} \underset{\pi \in R_{m}}{\Pr}[\exists i \leq m : H_{i}] &=& \underset{\pi \in R_{m}}{\Pr}[\exists i \leq m-1: H_{i}] + \\ &&\underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \cdot \underset{\pi \in R_{m}}{\Pr}[\forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m-1}}{\Pr}[\exists i \leq m-1: H_{i}] + \\ &&\underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \cdot \Pr_{\pi \in R_{m-1}}[\forall i \leq m-1: \overline{H_{i}}] \\ &\ge& \underset{\pi \in R_{m-1}}{\Pr}[\exists i \leq m-1: H_{i}] + \mu \cdot \Pr_{\pi \in R_{m-1}}[\forall i \leq m-1: \overline{H_{i}}]. \end{array} $$

Thus, \(r_{m} := \Pr _{\pi \in R_{m}}[\exists i \leq m : H_{i}]\) satisfies rmrm− 1 + μ ⋅ (1 − rm− 1) = (1 − μ) ⋅ rm− 1 + μ. Since r0 = 0, we get rm ≥ 1 − (1 − μ)m by induction. □

We can now show our main theorem:

Proof Proof (Theorem 1)

We use the probabilistic method in order to show that there exists q0Q such that \(\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)\) is a deterministic sliding window algorithm for Φ. With Lemma 3 we get

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is strictly correct for $\text{SW}_{n}(\varPhi)$ and $\delta$-universal}] \\ & = & 1 - \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is not strictly correct for $\text{SW}_{n}(\varPhi)$ or is not $\delta$-universal}] \\ & \ge & 1 - \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is not strictly correct for $\text{SW}_{n}(\varPhi)$}] - \Pr_{\pi \in R_{m}}[\pi \text{ is not $\delta$-universal}] \\ &\ge & \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is $\delta$-universal}] - \lambda \\ &\ge & 1 - (1-\mu)^{m} - \lambda . \end{array} $$

We have 1 − (1 − μ)mλ > 0 for \(m > \log (1-\lambda ) / \log (1-\mu )\) (note that λ < 1 and 0 < μ < 1 since we can assume that |Q|≥ 2). Hence there are m ≥ 0 and a strictly correct δ-universal run πRm. We can conclude with Lemma 1. □

4 Polynomially Long Streams

The word w1w2wm (with \(m > \log (1-\lambda ) / \log (1-\mu )\)) from the previous section, for which there exists a strictly correct and δ-universal run has a length that is doubly exponential in the window size n. To see this note that \(0 > \ln (1-x) \ge x/(x-1)\) for 0 ≤ x < 1, which implies

$$ m > \frac{\log(1-\lambda)}{\log(1-\mu)} = \frac{\ln(1-\lambda)}{\ln(1-\mu)} \ge \ln(1-\lambda) (\mu-1) \cdot \frac{1}{\mu} . $$

Here, \(\ln (1-\lambda )\) is a negative constant and μ − 1 is very close to − 1. Moreover, 1/μ grows doubly exponential in n by (2).

In other words: We need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize the algorithm. In this section we show that at least we cannot reduce the length to poly(n): if we restrict to inputs of length poly(n) then strictly λ-correct sliding window algorithms can yield a proper space improvement over deterministic sliding window algorithms.

For a word w = a1an let wR = ana1 denote the reversed word. Take the language \(K_{\text {pal}} = \{ ww^{\mathsf {R}} : w \in \{a,b\}^{*} \}\) of all palindromes of even length, which belongs to the class DLIN of deterministic linear context-free languages [6], and let L = $Kpal. As explained in Section 2.1 we identify L with the (exact) approximation problem \(\chi _{L} \colon \{a,b,\$\}^{*} \to \{0,1\}\) where χL(w) = 1 if and only if wL. We write Ln for SWn(χL). Note that the following proposition holds for arbitrarily long input streams.

Proposition 1

Any deterministic sliding window algorithm for L and window size 2n + 1 uses Ω(n) space.

Proof

Let \(\mathcal {D}_{2n+1}\) be a deterministic sliding window algorithm for L and window size 2n + 1, and take two distinct words $x and $y where x,y ∈{a,b}n. Since \(\mathcal {D}_{2n+1}\) accepts $xxR and rejects $yxR, the algorithm \(\mathcal {D}_{2n+1}\) reaches two different states on the inputs $x and $y. Therefore, D2n+ 1 must have at least |{a,b}n| = 2n states and hence Ω(n) space. □

Proposition 2

Fix a polynomial p(n) and let \(n \in \mathbb {N}\) be a window size. If n is large enough, there is a randomized streaming algorithm \(\mathcal {P}_{n}\) with \(s(\mathcal {P}_{n}) \leq \mathcal {O}(\log n)\) such that

$$ \underset{\pi \in \text{Runs}(\mathcal{P}_{n},w)}{\Pr} [\pi \text{ is strictly correct for } L_{n} ] \ge 1 - 1/n $$

for all input words w ∈Σ with |w|≤ p(n).

Proof

Babu et al. [6] have shown that for every language KDLIN there exists a randomized streaming algorithm using space \(\mathcal {O}(\log n)\) which, given an input v of length n,

  • accepts with probability 1 if vK,

  • and rejects with probability at least 1 − 1/n if vK.

We use this statement for the language KpalDLIN. We remark that the algorithm needs to know the length of v in advance. To stay consistent with our definition, we view the above algorithm as a family \((\mathcal {S}_{n})_{n \ge 0}\) of randomized streaming algorithms \(\mathcal {S}_{n}\). Furthermore, the error probability 1/n can be further reduced to 1/(n + 1)d where d is chosen such that p(n) ≤ nd for sufficiently large n (by picking random primes of size Θ(nd+ 1) in the proof from [6]).

Now we prove our claim for L = $Kpal. The streaming algorithm \(\mathcal {P}_{n}\) for window size n works as follows: After reading a $-symbol, the algorithm \(\mathcal {S}_{n-1}\) from above is simulated on the longest factor from {a,b} that follows (i.e. \(\mathcal {S}_{n-1}\) is simulated until the next $ arrives). Simultaneously we maintain the length of the maximal suffix over {a,b}, up to n, using \(\mathcal {O}(\log n)\) bits. If reaches n − 1, then \(\mathcal {P}_{n}\) accepts if and only if \(\mathcal {S}_{n-1}\) accepts. Notice that \(\mathcal {P}_{n}\) only errs if the stored length is n − 1, which happens at most once in every n steps. Therefore the number of time instants where \(\mathcal {P}_{n}\) errs on an input stream w of length |w|≤ p(n) ≤ nd is at most |w|/nnd/n = nd− 1 (if n is large enough). Moreover, at each of these time instants the error probability is at most 1/nd. By the union bound we have for every stream w ∈{$,a,b}p(n):

$$ \underset{\pi \in \text{Runs}(\mathcal{P}_{n},w)}{\Pr} {[} \pi \text{ is not strictly correct for } L_{n} {]} \le n^{d-1} \cdot \frac{1}{n^{d}} = 1/n. $$

This concludes the proof. □

5 Lower Bound for Basic Counting

For an approximation error 𝜖 > 0 let us define the basic counting problem

$$ C_{1,\epsilon} = \{ (w,m) \in \{0,1\}^{*} \times \mathbb{N} : (1-\epsilon) \cdot c_{1}(w) \leq m \leq (1+\epsilon) \cdot c_{1}(w)\} $$

where c1(w) denotes the number of 1’s in w. In [12] Datar, Gionis, Indyk and Motwani prove that any strictly λ-correct randomized sliding window algorithm for C1,𝜖 and window size n must use \(\frac {k}{64} \log ^{2} \frac {n}{k} - \log (1-\lambda )\) bits where k = ⌊1/𝜖⌋. We adapt their proof to show that the lower bound also holds for the weaker notion of λ-correct randomized sliding window algorithms.

Theorem 2

Let 𝜖 > 0 and k = ⌊1/𝜖⌋. Every 1/200-correct randomized sliding window algorithm for C1,𝜖 and window size n ≥ 4k must use \(\frac {k}{48} \log ^{2}(\frac {n}{k})\) many bits.

In the statement above we can assume any algorithm with error probability λ < 1/2 using the median trick, see e.g. [2]: We run m copies of the algorithm in parallel and output the median of their outputs. Using the Chernoff bound we can choose m such that the median is a correct 𝜖-approximation with error probability 1/200. This reduces the space lower bound only by a constant.

For the rest of the section let us fix \(n \in \mathbb {N}\) and 0 < 𝜖 < 1. Furthermore set k = ⌊1/𝜖⌋. For the proof we use a reduction from a suitable communication problem. Let f : A × B →{0,1} be a function. A (one-round communication public-coin) protocolP = (mA,mB) with cost c consists of two functions \(m_{A} \colon A \times R \to \{0,1\}^{c}\) and \(m_{B} \colon \{0,1\}^{c} \times B \times R \to \{0,1\}\). Here R is a finite set of random choices equipped with a probability distribution. Given inputs aA and bB the protocol computes the random output P(a,b) = mB(mA(a,r),b,r) where rR at random. It computes f with error probability λ < 1/2 if

$$ \underset{r \in R}{\Pr} [P(a,b) \neq f(a,b)] \le \lambda $$

for all aA,bB. If |R| = 1 then P is deterministic.

We define the communication problem GT,m where Alice is given m many -bit numbers \(a^{(1)}, \dots , a^{(m)}\), Bob is given a single -bit number b and an index 1 ≤ pm, and the goal is to decide whether a(p) > b. Formally, we view GT,m as a function

$$ \text{GT}_{\ell,m} \colon (\{0,1\}^{\ell})^{m} \times (\{0,1\}^{\ell} \times \{1, \dots, m\}) \to \{0,1\}. $$

If m = 1 we write GT = GT,1.

Proposition 3

Let \(B = \sqrt {nk}\) such that n ≥ 4k, and \(j = \lfloor \log \frac {n}{B}\rfloor \). If \(\mathcal {P}_{n}\) is a λ-correct sliding window algorithm for C1,𝜖 and window size n then there exists a one-round protocol for \(\text {GT}_{\log \frac {4B}{k}, \frac {jk}{4}}\) with cost \(s(\mathcal {P}_{n})\) and error probability λ.

Proof

In the following we ignore rounding issues. The idea is that Alice encodes her jk/4 many numbers by a bit stream consisting of jk/4 groups and feeding it into the sliding window algorithm. Then Bob can compare his number b with any of Alice’s numbers a(i) with high probability, by sliding the window to the appropriate position.

As in [12] we partition the window of length n into j blocks of size

$$B, 2B, 4B, \dots, 2^{j-1} B$$

from right to left where \(j = \lfloor \log \frac {n}{B}\rfloor \). Notice that j ≥ 1 by our assumption that n ≥ 4k. The blocks are numbered 0 to j − 1 from right to left. The i-th block of length 2iB is divided into B many subblocks of length 2i. Each block is divided into k/4 groups consisting of 4B/k contiguous subblocks. In the following we choose from every group exactly one subblock which is filled with 1’s; the remaining subblocks in the group are filled with 0’s. An example is shown in Figure 1.

Fig. 1
figure 1

A single block consisting of B = 12 subblocks divided into k/4 = 3 groups. The groups encode the numbers 2, 1, and 3 (from left to right)

Let \(M = \{1, \dots , 4B/k\}\). We will encode a tuple \(\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)}) \in M^{jk/4}\) as a bit string of length n in unary encoding fashion as follows: For aM and 0 ≤ ij − 1 define the bit string

$$ u_{i}(a) = (0^{2^{i}})^{4B/k -a} 1^{2^{i}} (0^{2^{i}})^{a-1} $$

of length 2i ⋅ 4B/k. For a tuple \(\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)})\) over M of length jk/4 we define the arrangementw(a) ∈{0,1}n by

$$ w(\mathbf{a}) = \prod\limits_{i=0}^{j-1} \prod\limits_{r = 1}^{k/4} u_{i}(a^{(ik/4 + r)}) $$

where both concatenations are interpreted from right to left.

Datar et al. [12] argue that for any two distinct tuples a and b, the arrangements w(a) and w(b) must be distinguished by a deterministic sliding window algorithm for C1,𝜖 and window length n.

We will present a communication protocol for \(\text {GT}_{\log \frac {4B}{k}, \frac {jk}{4}}\) based on a λ-correct sliding window algorithm \(\mathcal {P}_{n}\) for C1,𝜖. Notice that \(\log \frac {4B}{k} \ge 1\) by the assumption that n ≥ 4k. Suppose that Alice holds the tuple \(\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)})\) of numbers from M, Bob holds bM and an index 1 ≤ pjk/4. Their goal is to determine whether ap > b. The protocol is defined as follows: Alice simulates \(\mathcal {P}_{n}\) on w(a) and sends the reached state to Bob, using \(s(\mathcal {P}_{n})\) bits. Suppose that p = ik/4 + r for some 0 ≤ ij − 1 and 1 ≤ rk/4. Bob then insert a suitable number of 0’s in the stream such that the length-n window starts with the b-th subblock from the r-th group in the i-th block of w(a). Notice that this is possible without knowing the tuple a because of the regular structure of arrangements which is known to Bob. The number of 1-bits in the obtained window is precisely

  • \(r \cdot 2^{i} + \frac {k}{4} (2^{i-1} + {\dots } + 2^{1} + 2^{0})\) if apb, and

  • \((r-1) \cdot 2^{i} + \frac {k}{4} (2^{i-1} + {\dots } + 2^{1} + 2^{0})\) if ap > b.

Since the absolute approximation error is bounded by

$$ \epsilon \frac{k}{4} (2^{i} + {\dots} + 2^{1} + 2^{0}) < \frac{2^{i+1}}{4} = 2^{i-1}, $$

the two cases above can be distinguished by \(\mathcal {P}_{n}\) with probability 1 − λ. □

It remains to prove a lower bound for the one-round communication complexity of GT,m. We start by showing that the one-round communication complexity of GTn is Ω(n). This was already proven by Yao [25, Theorem 5]. More generally, Miltersen et al. showed that any r-round protocol for GTn requires Ω(n1/r) bits using the round elimination technique [21]. We will first reprove the Ω(n) lower bound for GTn, by directly plugging in r = 1 and GTn into the proof of [21, Lemma 11]. Afterwards we adapt the proof to show the Ω(m) lower bound for GT,m.

Theorem 3

Every one-round randomized protocol for GTn with error probability 1/200 has cost at least n/3 bits.

Proof

We follow the proof of [21, Lemma 11]. Consider a randomized one-round protocol for GTn with error probability 1/200. The goal is to prove that the protocol must use n/3 bits.

By Yao’s minimax principle [24] it suffices to exhibit a “hard” input distribution D on the set of inputs {0,1}n ×{0,1}n and to prove that every deterministic protocol P with \(\Pr _{D} [P(x,y) \neq \text {GT}_{n}(x,y)] \le 1/200\) must have cost Ω(n). See [20] for similar applications of Yao’s minimax principle in the area of communication complexity.

For a bit string x = x1xn and an index 1 ≤ in we define the bit string τi(x) = x1xi− 101ni of length n. Interpreted as binary numbers, we have the property

$$ x > \tau_{i}(x) \iff x_{i} = 1. $$
(4)

The “hard” input distribution D is the uniform distribution on

$$ \{ (x,\tau_{i}(x)) \mid x \in \{0,1\}^{n}, 1 \le i \le n \}. $$

In other words, Alice holds a uniformly random string x ∈{0,1}n and Bob holds τi(x) where the index 1 ≤ in is also chosen uniformly at random and independently from x. By property (4) Bob needs to determine the value of xi. Intuitively, the prefix x1xi− 1 of τi(x) does not help Bob, so this is basically the “index”-function, for which every one-round randomized protocol with error probability 1/3 has cost Ω(n) [19, Theorem 3.7].

Consider any deterministic protocol P with communication cost c such that

$$ \underset{D}{\Pr}[P \text{ errs on } (x,\tau_{i}(x))] \le 0.005. $$

Call an index i in x bad if P errs on (x,τi(x)), and otherwise good. A uniformly random string x ∈{0,1}n has at most 0.005n bad indices in expectation. By the Markov inequality we know

$$ \underset{x \in \{0,1\}^{n}}{\Pr}[\text{number of bad indices of } x \ge 0.01n] \le \frac{0.005n}{0.01n} = \frac{1}{2}. $$

Hence the set

$$ R = \{ x \in \{0,1\}^{n} \mid x \text{ has at most } 0.01n \text{ bad indices} \} $$

must contain at least 2n− 1 bit strings. Since Alice sends at most c bits, she partitions R into at most 2c subsets according to the bit string send to Bob. Let T be one of these subsets that has maximum cardinality. We have |T|≥|R|/2c = 2n− 1−c, i.e.,

$$ c \ge n - 1 - \log |T|. $$
(5)

To prove cn/3 we derive an upper bound on |T|.

In the following we successively construct a sequence of bits \(a_{1}, \dots , a_{n}\) and a sequence of nonempty sets \(T = T_{1} \supseteq T_{2} \supseteq {\cdots } \supseteq T_{n+1}\) of n-bit strings such that all strings in Ti have the prefix a1ai− 1 (in particular, |Tn+ 1| = 1).

  1. 1.

    Set T1 := T and repeat the following for \(i = 1, \dots , n\):

  2. 2.

    Set \(T_{i}^{-} := \{ x \in T_{i} \mid i \text { is bad in } x \}\) and \(T_{i}^{+} := \{ x \in T_{i} \mid i \text { is good in } x \}\).

  3. 3.

    If \(|T^{-}_{i}| \ge 0.05 \cdot |T_{i}|\) choose ai ∈{0,1} such that \(|\{ x \in T_{i}^{-} \mid x_{i} = a_{i} \}|\) is maximal. Then set \(T_{i+1} := \{ x \in T_{i}^{-} \mid x_{i} = a_{i} \}\).

  4. 4.

    Otherwise we have \(|T_{i}^{+}| \ge 0.95 \cdot |T_{i}|\). All strings \(x \in T_{i}^{+}\) must have the same i-th bit, say xi = ai, since for a string \(x \in T_{i}^{+}\) Bob outputs correctly the bit xi on input (x,τi(x)). But since all strings in \(T_{i}^{+}\) have the same prefix of length i − 1 (and hence yield the same value under τi) and Alice communicates by definition of T the same message to Bob, Bob outputs the same bit for all \(x \in T_{i}^{+}\). Set \(T_{i+1} := T_{i}^{+}\).

Observe that all subsets \(T_{1}, \dots , T_{n+1}\) are nonempty. If point 3 is satisfied then |Ti+ 1|≥ 0.5 ⋅ 0.05|Ti| = 0.025|Ti|, and the index i is bad in all strings of Ti+ 1. Hence, point 3 can only be satisfied at most 0.01n times by definition of R. If point 4 is satisfied then |Ti+ 1|≥ 0.95|Ti|. We can therefore bound

$$ 1 = |T_{n+1}| \ge 0.025^{0.01n} \cdot 0.95^{0.99n} \cdot |T| > 0.916^{n} \cdot |T|, $$

and thus |T|≤ 1.0917n. By (5) we have established

$$ c \ge (1 - \log 1.0917) n - 1 \ge 0.8735 n - 1 \ge n/3. $$

for all n ≥ 2. Clearly in the case n = 1 there is no zero-message randomized protocol for GT1. □

Theorem 4

Every one-round protocol for GT,m with error probability 1/200 has cost at least m/3 bits.

Proof

We adapt the proof above to GT,m. To keep the notation consistent we view Alice’s input as a single bit string x = x(1)x(m) ∈{0,1}m where \(x^{(1)}, \dots , x^{(m)} \in \{0,1\}^{\ell }\). We can write an index 1 ≤ im uniquely as i = (p(i) − 1) + r(i) where 1 ≤ p(i) ≤ m and 1 ≤ r(i) ≤ . For 1 ≤ im we define the bit string

$$ \tau_{i}(x) = x^{(p(i))}_{1} {\dots} x^{(p(i))}_{r(i)-1} 0 1^{\ell-r(i)} $$

of length . We have the property that

$$ \text{GT}_{\ell,m}(x, \tau_{i}(x), p(i)) = 1 \iff x^{(p(i))}_{r(i)} = 1. $$

The hard input distribution is the uniform distribution D over

$$ \{ (x, \tau_{i}(x),p(i)) : x \in \{0,1\}^{\ell m}, 1 \le i \le \ell m \}. $$

Let P be a deterministic protocol with communication cost c such that

$$ \underset{D}{\Pr}[P \text{ errs on } (x,\tau_{i}(x),p(i))] \le 0.005. $$

We call an index 1 ≤ im bad if P errs on (x,τi(x),p(i)). Again we can find a set \(T \subseteq \{0,1\}^{\ell m}\) such that

  • |T|≥ 2m− 1−c,

  • all xT have at most 0.01m bad indices,

  • and Alice sends the same message on all xT.

Using precisely the same arguments as in the previous proof we obtain |T|≤ 1.0917m and thus cm/3 whenever m ≥ 2. If m = 1 then Alice must also send at least one bit in any communication protocol for GT,m with error probability 1/200. □

Theorem 2 now follows from Proposition 3 and Theorem 4: Let \(B = \sqrt {nk}\) and \(j = \lfloor \log \frac {n}{B}\rfloor \ge 1\). Then, any randomized 1/200-correct sliding window algorithm for C1,𝜖 and window size n must use at least

$$ \frac{1}{3} \log\bigg(\frac{4B}{k}\bigg) \cdot \frac{jk}{4} = \frac{1}{3} \log\bigg(4\sqrt{\frac{n}{k}}\bigg) \cdot \bigg\lfloor\log\bigg(\sqrt{\frac{n}{k}}\bigg)\bigg\rfloor \cdot \frac{k}{4} \ge \frac{k}{48} \cdot \log^{2} \bigg(\frac{n}{k}\bigg) $$

many bits.

6 Open Problems

In the proof of Theorem 1 we need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability. We pose the question whether this can be reduced to exponentially long streams.

Another open problem is whether one can extend Theorem 4 to arbitrary communication problems. For any function f : A × B →{0,1} one can define an “indexed” version \(f^{(m)} \colon A^{m} \times B \times \{1, \dots , m\} \to \{0,1\}\) where Alice holds a tuple \((a_{1}, \dots , a_{m}) \in A^{m}\), Bob holds bB and 1 ≤ im and their goal is to compute f(ai,b). The question is whether the one-round communication complexity of f(m) must be m times as large as the complexity of f, as it is the case for GTn.