Derandomization for Sliding Window Algorithms with Strict Correctness∗

Ganardi, Moses; Hucke, Danny; Lohrey, Markus

doi:10.1007/s00224-020-10000-1

Derandomization for Sliding Window Algorithms with Strict Correctness^∗

Open access
Published: 14 August 2020

Volume 65, pages 1–18, (2021)
Cite this article

Download PDF

You have full access to this open access article

Theory of Computing Systems Aims and scope Submit manuscript

Derandomization for Sliding Window Algorithms with Strict Correctness^∗

Download PDF

2052 Accesses
6 Citations
Explore all metrics

Abstract

In the sliding window streaming model the goal is to compute an output value that only depends on the last n symbols from the data stream. Thereby, only space sublinear in the window size n should be used. Quite often randomization is used in order to achieve this goal. In the literature, one finds two different correctness criteria for randomized sliding window algorithms: (i) one can require that for every data stream and every time instant t, the algorithm computes a correct output value with high probability, or (ii) one can require that for every data stream the probability that the algorithm computes at every time instant a correct output value is high. Condition (ii) is stronger than (i) and is called “strict correctness” in this paper. The main result of this paper states that every strictly correct randomized sliding window algorithm can be derandomized without increasing the worst-case space consumption.

Derandomization for Sliding Window Algorithms with Strict Correctness

On Resource-Bounded Versions of the van Lambalgen Theorem

On Algorithmic Statistics for Space-bounded Algorithms

Article 06 February 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sliding window streaming algorithms process an input sequence a₁a₂⋯a_m from left to right and receive at time t the symbol a_t as input. Such algorithms are required to compute at each time instant t a value f(a_t−n+ 1⋯a_t) that depends on the n last symbols (we should assume t ≥ n here). The value n is called the window size and the sequence a_t−n+ 1⋯a_t is called the window content at time t. In many applications, data items in a stream are outdated after a certain time, and the sliding window model is a simple way to model this. A typical application is the analysis of a time series as it may arise in network monitoring, healthcare and patient monitoring, and transportation grid monitoring [3].

A general goal in the area of sliding window algorithms is to avoid the explicit storage of the window content, and, instead, to work in considerably smaller space, e.g. space polylogarithmic in the window size. In the seminal paper of Datar, Gionis, Indyk and Motwani [12], where the sliding window model was introduced, the authors prove that the number of 1’s in a 0/1-sliding window of size n can be maintained in space $O(\frac {1}{\epsilon } \cdot \log ^{2} n)$ if one allows a multiplicative error of 1 ± 𝜖. Also a matching lower bound is shown. Other algorithmic problems that were addressed in the extensive literature on sliding window streams include the computation of statistical data (e.g. computation of the variance and k-median [5], and quantiles [4]), optimal sampling from sliding windows [9], membership problems for formal languages [13,14,15,16], computation of edit distances [10], database querying (e.g. processing of join queries over sliding windows [18]) and graph problems (e.g. checking for connectivity and computation of matchings, spanners, and minimum spanning trees [11]). The reader can find further references in [1, Chapter 8] and [8].

Many of the above mentioned papers deal with sliding window algorithms that only compute a good enough approximation of the exact value of interest. In fact, even for very simple sliding window problems it is unavoidable to store the whole window content. Examples are the exact computation of the number of 1’s [12] or the computation of the first symbol of the sliding window for a 0/1-data stream [14]. In this paper, we consider a general model for sliding window approximation problems, where a (possibly infinite set) of admissible output values is fixed for each word. To be more accurate, a specific approximation problem is described by a relation ${\varPhi } \subseteq {\Sigma }^{*} \times Y$ which associates to words over a finite alphabet Σ (the set of data values in the stream) admissible output values from a possibly infinite set Y. A sliding window algorithm for such a problem is then required to compute at each time instant an admissible output value for the current window content. This model covers exact algorithms (where Φ is a function Φ: Σ^∗ → Y) as well as a wide range of approximation algorithms. For example the computation of the number of 1’s in a 0/1-sliding window with an allowed multiplicative error of 1 ± 𝜖 is covered by our model, since for a word with k occurrences of 1, the admissible output values are the integers between (1 − 𝜖)k and (1 + 𝜖)k.

A second ingredient of many sliding window algorithms is randomization. Following our recent work [13,14,15] we model a randomized streaming algorithm for a given approximation problem as a probabilistic automaton over a finite alphabet. Probabilistic automata were introduced by Rabin [23] and can be seen as a common generalization of deterministic finite automata and Markov chains. The basic idea is that for every state q and every input symbol a, the next state is chosen according to some probability distribution. As an extension to the classical model of Rabin, states in a probabilistic automaton are not accepting or rejecting but are associated with output values from the set Y. This allows to associate with every input word w ∈Σ^∗ and every output value y ∈ Y the probability that the automaton outputs y on input w. In order to solve a specific approximation problem $\varPhi \subseteq {\Sigma }^{*} \times Y$ in the sliding window model one should require that for a given window size n, a probabilistic automaton $\mathcal {P}_{n}$ should have a small error probability λ (say λ = 1/3) on every input stream. But what does the latter exactly mean? Two different definitions can be found in the literature:

For every input stream w = a₁⋯a_m, the probability that $\mathcal {P}_{n}$ outputs on input w a value y ∈ Y with (a_m−n+ 1⋯a_m,y)∉Φ is at most λ. Clearly an equivalent formulation is that, for all input streams w = a₁⋯a_m and all 0 ≤ t ≤ m the probability that $\mathcal {P}_{n}$ outputs on input $a_{1} {\dots } a_{t}$ a value y ∈ Y with (a_t−n+ 1⋯a_t,y)∉Φ is at most λ. In this case, we say that $\mathcal {P}_{n}$ is λ-correct for Φ and window size n.
For every input stream w = a₁⋯a_m, the probability that $\mathcal {P}_{n}$ outputs at some time instant t (n ≤ t ≤ m) a value y ∈ Y with (a_t−n+ 1⋯a_t,y)∉Φ is at most λ. In this case, we say that $\mathcal {P}_{n}$ is strictly λ-correct for Φ and window size n.

One can rephrase the difference between strict λ-correctness and λ-correctness as follows: λ-correctness means that while the randomized sliding window algorithm runs on an input stream it returns at each time instant an admissible output value with probability at least 1 − λ. In contrast, strict λ-correctness means that while the randomized sliding window algorithm reads an input stream, the probability that the algorithm returns an admissible output value at every time instant is at least 1 − λ. Obviously this makes a difference: imagine that Ω = {1,2,3,4,5,6} and that for every input word w ∈Σ^∗ the admissible output values are 2,3,4,5,6, then the algorithm that returns at every time instant the output of a fair dice throw is 1/6-correct. But the probability that this algorithm returns an admissible output value at every time instant is only (5/6)^m for an input stream of length m and hence converges to 0 for $m \to \infty $. Of course, in general, the situation is more complex since successive output values of a randomized sliding window algorithm are not independent.

In the following discussion, let us fix the error probability λ = 1/3 (using probability amplification, one can reduce λ to any constant > 0). In our recent paper [15] we studied the space complexity of the membership problem for regular languages with respect to λ-correct randomized sliding window algorithms. It turned out that in this setting, one can gain from randomization. Consider for instance the regular language ab^∗ over the alphabet {a,b}. Thus, the sliding window algorithm for window size n should output “yes”, if the current window content is ab^n− 1 and “no” otherwise. From our results in [13, 14], it follows that the optimal space complexity of a deterministic sliding window algorithm for the membership problem for ab^∗ is $\varTheta (\log n)$. On the other hand, it is shown in [15] that there is an λ-correct randomized sliding window algorithm for ab^∗ with (worst-case) space complexity $O(\log \log n)$ (this is also optimal). In fact, we proved in [15] that for every regular language L, the space optimal λ-correct randomized sliding window algorithm for L has either constant, doubly logarithmic, logarithmic, or linear space complexity, and the corresponding four space classes can be characterized in terms of simple syntactic properties.

Strict λ-correctness is used (without explicit mentioning) for instance in [7, 12].^{Footnote 1} In these papers, the lower bounds shown for deterministic sliding-window algorithms are extended with the help of Yao’s minimax principle [24] to strictly λ-correct randomized sliding-window algorithms. The main result from the first part of the paper states that this is a general phenomenon: we show that every strictly λ-correct sliding window algorithm for an approximation problem Φ can be derandomized without increasing the worst-case space complexity (Theorem 1). To the best of our knowledge, this is the first investigation on the general power of randomization on the space consumption of sliding window algorithms. We emphasize that our proof does not utilize Yao’s minimax principle, which would require the choice of a “hard” distribution of input streams specific to the problem. It remains open, whether such a hard distribution exists for every approximation problem.

We remark that the proof of Theorem 1 needs the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize it. In fact, we show that for a certain problem a restriction to polynomially long input streams yields an advantage of strictly correct randomized algorithms over deterministic ones, see Propositions 1 and 2. Whether such an advantage can be also obtained for input streams of length singly exponential in the window size remains open.

In the second part of the paper we come back to the problem of counting the number of 1’s in a sliding window [7, 12]. Datar et al. [12] proved a space lower bound of $\varOmega (\frac {1}{\epsilon } \cdot \log ^{2} n)$ for approximating the number of 1’s in a sliding window of size n with a multiplicative error of 1 ± 𝜖. This lower bound is first shown for deterministic algorithms and then, using Yao’s minimax principle [24], extended to strictly λ-correct randomized sliding-window algorithms. We show that the same lower bound also holds for the wider class of λ-correct randomized sliding-window algorithms (Theorem 2). For the proof of this result we first show a lower bound for the one-way randomized communication complexity of the following problem: Alice holds m many ℓ bit numbers a₁,…,a_m, and Bob holds an index 1 ≤ i ≤ m and an ℓ-bit number b. The goal of Bob is to find out whether a_i > b holds. We show that Alice has to transfer at least mℓ/3 bits to Bob if the protocol has an error probability of at most 1/200 (Theorem 4). From this result we can derive Theorem 2 using ideas from the lower bound proof in [12].

Let us add further remarks on our sliding window model. First of all, it is crucial for our proofs that the input alphabet (i.e., the set of data values in the input stream) is finite. This is for instance the case when counting the number of 1’s in a 0/1-sliding window. On the other hand, the problem of computing the sum of all data values in a sliding window of arbitrary numbers (a problem that is considered in [12] as well) is not covered by our setting, unless one puts a bound on the size of the numbers in the input stream.

As a second remark, note that our sliding window model is non-uniform in the sense that for every window size we may have a different streaming algorithm. In other words: it is not required that there exists a single streaming algorithm that gets the window size as a parameter. Clearly, lower bounds get stronger when shown for the non-uniform model. Moreover, all proofs of lower bounds in the sliding window setting, we are aware of, hold for the non-uniform model.

2 Preliminaries

With [0,1] we denote the real interval $\{ p \in \mathbb {R} : 0 \leq p \leq 1\}$ of all probabilities. With $\log $ we always mean the logarithm to the base two.

The set of all words over a finite alphabet Σ is denoted by Σ^∗. The empty word is denoted by ε. The length of a word w ∈Σ^∗ is denoted with |w|. The sets of words over Σ of length exactly, at most and at least n are denoted by Σⁿ, Σ^≤n and Σ^≥n, respectively. In the context of streaming algorithms, we also use the term “stream” for words.

2.1 Approximation Problems

An approximation problem is a relation $\varPhi \subseteq {\Sigma }^{*} \times Y$ where Σ is a finite alphabet and Y is a (possibly infinite) set of output values. The relation Φ associates with each word w ∈Σ^∗ a set of admissible or correct output values in Y. Typical examples include:

exact computation problems φ: Σ^∗→ Y where we identify φ with its graph Φ = {(w,φ(w)) : w ∈Σ^∗}. A typical example is the mapping $c_{1} \colon \{0,1\}^{*} \to \mathbb {N}$ where c₁(w) is the number of 1’s in w. Another exact problem is given by the characteristic function $\chi _{L} \colon {\Sigma }^{*} \to \{0,1\}$ of a language $L \subseteq {\Sigma }^{*}$ (χ_L(w) = 1 if and only if w ∈ L).
approximation of some numerical value for the data stream, which can be modeled by a relation $\varPhi \subseteq {\Sigma }^{*} \times \mathbb {N}$. A typical example would be the problem {(w,k) : (1 − 𝜖) ⋅ c₁(w) ≤ k ≤ (1 + 𝜖) ⋅ c₁(w)} for some 0 < 𝜖 < 1.

For a window length n ≥ 0 and a stream w ∈Σ^∗ we define last_n(w) to be the suffix of $\square ^{n} w$ of length n where $\square \in {\Sigma }$ is a fixed alphabet symbol. The word $\text {last}_{n}(\varepsilon ) = \square ^{n}$ is also called the initial window. To every approximation problem $\varPhi \subseteq {\Sigma }^{*} \times Y$ we associate the sliding window problem

$$ \text{SW}_{n}{(\varPhi)} = \{ (x,y) \in {\Sigma}^{*} \times Y : (\text{last}_{n}(x),y) \in {\varPhi} \} $$

for window length n.

2.2 Probabilistic Automata with Output

In the following we will introduce probabilistic automata [22, 23] as a model of randomized streaming algorithms which produce an output after each input symbol. A randomized streaming algorithm or a probabilistic automaton $\mathcal {P} = (Q,{\Sigma },\iota ,\rho ,o)$ consists of a (possibly infinite) set of states Q, a finite alphabet Σ, an initial state distribution ι: Q → [0,1], a transition probability function ρ: Q ×Σ× Q → [0,1] and an output function o: Q → Y such that

${\sum }_{q \in Q} \iota (q) = 1$,
${\sum }_{q \in Q} \rho (p,a,q) = 1$ for all p ∈ Q, a ∈Σ.

The space of the randomized streaming algorithm $\mathcal {P}$ (or the number of bits used by $\mathcal {P}$) is given by $s(\mathcal {P}) = \log |Q| \in \mathbb {R}_{\ge 0} \cup \{\infty \}$.

If ι and ρ map into {0,1}, then $\mathcal {P}$ is a deterministic automaton; in this case we write $\mathcal {P}$ as $\mathcal {P} = (Q,{\Sigma },q_{0},\delta ,o)$, where q₀ ∈ Q is the initial state and δ: Q ×Σ→ Q is the transition function. A run on a word $a_{1} {\cdots } a_{m} \in {\Sigma }^{*}$ in $\mathcal {P}$ is a sequence $\pi = (q_{0},a_{1},q_{1},a_{2},\dots ,a_{m},q_{m})$ where $q_{0}, \dots , q_{m} \in Q$ and ρ(q_i− 1,a_i,q_i) > 0 for all 1 ≤ i ≤ m. If m = 0 we obtain the empty run (q₀) starting and ending in q₀. We write runs in the usual way

$$ \pi \colon q_{0} \xrightarrow{a_{1}} q_{1} \xrightarrow{a_{2}} {\cdots} \xrightarrow{a_{m}} q_{m} $$

or also omit the intermediate states: $\pi \colon q_{0} \xrightarrow {a_{1} {\cdots } a_{m}} q_{m}$. We extend ρ to runs in the natural way: if $\pi \colon q_{0} \xrightarrow {a_{1}} q_{1} \xrightarrow {a_{2}} {\cdots } \xrightarrow {a_{m}} q_{m}$ is a run in $\mathcal {P}$ then $\rho (\pi ) = {\prod }_{i=1}^{m} \rho (q_{i-1},a_{i},q_{i})$. Furthermore we define ρ_ι(π) = ι(q₀) ⋅ ρ(π). We denote by $\text {Runs}(\mathcal {P},w)$ the set of all runs on w in $\mathcal {P}$ and denote by $\text {Runs}(\mathcal {P},q,w)$ those runs on w that start in q ∈ Q. If $\mathcal {P}$ is clear from the context, we simply write Runs(w) and Runs(q,w). Notice that for each w ∈Σ^∗ the function ρ_ι is a probability distribution on $\text {Runs}(\mathcal {P},w)$ and for each q ∈ Q the restriction of ρ to $\text {Runs}(\mathcal {P},q,w)$ is a probability distribution on $\text {Runs}(\mathcal {P},q,w)$. If π is a set of runs (which will often be defined by a certain property of runs), then $\Pr _{\pi \in \text {Runs}(w)}[\pi \in {\Pi }]$ denotes the probability ${\sum }_{\pi \in \text {Runs}(w) \cap {\Pi }} \rho _{\iota }(\pi )$ and $\Pr _{\pi \in \text {Runs}(q,w)}[\pi \in {\Pi }]$ denotes ${\sum }_{\pi \in \text {Runs}(q,w) \cap {\Pi }} \rho (\pi )$.

2.3 Correctness definitions

Let $\mathcal {P} = (Q,{\Sigma },\iota ,\rho ,o)$ be a randomized streaming algorithm with output function o: Q → Y, let $\varPhi \subseteq {\Sigma }^{*} \times Y$ be an approximation problem and let $w = a_{1}a_{2} {\cdots } a_{m} \in {\Sigma }^{*}$ be an input stream. Furthermore let 0 ≤ λ ≤ 1 be an error probability.

A run $\pi \colon q_{0} \xrightarrow {w} q_{m}$ in $\mathcal {P}$ is correct for Φ if (w,o(q_m)) ∈Φ. We say that $\mathcal {P}$ is λ-correct for Φ if for all w ∈Σ^∗ we have
$$ \underset{\pi \in \text{Runs}(w)}{\Pr} [\pi \text{ is correct for } {\varPhi}] \ge 1 - \lambda. $$
A run $\pi \colon q_{0} \xrightarrow {a_{1}} q_{1} \xrightarrow {a_{2}} {\cdots } q_{m-1} \xrightarrow {a_{m}} q_{m}$ in $\mathcal {P}$ on w is strictly correct for Φ if (a₁⋯a_t,o(q_t)) ∈Φ for all 0 ≤ t ≤ m. We say that $\mathcal {P}$ is strictlyλ-correct for Φ if for all w ∈Σ^∗ we have
$$ \underset{\pi \in \text{Runs}(w)}{\Pr} [\pi \text{ is strictly correct for } {\varPhi}] \ge 1 - \lambda. $$

A (strictly) λ-correct randomized streaming algorithm $\mathcal {P}_{n}$ for SW_n(Φ) is also called a (strictly)λ-correct randomized sliding window algorithm for Φ and window size n. If $\mathcal {P}_{n}$ is deterministic and (strictly) 0-correct, we speak of a deterministic sliding window algorithm for Φ and window size n. The reader might think of having for every window size n a sliding window algorithm $\mathcal {P}_{n}$. We do not assume any uniformity here in the sense that the sliding window algorithms for different window sizes do not have to follow a common pattern. This is the same situation as in non-uniform circuit complexity, where one has for every input length n a circuit C_n and it is not required that the circuit C_n can be computed from n.

Remark 1

The trivial sliding window algorithm stores for window size n the window content with $\lceil \log |{\Sigma }| \rceil \cdot n$ bits. Hence every approximation problem has a deterministic sliding window algorithm $\mathcal {D}_{n}$ with $s(\mathcal {D}_{n}) \le \lceil \log |{\Sigma }| \rceil \cdot n$. In particular, for every (strictly) λ-correct randomized sliding window algorithm $\mathcal {P}_{n}$ for Φ and window size n, there exists a (strictly) λ-correct randomized sliding window algorithm $\mathcal {P}_{n}^{\prime }$ for Φ and window size n such that

$$ s(\mathcal{P}_{n}^{\prime}) \le \min\{ s(\mathcal{P}_{n}), \lceil \log |{\Sigma}| \rceil \cdot n \}. $$

3 Derandomization of Strictly Correct Algorithms

In this section we prove the main result of this paper, which states that strictly correct randomized sliding window algorithms can be completely derandomized:

Theorem 1

Let $\varPhi \subseteq {\Sigma }^{*} \times Y$ be an approximation problem, $n \in \mathbb {N}$ be a window size and 0 ≤ λ < 1 be an error probability. For every randomized sliding window algorithm $\mathcal {P}_{n}$ which is strictly λ-correct for Φ and window size n there exists a deterministic sliding window algorithm $\mathcal {D}_{n}$ for Φ and window size n such that $s(\mathcal {D}_{n}) \le s(\mathcal {P}_{n})$.

The proof idea is to successively construct a (doubly exponentially long) strictly correct run which reads all possible windows of length n from a certain subset of memory states. This strictly correct run then defines a deterministic algorithm which is always correct.

Let ${\varPhi } \subseteq {\Sigma }^{*} \times Y$, $n \in \mathbb {N}$ be a window size and 0 ≤ λ < 1 as in Theorem 1. Let $\mathcal {P}_{n}$ be a strictly λ-correct sliding window algorithm for Φ and window size n. By Remark 1, we can assume that $\mathcal {P}_{n}$ has a finite state set. Consider a run

$$ \pi \colon q_{0} \xrightarrow{a_{1}} q_{1} \xrightarrow{a_{2}} {\cdots} \xrightarrow{a_{m}} q_{m} $$

in $\mathcal {P}_{n}$. The run π is simple if q_i≠q_j for 0 ≤ i < j ≤ m. A subrun of π is a run

$$ q_{i} \xrightarrow{a_{i+1}} q_{i+1} \xrightarrow{a_{i+2}} {\cdots} q_{j-1} \xrightarrow{a_{j}} q_{j} $$

for some 0 ≤ i ≤ j ≤ m. Consider a nonempty subset $S \subseteq Q$ and a function δ: Q ×Σ→ Q such that S is closed under δ, i.e., $\delta (S \times {\Sigma }) \subseteq S$. We say that the run π is δ-conform if δ(q_i− 1,a_i) = q_i for all 1 ≤ i ≤ m. We say that π is (S,δ)-universal if for all q ∈ S and x ∈Σⁿ there exists a δ-conform subrun $\pi ^{\prime } \colon q \xrightarrow {x} q^{\prime }$ of π. Finally, π is δ-universal if it is (S,δ)-universal for some nonempty subset $S \subseteq Q$ which is closed under δ.

Lemma 1

Let π be a strictly correct run in $\mathcal {P}_{n}$ for Φ, let $S \subseteq Q$ be a nonempty subset and let δ: Q ×Σ→ Q be a function such that S is closed under δ. If π is (S,δ)-universal, then there exists q₀ ∈ S such that $\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)$ is a deterministic sliding window algorithm for Φ and window size n.

Proof

Let $q_{0} = \delta (p,\square ^{n}) \in S$ for some arbitrary state p ∈ S and define $\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)$. Let w ∈Σ^∗ and consider the run $\sigma \colon p \xrightarrow {\square ^{n}} q_{0} \xrightarrow {w} q$ in $\mathcal {D}_{n}$ of length ≥ n. We have to show that (last_n(w),o(q)) ∈Φ. We can write $\square ^{n} w = x \text {last}_{n}(w)$ for some x ∈Σ^∗. Hence, we can rewrite the run σ as $\sigma \colon p \xrightarrow {x} q^{\prime } \xrightarrow {\text {last}_{n}(w)} q$. We know that $q^{\prime } \in S$ because S is closed under δ. Since π is (S,δ)-universal, it contains a subrun $q^{\prime } \xrightarrow {\text {last}_{n}(w)} q$. Strict correctness of π implies (last_n(w),o(q)) ∈Φ. □

For the rest of this section we fix an arbitrary function δ: Q ×Σ→ Q such that for all q ∈ Q, a ∈Σ,

$$ \rho(q,a,\delta(q,a)) = \max \{ \rho(q,a,p) \colon p \in Q \} . $$

Thus, we choose δ(q,a) as a most likely a-successor of q. Note that

$$ \rho(q,a,\delta(q,a)) \geq \frac{1}{|Q|} $$

(1)

for all q ∈ Q, a ∈Σ. Furthermore, let $\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)$ where the initial state q₀ will be defined later. We inductively define for each i ≥ 1 a state p_i, a run $\pi _{i}^{*}$ in $\mathcal {D}_{n}$ on some word $w_{i} \in {\Sigma }^{*}$, and a non-empty set $S_{i} \subseteq Q$, which is closed under δ. For m ≥ 0, we abbreviate $\text {Runs}(\mathcal {P}_{n},w_{1} {\cdots } w_{m})$ by R_m. Note that $R_{0} = \text {Runs}(\mathcal {P}_{n}, \varepsilon )$. For 1 ≤ i ≤ m let H_i denote the event that for a random run π = π₁⋯π_m ∈ R_m, where each π_j is a run on w_j, the subrun π_i is (S_i,δ)-universal. Notice that H_i is independent of m ≥ i.

First, we choose for p_i (i ≥ 1) a state that maximizes the probability

$$ \underset{\pi \in R_{i-1}}{\Pr}[\pi \text{ ends in } p_{i} \mid \forall 1 \leq j \leq i-1 : \overline{H_{j}} ], $$

which is at least 1/|Q|. Note that p₁ is a state such that ι(p₁) is maximal, since R₀ only consists of empty runs (q). For S_i we take any maximal strongly connected component of $\mathcal {D}_{n}$ (viewed as a directed graph) which is reachable from p_i. As usual, strongly connected component means that for all p,q ∈ S_i the state p is reachable from q and vice versa. Maximality means that for every q ∈ S_i and every a ∈Σ, also δ(q,a) belongs to S_i, i.e., S_i is closed under δ. Note that such a δ-closed strongly connected component must exist since Q is finite. Finally, we define the run $\pi ^{*}_{i}$ and the word w_i. The run $\pi ^{*}_{i}$ starts in p_i. Then, for each pair $(q,x) \in S_{i} \times {\Sigma }^{n}$ the run $\pi ^{*}_{i}$ leads from the current state to state q via a simple run and then reads the word x from q. The order in which we go over all pairs $(q,x) \in S_{i} \times {\Sigma }^{n}$ is not important. Since S_i is a maximal strongly connected component of $\mathcal {D}_{n}$ such a run $\pi ^{*}_{i}$ exists. Hence, $\pi ^{*}_{i}$ is a run on a word

$$ w_{i} = \underset{q \in S_{i}}{\prod} \underset{x \in {\Sigma}^{n}}{\prod} y_{q,x} x, $$

where y_q,x is the word that leads from the current state via a simple run to state q. Since we choose the runs on the words y_q,x to be simple, we have |y_q,x|≤|Q| and thus |w_i|≤|Q|⋅|Σ|ⁿ ⋅ (|Q| + n). Let us define

$$ \mu = \frac{1}{|Q|^{|Q| \cdot |{\Sigma}|^{n} \cdot (|Q|+n) + 1}} . $$

(2)

Note that by construction, the run $\pi ^{*}_{i}$ is (S_i,δ)-universal. Inequality (1) yields

$$ \underset{\pi \in \text{Runs}(p_{i},w_{i})}{\Pr} [\pi = \pi^{*}_{i}] \ge \frac{1}{|Q|^{|w_{i}|}} \ge \mu \cdot |Q|. $$

(3)

Lemma 2

For all m ≥ 0 we have $\Pr _{\pi \in R_{m}}[H_{m} \mid \forall i\leq m-1: \overline {H_{i}}] \geq \mu $.

Proof

In the following, let π be a random run from R_m and let π_i be the subrun on w_i. Under the assumption that the event [π_m− 1 ends in p_m] holds, the events $[\pi _{m} = \pi ^{*}_{m}]$ and $[\forall i \leq m-1: \overline {H_{i}}]$ are conditionally independent.^{Footnote 2} Thus, we have

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m} \wedge \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m}] . \end{array} $$

Since the event $[\pi _{m} = \pi ^{*}_{m}]$ implies the event [π_m− 1 ends in p_m] (recall that $\pi ^{*}_{m}$ starts in p_m) and $\pi ^{*}_{m}$ is (S_m,δ)-universal, we obtain:

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &\ge& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \wedge \pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi_{m}^{*} \mid \pi_{m-1} \text{ ends in } p_{m} \wedge \forall i \leq m-1: \overline{H_{i}}] \cdot \\ && \underset{\pi \in R_{m}}{\Pr}[\pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m}}{\Pr}[\pi_{m} = \pi^{*}_{m} \mid \pi_{m-1} \text{ ends in } p_{m}] \cdot \\ && \underset{\pi \in R_{m}}{\Pr}[\pi_{m-1} \text{ ends in } p_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \\ &\ge & \underset{\pi_{m} \in \text{Runs}(p_{m},w_{m})}{\Pr} [\pi_{m} = \pi^{*}_{m}] \cdot \frac{1}{|Q|}\\ & \geq & \mu, \end{array} $$

where the last inequality follows from (3). This proves the lemma. □

Lemma 3

$\Pr _{\pi \in R_{m}}[\pi is \delta -universal] \ge \Pr _{\pi \in R_{m}}[\exists i \leq m: H_{i}] \ge 1 - (1-\mu )^{m}$.

Proof

The first inequality follows from the definition of the event H_i. Moreover, with Lemma 2 we get

$$ \begin{array}{@{}rcl@{}} \underset{\pi \in R_{m}}{\Pr}[\exists i \leq m : H_{i}] &=& \underset{\pi \in R_{m}}{\Pr}[\exists i \leq m-1: H_{i}] + \\ &&\underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \cdot \underset{\pi \in R_{m}}{\Pr}[\forall i \leq m-1: \overline{H_{i}}] \\ &=& \underset{\pi \in R_{m-1}}{\Pr}[\exists i \leq m-1: H_{i}] + \\ &&\underset{\pi \in R_{m}}{\Pr}[H_{m} \mid \forall i \leq m-1: \overline{H_{i}}] \cdot \Pr_{\pi \in R_{m-1}}[\forall i \leq m-1: \overline{H_{i}}] \\ &\ge& \underset{\pi \in R_{m-1}}{\Pr}[\exists i \leq m-1: H_{i}] + \mu \cdot \Pr_{\pi \in R_{m-1}}[\forall i \leq m-1: \overline{H_{i}}]. \end{array} $$

Thus, $r_{m} := \Pr _{\pi \in R_{m}}[\exists i \leq m : H_{i}]$ satisfies r_m ≥ r_m− 1 + μ ⋅ (1 − r_m− 1) = (1 − μ) ⋅ r_m− 1 + μ. Since r₀ = 0, we get r_m ≥ 1 − (1 − μ)^m by induction. □

We can now show our main theorem:

Proof Proof (Theorem 1)

We use the probabilistic method in order to show that there exists q₀ ∈ Q such that $\mathcal {D}_{n} = (Q,{\Sigma },q_{0},\delta ,o)$ is a deterministic sliding window algorithm for Φ. With Lemma 3 we get

$$ \begin{array}{@{}rcl@{}} && \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is strictly correct for $\text{SW}_{n}(\varPhi)$ and $\delta$-universal}] \\ & = & 1 - \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is not strictly correct for $\text{SW}_{n}(\varPhi)$ or is not $\delta$-universal}] \\ & \ge & 1 - \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is not strictly correct for $\text{SW}_{n}(\varPhi)$}] - \Pr_{\pi \in R_{m}}[\pi \text{ is not $\delta$-universal}] \\ &\ge & \underset{\pi \in R_{m}}{\Pr}[\pi \text{ is $\delta$-universal}] - \lambda \\ &\ge & 1 - (1-\mu)^{m} - \lambda . \end{array} $$

We have 1 − (1 − μ)^m − λ > 0 for $m > \log (1-\lambda ) / \log (1-\mu )$ (note that λ < 1 and 0 < μ < 1 since we can assume that |Q|≥ 2). Hence there are m ≥ 0 and a strictly correct δ-universal run π ∈ R_m. We can conclude with Lemma 1. □

4 Polynomially Long Streams

The word w₁w₂⋯w_m (with $m > \log (1-\lambda ) / \log (1-\mu )$) from the previous section, for which there exists a strictly correct and δ-universal run has a length that is doubly exponential in the window size n. To see this note that $0 > \ln (1-x) \ge x/(x-1)$ for 0 ≤ x < 1, which implies

$$ m > \frac{\log(1-\lambda)}{\log(1-\mu)} = \frac{\ln(1-\lambda)}{\ln(1-\mu)} \ge \ln(1-\lambda) (\mu-1) \cdot \frac{1}{\mu} . $$

Here, $\ln (1-\lambda )$ is a negative constant and μ − 1 is very close to − 1. Moreover, 1/μ grows doubly exponential in n by (2).

In other words: We need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability in order to derandomize the algorithm. In this section we show that at least we cannot reduce the length to poly(n): if we restrict to inputs of length poly(n) then strictly λ-correct sliding window algorithms can yield a proper space improvement over deterministic sliding window algorithms.

For a word w = a₁⋯a_n let w^R = a_n⋯a₁ denote the reversed word. Take the language $K_{\text {pal}} = \{ ww^{\mathsf {R}} : w \in \{a,b\}^{*} \}$ of all palindromes of even length, which belongs to the class DLIN of deterministic linear context-free languages [6], and let L = $K_pal. As explained in Section 2.1 we identify L with the (exact) approximation problem $\chi _{L} \colon \{a,b,\$\}^{*} \to \{0,1\}$ where χ_L(w) = 1 if and only if w ∈ L. We write L_n for SW_n(χ_L). Note that the following proposition holds for arbitrarily long input streams.

Proposition 1

Any deterministic sliding window algorithm for L and window size 2n + 1 uses Ω(n) space.

Proof

Let $\mathcal {D}_{2n+1}$ be a deterministic sliding window algorithm for L and window size 2n + 1, and take two distinct words $x and $y where x,y ∈{a,b}ⁿ. Since $\mathcal {D}_{2n+1}$ accepts $xx^R and rejects $yx^R, the algorithm $\mathcal {D}_{2n+1}$ reaches two different states on the inputs $x and $y. Therefore, D_2n+ 1 must have at least |{a,b}ⁿ| = 2ⁿ states and hence Ω(n) space. □

Proposition 2

Fix a polynomial p(n) and let $n \in \mathbb {N}$ be a window size. If n is large enough, there is a randomized streaming algorithm $\mathcal {P}_{n}$ with $s(\mathcal {P}_{n}) \leq \mathcal {O}(\log n)$ such that

$$ \underset{\pi \in \text{Runs}(\mathcal{P}_{n},w)}{\Pr} [\pi \text{ is strictly correct for } L_{n} ] \ge 1 - 1/n $$

for all input words w ∈Σ^∗ with |w|≤ p(n).

Proof

Babu et al. [6] have shown that for every language K ∈DLIN there exists a randomized streaming algorithm using space $\mathcal {O}(\log n)$ which, given an input v of length n,

accepts with probability 1 if v ∈ K,
and rejects with probability at least 1 − 1/n if v∉K.

We use this statement for the language K_pal ∈DLIN. We remark that the algorithm needs to know the length of v in advance. To stay consistent with our definition, we view the above algorithm as a family $(\mathcal {S}_{n})_{n \ge 0}$ of randomized streaming algorithms $\mathcal {S}_{n}$. Furthermore, the error probability 1/n can be further reduced to 1/(n + 1)^d where d is chosen such that p(n) ≤ n^d for sufficiently large n (by picking random primes of size Θ(n^d+ 1) in the proof from [6]).

Now we prove our claim for L = $K_pal. The streaming algorithm $\mathcal {P}_{n}$ for window size n works as follows: After reading a $-symbol, the algorithm $\mathcal {S}_{n-1}$ from above is simulated on the longest factor from {a,b}^∗ that follows (i.e. $\mathcal {S}_{n-1}$ is simulated until the next $ arrives). Simultaneously we maintain the length ℓ of the maximal suffix over {a,b}, up to n, using $\mathcal {O}(\log n)$ bits. If ℓ reaches n − 1, then $\mathcal {P}_{n}$ accepts if and only if $\mathcal {S}_{n-1}$ accepts. Notice that $\mathcal {P}_{n}$ only errs if the stored length is n − 1, which happens at most once in every n steps. Therefore the number of time instants where $\mathcal {P}_{n}$ errs on an input stream w of length |w|≤ p(n) ≤ n^d is at most |w|/n ≤ n^d/n = n^d− 1 (if n is large enough). Moreover, at each of these time instants the error probability is at most 1/n^d. By the union bound we have for every stream w ∈{$,a,b}^≤p(n):

$$ \underset{\pi \in \text{Runs}(\mathcal{P}_{n},w)}{\Pr} {[} \pi \text{ is not strictly correct for } L_{n} {]} \le n^{d-1} \cdot \frac{1}{n^{d}} = 1/n. $$

This concludes the proof. □

5 Lower Bound for Basic Counting

For an approximation error 𝜖 > 0 let us define the basic counting problem

$$ C_{1,\epsilon} = \{ (w,m) \in \{0,1\}^{*} \times \mathbb{N} : (1-\epsilon) \cdot c_{1}(w) \leq m \leq (1+\epsilon) \cdot c_{1}(w)\} $$

where c₁(w) denotes the number of 1’s in w. In [12] Datar, Gionis, Indyk and Motwani prove that any strictly λ-correct randomized sliding window algorithm for C_1,𝜖 and window size n must use $\frac {k}{64} \log ^{2} \frac {n}{k} - \log (1-\lambda )$ bits where k = ⌊1/𝜖⌋. We adapt their proof to show that the lower bound also holds for the weaker notion of λ-correct randomized sliding window algorithms.

Theorem 2

Let 𝜖 > 0 and k = ⌊1/𝜖⌋. Every 1/200-correct randomized sliding window algorithm for C_1,𝜖 and window size n ≥ 4k must use $\frac {k}{48} \log ^{2}(\frac {n}{k})$ many bits.

In the statement above we can assume any algorithm with error probability λ < 1/2 using the median trick, see e.g. [2]: We run m copies of the algorithm in parallel and output the median of their outputs. Using the Chernoff bound we can choose m such that the median is a correct 𝜖-approximation with error probability 1/200. This reduces the space lower bound only by a constant.

For the rest of the section let us fix $n \in \mathbb {N}$ and 0 < 𝜖 < 1. Furthermore set k = ⌊1/𝜖⌋. For the proof we use a reduction from a suitable communication problem. Let f : A × B →{0,1} be a function. A (one-round communication public-coin) protocolP = (m_A,m_B) with cost c consists of two functions $m_{A} \colon A \times R \to \{0,1\}^{c}$ and $m_{B} \colon \{0,1\}^{c} \times B \times R \to \{0,1\}$. Here R is a finite set of random choices equipped with a probability distribution. Given inputs a ∈ A and b ∈ B the protocol computes the random output P(a,b) = m_B(m_A(a,r),b,r) where r ∈ R at random. It computes f with error probability λ < 1/2 if

$$ \underset{r \in R}{\Pr} [P(a,b) \neq f(a,b)] \le \lambda $$

for all a ∈ A,b ∈ B. If |R| = 1 then P is deterministic.

We define the communication problem GT_ℓ,m where Alice is given m many ℓ-bit numbers $a^{(1)}, \dots , a^{(m)}$, Bob is given a single ℓ-bit number b and an index 1 ≤ p ≤ m, and the goal is to decide whether a^(p) > b. Formally, we view GT_ℓ,m as a function

$$ \text{GT}_{\ell,m} \colon (\{0,1\}^{\ell})^{m} \times (\{0,1\}^{\ell} \times \{1, \dots, m\}) \to \{0,1\}. $$

If m = 1 we write GT_ℓ = GT_ℓ,1.

Proposition 3

Let $B = \sqrt {nk}$ such that n ≥ 4k, and $j = \lfloor \log \frac {n}{B}\rfloor $. If $\mathcal {P}_{n}$ is a λ-correct sliding window algorithm for C_1,𝜖 and window size n then there exists a one-round protocol for $\text {GT}_{\log \frac {4B}{k}, \frac {jk}{4}}$ with cost $s(\mathcal {P}_{n})$ and error probability λ.

Proof

In the following we ignore rounding issues. The idea is that Alice encodes her jk/4 many numbers by a bit stream consisting of jk/4 groups and feeding it into the sliding window algorithm. Then Bob can compare his number b with any of Alice’s numbers a⁽ⁱ⁾ with high probability, by sliding the window to the appropriate position.

As in [12] we partition the window of length n into j blocks of size

$$B, 2B, 4B, \dots, 2^{j-1} B$$

from right to left where $j = \lfloor \log \frac {n}{B}\rfloor $. Notice that j ≥ 1 by our assumption that n ≥ 4k. The blocks are numbered 0 to j − 1 from right to left. The i-th block of length 2ⁱB is divided into B many subblocks of length 2ⁱ. Each block is divided into k/4 groups consisting of 4B/k contiguous subblocks. In the following we choose from every group exactly one subblock which is filled with 1’s; the remaining subblocks in the group are filled with 0’s. An example is shown in Figure 1.

Let $M = \{1, \dots , 4B/k\}$. We will encode a tuple $\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)}) \in M^{jk/4}$ as a bit string of length n in unary encoding fashion as follows: For a ∈ M and 0 ≤ i ≤ j − 1 define the bit string

$$ u_{i}(a) = (0^{2^{i}})^{4B/k -a} 1^{2^{i}} (0^{2^{i}})^{a-1} $$

of length 2ⁱ ⋅ 4B/k. For a tuple $\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)})$ over M of length jk/4 we define the arrangementw(a) ∈{0,1}ⁿ by

$$ w(\mathbf{a}) = \prod\limits_{i=0}^{j-1} \prod\limits_{r = 1}^{k/4} u_{i}(a^{(ik/4 + r)}) $$

where both concatenations are interpreted from right to left.

Datar et al. [12] argue that for any two distinct tuples a and b, the arrangements w(a) and w(b) must be distinguished by a deterministic sliding window algorithm for C_1,𝜖 and window length n.

We will present a communication protocol for $\text {GT}_{\log \frac {4B}{k}, \frac {jk}{4}}$ based on a λ-correct sliding window algorithm $\mathcal {P}_{n}$ for C_1,𝜖. Notice that $\log \frac {4B}{k} \ge 1$ by the assumption that n ≥ 4k. Suppose that Alice holds the tuple $\mathbf {a} = (a^{(1)}, \dots , a^{(jk/4)})$ of numbers from M, Bob holds b ∈ M and an index 1 ≤ p ≤ jk/4. Their goal is to determine whether a_p > b. The protocol is defined as follows: Alice simulates $\mathcal {P}_{n}$ on w(a) and sends the reached state to Bob, using $s(\mathcal {P}_{n})$ bits. Suppose that p = ik/4 + r for some 0 ≤ i ≤ j − 1 and 1 ≤ r ≤ k/4. Bob then insert a suitable number of 0’s in the stream such that the length-n window starts with the b-th subblock from the r-th group in the i-th block of w(a). Notice that this is possible without knowing the tuple a because of the regular structure of arrangements which is known to Bob. The number of 1-bits in the obtained window is precisely

$r \cdot 2^{i} + \frac {k}{4} (2^{i-1} + {\dots } + 2^{1} + 2^{0})$ if a_p ≤ b, and
$(r-1) \cdot 2^{i} + \frac {k}{4} (2^{i-1} + {\dots } + 2^{1} + 2^{0})$ if a_p > b.

Since the absolute approximation error is bounded by

$$ \epsilon \frac{k}{4} (2^{i} + {\dots} + 2^{1} + 2^{0}) < \frac{2^{i+1}}{4} = 2^{i-1}, $$

the two cases above can be distinguished by $\mathcal {P}_{n}$ with probability 1 − λ. □

It remains to prove a lower bound for the one-round communication complexity of GT_ℓ,m. We start by showing that the one-round communication complexity of GT_n is Ω(n). This was already proven by Yao [25, Theorem 5]. More generally, Miltersen et al. showed that any r-round protocol for GT_n requires Ω(n^1/r) bits using the round elimination technique [21]. We will first reprove the Ω(n) lower bound for GT_n, by directly plugging in r = 1 and GT_n into the proof of [21, Lemma 11]. Afterwards we adapt the proof to show the Ω(ℓm) lower bound for GT_ℓ,m.

Theorem 3

Every one-round randomized protocol for GT_n with error probability 1/200 has cost at least n/3 bits.

Proof

We follow the proof of [21, Lemma 11]. Consider a randomized one-round protocol for GT_n with error probability 1/200. The goal is to prove that the protocol must use n/3 bits.

By Yao’s minimax principle [24] it suffices to exhibit a “hard” input distribution D on the set of inputs {0,1}ⁿ ×{0,1}ⁿ and to prove that every deterministic protocol P with $\Pr _{D} [P(x,y) \neq \text {GT}_{n}(x,y)] \le 1/200$ must have cost Ω(n). See [20] for similar applications of Yao’s minimax principle in the area of communication complexity.

For a bit string x = x₁⋯x_n and an index 1 ≤ i ≤ n we define the bit string τ_i(x) = x₁⋯x_i− 101ⁿ⁻ⁱ of length n. Interpreted as binary numbers, we have the property

$$ x > \tau_{i}(x) \iff x_{i} = 1. $$

(4)

The “hard” input distribution D is the uniform distribution on

$$ \{ (x,\tau_{i}(x)) \mid x \in \{0,1\}^{n}, 1 \le i \le n \}. $$

In other words, Alice holds a uniformly random string x ∈{0,1}ⁿ and Bob holds τ_i(x) where the index 1 ≤ i ≤ n is also chosen uniformly at random and independently from x. By property (4) Bob needs to determine the value of x_i. Intuitively, the prefix x₁⋯x_i− 1 of τ_i(x) does not help Bob, so this is basically the “index”-function, for which every one-round randomized protocol with error probability 1/3 has cost Ω(n) [19, Theorem 3.7].

Consider any deterministic protocol P with communication cost c such that

$$ \underset{D}{\Pr}[P \text{ errs on } (x,\tau_{i}(x))] \le 0.005. $$

Call an index i in x bad if P errs on (x,τ_i(x)), and otherwise good. A uniformly random string x ∈{0,1}ⁿ has at most 0.005n bad indices in expectation. By the Markov inequality we know

$$ \underset{x \in \{0,1\}^{n}}{\Pr}[\text{number of bad indices of } x \ge 0.01n] \le \frac{0.005n}{0.01n} = \frac{1}{2}. $$

Hence the set

$$ R = \{ x \in \{0,1\}^{n} \mid x \text{ has at most } 0.01n \text{ bad indices} \} $$

must contain at least 2^n− 1 bit strings. Since Alice sends at most c bits, she partitions R into at most 2^c subsets according to the bit string send to Bob. Let T be one of these subsets that has maximum cardinality. We have |T|≥|R|/2^c = 2^n− 1−c, i.e.,

$$ c \ge n - 1 - \log |T|. $$

(5)

To prove c ≥ n/3 we derive an upper bound on |T|.

In the following we successively construct a sequence of bits $a_{1}, \dots , a_{n}$ and a sequence of nonempty sets $T = T_{1} \supseteq T_{2} \supseteq {\cdots } \supseteq T_{n+1}$ of n-bit strings such that all strings in T_i have the prefix a₁⋯a_i− 1 (in particular, |T_n+ 1| = 1).

1.
Set T₁ := T and repeat the following for $i = 1, \dots , n$:
2.
Set $T_{i}^{-} := \{ x \in T_{i} \mid i \text { is bad in } x \}$ and $T_{i}^{+} := \{ x \in T_{i} \mid i \text { is good in } x \}$.
3.
If $|T^{-}_{i}| \ge 0.05 \cdot |T_{i}|$ choose a_i ∈{0,1} such that $|\{ x \in T_{i}^{-} \mid x_{i} = a_{i} \}|$ is maximal. Then set $T_{i+1} := \{ x \in T_{i}^{-} \mid x_{i} = a_{i} \}$.
4.
Otherwise we have $|T_{i}^{+}| \ge 0.95 \cdot |T_{i}|$. All strings $x \in T_{i}^{+}$ must have the same i-th bit, say x_i = a_i, since for a string $x \in T_{i}^{+}$ Bob outputs correctly the bit x_i on input (x,τ_i(x)). But since all strings in $T_{i}^{+}$ have the same prefix of length i − 1 (and hence yield the same value under τ_i) and Alice communicates by definition of T the same message to Bob, Bob outputs the same bit for all $x \in T_{i}^{+}$. Set $T_{i+1} := T_{i}^{+}$.

Observe that all subsets $T_{1}, \dots , T_{n+1}$ are nonempty. If point 3 is satisfied then |T_i+ 1|≥ 0.5 ⋅ 0.05|T_i| = 0.025|T_i|, and the index i is bad in all strings of T_i+ 1. Hence, point 3 can only be satisfied at most 0.01n times by definition of R. If point 4 is satisfied then |T_i+ 1|≥ 0.95|T_i|. We can therefore bound

$$ 1 = |T_{n+1}| \ge 0.025^{0.01n} \cdot 0.95^{0.99n} \cdot |T| > 0.916^{n} \cdot |T|, $$

and thus |T|≤ 1.0917ⁿ. By (5) we have established

$$ c \ge (1 - \log 1.0917) n - 1 \ge 0.8735 n - 1 \ge n/3. $$

for all n ≥ 2. Clearly in the case n = 1 there is no zero-message randomized protocol for GT₁. □

Theorem 4

Every one-round protocol for GT_ℓ,m with error probability 1/200 has cost at least ℓm/3 bits.

Proof

We adapt the proof above to GT_ℓ,m. To keep the notation consistent we view Alice’s input as a single bit string x = x⁽¹⁾⋯x^(m) ∈{0,1}^ℓm where $x^{(1)}, \dots , x^{(m)} \in \{0,1\}^{\ell }$. We can write an index 1 ≤ i ≤ ℓm uniquely as i = (p(i) − 1)ℓ + r(i) where 1 ≤ p(i) ≤ m and 1 ≤ r(i) ≤ ℓ. For 1 ≤ i ≤ ℓm we define the bit string

$$ \tau_{i}(x) = x^{(p(i))}_{1} {\dots} x^{(p(i))}_{r(i)-1} 0 1^{\ell-r(i)} $$

of length ℓ. We have the property that

$$ \text{GT}_{\ell,m}(x, \tau_{i}(x), p(i)) = 1 \iff x^{(p(i))}_{r(i)} = 1. $$

The hard input distribution is the uniform distribution D over

$$ \{ (x, \tau_{i}(x),p(i)) : x \in \{0,1\}^{\ell m}, 1 \le i \le \ell m \}. $$

Let P be a deterministic protocol with communication cost c such that

$$ \underset{D}{\Pr}[P \text{ errs on } (x,\tau_{i}(x),p(i))] \le 0.005. $$

We call an index 1 ≤ i ≤ ℓm bad if P errs on (x,τ_i(x),p(i)). Again we can find a set $T \subseteq \{0,1\}^{\ell m}$ such that

|T|≥ 2^{ℓm− 1−c},
all x ∈ T have at most 0.01ℓm bad indices,
and Alice sends the same message on all x ∈ T.

Using precisely the same arguments as in the previous proof we obtain |T|≤ 1.0917^ℓm and thus c ≥ ℓm/3 whenever ℓm ≥ 2. If ℓm = 1 then Alice must also send at least one bit in any communication protocol for GT_ℓ,m with error probability 1/200. □

Theorem 2 now follows from Proposition 3 and Theorem 4: Let $B = \sqrt {nk}$ and $j = \lfloor \log \frac {n}{B}\rfloor \ge 1$. Then, any randomized 1/200-correct sliding window algorithm for C_1,𝜖 and window size n must use at least

$$ \frac{1}{3} \log\bigg(\frac{4B}{k}\bigg) \cdot \frac{jk}{4} = \frac{1}{3} \log\bigg(4\sqrt{\frac{n}{k}}\bigg) \cdot \bigg\lfloor\log\bigg(\sqrt{\frac{n}{k}}\bigg)\bigg\rfloor \cdot \frac{k}{4} \ge \frac{k}{48} \cdot \log^{2} \bigg(\frac{n}{k}\bigg) $$

many bits.

6 Open Problems

In the proof of Theorem 1 we need the fact that the sliding window algorithm is strictly correct on doubly exponentially long streams with high probability. We pose the question whether this can be reduced to exponentially long streams.

Another open problem is whether one can extend Theorem 4 to arbitrary communication problems. For any function f : A × B →{0,1} one can define an “indexed” version $f^{(m)} \colon A^{m} \times B \times \{1, \dots , m\} \to \{0,1\}$ where Alice holds a tuple $(a_{1}, \dots , a_{m}) \in A^{m}$, Bob holds b ∈ B and 1 ≤ i ≤ m and their goal is to compute f(a_i,b). The question is whether the one-round communication complexity of f^(m) must be m times as large as the complexity of f, as it is the case for GT_n.

Notes

For instance, Ben-Basat et al. write “We say that algorithm A is λ-correct on a input instance S if it is able to approximate the number of 1’s in the last W bits, at every time instant while reading S, to within an additive error of W𝜖”.
Two events A and B are conditionally independent assuming event C if $\Pr [A \wedge B \mid C] = \Pr [A \mid C] \cdot \Pr [B \mid C]$, which is equivalent to $\Pr [A \mid B \wedge C] = \Pr [A \mid C]$.

References

Aggarwal, C.C.: Data streams - models and algorithms springer (2007)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)
Article MathSciNet Google Scholar
Andrade, H.C.M., Gedik, B., Turaga, D.S.: Fundamentals of stream processing: application design, systems, and analytics cambridge university press (2014)
Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows (2004)
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of PODS 2003, pages 234–243 ACM (2003)
Babu, A., Limaye, N., Radhakrishnan, J., Varma, G.: Streaming algorithms for language recognition problems. Theor. Comput. Sci. 494, 13–23 (2013)
Article MathSciNet Google Scholar
Ben-Basat, R., Einziger, G., Friedman, R., Kassner, Y.: Succinct summing over sliding windows. Algorithmica 81(5), 2072–2091 (2019)
Article MathSciNet Google Scholar
Braverman, V., Sliding window algorithms. In: Encyclopedia of Algorithms, pages 2006–2011. Springer (2016)
Braverman, V., Ostrovsky, R., Zaniolo, C.: Optimal sampling from sliding windows. J. Comput. Syst. Sci. 78(1), 260–272 (2012)
Article MathSciNet Google Scholar
Chan, H., Lam, T.W., Lee, L., Pan, J., Ting, H., Zhang, Q.: Edit distance to monotonicity in sliding windows. In: Proceedings of ISAAC 2011, volume 7074 of Lecture Notes in Computer Science, pages 564–573 Springer (2011)
Crouch, M.S., McGregor, A., Stubbsm, D.: Dynamic graphs in the sliding-window model. In: Proceedings of ESA 2013, volume 8125 of Lecture Notes in Computer Science, pages 337–348 Springer (2013)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM Journal on Computing 31(6), 1794–1813 (2002)
Article MathSciNet Google Scholar
Ganardi, M., Hucke, D., König, D., Lohrey, M., Mamouras, K.: Automata theory on sliding windows. In: Proceedings of STACS 2018, volume 96 of LIPIcs, pages 31:1–31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. to appear (2018)
Ganardi, M., Hucke, D., Lohrey, M.: Querying regular languages over sliding windows. In: Proceedings of FSTTCS 2016, volume 65 of LIPIcs, pages 18:1–18:14 Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)
Ganardi, M., Hucke, D., Lohrey, M.: Randomized sliding window algorithms for regular languages. In: Proceedings of ICALP 2018, volume 107 of LIPIcs, pages 127:1–127:13 Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)
Ganardi, M., Jez, A., Lohrey, M.: Sliding windows over context-free languages (2018)
Ganardi, M., Hucke, D., Lohrey, M.: Derandomization for Sliding Window Algorithms with Strict Correctness. In: Proceedings of CSR 2019, volume 11532 of Lecture Notes in Computer Science, pages 237–249 Springer International Publishing (2019)
Golab, L., Ȯzsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB 2003, pages 500–511 Morgan Kaufmann (2003)
Kremer, I., Nisan, N., Ron, D.: On randomized one-round communication complexity. Comput. Complex. 8(1), 21–49 (1999)
Article MathSciNet Google Scholar
Kushilevitz, E., Nisan, N.: Communication complexity Cambridge University Press (1997)
Miltersen, P.B., Nisan, N., Safra, S., Wigderson, A.: On data structures and asymmetric communication complexity. In: Proceedings of STOC 1995, pages 103–111 ACM (1995)
Paz, A.: Introduction to probabilistic automata academic press (1971)
Rabin, M. O.: Probabilistic automata. Inf. Control. 6(3), 230–245 (1963)
Article MathSciNet Google Scholar
Yao, A.C.: Probabilistic computations: Toward a unified measure of complexity. In: Proceedings of FOCS 1977, pages 222–227 IEEE Computer Society (1977)
Yao, A.C.: Lower Bounds by Probabilistic Arguments (Extended Abstract). In: Proceedings of FOCS 1983, pages 420–428 IEEE Computer Society (1983)

Download references

Funding

Open Access funding provided by Projekt DEAL.

Author information

Authors and Affiliations

Universität Siegen, Hölderlinstraße 3, 57076, Siegen, Germany
Moses Ganardi, Danny Hucke & Markus Lohrey

Authors

Moses Ganardi
View author publications
You can also search for this author in PubMed Google Scholar
Danny Hucke
View author publications
You can also search for this author in PubMed Google Scholar
Markus Lohrey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses Ganardi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Computer Science Symposium in Russia (2019)

Guest Editor: Gregory Kucherov

A conference version of this paper appeared in [17].

The first author has been supported by the DFG research project LO 748/13-1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ganardi, M., Hucke, D. & Lohrey, M. Derandomization for Sliding Window Algorithms with Strict Correctness^∗. Theory Comput Syst 65, 1–18 (2021). https://doi.org/10.1007/s00224-020-10000-1

Download citation

Published: 14 August 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00224-020-10000-1

Derandomization for Sliding Window Algorithms with Strict Correctness∗

Abstract

Similar content being viewed by others

Derandomization for Sliding Window Algorithms with Strict Correctness

On Resource-Bounded Versions of the van Lambalgen Theorem

On Algorithmic Statistics for Space-bounded Algorithms

1 Introduction

2 Preliminaries

2.1 Approximation Problems

2.2 Probabilistic Automata with Output

2.3 Correctness definitions

Remark 1

3 Derandomization of Strictly Correct Algorithms

Theorem 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Proof Proof (Theorem 1)

4 Polynomially Long Streams

Proposition 1

Proof

Proposition 2

Proof

5 Lower Bound for Basic Counting

Theorem 2

Proposition 3

Proof

Theorem 3

Proof

Theorem 4

Proof

6 Open Problems

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Derandomization for Sliding Window Algorithms with Strict Correctness^∗