## 1 Introduction

Streaming algorithms are algorithms for processing large data streams while using only a limited amount of memory, significantly smaller than what is needed to store the entire data stream. Data streams occur in many applications including computer networking, databases, and natural language processing. The seminal work of Alon, Matias, and Szegedy [1] initiated an extensive theoretical study and further applications of streaming algorithms.

In this work we focus on streaming algorithms that aim to maintain, at any point in time, an approximation for the value of some (predefined) real-valued function of the input stream. Such streaming algorithms are sometimes referred to as strong trackers. For example, this predefined function might count the number of distinct elements in the stream. Formally,

### Definition 1.1

Let $${\mathcal {A}}$$ be an algorithm that, for m rounds, obtains an element from a domain X and outputs a real number. Algorithm $${\mathcal {A}}$$ is said to be a strong tracker for a function $${\mathcal {F}}:X^*\rightarrow {\mathbb {R}}$$ with accuracy $$\alpha$$, failure probability $$\delta$$, and stream length m if the following holds for every sequence $$\textbf{u}=(u_1,\dots ,u_m)\in X^m$$. Consider an execution of $${\mathcal {A}}$$ on the input stream $$\textbf{u}$$, and denote the answers given by $${\mathcal {A}}$$ as $$\textbf{z}=(z_1,\dots ,z_m)$$. Then,

where the probability is taken over the coins of algorithm $${\mathcal {A}}$$.

While Definition 1.1 is certainly not the only possible definition of streaming algorithms, it is rather standard. Note that in this definition we assume that the input stream $$\textbf{u}$$ is fixed in advance. In particular, we assume that the choice for the elements in the stream is independent from the internal randomness of $${\mathcal {A}}$$. This assumption is crucial for the analysis (and correctness) of many of the existing streaming algorithms. We refer to algorithms that utilize this assumption as oblivious streaming algorithms. In this work we are interested in the setting where this assumption does not hold, often called the adversarial setting.

### 1.1 The Adversarial Streaming Model

The adversarial streaming model, in various forms, was considered by [2,3,4,5,6,7,8,9,10,11,12,13,14]. We give here the formulation presented by Ben-Eliezer et al. [9]. The adversarial setting is modeled by a two-player game between a (randomized) StreamingAlgorithm and an Adversary. At the beginning, we fix a function $${\mathcal {F}}:X^*\rightarrow {\mathbb {R}}$$. Then the game proceeds in rounds, where in the ith round:

1. 1.

The Adversary chooses an update $$u_i\in X$$ for the stream, which can depend, in particular, on all previous stream updates and outputs of StreamingAlgorithm.

2. 2.

The StreamingAlgorithm processes the new update $$u_i$$ and outputs its current response $$z_i\in {\mathbb {R}}$$.

The goal of the Adversary is to make the StreamingAlgorithm output an incorrect response $$z_i$$ at some point i in the stream. For example, in the distinct elements problem, the adversary’s goal is that at some step i, the estimate $$z_i$$ will fail to be a $$(1+\alpha )$$-approximation of the true current number of distinct elements.

In this work we present a new framework for transforming an oblivious streaming algorithm into an adversarially-robust streaming algorithm. Before presenting our framework, we first elaborate on the existing literature and the currently available frameworks.

### 1.2 Existing Framework: Ben-Eliezer et al. [9]

To illustrate the results of [9], let us consider the distinct elements problem, in which the function $${\mathcal {F}}$$ counts the number of distinct elements in the stream. Observe that, assuming that there are no deletions in the stream, this quantity is monotonically increasing. Furthermore, since we are aiming for a multiplicative error, the number of times we need to modify the estimate we release is quite small (it depends logarithmically on the stream length m). Informally, the idea of [9] is to run several independent copies of an oblivious algorithm (in parallel), and to use each copy to release answers over a part of the stream during which the estimate remains constant. In more detail, the generic transformation of [9] (applicable not only to the distinct elements problem) is based on the following definition.

### Definition 1.2

(Flip number [9]) Given a function $${\mathcal {F}}: {\mathcal {X}}^* \rightarrow {\mathbb {R}}$$, denote its values for some stream of length m as $$(y_1, \dots , y_m)$$. The $$(\alpha ,m)$$-flip number of $${\mathcal {F}}$$, denoted as $$\lambda _{\alpha ,m}({\mathcal {F}})$$, or simply $$\lambda$$ in short, is the maximal number $$k\in [m]$$ s.t. for any stream of length m there exists some indices $$0\le i_1< \dots < i_k\le m$$ where for every $$j\in \{2,\dots k\}$$, it holds that $$y_{i_{j-1}} \notin (1\pm \alpha )\cdot y_{i_{j}}$$.

That is, above definition captures the number of times a function can change by a factor of $$(1\pm \alpha )$$ for any stream of length m.

### Remark 1.3

In the technical sections of this work, we sometimes refer to the flip number of the given stream (w.r.t. the target function), which is a more fine-tuned quantity. See Definiton 3.2.

### Example 1.4

Assuming that there are no deletions in the stream (a.k.a. the insertion only model), the $$(\alpha ,m)$$-flip number of the distinct elements problem is at most $$O\left( \frac{1}{\alpha } \log m \right)$$. However, if deletions are allowed (a.k.a. the turnstile model), then the flip number of this problem could be as big as $$\Omega (m)$$.

The generic construction of [9] for a function $${\mathcal {F}}$$ is as follows.

1. 1.

Instantiate $$\lambda =\lambda _{O(\alpha ),m}({\mathcal {F}})$$ independent copies of an oblivious streaming algorithm for the function $${\mathcal {F}}$$, and set $$j=1$$.

2. 2.

When the next update $$u_i$$ arrives:

1. (a)

Feed $$u_i$$ to all of the $$\lambda$$ copies.

2. (b)

Release an estimate using the jth copy (rounded to the nearest power of $$(1+\alpha )$$). If this estimate is different than the previous estimate, then set $$j\leftarrow j+1$$.

Ben-Eliezer et al. [9] showed that this can be used to transform an oblivious streaming algorithm for $${\mathcal {F}}$$ into an adversarially robust streaming algorithm for $${\mathcal {F}}$$. In addition, the overhead in terms of memory is only $$\lambda$$, which is small in many interesting settings.

The simple, but powerful, observation of Ben-Eliezer et al. [9], is that by “using every copy at most once” we can break the dependencies between the internal randomness of our algorithm and the choice for the elements in the stream. Intuitively, this holds because the answer is always computed using a “fresh copy” whose randomness is independent from the choice of stream items.

### 1.3 Existing Framework: Hassidim et al. [10]

Hassidim et al. [10] showed that, in fact, we can use every copy of the oblivious algorithm much more than once. In more detail, the idea of Hassidim et al. is to protect the internal randomness of each of the copies of the oblivious streaming algorithm using differential privacy [15]. Hassidim et al. showed that this still suffices in order to break the dependencies between the internal randomness of our algorithm and the choice for the elements in the stream. This resulted in an improved framework where the space blowup is only $$\approx \sqrt{\lambda }$$ (instead of $$\lambda$$). Informally, the framework of [10] is as follows.

1. 1.

Instantiate $${\hat{\lambda }} = {\tilde{O}}(\sqrt{\lambda })$$ independent copies of an oblivious streaming algorithm for the function $${\mathcal {F}}$$.

2. 2.

When the next update $$u_i$$ arrives:

1. (a)

Feed $$u_i$$ to all of the $${\hat{\lambda }}$$ copies.

2. (b)

Aggregate all of the estimates given by the $${\hat{\lambda }}$$ copies, and compare the aggregated estimate to the previous estimate. If the estimate had changed “significantly”, output the new estimate. Otherwise output the previous output.

In order to efficiently aggregate the estimates in Step 2b, this framework crucially relied on the fact that all of the copies of the oblivious algorithm are “the same” in the sense that they compute (or estimate) exactly the same function of the stream. This allowed Hassidim et al. to efficiently aggregate the returned estimates using standard tools from the literature on differential privacy. The intuition is that differential privacy allows us to identify global properties of the data, and hence, aggregating several numbers (the outcomes of the different oblivious algorithms) is easy if they are very similar.

### 1.4 Existing Framework: Woodruff and Zhou [16]

Woodruff and Zhou [16] presented an adversarial streaming framework that builds on the framework of Ben-Eliezer at el. [9]. The new idea of [16] is that, in many interesting cases, the oblivious algorithms we execute can be modified to track different (but related) functions, that require less space while still allowing us to use (or combine) several of them at any point in time in order to estimate $${\mathcal {F}}$$.

To illustrate this, consider a part of the input stream, say from time $$t_1$$ to time $$t_2$$, during which the target function $${\mathcal {F}}$$ doubles its value and is monotonically increasing. More specifically, suppose that we already know (or have a good estimation for) the value of $${\mathcal {F}}$$ at time $$t_1$$, and we want to track the value of $${\mathcal {F}}$$ from time $$t_1$$ till $$t_2$$. Recall that in the framework of [9] we only modify our output once the value of the function has changed by more than a $$(1+\alpha )$$ factor. As $${\mathcal {F}}(t_2)\le 2\cdot {\mathcal {F}}(t_1)$$, we get that between time $$t_1$$ and $$t_2$$ there are roughly $$1/\alpha$$ time points at which we need to modify our output. In the framework of [9], we need a fresh copy of the oblivious algorithm for each of these $$1/\alpha$$ time points. For concreteness, let us assume that every copy uses space $$1/\alpha ^2$$ (which is the case if, e.g., $${\mathcal {F}}=F_2$$Footnote 1), and hence the framework of [9] requires space $$1/\alpha ^3$$ to track the value of the function $${\mathcal {F}}$$ from $$t_1$$ till $$t_2$$.

In the framework of [16], on the other hand, this will cost only $$1/\alpha ^2$$. We now elaborate on this improvement. As we said, from time $$t_1$$ till $$t_2$$ there are $$1/\alpha$$ time points on which we need to modify our output. Let us denote these time points as $$t_1 = w_0< w_1<w_2<\dots <w_{1/\alpha } = t_2$$.Footnote 2 In the framework of [16], the oblivious algorithms we execute are tracking differences between the values of $${\mathcal {F}}$$ at specific times, rather than tracking the value of $${\mathcal {F}}$$ directly. (These algorithms are called difference estimators, or DE in short.) In more detail, suppose that for every $$j\in \{0,1,2,3,\dots ,\log \frac{1}{\alpha }\}$$ and every $$i_j\in \{2^j, 2{\cdot }2^j, 3{\cdot }2^j, 4{\cdot }2^j,\dots , \frac{1}{\alpha }\}$$ we have an oblivious algorithm (a difference estimator) for estimating the value of $$[{\mathcal {F}}(w_{i_j}) - {\mathcal {F}}(w_{i_j-2^j})]$$. We refer to the index j as the level of the oblivious algorithm. So there are $$\log \frac{1}{\alpha }$$ different levels, where we have a different number of oblivious algorithms for each level. (For level $$j=0$$ we have $$1/\alpha$$ oblivious algorithms and for level $$j=\log \frac{1}{\alpha }$$ we have only a single oblivious algorithm.)

Note that given all of these oblivious algorithms, we could compute an estimation for the value of the target function $${\mathcal {F}}$$ at each of the time points $$w_1,\dots ,w_{1/\alpha }$$ (and hence for every time $$t_1\le t\le t_2$$) by summing the estimations of (at most) one oblivious algorithm from each level.Footnote 3 For example, an estimation for the value of $${\mathcal {F}}\left( w_{\frac{3}{4\alpha }+1}\right)$$ can be obtained by combining estimations as follows (see also Fig. 1):

As we sum at most $$\log \frac{1}{\alpha }$$ estimations, this decomposition increases our estimation error only by a factor of $$\log \frac{1}{\alpha }$$, which is acceptable. The key observation of [16] is that the space complexity needed for an oblivious algorithm at level j decreases when j decreases (intuitively because in lower levels we need to track smaller differences, which is easier). So, even though in level $$j{=}10$$ we have more oblivious algorithms than in level 20, these oblivious algorithms are cheaper than in level 20 such that the overall space requirements for levels $$j{=}10$$ and level $$j{=}20$$ (or any other level) is the same. Specifically, [16] showed that (for many problems of interest, e.g., for $$F_2$$) the space requirement of a difference estimator at level j is $$O( 2^{j} /\alpha )$$. We run $$O(2^{-j} / \alpha )$$ oblivious algorithms for level j, and hence, the space needed for level j is $$O(2^{-j}/\alpha \cdot 2^{j}/\alpha )=O(1/\alpha ^{2})$$. As we have $$\log (1/\alpha )$$ levels, the overall space we need to track the value of $${\mathcal {F}}$$ from time $$t_1$$ till $$t_2$$ is $${\tilde{O}}(1/\alpha ^{2})$$. This should be contrasted with the space required by [9] for this time segment, which is $$O(1/\alpha ^3)$$.

### Remark 1.5

The informal takeaway from this example is that if we can track differences of the target function “efficiently” then this could be leveraged to design a robust algorithm with improved space complexity. In the example outlined above, the space needed to track differences decreases linearly with the bound on the maximal difference they aim to track. We extend this takeaway also to turnstile streams.

### 1.5 Our Results

The framework of [16] is very effective for the insertion-only model. However, there are two challenges that need to be addressed in the turnstile setting: (1) We are not aware of non-trivial constructions for difference estimators in the turnstile setting, and hence, the framework of [16] is not directly applicable to the turnstile setting.Footnote 4 (2) Even assuming the existence of a non-trivial difference estimator, the framework of [16] obtains sub-optimal results in the turnstile setting.

To overcome the first challenge, we introduce a new monitoring technique, that aims to identify time steps at which we cannot guarantee correctness of our difference estimators (in the turnstile setting), and reset the system at these time steps. This will depend on the specific application at hand (the target function) and hence, we defer the discussion on our monitoring technique to Sect. 6 where we discuss applications of our framework.

We now focus on the second challenge (after assuming the existence of non-trivial difference estimators). To illustrate the sub-optimality of the framework of [16], let us consider a simplified turnstile setting in which the input stream can be partitioned into k time segments during each of which the target function is monotonic, and increases (or decreases) by at most a factor of 2 (or 1/2). Note that k can be very large in the turnstile model (up to O(m)). With the framework of [16], we would need space $${\tilde{O}}\left( \frac{k}{\alpha ^2} \right)$$ to track the value of $$F_2$$ throughout such an input stream. The reason is that, like in the framework of [9], the robustness guarantees are achieved by making sure that every oblivious algorithm is “used only once”. This means that we cannot reuse the oblivious algorithms across the different segments, and hence, the space complexity of [16] scales linearly with the number of segments k.

To mitigate this issue, we propose a new construction that combines the frameworks of [16] with the framework of [10]. Intuitively, in our simplified example with the k segments, we want to reuse the oblivious algorithms across different segments, and protect their internal randomness with differential privacy to ensure robustness. However, there is an issue here. Recall that the framework of [10] crucially relied on the fact that all of the copies of the oblivious algorithm are “the same” in the sense that they compute the same function exactly. This allowed [10] to efficiently aggregate the estimates in a differentially private manner. However, in the framework of [16], the oblivious algorithms we maintain are fundamentally different from each other, tracking different functions. Specifically, every difference estimator is tracking the value of $$[{\mathcal {F}}(t)-{\mathcal {F}}(e)]$$ for a unique enabling time $$e<t$$ (where t denotes the current time). That is, every difference estimator necessarily has a different enabling time, and hence, they are not tracking the same function, and it is not clear how to aggregate their outcomes with differential privacy.

Toggle Difference Estimator (TDE). To overcome the above challenge, we present an extension to the notion of a difference estimator, which we call a Toggle Difference Estimator (see Definition 3.3). Informally, a toggle difference estimator is a difference estimator that allows us to modify its enabling time on the go. This means that a TDE can track, e.g., the value of $$[{\mathcal {F}}(t)-{\mathcal {F}}(e_1)]$$ for some (previously given) enabling time $$e_1$$, and then, at some later point in time, we can instruct the same TDE to track instead the value of $$[{\mathcal {F}}(t)-{\mathcal {F}}(e_2)]$$ for some other enabling time $$e_2$$. We show that this extra requirement from the difference estimator comes at a very low cost in terms of memory and runtime. Specifically, in Sect. 5 we present a generic (efficiency preserving) method for generating a TDE from a DE.

Let us return to our example with the k segments. Instead of using every oblivious algorithm only once, we reuse them across the different segments, where during any single segment all the TDE’s are instructed to track the appropriate differences that are needed for the current segment. This means that during every segment we have many copies of the “same” oblivious algorithm. More specifically, for every different level (as we explained above) we have many copies of an oblivious algorithm for that level, which is (currently) tracking the difference that we need. This allows our space complexity to scale with $$\sqrt{k}$$ instead of linearly with k as the framework of [16]. To summarize this discussion, our new notion of TDE allows us to gain both the space saving achieved by differential privacy (as in the framework of [10]) and the space saving achieved by tracking the target function via differences (as in the framework of [16]).

### Remark 1.6

The presentation given above (w.r.t. our example with the k segments) is oversimplified. Clearly, in general, we have no guarantees that an input (turnstile) stream can be partitioned into k such segments. This means that in the actual construction we need to calibrate our TDE’s across time segments in which the value of the target function is not monotone. See Sect. 4.1 for a more detailed overview of our construction and the additional modifications we had to introduce.

We are now ready to state our main result, extending the framework of Woodruff and Zhou [16] to the turnstile setting. As in the work of [16], in order to benefit from tracking differences, we need the property that tracking smaller differences is cheaper.

### Theorem 1.7

(informal version of Theorem B.1)) Let $${\mathcal {F}}$$ be a function for which the following algorithms exist:

1. 1.

An $$\alpha$$-accurate oblivious streaming algorithm $${\textsf{E}}$$ with space complexity $$\textrm{Space}({\textsf{E}})$$.

2. 2.

An $$\alpha$$-accurate oblivious TDE streaming algorithm $${\textsf{E}}_{\textrm{TDE}}$$ that for every $$\gamma$$ can track differences of (relative) size at most $$\gamma$$ using space $$\gamma \cdot \textrm{Space}({\textsf{E}}_{\textrm{TDE}})$$.

Then there is an $$O(\alpha )$$-accurate adversarially-robust algorithm for turnstile streams with bounded flip number $$\lambda$$ with spaceFootnote 5$${\tilde{O}}\left( \sqrt{\alpha \lambda }\cdot \left( \textrm{Space}({\textsf{E}}) + \textrm{Space}({\textsf{E}}_{\textrm{TDE}}) \right) \right)$$.

In contrast, under the same conditions, the framework of [16] requires space $${\tilde{O}}\left( \alpha \lambda \cdot \left( \textrm{Space}({\textsf{E}}) + \textrm{Space}({\textsf{E}}_{\textrm{TDE}}) \right) \right)$$. As we mentioned, we are not aware of non-trivial constructions for difference estimators that work in the turnstile setting. Hence, Theorem 1.7, as well as the results of [16], are not currently applicable (as is) to the turnstile setting. Nevertheless, in Sect. 6 we augment our framework using a new “monitoring technique” and show that it can be applied to the turnstile setting for estimating $$F_2$$ (the second moment of the stream). To this end, we introduce the following notion that allows us to control the number of times we need to reset our system (which happens when we cannot guarantee correctness of our difference estimators).

### Definition 1.8

(Twist number) The $$(\alpha ,m)$$-twist number of a stream $${\mathcal {S}}$$ w.r.t. a functionality $${\mathcal {F}}$$, denoted as $$\mathrm{\mu }_{\alpha ,m}({\mathcal {S}})$$, is the maximal $$\mu \in [m]$$ such that $${\mathcal {S}}$$ can be partitioned into $$2\mu$$ disjoint segments $${\mathcal {S}}= {\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_{\mathrm{\mu }-1} \circ {\mathcal {V}}_{\mathrm{\mu }-1}$$ (where $$\{{\mathcal {P}}_i\}_{i\in [\mathrm{\mu }]}$$ may be empty) s.t. for every $$i\in [\mu ]$$:

1. 1.

$${\mathcal {F}}({\mathcal {V}}_i) > \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {V}}_{i-1} \circ {\mathcal {P}}_i)$$

2. 2.

$$|{\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \dots \circ {\mathcal {P}}_i \circ {\mathcal {V}}_i) - {\mathcal {F}}({\mathcal {P}}_0\circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i)| \le \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i)$$

In words, a stream has twist number $$\mu$$ if there are $$\mu$$ disjoint segments $${\mathcal {V}}_0,\dots ,{\mathcal {V}}_{\mu -1}\subseteq {\mathcal {S}}$$ such that the value of the function on each of these segments is large (Condition 1), but still these segments do not change the value of the function on the entire stream by too much (Condition 2). Intuitively, the twist number bounds the number of regions in which a local view of the stream would suggest a large multiplicative change, but a global view would not.

### Example 1.9

For $$F_2$$ estimation in insertion-only streams, it holds that $$\mu =0$$ even though $$\lambda$$ can be large. This is the case because, in insertion only streams, Conditions 1 and 2 from Definition 1.8 cannot hold simultaneously. Specifically, for a stream $${\mathcal {S}}$$ partitioned as $${\mathcal {S}}= {\mathcal {P}}_0\circ \dots \circ {\mathcal {P}}_i \circ {\mathcal {V}}_i$$ denote p as the frequency vector of $${\mathcal {P}}_0\circ \dots \circ {\mathcal {P}}_i$$ and v the frequency vector of $${\mathcal {V}}_i$$, and suppose that Condition 2 holds, i.e., $$\Vert p + v\Vert ^2-\Vert p\Vert ^2\le \alpha \cdot \Vert p\Vert ^2$$. Hence, in order to show that Condition 1 does not hold, it suffices to show that $$\Vert v\Vert ^2\le \Vert p + v\Vert ^2-\Vert p\Vert ^2$$, i.e., show that $$\Vert v\Vert ^2+\Vert p\Vert ^2\le \Vert p + v\Vert ^2$$, i.e., show that $$(v_1^2+p_1^2)+\dots +(v_n^2+p_n^2)\le (v_1+p_1)^2+\dots +(v_n+p_n)^2$$, which trivially holds whenever $$p_i,v_i\ge 0$$.

In Sect. 6 we leverage this notation and present the following result for $$F_2$$ estimation (see Definition 3.4 for $$F_2$$).

### Theorem 1.10

($$F_2$$ Robust estimation, informal) There exists an adversarially robust $$F_2$$ estimation algorithm for turnstile streams of length m with a bounded $$(O(\alpha ), m)$$-flip number $$\lambda$$ and a bounded $$(O(\alpha ), m)$$-twist number $$\mu$$ that guarantees $$\alpha$$-accuracy (w.h.p.) using space complexity $${\tilde{O}}\left( \frac{\sqrt{\alpha \lambda +\mu }}{\alpha ^{2}}\right)$$.

This should be contrasted with the result of [10], who obtain space complexity $${\tilde{O}}\left( \frac{\sqrt{\lambda }}{\alpha ^2}\right)$$ for robust $$F_2$$ estimation in the turnstile setting. Hence, our new result is better whenever $$\mu \ll \lambda$$. See a comparison of the space complexity of all aforementioned frameworks in Table 1. We leave open the possibility of pushing down the space complexity to $$\approx \frac{\sqrt{\lambda }}{\alpha ^{1.5}}$$ or to $$\approx \frac{\sqrt{\lambda +\mu }}{\alpha ^{1.5}}$$. Informally, the bottleneck for this in our construction is that we need to “reset” our system whenever a “twist” event occurs, which translates to a cost of an additional oblivious strong tracker. See Sect. 2 for elaborated discussion on a “twist” event in an $$F_2$$ estimation for a turnstile stream and example of a turnstile stream with large flip-number and small twist number in which our framework perform better then both [10, 16].

## 2 Discussion and Open Questions

In this section we discuss the role of our new notion, twist-number ($$\mu$$), in the space blow-up of our framework’s extension. For that, we discuss the events this notion captures, illustrate such an event in drawing (see Fig. 2 below) and explain how it effects the space complexity. We also elaborate regarding the regime where our framework outperforms the other existing frameworks and finally pose an open question.

Twist number and violation events. The notion of twist-number aims to tackle the following issue: Our generic framework relies on the existence of a DE for the target functionality. By definition, a $$\gamma$$-DE activated on a stream prefix $${\mathcal {P}}$$ must maintain accuracy for every stream suffix $${\mathcal {S}}$$ that satisfies the condition:

(1)

however, we are not aware of constructions of non-trivial DE’s in the turnstile setting. To overcome this, we will instantiate our framework with existing DE’s that operate only in insertion only streams, and augment our framework with a “monitoring technique” that identifies “violations” which are time points at which our underlying difference estimators might loose their utility guarantees (which may happen since they are not designed to operate outside of insertion-only streams). More specifically, in the context of $$F_2$$, existing constructions for DE with parameter $$\gamma$$, which are activated on a steam prefix $${\mathcal {P}}$$ and provide estimates during a suffix $${\mathcal {S}}$$, operate under the following assumption:

(2)

In insertion only streams, the requirement condition 1 of the DE implies its utility guarantee assumption 2. However, this implication does not hold in turnstile streams. We refer to events where assumption 2 fails but condition 1 still holds as violation events (see suffix violation events in Sect. C, Definition C.2). Essentially, these violations occur when the value of the suffix is large ($${\mathcal {F}}({\mathcal {S}})\ge \gamma \cdot {\mathcal {F}}({\mathcal {P}})$$ for some $$\gamma \in (0,1)$$), but the value of the entire stream does not change significantly ($${\mathcal {F}}({\mathcal {P}}\circ {\mathcal {S}})\approx {\mathcal {F}}({\mathcal {P}})$$). In such cases, the utility guarantee of the DE construction does not hold.

A violation event—a drawing illustration. We give a illustrative example at Fig. 2. In the drawing we show the value of the $$F_2$$ over a prefix (black arrow) and two corresponding trajectories of suffixes (blue dashed line, red dashed line). For the illustration, assume that at the point in time following the prefix $${\mathcal {P}}$$, we output its estimation. We then initialize a DE to estimate the contribution of the suffix $${\mathcal {S}}$$ to the value of the function on $${\mathcal {P}}\circ {\mathcal {S}}$$. According to the DE construction, whenever the suffix norm becomes too large, its estimation is no longer valid. Thus we mark in a green circle a possible bound on this DE. To mark the boundaries of the next required output modification, we have the two dashed orange circles at a multiplicative distance from the current value of $${\mathcal {F}}({\mathcal {P}})$$. These circles are referred to as the reporting boundaries. Whenever a trajectory crosses one of these reporting boundaries, we must update our output. The red trajectory crosses the reporting boundary when the DE guarantee no longer holds. In this scenario, we cannot rely on the DE estimation and must use alternative estimator for the required output modification. On the other hand, the blue trajectory crosses the reporting boundary while still meeting the DE guarantees requirements, making its estimation valid.

The regime where the framework is effective. For $$F_2$$ estimation under turnstile streams that have at least one $$O(\alpha )$$-suffix violation (i.e., $$\mu _{\alpha ,m}>0$$), the framework of [16] is not applicable, leaving only the frameworks of [10] and ours as viable options. Our framework achieves better performance whenever $$\mu \ll \lambda$$ or $$\mu = O((1-\alpha ) \lambda )$$. As illustrated above, an $$O(\alpha )$$-suffix violation event occurs when the frequency vector “twists” without significantly changing its $$L_2$$-norm. Since we may reset our framework during these events, we achieve better performance than [10] for streams that exhibits fewer “twists” compared to magnitude changes (as captured by the flip-number) in their frequency vector. However, for streams that demonstrate a larger number of “twists”, our framework has a similar space blowup to [10] because it essentially loses the optimization of DE and relies only on ST estimators.

Open question. Prior work has shown a strong connection between the flip-number of the target function, i.e., the number of output modification, and the resulting space blowup for estimating the target function under adversarial inputs compared to oblivious inputs. In this work, we introduce an additional notion of the input stream, the twist-number, which governs the space blowup. As discussed earlier, this new notion captures events corresponding to scenarios where insertion only DE constructions fail. Designing DE for turnstile inputs will remove the dependency in the twist number in the resulting space blowup.

### Question 2.1

Is it possible design an algorithm for robust $$F_2$$ estimation in the turnstile setting that achieves a space blowup of $$o(\sqrt{\alpha \lambda + \mu })$$?

That is, it is preferred to have a framework that has a blowup that is independent of the twist number of the stream. In the context of $$F_2$$ estimation, constructing an oblivious DE for turnstile streams with space requirements similar to those of existing insertion only DE will result in a space blowup of $${\tilde{O}}(\sqrt{\alpha \lambda })$$ using the proposed framework.

### 2.1 Other Related Works

Related to our work is the line of work on adaptive data analysis, aimed at designing tools for guaranteeing statistical validity in cases where the data is being accessed adaptively [17,18,19,20,21,22,23,24,25,26]. Recall that the difficulty in the adversarial streaming model arises from potential dependencies between the inputs of the algorithm and its internal randomness. As we mentioned, our construction builds on a technique introduced by [10] for using differential privacy to protect not the input data, but rather the internal randomness of algorithm. Following [10], this technique was also used by [27, 28] for designing robust algorithms in other settings.

## 3 Preliminaries

In this work we consider input streams which are represented as a sequence of updates, where every update is a tuple containing an element (from a finite domain) and its (integer) weight. Formally,

### Definition 3.1

(Turnstile stream) A stream of length m over a domain [n],Footnote 6 consists of a sequence of updates $$\langle s_0, \Delta _0 \rangle ,\dots , \langle s_{m-1}, \Delta _{m-1} \rangle$$ where $$s_i\in [n]$$ and $$\Delta _i \in {\mathbb {Z}}$$. Given a stream $${\mathcal {S}}\in ([n]\times {\mathbb {Z}})^m$$ and integers $$0\le t_1\le t_2\le m-1$$, we write $${\mathcal {S}}^{t_1}_{t_2} = (\langle s_{t_1}, \Delta _{t_1} \rangle ,\dots , \langle s_{t_2}, \Delta _{t_2} \rangle )$$ to denote the sequence of updates from time $$t_1$$ till $$t_2$$. We also use the abbreviation $${\mathcal {S}}_{t}={\mathcal {S}}^{1}_{t}$$ to denote the first t updates.

Let $${\mathcal {F}}:([n]\times {\mathbb {Z}})^{*}\rightarrow {\mathbb {R}}$$ be a function (for example $${\mathcal {F}}$$ might count the number of distinct elements in the stream). At every time step t, after obtaining the next element in the stream $$\langle s_t, \Delta _t \rangle$$, our goal is to output an approximation for $${\mathcal {F}}({\mathcal {S}}_t)$$. To simplify presentation we also denote $${\mathcal {F}}(t)={\mathcal {F}}({\mathcal {S}}_t)$$ for $$t\in [m]$$. We assume throughout the paper that $$\log (m)=\Theta (\log (n))$$ and that $${\mathcal {F}}$$ is bounded polynomially in n.

In Sect. 1, for the purpose of presentation, it was useful to refer to the quantity a flip number of a function. Our results are stated w.r.t a more refined quantity: a flip number of a stream.

### Definition 3.2

(Flip number of a stream [9]) Given a function $${\mathcal {F}}$$ and a stream $${\mathcal {S}}\in {\mathcal {X}}^m$$ of length m, denote $${\mathcal {F}}$$ values over $${\mathcal {S}}$$ as $$(y_1, \dots , y_m)$$. The $$(\alpha ,m)$$-flip number of $${\mathcal {S}}$$, denoted as $$\lambda _{\alpha }({\mathcal {S}})$$, is the maximal number $$k\in [m]$$ s.t. there exists some indices $$0\le i_1< \dots < i_k\le m$$ where for every $$j\in \{2,\dots k\}$$, it holds that $$y_{i_{j-1}} \notin (1\pm \alpha )\cdot y_{i_{j}}$$.

Toggle Difference Estimator. For the purpose of our framework, we present an extension to the notion of a difference estimator (DE) from [16], which we call a toggle difference estimator (TDE). A difference estimator for a function $${\mathcal {F}}$$ is an oblivious streaming algorithm, defined informally as follows: The difference estimator is initiated on time $$t = 1$$ and has a dynamically defined enabling time $$1\le e\le m$$. Once that enabling time is set, the difference estimator outputs an estimation for $$\left( {\mathcal {F}}({\mathcal {S}}_t) - {\mathcal {F}}({\mathcal {S}}_e)\right)$$ for all times $$t>e$$ (provided some conditions on that difference). That is, once the difference estimator’s enabling time is set, it cannot be changed. And so, if an estimation is needed for some other enabling time, say $$e^{\prime } \ne e$$, then an additional instance of a difference estimator is needed. Our framework requires from such an estimator to be able to provide estimations for multiple enabling times, as long as the estimation periods do not overlap. This is captured in the following definition.

### Definition 3.3

(Toggle Difference Estimator) Let $${\mathcal {F}}:([n]\times {\mathbb {Z}})^{*}\rightarrow {\mathbb {R}}$$ be a function, and let $$m,p\in {\mathbb {N}}$$ and $$\gamma ,\alpha , \delta \in (0,1)$$ be parameters. Let $${\textsf{E}}$$ be an algorithm with the following syntax. In every time step $$t \in [m]$$, algorithm $${\textsf{E}}$$ obtains an update $$\langle s_t, \Delta _t , b_t \rangle \in ([n]\times {\mathbb {Z}}\times \{0,1\})$$ and outputs a number $$z_t$$. Here $$\langle s_t, \Delta _t \rangle$$ denotes the current update, and $$b_t$$ is an indicator for when the current time t should be considered as the new enabling time. We consider input streams $${\mathcal {S}}\in ([n]\times {\mathbb {Z}}\times \{0,1\})^m$$ such that there are at most p time steps t for which $$b_t=1$$, and denote these time steps as $$1\le e^1< e^2< \dots< e^{p}<m$$. Also, for a time step $$t\in [m]$$ we denote $$e(t)=\max \{ e^i : e^i\le t \}$$.

Algorithm $${\textsf{E}}$$ is a $$(\gamma ,\alpha , p, \delta )$$-toggle difference estimator for $${\mathcal {F}}$$ if the following holds for every such input stream $${\mathcal {S}}$$. With probability at least $$1-\delta$$, for every $$t\in [m]$$ such that

(3)

the algorithm outputs a value $$z_t$$ such that $$z_t\in \left( {\mathcal {F}}({\mathcal {S}}_{t}) - {\mathcal {F}}({\mathcal {S}}_{e(t)})\right) \pm \alpha \cdot {\mathcal {F}}({\mathcal {S}}_{e(t)})$$.

This definition generalizes the notion of a difference estimator (DE) from [16], in which $$p=1$$. In Sect. 5 we show that this extension comes at a very low cost in terms of the space complexity. Note that on times t s.t. the requirements specified w.r.t. $$\gamma$$ do not hold, there is no accuracy guarantee from the TDE algorithm.

Frequency moments of a stream. Useful statistics of a stream are its frequency moments. These statistics are refereed to during the introduction section and the application. We give here the formal definition.

### Definition 3.4

(Frequency moments.) Let $${\mathcal {S}}= \langle s_0, \Delta _0 \rangle ,\dots , \langle s_{m-1}, \Delta _{m-1} \rangle \in ([n]\times {\mathbb {Z}})^{m}$$ be a stream of length m then its corresponding frequency vector $$v_{{\mathcal {S}}} \in {\mathbb {R}}^{n}$$ is defined as $$v_{{\mathcal {S}}}[i] = \sum _{j\in [m]:s_j=i} \Delta _j, \forall i\in [n]$$. For $$p\ge 0$$, the p-th frequency moment of $${\mathcal {S}}$$ defined as $$\Vert v_{{\mathcal {S}}} \Vert ^p_p$$ where $$\Vert \cdot \Vert _p$$ is the p-th norm.

The (p) frequency moments are also denoted as $$F_p$$. In the introduction section we took as an example for $${\mathcal {F}}$$, the tracked function of a stream, the number of its distinct elements which correspond to $$F_0$$. Our presented application is for the function $$F_2$$.

### 3.1 Preliminaries from Differential Privacy

Differential privacy [15] is a mathematical definition for privacy that aims to enable statistical analyses of databases while providing strong guarantees that individual-level information does not leak. Consider an algorithm $${\mathcal {A}}$$ that operates on a database in which every row represents the data of one individual. Algorithm $${\mathcal {A}}$$ is said to be differentially private if its outcome distribution is insensitive to arbitrary changes in the data of any single individual. Intuitively, this means that algorithm $${\mathcal {A}}$$ leaks very little information about the data of any single individual, because its outcome would have been distributed roughly the same even if without the data of that individual. Formally,

### Definition 3.5

([15]) Let $${\mathcal {A}}$$ be a randomized algorithm that operates on databases. Algorithm $${\mathcal {A}}$$ is $$(\varepsilon ,\delta )$$-differentially private if for any two databases $$S,S^{\prime }$$ that differ on one row, and any event T, we have

See Appendix A for additional preliminaries on differential privacy.

## 4 A Framework for Adversarial Streaming

Our transformation from an oblivious streaming algorithm $${{\textsf{E}}}_{\textrm{ST}}$$ for a function $${\mathcal {F}}$$ into an adversarially robust algorithm requires the following two conditions.

1. 1.

The existence of a toggle difference estimator $${{\textsf{E}}}_{\textrm{TDE}}$$ for $${\mathcal {F}}$$, see Definition 3.3.

2. 2.

Every single update can change the value of $${\mathcal {F}}$$ up to a factor of $$(1\pm \alpha ^{\prime })$$ for some $$\alpha ^{\prime }=O(\alpha )$$. Formally, throughout the analysis we assume that for every stream $${\mathcal {S}}$$ and for every update $$u=\langle s, \Delta \rangle$$ it holds that

### Remark 4.1

These conditions are identical to the conditions required by [16]. Formally, they require only a difference estimator instead of a toggle difference estimator, but we show that these two objects are equivalent. See Sect. 5.

### Remark 4.2

Condition 2 can be met for many functions of interest, by applying our framework on portions of the stream during which the value of the function is large enough. For example, when estimating $$F_2$$ with update weights $$\pm 1$$, whenever the value of the function is at least $$\Omega (1/\alpha )$$, a single update can increase the value of the function by at most a $$(1+\alpha )$$ factor. Estimating $$F_2$$ whenever the value of the function is smaller than $$O(1/\alpha )$$ can be done using an existing (oblivious) streaming algorithm with error $$\rho =O(\alpha )$$. To see that we can use an oblivious algorithm in this setting, note that the additive error of the oblivious streaming algorithm is at most $$O(\frac{\rho }{\alpha })\ll 1$$. Hence, by rounding the answers of the oblivious algorithm we ensure that its answers are exactly accurate (rather than approximate). As the oblivious algorithm returns exact answers in this setting, it must also be adversarially robust.Footnote 7

### 4.1 Construction Overview

Our construction builds on the constructions of [10, 16]. At a high level, the structure of our construction is similar to that of [16], but our robustness guarantees are achieved using differential privacy, similarly to [10], and using our new concept of TDE.

Our algorithm can be thought of as operating in phases. In the beginning of every phase, we aggregate the estimates given by our strong trackers with differential privacy, and “freeze” this aggregated estimate as the base value for the rest of the phase. Inside every phase, we privately aggregate (and “freeze”) estimates given by our TDE’s. More specifically, throughout the execution we aggregate TDE’s of different types/levels (we refer to the level that is currently being aggregated as the active level). At any point in time we estimate the (current) value of the target function by summing specific “frozen” differences together with the base value.

We remark that, in addition to introducing the notion of TDE’s, we had to incorporate several modifications to the framework of [16] in order to make it compatible with our TDE’s and with differential privacy. In particular, [16] manages phases by placing fixed thresholds (powers of 2) on the value of the target function; starting a new phase whenever the value of the target function crosses the next power of 2. If, at some point in time, the value of the target function drops below the power of 2 that started this phase, then this phase ends, and they go back to the previous phase. This is possible in their framework because the DE’s of the previous phase still exist in memory and are ready to be used. In our framework, on the other hand, we need to share all of the TDE’s across the different phases, and we cannot go back to “TDE’s of the previous phase” because these TDE’s are now tracking other differences. We overcome this issue by modifying the way in which differences are combined inside each phase.

In Algorithm 1 we present a simplified version of our main construction, including inline comments to improve readability. The complete construction is given in Algorithm RobustDE.

### 4.2 Analysis Overview

At a high level, the analysis can be partitioned into five components (with one component being significantly more complex then the others). We now elaborate on each of these components. The complete analysis is given in Appendix B.

#### 4.2.1 First Component: Privacy Analysis

In Sect. B.1 we show that our construction satisfies differential privacy w.r.t. the collection of random strings on which the oblivious algorithms operate. Recall that throughout the execution we aggregate (with differential privacy) the outcome of our estimators from the different levels. Thus, in order to show that the whole construction satisfies privacy (using composition theorems) we need to bound the maximal number of times we aggregate the estimates from the different levels. However, we can only bound this number under the assumption that the framework is accurate (in the adaptive setting), and for that we need to rely on the privacy properties of the framework. So there is a bit of circularity here. To simplify the analysis, we add to the algorithm hardcoded caps on the maximal number of times we can aggregate estimates at the different levels. This makes the privacy analysis straightforward. However, we will later need to show that this hardcoded capping “never” happens, as otherwise the algorithm fails.Footnote 8 These hardcoded caps are specified by the parameters $$P_j$$ (both in the simplified algorithm and in the complete construction), using which we make sure that the estimators at level j are never aggregated (with differential privacy) more than $$P_j$$ times.

#### 4.2.2 Second Component: Conditional Accuracy

In Sect. B.2 we show that if the following two conditions hold, then the framework is accurate:

Condition (1)::

At any time step throughout the execution, at least 80% of the estimators in every level are accurate (w.r.t. the differences that they are estimating).

Condition (2)::

The hardcoded capping never happens.

This is the main technical part in our analysis; here we provide an oversimplified overview, hiding many of the technicalities. We first show that if Conditions (1) and (2) hold then the framework is accurate. We show this by proving a sequence of lemmas that hold (w.h.p.) whenever Conditions (1) and (2) hold. We now elaborate on some of these lemmas. Recall that throughout the execution we “freeze” aggregated estimates given by the different levels. The following lemma shows that these “frozen” aggregations are accurate (at the moments at which we “freeze” them). This Lemma follows almost immediately from Condition (1), as if the vast majority of our estimators are accurate, then so is their private aggregation.

### Lemma 4.3

(informal version of Lemma B.4) In every time step $$t\in [m]$$ in which we compute a value $${\textsf{Z}}_j$$ (in Step 7a of Algorithm RobustDE, or Step 4a of the simplified algorithm) it holds that $${\textsf{Z}}_j$$ is accurate. Informally, if the current level j is that of the strong trackers, then $$|{\textsf{Z}}_j - {\mathcal {F}}(t)| < \alpha \cdot {\mathcal {F}}(t)$$, and otherwise $$|{\textsf{Z}}_j - ({\mathcal {F}}(t) - {\mathcal {F}}(e_j))| < \alpha \cdot {\mathcal {F}}(e_j)$$, where $$e_j$$ is the last enabling time of level j.

During every time step $$t\in [m]$$, we test whether the previous output is still accurate (and modify it if it is not). This test is done by comparing the previous output with (many) suggestions we get for the current value of the target function. These suggestions are obtained by summing the outputs of the estimators at the currently active level j together with a (partial) sum of the previously frozen estimates (denoted as $${\textsf{Z}}$$). This is done in Step 7 of Algorithm RobustDE, or in Step 4 of the simplified algorithm. The following lemma, which we prove using Lemma 4.3, states that the majority of these suggestions are accurate (and hence our test is valid).

### Lemma 4.4

(informal version of Lemma B.7) Fix a time step $$t\in [m]$$, and let j denote the level of active estimators. Then, for at least $$80\%$$ of the estimators in level j, summing their output z with $${\textsf{Z}}$$ is an accurate estimation for the current value of the target function, i.e., $$\left| {\mathcal {F}}(t) - ({\textsf{Z}} + z) \right| \le \alpha \cdot {\mathcal {F}}(t).$$

So, in every iteration we test whether our previous output is still accurate, and our test is valid. Furthermore, when the previous output is not accurate, we modify it to be $$({\textsf{Z}} + {\textsf{Z}}_j)$$, where $${\textsf{Z}}_j$$ is the new aggregation (the new “freeze”) of the estimators at level j. So this modified output is accurate (assuming that the hardcoded capping did not happen, i.e., Condition (2), as otherwise the output is not modified). We hence get the following lemma.

### Lemma 4.5

(informal version of Lemma B.9) In every time step $$t\in [m]$$ we have

That is, the above lemma shows that our output is “always” accurate. Recall, however, that this holds only assuming that Conditions (1) and (2) hold.

#### 4.2.3 Third Component: Calibrating to Avoid Capping

In Sect. B.3 we derive a high probability bound on the maximal number of times we will aggregate estimates at the different levels. In other words, we show that, with the right setting of parameters, we can make sure that Condition (2) holds. The analysis of this component still assumes that Condition (1) holds.

We first show that between every two consecutive times in which we modify our output, the value of the target function must change noticeably. Formally,

### Lemma 4.6

(Informal version of Lemma B.12) Let $$t_1< t_2\in [m]$$ be consecutive times in which the output is modified (i.e., the output is modified in each of these two iterations, and is not modified between them). Then, $$|{\mathcal {F}}(t_2) - {\mathcal {F}}(t_1)| =\Omega \left( \alpha \cdot {\mathcal {F}}(t_1) \right)$$.

We leverage this lemma in order to show that there cannot be too many time steps during which we modify our output. We then partition these time steps and “charge” different levels j for different times during which the output is modified. This allows us to prove a probabilistic bound on the maximal number of times we aggregate the estimates from the different levels (each level has a different bound). See Lemma B.14 for the formal details.

#### 4.2.4 Forth Component: The Framework is Robust

In Sect. B.4 we prove that Condition (1) holds (w.h.p.). That is, we show that at any time step throughout the execution, at least 80% of the estimators in every level are accurate.

This includes two parts. First, in Lemma B.16, we show that throughout the execution, the condition required by our TDE’s hold (specifically, see 3 in Definition 3.3). This means that, had the stream been fixed in advance, then (w.h.p.) we would have that all of the estimators are accurate throughout the execution. In other words, this shows that if there were no adversary then (a stronger variant of) Condition (1) holds.

Second, in Lemma B.17 we leverage the generalization properties of differential privacy to show that Condition (1) must also hold in the adversarial setting. This lemma is similar to the analysis of [10].

#### 4.2.5 Fifth Component: Calculating the Space Complexity

In the final part of the analysis, in Sect. B.5, we calculate the total space needed by the framework by accounting for the number of estimators in each level (which is a function of the high probability bound we derived on the number of aggregations done in each level), and the space they require. We refer the reader to Appendix B for the formal analysis.

## 5 Toggle Difference Estimator from a Difference Estimator

We present a simple method that transforms any difference estimator to a toggle difference estimator. The method works as follows. Let $$\textrm{DE}$$ be a difference estimator (given as an subroutine). We construct a $$\textrm{TDE}$$ that instantiates two copies of the given difference estimator: $$\textrm{DE}_{{ \mathrm enable}}$$ and $$\textrm{DE}_{\textrm{fresh}}$$. It also passes its parameters, apart of the enabling times, verbatim to both copies. As $$\textrm{DE}$$ is set to output estimations only after receiving an (online) enabling time e, the $$\textrm{TDE}$$ never enables the copy $$\textrm{DE}_{\textrm{fresh}}$$. Instead, $$\textrm{DE}_{\textrm{fresh}}$$ is used as a fresh copy that received the needed parameters and the stream $${\mathcal {S}}$$ and therefore it is always ready to be enabled. Whenever a time t is equal to some enabling time (i.e. $$t=e^i$$ for some $$i\in [p]$$), then the $$\textrm{TDE}$$ copies the state of $$\textrm{DE}_{\textrm{fresh}}$$ to $$\textrm{DE}_{\textrm{enable}}$$ (running over the same space), and then it enables $$\textrm{DE}_{\textrm{enable}}$$ for outputting estimations.

### Corollary 5.1

For any function $${\mathcal {F}}$$, provided that there exist a $$(\gamma ,\alpha ,\delta )$$-Difference Estimator for $${\mathcal {F}}$$ with space $$S_{\textrm{DE}}(\gamma ,\alpha ,\delta ,n,m)$$, then there exists a $$(\gamma ,\alpha ,\delta ,p)$$-Toggle Difference Estimator for $${\mathcal {F}}$$ with space $$S_{\textrm{TDE}}(\gamma ,\alpha ,\delta ,p,n,m) = 2\cdot S_{\textrm{DE}}(\gamma ,\alpha ,\delta /p,n,m)$$

Note that for a $$\textrm{DE}$$ whose space dependency w.r.t. the failure parameter $$\delta$$ is logarithmic, the above construction gives a $$\textrm{TDE}$$ with at most a logarithmic blowup in space, resulting from the p enabling times.

## 6 Applications

Our framework is applicable to functionalities that admit a strong tracker and a difference estimator. As [16] showed, difference estimators exist for many functionalities of interest in the insertion only model, including estimating frequency moments of a stream, estimating the number of distinct elements in a stream, identifying heavy-hitters in a stream and entropy estimation. However, as we mentioned, we are not aware of non-trivial DE constructions in the turnstile model. In more detail, [16] presented DE for the turnstile setting, but these DE require additional assumptions and do not exactly fit our framework (nor the framework of [16]).

To overcome this challenge we introduce a new monitoring technique which we use as a wrapper around our framework. This wrapper allows us to check whether the additional assumptions required by the DE hold, and reset our system when they do not. As a concrete application, we present the resulting bounds for $$F_2$$ estimation.

### Definition 6.1

(Frequency vector) The frequency vector of a stream $$S = (\langle s_1, \Delta _1 \rangle ,\dots , \langle s_m, \Delta _m \rangle )\in ([n]{\times }\{\pm 1\})^{m}$$ is the vector $$u\in {\mathbb {Z}}^{n}$$ whose ith coordinate is $$u[i] = \sum _{j\in [m], s_j = i}{\Delta _j}.$$ We write $$u^{(t)}$$ to denote the frequency vector of the stream $$S_t$$, i.e., restricted to the first t updates. Given two time points $$t_1\le t_2\in [m]$$ we write $$u^{(t_1,t_2)}$$ to denote the frequency vector of the stream $$S_{t_1}^{t_2}$$, i.e., restricted to the updates between time $$t_1$$ and $$t_2$$.

In this section we focus on estimating $$F_2$$, the second moment of the frequency vector. That is, after every time step t, after obtaining the next update $$\langle s_t, \Delta _t \rangle \in ([n]{\times }\{\pm 1\})$$, we want to output an estimation for

Woodruff and Zhou [16] presented a $$(\gamma , \alpha , \delta )$$-difference estimator for $$F_2$$ that works in the turnstile model, under the additional assumption that for any time point t and enabling time $$e\le t$$ it holds that

(4)

In general, we cannot guarantee that this condition holds in a turnstile stream (See discussion in Sect. 2). To bridge this gap, we introduce the notion of twist number (see Definition 1.8) in order to control the number of times during which this condition does not hold (when this condition does not hold we say that a violation has occurred). Armed with this notion, our approach is to run our framework (algorithm RobustDE) alongside a validation algorithm (algorithm Guardian) that identifies time steps at which algorithm RobustDE loses accuracy, meaning that a violation has occurred. We then restart algorithm RobustDE in order to maintain accuracy. As we show, our notion of twist number allows us to bound the total number of possible violation, and hence, bound the number of possible resets. This in turn allows us to bound the necessary space for our complete construction. The details are given in Appendix C; here we only state the result.

### Theorem 6.2

There exists an adversarially robust $$F_2$$ estimation algorithm for turnstile streams of length m with a bounded $$(O(\alpha ), m)$$-flip number and $$(O(\alpha ), m)$$-twist number with parameters $$\lambda$$ and $$\mu$$ correspondingly, that guarantees $$\alpha$$-accuracy with probability at least $$1-1/m$$ in all time $$t\in [m]$$ using space complexity of

As we mentioned, this should be contrasted with the result of [10], who obtain space complexity $$\tilde{{\mathcal {O}}}\left( \frac{\sqrt{\lambda }}{\alpha ^2}\right)$$ for robust $$F_2$$ estimation in the turnstile setting. Hence, our new result is better whenever $$\mu \ll \lambda$$.