Abstract
Classical streaming algorithms operate under the (not always reasonable) assumption that the input stream is fixed in advance. Recently, there is a growing interest in designing robust streaming algorithms that provide provable guarantees even when the input stream is chosen adaptively as the execution progresses. We propose a new framework for robust streaming that combines techniques from two recently suggested frameworks by Hassidim et al. (NeurIPS 2020) and by Woodruff and Zhou (FOCS 2021). These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks into a single hybrid framework that obtains the “best of both worlds”, thereby solving a question left open by Woodruff and Zhou.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Streaming algorithms are algorithms for processing large data streams while using only a limited amount of memory, significantly smaller than what is needed to store the entire data stream. Data streams occur in many applications including computer networking, databases, and natural language processing. The seminal work of Alon, Matias, and Szegedy [1] initiated an extensive theoretical study and further applications of streaming algorithms.
In this work we focus on streaming algorithms that aim to maintain, at any point in time, an approximation for the value of some (predefined) realvalued function of the input stream. Such streaming algorithms are sometimes referred to as strong trackers. For example, this predefined function might count the number of distinct elements in the stream. Formally,
Definition 1.1
Let \({\mathcal {A}}\) be an algorithm that, for m rounds, obtains an element from a domain X and outputs a real number. Algorithm \({\mathcal {A}}\) is said to be a strong tracker for a function \({\mathcal {F}}:X^*\rightarrow {\mathbb {R}}\) with accuracy \(\alpha \), failure probability \(\delta \), and stream length m if the following holds for every sequence \(\textbf{u}=(u_1,\dots ,u_m)\in X^m\). Consider an execution of \({\mathcal {A}}\) on the input stream \(\textbf{u}\), and denote the answers given by \({\mathcal {A}}\) as \(\textbf{z}=(z_1,\dots ,z_m)\). Then,
where the probability is taken over the coins of algorithm \({\mathcal {A}}\).
While Definition 1.1 is certainly not the only possible definition of streaming algorithms, it is rather standard. Note that in this definition we assume that the input stream \(\textbf{u}\) is fixed in advance. In particular, we assume that the choice for the elements in the stream is independent from the internal randomness of \({\mathcal {A}}\). This assumption is crucial for the analysis (and correctness) of many of the existing streaming algorithms. We refer to algorithms that utilize this assumption as oblivious streaming algorithms. In this work we are interested in the setting where this assumption does not hold, often called the adversarial setting.
1.1 The Adversarial Streaming Model
The adversarial streaming model, in various forms, was considered by [2,3,4,5,6,7,8,9,10,11,12,13,14]. We give here the formulation presented by BenEliezer et al. [9]. The adversarial setting is modeled by a twoplayer game between a (randomized) StreamingAlgorithm and an Adversary. At the beginning, we fix a function \({\mathcal {F}}:X^*\rightarrow {\mathbb {R}}\). Then the game proceeds in rounds, where in the ith round:

1.
The Adversary chooses an update \(u_i\in X\) for the stream, which can depend, in particular, on all previous stream updates and outputs of StreamingAlgorithm.

2.
The StreamingAlgorithm processes the new update \(u_i\) and outputs its current response \(z_i\in {\mathbb {R}}\).
The goal of the Adversary is to make the StreamingAlgorithm output an incorrect response \(z_i\) at some point i in the stream. For example, in the distinct elements problem, the adversary’s goal is that at some step i, the estimate \(z_i\) will fail to be a \((1+\alpha )\)approximation of the true current number of distinct elements.
In this work we present a new framework for transforming an oblivious streaming algorithm into an adversariallyrobust streaming algorithm. Before presenting our framework, we first elaborate on the existing literature and the currently available frameworks.
1.2 Existing Framework: BenEliezer et al. [9]
To illustrate the results of [9], let us consider the distinct elements problem, in which the function \({\mathcal {F}}\) counts the number of distinct elements in the stream. Observe that, assuming that there are no deletions in the stream, this quantity is monotonically increasing. Furthermore, since we are aiming for a multiplicative error, the number of times we need to modify the estimate we release is quite small (it depends logarithmically on the stream length m). Informally, the idea of [9] is to run several independent copies of an oblivious algorithm (in parallel), and to use each copy to release answers over a part of the stream during which the estimate remains constant. In more detail, the generic transformation of [9] (applicable not only to the distinct elements problem) is based on the following definition.
Definition 1.2
(Flip number [9]) Given a function \({\mathcal {F}}: {\mathcal {X}}^* \rightarrow {\mathbb {R}}\), denote its values for some stream of length m as \((y_1, \dots , y_m)\). The \((\alpha ,m)\)flip number of \({\mathcal {F}}\), denoted as \(\lambda _{\alpha ,m}({\mathcal {F}})\), or simply \(\lambda \) in short, is the maximal number \(k\in [m]\) s.t. for any stream of length m there exists some indices \(0\le i_1< \dots < i_k\le m\) where for every \(j\in \{2,\dots k\}\), it holds that \(y_{i_{j1}} \notin (1\pm \alpha )\cdot y_{i_{j}}\).
That is, above definition captures the number of times a function can change by a factor of \((1\pm \alpha )\) for any stream of length m.
Remark 1.3
In the technical sections of this work, we sometimes refer to the flip number of the given stream (w.r.t. the target function), which is a more finetuned quantity. See Definiton 3.2.
Example 1.4
Assuming that there are no deletions in the stream (a.k.a. the insertion only model), the \((\alpha ,m)\)flip number of the distinct elements problem is at most \(O\left( \frac{1}{\alpha } \log m \right) \). However, if deletions are allowed (a.k.a. the turnstile model), then the flip number of this problem could be as big as \(\Omega (m)\).
The generic construction of [9] for a function \({\mathcal {F}}\) is as follows.

1.
Instantiate \(\lambda =\lambda _{O(\alpha ),m}({\mathcal {F}})\) independent copies of an oblivious streaming algorithm for the function \({\mathcal {F}}\), and set \(j=1\).

2.
When the next update \(u_i\) arrives:

(a)
Feed \(u_i\) to all of the \(\lambda \) copies.

(b)
Release an estimate using the jth copy (rounded to the nearest power of \((1+\alpha )\)). If this estimate is different than the previous estimate, then set \(j\leftarrow j+1\).

(a)
BenEliezer et al. [9] showed that this can be used to transform an oblivious streaming algorithm for \({\mathcal {F}}\) into an adversarially robust streaming algorithm for \({\mathcal {F}}\). In addition, the overhead in terms of memory is only \(\lambda \), which is small in many interesting settings.
The simple, but powerful, observation of BenEliezer et al. [9], is that by “using every copy at most once” we can break the dependencies between the internal randomness of our algorithm and the choice for the elements in the stream. Intuitively, this holds because the answer is always computed using a “fresh copy” whose randomness is independent from the choice of stream items.
1.3 Existing Framework: Hassidim et al. [10]
Hassidim et al. [10] showed that, in fact, we can use every copy of the oblivious algorithm much more than once. In more detail, the idea of Hassidim et al. is to protect the internal randomness of each of the copies of the oblivious streaming algorithm using differential privacy [15]. Hassidim et al. showed that this still suffices in order to break the dependencies between the internal randomness of our algorithm and the choice for the elements in the stream. This resulted in an improved framework where the space blowup is only \(\approx \sqrt{\lambda }\) (instead of \(\lambda \)). Informally, the framework of [10] is as follows.

1.
Instantiate \({\hat{\lambda }} = {\tilde{O}}(\sqrt{\lambda })\) independent copies of an oblivious streaming algorithm for the function \({\mathcal {F}}\).

2.
When the next update \(u_i\) arrives:

(a)
Feed \(u_i\) to all of the \({\hat{\lambda }}\) copies.

(b)
Aggregate all of the estimates given by the \({\hat{\lambda }}\) copies, and compare the aggregated estimate to the previous estimate. If the estimate had changed “significantly”, output the new estimate. Otherwise output the previous output.

(a)
In order to efficiently aggregate the estimates in Step 2b, this framework crucially relied on the fact that all of the copies of the oblivious algorithm are “the same” in the sense that they compute (or estimate) exactly the same function of the stream. This allowed Hassidim et al. to efficiently aggregate the returned estimates using standard tools from the literature on differential privacy. The intuition is that differential privacy allows us to identify global properties of the data, and hence, aggregating several numbers (the outcomes of the different oblivious algorithms) is easy if they are very similar.
1.4 Existing Framework: Woodruff and Zhou [16]
Woodruff and Zhou [16] presented an adversarial streaming framework that builds on the framework of BenEliezer at el. [9]. The new idea of [16] is that, in many interesting cases, the oblivious algorithms we execute can be modified to track different (but related) functions, that require less space while still allowing us to use (or combine) several of them at any point in time in order to estimate \({\mathcal {F}}\).
To illustrate this, consider a part of the input stream, say from time \(t_1\) to time \(t_2\), during which the target function \({\mathcal {F}}\) doubles its value and is monotonically increasing. More specifically, suppose that we already know (or have a good estimation for) the value of \({\mathcal {F}}\) at time \(t_1\), and we want to track the value of \({\mathcal {F}}\) from time \(t_1\) till \(t_2\). Recall that in the framework of [9] we only modify our output once the value of the function has changed by more than a \((1+\alpha )\) factor. As \({\mathcal {F}}(t_2)\le 2\cdot {\mathcal {F}}(t_1)\), we get that between time \(t_1\) and \(t_2\) there are roughly \(1/\alpha \) time points at which we need to modify our output. In the framework of [9], we need a fresh copy of the oblivious algorithm for each of these \(1/\alpha \) time points. For concreteness, let us assume that every copy uses space \(1/\alpha ^2\) (which is the case if, e.g., \({\mathcal {F}}=F_2\)^{Footnote 1}), and hence the framework of [9] requires space \(1/\alpha ^3\) to track the value of the function \({\mathcal {F}}\) from \(t_1\) till \(t_2\).
In the framework of [16], on the other hand, this will cost only \(1/\alpha ^2\). We now elaborate on this improvement. As we said, from time \(t_1\) till \(t_2\) there are \(1/\alpha \) time points on which we need to modify our output. Let us denote these time points as \(t_1 = w_0< w_1<w_2<\dots <w_{1/\alpha } = t_2\).^{Footnote 2} In the framework of [16], the oblivious algorithms we execute are tracking differences between the values of \({\mathcal {F}}\) at specific times, rather than tracking the value of \({\mathcal {F}}\) directly. (These algorithms are called difference estimators, or DE in short.) In more detail, suppose that for every \(j\in \{0,1,2,3,\dots ,\log \frac{1}{\alpha }\}\) and every \(i_j\in \{2^j, 2{\cdot }2^j, 3{\cdot }2^j, 4{\cdot }2^j,\dots , \frac{1}{\alpha }\}\) we have an oblivious algorithm (a difference estimator) for estimating the value of \([{\mathcal {F}}(w_{i_j})  {\mathcal {F}}(w_{i_j2^j})]\). We refer to the index j as the level of the oblivious algorithm. So there are \(\log \frac{1}{\alpha }\) different levels, where we have a different number of oblivious algorithms for each level. (For level \(j=0\) we have \(1/\alpha \) oblivious algorithms and for level \(j=\log \frac{1}{\alpha }\) we have only a single oblivious algorithm.)
Note that given all of these oblivious algorithms, we could compute an estimation for the value of the target function \({\mathcal {F}}\) at each of the time points \(w_1,\dots ,w_{1/\alpha }\) (and hence for every time \(t_1\le t\le t_2\)) by summing the estimations of (at most) one oblivious algorithm from each level.^{Footnote 3} For example, an estimation for the value of \({\mathcal {F}}\left( w_{\frac{3}{4\alpha }+1}\right) \) can be obtained by combining estimations as follows (see also Fig. 1):
As we sum at most \(\log \frac{1}{\alpha }\) estimations, this decomposition increases our estimation error only by a factor of \(\log \frac{1}{\alpha }\), which is acceptable. The key observation of [16] is that the space complexity needed for an oblivious algorithm at level j decreases when j decreases (intuitively because in lower levels we need to track smaller differences, which is easier). So, even though in level \(j{=}10\) we have more oblivious algorithms than in level 20, these oblivious algorithms are cheaper than in level 20 such that the overall space requirements for levels \(j{=}10\) and level \(j{=}20\) (or any other level) is the same. Specifically, [16] showed that (for many problems of interest, e.g., for \(F_2\)) the space requirement of a difference estimator at level j is \(O( 2^{j} /\alpha )\). We run \(O(2^{j} / \alpha )\) oblivious algorithms for level j, and hence, the space needed for level j is \(O(2^{j}/\alpha \cdot 2^{j}/\alpha )=O(1/\alpha ^{2})\). As we have \(\log (1/\alpha )\) levels, the overall space we need to track the value of \({\mathcal {F}}\) from time \(t_1\) till \(t_2\) is \({\tilde{O}}(1/\alpha ^{2})\). This should be contrasted with the space required by [9] for this time segment, which is \(O(1/\alpha ^3)\).
Remark 1.5
The informal takeaway from this example is that if we can track differences of the target function “efficiently” then this could be leveraged to design a robust algorithm with improved space complexity. In the example outlined above, the space needed to track differences decreases linearly with the bound on the maximal difference they aim to track. We extend this takeaway also to turnstile streams.
1.5 Our Results
The framework of [16] is very effective for the insertiononly model. However, there are two challenges that need to be addressed in the turnstile setting: (1) We are not aware of nontrivial constructions for difference estimators in the turnstile setting, and hence, the framework of [16] is not directly applicable to the turnstile setting.^{Footnote 4} (2) Even assuming the existence of a nontrivial difference estimator, the framework of [16] obtains suboptimal results in the turnstile setting.
To overcome the first challenge, we introduce a new monitoring technique, that aims to identify time steps at which we cannot guarantee correctness of our difference estimators (in the turnstile setting), and reset the system at these time steps. This will depend on the specific application at hand (the target function) and hence, we defer the discussion on our monitoring technique to Sect. 6 where we discuss applications of our framework.
We now focus on the second challenge (after assuming the existence of nontrivial difference estimators). To illustrate the suboptimality of the framework of [16], let us consider a simplified turnstile setting in which the input stream can be partitioned into k time segments during each of which the target function is monotonic, and increases (or decreases) by at most a factor of 2 (or 1/2). Note that k can be very large in the turnstile model (up to O(m)). With the framework of [16], we would need space \({\tilde{O}}\left( \frac{k}{\alpha ^2} \right) \) to track the value of \(F_2\) throughout such an input stream. The reason is that, like in the framework of [9], the robustness guarantees are achieved by making sure that every oblivious algorithm is “used only once”. This means that we cannot reuse the oblivious algorithms across the different segments, and hence, the space complexity of [16] scales linearly with the number of segments k.
To mitigate this issue, we propose a new construction that combines the frameworks of [16] with the framework of [10]. Intuitively, in our simplified example with the k segments, we want to reuse the oblivious algorithms across different segments, and protect their internal randomness with differential privacy to ensure robustness. However, there is an issue here. Recall that the framework of [10] crucially relied on the fact that all of the copies of the oblivious algorithm are “the same” in the sense that they compute the same function exactly. This allowed [10] to efficiently aggregate the estimates in a differentially private manner. However, in the framework of [16], the oblivious algorithms we maintain are fundamentally different from each other, tracking different functions. Specifically, every difference estimator is tracking the value of \([{\mathcal {F}}(t){\mathcal {F}}(e)]\) for a unique enabling time \(e<t\) (where t denotes the current time). That is, every difference estimator necessarily has a different enabling time, and hence, they are not tracking the same function, and it is not clear how to aggregate their outcomes with differential privacy.
Toggle Difference Estimator (TDE). To overcome the above challenge, we present an extension to the notion of a difference estimator, which we call a Toggle Difference Estimator (see Definition 3.3). Informally, a toggle difference estimator is a difference estimator that allows us to modify its enabling time on the go. This means that a TDE can track, e.g., the value of \([{\mathcal {F}}(t){\mathcal {F}}(e_1)]\) for some (previously given) enabling time \(e_1\), and then, at some later point in time, we can instruct the same TDE to track instead the value of \([{\mathcal {F}}(t){\mathcal {F}}(e_2)]\) for some other enabling time \(e_2\). We show that this extra requirement from the difference estimator comes at a very low cost in terms of memory and runtime. Specifically, in Sect. 5 we present a generic (efficiency preserving) method for generating a TDE from a DE.
Let us return to our example with the k segments. Instead of using every oblivious algorithm only once, we reuse them across the different segments, where during any single segment all the TDE’s are instructed to track the appropriate differences that are needed for the current segment. This means that during every segment we have many copies of the “same” oblivious algorithm. More specifically, for every different level (as we explained above) we have many copies of an oblivious algorithm for that level, which is (currently) tracking the difference that we need. This allows our space complexity to scale with \(\sqrt{k}\) instead of linearly with k as the framework of [16]. To summarize this discussion, our new notion of TDE allows us to gain both the space saving achieved by differential privacy (as in the framework of [10]) and the space saving achieved by tracking the target function via differences (as in the framework of [16]).
Remark 1.6
The presentation given above (w.r.t. our example with the k segments) is oversimplified. Clearly, in general, we have no guarantees that an input (turnstile) stream can be partitioned into k such segments. This means that in the actual construction we need to calibrate our TDE’s across time segments in which the value of the target function is not monotone. See Sect. 4.1 for a more detailed overview of our construction and the additional modifications we had to introduce.
We are now ready to state our main result, extending the framework of Woodruff and Zhou [16] to the turnstile setting. As in the work of [16], in order to benefit from tracking differences, we need the property that tracking smaller differences is cheaper.
Theorem 1.7
(informal version of Theorem B.1)) Let \({\mathcal {F}}\) be a function for which the following algorithms exist:

1.
An \(\alpha \)accurate oblivious streaming algorithm \({\textsf{E}}\) with space complexity \(\textrm{Space}({\textsf{E}})\).

2.
An \(\alpha \)accurate oblivious TDE streaming algorithm \({\textsf{E}}_{\textrm{TDE}}\) that for every \(\gamma \) can track differences of (relative) size at most \(\gamma \) using space \(\gamma \cdot \textrm{Space}({\textsf{E}}_{\textrm{TDE}})\).
Then there is an \(O(\alpha )\)accurate adversariallyrobust algorithm for turnstile streams with bounded flip number \(\lambda \) with space^{Footnote 5}\({\tilde{O}}\left( \sqrt{\alpha \lambda }\cdot \left( \textrm{Space}({\textsf{E}}) + \textrm{Space}({\textsf{E}}_{\textrm{TDE}}) \right) \right) \).
In contrast, under the same conditions, the framework of [16] requires space \({\tilde{O}}\left( \alpha \lambda \cdot \left( \textrm{Space}({\textsf{E}}) + \textrm{Space}({\textsf{E}}_{\textrm{TDE}}) \right) \right) \). As we mentioned, we are not aware of nontrivial constructions for difference estimators that work in the turnstile setting. Hence, Theorem 1.7, as well as the results of [16], are not currently applicable (as is) to the turnstile setting. Nevertheless, in Sect. 6 we augment our framework using a new “monitoring technique” and show that it can be applied to the turnstile setting for estimating \(F_2\) (the second moment of the stream). To this end, we introduce the following notion that allows us to control the number of times we need to reset our system (which happens when we cannot guarantee correctness of our difference estimators).
Definition 1.8
(Twist number) The \((\alpha ,m)\)twist number of a stream \({\mathcal {S}}\) w.r.t. a functionality \({\mathcal {F}}\), denoted as \(\mathrm{\mu }_{\alpha ,m}({\mathcal {S}})\), is the maximal \(\mu \in [m]\) such that \({\mathcal {S}}\) can be partitioned into \(2\mu \) disjoint segments \({\mathcal {S}}= {\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_{\mathrm{\mu }1} \circ {\mathcal {V}}_{\mathrm{\mu }1}\) (where \(\{{\mathcal {P}}_i\}_{i\in [\mathrm{\mu }]}\) may be empty) s.t. for every \(i\in [\mu ]\):

1.
\({\mathcal {F}}({\mathcal {V}}_i) > \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {V}}_{i1} \circ {\mathcal {P}}_i)\)

2.
\({\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \dots \circ {\mathcal {P}}_i \circ {\mathcal {V}}_i)  {\mathcal {F}}({\mathcal {P}}_0\circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i) \le \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i)\)
In words, a stream has twist number \(\mu \) if there are \(\mu \) disjoint segments \({\mathcal {V}}_0,\dots ,{\mathcal {V}}_{\mu 1}\subseteq {\mathcal {S}}\) such that the value of the function on each of these segments is large (Condition 1), but still these segments do not change the value of the function on the entire stream by too much (Condition 2). Intuitively, the twist number bounds the number of regions in which a local view of the stream would suggest a large multiplicative change, but a global view would not.
Example 1.9
For \(F_2\) estimation in insertiononly streams, it holds that \(\mu =0\) even though \(\lambda \) can be large. This is the case because, in insertion only streams, Conditions 1 and 2 from Definition 1.8 cannot hold simultaneously. Specifically, for a stream \({\mathcal {S}}\) partitioned as \({\mathcal {S}}= {\mathcal {P}}_0\circ \dots \circ {\mathcal {P}}_i \circ {\mathcal {V}}_i \) denote p as the frequency vector of \({\mathcal {P}}_0\circ \dots \circ {\mathcal {P}}_i\) and v the frequency vector of \({\mathcal {V}}_i\), and suppose that Condition 2 holds, i.e., \(\Vert p + v\Vert ^2\Vert p\Vert ^2\le \alpha \cdot \Vert p\Vert ^2\). Hence, in order to show that Condition 1 does not hold, it suffices to show that \(\Vert v\Vert ^2\le \Vert p + v\Vert ^2\Vert p\Vert ^2\), i.e., show that \(\Vert v\Vert ^2+\Vert p\Vert ^2\le \Vert p + v\Vert ^2\), i.e., show that \((v_1^2+p_1^2)+\dots +(v_n^2+p_n^2)\le (v_1+p_1)^2+\dots +(v_n+p_n)^2\), which trivially holds whenever \(p_i,v_i\ge 0\).
In Sect. 6 we leverage this notation and present the following result for \(F_2\) estimation (see Definition 3.4 for \(F_2\)).
Theorem 1.10
(\(F_2\) Robust estimation, informal) There exists an adversarially robust \(F_2\) estimation algorithm for turnstile streams of length m with a bounded \((O(\alpha ), m)\)flip number \(\lambda \) and a bounded \((O(\alpha ), m)\)twist number \(\mu \) that guarantees \(\alpha \)accuracy (w.h.p.) using space complexity \({\tilde{O}}\left( \frac{\sqrt{\alpha \lambda +\mu }}{\alpha ^{2}}\right) \).
This should be contrasted with the result of [10], who obtain space complexity \({\tilde{O}}\left( \frac{\sqrt{\lambda }}{\alpha ^2}\right) \) for robust \(F_2\) estimation in the turnstile setting. Hence, our new result is better whenever \(\mu \ll \lambda \). See a comparison of the space complexity of all aforementioned frameworks in Table 1. We leave open the possibility of pushing down the space complexity to \(\approx \frac{\sqrt{\lambda }}{\alpha ^{1.5}}\) or to \(\approx \frac{\sqrt{\lambda +\mu }}{\alpha ^{1.5}}\). Informally, the bottleneck for this in our construction is that we need to “reset” our system whenever a “twist” event occurs, which translates to a cost of an additional oblivious strong tracker. See Sect. 2 for elaborated discussion on a “twist” event in an \(F_2\) estimation for a turnstile stream and example of a turnstile stream with large flipnumber and small twist number in which our framework perform better then both [10, 16].
2 Discussion and Open Questions
In this section we discuss the role of our new notion, twistnumber (\(\mu \)), in the space blowup of our framework’s extension. For that, we discuss the events this notion captures, illustrate such an event in drawing (see Fig. 2 below) and explain how it effects the space complexity. We also elaborate regarding the regime where our framework outperforms the other existing frameworks and finally pose an open question.
Twist number and violation events. The notion of twistnumber aims to tackle the following issue: Our generic framework relies on the existence of a DE for the target functionality. By definition, a \(\gamma \)DE activated on a stream prefix \({\mathcal {P}}\) must maintain accuracy for every stream suffix \({\mathcal {S}}\) that satisfies the condition:
however, we are not aware of constructions of nontrivial DE’s in the turnstile setting. To overcome this, we will instantiate our framework with existing DE’s that operate only in insertion only streams, and augment our framework with a “monitoring technique” that identifies “violations” which are time points at which our underlying difference estimators might loose their utility guarantees (which may happen since they are not designed to operate outside of insertiononly streams). More specifically, in the context of \(F_2\), existing constructions for DE with parameter \(\gamma \), which are activated on a steam prefix \({\mathcal {P}}\) and provide estimates during a suffix \({\mathcal {S}}\), operate under the following assumption:
In insertion only streams, the requirement condition 1 of the DE implies its utility guarantee assumption 2. However, this implication does not hold in turnstile streams. We refer to events where assumption 2 fails but condition 1 still holds as violation events (see suffix violation events in Sect. C, Definition C.2). Essentially, these violations occur when the value of the suffix is large (\({\mathcal {F}}({\mathcal {S}})\ge \gamma \cdot {\mathcal {F}}({\mathcal {P}})\) for some \(\gamma \in (0,1)\)), but the value of the entire stream does not change significantly (\({\mathcal {F}}({\mathcal {P}}\circ {\mathcal {S}})\approx {\mathcal {F}}({\mathcal {P}})\)). In such cases, the utility guarantee of the DE construction does not hold.
A violation event—a drawing illustration. We give a illustrative example at Fig. 2. In the drawing we show the value of the \(F_2\) over a prefix (black arrow) and two corresponding trajectories of suffixes (blue dashed line, red dashed line). For the illustration, assume that at the point in time following the prefix \({\mathcal {P}}\), we output its estimation. We then initialize a DE to estimate the contribution of the suffix \({\mathcal {S}}\) to the value of the function on \({\mathcal {P}}\circ {\mathcal {S}}\). According to the DE construction, whenever the suffix norm becomes too large, its estimation is no longer valid. Thus we mark in a green circle a possible bound on this DE. To mark the boundaries of the next required output modification, we have the two dashed orange circles at a multiplicative distance from the current value of \({\mathcal {F}}({\mathcal {P}})\). These circles are referred to as the reporting boundaries. Whenever a trajectory crosses one of these reporting boundaries, we must update our output. The red trajectory crosses the reporting boundary when the DE guarantee no longer holds. In this scenario, we cannot rely on the DE estimation and must use alternative estimator for the required output modification. On the other hand, the blue trajectory crosses the reporting boundary while still meeting the DE guarantees requirements, making its estimation valid.
The regime where the framework is effective. For \(F_2\) estimation under turnstile streams that have at least one \(O(\alpha )\)suffix violation (i.e., \(\mu _{\alpha ,m}>0\)), the framework of [16] is not applicable, leaving only the frameworks of [10] and ours as viable options. Our framework achieves better performance whenever \(\mu \ll \lambda \) or \(\mu = O((1\alpha ) \lambda )\). As illustrated above, an \(O(\alpha )\)suffix violation event occurs when the frequency vector “twists” without significantly changing its \(L_2\)norm. Since we may reset our framework during these events, we achieve better performance than [10] for streams that exhibits fewer “twists” compared to magnitude changes (as captured by the flipnumber) in their frequency vector. However, for streams that demonstrate a larger number of “twists”, our framework has a similar space blowup to [10] because it essentially loses the optimization of DE and relies only on ST estimators.
Open question. Prior work has shown a strong connection between the flipnumber of the target function, i.e., the number of output modification, and the resulting space blowup for estimating the target function under adversarial inputs compared to oblivious inputs. In this work, we introduce an additional notion of the input stream, the twistnumber, which governs the space blowup. As discussed earlier, this new notion captures events corresponding to scenarios where insertion only DE constructions fail. Designing DE for turnstile inputs will remove the dependency in the twist number in the resulting space blowup.
Question 2.1
Is it possible design an algorithm for robust \(F_2\) estimation in the turnstile setting that achieves a space blowup of \(o(\sqrt{\alpha \lambda + \mu })\)?
That is, it is preferred to have a framework that has a blowup that is independent of the twist number of the stream. In the context of \(F_2\) estimation, constructing an oblivious DE for turnstile streams with space requirements similar to those of existing insertion only DE will result in a space blowup of \({\tilde{O}}(\sqrt{\alpha \lambda })\) using the proposed framework.
2.1 Other Related Works
Related to our work is the line of work on adaptive data analysis, aimed at designing tools for guaranteeing statistical validity in cases where the data is being accessed adaptively [17,18,19,20,21,22,23,24,25,26]. Recall that the difficulty in the adversarial streaming model arises from potential dependencies between the inputs of the algorithm and its internal randomness. As we mentioned, our construction builds on a technique introduced by [10] for using differential privacy to protect not the input data, but rather the internal randomness of algorithm. Following [10], this technique was also used by [27, 28] for designing robust algorithms in other settings.
3 Preliminaries
In this work we consider input streams which are represented as a sequence of updates, where every update is a tuple containing an element (from a finite domain) and its (integer) weight. Formally,
Definition 3.1
(Turnstile stream) A stream of length m over a domain [n],^{Footnote 6} consists of a sequence of updates \(\langle s_0, \Delta _0 \rangle ,\dots , \langle s_{m1}, \Delta _{m1} \rangle \) where \(s_i\in [n]\) and \(\Delta _i \in {\mathbb {Z}}\). Given a stream \({\mathcal {S}}\in ([n]\times {\mathbb {Z}})^m\) and integers \(0\le t_1\le t_2\le m1\), we write \({\mathcal {S}}^{t_1}_{t_2} = (\langle s_{t_1}, \Delta _{t_1} \rangle ,\dots , \langle s_{t_2}, \Delta _{t_2} \rangle )\) to denote the sequence of updates from time \(t_1\) till \(t_2\). We also use the abbreviation \({\mathcal {S}}_{t}={\mathcal {S}}^{1}_{t}\) to denote the first t updates.
Let \({\mathcal {F}}:([n]\times {\mathbb {Z}})^{*}\rightarrow {\mathbb {R}}\) be a function (for example \({\mathcal {F}}\) might count the number of distinct elements in the stream). At every time step t, after obtaining the next element in the stream \(\langle s_t, \Delta _t \rangle \), our goal is to output an approximation for \({\mathcal {F}}({\mathcal {S}}_t)\). To simplify presentation we also denote \({\mathcal {F}}(t)={\mathcal {F}}({\mathcal {S}}_t)\) for \(t\in [m]\). We assume throughout the paper that \(\log (m)=\Theta (\log (n))\) and that \({\mathcal {F}}\) is bounded polynomially in n.
In Sect. 1, for the purpose of presentation, it was useful to refer to the quantity a flip number of a function. Our results are stated w.r.t a more refined quantity: a flip number of a stream.
Definition 3.2
(Flip number of a stream [9]) Given a function \({\mathcal {F}}\) and a stream \({\mathcal {S}}\in {\mathcal {X}}^m\) of length m, denote \({\mathcal {F}}\) values over \({\mathcal {S}}\) as \((y_1, \dots , y_m)\). The \((\alpha ,m)\)flip number of \({\mathcal {S}}\), denoted as \(\lambda _{\alpha }({\mathcal {S}})\), is the maximal number \(k\in [m]\) s.t. there exists some indices \(0\le i_1< \dots < i_k\le m\) where for every \(j\in \{2,\dots k\}\), it holds that \(y_{i_{j1}} \notin (1\pm \alpha )\cdot y_{i_{j}}\).
Toggle Difference Estimator. For the purpose of our framework, we present an extension to the notion of a difference estimator (DE) from [16], which we call a toggle difference estimator (TDE). A difference estimator for a function \({\mathcal {F}}\) is an oblivious streaming algorithm, defined informally as follows: The difference estimator is initiated on time \(t = 1\) and has a dynamically defined enabling time \(1\le e\le m\). Once that enabling time is set, the difference estimator outputs an estimation for \(\left( {\mathcal {F}}({\mathcal {S}}_t)  {\mathcal {F}}({\mathcal {S}}_e)\right) \) for all times \(t>e\) (provided some conditions on that difference). That is, once the difference estimator’s enabling time is set, it cannot be changed. And so, if an estimation is needed for some other enabling time, say \(e^{\prime } \ne e\), then an additional instance of a difference estimator is needed. Our framework requires from such an estimator to be able to provide estimations for multiple enabling times, as long as the estimation periods do not overlap. This is captured in the following definition.
Definition 3.3
(Toggle Difference Estimator) Let \({\mathcal {F}}:([n]\times {\mathbb {Z}})^{*}\rightarrow {\mathbb {R}}\) be a function, and let \(m,p\in {\mathbb {N}}\) and \(\gamma ,\alpha , \delta \in (0,1)\) be parameters. Let \({\textsf{E}}\) be an algorithm with the following syntax. In every time step \(t \in [m]\), algorithm \({\textsf{E}}\) obtains an update \(\langle s_t, \Delta _t , b_t \rangle \in ([n]\times {\mathbb {Z}}\times \{0,1\})\) and outputs a number \(z_t\). Here \(\langle s_t, \Delta _t \rangle \) denotes the current update, and \(b_t\) is an indicator for when the current time t should be considered as the new enabling time. We consider input streams \({\mathcal {S}}\in ([n]\times {\mathbb {Z}}\times \{0,1\})^m\) such that there are at most p time steps t for which \(b_t=1\), and denote these time steps as \(1\le e^1< e^2< \dots< e^{p}<m\). Also, for a time step \(t\in [m]\) we denote \(e(t)=\max \{ e^i : e^i\le t \}\).
Algorithm \({\textsf{E}}\) is a \((\gamma ,\alpha , p, \delta )\)toggle difference estimator for \({\mathcal {F}}\) if the following holds for every such input stream \({\mathcal {S}}\). With probability at least \(1\delta \), for every \(t\in [m]\) such that
the algorithm outputs a value \(z_t\) such that \(z_t\in \left( {\mathcal {F}}({\mathcal {S}}_{t})  {\mathcal {F}}({\mathcal {S}}_{e(t)})\right) \pm \alpha \cdot {\mathcal {F}}({\mathcal {S}}_{e(t)})\).
This definition generalizes the notion of a difference estimator (DE) from [16], in which \(p=1\). In Sect. 5 we show that this extension comes at a very low cost in terms of the space complexity. Note that on times t s.t. the requirements specified w.r.t. \(\gamma \) do not hold, there is no accuracy guarantee from the TDE algorithm.
Frequency moments of a stream. Useful statistics of a stream are its frequency moments. These statistics are refereed to during the introduction section and the application. We give here the formal definition.
Definition 3.4
(Frequency moments.) Let \({\mathcal {S}}= \langle s_0, \Delta _0 \rangle ,\dots , \langle s_{m1}, \Delta _{m1} \rangle \in ([n]\times {\mathbb {Z}})^{m}\) be a stream of length m then its corresponding frequency vector \(v_{{\mathcal {S}}} \in {\mathbb {R}}^{n} \) is defined as \(v_{{\mathcal {S}}}[i] = \sum _{j\in [m]:s_j=i} \Delta _j, \forall i\in [n]\). For \(p\ge 0\), the pth frequency moment of \({\mathcal {S}}\) defined as \(\Vert v_{{\mathcal {S}}} \Vert ^p_p\) where \(\Vert \cdot \Vert _p\) is the pth norm.
The (p) frequency moments are also denoted as \(F_p\). In the introduction section we took as an example for \({\mathcal {F}}\), the tracked function of a stream, the number of its distinct elements which correspond to \(F_0\). Our presented application is for the function \(F_2\).
3.1 Preliminaries from Differential Privacy
Differential privacy [15] is a mathematical definition for privacy that aims to enable statistical analyses of databases while providing strong guarantees that individuallevel information does not leak. Consider an algorithm \({\mathcal {A}}\) that operates on a database in which every row represents the data of one individual. Algorithm \({\mathcal {A}}\) is said to be differentially private if its outcome distribution is insensitive to arbitrary changes in the data of any single individual. Intuitively, this means that algorithm \({\mathcal {A}}\) leaks very little information about the data of any single individual, because its outcome would have been distributed roughly the same even if without the data of that individual. Formally,
Definition 3.5
([15]) Let \({\mathcal {A}}\) be a randomized algorithm that operates on databases. Algorithm \({\mathcal {A}}\) is \((\varepsilon ,\delta )\)differentially private if for any two databases \(S,S^{\prime }\) that differ on one row, and any event T, we have
See Appendix A for additional preliminaries on differential privacy.
4 A Framework for Adversarial Streaming
Our transformation from an oblivious streaming algorithm \({{\textsf{E}}}_{\textrm{ST}}\) for a function \({\mathcal {F}}\) into an adversarially robust algorithm requires the following two conditions.

1.
The existence of a toggle difference estimator \({{\textsf{E}}}_{\textrm{TDE}}\) for \({\mathcal {F}}\), see Definition 3.3.

2.
Every single update can change the value of \({\mathcal {F}}\) up to a factor of \((1\pm \alpha ^{\prime })\) for some \(\alpha ^{\prime }=O(\alpha )\). Formally, throughout the analysis we assume that for every stream \({\mathcal {S}}\) and for every update \(u=\langle s, \Delta \rangle \) it holds that
Remark 4.1
These conditions are identical to the conditions required by [16]. Formally, they require only a difference estimator instead of a toggle difference estimator, but we show that these two objects are equivalent. See Sect. 5.
Remark 4.2
Condition 2 can be met for many functions of interest, by applying our framework on portions of the stream during which the value of the function is large enough. For example, when estimating \(F_2\) with update weights \(\pm 1\), whenever the value of the function is at least \(\Omega (1/\alpha )\), a single update can increase the value of the function by at most a \((1+\alpha )\) factor. Estimating \(F_2\) whenever the value of the function is smaller than \(O(1/\alpha )\) can be done using an existing (oblivious) streaming algorithm with error \(\rho =O(\alpha )\). To see that we can use an oblivious algorithm in this setting, note that the additive error of the oblivious streaming algorithm is at most \(O(\frac{\rho }{\alpha })\ll 1\). Hence, by rounding the answers of the oblivious algorithm we ensure that its answers are exactly accurate (rather than approximate). As the oblivious algorithm returns exact answers in this setting, it must also be adversarially robust.^{Footnote 7}
4.1 Construction Overview
Our construction builds on the constructions of [10, 16]. At a high level, the structure of our construction is similar to that of [16], but our robustness guarantees are achieved using differential privacy, similarly to [10], and using our new concept of TDE.
Our algorithm can be thought of as operating in phases. In the beginning of every phase, we aggregate the estimates given by our strong trackers with differential privacy, and “freeze” this aggregated estimate as the base value for the rest of the phase. Inside every phase, we privately aggregate (and “freeze”) estimates given by our TDE’s. More specifically, throughout the execution we aggregate TDE’s of different types/levels (we refer to the level that is currently being aggregated as the active level). At any point in time we estimate the (current) value of the target function by summing specific “frozen” differences together with the base value.
We remark that, in addition to introducing the notion of TDE’s, we had to incorporate several modifications to the framework of [16] in order to make it compatible with our TDE’s and with differential privacy. In particular, [16] manages phases by placing fixed thresholds (powers of 2) on the value of the target function; starting a new phase whenever the value of the target function crosses the next power of 2. If, at some point in time, the value of the target function drops below the power of 2 that started this phase, then this phase ends, and they go back to the previous phase. This is possible in their framework because the DE’s of the previous phase still exist in memory and are ready to be used. In our framework, on the other hand, we need to share all of the TDE’s across the different phases, and we cannot go back to “TDE’s of the previous phase” because these TDE’s are now tracking other differences. We overcome this issue by modifying the way in which differences are combined inside each phase.
In Algorithm 1 we present a simplified version of our main construction, including inline comments to improve readability. The complete construction is given in Algorithm RobustDE.
4.2 Analysis Overview
At a high level, the analysis can be partitioned into five components (with one component being significantly more complex then the others). We now elaborate on each of these components. The complete analysis is given in Appendix B.
4.2.1 First Component: Privacy Analysis
In Sect. B.1 we show that our construction satisfies differential privacy w.r.t. the collection of random strings on which the oblivious algorithms operate. Recall that throughout the execution we aggregate (with differential privacy) the outcome of our estimators from the different levels. Thus, in order to show that the whole construction satisfies privacy (using composition theorems) we need to bound the maximal number of times we aggregate the estimates from the different levels. However, we can only bound this number under the assumption that the framework is accurate (in the adaptive setting), and for that we need to rely on the privacy properties of the framework. So there is a bit of circularity here. To simplify the analysis, we add to the algorithm hardcoded caps on the maximal number of times we can aggregate estimates at the different levels. This makes the privacy analysis straightforward. However, we will later need to show that this hardcoded capping “never” happens, as otherwise the algorithm fails.^{Footnote 8} These hardcoded caps are specified by the parameters \(P_j\) (both in the simplified algorithm and in the complete construction), using which we make sure that the estimators at level j are never aggregated (with differential privacy) more than \(P_j\) times.
4.2.2 Second Component: Conditional Accuracy
In Sect. B.2 we show that if the following two conditions hold, then the framework is accurate:
 Condition (1)::

At any time step throughout the execution, at least 80% of the estimators in every level are accurate (w.r.t. the differences that they are estimating).
 Condition (2)::

The hardcoded capping never happens.
This is the main technical part in our analysis; here we provide an oversimplified overview, hiding many of the technicalities. We first show that if Conditions (1) and (2) hold then the framework is accurate. We show this by proving a sequence of lemmas that hold (w.h.p.) whenever Conditions (1) and (2) hold. We now elaborate on some of these lemmas. Recall that throughout the execution we “freeze” aggregated estimates given by the different levels. The following lemma shows that these “frozen” aggregations are accurate (at the moments at which we “freeze” them). This Lemma follows almost immediately from Condition (1), as if the vast majority of our estimators are accurate, then so is their private aggregation.
Lemma 4.3
(informal version of Lemma B.4) In every time step \(t\in [m]\) in which we compute a value \({\textsf{Z}}_j\) (in Step 7a of Algorithm RobustDE, or Step 4a of the simplified algorithm) it holds that \({\textsf{Z}}_j\) is accurate. Informally, if the current level j is that of the strong trackers, then \({\textsf{Z}}_j  {\mathcal {F}}(t) < \alpha \cdot {\mathcal {F}}(t)\), and otherwise \({\textsf{Z}}_j  ({\mathcal {F}}(t)  {\mathcal {F}}(e_j)) < \alpha \cdot {\mathcal {F}}(e_j)\), where \(e_j\) is the last enabling time of level j.
During every time step \(t\in [m]\), we test whether the previous output is still accurate (and modify it if it is not). This test is done by comparing the previous output with (many) suggestions we get for the current value of the target function. These suggestions are obtained by summing the outputs of the estimators at the currently active level j together with a (partial) sum of the previously frozen estimates (denoted as \({\textsf{Z}}\)). This is done in Step 7 of Algorithm RobustDE, or in Step 4 of the simplified algorithm. The following lemma, which we prove using Lemma 4.3, states that the majority of these suggestions are accurate (and hence our test is valid).
Lemma 4.4
(informal version of Lemma B.7) Fix a time step \(t\in [m]\), and let j denote the level of active estimators. Then, for at least \(80\%\) of the estimators in level j, summing their output z with \({\textsf{Z}}\) is an accurate estimation for the current value of the target function, i.e., \( \left {\mathcal {F}}(t)  ({\textsf{Z}} + z) \right \le \alpha \cdot {\mathcal {F}}(t). \)
So, in every iteration we test whether our previous output is still accurate, and our test is valid. Furthermore, when the previous output is not accurate, we modify it to be \(({\textsf{Z}} + {\textsf{Z}}_j)\), where \({\textsf{Z}}_j\) is the new aggregation (the new “freeze”) of the estimators at level j. So this modified output is accurate (assuming that the hardcoded capping did not happen, i.e., Condition (2), as otherwise the output is not modified). We hence get the following lemma.
Lemma 4.5
(informal version of Lemma B.9) In every time step \(t\in [m]\) we have
That is, the above lemma shows that our output is “always” accurate. Recall, however, that this holds only assuming that Conditions (1) and (2) hold.
4.2.3 Third Component: Calibrating to Avoid Capping
In Sect. B.3 we derive a high probability bound on the maximal number of times we will aggregate estimates at the different levels. In other words, we show that, with the right setting of parameters, we can make sure that Condition (2) holds. The analysis of this component still assumes that Condition (1) holds.
We first show that between every two consecutive times in which we modify our output, the value of the target function must change noticeably. Formally,
Lemma 4.6
(Informal version of Lemma B.12) Let \(t_1< t_2\in [m]\) be consecutive times in which the output is modified (i.e., the output is modified in each of these two iterations, and is not modified between them). Then, \({\mathcal {F}}(t_2)  {\mathcal {F}}(t_1) =\Omega \left( \alpha \cdot {\mathcal {F}}(t_1) \right) \).
We leverage this lemma in order to show that there cannot be too many time steps during which we modify our output. We then partition these time steps and “charge” different levels j for different times during which the output is modified. This allows us to prove a probabilistic bound on the maximal number of times we aggregate the estimates from the different levels (each level has a different bound). See Lemma B.14 for the formal details.
4.2.4 Forth Component: The Framework is Robust
In Sect. B.4 we prove that Condition (1) holds (w.h.p.). That is, we show that at any time step throughout the execution, at least 80% of the estimators in every level are accurate.
This includes two parts. First, in Lemma B.16, we show that throughout the execution, the condition required by our TDE’s hold (specifically, see 3 in Definition 3.3). This means that, had the stream been fixed in advance, then (w.h.p.) we would have that all of the estimators are accurate throughout the execution. In other words, this shows that if there were no adversary then (a stronger variant of) Condition (1) holds.
Second, in Lemma B.17 we leverage the generalization properties of differential privacy to show that Condition (1) must also hold in the adversarial setting. This lemma is similar to the analysis of [10].
4.2.5 Fifth Component: Calculating the Space Complexity
In the final part of the analysis, in Sect. B.5, we calculate the total space needed by the framework by accounting for the number of estimators in each level (which is a function of the high probability bound we derived on the number of aggregations done in each level), and the space they require. We refer the reader to Appendix B for the formal analysis.
5 Toggle Difference Estimator from a Difference Estimator
We present a simple method that transforms any difference estimator to a toggle difference estimator. The method works as follows. Let \(\textrm{DE}\) be a difference estimator (given as an subroutine). We construct a \(\textrm{TDE}\) that instantiates two copies of the given difference estimator: \(\textrm{DE}_{{ \mathrm enable}}\) and \(\textrm{DE}_{\textrm{fresh}}\). It also passes its parameters, apart of the enabling times, verbatim to both copies. As \(\textrm{DE}\) is set to output estimations only after receiving an (online) enabling time e, the \(\textrm{TDE}\) never enables the copy \(\textrm{DE}_{\textrm{fresh}}\). Instead, \(\textrm{DE}_{\textrm{fresh}}\) is used as a fresh copy that received the needed parameters and the stream \({\mathcal {S}}\) and therefore it is always ready to be enabled. Whenever a time t is equal to some enabling time (i.e. \(t=e^i\) for some \(i\in [p]\)), then the \(\textrm{TDE}\) copies the state of \(\textrm{DE}_{\textrm{fresh}}\) to \(\textrm{DE}_{\textrm{enable}}\) (running over the same space), and then it enables \(\textrm{DE}_{\textrm{enable}}\) for outputting estimations.
Corollary 5.1
For any function \({\mathcal {F}}\), provided that there exist a \((\gamma ,\alpha ,\delta )\)Difference Estimator for \({\mathcal {F}}\) with space \(S_{\textrm{DE}}(\gamma ,\alpha ,\delta ,n,m)\), then there exists a \((\gamma ,\alpha ,\delta ,p)\)Toggle Difference Estimator for \({\mathcal {F}}\) with space \( S_{\textrm{TDE}}(\gamma ,\alpha ,\delta ,p,n,m) = 2\cdot S_{\textrm{DE}}(\gamma ,\alpha ,\delta /p,n,m)\)
Note that for a \(\textrm{DE}\) whose space dependency w.r.t. the failure parameter \(\delta \) is logarithmic, the above construction gives a \(\textrm{TDE}\) with at most a logarithmic blowup in space, resulting from the p enabling times.
6 Applications
Our framework is applicable to functionalities that admit a strong tracker and a difference estimator. As [16] showed, difference estimators exist for many functionalities of interest in the insertion only model, including estimating frequency moments of a stream, estimating the number of distinct elements in a stream, identifying heavyhitters in a stream and entropy estimation. However, as we mentioned, we are not aware of nontrivial DE constructions in the turnstile model. In more detail, [16] presented DE for the turnstile setting, but these DE require additional assumptions and do not exactly fit our framework (nor the framework of [16]).
To overcome this challenge we introduce a new monitoring technique which we use as a wrapper around our framework. This wrapper allows us to check whether the additional assumptions required by the DE hold, and reset our system when they do not. As a concrete application, we present the resulting bounds for \(F_2\) estimation.
Definition 6.1
(Frequency vector) The frequency vector of a stream \(S = (\langle s_1, \Delta _1 \rangle ,\dots , \langle s_m, \Delta _m \rangle )\in ([n]{\times }\{\pm 1\})^{m}\) is the vector \(u\in {\mathbb {Z}}^{n}\) whose ith coordinate is \(u[i] = \sum _{j\in [m], s_j = i}{\Delta _j}.\) We write \(u^{(t)}\) to denote the frequency vector of the stream \(S_t\), i.e., restricted to the first t updates. Given two time points \(t_1\le t_2\in [m]\) we write \(u^{(t_1,t_2)}\) to denote the frequency vector of the stream \(S_{t_1}^{t_2}\), i.e., restricted to the updates between time \(t_1\) and \(t_2\).
In this section we focus on estimating \(F_2\), the second moment of the frequency vector. That is, after every time step t, after obtaining the next update \(\langle s_t, \Delta _t \rangle \in ([n]{\times }\{\pm 1\})\), we want to output an estimation for
Woodruff and Zhou [16] presented a \((\gamma , \alpha , \delta )\)difference estimator for \(F_2\) that works in the turnstile model, under the additional assumption that for any time point t and enabling time \(e\le t\) it holds that
In general, we cannot guarantee that this condition holds in a turnstile stream (See discussion in Sect. 2). To bridge this gap, we introduce the notion of twist number (see Definition 1.8) in order to control the number of times during which this condition does not hold (when this condition does not hold we say that a violation has occurred). Armed with this notion, our approach is to run our framework (algorithm RobustDE) alongside a validation algorithm (algorithm Guardian) that identifies time steps at which algorithm RobustDE loses accuracy, meaning that a violation has occurred. We then restart algorithm RobustDE in order to maintain accuracy. As we show, our notion of twist number allows us to bound the total number of possible violation, and hence, bound the number of possible resets. This in turn allows us to bound the necessary space for our complete construction. The details are given in Appendix C; here we only state the result.
Theorem 6.2
There exists an adversarially robust \(F_2\) estimation algorithm for turnstile streams of length m with a bounded \((O(\alpha ), m)\)flip number and \((O(\alpha ), m)\)twist number with parameters \(\lambda \) and \(\mu \) correspondingly, that guarantees \(\alpha \)accuracy with probability at least \(11/m\) in all time \(t\in [m]\) using space complexity of
As we mentioned, this should be contrasted with the result of [10], who obtain space complexity \(\tilde{{\mathcal {O}}}\left( \frac{\sqrt{\lambda }}{\alpha ^2}\right) \) for robust \(F_2\) estimation in the turnstile setting. Hence, our new result is better whenever \(\mu \ll \lambda \).
Notes
\(F_2\) of a stream is its frequency moment of degree 2, see Definition 3.4.
Note that these time points are not known to the algorithm in advance. Rather, the algorithm needs to discover them “on the fly”. To simplify the presentation, in Sect. 1.4 we assume that these time points are known in advance.
Specifically, in order to reach the estimated value of \({\mathcal {F}}\) at time \(w_{{\tilde{t}}}\) one can add the estimations of difference estimators of levels corresponds to the binary representation of \({\tilde{t}}\). That is, at most one of each level j.
Moshe Shechner and Samson Zhou. Personal communication, 2022.
Here \({\tilde{O}}\) stands for omitting polylogarithmic factors of \(\lambda , \alpha ^{1}, \delta ^{1}, n, m\).
For an integer \(n\in {\mathbb {N}}\) denote \([n]=\{0,1,\dots ,n1\}\) (that is \([n] = n\)).
Note that this entails maintaining two algorithms: (1) An oblivious algorithm \({\mathcal {A}}\), to be used when the value of the function is “low”; and (2) our robust algorithm \({\mathcal {B}}\), to be used when the value of the function is “high”. Alternating between these two algorithms can be done as follows: Let \(T_l, T_h\in {\mathbb {R}}\) be two thresholds where \(T_h = c\cdot T_l = \Omega (1/\alpha )\) for some constant \(c\ge 2\). Then, the transition from \({\mathcal {A}}\) to \({\mathcal {B}}\) is done when \({\mathcal {A}}\)’s output exceeds \(T_h\), and the transition from \({\mathcal {B}}\) to \({\mathcal {A}}\) is done when \({\mathcal {B}}\)’s output drops below \(T_l\). This simple alternation management makes sure that there can be only \(O(\alpha \cdot \lambda )\) framework resets due to transitions and thus the space complexity of the framework with the alternation cost remains.
We remark that as the hardcoded capping “never” happens, we can in fact remove it from the algorithm. One way or another, however, we must derive a high probability bound on the number of times we can aggregate estimates at the different levels.
Note that this sequence contains time points at which we modify our output using different levels.
We assume there exist some constant c for which all estimates returned by the oblivious estimators type \({\textsf{E}}_j\) are within the range of \([n^{c},1/n^{c}]\cup \{0\} \cup [1/n^{c}, n^{c}]\). Rounding these estimates to their nearest values of \((1\pm \textrm{MuSize}(\alpha ))\) has only a small effect on the error. Such rounding on for range yields at most \(X=O(\alpha ^{1}\log (n))\) possible values.
Calibrating \(\textrm{StepSize}(\alpha ) > 2\cdot \textrm{MuSize}(\alpha )\) ensures that for any two consecutive time s.t. the output is modified there must be at least one time step between them.
To see that, consider a stream with prefix frequency vector u and suffix frequency vector w s.t. the norm of the frequency vectors between these two is roughly the norm of u (and so is the norm of \(u+w\)) while the support of u and \(u+w\) is disjoint.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)
Mironov, I., Naor, M., Segev, G.: Sketching in adversarial environments. SIAM J. Comput. 40(6), 1845–1870 (2011). https://doi.org/10.1137/080733772
Gilbert, A.C., Hemenway, B., Rudra, A., Strauss, M.J., Wootters, M.: Recovering simple signals. In: 2012 Information theory and applications workshop, pp. 382–391 (2012)
Gilbert, A.C., Hemenway, B., Strauss, M.J., Woodruff, D.P., Wootters, M.: Reusable lowerror compressive sampling schemes through privacy. In: 2012 IEEE statistical signal processing workshop (SSP), pp. 536–539 (2012)
Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: SODA, pp. 459–467 (2012). https://doi.org/10.1137/1.9781611973099.40
Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: PODS, pp. 5–14 (2012). https://doi.org/10.1145/2213556.2213560
Hardt, M., Woodruff, D.P.: How robust are linear sketches to adaptive inputs? In: STOC, pp. 121–130 (2013)
BenEliezer, O., Yogev, E.: The adversarial robustness of sampling. In: Proceedings of the 39th ACM SIGMODSIGACTSIGAI symposium on principles of database systems, pp. 49–62 (2020)
BenEliezer, O., Jayaram, R., Woodruff, D.P., Yogev, E.: A framework for adversarially robust streaming algorithms. ACM J. ACM (JACM) 69(2), 1–33 (2022)
Hassidim, A., Kaplan, H., Mansour, Y., Matias, Y., Stemmer, U.: Adversarially robust streaming algorithms via differential privacy. In: NeurIPS (2020)
Kaplan, H., Mansour, Y., Nissim, K., Stemmer, U.: Separating adaptive streaming from oblivious streaming using the bounded storage model. In: CRYPTO 2021 (2021). arxiv: 2101.10836
Braverman, V., Hassidim, A., Matias, Y., Schain, M., Silwal, S., Zhou, S.: Adversarial robustness of streaming algorithms through importance sampling. Adv. Neural. Inf. Process. Syst. 34, 3544–3557 (2021)
Cohen, E., Lyu, X., Nelson, J., Sarlós, T., Shechner, M., Stemmer, U.: On the robustness of countsketch to adaptive inputs. In: International conference on machine learning, pp. 4112–4140 (2022). PMLR
Cohen, E., Nelson, J., Sarlós, T., Stemmer, U.: Tricking the hashing trick: A tight lower bound on the robustness of countsketch to adaptive inputs. In: Proceedings of the AAAI conference on artificial intelligence, 37, 7235–7243 (2023)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography Conference, pp. 265–284 (2006). Springer
Woodruff, D.P., Zhou, S.: Tight bounds for adversarially robust streams and sliding windows via difference estimators. In: FOCS (2021)
Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A.L.: Preserving statistical validity in adaptive data analysis. In: Proceedings of the fortyseventh annual ACM symposium on theory of computing, pp. 117–126 (2015)
Bassily, R., Nissim, K., Smith, A.D., Steinke, T., Stemmer, U., Ullman, J.R.: Algorithmic stability for adaptive data analysis. SIAM J. Comput. 50(3) (2021)
Jung, C., Ligett, K., Neel, S., Roth, A., SharifiMalvajerdi, S., Shenfeld, M.: A new analysis of differential privacy’s generalization guarantees. In: 11th innovations in theoretical computer science conference (ITCS 2020) (2020). Schloss DagstuhlLeibnizZentrum für Informatik
Hardt, M., Ullman, J.: Preventing false discovery in interactive data analysis is hard. In: 2014 IEEE 55th annual symposium on foundations of computer science, pp. 454–463 (2014). IEEE
Steinke, T., Ullman, J.: Interactive fingerprinting codes and the hardness of preventing false discovery. In: Conference on learning theory, pp. 1588–1628 (2015). PMLR
Nissim, K., Smith, A.D., Steinke, T., Stemmer, U., Ullman, J.: The limits of postselection generalization. In: NeurIPS, pp. 6402–6411 (2018)
Nissim, K., Stemmer, U.: Concentration bounds for high sensitivity functions through differential privacy. J. Priv. Confidentiality 9(1) (2019)
Shenfeld, M., Ligett, K.: A necessary and sufficient stability notion for adaptive generalization. In: NeurIPS, pp. 11481–11490 (2019)
Shenfeld, M., Ligett, K.: Generalization in the face of adaptivity: a bayesian perspective. CoRR arXiv: 2106.10761 (2021)
Kontorovich, A., Sadigurschi, M., Stemmer, U.: Adaptive data analysis with correlated observations. In: International conference on machine learning, pp. 11483–11498 (2022). PMLR
Gupta, V., Jung, C., Neel, S., Roth, A., SharifiMalvajerdi, S., Waites, C.: Adaptive machine unlearning. arXiv preprint arXiv:2106.04378 (2021)
Beimel, A., Kaplan, H., Mansour, Y., Nissim, K., Saranurak, T., Stemmer, U.: Dynamic algorithms against an adaptive adversary: generic constructions and lower bounds. CoRR arXiv: 2111.03980 (2021)
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the fortyfirst annual ACM symposium on theory of computing, pp. 381–390 (2009)
Hardt, M., Rothblum, G.N.: A multiplicative weights mechanism for privacypreserving data analysis. In: 2010 IEEE 51st annual symposium on foundations of computer science, pp. 61–70 (2010). IEEE
Beimel, A., Nissim, K., Stemmer, U.: Private learning and sanitization: Pure vs. approximate differential privacy. Theory Comput. 12(1), 1–61 (2016). https://doi.org/10.4086/toc.2016.v012a001
Bun, M., Nissim, K., Stemmer, U., Vadhan, S.: Differentially private release and learning of threshold functions. In: 2015 IEEE 56th annual symposium on foundations of computer science, pp. 634–649 (2015). IEEE
Bun, M., Dwork, C., Rothblum, G.N., Steinke, T.: Composable and versatile privacy via truncated cdp. In: Proceedings of the 50th Annual ACM SIGACT symposium on theory of computing, pp. 74–86 (2018)
Kaplan, H., Ligett, K., Mansour, Y., Naor, M., Stemmer, U.: Privately learning thresholds: Closing the exponential gap. In: Conference on learning theory, pp. 2263–2285 (2020). PMLR
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: Privacy via distributed noise generation. In: Advances in CryptologyEUROCRYPT 2006: 24th annual international conference on the theory and applications of cryptographic techniques, St. Petersburg, Russia, May 28June 1, 2006. Proceedings 25, pp. 486–503 (2006). Springer
Dwork, C., Lei, J.: Differential privacy and robust statistics. In: Proceedings of the Fortyfirst Annual ACM symposium on theory of computing, pp. 371–380 (2009)
Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: 2010 IEEE 51st annual symposium on foundations of computer science, pp. 51–60 (2010). IEEE
Thorup, M., Zhang, Y.: Tabulation based 4universal hashing with applications to second moment estimation. SODA 4, 615–624 (2004)
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was originally presented at ITCS’23 conference.
Appendices
Additional Preliminaries from Differential Privacy
The Laplace Mechanism. The most basic constructions of differentially private algorithms are via the Laplace mechanism as follows.
Definition A.1
(The Laplace distribution) A random variable has probability distribution \(\textrm{Lap}(b)\) if its probability density function is \(f(x)=\frac{1}{2b}\exp \left( \frac{x}{b} \right) \), where \(x\in {\mathbb {R}}\).
Definition A.2
(Sensitivity) A function \(f:X^*\rightarrow {\mathbb {R}}\) has sensitivity \(\ell \) if for every two databases \(S,S^\prime \in X^*\) that difer in one row it holds that \(f(S)f(S^\prime )\le \ell \).
Theorem A.3
(Laplace mechanism [15]) Let \(f:X^*\rightarrow {\mathbb {R}}\) be a sensitivity \(\ell \) function. The mechanism that on input \(S\in X^*\) returns \(f(S)+\textrm{Lap}(\frac{\ell }{\varepsilon })\) preserves \((\varepsilon ,0)\)differential privacy.
The sparse vector technique. Consider a large number of lowsensitivity functions \(f_1,f_2,\dots \) which are given (one by one) to a data curator (holding a database S). Dwork et al. [29] presented a simple tool, called AboveThreshold (see Algorithm 2), for privately identifying the first index i such that the value of \(f_i(S)\) is “large”.
Theorem A.4
([29, 30]) Algorithm AboveThreshold is \((\varepsilon ,0)\)differentially private.
Privately approximating the median of the data. Given a database \(S\in X^{*}\), consider the task of privately identifying an approximate median of S. Specifically, for an error parameter \(\Gamma \), we want to identify an element \(x\in X\) such that there are at least \(S/2\Gamma \) elements in S that are bigger or equal to x, and there are at least \(S/2\Gamma \) elements in S that are smaller or equal to x. The goal is to keep \(\Gamma \) as small as possible, as a function of the privacy parameters \(\varepsilon , \delta \), the database size S, and the domain size X.
There are several advanced constructions in the literature with error that grows very slowly as a function of the domain size (only polynomially with \(\log ^{*}X\)). [31,32,33,34] In our application, however, the domain size is already small, and hence, we can use simpler constructions (where the error grows logarithmically with the domain size).
Theorem A.5
(folklore) There exists an \((\varepsilon ,0)\)differentially private algorithm that given a database \(S\in X^{*}\) outputs an element \(x\in X\) such that with probability at least \(1\delta \) there are at least \(S/2\Gamma \) elements in S that are bigger or equal to x, and there are at least \(S/2\Gamma \) elements in S that are smaller or equal to x, where \(\Gamma = O\left( \frac{1}{\varepsilon }\log \left( \frac{X}{\delta } \right) \right) \).
Composition of differential privacy. The following theorems argue about the privacy guarantees of an algorithm that accesses its input database using several differentially private mechanisms.
Theorem A.6
(Simple composition [35, 36]) Let \(0<\varepsilon \le 1\), and let \(\delta \in [0,1]\). A mechanism that permits k adaptive interactions with mechanisms that preserve \((\varepsilon ,\delta )\)differential privacy (and does not access the database otherwise) ensures \((k\varepsilon , k\delta )\)differential privacy.
Theorem A.7
(Advanced composition[37]) Let \(0<\varepsilon , \delta ^{\prime } \le 1\), and let \(\delta \in [0,1]\). A mechanism that permits k adaptive interactions with mechanisms that preserve \((\varepsilon ,\delta )\)differential privacy (and does not access the database otherwise) ensures \((\varepsilon ^{\prime },k\delta +\delta ^{\prime })\)differential privacy, for \(\varepsilon ^{\prime }=\sqrt{2k\ln (1/\delta ^{\prime })}\cdot \varepsilon + 2k\varepsilon ^2\).
Generalization properties of differential privacy. Dwork et al. [17] and Bassily et al. [18] showed that if a predicate h is the result of a differentially private computation on a random sample, then the empirical average of h and its expectation over the underlying distribution are guaranteed to be close.
Theorem A.8
([17, 18]) Let \(\varepsilon \in (0,1/3)\), \(\delta \in (0,\varepsilon /4)\) and \(n \ge \frac{1}{\varepsilon ^2}\log (\frac{2\varepsilon }{\delta })\). Let \({\mathcal {A}}: X^{n}\rightarrow 2^{X}\) be an \((\varepsilon ,\delta )\)differentially private algorithm that operates on a database of size n and outputs a predicate \(h:X\rightarrow \{0,1\}\). Let \({\mathcal {D}}\) be a distribution over X, let S be a database containing n i.i.d elements from \({\mathcal {D}}\), and let \(h\leftarrow {\mathcal {A}}(S)\). Then
The Formal Analysis
In this section, we present our framework for adversarial streaming, including the full construction (Algorithm RobustDE) and its formal analysis. The following theorem states the requirements of the algorithm and its space complexity for any stream with \(\lambda \)upperbounded flipnumber:
Theorem B.1
(Framework for Adversarial Streaming  Space) Provided that there exist:

1.
An oblivious streaming algorithm \({\textsf{E}}_{\textrm{ST}}\) for functionality \({\mathcal {F}}\), that guarantees that with probability at least 9/10 all of it’s estimates are accurate to within a multiplicative error of \((1\pm \alpha _{\textrm{ST}})\) with space complexity of \(S_{\textrm{ST}}(\alpha _{\textrm{ST}}, n,m)\)

2.
For every \(\gamma ,p\) there is a \((\gamma ,\alpha _{\textrm{TDE}},p,\frac{1}{10})\)\(\textrm{TDE}\) for \({\mathcal {F}}\) using space \(\gamma \cdot S_{\textrm{TDE}}(\alpha _{\textrm{TDE}},p,n,m)\).
Then there exist an adversarially robust streaming algorithm for functionality \({\mathcal {F}}\) that for any stream with a bounded flip number \(\lambda _{\alpha /8,m}< \lambda \), s.t. with probability at least \(1\delta \) its output is accurate to within a multiplicative error of \((1\pm \alpha )\) for all times \(t\in [m]\), and has a space complexity of
Notations for the Algorithm RobustDE and a HighLevel Description. RobustDE utilizes two types of oblivious algorithms for the function \({\mathcal {F}}\): an oblivious strong tracker for \({\mathcal {F}}\) and an oblivious \(\textrm{TDE}\) for \({\mathcal {F}}\), referred to as estimators and denoted by \({\textsf{E}}\). The algorithm instantiates several sets of these two types of estimators, denoted by \({\textsf{E}}_{\textrm{ST}}\) for the oblivious strong tracker and \({\textsf{E}}_{\textrm{TDE}}\) for the oblivious \(\textrm{TDE}\). Each such set is instantiated with corresponding parameters and is used for different purposes that will be elaborated next.
At a high level, RobustDE operates in phases. During each phase, it modifies its output ‘\(\text {Output}(t)\)’ \(O(1/\alpha )\) times, with the output remaining unchanged in between these modifications. Each such modification in the output is referred to as an output modification. Each of these output modifications is computed by (only) one of the sets of estimators. After each such output modification, the next set of estimators is selected. This selection is performed in subroutine ActiveLVL(\(\tau \)). The sets are enumerated by \(j \in [\beta + 1]\), and this enumeration is also referred to as the level of the set. There are levels \(0, \dots , \beta 1\) for sets of type \(\textrm{TDE}\) and level \(\beta \), which we denote as \(\textrm{ST}\), for the type \(\textrm{ST}\). Each estimator set j has corresponding size \({\textsf{K}}_j\), a bound on the number of output modifications \(P_j\) it is associated with and a privacy parameter \(\varepsilon _j\). The \(\textrm{TDE}\) levels estimators, that is levels \(j\in [\beta ]\) have an additional parameter \(\gamma _j\) that configures the maximal relative difference the estimators of this level are capable to estimate. As the phase progresses, \(\text {Output}(t)\) is computed by summing estimations saved from previous steps within the current phase, each from a different level of the estimator sets. This summation is performed in subroutine StitchFrozenVals(\(\tau \)). These saved estimation are also referred frozen values and denoted as \({\textsf{Z}}_j\) for \(j \in [\beta +1]\).
Concurrently, during each phase, RobustDE monitors an estimation of the difference between the value of \({\mathcal {F}}\) at the beginning of the phase and its value after every update. This monitoring happens on step 2 using an additional set of estimators of type \(\textrm{ST}\), which we denote as W. This additional set is identical in type and configuration to the set \(\textrm{ST}\) and is used concurrently for monitoring purposes.
Proof structure. Theorem B.1 has five components proved in the following sections.

1.
Privacy analysis. In Sect. B.1, we prove the robustness of each of the estimators sets. Since our robustness is achieved via differential privacy tool, the first component is reasoning about the privacy of the random strings corresponding to each of these sets.

2.
Conditional accuracy. In Sect. B.2, we present our accuracy lemmas, which are proven under the assumption that \(80\%\) of the estimators in each set are accurate at all times \(t \in [m]\).

3.
Output modification bounds. In Sect. B.3, we calculate an upper bound for the number of times an output modification is computed by each set of estimators.

4.
Robustness against an adaptive inputs. Using these output modification bounds, in Sect. B.4, we prove that the framework is robust against adaptiveadversarial inputs and prove the assumed condition (Assumption B.3). This proof of robustness determines the sufficient size of each set, as stated in Theorem B.19.

5.
Space complexity. In Sect. B.5, we calculate the total space complexity of the framework, which is determined by the sizes of the sets calculated and the space of the estimators in these sets.
1.1 Privacy Analysis
The following lemma shows that algorithm RobustDE is private w.r.t. the random bitstrings of the \(\beta +1\) type datasets.
Lemma B.2
For level \(j\in [\beta +1]\cup \{\textrm{W}\}\) let \({\mathcal {R}}_j\) be its corresponding random bitstrings dataset. Algorithm RobustDE satisfies \((\varepsilon ,\delta ^{\prime })\)DP w.r.t. a dataset \({\mathcal {R}}_j\) by configuring \(\varepsilon _{j}=O\left( \varepsilon /\sqrt{P_j\log (1/\delta ^{\prime })}\right) \).
Proof sketch
Let us focus on some level \(j\in [\beta +1]\cup \{\textrm{W}\}\). We analyze the privacy guarantees w.r.t. \({\mathcal {R}}_j\) by arguing separately for every sequence of time steps during which we do not modify our output using level j (the sequence ends in a time point at which we do modify the output using level j).^{Footnote 9} Let us denote by \(P_j\) the number of such sequences we allow for each level j afterwhich algorithm RobustDE is not generating any output due to the capping counters. Throughout every such time sequence, we access the dataset \({\mathcal {R}}_j\) via the sparse vector technique and once (at the end of the sequence) using the private median algorithm. We calibrate the privacy parameters of these algorithms to be \(\varepsilon _{j}=O\left( \varepsilon /\sqrt{P_j\log (1/\delta ^{\prime })}\right) \) such that, by using composition theorems across all of the \(P_j\) sequences, our algorithm satisfies \((\varepsilon ,\delta ^{\prime })\)differential privacy w.r.t. \({\mathcal {R}}_j\). \(\square \)
1.2 Conditional Accuracy
We first prove the framework accuracy assuming that throughout the run of algorithm RobustDE, for all \(t\in [m]\) \(80\%\) of the estimations given from the estimators are accurate (each level w.r.t its accuracy parameter \(\alpha _j\)). This assumption will be proved independently on B.18. The following is its formal definition:
Assumption B.3
(Accurate estimations) Fix a time step \(t\in [m]\). Let \(j\in [\beta +1]\cup \{\textrm{W}\}\) be a level of estimators. Recall that \({\textsf{K}}_j\) denotes the number of the estimators in level j, and let \(z^{1}_{j}, \dots , z^{{\textsf{K}}_j}_{j}\) denote the estimations given by these estimators. Then:

1.
For \(j \in \{\beta , \textrm{W}\}\) \(\{k\in [{\textsf{K}}_j] : z^{k}_{j}{\mathcal {F}}(t)< \alpha _{\textrm{ST}}\cdot {\mathcal {F}}(t) \} \ge (8/10) {\textsf{K}}_j\)

2.
For \(j < \beta \), \(\{k\in [{\textsf{K}}_j] : z^{k}_{j}({\mathcal {F}}(t)  {\mathcal {F}}(e_j))< \alpha _{\textrm{TDE}}\cdot {\mathcal {F}}(e_j) \} \ge (8/10) {\textsf{K}}_j\)
The framework accuracy is proved on three steps. The first step is arguing that given the assumption on the accuracy of the given estimations, every frozen value is accurate w.r.t the function that it has estimated (Lemma B.4). The second step is a follow up to the first: focusing on the active level of estimators, call it j, and the value of the function \({\mathcal {F}}\) at the time these estimators were enabled (\({\mathcal {F}}(e_j)\)), then combining frozen values of relevant levels results in an accurate estimation for \({\mathcal {F}}(e_j)\) (denoted on step 6 as \({\textsf{Z}}\)). That is achieved by applying B.4 on each of these frozen values of the relevant levels, and accounting for the accumulated error. And so, we have that \({\textsf{Z}}\approx {\mathcal {F}}(e_j)\). In addition, the assumption of the accuracy promise that \(80\%\) of (in particular) level j estimators are accurate thus their estimations \(z_j \approx {\mathcal {F}}(t)  {\mathcal {F}}(e_j)\). Combining it we get \({\textsf{Z}} + z_j \approx {\mathcal {F}}(t)\) which is established on Lemma B.7. Applying once again Lemma B.4 for times t that the estimators of level j were aggregated into the frozen value \({\textsf{Z}}_j\) (step 7a) results in \({\textsf{Z}}+{\textsf{Z}}_j \approx {\mathcal {F}}(t)\). By further observe that on these steps, the output is modifies into \({\textsf{Z}}+{\textsf{Z}}_j\), we get that on an output modification steps we guarantee a more refined accuracy then \(\alpha \). See Corollary B.8 inwhich the mentioned observation is elaborated. Finally, using B.4, B.7, B.8, we prove that given that the estimators are accurate (Assumption B.3), then on all \(t\in [m]\) before capping stage we have an accurate output (Lemma B.9). We prove an additional lemma in this section: Lemma B.6. Lemma B.6 states that during any phase the value of function \({\mathcal {F}}\) changes (increases or decreases) by at most a constant factor. That lemma is used in the proofs of B.7, B.8, B.9.
Lemma B.4
(Accuracy of frozen values) Let \(t\in [m]\) be a time step such that

1.
Assumption B.3 holds for every \(t'\le t\).

2.
\(\texttt {NoCapping}{=} \text {True}\) during time t.

3.
Algorithm PrivateMed was activated during time t (on Step 7a).
Let \(j\in [\beta +1]\) be the level of estimators used in time t. Let \({\textsf{Z}}_j\) be the value returned by PrivateMed , and suppose that \( {\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon } \sqrt{P_{j} \cdot \log \left( \frac{1}{\delta ^{\prime }} \right) }\log \left( \frac{P_{j}}{\delta ^{M}\alpha } \log (n)\right) \right) \). Then, with probability at least \(1\delta ^{M}/P_j\) we have that:

1.
For \(j = \beta \), \({\textsf{Z}}_j  {\mathcal {F}}(t) < \alpha _{\textrm{ST}}\cdot {\mathcal {F}}(t)\text {.}\)

2.
For \(j < \beta \), \({\textsf{Z}}_j  ({\mathcal {F}}(t)  {\mathcal {F}}(e_j)) < \alpha _{\textrm{TDE}}\cdot {\mathcal {F}}(e_j)\text {.}\)
Proof
In the case that step 7a was executed, mechanism PrivateMed was activated on the estimations \(z_j^{1},\dots ,z_j^{{\textsf{K}}_{j}}\) of level j estimators to get a new value for \({\textsf{Z}}_{j}\). By theorem A.5, assuming that^{Footnote 10}
then with probability at least \(1\delta ^{M}/P_{j}\) Algorithm PrivateMed returns an approximate median \({\textsf{Z}}_{j}\) to the estimations \(z_j^{1},\dots ,z_j^{{\textsf{K}}_{j}}\), satisfying
By assumption B.3, \((8/10)\cdot {\textsf{K}}_{j}\) of the estimations \(z^{k}\) satisfy the condition \(z^k  {\mathcal {F}}(t) < \alpha _{\textrm{ST}} \cdot {\mathcal {F}}(t)\) (or \(z^k ({\mathcal {F}}(t)  {\mathcal {F}}(e_j)) <\alpha _{\textrm{TDE}}\cdot {\mathcal {F}}(e_j)\) respectively), the approximate median \({\textsf{Z}}_j\) must also satisfy this condition. \(\square \)
Definition B.5
(Good execution) Throughout the execution of algorithm RobustDE, for \(j\in [\beta +1]\cup \{\textrm{W}\}\) the algorithm draws at most 4m noises from Laplace distribution with parameter \(\varepsilon _j\). In addition, denoting by \(P_j\) for \(j\in [\beta +1]\) the number of times that algorithm RobustDE activates \(\texttt {PrivateMed}\) on estimations of level j. Denote \(\delta ^{N}=\delta /(4\cdot (\beta +2))\), \(\delta ^M = \delta /(4\cdot (\beta +1))\). Set the algorithm parameters as follows \( {\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon _j} \log \left( \frac{P_{j}}{\delta ^{M}\alpha } \log (n)\right) \right) \), \(\varepsilon _j = O\left( \varepsilon /\sqrt{P_{j} \cdot \log \left( \frac{1}{\delta ^{\prime }} \right) } \right) \) We define a good execution as follows:

1.
All noises for all types \(j\in [\beta +1]\cup \{\textrm{W}\}\) are at most \(O\left( \frac{1}{\varepsilon _j}\log \left( \frac{ m}{\delta ^{N}}\right) \right) \) in absolute value.

2.
For all \(j\in [\beta +1]\), all first \(P_j\) frozen values of level j are accurate. That is, if t is the time of the frozen value computation then:

For \(j = \beta \), \({\textsf{Z}}_j  {\mathcal {F}}(t) < \alpha _{\textrm{ST}}\cdot {\mathcal {F}}(t)\text {.}\)

For \(j < \beta \), \({\textsf{Z}}_j  ({\mathcal {F}}(t)  {\mathcal {F}}(e_j)) < \alpha _{\textrm{TDE}}\cdot {\mathcal {F}}(e_j)\text {.}\)

We configure level j Laplace noise with parameter \(\varepsilon _j\). By the properties of Laplace distribution, with probability at least \(1\delta /4\), all noises for all types \(j\in [\beta +1]\cup \{\textrm{W}\}\) are at most \(\frac{4}{\varepsilon _j}\log \left( \frac{4m}{\delta ^{N}}\right) \) in absolute value. By Lemma B.4 the second requirement of Definition B.5 occur w.p. at least \(1\delta /4\). That implies a good execution w.p. at least \(1\delta /2\). We continue with the analysis assuming a good execution (B.5).
Max phase progress Algorithm RobustDE is coded with mechanism that guarantees a maximal progress of a phase (Step 2). A phase is starting whenever \(\tau [\beta ]\) is set to 0 (in either Step 2 or Step 7e). Denoting by \(t_p\) the time a phase has started, that code guarantees that in anytime t throughout the phase, the ratio between the values of the function for times \(t_p ,t\) is roughly bounded from above by \(\Gamma \) and from below by \(\Gamma ^{1}\). That gives a bound on the ratio of the value of the function between any two times that are on the same phase of \(\Theta (\Gamma ^2)\). That bound is given in the following lemma formally.
Lemma B.6
(Max phase progress) Let \(t_1< t_2\in [m]\) be time steps such that

1.
Assumption B.3 holds for every \(t'\le t_2\).

2.
\(\tau \ne 0\) for every \(t_1<t'\le t_2\).
Then, for any such \(t_1, t_2 \in [m]\), assuming \({\textsf{K}}_{\textrm{W}} = \Omega \left( \sqrt{P_{\textrm{W}}\cdot \log \left( \frac{1}{\delta ^{\prime }}\right) }\cdot \log \left( \frac{m}{\delta ^{N}}\right) \right) \text {,}\) For \(\delta ^{N}=O\left( \frac{\delta }{\log (\alpha ^{1})} \right) \) and \(P_{\textrm{W}} = O(\alpha \cdot \lambda )\) then assuming a good execution (see Definition B.5) we have
where \(\alpha _{\textrm{ST}}\le 1\) is the accuracy parameter of estimators and \(\Gamma \) is some constant.
Proof
Denote by \(t_p\) the time that s phase started (that is, in which \(\tau \) was set to 0). We first bound the ratio between \({\mathcal {F}}(t_p)\) and \({\mathcal {F}}(t)\) for time t in the same phase s.t. \({\mathcal {F}}(t_p)\le {\mathcal {F}}(t)\) (the case \({\mathcal {F}}(t_p)\ge {\mathcal {F}}(t)\) is similar). Since \(\tau \) was not set to 0 it means that on Step 2 the condition was not triggered. that is:
where the first inequality holds in a good execution (Definition B.5) and the last inequality follows by asserting that
where \(P_{\textrm{W}}\) is the number of times the condition in Step 2 may trigger during the run. Note that for a stream \({\mathcal {S}}\) with \(\lambda _{\alpha ^{\prime }}({\mathcal {S}})\) flip number, We have at most \(P_{\textrm{W}} = O(\alpha \cdot \lambda _{\alpha ^{\prime }}({\mathcal {S}}) )\) times in which the function value is changed by a constant factor. So, for at least \(4{\textsf{K}}_{\textrm{W}}/10\) of the estimations \(z_{k}^{\textrm{W}}\) we have that \(z_{k}^{\textrm{W}} < \Gamma \cdot {\textsf{Z}}_{\textrm{ST}}\). and we have:
where (*) follows from Assumption B.3 . Similarly for times t s.t. \({\mathcal {F}}(t_p)\ge {\mathcal {F}}(t)\) we get:
Overall, for any times \(t_1, t_2\) that belong to the same phase we get from Equations 5, 6:
\(\square \)
Lemma B.7
(Estimation error) Let \(t\in [m]\) be a time step such that

1.
Assumption B.3 holds for every \(t'\le t\).

2.
\(\texttt {NoCapping}{=} \text {True}\) during time t.
Let \(j\in [\beta +1]\) be the level of estimators used in time step t, and let \({\textsf{Z}}\) be the value computed in Step 6. Let \(z^{1}_{j}, \dots , z^{{\textsf{K}}_j}_{j}\) denote the estimations given by the estimators in level j. Then assuming a good execution (see Definition B.5), for at least \(80\%\) of the indices \(k\in [{\textsf{K}}_j]\) we have
where \(\alpha _{\textrm{Stitch}} = \Gamma \cdot (\alpha _{\textrm{ST}} + \beta \cdot \alpha _{\textrm{TDE}})\).
Proof
The estimation offset \({\textsf{Z}}\) is computed in a different manner for the cases that \(j = \beta \) and \(j < \beta \), as the first is an offset of the strong tracker and the second is an offset of a TDE of some level. We prove separately for these cases:
Case \(j<\beta \). On step 6 \({\textsf{Z}}\) is computed using the subroutine StitchFrozenVals(\(\tau \)). The parameter that is passed to that subroutine is \(\tau  2^{j1}+1\). Since \(\texttt {StitchFrozenVals}\) sums the frozen values with indices corresponding to the bits of the parameter that are set to 1, such a parameter results in summing frozen values correspond to levels \(j^{\prime }>j\) where j is the active level. And so, \({\textsf{Z}}\) consist of frozen values of levels \(j^{\prime }>j\). These are levels that consisted the output that was modified on the time that level j was enabled, that is on time \(e_j\). Thus summing the estimations from level j (that is one of the estimations \(\{z_j^{k}\}_{k\in [{\textsf{K}}_j]}\)) to that value \({\textsf{Z}}\) results in the current internal estimations to the value of the function \({\mathcal {F}}(t)\). In order to bound \({\mathcal {F}}(t)  ({\textsf{Z}} + z_j^{k})\) , we break the value \({\mathcal {F}}(t)\) into a telescopic series of differences, each difference correspond to freezing and enabling time of a certain level \(j^{\prime }\) from the frozen levels that compose \({\textsf{Z}}\). Let \(J_{{\textsf{Z}}}\) be the set of indexes of these levels and denote \(j_1> j_2 > \dots j_{{\textsf{Z}}}\) their order (therefore if \(J_{{\textsf{Z}}}\ne \emptyset \), then \(j_1 = \beta \) which is the level of the strong tracker).
where (1) holds by noting that for levels \(j_1 > j_2\), on the time that \(j_1\) was frozen \(j_2\) was enabled (see step 7c) thus \(f_{j_1} = e_{j_2}\). (2) is by reordering the terms and (3) is renaming index \(j_1\) as ST.
We now plug this alternative formulation of \({\mathcal {F}}(t)\) into the following:
where inequality (1) is by using Assumption B.3 directly on the right difference term while other difference are due to the accuracy of the frozen values promised on a good execution (Definition B.5), (2) is due to the ratio bound \(\Gamma \), that is promised by Lemma B.6, between any function values of two times from the same phase. (3) is due to the fact that \(J_{{\textsf{Z}}}\setminus \{\textrm{ST}\} \le \beta 1\) since \(j \notin J_{{\textsf{Z}}}\). Last equality is by denoting \(\alpha _{\textrm{Stitch}} = \Gamma \cdot (\alpha _{\textrm{ST}} + \beta \cdot \alpha _{\textrm{TDE}})\).
Case \(j=\beta \). On step 5, \({\textsf{Z}}\) is set to 0, then directly from Assumption B.3 we have:
\(\square \)
By now we showed that given that Assumption B.3 holds, then on step 6 we have a bound on the estimation error of at least 8/10 out of level j estimations of \({\mathcal {F}}(t)\) (Lemma B.7) and that whenever mechanism \(\texttt {PrivateMed}\) is activated (on step 7a), then its output for level j, \({\textsf{Z}}_j\) is accurate (Lemma B.4). Combining these lemmas results in the following corollary:
Corollary B.8
(Accuracy on output modification) Let \(t\in [m]\) be a time step such that

1.
Assumption B.3 holds for every \(t'\le t\).

2.
\(\texttt {NoCapping}{=} \text {True}\) during time t.

3.
Algorithm PrivateMed was activated during time t (on Step 7a).
Then assuming a good execution (see Definition B.5) we have
where \(\alpha _{\textrm{Stitch}} = \Gamma \cdot (\alpha _{\textrm{ST}} + \beta \cdot \alpha _{\textrm{TDE}})\).
Proof
During an output modification step, we update the value of \(\tau \). Denote \(\tau ^{\text {pre}}, \tau ^{\text {post}}\) the values of \(\tau \) before and after this update.
Case \(j < \beta \). For the case that the active level that was frozen was a \(\textrm{TDE}\) level, we look on the frozen values after mechanism \(\texttt {PrivateMed}\) was activated on step 7a. Let \(J_{{\textsf{Z}}}\) be the set of indexes for levels of the frozen values that compose \({\textsf{Z}}\) (computed by StitchFrozenVals(\(\tau \)) in step 6). Then:
where (1) is true for modification of levels smaller then \(\textrm{ST}\) since the values composing \({\textsf{Z}}\) are of levels that did not change after updating \(\tau \), (2) holds in a good execution (Definition B.5), (3) is by Lemma B.6, (4) is due to the fact that \(J_{{\textsf{Z}}}\setminus \{\textrm{ST}\} \le \beta 1\) Since \(j\notin J_{{\textsf{Z}}}\).
Case \(j = \beta \). For the case that the active level that was frozen was an \(\textrm{ST}\) level, then the output is \({\textsf{Z}}_{\textrm{ST}}\) which is accurate for the case of a good execution (Definition B.5). We have
\(\square \)
The following lemma is arguing about the output accuracy on all time \(t\in [m]\) (not only on output modification steps).
Lemma B.9
(Output accuracy) Let \(t\in [m]\) be a time step such that

1.
Assumption B.3 holds for every \(t'\le t\).

2.
\(\texttt {NoCapping}{=} \text {True}\) during time t.
Then assuming a good execution (see Definition B.5) we have
provided that \(\alpha _{\textrm{ST}} = O(\alpha )\), \(\alpha _{\textrm{TDE}} = O(\alpha /\log (\alpha ^{1}))\) and for all \(j\in [\beta +1]\cup \{\textrm{W}\}\), \({\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon _j}\log \left( \frac{m}{\delta ^{N}}\right) \right) \).
Proof
We prove for two cases of execution types: one is a step execution without an outputmodification, and the second is an execution that generates an outputmodification.
Case 1 (no outputmodification): If on time t we do not modify the output (the condition in step 7 was not satisfied), then assuming a good execution imply a bounded noise magnitude and we have that:
where the last inequality follows by asserting that
So, for at least \(4{\textsf{K}}_{j}/10\) of the estimations \(z_{j}^k\) we have that \(({\textsf{Z}}+z_{j}^{k})  \textrm{Output}(t1) \le {\textsf{Z}}_{\textrm{ST}}\cdot \textrm{StepSize}(\alpha )\). On the other hand, by the assumption on the accuracy of the estimators (Assumption B.3) we have that the requirement for Lemma B.7 met, therefore for at least \(8{\textsf{K}}_{j}/10\) of the estimations \(z_{j}^k\) we have that \({\mathcal {F}}(t)  ({\textsf{Z}} + z^{k}_j) \le \alpha _{\textrm{Stitch}}\cdot {\mathcal {F}}(t)\) Therefore, there must exist an index k that satisfies both conditions, and so:
where (1) is due to Lemma B.6 and (2) holds for \(\alpha _{\textrm{Stitch}}\le \frac{1}{10}\textrm{StepSize}(\alpha )\), and \(\textrm{StepSize}(\alpha )\le (2\Gamma )^{1}\alpha \). That imply \(\alpha _{\textrm{Stitch}} \le \alpha /(20\Gamma )\). Since \(\alpha _{\textrm{Stitch}} = \Gamma \cdot (\alpha _{\textrm{ST}} + \beta \alpha _{\textrm{TDE}})\) it is sufficient to set \(\alpha _{\textrm{ST}} = O(\alpha )\) and \(\alpha _{\textrm{TDE}} = O(\alpha /\beta ) = O(\alpha /\log (\alpha ^{1}))\) to get we have that \({\mathcal {F}}(t)  \textrm{Output}(t1)\le \frac{3}{4}\alpha \). Therefore the output is accurate for not updating the output.
Case 2 (an outputmodification): If on time t we do modify the output (the condition in step 7 was satisfied) then Algorithm PrivateMed was activated during time t (on Step 7a). In that case the requirements of Corollary B.8 are met and we have (for \(\alpha _{\textrm{Stitch}} \le \alpha \)):
\(\square \)
1.3 Calibrating to Avoid Capping
In this section we calculate the needed calibration of parameters \(P_j\) of RobustDE in order to avoid capping before the input stream ends. In order to avoid capping we need to calibrate for each of the estimators levels a sufficient privacy budget. That budget is derived from the number of output modification associated with each of these levels. At a high level, the calculation on these numbers per level is done as follows: recall our framework operates in phases. In each phase we bound the number of output modification for each of the estimators levels \(j\in [\beta +1]\cup \{\textrm{W}\}\). In addition we also bound the total number of phases. And so, the total number of output modification associated with each level results by multiplication of these bounds. This calculation is analysed w.r.t the framework level selection management (subroutine ActiveLVL(\(\tau \)) and the state of \(\tau \)). The following definition captures the number of output modifications we wish to bound w.r.t an input stream for algorithm RobustDE:
Definition B.10
For every level \(j\in [\beta +1]\cup \{\textrm{W}\}\) and every time step \(t\in [m]\), let \(C_j(t)\) denote the number of time steps \(t'\le t\) during which

1.
Level j was selected.

2.
The output is modified.
The lemma that bounds these quantities is Lemma B.14, and it is the main lemma of this section. A central part in that lemma is to upper bound the number of output modifications done by algorithm RobustDE for some stream segment. Lemma B.12 is useful for that. Additional lemma is needed: A phase can also be terminated before its predefined length (i.e. PhaseSize) by a phase reset. And so, in the analysis we focus on each of the stream segments between such resets. The analysis of these segments requires a bound on their flip number. This is enabled via Lemma B.11 that bounds the flip number of a sub stream:
Lemma B.11
(flip number of sub stream) Let \({\mathcal {S}}\) be a stream with a \((\alpha ,m)\)flip number denoted by \(\lambda _{{\mathcal {S}}}\). Let \({\mathcal {T}}= \{t_i\}_{i\in [\lambda _{{\mathcal {S}}}]}\), \(t_i < t_{i+1}\), \(t_i \in [m]\) be a set of time steps s.t. for all \(i\in [\lambda _{{\mathcal {S}}}]\), \({\mathcal {F}}(t_{i})  {\mathcal {F}}(t_{i+1}) \ge \alpha \cdot {\mathcal {F}}(t_{i})\). Let \({\mathcal {P}}\) be a substream of \({\mathcal {S}}\) from time \(r_1\) to time \(r_2>r_1\), \(r_1, r_2 \in [t]\). Let \(\lambda ^{\prime } = \{j\in [\lambda _{{\mathcal {S}}}] : \{t_j\}_{j\in [\lambda _{{\mathcal {S}}}]}, r_1\le t_j<r_2\}\) and let \(\lambda _{{\mathcal {P}}}\) be the \((\alpha , m)\)flip number of \({\mathcal {P}}\). Then:
Proof
Fix a stream \({\mathcal {S}}\) with a \((\alpha , m)\)flip number denoted as \(\lambda _{{\mathcal {S}}}\), and fix some set of time steps \({\mathcal {T}}= \{t_i\}_{i\in [\lambda _{{\mathcal {S}}}]}\), \(t_i < t_{i+1}\), \(t_i \in [m]\) s.t. for all \(i\in [\lambda _{{\mathcal {S}}}]\), \({\mathcal {F}}(t_{i})  {\mathcal {F}}(t_{i+1}) \ge \alpha \cdot {\mathcal {F}}(t_{i})\) with respect to \({\mathcal {S}}\). We look on time steps from \({\mathcal {T}}\) that reside in \([r_1, r_2)\), that is \({\mathcal {T}}\cap [r_1, r_2)\). Then, any set of times in \(t_j\in [r_1,r_2)\), \(t_j < t_{j+1}\) with \({\mathcal {F}}(t_{i})  {\mathcal {F}}(t_{i+1}) \ge \alpha \cdot {\mathcal {F}}(t_{i})\) is at size at most \({\mathcal {T}}\cap [r_1, r_2)\). Otherwise it could be used to construct along with \({\mathcal {T}}\setminus [r_1, r_2)\) a set of \(\alpha \)jumps times in \({\mathcal {S}}\) larger then \(\lambda _{{\mathcal {S}}}\) contradicting the maximality of the flip number of \({\mathcal {S}}\) being \(\lambda _{{\mathcal {S}}}\). That is:
Now, denote the smallest index by \(f = \text {argmin}\{t_j \in {\mathcal {T}}\cap [r_1, r_2)\}\) (for first) and the largest index by \(l = \text {argmin}\{t_j \in {\mathcal {T}}\cap [r_1, r_2)\}\) (for last). Then we can have at most additional two \(\alpha \)jumps. One from time \(r_1\) to time f and the second from time l to time \(r_21\) (regardless of the choice of \({\mathcal {T}}\)). That is:
\(\square \)
Lemma B.12
(Function value progress between outputmodifications) Let \(t_1< t_2\in [m]\) be consecutive times in which the output is modified (i.e., the output is modified in each of these two iterations, and is not modified between them), where

1.
Assumption B.3 holds for every \(t'\le t_2\).

2.
\(\texttt {NoCapping}{=} \text {True}\) during time \(t_2\).

3.
\(\tau \ne 0\) during time \(t_2\).
Then, assuming a good execution (see Definition B.5) we have:

\({\mathcal {F}}(t_2)  {\mathcal {F}}(t_1) \ge \textrm{StepSize}(\alpha )\cdot {\textsf{Z}}_{\textrm{ST}}  2\cdot \alpha _{\textrm{Stitch}}\cdot \max \{{\mathcal {F}}(t_1), {\mathcal {F}}(t_2)\}\)

\({\mathcal {F}}(t_2)  {\mathcal {F}}(t_1) \le \textrm{StepSize}(\alpha )\cdot {\textsf{Z}}_{\textrm{ST}} + (2\cdot \alpha _{\textrm{Stitch}} + \textrm{MuSize}(\alpha ))\cdot \max \{{\mathcal {F}}(t_1), {\mathcal {F}}(t_2)\}\)
Proof
Upper bound and lower bound of the function value between such times is analysed separately. First we analyse the lower bound of such progress and then we analyse the upper bound of it.
Minimum progress. We look on the time \(t_2\). Let j be the level of estimators used in time \(t_2\). On that time we modify the output, which means that during that time the condition on step 7 was satisfied. In such case, assuming a good execution we have bounded noise. That means that by asserting that \( {\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon } \sqrt{P_{j} \cdot \log \left( \frac{1}{\delta ^{\prime }} \right) }\log \left( \frac{m}{\delta ^{N}}\right) \right) \text {,} \) at least \(40\%\) of the estimations of level j admit \(({\textsf{Z}}+z_{j}^{k})  \textrm{Output}(t_21) \ge {\textsf{Z}}_{\textrm{ST}}\cdot \textrm{StepSize}(\alpha )\). Since Assumption B.3 holds, the requirements for Lemma B.7 are met and we have for at least \(80\%\) of these estimations: \({\mathcal {F}}(t_2)  ({\textsf{Z}} + z^{k}_j)  \le \alpha _{\textrm{Stitch}}\cdot {\mathcal {F}}(t_2)\). Thus at least one index k admit both inequalities and therefore:
Now, observe^{Footnote 11} that since there was no output modification between times \(t_1, t_2\) then \(\textrm{Output}(t_2  1) = \textrm{Output}(t_1)\). Applying Lemma B.8 time \(t_1\) (and setting \(\textrm{Output}(t_1)\leftarrow \textrm{Output}(t_21)\)) we get:
Combining the established equations 7, 8 we get:
Maximum progress. We now focus on times \(t_2, t_21\). Let j be the level of estimator that is used on time \(t_21\). Since on time \(t_21\) we did not modify the output, then the condition on step 7 did not trigger. That means that for at least \(40\%\) of the estimations \(z^{k}_j\) of level j the following holds: \( ({\textsf{Z}}+z^{k}_j)  \textrm{Output}(t_21)  \le {\textsf{Z}}_{\textrm{ST}} \cdot \textrm{StepSize}(\alpha ) \). Since in addition the output did not change between \(t_1\) to \(t_21\) (thus \(\textrm{Output}(t_1) = \textrm{Output}(t_21)\)) and by applying Corollary B.8 on time \(t_1\) we have that the output on time \(t_1\) was \(\alpha _{\textrm{Stitch}}\)accurate. And so we get that for at least (same) \(40\%\) estimations that are used on time \(t_21\) the following holds:
Now, applying on time \(t_2 1\) Lemma B.7 (since assumption B.3 holds) we get that for at least \(80\%\) of the estimations \(z^{k}_j\) of level j the following holds: \( ({\textsf{Z}} + z^{k}_j)  {\mathcal {F}}(t_2 1)  \le \alpha _{\textrm{Stitch}} \cdot {\mathcal {F}}(t_2 1)\). Recalling the maximum update size assumption (see Condition 2) we get a bound on the progress of the value of the function \({\mathcal {F}}\) between the adjacent times \(t_2, t_21\): \( {\mathcal {F}}(t_2)  {\mathcal {F}}(t_21)  \le \textrm{MuSize}(\alpha ) \cdot {\mathcal {F}}(t_2)\). And so, for at least \(80\%\) of the estimation \(z^{k}_j\) used on time \(t_21\) the following holds:
Equations 9, 10 hold for \(40\%\) and \(80\%\) of the estimations \(z^{k}_j\) of time \(t_21\) respectively. And so, for at least one of these estimations both equations hold and we get:
\(\square \)
The following lemma is using Lemmas B.11, B.12 to bound the total number of output modification \(C_j\) for each estimators levels \(j\in [\beta +1]\cup \{\textrm{W}\}\).
Remark B.13
Recall that once \(\texttt {NoCapping}{=} \text {False}\), then the output never changes. Therefore, if during some time \({\hat{t}}\) we have that \(\texttt {NoCapping}{=} \text {False}\), then \(C_j({\hat{t}})=C_j({\hat{t}}+1)\).
Lemma B.14
(Output modifications of each level) Let \({\mathcal {S}}\) be the input stream of length m for algorithm RobustDE with a flip number \(\lambda _{\alpha ^{\prime }}({\mathcal {S}})\) and let \(t\in [m]\) be a time step such that Assumption B.3 holds for every \(t'\le t\). Then, assuming a good execution (see Definition B.5), for every level \(j\in [\beta +1]\cup \{\textrm{W}\}\) we have
where \(\alpha ^{\prime } = (1/2)\cdot \textrm{StepSize}(\alpha ) = O(\alpha )\).
Proof
We bound the number of output modifications for each level by bounding the number of phases and then multiplying it with the number of output modifications of each level within a phase. The later is done in the last part of the proof while bounding the number of phases is the main part of the proof.
Bounding the number of phases. Whenever a phase starts, the previous phase is terminated. We elaborate on the two cases of phase termination and count them separately. A phase starts whenever an ST level estimators (i.e. \(\beta \)) are selected in ActiveLVL(\(\tau \)). That happens whenever \(\tau [\beta ] = 0\) which happens in two cases:

(C1):
A phase end: \(\tau \ne 0, \tau [\beta ]=0\). When Step 7g was executed on previous time step.

(C2):
A phase reset: \(\tau =0\). When condition in Step 2 is True.
And so in (C1) previous phase reached its end while in (C2) previous phase is terminated before its ending due to a phase reset.
Number of phase resets. When the condition in Step 2 is True it holds that the value of the target function \({\mathcal {F}}\) has changed by a constant multiplicative factor \(\Gamma \) (\(\Gamma \ge 2\)) compared to what it was in the beginning of the terminated phase. By the assumption of the flip number of \({\mathcal {S}}\), this can happen at most \(O(\alpha \cdot \lambda _{\alpha ', m}({\mathcal {S}}) )\) times. That is, the number of phase resets is bounded by:
Number of output modifications between resets. We bound the number of output modifications between two consecutive times where a phase reset was executed (C2). Denote two such consecutive times where \(\tau =0\) by \(r_{i}<r_{i+1}\) and let \({\mathcal {S}}_i\) be the segment of \({\mathcal {S}}\) for the times \([r_{i}, r_{i+1})\) with an \(\alpha ^{\prime }\)flip number \(\lambda _{\alpha ^{\prime }}({\mathcal {S}}_i)\). We bound the number of output modifications in \([r_{i}, r_{i+1})\) by looking at two consecutive time steps where the output is modified \(t_1 < t_2\), s.t. \(r_{i}\le t_1< t_2 < r_{i+1}\). That is, the output is modified in times \(t_1,t_2\) and is not modified between them. Then we have
(1) is by Lemma B.12, (2) holds due to the ratio checked in step 2 thus the ratio holds for any time of that phase (3) holds whenever \(\alpha _{\textrm{ST}} \le 1/3\) and \(\alpha _{\textrm{Stitch}}\le (1/12\Gamma )\cdot \textrm{StepSize}(\alpha )\). That is, in every time of such output modification the true value of the target function is changed by a multiplicative factor of at least \((1\pm \alpha ')\). Thus, for every segment \({\mathcal {S}}_i\) algorithm RobustDE can have at most \(\lambda _{\alpha ^{\prime }}({\mathcal {S}}_{i})\) such output modifications. Since in every such segment we have a single phase reset, and a single additional output modification that results from it, we have:
Output modifications in a phase. We now show that by algorithm RobustDE management of phases start/ end time, a phase that ends without a reset termination (that is in case (C1)), has \(O(\textrm{PhaseSize})\) number of output modifications (phases of case (C2) are shorter). That management is done on Steps 7e, 7f, 7g according to \(\tau \) which indicates the number of output modifications in a phase. We now elaborate on that management for case (C1):

1.
Step 7e (Stating a new phase) Setting ST bit in \(\tau \) to 1 to indicate a new value for \({\textsf{Z}}_{\textrm{ST}}\). This also set all lower bits of \(\tau \) to 0 which indicated that there are no frozen value for levels \(j<\beta \).

2.
Step 7f (Inner phase step) Increment the value of \(\tau \) by \(+1\) to indicate additional step of the current phase.

3.
Step 7g (Ending phase) Setting the ST bit of \(\tau \) to 0 to indicate that the phase has ended and next estimator level used will be ST.
That is, the counting cycle of a phase is managed on the lower bits (\([0,\beta )\)) of \(\tau \): These bits are set to zero on the beginning of the phase. Then for each output modification \(\tau \) is incremented by 1. The cycle ends when the value of these bits equals PhaseSize. Accounting the output modification done on a phase start, the number of output modifications in a phase that ends in case (C1) is \(\textrm{PhaseSize}+ 1\).
Total number of phases. For \(\kappa \) number of phase resets executed in times \(r_0<r_1\dots <r_{\kappa }\) we have \(\kappa + 1\) substreams of \({\mathcal {S}}\) corresponding to times \([r_i,r_{i+1})\) denoted by \({\mathcal {S}}_i\) for \(i\in [\kappa +1]\). Denote by \(\phi , \phi _i\) the number of phases in \({\mathcal {S}}, {\mathcal {S}}_i\) correspondingly. The following holds:
where (1) is true since on each segment \({\mathcal {S}}_i\) there is no phase reset and we start a new phase every \(\textrm{PhaseSize}+1\) number of steps, (2) holds by Equation 12, (3) is true by Lemma B.11 and (4) is true since by Equation 11 we have that \(\kappa = O\left( \alpha \lambda _{\alpha ^{\prime }({\mathcal {S}})} \right) \) and \(\textrm{PhaseSize}= O(\alpha ^{1})\).
Number of ouput modification for level j The levels \(j\in [\beta + 1]\) are selected in ActiveLVL(\(\tau \)) according to \(\tau ,\) s.t. j is the LSB of \(\tau + 1\). Since on every output modification we increment the value of \(\tau \) by \(+1\) then level \(j=0\) is selected every second time, level \(j=1\) is selected every forth time, level \(j=2\) is selected every eighth time and so on. That is a total of \(O(\textrm{PhaseSize}/2^{j})\) for level j. Multiplying the established bound for \(\phi \) (the total number of phases) with that bound of the number of output modification of level j we get:
\(\square \)
Corollary B.15
Provided that \(\lambda > \lambda _{\alpha ^{\prime }}({\mathcal {S}})\) then algorithm RobustDE will not get to capping state by calibrating:
1.4 The Framework is Robust
We move on to show that the framework RobustDE is robust for adaptive inputs. Lemma B.17 (adaptation of Lemma 3.2 [10]) uses tools from differential privacy to show that if the framework preserve privacy with respect to the random strings of the estimators, then the estimators yield accurate estimations. Yet in our case the accuracy of estimators of levels \(j<\beta \) (i.e. TDE levels) have an additional requirement: they must also estimate differences that are within their range (see requirement 3). We show in Lemma B.16 that indeed whenever estimator of level j is being used by the framework, then it is estimating a difference that is within its accuracy range.
Lemma B.16
(bounded estimation ranges) Let \(t\in [m]\) be a time step such that

1.
Level \(j \in [\beta ]\) was selected (a TDE).

2.
Assumption B.3 holds for every \(t'< t\).
Denote by \(e_j\) the last time step during which level j estimators were enabled. Then, assuming a good execution (see Definition B.5), the following holds:
where \(\gamma _j = \frac{1+\alpha _{\textrm{ST}}}{1\alpha _{\textrm{ST}}} \Gamma ^2 \cdot 2^{j+1} \cdot \alpha = O(2^j \cdot \alpha )\).
Proof
Let \(j\in [\beta ]\) be some \(\textrm{TDE}\) level and let t be a time s.t. level j is selected. Then the number of output modifications between the time \(e_j\) (the enabling time of level j estimators) and the time t is \(2^j1\). Denote the times during which the output was modified between the time \(e_j\) the time t by \(\{t_l\}_{l\in [2^j1]}\) where \(t_{l=0} = e_j\). We first bound the difference of the current value of the function \({\mathcal {F}}\) to its value on the last output modification:
where the last inequality is due to B.9 and B.8, and the assumption of bounded update size (condition 2). The following holds:
where (1) is by decomposing (\({\mathcal {F}}(t)  {\mathcal {F}}(e_j)\)) according to \(\{t_l\}_{l\in [2^j1]}\), (2) holds by plugging in Equation 13 and by applying Lemma B.12 on each of the differences in the term, (3) is due to Lemma B.6, (4) is by setting \(\alpha _{\textrm{Stitch}} = \textrm{MuSize}(\alpha )\) , (5) is by setting \(\textrm{StepSize}(\alpha ) \ge 4\cdot \textrm{MuSize}(\alpha )\), (6) is by denoting \(\gamma _j = \frac{1+\alpha _{\textrm{ST}}}{1\alpha _{\textrm{ST}}} \Gamma ^2 \cdot 2^{j+1} \cdot \alpha \). \(\square \)
Lemma B.17
(Accurate Estimations (Lemma 3.2 [10])) The following holds for a good execution (see Definition B.5). Let \(t\in [m]\) be a time step such that:

1.
Level j was selected.

2.
Assumption B.3 holds for every \(t'< t\).
Let \({\textsf{E}}({\mathcal {S}},\pi )\) be the estimator of level j that was selected on time step t, and let \(\pi \) be its (possibly dynamic) parameters. Let \({\textsf{E}}({\mathcal {S}},\pi )\) have (an oblivious) guarantee that all of its estimates are accurate with accuracy parameter \(\alpha _{{\textsf{E}}}\) with probability at least \(\frac{9}{10}\). Then for sufficiently small \(\varepsilon \), if algorithm RobustDE is \((\varepsilon ,\delta ^{\prime })\)DP w.r.t. the random bits of the estimators \(\{{\textsf{E}}^k\}_{k\in {\textsf{K}}}\), then with probability at least \(1\frac{\delta ^{\prime }}{\varepsilon }\), for time t we have:

1.
For \(j \in \{\beta , \textrm{W}\}\), \(\{k\in [{\textsf{K}}] : z^{k}{\mathcal {F}}(t)< \alpha _{{\textsf{E}}}\cdot {\mathcal {F}}(t) \} \ge (8/10) {\textsf{K}}\)

2.
For \(j < \beta \), \(\{k\in [{\textsf{K}}] : z^{k}({\mathcal {F}}(t)  {\mathcal {F}}(e))< \alpha _{{\textsf{E}}}\cdot {\mathcal {F}}(e) \} \ge (8/10) {\textsf{K}}\)
where \(z^k \leftarrow {\textsf{E}}^{k}({\mathcal {S}},\pi )\) for a set of size \({\textsf{K}} \ge \frac{1}{\varepsilon ^2}\log \left( \frac{2\varepsilon }{\delta ^{\prime }} \right) \) of the oblivious estimator \({\textsf{E}}({\mathcal {S}},\pi )\)
Proof
Since the requirements of Lemma B.16 holds, we have that on time t, whenever the level j that was selected is a type \(\textrm{TDE}\) estimator (i.e. \(j < \beta \)), that the accuracy requirement of these estimators holds. That is,
Now, for time t let \({\mathcal {S}}_t=\left( \langle s_1, \Delta _1 \rangle , \dots , \langle s_t, \Delta _t \rangle \right) \) be the prefix of the input stream \({\mathcal {S}}\) for that time, and let \(\pi (t)\) be the parameters configured to \({\textsf{E}}\) at that time. Let \(z_t\leftarrow {\textsf{E}}(r,\langle {\mathcal {S}}_t, \pi (t) \rangle )\) be the estimation returned by the oblivious streaming algorithm \({\textsf{E}}\) after the t stream update, when its executed with random string r on the input stream \({\mathcal {S}}_t\) with parameters \(\pi (t)\). Consider the following function (which is differently defined w.r.t the estimator type):

1.
for \(j = \{\beta ,\textrm{W}\}\), define \(f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r) = \mathbb {1}\left\{ z_t \in \left( 1 \pm \alpha _{{\textsf{E}}} \right) \cdot {\mathcal {F}}({\mathcal {S}}_t) \right\} \)

2.
for \(j < \beta \), define \(f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r) = \mathbb {1}\left\{ z_t\in \left( {\mathcal {F}}({\mathcal {S}}_{t})  {\mathcal {F}}({\mathcal {S}}_{e(t)})\right) \pm \alpha _{{\textsf{E}}} \cdot {\mathcal {F}}({\mathcal {S}}_{e(t)}) \right\} \)
Since Lemma B.2 holds, then by the generalization properties of differential privacy (see Theorem A.8), assuming that \({\textsf{K}} \ge \frac{1}{\varepsilon ^2}\log \left( \frac{2\varepsilon }{\delta ^{\prime }} \right) \), with probability at least \(1\frac{\delta ^{\prime }}{\varepsilon }\), the following holds for time t:
We continue with the analysis assuming that this is the case. Now observe that \({\mathbb {E}}_{r} \left[ f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r) \right] \ge 9/10\) by the utility guarantees of \({\textsf{E}}\) (because when the stream is fixed and its accuracy requirement is met its answers are accurate to within a multiplicative error of \((1\pm \alpha _{{\textsf{E}}})\) with probability at least 9/10). Thus for \(\varepsilon \le \frac{1}{100}\), for at least of 8/10 of the executions of \({\textsf{E}}\) we have \(f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r_k)=1\) which means the estimations \(z_t\) returned from these executions are accurate. That is, we have that at least \(8{\textsf{K}}/10\) of the estimations \(\left\{ z^k_t \right\} _{k\in [{\textsf{K}}]}\) satisfy the accuracy of the estimators of level j. \(\square \)
Lemmas B.16 and Lemma B.17 state that for some time \(t\in [m]\) given that assumption B.3 holds for all times \(t^{\prime }< t\) then it also hold in time t (w.p. \(1\varepsilon /\delta ^{\prime }\)). As a corollary we get the following:
Lemma B.18
(Accuracy assumption holds) Fix a time step \(t\in [m]\). Let \(j\in [\beta +1]\cup \{\textrm{W}\}\) be the level of estimators used in time t, Recall that \({\textsf{K}}_j\) denotes the number of the estimators in level j, and let \(z^{1}_{j}, \dots , z^{{\textsf{K}}_j}_{j}\) denote the estimations given by these estimators. Then with probability at least \(1\delta /2\), the following holds for all time \(t\in [m]\):

1.
For \(j \in \{\beta \), W}, \(\{k\in [{\textsf{K}}_j] : z^{k}_{j}{\mathcal {F}}(t)< \alpha _{\textrm{ST}}\cdot {\mathcal {F}}(t) \} \ge (8/10) {\textsf{K}}_j\)

2.
For \(j < \beta \), \(\{k\in [{\textsf{K}}_j] : z^{k}_{j}({\mathcal {F}}(t)  {\mathcal {F}}(e_j))< \alpha _{\textrm{TDE}}\cdot {\mathcal {F}}(e_j) \} \ge (8/10) {\textsf{K}}_j\)
provided that \(\varepsilon _j = O\left( \varepsilon \frac{1}{\sqrt{P_j\cdot \log {1/\delta ^{\prime }}}}\right) \) for \(\delta ^{\prime } = O(\varepsilon \delta /m\beta )\).
Proof
Fix some time \(t\in [m]\) and let \(j\in [\beta +1]\cup \{\textrm{W}\}\) be the level of estimators used in that time. We set \(\delta ^{\prime }=\varepsilon \cdot \delta /(2m(\beta +2))\), then by union bound over m possible times we have that with probability \(1\delta /(2(\beta +2))\), \(8/10\cdot {\textsf{K}}_j\) of the estimators of level j are accurate by Lemma B.17. Union bound over all different \(\beta +2\) estimators level, we get that with probability at least \(1\delta /2\), 8/10 of estimations are accurate for all time \(t\in [m]\) for all levels \(j\in [\beta ]\cup \{\textrm{ST}\}\cup \{\textrm{W}\}\). \(\square \)
By Corollary B.15 and Lemma B.18 we have that algorithm RobustDE will not get to capping state and Assumption B.3 holds. That is, the conditions for Lemma B.9 are met and we have the following:
Theorem B.19
(Algorithm RobustDE correctness) Denote \(\delta ^{*} = \delta /\beta = O(\delta /\log (\alpha ^{1}))\). Provided that For all \(j\in [\beta +1]\):

1.
\(\gamma _j = \Omega (2^j\cdot \alpha )\)

2.
\(\varepsilon _{j}=O\left( 1/\sqrt{P_j\log (m/\delta ^{*})}\right) \)

3.
\(P_j = \Omega \left( \frac{\lambda }{2^j}\right) \)

4.
\({\textsf{K}}_{j} = \Omega \left( \sqrt{P_j\log \left( \frac{m}{\delta ^{*}} \right) } \left[ \log \left( \frac{P_{j}}{\delta ^{*}\alpha } \log (n)\right) + \log \left( \frac{m}{\delta ^{*}}\right) \right] \right) \)
and \(\varepsilon _{\textrm{W}} = \varepsilon _{\beta }\), \(P_{\textrm{W}} = P_{\beta }\), \({\textsf{K}}_{\textrm{W}} = {\textsf{K}}_{\beta }\), then for all time \(t\in [m]\), with probability at least \(1\delta \) we have
Proof
The proof follows by two parts: estimators of the framework remain accurate under adaptive inputs and that the framework computes an accurate output from their estimation.
Estimators are accurate w.h.p. Lemma B.18 holds due to the privacy of the data bases \(\{{\mathcal {R}}_j\}_{j\in [\beta +1]}\cup \{{\mathcal {R}}_{\textrm{W}}\}\) and by making sure the estimations of level \([\beta ]\) are done within the estimators accuracy range. The later holds by Lemma B.16 with configuring \(\gamma _j = \Omega (2^j\cdot \alpha )\). The privacy of \(j\in [\beta +1]\cup \{\textrm{W}\}\) databases \({\mathcal {R}}_j\) in Lemma B.2 is due to calibrating the noise parameters \(\varepsilon _{j}=O\left( \varepsilon /\sqrt{P_j\log (1/\delta ^{\prime })}\right) \). By Corollary B.15, it is sufficient to set \(P_j = \Omega \left( \frac{\lambda }{2^j}\right) \) (and \(P_{\textrm{W}} = P_{\beta }\)) to have sufficient privacy budget for all databases \({\mathcal {R}}_j\). And so, by Lemma B.18, setting \(\delta ^{\prime }=\varepsilon \cdot \delta /(2m(\beta +2)) = O(\varepsilon \cdot \delta /m\beta )\) yields that at least \(80\%\) of estimators of each of the levels j are accurate on all \(t\in [m]\) (assumption B.3) w.p. at least \(1\delta /2\).
Output accuracy. It remains to show that the requirements of above lemmas are met. That is we have a good run (Definition B.5) w.h.p, and in addition the number of estimators \({\textsf{K}}_j\) on each level is calibrated according to the constraints of above lemmas. We begin by calculating a sufficient number of estimators \({\textsf{K}}_j\) for the required lemmas: First, Lemma B.17 require for all levels to have \({\textsf{K}} \ge \frac{1}{\varepsilon ^2}\log \left( \frac{2\varepsilon }{\delta ^{\prime }} \right) \). Lemma B.4 requires \({\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon _j} \log \left( \frac{P_{j}}{\delta ^{M}\alpha } \log (n)\right) \right) \), Lemma B.6 requires \({\textsf{K}}_{\textrm{W}} = \Omega \left( \frac{1}{\varepsilon _{\textrm{W}}} \log \left( \frac{m}{\delta ^{N}}\right) \right) \), Lemma B.9 requires \({\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon _j}\log \left( \frac{m}{\delta ^{N}}\right) \right) \), Lemma B.12 requires \({\textsf{K}}_{j} = \Omega \left( \frac{1}{\varepsilon _j} \log \left( \frac{m}{\delta ^{N}}\right) \right) \), and we have overall requirement of:
Now, recall that \(\varepsilon =10^{2}\) (constant). Setting \(\delta ^{N}=\delta /(4\cdot (\beta +2))\), \(\delta ^M = \delta /(4\cdot (\beta +1))\) and denote \(\delta ^{*} = \delta /\beta \) we get that \(\delta ^{\prime } = O(\delta ^{*}/m)\), \(\delta ^{M}, \delta ^{N} = O(\delta ^{*})\). And so Equation 14 simplified:
The setting \(\delta ^{N}=\delta /(4\cdot (\beta +2))\), \(\delta ^M = \delta /(4\cdot (\beta +1))\) also yields that we have a good run w.p. at least \(\delta /2\). And so, all the requirement of Lemma B.9 are met and we have with probability at least \(1\delta \) the output is accurate in all time \(t\in [m]\). \(\square \)
1.5 Calculating the Space Complexity
Algorithm RobustDE space complexity is determined by its input parameters: accuracy parameter \(\alpha >0\), the flip number bound \(\lambda \) of the input stream \({\mathcal {S}}\) for functionality \({\mathcal {F}}\), failure probability \(\delta \in (0,1]\), and space complexity of the given subroutines \({\textsf{E}}_{\textrm{ST}}\) and \({\textsf{E}}_{\textrm{TDE}}\) (denoted \(S_{\textrm{ST}}(\alpha _{\textrm{ST}},\delta _{\textrm{ST}},n,m)\) and \(S_{\textrm{TDE}}(\gamma ,\alpha _{\textrm{TDE}},\delta _{\textrm{TDE}},n,m)\) correspondingly). Observe that the pointers to the subroutines are on order of the total number of estimators (i.e. \(\sum _{j \in [\beta +1]} {\textsf{K}}_{j}\)). Therefore, the dominating parameter of the space complexity is the number of the estimators, which multiplied by the space complexity of the estimators will by the dominating term in the total space complexity. These set sizes \({\textsf{K}}_{j}\) are determined by the number of times the corresponding estimators type that were used caused an output modification, which is upper bounded by \(P_j\) and calculated in Lemma B.14. In Theorem B.19 we have the sufficient numbers of setsizes to calculate the space complexity of algorithm RobustDE. That is, we are now ready to prove Theorem B.1.
Proof of Theorem B.1
Setting the estimators set sizes \({\textsf{K}}_j\) for \(j\in [\beta +1]\) according to Theorem B.19, ensures that with probability at least \(1\delta \), algorithm RobustDE produce an accurate output at all times \(t\in [m]\) with the presence of an adaptive adversary controlling the stream. And so, we calculate the space needed according this setting. We separate the calculation of levels according to the type of estimators used in each level.
\(\textrm{TDE}\) level j space: Consider an oblivious toggle difference estimator for the function \({\mathcal {F}}\), \({\textsf{E}}_{\textrm{TDE}}\), with space complexity \(\textrm{Space}({\textsf{E}}_{\textrm{TDE}}) = \gamma \cdot S_{\textrm{TDE}}(\alpha , p, n, m)\). Recall (see Lemma B.16, Lemma B.9 and Corollary B.15) that for level j estimators we have \(\gamma _j = O(\alpha \cdot 2^{j})\), \(\alpha _{\textrm{TDE}} = O(\alpha /\beta ) = O(\alpha /\log (\alpha ^{1}))\), \(P_j = O(\lambda /2^j)\). Plugging in the parameters of granularity level j yields an oblivious TDE with space complexity of:
Accounting for the sufficient amount of estimators \({\textsf{K}}_{\textrm{TDE},j}\):
where \((*)\) holds since \(S_{\textrm{TDE}}\) is monotonic increasing in its p parameter and for all j \(P_j = O(\lambda /2^j) \le O(\lambda )\) and on last equality we denoted .
TDEs total space: We sum the calculated \(\textrm{Space}(\mathrm{TDE\text {}j})\) over \(j\in [\beta ]\) for \(\beta =\lceil \log (\alpha ^{1})\rceil \) to get the total TDE estimators space:
where equality \((*)\) is true since for \(\beta =\lceil \log (\alpha ^{1})\rceil \) the following holds:
ST space: Here we argue about the space complexity of both \(\textrm{ST}\), \(\textrm{W}\) estimators. Since all their parameters are identical, we only calculate the space for \(\textrm{ST}\) estimators. Consider an oblivious strong tracker for the function \({\mathcal {F}}\), \({\textsf{E}}_{\textrm{ST}}\), with space complexity \(\textrm{Space}({\textsf{E}}_{\textrm{ST}}) = S_{\textrm{ST}}(\alpha , n, m)\). Recall (see Lemma B.9 and Corollary B.15) that for level \(j\in \{ \textrm{ST}, \textrm{W}\}\) estimators we have \(\alpha _{ST} = O(\alpha )\), \(P_j = O(\alpha \cdot \lambda )\). Then plugging in the parameters for the \(\textrm{ST}\), \(\textrm{W}\) type estimator yields an oblivious estimator with space complexity of:
Accounting for the sufficient amount of estimators \({\textsf{K}}_{\textrm{ST}}\):
on last equality we denoted .
Algorithm space complexity: We sum the contribution of all estimators, Strong trackers and TDEs, to get:
where

1.
\(S_{\textrm{ST}} = S_{\textrm{ST}}(\alpha _{\textrm{ST}},n,m) = S_{\textrm{ST}}(O(\alpha ),n,m)\).

2.
\(S_{\textrm{TDE}} = S_{\textrm{TDE}}(\alpha _{\textrm{TDE}},\lambda ,n,m) = S_{\textrm{TDE}}(O(\alpha /\log (\alpha ^{1})),\lambda ,n,m)\).

3.
\(\text {polylog}_{\textrm{ALG}} = \left[ \log \left( \frac{m}{\delta ^{*}}\right) + \log \left( \frac{\lambda }{\alpha \delta ^{*}} \log (n)\right) \right] \sqrt{\log \left( \frac{m}{\delta ^{*}}\right) } \), for \(\delta ^{*} = \delta /\log (\alpha ^{1})\).
\(\square \)
Corollary B.20
Provided that there exist:

1.
An oblivious streaming algorithm \({\textsf{E}}_{\textrm{ST}}\) for functionality \({\mathcal {F}}\), that guarantees that with probability at least 9/10 all of it’s estimates are accurate to within a multiplicative error of \((1\pm \alpha _{\textrm{ST}})\) with space complexity of \( O\left( \frac{1}{\alpha _{\textrm{ST}}^{2}}\cdot f_{\textrm{ST}}\right) \) for \(f_{\textrm{ST}}=\text {polylog}(\alpha _{\textrm{ST}},n,m)\)

2.
For every \(\gamma ,p\) there is a \((\gamma ,\alpha _{\textrm{TDE}},\frac{1}{10},p)\)\(\textrm{TDE}\) for \({\mathcal {F}}\) using space \(\gamma \cdot O\left( \frac{1}{\alpha _{\textrm{TDE}}^{2}}\cdot f_{\textrm{TDE}} \right) \) for \(f_{\textrm{TDE}}=\text {polylog}(\alpha _{\textrm{TDE}},p,n,m)\).
Then there exist an adversarially robust streaming algorithm for functionality \({\mathcal {F}}\) that for any stream with a bounded flip number \(\lambda _{\frac{1}{8}\alpha ,m}< \lambda \), s.t. with probability at least \(1\delta \) its output is accurate to within a multiplicative error of \((1\pm \alpha )\) for all times \(t\in [m]\), and has a space complexity of
Formal Details for Applications (Section 6)
In this section we give the formal details for the resulting space bounds for \(F_2\). As these bounds are a function of a stream characterization, we begin with that.
Characterising the input streams for \(F_2\). The \(F_2\) \(\textrm{DE}\) construction presented in [16] has an additional requirement for turnstile streams. We now present this requirement:
Lemma C.1
(Difference estimator for \(F_2\)(Lemma 3.2, [16])) There exists a \((\gamma , \alpha , \delta )\)difference estimator for \(F_2\) that uses space of \(O(\gamma \varepsilon ^{1}\log n (\log \alpha ^{1} + \log \delta ^{1}))\) for streams \({\mathcal {S}}\) that for any time \(t>e\), where \(e\in [m]\) is the enabling time, admit:
For \(F_2\) estimation of a turnstile stream, it may be the case that requirement 16 does not hold while the DE accuracy guarantee does (See 3 in Definition 3.3).^{Footnote 12} The problem is that in such a scenario the \(\textrm{DE}\) estimators that are used by the framework are not accurate, while the framework may try to use their estimations. In order to capture such a scenario in a stream, we define the following:
Definition C.2
(Suffix violation of \({\mathcal {F}}\) ) Let \(\gamma \in (0,1)\). For \({\mathcal {F}}\), for some time \(e \in [t]\) where the stream \({\mathcal {S}}_{t}\) of length t is partitioned, denote \({\mathcal {S}}_{e}\) as the prefix of that partition and by \({\mathcal {S}}_{t}^{e}\) its suffix. Then time e is a \(\gamma \)suffix violation for \({\mathcal {F}}\) if the following holds:

1.
\({\mathcal {F}}({\mathcal {S}}_{t})  {\mathcal {F}}({\mathcal {S}}_{e})  \le \gamma \cdot {\mathcal {F}}({\mathcal {S}}_{e})\), and

2.
\({\mathcal {F}}({\mathcal {S}}_{t}^{e}) > \gamma \cdot {\mathcal {F}}({\mathcal {S}}_{e})\)
Note, that on times \(t\in [m]\) that are \(F_2\) suffix violations the \(\textrm{DE}\) construction from [16] has no accuracy guarantee, and so our framework cannot use it for its estimation. We wish to characterize the input stream w.r.t the number of such violations. For that, we present the notion of a twist number of a stream (also defined in Sect. 1, Definition 1.8):
Definition C.3
(Twist number) The \((\alpha ,m)\)twist number of a stream \({\mathcal {S}}\) w.r.t. a functionality \({\mathcal {F}}\), denoted as \(\mathrm{\mu }_{\alpha ,m}({\mathcal {S}})\), is the maximal \(\mu \in [m]\) such that \({\mathcal {S}}\) can be partitioned into \(2\mu \) disjoint segments \({\mathcal {S}}= {\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_{\mathrm{\mu }1} \circ {\mathcal {V}}_{\mathrm{\mu }1}\) (where \(\{{\mathcal {P}}_i\}_{i\in [\mathrm{\mu }]}\) may be empty) s.t. for every \(i\in [\mu ]\):

1.
\({\mathcal {F}}({\mathcal {V}}_i) > \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {V}}_{i1} \circ {\mathcal {P}}_i)\)

2.
\({\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \dots \circ {\mathcal {P}}_i \circ {\mathcal {V}}_i)  {\mathcal {F}}({\mathcal {P}}_0\circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i) \le \alpha \cdot {\mathcal {F}}({\mathcal {P}}_0 \circ {\mathcal {V}}_0 \circ \dots \circ {\mathcal {P}}_i)\)
An extension for the turnstile model: Algorithm description. At a high level, the extension is wrapping our framework. It monitors the output of the framework and checks whether it is accurate, in which case it forwards it as the output. If the framework is not accurate then it must be due to a previous input that did not admit some of the frameworks \(\textrm{DE}\) input requirement. In such case, the monitor sends a phase reset command to the framework, and outputs the same as the framework after its reset. This accuracy assertion is done by running additional estimators (strong trackers) that are used as a validation to the framework output. These monitor estimators are correct on all turnstile input. The extension is presented in algorithm Guardian. We also describe the exact modification needed in algorithm RobustDE in order to receive external phase reset commands in PhaseResetCommand.
An extension for turnstile model: Analysis structure. Algorithm Guardian analysis is composed of two components. In the first (Sect. C.1) we show that on a \(\mu \) bounded twist number stream there can be at most \(\mu \) phase reset commands that are sent to algorithm RobustDE (Lemma C.6). In addition, in that first component we also prove that the output of the algorithm Guardian is always accurate (Lemma C.7). The second component (Sect. C.2) consist of calculating the resulting space bounds of the extended framework due to receiving \(\mu \) phase reset commands and the additional space of the monitor (Theorem C.10). By instantiating known constructions of a strong tracker and a difference estimator for \(F_2\) in Theorem C.10 we establish the resulting space bounds for \(F_2\) in the turnstile model in Theorem C.13.
1.1 Bounding the Number of Phase Reset Commands
Next we show that for a \(\mathrm{\mu }\) bounded \((\gamma , m)\)twist number streams, algorithm Guardian captures at most \(\mathrm{\mu }\) \(\gamma \)suffix violations (and so it issues that many phase reset commands to RobustDE). That is established on Lemma C.6. Since algorithm Guardian uses oblivious estimators, we also prove that its output validation is correct in the adaptive input setting. That is done by using a technique from [10]. That is, first we prove that algorithm Guardian is DP (Lemma C.4). Then, we use tools from DP to argue that the validation estimators are accurate (Lemma C.5), and finally we show that this accuracy is leveraged for the correct validation (Lemma C.7).
As we mentioned, we achieve robustness for estimators \(\bar{{\textsf{E}}}_{\textrm{M}}\) via DP. The following lemma states that algorithm Guardian preserves privacy w.r.t the random strings of the estimators \(\bar{{\textsf{E}}}_{\textrm{M}}\).
Lemma C.4
Let \({\mathcal {R}}_{\textrm{M}}\) be the random bitstrings dataset of \(\bar{{\textsf{E}}}_{\textrm{M}}\). Then algorithm Guardian satisfies \((\varepsilon ,\delta ^{\prime })\)DP w.r.t. a dataset \({\mathcal {R}}_{\textrm{M}}\) by configuring \(\varepsilon _{\textrm{M}}=O\left( \varepsilon /\sqrt{P_{\textrm{M}}\log (1/\delta ^{\prime })}\right) \).
Proof sketch
We focus on the time sequences that begin after a time the condition in 3 is True, and ends in the consecutive time that the condition in 3 is True. Denote by \(P_{\textrm{M}}\) at the number of such timesequences. Throughout every such time sequence, we access the dataset \({\mathcal {R}}_{\textrm{M}}\) via the sparse vector technique (See Algorithm of A.4). We calibrate the privacy parameters of this algorithm to be \(\varepsilon _{\textrm{M}}=O\left( \varepsilon /\sqrt{P_{\textrm{M}}\log (1/\delta ^{\prime })}\right) \) such that, by using composition theorems across all of the \(P_{\textrm{M}}\) sequences, our algorithm satisfies \((\varepsilon ,\delta ^{\prime })\)differential privacy w.r.t. \({\mathcal {R}}_{\textrm{M}}\). \(\square \)
In the following lemmas we assume that all the noises (up to 2m draws of \(\textrm{Lap}(O(\varepsilon _{\textrm{M}}^{1}))\) noise) are smaller in absolute value from \(\frac{4}{\varepsilon _{\textrm{M}}}\log \left( \frac{m}{\delta ^{\textrm{M}}} \right) \) which is the case with probability at least \(1\delta ^{\textrm{M}}\). First lemma is an adaptation of technique from Lemma 3.2 [10] that uses differential privacy to assert that the estimations of \(\bar{{\textsf{E}}}_{\textrm{M}}\) are accurate.
Lemma C.5
(Accurate Estimations (Lemma 3.2 [10])) Let \({\textsf{E}}({\mathcal {S}})\) have (an oblivious) guarantee that all of its estimates are accurate with accuracy parameter \(\alpha _{{\textsf{E}}}\) with probability at least \(\frac{9}{10}\). Then for sufficiently small \(\varepsilon \), if algorithm Guardian is \((\varepsilon ,\delta ^{\prime })\)DP w.r.t. the random bits of the estimators \(\{{\textsf{E}}^k\}_{k\in {\textsf{K}}}\), then with probability at least \(1\frac{\delta ^{\prime }}{\varepsilon }\), for time t we have:
where \(z^k \leftarrow {\textsf{E}}^{k}({\mathcal {S}})\) for a set of size \({\textsf{K}} \ge \frac{1}{\varepsilon ^2}\log \left( \frac{2\varepsilon }{\delta ^{\prime }} \right) \) of the oblivious estimator \({\textsf{E}}({\mathcal {S}})\)
proof (A simplified version of B.17)
For time \(t \in [m]\) let \({\mathcal {S}}_t=\left( \langle s_1, \Delta _1 \rangle , \dots , \langle s_t, \Delta _t \rangle \right) \) be the prefix of the input stream \({\mathcal {S}}\) for that time. Let \(z_t\leftarrow {\textsf{E}}(r,{\mathcal {S}}_t)\) be the estimation returned by the oblivious streaming algorithm \({\textsf{E}}\) after the t stream update, when its executed with random string r on the input stream \({\mathcal {S}}_t\). Consider the following function: \(f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r) = \mathbb {1}\left\{ z_t \in \left( 1 \pm \alpha _{{\textsf{E}}} \right) \cdot {\mathcal {F}}({\mathcal {S}}_t) \right\} \). Since algorithm Guardian is \((\varepsilon , \delta ^{\prime })\)DP, then by the generalization properties of differential privacy (see Theorem A.8), assuming that \({\textsf{K}} \ge \frac{1}{\varepsilon ^2}\log \left( \frac{2\varepsilon }{\delta ^{\prime }} \right) \), with probability at least \(1\frac{\delta ^{\prime }}{\varepsilon }\), the following holds for time t:
We continue with the analysis assuming that this is the case. Now observe that \({\mathbb {E}}_{r} \left[ f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r) \right] \ge 9/10\) by the utility guarantees of \({\textsf{E}}\) (because when the stream is fixed its answers are accurate to within a multiplicative error of \((1\pm \alpha _{{\textsf{E}}})\) with probability at least 9/10). Thus for \(\varepsilon \le \frac{1}{100}\), for at least of 8/10 of the executions of \({\textsf{E}}\) we have \(f_{\langle {\mathcal {S}}_t, \pi (t) \rangle }(r_k)=1\) which means the estimations \(z_t\) returned from these executions are accurate. That is, we have that at least \(8{\textsf{K}}/10\) of the estimations \(\left\{ z^k_t \right\} _{k\in [{\textsf{K}}]}\) satisfy the accuracy of the estimators. \(\square \)
Lemma C.6
(Maximal number of monitor triggers) For an input stream with a \((\gamma _0, m)\)twist number \(\mathrm{\mu }\), algorithm Guardian is sending at most \(\mathrm{\mu }\) reset commands for algorithm RobustDE.
Proof
We show, that for an execution with \(\mu \) phase reset commands, the input stream \({\mathcal {S}}\) has a \((\gamma _0,m)\)twist number of at least \(\mu \). That implies the statement.
Let \(r_0< r_1< \dots < r_{\mu 1}\) be the times in which algorithm RobustDE has issued a phase reset command. We focus on the time segment \((r_{i1}, r_{i}]\) for some \(i\in [\mu ]\) (and in the time segment \([0,r_0]\) in the case of \(i=0\)). That is in time \(r_i\) we have:
where the first inequality holds in the event of the bounded noises and the second inequality holds by asserting that: \( {\textsf{K}}_{\textrm{M}} = \Omega \big ( \frac{1}{\varepsilon _{\textrm{M}}}\log \big ( \frac{m}{\delta ^{N}}\big ) \big ) = \Omega \big ( \frac{1}{\varepsilon } \sqrt{P_{\textrm{M}} \cdot \log \big (\frac{1}{\delta ^{\prime }} \big ) }\log \big ( \frac{m}{\delta ^{N}}\big ) \big ) \). So, for at least \(40\%\) of the estimations \(z_{\textrm{M}}^{k}\) of the estimators \(\bar{{\textsf{E}}}_{\textrm{M}}\) it holds that \(z_{\textrm{M}}^{k}  \textrm{Next} \ge (3/4) \alpha \cdot \textrm{Next}\), and in the same time, by Lemma C.5 we have that at least \(80\%\) of the estimators are accurate. That is for at least one estimation \(z_{\textrm{M}}^{k}\) both of above statements hold and we have (for any \(\alpha \in (0,14/15)\)):
That is, algorithm RobustDE accuracy guarantee does not hold in time \(r_i\), as it is not \((1/2)\alpha \) accurate. In addition, in time \(r_{i1}\) there was also a phase reset (or in case that \(i=0\), \(r_{1}=0\)). And so, if there has been any suffix violations it has not effect after the phase reset of time \(r_{i1}\). Therefore it follows that (w.h.p) there must be some time segment [e, t] s.t. \(r_{i1}\le e < t\le r_{i}\) and in addition some level of estimators s.t. these estimators were not accurate, causing algorithm RobustDE accuracy guarantee to break. Denote \({\mathcal {V}}_i\) as the input stream \({\mathcal {S}}\) in times [e, t], then previous conclusion is that \({\mathcal {V}}_i\) is a \(\gamma ^{\prime }\)suffix violation for some \(\gamma ^{\prime }\ge \gamma _0\). That is, for each issued phase reset command, we have in \({\mathcal {S}}\) at least one \(\gamma _0\)suffix violation which imply that in such a scenario the input stream must have a \((\gamma _0,m)\)twist number of at least \(\mu \). \(\square \)
Extension output is accurate. We now show that the output of algorithm Guardian is accurate in all time \(t\in [m]\).
Lemma C.7
If \(P_\textrm{M}> \mu \) then with probability at least \(1\delta \) algorithm Guardian output admit for all time \(t\in [m]\):
Proof
We relate to two cases w.r.t condition 3. If the condition is True, then the output is given after a phase reset command. In that case it was computed by the ST level estimators that are used in the new phase and are not affected by \(\gamma \)suffix violations. And so, the output is \((1/2)\alpha \)accurate according to the configured accuracy of algorithm RobustDE. In the complement case where the condition is False, we have the following:
where the first inequality holds in the event of the bounded noises and the second inequality holds by asserting that \( {\textsf{K}}_{\textrm{M}} = \Omega \left( \frac{1}{\varepsilon _{\textrm{M}}}\log \left( \frac{m}{\delta ^{N}}\right) \right) = \Omega \left( \frac{1}{\varepsilon } \sqrt{P_{\textrm{M}} \cdot \log \left( \frac{1}{\delta ^{\prime }} \right) }\log \left( \frac{m}{\delta ^{N}}\right) \right) \). That is, we have at least \(40\%\) of the estimations \(z_{\textrm{M}}^{k}\) that are \((3/4)\alpha \) close to \(\textrm{Next}\). At the same time, by Lemma C.5, at least \(80\%\) of the estimators are \(\alpha _{\textrm{M}}\)accurate thus there exists an estimator that admit both. And so, by setting \(\alpha _{\textrm{M}} = (1/10)\alpha \) we have that (for any \(\alpha \in (0,1)\)):
We now address the failure probability \(\delta \). Recall that all noises in algorithm Guardian (we have at most m draws of \(\textrm{Lap}(2/\varepsilon _{\textrm{M}})\) and m draws of \(\textrm{Lap}(4/\varepsilon _{\textrm{M}})\) noises) are bounded by \(\frac{4}{\varepsilon _{\textrm{M}}}\log \left( \frac{2m}{\delta ^{N}} \right) \) w.p. at least \(1\delta ^{N}\). Then by setting \(\delta ^{N} = \delta /4\), we have that the noises in algorithm Guardian are bounded as required w.p. at lest \(\delta /4\). In addition, Lemma C.5 statement holds w.p. at least \(1\delta ^{\prime }/100\). Configuring \(\delta ^{\prime } = \delta /(400m)\) yields that this lemma statement holds for all \(t\in [m]\) w.p. at least \(1\delta /4\). In addition, we configure the failure probability of RobustDE for \(\delta /2\). That is, we have that w.p. at least \(1\delta \) all algorithm Guardian outputs are accurate in all \(t\in [m]\). \(\square \)
1.2 Space Complexity of the Framework Extension
It remains to account for the space complexity of RobustDE with at most \(\mathrm{\mu }\) additional phase reset commands received externally from the Guardian algorithm. The adaptation that is needed in RobustDE in order to facilitate external phase reset command is presented in PhaseResetCommand (we present only the relevant lines).
External phase reset command in algorithm RobustDE. In order for Guardian to be able to trigger a phase reset command in algorithm RobustDE, we add an input to the stream, namely \(b_t\), that signals an external phase reset command. This input \(b_t\) has an effect on the functionality of line 2 and can initiate a phase reset. That is, in 2, the condition is triggering initiation of a new phase (regardless of \(\tau \) state) and in the extended version this initiation can also be triggered externally by the received input \(b_t=1\).
Each external reset command comes with cost in terms of additional output modifications. As these additional output modifications require additional estimators in the framework to support them, we calculate a new sufficient value for the input parameter \(\lambda \) of Algorithm RobustDE. This parameter in the notextended framework is bounding the flip number of the input stream. We calculate a new value for that parameter, denoted by \({\hat{\lambda }}\). That value is sufficient to support in the extended framework a stream with a flip number of \(\lambda \) and in addition, \(\mu \) external reset commands.
Calibrating \({\hat{\lambda }}\). Recall that in the analysis of RobustDE we calculate bounds for the number of output modification that are associated with each of the estimators levels, \(C_j\) (see Lemma B.14). It then follows for that analysis that configuring the capping parameter of each level, \(P_j\), to be larger then \(C_j\) (Corollary B.15) ensures no capping. These bounds are stated w.r.t a bound on an \((O(\alpha ),m)\)flip number bound \(\lambda \) that is an input to the algorithm, and hold for the framework without external phase reset commands. Since the extension introduces such external phase reset commands, the previous analysis needs to be adapted. That is, we need to show new bounds for the number of output modification per estimators level \(C_j\) for the extended framework w.r.t a stream that has a bounded \((\alpha ^{\prime },m)\)flip number and \((\gamma _0, m)\)twist number. We do that as follows: calculate a new input for the framework \({\hat{\lambda }}=f(\lambda , \mu )\) s.t. the computed parameters of the framework \(P_j({\hat{\lambda }})\) will be sufficient for nocappingstate for a \(\lambda \) bounded \((\alpha ^{\prime },m)\)flip number and \(\mu \) bounded \((\gamma _0, m)\)twist number streams. The following lemma calculate such calibration of \({\hat{\lambda }}\):
Lemma C.8
(Calibration of \({\hat{\lambda }}\)) Let \({\mathcal {S}}\) be a stream with \((\alpha ^{\prime },m)\)flip number and \((\gamma _0, m)\)twist number bounded by \(\lambda \) and \(\mu \) correspondingly. Then,
where \({\hat{\lambda }} = O(\lambda + \mu \cdot \alpha ^{1})\), \(\alpha ^{\prime } = (1/2)\cdot \textrm{StepSize}(\alpha ) = O(\alpha )\), \(\gamma _0 = \frac{1+\alpha _{\textrm{ST}}}{1\alpha _{\textrm{ST}}} \Gamma ^2 \cdot 2 \cdot \alpha = O(\alpha )\).
Proof
First we look on some segment of the stream \({\mathcal {S}}\) corresponding to times between two consecutive phase resets (either an internal phase reset or a phase reset command received from Guardian). On each such segment we bound its \((\alpha ^{\prime },m)\)flip number and calculate the resulting number of phases in that segment. Then we sum the total number of phases within all these segments. Finally, we bound the number of output modifications associated with each level from the bound of the number of phases.
Total number of phases. By Lemma C.6 we have that there are at most \(\mu \) reset commands issued from Guardian for \({\mathcal {S}}\). In addition, there are at most \(\kappa = O(\alpha \lambda _{\alpha ^{\prime }}({\mathcal {S}}))\) internal resets (see Lemma B.14). Denote by \({\hat{\mu }} = \mu + \kappa \) the number of phase resets in algorithm RobustDE (internal and external). Let \(\{r_i\}_{i \in [{\hat{\mu }}]}\), \(r_i\in [m]\) be a set of times in which the \({\hat{\mu }}\) phase reset were executed. For \(i\in [{\hat{\mu }}]\), let \({\mathcal {S}}_i\) by the sub stream of \({\mathcal {S}}\) in times \([r_i, r_{i+1})\) (where \({\mathcal {S}}_{{\hat{\mu }}1}\) is on times \([r_{{\hat{\mu }}1},m1]\)). Also denote by \(\phi , \phi _i\) the number of phases in \({\mathcal {S}}, {\mathcal {S}}_i\) correspondingly. The following holds:
where (1) is true since on each segment \({\mathcal {S}}_i\) there is no phase reset and we start a new phase every \(\textrm{PhaseSize}+1\) number of steps (see the proof of Lemma B.14), (2) holds since for every output modification the value of \({\mathcal {F}}\) progresses by at least factor of \((1/2)\cdot \textrm{StepSize}\ge \alpha ^{\prime }\) (see the proof of Lemma B.14), (3) is true by Lemma B.11.
Output per level. In every phase there are at most \(\textrm{PhaseSize}\) number of output modifications. And so (See the proof of Lemma B.14), for \(j\in [\beta ]\), the number of output modifications associated with level j estimators is \(O(\textrm{PhaseSize}/2^{j})\). That is:
\(\square \)
An immediate Corollary is that calibrating the input \({\hat{\lambda }} = \Omega (\mu \alpha ^{1} + \lambda )\), algorithm RobustDE will not get to capping state. That is since algorithm RobustDE is setting the parameters \(P_j = \Omega ({\hat{\lambda }}/2^{j})\) for an input \({\hat{\lambda }}\), resulting in \(P_j > C_j({\mathcal {S}})\) as required.
Corollary C.9
(No capping in extended RobustDE.) Let \({\mathcal {S}}\) be a stream with \((\alpha ^{\prime },m)\)flip number and \((\gamma _0, m)\)twist number bounded by \(\lambda \) and \(\mu \) correspondingly. Calibrating \({\hat{\lambda }}\), the input of RobustDE, to \({\hat{\lambda }} = \Omega (\mu \cdot \alpha ^{1} + \lambda )\) is sufficient to ensure RobustDE will not get into capping state.
We now present the resulting space bounds of the extended framework.
Theorem C.10
(Extended framework for Adversarial Streaming  Space) Provided that there exist:

1.
An oblivious streaming algorithm \({\textsf{E}}_{\textrm{ST}}\) for functionality \({\mathcal {F}}\), that guarantees that with probability at least 9/10 all of it’s estimates are accurate to within a multiplicative error of \((1\pm \alpha _{\textrm{ST}})\) with space complexity of \(S_{\textrm{ST}}(\alpha _{\textrm{ST}}, \frac{1}{10}, n,m)\)

2.
For every \(\gamma \) there is a \((\gamma ,\alpha _{\textrm{DE}},\frac{1}{10})\)\(\textrm{DE}\) for \({\mathcal {F}}\) using space \(\gamma \cdot S_{\textrm{DE}}(\alpha _{\textrm{DE}},\frac{1}{10},n,m)\).
Then there exist an adversarially robust streaming algorithm for functionality \({\mathcal {F}}\) that for any stream \({\mathcal {S}}\) with a bounded flip number \(\lambda _{\alpha ^{\prime },m}({\mathcal {S}})< \lambda \) and a bounded twist number \(\mu _{\gamma _0, m}({\mathcal {S}}) < \mu \) (where \(\alpha ^{\prime }, \gamma _0 = O(\alpha )\)), s.t. with probability at least \(1\delta \) its output is accurate to within a multiplicative error of \((1\pm \alpha )\) for all times \(t\in [m]\), and has a space complexity of
where:

1.
\(S_{\textrm{ST}} = S_{\textrm{ST}}(O(\alpha ),\frac{1}{10},n,m)\).

2.
\(S_{\textrm{DE}} = S_{\textrm{DE}}(O(\alpha /\log (\alpha ^{1})),\frac{1}{10{\hat{\lambda }}},n,m)\), for \({\hat{\lambda }} = O(\lambda + \mu \cdot \alpha ^{1})\)

3.
\(\text {polylog}_{\textrm{ALG}} = \text {polylog}(\lambda , \mu , \alpha ^{1}, \delta ^{1}, m, n)\).
Proof
In order to use the framework of Algorithm \(\texttt {RobustDE}\) for functionality \({\mathcal {F}}\) it is necessary (by Theorem B.1) to have for every \(\gamma ,p\) a \((\gamma ,\alpha _{\textrm{TDE}},p,\frac{1}{10})\)\(\textrm{TDE}\) for \({\mathcal {F}}\) using space \(\gamma \cdot S_{\textrm{TDE}}(\alpha _{\textrm{TDE}},p,n,m)\). By Corollary 5.1, it is possible to construct a \(\textrm{TDE}\) from a \(\textrm{DE}\) (that has the same accuracy guarantee) with space of \( S_{\textrm{TDE}}(\gamma ,\alpha ,\delta ,p,n,m) = 2\cdot S_{\textrm{DE}}(\gamma ,\alpha ,\delta /p,n,m)\). Thus having a \((\gamma ,\alpha _{\textrm{DE}},1/10)\)\(\textrm{DE}\) with space of \(\gamma \cdot S_{\textrm{DE}}(\alpha _{\textrm{DE}},1/10,n,m)\) imply a \((\gamma ,\alpha _{\textrm{TDE}}, p, 1/10)\)\(\textrm{TDE}\) with space of \(S_{\textrm{TDE}}=2\cdot S_{\textrm{DE}}(\gamma , \alpha _{\textrm{TDE}}, 1/(10\cdot p),n,m)\) with the same accuracy guarantee.
Sufficient parameter calibration. By Lemma C.8, calibrating \({\hat{\lambda }} = \Omega (\lambda + \mu \cdot \alpha ^{1})\) is sufficient to ensure that algorithm RobustDE will not get to capping state. (in addition in Lemma C.8 the required accuracy constant of the flip number is required to be \(\alpha ^{\prime } = (1/2)\cdot \textrm{StepSize}(\alpha ) \le (1/2)\cdot \alpha /(2\Gamma ) \) that is \(\alpha ^{\prime } = O(\alpha )\).) If in addition we configure \(P_{\textrm{M}} > \mu \), then by Lemmas C.4, C.7 we have that the output of Guardian is \(\alpha \)accurate in all times \(t\in [m]\).
Space of Guardian. Space of Guardian alone is accounted with the space of a \({\textsf{K}}_{\textrm{M}}\) number of \({\textsf{E}}_{\textrm{M}}\) estimators. These are strong trackers with accuracy \(\alpha _{\textrm{M}} = (1/10)\cdot \alpha = O(\alpha )\).
Since \(\delta ^{\prime } = O(\delta /m)\), \(\delta ^{N} = O(\delta )\), \(P_{\textrm{M}} = O(\mu )\).
Space of RobustDE. We have that \(S_{\textrm{TDE}}=2\cdot S_{\textrm{DE}}(\gamma , \alpha _{\textrm{TDE}}, 1/(10\cdot p),n,m)\) and \({\hat{\lambda }} = O(\lambda + \mu \cdot \alpha ^{1})\). And so, by plugging in \({\hat{\lambda }}\), \(S_{\textrm{TDE}}(\alpha ,\delta ,p,n,m) = 2\cdot S_{\textrm{DE}}(\alpha ,\delta /p,n,m)\) in B.1 we get the required bounds:
where

1.
\(S_{\textrm{ST}} = S_{\textrm{ST}}(\alpha _{\textrm{ST}},1/10,n,m) = S_{\textrm{ST}}(O(\alpha ),1/10,n,m)\).

2.
\(S_{\textrm{DE}} = S_{\textrm{DE}}(\alpha _{\textrm{TDE}},1/(10{\hat{\lambda }}),n,m) = S_{\textrm{DE}}(O(\alpha /\log (\alpha ^{1})),1/(10{\hat{\lambda }}),n,m)\).

3.
\(\text {polylog}_{\textrm{ALG}} = \left[ \log \left( \frac{m}{\delta ^{*}}\right) + \log \left( \frac{{\hat{\lambda }}}{\alpha \delta ^{*}} \log (n)\right) \right] \sqrt{\log \left( \frac{m}{\delta ^{*}}\right) } = \text {polylog}(\lambda + \mu \cdot \alpha ^{1}, \alpha ^{1}, \delta ^{1}, m, n)\).

4.
\({\hat{\lambda }} = O(\lambda + \mu \cdot \alpha ^{1})\), \(\delta ^{*} = \delta /\log (\alpha ^{1})\).
Total space of extension. It remain to calculate the resulting bounds of algorithms RobustDE, Guardian. Since \(\log ^{1.5}(m/\delta ) = O(\text {polylog}_{\textrm{ALG}})\), then the space of Guardian is subsumed in the space of RobustDE. \(\square \)
To apply our extended framework to \(F_2\), we first cite constructions of a strong tracker and of a difference estimator for \(F_2\), and then calculate the overall space complexity that results from our framework.
Theorem C.11
(Oblivious strong tracker for \(F_2\) [1, 38]) There exists a strong tracker for \(F_2\) functionality s.t. for every stream S of length m outputs on every time step \(t\in [m]\) an \(\alpha \)accurate estimation \(z_t\in (1\pm \alpha )\cdot F_2(S)\) with probability at least 9/10 and has space complexity of \( O \left( \frac{1}{\alpha ^{2}}\log m \left( \log n + \log m \right) \right) \)
Theorem C.12
(Oblivious DEfor \(F_2\) [16]) There exists a \((\gamma , \alpha , \delta )\)difference estimator for \(F_2\) that uses space of \(O\left( \gamma \cdot \frac{\log n}{\alpha ^{2}} \left( \log \frac{1}{\alpha } + \log \frac{1}{\delta } \right) \right) \)
Theorem C.13
(\(F_2\) Robust estimation) There exists an adversarially robust \(F_2\) estimation algorithm for turnstile streams of length m with a bounded \((\alpha ^{\prime }, m)\)flip number and \((\gamma _0, m)\)twist number with parameters \(\lambda \) and \(\mu \) correspondingly (where \(\alpha ^{\prime }, \gamma _0 = O(\alpha )\)), that guarantees \(\alpha \)accuracy with probability at least \(11/m\) in all time \(t\in [m]\) with space complexity of
where \({\tilde{O}}\) stands for omitting \(\text {polylog}(\alpha ^{1})\) factors.
Proof
By Theorem C.11, there exists a \((\alpha _{\textrm{ST}}, \frac{1}{10})\)strong tracker for functionality \(F_2\) with space complexity of \(S_{\textrm{ST}}(\alpha _{\textrm{ST}}, \frac{1}{10}, n,m) = O\left( \alpha _{\textrm{ST}}^{2} \log m \left( \log n + \log m \right) \right) \). For \(m=\text {poly}(n)\) we get \(S_{\textrm{ST}} = O(\alpha ^{2} \log ^2(m))\). By theorem C.12, there exists a \((\gamma , \alpha _{\textrm{DE}}, \frac{1}{10})\)difference estimator for functionality \(F_2\) with space complexity of \(\gamma \cdot S_{\textrm{DE}}(\alpha _{\textrm{DE}}, \delta , n,m)\) where \(S_{\textrm{DE}} = O\left( \alpha _{\textrm{DE}}^{2} \log n \left( \log \alpha _{DE}^{1} + \log \delta ^{1} \right) \right) \). Then by Theorem C.10 we have:
where (1) is by plugging in \(\text {polylog}_{\textrm{ALG}}\) and omitting factors of \(\text {polylog}\alpha ^{1}\), (2) is by again omitting factors of \(\text {polylog}\alpha ^{1}\), noting that \(\lambda , \mu < m\) and by assuming that \(n = \text {poly}(m)\). Now, setting \(\delta = 1/m\) we get:
\(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Attias, I., Cohen, E., Shechner, M. et al. A Framework for Adversarial Streaming Via Differential Privacy and Difference Estimators. Algorithmica (2024). https://doi.org/10.1007/s00453024012598
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00453024012598