1 Introduction

Traitor tracing systems, introduced by Chor et al. [11], are broadcast encryption schemes that are capable of tracing malicious “traitor” coalitions aiming at building pirate decryption devices. Such schemes are widely applicable to the distribution of digital commercial content (e.g. Pay-TV, news websites subscription, online stock quotes broadcast) for fighting against copyright infringement. In particular, consider a scenario where a distributor would like to send digital contents to n authorized users via a broadcast channel while users possess different secret keys that allow them to decrypt the broadcasts in a non-ambiguous fashion. Clearly, a pirate decoder, built upon a set of leaked secret keys, could also extract the cleartext content illegally. To discourage such piracy in a traitor tracing system, once a pirate decoder is found, the distributor can run a tracing algorithm to recover the identity of at least one user that collaborated in the pirate construction.

As a cryptographic primitive, traitor tracing system, together with its various generalizations, has been studied extensively in the literature (e.g., [6, 15, 24, 28, 31]). Considerable efforts have been made on the construction of more efficient traitor tracing scheme from other primitives, in terms of improving two decisive performance parameters – the length of user key and the length of ciphertext. To illustrate, we exhibit in Table 1 the relation between cryptographic assumptions and the performance of the fully collusion resistant traitor tracing systems where tracing succeeds no matter how many users keys the pirate has at his disposal.

Table 1. Some previous results on fully collusion resistant traitor tracing systems.

Obviously, as we illustrate in Table 1, more efficient traitor tracing schemes can be constructed based on stronger assumptions. Nonetheless, it is natural to ask whether the known constructions can be made more efficient if we only rely on the most fundamental cryptographic assumption – the existence of one-way functions. Impagliazzo and Rudich [21] first studied this type of questions in the context of key agreement. They observed that for most constructions in cryptography, the starting primitive is treated as an oracle, or a “black-box” and the security of the constructed scheme is derived from the security of the primitive in a black-box sense. Based on this observation, they showed that a black-box construction of key agreement built upon one-way functions implies a proof that \(P\ne NP\). This approach has been subsequently adopted in investigating the reducibility between other cryptographic primitives, such as one-way permutations [23], public-key encryption [18, 19], universal one-way hash functions [25]. In particular, in the context of traitor tracing, the question is whether there exists a more efficient traitor tracing scheme via black-box constructions based on one-way functions. In this paper, we focus on this problem and provide a partial answer to it.

1.1 Our Results

We consider traitor tracing systems in the random oracle model [4], which is an ideal model using one way functions in the strongest sense. In this model, the constructed cryptographic scheme can access a random oracle O which can be viewed as a fully random function. In spite of the criticism on its unjustified idealization in practical implementations [10], the random oracle model seems to be an appropriate model and a clean way to establish lower bounds in cryptography (e.g. [1, 21]). As there is no security measure defined on the oracle, one common way to prove security for oracle based constructions is to rely on the fully randomness of the oracle and the restriction on the number of queries the adversary (even computationally unbounded) can ask.

Our main result is a lower bound on the performance of traitor tracing systems satisfying a property we call IndKeys. Roughly speaking, a cryptographic scheme is said to be IndKeys if the scheme does not use the black-box hardness of the starting primitive to generate private keys. Here we give an informal definition of the IndKeys property for any cryptographic systems and defer the formal definition tailored for traitor tracing systems to Sect. 2.

Definition 1 (informal)

Let \(\varPi ^{(\cdot )}\) be a cryptographic scheme that takes other cryptographic primitives or ideal random functions as oracles. We say that \(\varPi ^{(\cdot )}\) is IndKeys if \(\varPi ^{(\cdot )}\) does not access the oracles while generating private keys.

Remark 1

Considering all cryptographic primitives (not restriced in the private-key traitor tracing systems we study here), it should be mentioned that the IndKeys property does not require any independence between the public keys and the oracles. Indeed, some of the known black-box constructions of cryptographic primitives use the black-box hardness to generate public keys, (e.g. one time signature [26]), but the private keys are still generated independent of the oracles as requested in IndKeys. To our best knowledge, all the exisiting cryptographic schemes via black-box reductions from one-way functions are IndKeys. Thus, our negative result for IndKeys systems shows that most of the standard black-box reductions in cryptography cannot help to construct a more efficient traitor tracing system. At last, as the IndKeys property is defined on all cryptographic schemes, it might be helpful to investigate the technical limitations of known black-box reductions and derive more lower bounds for other primitives.

In this paper, we show a lower bound on the performance (or efficiency) of the IndKeys traitor tracing systems in terms of the lengths of user keys and ciphertexts. We summarize the main theorem informally as follows and defer the rigorous statement to Sect. 2.

Theorem 1 (informal)

Let \(\varPi _{\mathtt {TT}}^{(\cdot )}\) be a secure traitor tracing system that is IndKeys, then

$$\begin{aligned} {\ell _k}\cdot {\ell _c}^2\ge \widetilde{\varOmega }(n) \end{aligned}$$

where \({\ell _k}\) is the length of user key, \({\ell _c}\) is the length of ciphertext and n is the number of users.

1.2 Our Approach

We prove our results by building on the connection between traitor tracing systems and differentially private sanitizers for counting queries discovered by Dwork et al. [13]. Informally, a database sanitizer is differentially private if its outputs on any two databases that only differ in one row, are almost the same. Dwork, Naor, Reingold, et al. showed that any differentially private and accurate sanitizer (with carefully calibrated parameters) can be used as a valid pirate decoder to break the security of traitor tracing systems. Intuitively, a pirate decoder can be viewed as a sanitizer of databases consist of leaked user keys.

Built upon this connection, we show the lower bound on traitor tracing systems by constructing a sanitizer in the random oracle model. We first build a natural extension of sanitizers and differential privacy in presence of random oracles in Sect. 3. The main difference from standard definitions is that we relax the accuracy requirement by asking sanitizer to be accurate with high probability w.r.t. the random oracle. That is, an accurate sanitizer under our definition might be (probabilistic) inaccurate for some oracle but must be accurate for most oracles. This relaxation allows us to derive a query-efficient sanitizer.

Our sanitizer is developed upon the median mechanism designed by Roth and Roughgarden [30], which maintains a set \(\mathcal {D}\) of databases and for each counting query: (1) compute the results of the query for all databases in \(\mathcal {D}\); (2) Use the median \( med \) of these results to answer the query if \( med \) is close to the answer \(a^*\) of the true database; (3) If not, output \(a^*\) added with a Laplacian noise and remove the databases in \(\mathcal {D}\) whose result of the query is not close to \(a^*\). Note that when computing \( med \), the median mechanism need to query the oracle for all databases in \(\mathcal {D}\) whose size might be exponential in \({\ell _k}\). Thus, it will make exponentially many queries to the oracle.

We design a query-efficient implementation of the median mechanism by using the expectations of query results (taken over all oracles) to compute \( med \) without querying the real oracle. Our mechanism would be accurate if the answers are concentrated around their expectations taken over all random oracles. Unfortunately, such concentration property does not hold for arbitrary queries and databases. But fortunately, we can show that it holds if there is no “significant” variables in the decryption (or query answering). More specifically, we generalize the deviation bound proved in [2] where they required the size of the database (decision forest) to be relatively larger than the “significance” of the variable (see formal definitions in Sect. 6). Our bound does not make this requirement and is much more applicable. We prove this bound by generalizing two previous deviation bounds proved by Beck et al. [2] and Gavinsky et al. [17]. Note that the IndKeys property is essential in our proof since the deviation bound only holds for uniformly distributed oracles.

To put it together, our mechanism maintains a set of databases \(\mathcal {D}\) and for each counting query: (a) remove the variables which are significant for most databases in \(\mathcal {D}\); (b) privately check whether the decryption process corresponding to the true database has a significant variable; (c) if there is a significant variable \(x^*\), output the noisy true answer and remove the databases that do not view \(x^*\) as a significant variable; (d) otherwise, compute the median \( med \) among all expected answers of databases in \(\mathcal {D}\); (e) if \( med \) is close to true answer, use it to answer the query; (f) otherwise, output the noisy answer and remove databases in \(\mathcal {D}\) whose expected answer is not close to the true answer.

1.3 Related Work

Starting with the seminal paper by Impagliazzo and Rudich [21], black-box reducibility between primitives has attracted a lot of attention in modern cryptography. Reingold et al. [29] revisited existing negative results and gave a more formal treatment of the notions of black-box reductions. In their notions, our results can be viewed as a refutation of the fully black-box reduction of IndKeys traitor tracing systems to one-way functions. Our usage of the random oracle model also follows the work by Barak and Mahmoody-Ghidary [1], where they proved lower bounds on the query complexity of every black-box construction of one-time signature schemes from symmetric cryptographic primitives as modeled by random oracles. To our best knowledge, there is no lower bound results on the performance of traitor tracing systems prior to our work.

Differential privacy, as a well studied notion of privacy tailored to private data analysis was first formalized by Dwork et al. [12]. They also gave an efficient sanitizer called Laplace Mechanism that is able to answer \(n^2\) counting queries. A remarkable following result of Blum et al. [5] shows that the number of counting queries can be increased to sub-exponential in n by using the exponential mechanism of McSherry and Talwar [27]. Subsequently, interactive mechanisms, with running time in polynomial of n and universe size, are developed to answer sub-exponentially many queries adaptively by Roth and Roughgarden [30] (median mechanism) and Hardt and Rothblum [20] (multiplicative weights mechanism). On the other hand, based on the connection between traitor tracing systems and sanitizers, Ullman [32] proved that no differentially private sanitizer with running time in polynomial of n and the logarithm of the universe size can answer \(\widetilde{\varTheta }(n^2)\) queries accurately assuming one-way functions exist. Our sanitizer constructions are inspired by the above mechanisms and also rely on the composition theorem of differentially private mechanisms by Dwork et al. [14]. Thus, our results can be viewed as an application of advanced techniques of designing differentially private sanitizer in proving cryptographic lower bounds.

This paper is also technically related to previous deviation bounds on Boolean decision forests. Gavinsky et al. [17] showed that for any decision forest such that every input variable appears in few trees, the average of the decision trees’ outputs should concentrate around its expectation when the input variables are distributed independently and uniformly. Similar bounds have also been proved by Beck et al. [2] for low depth decision tress but with a weaker “average” condition (see Sect. 6). As an application, they used this deviation bound to show that \(AC^{0}\) circuits can not sample good codes uniformly. By a finer treatment on the conditions stated in the above two works, we are able to prove a more general deviation bounds for decision forests, which we believe should have other applications.

1.4 Organization

The rest of the paper is organized as follows. In Sect. 2, we review the formal definition of traitor tracing systems in the random oracle model and state our main theorem. Then we review the connection between traitor tracing systems and differentially private sanitizers in Sect. 3. In Sect. 4, we prove a weaker lower bound which is \(\widetilde{\varOmega }(n^{1/3})\) to illustrate the main ideas via using a general large deviation bound for decision forests. Then we improve the bound to \(\widetilde{\varOmega }(n)\) as stated in our main theorem in Sect. 5 by more elaborate arguments. Then, in Sect. 6, we exhibit the proof the large deviation bound for decision forests which is omitted in the proof in Sect. 2. Due to space limit, some proofs are deferred in Appendix A. Furthermore, in Appendix B, we show an oralce separation result between one-way functions and secure traitor tracing systems as a straight-forward implication of our main theorem.

2 Traitor Tracing Systems

In this section, we give a formal definition of traitor tracing systems in the random oracle model and state our main theorem. For any security parameter \(\kappa \in \mathbb {N}\), an oracle can be viewed as a Boolean function \(O:\{0,1\}^{{\ell _o}(\kappa )}:\rightarrow \{0,1\}\), where \({\ell _o}\) is a function from \(\mathbb {N}\) to \(\mathbb {N}\).

Definition 2

Let n, m, \(\ell _k\), \(\ell _c\), and \({\ell _o}\) be functions \(:\mathbb {N}\rightarrow \mathbb {N}\), a traitor tracing system in the random oracle model denoted by \(\varPi _{\mathtt {TT}}\) with n users, user-key length \({\ell _k}\), ciphertext length \({\ell _c}\), m tracing rounds and access to an oracle with input length \({\ell _o}\), also contains the following four algorithms. We allow all the algorithms to be randomized except \(\mathtt {Dec}\).

  • \(\mathtt {Gen}^O(1^{\kappa })\), the setup algorithm, takes a security parameter \(\kappa \) as input and a Boolean function \(O:\{0,1\}^{{\ell _o}(\kappa )}\rightarrow \{0,1\}\) as an oracle, and outputs \(n=n(\kappa )\) user-keys \(k_1,\dots ,k_n\in \{0,1\}^{l_{k}(\kappa )}\). Formally, \(\mathbf {k}=(k_1,\dots ,k_n)\leftarrow _R\mathtt {Gen}^O(1^\kappa )\).

  • \(\mathtt {Enc}^O(\mathbf {k},b)\), the encrypt algorithm, takes n user-keys \(\mathbf {k}\) and a message \(b\in \{0,1\}\) as input, and outputs a ciphertext \(c\in \{0,1\}^{l_{c}(\kappa )}\) via querying an oracle O. Formally, \(c\leftarrow _R\mathtt {Enc}^O(\mathbf {k},b)\).

  • \(\mathtt {Dec}^O(k_i,c)\), the decrypt algorithm takes a user-key \(k_i\) and a ciphertext c as input, and outputs a message \(b\in \{0,1\}\) via querying an oracle O. Formally, \(b=\mathtt {Dec}^O(k_i,c)\).

  • \(\mathtt {Trace}^{O,\mathcal {P}^O}(\mathbf {k})\), the tracing algorithm, takes n user-keys \(\mathbf {k}\) as input, an oracle O and a pirate decoder \(\mathcal {P}^O\) as oracles, and makes \(m(\kappa )\) queries to \(\mathcal {P}^O\), and outputs the name of a user \(i\in [n]\). Formally, \(i\leftarrow _R\mathtt {Trace}^{O,\mathcal {P}^O}(\mathbf {k})\).

Formally, \(\varPi _{\mathtt {TT}}=(n,m,{\ell _k},{\ell _c},{\ell _o},\mathtt {Gen}^{(\cdot )},\mathtt {Enc}^{(\cdot )},\mathtt {Dec}^{(\cdot )}, \mathtt {Trace}^{(\cdot ,\cdot )})\).

For simplicity, when we use the notation \(\varPi _{\mathtt {TT}}\) without any specification, we also mean all these functions and algorithms are defined correspondingly. We also abuse the notations of functions of \(\kappa \) to denote the values of functions when \(\kappa \) is clear from the context, (e.g., n denotes \(n(\kappa )\)).

Intuitively, the pirate decoder \(\mathcal {P}\) can be viewed as a randomized algorithm that holds a set of user-keys \(\mathbf {k}_S=(k_i)_{i\in S}\) with \(S\subseteq [n]\). The tracing algorithm \(\mathtt {Trace}\) is attempting to identify a user \(i\in S\) by making queries to \(\mathcal {P}\) interactively. In particular, in each round \(j\in [m]\), \(\mathtt {Trace}\) submits a ciphertext \(c_j\) to \(\mathcal {P}\) and then \(\mathcal {P}\) answers a message \(\widehat{b}_j\in \{0,1\}\) based on all the previous ciphertexts \(c_1,\dots ,c_j\). Formally, \(\widehat{b}_j\leftarrow _R\mathcal {P}^O(\mathbf {k}_S,c_1,\dots ,c_j)\). Note that we allow the tracing algorithm to be stateful. That our lower bounds apply to stateful Traitor Tracing Systems makes our results stronger. Given a function \({\ell _o}\) and a security parameter \(\kappa \), let \(\mathcal {O}_{\mathrm {unif}}\) denote the uniform distribution over all oracles with size \({\ell _o}(\kappa )\), i.e. the uniform distribution for all Boolean functions with input \(\{0,1\}^{{\ell _o}(\kappa )}\). We also abuse \(\mathcal {O}_{\mathrm {unif}}\) to denote the support of this distribution. As a pirate decoder, \(\mathcal {P}\) should be capable of decrypting ciphertext with high probability as defined formally as follows.

Definition 3

Let \(\varPi _{\mathtt {TT}}\) be a traitor tracing system and \(\mathcal {P}^{(\cdot )}\) be a pirate decoder, we say that \(\mathcal {P}\) is m-available if for every \(S\subseteq [n]\) s.t. \(|S|\ge n-1\),

$$\mathop {\Pr }\limits _{\begin{array}{c} O\sim \mathcal {O}_{\mathrm {unif}},\mathbf {k}\leftarrow _R\mathtt {Gen}^O(1^\kappa )\\ c_j\leftarrow _R\mathtt {Trace}^{O,\mathcal {P}}(\mathbf {k},\widehat{b}_1,\dots ,\widehat{b}_{j-1})\\ \widehat{b}_j\leftarrow _R\mathcal {P}^O(\mathbf {k}_S,c_1,\dots ,c_j) \end{array}} \begin{array}{c} \\ \left[ \begin{array}{c} \exists j\in [m],b\in \{0,1\}\\ \left( \forall i\in S, \mathtt {Dec}^O(k_i,c_j)=b\right) \wedge (\widehat{b}_j\ne b)\\ \end{array} \right] \end{array} \le neg (n(\kappa )) $$

Similarly, a traitor tracing system should decrypt the ciphertext correctly.

Definition 4

A traitor tracing system \(\varPi _{\mathtt {TT}}\) is said to be correct if for all oracle O, user \(i\in [n]\) and message \(b\in \{0,1\}\),

$$\begin{aligned} \mathop {\Pr }\limits _{\begin{array}{c} \mathbf {k}\leftarrow _R\mathtt {Gen}^O(1^\kappa )\\ c\leftarrow _R\mathtt {Enc}^O(\mathbf {k},b) \end{array}}[\mathtt {Dec}^O(k_i,c)=b]=1 \end{aligned}$$

In addition, we require the traitor tracing system to be efficient in terms of the number of queries it makes. In particular, we use \(\mathrm {QC}(\mathcal {A}^O)\) to denote the query complexity of \(\mathcal {A}^O\), i.e. the number of queries \(\mathcal {A}^O\) makes to O.

Definition 5

A traitor tracing system \(\varPi _{\mathtt {TT}}\) is said to be efficient if for any oracle O with input size \({\ell _o}(\kappa )\) and for any pirate decoder \(\mathcal {P}\), the query complexity of \(\mathtt {Gen}^O,\mathtt {Enc}^O,\mathtt {Dec}^O,\mathtt {Trace}^O\) are in polynomial of their input size respectively. Formally, \(\mathrm {QC}(\mathtt {Gen}^O)=\mathtt {poly}(\kappa )\), \(\mathrm {QC}(\mathtt {Enc}^O)=\mathtt {poly}(n,{\ell _k})\), \(\mathrm {QC}(\mathtt {Dec}^O)=\mathtt {poly}({\ell _k},{\ell _c})\) and \(\mathrm {QC}(\mathtt {Trace}^{O,\mathcal {P}})=\mathtt {poly}(n,m,{\ell _k})\).

Note that we do not make any restriction on the computational power of the traitor tracing systems. Obviously, any computationally efficient \(\varPi _{\mathtt {TT}}\) is also query efficient but the other direction does not hold. That our lower bounds apply to efficient \(\varPi _{\mathtt {TT}}\) in the above definition makes our results apply to computational efficient \(\varPi _{\mathtt {TT}}\) directly. Similarly, we say a pirate decoder \(\mathcal {P}\) is efficient if \(\mathrm {QC}(\mathcal {P}^O)=\mathtt {poly}(n,{\ell _k},{\ell _c})\) in each round of its interaction with \(\mathtt {Trace}\).

Definition 6

A traitor tracing system \(\varPi _{\mathtt {TT}}\) is said to be secure if for any efficient \(m(\kappa )\)-available pirate decoder \(\mathcal {P}\) and \(S\subseteq [n(\kappa )]\),

$$\mathop {\Pr }\limits _{\begin{array}{c} O\sim \mathcal {O}_{\mathrm {unif}}\\ \mathbf {k}\leftarrow _R\mathtt {Gen}^O \end{array}}[\mathtt {Trace}^{O,\mathcal {P}^O(\mathbf {k}_S)}(\mathbf {k})\not \in S]\le o\left( \frac{1}{n(\kappa )}\right) $$

Definition 7 (IndKeys)

A traitor tracing system \(\varPi _{\mathtt {TT}}\) is said to be IndKeys if for all a security parameter \(\kappa \in \mathbb {N}\) and any two oracles O and \(O'\), the distribution of \(\mathbf {k}\) generated by \(\mathtt {Gen}^O(1^\kappa )\) and \(\mathtt {Gen}^{O'}(1^\kappa )\) are the same distribution. Equivalently, conditioned on any particular user-keys \(\mathbf {k}\), the oracle O can still be viewed as a random variable drawn from \(\mathcal {O}_{\mathrm {unif}}\).

Remark 2

Note that all known traitor tracing systems via black-box hardness are IndKeys. The scheme designed by with \({\ell _k}=O(n^2\kappa )\) and \({\ell _c}=O(\kappa )\) does not require oracles and the one designed by Chor et al. [11] and modified by Ullman [32] with \({\ell _k}=O(\kappa )\) and \({\ell _c}=O(n\kappa )\) does not need the oracle to generate private keys.

The following theorem is our main theorem whose proof is deferred to Sects. 4 and 5.

Theorem 2

In the random oracle model, for any \(\theta >0\), there is no query-efficient, correct and secure traitor tracing system \(\varPi _{\mathtt {TT}}^{(\cdot )}\) which is IndKeys, such that for any security parameter \(\kappa \in \mathbb {N}\),

$$\begin{aligned} {\ell _k}(\kappa )\cdot {\ell _c}(\kappa )^2\le n(\kappa )^{1-\theta }. \end{aligned}$$

3 Differentially Private Sanitizers in Random Oracle Model

In this section, we formally define differentially private sanitizers for counting queries in the random oracle model by extending the standard definitions. After that we show its connection with traitor tracing systems by slightly modifying the proofs in [13, 32]. For ease of presentation, we reuse the notations used in Sect. 2, (e.g. \(n,m,{\ell _k},{\ell _c},{\ell _o}\)) to denote their counterparts in the context of private data analysis.

A counting query on \(\{0,1\}^{\ell _k}\) is defined by a deterministic algorithm \(q^{(\cdot )}\) where given any oracle \(O:\{0,1\}^{\ell _o}\rightarrow \{0,1\}\), \(q^O\) is a Boolean function \(\{0,1\}^{\ell _k}\rightarrow \{0,1\}\). Abusing notation, we define the evaluation of the query \(q^{(\cdot )}\) on a database \(D=(x_1,\dots ,x_n)\in (\{0,1\}^{\ell _k})^n\) with access to O to be \(q^O(D)=\frac{1}{n}\sum _{i\in [n]}q^O(x_i)\). Let \(\mathcal {Q}\) be a set of counting queries. A sanitizer \(\mathcal {M}^{(\cdot )}\) for \(\mathcal {Q}\) can be viewed as a randomized algorithm takes a database \(D\in (\{0,1\}^{\ell _k})^n\) and a sequence of counting queries \(\mathbf {q}^{(\cdot )}=(q^{(\cdot )}_1,\dots ,q^{(\cdot )}_m)\in \mathcal {Q}^m\) as input and outputs a sequence of answers \((a_1,\dots , a_m)\in \mathbb {R}^m\) by accessing an oracle O. We consider interactive mechanisms, that means \(\mathcal {M}^{(\cdot )}\) should answer each query without knowing subsequent queries. More specifically, the computation of \(a_i\) can only depends on the first i queries, i.e. \((q^{(\cdot )}_1,\dots , q^{(\cdot )}_i)\). One might note that our definition differs from the traditional definition of sanitizers by allowing both sanitizers and queries to access oracles. Actually, this kind of sanitizers are defined in such a specific way which makes them useful in proving the hardness for the traitor tracing systems defined in Sect. 2. It is also not clear for us if it has any real application in the context of privately data analysis. Here we use the term “query” in two ways, one referring to the query answered by the santizer and the other one meaning the query sent by algorithms to oracles. Without specification, only when we say “query complexity” or “query efficient”, we are referring the oracle queries.

We say that two databases \(D,D'\in (\{0,1\}^{\ell _k})^n\) are adjacent if they differ only on a single row. We use \(\mathbf {q}^{(\cdot )}=(q^{(\cdot )}_1,\dots ,q^{(\cdot )}_m)\) to denote a sequence of m queries. Next, we give a natural extension of differential privacy to the setting with oracle access.

Definition 8

A sanitizer \(\mathcal {M}^{(\cdot )}\) for a set of counting queries \(\mathcal {Q}\) is said to be \((\varepsilon ,\delta )\)-differentially private if for any two adjacent databases D and \(D'\), oracle O, query sequence \(\mathbf {q}^{(\cdot )}\in \mathcal {Q}^m\) and any subset \(S\subseteq \mathbb {R}^m\),

$$\begin{aligned} \Pr [\mathcal {M}^O(D,\mathbf {q}^O)\in S]\le e^{\varepsilon }\Pr [\mathcal {M}^O(D',\mathbf {q}^O)\in S]+\delta \end{aligned}$$

If \(\mathcal {M}^{(\cdot )}\) is \((\varepsilon ,\delta )\)-differentially private for some constant \(\varepsilon =O(1)\) and \(\delta =o(1/n)\), we will drop the parameters \(\varepsilon \) and \(\delta \) and just say that \(\mathcal {M}^{(\cdot )}\) is differentially private.

Proposition 1

(Lemma 3.7 from [20]). The following condition implies \((\varepsilon ,\delta )\)-differential privacy. For any two adjacent databases D and \(D'\), oracle O and any query sequence \(\mathbf {q}^{(\cdot )}\in \mathcal {Q}^m\),

$$\mathop {\Pr }\limits _{a\leftarrow _R\mathcal {M}^O(D,\mathbf {q}^O)}\left[ \left| \log {\left( \frac{\Pr [\mathcal {M}^O(D,\mathbf {q}^O)=a]}{\Pr [\mathcal {M}^O(D',\mathbf {q}^O)=a]}\right) }\right| >\varepsilon \right] \le \delta $$

Moreover, a sanitizer should answer any sequence of queries accurately with high probability.

Definition 9

A sanitizer \(\mathcal {M}^{(\cdot )}\) is said to be \((\alpha ,\beta )\)-accurate for a set of counting queries \(\mathcal {Q}\) if for any database D

$$\mathop {\Pr }\limits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \forall \mathbf {q}^{(\cdot )}\in \mathcal {Q}^m, \left\| \mathcal {M}^O(D,\mathbf {q}^O)-\mathbf {q}^O(D)\right\| _\infty \le \alpha \right] \ge 1-\beta $$

If \(\mathcal {M}^{(\cdot )}\) is \((\alpha ,\beta )\)-accurate for some constant \(\alpha <1/2\) and \(\beta =o(1/n^{10})\), we will drop parameters \(\alpha \) and \(\beta \) and just say that \(\mathcal {M}^{(\cdot )}\) is accurate.

Finally, we consider the query complexity of sanitizers. Clearly, a sanitizer cannot be query efficient if the evaluation of some counting query \(q^{(\cdot )}\) is not query efficient. Let \(\mathcal {Q}_\mathtt {Enf}\) be the set of all efficient queries, i.e. for any database \(D=(\{0,1\}^{\ell _k})^n\) and any oracle O, any \(q^O(D)\in \mathcal {Q}_\mathtt {Enf}\) can be evaluated in \(\mathtt {poly}(n,{\ell _k},{\ell _c})\) number of queries to O. A sanitizer is said to be efficient if for any oracle O, database D and any query sequence \(\mathbf {q}^{(\cdot )}\in \mathcal {Q}_\mathtt {Enf}^m\), \(\mathcal {M}^O(D,\mathbf {q}^O)\) can be computed in \(\mathtt {poly}(n,m,{\ell _k})\) number of queries to O.

Theorem 3

Given functions \(n,m,{\ell _k},{\ell _c}\) and \({\ell _o}:\mathbb {N}\rightarrow \mathbb {N}\), if for any query set \(\mathcal {Q}\subseteq \mathcal {Q}_\mathtt {Enf}\) with size \(|\mathcal {Q}|\le 2^{{\ell _c}(\kappa )}\), there exists an efficient, differentially private and accurate sanitizer for any database \(D\in (\{0,1\}^{\ell _k(\kappa )})^{n(\kappa )}\) and any m-query sequence in \(\mathcal {Q}^m\), then there exists no efficient, correct and secure traitor tracing system \(\varPi _{\mathtt {TT}}=(n,m,{\ell _k},{\ell _c},{\ell _o},\mathtt {Gen},\mathtt {Enc},\mathtt {Dec},\mathtt {Trace})\).

Remark 3

The proof idea is similar to [13, 32], that is if there exist such a santizer and a traitor tracing system, we can slightly modify the sanitizer to be an available pirate decoder for the traitor tracing system. The only technical difference is that the traitor tracing system and the sanitizer defined here have access to a random oracle O. So we need to modify the proof in [32] to accommodate these oracle accesses and the definitions in Sects. 2 and 3.

4 Lower Bounds on Traitor Tracing Systems

In this section, we exhibit the proof of a weaker version of Theorem 2. That is, there is no efficient, correct and secure traitor tracing system such that \({\ell _k}(\kappa )\cdot {\ell _c}(\kappa )^2\le n(\kappa )^{\frac{1}{3}-\theta }\) for any \(\theta >0\). Assume to the contrary that there exists such a system \(\varPi _{\mathtt {TT}}\), let \(q_\pi \) denote the maximum query complexity of \(\mathtt {Dec}^O(\mathbf {k},c)\) over all database \(\mathbf {k}\), ciphertext c and oracle O. We will construct an efficient, differentially private and accurate sanitizer \(\mathcal {M}\) for any m queries from the query set \(\{\mathtt {Dec}^{(\cdot )}(\cdot ,c)\,|\,c\in \{0,1\}^{\ell _c}\}\) and any database \(D\in (\{0,1\}^{\ell _k})^n\) (inspired by [20, 30]). In this section, we abuse the notation \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c)\) to denote the function \(\frac{1}{n}\cdot \sum _{i\in [n]}\mathtt {Dec}^{(\cdot )}(k_i,c)\). Before describing the santizer, we first define significant variable for decryption.

Definition 10

Given a database \(\mathbf {k}\in (\{0,1\}^{\ell _k})^n\), a decrypt algorithm \(\mathtt {Dec}^{(\cdot )}\) and a ciphertext c, we say a variable \(x\in \{0,1\}^{\ell _o}\) is \(\beta \)-significant for \(\mathtt {Dec}^{(\cdot )}(k_i,c)\) if

$$\mathop {\Pr }\limits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \mathtt {Dec}^O(k_i,c) \, queries \, x\right] \ge \beta $$

We say x is \(\beta \)-significant for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c)\), if x is \(\beta \)-significant for at least one \(k_i\in \mathbf {k}\). We say x is \((\alpha ,\beta )\)-significant for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c)\), if x is \(\beta \)-significant for at least \(\alpha n\) entries of \(\mathbf {k}\).

Our sanitizer is described as Algorithm 1 by setting the parameters \(\sigma , \alpha , \beta \) to be

$$\sigma =n^{\theta /3}\sqrt{\frac{{\ell _k}}{n}},\qquad \alpha =\frac{1}{{\ell _c}n^\theta },\qquad \beta =\frac{1}{54n^4q^3_\pi }$$

The intuition behind the calibration of parameters is that we need the condition that \(\alpha \) dominates \(\sigma {\ell _k}\) which will be used in the later analysis. Since \({\ell _k}\cdot {\ell _c}^2\le n^{\frac{1}{3}-\theta }\), by simple calculation, we have \(\alpha /(\sigma {\ell _k})\ge n^{\theta /6}\).

The main idea is to maintain a set of potential databases denoted by \(\mathcal {D}_j\) for each round j. Note that the IndKeys property of the system guarantees that conditioned on any particular database, the oracles are always distributed uniformly. This allows us to focus on the available databases not the database and oracle pairs. For each ciphertext \(c_j\), the sanitizer consists of three phases. In phase 1, we examine all \(x\in \{0,1\}^{\ell _o}\) and determine a set (denoted by \(W_j\)) of significant variables which is queried with probability at least \(\beta /2\) over randomness over all \(O\in \mathcal {O}_{\mathrm {unif}}\) and \(\mathbf {k}\in \mathcal {D}_{j-1}\). Roughly speaking, we pick all variables which are significant for most databases. It should be emphasized that even though some variables are not picked in this phase, they might be significant for some database. Then for each variable in \(W_j\), we query \(O^*\) on it and simplify the decrypt algorithm by fixing the value of this specific variable. Note that, this phase does not depend on the true database \(\mathbf {k}^*\) so it is clear that there is no privacy loss here. On the other hand, as we will show in Lemma 1, the total number of queries we ask to the oracle \(O^*\) in this phase is polynomial in n.

In phase 2, we check if the \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) has \((\alpha ,\beta )\)-significant variables by using a variant of the exponential mechanism. If there is a significant variable, the santizer outputs \(\widehat{a}_j\) the true answer with a noise and modifies \(\mathcal {D}_j\). If there are no \((\alpha ,\beta )\)-significant variables, the sanitizer runs phase 3, where it “guesses” the answer by using the median of database set \(\mathcal {D}'_{j-1}\) which is the set of all databases in \(\mathcal {D}_{j-1}\) which has no \((\alpha ,\beta )\)-significant variables. The sanitizer outputs the guess \( med _j\) if it is close to the true answer. Otherwise, the sanitizer outputs \(\widehat{a}_j\) and modify \(\mathcal {D}_j\).

figure a

4.1 Efficiency Analysis

Lemma 1

The query complexity of Algorithm 1 is \(O(n{\ell _k}q_\pi /\beta )\) which is polynomial in n.

Proof

Let \(\mathbf {x}=(x_1,\dots ,x_{q_\pi })\) be a sequence of \(q_\pi \) oracle variables where \(x_i\in \{0,1\}^{\ell _o}\) and \(\mathbf {b}=(b_1,\dots ,b_{q_\pi })\) be a sequence of \(q_\pi \) bits where \(b_i\in \{0,1\}\). We define an indicator function of \(\mathbf {x}, \mathbf {b}, O\) and \(\mathbf {k}\) as follows.

$$\mathbf {1}_{\mathbf {x},\mathbf {b}}(O,\mathbf {k})=\left\{ \begin{aligned} 1&\quad \text {if } \mathtt {Dec}^O(\mathbf {k},c_j)\text { queries } x_1,\dots ,x_{q_\pi } \text { sequentially and } \mathbf {b}=O(\mathbf {x})\\ 0&\quad \text {otherwise.} \end{aligned} \right. $$

Then we define a potential function \(\varPhi =\sum _{\mathbf {x},\mathbf {b}}\sum _{O\in \mathcal {O}_{\mathrm {unif}},k\in \mathcal {D}_{j-1}}\mathbf {1}_{\mathbf {x},\mathbf {b}}(O,\mathbf {k})\). Clearly, the value of \(\varPhi \) at the beginning of Phase 1 is at most \(2^{n{\ell _k}q_{\pi }}\) since \(|\mathcal {D}_{j-1}|\le 2^{n{\ell _k}}\) and for any particular \(\mathbf {k}\) and \(c_j\), the number of all possible query histogram of \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\) is at most \(2^{q_\pi }\).

We will show that when fixing a variable \(x\in W_j\) such that

$$\mathop {\Pr }\limits _{\mathbf {k}\sim \mathrm {Unif}(\mathcal {D}_{j-1}),O\sim \mathcal {O}_{\mathrm {unif}}}[\mathtt {Dec}^O(k_i,c_j) \text { queries }x \text { for some }k_i\in \mathbf {k}]\ge \beta /2$$

the value of \(\varPhi \) will decrease by a factor \((1-\beta /4)\). This is because fixing the value of x will kill all pair of O and \(\mathbf {k}\) such that \(\mathtt {Dec}^O(\mathbf {k},c_j)\) queries x but O is not consistent to \(O^*\) on x. Since \(\varPhi \) can be less than 1, there are at most \(O(n{\ell _k}q_\pi /\beta )\) elements in \(W_{j}\).    \(\square \)

4.2 Utility Analysis

In this section, we show that the sanitizer is \((1/3, neg (n))\)-accurate. We use \(\mathbf {c}=(c_1,\dots ,c_m)\) to denote a sequence of m ciphertexts. Let \(\mathcal {M}^O(\mathbf {k},\mathbf {c})\) be the sanitizer described as Algorithm 1 running on database \(\mathbf {k}\) and ciphertext sequence \(\mathbf {c}\). We first show that with high probability, \(\widehat{a}_j\) is close to \(a_j\) for all round j.

Lemma 2

For any \(O^*\in \mathcal {O}_{\mathrm {unif}}\), any database \(\mathbf {k}^*\in (\{0,1\}^{\ell _k})^n\) and any sequence of m ciphertexts \(\mathbf {c}\in (\{0,1\}^{\ell _c})^m\),

$$\mathop {\Pr }\limits _{\mathbf {\widehat{a}}\leftarrow _R\mathcal {M}^{O*}(\mathbf {k}^*,\mathbf {c})}\left[ \exists j\in [m],\, \left| \widehat{a}_j-a_j\right| >0.1\right] \le neg (n)$$

Proof

Since \({\varDelta a}_j\) is drawn from \(\mathtt {Lap}(\sigma )\), \(\Pr [|{\varDelta a}_j|>0.1]\le e^{-0.1/\sigma }= neg (n)\). The lemmas follows by using union bound on all \(j\in [m]\).    \(\square \)

Then we show that with high probability, the phase 2 can successfully detect the significant variable in \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) for all round j.

Lemma 3

In the execution of Algorithm 1, for any round j where \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) has a \((\alpha ,\beta )\)-significant variable after Phase 1,

$$\begin{aligned} \Pr \left[ {\widehat{I}}_j(x^*_j)<\alpha /2\right] < neg (n) \end{aligned}$$

Proof

Let \(\tau \) be \(\max _x\{I_j(x)\}\). Note that \(\tau \ge \alpha \) since \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) has a \((\alpha ,\beta )\)-significant variable. So we have

$$\Pr [\tau +\mathtt {Lap}(\sigma )<\alpha /2]<\frac{1}{2}\cdot e^{-\frac{\alpha }{2\sigma }}= neg (n)$$

The lemma follows the fact that \({\widehat{I}}_j(x^*_j)<\alpha /2\) implies \(\tau +\mathtt {Lap}(\sigma )<\alpha /2\).    \(\square \)

Before bounding the failure probability of the sanitizer, we first exhibit a large deviation bound for decision forest whose proof is deferred to Sect. 6.

Proposition 2

For any \(c_j\in \{0,1\}^{\ell _c}\) and \(\mathbf {k}\in (\{0,1\}^{\ell _k})^n\), if there is no \((\alpha ,\beta )\)-significant variable in \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\) then for any \(\delta _1>0\) and \(\delta _2>0\),

$$\mathop {\Pr }\limits _{O^*\sim \mathcal {O}_{\mathrm {unif}}}\left[ \left| \mathtt {Dec}^{O^*}(\mathbf {k},c_j)-\mathop {\mathbb {E}}\limits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \mathtt {Dec}^{O^*}(\mathbf {k},c_j)\right] \right| >\delta _1+h\delta _2+n^2h\sqrt{\beta }\right] \le e^{-2\delta _1^2/\alpha }+h^8e^{-\delta _2^2/\beta }$$

where h is the query complexity of \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\).

Lemma 4

For any database \(\mathbf {k}^*\in (\{0,1\}^{\ell _k})^n\), if there is no \((\alpha ,\beta )\)-significant variables in \(\mathtt {Dec}^O(\mathbf {k}^*,c)\), then

$$\mathop {\Pr }\limits _{O^*\sim \mathcal {O}_{\mathrm {unif}}}\left[ \exists c\in \{0,1\}^{{\ell _c}},\, \left| \mathtt {Dec}^{O^*}(\mathbf {k}^*,c)- \mathop {\mathbb {E}}\limits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \mathtt {Dec}^O(\mathbf {k}^*,c)\right] \right| >0.1\right] \le neg (n)$$

Proof

Let \(T=0.1\), by Proposition 2 (setting \(\delta _1=T/3\), \(\delta _2=T/(3q_\pi )\), \(h=q_\pi \)) and noting that \(\beta =T/(3n^4q^3_\pi )\),

$$\mathop {\Pr }\limits _{O^*\sim \mathcal {O}_{\mathrm {unif}}}\left[ \left| \mathtt {Dec}^{O^*}(\mathbf {k}^*,c)- \mathop {\mathbb {E}}\limits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \mathtt {Dec}^O(\mathbf {k}^*,c)\right] \right| >T\right] \le 2e^{-T^2/(9\alpha )}+2q^8_\pi e^{-2Tn^4q_{\pi }/3}$$

By taking union bound over all \(c\in \{0,1\}^{\ell _c}\), the lemma follows that \(\alpha =1/({\ell _c}n^\theta )\).    \(\square \)

Remark 4

Note that the statement of Lemma 4 requires that, with high probability, for all ciphertext \(c\in \{0,1\}^{\ell _c}\), \(\mathtt {Dec}^{O^*}(k^*,c)\) should concentrate around the expectation. One might wonder whether this requirement is too stringent as the sanitizer only answers m (which may be far less than \(2^{\ell _c}\)) queries. Unfortunately, it seems that this condition cannot be relaxed because the m queries asked by the adversary might depend on the oracle \(O^*\). So when considering all \(O^*\), the number of possible queries can be much greater than m.

In order to bound the failure probability of the sanitizer, we divide all the query rounds \(1,\dots ,m\) into three types.

  • Type 1: \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) has a \((\alpha ,\beta )\)-significant variable. So \(\widehat{a}_j\) is used to answer the query.

  • Type 2: The median \( med _j\) is not close to \(\widehat{a}_j\). So \(\widehat{a}_j\) is used to answer the query.

  • Type 3: The mechanism use \( med _j\) to answer the query.

We say a round is bad if it is in Type 1 or 2 otherwise it is said to be good.

Lemma 5

For any database \(\mathbf {k}\in (\{0,1\}^{\ell _k})^n\),

$$\mathop {\Pr }\limits _{O^*\sim \mathcal {O}_{\mathrm {unif}}}\left[ \forall \mathbf {c}\in (\{0,1\}^{\ell _c})^m,\,the\, number\, of\, bad\, rounds\,in\, \mathcal {M}^{O^*}(\mathbf {k},\mathbf {c})>n{\ell _k}\right] \le neg (n)$$

Proof

We first show that, in any bad round j, the size of \(\mathcal {D}_j\) will shrink by at least a factor of 2, i.e. \(|\mathcal {D}_j|\le |\mathcal {D}_{j-1}|/2\). Consider any Type 1 round j. Let \(x^*_j\) be the significant variable picked at this round. Since \(x^*_j\not \in W_j\),

$$\sum _{O\in \mathcal {O}_{\mathrm {unif}},k\in \mathcal {D}_{j-1}}\mathbf {1}_{\mathtt {Dec}^O(\mathbf {k},c_j) \text { queries }x^*_j} \le |\mathcal {D}_{j-1}|\cdot |\mathcal {O}_{\mathrm {unif}}|\cdot \beta /2$$

On the other hand, since \(\mathcal {D}_j\) is obtained by removing all database \(\mathbf {k}\) where \(x^*_j\) is not \(\beta \)-significant for \(\mathtt {Dec}(\mathbf {k},c_j)\), we have

$$\sum _{O\in \mathcal {O}_{\mathrm {unif}},k\in \mathcal {D}_{j-1}}\mathbf {1}_{\mathtt {Dec}^O(\mathbf {k},c_j) \text { queries }x^*_j} \ge |\mathcal {D}_j|\cdot |\mathcal {O}_{\mathrm {unif}}|\cdot \beta $$

Combine above two inequalities, we have \(|\mathcal {D}_j|\le |\mathcal {D}_{j-1}|/2\). Consider any Type 2 round j. Suppose \(|\mathcal {D}_j|>|\mathcal {D}_{j-1}|/2\ge |\mathcal {D}'_{j-1}|/2\). By the definition of \(\mathcal {D}_j\) and \( med _j\), we have \(| med _j-\widehat{a}_j|\le T\) which contradicts the fact that j is a Type 2 round.

Next we show that \(\mathbf {k}^*\in \mathcal {D}_m\) with probability \(1- neg (n)\) by induction on j. Clearly, \(\mathbf {k}^*\in \mathcal {D}_0\). If j is Type 1, in order to show \(\mathbf {k}^*\notin \mathcal {D}_{j-1}\setminus \mathcal {D}_j\), it suffices to show that \(x^*_j\) is \(\beta \)-significant for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) with probability \(1- neg (n)\). For any x which is not \(\beta \)-significant for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\)n, we have \(I_j(x)=0\). Thus, a

$$\begin{aligned} \Pr [{\widehat{I}}_j(x)\ge \alpha /2]\le \frac{1}{2} e^{-\alpha /2\sigma } \end{aligned}$$

On the other hand, \(|\mathcal {U}_j|\) is at most \(2^{\ell _k}\beta /q_\pi \) since

$$|\mathcal {U}_j|\cdot |\mathcal {O}_{\mathrm {unif}}|\cdot \beta \le \sum _{O\in \mathcal {O}_{\mathrm {unif}},k\in \mathcal {D}_{j-1},x\not \in W_j} \mathbf {1}_{\mathtt {Dec}^O(\mathbf {k},c_j) \text { queries }x}\le |\mathcal {D}_{j-1}|\cdot |\mathcal {O}_{\mathrm {unif}}|\cdot q_\pi $$

By taking union bound over all \(x\in \mathcal {U}_j\), we have the probability that \(x^*_j\) is not \(\beta \)-significant for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\) is at most \(|\mathcal {U}_j|\cdot e^{-\alpha /2\sigma }\le 2^{\ell _k}\beta /q_\pi \cdot e^{-\alpha /2\sigma }\). Since \(\alpha /(\sigma {\ell _k})\ge n^{\theta /6}\), this probability is negligible.

If j is Type 2, by Lemma 3, \(\mathbf {k}^*\in \mathcal {D}'_{j-1}\) with probability at least \(1- neg (n)\). Then by Lemmas 2 and 4, with probability at least \(1- neg (n)\), \(|\widehat{a}_j-a_j|\le 0.1\) and \(\left| a_j- \mathop {\mathbb {E}}\nolimits _{O\sim \mathcal {O}_{\mathrm {unif}}}\left[ \mathtt {Dec}^O(\mathbf {k}^*,c_j)\right] \right| \le 0.1\). Thus, \(\mathbf {k}^*\notin \mathcal {D}'_{j-1}\setminus \mathcal {D}_j\) by triangle inequality. If j is Type 3, it is obvious since \(\mathcal {D}_{j-1}=\mathcal {D}_j\).

Putting it all together, the lemma follows the facts that \(|\mathcal {D}_0|=2^{n{\ell _k}}\), \(|\mathcal {D}_m|\ge 1\) with probability \(1- neg (n)\) and \(|\mathcal {D}_j|\le |\mathcal {D}_{j-1}|/2\) for all bad rounds.    \(\square \)

Lemma 6

(Utility). Algorithm 1 is \((0.3, neg (n))\)-accurate, i.e., for any database \(\mathbf {k}^*\in (\{0,1\}^{\ell _k})^n\),

$$\mathop {\Pr }\limits _{O^*\sim \mathcal {O}_{\mathrm {unif}}}\left[ \forall \mathbf {c}\in (\{0,1\}^{\ell _c})^m,\forall j\in [m], \left| ans_j-a_j\right| <0.3\right] \ge 1- neg (n)$$

where \(ans_j\) is the answer output by \(\mathcal {M}^{O^*}(\mathbf {k}^*,\mathbf {c})\) at round j and \(a_j\) is the true answer \(\mathtt {Dec}^{O^*}(\mathbf {k}^*,c_j)\).

Remark 5

Actually, the outermost probability should also be taken over the random coins in \(\mathcal {M}\), i.e. the randomness of the Laplace noises. We omit this for the ease of presentation since these random coins are independent from the choice of \(O^*\) and \(\mathbf {c}\).

Proof

By the description of Algorithm 1, if the sanitizer succeeds, \(| ans _j-\widehat{a}_j|\le 0.2\) for all round j. Thus the lemma follows from Lemmas 2 and 5.    \(\square \)

4.3 Privacy Analysis

Our goal in this section is to demonstrate that, Algorithm 1 is \((\varepsilon , neg (n))\)-differentially private. We first simplify the output of our sanitizer as a vector \(\mathbf {v}\), which will be shown to determine the output transcript of the sanitizer.

$$v_j=\left\{ \begin{array}{cc} \widehat{a}_j,x^*_j &{} \text {if round }j \text { is }\textsf {Type 1} \\ \widehat{a}_j,\bot &{} \text {if round }j\text { is }\textsf {Type 2} \\ \bot ,\bot &{} \text {if round }j\text { is }\textsf {Type 3} \\ \end{array}\right. $$

Lemma 7

Given the oracle \(O^*\) and \(\mathbf {v}\), the output of Algorithm 1 can be determined.

Fix an oracle \(O^*\) and two adjacent databases \(\mathbf {k},\mathbf {k}'\in (\{0,1\}^{\ell _k})^n\). Let A and B denote the output distributions of our sanitizer when run on the input database \(\mathbf {k}\) and \(\mathbf {k}'\) respectively. We also use A and B to denote their probability density function dA and dB. The support of both distributions is denoted by \(\mathcal {V}=({\{\bot \}}\cup \mathbb {R},{\{\bot \}}\cup \{0,1\}^{\ell _o})^n\). For any \(\mathbf {v}\in \mathcal {V}\), we define the loss function \(L:\mathcal {V}\rightarrow \mathbb {R}\) as

$$\begin{aligned} L(\mathbf {v})=\log \left( \frac{A(\mathbf {v})}{B(\mathbf {v})}\right) \end{aligned}$$

By Proposition 1, it suffices to show that

$$\begin{aligned} \mathop {\Pr }\limits _{\mathbf {v}\sim A}[L(\mathbf {v})>\varepsilon ]< neg (n) \end{aligned}$$

Given a transcript \(\mathbf {v}\), by chain rule,

$$L(\mathbf {v})=\log \left( \frac{A(\mathbf {v})}{B(\mathbf {v})}\right) =\sum _{j\in [m]} \log \left( \frac{A_j(v_j\mid \mathbf {v}_{<j})}{B_j(v_j\mid \mathbf {v}_{<j})}\right) $$

where \(A_j(v_j\mid \mathbf {v}_{<j})\) is the probability density function of the conditional distribution of Algorithm 1 outputting \(v_j\), conditioned on \(\mathbf {v}_{<j}=(v_1,\dots ,v_{j-1})\).

Now fix a round \(j\in [m]\) and \(\mathbf {v}_{<j}\). We define two borderline events on the noise values \({\varDelta I}_j(x)\) and \({\varDelta a}_j\). Let \(\mathcal {E}_1\) be the event that \({\widehat{I}}_j(x^*_j)>\alpha /2-\sigma \) and \(\mathcal {E}_2\) be the event that \(|\widehat{a}_j- med _j|>T-\sigma \). It should be emphasized that given \(\mathbf {v}_{<j}\), both \(\mathcal {E}_1\) and \(\mathcal {E}_2\) are events only depends on the Laplacian noises \(\{{\varDelta I}_j(x)\}_{x\in \mathcal {U}_j}\) and \({\varDelta a}_j\). Equivalently, \(\mathcal {E}_1\) is the event that \(\{{\varDelta I}_j(x)\}_{x\in \mathcal {U}_j}\) is in the set of noises such that \({\widehat{I}}_j(x^*_j)>\alpha /2-\sigma \) and \(\mathcal {E}_2\) is the event that \({\varDelta a}_j> T-\sigma + med _j-a_j\) or \({\varDelta a}_j<med_j-a_j-T+\sigma \). In the following lemma, we show that conditioned on \(\mathcal {E}_1\vee \mathcal {E}_2\), with probability at least 1 / e, a round j is a bad round.

Lemma 8

\(\Pr \left[ j\, is\, of\, {{\textsf {\textit{Type 1}}}}\mid \mathcal {E}_1\right] \ge 1/e\) and \(\Pr \left[ j\, is\, of\, {{\textsf {\textit{Type 2}}}}\mid \overline{\mathcal {E}}_1,\mathcal {E}_2\right] \ge 1/e\).

Then we show upper bounds on the privacy loss for three cases \(\overline{\mathcal {E}}_1\wedge \overline{\mathcal {E}}_2\), \(\overline{\mathcal {E}}_1\wedge \mathcal {E}_2\) and \(\mathcal {E}_1\). By combining all these three cases, we are able to show the following lemma. Due to space limit, we defer all the proofs in Appendix A.

Lemma 9

Algorithm 1 is \((\varepsilon , neg (n))\)-differently private.

5 Improved Lower Bound

In this section, we show how to improve the bound proved in Sect. 4 to \(\widetilde{\varOmega }(n)\) by modifying the sanitizer and the proof a bit. Suppose \({\ell _k}\cdot {\ell _c}^2\le n^{1-\theta }\). Set parameters \(\sigma , \alpha , \beta \) to be

$$\sigma =n^{\theta /3}\sqrt{\frac{{\ell _k}}{n}},\qquad \alpha =\frac{1}{{\ell _c}n^\theta },\qquad \beta =\frac{1}{54n^4q^3_\pi }$$

Since \({\ell _k}\cdot {\ell _c}^2\le n^{1-\theta }\), by simple calculation, we have \(\alpha /\sigma \ge n^{\theta /6}\).

We modify the definition of \(\mathcal {U}_j\) in the line 10 of Algorithm 1 as follows.

$$\begin{aligned} \text {Algorithm 1}:&\quad \mathcal {U}_j\leftarrow {\{x\notin W_j \mid \exists \mathbf {k}\in \mathcal {D}_{j-1} \text { s.t. }x\text { is } \beta \text {-significant for } \mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\}}\\ \text {New Algorithm}:&\quad \mathcal {U}_j\leftarrow {\{x\notin W_j \mid x\text { is } \beta \text {-significant for } \mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\}} \end{aligned}$$

The efficiency of the new sanitizer follows Lemma 1. The only difference in the utility analysis is in the proof of Lemma 5 where we show \(k^*\in \mathcal {D}_m\) if j is Type 1. In the new algorithm, this is straight forward since \(x^*_j\in \mathcal {U}_j\) must be a \(\beta \)-significant variable for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k}^*,c_j)\).

In the privacy analysis, the only difference is that the new definition of \(\mathcal {U}_j\) does depend on the true database. Given any adjacent databases \(\mathbf {k},\mathbf {k}'\), we fix a round j and \(\mathbf {v}_{<j}\). Let \(\mathcal {U}\) and \(\mathcal {U}'\) denote the set \(\mathcal {U}_j\) when the sanitizer running on \(\mathbf {k}\) and \(\mathbf {k}'\) respectively. We also use \(x^*\) and \({x^*}'\) to denote the variable \(x^*_j=\arg \!\max _x\{{\widehat{I}}_j(x)\}\) for \(\mathbf {k}\) and \(\mathbf {k}'\) respectively. Let \(\mathcal {H}_j\) be the event that there exists \(x\in \mathcal {U}\setminus \mathcal {U}'\) such that \({\varDelta I}_j(x)\ge \alpha /2-\sigma -1/n\) or there exists \(x\in \mathcal {U}'\setminus \mathcal {U}\) such that \({\varDelta I}'_j(x)\ge \alpha /2-\sigma -1/n\).

Lemma 10

\(\Pr [\mathcal {H}_j|\mathbf {v}_{<j}]\le neg (n)\).

Proof

First, note that \(|\mathcal {U}|\le q_\pi /\beta \) since

$$|\mathcal {U}|\cdot |\mathcal {O}_{\mathrm {unif}}|\cdot \beta \le \sum _{O\in \mathcal {O}_{\mathrm {unif}},x\not \in W_j} \mathbf {1}_{\mathtt {Dec}^O(\mathbf {k},c_j)\text { queries }x}\le |\mathcal {O}_{\mathrm {unif}}|\cdot q_\pi $$

On the other hand, since \({\varDelta I}_j(x)\) is drawn from \(\mathtt {Lap}(\sigma )\) and \(\alpha /\sigma \ge n^{\theta /6}\),

$$\Pr [{\varDelta I}_j(x)\ge \alpha /2-\sigma -1/n] \le \frac{1}{2}\cdot e^{-(\alpha /2-\sigma )/\sigma }= neg (n)$$

The lemma follows by taking union bound over all \(x\in \mathcal {U}\setminus \mathcal {U}'\) and applying similar arguments for \(x\in \mathcal {U}'\setminus \mathcal {U}\).    \(\square \)

We define another random variable \(A_j'\) such that \(d_{tv}(A_j,A'_j)\le neg (n)\) and \(\mathcal {H}_j\) never occurs with respect to \(A'_j\) (similar ideas has been also used in proving Theorem 3.5 of [14]). Observe that, conditioned on \(\overline{\mathcal {H}}_j\), \(\mathcal {E}_1\) implies \(x^*,{x^*}'\in \mathcal {U}\cap \mathcal {U}'\) and \(\overline{\mathcal {E}}_1\) implies the round j is not Type 1 for both \(\mathbf {k}\) and \(\mathbf {k}'\). Let \(L'(\mathbf {v})\) be the analogues of \(L(\mathbf {v})\) by replacing \(A_j\) by \(A'_j\) for all \(j\in [m]\). Clearly \(d_{tv}(L,L')\le m\cdot neg (n)= neg (n)\). Following the proof of Lemma 9, we can show \(\Pr [L'(\mathbf {v})\ge \varepsilon ]\le neg (n)\) for any \(\varepsilon =\varOmega (1)\). Thus \(\Pr [L(\mathbf {v})\ge \varepsilon ]\le neg (n)\) follows.

6 Large Deviation Bound for Decision Forests

In this section, we show the large deviation bound for \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\) for any given \(\mathbf {k}\in (\{0,1\}^{\ell _k})^n\) and \(c_j\in \{0,1\}^{\ell _c}\). Intuitively, a decrypt algorithm \(\mathtt {Dec}^{(\cdot )}(k_i,c_j)\) can be viewed as a decision tree and similarly, \(\mathtt {Dec}^{(\cdot )}(\mathbf {k},c_j)\) represents a decision forest (see formal definition below). So throughout this section, we will use the terms like decision trees/forest instead of decrypt algorithms to present our result on large deviation bound for decision forest.

A decision tree D is a binary tree whose internal nodes are labeled with Boolean variables while leaves labeled with 0 or 1. Given an input assignment \(\mathbf {a}=(a_1,\dots ,a_m)\in \{0,1\}^n\) to the variables \(x_1,\dots ,x_m\), the value computed by D on this input \(\mathbf {a}\) is denoted by \(D(\mathbf {a})\). This value \(D(\mathbf {a})\) is the value of the leaf at a path on D determined in the following way. The path starts from the root of D and then moves to the left child if the current internal node is assigned 0 and to right otherwise. A variable \(x_i\) is said to be queried by \(D(\mathbf {a})\) if the corresponding path passes through a node labeled \(x_{i}\). Clearly, every \(x_i\) can only be queried by \(D(\mathbf {a})\) at most once.

A decision forest \(\mathcal {F}\) is a collection of \(|\mathcal {F}|\) decision trees. For any assignment \(\mathbf {a}\) of \(\mathbf {x}\), \(\mathcal {F}(\mathbf {a})\) denotes the \(|\mathcal {F}|\)-dimensional vector computed by \(\mathcal {F}\) on \(\mathbf {a}\), whose ith component is the value computed by the ith tree. We use \(w(\mathcal {F}(\mathbf {a}))\) to denote the fractional hamming weight of \(\mathcal {F}(\mathbf {a})\), i.e.,

$$\begin{aligned} w(\mathcal {F}(\mathbf {a}))=\frac{\sum _{D_{j}\in \mathcal {F}}D_{j}(\mathbf {a})}{|\mathcal {F}|}. \end{aligned}$$

In most cases, we assume the assignment \(\mathbf {a}\) are drawn from the uniform distribution on \(\{0,1\}^m\). We also use the shorthand notations \(\Pr _{\mathbf {a}}\) and \(\mathbb {E}_{\mathbf {a}}\) to denote the probability and expectation when \(\mathbf {a}\) are uniformly distributed when it is clear from the context. We may also abuse the \(\Pr _{\mathbf {a}}\) or \(\mathbb {E}_{\mathbf {a}}\) inside another \(\Pr _{\mathbf {a}}\) or \(\mathbb {E}_{\mathbf {a}}\) to denote the probability or expectation corresponding to another random variable when it is not ambiguous, e.g. \(\Pr _\mathbf {a}\big [w(\mathcal {F}(\mathbf {a}))>\mathbb {E}_\mathbf {a}[w(\mathcal {F}(\mathbf {a}))]\big ]\).

Definition 11

( \((\alpha ,\beta )\) -significant). For a decision forest \(\mathcal {F}\) and an input \(\mathbf {x}\), a Boolean variable \(x_{i}\) is said to be \((\alpha ,\beta )\)-significant if at least \(\alpha \) fraction of trees D in \(\mathcal {F}\) satisfy \(\Pr _\mathbf {a}\big [D(\mathbf {a})\text { queries } x_i\big ]\ge \beta \).

For comparison, we discuss the difference between the above definition and the notion called “average significance” used in [2]. Recall that the average significance of \(x_i\) on \(\mathcal {F}\) is defined as

$$\frac{1}{|\mathcal {F}|}\cdot \sum _{D\in \mathcal {F}}\mathop {\Pr }\limits _\mathbf {a}\big [D(\mathbf {a})\text { queries } x_i\big ].$$

Obviously, if \(x_i\) is \((\alpha ,\beta )\)-significant, the average significance of \(x_i\) is at least \(\alpha \cdot \beta \). On the other hand, if \(x_i\) is not \((\alpha ,\beta )\)-significant, it can be shown that the average significance of \(x_i\) is at most \(\alpha +\beta \). To see this, let \(\mathcal {F}_1\subseteq \mathcal {F}\) be the set of trees D such that \(\Pr _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]\ge \beta \).

$$\begin{aligned}&\frac{1}{|\mathcal {F}|}\cdot \sum _{D\in \mathcal {F}}\mathop {\Pr }\limits _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]\\ \le&\frac{1}{|\mathcal {F}|}\left( \sum _{D\in \mathcal {F}_1}\mathop {\Pr }\limits _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]+\sum _{D\in \mathcal {F}\setminus \mathcal {F}_1}\mathop {\Pr }\limits _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]\right) \\ \le&\frac{1}{|\mathcal {F}|}\left( |\mathcal {F}_1|+\sum _{D\in \mathcal {F}\setminus \mathcal {F}_1}\beta \right) \le \alpha +\beta \end{aligned}$$

We restate two theorems from [2, 17] in our terms.

Theorem 4

(Theorem 1.1. in [17]). Let \(\mathcal {F}\) be a decision forest that has no \((\alpha ,0)\)-significant variable and n be \(|\mathcal {F}|\). Then for any \(\delta >0\),

$$\mathop {\Pr }\limits _{\mathbf {a}}\left[ \left| w(\mathcal {F}(\mathbf {a}))-\mathop {\mathbb {E}}_{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]\right| \ge \delta \right] \le e^{-2\delta ^2/\alpha }$$

Theorem 5

([2]). Let \(\mathcal {F}\) be a decision forest of height at most h that has no \((\beta ,\beta )\)-significant variable. Then for any \(\delta >0\),

$$\mathop {\Pr }\limits _{\mathbf {a}}\left[ \left| w(\mathcal {F}(\mathbf {a}))- \mathop {\mathbb {E}}\limits _{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]\right| \ge h\delta \right] \le h^8e^{-\delta ^2/\beta }$$

We state the main theorem that we will prove in this section.

Theorem 6

Let \(\mathcal {F}\) be a decision forest of height at most h that has no \((\alpha ,\beta )\)-significant variable. Then for any \(\delta _1>0\) and \(\delta _2>0\),

$$\mathop {\Pr }\limits _{\mathbf {a}}\left[ \left| w(\mathcal {F}(\mathbf {a}))-\mathop {\mathbb {E}}\limits _{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]\right| >\delta _1+h\delta _2+n^2h\sqrt{\beta }\right] \le e^{-2\delta _1^2/\alpha }+h^8e^{-\delta _2^2/\beta }$$

For the rest of this section, we fix \(\mathcal {F}\) to be a decision forest of size n and height h, which has no \((\alpha ,\beta )\)-significant variables. Let S denote the set of all variables \(x_i\) such that there exists \(D\in \mathcal {F}\), \(\Pr _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]\ge \sqrt{\beta }\). Clearly, \(|S|\le nh/\sqrt{\beta }\). We use \(\bar{S}\) to denote the complement set of S and \(\mathbf {a}_S\) to denote the partial assignment truncated on S.

Definition 12 (pruning)

Let \(\mathcal {F}_{\mathcal {P}}\) be the pruned forest of \(\mathcal {F}\) defined as follows. For each variable \(x_i\in S\) and \(D\in \mathcal {F}\), if \(\Pr _{\mathbf {a}}[D(\mathbf {a})\, queries \,x]\le \beta \), we deleted \(x_i\) from the corresponding tree in \(\mathcal {F}_{\mathcal {P}}\) and instead replaced with leaves assigning the value 0.

We only show one side of the Theorem 6, i.e.

$$\mathop {\Pr }\limits _{\mathbf {a}}\left[ w(\mathcal {F}(\mathbf {a}))<\mathop {\mathbb {E}}\limits _{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]-\delta _1-h\delta _2-n^2h\sqrt{\beta }\right] \le e^{-2\delta _1^2/\alpha }+h^8e^{-\delta _2^2/\beta }$$

The proof of the other side is symmetric by changing the definition of pruning to replacing \(x_i\) by 1.

The proof sketch of Theorem 6 can be described as follows. Note that for any assignment \(\mathbf {a}\), \(w(\mathcal {F}(\mathbf {a}))\ge w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))\). On the other hand, \(\mathbb {E}_{\mathbf {a}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]\ge \mathbb {E}_{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]-n\beta \cdot nh/\sqrt{\beta }\) since pruning each variable in |S| decreases the expectation value at most \(\beta n\) and \(|S|\le nh/\sqrt{\beta }\). Hence, to prove Theorem 6, it suffices to prove that \(w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))\) is close to \(\mathbb {E}_{\mathbf {a}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]\) with high probability, which can be established in two steps. We first show that, in Lemma 11, for any partial assignment \(\mathbf {a}_{\bar{S}}\), \(w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))\) is close to \(\mathbb {E}_{\mathbf {a}_S}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))]\) with high probability (w.r.t. the randomness of \(\mathbf {a}_S\)). Then in Lemma 12, we prove that with respect to the randomness of \(\mathbf {a}_{\bar{S}}\), \(\mathbb {E}_{\mathbf {a}_S}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))]\) is close to \(\mathbb {E}_{\mathbf {a}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]\) with high probability. Therefore, Theorem 6 follows union bound.

Lemma 11

For any partial assignment \(\mathbf {a}_{\bar{S}}\) and \(\delta >0\),

$$\mathop {\Pr }\limits _{\mathbf {a}_{S}}\left[ \left| w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))-\mathop {\mathbb {E}}\limits _{\mathbf {a}_{S}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))]\right| \ge \delta \right] \le e^{-2\delta _1^2/\alpha }$$

Proof

Given an assignment \(\mathbf {a}_{\bar{S}}\), it is not hard to see that the decision forest \(\mathcal {F}_{\mathcal {P}}(\mathbf {x}_S,\mathbf {a}_{\bar{S}})\), which only takes \(\mathbf {x}_S\) as input, has no \((\alpha ,0)\)-significant variable. Otherwise, such variable must be \((\alpha ,\beta )\)-significant in \(\mathcal {F}\). Hence the lemma follows Theorem 4.    \(\square \)

Lemma 12

For any \(\delta >0\),

$$\mathop {\Pr }\limits _{\mathbf {a}_{\bar{S}}}\left[ \left| \mathop {\mathbb {E}}\limits _{\mathbf {a}_{S}}\big [w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))\big ] -\mathop {\mathbb {E}}\limits _{\mathbf {a}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]\right| \ge h\delta \right] \le h^8e^{-\delta ^2/\beta }$$

Before proving Lemma 12, we define an operation on \(\mathcal {F}_{\mathcal {P}}\).

Definition 13 (truncating)

Let \(\mathcal {F}_{\mathcal {T}}\) be a truncated forest of \(\mathcal {F}_{\mathcal {P}}\) with size \(2^{|S|}\cdot |\mathcal {F}_{\mathcal {P}}|\). For each tree \(D\in \mathcal {F}_{\mathcal {P}}\), there are \(2^{|S|}\) trees in \(\mathcal {F}_{\mathcal {T}}\) that corresponds to all possible assignments of \(\mathbf {x}_S\).

Proof

We first show that there is no \((\sqrt{\beta },\sqrt{\beta })\)-significant variables in \(\mathcal {F}_{\mathcal {T}}\). Note that all the variables in \(\mathcal {F}_{\mathcal {T}}\) are in \(\bar{S}\). Assume to the contrary that there exists \(x_i\in \bar{S}\) that is \((\sqrt{\beta },\sqrt{\beta })\)-significant. Then

$$\sum _{D\in \mathcal {F}_{\mathcal {P}}}\mathop {\Pr }\limits _{\mathbf {a}}[D(\mathbf {a})\text { queries } x_i]/n= \sum _{D_{\mathcal {T}}\in \mathcal {F}_{\mathcal {T}}}\mathop {\Pr }\limits _{\mathbf {a}_{\bar{S}}}[D_{\mathcal {T}}(\mathbf {a}_{\bar{S}})\text { queries } x_i]/(n\cdot 2^{|S|})\ge \sqrt{\beta }\cdot \sqrt{\beta }=\beta $$

which implies there is a \(D\in \mathcal {F}_{\mathcal {P}}\) such that \(\Pr _\mathbf {a}[D(\mathbf {a})\text { queries }x_i]\ge \beta \). This is a contradiction with the definition of \(\bar{S}\).

Thus, by Theorem 5,

$$\mathop {\Pr }\limits _{\mathbf {a}_{\bar{S}}}\left[ \left| w(\mathcal {F}_{\mathcal {T}}(\mathbf {a}_{\bar{S}}))-\mathop {\mathbb {E}}\limits _{\mathbf {a}_{\bar{S}}}[w(\mathcal {F}_{\mathcal {T}}(\mathbf {a}_{\bar{S}}))]\right| \le h\delta \right] \le h^8e^{-\delta ^2/\beta }$$

Therefore, the lemma follows the fact that \(w(\mathcal {F}_{\mathcal {T}}(\mathbf {a}_{\bar{S}}))=\mathbb {E}_{\mathbf {a}_S}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}_S,\mathbf {a}_{\bar{S}}))]\).    \(\square \)

Proof

(Proof of Theorem 6 ). Combining Lemmas 11 and 12, with probability at least \(1-e^{-2\delta _1^2/\alpha }-h^8e^{-\delta _2^2/\beta }\), we have \(w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))\ge \mathbb {E}_\mathbf {a}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]-\delta _1-h\delta _2\). Then the theorem follows that \(w(\mathcal {F}(\mathbf {a}))\ge w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))\) and \(\mathbb {E}_{\mathbf {a}}[w(\mathcal {F}_{\mathcal {P}}(\mathbf {a}))]\ge \mathbb {E}_{\mathbf {a}}[w(\mathcal {F}(\mathbf {a}))]-n^2h\sqrt{\beta }\).    \(\square \)