Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Differential privacy [DMNS06] is a formal mathematical standard for protecting individual-level privacy in statistical data analysis. In its simplest form, (pure) differential privacy is parameterized by a real number \(\varepsilon > 0\), which controls how much “privacy loss”Footnote 1 an individual can suffer when a computation (i.e., a statistical data analysis task) is performed involving his or her data.

One particular hallmark of differential privacy is that it degrades smoothly and predictably under the composition of multiple computations. In particular, if one performs k computational tasks that are each \(\varepsilon \)-differentially private and combines the results of those tasks, then the computation as a whole is \(k\varepsilon \)-differentially private. This property makes differential privacy amenable to the type of modular reasoning used in the design and analysis of algorithms: When a sophisticated algorithm is comprised of a sequence of differentially private steps, one can establish that the algorithm as a whole remains differentially private.

A widely-used relaxation of pure differential privacy is approximate or \((\varepsilon , \delta )\)-differential privacy [DKM+06], which essentially guarantees that the probability that any individual suffers privacy loss exceeding \(\varepsilon \) is bounded by \(\delta \). For sufficiently small \(\delta \), approximate \((\varepsilon ,\delta )\)-differential privacy provides a comparable standard of privacy protection as pure \(\varepsilon \)-differential privacy, while often permitting substantially more useful analyses to be performed.

Unfortunately, there are situations where, unlike pure differential privacy, approximate differential privacy is not a very elegant abstraction for mathematical analysis, particularly the analysis of composition. The “advanced composition theorem” of Dwork, Rothblum, and Vadhan [DRV10] (subsequently improved by [KOV15, MV16]) shows that the composition of k tasks that are each \((\varepsilon , \delta )\)-differentially private is \( (\approx \!\!\sqrt{k}\varepsilon , \approx \!\!k\delta )\)-differentially private. However, these bounds can be unwieldy; computing the tightest possible privacy guarantee for the composition of k arbitrary mechanisms with differing \((\varepsilon _i, \delta _i)\)-differential privacy guarantees is \(\#\mathsf {P}\)-hard [MV16]! Moreover, these bounds are not tight even for simple privacy-preserving computations. For instance, consider the mechanism that approximately answers k statistical queries on a given database by adding independent Gaussian noise to each answer. Even for this basic computation, the advanced composition theorem does not yield a tight analysis.Footnote 2

Dwork and Rothblum [DR16] recently put forth a different relaxation of differential privacy called concentrated differential privacy. Roughly, a randomized mechanism satisfies concentrated differentially privacy if the privacy loss has small mean and is subgaussian. Concentrated differential privacy behaves in a qualitatively similar way as approximate \((\varepsilon , \delta )\)-differential privacy under composition. However, it permits sharper analyses of basic computational tasks, including a tight analysis of the aforementioned Gaussian mechanism.

Using the work of Dwork and Rothblum [DR16] as a starting point, we introduce an alternative formulation of the concept of concentrated differential privacy that we call “zero-concentrated differential privacy” (zCDP for short). To distinguish our definition from that of Dwork and Rothblum, we refer to their definition as “mean-concentrated differential privacy” (mCDP for short). Our definition uses the Rényi divergence between probability distributions as a different method of capturing the requirement that the privacy loss random variable is subgaussian.

1.1 Our Reformulation: Zero-Concentrated Differential Privacy

As is typical in the literature, we model a dataset as a multiset or tuple of n elements (or “rows”) in \(\mathcal {X}^n\), for some “data universe” \(\mathcal {X}\), where each element represents one individual’s information. A (privacy-preserving) computation is a randomized algorithm \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\), where \(\mathcal {Y}\) represents the space of all possible outcomes of the computation.

Definition 1

(Zero-Concentrated Differential Privacy (zCDP)). A randomised mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) is \((\xi ,\rho )\)-zero-concentrated differentially private (henceforth \((\xi ,\rho )\)-zCDP) if, for all \(x,x' \in \mathcal {X}^n\) differing on a single entry and all \(\alpha \in (1,\infty )\),

$$\begin{aligned} \mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) \le \xi + \rho \alpha , \end{aligned}$$
(1)

where \(\mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) \) is the \(\alpha \)-Rényi divergenceFootnote 3 between the distribution of M(x) and the distribution of \(M(x')\).

We define \(\rho \)-zCDP to be \((0,\rho )\)-zCDP.Footnote 4

Equivalently, we can replace (1) with

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1)Z}\right] \le e^{(\alpha -1)(\xi + \rho \alpha )}, \end{aligned}$$
(2)

where \(Z = \mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) is the privacy loss random variable:

Definition 2

(Privacy Loss Random Variable). Let Y and \(Y'\) be random variables on \(\varOmega \). We define the privacy loss random variable between Y and \(Y'\) – denoted \(Z=\mathsf {PrivLoss}\left( Y\Vert Y'\right) \) – as follows. Define a function \(f : \varOmega \rightarrow \mathbb {R}\) by \(f(y) = \log ({\mathop {\mathbb {P}}\limits _{}}\left[ Y=y\right] /{\mathop {\mathbb {P}}\limits _{}}\left[ Y'=y\right] )\). Then Z is distributed according to f(Y).

Intuitively, the value of the privacy loss \(Z = \mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) represents how well we can distinguish x from \(x'\) given only the output M(x) or \(M(x')\). If \(Z>0\), then the observed output of M is more likely to have occurred if the input was x than if \(x'\) was the input. Moreover, the larger Z is, the bigger this likelihood ratio is. Likewise, \(Z<0\) indicates that the output is more likely if \(x'\) is the input. If \(Z=0\), both x and \(x'\) “explain” the output of M equally well.

A mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) is \(\varepsilon \)-differentially private if and only if \({\mathop {\mathbb {P}}\limits _{}}\left[ Z>\varepsilon \right] =0\), where \(Z = \mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) is the privacy loss of M on arbitrary inputs \(x,x' \in \mathcal {X}^n\) differing in one entry. On the other hand, M being \((\varepsilon ,\delta )\)-differentially private is equivalent, up to a small loss in parameters, to the requirement that \({\mathop {\mathbb {P}}\limits _{}}\left[ Z>\varepsilon \right] \le \delta \).

In contrast, zCDP entails a bound on the moment generating function of the privacy loss Z — that is, \({\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1)Z}\right] \) as a function of \(\alpha -1\). The bound (2) implies that Z is a subgaussian random variable with small mean. Intuitively, this means that Z resembles a Gaussian distribution with mean \(\xi +\rho \) and variance \(2\rho \). In particular, we obtain strong tail bounds on Z. Namely (2) implies that

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ Z>\lambda +\xi +\rho \right] \le e^{-\lambda ^2/4\rho } \end{aligned}$$

for all \(\lambda >0\).Footnote 5

Thus zCDP requires that the privacy loss random variable is concentrated around zero (hence the name). That is, Z is “small” with high probability, with larger deviations from zero becoming increasingly unlikely. Hence we are unlikely to be able to distinguish x from \(x'\) given the output of M(x) or \(M(x')\). Note that the randomness of the privacy loss random variable is taken only over the randomnesss of the mechanism M.

Comparison to the Definition of Dwork and Rothblum. For comparison, Dwork and Rothblum [DR16] define \((\mu ,\tau )\)-concentrated differential privacy for a randomized mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) as the requirement that, if \(Z=\mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) is the privacy loss for \(x,x'\in \mathcal {X}^n\) differing on one entry, then

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{}}}\left[ Z\right] \le \mu \qquad \text {and}\qquad {\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1)(Z-{\mathop {\mathbb {E}}\limits _{{}}}\left[ Z\right] )}\right] \le e^{(\alpha -1)^2\frac{1}{2}\tau ^2} \end{aligned}$$

for all \(\alpha \in \mathbb {R}\). That is, they require both a bound on the mean of the privacy loss and that the privacy loss is tightly concentrated around its mean. To distinguish our definitions, we refer to their definition as mean-concentrated differential privacy (or mCDP).

Our definition, zCDP, is a relaxation of mCDP. In particular, a \((\mu ,\tau )\)-mCDP mechanism is also \((\mu -\tau ^2/2,\tau ^2/2)\)-zCDP (which is tight for the Gaussian mechanism example), whereas the converse is not true. (However, a partial converse holds; see Lemma 24.)

1.2 Results

Relationship Between zCDP and Differential Privacy. Like Dwork and Rothblum’s formulation of concentrated differential privacy, zCDP can be thought of as providing guarantees of \((\varepsilon , \delta )\)-differential privacy for all values of \(\delta > 0\):

Proposition 3

If M provides \(\rho \)-zCDP, then M is \((\rho +2\sqrt{\rho \log (1/\delta )},\delta )\)-differentially private for any \(\delta >0\).

There is also a partial converse, which shows that, up to a loss in parameters, zCDP is equivalent to differential privacy with this \(\forall \delta >0\) quantification (see Lemma 22).

There is also a direct link from pure differential privacy to zCDP:

Proposition 4

If M satisfies \(\varepsilon \)-differential privacy, then M satisfies \((\frac{1}{2} \varepsilon ^2)\)-zCDP.

Dwork and Rothblum [DR16, Theorem 3.5] give a slightly weaker version of Proposition 4, which implies that \(\varepsilon \)-differential privacy yields \((\frac{1}{2} \varepsilon (e^\varepsilon -1))\)-zCDP; this improves on an earlier bound [DRV10] by the factor \(\frac{1}{2}\).

Propositions 3 and 4 show that zCDP is an intermediate notion between pure differential privacy and approximate differential privacy. Indeed, many algorithms satisfying approximate differential privacy do in fact also satisfy zCDP.

Gaussian Mechanism. Just as with mCDP, the prototypical example of a mechanism satisfying zCDP is the Gaussian mechanism, which answers a real-valued query on a database by perturbing the true answer with Gaussian noise.

Definition 5

(Sensitivity). A function \(q : \mathcal {X}^n \rightarrow \mathbb {R}\) has sensitivity \(\varDelta \) if for all \(x, x' \in \mathcal {X}^n\) differing in a single entry, we have \(|q(x) - q(x')| \le \varDelta \).

Proposition 6

(Gaussian Mechanism). Let \(q : \mathcal {X}^n \rightarrow \mathbb {R}\) be a sensitivity-\(\varDelta \) query. Consider the mechanism \(M : \mathcal {X}^n \rightarrow \mathbb {R}\) that on input x, releases a sample from \(\mathcal {N}(q(x), \sigma ^2)\). Then M satisfies \((\varDelta ^2/2\sigma ^2)\)-zCDP.

We remark that either inequality defining zCDP — (1) or (2) — is exactly tight for the Gaussian mechanism for all values of \(\alpha \). Thus the definition of zCDP seems tailored to the Gaussian mechanism.

Basic Properties of zCDP. Our definition of zCDP satisfies the key basic properties of differential privacy. Foremost, these properties include smooth degradation under composition, and invariance under postprocessing:

Lemma 7

(Composition). Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) and \(M' : \mathcal {X}^n \rightarrow \mathcal {Z}\) be randomized algorithms. Suppose M satisfies \(\rho \)-zCDP and \(M'\) satisfies \(\rho '\)-zCDP. Define \(M'' : \mathcal {X}^n \rightarrow \mathcal {Y} \times \mathcal {Z}\) by \(M''(x) = (M(x),M'(x))\). Then \(M''\) satisfies \((\rho +\rho ')\)-zCDP.

Lemma 8

(Postprocessing). Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) and \(f : \mathcal {Y} \rightarrow \mathcal {Z}\) be randomized algorithms. Suppose M satisfies \(\rho \)-zCDP. Define \(M' : \mathcal {X}^n \rightarrow \mathcal {Z}\) by \(M'(x) = f(M(x))\). Then \(M'\) satisfies \(\rho \)-zCDP.

These properties follow immediately from corresponding properties of the Rényi divergence outlined in Lemma 15.

We remark that Dwork and Rothblum’s definition of mCDP is not closed under postprocessing; we provide a counterexample in the full version of this work. (However, an arbitrary amount of postprocessing can worsen the guarantees of mCDP by at most constant factors.)

Group Privacy. A mechanism M guarantees group privacy if no small group of individuals has a significant effect on the outcome of a computation (whereas the definition of zCDP only refers to individuals, which are groups of size 1). That is, group privacy for groups of size k guarantees that, if x and \(x'\) are inputs differing on k entries (rather than a single entry), then the outputs M(x) and \(M(x')\) are close.

Dwork and Rothblum [DR16, Theorem 4.1] gave nearly tight bounds on the group privacy guarantees of concentrated differential privacy, showing that a \((\mu =\tau ^2/2, \tau )\)-concentrated differentially private mechanism affords \((k^2\mu \cdot (1 + o(1)), k\tau \cdot (1 + o(1)))\)-concentrated differential privacy for groups of size \(k = o(1/\tau )\). We are able to show a group privacy guarantee for zCDP that is exactly tight and works for a wider range of parameters:

Proposition 9

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \(\rho \)-zCDP. Then M guarantees \((k^2\rho )\)-zCDP for groups of size k — i.e. for every \(x, x' \in \mathcal {X}^n\) differing in up to k entries and every \(\alpha \in (1, \infty )\), we have

$$\begin{aligned} \mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) \le (k^2\rho ) \cdot \alpha . \end{aligned}$$

In particular, this bound is achieved (simultaneously for all values \(\alpha \)) by the Gaussian mechanism. Our proof is also simpler than that of Dwork and Rothblum; see Sect. 5.

Lower Bounds. The strong group privacy guarantees of zCDP yield, as an unfortunate consequence, strong lower bounds as well. We show that, as with pure differential privacy, zCDP is susceptible to information-based lower bounds, as well as to so-called packing arguments [HT10, MMP+10, De12]:

Theorem 10

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \(\rho \)-zCDP. Let X be a random variable on \(\mathcal {X}^n\). Then

$$\begin{aligned} I\left( X;M(X)\right) \le \rho \cdot n^2, \end{aligned}$$

where \(I(\cdot ;\cdot )\) denotes the mutual information between the random variables (in nats, rather than bits). Furthermore, if the entries of X are independent, then \(I(X;M(X)) \le \rho \cdot n\).

Theorem 10 yields strong lower bounds for zCDP mechanisms, as we can construct distributions X such that M(X) reveals a lot of information about X (i.e. I(XM(X)) is large) for any accurate M.

In particular, we obtain a strong separation between approximate differential privacy and zCDP. For example, we can show that releasing an accurate approximate histogram (or, equivalently, accurately answering all point queries) on a data domain of size k requires an input with at least \(n=\varTheta (\sqrt{\log k})\) entries to satisfy zCDP. In contrast, under approximate differential privacy, n can be independent of the domain size k [BNS13]! In particular, our lower bounds show that “stability-based” techniques (such as those in the propose-test-release framework [DL09]) are not compatible with zCDP.

Our lower bound exploits the strong group privacy guarantee afforded by zCDP. Group privacy has been used to prove tight lower bounds for pure differential privacy [HT10, De12] and approximate differential privacy [SU15a]. These results highlight the fact that group privacy is often the limiting factor for private data analysis. For \((\varepsilon ,\delta )\)-differential privacy, group privacy becomes vacuous for groups of size \(k=\varTheta (\log (1/\delta )/\varepsilon )\). Indeed, stability-based techniques exploit precisely this breakdown in group privacy.

As a result of this strong lower bound, we show that any mechanism for answering statistical queries that satisfies zCDP can be converted into a mechanism satisfying pure differential privacy with only a quadratic blowup in its sample complexity. More precisely, the following theorem illustrates a more general result we prove in Sect. 7.

Theorem 11

Let \(n \in \mathbb {N}\) and \(\alpha \ge 1/n\) be arbitrary. Set \(\varepsilon =\alpha \) and \(\rho =\alpha ^2\). Let \(q : \mathcal {X} \rightarrow [0,1]^k\) be an arbitrary family of statistical queries. Suppose \(M : \mathcal {X}^n \rightarrow [0,1]^k\) satisfies \(\rho \)-zCDP and

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{M}}}\left[ \Vert M(x)-q(x)\Vert _\infty \right] \le \alpha \end{aligned}$$

for all \(x \in \mathcal {X}^n\). Then there exists \(M' : \mathcal {X}^{n'} \rightarrow [0,1]^k\) for \(n'=5n^2\) satisfying \(\varepsilon \)-differential privacy and

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{M'}}}\left[ \Vert M'(x)-q(x)\Vert _\infty \right] \le 10\alpha \end{aligned}$$

for all \(x \in \mathcal {X}^{n'}\).

For some classes of queries, this reduction is essentially tight. For example, for k one-way marginals, the Gaussian mechanism achieves sample complexity \(n=\varTheta (\sqrt{k})\) subject to zCDP, whereas the Laplace mechanism achieves sample complexity \(n=\varTheta (k)\) subject to pure differential privacy, which is known to be optimal. For more details, see Sects. 6 and 7.

Approximate zCDP. To circumvent these strong lower bounds for zCDP, we consider a relaxation of zCDP in the spirit of approximate differential privacy that permits a small probability \(\delta \) of (catastrophic) failure:

Definition 12 (Approximate zCDP)

A randomized mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) is \(\delta \)-approximately \((\xi ,\rho )\)-zCDP if, for all \(x,x' \in \mathcal {X}^n\) differing on a single entry, there exist events E (depending on M(x)) and \(E'\) (depending on \(M(x')\)) such that \({\mathop {\mathbb {P}}\limits _{}}\left[ E\right] \ge 1-\delta \), \({\mathop {\mathbb {P}}\limits _{}}\left[ E'\right] \ge 1-\delta \), and

$$\begin{aligned} \forall \alpha \in (1,\infty ) \qquad \mathrm {D}_{\alpha }\left( M(x)|_{E}\Vert M(x')|_{E'}\right)&\le \xi + \rho \cdot \alpha \qquad \\ \wedge&\qquad \mathrm {D}_{\alpha }\left( M(x')|_{E'}\Vert M(x)|_{E}\right) \le \xi + \rho \cdot \alpha , \end{aligned}$$

where \(M(x)|_{E}\) denotes the distribution of M(x) conditioned on the event E. We further define \(\delta \)-approximate \(\rho \)-zCDP to be \(\delta \)-approximate \((0,\rho )\)-zCDP.

In particular, setting \(\delta =0\) gives the original definition of zCDP. However, this definition unifies zCDP with approximate differential privacy:

Proposition 13

If M satisfies \((\varepsilon ,\delta )\)-differential privacy, then M satisfies \(\delta \)-approximate \(\frac{1}{2} \varepsilon ^2\)-zCDP.

Approximate zCDP retains most of the desirable properties of zCDP, but allows us to incorporate stability-based techniques and bypass the above lower bounds. This also presents a unified tool to analyse a composition of zCDP with approximate differential privacy; see Sect. 8.

Related Work. Our work builds on the aforementioned prior work of Dwork and Rothblum [DR16].Footnote 6 We view our definition of concentrated differential privacy as being “morally equivalent” to their definition of concentrated differential privacy, in the sense that both definitions formalize the same concept.Footnote 7 (The formal relationship between the two definitions is discussed in Sect. 4.) However, the definition of zCDP generally seems easier to work with than mCDP. In particular, our formulation in terms of Rényi divergence simplifies many analyses.

Dwork and Rothblum prove several results about concentrated differential privacy that are similar to ours. Namely, they prove analogous properties of mCDP as we prove for zCDP. However, as noted, some of their bounds are weaker than ours; also, they do not explore lower bounds.

Several of the ideas underlying concentrated differential privacy are implicit in earlier works. In particular, the proof of the advanced composition theorem of Dwork, Rothblum, and Vadhan [DRV10] essentially uses the ideas of concentrated differential privacy.

We also remark that Tardos [Tar08] used Rényi divergence to prove lower bounds for cryptographic objects called fingerprinting codes. Fingerprinting codes turn out to be closely related to differential privacy [Ull13, BUV14, SU15b], and Tardos’ lower bound can be (loosely) viewed as a kind of privacy-preserving algorithm.

Further Work. We believe that concentrated differential privacy is a useful tool for analysing private computations, as it provides both simpler and tighter bounds. We hope that CDP will be prove useful in both the theory and practice of differential privacy.

Furthermore, our lower bounds show that CDP can really be a much more stringent condition than approximate differential privacy. Thus CDP defines a “subclass” of all \((\varepsilon , \delta )\)-differentially private algorithms. This subclass includes most differentially private algorithms in the literature, but not all — the most notable exceptions being algorithms that use the propose-test-release approach [DL09] to exploit low local sensitivity.

This “CDP subclass” warrants further exploration. In particular, is there a “complete” mechanism for this class of algorithms, in the same sense that the exponential mechanism [MT07, BLR13] is complete for pure differential privacy? Can we obtain a simple characterization of the sample complexity needed to satisfy CDP? The ability to prove stronger and simpler lower bounds for CDP than for approximate DP may be useful for showing the limitations of certain algorithmic paradigms. For example, any differentially private algorithm that only uses the Laplace mechanism, the exponential mechanism, the Gaussian mechanism, and the “sparse vector” technique, along with composition and postprocessing will be subject to the lower bounds for CDP.

There is also room to examine how to interpret the zCDP privacy guarantee. In particular, we leave it as an open question to understand the extent to which \(\rho \)-zCDP provides a stronger privacy guarantee than the implied \((\varepsilon ,\delta )\)-DP guarantees (cf. Proposition 3).

In general, much of the literature on differential privacy can be re-examined through the lens of CDP, which may yield new insights and results.

2 Rényi Divergence

Recall the definition of Rényi divergence:

Definition 14

(Rényi Divergence [Ren61, Eq. (3.3)]). Let P and Q be probability distributions on \(\varOmega \). For \(\alpha \in (1,\infty )\), we define the Rényi divergence of order \(\alpha \) between P and Q as

$$\begin{aligned} \mathrm {D}_{\alpha }\left( P\Vert Q\right) =&\frac{1}{\alpha -1} \log \left( \int _\varOmega P(x)^\alpha Q(x)^{1-\alpha } \mathrm {d} x \right) \\ =&\frac{1}{\alpha -1} \log \left( {\mathop {\mathbb {E}}\limits _{{x \sim P}}}\left[ \left( \frac{P(x)}{Q(x)}\right) ^{\alpha -1}\right] \right) , \end{aligned}$$

where \(P(\cdot )\) and \(Q(\cdot )\) are the probability mass/density functions of P and Q respectively or, more generally, \(P(\cdot )/Q(\cdot )\) is the Radon-Nikodym derivative of P with respect to Q.

We also define the KL-divergence

$$\begin{aligned} \mathrm {D}_{1}\left( P\Vert Q\right) = \lim _{\alpha \rightarrow 1} \mathrm {D}_{\alpha }\left( P\Vert Q\right) = \int _\varOmega P(x) \log \left( \frac{P(x)}{Q(x)}\right) \mathrm {d}x \end{aligned}$$

and the max-divergence

$$\begin{aligned} \mathrm {D}_{\infty }\left( P\Vert Q\right) = \lim _{\alpha \rightarrow \infty } \mathrm {D}_{\alpha }\left( P\Vert Q\right) = \sup _{x \in \varOmega } \log \left( \frac{P(x)}{Q(x)}\right) . \end{aligned}$$

Alternatively, Rényi divergence can be defined in terms of the privacy loss (Definition 2) between P and Q:

$$\begin{aligned} e^{(\alpha -1)\mathrm {D}_{\alpha }\left( P\Vert Q\right) } = {\mathop {\mathbb {E}}\limits _{{Z\sim \mathsf {PrivLoss}\left( P\Vert Q\right) }}}\left[ e^{(\alpha -1)Z}\right] \end{aligned}$$

for all \(\alpha \in (1,\infty )\). Moreover, \(\mathrm {D}_{1}\left( P\Vert Q\right) = {\mathop {\mathbb {E}}\limits _{{Z\sim \mathsf {PrivLoss}\left( P\Vert Q\right) }}}\left[ Z\right] \).

We record several useful and well-known properties of Rényi divergence. We refer the reader to [vEH14] for proofs and discussion of these (and many other) properties.

Lemma 15

Let P and Q be probability distributions and \(\alpha \in [1,\infty ]\).

  • Non-negativity: \(\mathrm {D}_{\alpha }\left( P\Vert Q\right) \ge 0\) with equality if and only if \(P=Q\).

  • Composition: Suppose P and Q are distributions on \(\varOmega \times \varTheta \). Let \(P'\) and \(Q'\) denote the marginal distributions on \(\varOmega \) induced by P and Q respectively. For \(x \in \varOmega \), let \(P'_x\) and \(Q'_x\) denote the conditional distributions on \(\varTheta \) induced by P and Q respectively, where x specifies the first coordinate. Then

    $$\begin{aligned} \mathrm {D}_{\alpha }\left( P'\Vert Q'\right) + \min _{x \in \varOmega } \mathrm {D}_{\alpha }\left( P'_x\Vert Q'_x\right) \le \mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \mathrm {D}_{\alpha }\left( P'\Vert Q'\right) + \max _{x \in \varOmega } \mathrm {D}_{\alpha }\left( P'_x\Vert Q'_x\right) . \end{aligned}$$

    In particular if P and Q are product distributions, then the Rényi divergence between P and Q is just the sum of the Rényi divergences of the marginals.

  • Quasi-Convexity: Let \(P_0, P_1\) and \(Q_0, Q_1\) be distributions on \(\varOmega \), and let \(P = tP_0 + (1-t)P_1\) and \(Q = tQ_0 + (1-t)Q_1\) for \(t \in [0, 1]\). Then \(\mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \max \{\mathrm {D}_{\alpha }\left( P_0\Vert Q_0\right) , \mathrm {D}_{\alpha }\left( P_1\Vert Q_1\right) \}\). Moreover, KL divergence is convex:

    $$\begin{aligned} \mathrm {D}_{1}\left( P\Vert Q\right) \le t \mathrm {D}_{1}\left( P_0\Vert Q_0\right) +(1-t)\mathrm {D}_{1}\left( P_1\Vert Q_1\right) . \end{aligned}$$
  • Postprocessing: Let P and Q be distributions on \(\varOmega \) and let \(f : \varOmega \rightarrow \varTheta \) be a function. Let f(P) and f(Q) denote the distributions on \(\varTheta \) induced by applying f to P or Q respectively. Then \(\mathrm {D}_{\alpha }\left( f(P)\Vert f(Q)\right) \le \mathrm {D}_{\alpha }\left( P\Vert Q\right) \).

    Note that quasi-convexity allows us to extend this guarantee to the case where f is a randomized mapping.

  • Monotonicity: For \(1 \le \alpha \le \alpha ' \le \infty \), \(\mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \mathrm {D}_{\alpha '}\left( P\Vert Q\right) \).

2.1 Gaussian Mechanism

The following lemma gives the Rényi divergence between two Gaussian distributions with the same variance.

Lemma 16

Let \(\mu , \nu , \sigma \in \mathbb {R}\) and \(\alpha \in [1,\infty )\). Then

$$\begin{aligned} \mathrm {D}_{\alpha }\left( \mathcal {N}(\mu ,\sigma ^2)\Vert \mathcal {N}(\nu ,\sigma ^2)\right) = \frac{\alpha (\mu - \nu )^2}{2\sigma ^2} \end{aligned}$$

Consequently, the Gaussian mechanism, which answers a sensitivity-\(\varDelta \) query by adding noise drawn from \(\mathcal {N}(0,\sigma ^2)\), satisfies \(\left( \frac{\varDelta ^2}{2\sigma ^2}\right) \)-zCDP (Proposition 6).

For the multivariate Gaussian mechanism, Lemma 16 generalises to the following.

Lemma 17

Let \(\mu , \nu \in \mathbb {R}^d\), \(\sigma \in \mathbb {R}\), and \(\alpha \in [1,\infty )\). Then

$$\begin{aligned} \mathrm {D}_{\alpha }\left( \mathcal {N}(\mu ,\sigma ^2 I_d)\Vert \mathcal {N}(\nu ,\sigma ^2 I_d)\right) = \frac{\alpha \Vert \mu - \nu \Vert _2^2}{2\sigma ^2} \end{aligned}$$

Thus, if \(M : \mathcal {X}^n \rightarrow \mathbb {R}^d\) is the mechanism that, on input x, releases a sample from \(\mathcal {N}(q(x), \sigma ^2 I_d)\) for some function \(q : \mathcal {X}^n \rightarrow \mathbb {R}^d\), then M satisfies \(\rho \)-zCDP for

$$\begin{aligned} \rho = \frac{1}{2\sigma ^2} \sup _{x,x' \in \mathcal {X}^n \atop \text {differing in one entry}} \Vert q(x)-q(x')\Vert _2^2 .\end{aligned}$$
(3)

3 Relation to Differential Privacy

We now discuss the relationship between zCDP and the traditional definitions of pure and approximate differential privacy. There is a close relationship between the notions, but not an exact characterization.

Definition 18

(Differential Privacy (DP) [DMNS06, DKM+06]). A randomized mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\varepsilon ,\delta )\)-differential privacy if, for all \(x,x' \in \mathcal {X}\) differing in a single entry, we have

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ M(x) \in S\right] \le e^\varepsilon {\mathop {\mathbb {P}}\limits _{}}\left[ M(x') \in S\right] + \delta \end{aligned}$$

for all (measurable) \(S \subset \mathcal {Y}\). Further define \(\varepsilon \)-differential privacy to be \((\varepsilon ,0)\)-differential privacy.

3.1 Pure DP versus zCDP

We now show that \(\varepsilon \)-differential privacy implies \((\frac{1}{2}\varepsilon ^2)\)-zCDP (Proposition 4).

Proposition 19

Let P and Q be probability distributions on \(\varOmega \) satisfying \(\mathrm {D}_{\infty }\left( P\Vert Q\right) \le \varepsilon \) and \(\mathrm {D}_{\infty }\left( Q\Vert P\right) \le \varepsilon \). Then \(\mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \frac{1}{2} \varepsilon ^2 \alpha \) for all \(\alpha >1\).

Remark 20

In particular, Proposition 19 shows that the KL-divergence \(\mathrm {D}_{1}\left( P\Vert Q\right) \le \frac{1}{2}\varepsilon ^2\). A bound on the KL-divergence between random variables in terms of their max-divergence is an important ingredient in the analysis of the advanced composition theorem [DRV10]. Our bound sharpens (up to lower order terms) and, in our opinion, simplifies the previous bound of \(\mathrm {D}_{1}\left( P\Vert Q\right) \le \frac{1}{2}\varepsilon (e^{\varepsilon }-1)\) proved by Dwork and Rothblum [DR16].

Proof

(Proof of Proposition  19). We may assume \(\frac{1}{2} \varepsilon \alpha \le 1\), as otherwise \(\frac{1}{2} \varepsilon ^2 \alpha > \varepsilon \), whence the result follows from monotonicity. We must show that

$$\begin{aligned} e^{(\alpha -1)\mathrm {D}_{\alpha }\left( P\Vert Q\right) } = {\mathop {\mathbb {E}}\limits _{{x \sim Q}}}\left[ \left( \frac{P(x)}{Q(x)}\right) ^\alpha \right] \le e^{\frac{1}{2} \alpha (\alpha -1)\varepsilon ^2}. \end{aligned}$$

We know that \(e^{-\varepsilon } \le \frac{P(x)}{Q(x)} \le e^{\varepsilon }\) for all x. Define a random function \(A : \varOmega \rightarrow \{e^{-\varepsilon },e^\varepsilon \}\) by \({\mathop {\mathbb {E}}\limits _{{A}}}\left[ A(x)\right] ={\frac{P(x)}{Q(x)}}\) for all x. By Jensen’s inequality,

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{x \sim Q}}}\left[ \left( \frac{P(x)}{Q(x)}\right) ^{\alpha }\right] = {\mathop {\mathbb {E}}\limits _{{x \sim Q}}}\left[ \left( {\mathop {\mathbb {E}}\limits _{{A}}}\left[ A(x)\right] \right) ^{\alpha }\right] \le {\mathop {\mathbb {E}}\limits _{{x \sim Q}}}\left[ {\mathop {\mathbb {E}}\limits _{{A}}}\left[ A(x)^\alpha \right] \right] = {\mathop {\mathbb {E}}\limits _{{A}}}\left[ A^\alpha \right] , \end{aligned}$$

where A denotes A(x) for a random \(x \sim Q\). We also have \({\mathop {\mathbb {E}}\limits _{{A}}}\left[ A\right] = {\mathop {\mathbb {E}}\limits _{{x \sim Q}}}\left[ \frac{P(x)}{Q(x)}\right] =1\). From this equation, we can conclude that

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{A}}\left[ A=e^{-\varepsilon }\right] = \frac{e^\varepsilon -1}{e^\varepsilon -e^{-\varepsilon }} \qquad \text {and} \qquad {\mathop {\mathbb {P}}\limits _{A}}\left[ A=e^{\varepsilon }\right] = \frac{1-e^{-\varepsilon }}{e^\varepsilon -e^{-\varepsilon }}. \end{aligned}$$

Thus

$$\begin{aligned} e^{(\alpha -1)\mathrm {D}_{\alpha }\left( P\Vert Q\right) } \le \,&{\mathop {\mathbb {E}}\limits _{{A}}}\left[ A^\alpha \right] \\ =&\frac{e^\varepsilon -1}{e^\varepsilon -e^{-\varepsilon }} \cdot e^{-\alpha \varepsilon } + \frac{1-e^{-\varepsilon }}{e^\varepsilon -e^{-\varepsilon }} \cdot e^{\alpha \varepsilon }\\ =&\frac{\sinh (\alpha \varepsilon )-\sinh ((\alpha -1)\varepsilon )}{\sinh (\varepsilon )}. \end{aligned}$$

The result now follows from the following inequality, which is proved in the full version of this work.

$$\begin{aligned} 0 \le y < x \le 2 \implies \frac{\sinh (x)-\sinh (y)}{\sinh (x-y)} \le e^{\frac{1}{2} xy}. \end{aligned}$$

3.2 Approximate DP versus zCDP

The statements in this section show that, up to some loss in parameters, zCDP is equivalent to a family of \((\varepsilon , \delta )\)-DP guarantees for all \(\delta > 0\).

Lemma 21

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \((\xi ,\rho )\)-zCDP. Then M satisfies \((\varepsilon ,\delta )\)-DP for all \(\delta >0\) and

$$\begin{aligned} \varepsilon = \xi + \rho + \sqrt{4\rho \log (1/\delta )}. \end{aligned}$$

Thus to achieve a given \((\varepsilon ,\delta )\)-DP guarantee it suffices to satisfy \((\xi ,\rho )\)-zCDP with

$$\begin{aligned} \rho = \left( \sqrt{\varepsilon -\xi +\log (1/\delta )}-\sqrt{\log (1/\delta )}\right) ^2 \approx \frac{(\varepsilon -\xi )^2}{4\log (1/\delta )}. \end{aligned}$$

Proof

Let \(x,x' \in \mathcal {X}^n\) be neighbouring. Define \(f(y) = \log ({\mathop {\mathbb {P}}\limits _{}}\left[ M(x)=y\right] /{\mathop {\mathbb {P}}\limits _{}}\left[ M(x')=y\right] )\). Let \(Y\sim M(x)\) and \(Z=f(Y)\). That is, \(Z=\mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) is the privacy loss random variable. Fix \(\alpha \in (1,\infty )\) to be chosen later. Then

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1)Z}\right] ={\mathop {\mathbb {E}}\limits _{{Y \sim M(x)}}}\left[ \left( \frac{{\mathop {\mathbb {P}}\limits _{}}\left[ M(x)=Y\right] }{{\mathop {\mathbb {P}}\limits _{}}\left[ M(x')=Y\right] }\right) ^{\alpha -1}\right] =e^{(\alpha -1)\mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) } \le e^{(\alpha -1)(\xi +\rho \alpha )}. \end{aligned}$$

By Markov’s inequality

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ Z>\varepsilon \right] = {\mathop {\mathbb {P}}\limits _{}}\left[ e^{(\alpha -1) Z} > e^{(\alpha -1)\varepsilon }\right] \le \frac{{\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1) Z}\right] }{e^{(\alpha -1)\varepsilon }} \le e^{(\alpha -1)(\xi +\rho \alpha -\varepsilon )}. \end{aligned}$$

Choosing \(\alpha = (\varepsilon - \xi + \rho )/2\rho > 1\) gives

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ Z>\varepsilon \right] \le e^{-(\varepsilon - \xi - \rho )^2/4\rho } \le \delta . \end{aligned}$$

This implies that for any measurable \(S \subset \mathcal {Y}\),

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ M(x) \in S\right] \le e^\varepsilon {\mathop {\mathbb {P}}\limits _{}}\left[ M(x') \in S\right] + \delta . \end{aligned}$$

Lemma 21 is not tight, and we give a quantitative refinement in Lemma 38 (setting \(\delta =0\) there). There, we also show a partial converse to Lemma 21:

Lemma 22

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \((\varepsilon ,\delta )\)-DP for all \(\delta >0\) and

$$\begin{aligned} \varepsilon = \hat{\xi }+\sqrt{\hat{\rho }\log (1/\delta )} \end{aligned}$$
(4)

for some constants \(\hat{\xi }, \hat{\rho }\in [0,1]\). Then M is \(\left( \hat{\xi }-\frac{1}{4} \hat{\rho }+5\root 4 \of {\hat{\rho }}, \frac{1}{4} \hat{\rho }\right) \)-zCDP.

Thus zCDP and DP are equivalent up to a (potentially substantial) loss in parameters and the quantification over all \(\delta \).

4 Zero- versus Mean-Concentrated Differential Privacy

We begin by recalling the definition of mean-concentrated differential privacy:

Definition 23

(Mean-Concentrated Differential Privacy [DR16]). A randomized mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\mu ,\tau )\)-mCDP if, for all \(x,x' \in \mathcal {X}^n\) differing in one entry, and letting \(Z = \mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \), we have

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{}}}\left[ Z\right] \le \mu \quad \text {and} \quad {\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{\lambda \left( Z-{\mathop {\mathbb {E}}\limits _{{}}}\left[ Z\right] \right) }\right] \le e^{\lambda ^2 \cdot \tau ^2 / 2} \end{aligned}$$

for all \(\lambda \in \mathbb {R}\).

In contrast \((\xi ,\rho )\)-zCDP requires that, for all \(\alpha \in (1,\infty )\), \({\mathop {\mathbb {E}}\limits _{{}}}\left[ e^{(\alpha -1)Z}\right] \le e^{(\alpha -1)(\xi +\rho \alpha )}\), where \(Z\sim \mathsf {PrivLoss}\left( M(x)\Vert M(x')\right) \) is the privacy loss random variable. In the full version of this work, we show that these definitions are equivalent up to a (potentially significant) loss in parameters.

Lemma 24

If \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\mu ,\tau )\)-mCDP, then M satisfies \((\mu -\tau ^2/2,\tau ^2/2)\)-zCDP. Conversely, if \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\xi ,\rho )\)-zCDP, then M satisfies \((\xi +\rho ,O(\sqrt{\xi +2\rho }))\)-mCDP.

Thus we can convert \((\mu ,\tau )\)-mCDP into \((\mu -\tau ^2/2,\tau ^2/2)\)-zCDP and then back to \((\mu , O(\sqrt{\mu +\tau ^2/2}))\)-mCDP. This may result in a large loss in parameters, which is why, for example, pure DP can be characterised in terms of zCDP, but not in terms of mCDP.

We view zCDP as a relaxation of mCDP; mCDP requires the privacy loss to be “tightly” concentrated about its mean and that the mean is close to the origin. The triangle inequality then implies that the privacy loss is “weakly” concentrated about the origin. (The difference between “tightly” and “weakly” accounts for the use of the triangle inequality.) On the other hand, zCDP direcly requires that the privacy loss is weakly concentrated about the origin. That is, zCDP gives a subgaussian bound on the privacy loss that is centered at zero, whereas mCDP gives a subgaussian bound that is centered at the mean and separately bounds the mean.

There may be some advantage to the stronger requirement of mCDP, either in terms of what kind of privacy guarantee it affords, or how it can be used as an analytic tool. However, it seems that for most applications, we only need what zCDP provides.

5 Group Privacy

In this section we show that zCDP provides privacy protections to small groups of individuals.

Definition 25

(zCDP for Groups). We say that a mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) provides \((\xi ,\rho )\)-zCDP for groups of size k if, for every \(x,x' \in \mathcal {X}^n\) differing in at most k entries, we have

$$\begin{aligned} \forall \alpha \in (1,\infty ) \qquad \mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) \le \xi + \rho \cdot \alpha . \end{aligned}$$

The usual definition of zCDP only applies to groups of size 1. Here we show that it implies bounds for all group sizes. We begin with a technical lemma.

Lemma 26

(Triangle-like Inequality for Rényi Divergence). Let P, Q, and R be probability distributions. Then

$$\begin{aligned} \mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \frac{k\alpha }{k\alpha -1}\mathrm {D}_{ \frac{k\alpha -1}{k-1}}\left( P\Vert R\right) + \mathrm {D}_{k\alpha }\left( R\Vert Q\right) \end{aligned}$$
(5)

for all \(k,\alpha \in (1,\infty )\).

Proof

Let \(p=\frac{k\alpha -1}{\alpha (k-1)}\) and \(q=\frac{k\alpha -1}{\alpha -1}\). Then \(\frac{1}{p}+\frac{1}{q}=\frac{\alpha (k-1) + (\alpha -1)}{k\alpha -1}=1\). By Hölder’s inequality,

$$\begin{aligned} e^{(\alpha -1)\mathrm {D}_{\alpha }\left( P\Vert Q\right) }=&\int _\varOmega P(x)^\alpha Q(x)^{1-\alpha } \mathrm {d}x\\ =&\int _\varOmega P(x)^\alpha R(x)^{-\alpha } \cdot R(x)^{\alpha -1} Q(x)^{1-\alpha } \cdot R(x) \mathrm {d}x\\ =&{\mathop {\mathbb {E}}\limits _{{x \sim R}}}\left[ \left( \frac{P(x)}{R(x)}\right) ^\alpha \cdot \left( \frac{R(x)}{Q(x)}\right) ^{\alpha -1}\right] \\ \le&{\mathop {\mathbb {E}}\limits _{{x \sim R}}}\left[ \left( \frac{P(x)}{R(x)}\right) ^{p\alpha }\right] ^{1/p} \cdot {\mathop {\mathbb {E}}\limits _{{x \sim R}}}\left[ \left( \frac{R(x)}{Q(x)}\right) ^{q(\alpha -1)}\right] ^{1/q}\\ =&e^{(p\alpha -1)\mathrm {D}_{p\alpha }\left( P\Vert R\right) /p} \cdot e^{q(\alpha -1)\mathrm {D}_{q(\alpha -1)+1}\left( R\Vert Q\right) /q}. \end{aligned}$$

Taking logarithms and rearranging gives

$$\begin{aligned} \mathrm {D}_{\alpha }\left( P\Vert Q\right) \le \frac{p\alpha -1}{p(\alpha -1)}\mathrm {D}_{p\alpha }\left( P\Vert R\right) + \mathrm {D}_{q(\alpha -1)+1}\left( R\Vert Q\right) . \end{aligned}$$

Now \(p\alpha =\frac{k\alpha -1}{k-1}\), \(q(\alpha -1)+1=k\alpha \), and \(\frac{p\alpha -1}{p(\alpha -1)} = \frac{k\alpha }{k\alpha -1}\).

Proposition 27

If \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\xi ,\rho )\)-zCDP, then M gives \((\xi \cdot k \sum _{i=1}^k \frac{1}{i}, \rho \cdot k^2)\)-zCDP for groups of size k.

In particular, \((\xi ,\rho )\)-zCDP implies \((\xi \cdot O(k \log k), \rho \cdot k^2)\)-zCDP for groups of size k. The Gaussian mechanism shows that \(k^2\rho \) is the optimal dependence on \(\rho \). However, \(O(k \log k)\xi \) is not the optimal dependence on \(\xi \): \((\xi ,0)\)-zCDP implies \((k\xi ,0)\)-zCDP for groups of size k.

Proof

We show this by induction on k. The statement is clearly true for groups of size 1. We now assume the statement holds for groups of size \(k-1\) and will verify it for groups of size k.

Let \(x,x' \in \mathcal {X}^n\) differ in k entries. Let \(\hat{x} \in \mathcal {X}^n\) be such that x and \(\hat{x}\) differ in \(k-1\) entries and \(x'\) and \(\hat{x}\) differ in one entry.

Then, by the induction hypothesis,

$$\begin{aligned} \mathrm {D}_{\alpha }\left( M(x)\Vert M(\hat{x})\right) \le \xi \cdot (k-1) \sum _{i=1}^{k-1} \frac{1}{i} + \rho \cdot (k-1)^2 \cdot \alpha \end{aligned}$$

and, by zCDP,

$$\begin{aligned} \mathrm {D}_{\alpha }\left( M(\hat{x})\Vert M(x')\right) \le \xi + \rho \cdot \alpha \end{aligned}$$

for all \(\alpha \in (1,\infty )\). By (5), for any \(\alpha \in (1,\infty )\),

$$\begin{aligned}&\mathrm {D}_{\alpha }\left( M(x)\Vert M(x')\right) \\&\le \frac{k\alpha }{k\alpha -1}\mathrm {D}_{ \frac{k\alpha -1}{k-1}}\left( M(x)\Vert M(\hat{x})\right) + \mathrm {D}_{k\alpha }\left( M(\hat{x})\Vert M(x')\right) \\&\le \frac{k\alpha }{k\alpha -1}\left( \xi \cdot (k-1) \sum _{i=1}^{k-1} \frac{1}{i} + \rho \cdot (k-1)^2 \cdot \frac{k\alpha -1}{k-1} \right) + \xi + \rho \cdot k\alpha \\&= \xi \cdot \left( 1 + \frac{k\alpha }{k\alpha -1} (k-1) \sum _{i=1}^{k-1} \frac{1}{i} \right) + \rho \cdot \left( \frac{k\alpha }{k\alpha -1} (k-1)^2 \frac{k\alpha -1}{k-1} + k\alpha \right) \\&\le \xi \cdot k \sum _{i=1}^{k} \frac{1}{i} + \rho \cdot k^2 \cdot \alpha , \end{aligned}$$

where the last inequality follows from the fact that \(\frac{k\alpha }{k\alpha -1}\) is a decreasing function of \(\alpha \) for \(\alpha >1\).

6 Lower Bounds

In this section we develop tools to prove lower bounds for zCDP. We will use group privacy to bound the mutual information between the input and the output of a mechanism satisfying zCDP. Thus, if we are able to construct a distribution on inputs such that any accurate mechanism must reveal a high amount of information about its input, we obtain a lower bound showing that no accurate mechanism satisfying zCDP can be accurate for this data distribution.

We begin with the simplest form of our mutual information bound, which is an analogue of the bound of [MMP+10] for pure differential privacy:

Proposition 28

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \((\xi ,\rho )\)-zCDP. Let X be a random variable in \(\mathcal {X}^n\). Then

$$\begin{aligned} I( X ; M(X) ) \le \xi \cdot n (1+\log n) + \rho \cdot n^2, \end{aligned}$$

where I denotes mutual information (measured in nats, rather than bits).

Proof

By Proposition 27, M is \((\xi \cdot n \sum _{i=1}^n \frac{1}{i}, \rho \cdot n^2)\)-zCDP for groups of size n. Thus

$$\begin{aligned} \mathrm {D}_{1}\left( M(x)\Vert M(x')\right) \le \xi \cdot n \sum _{i=1}^n \frac{1}{i} + \rho \cdot n^2 \le \xi \cdot n (1+\log n) + \rho \cdot n^2 \end{aligned}$$

for all \(x,x' \in \mathcal {X}^n\). Since KL-divergence is convex,

$$\begin{aligned} I( X ; M(X) ) =&{\mathop {\mathbb {E}}\limits _{{x \leftarrow X}}}\left[ \mathrm {D}_{1}\left( M(x)\Vert M(X)\right) \right] \\ \le&{\mathop {\mathbb {E}}\limits _{{x \leftarrow X}}}\left[ {\mathop {\mathbb {E}}\limits _{{x' \leftarrow X}}}\left[ \mathrm {D}_{1}\left( M(x)\Vert M(x')\right) \right] \right] \\ \le&{\mathop {\mathbb {E}}\limits _{{x \leftarrow X}}}\left[ {\mathop {\mathbb {E}}\limits _{{x' \leftarrow X}}}\left[ \xi \cdot n (1+\log n) + \rho \cdot n^2\right] \right] \\ = \&\xi \cdot n (1+\log n) + \rho \cdot n^2. \end{aligned}$$

The reason this lower bound works is the strong group privacy guarantee — even for groups of size n, we obtain nontrivial privacy guarantees. While this is good for privacy it is bad for usefulness, as it implies that even information that is “global” (rather than specific to a individual or a small group) is protected. These lower bounds reinforce the connection between group privacy and lower bounds [HT10, De12, SU15a].

In contrast, \((\varepsilon ,\delta )\)-DP is not susceptible to such a lower bound because it gives a vacuous privacy guarantee for groups of size \(k=O(\log (1/\delta )/\varepsilon )\). This helps explain the power of the propose-test-release paradigm.

Furthermore, we obtain even stronger mutual information bounds when the entries of the distribution are independent:

Lemma 29

Let \(M : \mathcal {X}^{m} \rightarrow \mathcal {Y}\) satisfy \((\xi ,\rho )\)-zCDP. Let X be a random variable in \(\mathcal {X}^m\) with independent entries. Then

$$\begin{aligned} I \left( X ; M(X) \right) \le (\xi +\rho ) \cdot m , \end{aligned}$$

where I denotes mutual information (measured in nats, rather than bits).

Proof

First, by the chain rule for mutual information,

$$\begin{aligned} I(X; M(X)) = \sum _{i \in [m]} I(X_i ; M(X) | X_{1 \cdots i-1}), \end{aligned}$$

where

$$\begin{aligned} I(X_i ; M(X) | X_{1 \cdots i-1}) =&{\mathop {\mathbb {E}}\limits _{{x \leftarrow X_{1 \cdots i-1}}}}\left[ I(X_i | X_{1 \cdots i-1} = x; M(X) | X_{1 \cdots i-1} = x)\right] \\ =&{\mathop {\mathbb {E}}\limits _{{x \leftarrow X_{1 \cdots i-1}}}}\left[ I(X_i; M(x,X_{i \cdots m}))\right] , \end{aligned}$$

by independence of the \(X_i\)s.

We can define mutual information in terms of KL-divergence:

$$\begin{aligned} I(X_i; M(x,X_{i \cdots m})) =&{\mathop {\mathbb {E}}\limits _{{y \leftarrow X_i}}}\left[ \mathrm {D}_{1}\left( M(x,X_{i \cdots m}) | X_i=y\Vert M(x,X_{i \cdots m})\right) \right] \\ =&{\mathop {\mathbb {E}}\limits _{{y \leftarrow X_i}}}\left[ \mathrm {D}_{1}\left( M(x,y,X_{i+1 \cdots m})\Vert M(x,X_{i \cdots m})\right) \right] . \end{aligned}$$

By zCDP, we know that for all \(x \in \mathcal {X}^{i-1}\), \(y,y' \in \mathcal {X}\), and \(z \in \mathcal {X}^{m-i}\), we have

$$\begin{aligned} \mathrm {D}_{1}\left( M(x,y,z)\Vert M(x,y',z)\right) \le \xi + \rho . \end{aligned}$$

Thus, by the convexity of KL-divergence,

$$\begin{aligned} \mathrm {D}_{1}\left( M(x,y,X_{i+1 \cdots m})\Vert M(x,X_{i \cdots m})\right) \le \xi +\rho \end{aligned}$$

for all x and y. The result follows.

More generally, we can combine dependent and independent rows as follows.

Theorem 30

Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfy \((\xi ,\rho )\)-zCDP. Take \(n=m \cdot \ell \). Let \(X^1, \cdots , X^m\) be independent random variables on \(\mathcal {X}^\ell \). Denote \(X=(X^1, \cdots , X^{m}) \in \mathcal {X}^n\). Then

$$\begin{aligned} I\left( X;M(X)\right) \le m \cdot \left( \xi \cdot \ell (1+\log \ell ) + \rho \cdot \ell ^2 \right) , \end{aligned}$$

where I denotes the mutual information (measured in nats, rather than bits).

6.1 Example Applications of the Lower Bound

We informally discuss a few applications of our information-based lower bounds to some simple and well-studied problems in differential privacy.

One-Way Marginals. Consider \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) where \(\mathcal {X}=\{0,1\}^d\) and \(\mathcal {Y}=[0,1]^d\). The goal of M is to estimate the attribute means, or one-way marginals, of its input database x:

$$\begin{aligned} M(x) \approx \overline{x} = \frac{1}{n} \sum _{i \in [n]} x_i. \end{aligned}$$

It is known that this is possible subject to \(\varepsilon \)-DP if and only if \(n=\varTheta (d/\varepsilon )\) [HT10, SU15a]. This is possible subject to \((\varepsilon ,\delta )\)-DP if and only if \(n=\tilde{\varTheta }(\sqrt{d\log (1/\delta )}/\varepsilon )\), assuming \(\delta \ll 1/n\) [BUV14, SU15a].

We now analyze what can be accomplished with zCDP. Adding independent noise drawn from \(\mathcal {N}(0,d/2n^2\rho )\) to each of the d coordinates of \(\overline{x}\) satisfies \(\rho \)-zCDP. This gives accurate answers as long as \(n \gg \sqrt{d/\rho }\).

For a lower bound, consider sampling \(X_1 \in \{0,1\}^d\) uniformly at random. Set \(X_i=X_1\) for all \(i \in [n]\). By Proposition 28,

$$\begin{aligned} I(X ; M(X) ) \le n^2 \rho \end{aligned}$$

for any \(\rho \)-zCDP \(M : (\{0,1\}^d)^n \rightarrow [0,1]^d\). However, if M is accurate, we can recover (most of) \(X_1\) from M(X), whence \(I(X ; M(X) ) \ge \varOmega (d)\). This yields a lower bound of \(n \ge \varOmega (\sqrt{d/\rho })\), which is tight up to constant factors.

Histograms (a.k.a. Point Queries). Consider \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\), where \(\mathcal {X}=[T]\) and \(\mathcal {Y}=\mathbb {R}^T\). The goal of M is to estimate the histogram of its input:

$$\begin{aligned} M(x)_t \approx h_t(x) = |\{i \in [n] : x_i = t\}| \end{aligned}$$

For \(\varepsilon \)-DP it is possible to do this if and only if \(n = \varTheta (\log (T)/\varepsilon )\); the optimal algorithm is to independently sample

$$\begin{aligned} M(x)_t \sim h_t(x) + \mathsf {Laplace}(2/\varepsilon ). \end{aligned}$$

However, for \((\varepsilon ,\delta )\)-DP, it is possible to attain sample complexity \(n = O(\log (1/\delta )/\varepsilon )\) [BNS16, Theorem 3.13]. Interestingly, for zCDP we can show that \(n=\varTheta (\sqrt{\log (T)/\rho })\) is sufficient and necessary:

Sampling

$$\begin{aligned} M(x)_t \sim h_t(x) + \mathcal {N}(0,1/\rho ) \end{aligned}$$

independently for \(t \in [T]\) satisfies \(\rho \)-zCDP. Moreover,

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{}}\left[ \max _{t \in [T]} \left| M(x)_t - h_t(x) \right| \ge \lambda \right] \le T \cdot {\mathop {\mathbb {P}}\limits _{}}\left[ |\mathcal {N}(0,1/\rho )| > \lambda \right] \le T \cdot e^{-\lambda ^2\rho /2}. \end{aligned}$$

In particular \({\mathop {\mathbb {P}}\limits _{}}\left[ \max _{t \in [T]} \left| M(x)_t - h_t(x) \right| \ge \sqrt{\log (T/\beta )/\rho }\right] \le \beta \) for all \(\beta >0\). Thus this algorithm is accurate if \(n \gg \sqrt{\log (T)/\rho }\).

On the other hand, if we sample \(X_1 \in [T]\) uniformly at random and set \(X_i=X_1\) for all \(i \in [n]\), then \(I(X;M(X)) \ge \varOmega (\log T)\) for any accurate M, as we can recover \(X_1\) from M(X) if M is accurate. Proposition 28 thus implies that \(n \ge \varOmega (\sqrt{\log (T)/\rho })\) is necessary to obtain accuracy. This gives a strong separation between approximate DP and zCDP.

Lower Bounds with Accuracy. The above examples can be easily discussed in terms of a more formal and quantitative definition of accuracy. For instance, in the full version of this work, we revisit the histogram example:

Proposition 31

If \(M : [T]^n \rightarrow \mathbb {R}^T\) satisfies \(\rho \)-zCDP and

$$\begin{aligned} \forall x \in [T]^n \qquad {\mathop {\mathbb {E}}\limits _{{M}}}\left[ \max _{t \in [T]} \big | M(x)_t - h_t(x) \big |\right] \le \alpha n, \end{aligned}$$

then \(n \ge \varOmega (\sqrt{\log (\alpha ^2 T)/\rho \alpha ^2})\).

We remark that our lower bounds for zCDP can be converted to lower bounds for mCDP using Lemma 24.

7 Obtaining Pure DP Mechanisms from zCDP

We now establish limits on what more can be achieved with zCDP over pure differential privacy. In particular, we prove that any mechanism satisfying zCDP can be converted into a mechanism satisfying pure DP with at most a quadratic blowup in sample complexity. Formally, we show the following theorem.

Theorem 32

Fix \(n \in \mathbb {N}\), \(n' \in \mathbb {N}\), \(k \in \mathbb {N}\) \(\alpha >0\), and \(\varepsilon >0\). Let \(q : \mathcal {X} \rightarrow \mathbb {R}^k\) and let \(\Vert \cdot \Vert \) be a norm on \(\mathbb {R}^k\). Assume \(\max _{x \in \mathcal {X}} \Vert q(x)\Vert \le 1\). Suppose there exists a \((\xi ,\rho )\)-zCDP mechanism \(M : \mathcal {X}^n \rightarrow \mathbb {R}^k\) such that for all \(x \in \mathcal {X}^n\),

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{M}}}\left[ \Vert M(x)-q(x)\Vert \right] \le {\alpha }. \end{aligned}$$

Assume \(\xi \le \alpha ^2\), \(\rho \le \alpha ^2\), and

$$\begin{aligned} n' \ge \frac{4}{\varepsilon \alpha } \left( \rho \cdot n^2 + \xi \cdot n \cdot (1 + \log n) + 1 \right) . \end{aligned}$$

Then there exists a \((\varepsilon ,0)\)-differentially private \(M' : \mathcal {X}^{n'} \rightarrow \mathbb {R}^k\) satisfying

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{M'}}}\left[ \Vert M'(x)-q(x)\Vert \right] \le 10\alpha \end{aligned}$$

and

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{M'}}\left[ \Vert M'(x)-q(x)\Vert > 10\alpha + \frac{4}{\varepsilon n'} \log \left( \frac{1}{\beta }\right) \right] \le \beta \end{aligned}$$

for all \(x \in \mathcal {X}^{n'}\) and \(\beta >0\).

Before discussing the proof of Theorem 32, we make some remarks about its statement:

  • Unfortunately, the theorem only works for families of statistical queries \(q : \mathcal {X} \rightarrow \mathbb {R}^k\). However, it works equally well for \(\Vert \cdot \Vert _\infty \) and \(\Vert \cdot \Vert _1\) error bounds.

  • If \(\xi =0\), we have \(n' = O(n^2 \rho / \varepsilon \alpha )\). So, if \(\rho \), \(\varepsilon \), and \(\alpha \) are all constants, we have \(n'=O(n^2)\). This justifies our informal statement that we can convert any mechanism satisfying zCDP into one satisfying pure DP with a quadratic blowup in sample complexity.

  • The requirement that \(\xi , \rho \le \alpha ^2\) is only used to show that

    $$\begin{aligned} \max _{x \in \mathcal {X}^{n'}} \min _{\hat{x} \in \mathcal {X}^n} \Vert q(x)-q(\hat{x})\Vert \le 2\alpha . \end{aligned}$$
    (6)

    However, in many situations (6) holds even when \(\xi ,\rho \gg \alpha ^2\). For example, if \(n \ge O(\log (k)/\alpha ^2)\) or even \(n \ge O(VC(q)/\alpha ^2)\) then (6) is automatically satisfied. The technical condition (6) is needed to relate the part of the proof with inputs of size n to the part with inputs of size \(n'\).

The proof of Theorem 32 is not constructive. Rather than directly constructing a mechanism satisfying pure DP from any mechanism satisfying zCDP, we show the contrapositive statement: any lower bound for pure DP can be converted into a lower bound for zCDP. Pure DP is characterized by so-called packing lower bounds and the exponential mechanism.

In the full version of this work, we use a greedy argument to show that for any output space and any desired accuracy, there is a set T that is simultaneously a “packing” and a “net:”

Lemma 33

Let \((\mathcal {Y},d)\) be a metric space. Fix \(\alpha >0\). Then there exists a countable \(T \subset \mathcal {Y}\) such that both of the following hold.

  • (Net:) Either T is infinite or for all \(y' \in \mathcal {Y}\) there exists \(y \in T\) with \(d(y,y') \le \alpha \).

  • (Packing:) For all \(y,y' \in T\), if \(y \ne y'\), then \(d(y,y')>\alpha \).

It is well-known that a net yields a pure DP algorithm:

Lemma 34

Exponential Mechanism [MT07, BLR13]. Let \(\ell : \mathcal {X}^n \times T \rightarrow \mathbb {R}\) satisfy \(|\ell (x,y) - \ell (x',y)| \le \varDelta \) for all \(x,x' \in \mathcal {X}^n\) differing in one entry and all \(y \in T\). Then, for all \(\varepsilon >0\), there exists an \(\varepsilon \)-differentially private \(M : \mathcal {X}^n \rightarrow T\) such that

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{M}}\left[ \ell (x,M(x)) \le \min _{y \in T} \ell (x,y) + \frac{2\varDelta }{\varepsilon }\log \left( \frac{|T|}{\beta }\right) \right] \ge 1-\beta \end{aligned}$$

and

$$\begin{aligned} {\mathop {\mathbb {E}}\limits _{{M}}}\left[ \ell (x,M(x))\right] \le \min _{y \in T} \ell (x,y) + \frac{2\varDelta }{\varepsilon } \log |T| \end{aligned}$$

for all \(x \in \mathcal {X}^n\) and \(\beta >0\).

On the other hand, in the full version of this work we use Proposition 28 to show that a packing yields a lower bound for zCDP:

Lemma 35

Let \((\mathcal {Y},d)\) be a metric space and \(q : \mathcal {X}^n \rightarrow \mathcal {Y}\) a function. Let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) be a \((\xi ,\rho )\)-zCDP mechanism satisfying

$$\begin{aligned} {\mathop {\mathbb {P}}\limits _{M}}\left[ d(M(x),q(x)) > \alpha /2\right] \le \beta \end{aligned}$$

for all \(x \in \mathcal {X}^n\). Let \(T \subset \mathcal {Y}\) be such that \(d(y,y')>\alpha \), for all \(y, y' \in T\) with \(y \ne y'\). Assume that for all \(y \in T\) there exists \(x \in \mathcal {X}^n\) with \(q(x)=y\). Then

$$\begin{aligned} (1-\beta )\log |T| - \log 2 \le \xi \cdot n (1+\log n) + \rho \cdot n^2. \end{aligned}$$

In particular, if \(\xi =0\), we have

$$\begin{aligned} n \ge \sqrt{\frac{(1-\beta )\log |T|-\log 2}{\rho }}= \varOmega (\sqrt{\log |T|/\rho }). \end{aligned}$$

In the full version of this work, we combine these lemmas to prove Theorem 32.

8 Approximate zCDP

Recall our definition of approximate zCDP:

Definition 36

(Approximate zCDP). A randomised mechanism \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) is \(\delta \)-approximately \((\xi ,\rho )\)-zCDP if, for all \(x,x' \in \mathcal {X}^n\) differing on a single entry, there exist events \(E=E(M(x))\) and \(E'=E'(M(x'))\) such that, for all \(\alpha \in (1,\infty )\),

$$\begin{aligned} \mathrm {D}_{\alpha }\left( M(x)|_E\Vert M(x')|_{E'}\right) \le \xi + \rho \cdot \alpha ~~~~~ \text {and} ~~~~~ \mathrm {D}_{\alpha }\left( M(x')|_{E'}\Vert M(x)|_{E}\right) \le \xi + \rho \cdot \alpha \end{aligned}$$

and \({\mathop {\mathbb {P}}\limits _{M(x)}}\left[ E\right] \ge 1- \delta \) and \({\mathop {\mathbb {P}}\limits _{M(x')}}\left[ E'\right] \ge 1-\delta \).

Clearly 0-approximate zCDP is simply zCDP. Hence we have a generalization of zCDP. As we will show later in this section, \(\delta \)-approximate \((\varepsilon ,0)\)-zCDP is equivalent to \((\varepsilon ,\delta )\)-DP. Thus we have also generalized approximate DP. Hence, this definition unifies both relaxations of pure DP.

Approximate zCDP is a three-parameter definition which allows us to capture many different aspects of differential privacy. However, three parameters is quite overwhelming. We believe that use of the one-parameter \(\rho \)-zCDP (or the two-parameter \(\delta \)-approximate \(\rho \)-zCDP if necessary) is sufficient for most purposes.

It is easy to verify that the definition of approximate zCDP satisfies the usual composition and post-processing properties. However, the strong group privacy guarantees of Sect. 5 no longer apply to approximate zCDP and, hence, the strong lower bounds of Sect. 6 also no longer hold. Circumventing these lower bounds is part of the motivation for considering approximate zCDP.

In the full version of this work, we use techniques developed in [KOV15, MV16] to show that approximate DP can be converted to approximate zCDP.

Lemma 37

If \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \((\varepsilon ,\delta )\)-DP, then M satisfies \(\delta \)-approximate \((\varepsilon ,0)\)-zCDP, which, in turn, implies \(\delta \)-approximate \((0,\frac{1}{2} \varepsilon ^2)\)-zCDP.

Conversely, approximate zCDP also implies approximate DP. The following result sharpens Lemma 21.

Lemma 38

Suppose \(M : \mathcal {X}^n \rightarrow \mathcal {Y}\) satisfies \(\delta \)-approximate \((\xi ,\rho )\)-zCDP. If \(\rho =0\), then M satisfies \((\xi ,\delta )\)-DP. In general, M satisfies \((\varepsilon ,\delta +(1-\delta )\delta ')\)-DP for all \(\varepsilon \ge \xi +\rho \), where

$$\begin{aligned} \delta ' = e^{-(\varepsilon -\xi -\rho )^2/4\rho } \cdot \min \left\{ \begin{array}{l} 1 \\ \sqrt{\pi \cdot \rho } \\ \frac{1}{1+(\varepsilon -\xi -\rho )/2\rho } \\ \frac{2}{1+\frac{\varepsilon -\xi -\rho }{2\rho } + \sqrt{\left( 1+\frac{\varepsilon -\xi -\rho }{2\rho }\right) ^2+\frac{4}{\pi \rho }}} \end{array} \right. . \end{aligned}$$

A result is that we can give a sharper version of the so-called advanced composition theorem [DRV10]. Note that the following results are subsumed by the bounds of Kairouz, Oh, and Viswanath [KOV15] and Murtagh and Vadhan [MV16]. However, these bounds may be extended to analyse the composition of mechanisms satisfying CDP with mechanisms satisfying approximate DP. We believe that such a “unified” analysis of composition will be useful.

Applying Lemmas 37 and 38 yields the following result.

Corollary 39

Let \(M_1, \cdots , M_k : \mathcal {X}^n \rightarrow \mathcal {Y}\) and let \(M : \mathcal {X}^n \rightarrow \mathcal {Y}^k\) be their composition. Suppose each \(M_i\) satisfies \((\varepsilon _i,\delta _i)\)-DP. Then M satisfies

$$\begin{aligned} \left( \frac{1}{2}\Vert \varepsilon \Vert _2^2+\sqrt{2}\lambda \Vert \varepsilon \Vert _2, \sqrt{\frac{\pi }{2}} \cdot \Vert \varepsilon \Vert _2 \cdot e^{-\lambda ^2} + \Vert \delta \Vert _1 \right) \text {-DP} \end{aligned}$$

for all \(\lambda \ge 0\). Alternatively M satisfies

$$\begin{aligned} \left( \frac{1}{2}\Vert \varepsilon \Vert _2^2+\sqrt{2\log (\sqrt{\pi /2} \cdot \Vert \varepsilon \Vert _2/\delta ')}\cdot \Vert \varepsilon \Vert _2,\delta ' + \Vert \delta \Vert _1 \right) \text {-DP} \end{aligned}$$

for all \(\delta ' \ge 0\).

In comparison to the composition theorem of [DRV10], we save modestly by a constant factor in the first term and, in most cases \(\sqrt{\pi /2} \Vert \varepsilon \Vert _2 < 1\), whence the logarithmic term is an improvement over the usual advanced composition theorem.