1 Introduction

The Statistical Difference Problem, introduced by Sahai and Vadhan [SV03], is a central computational (promise) problem in complexity theory and cryptography, which is also intimately related to the study of statistical zero-knowledge (\({\mathsf {SZK}}\)). The input to this problem is a pair of circuits \(C_0\) and \(C_1\), specifying probability distributions (i.e., that are induced by feeding the circuits with a uniformly random string). YES instances are those in which the statistical distanceFootnote 1 between the two distributions is at least 2/3 and NO instances are those in which the distance is at most 1/3. Input circuits that do not fall in one of these two cases are considered to be outside the promise (and so their value is left unspecified).

The choice of the constants 1/3 and 2/3 in the above definition is somewhat arbitrary (although not entirely arbitrary as will soon be discussed in detail). A more general family of problems can be obtained by considering a suitable parameterization. More specifically, let \(0 \le \beta < \alpha \le 1\). The \((\alpha ,\beta )\) parameterized version of the Statistical Difference Problem, denoted \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\), has as its YES inputs pairs of circuits that induce distributions that have distance at least \(\alpha \) whereas the NO inputs correspond to circuits that induce distributions that have distance at most \(\beta \).

Definition 1.1

Let with \(\alpha (n)>\beta (n)\) for every n. The Statistical Difference Problem with promise \((\alpha ,\beta )\), denoted \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\), is given by the sets

$$\begin{aligned}&{{\,\mathrm{SDP}\,}}^{\alpha ,\beta }_{Y} = \big \{(C_0,C_1) \mid {\text {SD}}(C_0,C_1) \ge \alpha (n) \big \} \text { and}\\&{{\,\mathrm{SDP}\,}}^{\alpha ,\beta }_{N} = \big \{(C_0,C_1) \mid {\text {SD}}(C_0,C_1) \le \beta (n)\big \}, \end{aligned}$$

where n is the output length of the circuits \(C_0\) and \(C_1\).Footnote 2

(Here and below we abuse notation and use \(C_0\) and \(C_1\) to denote both the circuits and the respective distributions that they generate.)

The elegant polarization lemma of [SV03] shows how to polarize the statistical distance between two distributions. In more detail, for any constants \(\alpha \) and \(\beta \) such that \(\beta < \alpha ^2\), the lemma gives a transformation that makes distributions that are at least \(\alpha \)-far be extremely far and distributions that are \(\beta \)-close be extremely close. Beyond being of intrinsic interest, the polarization lemma is used to establish the \({\mathsf {SZK}}\) completeness of \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\), when \(\alpha ^2 > \beta \), and has other important applications in cryptography such as the amplification of weak public key encryption schemes to full fledged ones [DNR04, HR05].

Sahai and Vadhan left the question of polarization for parameters \(\alpha \) and \(\beta \) that do not meet the requirements of their polarization lemma as an open question. We refer to this setting of \(\alpha \) and \(\beta \) as the non-polarizing regime. We emphasize that by non-polarizing we merely mean that in this regime polarization is not currently known and not that it is impossible to achieve (although some barriers are known and will be discussed further below). The focus of this work is studying the \( \textsc {Statistical\,Difference\,Problem} \) in the non-polarizing regime.

1.1 Our Results

We proceed to describe our results.

1.1.1 Polarization and \({\mathsf {SZK}}\) Completeness for Other Notions of Distance

The statistical distance metric is one of the central information theoretic tools used in cryptography as it is very useful for capturing similarity between distributions. However, in information theory there are other central notions that measure similarity such as mutual information and KL divergence as well as others.

Loosely speaking, our first main result shows that polarization is possible even in some cases in which \(\beta \ge \alpha ^2\). However, this result actually stems from a more general study showing that polarization is possible for other notions of distance between distributions from information theory, which we find to be of independent interest.

When distributions are extremely similar or extremely dissimilar, these different notions of distance are often (but not always) closely related and hence interchangeable. This equivalence is particularly beneficial when considering applications of \({\mathsf {SZK}}\)—for some applications one distance measure may be easier to use than others. For example, showing that the average-case hardness of \({\mathsf {SZK}}\) implies one-way functions can be analyzed using statistical distance (e.g., [Vad99, Section 4.8]), but showing that every language in \({\mathsf {SZK}}\) has instance-dependent commitments is naturally analyzed using entropy (e.g., [OV08]).

However, as the gaps in the relevant distances get smaller (i.e., the distributions are only somewhat similar or dissimilar), the relation between different statistical properties becomes less clear (for example, the reduction from \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) to the Entropy Difference Problem of [GV99] only works when roughly \(\alpha ^2>\beta \)). This motivates studying the computational complexity of problems defined using different notions of distance in this small gap regime. Studying this question can be (and, as we shall soon see, indeed is) beneficial in two aspects. First, providing a wider bag of statistical properties related to \({\mathsf {SZK}}\), which can make certain applications easier to analyze. Second, the computational complexity of these distance notions might shed light on the computational complexity of problems involving existing distance notions (e.g., \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) when \(\alpha ^2<\beta \)).

We focus here on two specific distance notions—the triangular discrimination and the Jensen-Shannon divergence, defined next.

Definition 1.2

(Triangular Discrimination). The Triangular Discrimination (a.k.a. Le Cam divergence) between two distributions P and Q is defined as

$$\begin{aligned} {{\,\mathrm{TD}\,}}(P,Q) = \frac{1}{2}\sum _{y\in \mathcal {Y}}\frac{(P_y-Q_y)^2}{P_y+Q_y}, \end{aligned}$$

where \(\mathcal {Y}\) is the union of the supports of P and Q.

The Triangular Discrimination Problem with promise \((\alpha ,\beta )\), denoted \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\), is defined analogously to \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\), but with respect to \({{\,\mathrm{TD}\,}}\) rather than \({\text {SD}}\).

The triangular discrimination is commonly used, among many other applications, in statistical learning theory for parameter estimation with quadratic loss, see [Cam86, P. 48] (in a similar manner to how statistical distance characterizes the 0–1 loss function in hypothesis testing). Jumping ahead, while the definition of triangular discrimination seems somewhat arbitrary at first glance, in Sect. 2 we will show that this distance notion characterizes some basic phenomena in the study of statistical zero-knowledge. Triangular discrimination has recently found usage in theoretical computer science, and even specifically in problems related to \({\mathsf {SZK}}\). Yehudayoff [Yeh16] showed that using TD yields a tighter analysis of the pointer chasing problem in communication complexity. The work of Komargodski and Yogev [KY18] uses triangular discrimination to show that the average-case hardness of \({\mathsf {SZK}}\) implies the existence of distributional collision resistant hash functions.

Next, we define the Jensen-Shannon Divergence. To start with, recall that the KL-divergence between two distributions P and Q is definedFootnote 3 as \({{\,\mathrm{KL}\,}}(P||Q) = \sum _{y \in \mathcal {Y}} P_y \log (P_y/Q_y)\). Also, given distributions \(P_0\) and \(P_1\) we define the distribution \(\frac{1}{2} P_0+ \frac{1}{2} P_1\) as the distribution obtained by sampling a random coin \(b \in \{0,1\}\) and outputting a sample y from \(P_b\) (indeed, this notation corresponds to arithmetic operations on the probability mass functions). The Jensen-Shannon divergence measures the mutual information between b and y.

Definition 1.3

(Jensen-Shannon Divergence). The Jensen-Shannon divergence between two distributions P and Q is defined as

The Jensen-Shannon Divergence Problem with promise \((\alpha ,\beta )\), denoted \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\), is defined analogously to \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\), but with respect to \({{\,\mathrm{JS}\,}}\) rather than \({\text {SD}}\).

The Jensen-Shannon divergence enjoys a couple of important properties (in our context) that the KL-divergence lacks: it is symmetric and bounded. Both triangular discrimination and Jensen-Shannon divergence (as well as statistical distance and KL-divergence) are types of f-divergences, a central concept in information theory (see [PW17, Section 6] and references therein). They are both non-negative and bounded by one.Footnote 4 Finally, the Jensen-Shannon divergence is a metric, while the triangular discrimination is a square of a metric.

With these notions of distance and corresponding computational problems in hand, we are almost ready to state our first set of results. Before doing so, we introduce an additional useful technical definition.

Definition 1.4

(Separated functions). Let . A pair of \({\mathsf {poly}}(n)\)-time computable functions \((\alpha ,\beta )\), where \(\alpha =\alpha (n) \in [0,1]\) and \(\beta = \beta (n) \in [0,1]\), is g -separated if \(\alpha (n)\ge \beta (n)+g(n)\) for every .

We denote by (1/poly)-separated the set of all pairs of functions that are (1/p)-separated for some polynomial p. Similarly, we denote by (1/log)-separated the set of all pairs of functions that are \((1/(c\log ))\)-separated for some constant \(c>0\).

We can now state our first set of results: that both \({{\,\mathrm{TDP}\,}}\) and \({{\,\mathrm{JSP}\,}}\), with a noticeable gap, are \({\mathsf {SZK}}\) complete.

Theorem 1.5

Let \((\alpha ,\beta )\) be \((1/{\mathsf {poly}})\)-separated functions such that there exists a constant \(\varepsilon \in (0,1/2)\) such that \(2^{-n^{1/2-\varepsilon }} \le \beta (n)\) and \(\alpha (n) \le 1-2^{-n^{1/2-\varepsilon }}\), for every . Then, \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\) complete.

Theorem 1.6

For \((\alpha ,\beta )\) as in Theorem 1.5, the problem \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\) complete.

The restriction on \(2^{-n^{1/2-\varepsilon }} \le \beta (n)\) and \(\alpha (n) \le 1-2^{-n^{1/2-\varepsilon }}\) should be interpreted as a non-degeneracy requirement (which we did not attempt to optimize), where we note that some restriction seems inherent. Moreover, we can actually decouple the assumptions in Theorems 1.5 and 1.6 as follows. To show that \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) and \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) are \({\mathsf {SZK}}\)-hard, only the non-degeneracy assumption (i.e., \(2^{-n^{1/2-\varepsilon }} \le \beta (n)\) and \(\alpha (n) \le 1-2^{-n^{1/2-\varepsilon }}\)) is needed. On the other hand, to show that these problems are in \({\mathsf {SZK}}\) we only require that \((\alpha ,\beta )\) are \((1/{\mathsf {poly}})\)-separated.

Note that in particular, Theorems 1.5 and 1.6 imply polarization lemmas for both \({{\,\mathrm{TD}\,}}\) and \({{\,\mathrm{JS}\,}}\). For example, for triangular discrimination, since \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta } \in {\mathsf {SZK}}\) and \({{\,\mathrm{TDP}\,}}^{1-2^{-k},2^{-k}}\) is \({\mathsf {SZK}}\)-hard, one can reduce the former to the latter.

Beyond showing polarization for triangular discrimination, Theorem 1.5 has implications regarding the question of polarizing statistical distance, which was our original motivation. It is known that the triangular discrimination is sandwiched between the statistical distance and its square; namely, for every two distributions P and Q it holds that (see [Top00, Eq. (11)]):

$$\begin{aligned} {\text {SD}}(P,Q)^2\le {{\,\mathrm{TD}\,}}(P,Q)\le {\text {SD}}(P,Q) \end{aligned}$$
(1.1)

Thus, the problem \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) is immediately reducible to \({{\,\mathrm{TDP}\,}}^{\alpha ^2,\beta }\), which Theorem 1.5 shows to be \({\mathsf {SZK}}\)-complete, as long as the gap between \(\alpha ^2\) and \(\beta \) is noticeable. Specifically, we have the following corollary.

Corollary 1.7

Let \((\alpha ,\beta )\) be as in Theorem 1.5, with the exception that \((\alpha ^2,\beta )\) are \((1/{\mathsf {poly}})\)-separated (note that here \(\alpha \) is squared). Then, the promise problem \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\) complete.

We highlight two implications of Theorem 1.5 and Corollary 1.7 (which were also briefly mentioned above).

Polarization with Inverse Polynomial Gap. Observe that Corollary 1.7 implies polarization of statistical distance in a regime in which \(\alpha \) and \(\beta \) are functions of n, the output length of the two circuits, and \(\alpha ^2\) and \(\beta \) are only separated by an inverse polynomial. This is in contrast to most prior works which focus on \(\alpha \) and \(\beta \) that are constants. In particular, Sahai and Vadhan’s [SV03] proof of the polarization lemma focuses on constant \(\alpha \) and \(\beta \) and can be extended to handle an inverse logarithmic gap, but does not seem to extend to an inverse polynomial gap.Footnote 5 Corollary 1.7 does yield such a result, by relying on a somewhat different approach.

Polarization Beyond \(\alpha ^2 > \beta \). Theorem 1.5 can sometimes go beyond the requirement that \(\alpha ^2>\beta \) for polarizing statistical distance. Specifically, it shows that any problem with noticeable gap in the triangular discrimination can be polarized. Indeed, there are distributions (PQ) and \((P',Q')\) with \({\text {SD}}(P,Q)> {\text {SD}}(P',Q') > {\text {SD}}(P,Q)^2\) but still \({{\,\mathrm{TD}\,}}(P,Q) > {{\,\mathrm{TD}\,}}(P',Q')\).Footnote 6 Circuits generating such distributions were until now not known to be in the polarizing regime, but can now be polarized by combining Theorem 1.5 and Eq. (1.1).

1.1.2 From Statistical Difference to One-Way Functions

We continue our study of the Statistical Difference Problem, focusing on the regime where \(\beta < \alpha \) (and in particular even when \(\beta \ge \alpha ^2\)). We show that in this regime the \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) problem shares many important properties of \({\mathsf {SZK}}\) (although we fall short of actually showing that it lies in \({\mathsf {SZK}}\)—which is equivalent to polarization for any \(\beta <\alpha \)).

First, we show that similarly to \({\mathsf {SZK}}\), the average-case hardness of \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) implies the existence of one-way functions. The fact that average-case hardness of \({\mathsf {SZK}}\) (or equivalently \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) for \(\beta < \alpha ^2\)) implies the existence of one-way functions was shown by Ostrovsky [Ost91]. Indeed, our contribution is in showing that the weaker condition of \(\beta <\alpha \) (rather than \(\beta < \alpha ^2\)) suffices for this result.

Theorem 1.8

Let \((\alpha ,\beta )\) be \((1/{\mathsf {poly}})\)-separated functions. Suppose that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) is average-case hard. Then, there exists a one-way function.

The question of constructing one-way functions from the (average-case) hardness of \({{\,\mathrm{SDP}\,}}\) is closely related to a result of Goldreich’s [Gol90] showing that the existence of efficiently sampleable distributions that are statistically far but computationally indistinguishable implies the existence of one-way functions. Our proof of Theorem 1.8 allows us to re-derive the following strengthening of [Gol90], due to Naor and Rothblum [NR06, Theorem 4.1]: for any \((1/{\mathsf {poly}})\)-separated \((\alpha ,\beta )\), the existence of efficiently sampleable distributions whose statistical distance is \(\alpha \) but no efficient algorithm can distinguish between them with advantage more than \(\beta \), implies the existence of one-way functions. See further discussion in Theorem 2.1.

1.1.3 Interactive Proof for Statistical Distance Approximation

As our last main result, we construct a new interactive protocol that lets a verifier estimate the statistical distance between two given circuits up to any noticeable precision.

Theorem 1.9

There exists a constant-round public-coin interactive protocol between a prover and a verifier that, given as input a pair of circuits \((C_0,C_1)\), a claim \(\varDelta \in [0,1]\) for their statistical distance, and a tolerance parameter \(\delta \in [0,1]\), satisfies the following properties:

  • Completeness: If \({\text {SD}}(C_0,C_1) = \varDelta \), then the verifier accepts with probability at least 2/3 when interacting with the honest prover.

  • Soundness: If \(\left| {\text {SD}}(C_0,C_1) - \varDelta \right| \ge \delta \), then when interacting with any (possibly cheating) prover, the verifier accepts with probability at most 1/3.

  • Efficiency: The verifier runs in time \({\mathsf {poly}}(|C_0|,|C_1|,1/\delta )\).

(As usual the completeness and soundness errors can be reduced by applying parallel repetition. We can also achieve perfect completeness using a result from [FGM+89].)

Theorem 1.9 is actually equivalent to the following statement.

Theorem 1.10

([BL13, Theorem 6], [BBF16, Theorem 2]). For any \((\alpha ,\beta )\) that are \((1/{\mathsf {poly}})\)-separated, it holds that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta } \in {\mathsf {AM}}\cap {\mathsf {coAM}}\).Footnote 7

It is believed that \({\mathsf {AM}}\cap {\mathsf {coAM}}\) lies just above \({\mathsf {SZK}}\), and if we could show that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\), that would imply \({\text {SD}}\) polarization for such \(\alpha \) and \(\beta \).

Since Theorem 1.9 can be derived from existing results in the literature, we view our main contribution to be the proof which is via a single protocol that we find to be cleaner and more direct than alternate approaches.

Going into a bit more detail, [BL13, BBF16]’s proofs are in fact a combination of two separate constant-round protocols. The first protocol is meant to show that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta } \in {\mathsf {AM}}\) and follows directly by taking the interactive proof for \({{\,\mathrm{SDP}\,}}\) presented by Sahai and Vadhan (which has completeness error \((1-\alpha )/2\) and soundness error \((1+\beta )/2\)), and applying parallel repetition (and the private-coin to public-coin transformation of [GS89]).

The second protocol is meant to show that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta } \in {\mathsf {coAM}}\), and is based on a protocol by Bhatnagar, Bogdanov, and Mossel [BBM11]. Another approach for proving that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta } \in {\mathsf {coAM}}\) is by combining results of [GVW02] and [SV03]. Goldreich, Vadhan and Wigderson [GVW02] showed that problems with laconic interactive proofs, that is proofs where the communication from the prover to the verifier is small, have \( {\mathsf {co}}{\mathsf {AM}}\) proofs. Sahai and Vadhan [SV03], as described earlier, showed that \( {{\,\mathrm{SDP}\,}}^{\alpha , \beta } \), and \({\mathsf {SZK}}\) in general, has an interactive proof where the prover communicates a single bit. Combining these results immediately gives a \( {\mathsf {co}}{\mathsf {AM}}\) protocol for \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) when \((\alpha ,\beta )\) are \(\varOmega (1)\)-separated. As for \((\alpha ,\beta )\) that are only \((1/{\mathsf {poly}})\)-separated, while the [GVW02] result as-stated does not suffice, it seems that their protocol can be adapted to handle this case as well.Footnote 8

As mentioned above, we give a different, and direct, proof of Theorem 1.9 that we find to be simpler and more natural than the above approach. In particular, our proof utilizes the techniques developed for our other results, which enable us to give a single and more general protocol—one that approximates the statistical difference (as in Theorem 1.9), rather than just deciding if that distance is large or small.

At a very high level, our protocol may be viewed as an application of the set-lower-bound-based techniques of Akavia et al. [AGGM06] or Bogdanov and Brzuska [BB15] to our construction of a one-way function from the average-case hardness of \({{\,\mathrm{SDP}\,}}\) (i.e., Theorem 1.8), though there are technical differences in our setting. Both these papers show how to construct a \({\mathsf {coAM}}\) protocol for any language that can be reduced, to inverting a size-verifiable one-way function.Footnote 9 While we do not know how to reduce solving \({{\,\mathrm{SDP}\,}}\) in the worst-case to inverting any specific function, we make use of the fact that associated with each instance of \({{\,\mathrm{SDP}\,}}\), there is an instance-dependent function [OW93], that is size-verifiable on the average.

1.2 Additional Related Works

Barriers to Improved Polarization. Holenstein and Renner [HR05] show that in a limited model dubbed “oblivious polarization”, the condition \( \alpha ^2 > \beta \) on the statistical distance is necessary for polarizing statistical distance.Footnote 10 All the past polarization reductions fit in this framework and so do ours. Specifically, Holenstein and Renner show distributions where \( \alpha ^2 < \beta \) and cannot be polarized in this model. We show a condition that suffices for polarization, even for distributions where \( \alpha ^2 \le \beta \). This does not contradict the [HR05] result because their distributions do not satisfy this condition.

In a more general model, [LZ17, CGVZ18] showed lower bounds for \({\mathsf {SZK}}\)-related distribution manipulation tasks. The model they consider allows the reduction arbitrary oracle access to the circuits that sample the distributions, as opposed to the more restricted model of oblivious polarization. In this model, Lovett and Zhang [LZ17] show that efficient entropy reversal is impossibleFootnote 11, and Chen, Göös, Vadhan and Zhang [CGVZ18] showed that entropy flattening requires \( \varOmega (n^2) \) invocations to the underlying circuit. Showing lower bounds for polarization in this more general model remains an interesting open question.

Polarization for Other Notions of Distance. In the process of characterizing zero-knowledge in the help model, Ben-Or and Gutfreund [BG03] and Chailloux et al. [CCKV08] gave a polarization procedure that considers two different distances for every \((1/{\mathsf {log}})\)-separated \(\alpha >\beta \): if the statistical distance is at most \(\beta \), then it decreases to \(2^{-k}\); and if the mutual disjointnessFootnote 12 is at least \(\alpha \), then it increases to \(1-2^{-k}\). Fehr and Vaudenay [FV17] raise the question of polarization for the fidelity measureFootnote 13 but leave resolving it as an open problem (see Sect. 2.3.3 for details).

\({{\,\mathrm{SDP}\,}}\) and Cryptography. We show that average-case hardness of \( {{\,\mathrm{SDP}\,}}^{\alpha , \beta } \) implies one-way functions. In the reverse direction, Bitansky et al. [BDV17] show that one-way functions do not imply even worst-case hardness of \( {{\,\mathrm{SDP}\,}}^{\alpha , \beta } \) in a black-box manner for any \( (1/{\mathsf {poly}})\)-separated \( \alpha , \beta \).Footnote 14

2 Techniques

We begin in Sect. 2.1 by describing how to construct a one-way function from the average-case hardness of \({\text {SD}}\) with any noticeable gap (Theorem 1.8). The techniques used there are also central in our interactive protocol for \({\text {SD}}\) estimation (Theorem 1.9), which is described in Sect. 2.2, as well as in our proof that triangular discrimination and Jensen-Shannon divergence are \({\mathsf {SZK}}\) complete (Theorems 1.5 and 1.6), which are outlined in Sect. 2.3 below.

2.1 One-Way Function from Statistical Difference with Any Noticeable Gap

We first show the existence of distributionally one-way functions. Namely, an efficiently computable function f for which it is hard to sample a uniformly random pre-image for a random output y (rather than an arbitrary pre-image as in a standard one-way function). This suffices since Impagliazzo and Luby [IL89] showed how to convert a distributionally one-way function into a standard one.

Assume that we are given a distribution over a pair of circuits \((C_0,C_1)\) such that it is hard to distinguish between the cases \({\text {SD}}(C_0,C_1)\ge \alpha \) or \({\text {SD}}(C_0,C_1)\le \beta \), for some \(\alpha >\beta +1/{\mathsf {poly}}\). A natural candidate for a one-way function is the (efficiently computable) function

$$\begin{aligned} f_{C_0,C_1}(b,x) = C_b(x). \end{aligned}$$
(2.1)

Namely, f is parameterized by the circuits \((C_0,C_1)\) (which are to be sampled according to the hard distribution), and the bit b chooses which of the two circuits would be evaluated on the string x. This function appears throughout the \({\mathsf {SZK}}\) literature (e.g., it corresponds to the verifier’s message in the \({{\,\mathrm{SDP}\,}}\) protocol of [SV03]).

Assume that f is not distributionally one-way, and let \(\mathsf {A}\) be an algorithm that given \((C_0,C_1)\) and a random input y—sampled by first drawing a uniformly random bit b and a string x and then computing \(y=C_b(x)\)—outputs a uniformly random element \((b',x')\) from the set . For simplicity, we assume that \(\mathsf {A}\) is a perfect distributional inverter, that is for every fixed \((C_0, C_1, y)\) it outputs uniformly random elements of \( f^{-1}_{C_0, C_1}(y) \).

Arguably, the most natural approach for distinguishing between the cases of high or low statistical distance given the two circuits and the inverter, is to choose x and b at random, invoke the inverter to obtain \((b',x')\), and check whether \(b=b'\). Indeed, if \({\text {SD}}(C_0,C_1)=1\), then , and if \({\text {SD}}(C_0,C_1)=0\), then . Thus, we can distinguish between the cases with constant advantage.

But what happens when the gap in the statistical distance is smaller? To analyze this case we want to better understand the quantity . It turns out that this quantity is characterized by the triangular discrimination between the circuits. Let \(P_b\) denote the output distribution of \(C_b\). Using elementary manipulations (and the fact that \(\frac{1}{2}(P_0+P_1)\) is a distribution), it holds thatFootnote 15

(2.2)

Based on the general bounds between triangular discrimination and statistical distance (Eq. (1.1)), which are known to be tight, all we are guaranteed is

So, this approach is limited to settings in which \(\alpha ^2>\beta \).

To overcome this limitation we want to find a quantity that is more tightly characterized by the statistical distance of the circuits. This quantity, which we call imbalance, will be central in all of the proofs in this work. The imbalance measures how likely it is that an output string y was generated from \(C_1\) versus \(C_0\). Formally,

(2.3)

Elementary manipulations yields that

(2.4)

(Recall that y is sampled by first drawing a uniform random bit b and a string x, and setting \(y=C_b(x)\). Hence, using the notation that \(P_b\) denotes the output distributions of the circuit \(C_b\), the marginal distribution of y is \(\frac{1}{2}P_0+\frac{1}{2}P_1\).)

Equation (2.4) naturally gives rise to the following algorithm for approximating \({\text {SD}}(C_0,C_1)\):

figure b

The quantities \(p_1\) and \(p_0\) are in fact the empirical distribution of b conditioned on y, computed using k samples. By choosing large enough k, we get that and so \(\widehat{\theta }_i\approx \theta _{y_i}\). By then choosing large enough t, we get that \(\frac{1}{t}\sum _{i=1}^t|{\widehat{\theta }_i}|\approx {\text {SD}}(C_0,C_1)\). Hence, we can distinguish between the cases \({\text {SD}}(C_0,C_1)\ge \alpha \) or \({\text {SD}}(C_0,C_1)\le \beta \), for any \(\alpha >\beta +1/{\mathsf {poly}}\).

Essentially the same proof continues to work if \(\mathsf {A}\) is not a perfect distributional inverter, but is close enough to being so—that is, on input y its output distribution is close to being uniform over \(f^{-1}(y)\) for most (but not all) tuples \(C_0,C_1,y\).

The above proof strategy also yields a new proof for the strengthening of [Gol90] by Naor and Rothblum [NR06].Footnote 16 See Theorem 2.1 below for a discussion about the differences between our techniques and those of [NR06].

Distributional Collision Resistant Hash Function. As a matter of fact, the above proof also shows that the average-case hardness of \( {{\,\mathrm{SDP}\,}}^{\alpha , \beta } \) also implies that the function \( f_{C_0, C_1}(b,x) = C_b(x) \) is a distributional k-multi-collisionFootnote 17 resistant hash function, for \( k = O \left( \frac{\log n}{(\alpha - \beta )^2} \right) \). That is, for a random output y of f, it is difficult to find k random preimages of y. This is because access to such a set of k random pre-images of random \(y_i\)’s is all we use the inverter \(\mathsf {A}\) for in the above reduction, and it could handily be replaced with a k-distributional multi-collision finder.

Remark 2.1

(Comparison to [NR06]). Naor and Rothblum’s proof implicitly attempts to approximate the maximal likelihood bit of y; that is, the bit \(b_{ml}\) such that (breaking ties arbitrarily). Indeed, the maximal likelihood bit, as shown by [SV03], is closely related to the statistical distance:

(2.5)

To approximate \(b_{ml}\), [NR06] make, like us, many calls to \(\mathsf {A}(y)\), and take the majority of the answered bits. The idea is that when the statistical distance is large, the majority is likely to be \(b_{ml}\), and when the statistical distance is small, the majority is equally likely to be \(b_{ml}\) or \(1-b_{ml}\).

To formally prove this intuition, it must hold that if \({\text {SD}}(C_0,C_1)\) is large, then is sufficiently large; putting in our terminology and using Eq. (2.4), if is sufficiently large, then \(\left| \theta _y\right| \) should be large for a random y (and the opposite should hold if \({\text {SD}}(C_0,C_1)\) is small). While these statements are true, in order to prove them, [NR06]’s analysis involves some work which results in a more complicated analysis.

We manage to avoid such complications by using the imbalance \(\theta _y\) and its characterization of statistical distance (Eq. 2.4). Furthermore, [NR06]’s approach only attempts to distinguish between the cases when \({\text {SD}}(C_0,C_1)\) is high or low, while our approach generalizes to approximate \({\text {SD}}(C_0,C_1)\). Lastly, Naor and Rothblum do not construct one-way functions based on the average-case hardness of \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) with any noticeable gap as we do. Using their technique to do so seems to require additional work—work that our analysis significantly simplifies.

2.2 Interactive Proof for Statistical Distance Approximation

We proceed to describe a constant-round public-coin protocol in which a computationally unbounded prover convinces a computationally bounded verifier that the statistical difference of a given pair of circuits is what the prover claims it to be, up to any inverse polynomial (additive) error. Such a protocol simultaneously establishes the inclusion of \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) in both \({\mathsf {AM}}\) and \({\mathsf {coAM}}\) for any \(\alpha >\beta +1/{\mathsf {poly}}\).

Our starting point is the algorithm we described above that used a one-way function inverter to estimate the statistical distance. Specifically, that algorithm used the inverter to estimate \(\theta _y\) for random y’s, and then applied Eq. (2.4). We would like to use the prover, instead of the inverter, to achieve the same task.

In our protocol, the verifier draws polynomially many y’s and sends them to the prover. The prover responds with values \(\widehat{\theta }_i\)’s, which it claims are the genuine \(\theta _{y_i}\)’s. But how can the verifier trust that the prover sent the correct values? In the reduction in Sect. 2.1, we used k many samples of b conditioned on y to estimate b’s true distribution. A standard concentration bound shows that as k grows, the number of ones out of \(b_1,\dots ,b_k\), all sampled from (b|y), is very close to . Similarly, the number of zeros is very close to . Consider the following typical set for any fixed y and arbitrary value \(\theta \):

$$\begin{aligned} \mathcal {T}^{k,\theta }_y=\left\{ \begin{array}{c|c}(b_1,x_1,b_2,x_2,\dots ,b_k,x_k)&{} \begin{array}{c} C_{b_i}(x_i)=y \text { for all { i},} \\ \text {and }\frac{\sum _{i=1}^kb_i -\sum _{i=1}^k(1-b_i)}{k}\approx \theta \end{array}\end{array}\right\} . \end{aligned}$$

Namely, \(\mathcal {T}^{k,\theta }_y\) contains every k-tuple of \((b_i,x_i)\) such that all map to y, and each tuple can be used to estimate \(\theta \) well—the difference between the number of ones and the number of zeros, normalized by k, is close to \(\theta \). Also consider the pre-image set of y: . Since as k grows the estimation of \(\theta _y\) improves, we expect that \(\mathcal {T}^{k,\theta _y}_y\)—the typical set of y with the value \(\theta _y\)—to contain almost all tuples. Indeed, standard concentration bounds show that

$$\begin{aligned} \frac{\left| \mathcal {T}^{k,\theta _y}_y\right| }{\left| \mathcal {I}_y\right| ^k}\ge 1-e^{-\varOmega (k)}. \end{aligned}$$
(2.6)

On the other hand, the sets \(\mathcal {T}^{k,\theta '}_y\), corresponding to values \(\theta '\) that are far from \(\theta _y\), should be almost empty. Indeed, if \(\left| \theta '-\theta _y\right| \ge \varOmega (1)\), then,

$$\begin{aligned} \frac{\left| \mathcal {T}^{k,\theta '}_y\right| }{\left| \mathcal {I}_y\right| ^k}\le e^{-\varOmega (k)}. \end{aligned}$$
(2.7)

So, for the verifier to be convinced that the value \(\widehat{\theta }\) sent by the prover is close to \(\theta _y\), the prover can prove that the typical set \(\mathcal {T}^{k,\widehat{\theta }}_y\) is large. To do so, the parties will use the public-coin constant round protocol for set lower-bound of [GS89], which enables the prover to assert statements of the form “the size of the set \(\mathcal {S}\) is at least s”.

However, there is still one hurdle to overcome. The typical set \(\mathcal {T}^{k,\theta _y}_y\) is only large relative to \(\left| \mathcal {I}_y\right| ^k\). Since we do not known how to compute \(|\mathcal {I}_y|\) it is unclear what should be the size s that we run the set lower-bound protocol with. Our approach for bypassing this issue is as follows. First observe that the expected value, over a random y, of the logarithm of the size of \(\mathcal {I}_y\) is the entropyFootnote 18 of (bx) given y. Namely,

(2.8)

where the jointly distributed random variables (BXY) take the values of randomly drawn (xby). Thus, if we draw t independent elements \(y_1,\dots ,y_t\), the average of \(\log \left| \mathcal {I}_y\right| \) gets closer to \(t\cdot {{\,\mathrm{H}\,}}(B,X|Y)\), as t grows. Specifically,

(2.9)

where n denotes the output length of the given circuits. For large enough t, we can thus assume that the size of this product set is approximately \(2^{t\cdot H(B,X|Y)}\), and run the set lower bound protocol for all the \(y_i\)’s together. That is, we ask the prover to send t estimates \((\widehat{\theta }_1, \dots , \widehat{\theta }_t)\) for the values \((\theta _{y_1},\dots ,\theta _{y_t})\), and prove that the size of the product set \(\mathcal {T}^{k,\widehat{\theta }_1}_{y_1} \times \cdots \times \mathcal {T}^{k,\widehat{\theta }_1}_{y_1}\) is almost \(2^{t\cdot {{\,\mathrm{H}\,}}(B,X|Y)}\).

So far we have reduced knowing the size of \(\mathcal {I}_y\) to knowing \({{\,\mathrm{H}\,}}(B,X|Y)\), but again it seems difficult for the verifier to compute this quantity on its own. Actually, standard entropy manipulations show that

$$\begin{aligned} {{\,\mathrm{H}\,}}(B,X|Y) = (m+1)-{{\,\mathrm{H}\,}}(Y), \end{aligned}$$

where m denotes the input length of the given circuits. It thus suffices to approximate \({{\,\mathrm{H}\,}}(Y)\). Recall that y is the output of the circuit that maps (xb) to \(C_b(x)\), so Y is drawn according to an output distribution of a known circuit. Luckily, Goldreich, Sahai and Vadhan [GSV99] showed that approximating the output entropy of a given circuit is in \({\mathsf {NISZK}}\), and thus has a constant-round public-coin protocol (since \({\mathsf {NISZK}}\subseteq {\mathsf {AM}}\cap {\mathsf {coAM}}\)).

To conclude, we describe the entirety of our protocol, which proves Theorem 1.9.

figure c

2.3 \({{\,\mathrm{TDP}\,}}\) and \({{\,\mathrm{JSP}\,}}\) Are \({\mathsf {SZK}}\)-Complete

We show that both \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) and \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) with \(\alpha >\beta +1/{\mathsf {poly}}\) are \({\mathsf {SZK}}\)-complete. Since the proof of the former uses that of the latter we start by giving an outline that \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\)-complete.

2.3.1 \( \textsc {Jensen}\text {-}\textsc {Shannon}\,\textsc {Divergence}\,\textsc {Problem} \) Is \({\mathsf {SZK}}\)-Complete

We need to show that \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) with \(\alpha >\beta +1/{\mathsf {poly}}\) is both in \({\mathsf {SZK}}\) and \({\mathsf {SZK}}\)-hard. In both parts we use the following characterization of the Jensen-Shannon divergence, which follows from its definition. Given a pair of circuits \(C_0\) and \(C_1\), consider the jointly distributed random variables (BXY), where B is a uniformly random bit, X is a uniformly random string and \(Y=C_B(X)\). Then, it follows from some elementary manipulations that:

$$\begin{aligned} {{\,\mathrm{JS}\,}}(C_0,C_1) = 1-{{\,\mathrm{H}\,}}(B|Y). \end{aligned}$$
(2.10)

We use this characterization to tie Jensen-Shannon Divergence Problem to another \({\mathsf {SZK}}\)-complete problem—the Entropy Difference Problem (\({{\,\mathrm{EDP}\,}}\)) with a gap function g. The input to \({{\,\mathrm{EDP}\,}}^g\) is also a pair of circuits \(C_0\) and \(C_1\). YES instances are those in which the entropy gap \({{\,\mathrm{H}\,}}(C_0)-{{\,\mathrm{H}\,}}(C_1)\) is at least g(n) (where n is the output length of the circuits) and NO instances are those in which the gap is at most \(-g(n)\). Goldreich and Vadhan [GV99] showed that \({{\,\mathrm{EDP}\,}}^{g}\) is \({\mathsf {SZK}}\)-complete for any noticeable function g. Our proof that \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\)-complete closely follows the reduction from the reverse problem of \({{\,\mathrm{SDP}\,}}{}\) (i.e., in which YES instances are distributions that are statistically close) to \({{\,\mathrm{EDP}\,}}{}\) [Vad99, Section 4.4].

  • \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\): We reduce \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) to \({{\,\mathrm{ED}\,}}^{(\alpha -\beta )/2}\). Given \(C_0\) and \(C_1\), the reduction outputs a pair of circuits \(D_0\) and \(D_1\) such that \(D_1\) outputs a sample from (BY) and \(D_0\) outputs a sample from \((B',Y)\), where \(B'\) is an independent random bit with \({{\,\mathrm{H}\,}}(B')=1-\frac{\alpha +\beta }{2}\). The chain rule for entropyFootnote 19 implies that

    $$\begin{aligned} {{\,\mathrm{H}\,}}(D_0)-{{\,\mathrm{H}\,}}(D_1) = 1-\frac{\alpha +\beta }{2}-{{\,\mathrm{H}\,}}(B|Y) = {{\,\mathrm{JS}\,}}(C_0,C_1)-\frac{\alpha +\beta }{2}, \end{aligned}$$

    where the second equality follows from Eq. (2.10). Thus, if \({{\,\mathrm{JS}\,}}(C_0,C_1)\ge \alpha \), then \({{\,\mathrm{H}\,}}(D_0)-{{\,\mathrm{H}\,}}(D_1)\ge \frac{\alpha -\beta }{2}\); and if \({{\,\mathrm{JS}\,}}(C_0,C_1)\le \beta \), then \({{\,\mathrm{H}\,}}(D_0)-{{\,\mathrm{H}\,}}(D_1)\le -\frac{\alpha -\beta }{2}\). And since \({{\,\mathrm{ED}\,}}^{(\alpha -\beta )/2} \in {\mathsf {SZK}}\), we get that \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta } \in {\mathsf {SZK}}\).

  • \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\)-hard: We reduce \({{\,\mathrm{SDP}\,}}^{1-2^{-k},2^{-k}}\) to the problem \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\), for some large enough k. This is sufficient since \({{\,\mathrm{SDP}\,}}^{1-2^{-k},2^{-k}}\) is known to be \({\mathsf {SZK}}\)-hard [SV03].Footnote 20 In the presentation of related results in his thesis, Vadhan relates the statistical distance of the circuits to the entropy of B given Y [Vad99, Claim 4.4.2]. For example, if \({\text {SD}}(C_0,C_1)=0\) (i.e., the distributions are identical), then B|Y is a uniformly random bit, and so \({{\,\mathrm{H}\,}}(B|Y)=1\); and if \({\text {SD}}(C_0,C_1)=1\) (i.e., the distributions are disjoint), then B is completely determined by Y, and so \({{\,\mathrm{H}\,}}(B|Y)=0\). More generally, Vadhan showed that if \({\text {SD}}(C_0,C_1)=\delta \), thenFootnote 21

    $$\begin{aligned} 1-\delta \le {{\,\mathrm{H}\,}}(B|Y) \le h\left( \frac{1+\delta }{2}\right) . \end{aligned}$$
    (2.11)

    By taking k to be large enough (as a function of \(\alpha \) and \(\beta \)), and applying Eqs. (2.10) and (2.11), we have that if \({\text {SD}}(C_0,C_1)\ge 1-2^{-k}\), then \({{\,\mathrm{JS}\,}}(C_0,C_1)\ge \alpha \); and if \({\text {SD}}(C_0,C_1)\le 2^{-k}\), then \({{\,\mathrm{JS}\,}}(C_0,C_1)\le \beta \). Thus, the desired reduction is simply the identity function that outputs the input circuits.

2.3.2 Triangular Discrimination Problem is \({\mathsf {SZK}}\)-Complete

We need to show that \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) with \(\alpha >\beta +1/{\mathsf {poly}}\) is both in \({\mathsf {SZK}}\) and \({\mathsf {SZK}}\)-hard. Showing the latter is very similar to showing that \({{\,\mathrm{JSP}\,}}^{\alpha ,\beta }\) is \({\mathsf {SZK}}\)-hard, but using Eq. (1.1) to relate the triangular discrimination to statistical distance (instead of Eq. (2.11) that relates the Jensen-Shannon divergence to statistical distance). We leave the formal details to the body of this paper and focus here on showing that \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\).

A natural approach to show that \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\) is to follow Sahai and Vadhan’s proof that \({{\,\mathrm{SDP}\,}}^{2/3,1/3}\) is in \({\mathsf {SZK}}\). Specifically, a main ingredient in that proof is to polarize the statistical distance of the circuits (to reduce the simulation error). Indeed, if we can reduce \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) to, say, \({{\,\mathrm{TDP}\,}}^{0.9,0.1}\) by polarizing the triangular discrimination, then Eq. (1.1) would imply that we also reduce \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) to \({{\,\mathrm{SDP}\,}}^{2/3,1/3}\), which we know is in \({\mathsf {SZK}}\).

We are indeed able to show such a polarization lemma for triangular discrimination (using similar techniques to [SV03]’s polarization lemma). However, this lemma only works when the gap between \(\alpha \) and \(\beta \) is roughly \(1/\log \). Actually, the polarization lemma of [SV03] also suffers the same limitation with respect to the gap between \(\alpha ^2\) and \(\beta \).

Still, we would like to handle also the case that the gap between \(\alpha \) and \(\beta \) is only \(1/{\mathsf {poly}}\). To do so we take a slightly different approach. Specifically, we reduce \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) to \({{\,\mathrm{JSP}\,}}^{\alpha ', \beta '}\), where \( \alpha '\) and \(\beta '\) are also noticeably separated.

An important step toward showing this reduction is to characterize the triangular discrimination and the Jensen-Shannon divergence via the imbalance \(\theta _y\) (see Eq. (2.3)), as we already did for statistical distance. Recall that given \(Y=y\), the random variable B takes the value 1 with probability \(\frac{1+\theta _y}{2}\), and 0 otherwise. Hence, Eq. (2.10) can also be written as

(2.12)

As for the triangular discrimination, it follows from the definition that

(2.13)

Furthermore, by Taylor approximation, for small values of \(\theta \), it holds that

$$\begin{aligned} h\left( \frac{1+\theta }{2}\right) \approx 1-\theta ^2. \end{aligned}$$
(2.14)

As we can see, the above equations imply that if all the \(\theta _y\)’s were small, a gap in the triangular discrimination would also imply a gap in the Jensen-Shannon divergence. Thus, we would like an operation that reduces all the \(\theta _y\).

The main technical tool we use to reduce \(\theta _y\) is to consider the convex combination of the two input circuits. Given a pair of circuits \(C_0\) and \(C_1\), consider the pair of circuits \(D_0\) and \(D_1\) such that \(D_b=\lambda \cdot C_b+(1-\lambda ) \cdot \frac{C_0+C_1}{2}\).Footnote 22 Let \(Q_b\) denote the output distribution of \(D_b\), and recall that \(P_b\) denotes the output distribution of \(C_b\). We also let \(\theta '_y\) be defined similarly to \(\theta _y\), but with respect to \(D_0\) and \(D_1\) (rather than \(C_0\) and \(C_1\)). Using this notation, we have that \(\theta _y=\frac{P_1(y)-P_0(y)}{P_1(y)+P_0(y)}\), and it may be seen that

$$\begin{aligned} \theta '_y = \frac{Q_1(y)-Q_0(y)}{Q_1(y)+Q_0(y)} = \lambda \cdot \theta _y. \end{aligned}$$
(2.15)

So, our reduction chooses a sufficiently small \(\lambda \), and outputs the circuits \(D_0\) and \(D_1\). Some care is needed when choosing \(\lambda \). Equations (2.13) and (2.15) yield that \({{\,\mathrm{TD}\,}}(D_0,D_1)=\lambda ^2 \cdot {{\,\mathrm{TD}\,}}(C_0,C_1)\). Hence, the convex combination also shrinks the gap in triangular discrimination. We show that by choosing \(\lambda \approx \sqrt{\alpha -\beta }\), the approximation error in Eq. (2.14) is smaller than the aforementioned shrinkage, and the reduction goes through. The resulting gap in the Jensen-Shannon divergence is roughly \((\alpha -\beta )^2\), which is noticeable by the assumption that \(\alpha >\beta +1/{\mathsf {poly}}\).

This shows that \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\) if \(\alpha > \beta + 1/{\mathsf {poly}}\). By the relationship between \({{\,\mathrm{TD}\,}}\) and \({\text {SD}}\) (Eq. (1.1)), this implies that \({{\,\mathrm{SDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\) if \(\alpha ^2 > \beta + 1/{\mathsf {poly}}\). This, in turn, by the \({\mathsf {SZK}}\)-hardness of \({{\,\mathrm{SDP}\,}}^{2/3,1/3}\) and the known polarization lemma that applies for the same, implies polarization for statistical distance for any \((\alpha ,\beta )\) such that \(\alpha ^2 > \beta + 1/{\mathsf {poly}}\).

2.3.3 Reflections and an Open Problem

Many f-divergences of interest can be expressed as an expectation, over \(y \sim Y\), of a simple function of \(\theta _y\). That is, an expression of the form , for some function \(g : [-1,1] \rightarrow [0,1]\). For example:

  • (i.e., \(g(z)=|z|\), see Eq. (2.4));

  • (i.e., \(g(z)=z^2\), see Eq. (2.13)); and

  • (i.e., \(g(z) = 1-h\left( \frac{1+z}{2}\right) \), see Eq. (2.12)).

To reduce \({{\,\mathrm{TDP}\,}}\) to \({{\,\mathrm{JSP}\,}}\), we took a convex combination of the two circuits and used the fact that \(1-h\left( \frac{1+\theta _y}{2}\right) \approx O(\theta _y^2)\) for small values of \(\theta _y\). While this worked for polarization of \({{\,\mathrm{TD}\,}}\) (which corresponds to \(g(z)=z^2)\), it seems unlikely to yield a polarization lemma for \({\text {SD}}\) for an arbitrarily small (but noticeable) gap. The reason is that the function \(g(z) = \left| z\right| \)—the g-function corresponding to \({\text {SD}}\)—is not differentiable at 0 and in particular does not act like \(z^2\) for small values of z. As we find this similarity between the different notions of distance striking, and indeed our proofs leverage the relations between them, we provide in Fig. 1 a plot comparing the different choices for the function g.

Fig. 1.
figure 1

Comparison between the difference choices of the function g that were discussed. Since all functions are symmetric around 0, we restrict to the domain [0, 1]. Recall that \(g_1(\theta )=|\theta |\) corresponds to \({\text {SD}}\), \(g_2(\theta )=\theta ^2\) to \({{\,\mathrm{TD}\,}}\), \(g_3(\theta )=1-h\left( \frac{1+\theta }{2}\right) \) to \({{\,\mathrm{JS}\,}}\) and \(g_4(\theta )=1-\sqrt{1-\theta ^2}\) to \({{\,\mathrm{H}\,}}^2\).

Another popular f-divergence that we have not discussed thus farFootnote 23 is the squared Hellinger distance, defined as \({{\,\mathrm{H}\,}}^2(P,Q) = \frac{1}{2}\sum _{y}\left( \sqrt{P_y}-\sqrt{Q_y}\right) ^2\). It can be shown that , and so also this distance falls within the above framework (i.e., by considering \(g(z) = 1-\sqrt{1-z^2}\)).

Notably, the squared Hellinger distance also acts like \({{\,\mathrm{JS}\,}}\) (and \({{\,\mathrm{TD}\,}}\)) around 0; namely, \(1-\sqrt{1-\theta _y^2}\approx O(\theta _y^2)\) for small values of \(\theta _y\). However, unlike \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\), we do not know how to show that the Hellinger Difference Problem, denoted \({{\,\mathrm{HDP}\,}}^{\alpha ,\beta }\) and defined analogously to \({{\,\mathrm{TDP}\,}}^{\alpha ,\beta }\) (while replacing the distance \({{\,\mathrm{TD}\,}}\) with \({{\,\mathrm{H}\,}}^2\)), is in \({\mathsf {SZK}}\) for all \((1/{\mathsf {poly}})\)-separated \((\alpha ,\beta )\). We do mention that \({{\,\mathrm{H}\,}}^2(P,Q)\le {{\,\mathrm{TD}\,}}(P,Q)\le 2{{\,\mathrm{H}\,}}^2(P,Q)\), and thus \({{\,\mathrm{HDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\) if \(\alpha \) and \(\beta /2\) are \((1/{\mathsf {poly}})\)-separated. However, the proof described above does not go through if we try to apply it to the Hellinger distance—we cannot guarantee that the gap in the Hellinger distance after taking the convex combination is larger than the error in the Taylor approximation. Indeed, the question whether \({{\,\mathrm{HDP}\,}}^{\alpha ,\beta }\) is in \({\mathsf {SZK}}\) for any \((1/{\mathsf {poly}})\)-separated \((\alpha ,\beta )\), first raised by Fehr and Vaudenay [FV17], remains open.