1 Introduction

Witness indistinguishability (WI) is one of the most widely used notions of privacy for proof systems. Informally, WI protocols [13] allow a prover to convince a verifier that some statement X belongs to an NP language L, with the following privacy guarantee: if there are two witnesses \(w_0, w_1\) that both attest to the fact that \(X \in L\), then a verifier should not be able to distinguish an honest prover using witness \(w_0\) from an honest prover using witness \(w_1\). WI is a relaxation of zero-knowledge and has proven to be surprisingly useful. Since WI is a relaxation, unlike zero-knowledge, there are no known lower bounds on the rounds of interaction needed to build WI protocols in the plain model.

Indeed, Dwork and Naor [10, 12] introduced the notion of two-message public-coin witness indistinguishable proofs (ZAPs) without any setup assumptions, and also constructed it assuming trapdoor permutations. We observe that the public-coin feature of ZAPs yield public verifiability of the resulting proof system, since a third party can use the public coins of the verifier to determine whether or not the prover’s response constitutes a valid proof. Subsequently, Groth et al. [15] constructed ZAPs assuming the decisional linear assumption, and Bitansky and Paneth [2] constructed ZAPs from indistinguishability obfuscation and one way functions.

Our Goal: ZAPs with Statistical Privacy. As originally introduced, ZAPs satisfied soundness against unbounded provers (i.e. were proofs), and witness indistinguishability against computationally bounded verifiers. In this work, we examine whether these requirements can be reversed: can we achieve witness indistinguishability against computationally unbounded verifiers, while achieving soundness against computationally bounded cheating provers? We call such objects statistical ZAP arguments.

An analogue of this question has a long history of study in the context of zero-knowledge protocols. Indeed, zero-knowledge protocols for NP were originally achieved guaranteeing privacy to hold only against computationally bounded verifiers [14]. In the case of zero-knowledge, the notion of statistical zero-knowledge arguments was achieved soon after [6, 8], that strengthened the privacy requirement to hold against computationally unbounded verifiers, while requiring soundness to hold only against computationally bounded provers.

Because ZAPs require a single message each from the verifier and the prover, a better comparison would perhaps be to non-interactive zero-knowledge (NIZK)  [4]. Even in the case of NIZKs, we have had arguments for NP satisfying statistical zero-knowledge since 2006 [15]. And yet, the following natural question has remained open since the introduction of ZAPs nearly two decades ago.

Do there exist statistical ZAP arguments for NP in the plain model?

Statistical witness indistinguishability, just like its zero-knowledge counterpart, guarantees everlasting privacy against malicious verifiers, long after protocols have completed execution. Of course, to achieve statistical privacy, we must necessarily sacrifice soundness against unbounded provers. But such a tradeoff could often be desirable, since soundness is usually necessary only in an online setting: in order to convince a verifier of a false statement, a cheating prover must find a way to cheat during the execution of the protocol.

The Main Challenge: Achieving a Public-coin Protocol. The recent work of Kalai et al. [20] constructed the first two message statistically witness indistinguishable arguments in the plain model under standard sub-exponential assumptions. However, their arguments are only privately verifiable.

The blueprint of [20], which builds on other similar approaches in the computational witness indistinguishability setting [1, 18], uses oblivious transfer (OT) to reduce interaction in a \(\varSigma \)-protocol. In all these approaches, the verifier obtains the third message of the \(\varSigma \)-protocol via the output of the OT, and therefore these approaches fundamentally require the use of private coins for verification. It is also worth noting that these protocols are not sound against provers that have access to the private coins of the verifier, which restricts their applicability. Additionally, the verifier’s message is not reusable, which means that soundness is not guaranteed if the same verifier message is reused across multiple executions.

On the other hand, a public coin argument, which is the focus of this work, does not suffer from any of these limitations. In fact, where the verifier’s message only needs to be a uniformly random string. Such a string can easily be generated, for example, via an MPC protocol, and can then be reused across multiple executions with no loss in soundness.

We stress that prior to our work, even two message statistically witness indistinguishable arguments that were only publicly verifiable (and not necessarily public coin) were not known.

1.1 Our Results

In this paper, we construct the first two message public coin statistically witness indistinguishable arguments for NP in the plain model. Our constructions assume quasi-polynomial hardness of the learning with errors (LWE) problem. In fact, these are the first known two-message public coin (or even publicly verifiable) arguments based on lattice assumptions, satisfying any notion of witness indistinguishability (computational/statistical). We provide an informal theorem below.

Informal Theorem 1

Assuming quasi-polynomial hardness of the learning with errors (LWE) assumption, there exist two message public-coin statistically witness indistinguishable arguments for NP in the plain model.

Our results are obtained by combining two recent results in a new way: recent constructions of correlation-intractable hash functions based on LWE [7] and the statistically hiding extractable commitments of [20] (which are built upon [21]). This yields a new method of using correlation intractable hash functions to instantiate the Fiat-Shamir transform, by extracting messages from statistically hiding commitments, instead of from statistically binding trapdoor commitments – that we believe may be of independent interest.

Additionally, we observe that the same protocol has a super-polynomial zero knowledge simulator assuming subexponential LWE, giving the following theorem.

Informal Theorem 2

Assuming subexponential hardness of the learning with errors (LWE) assumption, there exist two message public-coin super-polynomial simulation statistical zero knowledge arguments for NP in the plain model.

2 Overview of Techniques

In this section, we provide a brief overview of the techniques we use to build a two message public coin statistical WI argument (henceforth referred to as a \(\mathsf {ZAP}\)).

Our starting point is the popular technique to construct \(\mathsf {ZAP}\)s for \(\mathrm {NP}\), due to Dwork and Naor [11]. Their construction makes use of a statistically sound \(\mathsf {NIZK}\) in the common random string model, and can be described as follows.

  • In the first round, the verifier picks uniformly random strings \(\mathsf {crs}_{1},....,\mathsf {crs}_{{\lambda }}\), where \({\lambda } \) denotes the security parameter, and sends them to the prover.

  • In the second round, the prover samples a uniformly random string \(\mathsf {crs}'\). It computes proofs \((\pi _1,...,\pi _{\ell })\) where \(\pi _i\) is a \(\mathsf {NIZK}\) proof for the instance x that verifies under \(\mathsf {crs}'_i= \mathsf {crs}' \oplus \mathsf {crs}_i\) The prover sends \(\mathsf {crs}'\) along with proof strings \((\pi _1,...,\pi _{\ell })\) to the verifier.

The soundness of this protocol can be proven based on the statistical soundness of \(\mathsf {NIZK}\), in the following way. Fix an instance \(x\notin L\). Statistical soundness of the \(\mathsf {NIZK}\) implies that with probability at least 1/2 over the choice of \(\mathsf {crs}\) from the domain of the common random string of \(\mathsf {NIZK}\), there does not exist a proof \(\pi \) that verifies for instance x with respect to \(\mathsf {crs}\). Put another way, for fixed x, for at least 1/2 of the strings in the domain of the common random string of the \(\mathsf {NIZK}\), there does not exist a proof for x. One can use this fact to argue combinatorially that over the choice of random \(\mathsf {crs}_1,...,\mathsf {crs}_{{\lambda }}\), the probability that there exists \(\mathsf {crs}'\) for which there exist proofs with respect to every member of the set \(\{\mathsf {crs}'_i=\mathsf {crs}'\oplus \mathsf {crs}_i\}_{i \in [\ell ]}\), is negligible.

The proof of witness indistinguishability follows quite simply, by switching the witness in each of the proofs one by one.

But when applied to our context, this approach immediately encounters the following problems.

  1. 1.

    The soundness argument outlined above crucially requires that with high probability over the CRS of the \(\mathsf {NIZK}\), there just should not exist a proof for any fixed false instance. This translates to requiring statistical soundness of the underlying \(\mathsf {NIZK}\).

  2. 2.

    One cannot hope to get a WI argument secure against unbounded verifiers via this transform, unless the underlying \(\mathsf {NIZK}\) also satisfies privacy against unbounded verifiers, i.e. satisfies statistical zero-knowledge.

  3. 3.

    It is believed that statistically sound and statistical zero-knowledge \(\mathsf {NIZK}\)s for all of \(\mathrm {NP}\) cannot exist.

  4. 4.

    Even if we only desired computational witness indistinguishability based on lattice assumptions, no statistically sound NIZKs in the common random string model are known from lattice assumptions.

As an intermediate objective, we will first try to tackle problem #4 and build a publicly verifiable computational WI argument based on LWE.

2.1 A Simple Two-Message Public-Coin Computational WI Argument

We make a few modifications to the template above so as to obtain a publicly verifiable computational WI argument based on LWE.

Before we describe these modifications, we list a few ingredients. We will assume that there exists a dense public key encryption scheme \(\mathsf {PKE}\), that is, a scheme for which every string in \(\{0,1\}^{|{pk}|}\) corresponds to a valid public key (and therefore every string has a valid secret key). We will further assume the existence of a correlation intractable hash function family. Informally, a hash function family \(\mathcal {H}\) is correlation-intractable for a function family \(\mathcal {F}\) if:

  • Given a fixed function \(f\in \mathcal {F}\), and a randomly generated key K (that can depend on f), the probability that an adversary outputs x such that \((x,\mathcal {H}(K,x))=(x,f(x))\) is at most \(\epsilon \).

  • The hash key K statistically hides the function f, such that adversaries cannot distinguish a random key from a key for f with advantage better than \(\epsilon \).

We will set \(\epsilon = 2^{-2|pk|}\). We will use \(\varPi \) to denote a parallel repetition of Blum’s \(\varSigma \)-protocol for Graph Hamiltonicity, represented as \(\{a_i = \mathsf {com}(\hat{a}_i)\}_{i \in [{\lambda } ]},\{e_i\}_{i \in [{\lambda } ]}\), \(\{z_i\}_{i \in [{\lambda } ]}\}\), where \(\{a_i\}_{i \in [{\lambda } ]}\) represents the first commitments sent by the prover, \(\{e_i\}_{i \in [{\lambda } ]}\) is a challenge string sent by the verifier and \(\{z_i\}_{i \in [{\lambda } ]}\) represents the corresponding third message by the prover. Let the instance be x and its witness be w. Then, the protocol is described as follows.

  1. 1.

    In the first round, the verifier randomly samples a key K for the correlation intractable hash function \(\mathcal {H}\) for bounded size \(\mathsf {NC}_1\) functions.

  2. 2.

    In the second round, the prover picks a key pair \((pk,sk)\) for the scheme \(\mathsf {PKE}\). Then the prover uses \(\mathsf {PKE}.\mathsf {Enc}(pk,\cdot )\) as a commitment scheme to compute the commitments \(\{a_i\}_{i \in [{\lambda } ]}\). Next, the prover computes \(e = \mathcal {H}(K,x,\{a_i\}_{i \in [{\lambda } ]}) \in \{0,1\}^{{\lambda }}\), and uses (xwae) to compute \(z = (z_1,...,z_{{\lambda }})\) according to the protocol \(\varPi \). It outputs \((pk, \{a_i = \mathsf {PKE}.\mathsf {Enc}(pk,\hat{a}_i)\}_{i \in [{\lambda } ]} ,e,z)\)

While witness indistinguishability of this protocol is easy to see, arguing soundness is trickier. In order to argue soundness, the reduction will simple try to guess the public key \(pk^*\) that the prover will use, and will abort if this guess is not correct. Note that such a guess is correct with probability at least \(2^{- \vert pk^* \vert }\).

Suppose a cheating prover convinces a verifier to accept false statements with probability \(\frac{1}{p({\lambda })}\) for some polynomial \(p(\cdot )\). Then, with probability at least \(\frac{1}{p(\cdot )} \cdot 2^{-\vert pk^* \vert }\), the reduction guesses \(pk^*\) correctly, and the prover provides a convincing proof of a false statement using \(pk^*\).

In the next hybrid, the challenger guesses \(pk^*\) together with the corresponding secret key \(sk^*\), and then samples a correlation intractable hash key for a specific function \(f_{sk^*}(\cdot )\). The function \(f_{sk^*}(\cdot )\) on input x, along with a (the messages committed in the \(\varSigma \)-protocol), outputs the only possible string \(e_{bad}\) for which there exists a string z such that \((a,e_{bad},z)\) verifies for \(x\notin L\).Footnote 1 Note that this function is in \(\mathsf {NC}_1\). By \(\epsilon \)-security of the correlation intractable hash family (where \(\epsilon = 2^{-2\vert pk\vert } \)), with probability at least \(\Big (\frac{1}{p(\cdot )} - 2^{-\vert pk\vert } \Big ) \cdot 2^{-\vert pk\vert }\), the reduction guesses \(pk^*\) correctly, and the prover provides a convincing proof of a false statement using \(pk^*\).

Finally, since the correlation intractable hash function is \(\epsilon \)-secure, in the final hybrid adversary cannot produce a proof for x with probability greater than \(\epsilon \), as this will mean that he output \(a^*,e^*,z^*\) such that \(e^*=f_{bad}(x,a^*)\).

The protocol sketched above is public-coin, because when we instantiate the correlation-intractable hash family with the LWE-based one by [24], the hash keys are statistically close to uniform.

In the description above, we also relied on a dense public key encryption scheme, which is unfortunately not known to exist based on LWE. However, we note that we can instead use a scheme with the property that at least 1/2 of the strings in \(\{0,1\}^{\ell _{\mathsf {PKE}}}\) correspond to correct encryption keys with a valid secret key, and the property that public keys are pseudorandom. Then, the verifier sends \({\lambda } \) public keys \(pk_1, \ldots , pk_\lambda \), and the prover outputs \(pk'\), and then uses the public keys \(\{(pk' \oplus pk_i)\}_{i \in [{\lambda } ]}\) to compute \({\lambda } \) proofs. Soundness can be obtained by arguing that with overwhelming probability, there will exist an index \(i \in [{\lambda } ]\) such that \((pk'\oplus pk_i)\) has a secret key, just like the [11] technique described at the beginning of this overview.

However, the construction above falls short of achieving statistical witness indistinguishability against malicious verifiers. The reason is the following: arguing that the construction described above satisfies soundness requires relying on correlation intractability of the hash function. In order to invoke the correlation intractable hash function, it is crucial that the prover be “committed” to a well-defined, unique message \(\{a_i\}_{i \in [{\lambda } ]}\), that can be extracted using the secret key \(sk^*\) of the public key encryption scheme. At first, statistical hiding, together with such extraction, may appear to be contradictory objectives.

Indeed, we will try obtain a weaker version of these contradictory objectives, and specifically, we will rely on a two-message statistically hiding extractable commitment scheme [20].

2.2 Using Correlation-Intractable Hashing with Statistically Hiding Extractable Commitments

In the recent exciting work on using LWE-based correlation-intractable hashing [7, 24] for achieving soundness, as well as in the “warm up” ZAP protocol described above, the correlation-intractable hash function is used as follows. Because the LWE-based CI-hash function is designed to avoid an efficiently computable function f of the prover’s first message, it is used together with a public-key encryption scheme: the prover’s first message is encrypted using the public key, and the function f is built to contain the secret key of the encryption scheme, so that it can decrypt the prover’s first message in order to calculate the challenge that must be avoided.

Our work imagines a simple modification of this strategy of using correlation-intractable hashing for arguing soundness. The main idea is that we want to replace the encryption scheme (which necessarily can only at most provide computational hiding) with an extractable statistically hiding commitment scheme. We will describe what this object entails in more detail very shortly, but the main observation is that such an extractable commitment in fact reveals the value being committed to with a tiny (but tunable) probability – crucially in a way that prevents a malicious prover from learning whether the commitment will reveal the committed value or not. With such a commitment scheme, the efficient function f underlying the correlation-intractable hash function will only “work” in the rare case that the commitment reveals the value being committed. But since a cheating prover can’t tell whether its committed values will be revealed or not, soundness will still hold overall, even though the actual guarantee of the correlation-intractable hash function is only invoked with a tiny probability in the proof of soundness. We now elaborate.

2.3 Statistically Hiding Extractable Commitments

Any statistically hiding commitment must lose all information about the committed message, except with negligible probability. This makes it challenging to define notions of extraction for statistically hiding commitments. In 4 rounds or more, this notion is easier to define, as extraction is possible even from statistically hiding commitments, simply by rewinding the adversary. However, traditional rewinding techniques break down completely when considering two-message commitments.

Nevertheless, the recent work of [20], building on [21], defined and constructed two-message statistically hiding extractable commitments, which they used to construct two-message statistical WI arguments, that were privately verifiable. In what follows, we abstract out the properties of a statistically hiding extractable commitment. A more formal description can be found in Sect. 5. We point out that we only need to rely on significantly simpler definitions than the ones in [20], and we give much simpler proofs that the constructions in [20] according to our new definitions. This may be of independent interest.

Defining Statistically Hiding Extractable Commitments. We start with an important observation about statistically hiding commitments, which gives a hint about how one can possibly define (and construct) two-message statistically hiding extractable commitments. Namely, any statistically hiding commitment must lose all information about the committed message, but may retain this information with some small negligible probability. Specifically,

  • A commitment that leaks the committed message with probability \(\epsilon \) (where \(\epsilon \) is a fixed negligible function in the security parameter) and statistically hides the message otherwise, will continue to be statistically hiding.

  • At the same time, one could ensure that no matter the behavior of the committer, the message being committed does get leaked to the honest receiver with probability at least \(\epsilon \).

  • Moreover, the committer does not know whether or not the committed message was leaked to the receiver. This property is important and will be crucially used in our proofs.

In spirit, this corresponds to establishing an erasure channel over which the committer transmits his message to the receiver. This channel almost always erases the committed message, but is guaranteed to transmit the committed message with a very small probability (\(\epsilon \)). Moreover, just like cryptographic erasure channels, the committer does not know whether or not his message was transmitted. Additionally, because this is a commitment, we require computational binding: once the committer transmits his message (that is, commits), he should not be able to change his mind about the message, even if the message did not get transmitted. Finally, we say that “extraction occurs” whenever the message does get transmitted, and we require that extraction occur with probability at least \(\epsilon \), even against a malicious committer.

Next, we describe how we interface these commitments with correlation intractable hash functions to obtain two-message statistical ZAP arguments.

2.4 Statistical ZAP Arguments

With this tool in mind, we make the following observations:

  1. 1.

    We would like to replace the encryption scheme used for generating the first message a for the sigma protocol, sent by the prover in the second round, with a statistically hiding commitment.

  2. 2.

    The first message of this commitment will be generated by the verifier. Furthermore, because we want a public coin protocol, we require this message to be pseudorandom.

  3. 3.

    We will require that with some small probability (say \(\lambda ^{-\omega (\log \lambda )}\)), all messages committed by the prover get transmitted to the verifier, that is with probability \(\lambda ^{-\omega (\log \lambda )}\), the verifier can recover all the messages committed by the prover in polynomial time given his secret state. Next, using an insight from the simple protocol in Sect. 2.1, we will set the security of the correlation intractable hash function, so that it is infeasible for any polynomially sized adversary to break correlation intractability with probability \(\lambda ^{-\omega (\log \lambda )}\).

The protocol is then as follows:

  • In the first round, the verifier samples a hash key K for the correlation intractable hash function \(\mathcal {H}\), for the same function family \(\mathcal {F}\) as Sect. 2.1. The verifier also samples strings \(q=\{c_{1,j}\}_{j\in [\mathsf {poly} (\lambda )]}\) uniformly at random, where \(\mathsf {poly} \) is a polynomial denoting the number of commitments made by the prover. The verifier sends q and K over to the prover.

  • In the second round, the prover computes the first message of the sigma protocol a (where the number of parallel repetitions equals the output length of correlation intractable hash function). This message a is generated using the statistically hiding extractable commitment scheme \(\mathsf {com}\) with q as the first message. The prover computes \(e = \mathcal {H}(K,x,q,a)\) and uses e to compute the third message z of the sigma protocol, by opening some subset of the commitments made by the prover. The prover outputs (aez).

We now provide some intuition for the security of this protocol.

  • Soundness: To argue soundness, we follow an approach that is similar to the soundness proof for the computational ZAP argument described in Sect. 2.1 (although with some additional technical subtleties). We discuss one such subtlety here:

    Let \(\ell = |e|\). Then, the correlation-extractable hash function can be at most \(2^{-\ell ^{\delta }}\)-secureFootnote 2. For this reason, we require the commitments to be jointly extractable in polynomial time with probability at least \(2^{-\ell ^\delta }\). Note that the total number of commitments is \(N = \ell \cdot \mathsf {poly} ({\lambda })\).

    However, statistically hiding commitments, as originally constructed in [20], are such that if a single commitment can be extracted with probability \(\epsilon \), then N commitments can be extracted with probability roughly \(\epsilon ^N\). Setting \(N = \ell \cdot \mathsf {poly} ({\lambda })\) as above implies that trivially, the probability of extraction will be roughly \(O(2^{-\ell \cdot \mathsf {poly} ({\lambda })})\), which is smaller than the required probability \(2^{-\ell ^\delta }\).

    However, we observe that the commitments constructed in [20] can be modified very slightly so that the probability of extraction can be \(2^{-g({\lambda })}\) for any efficiently computable function g that is bounded by any polynomial in \({\lambda } \). Thus, for example, the probability of extraction can be made to be \({\lambda } ^{-\log ({\lambda })}\). In other words, this extraction probability can be made to be independent of the total number of commitments, N. We describe this modification in additional detail in Sect. 4.2.

    Using commitments that satisfy the property stated above, we observe that we can switch to a hybrid where the challenger samples the commitment messages on behalf of the verifier, and hardwires the secret state used for extraction inside the hash key. The function is defined such that in the event that extraction occurs (given the secret state), the verifier can use the extracted values to compute the bad challenge \(e_{bad}\) (just as in Sect. 2.1), by evaluating a depth bounded function \(f_{bad}\) on the extracted values, and otherwise \(e_{bad}\) is set to 0. If the adversary breaks soundness with noticeable probability \(\epsilon \), then with probability roughly at least \(2^{-g({\lambda })} \cdot \epsilon \), the outputs of the adversary satisfy \(\mathcal {H}(K,x,q,a)=e_{bad}\). As already alluded to previously, we set the function g and the (quasi-polynomial) security of the hash function such that the event above suffices to contradict correlation intractability.

  • Statistical Witness Indistinguishability: Statistical witness indistinguishability composes under parallel repetition, and therefore can be proven index-by-index based on the statistical hiding property of the commitment. Additional details about the construction and the proof can be found in Sect. 5.

Super-Polynomial Simulation (SPS) Zero Knowledge. We show that the protocol above has a super-polynomial simulator which provides statistical zero knowledge. At a very high level, we do this by showing that the extractable commitment scheme can be equivocated in exponential time, and then by using complexity leveraging. We refer to the full version of the paper for details.

Concurrent and Independent Works. In a concurrent and independent work, [17] also constructed a 2-message public-coin statistically witness indistinguishable argument from quasipolynomial LWE. Another concurrent and independent work is that of [22], who construct a 2-message computationally witness indistinguishable public-coin argument from subexponential LWE.

2.5 Organization

The rest of this paper is organized as follows. In Sect. 3, we describe some of the preliminaries such as correlation intractability, oblivious transfer and proof systems. In Sect. 4, we define a simplified variant and present a slightly modified construction of extractable statistically hiding commitments, first proposed by [20]. Finally, in Sect. 5, we construct and prove the security of our statistical ZAP argument.

3 Preliminaries

Notation. Throughout this paper, we will use \({\lambda } \) to denote the security parameter, and \(\mathsf {negl} ({\lambda })\) to denote any function that is asymptotically smaller than \(\frac{1}{\mathsf {poly} ({\lambda })}\) for any polynomial \(\mathsf {poly} (\cdot )\).

The statistical distance between two distributions \(D_1, D_2\) is denoted by \(\varDelta (D_1, D_2)\) and defined as:

$$\varDelta (D_1, D_2) = \frac{1}{2} \varSigma _{v \in V} |\mathrm {Pr} _{x \leftarrow D_1}[x = v] - \mathrm {Pr} _{x \leftarrow D_2} [x = v] |.$$

We say that two families of distributions \(D_1=\{D_{1,{\lambda }}\}, D_2=\{D_{2,{\lambda }}\}\) are statistically indistinguishable if \(\varDelta (D_{1,{\lambda }}, D_{2,{\lambda }}) =\mathsf {negl} ({\lambda })\). We say that two families of distributions \(D_1=\{D_{1,{\lambda }}\}, D_2=\{D_{2,{\lambda }}\}\) are computationally indistinguishable if for all non-uniform probabilistic polynomial time distinguishers \({\mathcal D} \),

$$\big | \mathrm {Pr} _{r \leftarrow D_{1,{\lambda }}}[{\mathcal D} (r) = 1] - \mathrm {Pr} _{r \leftarrow D_{2,{\lambda }}}[{\mathcal D} (r) = 1] \big | = \mathsf {negl} ({\lambda }).$$

Let \(\varPi \) denote an execution of a protocol. We use \(\mathsf {View}_A (\varPi )\) denote the view, including the randomness and state of party A in an execution \(\varPi \). We also use \(\mathsf {Output}_A (\varPi )\) denote the output of party A in an execution of \(\varPi \).

Remark 1

In what follows we define several 2-party protocols. We note that in all these protocols both parties take as input the security parameter \(1^{\lambda } \). We omit this from the notation for the sake of brevity.

Definition 1

(\(\varSigma \)-protocols). Let \(L \in \mathsf {NP}\) with corresponding witness relation \(R_L\). A protocol \(\varPi = \langle P, V \rangle \) is a \(\varSigma \)-protocol for relation \(R_L\) if it is a three-round public-coin protocol which satisfies:

  • Completeness: For all \((x, w) \in R_L\), \(\mathrm {Pr} [\mathsf {Output}_V \langle P(x,w), V(x) \rangle = 1] = 1 - \mathsf {negl} ({\lambda })\), assuming P and V follow the protocol honestly.

  • Special Soundness: There exists a polynomial-time algorithm A that given any x and a pair of accepting transcripts \((a, e, z), (a, e', z')\) for x with the same first prover message, where \(e \ne e'\), outputs w such that \((x, w) \in R_L\).

  • Honest verifier zero-knowledge: There exists a probabilistic polynomial time simulator \({\mathcal S} _\varSigma \) such that for all \({(x, w) \in R_L}\), the distributions \( \left\{ {\mathcal S} _\varSigma (x, e) \right\} \) and \( \left\{ \mathsf {View}_{V} \langle P(x,w(x)), V(x, e) \rangle \right\} \) are statistically indistinguishable. Here \({\mathcal S} _\varSigma (x, e)\) denotes the output of simulator \({\mathcal S} \) upon input x and e, such that V’s random tape (determining its query) is e.

3.1 Correlation Intractable Hash Functions

We adapt definitions of a correlation intractable hash function family from [7, 24].

Definition 2

For any polynomials \(k, (\cdot ), s(\cdot ) = \omega (k(\cdot ))\) and any \(\lambda \in {\mathbb N} \), let \(\mathcal {F}_{\lambda ,s(\lambda )}\) denote the class of \(\mathsf {NC}^1\) circuits of size \(s(\lambda )\) that on input \(k(\lambda )\) bits output \(\lambda \) bits. Namely, \(f: \{0,1\}^{k(\lambda )} \rightarrow \{0,1\}^{\lambda }\) is in \(\mathcal {F}_{\lambda ,s}\) if it has size \(s(\lambda )\) and depth bounded by \(O(\log \lambda )\).

Definition 3

[Quasi-polynomially Correlation Intractable Hash Function Family] A hash function family \(\mathcal {H}= (\mathsf {Setup}, \mathsf {Eval})\) is quasi-polynomially correlation intractable (CI) with respect to \({\mathcal F} = \{{\mathcal F} _{\lambda , s(\lambda )} \}_{\lambda \in {\mathbb N}}\) as defined in Definition 2, if the following two properties hold:

  • Correlation Intractability: For every \({f\in \mathcal {F}_{\lambda ,s}}\), every non-uniform polynomial-size adversary \(\mathcal {A}\), every polynomial s, and every large enough \(\lambda \in {\mathbb N} \),

    $$\mathrm {Pr} _{K\leftarrow \mathcal {H}.\mathsf {Setup}(1^{{\lambda }}, f) } \Big [\mathcal {A}(K)\rightarrow x \text { such that } (x,\mathcal {H}.\mathsf {Eval}(K,x)) =(x,f(x)) \Big ]\le \frac{1}{{\lambda } ^{\log {\lambda }}}.$$
  • Statistical Indistinguishability of Hash Keys: Moreover, for every \(f \in \mathcal {F}_{\lambda ,s}\), for every unbounded adversary \(\mathcal {A}\),and every large enough \(\lambda \in {\mathbb N} \),

    $$\Big | \mathrm {Pr} _{K\leftarrow \mathcal {H}.\mathsf {Setup}(1^{{\lambda }},f)}[\mathcal {A}(K)=1]- \mathrm {Pr} _{K\leftarrow \{0,1\}^{\ell }}[\mathcal {A}(K)=1]\Big | \le 2^{-\lambda ^{\varOmega (1)}},$$

    where \(\ell \) denotes the size of the output of \(\mathcal {H}.\mathsf {Setup}(1^{{\lambda }},f)\).

The work of [24] gives a construction of correlation intractable hash functions with respect to \({\mathcal F} = \{{\mathcal F} _{\lambda , s(\lambda )} \}_{\lambda \in {\mathbb N}}\), based on polynomial LWE with polynomial approximation factors. We observe that their construction also satisfies Definition 3, assuming quasi-polynomial LWE with polynomial approximation factors.

3.2 Oblivious Transfer

Definition 4 (Oblivious Transfer)

Oblivious transfer is a protocol between two parties, a sender S with input messages \((m_0, m_1)\) and receiver R with input a choice bit b. The correctness requirement is that R obtains output \(m_b\) at the end of the protocol (with probability 1). We let \(\langle S(m_0, m_1), R(b) \rangle \) denote an execution of the OT protocol with sender input \((m_0, m_1)\) and receiver input bit b. We require OT that satisfies the following properties:

  • Computational Receiver Security. For any non-uniform PPT sender \(S^*\) and any \((b, b') \in {\{0,1\}} \), the views \(\mathsf {View} _{S^*}(\langle S^*, R(b) \rangle )\) and \(\mathsf {View} _{S^*}(\langle S^*, R(b') \rangle )\) are computationally indistinguishable.

    We say that the OT scheme is T-secure if all PPT malicious senders have distinguishing advantage less than \(\frac{1}{T}\).

  • Statistical Sender Security. This is defined using the real-ideal paradigm, and requires that for any distribution on the inputs \((m_0, m_1)\) and any unbounded adversarial receiver \(R^*\), there exists a (possibly unbounded) simulator \(\mathsf {Sim} _{R^*}\) that interacts with an ideal functionality \({\mathcal F} _{\mathsf {ot}} \) on behalf of \(R^*\). Here \({\mathcal F} _{\mathsf {ot}} \) is an oracle that obtains the inputs \((m_0, m_1)\) from S and b from \(\mathsf {Sim} _{R^*}\) (simulating the malicious receiver), and outputs \(m_b\) to \(\mathsf {Sim} _{R^*}\). Then \(\mathsf {Sim} ^{{\mathcal F} _{\mathsf {ot}}}_{R^*}\) outputs a receiver view that is statistically indistinguishable from the real view of the malicious receiver \(\mathsf {View}_{R^*}(\langle S(m_0, m_1), R^* \rangle )\). We say that the OT protocol satisfies \((1 - \delta )\) statistical sender security if the statistical distance between the real and ideal distributions is at most \(\delta \).

We use the following sender security property in our protocols (which follows from the definition of sender security in Definition 4 above).

Claim

For any two-message OT protocol satisfying Definition 4, for every malicious receiver \(R^*\) and every first message \(m_{R^*}\) generated by \(R^*\), we require that there exists an unbounded machine E which extracts b such that either of the following statements is true:

  • For all \(m_0, m_1, m_2\), \(\mathsf {View} _{R^*}\langle S(m_0, m_1), R^* \rangle \) and \(\mathsf {View} _{R^*}\langle S(m_0, m_2), R^* \rangle \) are statistically indistinguishable and \(b = 0\), or,

  • For all \(m_0, m_1, m_2\), \(\mathsf {View} _{R^*}\langle S(m_0, m_1), R^* \rangle \) and \(\mathsf {View} _{R^*}\langle S(m_2, m_1), R^* \rangle \) are statistically indistinguishable and \(b = 1\).

Proof

From the (unbounded) simulation property of the two-message OT, there exists a simulator that extracts a receiver input bit b from the first message of \(R^*\), sends it to the ideal functionality, obtains \(m_b\) and generates an indistinguishable receiver view. Then, by the definition of sender security, when \(b = 0\), the simulated view must be close to both \(\mathsf {View} _{R^*}\langle {S(m_0, m_1), R^*\rangle }\), and \(\mathsf {View} _{R^*}\langle {S(m_0, m_2), R^*\rangle }\). Similarly, when \(b = 1\), the simulated view must be statistically close to both \(\mathsf {View} _{R^*}\langle {S(m_0, m_1), R^*\rangle }\), and \(\mathsf {View} _{R^*}\langle {S(m_2, m_1), R^*\rangle }\).

Throughout the paper, we focus on two-message oblivious transfer. We now discuss an additional specific property of two-message OT protocols.

Property 1

The message sent by the receiver is pseudorandom - in particular, this means that the receiver can just sample and send a uniformly random string as a valid message to the sender.

Such two-message OT protocols with this additional property have been constructed based on the DDH assumption [23], LWE assumption [5], and a stronger variant of smooth-projective hashing, which can be realized from DDH as well as the \(N^{th}\)-residuosity and Quadratic Residuosity assumptions [16, 19]. Such two-message protocols can also be based on witness encryption or indistinguishability obfuscation (iO) together with one-way permutations [25].

3.3 Proof Systems

An n-message interactive protocol for deciding a language L with associated relation \(R_L\) proceeds in the following manner:

  • At the beginning of the protocol, P and V receive the size of the instance and security parameter, and execute the first \(n-1\) messages.

  • At some point during the protocol, P receives input \((x,w)\in R_L\). P sends x to V together with the last message of the protocol. Upon receiving the last message from P, V outputs 1 or 0.

An execution of this protocol with instance x and witness w is denoted by \(\langle P (x, w), V (x)\rangle \). One can consider both proofs – with soundness against unbounded provers, and arguments – with soundness against computationally bounded provers.

Definition 5 (Two-Message Interactive Arguments)

A two-message delayed-input interactive protocol (PV) for deciding a language L is an interactive argument for L if it satisfies the following properties:

  • Completeness: For every \((x,w)\in R_L\), \(\mathrm {Pr} \big [\mathsf {Output}_V\langle P(x,w),V(x) \rangle = 1\big ] = 1-\mathsf {negl} ({\lambda }),\) where the probability is over the random coins of P and V, and where in the protocol P receives (xw) right before computing the last message of the protocol, and V receives x together with the last message of the protocol.

  • Non-adaptive Soundness: For every (non-uniform) PPT prover \(P^*\) that on input \(1^{\lambda } \) (and without access to the verifier’s message) outputs a length \(1^{p(\lambda )}\) and \(x\in {\{0,1\}} ^{p(\lambda )}\setminus L\), \(\mathrm {Pr} \big [\mathsf {Output}_V\langle P^*,V \rangle (x) = 1\big ] = \mathsf {negl} ({\lambda }),\) where the probability is over the random coins of V.

Witness Indistinguishability. A proof system is witness indistinguishable if for any statement with at least two witnesses, proofs computed using different witnesses are indistinguishable. In this paper, we only consider statistical witness indistinguishability, which we formally define below.

Definition 6 (Statistical Witness Indistinguishability)

A delayed-input interactive argument (PV) for a language L is said to be statistical witness-indistinguishable if for every unbounded verifier \(V^*\), every polynomially bounded function \(n=n({\lambda })\le \mathsf {poly} ({\lambda })\), and every \((x_n, w_{1,n}, w_{2,n})\) such that \((x_n, w_{1,n}) \in R_L\) and \((x_n, w_{2,n}) \in R_L\) and \(|x_n|=n\), the following two ensembles are statistically indistinguishable:

$$\big \{ \mathsf {View}_{V^*} \langle P(x_n,w_{1,n}),V^*(x_n) \rangle \big \} \text { and } \big \{ \mathsf {View}_{V^*} \langle P(x_n,w_{2,n}),V^*(x_n) \rangle \big \}$$

Definition 7

(\(T_\mathsf {Sim}\)-Statistical Zero Knowledge). A delayed-input interactive argument (PV) for a language L is said to be a \(T_\mathsf {Sim}\)-super-polynomial simulation (SPS) statistical zero-knowledge argument for L if there exists a (uniform) simulator \(\mathsf {Sim}\) that runs in time \(T_\mathsf {Sim}\), such that for every x, every unbounded verifier \(V^*\), the two distributions \(\mathsf {View} _{V^*} \left[ \langle P, V^* \rangle (x,w) \right] \) and \(S^{V^*}(x,z)\) are statistically close.

4 Extractable Commitments

4.1 Definitions

We take the following definition of statistically hiding extractable commitments from [20]. As before, we use \({\lambda } \) to denote the security parameter, and we let \(p = \mathsf {poly} ({\lambda })\) be an arbitrary fixed polynomial such that the message space is \({\{0,1\}} ^{p({\lambda })}\).

We restrict ourselves to commitments with non-interactive decommitment, and where the (honest) receiver is not required to maintain any state at the end of the commit phase in order to execute the decommit phase. Our construction will satisfy this property and this will be useful in our applications to constructing statistically private arguments.

Definition 8 (Statistically Hiding Commitment Scheme)

A commitment \(\langle {\mathcal C}, {\mathcal R} \rangle \) is a two-phase protocol between a committer \({\mathcal C} \) and receiver \({\mathcal R} \), consisting of algorithms \(\mathsf {Commit}, \mathsf {Decommit}\) and \(\mathsf {Verify}\). At the beginning of the protocol, \({\mathcal C} \) obtains as input a message \(M \in {\{0,1\}} ^p\). Next, \({\mathcal C} \) and \({\mathcal R} \) execute the commit phase, and obtain a commitment transcript, denoted by \(\tau \), together with private states for \({\mathcal C} \) and \({\mathcal R} \), denoted by \(\mathsf {state}_{{\mathcal C}, \tau }\) and \(\mathsf {state}_{{\mathcal R}, \tau }\) respectively. We use the notation

$$(\tau ,\mathsf {state}_{{\mathcal C},\tau }, \mathsf {state}_{{\mathcal R}, \tau }) \leftarrow \mathsf {Commit}\langle {\mathcal C} (M), {\mathcal R} \rangle .$$

Later, \({\mathcal C} \) and \({\mathcal R} \) possibly engage in a decommit phase, where the committer \({\mathcal C} \) computes and sends message \(y = \mathsf {Decommit} (\tau , \mathsf {state}_{{\mathcal C}, \tau })\) to \({\mathcal R} \). At the end, \({\mathcal R} \) computes \(\mathsf {Verify}(\tau , y)\) to output \(\bot \) or a message \(\widetilde{M} \in {\{0,1\}} ^p\).Footnote 3

A statistically hiding commitment scheme is required to satisfy three properties: perfect completeness, statistical hiding and computational binding. We formally define these in the full version of the paper.

We also define an extractor \({\mathcal E} \) that given black-box access to \({\mathcal C} ^*\), and then without executing any decommitment phase with \({\mathcal C} ^*\), outputs message \(\widetilde{M}\) committed by \({\mathcal C} ^*\) with probability at least \(\epsilon \): we require “correctness” of this extracted message \(\widetilde{M}\). We also require that no PPT adversary can distinguish transcripts where extraction is successful from those where it is unsuccessful. This is formally described in Definition 9.

Definition 9

(\(\epsilon \)-Extractable Statistically Hiding Commitment). We say that a statistically hiding commitment scheme is \(\epsilon \)-extractable if the following holds: Denote \((\tau , \mathsf {state}_{{\mathcal C}, \tau }, \mathsf {state}_{{\mathcal R}, \tau }) \leftarrow \mathsf {Commit} \langle {\mathcal C} ^*, {\mathcal R} \rangle \). We require that there exists a deterministic polynomial time extractor \({\mathcal E} \) that on input \((\tau , \mathsf {state}_{{\mathcal R},\tau })\) outputs \(\widetilde{M}\) such that the following properties hold.

  • Frequency of Extraction. For every PPT committer \({\mathcal C} ^*\),

    $$\mathrm {Pr} [{\mathcal E} (\tau , \mathsf {state}_{{\mathcal R},\tau }) \ne \bot ] = \epsilon $$

    where the probability is over \((\tau , \mathsf {state}_{{\mathcal C}, \tau }, \mathsf {state}_{{\mathcal R}, \tau }) \leftarrow \mathsf {Commit} \langle {\mathcal C} ^*, {\mathcal R} \rangle \).

  • Correctness of Extraction. For every PPT committer \({\mathcal C} ^*\), every execution \((\tau , \mathsf {state}_{{\mathcal C}, \tau }, \mathsf {state}_{{\mathcal R}, \tau }) \in \mathsf {Supp}(\mathsf {Commit} \langle {\mathcal C} ^*, {\mathcal R} \rangle )\), and every y, denoting \(\widetilde{M} = {\mathcal E} (\tau , \mathsf {state}_{{\mathcal R}, \tau })\) and \(M = \mathsf {Verify}(\tau , y)\), if \(\widetilde{M} \ne \bot \) and \(M \ne \bot \), then \(\widetilde{M} = M\).

  • Indistinguishability of Extractable Transcripts. For every \({\mathcal C} ^*\),

    $$\big | \mathrm {Pr} [{\mathcal C} ^*(\tau ) = 1 \mid {\mathcal E} (\tau ,\mathsf {state}_{{\mathcal R}, \tau }) \ne \bot ] - \mathrm {Pr} [{\mathcal C} ^*(\tau ) = 1 \mid {\mathcal E} (\tau ,\mathsf {state}_{{\mathcal R},\tau }) = \bot ] \big | = \mathsf {negl} ({\lambda })$$

    where the probability is over \((\tau , \mathsf {state}_{{\mathcal R}, \tau }) \leftarrow \mathsf {Commit} \langle {\mathcal C} ^*, {\mathcal R} \rangle \).

We also consider a stronger definition, of \(\epsilon \)-extractable statistically hiding \(\ell \) multi-commitments, where we require that an entire sequence of \(\ell \) commitments can be extracted with probability \(\epsilon \), that is independent of \(\ell \). We will also modify the \(\mathsf {Verify}\) algorithm so that it obtains as input the transcript \(\tau : = (\tau _1, \tau _2, \ldots \tau _\ell )\) of all \(\ell \) commitments, together with an index \(i \in [\ell ]\) and the decommitment \(\mathsf {state}_{C, \tau ,i}\) to a single commitment. We defer their formal description to the full version of the paper.

4.2 Protocol

In this section, we construct two-message statistically hiding, extractable commitments according to Definition 9 assuming the existence of two message oblivious transfer (OT). Our construction is described in Fig. 1.

Primitives Used. Let \(\mathsf {OT} = (\mathsf {OT} _1, \mathsf {OT} _2)\) denote a two-message string oblivious transfer protocol according to Definition 4, also satisfying Property 1. Let \(\mathsf {OT} _1(b;r_1)\) denote the first message of the \(\mathsf {OT}\) protocol with receiver input b and randomness \(r_1\), and let \(\mathsf {OT} _2(M_0, M_1;r_2)\) denote the second message of the OT protocol with sender input strings \(M_0, M_1\) and randomness \(r_2\).Footnote 4

Fig. 1.
figure 1

Extractable commitments

Observe that the protocol satisfies the property mentioned in the definition that the verify algorithm in the decommitment phase does not require the private randomness used by the receiver in the commit phase. Further, observe that if the oblivious transfer protocol satisfies Property 1, the receiver’s message can alternately be generated by just sampling a uniformly random string. Thus, this would give an extractable commitment protocol where the receiver’s algorithms are public coin.

We will now prove the following main theorem.

Theorem 1

Assuming that the underlying OT protocol is \({\lambda } ^{-\log {\lambda }}\)-secure against malicious senders, \((1 - \delta _{\mathsf {OT}})\) secure against malicious receivers according to Definition 4, and satisfies Property 1, there exists a setting of \(m=O(\log ^2 {\lambda })\) for which the scheme in Fig. 1 is a \((1 - 2^{-m} - \delta _{\mathsf {OT}})\) statistically hiding, \(\lambda ^{-\log ^{1/2} \lambda }\)-extractable commitment scheme according to Definition 9. Further, the receiver’s algorithms are public coin.

We relegate the proof of Theorem 1 to the full version of the paper.

5 Our Statistical WI Protocol

5.1 Modified Blum Protocol

We begin by describing a very simple modification to the Blum \(\varSigma \)-protocol for Graph Hamiltonicity. The protocol we describe will have soundness error \(\frac{1}{2} - \mathsf {negl} ({\lambda })\) against adaptive PPT provers, and will satisfy statistical zero-knowledge. Since Graph Hamiltonicity is NP-complete, this protocol can also be used to prove any statement in NP via a Karp reduction. This protocol is described in Fig. 2.

We give an overview of the protocol here. Note that the only modification to the original protocol of Blum [3] is that we use two message statistically hiding, extractable commitments instead of non-interactive statistically binding commitments. The proofs of soundness and statistical honest-verifier zero-knowledge are fairly straightforward. They roughly follow the same structure as [3], replacing statistically binding commitments with statistically hiding commitments.

Fig. 2.
figure 2

Modified blum SZK argument

Lemma 1

Assuming that \(\mathsf {extcom}\) is computationally binding, the protocol in Fig. 2 satisfies soundness against PPT provers that may choose x adaptively in the second round of the protocol.

Proof

The proof of soundness follows by the computational binding property of \(\mathsf {extcom} \) and the soundness of the (original) Blum protocol.

Let L denote the language consisting of all graphs that have a Hamiltonian cycle. Consider a cheating prover \(P^*\) that convinces a malicious verifier about a statement \(x \not \in L\) with probability \(\frac{1}{2} + h(n)\), where \(h(\cdot ) > \frac{1}{\mathsf {poly} (\cdot )}\) for some polynomial \(\mathsf {poly} (\cdot )\). By an averaging argument, this means that there exists at least one transcript prefix \(\tau \) consisting of the first two messages of the protocol, where for \(G \not \in L\) sent by the prover in the third message, \(\mathrm {Pr} [V \text { accepts}|\tau , G \not \in L] > \frac{1}{2}\). This implies that there exists a cheating prover that generates a transcript prefix \(\tau \), for which it provides an accepting opening corresponding to both \(b = 0\) and \(b = 1\), with probability at least h(n). Next, we argue that such a cheating prover must break the (computational) binding of \(\mathsf {com}\).

Since \(G \not \in L\), it is information theoretically impossible for any cheating prover to generate a commitment to a unique string \(\pi , \pi (G)\) such that there exists a Hamiltonian cycle in \(\pi (G)\). Therefore, any prover that opens a transcript prefix \(\tau , G\) corresponding to both \(b = 0\) and \(b = 1\) for \(G \not \in L\), must open at least one commitment in the set \(\{\mathsf {extcom} _{P}, \{\mathsf {extcom} _{i,j}\}_{i, j \in p \times p}\}\) to two different values, thereby giving a contradiction to the binding of the commitment scheme.    \(\square \)

Lemma 2

Assuming that \(\mathsf {extcom}\) is statistically hiding, the protocol in Fig. 2 satisfies honest-verifier statistical zero-knowledge.

Proof

The simulation strategy is identical to that of [3]. The simulator \(\mathsf {Sim}\) first guesses the challenge bit \(c'\). It begins an interaction with the malicious verifier. On obtaining the first message from the verifier, if \(c' = 0\), it samples \(\pi \) uniformly at random and generates a commitment to \(\pi , \pi (G)\) following honest prover strategy to generate the commitment. If \(c' = 1\), it samples \(\pi , H'\) uniformly at random where \(H'\) is an arbitrary hamiltonian cycle, and generates a commitment to \(\pi , \pi (H')\) following honest prover strategy to generate the commitment. Next, it waits for the verifier to send c, and if \(c \ne c'\), it aborts and repeats the experiment. If \(c = c'\), then it decommits to the commitments according to honest prover strategy.

Note that when \(c = c' = 1\), the resulting simulation is perfect zero-knowledge since the simulated view of the verifier is identical to the view generated by an honest prover. On the other hand when \(c = c' = 0\), it follows from the statistical hiding property of the commitment \(\mathsf {extcom} \) that the verifier cannot distinguish the case where \(\mathsf {extcom} \) is a commitment to \(\pi , \pi (G)\) and a hamiltonian cycle is opened in \(\pi (G)\), from the case where \(\mathsf {extcom} \) is not a commitment to \(\pi (G)\), but instead to some \(\pi (H')\) for a hamiltonian cycle \(H'\).    \(\square \)

Since honest-verifier zero-knowledge composes under parallel repetition, we can repeat the protocol several times in parallel to get negligible soundness error. Formally, we have the following lemma:

Lemma 3

Assuming that \(\mathsf {extcom}\) is statistically hiding, the protocol in Fig. 2 satisfies honest verifier statistical zero-knowledge under parallel repetition.

Finally, Cramer et al. [9] showed that honest verifier zero knowledge where the receiver’s algorithms are public coin implies witness indistinguishability even against malicious verifiers. As a result, we get the following lemma:

Lemma 4

Assuming that \(\mathsf {extcom}\) is statistically hiding, the protocol in Fig. 2 satisfies statistical witness indistinguishability under parallel repetition.

5.2 Statistical ZAPs

In this section, we prove the following theorem:

Theorem 2

There exists a two message public-coin statistical witness indistinguishable argument system for NP in the plain model assuming that the following primitives exist:

  • Two-message oblivious transfer (OT) that is quasi-polynomially secure against malicious senders, satisfying Definition 4 and Property 1, and,

  • Quasi-polynomially correlation intractable hash functions.

Recall from previous sections that we can use the above OT to build the extractable commitment which is then used to build a four message \(\varSigma \)-protocol that is a modification to Blum’s protocol. As mentioned before, we can instantiate both the OT and the correlation intractable hash function assuming the learning with errors (LWE) assumption. Therefore, instantiating both the primitives in the above theorem gives us the following:

Theorem 3

Assuming quasi-polynomially secure LWE, there exists a two message public-coin statistical witness indistinguishable argument system for NP in the plain model.

Notations and Primitives Used

  • Let \(\lambda \) be the security parameter.

  • Let \(\varSigma := (\varSigma _1, \ldots , \varSigma _\lambda )\) denote \(\lambda \) parallel repetitions of the modified Blum Sigma protocol constructed in Sect. 5.1, where for \(i \in [\ell ], \varSigma _i = (q_i, a_i, e_i, z_i)\). Let the underlying commitment scheme be instantiated with extraction success probability \(\epsilon = {\lambda } ^{-\log ^{1/2} {\lambda }}\).

  • Let \(\mathcal {H}\) be a correlation intractable hash function with respect to \(\{ \mathcal {F}_{\lambda ,s(\lambda )}\}_{\lambda \in \mathbb {N}}\) according to Definition 3 that outputs strings of length \(\lambda \), where \(s(\lambda ) = 2 s_1(\lambda )\) where \(s_1\) is the size of the extractor \({\mathcal E} \) used in the commitment scheme and \({\mathcal F} \) denotes the class of all \(\mathsf {NC}^1\) circuits of size \(s(\lambda )\) as defined in Definition 2. Recall the correlation-intractability advantage is assumed to be at most \(\frac{1}{{\lambda } ^{\log {\lambda }}}\).

Construction. Let x be any instance in \(\{0,1\}^{{\lambda }}\) and let w be the corresponding witness for the statement \(x \in L\).

  1. 1.

    Verifier’s message to the Prover:

    • Sample \(q:= \{q_i\}_{i \in [{\lambda } ]}\).

    • Sample \(K\leftarrow \mathcal {H}.\mathsf {Setup}(1^{{\lambda }},0^{\ell })\).

    • Output (qK).

  2. 2.

    Prover’s message to the Verifier:

    • Compute \(\{a_i\}_{i \in [{\lambda } ]}\) as a response to \(\{q_i\}_{i \in [{\lambda } ]}\).

    • Compute \(e\leftarrow \mathcal {H}.\mathsf {Eval}(K,x,(q,a))\).

    • Compute \(\{z_i\}_{i \in [{\lambda } ]}\) with respect to the challenge string e.

    • Output (xaez).

  3. 3.

    Verification: The verifier does the following:

    • If \(\mathcal {H}.\mathsf {Eval}(K,x,a) \ne e\), output reject.

    • Else if (xqaez) does not verify according to the \(\varSigma \) protocol, output reject.

    • Else output accept.

Completeness. Completeness of the protocol can be easily observed from the correctness of the underlying primitives: the protocol \(\varSigma \) and the hash function H.

Public Coin. Recall from the statistical indistinguishability of hash keys property that an honest verifier can just sample a uniformly random string as the hash key K. This, along with the fact that the underlying protocol \(\varSigma \) is public coin results in the above protocol also being public coin.

Soundness. We now prove computational soundness of the protocol above. Towards a contradiction, fix any adversary \({\mathcal A} \) that breaks soundness of the protocol with probability \(\frac{1}{p(\lambda )}\) for some polynomial \(p(\cdot )\).

We consider a sequence of hybrids where the first hybrid corresponds to the real soundness experiment.

  • \(\mathsf {Hybrid} _0:\) This hybrid corresponds to the experiment where the challenger behaves identically to the verifier in the actual protocol.

  • \(\mathsf {Hybrid} _1\): In this hybrid, instead of generating the verifier’s first message as uniformly random string, the challenger \(\mathsf {Ch}\) now computes the first message of the extractable commitment scheme used in the underlying protocol \(\varSigma \) as done in the protocol description in Fig. 1. In particular, the underlying OT receiver messages are not sampled as uniformly random strings but instead are computed by running the OT receiver algorithm. As a result, \(\mathsf {Ch}\) now has some internal state \(r_\mathsf {state}\) as part of the extractable commitment scheme that is not public.

  • \(\mathsf {Hybrid} _2\): This hybrid is the same as the previous hybrid except that the hash key K is generated as follows. \(K\leftarrow \mathcal {H}.\mathsf {Setup}(1^{{\lambda }},R)\) where the relation R consists of tuples of the form ((xqa), y) where y is computed by an efficient function \(f_{bad}\) described below. \(f_{bad}\) has the verifier’s secret state \(r_\mathsf {state}\) hardwired, takes as input the statement x, the verifier’s message q, the prover’s message a and does the following.

    1. 1.

      Run the extractor algorithm \({\mathcal E} \) on input \((r_\mathsf {state}, \tau = (q,a))\) to compute m. Note that \({\mathcal E} \) can be represented by an \(\mathsf {NC}^1\) circuit of size \(s_1(\lambda )\) for some polynomial \(s_1\).

    2. 2.

      If \(m \ne \bot \), this means that m is the tuple of messages committed to in the set of \(\lambda \) commitment tuples \((c_P,\{c_{i,j}\})\). For each \(k \in [\lambda ]\), check whether the message committed to by the tuple \(\{c_{i,j}\}\) is indeed equal to \(\pi (G)\) where \(\pi \) is the permutation committed to in \(c_P\). If so, then set \(e_k = 0\) and else set \(e_k=1\). Set \(y = (e_1,\ldots ,e_\lambda )\).Footnote 5

    3. 3.

      If \(m = \bot \), set \(y = 0^\lambda \).

Before proving the soundness of the protocol using the hybrids, we define an event that helps us in the proof.

Event E: Let \(\tau \) denote the transcript of an execution of the above protocol and let \(\tau _{\mathcal C} \) denote the transcript of the commitment scheme in the execution. Let \(\mathsf {state}_{\mathcal R} \) denote the state of the verifier when it runs the receiver algorithm of the commitment scheme. We will say that the event \({\mathbf{E}} \) occurs if for any honest verifier V:

$$[V(\tau )=1 \wedge {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ].$$

We now continue the proof of soundness with the following claims.

Lemma 5

Assuming the pseudorandomness of receiver messages of the OT protocol used in the underlying extractable commitment scheme (Property 1), \(|\mathrm {Pr} [V(\tau )=1|\mathsf {Hybrid} _1] - \mathrm {Pr} [V(\tau )=1|\mathsf {Hybrid} _0]| = \mathsf {negl} ({\lambda }) \)

Proof

The only difference between the two hybrids is that in \(\mathsf {Hybrid} _0\), the OT receiver messages in the extractable commitment scheme used in the underlying protocol \(\varSigma \) are generated as uniformly random strings while in \(\mathsf {Hybrid} _1\), they are generated by running the algorithm \(\mathsf {OT} _1\) on behalf of the OT receiver. It is easy to see that if the difference in the adversary’s success probability in breaking soundness between these two hybrids is non-negligible, we can break the pseudorandomness of receiver messages property (Property 1) of the underlying two message OT protocol, which is a contradiction.    \(\square \)

Lemma 6

Assuming the frequency of extraction property and the indistinguishability of extractable transcripts property of the extractable commitment scheme, there exists a polynomial \(p(\cdot )\) such that \(\mathrm {Pr} [{\mathbf{E}} \text { occurs in }\mathsf {Hybrid} _1] \ge \epsilon \cdot \frac{1}{p(\lambda )},\) where the probability is over the randomness of V, and where \(\epsilon = \lambda ^{-\log ^{1/2} \lambda }\) is the extraction probability of the underlying commitment scheme.

Proof

Fix \(x \not \in L\). We will consider a reduction \({\mathcal B} \) that interacts with the adversary and relies on the frequency of extraction property and the indistinguishability of extractable transcripts property of the extractable commitment scheme to prove the lemma.

\({\mathcal B} \) interacts with a challenger \(\mathsf {Ch}\) for the commitment scheme and receives a first round message \(\mathsf {com}_1\) for the \(\ell \)-extractable commitment scheme. It then interacts with the adversary \({\mathcal A} \) as the verifier in the ZAP protocol, setting \(\mathsf {com}_1\) as its message on behalf of the receiver in the underlying commitment scheme, and sampling the hash key \(K \leftarrow {\mathcal H}.\mathsf {Setup}(1^\lambda ,0^\ell )\). After completing the protocol execution with \({\mathcal A} \), \({\mathcal B} \) forwards the commitments sent by \({\mathcal A} \) as its message \(\mathsf {com}_2\) of the commitment scheme to the challenger \(\mathsf {Ch}\). Further, \({\mathcal B} \) outputs 1 in its interaction with \(\mathsf {Ch}\) if the proof provided by \({\mathcal A} \) verifies, and 0 otherwise.

Let \(\tau \) denote the transcript of the ZAP protocol and \(\tau _{\mathcal C} \) the transcript of the underlying commitment scheme. Let \(\mathsf {state}_r\) be the state of the receiver in the commitment scheme as sampled by the challenger \(\mathsf {Ch}\).

First, we observe that by Lemma 5, there exists a polynomial \(p(\cdot )\) such that adversary \({\mathcal A} \) breaks the soundness property in \(\mathsf {Hybrid} _1\) with non-negligible probability \(\frac{1}{p(\lambda )}\). This implies that \(\mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1] \ge \frac{1}{p(\lambda )}\) over the random coins of \({\mathcal B},\mathsf {Ch}\). This gives us the following equation.

$$\begin{aligned} \begin{aligned} \mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1] = ( \mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ]\cdot \mathrm {Pr} [{\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \\ + \mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) = \bot ]\cdot \mathrm {Pr} [{\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) = \bot ] ) \ge \frac{1}{p(\lambda )} \end{aligned} \end{aligned}$$
(1)

From the indistinguishability of extractable transcripts property, we have that:

$$\begin{aligned} \big |\mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] - \mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) = \bot ] \big |= \mathsf {negl} (\lambda ) \end{aligned}$$
(2)

From the frequency of extraction property, we have that :

$$\begin{aligned} \mathrm {Pr} [ {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \ge \epsilon \end{aligned}$$
(3)

where all equations are over the random coins of the challenger \(\mathsf {Ch}\) and reduction \({\mathcal B} \). Combining Eqs. (1) and (2) implies that there exists a polynomial \(q(\cdot )\) such that \(\mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \ge \frac{1}{q(\lambda )},\) which, by Eq. (3), implies that

$$\begin{aligned} \mathrm {Pr} [{\mathcal B} (\tau )=1 \mathrel {\wedge }&{\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \\&= \mathrm {Pr} [{\mathcal B} (\tau _{\mathcal C})=1 \mid {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \mathrel {\cdot } \mathrm {Pr} [{\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}}) \ne \bot ] \\&\ge \frac{1}{q(\lambda )} \cdot \epsilon . \end{aligned}$$

Thus we have \(\mathrm {Pr} [\mathbb {E} \text { occurs in } \mathsf {Hybrid} _1] \ge \epsilon \cdot \frac{1}{q(\lambda )}. \) This completes the proof of the Lemma.    \(\square \)

Lemma 7

Assuming the statistical indistinguishability of hash keys of the correlation intractable hash function, there exists a polynomial \(p(\cdot )\) such that

$$\mathrm {Pr} [{\mathbf{E}} \text {occurs in }\mathsf {Hybrid} _2] \ge \epsilon \cdot \frac{1}{p(\lambda )},$$

where the probability is over the randomness of V, and where \(\epsilon = \lambda ^{- \log ^{1/2} \lambda }\) is the extraction probability of the underlying commitment.

Proof

Assume for the sake of contradiction that the lemma is not true. We will show that we can break the statistical indistinguishability of hash keys property of the correlation intractable hash function.

We will design a reduction \({\mathcal B} \) that interacts with \({\mathcal A} \), where \({\mathcal B} \) acts as verifier in the above ZAP protocol. \({\mathcal B} \) interacts with a challenger \(\mathsf {Ch}\) for the correlation intractable hash function. Initially, \({\mathcal B} \) samples the first round message q for the underlying Sigma protocol just as in \(\mathsf {Hybrid} _1\), along with associated receiver state \(\mathsf {state}_{\mathcal R} \) for the commitment scheme, and sends both to \(\mathsf {Ch}\). \({\mathcal B} \) obtains a hash key K sampled either uniformly at random (as in \(\mathsf {Hybrid} _1\)) or by running the setup algorithm of the hash function as described in \(\mathsf {Hybrid} _2\). \({\mathcal B} \) uses this key K in its interaction with the adversary \({\mathcal A} \) and completes executing the ZAP protocol. Observe that if \(\mathsf {Ch}\) sampled a hash key uniformly at random, the interaction between \({\mathcal A} \) and \({\mathcal B} \) is identical to \(\mathsf {Hybrid} _1\) and if \(\mathsf {Ch}\) sampled as hash key as described in \(\mathsf {Hybrid} _2\), the interaction between \({\mathcal A} \) and \({\mathcal B} \) is identical to \(\mathsf {Hybrid} _2\).

Now, \({\mathcal B} \) tests if event \({\mathbf{E}} \) occurs. That is, it checks if the ZAP protocol verifies and if so, runs the extractor \({\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}})\) using the transcript \(\tau _{\mathcal C} \) for the commitment scheme. If the extractor cE does not output \(\bot \), then event \({\mathbf{E}} \) occurs and \({\mathcal B} \) guesses that the hash key was uniformly sampled in its interaction with the challenger \(\mathsf {Ch}\). Otherwise, it guesses that the hash key was not uniformly sampled. Thus, if the event \({\mathbf{E}} \) occurs with probability \(\ge \epsilon \cdot \frac{1}{p(\lambda )}\) in \(\mathsf {Hybrid} _1\), and occurs with probability \(\epsilon \cdot \mathsf {negl} (\lambda )\) in \(\mathsf {Hybrid} _2\), \({\mathcal B} \) can distinguish between the hash keys with advantage \(\frac{\epsilon }{q(\lambda )}\) for some polynomial q. This is a contradiction, and this completes the proof of the lemma.    \(\square \)

Lemma 8

Assuming the quasi-polynomial correlation intractable property of the hash function, the soundness of the underlying protocol \(\varSigma \) and the correctness of extraction of the extractable commitment scheme,

$$\mathrm {Pr} [{\mathbb E} \text { occurs in }\mathsf {Hybrid} _2] \le \epsilon \cdot \mathsf {negl} (\lambda ).$$

Proof

Suppose the claim is not true. This implies that \(\mathrm {Pr} [V(\tau )=1 \wedge {\mathcal E} (\tau _{\mathcal C}, \mathsf {state}_{{\mathcal R}})\) \(\ne \bot ] = \epsilon \cdot \frac{1}{p(\lambda )} \) for some polynomial p. Let us consider any transcript on which event \({\mathbf{E}} \) occurs. Let (qK) denote the verifier’s message and (xaez) denote the prover’s message. Then, from the correctness of the ZAP protocol, it must be the case that (qaez) verifies according to protocol \(\varSigma \) and \(e = H(K,q,x,a)\). Further, since the extractor \({\mathcal E} \) succeeds on this transcript, the commitment scheme is statistically binding. Therefore, we can invoke the special soundness of the underlying modified Blum \(\varSigma \) protocol (as in the case of the regular Blum protocol) to state that for the statement \(x \notin L\) and prefix (qa) there can exist at most one pair \((e^*,z^*)\) such that \((q,a,e^*,z^*)\) verifies successfully. Therefore, the adversary’s message e must be equal to this value \(e^*\).

Now, from the description of the relation R used in defining the hash key K in \(\mathsf {Hybrid} _2\), we observe that, by the correctness of extraction, \(f_{bad}(q,x,a) = e^* = H(K,q,x,a)\). Thus, for any transcript that satisfies the conditions in event \({\mathbf{E}} \), \(f_{bad}(q,x,a) = e^* = H(K,q,x,a)\).

Thus, we can build a reduction \({\mathcal B} \) that, using the adversary \({\mathcal A} \), produces (xqa) such that \(f_{bad}(q,x,a) = e^* = H(K,q,x,a)\) with probability at least \(\epsilon \cdot \frac{1}{p({\lambda })} = \frac{1}{\lambda ^{\log ^{1/2} \lambda } \cdot p({\lambda })}\). Since by Definition 3 the advantage of any polynomial-time adversary in this game must be at most \(\frac{1}{{\lambda } ^{\log {\lambda }}}\), this yields a contradiction.

   \(\square \)

Note that Lemmas 7 and 8 contradict each other, and therefore the adversary does not break soundness in the real experiment. This completes the proof of soundness.    \(\square \)

Statistical Witness Indistinguishability. Let \({\mathcal A} \) denote the unbounded time adversarial verifier and \(\mathsf {Ch}\) denote the challenger. Let x be the challenge instance of length \({\lambda } \) and \(w_0\) and \(w_1\) be a pair of witnesses for \(x \in L\). Consider a pair of hybrids where the first hybrid \(\mathsf {Hybrid} _0\) corresponds to \(\mathsf {Ch}\) running the honest prover algorithm with witness \(w_0\) being used and the second hybrid \(\mathsf {Hybrid} _1\) corresponds to \(\mathsf {Ch}\) running the honest prover algorithm with witness \(w_1\) being used. We now show that these two hybrids are statistically indistinguishable to complete the proof.

Claim

Assuming the \(\varSigma \)-protocol is statistically witness indistinguishable, \(\mathsf {Hybrid} _0\) is statistically indistinguishable from \(\mathsf {Hybrid} _1\).

Proof

We now show that if there exists an unbounded time adversary \({\mathcal A} \) for which the two hybrids are not statistically indistinguishable, we can build a reduction \({\mathcal B} \) that can break the witness indistinguishability of the underlying modified Blum’s Sigma protocol which is a contradiction to Lemma 4. \({\mathcal B} \) acts as the challenger in its interaction with the adversary \({\mathcal A} \) that is trying to distinguish between these two hybrids. Further, \({\mathcal B} \) acts as the adversary in its interaction with a challenger \({\mathcal C} \) in trying to break the WI property of the modified Blum Sigma protocol. Initially, \({\mathcal A} \) sends a statement x, a pair of witnesses \((w_0,w_1)\) and a first round message (qK) for the above ZAP construction. \({\mathcal B} \) forwards \((x,w_0,w_1)\) to the challenger \({\mathcal C} \) and sends q as its first message of the underlying protocol \(\varSigma \). \({\mathcal C} \) responds with its round two message a on behalf of the prover. \({\mathcal B} \) computes \(e\leftarrow \mathcal {H}.\mathsf {Eval}(K,x,(q,a))\) and sends it to \({\mathcal C} \). Finally, \({\mathcal C} \) responds with the last round message z on behalf of the prover. Now, \({\mathcal B} \) sends the tuple (xaez) to \({\mathcal A} \) as the prover message for the above ZAP protocol. Observe that if the challenger \({\mathcal C} \) interacted using witness \(w_0\), then the interaction between the reduction \({\mathcal B} \) and the adversary \({\mathcal A} \) is identical to \(\mathsf {Hybrid} _0\) and if the challenger \({\mathcal C} \) interacted using witness \(w_1\), then the interaction between the reduction \({\mathcal B} \) and the adversary \({\mathcal A} \) is identical to \(\mathsf {Hybrid} _1\). Thus, if these two hybrids are not statistically indistinguishable to \({\mathcal A} \), \({\mathcal B} \) can use the same guess used by \({\mathcal A} \) to distinguish them, to break the statistical witness indistinguishability property of the protocol \(\varSigma \) which is a contradiction.    \(\square \)

5.3 Statistical SPS Zero Knowledge

We achieve the following theorem:

Theorem 4

For any \(c > 0\), there exists a two message public-coin \(T_\mathsf {Sim}\)-SPS statistical zero knowledge argument system for NP in the plain model, where \(T_\mathsf {Sim}= 2^{\lambda ^{c}}\), assuming two-message oblivious transfer (OT) that is subexponentially secure against malicious senders, and quasi-polynomially correlation intractable hash functions.

Note that we can instantiate the CI hash function and the OT protocol assuming subexponential LWE. We refer to the full version of the paper for the proof of Theorem 4.