1 Introduction

The problem of proving set membership—that a given element x belongs to some set S—arises in many applications, including governmental white-lists to prevent terrorism or money-laundering, voting and anonymous credentials, among others. More recently, this problem also appears at the heart of currency transfer and identity systems over blockchains. In this setting, parties can first publicly commit to sets of data (through the blockchain itself) and then, by proving set membership, can claim ownership of assets or existence of identity attributes, while ensuring privacy.

A naive approach to check if an element is in a set is to go through all its entries. The complexity of this approach, however, is unacceptable in many scenarios. This is especially true for blockchains, where most of the parties (the verifiers) should run quickly.

How to efficiently verify set membership then? Cryptographic accumulators [6] provide a nice solution to this problem. They allow a set of elements to be compressed into a short value (the accumulator) and to generate membership proofs that are short and fast to verify. As a security guarantee they require it should be computationally infeasible to generate a false membership proof.

As of today, we can divide constructions for accumulators into three main categories: Merkle Trees [55]; RSA-based [2, 11, 16, 50]; pairing-based [17, 32, 57, 78]. Approaches based on Merkle TreesFootnote 1 allow for short (i.e., O(1)) public parameters and accumulator values, whereas the witness for membership proofs is of size \(\log (n)\), where n is the size of the set. In RSA-based constructions (which can be actually generalized to any group of unknown order [48], including class groups) both the accumulator and the witness are each a single element in a relatively large hidden-order group \(\mathbb {G}\),Footnote 2 and thus of constant-size. Schemes that use pairings in elliptic curves such as [17, 57] offer small accumulators and small witnesses (which can each be a single element of a prime order bilinear group, e.g., 256 bits) but require large parameters (approximately O(n)) and a trusted setup.

In anonymous cryptocurrencies, e.g. Zerocash [5] (but also in other applications such as Anonymous Credentials [22] and whitelists), we also require privacy. That is, parties in the system would not want to disclose which element in the set is being used to prove membership. Phrased differently, one desires to prove that \(u \in S\) without revealing u, or: the proof should be zero-knowledge [45] for u. As an example, in Zerocash users want to prove that a coin exists (i.e. belongs to the set of previously sent coins) without revealing which coin it is that they are spending.

In practice it is common that this privacy requirement goes beyond proving membership. In fact, these applications often require proving further properties about the accumulated elements, e.g., that for some element u in the set, property P(u) holds. And this without leaking any more information about u other than what is entailed by P. In other words, we desire zero-knowledge for the statement \(R^*(S, u) :=``u \in S \text { and } P(u)"\).

One way to solve the problem, as done in Zerocash, is to directly apply general-purpose zero-knowledge proofs for \(R^*\), e.g., [46, 61]. This approach, however, tends to be expensive and ad-hoc. One of the questions we aim to tackle is that of providing a more efficient proof systems for set membership relations, that can also be modular.

Specifically, as observed in [18], the design of practical proof systems can benefit from a more modular vision. A modular framework such as the one in [18] not only allows for separation of concerns, but also increases reusability and compatibility in a plug-and-play fashion: the same proof system is designed once and can be reused for the same sub-problem regardless of the contextFootnote 3; it can be replaced with a component for the same sub-problem at any time. Also, as [18] shows, this can have a positive impact on efficiency since designing a special-purpose proof system for a specific relation can lead to significant optimizations. Finally, this compositional approach can also be leveraged to build general-purpose proof systems.

In this work we focus on applying this modular vision to designing succinct zero-knowledge proofs for set membership. Following the abstract framework in [18] we investigate how to apply commit-and-prove techniques [20] to our setting. Our approach uses commitments for composability as follows. Consider an efficient zero-knowledge proof system \(\Pi \) for property P(u). Let us also assume it is commit-and-prove, i.e. the verifier can test P(u) by simply holding a commitment \(c(u)\) to u. Such \(\Pi \) could be for example a commit-and-prove NIZK such as Bulletproofs [13] or a commit-and-prove zkSNARK such as LegoGroth16 from [18] that are able to operate on Pedersen commitments \(c(\cdot )\) over elliptic curves. In order to obtain a proof gadget for set membership, all one needs to design is a commit-and-prove scheme for the relations “\(u \in S\)” where both u and S are committed: u through \(c(u)\) and S through some other commitment for sets, such as an accumulator.

Our main contribution is to propose a formalization of this approach and new constructions of succinct zero-knowledge commit-and-prove systems for set membership. In addition, as we detail later, we also extend our results to capture proofs of non-membership, i.e., to show that \(u \notin S\). For our constructions we focus on designing schemes where \(c(u)\) is a Pedersen commitment in a prime order group \(\mathbb {G}_{q}\). We focus on linking through Pedersen commitments as these can be (re)used in some of the best state-of-the-art zero-knowledge proof systems for general-purpose relations that offer for example the shortest proofs and verification time (see, e.g., [46] and its efficient commit-and-prove variant [18]), or transparent setup and logarithmic-size proofs [13].

Before describing our results in more detail, we review existing solutions and approaches to realize commit-and-prove zkSNARKs for set membership.

1.1 Existing approaches for proving set membership for pedersen commitments

The accumulator of Nguyen [57], by the simple fact of having a succinct pairing-based verification equation, can be combined with standard zero-knowledge proof techniques (e.g., Sigma protocols or the celebrated Groth–Sahai proofs [47]) to achieve a succinct system with reasonable proving and verification time. The main drawbacks of using [57], however, are the large public parameters (i.e. requiring as many prime group elements as the elements in the set) and a high cost for updating the accumulator to the set, in order to add or remove elements (essentially requiring to recompute the accumulator from scratch).

By using general-purpose zkSNARKs one can obtain a solution with constant-size proofs based on Merkle Trees: prove that there exists a valid path which connects a given leaf to the root; this requires proving correctness of about \(\log n\) hash function computations (e.g., SHA256). This solution yields a constant-size proof and requires \(\log n\)-size public parameters if one uses preprocessing zkSNARKs such as [46, 61]. On the other hand, often when proving a relation such as \(R^*(S, u) :=``u \in S \text { and } P(u)''\) the bulk of the work stems from the set membership proof. This is the case in Zcash or FilecoinFootnote 4 where the predicate \(P(\cdot )\) is sufficiently small.

Finally, another solution that admits constant-size public parameters and proofs is the protocol of [16]. Specifically, Camenisch and Lysyanskaya showed how to prove in zero-knowledge that an element \(u\) committed in a Pedersen commitment over a prime order group \(\mathbb {G}_{q}\) is a member of an RSA accumulator. In principle this solution would fit the criteria of the gadget we are looking for. Nonetheless, its concrete instantiations show a few limitations in terms of efficiency and flexibility. The main problem is that, for its security to hold, we need a prime order group (the commitment space) and the primes (the message space) to be quite large, for exampleFootnote 5\(q > 2^{519}\). But having such a large prime order group may be undesirable in practice for efficiency reasons. In fact the group \(\mathbb {G}_{q}\) is the one that is used to instantiate more proof systems that need to interact and be linked with the Pedersen commitment.

1.2 Our contributions

We investigate the problem of designing commit-and-prove zero-knowledge systems for set membership and non-membership that can be used in a modular way and efficiently composed with other zero-knowledge proof systems for potentially arbitrary relations. Our main results are the following.

First, building upon the view of recent works on composable proofs [1, 18], we define a formal framework for commit-and-prove zkSNARKs (CP-SNARKs) for set (non-)membership. The main application of this framework is a compiler that, given a CP-SNARK \(\textsf {CP}_{\textsf{mem}}\) for set membership and any other CP-SNARK \(\textsf {CP}_{R}\) for a relation R, yields a CP-SNARK \(\textsf {CP}\) for the composed relation “\(u\in S \wedge \exists \omega : R(u, \omega )\)”. As a further technical contribution, our framework extends the one in [18] in order to work with commitments from multiple schemes (including set commitments, e.g., accumulators).

Second, we propose new efficient constructions of CP-SNARKs for set membership and non-membership, in which elements of the accumulated set can be committed with a Pedersen commitment in a prime order group \(\mathbb {G}_{q}\)—a setting that, as argued before, is of practical relevance due to the widespread use of these commitments and of proof systems that operate on them. In more detail, we propose: four schemes (two for set membership and two for non-membership) that enjoy constant-size public parameters and are based on RSA accumulators for committing to sets, and a scheme over pairings that has public parameters linear in the size of the set, but where the set can remain hidden.

Finally, we implement our solutions in a software library and experimentally evaluate their performance.

Like the recent works [1, 18], our work can be seen as showing yet another setting—set membership—where the efficiency of SNARKs can benefit from a modular design.

1.3 RSA-based constructions

Our first scheme, a CP-SNARK for set membership based on RSA accumulators, supports a large domain for the set of accumulated elements, represented by binary strings of a given length \(\eta \). Our second scheme, also based on RSA accumulators, supports elements that are prime numbers of exactly \(\mu \) bits (for a given \(\mu \)). Neither scheme requires an a-priori bound on the cardinality of the set. Both schemes improve the proof-of-knowledge protocol by Camenisch and Lysyanskaya [16]: (i) we can work with a prime order group \(\mathbb {G}_{q}\) of “standard” size, e.g., 256 bits, whereas [16] needs a much larger \(\mathbb {G}_{q}\) (see above). We note that the size of \(\mathbb {G}_{q}\) affects not only the efficiency of the set membership protocol but also the efficiency of any other protocol that needs to interact with commitments to alleged set members; (ii) we can support flexible choices for the size of set elements. For instance, in the second scheme, we could work with primes of about 50 or 80 bits,Footnote 6 which in practice captures virtually unbounded sets and can make the accumulator operations 4–\(5\times \) faster compared to using \(\approx 256\)-bits primes as in [16].

Our main technical contribution here involves a new way to link a proof of membership for RSA accumulators to a Pedersen commitment in a prime order group, together with a careful analysis showing this can be secure under parameters not requiring a larger prime order group (as in [16]). See Sect. 4 for further details.

1.4 Pairing-based construction

Our pairing-based scheme for set membership supports set elements in \(\mathbb {Z}_q\), where \(q\) is the order of bilinear groups, while the sets are arbitrary subsets of \(\mathbb {Z}_q\) of cardinality less than a fixed a-priori bound n. This scheme has the disadvantage of having public parameters linear in n, but has other advantages in comparison to previous schemes with a similar limitation (and also in comparison to the RSA-based schemes above). First, the commitment to the set can be hiding and untrusted for the verifier, i.e., the set can be kept hidden and it is not needed to check the opening of the commitment to the set; this makes it composable with proof systems that could for example prove global properties on the set, i.e., that P(S) holds. Second, the scheme works entirely in bilinear groups, i.e., no need of operating over RSA groups. The main technical contribution here is a technique to turn the EDRAX vector commitment [23] into an accumulator admitting efficient zero-knowledge membership proofs.

1.5 Extensions to set non-membership

We propose extensions of both our CP-SNARK framework and RSA constructions to deal with proving set non-membership, namely proving in zero-knowledge that \(u \notin S\) with respect to a commitment \(c(u)\) and a committed set S. Our two RSA-based schemes for non-membership have the same features as the analogous membership schemes mentioned above: the first scheme supports sets whose elements are strings of length \(\eta \), the second one supports elements that are prime numbers of \(\mu \) bits, and both work with elements committed using Pedersen in a prime order group and sets committed with RSA accumulators. A byproduct of sharing the same parameters is that we can easily compose the set-membership and non-membership schemes, via our framework, in order to prove statements like \(u \in S_1 \wedge u \notin S_2\). Our technical contribution in the design of these schemes is a zero-knowledge protocol for non-membership witnesses of RSA accumulators that is linked to Pedersen commitments in prime order groups.

1.6 Implementation and experiments

We have implemented our RSA-basedFootnote 7 schemes for membership and non-membership as a Rust library which is publicly available [28]. Our library is implemented in a modular fashion to work with any elliptic curve from libzexe [67] and Ristretto from curve25519-dalek [54]. This choice enables everyone to easily and efficiently combine our CP-SNARKs in a modular way with other CP-SNARKs implemented over these elliptic curves, such as Bulletproofs [13] and LegoGroth16 [18].

We evaluated our RSA-based constructions and compared them against highly optimized solutions based on Merkle Trees.Footnote 8 Our schemes achieve significantly better performance in proving time while slightly compromising on proof size and verification time. Our implementation is fast, yet we have not heavily optimized it and thus expect the results can be further improved.

Our solutions supporting sets of arbitrary elements achieve a proving time that is up toFootnote 9\(3.7\times \) faster for set membership (309 ms vs. 1.14 s) and up to \(7\times \) faster for set non-membership (325 ms vs. 2.28 s).Footnote 10

Our solutions where elements of the set are large prime numbers (i.e., of 252-bit size) offer even better results: our proving time is \(4.5\times \)\(23.5\times \) faster for membership and \(6.8\times \)\(36\times \) faster for non-membership (depending on the depth of the Merkle tree used in the comparison). We also show an optimization that, at the price of achieving computational (instead of statistical) zero-knowledge, is twice faster (see Sect. 7.4). This scenario can for example capture the case of sets made of hiding commitments that are prime numbers. In Sect. 8 we discuss how this can be relevant for a slight variant of the Zerocash protocol where commitments can be made prime numbers.

More details on the implementation and the benchmarks are available in Sect. 7.

1.7 Transparent instantiations

We generalize our building blocks for RSA groups to any hidden-order group (Appendix 4). By instantiating the latter with class groups and by using a transparent CP-NIZK such as Bulletproofs, we obtain variants of our RSA-based schemes with transparent setup. Class groups are more expensive than traditional RSA groups; in this setting we still obtain performance (proving time 12s; \(|\Pi | = 6.4\) kB) outperforming other transparent solution for large Merkle trees, roughly \(2^{64}\) leaves (see [79, Fig. 5] which summarizes performances of transparent SNARKs used to prove Merkle tree computations using SHA256 as hash). These potential gains come at the price of a relatively longer verification (compared to other solutions): 6.4 s.

1.8 Other related work

Ozdemir et al. [58] recently proposed a solution to scale operations on RSA accumulators inside a SNARK. In particular, their approach scales when these operations are batched (i.e., when proving membership of many elements at the same time); for example, they surpass a \(2^{20}\)-large Merkle tree when proving batches of at least 600 elements. This approach is attractive in settings where we can delegate a large quantity of these checks to an untrusted server as there is a high constant proving cost. In contrast, our approach can achieve faster proving time than Merkle trees already for a single membership check. It is an interesting open problem to adapt our techniques for modular set (non-)membership for the case of batched membership while keeping the tested elements hidden.

1.9 Organization

We give basic definitions in Sect. 2. In Sect. 3 we formalize commit-and-prove zkSNARKs for set (non-)membership. We describe our main constructions based on RSA accumulators for set membership and non-membership respectively in Sects. 4 and 5. We describe our construction for set membership based on bilinear pairings in Sect. 6. Finally, in Sects. 7 and 8 we discuss our implementation, experiments and applications.

1.10 Recent developments

Here we mention recent developments in the area of zero-knowledge proof for set (non-)membership, following the conference version of this paper published in 2021 [8].

A closely related work is that of Campanelli et al. [19] who present zero-knowledge protocols for RSA Accumulators with which one can prove membership for any number of Pedersen-committed elements (a so-called ‘batch proof’). That is the proofs of [19] are independent both of the size of the set and the number of elements proving membership for.

In the bilinear groups setting, Srinivasan et al. [70], among other improvements on the functionalities and security properties of the actual pairing-based accumulator, provide zero-knowledge (batch) proofs for membership and non-membership over the Nguyen accumulator [57].

Another relevant, rapidly developing, line of work has to do with succinct zero-knowledge lookup arguments. That is, given a committed vector of n elements, one proves that a number m of committed elements are all values of the vector in some hidden position, while retaining the elements secret. The proofs are succinct in both n and m. This line of work was initiated by the seminal work of Zapico et al. [74] followed by a number of works improving the prover’s complexity [35, 42, 62, 75]. All these constructions work over bilinear groups.

Finally, Lipmaa and Parisella [53] (building on [24, 26]) construct succinct set (non-)membership NIZKs from falsifaible assumptions. That is, the objective of their work is constructing efficient NIZKs for set (non-)membership that can be proven secure in the standard model and assuming only falsifiable assumptions.

1.11 Publication note

This article is the long version of the homonymous paper that appeared in the proceedings of Financial Cryptography and Data Security 2021 [8]. This version additionally contains:

  • The Sect. 1.10 on recent developments (subsequent to [8] works) in the area.

  • The full definitional framework of CP-SNARKs for set (non-)membership (Sect. 3).

  • The pairing-based construction of Sect. 6.

  • Full security proofs of the RSA-based constructions (Sects. 4, 5).

  • An experimental evaluation of our RSA-based protocols (Sect. 7).

  • A (slightly) different variant of our non-membership protocol (Appendix 2).

  • A discussion on how to extend our RSA-based protocols to work with any Hidden Order Group (Appendix 4).

2 Preliminaries

2.1 Notation

We denote the security parameter with \(\lambda \in \mathbb {N}\) and its unary representation with \(1^\lambda \). Throughout the paper we assume that all the algorithms of the cryptographic schemes take as input \(1^\lambda \), which is thus omitted from the list of inputs. If D is a distribution, we denote by \(x \leftarrow D\) the process of sampling x according to D. An ensemble \(\mathcal {X} = \{X_{\lambda }\}_{\lambda \in \mathbb {N}}\) is a family of probability distributions over a family of domains \(\mathcal {D}=\{D_{\lambda }\}_{\lambda \in \mathbb {N}}\), and we say that two ensembles \(\mathcal {D} = \{D_{\lambda }\}_{\lambda \in \mathbb {N}}\) and \(\mathcal {D}' = \{D'_{\lambda }\}_{\lambda \in \mathbb {N}}\) are statistically indistinguishable (denoted by \(\mathcal {D} \approx _s\mathcal {D'}\)) if \(\frac{1}{2}\sum _x |D_{\lambda }(x)-D_{\lambda }'(x)| < \textsf{negl}(\lambda )\). If \(\mathcal {A}= \{ \mathcal {A}_{\lambda } \}\) is a (possibly non-uniform) family of circuits and \(\mathcal {D} = \{D_{\lambda }\}_{\lambda \in \mathbb {N}}\) is an ensemble, then we denote by \(\mathcal {A}(\mathcal {D})\) the ensemble of the outputs of \(\mathcal {A}_{\lambda }(x)\) when \(x \leftarrow D_{\lambda }\). We say two ensembles \(\mathcal {D} = \{D_{\lambda }\}_{\lambda \in \mathbb {N}}\) and \(\mathcal {D}' = \{D'_{\lambda }\}_{\lambda \in \mathbb {N}}\) are computationally indistinguishable (denoted by \(\mathcal {D} \approx _c\mathcal {D'}\)) if for every non-uniform polynomial time distinguisher \(\mathcal {A}\) we have \(\mathcal {A}(\mathcal {D}) \approx _s\mathcal {A}(\mathcal {D'})\).

We use [n] to denote the set of integers \(\{1, \dots , n\}\), and [0, n] for \(\{0, 1, \dots , n\}\). We denote by \((u_j)_{j \in [\ell ]}\) the tuple of elements \((u_1, \ldots , u_{\ell })\).

We denote \(\textsf{Primes}:=\{e\in \mathbb {N}: e\text { is prime}\}\) the set of all positive integers \(e>1\) such that they do not have non-trivial (i.e. different than \(e\) and 1) factors. More specifically, given two positive integers \(A, B > 0\) such that \(A<B\), we denote with \(\textsf{Primes}(A,B)\) the subset of \(\textsf{Primes}\) of numbers lying in the interval (AB), i.e., \(\textsf{Primes}(A,B) :=\{e\in \mathbb {Z}: e\text { is prime} \; \wedge \; A<e<B\}\). According to the well known prime number theorem \(\left| \textsf{Primes}(1,B) \right| = O\big (\frac{B}{\log B}\big )\) which results to \(\left| \textsf{Primes}(A,B) \right| =O\big (\frac{B}{\log B}\big )-O\big (\frac{A}{\log A}\big )\).

2.2 RSA groups

We say that \(N=pq\) is an RSA modulus for some primes pq, such that \(|p|=|q|\). We further say that N is a strong RSA modulus if there are primes \(p',q'\) such that \(p=2p'+1,q = 2q'+1\). We call \(\mathbb {Z}_N^*\) for an RSA modulus an RSA group. With \(\phi :\mathbb {N} \rightarrow \mathbb {N}\) we denote the Euler’s totient function, \(\phi (N) :=|\mathbb {Z}_N^*|\). In particular for RSA modulus \(\phi (N) = (p-1)(q-1)\). An RSA Group generator \(N {\leftarrow }{\$}\,\textsf{GenSRSAmod}(1^{\lambda })\) is a probabilistic algorithm that outputs a strong RSA modulus N of bit-length \(\ell (\lambda )\) for an appropriate polynomial \(\ell (\cdot )\).

For any N we denote by \(\textsf{QR}_N :=\{Y: \exists X \in \mathbb {Z}_N^*\text { such that } Y=X^2 \pmod {N}\}\), the set of all the quadratic residues modulo N. \(\textsf{QR}_N\) is a subgroup (and thus closed under multiplication) of \(\mathbb {Z}_N^*\) with order \(|\textsf{QR}_N| = |\mathbb {Z}_N^*|/2\). In particular for a strong RSA modulus \(|\textsf{QR}_N| = \frac{4p'q'}{2} = 2 p'q'\).

2.2.1 Computational assumptions in RSA groups

The most fundamental assumption for RSA groups is the factoring assumption which states that given an RSA modulus \(N \leftarrow \textsf{GenSRSAmod}(1^{\lambda })\) it is hard to compute its factors p and q. We further recall the Discrete Logarithm and strong RSA [2] assumptions:

Definition 2.1

(DLOG assumption for RSA groups) We say that the Discrete Logarithm (DLOG) assumption holds for \(\textsf{GenSRSAmod}\) if for any PPT adversary \(\mathcal {A}\):

$$\begin{aligned} \Pr \left[ \begin{aligned} G^{x'}=Y \pmod {N} \end{aligned} \;:\; \begin{aligned}&N \leftarrow \textsf{GenSRSAmod}(1^\lambda )\\&G {\leftarrow }{\$}\,\mathbb {Z}_N^*; x {\leftarrow }{\$}\,\mathbb {Z}\\&Y \leftarrow G^x \pmod {N}\\&x' \leftarrow \mathcal {A}(\mathbb {Z}_N^*,G,Y)\end{aligned}\right] = \textsf{negl}(\lambda ). \end{aligned}$$

Definition 2.2

(Strong-RSA assumption [2]) We say that the strong RSA assumption holds for \(\textsf{GenSRSAmod}\) if for any PPT adversary \(\mathcal {A}\):

$$\begin{aligned} \Pr \left[ \begin{aligned} U^{e}=G\\ \wedge e \ne 1,-1 \end{aligned} \;:\; \begin{aligned}&N \leftarrow \textsf{GenSRSAmod}(1^\lambda )\\&G {\leftarrow }{\$}\,\mathbb {Z}_N^*\\&(U,e) \leftarrow \mathcal {A}(\mathbb {Z}_N^*,G)\end{aligned}\right] = \textsf{negl}(\lambda ). \end{aligned}$$

2.3 Non-interactive zero-knowledge (NIZK)

We recall the definition of zero-knowledge non-interactive arguments of knowledge (NIZKs, for short).

Definition 2.3

(NIZK) A NIZK for \(\{\mathcal {R}_{\lambda }\}_{\lambda \in \mathbb {N}}\) is a tuple of three algorithms \(\Pi = (\textsf{KeyGen}, \textsf{Prove}, \textsf{VerProof})\) that work as follows and satisfy the notions of completeness, knowledge soundness and (composable) zero-knowledge defined below.

  • \(\textsf{KeyGen}(R) \rightarrow (\textsf{ek}, \textsf{vk})\) takes the security parameter \(\lambda \) and a relation \(R\in \mathcal {R}_{\lambda }\), and outputs a common reference string consisting of an evaluation and a verification key.

  • \(\textsf{Prove}(\textsf{ek}, x, w) \rightarrow \pi \) takes an evaluation key for a relation \(R\), a statement \(x\), and a witness \(w\) such that \(R(x, w)\) holds, and returns a proof \(\pi \).

  • \(\textsf{VerProof}(\textsf{vk}, x, \pi ) \rightarrow b\) takes a verification key, a statement \(x\), and either accepts (\(b=1\)) or rejects (\(b=0\)) the proof \(\pi \).

Completeness For any \(\lambda \in \mathbb {N}\), \(R\in \mathcal {R}_{\lambda }\) and \((x, w)\) such that \(R(x, w)\), it holds \(\Pr [(\textsf{ek}, \textsf{vk}) \leftarrow \textsf{KeyGen}(R), \pi \leftarrow \textsf{Prove}(\textsf{ek}, x, w): \textsf{VerProof}(\textsf{vk}, x, \pi )=1 ]=1\).

Knowledge soundness Let \(\mathcal{R}\mathcal{G}\) be a relation generator such that \(\mathcal{R}\mathcal{G}_{\lambda } \subseteq \mathcal {R}_{\lambda }\). \(\Pi \) has computational knowledge soundness for \(\mathcal{R}\mathcal{G}\) and auxiliary input distribution \(\mathcal {Z}\), denoted \(\textsf{KSND}(\mathcal{R}\mathcal{G}, \mathcal {Z})\) for brevity, if for every (non-uniform) efficient adversary \(\mathcal {A}\) there exists a (non-uniform) efficient extractor \(\mathcal {E}\) such that \(\Pr [{\textsf{Game}}^{\textsf{KSND}}_{\mathcal{R}\mathcal{G},\mathcal {Z},\mathcal {A},\mathcal {E}} = 1] = \textsf{negl}\). We say that \(\Pi \) is knowledge sound if there exists benign \(\mathcal{R}\mathcal{G}\) and \(\mathcal {Z}\) such that \(\Pi \) is \(\textsf{KSND}(\mathcal{R}\mathcal{G}, \mathcal {Z})\).

figure a

Composable zero-knowledge A scheme \(\Pi \) satisfies composable zero-knowledge for a relation generator \(\mathcal{R}\mathcal{G}\) if there exists a simulator \(\mathcal {S}= (\mathcal {\mathcal {S}_{\textsf{kg}}}, \mathcal {\mathcal {S}_{\textsf{prv}}})\) such that both following conditions hold.

Keys indistinguishability For all adversaries \(\mathcal {A}\)

$$\begin{aligned} \Pr \left[ \;\,\begin{aligned}&(R, \textsf{aux}_{R}) \leftarrow \mathcal{R}\mathcal{G}(1^\lambda ) \\ {}&\textsf{crs}\leftarrow \textsf{KeyGen}(R) \\ {}&\mathcal {A}(\textsf{crs}, \textsf{aux}_{R}) = 1 \end{aligned}\;\, \right] \approx \Pr \left[ \;\,\begin{aligned}&(R, \textsf{aux}_{R}) \leftarrow \mathcal{R}\mathcal{G}(1^\lambda ) \\&({\textsf{crs}},\textsf{td}_{\textsf{k}}) \leftarrow \mathcal {\mathcal {S}_{\textsf{kg}}}(R) \\ {}&\mathcal {A}({\textsf{crs}}, \textsf{aux}_{R}) = 1\end{aligned}\;\, \right] {}. \end{aligned}$$

Proof indistinguishability For all adversaries \(\mathcal {A}= (\mathcal {A}_1, \mathcal {A}_2)\)

$$\begin{aligned}&\Pr \left[ \begin{aligned}&(R, \textsf{aux}_{R}) \leftarrow \mathcal{R}\mathcal{G}(1^\lambda ) \\ {}&({\textsf{crs}}, \textsf{td}_{\textsf{k}}) \leftarrow \mathcal {\mathcal {S}_{\textsf{kg}}}(R)\\ {}&(x, w, \textsf{st}) \leftarrow \mathcal {A}_1({\textsf{crs}}, \textsf{aux}_{R}) \\ {}&\pi \leftarrow \textsf{Prove}(\textsf{ek}, x, w) \\ {}&\mathcal {A}_2(\textsf{st}, \pi ) = 1\end{aligned} \;:\; \begin{aligned}R(x, w)\end{aligned}\right] \\&\quad \approx \Pr \left[ \begin{aligned}&(R, \textsf{aux}_{R}) \leftarrow \mathcal{R}\mathcal{G}(1^\lambda ) \\ {}&({\textsf{crs}}, \textsf{td}_{\textsf{k}}) \leftarrow \mathcal {\mathcal {S}_{\textsf{kg}}}(R)\\ {}&(x, w, \textsf{st}) \leftarrow \mathcal {A}_1({\textsf{crs}}, \textsf{aux}_{R}) \\ {}&{\pi }\leftarrow \mathcal {\mathcal {S}_{\textsf{prv}}}(\textsf{crs}, \textsf{td}_{\textsf{k}}, x) \\ {}&\mathcal {A}_2(\textsf{st}, {\pi }) = 1\end{aligned} \;:\; \begin{aligned}R(x, w)\end{aligned}\right] . \end{aligned}$$

Definition 2.4

(zkSNARKs) A NIZK \(\Pi \) is called zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) if \(\Pi \) is a NIZK as per Definition 2.3 enjoying an additional property, succinctness, i.e., if the running time of \(\textsf{VerProof}\) is \(\textsf{poly}(\lambda + |x| + \log |w|)\) and the proof size is \(\textsf{poly}(\lambda + \log |w|)\).

Remark 2.1

(On knowledge-soundness) In the NIZK definition above we use a non black-box notion of extractability. Although this is virtually necessary in the case of zkSNARKs [44], NIZKs can also satisfy stronger (black-box) notions of knowledge-soundness.

2.4 Type-based commitments

We recall the notion of Type-Based Commitment schemes introduced by Escala and Groth [36]. In brief, a Type-Based Commitment scheme is a normal commitment scheme with the difference that it allows one to commit to values from different domains. More specifically, the \(\textsf{Commit}\) algorithm (therefore the \(\textsf{VerCommit}\) algorithm also) depends on the domain of the input, while the commitment key remains the same. For example, as in the original motivation of [36], the committer can use the same scheme and key to commit to elements that may belong to two different groups \(\mathbb {G}_1,\mathbb {G}_2\) or a field \(\mathbb {Z}_p\). In our work we use type-based commitments. The main benefit of this formalization is that it can unify many commitment algorithms into one scheme. In our case this is useful to formalize the notion of commit-and-prove NIZKs that work with commitments from different groups and schemes.

More formally, a Type-Based Commitment is a tuple of algorithms \(\textsf{Com}= (\textsf{Setup}, \textsf{Commit}, \textsf{VerCommit})\) that works as a Commitment scheme defined above with the difference that \(\textsf{Commit}\) and \(\textsf{VerCommit}\) algorithms take an extra input \(\textsf{t}\) that represent the type of \(u\). All the possible types are included in the type space \(\mathcal {T}\).Footnote 11

Definition 2.5

A type-based commitment scheme for a set of types \(\mathcal {T}\) is a tuple of algorithms \(\textsf{Com}= (\textsf{Setup}, \textsf{Commit}, \textsf{VerCommit})\) that work as follows:

  • \(\textsf{Setup}(1^\lambda ) \rightarrow \textsf{ck}\) takes the security parameter and outputs a commitment key \(\textsf{ck}\). This key includes \(\forall \textsf{t}\in \mathcal {T}\) descriptions of the input space \(\mathcal {D}_\textsf{t}\), commitment space \(\mathcal {C}_\textsf{t}\) and opening space \(\mathcal {O}_\textsf{t}\).

  • \(\textsf{Commit}(\textsf{ck}, \textsf{t}, u) \rightarrow (c, o)\) takes the commitment key \(\textsf{ck}\), the type \(\textsf{t}\) of the input and a value \(u\in \mathcal {D}_\textsf{t}\), and outputs a commitment \(c\) and an opening \(o\).

  • \(\textsf{VerCommit}(\textsf{ck}, \textsf{t}, c, u, o) \rightarrow b\) takes as a type \(\textsf{t}\), a commitment \(c\), a value \(u\) and an opening \(o\), and accepts (\(b=1\)) or rejects (\(b=0\)).

Furthermore, the security properties depend on the type, in the sense that binding and hiding should hold with respect to a certain type.

Definition 2.6

Let \(\mathcal {T}\) be a set of types, and \(\textsf{Com}\) be a type-based commitment scheme for \(\mathcal {T}\). Correctness, \(\textsf{t}\)-Type Binding and \(\textsf{t}\)-Type Hiding are defined as follows:

Correctness For all \(\lambda \in \mathbb {N}\) and any input \((\textsf{t}, u) \in (\mathcal {T}, \mathcal {D}_\textsf{t})\) we have:

$$\begin{aligned} \Pr [\textsf{ck}\leftarrow \textsf{Setup}(1^\lambda ), (c, o) \leftarrow \textsf{Commit}(\textsf{ck}, \textsf{t}, u): \textsf{VerCommit}(\textsf{ck}, \textsf{t}, c, u, o)=1]=1. \end{aligned}$$

\(\textsf{t}\)-Type binding Given \(\textsf{t}\in \mathcal {T}\), for every polynomial-time adversary \(\mathcal {A}\):

$$\begin{aligned} \Pr \left[ \begin{aligned}&\textsf{ck}\leftarrow \textsf{Setup}(1^\lambda ) \\ {}&(c, u, o, u', o') \leftarrow \mathcal {A}(\textsf{ck},\textsf{t})\end{aligned} \;:\; \begin{aligned} u\ne u' \wedge \ \textsf{VerCommit}(\textsf{ck},\textsf{t}, c, u, o)=1 \\ \wedge \ \textsf{VerCommit}(\textsf{ck}, \textsf{t}, c, u', o')=1\end{aligned}\right] =\textsf{negl}. \end{aligned}$$

In case \(\textsf{Com}\) is \(\textsf{t}\)-type bidning for all \( \textsf{t}\in \mathcal {T}\) we will say that it is Binding.

\(\textsf{t}\) -Type hiding Given a \(\textsf{t}\in \mathcal {T}\), for \(\textsf{ck}\leftarrow \textsf{Setup}(1^\lambda )\) and every pair of values \(u, u' \in \mathcal {D}_\textsf{t}\), the following two distributions are statistically close: \(\textsf{Commit}(\textsf{ck}, \textsf{t}, u) \approx \textsf{Commit}(\textsf{ck},\textsf{t}, u')\).

In case \(\textsf{Com}\) is \(\textsf{t}\)-Type Hiding for all \(\textsf{t}\in \mathcal {T}\) we say it is Hiding.

Composing type-based commitments For simplicity we now define an operator that allows to compose type-based commitment schemes in a natural way.

Definition 2.7

Let \(\textsf{C}\) and \(\textsf{C}'\) be two commitment schemes respectively for (disjoint) sets of types \(\mathcal {T}\) and \(\mathcal {T}'\). Then we denote by \(\textsf{C}\bullet \textsf{C}'\) the commitment scheme \(\bar{\textsf{C}}\) for \(\mathcal {T}\cup \mathcal {T}'\) such as:

  • \(\bar{\textsf{C}}.\textsf{Setup}(\textsf{secpar}, \textsf{secpar}') \rightarrow \overline{\textsf{ck}}:\) compute \(\textsf{ck}\leftarrow \textsf{C}.\textsf{Setup}(\textsf{secpar}) \text { and } \textsf{ck}' \leftarrow \textsf{C}'.\textsf{Setup}(\textsf{secpar}'); \overline{\textsf{ck}} :=(\textsf{ck}, \textsf{ck}')\).

  • \(\bar{\textsf{C}}.\textsf{Commit}(\overline{\textsf{ck}} :=(\textsf{ck}, \textsf{ck}'), \textsf{t}, u):\) If \(\ t \in \mathcal {T}\) then output \(\textsf{C}.\textsf{Commit}(\textsf{ck}, \textsf{t}, u)\); otherwise return \(\textsf{C}'.\textsf{Commit}(\textsf{ck}', \textsf{t}, u)\).

  • \(\bar{\textsf{C}}.\textsf{VerCommit}(\overline{\textsf{ck}} :=(\textsf{ck}, \textsf{ck}'), \textsf{t}, c, u, o):\) If \(\ t \in \mathcal {T}\) then return \(\textsf{C}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}, c, u, o)\); otherwise return \(\textsf{C}'.\textsf{VerCommit}(\textsf{ck}', \textsf{t}, c, u, o)\).

The following property of \(\bullet \) follows immediately from its definition.

Lemma 2.1

Let \(\textsf{C}\) and \(\textsf{C}'\) be two commitment schemes with disjoint sets of types. For all types t if \(\textsf{C}\) or \(\textsf{C}'\) is t-hiding (resp. t-binding) then \(\textsf{C}\bullet \textsf{C}'\) is t-hiding (resp. t-binding).

Remark 2.2

We observe that a standard non type-based commitment scheme with input space \(\mathcal {D}\) induces directly a type-based commitment scheme with the same input space and a type we denote by \(\mathbb {T}[\mathcal {D}]\).

2.5 Commit-and-prove NIZKs

We give the definition of commit-and-prove NIZKs (CP-NIZKs). We start from the definition given in [7, 18] and we extend it to type-based commitments. The main benefit of such extension is that we can formalize CP-NIZKs working with commitments over different domains. In a nutshell, a CP-NIZK is a NIZK  that can prove knowledge of \((x, w)\) such that \(R(x, w)\) holds with respect to a witness \(w=( u, \omega )\) such that \(u\) opens a commitment \(c_u\). As done in [18], we explicitly considers the input domain \(\mathcal {D}_{u}\) at a more fine grained-level splitting it over \(\ell \) subdomains. We call them commitment slots as each of the \(\mathcal {D}_i\)-s intuitively corresponds to a committed element.Footnote 12 The description of the splitting is assumed part of \(R\)’s description.

In the remainder of this work we use the following shortcut definition. If \(\textsf{C}\) is a type-based commitment scheme over set of types \(\mathcal {T}\), we say that a relation \(R\) over \((\mathcal {D}_1 \times \cdots \times \mathcal {D}_{\ell })\) is \(\mathcal {T}\)-compatible if for all \(j\in [\ell ]\) it holds that \(\mathbb {T}[\mathcal {D}_j] \in \mathcal {T}\). We say a relation family \(\mathcal {R}\) is \(\mathcal {T}\)-compatible if every \(R\) in \(\mathcal {R}\) is \(\mathcal {T}\)-compatible; a relation generator \(\mathcal{R}\mathcal{G}\) is \(\mathcal {T}\)-compatible if \(\textsf{Range}(\mathcal{R}\mathcal{G})\) is \(\mathcal {T}\)-compatible.

Definition 2.8

(CP-NIZKs [18]) Let \(\{\mathcal {R}_{\lambda }\}_{\lambda \in \mathbb {N}}\) be a family of relations \(R\) over \(\mathcal {D}_{x} \times \mathcal {D}_{u} \times \mathcal {D}_{\omega }\) such that \(\mathcal {D}_{u}\) splits over \(\ell \) arbitrary domains \((\mathcal {D}_1 \times \cdots \times \mathcal {D}_{\ell })\) for some arity parameter \(\ell \ge 1\). Let \(\textsf{C}= (\textsf{Setup}, \textsf{Commit}, \textsf{VerCommit})\) be a commitment scheme (as per Definition 2.5) over set of types \(\mathcal {T}\) such that \(\{\mathcal {R}_{\lambda }\}_{\lambda \in \mathbb {N}}\) is \(\mathcal {T}\)-compatible.

A commit and prove NIZK for \(\textsf{C}\) and \(\{\mathcal {R}_{\lambda }\}_{\lambda \in \mathbb {N}}\) is a NIZK for a family of relations \(\{\mathcal {R}^{\textsf{C}}_{\lambda }\}_{\lambda \in \mathbb {N}}\) such that:

  • every \(\smash {\varvec{\mathsf R}\in \mathcal {R}^{\textsf{C}}}\) is represented by a pair \((\textsf{ck}, R)\) where \(\textsf{ck}\in \) \(\smash {\textsf{C}.\textsf{Setup}(1^\lambda )}\) and \(R\in \mathcal {R}_{\lambda }\);

  • \(\varvec{\mathsf R}\) is over pairs \((\varvec{\mathsf x}, \varvec{\mathsf w})\) where the statement is \(\varvec{\mathsf x}:=(x, (c_j)_{j \in [\ell ]}) \in \mathcal {D}_{x} \times \mathcal {C}^{\ell }\), the witness is \(\varvec{\mathsf w}:=((u_j)_{j \in [\ell ]}, (o_j)_{j \in [\ell ]}, \omega ) \in \) \( \mathcal {D}_1 \times \cdots \times \mathcal {D}_{\ell }\times \mathcal {O}^{\ell } \times \mathcal {D}_{\omega }\), and the relation \(\varvec{\mathsf R}\) holds iff

    $$\begin{aligned} \bigwedge \nolimits _{j \in [\ell ]} \ \textsf{VerCommit}(\textsf{ck}, \mathbb {T}[\mathcal {D}_j], c_j, u_j, o_j)=1 \wedge R(x, (u_j)_{j \in [\ell ]}, \omega )=1. \end{aligned}$$

We denote knowledge soundness of a CP-NIZK for commitment scheme \(\textsf{C}\) and relation and auxiliary input generators \(\mathcal{R}\mathcal{G}\) and \(\mathcal {Z}\) as \(\textsf{CP}\text{- }\textsf{KSND}(\textsf{C}, \mathcal{R}\mathcal{G}, \mathcal {Z})\).

We denote a CP-NIZK as a tuple of algorithms \(\textsf {CP}= (\textsf{KeyGen}, \textsf{Prove}, \textsf{VerProof})\). For ease of exposition, in our constructions we adopt the following explicit syntax for \(\textsf {CP}\)’s algorithms.

  • \(\textsf{KeyGen}(\textsf{ck},R) \rightarrow \textsf{crs}:=(\textsf{ek}, \textsf{vk})\)

  • \(\textsf{Prove}(\textsf{ek}, x, (c_j)_{j \in [\ell ]}, (u_j)_{j \in [\ell ]}, (o_j)_{j \in [\ell ]}, \omega ) \rightarrow \pi \)

  • \(\textsf{VerProof}(\textsf{vk}, x, (c_j)_{j \in [\ell ]}, \pi ) \rightarrow b \in \{0, 1\}\)

2.6 Commit-and-prove NIZKs with partial opening

We now define a variant of commit-and-prove NIZKs with a weaker notion of knowledge-soundness. In particular we consider the case where part of the committed input is not assumed to be extractable (or hidden),Footnote 13 i.e., such input is assumed to be opened by the adversary. This models scenarios where we do not require this element to be input of the verification algorithm (the verifier can directly use a digest to it).

The motivation to define and use this notion is twofold. First, in some constructions commitments on sets are compressing but not knowledge-extractable. Second, in many applications this definition is sufficient since the set is public (e.g., the set contain the valid coins).

The definition below is limited to a setting where the adversary opens only one input in this fashion.Footnote 14 We will assume, as a convention, that in a scheme with partial opening this special input is always the first committed input of the relation, i.e. the one denoted by \(u_1\) and corresponding to \(\mathcal {D}_1\). We note that the commitment to \(u_1\) does not require hiding for zero-knowledge to hold.

Definition 2.9

(CP-NIZK with partial opening) A commit and prove NIZK with partial opening for \(\textsf{C}\) and \(\{\mathcal {R}_{\lambda }\}_{\lambda \in \mathbb {N}}\) is a NIZK for a family of relations \(\{\mathcal {R}^{\textsf{C}}_{\lambda }\}_{\lambda \in \mathbb {N}}\) (defined as in Definition 2.8) such that the property of knowledge soundness is replaced by knowledge soundness with partial opening below.

Knowledge soundness with partial opening Let \(\mathcal{R}\mathcal{G}\) be a relation generator such that \(\mathcal{R}\mathcal{G}_{\lambda } \subseteq \mathcal {R}_{\lambda }\). \(\Pi \) has knowledge soundness with partial opening for \(\textsf{C}\), \(\mathcal{R}\mathcal{G}\) and auxiliary input distribution \(\mathcal {Z}\), denoted \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \mathcal{R}\mathcal{G}, \mathcal {Z})\) for brevity, if for every (non-uniform) efficient adversary \(\mathcal {A}\) there exists a (non-uniform) efficient extractor \(\mathcal {E}\) such that \(\Pr [{\textsf{Game}}^{\textsf{CP}\text{- }\textsf{poKSND}}_{\textsf{C},\mathcal{R}\mathcal{G},\mathcal {Z},\mathcal {A},\mathcal {E}} = 1] = \textsf{negl}\). We say that \(\Pi \) is knowledge sound for \(\textsf{C}\) if there exists benign \(\mathcal{R}\mathcal{G}\) and \(\mathcal {Z}\) such that \(\Pi \) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \mathcal{R}\mathcal{G}, \mathcal {Z})\).Footnote 15

figure b

Remark 2.3

(On weaker ZK in the context of partial opening) The notion of zero-knowledge for CP-NIZKs with partial opening that is implied by our definition above implies that the simulator does not have access to the opening of the first input (as it is the case in zero-knowledge for CP-NIZKs in general). Since this first commitment is opened, in principle one could also consider and define a weaker notion of zero-knowledge where the simulator has access to the first opened input. We leave it as an open problem to investigate if it can be of any interest.

Remark 2.4

(Full extractability) If a CP-NIZK has an empty input \(u_1\) opened by the adversary in the game above, then we say that it is fully extractable. This roughly corresponds to the notion of knowledge soundness in Definition 2.3.

2.6.1 Composition properties of commit-and-prove schemes

In [18], Campanelli et al. show a compiler for composing commit-and-prove schemes that work for the same commitment scheme in order to obtain \(\textsf {CP}\) systems for conjunction of relations. In this section we generalize their results to the case of typed relations and type-based commitments. This generalization in particular can model the composition of CP-NIZKs that work with different commitments, as is the case in our constructions for set membership in which one has a commitment to a set and a commitment to an element.

We begin by introducing the following compact notation for an augmented relation generator.

Definition 2.10

(Augmented relation generator) Let \(\mathcal{R}\mathcal{G}\) be a relation generator and \(\mathcal {F}(1^\lambda )\) an algorithm taking as input a security parameter. Then we denote by \(\mathcal{R}\mathcal{G}[\mathcal {F}]\) the relation generator returning \((R, (\textsf{aux}_{R}, \textsf{out}_{\mathcal {F}}))\) where \(\textsf{out}_{\mathcal {F}} \leftarrow \mathcal {F}(1^\lambda )\) and \((R, \textsf{aux}_{R}) \leftarrow \mathcal{R}\mathcal{G}(1^{\lambda })\).

The next lemma states that we can (with certain restrictions) trivially extend a CP-NIZK for commitment scheme \(\textsf{C}\) to an extended commitment scheme \(\textsf{C}\bullet \textsf{C}'\).

Lemma 2.2

(Extending to commitment composition) Let \(\textsf{C}, \textsf{C}'\) be commitment schemes defined over disjoint type sets \(\mathcal {T}\) and \(\mathcal {T}'\). If \(\textsf {CP}\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \mathcal{R}\mathcal{G}[\textsf{C}.\textsf{Setup}], \mathcal {Z})\) for some relation and auxiliary input generators \(\mathcal{R}\mathcal{G}, \mathcal {Z}\). Then \(\textsf {CP}\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}\bullet \textsf{C}', \mathcal{R}\mathcal{G}[\textsf{C}.\textsf{Setup}], \mathcal {Z})\) if \(\mathcal{R}\mathcal{G}\) is \(\mathcal {T}\)-compatible.

We now define relation generators and auxiliary input generators for our composition constructions.

Fig. 1
figure 1

Relation and auxiliary input generators for AND composition construction

The following lemma shows how we can compose CP-NIZKs even when one of them is fully extractable but the other is not. We are interested in the conjunction \(R^{\wedge }_{asym}\) of relations of type \(R_1(x_1, (u_0, u_1, u_3), \omega _1)\) and \(R_2(x_2, (u_2, u_3), \omega _2)\) where

$$\begin{aligned} R^{\wedge }_{asym}(x_1, x_2, (u_0, u_1,u_2,u_3), \omega _1,\omega _2):= R_1(x_1, (u_0, u_1, u_3), \omega _1) \wedge R_2(x_2, (u_2, u_3), \omega _2). \end{aligned}$$

Lemma 2.3

(Composing conjunctions (with asymmetric extractability)) Let \(\textsf{C}\) be a computationally binding commitment scheme. If \(\textsf {CP}_1\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \overline{\mathcal{R}\mathcal{G}}_1, \overline{\mathcal {Z}}_1)\) and \(\textsf {CP}_2\) is \(\textsf{KSND}(\textsf{C}, \overline{\mathcal{R}\mathcal{G}}_2, \overline{\mathcal {Z}}_2)\) (where \(\overline{\mathcal{R}\mathcal{G}}_{b}, \overline{\mathcal {Z}}_{b}\) are defined in terms of \(\mathcal{R}\mathcal{G}_{b}, \mathcal {Z}_{b}\) in Fig. 1 for \(b \in \{1,2\}\)), then the scheme \(\textsf {CP}^{\wedge }_{asym}\) in Fig. 2 is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*})\) where \(\mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*}\) are as defined in Fig. 1.

The following lemma is a symmetric variant of Lemma 2.3, i.e. the CP-NIZKs we are composing are both secure over the same commitment scheme and support partial opening, that is they both handle relations with and adversarially open input \(u_0\). This time we are interested in the conjunction \(R^{\wedge }_{sym}\) of relations of type \(R_1(x_1, (u_0, u_1, u_3), \omega _1)\) and \(R_2(x_2, (u_0, u_2, u_3), \omega _2)\) where

$$\begin{aligned} R^{\wedge }_{sym}(x_1, x_2, (u_0, u_1, u_2,u_3), \omega _1,\omega _2):= & {} R_1(x_1, (u_0, u_1, u_3), \omega _1) \\ {}{} & {} \wedge R_2(x_2, (u_0, u_2, u_3), \omega _2). \end{aligned}$$

Lemma 2.4

(Composing conjunctions (symmetric case)) Let \(\textsf{C}\) be a (type-based) computationally binding commitment scheme. If \(\textsf {CP}_{b}\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}, \overline{\mathcal{R}\mathcal{G}}_{b}, \overline{\mathcal {Z}}_{b})\) (where \(\overline{\mathcal{R}\mathcal{G}}_{b}, \overline{\mathcal {Z}}_{b}\) are defined in terms of \(\mathcal{R}\mathcal{G}_{b}, \mathcal {Z}_{b}\) in Fig. 1) for \(b \in \{1,2\}\), then the scheme \(\textsf {CP}^{\wedge }_{sym}\) in Fig. 3 is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C},\mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*})\) where \(\mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*}\) are as defined in Fig. 1.

Fig. 2
figure 2

CP-NIZK construction for AND composition (asymmetric case)

Fig. 3
figure 3

CP-NIZK construction for AND composition (symmetric case)

3 CP-SNARKs for set membership (and non-membership)

In this section we discuss a specialization of CP-SNARKs for the specific NP relation that models membership (resp. non-membership) of an element in a set, formally defined below.

3.1 Set membership relations

Let \(\mathcal {D}_{\textsf{elm}}\) be some domain for set elements, and let \(\mathcal {D}_{\textsf{set}} \subseteq 2^{\mathcal {D}_{\textsf{elm}}}\) be a set of possible sets over \(\mathcal {D}_{u}\). We define the set membership relation \(R_{\textsf{mem}}: \mathcal {D}_{\textsf{elm}} \times \mathcal {D}_{\textsf{set}}\) as

$$\begin{aligned} R_{\textsf{mem}}(U, u) =1 \iff u\in U. \end{aligned}$$

This is the fundamental relation that we deal with in the rest of this work.

The non-membership relation \(R_{\textsf{nmem}}: \mathcal {D}_{\textsf{elm}} \times \mathcal {D}_{\textsf{set}}\) can be defined analogously as

$$\begin{aligned} R_{\textsf{nmem}}(U, u) =1 \iff u\not \in U. \end{aligned}$$

3.2 CP-SNARKs for set membership

Intuitively, a commit-and-prove SNARK for set membership allows one to commit to a set \(U\) and to an element \(u\), and then to prove in zero-knowledge that \(R_{\textsf{mem}}(U,u) = 1\). More formally, let \(R_{\textsf{mem}}: \mathcal {D}_{\textsf{elm}} \times \mathcal {D}_{\textsf{set}}\) be a set membership relation as defined above where \(\mathbb {T}[\mathcal {D}_{\textsf{elm}}] = \textsf{t}_{\textsf{elm}}\) and \(\mathbb {T}[\mathcal {D}_{\textsf{set}}] = \textsf{t}_{\textsf{set}}\), and let \(\textsf{Com}_{\mathsf {S\cup elm}}\) be a type-based commitment scheme for \(\mathcal {T}\) such that \(\textsf{t}_{\textsf{set}}, \textsf{t}_{\textsf{elm}} \in \mathcal {T}\). Basically, \(\textsf{Com}_{\mathsf {S\cup elm}}\) allows one to either commit an element of \(\mathcal {D}_\textsf{elm}\) or to a set of values of \(\mathcal {D}_\textsf{elm}\). Then a CP-SNARK for set membership is a CP-SNARK for the family of relations \(\{\mathcal {R}^{\textsf{mem}}_\lambda \}\) and a type-based commitment scheme \(\textsf{Com}_{\mathsf {S\cup elm}}\). It is deduced from definition 2.8 that this is a zkSNARK for the relation:

\(\varvec{\mathsf R}=(ck,R_{\textsf{mem}})\) over \((\varvec{x},\varvec{w})=((x,c),(u,o,\omega )) :=\left( \, (\, \varnothing \, ,\, (c_U,c_u)\, )\, ,\, (\, (U,u)\, ,\, (o_U,o_u)\, ,\, \varnothing \, ) \, \right) \),

such that \(\varvec{\mathsf R}\) holds iff:

$$\begin{aligned} R_{\textsf{mem}}(U,u)= & {} 1 \wedge \textsf{VerCommit}(\textsf{ck},\textsf{t}_\textsf{set}, c_U, U, o_U)=1 \\ {}{} & {} \wedge \textsf{VerCommit}(\textsf{ck},\textsf{t}_{\textsf{elm}}, c_u, u, o_u)=1. \end{aligned}$$

A commit-and-prove version of \(R_{\textsf{nmem}}\) can be defined as a natural variant of the relation above.

Notice that for the relation \(R_{\textsf{mem}}\) it is relevant for the proof system to be succinct so that proofs can be at most polylogarithmic (or constant) in the the size of the set (that is part of the witness). This is why for set membership we are mostly interested in designing CP-SNARKs.

3.3 Proving arbitrary relations involving set (non-)membership

As discussed in the introduction, a primary motivation of proving set membership in zero-knowledge is to prove additional properties about an alleged set member. In order to make our CP-SNARK for set membership a reusable gadget, we discuss a generic and simple method for composing CP-SNARKs for set membership (with partial opening) with other CP-SNARKs (with full extractability) for arbitrary relations. More formally, let \(R_{\textsf{mem}}\) be the set membership relation over pairs \((U, u) \in \mathbb {X}\times \mathcal {D}_{u}\) as \(R\) be an arbitrary relation over pairs \((u, \omega )\), then we define as \(R^*\) the relation:

$$\begin{aligned} R^*(U, u, \omega ) :=R_{\textsf{mem}}(U, u) \wedge R(u, \omega ). \end{aligned}$$

The next corollary (direct consequence of Lemmas 2.2, 2.3) states we can straightforwardly compose a CP-SNARK for set membership with a CP-SNARK for an arbitrary relation on elements of the set.

Corollary 3.1

(Extending relations with set membership) Let \(\textsf{C}_{\textsf{S}}, \textsf{C}_{u}\) be two computationally binding commitment schemes defined over disjoint type sets \(\mathcal {T}_{\textsf{S}}\) and \(\mathcal {T}_u\). Let \(\textsf {CP}_{\textsf{mem}}, \textsf {CP}_u\) be two CP-SNARKs and \(R_{\textsf{mem}}, \mathcal{R}\mathcal{G}_u\) (resp. \(\mathcal {Z}_{\textsf{mem}}, \mathcal {Z}_u\)) be two relation (resp. auxiliary input) generators. If \(\textsf {CP}_{\textsf{mem}}\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}_{\textsf{S}}\bullet \textsf{C}_{u}, R_{\textsf{mem}}, \mathcal {Z}_{\textsf{mem}})\) and \(\textsf {CP}_u\) is \(\textsf{KSND}(\textsf{C}_{u}, \mathcal{R}\mathcal{G}_u, \mathcal {Z}_u)\) then there exists a \(\textsf {CP}^*\) that is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}_{\textsf{S}}\bullet \textsf{C}_{u}, \mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*})\) where \(\mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*}\) are as defined in Fig. 1.

In a similar fashion, we can combine an arbitrary relation \(R\) with the relation for non-membership obtaining relation \(\bar{R}^*\) defined as:

$$\begin{aligned} \bar{R}^*(U, u, \omega ) :=R_{\textsf{nmem}}(U, u) \wedge R(u, \omega ). \end{aligned}$$

The next corollary states we can straightforwardly compose a CP-SNARK for set non-membership with a CP-SNARK for an arbitrary relation on elements in the universe of the set.

Corollary 3.2

(Extending relations with set non-membership) Let \(\textsf{C}_{\textsf{S}}, \textsf{C}_{u}\) be two computationally binding commitment schemes defined over disjoint type sets \(\mathcal {T}_{\textsf{S}}\) and \(\mathcal {T}_u\). Let \(\textsf {CP}_{\textsf{nmem}}, \textsf {CP}_u\) be two CP-SNARKs and \(R_{\textsf{nmem}}, \mathcal{R}\mathcal{G}_u\) (resp. \(\mathcal {Z}_{\textsf{nmem}}, \mathcal {Z}_u\)) be two relation (resp. auxiliary input) generators. If \(\textsf {CP}_{\textsf{nmem}}\) is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}_{\textsf{S}}\bullet \textsf{C}_{u}, R_{\textsf{nmem}}, \mathcal {Z}_{\textsf{nmem}})\) and \(\textsf {CP}_u\) is \(\textsf{KSND}(\textsf{C}_{u}, \mathcal{R}\mathcal{G}_u, \mathcal {Z}_u)\) then there exists a \(\textsf {CP}^*\) that is \(\textsf{CP}\text{- }\textsf{poKSND}(\textsf{C}_{\textsf{S}}\bullet \textsf{C}_{u}, \mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*})\) where \(\mathcal{R}\mathcal{G}^{*}, \mathcal {Z}^{*}\) are as defined in Fig. 1.

3.3.1 CP-SNARKs for set membership from accumulators with proofs of knowledge

As discussed in the introduction, CP-SNARKs for set membership are simply a different lens through which we can approach accumulators that have a protocol for proving in zero-knowledge that a committed value is in the accumulator (i.e., it is in the set succinctly represented by the accumulator). To strengthen this intuition in Appendix 2 we formally show that a CP-SNARK for set membership can be constructed from an accumulator scheme that has a zero-knowledge proof for committed values. This allows us to capture existing schemes such as [16, 57].

4 A CP-SNARK for set membership with short parameters

In this section we describe CP-SNARKs for set membership in which the elements of the sets can be committed using a Pedersen commitment scheme defined in a prime order group, and the sets are committed using an RSA accumulator. The advantage of having elements committed with Pedersen in a prime order group is that our CP-SNARKs can be composed with any other CP-SNARK for Pedersen commitments and relations \(R\) that take set elements as inputs. The advantage of committing to sets using RSA accumulators is instead that the public parameters (i.e., the CRS) of the CP-SNARKs presented in this section are short, virtually independent of the size of the sets. Since RSA accumulators are not extractable commitments, the CP-SNARKs presented here are secure in a model where the commitment to the set is assumed to be checked at least once, namely they are knowledge-sound with partial opening of the set commitment.

A bit more in detail, we propose two CP-SNARKs. Our first scheme, called \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\), works for set elements that are arbitrary strings of length \(\eta \), i.e., \(\mathcal {D}_{\textsf{elm}} = \{0, 1\}^{\eta }\), and for sets that are any subset of \(\mathcal {D}_{\textsf{elm}}\), i.e., \(\mathcal {D}_{\textsf{set}} = 2^{\mathcal {D}_{\textsf{elm}}}\). Our second scheme, \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\), instead works for set elements that are prime numbers of exactly \(\mu \) bits, and for sets that are any subset of such prime numbers. This second scheme is a simplified variant of the first one that requires more structure on the set elements (they must be prime numbers) but in exchange of that offers better efficiency. So it is preferable in those applications that can work with prime representatives.

4.1 An high-level overview of our constructions

We provide the main idea behind our scheme, and to this end we use the simpler scheme \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) in which set elements are prime numbers in \(\left( 2^{\mu -1},2^{\mu } \right) \). The commitment to the set \(P= \{e_1, \ldots , e_{n}\}\) is an RSA accumulator [2, 6] that is defined as \(\textsf{Acc}= G^{\prod _{e_i \in P} e_i}\) for a random quadratic residue \(G \in \textsf{QR}_N\). The commitment to a set element \(e\) is instead a Pedersen commitment \(c_{e} = g^{e} h^{r_{q}}\) in a group \(\mathbb {G}_{q}\) of prime order \(q\), where \(q\) is of \(\nu \) bits and \(\mu < \nu \). For public commitments \(\textsf{Acc}\) and \(c_{e}\), our scheme allows to prove in zero-knowledge the knowledge of \(e\) committed in \(c_{e}\) such that \(e\in P\) and \(\textsf{Acc}= G^{\prod _{e_i \in P} e_i}\). A public coin protocol for this problem was proposed by Camenisch and Lysyanskaya [16]. Their protocol however requires various restrictions. For instance, the accumulator must work with at least \(2\lambda \)-bit long primes, which slows down accumulation time, and the prime order group must be more than \(4 \lambda \)-bits (e.g., of 512 bits), which is undesirable for efficiency reasons, especially if this prime order group is used to instantiate more proof systems to create other proofs about the committed element. In our scheme the goal is instead to keep the prime order group of “normal” size (say, \(2\lambda \) bits), so that it can be for example a prime order group in which we can efficiently instantiate another CP-SNARK that could be composed with our \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\). And we can also allow flexible choices of the primes size that can be tuned to the application so that applications that work with moderately large sets can benefit in efficiency. In order to achieve these goals, our idea to create a membership proof is to compute the following:

  • An accumulator membership witness \(W = G^{\prod _{e_i \in P{\setminus } \{e\}} e_i}\), and an integer commitment to \(e\) in the RSA group, \(C_{e} = G^{e} H^{r}\), where \(H \in \textsf{QR}_N\).

  • A ZK proof of knowledge \(\textsf {CP}_{\textsf{Root}}\) of a committed root for \(\textsf{Acc}\), i.e. a proof of knowledge of \(e\) and W such that \(W^{e} = \textsf{Acc}\) and \(C_{e} = G^{e} H^{r}\). Intuitively, this gives that \(C_{e}\) commits to an integer that is accumulated in \(\textsf{Acc}\) (at this point, however, the integer may be a trivial root, i.e., 1).

  • A ZK proof \(\textsf {CP}_{\textsf{modEq}}\) that \(C_{e}\) and \(c_{e}\) commit to the same value modulo \(q\).

  • A ZK proof \(\textsf {CP}_{\textsf{Range}}\) that \(c_{e}\) commits to an integer in the range \(\left( 2^{\mu -1},2^{\mu } \right) \).

From the combination of the above proofs we would like to conclude that the integer committed in \(c_{e}\) is in \(P\). Without further restrictions, however, this may not be the case; in particular, since for the value committed in \(C_{e}\) we do not have a strict bound it may be that the integer committed in \(c_{e}\) is another \(e_{q}\) such \(e=e_{q}\pmod q\) but \(e\ne e_{q}\) over the integers. In fact, the proof \(\textsf {CP}_{\textsf{Root}}\) does not guarantee us that \(C_{e}\) commits to a single prime number \(e\), but only that \(e\) divides \(\prod _{e_i \in P} e_i\), namely e might be a product of a few primes in \(P\) or the corresponding negative value, while its residue modulo \(q\) may be some value that is not in the set—what we call a “collision”. We solve this problem by taking in consideration that \(e_{q}\) is guaranteed by \(\textsf {CP}_{\textsf{Range}}\) to be in \(\left( 2^{\mu -1},2^{\mu } \right) \) and by enhancing \(\textsf {CP}_{\textsf{Root}}\) to also prove a bound on \(e\): roughly speaking \(|e| < 2^{2\lambda _{s}+ \mu }\) for a statistical security parameter \(\lambda _{s}\). Using this information we develop a careful analysis that bounds the probability that such collisions can happen for a malicious \(e\) (see Sect. 4.3 for more intuition).

In the following section we formally describe the type-based commitment scheme supported by our CP-SNARK, and a collection of building blocks. Then we present the \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) CP-SNARKs in Sects. 4.3 and 4.4 respectively, and finally we give instantiations for some of our building blocks in Sect. 4.5.

Remark 4.1

Although we specifically describe our protocols for RSA groups, they generalize to work over any Hidden Order Group with slight modifications. See Appendix 4 for details.

4.2 Preliminaries and building blocks

4.2.1 Notation

Given a set \(U= \{u_1,\dots ,u_n\} \subset \mathbb {Z}\) of cardinality n we denote compactly with \(\textsf{prod}_{U} :=\prod _{i=1}^{n} u_i\) the product of all its elements. We use capital letters for elements in an RSA group \(\mathbb {Z}_N^*\), e.g., \(G,H \in \mathbb {Z}_N^*\). Conversely, we use small letters for elements in a prime order group \(\mathbb {G}_q\), e.g., \(g,h \in \mathbb {G}_q\). Following this notation, we denote a commitment in a prime order group as \(c \in \mathbb {G}_q\), while a commitment in an RSA group as \(C \in \mathbb {Z}_N^*\).

4.2.2 Commitment schemes

Our first CP-SNARK, called \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\), is for a family of relations \(R_{\textsf{mem}}: \mathcal {D}_{\textsf{elm}} \times \mathcal {D}_{\textsf{set}}\) such that \(\mathcal {D}_{\textsf{elm}} = \{0, 1\}^{\eta }\), \(\mathcal {D}_{\textsf{set}} = 2^{\mathcal {D}_{\textsf{elm}}}\), and for a type-based commitment scheme that is the canonical composition \(\textsf{SetCom}_{\textsf{RSA}}\bullet \textsf{PedCom}\) of the two commitment schemes given in Fig. 4. \(\textsf{PedCom}\) is essentially a classical Pedersen commitment scheme in a group \(\mathbb {G}_{q}\) of prime order \(q\) such that \(q\in (2^{{\nu }-1},2^{\nu })\) and \(\eta < \nu \). \(\textsf{PedCom}\) is used to commit to set elements and its type is \(\textsf{t}_{q}\). \(\textsf{SetCom}_{\textsf{RSA}}\) is a (non-hiding) commitment scheme for sets of \(\eta \)-bit strings, that is built as an RSA accumulator [2, 6] to a set of \(\mu \)-bit primes, each derived from an \(\eta \)-bit string by a deterministic hash function \(\textsf{H}_{\textsf{prime}}: \{0, 1\}^{\eta } \rightarrow \textsf{Primes}\left( 2^{\mu -1},2^{\mu } \right) \). \(\textsf{SetCom}_{\textsf{RSA}}\) is computationally binding under the factoring assumptionFootnote 16 and the collision resistance of \(\textsf{H}_{\textsf{prime}}\). Its type for sets is \(\textsf{t}_{U}\).

Fig. 4
figure 4

RSA accumulator and Pedersen commitment schemes for \(\textsf{RSAHashmem}\)

4.2.3 Hashing to primes

The problem of mapping arbitrary values to primes in a collision-resistant manner has been studied in the past, see e.g., [14, 29, 43], and in [40] a method to generate random primes is presented. Although the main idea of our scheme would work with any instantiation of \(\textsf{H}_{\textsf{prime}}\), for the goal of significantly improving efficiency, our construction considers a specific class of \(\textsf{H}_{\textsf{prime}}\) functions that work as follows. Let \(\textsf{H}: \{0, 1\}^{\eta } \times \{0, 1\}^{\iota } \rightarrow \{0, 1\}^{\mu -1}\) be a collision-resistant function, and define \(\textsf{H}_{\textsf{prime}}(u)\) as the function that starting with \(j=0\), looks for the first \(j \in [0,2^{\iota }-1]\) such that the integer represented by the binary string \(1 | \textsf{H}(u,j)\) is prime. In case it reaches \(j = 2^{\iota }-1\) it failed to find a prime and outputs \(\perp \).Footnote 17 We consider two main candidates of such function \(\textsf{H}\) (and thus \(\textsf{H}_{\textsf{prime}}\)):

  • Pseudorandom function Namely \(\textsf{H}(u,j) :=\textsf{F}_{\kappa }(u,j)\) where \(\textsf{F}_{\kappa }:\{0, 1\}^{\eta + \iota }\) is a PRF with public seed \(\kappa \) and \(\iota = \lceil \log \mu \lambda \rceil \). Due to the density of primes, the corresponding \(\textsf{H}_{\textsf{prime}}\) runs in the expected running time \(O(\mu )\) and \(\perp \) is returned with probability \(\le \textsf{exp}(-\lambda ) =\textsf{negl}(\lambda )\).Footnote 18 Under the random oracle heuristic, \(\textsf{F}\) can be instantiated with a hash function like SHA256.

  • Deterministic map \(\textsf{H}(u,j) :=f(u) + j\) with \(u>2^{\eta -1}\) and \(j \in (f(u), f(u+1))\), where \(f(u) :=2(u+ 2) \log _2(u+1)^{2}\). The corresponding function \(\textsf{H}_{\textsf{prime}}(u)\) is essentially the function that maps to the next prime after \(f(u)\). This function is collision-free (indeed it requires to take \(\mu > \eta \)) and generates primes that can be smaller (in expectation) than the function above. Cramer’s conjecture implies that the interval \((f(u), f(u+1))\) contains a prime when \(u\) is sufficiently large.

4.2.4 CP-NIZK for \(\textsf{H}\) computation and \(\textsf{PedCom}\)

We use a CP-NIZK \(\textsf {CP}_{\textsf{HashEq}}\) for the relation \(R_\textsf{HashEq}: \{0, 1\}^{\mu } \times \{0, 1\}^{\eta } \times \{0, 1\}^{\iota }\) defined as

$$\begin{aligned} R_\textsf{HashEq}(u_1, u_2, \omega )= 1 \iff u_1 = (1 | \textsf{H}(u_2, \omega )), \end{aligned}$$

and for the commitment scheme \(\textsf{PedCom}\). Essentially, with this scheme one can prove that two commitments \(c_{e}\) and \(c_{u}\) in \(\mathbb {G}_{q}\) are such that \(c_{e}=g^{e} h^{r_{q}}\), \(c_{u}=g^{u}h^{r_u}\) and there exists j such that \(e=(1 | \textsf{H}(u, j))\). As it shall become clear in our security proof, we do not have to prove all the iterations of \(\textsf{H}\) until finding j such that \((1 | \textsf{H}(u, j)) = \textsf{H}_{\textsf{prime}}(u)\) is prime, which saves significantly on the complexity of this CP-NIZK.

4.2.5 Integer commitments

We use a scheme for committing to arbitrarily large integer values in RSA groups introduced by Fujisaki and Okamoto [41] and later improved in [31]. We briefly recall the commitment scheme. Let \(\mathbb {Z}_N^*\) be an RSA group. The commitment key consists of two randomly chosen generators \(G,H \in \mathbb {Z}_N^*\); to commit to any \(x \in \mathbb {Z}\) one chooses randomly an \(r {\leftarrow }{\$}\,[1,N/2]\) and computes \(C \leftarrow G^x H^r\); the verifier checks whether or not \(C = \pm G^x H^r\). This commitment scheme is statistically hiding, as long as G and H lie in the subgroup of \(\mathbb {Z}_N^*\). This can be achieved by setting \(G \leftarrow F^2, H \leftarrow J^2 \in \textsf{QR}(N)\), where FJ are randomly sampled from \(\mathbb {Z}_N^*\). Moreover it’s computationally binding under the assumption that factoring is hard in \(\mathbb {Z}_N^*\). Furthermore, a proof of knowledge of an opening was presented in [31], its knowledge soundness was based on the strong RSA assumption, and later found to be reducible to the plain RSA assumption in [25]. We denote this commitment scheme as \(\textsf{IntCom}\).

4.2.6 Strong-RSA accumulators

As observed earlier, our commitment scheme for sets is an RSA accumulator \(\textsf{Acc}\) computed on the set of primes \(P\) derived from \(U\) through the map to primes, i.e., \(P:=\{\textsf{H}_{\textsf{prime}}(s) | s \in U\}\). In our construction we use the accumulator’s feature for computing succinct membership witnesses, which we recall works as follows. Given \(\textsf{Acc}= G^{\prod _{e_i \in P}e_i} :=G^{\textsf{prod}_P}\), the membership witness for \(e_{k}\) is \(W_k = G^{\prod _{e_i \in P{\setminus } \{e_k\}}e_i}\), which can be verified by checking if \(W_k^{e_k} = \textsf{Acc}\).

4.2.7 Argument of knowledge of a root

We make use of a zero-knowledge non-interactive argument of knowledge of a root of a public RSA group element \(\textsf{Acc}\in \textsf{QR}_N\). This NIZK argument is called \(\textsf {CP}_\textsf{Root}\). More precisely, it takes in an integer commitment to a \(e\in \mathbb {Z}\) and then proves knowledge of an \(e\)-th root of \(\textsf{Acc}\), i.e., of \(W=\textsf{Acc}^{\frac{1}{e}}\). More formally, \(\textsf {CP}_\textsf{Root}\) is a NIZK for the relation \(R_\textsf{Root}: (\mathbb {Z}_N^*\times \textsf{QR}_N \times \mathbb {N}) \times (\mathbb {Z}\times \mathbb {Z}\times \mathbb {Z}_N^*)\) defined as

\(R_\textsf{Root}\left( (C_e, \textsf{Acc}, \mu ),(e, r, W) \right) = 1\) iff,

$$\begin{aligned} C_e= \pm G^eH^r \bmod N \; \wedge \; W^e= \textsf{Acc}\bmod N \; \wedge \; |e| < 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}, \end{aligned}$$

where \(\lambda _{z}\) and \(\lambda _{s}\) are the statistical zero-knowledge and soundness security parameters respectively of the protocol \(\textsf {CP}_\textsf{Root}\). \(\textsf {CP}_\textsf{Root}\) is obtained by applying the Fiat–Shamir transform to a public-coin protocol that we propose based on ideas from the protocol of Camenisch and Lysysanskaya for proving knowledge of an accumulated value [16]. In [16], the protocol ensures that the committed integer \(e\) is in a specific range, different from 1 and positive. In our \(\textsf {CP}_\textsf{Root}\) protocol we instead removed these constraints and isolated the portion of the protocol that only proves knowledge of a root. We present the \(\textsf {CP}_\textsf{Root}\) protocol in Sect. 4.5; its interactive public coin version is knowledge sound under the RSA assumption and statistical zero-knowledge. Finally, we notice that the relation \(R_{\textsf{Root}}\) is defined for statements where \(\textsf{Acc}\in \textsf{QR}_N\), which may not be efficiently checkable given only N if \(\textsf{Acc}\) is adversarially chosen. Nevertheless \(\textsf {CP}_\textsf{Root}\) can be used in larger cryptographic constructions that guarantee \(\textsf{Acc}\in \textsf{QR}_N\) through some extra information, as is the case in our scheme.

4.2.8 Proof of equality of commitments in \(\mathbb {Z}_N^*\) and \(\mathbb {G}_{q}\)

Our last building block, called \(\textsf {CP}_\textsf{modEq}\), proves in zero-knowledge that two commitments, a Pedersen commitment in a prime order group and an integer commitment in an RSA group, open to the same value modulo the prime order \(q= \textsf{ord}(\mathbb {G})\). This is a conjunction of a classic Pedersen \(\varSigma \)-protocol and a proof of knowledge of opening of an integer commitment [31], i.e. for the relation

$$\begin{aligned}{} & {} R_\textsf{modEq}\left( (C_e,c_{e}),(e, e_{q}, r, r_{q}) \right) = 1 \text { iff } e\\{} & {} \quad = e_{q}\bmod q\wedge C_e=\pm G^eH^r \bmod N \wedge c_{e} = g^{e_{q}\bmod q}h^{r_{q}\bmod q}. \end{aligned}$$

We present \(\textsf {CP}_\textsf{modEq}\) in Sect. 4.5.

4.3 Our CP-SNARK \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\)

We are now ready to present our CP-SNARK \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) for set membership. The scheme is fully described in Fig. 5 and makes use of the building blocks presented in the previous section.

The \(\textsf{KeyGen}\) algorithm takes as input the commitment key of \(\textsf{Com}_{1}\) and a description of \(R_{\textsf{mem}}\) and does the following: it samples a random generator \(H {\leftarrow }{\$}\,\textsf{QR}_N\) so that (GH) define a key for the integer commitment, and generate a CRS \(\textsf{crs}_{\textsf{HashEq}}\) of the \(\textsf {CP}_\textsf{HashEq}\) CP-NIZK.

For generating a proof, the ideas are similar to the ones informally described at the beginning of Sect. 4 for the case when set elements are prime numbers. In order to support sets \(U\) of arbitrary strings the main differences are the following: (i) we use \(\textsf{H}_{\textsf{prime}}\) in order to derive a set of primes \(P\) from \(U\), (ii) given a commitment \(c_{u}\) to an element \(u\in \{0, 1\}^{\eta }\), we commit to \(e= \textsf{H}_{\textsf{prime}}(u)\) in \(c_{e}\); (iii) we use the previously mentioned ideas to prove that \(c_{e}\) commits to an element in \(P\) (that is correctly accumulated), except that we replace the range proof \(\pi _{\textsf{Range}}\) with a proof \(\pi _\textsf{HashEq}\) that \(c_{u}\) and \(c_{e}\) commits to \(u\) and \(e\) respectively, such that \(\exists j: e= (1 | \textsf{H}(u,j))\).

Remark 4.2

(On the support of larger \(\eta \)) In order to commit to a set element \(u\in \{0, 1\}^{\eta }\) with the \(\textsf{PedCom}\) scheme we require \(\eta < \nu \). This condition is actually used for ease of presentation. It is straightforward to extend our construction to the case \(\eta \ge \nu \), in which case every \(u\) should be split in blocks of less than \(\nu \) bits that can be committed using the vector Pedersen commitment (Fig. 4).

Fig. 5
figure 5

\(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) CP-SNARK for set membership

The correctness of \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) can be checked by inspection: essentially, it follows from the correctness of all the building blocks and the condition that \(\eta , \mu < \nu \). For succinctness, we observe that the commitments \(C_{U}, c_{u}\) and all the three proofs have size that does not depend on the cardinality of the set \(U\), which is the only portion of the witness whose size is not a-priori fixed.

4.3.1 Proof of security

Recall that the goal is to prove in ZK that \(c_{u}\) is a commitment to an element \(u\in \{0, 1\}^{\eta }\) that is in a set \(U\) committed in \(C_{U}\). Intuitively, we obtain the security of our scheme from the conjunction of proofs for relations \(R_\textsf{Root}, R_\textsf{modEq}\) and \(R_\textsf{HashEq}\): (i) \(\pi _{\textsf{HashEq}}\) gives us that \(c_{e}\) commits to \(e_{q}= (1|\textsf{H}(u,j))\) for some j and for \(u\) committed in \(c_{u}\). (ii) \(\pi _{\textsf{modEq}}\) gives that \(C_{e}\) commits to an integer \(e\) such that \(e\mod q= e_{q}\) is committed in \(c_{e}\). (iii) \(\pi _{\textsf{Root}}\) gives us that the integer \(e\) committed in \(C_{e}\) divides \(\textsf{prod}_{P}\), where \(C_{U} = G^{\textsf{prod}_{P}}\) with \(P= \{ \textsf{H}_{\textsf{prime}}(u_i): u_{i} \in U\}\).

By combining these three facts we would like to conclude that \(e_{q}\in P\) that, together with \(\pi _{\textsf{HashEq}}\), should also guarantee \(u\in U\). A first problem to analyze, however, is that for \(e\) we do not have guarantees of a strict bound in \(\left( 2^{\mu -1},2^{\mu } \right) \); so it may in principle occur that \(e=e_{q}\pmod q\) but \(e\ne e_{q}\) over the integers. Indeed, the relation \(R_{\textsf{Root}}\) does not guarantee us that \(e\) is a single prime number, but only that \(e\) divides the product of primes accumulated in \(C_{U}\). Assuming the hardness of Strong RSA we may still have that \(e\) is the product of a few primes in \(P\) or even is a negative integer. We expose a simple attack that could arise from this: an adversary can find a product of primes from the set \(P\), let it call \(e\), such that \(e=e_{q}\pmod q\) but \(e\ne e_{q}\) over the integers. Since \(e\) is a legitimate product of members of \(P\), the adversary can efficiently compute the \(e\)-th root of \(C_{U}\) and provide a valid \(\pi _\textsf{Root}\) proof. This is what we informally call a “collision”. Another simple attack would be that an adversary takes a single prime \(e\) and then commits to its opposite \(e_{q}\leftarrow -e\mod q\) in the prime order group. Again, since \(e\in P\) the adversary can efficiently compute the \(e\)-th root of \(C_{U}\), \(W^{e}= C_{U}\), and then the corresponding \(-e\)-th root of \(C_{U}\), \(\left( W^{-1} \right) ^{-e} = C_{U}\). This is a second type of attack to achieve what we called “collision”. With a careful analysis we show that with appropriate parameters the probability that such collisions occur can be either 0 or negligible.

One key observation is that \(R_\textsf{Root}\) does guarantee a lower and an upper bound, \(-2^{\lambda _{z}+ \lambda _{s}+ \mu + 2}\) and \(2^{\lambda _{z}+ \lambda _{s}+ \mu + 2}\) respectively, for \(e\) committed in \(C_{e}\). From these bounds (and that \(e\mid \textsf{prod}_{P}\)) we get that an adversarial \(e\) can be the product of at most \(d = 1 + \lfloor \frac{\lambda _{z}+ \lambda _{s}+ 2}{\mu }\rfloor \) primes in \(P\) (or their corresponding negative product). Then, if \(2^{d\mu } \le 2^{\nu -2} < q\), or \(d \mu + 2 \le \nu \), we get that \(e< 2^{d\mu } < q\). In case \(e>0\) and since \(q\) is prime, \(e= e_{q}\bmod q\wedge e< q\) implies that \(e= e_{q}\) over \(\mathbb {Z}\), namely no collision can occur at all. In the other case \(e<0\) we have \(e> -2^{d \mu }\) and \(e= e_{q}\pmod q\) implies \(e= -q+e_{q}< -2^{\nu -1}+2^{\mu } < -2^{\nu -1}+2^{\nu -2}=-2^{\nu -2}\). Therefore, \(-2^{d \mu }<-2^{\nu -2}\), which is a contradiction since we assumed \(d \mu +2 \le \nu \). So this type of collision cannot happen.

If on the other hand we are in a parameters setting where \(d \mu > \nu -2\), we give a concrete bound on the probability that such collisions occur. More precisely, for this case we need to assume that the integers returned by \(\textsf{H}\) are random, i.e., \(\textsf{H}\) is a random oracle, and we also use the implicit fact that \(R_\textsf{HashEq}\) guarantees that \(e_{q}\in \left( 2^{\mu -1},2^{\mu } \right) \). Then we give a concrete bound on the probability that the product of d out of \(\textsf{poly}(\lambda )\) random primes lies in a specific range \(\left( 2^{\mu -1},2^{\mu } \right) \), which turns out to be negligible when d is constant and \(2^{\mu - \nu }\) is negligible.

Since the requirements of security are slightly different according to the setting of parameters mentioned above, we state two separate theorems, one for each case.

Theorem 4.1

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\textsf{RSA}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Root}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption holds, and that \(\textsf{H}\) is collision resistant. If \(d \mu + 2 \le \nu \), then \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) is knowledge-sound with partial opening of the set commitments \(C_{U}\).

Theorem 4.2

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\textsf{RSA}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Root}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption hold, and that \(\textsf{H}\) is collision resistant. If \(d \mu + 2 > \nu \), \(d = O(1)\) is a small constant, \(2^{\mu - \nu } \in \textsf{negl}(\lambda )\) and \(\textsf{H}\) is modeled as a random oracle, then \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) is knowledge-sound with partial opening of the set commitments \(C_{U}\).

Remark 4.3

It is worth noting that Theorem 4.2 where we assume \(\textsf{H}\) to be a random oracle requires a random oracle assumption stronger than usual; this has to do with the fact that while we assume \(\textsf{H}\) to be a random oracle we also assume that \(\textsf {CP}_\textsf{modEq}\) can create proof about correct computations of \(\textsf{H}\). Similar assumptions have been considered in previous works, see, e.g, [71, Remark 2].

Finally, we state the theorem about the zero-knowledge of \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\).

Theorem 4.3

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\textsf{RSA}}\) and \(\textsf{IntCom}\) be statistically hiding commitments, \(\textsf {CP}_\textsf{Root}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) be zero-knowledge arguments. Then \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) is zero-knowledge.

Proof

(Sketch) The proof is rather straightforward, so we only provide a sketch. We define the simulator \(\mathcal {S}\) that takes as input \((\textsf{crs},C_{U},c_{u})\) and does the following:

  • Parses \(\textsf{crs}:=(N,G,H, \textsf{H}_{\textsf{prime}}, \mathbb {G}_q,g,h, \textsf{crs}_{\textsf{HashEq}})\), from which it computes the corresponding \(\textsf{crs}_{\textsf{Root}} :=(N,G,H)\) and \(\textsf{crs}_{\textsf{modEq}} :=(N,G,H,\mathbb {G}_q,g,h) \).

  • Samples at random \(C_{e}^* {\leftarrow }{\$}\,\mathbb {Z}_N^*\) and \(c_{e}^* {\leftarrow }{\$}\,\mathbb {G}_q\).

  • Invokes \(\mathcal {S}_{\textsf{Root}}(\textsf{crs}_{\textsf{Root}},C_e^*,C_{U})\), \(\mathcal {S}_{\textsf{modEq}}(\textsf{crs}_\textsf{modEq},C_{e}^*,c_{e}^*)\) and \(\mathcal {S}_{\textsf{HashEq}}(\textsf{crs}_{\textsf{HashEq}},c_{e}^*,c_{u})\) the corresponding simulators of \(\textsf {CP}_{\textsf{Root}}\), \(\textsf {CP}_{\textsf{modEq}}\) and \(\textsf {CP}_{\textsf{HashEq}}\) respectively. They output simulated proof \(\pi _{\textsf{Root}}^*\), \(\pi _{\textsf{modEq}}^*\) and \(\pi _{\textsf{HashEq}}^*\) respectively.

  • \(\mathcal {S}\) outputs \((C_{e}^*,c_{e}^*,\pi _{\textsf{Root}}^*,\pi _{\textsf{modEq}}^*,\pi _{\textsf{HashEq}}^*)\).

Let \(\pi :=(C_e,c_{e}, \pi _{\textsf{Root}},\pi _{\textsf{modEq}},\pi _{\textsf{HashEq}}) \leftarrow \textsf{Prove}(\textsf{crs}, (C_{U}, c_{u}), (U, u),(\varnothing , r_{u}))\) be the output of a real proof. Since \(\textsf{IntCom}\) and \(\textsf{PedCom}\) are statistically hiding \(C_{e}^*\) and \(c_{e}^*\) are indistinguishable from \(C_e\) and \(c_{e}\) resp. Finally, since \(\textsf {CP}_{\textsf{Root}}\), \(\textsf {CP}_{\textsf{modEq}}\) and \(\textsf {CP}_{\textsf{HashEq}}\) are zero knowledge arguments \(\pi _{\textsf{Root}}^*\), \(\pi _{\textsf{modEq}}^*\) and \(\pi _{\textsf{HashEq}}^*\) are indistinguishable from \(\pi _{\textsf{Root}}\), \(\pi _{\textsf{modEq}}\) and \(\pi _{\textsf{HashEq}}\) resp. \(\square \)

4.3.2 Notation

We introduce some notation that eases our proofs exposition. Let \(U= \{u_1,\dots ,u_n\} \subset \mathbb {Z}\) be a set of cardinality n. We denote as \(\textsf{prod}\) a product of (an arbitrary number of) elements of \(U\), \(\textsf{prod}= \prod _{i \in I}u_i\), for some \(I \subseteq [n]\). Furthermore, \(\Pi _U= \{\textsf{prod}_1,\dots , \textsf{prod}_{2^n-1}\}\) is the set of all possible products and more specifically \(\Pi _{U,d} \subseteq \Pi _U\) denotes the set of possible products of exactly d elements of \(U\), \(|I|=d\), while for the degenerate case of \(d > n\) we define \(\Pi _{U,d}= \emptyset \). We note that \(|\Pi _{U,d}| = \left( {\begin{array}{c}n\\ d\end{array}}\right) \) (except for the degenerate case where \(|\Pi _{U,d}| =0\)). For convenience, in the special case of \(\textsf{prod}\in \Pi _{U,|U|}\), i.e. the (unique) product of all elements of \(U\), we will simply write \(\textsf{prod}_U\). Finally, for a \(J \subseteq [n]\) we let \(\Pi _{U,J} = \cup _{j \in J}\Pi _{U,j}\); for example \(\Pi _{U,[1,\dots ,d]} = \cup _{j=1}^d\Pi _{U,j}\) is the set of all possible products of up to d elements of \(U\). For all of the above we also denote with "−" the corresponding set of the opposite element, e.g. \(-\Pi _U= \{-\textsf{prod}_1,\dots , -\textsf{prod}_{2^n-1}\}\)

Proof of Theorem 4.1

Let a malicious prover \(\mathcal {P}^*\), a PPT adversary of Knowledge Soundness with Partial Opening (see the definition in Sect. 2.6) that on input \((\textsf{ck}, R_{\textsf{mem}}, \textsf{crs}, \textsf{aux}_{R}, \textsf{aux}_{Z})\) outputs \(\left( C_{U}, c_{u}, U, \pi \right) \) such that the verifier \(\mathcal {V}\) accepts, i.e. \( \textsf{VerProof}(\textsf{crs}, C_{U}, c_{u}), \pi )= 1\) and \(\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, C_{U}, U, \varnothing )=1\) with non-negligible probability \(\epsilon \). We will construct a PPT extractor \(\mathcal {E}\) that on the same input outputs a partial witness \((u,r_{q})\) such that \(R_{\textsf{mem}}(U,u)=1 \wedge \textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, c_{u}, u, r_{q})=1\).

For this we rely on the Knowledge Soundness of \(\textsf {CP}_\textsf{Root}, \textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) protocols. \(\mathcal {E}\) parses \(\pi :=(C_e,c_{e}, \pi _{\textsf{Root}},\pi _{\textsf{modEq}},\pi _{\textsf{HashEq}})\) and \(\textsf{crs}:=(N,G,H, \textsf{H}_{\textsf{prime}}, \mathbb {G}_q,g,h, \textsf{crs}_{\textsf{HashEq}})\), from which it computes the corresponding \(\textsf{crs}_{\textsf{Root}} :=(N,G,H)\) and \(\textsf{crs}_{\textsf{modEq}} :=(N,G,H,\mathbb {G}_q,g,h) \). Then constructs an adversary \(\mathcal {A}_{\textsf{Root}}\) for \(\textsf {CP}_\textsf{Root}\) Knowledge Soundness that outputs \((C_e, C_{U}, \mu ,\pi _{\textsf{Root}})\). It is obvious that since \(\mathcal {V}\) accepts \(\pi \) then it also accepts \(\pi _{\textsf{Root}}\), i.e., \(\textsf {CP}_{\textsf{Root}}.\textsf{VerProof}(\textsf{crs}_{\textsf{Root}},(C_{e}, C_{U}, \mu ), \pi _{\textsf{Root}})=1\). From Knowledge Soundness of \(\textsf {CP}_{\textsf{Root}}\) we know that there is an extractor \(\mathcal {E}_{\textsf{Root}}\) that outputs \((e,r,W)\) such that \(C_{e}=\pm G^{e}H^r \pmod N \wedge W^{e}= C_{U} \pmod N \wedge |e| < 2^{\lambda _{z}+ \lambda _{s}+\mu + 2}\). Similarly, \(\mathcal {E}\) constructs adversaries \(\mathcal {A}_{\textsf{modEq}}\) and \(\mathcal {A}_{\textsf{HashEq}}\) of protocols \(\textsf {CP}_{\textsf{modEq}}\) and \(\textsf {CP}_{\textsf{HashEq}}\) respectively. And similarly there are extractors \(\mathcal {E}_{\textsf{modEq}}\) and \(\mathcal {E}_{\textsf{HashEq}}\) that output \((e',e_{q},r',r_{q})\) such that \(e' = e_{q}\pmod q\wedge C_{e'}=\pm G^{e'}H^{r'} \pmod N \wedge c_{e_{q}} = g^{e_{q}\mod q}h^{r_{q}\mod q} \) and \((e_{q}',u,r_{q}', r_{u},j)\) such that \(c_{e}=g^{e_{q}'}h^{r_{q}'} \wedge e_{q}' = (1 | \textsf{H}(u, j))\) respectively.

From the Binding property of the integer commitment scheme we get that \(e=e'\) and \(r = r'\) (over the integers), unless with a negligible probability. Similarly, from the Binding property of the Pedersen commitment scheme we get that \(e_{q}=e_{q}' \pmod q\) and \(r_{q}= r_{q}' \pmod q\), unless with a negligible probability. So if we put everything together the extracted values are \((e,r,W,e_{q},r_{q},u,r_{u},j)\) such that:

$$\begin{aligned} W^{e}= C_{U} \pmod N \wedge |e| < 2^{\lambda _{z}+ \lambda _{s}+\mu + 2} \wedge e= e_{q}\pmod q\wedge e_{q}= (1 | \textsf{H}(u, j)), \end{aligned}$$

and additionally

$$\begin{aligned} C_{e}= \pm G^{e}H^r \wedge c_{e} = g^{e_{q}\bmod q}h^{r_{q}\bmod q} \wedge \textsf{VerCommit}(\textsf{ck},\textsf{t}_{q},c_{u},u,r_{u})=1. \end{aligned}$$

From \(\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, C_{U}, U, \varnothing )=1\) we infer that \(C_{U} = G^{\textsf{prod}_{P}}\), where \(P:=\{ \textsf{H}_{\textsf{prime}}(u) \mid u\in U\} \). From the strong RSA assumption since \(W^{e}= C_{U} = G^{\textsf{prod}_{P}} \pmod N\) we get \(e\in \Pi _{P}\) or \(e\in -\Pi _{P}\), unless with a negligible probability (see Appendix 2).

Since, all the elements of \(P\) are outputs of \(\textsf{H}_{\textsf{prime}}\) they have exactly bitlength \(\mu \), that is \(2^{\mu -1}< e_i < 2^{\mu }\) for each \(e_i \in P\). This means that \(e\) is a (±) product of \(\mu \)-sized primes. Let \(|e|\) be a product of \(\ell \) primes, meaning that \(2^{\ell (\mu -1)}< |e| < 2^{\ell \mu }\), and \(d :=\lfloor \frac{ \lambda _{z}+ \lambda _{s}+\mu + 2}{\mu } \rfloor \). From \(|e| < 2^{\lambda _{z}+ \lambda _{s}+\mu + 2}\) we get that \(2^{\ell \mu }< 2^{\lambda _{z}+ \lambda _{s}+\mu + 2} \Rightarrow \ell < d\) which means that \( e\in \Pi _{P,[1,\dots ,d]}\) or \( e\in -\Pi _{P,[1,\dots ,d]}\) (i.e. \(e\) is a (±) product of at most d primes).

First we show that \(e\in \Pi _{P}\), i.e., that \(e\) cannot be negative. Let \(e\in -\Pi _{P,[1,\dots ,d]}\). We use the fact that \(e= e_{q}\pmod q\), so \(e\le -q+e_{q}< -2^{\nu -1}+2^{\mu } < -2^{\nu -1}+2^{\nu -2}=-2^{\nu -2} \). Since \(-2^{d \mu }<e\) this leads to \(-2^{d \mu } < -2^{\nu -2}\) which contradicts the assumption \(d \mu +2 \le \nu \) (we used the fact that \(e_{q}= (1 | \textsf{H}(u, j))\) to conclude that \(2^{\mu -1}< e_{q}< 2^{\mu }\), which comes from the definition of \(\textsf{H}\)). So \(e> 0\) or \( e\in \Pi _{P,[1,\dots ,d]}\).

Recall that \(e< 2^{d \mu }\). From the assumption \(d \mu + 2 \le \nu \) which means that \(e< 2^{d \mu }< 2^{\nu -2}<q\Rightarrow e< q\). Since \(e= e_{q}\pmod q\) and \(e< q\) this means that \(e= e_{q}\) over the integers. Again we are using the fact that \(e_{q}= (1 | \textsf{H}(u, j))\) to conclude that \(2^{\mu -1}< e_{q}< 2^{\mu }\), which comes from the definition of \(\textsf{H}\), and combined with \(e= e_{q}\) we get that \(2^{\mu -1}< e< 2^{\mu }\). The last fact means that \(e\in \Pi _{P,\{1\}}\) (i.e. \(e\) is exactly one prime from \(P\)) otherwise it would exceed \(2^{\mu }\), so \(e\in P\).

Finally, \(e= e_{q}= (1 | \textsf{H}(u, j)) = \textsf{H}_{\textsf{prime}}(u) \in P= \{\textsf{H}_{\textsf{prime}}(u_1),\dots ,\textsf{H}_{\textsf{prime}}(u_n)\}\), where \(U:=\{u_1,\dots ,u_n\}\). This means that there is an i such that \(\textsf{H}_{\textsf{prime}}(u) = \textsf{H}_{\textsf{prime}}(u_i)\). From collision resistance of \(\textsf{H}_{\textsf{prime}}\) we infer that \(u= u_i\). So we conclude that \(u\in U\) or \(R_{\textsf{mem}}(U,u) = 1\) and as shown above \(\textsf{VerCommit}(\textsf{ck},\textsf{t}_{q},c_{u},u,r_{u})=1\). \(\square \)

4.3.3 Collision finding analysis

For the second theorem we cannot count on the formula \(d \mu + 2 \le \nu \) that ensures that the extracted integer \(e\) lies inside \([0,q-1]\). As explained above, we can only rely on the randomness of each prime to avoid the described “collisions”. First, we formally define what a “collision” is through a probabilistic experiment, \(\textsf{CollisionFinding}\), and then we compute a concrete bound for the probability that this event happens, i.e. the experiment outputs 1. Finally, we state a theorem that shows this probability is asymptotically negligible under the assumption that \(2^{\mu -\nu }\) is a negligible value (and d is a constant).

figure c

Lemma 4.1

Let \(\mathbb {G}_{q}\) be a prime order group of order \(q\in \left( 2^{\nu -1}, 2^{\nu } \right) \) and \(\mu \) such that \(\mu < \nu \) then \(Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{e},n)=1] \le 2 \cdot \sum _{j=2}^d \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }\).

Proof

First we will prove it for positive products, that is we bound the probability

\(Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{e},n)=1 | \textsf{prod}\in \Pi _{P,[2,d]}]\). Let \(\textsf{prod}= q_1...q_j\) be a product of exactly j primes for a \(2 \le j \le d\). Since \(q_i \in \left( 2^{\mu -1},2^{\mu } \right) \) we get \(\textsf{prod}=q_1...q_j \in \left( 2^{j\mu -j},2^{j \mu } \right) \). Also \(\mathbb {Z}_q^*\) is cyclic so we know that at most

$$\begin{aligned} \Bigg \lceil \frac{\left| \left( 2^{j\mu -j},2^{j \mu } \right) \right| }{q} \Bigg \rceil = \Bigg \lceil \frac{2^{j\mu } - 2^{j\mu -j}}{q} \Bigg \rceil = \Bigg \lceil \frac{2^{j\mu - j} \cdot (2^j-1)}{q} \Bigg \rceil \le 2^{j\mu - j- \nu +1} \cdot (2^j-1), \end{aligned}$$

integers in \(\left( 2^{j\mu -j},2^{j \mu } \right) \) are equal to c modulo \(q\), for any \(c \in \{0,1,...,q-1\}\).

We are interested in the interval \(\left( 2^{\mu -1},2^{\mu } \right) \) modulo \(q\). From the previous we get that at most \(2^{j\mu - j- \nu +1} \cdot (2^j-1) \cdot \left| \left( 2^{\mu -1}, 2^{\mu } \right) \right| = 2^{j\mu - j- \nu +1} \cdot (2^j-1) \cdot 2^{\mu -1}= 2^{(j+1)\mu - j- \nu } (2^j-1)\) integers in the range of \(\left( 2^{j\mu -j},2^{j \mu } \right) \) are “winning” integers for the adversary, meaning that after modulo \(q\) they are mapped to the winning interval \(\left( 2^{\mu -1},2^{\mu } \right) \).

From the distribution of primes we know that the number of primes in \(\left( 2^{\mu -1},2^{\mu } \right) \) is approximately \(\frac{2^{\mu -1}}{\mu -1}\). So there are (approximately) \( \left( \frac{2^{\mu -1}}{\mu -1} \right) ^j =\frac{2^{j\mu -j}}{(\mu -1)^j}\) different products of j primes from \(\textsf{Primes}\left( 2^{\mu -1},2^{\mu } \right) \) in \(\left( 2^{j\mu -d},2^{j \mu } \right) \).

This leads us to the combinatorial experiment of choice of \(B=\frac{2^{j \mu -j}}{(\mu -1)^j}\) “balls”, with \(T=2^{(j+1)\mu - j- \nu } (2^j-1)\) “targets” and \(X = \left( {\begin{array}{c}n\\ j\end{array}}\right) \) “tries” without replacement, where “balls” are all possible products, “targets” are the ones that go to \(\left( 2^{\mu -1},2^{\mu } \right) \) modulo \(q\) (the winning ones) and tries are the number of products (for a constant j) that the adversary can try. The “without replacement” comes from the fact that all products are different. The final winning probability is:

$$\begin{aligned} \begin{aligned} Pr[\textsf{prod}\mod q\in \left( 2^{\mu -1},2^{\mu } \right) \wedge \textsf{prod}\in \Pi _{P,j}]&\le \frac{T}{B}+ \frac{T}{B-1} + \frac{T}{B-2} + \ldots + \frac{T}{B-X}\\&\le X \cdot \frac{T}{B-X}\\&= \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }. \end{aligned} \end{aligned}$$

By applying the union bound for all j’s we get:

$$\begin{aligned} \begin{aligned} Pr[\textsf{prod}\mod q\in \left( 2^{\mu -1},2^{\mu } \right) \wedge \textsf{prod}\in \Pi _{P,[2,d]}]\le \sum _{j=2}^d \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }. \end{aligned} \end{aligned}$$

By using the same arguments for negative products we would conclude that

$$\begin{aligned} \begin{aligned} Pr[\textsf{prod}\mod q\in \left( 2^{\mu -1},2^{\mu } \right) \wedge \textsf{prod}\in -\Pi _{P,[2,d]}]\le \sum _{j=2}^d \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }. \end{aligned} \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned}&Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{e},n)=1]\\&\quad = Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{e},n)=1 \wedge \textsf{prod}\in \Pi _{P,[2,d]}] + \\&\qquad + Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{e},n)=1 \wedge \textsf{prod}\in -\Pi _{P,[2,d]}]=\\&\qquad \le 2 \cdot \sum _{j=2}^d \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }. \end{aligned} \end{aligned}$$

\(\square \)

Theorem 4.4

Let \(\mathbb {G}_q\) be a prime order group of order \(q\in \left( 2^{\nu -1}, 2^{\nu } \right) \), \(\mu \) such that \(2^{\mu -\nu } \in \textsf{negl}(\lambda )\), d constant and \(n = \textsf{poly}(\lambda )\) then \(Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_q,n)=1] \in \textsf{negl}(\lambda )\)

Proof

Now \(n = \textsf{poly}(\lambda )\) so the set \(P\) is polynomially bounded. Due to Lemma 4.1 it is straightforward that \(Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_q,n)=1] \le \sum _{j=2}^d \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }\). Since d is constant, for any \(j \in [2,d]\) \(\left( {\begin{array}{c}n\\ j\end{array}}\right) = O(n^j)\) and we get:

$$\begin{aligned} \begin{aligned} 2 \cdot \frac{\left( {\begin{array}{c}n\\ j\end{array}}\right) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -\left( {\begin{array}{c}n\\ j\end{array}}\right) }&= 2 \cdot \frac{O(n^j) 2^{(j+1)\mu - j- \nu } (2^j-1)}{\frac{2^{j\mu -j}}{(\mu -1)^j} -O(n^j)}\\&= 2 \cdot \frac{O(n^j) (2^j-1)(\mu -1)^j}{\frac{2^{j\mu -j}}{2^{(j+1)\mu - j- \nu }} -\frac{O(n^j)(\mu -1)^j}{2^{(j+1)\mu - j- \nu }}}. \end{aligned} \end{aligned}$$

\(O(n^j) (2^j-1)(\mu -1)^j = \textsf{poly}(\lambda )\) and \(\frac{O(n^j)(\mu -1)^j}{2^{(j+1)\mu - j- \nu }} = \textsf{negl}(\lambda )\). Also \(\frac{2^{j\mu -j}}{2^{(j+1)\mu - j- \nu }} = 2^{\nu -\mu }\), therefore for j we get a probability bounded by \(\frac{\textsf{poly}(\lambda )2^{\mu - \nu }}{1 - \textsf{negl}(\lambda )2^{\mu -\nu }} = \textsf{negl}(\lambda )\) by assumption.

Finally, \(Pr[\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_q,n)=1] \le (d-1) \cdot \textsf{negl}(\lambda ) = \textsf{negl}(\lambda )\). \(\square \)

Remark 4.4

For the sake of generality, in \(\textsf{CollisionFinding}\) we do not specify how the random primes are generated. In practice in our scheme they are outputs of the hash function \(\textsf{H}_{\textsf{prime}}\) that we model as a random oracle.

Now we are ready to give the proof of Theorem 4.2:

Proof of Theorem 4.2

The proof is almost the same as the one of Theorem 4.1 except for the next-to-last paragraph, i.e. the justification of \(e\in \Pi _{P,\{1\}} \). Since \(d \mu + 2 > \nu \) we cannot use the same arguments to conclude to it. However, still \(e\in \left( \Pi _{P,[1,\dots , d]} \cup -\Pi _{P,[1,\dots , d]} \right) \).

Let \(e\in \left( \Pi _{P,[1,\dots , d]} \cup -\Pi _{P,[1,\dots , d]} \right) \), it is straightforward to reduce this case to the the collision finding problem. Assume that the adversary \(\mathcal {P}^*\) made \(q_{\textsf{H}}\) random oracle queries to \(\textsf{H}\) and let \(Q_{\textsf{H}}\) be the set of answers she received. Further assume that exactly \(q_{\textsf{H}_{\textsf{prime}}}\) of the them are primes and let \(Q_{\textsf{H}_{\textsf{prime}}}\) be the set of them. We note that \(P\subseteq Q_{\textsf{H}_{\textsf{prime}}}\), unless a collision happened in \(\textsf{H}\).

Now let \(Q_{\textsf{H}_{\textsf{prime}}}\) be the set of the \(\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_q,|Q_{\textsf{H}_{\textsf{prime}}}|)\) experiment. It satisfies all three conditions since each \(e_i \in Q_{\textsf{H}_{\textsf{prime}}}\) is an output of \(\textsf{H}_{\textsf{prime}}\). Therefore \(e_i\) is prime, \(2^{\mu -1}< e_i < 2^{\mu }\) and since \(\textsf{H}\) is modeled as a random oracle the outputs of \(\textsf{H}_{\textsf{prime}}\) are uniformly distributed in \(\textsf{Primes}\left( 2^{\mu -1},2^{\mu } \right) \). Then for the extracted \(e\), we know that \(e= e_{q}\pmod q\in \left( 2^{\mu -1},2^{\mu } \right) \) and from the assumption \(e\in \left( \Pi _{P,[1,\dots , d]} \cup -\Pi _{P,[1,\dots , d]} \right) \), which (as noted above) means that \(e\in \left( \Pi _{Q_{\textsf{H}_{\textsf{prime}}},[2,\dots ,d]} \cup -\Pi _{Q_{\textsf{H}_{\textsf{prime}}},[2,\dots ,d]} \right) \). So \(\textsf{CollisionFinding}(\mu ,d,\mathbb {G}_{q},|Q_{\textsf{H}_{\textsf{prime}}}|) = 1\). Since the adversary is PPT \(|Q_{\textsf{H}_{\textsf{prime}}}| = \textsf{poly}(\lambda )\). Also, \(d = O(1)\) and \(2^{\mu - \nu } \in \textsf{negl}(\lambda )\) (from the assumptions of the theorem) so the previous happens with a negligible probability according to theorem 4.4. So we conclude that, unless with a negligible probability, \(e\in \Pi _{P,\{1\}} \). \(\square \)

4.4 Our CP-SNARK for set membership for primes sets

In this section we show a CP-SNARK for set membership \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) that supports set elements that are prime numbers of exactly \(\mu \) bits, i.e., \(\mathcal {D}_{\textsf{elm}} = \textsf{Primes}(2^{\mu -1}, 2^{\mu })\), and \(\mathcal {D}_{\textsf{set}} = 2^{\mathcal {D}_\textsf{elm}}\). \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) works for a type-based commitment scheme \(\textsf{Com}_{2}\) that is the canonical composition \(\textsf{SetCom}_{\mathsf {RSA'}}\bullet \textsf{PedCom}\) where \(\textsf{SetCom}_{\mathsf {RSA'}}\) is in Fig. 6 (it is essentially a simplification of \(\textsf{SetCom}_{\textsf{RSA}}\) since elements are already primes).

Fig. 6
figure 6

\(\textsf{SetCom}_{\mathsf {RSA'}}\) commitment to sets

The scheme \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) is described in Fig. 7. Its building blocks are the same as the ones for \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) except that instead of a CP-NIZK for proving correctness of a map-to-prime computation, we use a CP-NIZK for range proofs. Namely, we let \(\textsf {CP}_\textsf{Range}\) be a NIZK for the following relation on \(\textsf{PedCom}\) commitments \(c\) and two given integers \(A<B\):

$$R_\textsf{Range}\left( (c_{e},A,B),(e,r_{q}) \right) = 1 \; \text { iff } \; c=g^{e} h^{r_{q}} \; \wedge \; A< e_{q}< B $$

.

Fig. 7
figure 7

\(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) CP-SNARK for set membership

The idea behind the security of the scheme is similar to the one of the \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) scheme. The main difference is that here we rely on the range proof \(\pi _{\textsf{Range}}\) in order to “connect” the Pedersen commitment \(c_{e}\) to the accumulator. In particular, in order to argue the absence of possible collisions here we assume that \(d \mu + 2 \le \nu \) holds, namely we argue security only for this setting of parameters. It is worth noting that in applications where \(\textsf {D}_{\textsf{elm}}\) is randomly chosen subset of \(\textsf{Primes}\left( 2^{\mu -1},2^{\mu } \right) \), we could argue security even when \(d \mu + 2 > \nu \), in a way similar to Theorem 4.2. We omit the analysis of this case from the paper.

Theorem 4.5

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\mathsf {RSA'}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Root}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{Range}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption hold. If \(d \mu + 2 \le \nu \), then \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) is knowledge-sound with partial opening of the set commitments \(c_{P}\). Furthermore, if \(\textsf{PedCom}\), \(\textsf{SetCom}_{\mathsf {RSA'}}\) and \(\textsf{IntCom}\) are statistically hiding commitments, and \(\textsf {CP}_\textsf{Root}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{Range}\) be zero-knowledge, then \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) is zero-knowledge.

Proof of Theorem 4.5

Knowledge soundness with partial opening of \(C_{P}\): the proof is similar to the one of Theorem 4.1 except for some minor parts.

Let a malicious prover \(\mathcal {P}^*\), a PPT adversary of Knowledge Soundness with Partial Opening (see the definition in Sect. 2.6) that on input \((\textsf{ck}, R_{\textsf{mem}}, \textsf{crs}, \textsf{aux}_{R}, \textsf{aux}_{Z})\) outputs \(\left( C_{P}, c_{e}, P, \pi \right) \) such that the verifier \(\mathcal {V}\) accepts, i.e. \( \textsf{VerProof}(\textsf{crs}, C_{P}, c_{e}), \pi )= 1\) and \(\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, C_{P}, P, \varnothing )=1\) with non-negligible probability \(\epsilon \). We will construct a PPT extractor \(\mathcal {E}\) that on the same input outputs a partial witness \((e,r)\) such that \(R_{\textsf{mem}}(P,e)=1 \wedge \textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, c_{e}, e, r)=1\).

For this we rely on the Knowledge Soundness of \(\textsf {CP}_\textsf{Root}, \textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{Range}\) protocols. \(\mathcal {E}\) parses \(\pi :=(C_e, \pi _{\textsf{Root}},\pi _{\textsf{modEq}},\pi _{\textsf{Range}})\) and \(\textsf{crs}:=(N,G,H, \textsf{H}_{\textsf{prime}}, \mathbb {G}_q,g,h, \textsf{crs}_{\textsf{Range}})\), from which it computes the corresponding \(\textsf{crs}_{\textsf{Root}} :=(N,G,H)\) and \(\textsf{crs}_{\textsf{modEq}} :=(N,G,H,\mathbb {G}_q,g,h) \). Then constructs an adversary \(\mathcal {A}_{\textsf{Root}}\) for \(\textsf {CP}_\textsf{Root}\) Knowledge Soundness that outputs \((C_e, C_{P}, \mu ,\pi _{\textsf{Root}})\). It is obvious that since \(\mathcal {V}\) accepts \(\pi \) then it also accepts \(\pi _{\textsf{Root}}\), i.e., \(\textsf {CP}_{\textsf{Root}}.\textsf{VerProof}(\textsf{crs}_{\textsf{Root}},(C_{e},C_{P},\mu ),\pi _{\textsf{Root}})=1\). From Knowledge Soundness of \(\textsf {CP}_{\textsf{Root}}\) we know that there is an extractor \(\mathcal {E}_{\textsf{Root}}\) that outputs \((e,r,W)\) such that \(C_{e}=\pm G^{e}H^r \pmod N \wedge W^{e}= C_{P} \pmod N \wedge e< 2^{\lambda _{z}+ \lambda _{s}+\mu + 2}\). Similarly, \(\mathcal {E}\) constructs adversaries \(\mathcal {A}_{\textsf{modEq}}\) and \(\mathcal {A}_{\textsf{Range}}\) of protocols \(\textsf {CP}_{\textsf{modEq}}\) and \(\textsf {CP}_{\textsf{Range}}\) respectively. And similarly there are extractors \(\mathcal {E}_{\textsf{modEq}}\) and \(\mathcal {E}_{\textsf{Range}}\) that output \((e',e_{q},r',r_{q})\) such that \(e' = e_{q}\pmod q\wedge C_{e'}=\pm G^{e'}H^{r'} \pmod N \wedge c_{e_{q}} = g^{e_{q}\mod q}h^{r_{q}\mod q} \) and \((e_{q}',r_{q}')\) such that \(c_{e}=g^{e_{q}'}h^{r_{q}'} \wedge 2^{\mu -1}< e_{q}' < 2^{\mu }\) respectively.

From the Binding property of the integer commitment scheme we get that \(e=e'\) and \(r = r'\) (over the integers), unless with a negligible probability. Similarly, from the Binding property of the Pedersen commitment scheme we get that \(e_{q}=e_{q}' \pmod q\) and \(r_{q}= r_{q}' \pmod q\), unless with a negligible probability. So if we put everything together the extracted values are \((e,r,W,e_{q},r_{q})\) such that:

$$\begin{aligned} W^{e}= C_{P} \pmod N \wedge e< 2^{\lambda _{z}+ \lambda _{s}+\mu + 2} \wedge e= e_{q}\pmod q\wedge 2^{\mu -1}< e_{q}< 2^{\mu }, \end{aligned}$$

and additionally

$$\begin{aligned} C_{e}= \pm G^{e}H^r \wedge c_{e} = g^{e_{q}\bmod q}h^{r_{q}\bmod q}. \end{aligned}$$

From \(\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, C_{P}, P, \varnothing )=1\) we infer that \(C_{P} = G^{\textsf{prod}_{P}}\), where for each \(e_i \in P\) it holds that \(e\in \textsf{Primes}\left( 2^{\mu -1},2^{\mu } \right) \). From the strong RSA assumption since \(W^{e}= C_{P} = G^{\textsf{prod}_{P}} \pmod N\) we get \(e\in \Pi _{P}\), unless with a negligible probability (see Appendix 2).

The rest of the analysis that justifies \(e\in P\) is identical to the one of the proof of Theorem 4.1. So \(e\in P\) and as shown above \(\textsf{VerCommit}(\textsf{ck},\textsf{t}_{q},c_{e},e_{q},r_{q})=1\).

Zero knowledge For the Zero Knowledge Property we rely on similar techniques with the ones of the proof of Theorem 4.3 except for the use of \(\mathcal {S}_{\textsf{HashEq}}\). Here we use instead the simulator of the \(\textsf {CP}_\textsf{Range}\) protocol, \(\mathcal {S}_{\textsf{Range}}\). \(\square \)

4.5 Proposed instantiations of protocols for \(R_{\textsf{Root}}\) and \(R_{\textsf{modEq}}\)

4.5.1 Protocol \(\textsf {CP}_\textsf{Root}\)

We first give a protocol \(\textsf {CP}_\mathsf {Root'}\) for a simpler version of the \(\textsf{Root}\) relation in which the upper bound on e is removed; let us call \(R_{\mathsf {Root'}}\) this relation.

Below is an interactive ZK protocol for \(R_\mathsf {Root'}\):

  1. 1.

    Prover computes a W such that \(W^e=Acc\) and \(C_W=WH^{r_2},C_r=G^{r_2}H^{r_3}\) and sends to the verifier:

    \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: C_W,C_r\)

  2. 2.

    Prover and Verifier perform a protocol for the relation:

    \(R((C_e,C_r,C_W,Acc),(e,r,r_2,r_3,\beta ,\delta ))=1 \) iff

    $$\begin{aligned} C_e = G^eH^r \wedge C_r=G^{r_2}H^{r_3} \wedge Acc=C_W^e \left( \frac{1}{H} \right) ^\beta \wedge 1= C_r^e \left( \frac{1}{H} \right) ^\delta \left( \frac{1}{G} \right) ^\beta \end{aligned}$$

    Let \(\lambda _{s}\) be the size of the challenge space, \(\lambda _{z}\) be the statistical security parameter and \(\mu \) the size of e.

    • Prover samples:

      $$\begin{aligned} \begin{aligned}&r_e {\leftarrow }{\$}\,\left( -2^{\lambda _{z}+ \lambda _{s}+ \mu },2^{\lambda _{z}+ \lambda _{s}+ \mu } \right) \\&r_r,r_{r_2},r_{r_3} {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}},\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}} \right) \\&r_\beta ,r_\delta {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu },\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu } \right) \end{aligned} \end{aligned}$$

      and computes:

      $$\begin{aligned} \alpha _1 = G^{r_e}H^{r_r}, \quad \alpha _2 = G^{r_{r_2}}H^{r_{r_3}},\quad \alpha _3 = C_W^{r_e} \left( \frac{1}{H} \right) ^{r_\beta } , \quad \alpha _4 = C_{r}^{r_e}(\frac{1}{H})^{r_\delta } \left( \frac{1}{G} \right) ^{r_{\beta }} \end{aligned}$$

      \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: (\alpha _1,\alpha _2,\alpha _3,\alpha _4)\)

    • Verifier samples the challenge \(c \leftarrow \{0,1\}^{\lambda _{s}}\) \(\underline{\mathcal {V} \rightarrow \mathcal {P}}: c\)

    • Prover computes the response:

      $$\begin{aligned} \begin{aligned}&s_e = r_e - c e\\&s_r = r_r - c r, \quad s_{r_2} = r_{r_2} - c r_2, \quad s_{r_3} = r_{r_3} - c r_{r_3}\\&s_\beta = r_\beta - c e r_2, \quad s_\delta =r_\delta - c e r_3 \end{aligned} \end{aligned}$$

      \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: (s_e,s_r,s_{r_2},s_{r_3},s_\beta ,s_\delta )\)

    • Verifier checks if:

      $$\begin{aligned}{} & {} \alpha _1 {\mathop {=}\limits ^{?}} C_e^c G^{s_e} H^{s_r}, \quad \alpha _2 {\mathop {=}\limits ^{?}} C_r^c G^{s_{r_2}}H^{s_{r_3}},\quad \alpha _3 {\mathop {=}\limits ^{?}} Acc^c C_W^{s_e}\left( \frac{1}{H} \right) ^{s_\beta }, \\ {}{} & {} \quad \alpha _4 {\mathop {=}\limits ^{?}} C_{r}^{s_e}\left( \frac{1}{H}\right) ^{s_\delta } \left( \frac{1}{G} \right) ^{s_{\beta }} \end{aligned}$$
Fig. 8
figure 8

Description of the Root protocol

Theorem 4.6

Let \(\mathbb {Z}_N^*\) be an RSA group where strong-RSA assumption holds, then the above protocol is a correct, knowledge sound and honest-verifier zero knowledge protocol for \(R_{\mathsf {Root'}}\) (Fig. 8).

The proof of the above is similar to the one of [16] where the more specific protocol was introduced, but implicitly was including a protocol for \(R_{\mathsf {Root'}}\). Before proceeding to the proof we recall some properties related to RSA groups. First we expose two standard arguments. The first is that obtaining a multiple of \(\phi (N)\) is equivalent to factoring N. This directly allows us to argue that for any \(G \in \mathbb {Z}_N^*\), if one is able to find an \(x \in \mathbb {Z}\) such that \(G^x = 1 \pmod {N}\) then under the factoring assumption \(x= 0\), otherwise x is a multiple of \(\phi (N)\). Secondly, finding any non-trivial solution of the equation \(\mu ^2=1 \pmod N\) in \(\mathbb {Z}_N^*\) (non-trivial means \(\mu \ne \pm 1\)) is equivalent to factoring N.

Remark 4.5

In 2017 Couteau et al. proved that in fact knowledge soundness for the protocol of opening an integer commitment can be reduced to (plain) RSA problem [25]. This could be inherited to our protocol too. However, the relation itself assumes strong RSA’s hardness, otherwise finding a root would be computable in polynomial time. Additionally, in the reduction to (plain) RSA, the extractor’s probability of success is cubic, while in the reduction to strong RSA linear, in the adversary’s probability of success.

Proposition 4.1

Let \(\mathbb {Z}_N^*\) be an RSA group with a modulus N and \(\textsf{QR}_N\) the corresponding group of quadratic residues modulo N.

  1. 1.

    Let \(G,H {\leftarrow }{\$}\,\textsf{QR}_N\) two random generators of \(\textsf{QR}_N\) and a PPT adversary \(\mathcal {A}\) outputting \(\alpha , \beta \in \mathbb {Z}_N^*\) such that \(G^\alpha H^\beta =1\) then under the assumption that DLOG problem is hard in \(\textsf{QR}_N\) it holds that \(\alpha =\beta =0\).

  2. 2.

    Let \(A,B \in \mathbb {Z}_N^*\) and a PPT adversary \(\mathcal {A}\) outputting \(x,y \in \mathbb {Z}_N^*\) such that \(A^y = B^x\) and \(y \mid x\) then under the assumption that factoring of N is hard it holds that \(A = \pm B^{\frac{x}{y}}\).

Proof

  1. 1.

    Since \(G,H \in \textsf{QR}_N\) there is an \(x \in \mathbb {Z}_N^*\) such that \(G = H^x \pmod N\) which leads to \(H^{x \alpha + \beta } = 1\). As we discussed above under the assumption that factoring of N is hard, \(x \alpha + \beta = 0\). If \(\alpha \ne 0\) then \(x \leftarrow -\frac{\beta }{\alpha }\) is a discrete logarithm of H, so assuming that DLOG is hard \(\alpha = 0\). Similarly, there is an \(y \in \mathbb {Z}_N^*\) such that \(G^y = H \pmod N\) and with a similar argument we can conclude that \(\beta = 0\).

  2. 2.

    We discern two cases, \(y = \rho \) is odd or \(y = 2^v \rho \) is even (for an odd \(\rho \)). In case y is odd then it is co-prime with \(\phi (N) = p'q'\) (otherwise if \(y = p'\) or \(y = q'\) we would be able to factor N), so \(y^{-1} \pmod {\phi (N)}\) exists and \(A = B^{\frac{x}{y}}\). If \(y=2^v \rho \) then \(\left( A^{-1}B^{\frac{x}{y}} \right) ^{y} = 1 \Rightarrow \left( A^{-1}B^{\frac{x}{y}} \right) ^{2^v\rho }=1 \Rightarrow \left( A^{-1}B^{\frac{x}{y}} \right) ^{2^v}=1\). From the second fact that we discussed above under the factoring assumption \(\left( A^{-1}B^{\frac{x}{y}} \right) ^{2^{v-1}} = \pm 1\). However for \(v >1\) the left part of the equation is a quadratic residue so it cannot be \(-1\), therefore \(\left( A^{-1}B^{\frac{x}{y}} \right) ^{2^{v-1}} = 1\). Using the same facts repeatedly we will eventually conclude that \(\left( A^{-1}B^{\frac{x}{y}} \right) ^{2} = 1\), hence \( A^{-1}B^{\frac{x}{y}} = \pm 1 \Rightarrow A = \pm B^{\frac{x}{y}}\).

\(\square \)

Proof of Theorem 4.6

Correctness is straightforward. Honest-verifier zero knowledge can be shown with standard arguments used in \(\varSigma \)-protocols and the fact that the commitments to \(C_e,C_W,C_r\) are statistically hiding. That is the simulator \(\mathcal {S}\) on input \((C_e,\textsf{Acc})\) samples \(C_W^* {\leftarrow }{\$}\,\mathbb {Z}_N^*\) and \(C_r^* {\leftarrow }{\$}\,\mathbb {Z}_N^*\). Then samples

$$\begin{aligned}{} & {} s_e^* {\leftarrow }{\$}\,\left( -2^{\lambda _{z}+ \lambda _{s}+ \mu }-2^{\lambda _{z}+\mu },2^{\lambda _{z}+ \lambda _{s}+ \mu }+2^{\lambda _{z}+\mu } \right) ,\\{} & {} s_r^*,s_{r_2}^*,s_{r_3}^* {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}} -\left\lfloor N/4\right\rfloor 2^{\lambda _{s}},\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}} + \left\lfloor N/4\right\rfloor 2^{\lambda _{s}}\right) ,\\{} & {} s_\beta ^*,s_\delta ^* {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu } - \left\lfloor N/4\right\rfloor 2^{\lambda _{s}+\mu },\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu } + \left\lfloor N/4\right\rfloor 2^{\lambda _{s}+\mu } \right) . \end{aligned}$$

Finally it samples \(c^* {\leftarrow }{\$}\,\{0,1\}^{\lambda _{s}}\). Then it sets \(\alpha _1^* \leftarrow C_e^c G^{s_e} H^{s_r}\), \(\alpha _2^* \leftarrow C_r^c G^{s_{r_2}}H^{s_{r_3}}\), \(\alpha _3^* \leftarrow \textsf{Acc}^c C_W^{s_e}\left( \frac{1}{H} \right) ^{s_\beta }\) and \(\alpha _4^* {\mathop {=}\limits ^{?}} C_{r}^{s_e}\left( \frac{1}{H}\right) ^{s_\delta } \left( \frac{1}{G} \right) ^{s_{\beta }}\). \(\mathcal {S}\) outputs \(\pi ^* \leftarrow (C_W^*,C_r^*,\alpha _1^*,\alpha _2^*,\alpha _3^*\), \(\alpha _4^*,c^*,s_e^*,s_r^*,s_{r_2}^*,s_{r_3}^*,s_\beta ^*,s_\delta ^*)\). The distribution of \(\pi ^*\) is identical to the one of a real proof \(\pi \).

For the knowledge soundness, let an adversary of the knowledge soundness \(\mathcal {A}\) that is able to convince the verifier \(\mathcal {V}\) with a probability at least \(\epsilon \). We will construct an extractor \(\mathcal {E}\) that extracts the witness \((e,r,r_2,r_3,\beta ,\delta )\). Using rewinding \(\mathcal {E}\) gets two accepted transcripts

$$\begin{aligned}{} & {} (C_W, C_r, \alpha _1, \alpha _2, \alpha _3, \alpha _4, c, s_e, s_r, s_{r_2}, s_{r_3}, s_\beta ,s_\delta ) \\ {}{} & {} \quad \text { and } (C_W, C_r, \alpha _1, \alpha _2, \alpha _3, \alpha _4, c', s_e', s_r', s_{r_2}', s_{r_3}', s_\beta ', s_\delta '), \end{aligned}$$

on two different challenges c and \(c'\). \(\mathcal {E}\) aborts if it cannot get two such transcripts (\(\textsf {abort}1\)).

We denote \(\varDelta c :=c'-c, \varDelta s_e :=s_e - s_e', \varDelta s_r :=s_r - s_r', \varDelta s_{r_2} :=s_{r_2} - s_{r_2}', \varDelta s_{r_3} :=s_{r_3} - s_{r_3}', \varDelta s_{\beta } :=s_{\beta } - s_{\beta }', \varDelta s_{\delta } :=s_{\delta } - s_{\delta }'\) then

$$\begin{aligned} C_e^{\varDelta c}= & {} G^{\varDelta s_e}H^{\varDelta s_r},C_r^{\varDelta c} = G^{\varDelta s_{r_2}}H^{\varDelta s_{r_3}}, \textsf{Acc}^{\varDelta c} = C_W^{\varDelta s_e}\left( \frac{1}{H} \right) ^{\varDelta s_\beta }, \\ 1= & {} C_r^{\varDelta s_e} \left( \frac{1}{H} \right) ^{\varDelta s_\delta } \left( \frac{1}{G} \right) ^{\varDelta s_\beta }. \end{aligned}$$

Define the (possibly rational) numbers \(\hat{e} :=\frac{\varDelta s_e}{\varDelta c}, \hat{r} :=\frac{\varDelta s_r}{\varDelta c}, \hat{r_2} :=\frac{\varDelta s_{r_2}}{\varDelta c}, \hat{r_3} :=\frac{\varDelta s_{r_3}}{\varDelta c}\). In case \(\varDelta c\) doesn’t divide \(\varDelta s_e\) and \(\varDelta s_r\), \(\mathcal {E}\) aborts (\(\textsf {abort}\, 2a\)). Similarly, in case \(\varDelta c\) doesn’t divide \(\varDelta s_{r_2}\) and \(\varDelta s_{r_3}\), \(\mathcal {E}\) aborts (\(\textsf {abort}\, 2b\)). Therefore, since the above aborts didn’t happen and according to second point of Proposition 4.1, \(C_e = \pm G^{\hat{e}}H^{\hat{r}}\) and \(C_{r} = \pm G^{\hat{r_2}}H^{\hat{r_3}}\).

Now if we replace \(C_r\) in the fourth equation we get \(1 = (\pm 1)^{\varDelta s_e} G^{\hat{r_2} \varDelta s_e} H^{\hat{r_3} \varDelta s_e} \left( \frac{1}{H} \right) ^{\varDelta s_\delta } \left( \frac{1}{G} \right) ^{\varDelta s_\beta }\) or \((\pm 1)^{\varDelta s_e} G^{\hat{r_2} \varDelta s_e - \varDelta s_\beta } H^{\hat{r_3} \varDelta s_e - \varDelta s_\delta } = 1\). However, \((\pm 1)^{\varDelta s_e} = 1\) otherwise if \((\pm 1)^{\varDelta s_e}=-1\) then \(-G^{\hat{r_2} \varDelta s_e - \varDelta s_\beta } H^{\hat{r_3} \varDelta s_e - \varDelta s_\delta }\) would be a non-quadratic residue (since GH are both in \(\textsf{QR}_N\) and \(\textsf{QR}_N\) is closed under multiplication) equal to 1 which is a quadratic residue and this would be a contradiction, hence \(G^{\hat{r_2} \varDelta s_e - \varDelta s_\beta } H^{\hat{r_3} \varDelta s_e - \varDelta s_\delta } = 1\). According to the first point of Proposition  4.1, under the factoring assumption \(\hat{r_2} \varDelta s_e - \varDelta s_\beta = \hat{r_3} \varDelta s_e - \varDelta s_\delta = 0\), so \(\hat{r_2} \varDelta s_e = \varDelta s_\beta \).

Finally we replace \(\varDelta s_\beta \) in the third equation and we get \(Acc^{\varDelta c} = C_W^{\varDelta s_e}\left( \frac{1}{H} \right) ^{\hat{r_2} \varDelta s_e} \Rightarrow Acc^{\varDelta c} = \left( \frac{C_w}{H^{\hat{r_2}}} \right) ^{\varDelta s_e}\). As stated above \(\varDelta c\) divides \(\varDelta s_e\) so according to the second point of Proposition 4.1\(\textsf{Acc}= \pm \left( \frac{C_W}{H^{\hat{r_2}}} \right) ^{\frac{\varDelta s_e}{\varDelta c}} = \pm \left( \frac{C_W}{H^{\hat{r_2}}} \right) ^{\hat{e}}\). We discern three cases:

  • \(\underline{\textsf{Acc}= + \left( \frac{C_W}{H^{\hat{r_2}}} \right) ^{\frac{\varDelta s_e}{\varDelta c}}}\): Then \(\mathcal {E}\) sets \(\tilde{W} \leftarrow \frac{C_W}{H^{\hat{r_2}}}\) and \(\tilde{e} \leftarrow \hat{e} :=\frac{\varDelta s_e}{\varDelta c}\) \(\tilde{r} \leftarrow \hat{r} :=\frac{\varDelta s_r}{\varDelta c}\) as above. It is clear that \(\textsf{Acc}= \tilde{W}^{\tilde{e}}\) and as stated above \(C_e = G^{\tilde{e}}H^{\tilde{r}}\).

  • \(\underline{\textsf{Acc}= - \left( \frac{C_W}{H^{\hat{r_2}}} \right) ^{\frac{\varDelta s_e}{\varDelta c}} and \frac{\varDelta s_e}{\varDelta c} odd}\): Then \(\mathcal {E}\) sets \(\tilde{W} \leftarrow -\frac{C_W}{H^{\tilde{r_2}}}\) and \(\tilde{e} \leftarrow \hat{e} :=\frac{\varDelta s_e}{\varDelta c}\) \(\tilde{r} \leftarrow \hat{r} :=\frac{\varDelta s_r}{\varDelta c}\) as above. It is clear that \(\textsf{Acc}= \tilde{W}^{\tilde{e}}\) and as stated above \(C_e = G^{\tilde{e}}H^{\tilde{r}}\).

  • \(\underline{\textsf{Acc}= - \left( \frac{C_W}{H^{\hat{r_2}}} \right) ^{\frac{\varDelta s_e}{\varDelta c}} and \frac{\varDelta s_e}{\varDelta c} even}\): this means that Acc is a non-quadratic residue, which is a contradiction since in the \(R_{\mathsf {Root'}}\) relation we assume that \(\textsf{Acc}\in \textsf{QR}_N\).

Finally the \(\mathcal {E}\) outputs \((\tilde{e},\tilde{r}, \tilde{W})\).

Now we show that the probability the extractor terminates with outputting a valid witness is \(O(\epsilon )\). If the extractor does not abort then it clearly outputs a valid witness (under factoring assumption). For the first abort, with a standard argument it can be shown that the extractor is able to extract two accepting transcripts with probability \(O(\epsilon )\) (for the probabilistic analysis we refer to [31]). Thus \(Pr[\textsf {abort}1] = 1 - O(\epsilon )\). For the second type of aborts (\(\textsf {abort}\, 2a\) and \(\textsf {abort}\, 2b\)), they happen with negligible probability under the strong RSA assumption. For the details see Lemma 4.2 below, which was proven in [31]. Putting them together the probability of success of \(\mathcal {E}\) is at least \(O(\epsilon ) - \textsf{negl}(\lambda _{s})\). \(\square \)

Lemma 4.2

([31]) Given that \(\textsf {abort}\, 2a\) occurs a PPT adversary \(\mathcal {B}\) can solve the strong RSA problem with probability at least \(\frac{1}{2}-2^{-\lambda _{s}}\).

From the above we get \(Pr[\mathcal {B} \text { solves } sRSA] \ge \left( \frac{1}{2}-2^{-\lambda _{s}} \right) Pr[\textsf {abort}\, 2a]\), so we conclude to \( Pr[\textsf {abort}\, 2a] \le \frac{1}{\frac{1}{2}- 2^{-\lambda _{s}}} Pr[\mathcal {B} \text { solves } sRSA] = \textsf{negl}(\lambda _{s})\). The same lemma holds for \(\textsf {abort}\, 2b\).

Notice in the above protocol that

$$\begin{aligned}{} & {} -2^{\lambda _{z}+ \lambda _{s}+ \mu }-2^{\lambda _{s}+\mu } \le s_e \le 2^{\lambda _{z}+ \lambda _{s}+ \mu }+2^{\lambda _{s}+\mu } \Rightarrow \\{} & {} -2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \le s_e \le 2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \Rightarrow \\{} & {} -2^{\lambda _{z}+ \lambda _{s}+ \mu +2} \le \varDelta s_e \le 2^{\lambda _{z}+ \lambda _{s}+ \mu +2} \Rightarrow \\{} & {} -2^{\lambda _{z}+ \lambda _{s}+ \mu +2} \le \hat{e} \le 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}, \end{aligned}$$

so if we impose an additional verification check of honest \(s_e\) size, i.e., \(s_e \in \left[ -2^{\lambda _{z}+ \lambda _{s}+ \mu +1}\right. \), \(\left. 2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \right] \), we get that \( |\hat{e}| \le 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}\). The verifier performs an extra range check \(s_e {\mathop {\in }\limits ^{?}} \left[ -2^{\lambda _{z}+ \lambda _{s}+ \mu +1}, 2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \right] \) and the resulting protocol is the \(\textsf {CP}_\textsf{Root}\) that except for proving of knowledge of an e-th root also provides a bound for the size of |e|:

$$\begin{aligned} R_{\textsf{Root}} \left( (C_e,Acc,\mu ),(e,r,W) \right)= & {} 1 \text { iff } C_e=\pm G^eH^r \pmod N \wedge W^e=Acc \\ {}{} & {} \pmod N \wedge |e|< 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}. \end{aligned}$$

4.5.2 Protocol \(\textsf {CP}_\textsf{modEq}\)

Below we describe the public-coin ZK protocol for \(R_\textsf{modEq}\). In Fig.  9 we summarize the corresponding NIZK obtained after applying the Fiat–Shamir transform to it.

  1. 1.

    Prover samples:

    $$\begin{aligned} \begin{aligned}&r_e \leftarrow \left( -2^{\lambda _{z}+ \lambda _{s}+ \mu },2^{\lambda _{z}+ \lambda _{s}+ \mu } \right) \\&r_r\leftarrow \left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}},\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}} \right) \\&r_{r_{q}} \leftarrow \mathbb {Z}_{q}, \end{aligned} \end{aligned}$$

    and computes:

    $$\begin{aligned} \alpha _1 = G^{r_e}H^{r_r}, \quad \alpha _2 = g^{r_e \pmod p}h^{r_{r_{q}}}. \end{aligned}$$

    \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: (\alpha _1,\alpha _2)\).

  2. 2.

    Verifier samples the challenge \(c \leftarrow \{0,1\}^{\lambda _{s}}\).para \(\underline{\mathcal {V} \rightarrow \mathcal {P}} c\).

  3. 3.

    Prover computes the response:

    $$\begin{aligned} \begin{aligned}&s_e = r_e - c e\\&s_r = r_r - c r\\&s_{r_{q}} = r_{r_{q}} - c r_{q}\pmod q. \end{aligned} \end{aligned}$$

    \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: (s_e,s_r,s_{r_{q}})\).

  4. 4.

    Verifier checks if:

    $$\begin{aligned} \alpha _1 {\mathop {=}\limits ^{?}} \pm C_{e}^{c} G^{s_e} H^{s_{r}} \pmod N, \alpha _2 {\mathop {=}\limits ^{?}} c_{e_{q}}^{c} g^{s_e \pmod q} h^{s_{r_{q}}}. \end{aligned}$$

Theorem 4.7

Let \(\mathbb {Z}_N^*\) be an RSA group where strong-RSA assumption holds and \(\mathbb {G}\) be a prime order group where DLOG assumption holds then the above protocol is a correct, knowledge sound and honest-verifier zero knowledge protocol for \(R_{\textsf{modEq}}\).

The proof is quite simple and is omitted.

Fig. 9
figure 9

Description of the modEq protocol

4.6 Instantiations

We discuss the possible instantiations of our schemes \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) that can be obtained by looking at applications’ constraints and security parameters constraints.

Parameters for \(d\mu + 2 \le \nu \) and \(\mu \le \nu - 2\). First we analyze possible parameters that satisfy the conditions \(d\mu + 2 \le \nu \wedge \mu \le \nu - 2\) that is used in Theorems 4.1 and 4.2; we recall \(d = 1 + \lfloor \frac{\lambda _{z}+ \lambda _{s}+ 2}{\mu }\rfloor \), where \(\lambda _{z}\) and \(\lambda _{s}\) are statistical security parameters for zero-knowledge and soundness respectively of \(\textsf {CP}_{\textsf{Root}}\).

If the prime order group \(\mathbb {G}_{q}\) is instantiated with (pairing-friendly) elliptic curves, then the bitsize \(\nu \) of its order must be at least \(2\lambda \). And recall that for correctness we need \(\mu < \nu \).

Considering these constraints, one way to satisfy \(d\mu + 2 \le \nu \) is to choose \(\mu \) such that \(\nu -1> \mu > \lambda _{z}+ \lambda _{s}+ 2\). More specifically, a choice that maximizes security is \(\nu = 2\lambda \), \(\mu = 2\lambda -2\) and \(\lambda _{z}=\lambda -3, \lambda _{s}= \lambda - 2\). For the case of the \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) scheme, this choice yields an instantiation with nearly \(\lambda \) bits of security and where the function \(\textsf{H}\) does not necessarily need to be a random oracle (yet it must be collision resistant).

Because of the constraint \(\mu > \lambda _{z}+ \lambda _{s}+ 2\), we the choice above implies the use of large primes. This would be anyway the case if one instantiates the scheme with a collision-resistant hash function \(\textsf{H}\) (e.g., SHA256 or SHA3), e.g., because set elements are quite arbitrary. If on the other hand, one could support more specific set elements, one could use instead a deterministic map-to-primes or even use our scheme \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) in which set elements themselves are primes. In this case one may wonder if it is possible to choose values of \(\mu \) smaller than \(2\lambda \); for example \(\mu \approx 30, 60, 80\). The answer is positive although the characterization of such \(\mu \)’s require an involved analysis.

Let us fix \(\nu = 2\lambda \), and say that the statistical security parameters \(\lambda _{z}, \lambda _{s}\) are such that \(\lambda _{z}+ \lambda _{s}+ 2 = 2\lambda - 2 - c\) for some constant c (for example \(c=4\) if \(\lambda _{z}= \lambda _{s}= \lambda - 4\)). We are essentially looking for \(\mu \) such that

$$\begin{aligned}{} & {} \mu \le 2\lambda -2 -c \text { and } \mu + \mu \left\lfloor \frac{2\lambda -2}{\mu } - \frac{c}{\mu }\right\rfloor \le 2\lambda -2\\{} & {} \iff \mu \le 2\lambda -2 -c \text { and } \left\lfloor \frac{2\lambda -2}{\mu } - \frac{c}{\mu }\right\rfloor \le \frac{2\lambda -2}{\mu } - 1. \end{aligned}$$

From the fact \(x \mod y = x - y \lfloor \frac{x}{y}\rfloor \), we can reduce the above inequality into

$$\begin{aligned} \mu \le 2\lambda -2 -c \text { and } 2\lambda -2 -c \bmod \mu \ge \mu - c, \end{aligned}$$

that can admit solutions for \(c \ge 2\).

For instance, if \(\lambda = 128\) and \(c = 4\), then we get several options for \(\mu \), e.g., \(\mu = 32, 42, 63, 84, 126, 127\).

Parameters for \(d\mu + 2 > \nu \). This case concerns only \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) and Theorem 4.2 in particular. In this case, if one aims at maximizing security, say to get a scheme with \(\lambda \)-bits of security, then would have to set \(\mu \approx 2\lambda \) for collision resistance, and consequently select the prime order group so that \(\nu \ge 3\lambda \). This choice however is costly in terms of performance since the efficiency of all protocols that work in the prime order group degrades.

5 A CP-SNARK for set non-membership with short parameters

Here we describe two CP-SNARKs for set non-membership that work in a setting identical to the one of Sect. 4. Namely, the set is committed using an RSA accumulator, and the element (that one wants to prove not to belong to the set) is committed using a Pedersen commitment scheme. As in the previous section, we propose two protocols for non-membership, called \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\), in complete analogy to \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\). In the former, the elements of the set are arbitrary bit-strings of length \(\eta \), \(\mathcal {D}_{\textsf{elm}} = \{0, 1\}^{\eta }\), while in the latter the elements are primes of length \(\mu \). The schemes are fully described in Figs. 10 and 11.

5.1 An high-level overview of the constructions

The main idea of \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) is similar to the one of the corresponding membership protocol, \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\). It uses in the same modular way the \(\textsf{modEq}\) and \(\textsf{HashEq}\) protocols. The only difference lies in the third protocol: instead of using \(\textsf{Root}\) it uses a new protocol \(\textsf{Coprime}\). In a similar manner, \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) uses \(\textsf{modEq}\), \(\textsf{Range}\) and \(\textsf{Coprime}\).

Let us explain the need of the \(\textsf{Coprime}\) protocol and what it does. First, recall how a non-membership proof is computed in RSA Accumulators [50]. Let \(P\) be a set of primes to be accumulated and \(\textsf{prod}\) the corresponding product. For any prime element \(e \notin P\) it holds that \(\gcd (e,\textsf{prod}) = 1\), while for any member \(e \in P\) it is \(\gcd (e,\textsf{prod}) = e \ne 1\). Thus, proving that \(\gcd (e,\textsf{prod})=1\) would exhibit non-membership of e in \(P\). Recall, also, that using the extended Euclidean algorithm one can efficiently compute coefficients (ab) such that \(a \cdot e + b \cdot \textsf{prod}= \gcd (e,\textsf{prod})\). A non-membership proof for an element e w.r.t. an accumulator \(\textsf{Acc}= G^{\textsf{prod}}\) consists of a pair \((D=G^a, b)\), where ab are such that \(a \cdot e + b \cdot \textsf{prod}= 1\). The verification is \(D^e \textsf{Acc}^b = G\), which ensures that e and \(\textsf{prod}\) are coprime, i.e. \(\gcd (e,\textsf{prod}) = 1\). Therefore, the goal of the \(\textsf{Coprime}\) protocol is to prove knowledge of an element \(e\) committed in an integer commitment \(C_{e}\) that satisfies this relation. A more formal definition of \(\textsf{Coprime}\) is given below and an instantiation of this protocol is in Sect. 5.4.

5.2 Argument of knowledge for a coprime element

We make use of a non-interactive argument of knowledge of a non-membership witness of an element such that the verification equation explained above holds. More formally \(\textsf {CP}_\textsf{Coprime}\), is a NIZK for the relation: \(R_\textsf{Coprime}: (\mathbb {Z}_N^*\times \textsf{QR}_N) \times (\mathbb {Z}\times \mathbb {Z}\times \textsf{QR}_N \times \mathbb {Z})\) defined as

\(R_\textsf{Coprime}\left( (C_e, \textsf{Acc}),(e, r, D, b) \right) = 1\) iff

$$\begin{aligned} C_e= \pm G^eH^r \bmod N \; \wedge \; D^{e} \textsf{Acc}^b = G \wedge |e| < 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}. \end{aligned}$$

We propose an instantiation of a protocol for the above relation in the Sect. 5.4.

5.3 Our constructions of \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\)

In Figs. 10 and 11 we give a full description of the schemes.

Fig. 10
figure 10

\(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) CP-SNARK for set non-membership

Fig. 11
figure 11

\(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) CP-SNARK for set non-membership

The security of these schemes follow very closely the one of the corresponding membership schemes given in Sect. 4. Below we give the Theorems that state their security. The proofs are omitted since they are almost identical to the corresponding proofs for the membership schemes.

Theorem 5.1

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\textsf{RSA}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Coprime}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption hold, and that \(\textsf{H}\) is collision resistant.

If \(d \mu + 2 \le \nu \), \(\lambda _{s}+ 1 < \mu \) and \(\lambda _{s}< \log (N)/2\) then \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) is knowledge-sound with partial opening of the set commitments \(C_{U}\).

Theorem 5.2

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\textsf{RSA}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Coprime}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{HashEq}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption hold, and that \(\textsf{H}\) is collision resistant.

If \(d \mu + 2 > \nu \), \(\lambda _{s}+ 1 < \mu \), \(\lambda _{s}< \log (N)/2\), \(d = O(1)\) is a small constant, \(2^{\mu - \nu } \in \textsf{negl}(\lambda )\) and \(\textsf{H}\) is modeled as a random oracle, then \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) is knowledge-sound with partial opening of the set commitments \(C_{U}\).

Theorem 5.3

Let \(\textsf{PedCom}\), \(\textsf{SetCom}_{\mathsf {RSA'}}\) and \(\textsf{IntCom}\) be computationally binding commitments, \(\textsf {CP}_\textsf{Coprime}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{Range}\) be knowledge-sound NIZK arguments, and assume that the Strong RSA assumption hold. If \(d \mu + 2 \le \nu \), \(\lambda _{s}+ 1 < \mu \) and \(\lambda _{s}< \log (N)/2\) then \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) is knowledge-sound with partial opening of the set commitments \(c_{P}\). Furthermore, if \(\textsf{PedCom}\), \(\textsf{SetCom}_{\mathsf {RSA'}}\) and \(\textsf{IntCom}\) are statistically hiding commitments, and \(\textsf {CP}_\textsf{Coprime}\), \(\textsf {CP}_\textsf{modEq}\) and \(\textsf {CP}_\textsf{Range}\) be zero-knowledge, then \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) is zero-knowledge.

5.4 Proposed instantiations of protocol for \(R_{\textsf{Coprime}}\)

Below we propose an interactive ZK protocol for \(R_\textsf{Coprime}\). As the relation indicates, we need to prove knowledge of (Db) such that \(D^e\textsf{Acc}^b=G\), for a committed e. Proving opening of \(C_e\) to e is straightforward, so the main challenge is to prove the non-membership equation. For this the prover should send D and \(\textsf{Acc}^b\) to the verifier so that she can check that \(D^e\textsf{Acc}^b=G\) herself. Of course, there are two caveats. The first one is that D and \(\textsf{Acc}^b\) cannot be sent in the plain as we require zero-knowledge; we solve this by sending them in a hiding manner, i.e., \(C_a = D H^{r_a}\) and \(C_B = \textsf{Acc}^b H^{\rho _{B}}\) for random values \(r_a,\rho _B\). Consequently, the verification now should work with the hiding elements. Secondly, the verifier should be ensured that \(\textsf{Acc}^b\) is indeed an exponentiation of \(\textsf{Acc}\) with a known (to the prover) value b, otherwise soundness can be broken. More specifically we require extraction of \(b,\rho _B\) such that \(C_B = \textsf{Acc}^b H^{\rho _B}\). This is done using the partial opening of \(\textsf{Acc}\) to the set represented by \(\textsf{prod}\), i.e., the protocol assumes that \(\textsf{Acc}= G^{\textsf{prod}}\) is a common knowledge.

Below we present our protocol in full details.

  1. 1.

    Prover computes \(C_a=D H^{r_a}, C_{r_a}=G^{r_a} H^{r'_a}, C_B=Acc^{b}H^{\rho _B}, C_{\rho _{B}} = G^{\rho _{B}} H^{\rho '_{B}}\) and sends to the verifier:

    \(\underline{\mathcal {P} \rightarrow \mathcal {V}}:C_a, C_{r_a}, C_B, C_{\rho _B}\).

  2. 2.

    Prover and Verifier perform a protocol for the relation: \(R((\textsf{Acc}, C_e, C_a, C_{r_a}, C_B, C_{\rho _B}), (e,b,r, r_a, r'_a, \rho _{B}, \rho '_B, \beta ,\delta ))=1 \) iff

    $$\begin{aligned} C_e = G^eH^r \wedge C_r=G^{r_2}H^{r_3} \wedge Acc=C_W^e \left( \frac{1}{H} \right) ^\beta \wedge 1= C_r^e \left( \frac{1}{H} \right) ^\delta \left( \frac{1}{G} \right) ^\beta . \end{aligned}$$

    Let \(\lambda _{s}\) be the size of the challenge space, \(\lambda _{z}\) be the statistical security parameter and \(\mu \) the size of e.

    • Prover samples:

      $$\begin{aligned} \begin{aligned}&r_b, r_e {\leftarrow }{\$}\,\left( -2^{\lambda _{z}+ \lambda _{s}+ \mu },2^{\lambda _{z}+ \lambda _{s}+ \mu } \right) \\&r_{\rho _B}, r_r, r_{r_a},r_{r'_a}, r_{\rho _{B}'} {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}},\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+\lambda _{s}} \right) \\&r_\beta ,r_\delta {\leftarrow }{\$}\,\left( -\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu },\left\lfloor N/4\right\rfloor 2^{\lambda _{z}+ \lambda _{s}+ \mu } \right) , \end{aligned} \end{aligned}$$

      and computes:

      $$\begin{aligned}{} & {} \alpha _2 = \textsf{Acc}^{r_b}H^{r_{\rho _B}}, \quad \alpha _3 = G^{r_e}H^{r_r}, \quad \alpha _4 = G^{r_{r_a}}H^{r_{r_a'}},\\{} & {} \alpha _5 = C_a^{r_e} H^{r_\beta }, \quad \alpha _6 = C_{r_a}^{r_e} G^{r_{\beta }} H^{r_\delta }, \quad \alpha _7 = G^{r_{\rho _B}}H^{r_{\rho _B'}}. \end{aligned}$$

      \(\underline{\mathcal {P} \rightarrow \mathcal {V}}:(\alpha _2,\alpha _3,\alpha _4, \alpha _5, \alpha _6, \alpha _7)\)

    • Verifier samples the challenge \(c \leftarrow \{0,1\}^{\lambda _{s}} \underline{\mathcal {V} \rightarrow \mathcal {P}}: c\).

    • Prover computes the response:

      $$\begin{aligned} \begin{aligned}&s_b = r_b - c b, s_e = r_e - c e\\&s_{\rho _B} = r_{\rho _B} - c \rho _B, s_r = r_r - c r, s_{r_a} = r_{r_a} - c r_a, s_{r'_a} = r_{r'_a} - c r_a', s_{\rho _{B}'} = r_{\rho _B'} - c \rho _B'\\&s_\beta = r_\beta + c (e r_a + \rho _{B}), \quad s_\delta =r_\delta + c ( e r_a' +\rho '_{B}). \end{aligned} \end{aligned}$$

      \(\underline{\mathcal {P} \rightarrow \mathcal {V}}: (s_b, s_e, s_{\rho _B}, s_r, s_{r_a}, s_{r'_a}, s_{\rho _{B}'}, s_\beta , s_\delta )\)

    • Verifier checks if:

      $$\begin{aligned}{} & {} \alpha _2 {\mathop {=}\limits ^{?}} C_B^c \textsf{Acc}^{s_b}H^{s_{\rho _B}}, \quad \alpha _3 {\mathop {=}\limits ^{?}} C_e^c G^{s_e} H^{s_r}, \quad \alpha _4 {\mathop {=}\limits ^{?}} C_{r_a}^c G^{s_{r_a}}H^{s_{r_a'}}, \quad \\{} & {} \alpha _5 {\mathop {=}\limits ^{?}} C_a^{s_e} H^{s_\beta } G^{c} C_{B}^{-c}, \quad \alpha _6 {\mathop {=}\limits ^{?}} C_{r_a}^{s_e} H^{s_\delta } G^{s_{\beta }} C_{\rho _B}^{-c}, \quad \alpha _7 {\mathop {=}\limits ^{?}} C_{\rho _B}^c G^{s_{\rho _B}}H^{s_{\rho _B'}},\\{} & {} s_e {\mathop {\in }\limits ^{?}} \left[ -2^{\lambda _{z}+ \lambda _{s}+ \mu +1}, 2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \right] . \end{aligned}$$
Fig. 12
figure 12

Description of the Coprime protocol

5.4.1 Correctness

Here we show the correctness of the protocol (Fig. 12).

$$\begin{aligned} \alpha _2= & {} \textsf{Acc}^{r_b} H^{r_{\rho _B}} = \textsf{Acc}^{s_b + c b} H^{s_{\rho _B} + c \rho _B} = \textsf{Acc}^{s_{r_b}} H^{s_{\rho _B}} (\textsf{Acc}^{b} H^{\rho _B})^c \\= & {} \textsf{Acc}^{s_b} H^{s_{\rho _B}} C_B^c \\ \alpha _3= & {} G^{r_e} H^{r_r} = G^{s_e + c e} H^{s_r + c r} = G^{s_e} H^{s_r} (G^{e} H^{r})^c \\= & {} G^{s_e} H^{s_r} C_e^c \\ \alpha _4= & {} G^{r_{r_a}} H^{r_{r'_a}} = G^{s_{r_a} + c r_a} H^{s_{r'_a} + c r'_a} = G^{s_{r_a}}H^{s_{r_a'}} (G^{ r_a} H^{ r'_a})^{c} \\= & {} G^{s_{r_a}}H^{s_{r_a'}} C_{r_a}^c \\ \alpha _5= & {} C_a^{r_e} H^{r_\beta } = C_a^{s_e + c e} H^{s_\beta - c ( e r_a + \rho _B)} = C_a^{s_e} H^{s_\beta } (D^{e} H^{e r_a})^c H^{ - c ( e r_a + \rho _B)} \\= & {} C_a^{s_e} H^{s_\beta } (D^{e} H^{ - \rho _B})^{c} = C_a^{s_e} H^{s_\beta } (G Acc^{-b} H^{ - \rho _B})^{c} = \\= & {} C_a^{s_e} H^{s_\beta } G^{c} C_{B}^{-c} \\ \alpha _6= & {} C_{r_a}^{r_e} G^{r_{\beta }} H^{r_\delta } = C_{r_a}^{s_e + c e} G^{s_\beta - c ( e r_a + \rho _B)} H^{s_\delta - c ( e r'_a + \rho '_B)} \\= & {} C_{r_a}^{s_e} G^{s_\beta } H^{s_\delta } (G^{r_a} H^{r'_a})^{c e} G^{ - c ( e r_a + \rho _B)} H^{ - c ( e r'_a + \rho '_B)} = C_{r_a}^{s_e} G^{s_\beta } H^{s_\delta } G^{ - c \rho _B} H^{ - c \rho '_B} \\= & {} C_{r_a}^{s_e} G^{s_\beta } H^{s_\delta } C_{\rho _B}^{ - c} \\ \alpha _7= & {} G^{r_{\rho _B}} H^{r_{\rho _B'}} = G^{s_{\rho _B} + c \rho _B} H^{s_{\rho _B'} + c \rho _B'} = G^{s_{\rho _B}} H^{s_{\rho _B'}} (G^{\rho _B} H^{\rho _B'})^c \\= & {} G^{s_{\rho _B}} H^{s_{\rho _B'}} C_{\rho _B}^c \\. \end{aligned}$$

5.4.2 Security

Security of our scheme holds with the partial opening of \(\textsf{Acc}\), i.e., when it is ensured outside the protocol that \(\textsf{Acc}\) is a valid commitment of the set. The proof is similar to the one of Theorem 4.6. The main technical difference is in the extraction of the opening of \(C_B\), because \(\textsf{Acc}\) is not a random generator sampled at the setup phase. However, from partial opening we know that it is \(\textsf{Acc}= G^{\textsf{prod}}\) for a random generator G. This will allow us to state an alternative to Lemma 4.2 to justify the extraction of the opening of \(C_B\).

Theorem 5.4

Let \(\mathbb {Z}_N^*\) be an RSA group where strong-RSA assumption holds, then the above protocol is honest-verifier zero knowledge protocol and, also, if \(\lambda _{s}+ 1 < \mu \) and \(\lambda _{s}< \log (N)/2\), is knowledge sound with partial opening of \(\textsf{Acc}\) for \(R_{\textsf{Coprime}}\).

Proof

Zero-Knowledge can be proven with standard techniques, similar to the ones in the proof of Theorem 4.6 and is therefore omitted.

For the knowledge soundness, let an adversary of the knowledge soundness \(\mathcal {A}\) that is able to convince the verifier \(\mathcal {V}\) with a probability at least \(\epsilon \). We will construct an extractor \(\mathcal {E}\) that extracts the witness \((e,r,r_2,r_3,\beta ,\delta )\). Using rewinding \(\mathcal {E}\) gets two accepted transcripts

$$\begin{aligned}{} & {} (C_a, C_{r_a}, C_B, C_{\rho _B}, \alpha _2, \alpha _3, \alpha _4, \alpha _5, \alpha _6, \alpha _7, c, s_b, s_e, s_{\rho _B}, s_{r}, s_{r_a}, s_{r_a'}, s_{\rho _B'}, s_\beta ,s_\delta )\\{} & {} (C_a, C_{r_a}, C_B, C_{\rho _B}, \alpha _2, \alpha _3, \alpha _4, \alpha _5, \alpha _6, \alpha _7, c', s_b', s_e', s_{\rho _B}', s_{r}', s_{r_a}', s_{r_a'}', s_{\rho _B'}', s_\beta ',s_\delta '), \end{aligned}$$

on two different challenges c and \(c'\). \(\mathcal {E}\) aborts if it cannot get two such transcripts (\(\textsf {abort}1\)).

We denote \(\varDelta c :=c'-c, \varDelta s_b :=s_b - s_b', \varDelta s_e :=s_e - s_e', \varDelta s_{\rho _B} :=s_{\rho _B} - s_{\rho _B}', \varDelta s_r :=s_r - s_r', \varDelta s_{r_a} :=s_{r_a} - s_{r_a}', \varDelta s_{r_a'} :=s_{r_a'} - s_{r_a'}', \varDelta s_{\rho _B'} :=s_{\rho _B'} - s_{\rho _B'}', \varDelta s_\beta :=s_\beta - s_\beta ', \varDelta s_\delta :=s_\delta - s_\delta '\) then

$$\begin{aligned} C_B^{\varDelta c}= & {} \textsf{Acc}^{\varDelta s_b}H^{\varDelta s_{\rho _B}} \Rightarrow C_B = \pm \textsf{Acc}^{\hat{b}}H^{\hat{\rho _B}}, \end{aligned}$$
(1)
$$\begin{aligned} C_e^{\varDelta c}= & {} G^{\varDelta s_e} H^{\varDelta s_r} \, \Rightarrow \, C_e = \pm G^{\hat{e}} H^{\hat{r}}, \end{aligned}$$
(2)
$$\begin{aligned} C_{r_a}^{\varDelta c}= & {} G^{\varDelta s_{r_a}}H^{\varDelta s_{r_a'}} \, \Rightarrow \, C_{r_a} = \pm G^{\hat{r_a}}H^{\hat{r'_a}}, \end{aligned}$$
(3)
$$\begin{aligned} 1= & {} C_a^{\varDelta s_e} H^{\varDelta s_\beta } G^{-\varDelta c} C_{B}^{\varDelta c}, \end{aligned}$$
(4)
$$\begin{aligned} 1= & {} C_{r_a}^{\varDelta s_e} H^{\varDelta s_\delta } G^{\varDelta s_{\beta }} C_{\rho _B}^{\varDelta c}, \end{aligned}$$
(5)
$$\begin{aligned} C_{\rho _B}^{\varDelta c}= & {} G^{\varDelta s_{\rho _B}}H^{\varDelta s_{\rho _B'}} \, \Rightarrow \, C_{\rho _B} = \pm G^{\hat{\rho _B}}H^{\hat{\rho '_B}}, \end{aligned}$$
(6)

define the (possibly rational) numbers \(\hat{b} :=\frac{\varDelta s_b}{\varDelta c}\), \(\hat{e} :=\frac{\varDelta s_e}{\varDelta c}\), \(\hat{r} :=\frac{\varDelta s_r}{\varDelta c}\), \(\hat{r_a} :=\frac{\varDelta s_{r_a}}{\varDelta c}\), \(\hat{r_a'} :=\frac{\varDelta s_{r_a'}}{\varDelta c}\), \(\hat{\rho _B} :=\frac{\varDelta s_{\rho _B}}{\varDelta c}\), \(\hat{\rho _B'} :=\frac{\varDelta s_{\rho _B'}}{\varDelta c}\).

\(\mathcal {E}\) aborts in case \(\varDelta c\) doesn’t divide: \(\varDelta s_e\) and \(\varDelta s_r\)(\(\textsf {abort}\, 2a\)), \(\varDelta s_{r_a}\) and \(\varDelta s_{r_a'}\)(\(\textsf {abort}\, 2b\)), \(\varDelta s_{\rho _B}\) and \(\varDelta s_{\rho _B'}\)(\(\textsf {abort}\, 2c\)). And finally, \(\mathcal {E}\) aborts if \(\varDelta c\) doesn’t divide \(\varDelta s_b\) and \(\varDelta s_{\rho _B}\) (\(\textsf {abort}\, 2d\)). Therefore, after these aborts didn’t happen we can infer the equivalent equalities on the right of Eqs. 2, 3, 6 and 1.

If we replace Eqs. 3 and 6 in Eq. 5 we get \(1 = \left( \pm G^{\hat{r_a}}H^{\hat{r'_a}} \right) ^{\varDelta s _e} H^{\varDelta s_\beta } G^{\varDelta s_\beta } \left( \pm G^{\hat{\rho _B}}H^{\hat{\rho '_B}} \right) ^{\varDelta c}\) or \(1 = (\pm 1)^{\varDelta s_e} (\pm 1)^{\varDelta c} G^{\hat{r_a} \varDelta s_e + \hat{\rho _B} \varDelta c + \varDelta s_\beta } H^{\hat{r_a'} \varDelta s_e + \hat{\rho _B'} \varDelta c + \varDelta s_\beta }\). Since GH, 1 are quadratic residues then \((\pm 1)^{\varDelta s_e} (\pm 1)^{\varDelta c} = 1\), hence \(1 = G^{\hat{r_a} \varDelta s_e + \hat{\rho _B} \varDelta c + \varDelta s_\beta } H^{\hat{r_a'} \varDelta s_e + \hat{\rho _B'} \varDelta c + \varDelta s_\beta }\). Then under the DLOG assumption \(\hat{r_a} \varDelta s_e + \hat{\rho _B} \varDelta c + \varDelta s_\beta = 0 = \hat{r_a'} \varDelta s_e + \hat{\rho _B'} \varDelta c + \varDelta s_\beta \), which gives us that

$$\begin{aligned} \varDelta s_\beta = -\hat{r_a} \varDelta s_e - \hat{\rho _B} \varDelta c. \end{aligned}$$
(7)

Finally, we replace Eqs. 1 and 7 in Eq. 4 we get \(1 = C_a^{\varDelta s_e} H^{-\hat{r_a} \varDelta s_e - \hat{\rho _B} \varDelta c} G^{-\varDelta c} \left( \pm \textsf{Acc}^{\hat{b}}H^{\hat{\rho _B}} \right) ^{\varDelta c}\) or \(1 = (\pm 1)^{\varDelta c} C_a^{\varDelta s_e} \textsf{Acc}^{\hat{b} \varDelta c} G^{- \varDelta c} H^{- \hat{r_a} \varDelta s_e}\) or \(\left( \pm \textsf{Acc}^{\hat{b}}G^{-1} \right) ^{\varDelta c} = \left( C_a^{-1} H^{r_a} \right) ^{\varDelta s_e}\). But as noted above \(\varDelta c\) divides \(\varDelta s_e\) so \(\pm \textsf{Acc}^{\hat{b}}G^{-1} = \pm \left( C_a^{-1} H^{r_a} \right) ^{\hat{e}} \Rightarrow \textsf{Acc}^{\hat{b}}G^{-1} = \pm \left( C_a^{-1} H^{\hat{r_a}} \right) ^{\hat{e}} \Rightarrow \left( \frac{C_a}{H^{\hat{r_a}}} \right) ^{\hat{e}}\textsf{Acc}^{\hat{b}} = \pm G\). We discern two cases:

  • \(\underline{\left( \frac{C_a}{H^{\hat{r_a}}} \right) ^{\hat{e}}\textsf{Acc}^{\hat{b}} = + G}\): Then \(\mathcal {E}\) sets \(\tilde{D} \leftarrow \frac{C_a}{H^{\hat{r_a}}}\), \(\tilde{e} \leftarrow \hat{e} :=\frac{\varDelta s_e}{\varDelta c}\), \(\tilde{r} \leftarrow \hat{r} :=\frac{\varDelta s_r}{\varDelta c}\) and \(\tilde{b} \leftarrow \hat{b} :=\frac{\varDelta s_b}{\varDelta c}\).

  • \(\underline{\left( \frac{C_a}{H^{\hat{r_a}}} \right) ^{\hat{e}}\textsf{Acc}^{\hat{b}} = - G}\): Then \(\hat{e}\) should be odd otherwise if \(\hat{e} = 2 \rho \) then \(G = -\left( \frac{C_a}{H^{\hat{r_a}}} \right) ^{2\rho }\textsf{Acc}^{\hat{b}}\) would be a non-quadratic residue. So \(\mathcal {E}\) sets \(\tilde{D} \leftarrow -\frac{C_a}{H^{\hat{r_a}}}\), \(\tilde{e} \leftarrow \hat{e} :=\frac{\varDelta s_e}{\varDelta c}\), \(\tilde{r} \leftarrow \hat{r} :=\frac{\varDelta s_r}{\varDelta c}\) and \(\tilde{b} \leftarrow \hat{b} :=\frac{\varDelta s_b}{\varDelta c}\). It is clear that \(\tilde{D}^{\tilde{e}} \textsf{Acc}^{\tilde{b}} = G\).

Finally the \(\mathcal {E}\) outputs \((\tilde{e},\tilde{r}, \tilde{D}, \tilde{b})\).

Now we show that the probability the extractor terminates with outputting a valid witness is \(O(\epsilon )\). If the extractor does not abort then it clearly outputs a valid witness (under the factoring assumption). For the first abort, with a standard argument it can be shown that the extractor is able to extract two accepting transcripts with probability \(O(\epsilon )\) (for the probabilistic analysis we refer to [31]). Thus \(Pr[\textsf {abort}1] = 1 - O(\epsilon )\). For the aborts \(\textsf {abort}\, 2a\), \(\textsf {abort}\, 2b\) and \(\textsf {abort}\, 2c\) they happen with negligible probability (\( \le \frac{2}{1- 2^{-\lambda _{s}+1}} Pr[\mathcal {B} \text { solves } sRSA]\) each, for any PPT adversary \(\mathcal {B}\)) under the strong RSA assumption according to Lemma 4.2. For \(\textsf {abort}\, 2d\) we cannot directly use the same lemma as \(\textsf{Acc}\) is not a random generator that is part of the \(\textsf{crs}\). However, with a similar argument and using partial extractability we show below that the probability for this abort is the same. Putting them together the probability of success of \(\mathcal {E}\) is at least \(O(\epsilon ) - \frac{8}{1- 2^{-\lambda _{s}+1}} Pr[\mathcal {B} \text { solves } sRSA] = O(\epsilon ) - \textsf{negl}(\lambda _{s})\).

For Eq. 1, we get from partial opening that \(\textsf{Acc}= G^{\textsf{prod}_{P}}\), where \(P:=\{ \textsf{H}_{\textsf{prime}}(u) \mid u\in U\}\), so

$$\begin{aligned} C_B^{\varDelta c} = G^{\prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b} H^{\varDelta s_{\rho _B}}. \end{aligned}$$

We use a similar to [31] argument to prove that \(\varDelta c\) divides \(\varDelta s_b\) and \(\varDelta s_{\rho _B}\) under the strong RSA assumption, given that \(\lambda _{s}+ 1 < \mu \). Then

$$\begin{aligned} C_B = \pm \textsf{Acc}^{\hat{b}}H^{\hat{\rho _B}}. \end{aligned}$$
(8)

Lemma 5.1

Let \(\lambda _{s}+ 1 < \mu \) and \(\lambda _{s}< \log (N)/2\) then \(\varDelta c\) divides \(\varDelta s_b\) and \(\varDelta s_{\rho _B}\) under the strong RSA assumption.

Proof

An adversary against the strong RSA assumption receives \(H \in \textsf{QR}_N\) and does the following: sets \(G = H^\tau \) for \(\tau {\leftarrow }{\$}\,[0,2^{\lambda _{s}} N^2]\) and sends (GH) to the adversary \(\mathcal {A}\) which outputs a proof \(\pi _{\textsf{Coprime}}\). Then we rewind to get another successful proof \(\pi _{\textsf{Coprime}}'\) and we use the extractor as above to get \(C_B^{\varDelta c} = G^{\prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b} H^{\varDelta s_{\rho _B}}\) or

$$\begin{aligned} C_B^{\varDelta c} = H^{\tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B}}. \end{aligned}$$

We can exclude the case that \(\varDelta c\) divides \(\prod _{u\in U} \textsf{H}_{\textsf{prime}}(u)\), since \(\varDelta c\) is smaller than the domain of the hash function \(\textsf{H}_{\textsf{prime}}\), i.e. \(\varDelta c < \textsf{H}_{\textsf{prime}}(u)\) for each \(u\in U\), which comes from \(\lambda _{s}+ 1 < \mu \). Assume that \(\varDelta c \not \mid \varDelta s_b \vee \varDelta c \not \mid \varDelta s_{\rho _B}\). we discern two cases:

  • \(\varDelta c\) doesn’t divide \(\tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B}\): then \(\gcd (\varDelta c, \tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} ) {=} g\) and there are \(\chi , \psi \) such that \(\chi \cdot \varDelta c {+} \psi \cdot \left( \tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} \right) = g\). Thus

    $$\begin{aligned} H^g = H^{\chi \cdot \varDelta c + \psi \cdot \left( \tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} \right) } = H^{\chi \varDelta c} \cdot C_B^{\psi \varDelta c} = \left( H^{\chi } \cdot C_B^{\psi }\right) ^{\varDelta c}. \end{aligned}$$

    Since g divides \(\varDelta c\) we get \(H = \pm \left( H^{\chi } \cdot C_B^{\psi }\right) ^{\frac{\varDelta c}{g}}\). However H is a quadratic residue (thus \(C_B\) is so), meaning that \(H = \left( H^{\chi } \cdot C_B^{\psi }\right) ^{\frac{\varDelta c}{g}}\), thus \((H^{\chi } \cdot C_B^{\psi },\frac{\varDelta c}{g})\) is a solution to the strong RSA problem.

  • \(\varDelta c\) divides \(\tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B}\): let \(q^\ell \) be the maximal q-power that divides \(\varDelta c\) (i.e. \(q^\ell \) is a factor of \(\varDelta \)) and doesn’t divide at least one of \(\varDelta s_b\) and \(\varDelta s_{\rho _B}\), where q is prime. Such a \(q^\ell \) should exist otherwise \(\varDelta c\) would divide both \(\varDelta s_b\) and \(\varDelta s_{\rho _B}\), which we assumed it doesn’t. Notice that if \(q^\ell \) divided \(\varDelta s_b\) then it would also divide \(\varDelta s_{\rho _B}\), as \(q^\ell \) divides \(\tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B}\) (from assumption), so \(q^\ell \not \mid \varDelta s_b\).

    $$\begin{aligned} q^\ell \mid \left( \tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} \right) \Rightarrow \tau \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} = 0 \pmod {q^\ell }. \end{aligned}$$

    We can write \(\tau :=\tau _1 + \tau _2 \, \textsf{ord}(H)\). Notice that \(\tau _2\) is information theoretically hidden to the adversary and thus is uniformly random in \([0,2^{\lambda _{s}} N^2/\textsf{ord}(H)] \supset [0,2^{\lambda _{s}} N]\) in its view.

    $$\begin{aligned}{} & {} \Rightarrow \tau _1 \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \tau _2 \textsf{ord}(H) \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b + \varDelta s_{\rho _B} = 0 \pmod {q^\ell }\\{} & {} \Rightarrow \tau _2 \cdot \varDelta s_b = \left( -\tau _1 \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \cdot \varDelta s_b - \varDelta s_{\rho _B}\right) \\ {}{} & {} \quad \cdot \left( \prod _{u\in U} \textsf{H}_{\textsf{prime}}(u) \right) ^{-1} \cdot \left( \textsf{ord}(H) \right) ^{-1} \pmod {q^\ell }. \end{aligned}$$

    To see that \(\prod _{u\in U} \textsf{H}_{\textsf{prime}}(u)\) has an inverse modulo \(q^\ell \) note that since \(\varDelta c < \textsf{H}_{\textsf{prime}}(u)\) implies \(q^{\ell } < \textsf{H}_{\textsf{prime}}(u)\), so \(\gcd (\prod _{u\in U} \textsf{H}_{\textsf{prime}}(u), q^\ell ) = 1\). For the inverse of \(\textsf{ord}(H)\) note that \(H \in \textsf{QR}_N\) so \(\textsf{ord}(H) \in \{q_1,q_2,q_1q_2\}\), where \(N = (2q_1+1)(2q_2+1)\) is the RSA modulus. Then from \(\lambda _{s}< \log (N)/2\) we get \(\varDelta c < q_1,q_2\) and thus \(\gcd (\textsf{ord}(H), q^\ell ) = 1\).

    As noted above, \(\tau _2\) is uniformly random in a superset of \([0,2^{\lambda _{s}} N]\). But \(q^\ell< \varDelta c < N\), so \(2^{\lambda _{s}} N\) is at least \(2^{\lambda _{s}}\) larger than \(q^\ell \). Thus \(\tau _2\) is statistically close to uniform in \(\{0, 1, \dots , q^\ell -1\}\) (with \(2^{-\lambda _{s}}\) error), \(Pr_{\tau _2} [\tau _2 = C \pmod {q^\ell }] \approx \frac{1}{q^\ell }\). Furthermore, for any \(\varDelta s_b\), \(Pr_{\tau _2} [\tau _2 \cdot \varDelta s_b = C \pmod {q^\ell }] \approx \frac{1}{q^\ell } \cdot \gcd (q^\ell ,\varDelta s_b) \le \frac{1}{q^\ell } \cdot q^{\ell -1}\) (since \(q^\ell \) doesn’t divide \(\varDelta s_b\)). This is because for variable \(\tau _2\), the equation \(\tau _2 \varDelta s_b = C \pmod {q^\ell }\) has \(\gcd (q^\ell , \varDelta s_b )\) solutions.

    In conclusion, the probability that the above equation holds is at most \(\frac{1}{q} + 2^{-\lambda _{s}} \le \frac{1}{2} + 2^{-\lambda _{s}}\).

To summarize we showed that the probability to fall in the second case is at most \(\frac{1}{2} + 2^{-\lambda _{s}}\). So with probability to fall in the first case, and thus solve the strong RSA problem, is at least \(\frac{1}{2} - 2^{-\lambda _{s}}\). \(\square \)

By a simple argument identical to the one of section 4.5, we can also conclude about the range of the extracted \(\tilde{e}\): \(s_e {\mathop {\in }\limits ^{?}} \left[ -2^{\lambda _{z}+ \lambda _{s}+ \mu +1}, 2^{\lambda _{z}+ \lambda _{s}+ \mu +1} \right] \) implies \(-2^{\lambda _{z}+ \lambda _{s}+ \mu +2} \le \hat{e} \le 2^{\lambda _{z}+ \lambda _{s}+ \mu +2}\). \(\square \)

6 A CP-SNARK for set membership in bilinear groups

In this section we propose another CP-SNARK, called \(\textsf{Mem}\textsf {CP}_{\textsf{VC}}\), for the set membership relation that works in bilinear groups. Unlike the schemes of Sect. 4, the CP-SNARK given in this section does not have short parameters; specifically it has a CRS linear in the size of the sets to be committed. On the other hand, it enjoys other features that are not satisfied by our previous schemes (nor by other schemes in the literature): first, it works solely in Bilinear Groups without having to deal with RSA groups; second, it allows to commit the set in an hiding manner and, for the sake of soundness, does not need to be opened by the adversary. This is possible thanks to the fact that the set is committed in a way that (under a knowledge assumption) guarantees that the prover knows the set.

More in detail, \(\textsf{Mem}\textsf {CP}_{\textsf{VC}}\) is a CP-SNARK for set membership where set elements are elements from the large field \(\mathbb {F}= \mathbb {Z}_q\) where \(q\) is the order of bilinear groups. So \(\mathcal {D}_{\textsf{elm}} = \mathbb {F}\). In terms of set it supports all the subsets of \(2^{\mathcal {D}_{\textsf{elm}}}\) of cardinality bounded by n, \(\mathcal {D}_{\textsf{set}} = \{U\in 2^{\mathcal {D}_{\textsf{elm}}}: \#U\le n\} \), which we denote by \(\mathcal {S}_n\), \(\#\) symbol denotes the cardinality of a set. So \(U\) has elements in \(\mathbb {F}\) and is a subset of \(\mathcal {S}_n\).

6.1 Preliminaries and building blocks

6.1.1 Bilinear groups

A bilinear group generator \(\mathcal{B}\mathcal{G}(1^\lambda )\) outputs \((q, \mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_T, e)\), where \(\mathbb {G}_1\), \(\mathbb {G}_2\), \(\mathbb {G}_T\) are additive groups of prime order q, and \(e: \mathbb {G}_1 \times \mathbb {G}_2 \rightarrow \mathbb {G}_T\) is an efficiently computable, non-degenerate, bilinear map. For ease of exposition we present our results with Type-1 groups where we assume that \(\mathbb {G}_1 = \mathbb {G}_2\). Our results are under the \((\ell + 1)d\)-Strong Diffie Hellman and the \((d, \ell )\)-Extended Power Knowledge of Exponent assumptions, for which we refer the reader to [77].

6.1.2 A polynomial-pedersen type-based commitment scheme

First we present \(\textsf{PolyCom}\), a type-based commitment scheme which was introduced in [18] extracted from the verifiable polynomial delegation scheme of [77]. The scheme has two types: one for \(\ell \)-variate polynomials \(f:\mathbb {F}^\ell \rightarrow \mathbb {F}\) over \(\mathbb {F}\) of variable degree at most d, and one which is a standard Pedersen commitment for field elements. Let \(\mathcal {W}_{\ell ,d}\) be the set of all multisets of \(\{1, \dots , \ell \}\) where the cardinality of each element is at most d. The scheme is described in Fig. 13.

Fig. 13
figure 13

\(\textsf{PolyCom}\) commitment scheme

Theorem 6.1

Under the \((\ell + 1)d\)-Strong Diffie Hellman and the \((d, \ell )\)-Extended Power Knowledge of Exponent assumptions \(\textsf{PolyCom}\) is an extractable trapdoor commitment scheme.

For the proof we refer to [18, 77].

6.1.3 Input-hiding CP-SNARK for polynomial evaluation

The main building block of our main protocol is a CP-SNARK \(\textsf {CP}_{\textsf{PolyEval}}\) for the type-based commitment \(\textsf{PolyCom}\). Loosely speaking the idea is to commit to the input \(\varvec{t}\) and the output y of a polynomial (with a Pedersen commitment), further commit to the polynomial f itself (with a polynomial commitment) and then prove that the opening of the committed polynomial evaluated on the opening of the committed input gives the committed output. The relation of the protocol is \(R_{\textsf{PolyEval}}((t_k)_{k \in [\ell ]},f,y))=1 \) iff \(f(t_1,\dots ,t_\ell )=y\):

\(\varvec{\mathsf R}=(\textsf{ck},R_{\textsf{PolyEval}})\) where \(\varvec{\mathsf R}\) is over

$$\begin{aligned} (\varvec{x},\varvec{w})= & {} ((x,, ),(u,o,\omega ))=\big ( \, (\varnothing ,(, _y,(, _{t_k})_{k \in [\ell ]},, _f)) \,, \, ((y,(t_k)_{k \in [\ell ]},f), \\ {}{} & {} \quad (r_y,(r_{t_{k}})_{k \in [\ell ]},r_f),\varnothing ) \, \big ). \end{aligned}$$

We will present a CP-SNARK for this relation, \(\textsf {CP}_\textsf{PolyEval}\), in Sect. 6.3. \(\textsf {CP}_\textsf{PolyEval}\) is based on a similar protocol for polynomial evaluation given in [18] which was in turn based on the verifiable polynomial delegation scheme of zk-vSQL [77]. In those protocols, however, the input \(\textbf{t}\) is public whereas in ours we can keep it private and committed.

6.1.4 Range proof CP-NIZK

We make use of \(\textsf {CP}_\textsf{Range}\), a CP-NIZK for the following relation on \(\textsf{PedCom}\) commitments \(, \) and two given integers \(A<B\):

$$\begin{aligned} R_\textsf{Range}\left( (, _{e},A,B),(e,r_{q}) \right) = 1 \; \text { iff } \; , =g^{e} h^{r_{q}} \; \wedge \; A< e_{q}< B. \end{aligned}$$

\(\textsf {CP}_{\textsf{Range}}\) can have various instantiations such as Bulletproofs [13].

6.1.5 Multilinear extensions of vectors

Let \(\mathbb {F}\) be a field and \(n=2^\ell \). The multilinear extension of a vector \(\varvec{a}=(a_0,\dots ,a_{n-1})\) in \(\mathbb {F}\) is a polynomial \(f_{\varvec{a}}:\mathbb {F}^\ell \rightarrow \mathbb {F}\) with variables \(x_1,\dots ,x_\ell \) defined as

$$\begin{aligned} f_{\varvec{a}}(x_1,\dots ,x_\ell )= \sum _{i=0}^{n-1}a_i \cdot \prod _{k=1}^{\ell }\textsf{select}_{i_k}(x_k), \end{aligned}$$

where \(i_\ell i_{\ell -1} \dots i_{2} i_{1}\) is the bit representation of i and \( \textsf{select}_{i_k}(x_k)= {\left\{ \begin{array}{ll} x_k, &{} \text {if } i_k=1\\ 1-x_k, &{} \text {if } i_k=0. \end{array}\right. }\)

A property of Multilinear extension of \(\varvec{a}\) is that \(f_{\varvec{a}}(i_1,\dots ,i_\ell )=a_i\) for each \(i \in [n]\).

6.1.6 The type-based commitment scheme of \(\textsf{Mem}\textsf {CP}_{\textsf{VC}}\)

We define the type-based commitment \(\textsf{C}_{EdraxPed}\) for our CP-SNARK \(\textsf{Mem}\textsf {CP}_{\textsf{VC}}\). We recall we need a commitment that allows one to commit to both elements and sets. We build this based on a hiding variant of EDRAX Vector Commitment [23], which in turn relies on a polynomial commitment. Therefore, we use a special case of \(\textsf{PolyCom}\) for polynomials of maximum variable degree \(d = 1\). Let \(\ell :=\Bigg \lceil \log (n) \Bigg \rceil \) and \(2^{[\ell ]}\) be the powerset of \([\ell ]=\{1,...,\ell \}\) then \(\mathcal {W}_{\ell ,1} = 2^{[\ell ]}\). Furthermore, for any \(n' \le n\) let \(L:\mathcal {S}_{n'} \rightarrow \mathbb {F}^{n'}\) be a function that maps a set of cardinality \(n'\) to its corresponding vector according to an ordering. The description of the scheme can be found in Fig. 14. Essentially the idea is to take the set, fix some ordering so that we can encode it with a vector, and then commit to such vector using the vector commitment of [23], which in turn commits to a vector by committing to its multilinear extension polynomial.

Fig. 14
figure 14

\(\textsf{C}_{EdraxPed}\)

6.2 CP-SNARK for set membership using EDRAX vector commitment

Here we present a CP-SNARK for set membership that uses a Vector Commitment—an EDRAX [23] variant—to commit to a set. The idea is to transform a set to a vector (using for example lexicographical order) and then commit to the vector with a vector commitment. Then the set membership is proven with a zero knowledge proof of opening of the corresponding position of the vector. However to preserve zero knowledge we additionally need to hide the position of the element. For this we construct a zero knowledge proof of knowledge of an opening of a position that does not give out the position. Finally, since the position is hidden we additionally need to ensure that the prover is not cheating by providing a proof for a position that exceeds the length of the vector. For this we, also, need a proof of range for the position, i.e. that \(i < n\).

In this section the domain of the elements is a field, \(\mathcal {D}_{\textsf{elm}} :=\mathbb {F}\), and the domain of the set is all the subsets of \(2^{\mathcal {D}_{\textsf{elm}}}\) of cardinality bounded by n, \(\mathcal {D}_{\textsf{set}} = \{U\in 2^{\mathcal {D}_{\textsf{elm}}}: \#U\le n\} \), which we denote by \(\mathcal {S}_n\) (the \(\#\) symbol denotes the cardinality of a set). So \(U\) has elements in \(\mathbb {F}\) and is a subset of \(\mathcal {S}_n\).

The type-based commitment of our scheme is \(\textsf{C}_{EdraxPed}\) (Fig. 14) that is presented in the previous section, and the relation is

\(\varvec{\mathsf R}=(\textsf{ck},R_{\textsf{VCmem}})\) where \(\varvec{\mathsf R}\) is over

$$\begin{aligned} (\varvec{x},\varvec{w})= & {} \left( (x,c),(u,o,\omega ) \right) =\left( \left( \#U,(, _y,(, _{i_k})_{k \in [\ell ]},, _{U})\right) \right. ,\\ {}{} & {} \quad \left. \left( (y,(i_k)_{k \in [\ell ]},U),(r_y,(r_{i_k})_{k \in [\ell ]},r_{U}),\varnothing \right) \right) . \end{aligned}$$

\( R_{\textsf{VCmem}}(\#U,\left( y,(i_k)_{k \in [\ell ]},U\right) )= 1\) iff \(y =L(U)[i] \wedge i < \#U\wedge i=\sum _{k=1}^{\ell } i_k 2^{k-1}\).

Note that in the above the prover should normally give exactly \(\ell = \Bigg \lceil \log (\#U) \Bigg \rceil \) commitments. In case \(\ell <\Bigg \lceil \log (\#U) \Bigg \rceil \) the position is not fully hiding since it is implicit that \(i < 2^{\ell -1}\) so the verifier gets a partial information about the position.

For this we will compose a CP-SNARK \(\textsf {CP}_{\textsf{PolyEval}}\) and a CP-NIZK \(\textsf {CP}_{\textsf{Range}}\) for the relations \(R_{\textsf{PolyEval}}((i_k)_{k \in [\ell ]},f,y))=1\) iff \(f(i_1,\dots ,i_\ell )=y\) and \(R_{\textsf{Range}}(T,(i_k)_{k \in [\ell ]}) = 1\) iff \(i<T\) respectively and the commitment scheme \(\textsf{C}_{EdraxPed}\). So \(\textsf {CP}_{\textsf{VCmem}}\) is a conjuction of the former, where the common commitments are \((, _{i_k})_{k \in [\ell ]}\) (Fig. 15).

Fig. 15
figure 15

\(\textsf{Mem}\textsf {CP}_{\textsf{VC}}\)

Theorem 6.2

Let \(\textsf {CP}_{\textsf{PolyEval}}\) and \(\textsf {CP}_{\textsf{Range}}\) be zero knowledge CP-SNARKs for the relations \(R_{\textsf{PolyEval}}\) and \(R_{\textsf{Range}}\) respectively under the commitment scheme \(\textsf{PolyCom}\) then the above scheme is a zero knowledge CP-SNARK for the relation \(R_{\textsf{VCmem}}\) and the commitment scheme \(\textsf{C}_{EdraxPed}\). Further it is a CP-SNARK for \(R_{\textsf{mem}}\) under the same commitment scheme.

Proof

Zero Knowledge comes directly from the zero knowledge of \(\textsf {CP}_{\textsf{PolyEval}}\) and \(\textsf {CP}_{\textsf{PolyEval}}\).

For Knowledge Soundness, let an adversary \(\mathcal {A}(\varvec{\mathsf R},\textsf{crs}, \textsf{aux}_{R}, \textsf{aux}_{Z})\) outputting \((x,, ) :=\big (\#U,\) \((, _y,(, _{i_k})_{k \in [\ell ]},, _{U}) \big )\) and \(\pi \) such that \(\textsf{VerProof}(\textsf{vk},\#U,(, _y,(, _{i_k})_{k \in [\ell ]},, _{U}),\pi ) = 1\). We will construct an extractor \(\mathcal {E}\) that on input \((\varvec{\mathsf R},\textsf{crs}, \textsf{aux}_{R}, \textsf{aux}_{Z})\) outputs a valid witness \(w :=\big ((y,(i_k)_{k \in [\ell ]},U),(r_y,(r_{i_k})_{k \in [\ell ]},\) \(r_{U}),\varnothing \big )\).

\(\mathcal {E}\) uses the extractors of \(\mathcal {E}_{\textsf{PolyEval}}\), \(\mathcal {E}_{\textsf{Range}}\) of \(\textsf {CP}_{\textsf{PolyEval}}\) and \(\textsf {CP}_{\textsf{Range}}\). \(\mathcal {E}_{\textsf{PolyEval}}\) outputs \((y,(i_k)_{k \in [\ell ]},f)\), \((r_y,(r_{i_k})_{k \in [\ell ]},r_f)\) such that \(f(i_1,\dots ,i_{\ell }) = y \wedge \textsf{PolyCom}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{\mathbb {F}[\varvec{s}]}, , _{U}, f, r_f)=1 \wedge \)

\(\textsf{PolyCom}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, , _y, y, r_y)=1 \bigwedge _{k=1}^{\ell } \textsf{PolyCom}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, , _{i_k}, i_k, r_{i_k}) = 1\). Further, from the Extended Power Knowledge of Exponent assumption we know that f is an \(\ell \)-variate polynomial of maximum variable degree 1. Therefore it corresponds to a multilinear extension of a unique vector \(\vec {U}\), which is efficiently computable. The extractor computes the vector \(\vec {U}\) from f and the corresponding set \(U\). It is clear that, since f is the multilinear extension of the \(U\) and \(\textsf{PolyCom}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{\mathbb {F}[\varvec{s}]}, , _{U}, f, r_f)=1\), \(\textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, , _{U}, U, r_f) = 1\). \(\textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, , _y, y, r_y)=1 \bigwedge _{k=1}^{\ell } \textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, , _{i_k}, i_k, r_{i_k}) = 1\) is straightforward from the definition of the \(\textsf{C}_{EdraxPed}\) commitment scheme for field elements type.

\(\mathcal {E}\) uses the extractor of the commitment scheme \(\textsf{PolyCom}\), \(\mathcal {E}_{\textsf{PolyCom}}\), that outputs for each \(k = 1,\dots , \ell \) \(i_k, r_{i_k}\) such that \(c_{i_k,1} = g^{i_k}h^{r_{i_k}} \wedge e(c_{i_k,1},g^\beta ) = e(c_{i_k,2},g)\) or \(\textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck},\textsf{t}_{q}, , _{i_k},r_{i_k})=1\). \(\mathcal {E}_{\textsf{Range}}\) outputs \((i,r_i)\) such that \(i < \#U\wedge \textsf{PolyCom}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{q}, , _i, i, r_i)=1\) which means that \(, _{i,1} = g^{i}h^{r_i}\). Since the proof \(\pi \) is verified then \(, _{i,1} = \prod _{k=1}^{\ell }(, _{i_k,1})^{2^{k-1}}\) or \(g^{i}h^{r_i} = g^{\sum _{k=1}^{\ell } i_k 2^{k-1}}h^{\sum _{k=1}^{\ell } r_{i_k} 2^{k-1}}\). From the binding property of the Pedersen commitment we get that \(i = \sum _{k=1}^{\ell } i_k 2^{k-1}\) and \(r_i = \sum _{k=1}^{\ell } r_{i_k} 2^{k-1}\).

Putting them together the extractor outputs \(\left( (y,(i_k)_{k \in [\ell ]},U),(r_y,(r_{i_k})_{k \in [\ell ]},r_{f}),\varnothing \right) \) such that \(\textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck},\textsf{t}_{q}, , _{y},r_{y})=1 \bigwedge _{i=1}^{\ell } \textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck},\textsf{t}_{q}, , _{i_k},r_{i_k})=1 \wedge \) \(\textsf{C}_{EdraxPed}.\textsf{VerCommit}(\textsf{ck}, \textsf{t}_{U}, , _f, U, r_f) = 1\) and further \(y =L(U)[i] \wedge i < \#U\wedge i=\sum _{k=1}^{\ell } i_k 2^{k-1}\). It is straightforward that \(y =L(U)[i] \wedge i < \#U\) means that \(y \in U\) which leads to \(R_{\textsf{mem}}(y,U) =1\). \(\square \)

6.3 Input-hiding CP-SNARKs for polynomial evaluation

Here, we present an instantiation of a a zero knowledge CP-SNARK for the relation \(R_\textsf{PolyEval}\) presented in Sect. 6.1.

To give an intuition of the protocol we recall that zk-vSQL uses Lemma 6.1 to prove the correct evaluation of the polynomial, that we recall below.

Lemma 6.1

([59]) Let \(f:\mathbb {F}^\ell \rightarrow \mathbb {F}\) be a polynomial of variable degree d. For all \(\varvec{t} :=(t_1, \dots , t_\ell ) \in \mathbb {F}^\ell \) there exist efficiently computable polynomials \(q_1, \dots , q_\ell \) such that: \(f(\varvec{z}) - f(\varvec{t}) = \sum _{i=1}^{\ell }(z_i - t_i)q_i(\varvec{z})\).

With this one can verify in time linear in the number of variables that \(f(\varvec{t}) = y\) by checking iff \(g^{f(\varvec{t})}g^{-y} = \prod _{i=1}^{\ell } e(g^{s_i},w_i)\), given the values \(g^{f(\varvec{s})},\{g^{s_i}\}_{i=1}^{\ell }, \{w_i = g^{q_i(\varvec{s})}\}_{i=1}^{\ell }\) We are interested in the committed values of \(f, y = f(\varvec{t})\) and \(\varvec{t}\), \(, _f,, _y,, _t\) respectively, that hide them. For this we will use instead the equation below for verification:

$$\begin{aligned}&\left( f(\varvec{z})+r_f z_{\ell + 1} \right) -\left( f(\varvec{t})+ r_y z_{\ell +1} \right) \\&\quad =\sum _{k=1}^{\ell }(z_k-t_k)q_k(\varvec{z}) + z_{\ell + 1}(r_f-r_y)\\&\quad =\sum _{k=1}^{\ell }(z_k-t_k)(q_k(\varvec{z})+r_k z_{\ell + 1})+z_{\ell + 1}\left( r_f-r_y-\sum _{k=1}^{\ell }r_k(z_k-t_k) \right) \\&\quad =\sum _{k=1}^{\ell } \left[ z_k-(t_k+r_{t_k}z_{\ell + 1}) \right] \cdot [q_k(\varvec{z})+r_k z_{\ell + 1}]+\\&\qquad +z_{\ell + 1}\left( r_f-r_y-\sum _{k=1}^{\ell }r_k(z_k-t_k) + \sum _{k=1}^{\ell } r_{t_k} \left[ q_k(\varvec{z})+r_k z_{\ell +1} \right] \right) . \end{aligned}$$

The equation indicates us how to construct the protocol which we present in Fig. 16.

Fig. 16
figure 16

Description of the CP-SNARK for polynomial evaluation

Theorem 6.3

Under the \((\ell +1)d\)-Strong Diffie Hellmann and the \((d,\ell )\)-extended power knowledge of exponent assumptions, \(\textsf {CP}_{\textsf{PolyEval}}\) is a Knowledge Extractable CP-SNARK for the relation \(R_{\textsf{PolyEval}}\) and the commitment scheme \(\textsf{PolyCom}\).

Proof

Below is a proof sketch, which however is quite similar to the one of \(\textsf {CP}_{\textsf{poly}}\) in [18].

Knowledge soundness The proof comes directly from Evaluation Extractability of vSQL (see [77]) with the difference that here \(t_k\) for each \(k \in [\ell ]\) should also be extracted. However, its extraction is straightforward from the extractability of the commitment scheme.

Zero-knowledge Consider the following proof simulator algorithm

\(\mathcal {\mathcal {S}_{\textsf{prv}}}(\textsf{td},, _{f}, (, _{t_k})_{k \in [\ell ]}, , _{y})\):

  • Use \(\textsf{td}\) to get \(\alpha \) and \(s_{\ell +1}\).

  • For \(k=1\) to \(\ell \), sample \(\xi _{k} {\leftarrow }{\$}\,\mathbb {Z}_{q}\) and sets \(w_k \leftarrow g^{\xi _k}\).

  • Compute \(w_{\ell +1}\) such that \(e\left( , _{f,1} \cdot , _{y,1}^{-1},g \right) =\prod _{k=1}^{\ell } e \left( g^{s_k}{, _{t_{k},1}}^{-1},w_k \right) \cdot e \left( g^{s_{\ell +1}},w_{\ell +1} \right) \) holds. That is: \(w_{\ell +1} \leftarrow \left( , _f \cdot c_y^{-1} \cdot \prod _{k=1}^{\ell } \left( g^{-s_k}, _{t_k,1} \right) ^{\xi _k} \right) ^{s_{\ell +1}^{-1}}\).

  • Use \(\alpha \) to compute \(w_k'=w_k^a\) for all \(k \in [\ell +1]\).

  • Return \(\{w_1,...,w_\ell ,w_{\ell + 1},w_1', \dots ,w_\ell ',w_{\ell + 1}'\}\)

It is straightforward to check that proofs created by \(\mathcal {\mathcal {S}_{\textsf{prv}}}\) are identically distributed to the ones returned by \(\textsf {CP}_{\textsf{PolyEval}}.\textsf{Prove}\). \((w_k)_{k \in [\ell ]}\)’s are uniformely distributed in both cases. For \(w_{\ell +1}\) there is a function W such that \(w_{\ell +1} = W(, _{f,1},, _{y,1},\textsf{vk},(, _{t_k,1})_{k \in [\ell ]},(w_k)_{k \in [\ell ]})\) in both cases. Since the inputs are either identical or identically distributed, the outputs \(w_{\ell + 1}\) are also identically distributed in the case of of \(\mathcal {\mathcal {S}_{\textsf{prv}}}\) and \(\textsf {CP}_{\textsf{PolyEval}}.\textsf{Prove}\). \(\square \)

7 Experimental evaluation

We implemented all our RSA-based CP-SNARKs for set-membership and non-membership as a Rust library cpsnarks-set[28]. Our library is implemented in a modular fashion such that any elliptic curve from libzexe[67] and Ristretto from curve25519-dalek[54] can be used. In particular, this means that our CP-SNARKs can be easily (and efficiently) used in combination with other CP-SNARKs implemented over these elliptic curves, such as Bulletproofs [13] and LegoGroth16Footnote 19 [18].

In this section, we provide details on the implementation, we present experimental results to validate the concrete efficiency of our solutions and we compare with existing approaches.

7.1 Implementation of cpsnarks-set

Our cpsnarks-set library includes implementations of the schemes \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\), \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\), \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\), and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\). In all the schemes, the RSA accumulator implementation is a modification of accumulator[15], and the internal protocols are implemented as interactive and are made non-interactive with the use of Merlin[33]. For \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\)—where we recall set elements can be binary strings and the protocol encodes them into primes—we used our implementation of LegoGroth16 [66] on top of \(\textit{libzexe}\) to provide efficient instantiations of \(\textsf {CP}_{\textsf{HashEq}}\). For \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\)—where set elements are already primes and one needs to verify a claim about ranges—we implemented two instantiations of \(\textsf {CP}_{\textsf{Range}}\): one based on LegoGroth16 and one based on Bulletproofs.

Each of the protocols \(\textsf{Root}, \textsf{Coprime}, \textsf{modEq}, \textsf{HashEq}\) and the different instantiations of \(\textsf{Range}\) are implemented individually and are further composed into the higher level membership and non-membership protocols. The higher level protocols are modular: they can use any hash-to-prime proof—or range proof in the prime elements case—as long as it implements the appropriate interface.

We benchmark the implementation on a desktop machine having a 3.8 Ghz 6-Core Intel Core i7 processor and 32GB RAM. The benchmarks code is available on [27, 28].

7.2 CP-SNARKs for set membership

For the problem of set membership, we tested the following instantiations of our solutions using the RSA-2048 [65] modulus: 1. \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) with LegoGroth16 for \(\textsf {CP}_{\textsf{HashEq}}\) and a Blake2s-based hash-to-prime mapping to 252-bit primes (\(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}^\textsf{LG}\)); 2. \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) with LegoGroth16 on the BLS12-381 curve for \(\textsf {CP}_{\textsf{Range}}\) (\(\textsf{MemCP}_{\textsf{RSAPrm}}^{\textsf{LG}}\)), and: (a) 252-bit primes, (b) 63-bit primes; 3. \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) with Bulletproofs on the Ristretto curve for \(\textsf {CP}_{\textsf{Range}}\) (\(\textsf{MemCP}_{\textsf{RSAPrm}}^{\textsf{BP}}\)), and: (a) 250-bit primes; (b) 62-bit primes.

The results of our experiments are summarized in Fig. 1.

Table 1 Set membership asymptotic complexity and benchmarks—our RSA schemes (|x|: size of set elements)
Table 2 Set membership asymptotic complexity and benchmarks—Merkle trees through [46] zkSNARK (n: number of elements in the set)

7.2.1 Comparison with Merkle-tree approach

We compare our solutions against one based on proving a valid opening of a Merkle Tree in a SNARK. Specifically, we ran experiments for Merkle trees with maximum capacities of \(\{2^{8}, 2^{16}, 2^{32}, 2^{64}\}\) elements, using the Groth16 SNARK [46] over the BLS12-381 curve, with the following hash functions: 1. Pedersen Hash over the Jubjub curve, a curve defined over the scalar field of the BLS12-381 \(\mathbb {G}_1\) group.Footnote 20 2. SHA256. The Merkle tree benchmark code is based on the production Zcash code from [76]. The results of the experiments are in Fig. 2. We recall that proofs in this solution are of 192 bytes.

As one can see from the results, our solutions are highly attractive in terms of proving time and CRS size. For instance, compared to an optimized solution based on a Pedersen-Hash-based Merkle tree containing up to \(2^{32}\) elements, our \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\) scheme for arbitrary elements enjoys a sub-second proof generation on a commodity laptop, it is more than twice faster and requires a shorter CRS. A price to pay in our solution is a larger proof size (4.4 kilobytes vs. 192 bytes) and higher verification time (31 ms vs. 2.8 ms). Nevertheless, these values stay within practical reach. When comparing to less optimized solutions based on Merkle trees (e.g., using SHA256, something common in lack of specialized elliptic curves), we achieve up to \(32\times \) faster proving time and a \(48\times \) shorter CRS.

In addition to the aforementioned gains in prover efficiency, our solutions can benefit from the use of RSA accumulators to succinctly represent sets in comparison to using Merkle trees. In particular, the algebraic properties of RSA accumulators yield simple and efficient methods to add (resp. delete) elements to (resp. from) the set.

For instance, we can insert an element in an RSA accumulator in O(1) time and space, and with the same complexity we can update each existing membership and non-membership witness. This means that, once having an updated witness, our zero-knowledge proofs can also be recomputed in O(1) time and space. With respect to deleting elements, this can also be done in constant time and space by a party who holds a valid membership witness.

Insertion and deletion in ordinary Merkle Trees may require O(n) time by rebuilding the tree from scratch from the whole set (thus also requiring O(n) storage). A more efficient method for insertion requires clients to store a “frontier” of size \(\varTheta (\textsf{log}(n))\) of internal hashes which lowers the time complexity to \(O(\textsf{log}(n))\). One can also lower deletion times to \(O(\textsf{log}(n))\) by using other techniques, e.g., [63], but at the expense of keeping O(n) storage. Updating a Sparse Merkle Trees requires O(n) time and space during updates. Inserting and deleting elements in Interval Merkle trees requires keeping the elements contiguous and sorted. This brings the time/storage complexity to O(n) for insertion and deletion, since we may need to rebuild substantial portions of the tree from scratch.

7.3 CP-SNARKs for set non-membership

For set non-membership, we tested the following instantiations of our solutions using the RSA-2048 [65] modulus: 1. \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) with LegoGroth16 for \(\textsf {CP}_{\textsf{HashEq}}\) and a Blake2s-based hash-to-prime mapping yielding primes of 252 bits; 2. \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) with LegoGroth16 on the BLS12-381 curve for \(\textsf {CP}_{\textsf{Range}}\), and 252-bit primes; 3. \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\) with Bulletproofs on the Ristretto curve for \(\textsf {CP}_{\textsf{Range}}\), and 250-bit primes.

The results of our experiments are summarized in Fig. 3.

Table 3 Set non-membership benchmarks—our RSA schemes (|x|: size of set elements)

7.3.1 Comparison to other approaches for non-membership

Non-membership proofs are usually a more computationally intensive task in SNARKs. There are two common approaches to deal with this problem using Merkle trees: sparse Merkle trees and interval Merkle trees. We did not test these solutions experimentally. However, as we detail below, creating a zero-knowledge proof for one of these solutions would not be more efficient than proving one Merkle tree path. Therefore, our solutions for non-membership achieve at least the same improvement as in the previous section.

Sparse Merkle trees for a set S are built through an ordinary Merkle Tree T on the universe \(\mathbb {U}\) of elements (we assume there is some conventional way to index the elements). For each element x not in the set S we store a dummy element in T corresponding to the index of x. For each element in the S we store that particular element at the corresponding index. In order to prove that \(x \not \in S\) we provide an opening path of a Merkle tree whose leaf is a dummy value at the right index. Although there are efficient techniques to build or update a sparse Merkle Tree [4, 30], the main drawback with this technique is the opening size, which is \(\varTheta (\textsf{log}(|\mathbb {U}|))\) instead of \(\varTheta (\textsf{log}(|S|))\). If we perform the opening inside a SNARK, we have to pay a higher proving time. For example, consider if we use SHA256 to index elements in a set with a roughly 32 bit-representations. This would require a tree of size \(2^{256}\) which typically implies at least a \(256/32 = 8\times \) slowdown.

Interval Merkle trees work by sorting the leaves on each insertion and storing a pair of adjacent elements in each leaf, signifying intervals that don’t contain elements in the set. The depth of an Interval Merkle Tree is the same as in an ordinary Merkle Tree. Nonetheless it has the following performance overheads: (i) opening requires two opening paths instead of only one (typically doubling the proving time); (ii) insertion requires sorting all leaves, which may be computationally demanding if the set is large.

Unlike either of the approaches above, the size of the set does not impact proving time in our constructions. Moreover, both insertions and non-membership witness updates are efficient to compute.

7.4 Improving running times: from statistical ZK to computational ZK

The schemes described in this section use statistically hiding commitments to achieve statistical zero-knowledge. We can improve our running times switching to computationally hiding commitments and thus computational zero-knowledge. This optimization has concrete benefits as it can cut running times by approximately half. Specifically, it reduces by 50%:

  • verification time in constructions \(\textsf{Mem}\textsf {CP}_{\textsf{RSA}}\), \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\), \(\textsf{NonMem}\textsf {CP}_{\textsf{RSA}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\);

  • proving time in constructions \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) and \(\textsf{NonMem}\textsf {CP}_{\textsf{RSAPrm}}\).

The results of our experiments for membership and non-membership are summarized in Figs. 4 and 5 respectively.

Here are more details about the optimization. Our protocols, as originally described, make use of the integer commitment of Damgard and Fujisaki [31] as described in Sect. 4.2. In this scheme we hide the value by uniformly sampling an integer r from a large set. Its size should be at least around the order of the group; for RSA groups, for example, this is equivalent to sampling \(r {\leftarrow }{\$}\,[1,N/2]\). Performing exponentiations with such a large integer—on average N/4 in the RSA case—is expensive.

To overcome this problem, we propose a computationally hiding integer commitment variant of the above, in which r is picked from a smaller set; we sample it as \(r {\leftarrow }{\$}\,[1,2^{2\lambda }]\). The scheme is hiding under the assumption that \(\{G^{r_1}:r_1 {\leftarrow }{\$}\,[1,N/2]\}\) and \(\{G^{r_2}:r_2 {\leftarrow }{\$}\,[1,2^{2\lambda }]\}\) are computationally indistinguishable.Footnote 21 This assumption can be justified in the generic group model. Similar assumption related to non-uniform distributions over \([1,\textsf{ord}(\mathbb {G})]\) have been proven secure in GGM by Bartusek et al. [3]. This approach makes exponentiations by r faster on average since \(N>2^{2\lambda }\).

Table 4 Set membership benchmarks—our RSA schemes with the computational ZK optimization (|x|: size of set elements)
Table 5 Set non-membership benchmarks—our RSA schemes with the computational ZK optimization (|x|: size of set elements)

8 Applications

In this section, we discuss applications of our solutions for proving set (non-)membership in a succinct and modular way.

As one can note, in our solutions the set of committed elements is public and not hidden to the verifier. Nevertheless, our solutions can still capture some applications in which the “actual” data in the set is kept private. This is for example the case of anonymous cryptocurrencies like Zerocash. In this scenario, the public set of elements to be accumulated, U, is derived by creating a commitment to the underlying data, X, e.g., \(u = COMM(x)\). To support this setting, we can use our solutions for arbitrary elements (so supporting virtually any commitment scheme). Interestingly, though, we can also use our (more efficient) solution for sets of primes if commitments are prime numbers. This can be done by using for example the hash-to-prime method described in Sect. 4.2 or another method for Pedersen commitments that we explain below in the context of Zerocash.

We now discuss concrete applications for which our constructions are suitable, both for set-membership and set non-membership. In particular these are applications in which: (1) the prover time must be small; (2) the size of the state (i.e.: the accumulator value and commitments) must be small (potentially constant); (3) the verifier time should be small; and (4) the time to update the accumulator—adding or deleting an element—should be fast. As we discuss below, our RSA-based constructions are suitable candidate for settings with these constraints.

8.1 Zerocash

Zerocash [5] is a UTXO-type (Unspent Transaction Output) cryptocurrency protocol which extends Bitcoin with privacy-preserving (shielded) transactions. When performing a shielded transaction users need to prove they are spending an output note from a token they had previously received. Users concerned with privacy should not reveal which note they are spending, else their new transaction could be linked to the original note that contained the note commitment. This would reveal information both to the public and the sender of the initial transaction, and hence partially reveal the transaction graph. In order to keep transactions unlinkable, the protocol uses zkSNARKs to prove a set membership relation, namely that a note commitment is in a publicly known set of “usable” note commitments.

Zcash is a full-fledged digital currency using Zerocash as the underlying protocol. In its current deployment, Sapling [49], it employs Pedersen commitments of the notes and makes a zero-knowledge set membership proof of these commitments using a Pedersen-Hash-based Merkle tree approach. This is the part of the protocol that can be replaced by one of our RSA-based solutions in order to obtain a speedup in proving time. In particular, we could slightly modify the note commitments in order to enable the use of our scheme \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) for sets of prime numbers, which gives the best efficiency. We can proceed as follows. Let us recall that the note commitments are represented by their x coordinates in the underlying elliptic curve group. We can then modify them so that the sender chooses a blinding factor such that the commitment representation of a note is a prime number, and we can add a consensus rule that enforces this check. With this change, we can achieve a solution that is significantly more efficient than that currently used in Zcash. Currently Zcash uses a Merkle Tree whose depth is 32. In this setting, we would be able to reduce proving time of set-membership from 1.12s to 54.51 ms, trading it for larger proof sizes. We note that in this application, the set-membership proof about \(u \in S\) is accompanied by another predicate P(u). In the proof statement of the Zcash protocol, proving that P(u) is satisfied takes considerably less time than the membership proof, hence this is why our solution would improve the overall proving time considerably, albeit the proof having more components. Another interesting comment is that our solution significantly reduces the size of the circuit, hence the need of a succinct proof system is reduced and one may even consider instantiations with other proof systems, such as Bulletproofs, that would offer transparency at the price of larger proofs and verification time.

8.2 Asset governance

In the context of blockchain-based asset transfers protocols, a governance system must be established to determine who can create new assets. In many cases these assets must be publicly traceable (i.e., their total supply must be public), yet in others, where the assets can be issued privately, validators still need to verify that the assets were issued by an authorized issuer. Specifically, there may be a public set of rules, X (where a \(\textsf{rule} = (pk, [a,b])\)), defining which entities (public keys) are allowed to issue which assets (defined by a range of asset types), forming an “issuance whitelist”. When one of those issuers wants to issue a new asset, they need to prove (in zero knowledge) that their public key belongs to the issuance whitelist, which entails set membership, as well as prove that the asset type they issued is within the allowed range of asset types (as defined in the original rule). In this case, the accumulated set of rules is public to all, and this public information may also include a mapping between rules and prime numbers. Our RSA-based scheme for sets of primes (Sect. 4.4) can suit this scenario.

8.3 Anonymous broadcast

In a peer-to-peer setting, anonymous broadcast allows users in a group to broadcast a message without revealing their identity. They can only broadcast once on each topic. One approach described in [64] works by asking users to put down a deposit which they will lose if they try and broadcast multiple messages on the same topic. In this approach users joining a group deposit their collateral in a smart contract. Whoever has the private key used by the client for the deposit can claim the sum. The approach in [64] makes sure that the key is leaked if one broadcasts more than one message. To enforce this leakage we require that at broadcast time users (i) derive an encryption key K that depends on their private key and the topic, and (ii) compute an encryption of the private key by the newly derived K. Then the users publish both the ciphertext and a secret share of the encryption key K, and prove (in zero-knowledge) their public key is part of the group and that (i) and (ii) were performed correctly. Which specific share needs to be revealed depends on the broadcasted message, thus making it likely two different shares will be leaked for two different messages.

This way, broadcasting multiple messages on the same topic reveals the user’s private key, allowing other users to remove them from the group by calling a function in the smart contract and receive part of the deposit.

A particularly interesting use case for anonymous broadcast is that in which the group is comprised of validators participating in a consensus algorithm, who would like to broadcast messages without exposing their node’s identity and thus prevent targeted DoS attacks. This setting requires proofs to be computed extremely fast while verification performance requirements are less strict. Our \(\textsf{Mem}\textsf {CP}_{\textsf{RSAPrm}}\) can satisfy these performance requirements trading for a modest increase in proof size.

8.4 Financial identities

In the financial world, regulations establish that financial organizations must know who their costumers are [38]. This is called a KYC check and allows to reduce the risk of fraud. Some common practices for KYC often undermines user privacy as they involve collecting a lot of personal information on them. Zero-knowledge proofs allow for an alternative approach. In modern systems, one can expect that individuals or companies will be able to prove that they belong to a set of accepted or legitimate identities. A privacy-preserving KYC check would then be reduced to generating a set-membership proof in zero-knowledge. Often some further information is required, e.g. the credit score of the individual. In such cases our CP-SNARK for set membership can be combined with one proving an additional predicate \(P(\textsf{id})\) on the identity in a modular fashion.

Regarding applications of non-membership proofs, we expand on the well-known concept of “blacklists”, where identities (or credentials) must be shown to not belong to a certain set of identities (or credentials). As an example, in the context of financial identities, anti-money laundering regulations (AML) [68] require customers not to be in a list of fraudulent identities. Here one can use our non-membership construction to generate a proof that the customer does not belong to the set of money launderers (or those thought to be). Because, as in the set-membership case, a user may have to prove additional information about their identity, here we can also benefit from a modular framework. Furthermore, modularity allows us to cheaply prove both membership and non-membership (at the same time) for the same identity \(\textsf{id}\) together with some additional information \(P(\textsf{id})\): holding commitment \(, (\textsf{id})\) one can produce the following tuple of proofs: (1) a membership proof (\(\textsf{id} \in S\)); (2) a non-membership proof (\(\textsf{id} \not \in S'\)); (3) a CP-SNARK proof that includes the statement to be proven on that identity (\(P(\textsf{id})\)).

We note that in some cases, a central authority, who controls the white and black lists, is trusted to ensure the integrity of the lists. This means that the identities can be added or removed from the lists, which means that our RSA-based construction is ideal given the comparatively reduced cost of updating the dynamic accumulator.

8.5 Zerocoin vulnerability

Another specific application of our RSA-based constructions is that of solving the security vulnerability of the implementation of the Zerocoin protocol [56] used in the Zcoin cryptocurrency [73]. The vulnerability in a nutshell: when proving equality of values committed under the RSA commitment and the prime-order group commitment, the equality may not hold over the integers, and hence one could easily produce collisions in the prime order group. Our work can provide different ways to solve this problem by generating a proof of equality over the integers.