1 Introduction

Following the seminal works of Dolev, Dwork and Naor [8] and Feige and Shamir [11] from the early 1990s, concurrent security of cryptographic protocols has been an active area of research. Yet, it is still not well-understood when and where concurrent security can be achieved. One potential reason for this might be the complexity of traditional analyses. In this work we focus on generalizing and (in our eyes) simplifying analyses of concurrent security in one of the most basic settings, namely that of zero-knowledge proofs.

Zero-knowledge (ZK) interactive proofs [15] are paradoxical constructs that allow one player (called the prover) to convince another player (called the verifier) of the validity of a mathematical statement xL, while providing zero additional knowledge to the verifier. Beyond being fascinating in their own right, ZK proofs have numerous cryptographic applications and are one of the most fundamental cryptographic building blocks. As such, techniques developed in the context of ZK often extend to more general types of interactions.

The notion of concurrent zero knowledge, first introduced and achieved in the paper by Dwork, Naor and Sahai [10], considers the execution of zero-knowledge proofs in an asynchronous and concurrent setting. More precisely, we consider a single adversary mounting a coordinated attack by acting as a verifier in many concurrent executions (called sessions). Concurrent ZK proofs are significantly harder to construct and analyze. Since the original protocols by Dwork, Naor and Sahai (which relied on so-called “timing assumptions”), various other concurrent ZK protocols have been obtained based on different set-up assumptions (e.g., [4, 7, 9]). In the standard model without set-up assumptions (the focus of our work), Canetti, Kilian, Petrank and Rosen [5] (building on earlier works by Kilian, Petrank and Rackoff [20] and Rosen [29]) show that concurrent \(\mathcal{ZK}\) proofs for non-trivial languages, with so-called “black-box” simulators, require at least \(\tilde{\varOmega}(\log n)\) number of communication rounds. On the other hand, Richardson and Kilian [28] constructed the first concurrent ZK argument in the standard model without any extra set-up assumptions. Their protocol, which uses a black-box simulator, requires O(n ϵ) number of rounds. (See also the work of Canetti, Goldreich, Goldwasser and Micali [4] for a somewhat different and more detailed analysis of this protocol.) Kilian and Petrank [19] then introduced a new oblivious zero-knowledge simulator. Using this simulation technique they obtained a simpler and cleaner analysis, and additionally improved the round complexity to \(\tilde{O}(\log^{2} n)\). Finally, the work of Prabhakaran, Rosen and Sahai [27] further simplifies and improves the analysis of the oblivious simulator, obtaining an essentially optimal round complexity of \(\tilde{O} (\log n)\).

Despite these simplifications and improvements, the analysis of concurrent zero-knowledge protocols remains quite complex. Furthermore, the different analyses are tailored to different types of protocols. In particular, the most refined analysis from [27] considers committed-verifier protocols, where the verifier commits to its messages in advance; more specifically, as far as we know, the analysis has only been applied to generalizations of the Goldreich–Kahan ZK protocol [13]. For instance, no generalizations of the Feige–Shamir ZK protocol [11] have been analyzed using it; apart from theoretical interests, the Feige–Shamir ZK protocol is noteworthy due to its efficient instantiations via “sigma protocols” [6].

In this work, we focus on simplifying and generalizing current analysis techniques for concurrent ZK. More precisely, we provide a variant of Prabhakaran, Rosen and Sahai’s (PRS) analysis [27] of the Kilian–Petrank (KP) zero-knowledge simulator [19]. Our contribution is twofold:

  • In our eyes, this analysis is simpler and more flexible than the original PRS analysis. In particular, the analysis also directly applies to more efficient variants of the KP-simulator, resulting in concurrent ZK protocols with “tight” [12, 14], and even “precise” [22] simulations (i.e., simulations where the running-time of the simulator is close to the running-time of the malicious verifier, in an execution-by-execution manner). Such results were already established in [24], but required a more elaborate analysis (building upon [27]).

  • Our analysis applies to a broad range of protocols, and in particular to Feige–Shamir-type protocols. As a consequence, we establish a simple ω(logn)-round concurrent zero-knowledge argument of knowledge for NP based on one-way functions. The same protocol construction also yields an poly(n)-round concurrent statistical ZK argument of knowledge for NP, based on one-way functions (concurrent statistical ZK arguments were first constructed in [16] using a more complex protocol). Furthermore, in a subsequent work, Lin et al. [21] rely on our analysis to construct concurrent non-malleable zero-knowledge proofs for NP; our analysis is helpful in this context since their protocol is not of the committed-verifier type.

Previous Techniques

Kilian and Petrank’s (KP) ingenious simulation technique relies on a static—and oblivious—rewinding schedule; namely, the simulator rewinds the adversarial verifier after some fixed number of messages, independent of the content of the messages and the interleaving schedule of the sessions. The crux of their analysis is to show that using this rewinding schedule, every session is “successfully rewound” at least once with high probability; in a successful rewind, the simulator can extract a “trapdoor” that will allow it to complete the simulation. To bound the failure probability, they rely on a subtle computation of conditional probabilities.

The elegant work of Prabhakaran, Rosen and Sahai (PRS) [26, 27, 30], on the other hand, directly analyze the probability space of the simulator, i.e., count the random tapes of the simulator; this makes the analysis both simpler and sharper. The idea is to show that each “bad” random tape (that produces a failed simulation) can be mapped into super-polynomially many distinct “good” tapes. This is done by identifying random tape segments, called rewinding intervals, that can be “swapped” among each other in order to turn a bad tape into a good one.Footnote 1 The crux of their proof is then to count how many such “swappings” actually generate new and distinct random tapes. However, complications arise since swappings performed on different rewinding intervals may overlap and even remove other possible rewinding intervals. A bit more precisely, the PRS analysis focuses only on “disjoint” rewinding intervals, but performs a computation based on the “multiplicity” on those intervals. A count with multiplicity is needed because the number of disjoint rewind intervals in general could not be guaranteed to be sufficiently large, at least in the case of ω(logn) round protocols. (As we shall see, in our analysis, we are able to swap also non-disjoint rewinding intervals; as a result, we can avoid the count with multiplicities.)

Additionally, to enable this counting argument, the PRS analysis bounds the failure probability of a “hybrid” simulator (which has access to the witnesses of input statements). To show that the real simulator is indistinguishable from the hybrid simulator, committed-verifier protocols are used; this is required to ensure that when changing the hybrid simulator (which uses the actual witness) to the real simulator (which does not know the actual witness), indistinguishability holds despite the rewinds performed by the simulator. Intuitively, the committed-verifier property ensures that the rewinds are “harmless.”

Our Techniques

We show how to directly analyze the failure probability of the actual simulator (as opposed to a hybrid one), while (in our eyes) simplifying the counting argument. Our key step is to identify a stronger notion of rewinding intervals, which we call composable blocks. Just like rewinding intervals, properties of composable blocks guarantee that a “swap” will generate a new good random tape; moreover, these same properties are closed under composition—namely, the swapping of one such block leaves other composable blocks intact, even if these composable blocks are not disjoint. By this new composition property, it is enough to identify K composable blocks to conclude that the simulation fails with probability less than 2K.

In essence, our proof will consist of two simple steps: First, we establish local properties of a composable block (namely, that a swap generates one new good random tape, and that swappings are composable); then, we count the number of composable blocks on a bad random tape; as we shall see, each round in the protocol gives rise to a new composable block. As such, our analysis conveys a strong intuition of how “each additional round of the protocol halves the simulator’s failing probability.” However, we emphasize that our techniques do not improve the “quantity” of the count (e.g., does not improve upon the round-complexity of the PRS protocol).

To employ this new notion of composable blocks, we consider and analyze a “lazy” variant of the KP simulator. Intuitively, the lazy KP simulator is identical to the KP simulator but only makes use of information gathered in its rewinds after some delay. The lazy KP simulator can only fail more often than the original KP simulator (and thus our analysis indirectly also applies to the KP simulator); yet, considering this “weaker” simulator enables our way of directly analyzing the failure probability of the simulation. In a sense, much like making a stronger inductive hypothesis can simplify the inductive step, our stronger notion of composable blocks and our weaker lazy KP simulator enable and simplify the analysis. We note that the PRS analysis also seems to apply to the lazy KP simulator, although this was not made use of in [27].

After directly bounding the failure probability of the real simulator, we provide a simple hybrid argument to show that the output of the simulator is indistinguishable from the view of the verifier. The base case of this hybrid argument considers only a “straight-line” (i.e., a non-rewinding) execution, and as such the analysis directly applies also to committed-verifier protocols.

Overview

We define concurrent ZK and give some preliminaries in Sect. 2. We construct and analyze computational and statistical concurrent black-box ZK arguments of knowledge in Sect. 3 (our main theorems). For completeness, we also provide a brief overview of the PRS analysis in Appendix A.

2 Preliminaries

We assume familiarity with indistinguishability and interactive proofs. [n] denotes the set {1,…,n}.

2.1 Black-Box Concurrent Zero-Knowledge

Let (P,V) be an interactive proof for a language L. An m-session concurrent adversarial verifier V is a probabilistic polynomial time machine that, on common input x and auxiliary input z, interacts with m(|x|) independent copies of P concurrently (called sessions). There are no restrictions on how V schedules the messages among the different sessions, and V may choose to abort some sessions but not others. Let \(\operatorname {View}_{V^{*}}^{P}(x,z)\) be the random variable that denotes the view of V (x,z) in an interaction with P (this includes the random coins of V and the messages received by V ). A black-box simulator S is a probabilistic polynomial-time machine that is given black-box access to V (written as \(S^{V^{*}}\)). Roughly speaking, we require that for every instance xL, and every auxiliary input z, \(S^{V^{*}(x,z)}(x)\) can generate the view of V (x,z) in an interaction with P. Since we provide V with an auxiliary input, we can without loss of generality restrict our attention to deterministic V (as V can always receive its random coins as auxiliary advice).

Definition 1

(Black-box concurrent zero-knowledge [10])

Let (P,V) be an interactive protocol for a language L. Π is black-box concurrent zero-knowledge if for all polynomials m, there exists a black-box simulator S m such that for every common input x and auxiliary input z, and every deterministic m-session concurrent adversary V , \(S_{m}^{V^{\!*}(x, z)\!}(x)\) runs in time polynomial in |x|. Furthermore, the ensembles \(\{ \operatorname {View}_{V^{*}}^{P}(x,z)\}_{x \in L, z\in \{0, 1\}^{*}}\) and \(\{ S_{m}^{V^{\!*}(x, z)\!}(x)\}_{x \in L, z \in \{0, 1\}^{*}}\) are computationally indistinguishable (as a function of |x|).

2.2 Other Primitives

Witness-Indistinguishable (WI) Proofs [11]

Roughly speaking, an interactive proof is witness indistinguishable if the verifier’s view is “independent” of the witness used by the prover for proving the statement.

Definition 2

(Witness-indistinguishability)

Let (P,V) be an interactive proof system for a language LNP with witness relation R L . We say that (P,V) is witness-indistinguishable for R L if for every probabilistic polynomial-time adversarial V and for every two sequences of witnesses \(\{w_{x}^{1}\}_{x\in L}\) and \(\{w_{x}^{2}\}_{x\in L}\) satisfying \(w_{x}^{1},w_{x}^{2} \in R_{L}(x)\), the following two probability ensembles are computationally indistinguishable as a function of n:

Proofs and Arguments of Knowledge (POK, AOK) [2, 11]

An interactive proof (resp. argument) is a proof (resp. argument) of knowledge if the prover convinces the verifier that it possesses, or can feasibly compute, a witness for the statement proved.

Definition 3

(Proofs and arguments of knowledge [2])

An interactive protocol Π=〈P,V〉 is a proof of knowledge (resp. argument of knowledge) of language L with respect to witness relation R L if Π is indeed an interactive proof (resp. argument) for L. Additionally, there exists a polynomial q, a negligible function ν, and a probabilistic oracle machine E, such that for every interactive machine P (resp. for every polynomially-sized machine P ) and every xL, the following holds:

If Pr[〈P ,V〉(x)=1]>ν(|x|), then on input x and oracle access to P (x), machine E outputs a string from the R L (x) within an expected number of steps bounded by

$$\frac{q(|x|)}{\Pr[\langle P^*, V\rangle (x) = 1] - \nu(|x|)}\,. $$

The machine E is called the knowledge extractor.

Special-Sound (SS) Proofs [6]

Special-sound proofs are proofs of knowledge with a very rigid and useful structure.

Definition 4

(Special soundness)

A 4-round interactive proof (P,V) for language LNP with witness relation R L is special sound with respect to R L if (P,V) is public-coin (i.e., verifier messages are segments of its random tape), and on input x, all verifier messages have length g(|x|)≥|x|.

Moreover, there exists a deterministic polynomial-time extraction procedure X such that on input x, with all but negligible probability in |x| over the choice of a uniform ρ∈{0,1}g(|x|), for all α, β, β′, γ, γ′ such that ββ′, and (ρ,α,β,γ) and (ρ,α,β′,γ′) are both accepting transcripts of (P,V) on input x, X(x,(ρ,α,β,γ),(ρ,α,β′,γ′)) outputs a witness wR L (x).

2.3 Known Protocols

In our construction of concurrent zero-knowledge arguments, we use:

  • 4-round computational WI and SS proofs based on one-way functions. This can be instantiated with a parallel repetition of the Blum Hamiltonicity protocol [3] with 2-round statistically binding commitments constructed from one-way functions [18, 23].

  • 4-round computational WI-AOK or poly(n)-round statistical WI-AOK based on one-way functions. Again, this can be instantiated with the Blum Hamiltonicity protocol with the help of 2-round statistically binding commitments ([18, 23], this actually gives a POK) or statistically hiding commitments [17] from one-way functions.

3 Black-Box Concurrent Zero-Knowledge Arguments of Knowledge

In this section we re-prove the following theorem.

Theorem 1

Assume the existence of one-way functions. Then every language in NP has an ω(logn)-round concurrent black-box ZK argument of knowledge, and a poly(n)-round concurrent black-box statistical-ZK argument of knowledge.

3.1 The Protocol

Our concurrent ZK protocol ConcZKArg (also used in [25]) is a slight variant of the precise ZK protocol of [22], which in turn is a generalization of the Feige–Shamir protocol [11]. The protocol for language L proceeds in three stages, given a security parameter n, a common input statement x∈{0,1}n, and a “round-parameter” kω(logn):

Stage Init: :

The verifier picks two random strings r 1,r 2∈{0,1}n and sends their images c 1=f(r 1), c 2=f(r 2) through a one-way function f to the prover. The verifier then provides, in parallel, k instances of a 4-round computationally-WI and SS proof of knowledge of the NP statement “c 1 or c 2 is in the image set of f” (a witness here would be a pre-image of c 1 or c 2). The first two (out of four) messages of each SS-POK are exchanged in this stage. The end of Stage Init is called the start of the protocol.

Stage 1: :

k message exchanges occur in Stage 1. In the jth iteration, the prover sends β j ∈{0,1}n, a random second last message of the jth SS-POK, and the verifier replies with the last message γ j of the SS-POK. These k iterations are called slots. A slot is convincing if the verifier produces an accepting proof. If there is ever an unconvincing slot, the prover aborts the whole session. The end of Stage 1 (after k convincing slots) is called the end of the protocol.

Stage 2: :

The prover provides a 4-round computational-WI (resp. poly(n)-round statistical-WI) argument of knowledge of the statement “xL, or one of c 1 or c 2 is in the image set of f.”

Completeness and soundness/proof of knowledge follow directly from the proof of Feige and Shamir [11]; in fact, the protocol is an instantiation of theirs. Intuitively, to cheat in the protocol a prover must “know” an inverse to c 1 or c 2 (since Stage 2 is an argument of knowledge), which requires inverting the one-way function f. A formal description of protocol ConcZKArg is shown in Fig. 1.

Fig. 1.
figure 1

Concurrent ZK argument of knowledge for NP with round parameter k.

3.2 The “Lazy KP” ZK Simulator

We show that whenever k is super logarithmic (i.e. k=ω(logn)), our protocol is concurrent ZK. This requires us to construct a simulator \({\textsf {Sim}}= {\textsf {Sim}}^{V^{*}(x,z)}(x)\) that, given input instance xL and black-box access to V (x,z), outputs a view that is indistinguishable from the real view of V (x,z). On a very high level, the simulation follows that of Richardson and Kilian [28]. The simulator simulates Stage Init and Stage 1 of the protocol by following the honest prover strategy, and attempts to rewind one of the slots (i.e. the last two messages of the special-sound proofs provided by V ). If the simulator obtains two matching convincing slots, i.e., the slots are from the same round of the protocol and share the same initial transcript, the special-soundness property allows the simulator to compute a fake witness r such that f(r)=c 1 or c 2. This fake witness can then be used to simulate Stage 2 of the protocol. Towards this goal, we let Sim be an oblivious black-box simulator similar to [19].

Description of Sim

Let n be the security parameter, m be a bound on the number of concurrent sessions invoked by V and T be the total number of messages exchanged, bounded by O(mk), a polynomial in n. Keep in mind that during black-box simulation, we assume without loss of generality that V is deterministic; therefore the view of V is just the transcript of its interaction with the honest prover.

In order to extract a fake witness from V , Sim follows an oblivious rewinding schedule based only on the number of messages exchanged so far, just like in [19] and [27]. During the oblivious simulation, Sim keeps a repository of all messages generated by V among all rewinds; whenever Sim encounters Stage 2 of the protocol, Sim looks for matching convincing slots in this repository to compute the required fake witness. More precisely, Sim uses the recursive procedure lazy-rewind described below.

At a high level, \(\textsf {lazy-rewind}(t, \mathcal {V}, \mathcal {T}) \to (\mathcal {V}', \mathcal {T}')\) attempts to recursively simulate V (x,z) for t messages starting from a partial view \(\mathcal {V}\) of V , with the help of of a repository of messages generated by V during rewinds, \(\mathcal {T}\) (formally just a set of all simulator query and verifier message pairs). If lazy-rewind is successful, it outputs a longer view \(\mathcal {V}'\) of V (that contains exactly t more verifier messages than \(\mathcal {V}\)), and an updated repository \(\mathcal {T}'\) including verifier messages that lazy-rewind gathered from various rewinds (and most likely contains more verifier messages than what is recorded in \(\mathcal {V}'\)). Sim simply outputs the view produced by \(\textsf {lazy-rewind}(T, \mathcal {V}=\emptyset, \mathcal {T}=\emptyset)\), i.e., lazy-rewind starting from the empty view and an empty repository.

Description of lazy-rewind(t,s,h)

At the base case of the recursion (t=1), lazy-rewind receives a message from V and produces a prover response; lazy-rewind behaves identically to an honest prover to generate Stage Init and Stage 1 messages. Whenever a session reaches end, lazy-rewind will attempt to compute a fake witness r for the session (f(r)=c 1 or c 2) by searching \(\mathcal {T}\) for matching convincing slots. If this is successful, the fake witness r is used to generate prover messages in Stage 2 of this session (i.e. the WI-POK). Otherwise, lazy-rewind outputs ⊥, which in turn causes Sim to output ⊥ as well.Footnote 2 In the end, lazy-rewind outputs the updated view \(\mathcal {V}'\) of V (the input view appended with the newly exchanged pair of messages), and the updated repository \(\mathcal {T}'\) (the input repository inserted with the newly exchanged pair of messages).

When t>1, \(\textsf {lazy-rewind}(t, \mathcal {V}, \mathcal {T})\) proceeds roughly as follows: It first recursively simulates V for t/2 messages twice starting from the partial view \(\mathcal {V}\). Then, continuing from one of those simulations, lazy-rewind recursively simulates V for another t/2 messages, twice. More formally, \(\textsf {lazy-rewind}(t, \mathcal {V}, \mathcal {T})\) calls itself four times as follows:

  1. 1.

    \((\mathcal {V}_{1}, \mathcal {T}_{1}) \leftarrow \textsf {lazy-rewind}(t/2, \mathcal {V}, \mathcal {T})\).

  2. 2.

    \((\mathcal {V}_{2}, \mathcal {T}_{2}) \leftarrow \textsf {lazy-rewind}(t/2, \mathcal {V}, \mathcal {T})\). Merge \(\mathcal {T}_{1}\) and \(\mathcal {T}_{2}\) into a larger repository of messages \(\mathcal {T}'\).

  3. 3.

    \((\mathcal {V}_{3}, \mathcal {T}_{3}) \leftarrow \textsf {lazy-rewind}(t/2, \mathcal {V}_{1}, \mathcal {T}')\).

  4. 4.

    \((\mathcal {V}_{4}, \mathcal {T}_{4}) \leftarrow \textsf {lazy-rewind}(t/2, \mathcal {V}_{1}, \mathcal {T}')\). Merge \(\mathcal {T}_{3}\) and \(\mathcal {T}_{4}\) into a larger repository of messages \(\mathcal {T}''\).

  5. 5.

    Output \((\mathcal {V}_{3}, \mathcal {T}'')\).

Because the first two recursive calls to lazy-rewind (resp. the last two calls) have identical inputs (they differ only because they use different segments of Sim’s random tape), they are called sibling calls. See Fig. 2 for an illustration of the rewinding schedule, and Fig. 3 for a pseudo-code description.

Fig. 2.
figure 2

A pictorial representation of the rewinding schedule of lazy-rewind. The boxes represent blocks, and the lines represent threads. If this is the top level call (i.e., lazy-rewind(T,∅,∅)), then the thicker thread is the output thread, whose view is the output of Sim.

Fig. 3.
figure 3

The recursive procedure used by Sim—the “lazy” KP simulator.

Let us describe some terminology that is useful for the analysis Sim and lazy-rewind. Because Sim follows an oblivious rewinding schedule, it always makes a fixed set of calls to lazy-rewind at fixed moments in the simulation, and it always “connects” these calls of lazy-rewind in a fixed way to generate partial views of V . Intuitively, a thread is one of these fixed connections.

Definition 5

(Threads)

A thread is a sequence of 0’s and 1’s; from the beginning of the simulation, this sequence specifies, whenever a pair of sibling calls are encountered, whether to follow the first or second sibling call of lazy-rewind, respectively. (A sequence may terminate prematurely to specify a “partial” thread.) The thread 00…0 (of sufficient length) is the thread that follows the first sibling calls to the end of the simulation, and is called the output thread because the view of V generated on this thread is the output of Sim.

Given an execution of Sim (on an input xL and a random tape), a block intuitively refers to the “location” (in the static rewinding schedule) of a call to lazy-rewind, as well as the actual simulation performed by the call.

Definition 6

(Blocks)

Given an execution of Sim, a block B is a pair B=(B loc ,B content ), where B loc specifies the location of a call of lazy-rewind and B content specifies the inputs and randomness of the same call. Formally B loc is a partial thread (that leads to and includes the lazy-rewind call), and B content is just the inputs and random tape used by the lazy-rewind call, i.e., \((t, \mathcal {V}, \mathcal {T}, r)\). We say a block C is contained in block B if the recursive call of lazy-rewind corresponding to block C is nested inside the recursive call of lazy-rewind corresponding to block B.

Due to the recursive nature of lazy-rewind, every block would contain four “smaller” blocks; of these four blocks, we call the first pair (resp. the second pair) sibling blocks, as they correspond to sibling calls of lazy-rewind. Finally, we say a block contains a thread if the thread “passes through” the block.

Definition 7

(Threads in a block)

Given an execution of Sim, we say a block B contains a thread h if B loc is a prefix of h.

Since lazy-rewind does not update the message repository \(\mathcal {T}\) between sibling recursive calls (sibling blocks), we call it lazy. This departs from previous works such as [19] and [27], and is crucial for our analysis. We have also changed how blocks are threaded together from [19] and [27]. In lazy-rewind, the second pair of recursive calls are continued from the first recursive call of the first pair (i.e. continued from view \(\mathcal {V}_{1}\)). This is similar to the precise simulation of [22] and [24]. This choice is inconsequential for our analysis, but will be useful later when we discuss precision in Sect. 3.6; [19] and [27], in contrast, continue the recursive calls from the view \(\mathcal {V}_{2}\). See Fig. 2 for an illustration of blocks, threads and siblings in an execution of lazy-rewind.

3.3 Proof Overview

In order to prove the correctness of the simulation, we need to show that for every adversarial verifier V , the simulator runs in polynomial time and the output distribution is “correct.” The running time of Sim can be bound just as in [19] and [27]. Sim spends a maximum of poly(n) time on responding to each verifier message. It follows from the recursive structure of the simulator that the number of messages exchanged is doubled for each level of the recursion; since we have a recursive depth of log2 T, the running time of the simulator is bounded by \(\operatorname {poly}(n) \cdot T \cdot 2^{\log_{2} T} = \operatorname {poly}(n) \cdot T^{2} =~\operatorname {poly}(n)\).

Intuitively, the correctness of the output view follows from the fact that Sim chooses Stage Init and Stage 1 messages honestly, and that the protocol used in Stage 2 is witness indistinguishable (this requires a proof later since Sim performs rewinds). Therefore, as long as Sim gets stuck (outputs ⊥) with negligible probability, taken over the random tapes of Sim (the random tape of V is fixed during black box simulation), the output distribution is correct. Towards this goal we will show that the probability of getting stuck at any point in the simulation is negligible.

Recall that Sim can only get stuck on a particular thread when the simulation reaches the end of some session and could not extract the fake witness. Following the approach of [27], we show that the probability of getting stuck on any session and any thread is negligible. Since there are only polynomially many sessions and threads, the main theorem follows by the union bound.

Fix any thread h and session i; from now on we refer to it as the “main” thread and the “main” session, and call all other threads and sessions “auxiliary.” We say a random tape of Sim is bad if Sim gets stuck at the end of main session i on the main thread thread h; all other random tapes are called good (including those that got stuck on an auxiliary session or thread). The high-level idea, just like in [27], is to show that for every bad random tape, there exists super-polynomially many good random tapes. Furthermore, the good tapes corresponding to any two bad tapes are disjoint. Hence the probability of a tape being bad is negligible. From here on, start and end refer to those on the main session and thread unless otherwise noted.

Here is how we generate good random tapes from bad ones. Recall that on a bad tape, the simulator reaches end without extracting a “fake witness.” Hence, all slots on the main thread are convincing (or else we would never reach end), but no corresponding convincing slots are on an auxiliary thread prior to end (since otherwise Sim would have extracted a witness). Intuitively, to generate a good tape from a bad one we just need to “swap” a convincing slot from the main thread into an auxiliary thread. After the swapping, should the simulation reaches end of the main session on the main thread, the newly formed convincing slot on the auxiliary thread, together with the corresponding convincing slot on the main thread, will allow Sim to compute the fake witness. Hence the simulation may continue on without getting stuck. So far we have not deviated from the analysis of [27].

To actually “swap” convincing slots, we modify the random tape of Sim. The basic operation that we perform on the random tape is to exchange the randomness used by sibling blocks (i.e., the segments of the random tape used to simulate these blocks). Since sibling blocks are identical modulo randomness, swapping the random tape between siblings swaps the simulation result in the two blocks exactly. (In the rest of the paper, we use the convention that after swapping a block B with its sibling B′, the “new block B” refers to the block in the old location of B′ with the same content as the “old block B,” i.e., \((B'_{\mathsf{loc}}, B_{\mathsf{content}})\).) Note that this “exact swap” property is made possible by the lazy nature of Sim; the same property does not hold for the KP simulator where the second sibling benefits from fake witnesses extracted during the execution of the first sibling.

Intuitively, we call a block on the main thread composable if it satisfies the following properties:

Goodness.:

Swapping a composable block with its sibling produces a good random tape.

Composability.:

The above swap leaves other composable blocks on the main thread composable.

Reversibility.:

Given the random tape obtained after swapping a composable block, there is a procedure undo that reverses the swap. This ensures that the resulting good tape is unique.

Consider K composable blocks with an ordering such that each swap will leave the successive composable blocks still composable. Then, we can generate 2K−1 good random tapes by choosing to swap each block or not in the ordering. By a simple counting argument, we will show that for any bad tape, there are k−2log2 T composable blocks with an ordering, therefore generating \(2^{k-2\log_{2} T}\) distinct good tapes. We then use the undo procedure to show that different bad tapes generate different good tapes. Thus, if kω(log2 T)=ω(logn), the probability of having a bad tape is negligible.

3.4 The Actual Proof

Formally, Sim may output ⊥ for two reasons. Firstly, it may reach end without encountering two matching convincing slots after the start of the session; we call this a rewinding failure. Secondly, Sim may not be able to compute a fake witness even though it has access to matching special-sound transcripts; we call this a special-sound failure. Special-sound failures are easy to upper bound; see Claim 8. As mentioned, the main part of the proof is bounding the probability of rewinding failures.

3.4.1 Composable Blocks

We first define the notion of composable blocks and show that they satisfy the three properties of goodness, composability and reversibility. Let us fix a particular main session and main thread, and formally define a random tape to be bad if Sim encounters a rewinding failure in the main session on the main thread; otherwise a random tape is good. From here on start and end refers to those of the main session and main thread, unless otherwise noted.

Definition 8

(Composable block)

Consider an execution of Sim with any random tape (not necessarily bad). A block B, with sibling B′, is called a composable block (with respect to the main thread and session), if it satisfies the following conditions:

Main block condition::

B contains the main thread h, a convincing slot of the main session (not necessarily on the main thread) and does not contain start (of the main session on the main thread). The last condition is equivalent to saying that the prefix of B contains start.

Sibling condition::

B′ does not contain any end (of the main session on the main thread).

Tracing condition::

The simulation after start but before B contains only convincing slots on the main thread h, and contains no convincing slots on the auxiliary threads.

As we will soon see, the Main block condition and the Sibling condition implies goodness and composability, while the Tracing condition enables the undo procedure, which implies reversibility. We also define an ordering relation > on composable blocks.

Definition 9

Let C and B be two blocks on a common thread. We write C>B iff

  • C and B are disjoint, and C occurs before B (Case 1 in Fig. 4), or

    Fig. 4.
    figure 4

    Two possible block diagrams after the swapping procedure in Claim 2 (B and B′ is swapped). The main thread is shown in a thick line, and a composable block C>B, drawn with dashed lines, is shown in two possible configurations.

  • C and B are not disjoint, and C is a larger block that contains B (Case 2 in Fig. 4)

Note that given two blocks on the same thread, if they are not disjoint, then one must contain another. Thus > is a total order on any set of blocks that share a common thread.

Finally, we define a deterministic undo function on random tapes in order to achieve reversibility:

  • Given a random tape τ′, execute lazy-rewind with the tape τ′. Call a block that does not contain the main thread special if it contains a convincing slot of the main session.

  • Let D be the first special block after start; that is, any other special block E after start satisfies D>E. Swap the parts of τ′ used by D and its sibling, and output the new random tape.

Claim 2

Let τ be a random tape (not necessarily bad). Let B be a composable block with sibling Bwhen lazy-rewind is executed with random tape τ, and let \(\mathcal {V}\) be the common prefix of B and B′. Furthermore, let τbe the random tape obtained after swapping the blocks B and B′. Then:

  1. 1.

    [Goodness]: τis a good random tape.

  2. 2.

    [Composability]: Any composable block C on τ with C>B is still composable on τ′.

  3. 3.

    [Reversibility]: undo(τ′)=τ.

Proof

Recall that after the swapping, blocks B and B′ are exchanged in the simulation.

Goodness. When lazy-rewind is executed with τ′, B′ will now be on the main thread (see Fig. 4). Recall that B′ does not contain any end of the main session (sibling condition). Thus, if the end of the main session ever occurs on the main thread, it will occur after both B and B′ are executed. In that case, both the convincing slot in B (which is now in an auxiliary thread) and the corresponding convincing slot on the main thread (which must be there before end occurs) together forms a matching pair of convincing slots that occurs after start. Moreover, this pair of convincing slots occur before end. Thus τ′ is a good tape.

Composability. Given a composable block C>B with sibling C′ on τ, we have two cases as shown in Fig. 4. In case 1, when C is disjoint from B, the swapping of B and B′ does not change the simulation inside C, C′, and between start and C. Respectively, this leaves the main block condition, sibling condition, and tracing condition of C intact on τ′. On the other hand, in case 2 where C contains B, the swapping of B and B′ again leaves the simulation inside C′ and between start and C unchanged, keeping the sibling condition and tracing condition intact. In addition, since C still contains B under τ′, and B in turn contains a convincing slot, the main block condition still holds as well (other parts of C may have changed). In both cases, C continues to be a composable block on τ′.

Reversibility. Finally, we need to show that undo(τ′)=τ. After the swap (executing with random tape τ′), block B no longer contains the main thread and contains a convincing slot; it is therefore a special block. Next we show that any block C>B is not special. Either C occurs strictly before B, or it contains B (in this case C also contains B′). In the first case, block C is unchanged during the swap, and therefore is not special because it does not contain a convincing slot (tracing condition). In the second case, since C contains B′ and therefore the main thread, it is not special. Thus, undo will always locate B as the first special block and perform the correct inverse swapping to recover τ.Footnote 3 □

The next claim demonstrates how to compose multiple composable blocks.

Claim 3

Let τ be a bad random tape, \(\mathcal{B} = \{B_{1},\ldots,B_{p}\}\) be a set of composable blocks for τ. Then, we can generate a set of good random tapes, \(S(\tau,\mathcal{B})\), by swapping the various composable blocks in  \(\mathcal{B}\), so that the following holds:

  1. 1.

    \(|S(\tau,\mathcal{B})| \geq 2^{p} - 1\).

  2. 2.

    For any bad tape τ′≠τ and any set of composable blocks \(\mathcal{B}'\) for τ′, \(S(\tau,\mathcal{B}) \cap S(\tau',\mathcal{B}') = \emptyset\).

Proof

Since all composable blocks lie on the main thread, there is a total ordering of the blocks. Without loss of generality, let B 1>B 2>⋯>B p . Consider any non-empty subsequence of 1,…,p, say u 1,…,u q . There are 2p−1 such sequences. Let \(\tau_{u_{1} \ldots u_{q}}\) be the random tapes obtained from τ by swapping the blocks \(B_{u_{i}}\) with its sibling, in the order of i=q,q−1,…,1.

From Claim 2 it follows that \(\tau_{u_{1} \ldots u_{q}}\) is a good random tape. We further note that given \(\tau_{u_{1} \ldots u_{q}}\), we can recover the blocks \(B_{u_{1}}, \ldots, B_{u_{q}}\) by repeatedly applying undo until we reach a bad tape (it will always be τ). Therefore given two different subsequences, u 1,…,u q and v 1,…,v q, we must have \(\tau_{u_{1} \ldots u_{q}} \ne \tau_{v_{1} \ldots v_{q'}}\) in order for undo to recover a different set of swapped blocks. Thus, we obtain 2p−1 distinct good random tapes.

Similarly, take any \(\alpha \in S(\tau, \mathcal{B})\) and \(\beta \in S(\tau', \mathcal{B}')\) (good tapes produced by swapping from τ and τ′, respectively). Applying undo repeatedly on α until the result is a bad tape will result in τ, while applying the same procedure on β will give τ′. If ττ′, then we must have αβ. □

Corollary 4

Suppose every bad random tape has p composable blocks. Then, the probability of a random tape being bad is at most 1/2p.

3.4.2 Number of Composable Blocks

We now proceed to count the number of composable blocks. First we introduce the notion of minimal containing blocks (this is identical to minimal rewinding intervals as defined by [27]). For each slot, its minimal containing block is the minimal block on the main thread that contains the slot. Claims 5 and 6 below together show that there are at least k−2logT composable blocks when we run Sim with a bad tape. Claim 5, which counts the number of minimal containing blocks, is identical to that of [27]; we include it here for completeness.

Claim 5

In an execution of Sim with a bad random tape, there are at least k minimal containing blocks.

Proof

As observed earlier, on a bad tape there will be k convincing slots of the main session on the main thread (in order to reach end). We merely need to show that for each slot, its respective minimal containing blocks are distinct. Suppose that two slots share the same minimal containing block of length t. Since slots on the same thread are disjoint, we reach a contradiction as one of the slots must be properly contained in one of the two smaller blocks of size t/2. □

Claim 6

Consider an execution of Sim with a bad random tape τ. If there are kminimal containing block, then there are at least k′−2logT composable blocks.

Proof

Let B be a minimal containing block that does not contain start or end. Since start (or end) can only be in at most logT different blocks on the main thread (since that is the recursion depth), we conclude that there are at least k′−2logT such blocks.Footnote 4 It remains to show that B is a composable block. Let B′ be the sibling of B.

The main block condition of composable blocks follows directly, while the tracing condition on the main thread actually holds for the whole simulation from start to end, since τ is a bad random tape. Thus, we only need to show that the sibling condition is satisfied, i.e. B′ does not contain end. Assume to the contrary that B′ does contain end. Since B and B′ are siblings with a common starting point and B contains a slot of the main session, B′ must contain that same slot in a convincing manner in order to reach end. On the other hand, B does not contain end. Thus B′ will be executed before the main thread reaches end (if at all), and this convincing slot will allow Sim to compute the witness of the main session by the same argument in Claim 2. This contradicts the fact that τ is a bad tape. □

3.4.3 Concluding the Proof

We first show that Sim gets stuck with negligible probability, and then use it in Claim 9 to conclude that the output distribution of \({\textsf {Sim}}^{V^{*}}\) is computationally (resp. statistically) indistinguishable from the real view of V .

Claim 7

Sim encounters rewinding failures with negligible probability.

Proof

As mentioned before, since there are only polynomially many sessions and threads, it suffices to show that the probability of the simulator getting stuck on any fixed thread and session is negligible. The union bound then shows that Sim overall gets stuck with negligible probability

For any fixed thread and session, combining Claims 3, 5 and 6 show that a random tape is bad with probability at most

$$\frac{1}{2^{k - 2 \log T}} $$

This is negligible in n since T is polynomial in n and k=ω(logn). □

Claim 8

Sim encounters special-sound failures with negligible probability.

Proof

Suppose for the sake of contradiction that Sim encounters special-sound failures with non-negligible probability. Consider an unbounded adversarial prover P that forwards the prover messages of the prefix of the SS-POK (Stage Init), in a random session and random thread from an execution of \({\textsf {Sim}}^{V^{*}}\), to an outside honest verifier V SS of the SS-POK (essentially we are forwarding messages between V , who acts as the prover of the SS-POK in Stage Init, to the outside honest verifier V SS ). Since an execution of \({\textsf {Sim}}^{V^{*}}\) only has polynomially many instances of Stage Init, P would contradict the special-soundness property with non-negligible probability (i.e., produce a prefix of the SS-POK where is it possible for the witness-computing procedure to fail, even when supplied with two different completions of the prefix).

The actual “forwarding” procedure of P has a subtlety due to the rewinding nature of Sim. In the middle of forwarding the prefix of a SS-POK from V to V SS , Sim may decide to rewind V partially to an earlier point in the SS-POK proof. In a naive forwarding scheme, this would require V SS to be rewound as well to generate fresh verifier messages (which cannot be done). Fortunately, since the prefix of our SS protocol has only two messages (non-interactive), such a rewinding cannot occur. □

Claim 9

If the argument of knowledge in Stage 2 is WI (resp. statistical WI), then the ensembles \(\{ \operatorname {View}_{V^{*}}^{P}(x,z)\}_{x \in L, z \in \{0, 1\}^{*}}\) and \(\{{\textsf {Sim}}(x,z)\}_{x \in L, z \in \{0, 1\}^{*}}\) are computationally (resp. statistically) indistinguishable (as a function of |x|).

Proof

We consider polynomially many intermediate hybrids Sim i , 0≤im+1, that receive the real witnesses to the statements x 1,…,x m . Sim i proceeds as Sim until the ith Stage 2 proof on the output thread, after which Sim i continues in a straight-line simulation with V using the real witnesses for Stage 2 proofs. (We note that a similar type of “cut-off” simulator was used in [1].) Sim i will output ⊥, however, should Sim encounter a rewinding or special-sound failure during the ith proof. Clearly, Sim 0 generates \(\operatorname {View}_{V^{*}}^{P}(x,z)\) and Sim m+1 generates Sim(x,z). Thus, it is enough to show that for all i, the output of Sim i and Sim i+1, are computationally (resp. statistically) indistinguishable.

We introduce yet another hybrid \({\textsf {Sim}}'_{i}\) that proceeds as Sim i except that it utilizes the extracted fake witness for the ith proof. \({\textsf {Sim}}'_{i}\) and Sim i+1 differ only in that Sim i+1 has more chances to output ⊥. But by Claims 7 and 8, both actually outputs ⊥ with negligible probability, and therefore they are statistically close. On the other hand, Sim i and \({\textsf {Sim}}'_{i}\) differ only in the ith proof, which both simulators run in a straight-line fashion, without rewinds. Therefore they are computationally (resp. statistically) indistinguishable by the WI property of the Stage 2 proof. Thus, the output of Sim i and Sim i+1 are indeed computationally (resp. statistically) indistinguishable. □

Claim 9 completes the proof of Theorem 1.

Remark

Since we have shown that our lazy simulator is a concurrent zero-knowledge simulator, it follows directly that the KP simulator is also a concurrent zero-knowledge simulator: because the KP simulator receives more information than the lazy simulator at any point during the simulation (i.e. a bigger history repository h), the probability that the KP simulator outputs ⊥ is no more than the probability that the lazy simulator outputs ⊥. Thus, the same argument presented in Claim 9 can be applied also to the KP simulator.

3.5 Improving the Running Time of the Simulator

A faster simulator gives tighter “knowledge security” [12, 14]. In this section, we bound the running time Sim by bounding the number of queries the Sim makes to V . Recall that T is the maximum number of queries needed in a straight-line execution of V . This means the recursive depth of lazy-rewind(T,⋅,⋅) invoked by Sim is at most log2 T. Since lazy-rewind doubles the number of queries per recursive depth, Sim makes at most \(T 2^{\log_{2} T} = T^{2}\) queries.

We can reduce the number of queries by following the approach of [24]. Currently, when lazy-rewind is asked to simulate t messages (a block of size t), it divides the t messages into two halves and recursively calls itself on each half (creating blocks of size t/2). One approach would be for lazy-rewind to divide each block into smaller sub-blocks, thus reducing the recursive depth. For example, an illustration of dividing a block into 3 equal parts is shown in Fig. 5 (this means lazy-rewind(t,⋅,⋅) would make 6 recursive calls of the form lazy-rewind(t/3,⋅,⋅)). Suppose we divide each block into g equal sized sub-blocks; we call g the splitting factor. Then it immediately follows that the recursive depth of lazy-rewind(T,⋅,⋅) becomes log g T, and the number of queries made by Sim is reduced to at most \(T2^{\log_{g} T}\).

Fig. 5.
figure 5

A pictorial representation of a rewinding schedule with splitting factor 3.

Now that the Sim is making less queries overall, can it still successfully generate a view of V ? It is easy to see that the combinatorial properties of composable blocks do not change with this generalized rewinding scheme. Therefore, we only need to count the number of composable blocks in this new rewinding schedule to bound Sim’s failure probability. As in Sect. 3.4.2, we start by counting the number of minimal containing blocks. The following two claims mirror Claims 5 and 6:

Claim 10

In an execution of Sim with splitting factor g on a bad random tape, there are at least k/(g−1) minimal containing blocks.

Proof

Recall that in an execution with a bad tape, there are k convincing slots of the main session on the main thread (in order to reach end). Since each slot has its respective minimal containing block, and any block can be the minimal containing block for at most g−1 slots (by the pigeon hole principle), there are at least k/(g−1) minimal containing blocks. □

Claim 11

Consider an execution of Sim with splitting factor g on bad random tape τ. If there are kminimal containing block, then there are at least k′−2log g T composable blocks.

Proof

As shown in Claim 6, it still holds that any minimal containing block that does not contain start or end is a composable block. Since start (or end) can only be in at most log g T different blocks on the main thread (since that is the recursion depth), we conclude that there are at least k′−2log g T composable blocks. □

Following the claims in Sect. 3.4.3, we conclude that Sim with splitting factor g is still a valid zero-knowledge simulator as long as

$$\frac{1}{2^{k/(g-1) - \log_g T}} $$

is negligible in the security parameter n; this holds whenever k/gω(logn). In particular, for any ε>0, if we set g=21/ε and k=ω(logn), then protocol ConcZKArg remains secure and ω(logn)-rounds, and Sim makes at most T 1+ε queries to V where T is the maximum number of queries needed in a straight-line interaction with V .

3.6 Achieving a Precise Simulation

Precise zero-knowledge, introduced by Micali and Pass [22], considers an “execution-by-execution” notion of knowledge-tightness; it requires that the view of any adversarial verifier V be simulated in time closely related to the actual running time (as opposed to the worst-case running time) spent by V in the view generated. More formally, a zero-knowledge simulator has precision p(t) if for all views \(\mathcal {V}\) of V generated by the simulator, if V takes time t in the view \(\mathcal {V}\), then the simulator must have generated the view in time p(t). See [22] for more detailed discussions and definitions.

Pandey et al. [24] recently extended precise ZK to the concurrent setting. The crux of [24] is a slightly modified KP simulator:

  1. 1.

    The KP simulator is modified to obliviously rewind based on time, i.e., the number of Turing machine steps taken by V , instead of the number of messages sent by V .

  2. 2.

    The KP simulator is modified to output the view of V on the “first” thread that it simulates (i.e., outputs the view in the first sibling block instead of the second).

In fact, the second modification could not be directly analyzed with the techniques of [24] (based on the PRS analysis). Instead, [24] asks that sibling blocks be simulated in parallel (instead of one after another); this requires subtle modifications to the PRS analysis, and the addition of a doubling trick to guess the running time of V so that the simulator knows how many recursive levels to simulated in parallel.

Looking at the lazy KP simulator, it already outputs the view of V on the “first” simulation thread. Therefore, to make the lazy KP simulator precise, we only need to modify it to rewind based on time. In other words, simply let lazy-rewind(t,⋅,⋅) simulate V for t Turing machine steps instead of t messages. The observations in Sect. 3.5 then allow us to obtain (and expand to arguments of knowledge) the main theorems of [24], namelyFootnote 5:

Theorem 12

For any integer functions k and g satisfying k(n)/g(n)∈ω(logn), there exists a O(k(n))-round concurrent zero-knowledge argument of knowledge for all of NP, based on one-way functions, with precision \(p(t) \in O(t 2^{\log_{g} t})\). In particular, for any ε>0, there exists a ω(logn)-round instantiation of the protocol with precision p(t)∈O(t 1+ε).