Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Computational Integrity. An unobserved party is often required to execute a program \({\mathbb {P}} \) on data x, using auxiliary data w. Yet, that party might benefit from misreporting the output y. For example:

  1. 1.

    Individuals and companies may benefit financially from reporting lower tax payments; in this case \({\mathbb {P}} \) is the program that computes tax, x is the tax-relevant data (w is the empty string) and y is the resulting tax.

  2. 2.

    Criminals may benefit if an innocent individual (or no individual) is prosecuted based on faulty crime-scene data analysis, and corrupt law enforcement officials to reach this outcome. In this case \({\mathbb {P}} \) is the program that analyzes crime-scene data, x may contain the cryptographic hashes of (i) a criminal DNA database and (ii) DNA fingerprints taken from the crime-scene, w is the preimage of (i), (ii) and y would be the name of a suspect.

  3. 3.

    Health-care and other insurance companies may benefit from mis-computing policy rates. In this case \({\mathbb {P}} \) may be a government-approved program that computes policy rates, x is the identifying number of a patient, w would be her medical history (including, perhaps, her DNA sequence) and y is the policy rate.

Naturally, correctness and integrity of the input data xw are preliminary requirements for obtaining a correct output y; These inputs often arrives from third parties and can be digitally signed by them, hence changing (xw) maliciously to \((x',w')\) would require their collusion. Instead, the main focus of this work is on ensuring the integrity of the computation \({\mathbb {P}} \) itself, e.g., ensuring that the reported tax y is correct with respect to the explicit input x, program \({\mathbb {P}} \) and some auxiliary input w. In spite of incentives to cheat, we often assume that unobserved parties operate with computational integrity (CI) meaning that CI statements like

$$\begin{aligned} \tau _{({\mathbb {P}},x,y,T)}:=``\exists w \text{ such } \text{ that } y= \text{ output } \text{ of } {\mathbb {P}} \text{ on } \text{ inputs } x, w \text{ after } T \text{ steps }'' \end{aligned}$$
(*)

are considered true, even when the party making the statement could benefit from replacing y with \(y'\ne y\). The assumption that parties operate with computational integrity is backed by (i) legislation and (ii) regulation, and also relies on (iii) the economic value of “integrity” to individuals, businesses and government. Manual enforcement of CI via audits and reports by trusted third parties is labor-intensive, and yet leaves the door open to corruption of those third parties. Automated CI based on cryptography (also called delegation of computation [43], certified computation [32] and verifiable computation [40]) could potentially replace this manual labor and, more importantly, introduce integrity to settings in which it is currently too costly to achieve.

Interactive Proof (IP) Systems. [5, 44] revolutionized cryptographic CI by initiating an approach that led (see below) to a viable theoretical solution to the problem of discovering false CI statements. In such systems the party that makes the CI statement () is represented by a prover which is a (randomized) algorithm. The prover tries to convince a verifier—an efficient randomized algorithm—that () is true via a court-of-law-style interactive protocol in which the verifier “interrogates” the prover over several rounds of communication. The protocol ends with the verifier announcing its verdict which is either to “accept” \(\tau _{({\mathbb {P}},x,y,T)}\) as true, or to “reject” it. The systems we focus on have only one-sided error: all true statement can be supported by a prover that causes the verifier to accept them but the verifier may err and accept falsities; the probability of error is known as the soundness-error.

Probabilistically Checkable Proof (PCP) Systems. Footnote 1 [1,2,3,4] are a particularly efficient multi-prover interactive proof (MIP) system [8] in terms of the amount of communication between prover and verifier, verification time, the number of rounds of interaction and soundness-error. Assuming T is given in binary, the set of true CI statements (eq:statement) is a \(\mathbf{{NEXP}} \)-complete language and PCPs are powerful enough to prove membership in this language. Here, the prover writes once a string of bits \(\pi _{({\mathbb {P}},x,y,T)}\) known as a PCP; its length is polynomial in the execution time T. Total verifier running time is \(\mathrm{{poly}}\log T\), which is (i) negligible compared to the naïve solution of re-executing \({\mathbb {P}} \) at a cost of T steps and (ii) nearly-optimal because every proof system for general CI statements must have the verifier running time be at least \(\varOmega (\log T)\). Using a single round, the verifier asks to read a small (randomly selected) number of bits of \(\pi _{({\mathbb {P}},x,y,T)}\); clearly the verifier cannot read more bits than its running time (\(\mathrm{{poly}}\log T\)) allows, and this amount can be further reduced to a small constant that is independent of T (cf. [34, 49, 63, 66]). Initial constructions required proofs of length \(\mathrm{{poly}}(T)\) but length has been reduced since then [21, 24, 42, 48] and state-of-the-art proofs are of quasi-linear length in T, i.e., length \(T\cdot \mathrm{{poly}}\log T\) [20, 23, 34, 62] and can be computed in quasi-linear time as well [13]. The system reported — called Scalable Computational Integrity (SCI) — implements the quasi-linear PCP system [13, 23] with certain improvements (described later).

In many cases the prover needs to preserve the privacy of the auxiliary input w (as is the case with examples 2, 3 above) while at the same time proving that it “knows” w, as opposed to merely proving that w exists. Privacy-preserving, or zero knowledge (ZK) proofs [44] and ZK proofs of knowledge [7] can be constructed from any PCP system in polynomial time [36, 55, 56] (cf. [52,53,54, 60]). Certain “algebraic” PCP systems, including SCI, can be converted to ZK proofs of knowledge with only a quasilinear increase in running time [11]; implementing this enhancement is left to future work.

A PCP verifier requires random access to bits of \(\pi _{({\mathbb {P}},x,y,T)}\); a naïve implementation in which prover sends the whole proof to the verifier would cost \(\mathrm{{poly}}(T)\) communication (and verification time) but a collision-resistant hash function can be used to reduce communication and verifier running time to \(\mathrm{{poly}}\log T\) [55]. The three messages transmitted between prover and verifier ((1) prover sends proof; (2) verifier sends queries; (3) prover answers queries) can be reduced to a single message from the prover, if both parties have access to the same random function [61]; this can be realized using a standard cryptographic hash function such as SHA-3, via the Fiat-Shamir heuristic [38] (or via an extractable collision resistant hash function [26]). The single message (published by the prover) is known as a succinct computationally sound (CS) proof \(\hat{\pi }\); its length is \(\mathrm{{poly}}\log T\) and it can now be appended to \(\tau _{({\mathbb {P}},x,y,T)}\) and then publicly verified in time \(\mathrm{{poly}}\log T\) with no further interaction with the prover. We refer to \(\hat{\pi }\) as a hash-based (CI) proof to emphasize that the only cryptographic primitive needed to implement it is a hash function.

Prior CI solutions. In spite of the asymptotic efficiency of PCPs, prior CI approaches (recounted below) did not implement a PCP system. To quote from the recent survey [77], the reason for this was that “the proofs arising from the PCP theorem (despite asymptotic improvements) were so long and complicated that it would have taken thousands of years to generate and check them, and would have needed more storage bits than there are atoms in the universe”. Due to this view (which this work challenges), five main alternatives have been explored recently, described below. Like SCI, all rely on arithmetization [59], the reduction of computational integrity statements () to systems of low-degree polynomials over finite fields. But in contrast to SCI, all previous solutions circumvent the use of core PCP techniques like proof composition [2], low-degree testing and the use of PCPs of proximity (PCPP) [20, 35]; these techniques are crucial for obtaining succinctly verifiable proofs with a public setup process, which SCI is the first to implement.

  • IP-based: The proofs for muggles approach [43] scales down Interactive Proofs (IP) from \(\mathbf{{PSPACE}} \) to \(\mathbf{{P}} \) and leads to excellent solutions for a limited yet interesting class of programs: those with high parallelism and small memory consumption; prover time for IP-based systems was reduced to quasi-linear [33] and implemented in a number of works [32, 73, 75].

  • LPCP-based: [51] proposed using additively homomorphic encryption (AHE) and linear PCPs (LPCP) to build CI proof systems that are interactive, and where the verifier’s work is amortized over multiple statements; cf. [69, 71, 72] for implementations of LPCP-based systems.

  • KOE-based: A sequence of works [28, 40, 41, 46, 58] improved on [51] by relying on Knowledge Of Exponent (KOE) assumptions and bilinear pairings over elliptic curves. KOE-based systems were implemented in [15, 19, 65, 70, 76], and further optimizations of this latter system for specific applications related to Bitcoin [64] such as smart contracts [57] and anonymous payment systems [12] are already being evaluated by commercial entities [45].

  • IVC-based: KOE-based systems require a proving key \(\mathsf {k}_\mathsf{P}\) (discussed below) that is longer than T, the number of computation cycles. Incrementally verifiable computation (IVC) [74] and bootstrapping [27] shorten the length of \(\mathsf {k}_\mathsf{P}\) to \(\mathrm{{poly}}\log T\) and an IVC-based system has been implemented recently [18].

  • DLP-based: KOE/IVC-based systems require a private setup phase that is discussed below. [47] (cf. [68]) assumes hardness of the Discrete Logarithm Problem (DLP) to build a system that requires only a public setup, like SCI. Proof length in the initial works above was \(\varTheta \left( \sqrt{T}\right) \) and this was reduced to \(\mathrm{{poly}}\log T\) in [29], which also implemented both versions; verifier running time in both variants is \(\varOmega (T)\).

Comparing SCI to Prior CI Solutions. SCI is the first CI solution that achieves both (1) a short public randomness setup phase and (2) universal scalability for one-shot computation. We discuss the significance of these properties after explaining them. (A quantitative comparison of the running time, memory consumption and communication complexity of SCI to prior systems appears in Sect. 2 and Table 1.)

One-shot Universal Scalability (OSUS). A CI system is universally scalable if for any fixed program \({\mathbb {P}} \), prover running time is bounded by \(T\mathrm{{poly}}\log T\) and verification time is at most \(\mathrm{{poly}}\log T\) where T is the number of machine cyclesFootnote 2. If the same asymptotic running times hold even for a single execution of \({\mathbb {P}} \), and where the setup (“preprocessing”) is carried out by the verifier (and hence setup-cost is part of the total verification-cost), we shall say that CI solution is one-shot universally scalable (OSUS). DLP-based systems have super-linear verification time, hence are not scalable for any program. IP-based systems are efficient only for highly-parallel computations, thus are not universally scalable. LPCP- and KOE-based systems are universally scalable but not OSUS because they require a proving key \(\mathsf {k}_\mathsf{P}\) that is longer than T which must be generated by the verifier (in the one-shot setting). Of all prior solutions, only the IVC-based one is OSUS, like SCI.

Public Setup. All implemented solutions but for DLP-based and SCI, if instantiated as publicly verifiable CI systems, require a setup phase (“preprocessing”), the output of which is a pair of keys (\(\mathsf {k}_\mathsf{P},\mathsf {k}_\mathsf{V}\)), one needed for proving statements, the other for verifying them. A “trapdoor key” \(\mathsf {k}_\mathsf{tpdr}\) is associated with \((\mathsf {k}_\mathsf{P},\mathsf {k}_\mathsf{V})\) and can be used to forge pseudo-proofs of false statements. Furthermore, \(\mathsf {k}_\mathsf{tpdr}\) can be recovered by the parties that run the preprocessing phase. Secure multi-party computation can boost security by “distributing knowledge” of the trapdoor among several parties [17] so that all of them have to be compromised to recover \(\mathsf {k}_\mathsf{tpdr}\); but this does not remove the concern that \(\mathsf {k}_\mathsf{tpdr}\) has been recovered by collusion of all parties, or retrieved by a central party eavesdropping to all of them. Even if \(\mathsf {k}_\mathsf{tpdr}\) has not been recovered by anyone, its mere existence may erode trust in such systems. (Cf. [6] for a recent discussion of setup-attacks and their implications and mitigations.) In contrast, SCI and DLP-based systems require only a short public random string when instantiated as a publicly verifiable noninteractive CI system.

Discussion. The combination of OSUS and public setup which is unique to SCI has three implications: (i) the ease of setting up and modifying CI systems based on it is relatively small, (ii) the trust assumptions made by parties using it are comparatively minor and hence (iii) it seems more suitable than existing solutions for use in decentralized and public settings, like Bitcoin. We repeat and stress that many such applications require zero-knowledge proofs, a property achieved by prior solutions and not achieved by SCI; augmenting SCI to obtain zero knowledge seems within reach [11] but is outside the scope of our work.

SCIMain Technical Contributions. We faced three major challenges when attempting to construct PCP systems that scale well and apply to general programs, and SCI is the first implementation to contain scalable solutions to each of them, reported here for the first time: (i) implementing the recursive proof composition [2] technique applied to PCPs of proximity (PCPPs) [20, 35] (ii) constructing quasi-linear PCPP systems for Reed-Solomon (RS) error correcting codes [67] of huge message length [23] that require, in particular, quasi-linear time algorithms for interpolation and multi-point evaluation of large-degree polynomials over finite fields of characteristic 2; and (iii) reducing general programs that include jumps, loops, and random access memory (RAM) instructions to succinct Algebraic Constraint Satisfaction Problem (sACSP) instances that “capture” the corresponding CI statement (); prior arithmetization solutions require the verifier, or a party trusted by it, to “unroll” a T-cycle computation to obtain an arithmetic circuit of size \(\varOmega (T)\), whereas SCI ’s verifier is succcint and does not perform this unrolling. (All prior solutions arithmetize over large prime fields; SCI is also novel in its being the first arithmetization over large binary fields, which poses new challenges, especially for integer operations like addition and multiplciation, cf. Section B.1.)

To overcome the blowup (i) that is due to recursive PCPP composition, we replace PCPPs with interactive oracle proofs of proximity (IOPPs) [9, 10, 37], implemented here for the first time, and increase the number of rounds of interaction between prover and verifier; the extra rounds can be removed in the random oracle model [37]. To address (ii) we built a dedicated library that implements finite field arithmetic efficiently (reported in [22]) and used it to further implement additive Fast Fourier Transforms (aFFT) [39] that perform interpolation and multi-point evaluation in quasi-linear time and in parallel (via multi-threading); the large-scale additive FFTs are reported here for the first time. To solve (iii) and reduce general programs to PCP systems efficiently, we devise a novel reduction from general programs for random access machines to sACSP instances. We describe these three contributions in more detail in Sect. 3 and the appendix.

2 Measurements

SCI can be applied to any language in \(\mathbf{{NEXP}} \); for concreteness we picked two programs computing the NP-complete subset-sum problem (cf. Appendix C); we explain this choice after introducing the two programs. The input to the subset-sum problem is an integer array A of size n and a target integer t; the problem is to decide whether there exists a subset \(A'\subset A\) that sums to t. The CI statement addressed here is the co-NP version of the problem, stating “no subset of A sums to t” and denoted by \(\tau _{(A,n,t)}\). The two programs differ in their time and space consumption. The first one exhaustively tries all possible subsets, requiring \(2^n\) cycles but only O(1) memory, hence can be executed using only the local registers of the machine and with no random access to memory. The second program uses sorting and runs in time \(O(2^{n/2})\), a quadratic improvement over the exhaustive solution but it also requires \(\varTheta (2^{n/2})\) memory and hence uses the random access memory. We denote the two programs by \({\mathbb {P}} _\mathrm{exh}\) and \({\mathbb {P}} _\mathrm{sort}\), respectively.

On Choice of Programs. We would like to run SCI on “real-world” applications like the examples given in the introduction but our current scalability is not up to par. This situation is similar to that of the very first works on other CI solutions (cf. [15, 33, 65, 69]): initial reports discussed only small word-size machines, restricted functionality and simple programs. Like some of those works (most notably, [19]) we use the 16-bit version of the TinyRAM architecture as our model of computation, and support all of its assembly code even though these two programs use only a subset of it. We focus on subset-sum for two reasons: (i) it is a natural NP-complete problem that is often used in cryptographic applications but more importantly (ii) it allows us to display the effect of time–space tradeoffs on our CI solution (cf. Figure 2). Since SCI supports non-determinism, we could have used the non-deterministic version of the subset-sum statement. In fact, this would have reduced prover and verifier complexity because fewer boundary constraints are imposed on the input. However, the resulting statement seems less interesting, saying “there exists A such that no subset of it sums to t”.

Measurement Range. Input array size n ranged between \(3\)\(16\). Prover data was measured on a “large” server with 32 AMD Opteron cores at clock rate 3.2 GHz and 512 Gigabytes of RAM, running with two threads per core (total of 64 threads); to bound the single-core/thread prover time one may multiply the stated times by \(\times 32 / \times 64\) respectively. Verifier data was measured on a “standard” laptop, a Lenovo T440s with Intel core i7-4600 at clock rate 2.1 GHz and 12 Gigabytes RAM. We stress that verifier succinctness for one-shot programs allows us to measure verifier running time independently of prover running time, all the way up to \(2^{47}\) machine cycles. Both prover and verifier were measured for 1-bit security and 80-bit security using state-of-the-art PCPP and IOPP security estimates [9].

Prover Time and Memory. The left column of Fig. 1 presents the running time (top) and memory consumption (bottom) of the Prover for both \({\mathbb {P}} _\mathrm{exh}\) and \({\mathbb {P}} _\mathrm{sort}\) as a function of the number of machine cycles of the simulated machine for both 1-bit and 80-bit security level. The two main observations from these figures are that (i) resources scale quasi-linearly with number of cycles and (ii) \({\mathbb {P}} _\mathrm{sort}\) is more costly than \({\mathbb {P}} _\mathrm{exh}\) due to its random access memory usage, which increases proof length by \(\times \log ^{O(1)} T\) factor for a T-cycle execution (cf. Section 3). Figure 2 compares time and memory as a function of the size on the input array n and shows that for \(n\ge 8\) the quadratic running-time improvement of \({\mathbb {P}} _\mathrm{sort}\) over \({\mathbb {P}} _\mathrm{exh}\) outweighs the \(\times O(\log T)\) factor required by random access to memory, both for 1-bit and 80-bit security level.

Verifier Time and Query Complexity. The right column of Fig. 1 shows verifier running time (top) and query complexity (bottom) for both programs for both 1-bit and 80-bit security levels. Notice the \(\approx 2^{13}\)\(2^{23}\times \) factor improvement of verifier over prover in both parameters (recall \(1MB=2^{10}KB\)) and the increase in running time as a function of security due to repetition. For small n verifier running time is greater than that of the naïve verifier which re-runs the program. However, since naive verification grows like \(2^n\) for \({\mathbb {P}} _\mathrm{exh}\) and like \(2^{n/2}\) for \({\mathbb {P}} _\mathrm{sort}\), for \(n\ge 22\) (at 80-bit security) our verifier is more efficient than the naïve one for \({\mathbb {P}} _\mathrm{exh}\), and for \(n\ge 48\) the verifier for \({\mathbb {P}} _\mathrm{sort}\) is more efficient than the naïve one (cf. Figure 3).

Table 1. Quantitative comparison of SCI with KOE-based [15], IVC-based [18] and DLP-based [47] solutions. Data measured on executions of \(2^{16}\) cycles of \({\mathbb {P}} _\mathrm{exh}\) at an 80-bit security level on the same machine with 32 AMD Opteron cores at clock rate 3.2 GHz and 512 Gigabytes of RAM. The DLP-based column is extrapolated from [47, Table 2], accounting for (i) the larger circuit size of our computation (which has \(\sim 132\)M gates compared with maximal size of 1.4M gates there) and different compute architectures (single threaded Intel 4690 K core vs. 64 threaded AMD Opteron). Notice the proving time of SCI is \(\sim \times 2--4\) slower than KOE- and DLP-based and \(\sim \times 150\) faster than IVC-based. Regarding total communication complexity, SCI is more efficient than prior solutions but less efficient when measuring only post-processing communication.

Quantitative Comparison with other CI Implementations. Table 1 compares SCI to three recent CI systems, the KOE-based [15], the IVC-based [18], and the DLP-based [47], using the version with \(\mathrm{{poly}}\log (T)\) communication complexity. One sees that SCI has the shortest and fastest setup but larger post-setup communication complexity; post-setup verification is faster than DLP-based but slower than KOE/IVC-based, as predicted by theory. Two other important points are: (i) proofs in SCI are not zero-knowledge whereas the other solutions are, and (ii) the setup of the last two columns (DLP-based and SCI) is comprised only of a public random string, whereas KOE/IVC-based solutions require private setup and involve a trapdoor that can be used to forge proofs of false statements.

Fig. 1.
figure 1

Comparison of prover (left) and verifier (right) running time (top) and memory consumption (bottom). The sharp drop in query complexity is due to transition from 2 to 3 levels of recursion in the RS-PCPP; as seen in the top-right, this has little effect on overall verifier running time, which is significantly smaller than prover running time, and also grows at a considerably slower rate as a function of # cycles. Answers to verifier queries provided by random strings which simulates accurately actual proofs because verifier is non-adaptive, i.e., its running time is independent of the proof content.

Fig. 2.
figure 2

Prover running time (left) and memory consumption (right) as a function of input array size n. For \(n \ge 8\) the quadratic running-time improvement of \({\mathbb {P}} _\mathrm{sort}\) over \({\mathbb {P}} _\mathrm{exh}\) overcomes the \(\times \mathrm{{poly}}\log T\) factor overhead of \({\mathbb {P}} _\mathrm{exh}\) due to random memory access; this holds for both 1-bit and 80-bit security level.

Fig. 3.
figure 3

Computation of the break-even point [71, 72], the minimal input size n for which naïve verification via re-execution becomes more costly than PCP-based verification. For \({\mathbb {P}} _\mathrm{exh}\) at 80-bit security this threshold is at \(n=22\) and for \({\mathbb {P}} _\mathrm{sort}\) it is significantly higher, estimated around \(n=48\), due to quadratic improvement in running time of the latter program.

3 Overview of Construction

The construction of the PCP \(\pi _{({\mathbb {P}},x,y,T)}\) for the computational statement \(\tau _{({\mathbb {P}},x,y,T)}\) follows the rather complex process detailed in [13, 14, 21, 23] which we summarize next (see Appendix A). The statement \(\tau _{({\mathbb {P}},x,y,T)}\) is converted into an instance \(\psi _{({\mathbb {P}},x,y,T)}\) of an algebraic constraint satisfaction problem (ACSP) over a finite fieldFootnote 3 \(\mathbb {F}\) of characteristic 2 and \(\tau _{({\mathbb {P}},x,y,T)}\) is used by prover and verifier as described next.

Prover. To construct the PCP, the prover executes \({\mathbb {P}} \) on input x and encodes the execution trace by a Reed-Solomon [67] codeword \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) evaluated over an additive sub-group of \(\mathbb {F}\). The ACSP instance \(\psi _{({\mathbb {P}},x,y,T)}\) is applied to \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) as described in [23, Equation (3.2)] to obtain an additional RS-codeword, denoted \(\mathsf {b}_{({\mathbb {P}},x,y,T)}=\psi _{({\mathbb {P}},x,y,T)}(\mathsf {a}_{({\mathbb {P}},x,y,T)})\), that “attests” to the fact that \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) encodes a valid execution trace, and hence, in particular, its output is correct. Each of the two codewords is appended with a PCP of proximity (PCPP) for the RS-code [23], denoted \(\pi _\mathsf{{a}},\pi _\mathsf{{b}}\), respectively. The PCP \(\pi _{({\mathbb {P}},x,y,T)}\) is defined to be the concatenation of \(\mathsf {a}_{({\mathbb {P}},x,y,T)},\mathsf {b}_{({\mathbb {P}},x,y,T)}, \pi _\mathsf{{a}}\) and \(\pi _\mathsf{{b}}\).

Verifier. The verifier queries the four parts of the PCP in the following manner: First it invokes an RS-PCPP verifier that queries \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) and \(\pi _\mathsf{{a}}\) to “check” that \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) is close in Hamming distance to a codeword of the RS-code; it repeats this process with respect to \(\mathsf {b}_{({\mathbb {P}},x,y,T)}\) and \(\pi _\mathsf{{b}}\). Second and last, the verifier queries \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) and \(\mathsf {b}_{({\mathbb {P}},x,y,T)}\) and uses \(\psi _{({\mathbb {P}},x,y,T)}\) to check that the two codewords encode a valid computation of \({\mathbb {P}} \) that starts with x and reaches y within T cycles. In this process we rely on the “locality” of the mapping \(\psi _{({\mathbb {P}},x,y,T)}:\mathsf {a}_{({\mathbb {P}},x,y,T)}\rightarrow \mathsf {b}_{({\mathbb {P}},x,y,T)}\) which means that each entry of \(\mathsf {b}_{({\mathbb {P}},x,y,T)}\) depends on a small number of entries of \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\). In what follows we elaborate on the novel aspects of this reduction as implemented in SCI.

From Assembly Code to Succinct ACSP. The efficiency of the ACSP instance \(\psi _{({\mathbb {P}},x,y,T)}\) is measured by three parameters that we seek to minimize: circuit size, degree, and query complexity, denoted \(C_{({\mathbb {P}},x,y,T)}, D_{({\mathbb {P}},x,y,T)},Q_{({\mathbb {P}},x,y,T)}\) respectively. Circuit size affects both proving and verification time; degree affects PCP length and reducing it decreases running time and memory consumption on the prover side; query complexity affects the length of communication between prover and verifier (and the length of computationally sound (CS) proofs \(\hat{\pi }\)) as well as verifier running time. Each parameter can be optimized at the expense of the other two, and the challenge is to reach an efficient balance between all three.

Our starting point is a program \({\mathbb {P}} \), i.e., a sequence of instructions for a random access machine (RAM). For simplicity we first focus on instructions that access only (local) registers; random access memory instructions are discussed below. Each instruction specifies the input and output register locations and an operation applied to the inputs, called the opcode. We build \(\psi _{({\mathbb {P}},x,y,T)}\) bottom-up (cf. Appendix B for a detailed example). Each opcode \(\mathsf{{op}}\) appearing in \({\mathbb {P}} \) (like xor, add, jump, etc.) is specified by an algebraic definition over \(\mathbb {F}\); in other words, we specify a set of multi-variate polynomials \(\mathcal{{P}}_\mathsf{{op}}\subseteq \mathbb {F}[X_1,X_2,\ldots ,X_m]\) such that the set of common zeros of \(\mathcal{{P}}_\mathsf{{op}}\) correspond to correct input-output tuples for \(\mathsf{{op}}\). Program flow is controlled by multiplying each polynomial in \(\mathcal{{P}}_\mathsf{{op}}\) by a multivariate Lagrange “selector” polynomial that, based on the value v of the program counter (PC), annihilates all constraints that are irrelevant for enforcing the vth instruction of \({\mathbb {P}} \). For a program with \(\ell \) lines these selector polynomials have degree \(\lceil \log \ell \rceil \). The resulting ACSP has circuit size \(O(\ell )\) and degree and query complexity are \(\log \ell +O(1)\); the constants hidden by asymptotic notation depend on the machine specification.

Random Access Memory Instructions. The execution trace of \({\mathbb {P}} \) is the length–T sequence of machine states that describes the computation. To verify the integrity of random access memory instructions (such as load and store) we follow [13, 14] and use a pair of execution traces. The first trace, \(\mathsf{{trace}}^\mathsf{{time}}\), is sorted increasingly by time, and the second, \(\mathsf{{trace}}^\mathsf{{mem}}\), is sorted lexicographically first by memory location, then by time. RAM-related execution validity is verified “locally” by inspecting pairs of consecutive elements in \(\mathsf{{trace}}^\mathsf{{mem}}\), just like non-RAM related instructions are verified “locally” by inspecting pairs of consecutive elements in \(\mathsf{{trace}}^\mathsf{{time}}\). To further reduce proof length and query complexity, each state of \(\mathsf{{trace}}^\mathsf{{mem}}\) contains only the information needed to check memory consistency — an address, its content and the type of memory access (load/store); let s denote the number of field elements in a single line of \(\mathsf{{trace}}^\mathsf{{mem}}\).

To prove that \(\mathsf{{trace}}^\mathsf{{mem}}\) and \(\mathsf{{trace}}^\mathsf{{time}}\) refer to the same execution, the prover must describe a permutation between the two, and the verifier must check its validity. To achieve this SCI uses a non-blocking Beneš switching network [25, 31] embedded in an affine graph over \(\mathbb {F}\) (cf. [14, 23] for definitions). Using this method, adding RAM-related instructions to a program adds only \(O(T\cdot \log T)\) field elements to the PCP and increases query complexity by a small constant.

Reducing Proof Construction Time via Interactive Oracle Proofs of Proximity (IOPP). A significant portion of the prover running time and memory consumption are dedicated to the construction of the PCP of Proximity (PCPP) for \(\mathsf {a}_{({\mathbb {P}},x,y,T)}\) and for \(\mathsf {b}_{({\mathbb {P}},x,y,T)}\). The full PCPP for an RS-codeword of degree N is of length \(O(N\log ^{2.6} N)\) which is quite large in our applications. Observing that (i) these PCPPs are built using recursive PCPP composition [21], and (ii) only a small fraction of recursive branches are explored by the verifier, we increase the number of rounds of interaction and use a notarized interactive proof of proximity (NIPP) [9], a special case of interactive oracle proofs of proximity (IOPP) [10, 37] to reduce proof length to \(4N +O(\sqrt{N})\). The added rounds of interaction can be removed in the random oracle model to obtain computationally sound proofs [37].

Parallel Implementation of PCPPs for RS Codes. To reduce the time required to encode the execution trace into a pair of RS-codewords, SCI uses parallel algorithms for finite field operations and for dealing with polynomials over finite fields of characteristic 2. To speed up basic field operations (most notably, multiplication) a dedicated algebraic library was built, that utilizes parallel hardware on multi-core CPU. Interpolation and evaluation of polynomials over affine spaces of size N are computed in quasilinear time using so-called additive Fast Fourier Transform (aFFT) [39].

4 Concluding Remarks

SCI is the first implementation of a system of computational integrity that achieves asymptotic one shot universal scalability (OSUS) with a setup key that is merely a public random string. Prior solutions either required super-linear verification time, or used a setup procedure that involves keys which could be used to forge proofs of falsities. While the computer programs on which SCI was tested are of limited applicability, the simpler setup assumptions of SCI make it a natural starting point for building further applications — most notably zero knowledge proofs — for use in decentralized networks.