Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Cloud computing provides an attractive solution for computationally weak clients that need to outsource data and perform large-scale computations on the outsourced data. This however raises the important security requirement of enabling the client to verify the correctness of the outsourced computation. A cloud server may return an incorrect result, accidentally or intentionally, and the ability to verify the result is a basic requirement. This requirement has motivated the research on the verifiability of outsourced computation in two directions: exploring the theoretical foundation of what computations can be securely outsourced, and proposing secure solutions for specific problems with emphasis on practicality. Our work follows the latter direction.

Verifiable Computation. Several models have been proposed for secure outsourcing of computation. In the verifiable computation (VC) model of Gennaro, Gentry and Parno [14], the client’s data defines a function and a computation is equivalent to evaluating this function that is computationally expensive. To outsource this computation, the client computes a one-time encoding of the function and stores it at the server. This enables the server to not only evaluate the function on any input, but also provide a proof that the evaluation has been done correctly. The client’s verification must be substantially less time-consuming than evaluating the original function. The effort of generating the one-time encoding will be amortized over multiple evaluations of the function and so is considered acceptable.

Following [14] a number of VC schemes [2, 10, 11, 14, 21] to delegate generic functions have been proposed. These schemes are based on fully homomorphic encryption (FHE) and so with today’s constructions of FHE, cannot be considered practical. Benabbas et al. [5] initiated a line of research [5, 9, 13, 20] on practical VC for specific functions such as polynomials, which do not require heavy cryptographic computations such as FHE. In a VC for polynomials, the client’s data consists of the coefficients of a polynomial. The client stores an encoding of the coefficients on a cloud server; this encoding allows the server to evaluates the polynomial on any requested point; the client can efficiently verify the server’s computation. These schemes are secure against a malicious server which is allowed to make a polynomial (in the security parameter) number of attempts to deceive the client into accepting a wrong computation result, with each attempt being told successful or not.

Practical VC schemes however are limited to the computation of linear functions on the outsourced data (e.g., evaluating a polynomial at a point x is equivalent to computing the inner product of a vector defined by the coefficients with a vector defined by x and linear in the coefficients). This means that even simple statistical functions such as variance, cannot be computed. Also, the encoding of the function doubles the cloud storage needed by the function itself. Evaluating polynomials arise in applications such as proof of retrievability and verifiable keyword search [13], where the number of polynomial coefficients is roughly equal to the number of data elements in a file or database. In those scenarios doubling the cloud storage will result in a substantial increase of the client’s expense and will become increasingly problematic as more and more data is outsourced.

Homomorphic MAC. Homomorphic MAC (HomMAC) [16] allows a client to store a dataset (a set of data elements) on a cloud server, and later request the computation of some specified functions, referred to as the admissible function family, on the dataset. The dataset may consist of employee records of an institution and a possible computation could be evaluating a function of the records. One can add elements to, or remove elements from, the dataset as needed. The encoding of the dataset consists of all data elements and a special MAC tag for each data element. The tags allow the server to produce a MAC tag for the computation of any admissible function.

HomMACs for admissible linear functions [1] and admissible nonlinear functions [4, 8, 16] have been proposed. Some of these schemes require heavy cryptographic computations, such as FHE [16]. Catalano and Fiore [8] proposed an elegant HomMAC for high degree polynomials with efficient server computations (including PRF computations and polynomial evaluations over relatively small finite fields). The client verification cost however is effectively the same as performing the outsourced computation. Backes, Fiore and Reischuk [4] removed this drawback by restricting the class of admissible functions to polynomials of degree 2. They considered the computations of the same function on multiple datasets. The verification of the computations requires an expensive preprocessing which is done only once and amortized over all verifications. Restriction on the degree of the polynomials however limits their applicability. For example an important task in data analysis is to determine if a dataset is normally distributed. Two commonly used statistical measures for symmetry and flatness of a distribution relative to normal distribution, are skewness and kurtosis, which require computation of degree 3 and 4 polynomials of the data elements, respectively.

Compared to the VC model of [5, 14], the security model of HomMAC is more demanding. Here the server is allowed to learn the MAC tags of arbitrary data elements of its choice and also make a polynomial (in the security parameter) number of attempts to deceive the client into accepting a wrong computation result, with each attempt being told successful or not. This stronger security property means that the HomMACs can be straightforwardly translated into VC schemes but the converse may not be true in general. In a HomMAC based VC scheme the server has to store both the data elements and the MAC tags. This usually doubles the cloud storage consumed by the data elements.

An additional desirable property of HomMACs is that they allow composition. That is, given multiple computation results and their MAC tags, one can perform a high level computation on these results and also generate a corresponding MAC tag for this high level computation.

Motivation. The existing VC schemes satisfy a subset of the following desirable properties: (p1) Large admissible function family: enabling the computation of high degree polynomials (not limited to the linear and quadratic ones) on the outsourced data; (p2) Efficient client verification: the client’s verification is substantially less expensive than computing the delegated computation; (p3) Efficient server computation: the server does not need to do heavy cryptographic computations (such as FHE); (p4) Efficient server storage: the server stores an encoding of the client’s data and the encoding consumes almost no extra storage than the data itself. (p5) Unbounded data: the client can freely update the outsourced data by adding new elements to every dataset and also adding new datasets. Our goal is to construct schemes that provide all the above properties.

1.1 Our Contributions

We introduce batch verifiable computation (BVC), and construct two BVC schemes that satisfy properties (p1)–(p5). Similar to Backes et al. [4], we also consider outsourcing of multiple datasets with two labels. The outsourced data m defines an \(N\times s\) matrix \((m_{i,j})_{N\times s}\), where each column is called a dataset, and each entry \(m_{i,j}\) is labeled by a pair \((i,j)\in [N]\times [s]\). However, the similarity ends here: Backes et al. allow computation of different functions on each dataset with the restriction that the polynomials are of degree at most two. Our main observation is that by batching computation of the same function on all datasets, an impressive set of properties can be achieved. In particular one can save storage at the server, and this saving will be significant when the computation on more datasets are outsourced. In BVC the client computes a tag \(t_i\) for the ith row of m for every \(i\in [N]\), and stores \(\mathbf{t}= (t_1,\ldots , t_N)\) as an extra column at the cloud server. A computation is specified by a program \(\mathcal{P}= (f, I)\), where \(f(x_1,\ldots ,x_n)\) is a function and \(I=\{i_1,\ldots ,i_n\}\subseteq [N]\) specifies the subset of elements of each dataset which will be used in the computation of f. Given the program \(\mathcal{P}\), the server returns s results \(\rho _1=f(m_{i_1,1},\ldots , m_{i_n,1}),\ldots ,\rho _s=f(m_{i_1,s},\ldots ,m_{i_n,s})\) and a single batch proof \(\pi \); the client accepts the s results only if they successfully pass the client’s verification. A BVC scheme is secure if no malicious cloud server can deceive the client into accepting wrong results. We consider the computation of any polynomial function (i.e., arithmetic circuit) on the outsourced data, and construct two BVC schemes with the following properties.

Large Admissible Function Family. The first scheme admits polynomials of degree as high as any polynomial in the security parameter. The second scheme admits any constant degree polynomials. The only other known practical schemes that can compute the same class of functions is from [8] in which the client’s verification is effectively as heavy as the outsourced computation.

Efficient Client Verification. In our BVC schemes the client can verify the computation results on the s datasets using a single batch proof that is computed from the tag column. In both schemes verifying the computation result of each dataset is by evaluating the batch proof (which is a polynomial) at a specific point. The batch proof in the first scheme is a univariate polynomial of bounded degree, and in the second scheme is a multivariate polynomial of bounded degree. Compared with the naive construction where the scheme in [8] is used on each dataset, the client’s average verification cost in our schemes is substantially less than what is required by the original computation as long as s is large enough.

Efficient Server Computation. The server computation in our schemes consists of PRF computations and polynomial evaluations over relatively small finite fields (such as \(\mathbb {Z}_p\) for \(p\approx 2^{128}\) when the security parameter \(\lambda =128\)). This is similar to [8] and more efficient than [4] where the server must compute a large number of exponentiations and pairings over significantly larger groups.

Efficient Server Storage. In a VC (or BVC) scheme the client stores an encoding of its data on the cloud server. We define the storage overhead of a VC (or BVC) scheme as the ratio of the size of the encoding to the size of data. It is easy to see that the storage overhead is lower bounded by 1. In both schemes a tag has size equal to an element of m, resulting in a storage overhead of \(1+1/s\) which approaches 1 as s increases. In all existing practical VC schemes [4, 5, 8, 9, 13] the storage overhead is \(\ge 2\).

Unbounded Data. In our BVC schemes the outsourced data m consists of s datasets, each consisting of N elements. Our schemes allow the client to add an arbitrary number of new rows and/or columns to m, and efficiently update the tag column without downloading m. While adding new rows to m is straightforward, adding new datasets to m without performing the tag computation from scratch (and so downloading the already outsourced data) is highly non-trivial. This is because in our schemes each row of m is authenticated using a single tag, and so adding a new dataset (a new data element to each row) could destroy the well-formed tag of the row, requiring the tag of the updated row to be computed from scratch. We show that our second scheme allows the client to add new datasets and efficiently update the tag column, without downloading m.

In summary our BVC schemes provide all the desirable properties of a VC scheme in practice, together with the unique property that the storage overhead reduces with the number of datasets. The storage efficiency however comes at a somewhat higher cost of computing the proofs by the server. In Sect. 4 we give comparisons of our schemes with [8] that supports the same functionality, when applied to the s datasets individually.

Composition. Our BVC schemes support composition. Let \(m=(m_{i,j})_{N\times s}\) be the client’s outsourced data, and \(\mathcal{P}_1=(f_1,I_1), \ldots , \mathcal{P}_n=(f_n, I_n)\) be n programs, where \(f_i\) is a function and \(I_i \subseteq [N]\) for every \(i\in [n]\). Computing the n programs on the datasets gives a matrix \(\rho =(\rho _{i,j})_{n\times s}\) of computation results and n proofs \(\pi _1,\ldots ,\pi _n\), where \(\rho _{i,j}\) is the result of computing \(f_i\) on the jth dataset and \(\pi _i\) is a proof of the correctness of the ith row of \(\rho \). Our schemes allow composition in the sense that there is a polynomial time algorithm \(\mathsf{Comb}\) that takes \((\rho , (\pi _1,\ldots ,\pi _n))\) and any program \(\mathcal{P}=(f(x_1,\ldots ,x_n), I=[n])\) as input, and outputs \(\xi _1=f(\rho _{1,1},\ldots ,\rho _{n,1}), \ldots , \xi _s=f(\rho _{1,s},\ldots , \rho _{n,s})\) along with a batch proof \(\pi \). Moreover, the client’s cost to verify \(\xi _1,\ldots ,\xi _s\) is substantially less than what is required by computing \(\xi _1,\ldots ,\xi _s\).

1.2 Overview of the Constructions

We use a novel interpretation of the technique in [8] when applied to multiple datasets to design schemes that satisfy properties (p1)–(p5). Let \(m=(m_{i,j})_{N\times s}\) be a collection of s datasets that are to be outsourced. We shall authenticate the s elements in each row of m using a single authentication tag that has size equal to an entry of m. This immediately results in a storage overhead of \(1+1/s\). The N tags are generated such that the cloud server can compute any program \(\mathcal{P}=(f,I)\) on the s datasets, and also produce a single proof that verifies the correctness of all s computation results. The main idea is a generalization of the technique of [8] to s elements. We pick a curve (or a plane) \(\sigma _i\) that passes through the s points determined by the s elements in the ith row of m and also a point determined by a pseudorandom value \(F_k(i)\), where F is a pseudorandom function; the stored tag is a single field element that can be used by the server to determine \(\sigma _i\); the computations of any program \(\mathcal{P}=(f,I)\) on all the s outsourced datasets can be efficiently verified using the once computation of f on the pseudorandom values \(\{F_k(i): i\in I\}\).

In the first scheme, the client picks a secret key \(sk=(k,a)\leftarrow \mathcal{K}\times (\mathbb {Z}_p\setminus \{0,1,\ldots ,s\})\) and determines a univariate polynomial \(\sigma _i(x)\) of degree \(\le s\) that passes through the \(s+1\) points \((1,m_{i,1}), \ldots , (s, m_{i,s})\) and \((a, F_k(i))\), for every \(i\in [N]\). The client takes the coefficient of \(x^s\) in \(\sigma _i(x)\) as the tag \(t_i\) that authenticates all data elements in the ith row of m, i.e., \(m_{i,1},\ldots , m_{i,s}\). The client stores \(pk=(m, \mathbf{t}=(t_1,\ldots ,t_N))\) on the cloud server. Let \(\mathcal{P}=(f,I)\) be a program where \(f(x_1,\ldots ,x_n)\) is a polynomial, and \(I=\{i_1,\ldots ,i_n\}\subseteq [N]\) specifies the elements of each dataset that are used in the computation of f. Given the program \(\mathcal{P}\), the server returns both the s computation results \(\rho _1=f(m_{i_1,1},\ldots , m_{i_n,1}), \ldots , \rho _s=f(m_{i_1,s},\ldots ,m_{i_n,s})\) and a proof \(\pi =f(\sigma _{i_1}(x),\ldots ,\sigma _{i_n}(x))\). The client accepts all s results only if \(\pi (j)=\rho _j\) for every \(j\in [s]\) and \(\pi (a)= f(F_k(i_1),\ldots ,F_k(i_n))\). In the second scheme, the client picks a secret key \(sk=(k,\mathbf{a}=(a_0,a_1,\ldots ,a_s))\leftarrow \mathcal{K}\times (\mathbb {Z}_p^*)^{s+1}\) and determines an \((s+1)\)-variate polynomial \(\sigma _i(\mathbf{y})=\sigma _i(y_0,y_1,\ldots ,y_s)= t_i \cdot y_0+m_{i,1}\cdot y_1+\cdots + m_{i,s}\cdot y_s\) that passes through the \(s+1\) points \((\mathbf{e}_2,m_{i,1}), \ldots , (\mathbf{e}_{s+1}, m_{i,s})\) and \((\mathbf{a}, F_k(i))\) for every \(i\in [N]\), where \(\mathbf{e}_j\in \mathbb {Z}_p^{s+1}\) is a 0–1 vector whose jth entry is equal to 1 and all other entries are equal to 0. The client stores \(pk=(m, \mathbf{t}=(t_1,\ldots ,t_N))\) on the cloud server. Given the program \(\mathcal{P}=(f,I)\), the server returns both the s computation results \(\rho _1, \ldots , \rho _s\) and a proof \(\pi =f(\sigma _{i_1}(\mathbf{y}),\ldots ,\sigma _{i_n}(\mathbf{y}))\). The client accepts all s results only if \(\pi (\mathbf{e}_{j+1})=\rho _j\) for every \(j\in [s]\) and \(\pi (\mathbf{a})= f(F_k(i_1),\ldots ,F_k(i_n))\).

In both schemes the server’s computation consists of PRF computations and polynomial evaluations over a relatively small finite field \(\mathbb {Z}_p\). In Sect. 4 we will show that the first scheme admits computation of polynomials of degree as high as any polynomial in the security parameter \(\lambda \), and the second scheme admits computation of any constant-degree polynomials where the constant however can be much larger than two. In both schemes, the client’s complexity of verifying all s computation results is dominated by the once computation of f on the n pseudorandom values \(F_k(i_1),\ldots ,F_k(i_n)\). In particular, this complexity becomes substantially less than the complexity incurred by the s outsourced computations on datasets when the number s is large enough. In both of our schemes the s datasets of size N contained in m are authenticated using a single vector \(\mathbf t\) of N tags, where each tag is a single field element. As a consequence, the storage overheads of our schemes are both equal to \((|m|+|\mathbf{t}|)/|m|=(Ns+N)/(Ns)=1+1/s\), which can be arbitrarily close to the lower bound 1 as long as s is large enough. Hence, our schemes achieve the properties (p1)–(p4).

In our schemes, a malicious cloud server may want to deceive the client into accepting some wrong results \((\bar{\rho }_1,\ldots ,\bar{\rho }_s)\ne (\rho _1,\ldots ,\rho _s)\) with a forged proof \(\bar{\pi }\). In the first scheme, the forged proof \(\bar{\pi }\), as the correct proof \(\pi \), is a univariate polynomial of degree \(\le d_1=s\cdot \deg (f)\). The malicious server succeeds only if \((\bar{\pi }(1),\ldots ,\bar{\pi }(s))=(\bar{\rho }_1,\ldots ,\bar{\rho }_s)\ne (\rho _1,\ldots ,\rho _s)=(\pi (1),\ldots ,\pi (s))\) and \(\bar{\pi }(a)= f(F_k(i_1),\ldots ,F_k(i_n))=\pi (a)\). Let \(\bar{\pi }-\pi =u_0+u_1x+\cdots +u_{d_1}x^{d_1}\) and \(\mathbf{a}=(1,a,\ldots ,a^{d_1})\). Then breaking the security of our first scheme is equivalent to finding a non-zero vector \(\mathbf{u}=(u_0,\ldots ,u_{d_1})\) such that the inner product \(\mathbf{u}\cdot \mathbf{a}=0\). In the second scheme, the forged proof \(\bar{\pi }\), as the correct proof \(\pi \), is an \((s+1)\)-variate polynomial of degree \(\le d_2=\deg (f)\). The malicious server succeeds only if \((\bar{\pi }(\mathbf{e}_2),\ldots ,\bar{\pi }(\mathbf{e}_{s+1}))=(\bar{\rho }_1,\ldots ,\bar{\rho }_s)\ne (\rho _1,\ldots ,\rho _s)=(\pi (\mathbf{e}_2),\ldots ,\pi (\mathbf{e}_{s+1}))\) and \(\bar{\pi }(\mathbf{a})= f(F_k(i_1),\ldots ,F_k(i_n))=\pi (\mathbf{a})\), where \(\mathbf{a}=(a_0,\ldots ,a_s)\). Let \(\bar{\pi }-\pi \) have coefficient vector \(\mathbf{u}\in \mathbb {Z}_p^h\) and let \({\varvec{\alpha }}=\langle \mathbf{a}^\mathbf{i}: \mathrm{wt}(\mathbf{i})\le d_2 \rangle \in \mathbb {Z}_p^h\), where \(h={s+1+d_2\atopwithdelims ()d_2}\) and \(\mathbf{a}^\mathbf{i}=a_0^{i_0}a_1^{i_1}\cdots a_s^{i_s}\) for every \(\mathbf{i}=(i_0,i_1,\ldots ,i_s)\). Then breaking the security of our second scheme is equivalent to finding a non-zero vector \(\mathbf{u}\) such that \(\mathbf{u}\cdot {\varvec{\alpha }}=0\). In Sect. 2, we provide a technical lemma that shows the probability that any adversary finds such a vector \(\mathbf u\) in both schemes is negligible in \(\lambda \) and thus the security proofs follow.

In both schemes, client can easily authenticate an arbitrary number of new rows using the same secret key and thus extend the size of all datasets. The second scheme also allows the number of datasets to be increased. To add a new dataset \((m_{1,s+1},\ldots ,m_{N,s+1})\), the client picks \((k^\prime ,a_{s+1})\leftarrow \mathcal{K}\times \mathbb {Z}_p^*\), and sends both \((m_{1,s+1},\ldots , m_{N,s+1})\) and \((\varDelta _1,\ldots , \varDelta _N)\) to the cloud server, where \(\varDelta _i=a_0^{-1}(F_k(i)-F_{k^\prime }(i)+a_{s+1}\cdot m_{i,s+1})\) for every \(i\in [N]\). The cloud server will update m to \((m_{i,j})_{N\times (s+1)}\) and update \(\mathbf{t}\) to \(\mathbf{t}^\prime =(t^\prime _1,\ldots ,t^\prime _N)\), where \(t^\prime _i=t_i-\varDelta _i\) for every \(i\in [N]\). Intuitively, doing so reveals no information about \(\mathbf{a}^\prime =(a_0,\ldots ,a_s,a_{s+1})\) to the cloud server. The \(t^\prime _i\) is computed such that \(\sigma _i^\prime (y_0,\ldots ,y_{s+1})=t_i^\prime \cdot y_0+ m_{i,1}\cdot y_1+\cdots +m_{i,s+1}\cdot y_{s+1}\) passes through \((\mathbf{a}^\prime , F_k(i)), (\mathbf{e}_2, m_{i,1}),\ldots , (\mathbf{e}_{s+2}, m_{i,s+1})\). Thus, all the algorithms of the second scheme will work well with the new secret key \(sk^\prime =(k^\prime ,\mathbf{a}^\prime )\). We show that breaking the security of this extended scheme is equivalent to finding a non-zero vector \(\mathbf u\) such that \(\mathbf{u}\cdot {\varvec{\alpha }}^\prime =0\), where \({\varvec{\alpha }}^\prime =\langle (\mathbf{a}^\prime )^\mathbf{i}: \mathrm{wt}(\mathbf{i})\le d_2 \rangle \). We show that this cannot be done except with negligible probability. Thus the second scheme also satisfies (p5).

In both schemes, the composition property follows from the intrinsic structure of the constructions. Let \(\mathcal{P}_1=(f_1,I_1), \ldots , \mathcal{P}_n=(f_n,I_n)\) be n programs. In the first scheme the cloud server would compute these programs on \(pk=(m,\mathbf{t})\) and then obtain a matrix \((\rho _{i,j})_{n\times s}\) of results and n proofs \((\pi _1,\ldots ,\pi _n)\). Given any high level program \(\mathcal{P}=(f(x_1,\ldots ,x_n),I=[n])\), the cloud server would be able to compute \(\mathcal{P}\) on \((\rho _{i,j})_{n\times s}\) to obtain s results \(\xi _1,\ldots ,\xi _s\) and also compute \(\mathcal P\) on \((\pi _1,\ldots ,\pi _n)\) to obtain a proof \(\pi =f(\pi _1,\ldots ,\pi _n)\).

1.3 Related Work

The problem of securely outsourcing computation has a long history. We refer the readers to [5, 14] for the solutions that require strong assumptions on adversaries, and the theoretical solutions [19] that require interaction. We are only interested in the non-interactive solutions in the standard model.

Verifiable Computation. The verifiable computation of Gennaro et al. [14] gave a non-interactive solution for securely outsourcing computation in the standard model. The VC schemes of [2, 11, 14] can delegate any generic functions but have limited practical relevance due to their use of fully homomorphic encryption (FHE). The memory delegation [10] can delegate computations on an arbitrary portion of the outsourced data. However, the client must be stateful and suffer from the impracticality of PCP techniques. Benabbas et al. [5] initiated the study of practical (private) VC schemes for delegating specific functions such as polynomials. Parno et al. [21] initiated the study of public VC schemes. Fiore et al. [13] generalized the constructions of [5] and obtained public VC schemes for delegating polynomials and matrices. Papamanthou et al. [20] constructed a public VC scheme for delegating polynomials that allows efficient update. The storage overhead of all these schemes is \(\ge 2\). Furthermore, they only admit linear computations on the outsourced data. In particular, the multi-function VC [13] has similar setting as ours but only admits linear computations and has storage overhead \(\ge 2\).

Homomorphic MACs and Signatures. A homomorphic MAC or signature scheme [7, 16] allows one to freely authenticate data and then verify computations on the authenticated data. Such schemes give VC: the client can store data elements and their MAC tags (or signatures) with a server such that the server can compute some admissible functions on an arbitrary subset of the data elements; the server provides both the answer and a MAC tag (or signature) vouching for the correctness of its answer. The storage overhead of the resulting VC scheme is \(\ge 2\). Catalano and Fiore [8] proposed a practical HomMAC that admits polynomials of degree as high as a polynomial in the security parameter. However, the client’s verification requires as much time as the delegated computation. Backes, Fiore and Reischuk [4] proposed a HomMAC that has amortized efficient verification but only admits polynomials of degree \(\le 2\).

Non-interactive Proofs. Goldwasser et al. [18] gave a non-interactive scheme for delegating NC computations. However, for any circuit of size n, the server’s running time may be a high degree polynomial of n and thus not practical. The SNARGs/SNARKs of [3, 6, 15] give non-interactive schemes for delegating computations. However, they must rely on the non-falsifiable assumptions [17] which are not standard and much stronger than the common assumptions such as the existence of secure PRFs we use in this paper.

1.4 Organization

In Sect. 2 we provide a formal definition of batch verifiable computation and its security; we also develop a lemma which will be used in our security proofs; In Sect. 3 we present our BVC schemes; In Sect. 4, we give a detailed analysis of the proposed schemes and compare them with the solutions based on [4, 8]; we also discuss extra properties of our schemes such as composition; Sect. 5 contains some concluding remarks.

2 Preliminaries

Let \(\lambda \) be a security parameter. We say that a function \(q(\lambda )\) is a polynomial function of \(\lambda \), denoted as \(q(\lambda )=\mathsf{poly}(\lambda )\), if there is a real number \(c>0\) such that \(q(\lambda )=O(\lambda ^c)\); we say that a function \(\epsilon (\lambda )\) is a negligible function of \(\lambda \), denoted as \(\epsilon (\lambda )= \mathsf{neg}(\lambda )\), if \(\epsilon (\lambda )=o(\lambda ^{-c})\) for any real number \(c>0\). Let \(\mathcal{A}(\cdot )\) be a probabilistic polynomial time (PPT) algorithm. The symbol “\(y\leftarrow \mathcal{A}(x)\)” means that y is the output distribution of running algorithm \(\mathcal A\) on the input x. We denote by \(\mathbf{u}=\langle u_x: x\in X\rangle \) any vector whose entries are labeled by elements of the finite set X.

2.1 Batch Verifiable Computation on Outsourced Data

In this section we formally define the notion of batch verifiable computation on outsourced data. In our model, the client has a set of data elements and stores them on the cloud server. The set is organized as a matrix \(m=(m_{i,j})_{N\times s}\), where each element \(m_{i,j}\) is labeled with a pair \((i,j)\in [N]\times [s]\). Each column of m is called a dataset. Let \(\mathcal{F}\) be any admissible function family. The client is interested in delegating the computation of some function \(f(x_1,\ldots ,x_n)\in \mathcal{F}\) on the n elements labeled by \(I=\{i_1,\ldots ,i_n\} \subseteq [N]\), of every dataset. In other words, the client is interested in learning \(\rho _1=f(m_{i_1,1},\ldots ,m_{i_n,1}), \rho _2=f(m_{i_1,2},\ldots ,m_{i_n,2}), \ldots , \rho _s=f(m_{i_1,s},\ldots , m_{i_n,s})\). We say that such a batch of computations is defined by a program \(\mathcal{P}=(f,I) \in \mathcal{F}\times 2^{[N]}\).

Definition 1

(Batch Verifiable Computation). A BVC scheme for \(\mathcal F\) is a tuple \(\varPi =(\mathsf{KeyGen, ProbGen,}\) \(\mathsf{Compute, Verify})\) of four polynomial-time algorithms, where

  • \((sk,pk)\leftarrow \mathsf{KeyGen}(1^\lambda ,m)\) is a key generation algorithm that takes as input the security parameter \(\lambda \) and a set \(m=(m_{i,j})_{N\times s}\) of data elements and outputs a secret key sk and a public key pk;

  • \(vk\leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\) is a problem generation algorithm that takes as input sk, a program \(\mathcal{P}=(f,I)\in \mathcal{F}\times 2^{[N]}\) and outputs a verification key vk;

  • \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk, \mathcal{P})\) is a computation algorithm that takes as input pk and a program \(\mathcal{P}=(f,I) \in \mathcal{F}\times 2^{[N]}\) and outputs an answer \(\rho =(\rho _1,\ldots ,\rho _s)\) and a proof \(\pi \); and

  • \(\{0,1\}\leftarrow \mathsf{Verify}(sk, vk,(\rho ,\pi ))\) is a verification algorithm that verifies \(\rho \) with \((sk,vk,\pi )\); it outputs 1 (to indicate acceptance) or 0 (to indicate rejection).

In our BVC model, the client generates \((sk,pk)\leftarrow \mathsf{KeyGen}(1^\lambda ,m)\) and gives pk to the server. To compute some program \(\mathcal{P}=(f,I)\) on the outsourced data, the client generates \(vk\leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\) and gives \(\mathcal{P}\) to the server. Given \((pk,\mathcal{P})\), the server computes and replies with \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk,\mathcal{P})\). At last, the client accepts \(\rho \) only if \(\mathsf{Verify}(sk,vk,(\rho ,\pi ))=1\).

Correctness. This property requires that the client always accepts the results computed by an honest server (using the algorithm \(\mathsf{Compute}\)). Formally, the scheme \(\varPi \) is correct if for any data \(m=(m_{i,j})\), any \((sk,pk)\leftarrow \mathsf{KeyGen}(1^\lambda ,m)\), any program \(\mathcal{P}\), any \(vk\leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\) and any \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk,\mathcal{P})\), it holds that \(\mathsf{Verify}(sk,vk,(\rho ,\pi ))=1\).

Security. This property requires that no malicious server can deceive the client into accepting any incorrect results. Formally, the scheme \(\varPi \) is said to be secure if any PPT adversary \(\mathcal A\) wins with probability \(<\mathsf{neg}(\lambda )\) in the security game of Fig. 1.

Fig. 1.
figure 1

Security game

Remarks: (1) In the Forgery phase the adversary \(\mathcal A\) behaves just like it has done in any one of the q queries. Without loss of generality, we can suppose \((\mathcal{P}^*,\bar{\rho }^*, \bar{\pi }^*)=(\mathcal{P}_{\ell ^*},\bar{\rho }_{\ell ^*}, \bar{\pi }_{\ell ^*})\) for some \(\ell ^*\in [q]\), i.e., \(\mathcal A\) picks one of its q queries as the final forgery. (2) In the literature, many VC schemes such as [2, 11, 14] are not immune to the “rejection problem”: if the malicious server knows whether the client has accepted or rejected its answer, then the algorithm \(\mathsf{KeyGen}\) (requiring heavy computation effort) must be run again to refresh both sk and pk; otherwise, the VC scheme becomes no longer secure. In our security definition, the adversary \(\mathcal A\) is allowed to make a polynomial number of queries and learns whether some adaptively chosen answers in each query will be accepted by the client. Therefore, the BVC schemes secure under our definition will be immune to the “rejection problem”. (3) Our definition of BVC is different from the VC [5] in the sense that we neither consider the outsourced data as a function nor consider the client’s input to \(\mathsf{ProbGen}\) as an input from that function’s domain. In our definition, the client’s input to \(\mathsf{ProbGen}\) is a program \(\mathcal{P}=(f,I) \in \mathcal{F}\times 2^{[N]}\) that specifies the computations of an admissible function f on the portion labeled by I of every dataset. Clearly our definition captures more general scenarios than [5]. In particular, the VC model of [5] can be captured by our BVC as below. Let m(x) be the client’s function which will be delegated to the cloud server (e.g., m(x) may be a polynomial \(m_1+m_2x+\cdots +m_Nx^{N-1}\) in [5]); from our point of view, the coefficients \((m_1,\ldots ,m_N)\) of the polynomial m(x) is a dataset; and furthermore, any input \(\alpha \) to the polynomial m(x) specifies a program \(\mathcal{P}=(f_\alpha , [N])\), where \(f_\alpha (m_1,\ldots ,m_N)=m(\alpha )\). Therefore, the polynomial evaluations considered in [5] can be captured by some specific linear computations in our BVC model. (4) In our BVC, the client’s verification requires the secret key sk. Thus, our BVC schemes are privately verifiable. (5) A critical efficiency measure of the BVC scheme in Definition 1 is to what extent the client’s verification requires less computing time (resources) than the delegated computations. The client’s verification in [5, 9, 13, 14, 20, 21] is efficient in the sense that it requires substantially less time than performing the delegated computation. In our BVC, the client performs verification by generating a verification key \(vk\leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\) and then running the verification algorithm \(\mathsf{Verify}(sk,vk,(\rho ,\pi ))\). The client’s verification time is equal to the total time required for running both algorithms. Let \(t_\mathcal{P}\) be the time required for computing the program \(\mathcal P\) on the outsourced data. We say that a BVC scheme is outsourceable if the client’s verification time is of the order \(o(t_\mathcal{P})\). In this paper, we shall construct BVC schemes that are outsourceable.

2.2 A Lemma

In this section we present a lemma (Lemma 1) that underlies the security proofs of our BVC schemes. Let \(\lambda \) be a security parameter. Let p be a \(\lambda \)-bit prime and let \(\mathbb {Z}_p\) be the finite field of p elements. Let \(h\ge 0\) be an integer. We define an equivalence relation \(\sim \) over \(\mathbb {Z}_p^{h+1}\setminus \{\mathbf{0}\}\) as below: two vectors \({\mathbf{u}},{\mathbf{v}}\in \mathbb {Z}_p^{h+1}\setminus \{\mathbf{0}\}\) are said to be equivalent if there exists \(\xi \in \mathbb {Z}_p\setminus \{0\}\) such that \({\mathbf{u}}=\xi \cdot {\mathbf{v}}\). Let \(\varOmega _{p,h}=(\mathbb {Z}_p^{h+1}\setminus \{\mathbf{0}\})/\sim \) be the set of all equivalence classes. We represent each equivalence class with a vector in that class. Without loss of generality, we agree that the representative of each class in \(\varOmega _{p,h}\) is chosen such that its first non-zero element is 1. For any \({\mathbf{u}},{\mathbf{v}}\in \varOmega _{p,h}\), we define \({\mathbf{u}}\odot {\mathbf{v}}=0\) if the inner product of \(\mathbf{u}\) and \(\mathbf{v}\) is equal to 0 modulo p and define \({\mathbf{u}}\odot {\mathbf{v}}=1\) otherwise. The following game models the malicious server’s attack in our BVC schemes.

\(\mathbf{Game}_\mathcal{V}\mathbf{.}\) Let \(\mathcal A\) be any algorithm. Let \(\mathcal{V}\subseteq \varOmega _{p,h}\) and let \(q=\mathsf{poly}(\lambda )\). In this problem, a vector \({\mathbf{v}}^*\leftarrow \mathcal{V}\) is chosen and hidden from \(\mathcal A\); for \(i=1\) to q, \(\mathcal A\) adaptively picks a query \({\mathbf{u}}_i\in \varOmega _{p,h}\) and learns \(b_i={\mathbf{u}}_i\odot {\mathbf{v}}^*\in \{0,1\}\); \(\mathcal A\) wins the game if there exists an index \(i^*\in [q]\) such that \(b_{i^*}=0\).

In Appendix A, we show the following technical lemma:

Lemma 1

Let p be a prime and let \(d,h, s> 0\) be integers.

  1. (1)

    Let \(A\subseteq \mathbb {Z}_p\) be a non-empty subset of \(\mathbb {Z}_p\). Let \(\mathcal{V}_\mathrm{up}=\{(1,a,\ldots , a^h): a\in A\} \). Then any adversary \(\mathcal A\) wins in \(\mathbf{Game}_{\mathcal{V}_\mathrm{up}}\) with probability \(\le hq/|A|\).

  2. (2)

    Let \(\mathcal{V}_\mathrm{mp}=\{\langle {\mathbf{a}}^{\mathbf{i}}: \mathrm{wt}({\mathbf{i}})\le d\rangle : \mathbf{a}\in A^{s+1}\}\), where \(h={s+1+d\atopwithdelims ()d}-1\). Then any adversary \(\mathcal A\) wins in \(\mathbf{Game}_{\mathcal{V}_\mathrm{mp}}\) with probability \(\le dq/|A|\).

3 Constructions

In this section we propose two BVC schemes for delegating polynomial computations on outsourced data. Our schemes use curves and planes to authenticate the outsourced data, respectively.

3.1 The First Construction

Let p be a \(\lambda \)-bit prime and let \({F}: \mathcal{K}\times \{0,1\}^*\rightarrow \mathbb {Z}_p\) be a PRF with key space \(\mathcal K\), domain \(\{0,1\}^*\) and range \(\mathbb {Z}_p\). Let \(s>0\) be an integer. Let \(m=(m_{i,j})\in \mathbb {Z}_p^{N\times s}\) be a matrix that models the client’s data. We consider \(1,2,\ldots ,s\) as elements of \(\mathbb {Z}_p\). Below is our first construction \(\varPi _1\).

  • \((sk,pk)\leftarrow \mathsf{KeyGen}(1^\lambda , m)\): Pick \(k\leftarrow \mathcal{K}\) and \(a\leftarrow \mathbb {Z}_p\setminus \{0,1,2,\ldots ,s\}\). For every \(i\in [N]\), compute the coefficients of a polynomial \(\sigma _i(x)=\sigma _{i,1}+\sigma _{i,2}\cdot x+\cdots +\sigma _{i,s} \cdot x^{s-1}+t_i\cdot x^s\) such that \(\sigma _i(j)=m_{i,j}\) for every \(j\in [s]\) and \(\sigma _i(a)={F}_k(i)\). This can be done by solving the following equation system

    $$\begin{aligned} \begin{pmatrix} 1 &{} 1 &{} 1 &{} \cdots &{} 1\\ 1 &{} 2 &{} 2^2 &{} \cdots &{} 2^{s}\\ \vdots &{} \vdots &{} \vdots &{} \cdots &{} \vdots \\ 1 &{} s &{} s^2 &{} \cdots &{} s^s\\ 1 &{} a &{} a^2 &{} \cdots &{} a^s \end{pmatrix} \begin{pmatrix} \sigma _{i,1}\\ \sigma _{i,2}\\ \vdots \\ \sigma _{i,s}\\ t_i\\ \end{pmatrix} = \begin{pmatrix} m_{i,1}\\ m_{i,2}\\ \vdots \\ m_{i,s}\\ {F}_k(i) \end{pmatrix} \end{aligned}$$
    (1)

    for every \(i\in [N]\). The algorithm outputs \(pk=(m,\mathbf{t})\) and \(sk=(k,a)\), where \(\mathbf{t}=(t_1,\ldots ,t_N)\).

  • \(vk \leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies the data elements on which f should be computed. This algorithm computes and outputs a verification key \(vk=f({F}_k(i_1),\ldots , {F}_k(i_n))\).

  • \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk, \mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies the data elements on which f should be computed. This algorithm computes \(\rho _j=f(m_{i_1,j},\ldots , m_{i_n,j})\) for every \(j\in [s]\). It solves the following equation system

    $$\begin{aligned} \begin{pmatrix} 1 &{} 1 &{} 1 &{} \cdots &{} 1\\ 1 &{} 2 &{} 2^2 &{} \cdots &{} 2^{s-1}\\ \vdots &{} \vdots &{} \vdots &{} \cdots &{}\vdots \\ 1 &{} s &{} s^2 &{} \cdots &{} s^{s-1} \end{pmatrix} \begin{pmatrix} \sigma _{i,1}\\ \sigma _{i,2}\\ \vdots \\ \sigma _{i,s} \end{pmatrix} = \begin{pmatrix} m_{i,1}-t_i\\ m_{i,2}-2^st_i\\ \vdots \\ m_{i,s}-s^s t_i \end{pmatrix} \end{aligned}$$
    (2)

    to determine s coefficients \(\sigma _{i,1},\ldots ,\sigma _{i,s}\) for every \(i\in I\). Let \(\sigma _i(x)=\sigma _{i,1}+\sigma _{i,2}\cdot x+\cdots +\sigma _{i,s}\cdot x^{s-1}+t_i\cdot x^s\). This algorithm outputs \(\rho =(\rho _1,\ldots , \rho _{s})\) and \(\pi =f(\sigma _{i_1}(x),\ldots , \sigma _{i_n}(x))\).

  • \(\{0,1\}\leftarrow \mathsf{Verify}(sk,vk,(\rho ,\pi ))\): This algorithm accepts \(\rho \) and outputs 1 only if \( \pi (a)=vk\) and \(\pi (j)=\rho _j\mathrm{~for~every~}j\in [s]\).

It is easy to see \(\varPi _1\) is correct. In the full version we show that no PPT adversary can win in the standard security game (Fig. 1) for \(\varPi _1\) except with negligible probability. So we have

Theorem 1

If F is a secure PRF, then \(\varPi _1\) is a secure BVC scheme.

3.2 The Second Construction

Let p be a \(\lambda \)-bit prime and let \({F}: \mathcal{K}\times \{0,1\}^*\rightarrow \mathbb {Z}_p\) be a PRF with key space \(\mathcal K\), domain \(\{0,1\}^*\) and range \(\mathbb {Z}_p\). Let \(s>0\) be an integer. Let \(m=(m_{i,j})\in \mathbb {Z}_p^{N\times s}\) be a matrix that models the client’s data. We consider \(1,2,\ldots ,s\) as elements of \(\mathbb {Z}_p\). Below is our second construction \(\varPi _2\).

  • \((sk,pk)\leftarrow \mathsf{KeyGen}(1^\lambda , m)\): Pick \(k\leftarrow \mathcal{K}\) and \(a_0,a_1, \ldots ,a_{s}\leftarrow \mathbb {Z}_p^*\); for every \(i\in [N]\), compute

    $$\begin{aligned} t_i=a_{0}^{-1}({F}_k(i)-a_1\cdot m_{i,1}-\cdots -a_{s}\cdot m_{i,s}). \end{aligned}$$
    (3)

    This algorithm outputs \(pk=(m,\mathbf{t})\) and \(sk=(k,\mathbf{a})\), where \(\mathbf{t}=(t_1,\ldots ,t_N)\) and \(\mathbf{a}=(a_0,a_1,\ldots ,a_s)\).

  • \(vk \leftarrow \mathsf{ProbGen}(sk,\mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies the data elements on which f should be computed. This algorithm computes and outputs a verification key \(vk=f({F}_k(i_1),\ldots , {F}_k(i_n))\).

  • \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk, \mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies the data elements on which f should be computed. This algorithm computes \(\rho _j=f(m_{i_1,j},\ldots , m_{i_n,j})\) for every \(j\in [s]\). Let \(\sigma _i(\mathbf{y})=t_i\cdot y_0+m_{i,1}\cdot y_1+\cdots +m_{i,s}\cdot y_{s} \) for every \(i\in I\), where \(\mathbf{y}=(y_0, y_1,\ldots ,y_s)\). This algorithm outputs s results \(\rho =(\rho _1,\ldots , \rho _{s})\) and a proof \(\pi =f(\sigma _{i_1}(\mathbf{y}),\ldots , \sigma _{i_n}(\mathbf{y}))\).

  • \(\mathsf{Verify}(sk,vk,(\rho ,\pi ))\): This algorithm accepts \(\rho \) and outputs 1 only if \(\pi (\mathbf{a})=vk\) and \(\pi (\mathbf{e}_{j+1})=\rho _j\) for every \(j\in [s]\), where \(\mathbf{e}_{j+1}\in \mathbb {Z}_p^{s+1}\) is a 0–1 vector whose \(j+1\)st component is 1 and all other components are 0.

It is easy to see \(\varPi _2\) is correct. In the full version we show that no PPT adversary can win the standard security game (Fig. 1) for \(\varPi _2\) except with negligible probability. So we have

Theorem 2

If F is a secure PRF, then \(\varPi _2\) is a secure BVC scheme.

4 Analysis

In this section we analyze our BVC schemes and compare them with several (naive) solutions based on the existing works [4, 8].

Admissible Function Family. In both of our schemes the integer s is allowed to be \(O(\lambda )\) to capture the scenario that a large enough number of datasets are outsourced. In \(\varPi _1\) the cloud server’s computation consists of computing f on s points, solving n equation systems of the form (2) and also computing a proof \(\pi =f(\sigma _{i_1}(x),\ldots , \sigma _{i_n}(x))\). On one hand, the first two computations are light for the powerful server. On the other hand, computing the proof \(\pi \) involves some symbolic computation and seems heavy. However, \(\pi \) is a univariate polynomial of degree \(\le sd\). So \(\pi \) can be interpolated given \(D=sd+1\) evaluations of \(\pi \), which requires the computations of f on \(O(D)=O(ds)\) points. This work is acceptable for the cloud server even if \(d=\mathsf{poly}(\lambda )\). Therefore, \(\varPi _1\) allows the computation of polynomials of degree d as high as a polynomial in the security parameter. In \(\varPi _2\) the cloud server’s computation consists of computing f on s points and also computing a proof \(\pi =f(\sigma _{i_1}(\mathbf{y}), \ldots , \sigma _{i_n}(\mathbf{y}))\). On one hand, the first computation is light for the powerful cloud server. On the other hand, computing the proof \(\pi \) involves some symbolic computation. Note that \(f(x_1,\ldots ,x_n)\) is of degree d and each of the \((s+1)\)-variate polynomials \(\sigma _{i_1}(\mathbf{y}), \ldots , \sigma _{i_n}(\mathbf{y})\) is of degree 1. The cost required by computing \(\pi \) is roughly equal to that required by computing f on \((s+1)^d\) points. Furthermore, the server needs to send a representation of \(\pi \) that consists of \({s+1+d\atopwithdelims ()d}\) field elements. If we allow \(s=O(\lambda )\), then degree d must be restricted to O(1) such that the server’s computation and communication are not too costly. So \(\varPi _2\) allows the computation of any O(1)-degree polynomials. This admissible function family of O(1)-degree polynomials can be significantly larger than the admissible function family of quadratic polynomials in [4].

Efficient Client Verification. Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial function and \(I=\{i_1,\ldots ,i_n\}\subseteq [N]\). Let \((\rho ,\pi )\) be the results and proof generated by \(\mathsf{Compute}\). The verification complexity is measured by the time complexity of running two algorithms: \(\mathsf{ProbGen}(sk,\mathcal{P})\) and \(\mathsf{Verify}(sk,vk,(\rho ,\pi ))\). In our schemes, the time complexity of running \(\mathsf{Verify}\) is independent of n. As we always consider large enough n, the verification complexity in both of our schemes will be dominated by the time complexity of running \(\mathsf{ProbGen}(sk,\mathcal{P})\). This is the complexity of computing f on n pseudorandom values \(F_k(i_1),\ldots ,F_k(i_n)\) once. Note that this computation requires roughly 1 / s times as much time as that required by the s delegated computations of f on the outsourced data. Whenever s is large enough, the client’s verification per each dataset uses substantially less time than computing f on each dataset. Hence, our schemes are outsourceable.

Efficient Server Computation. In our schemes, the cloud server’s computation only involves PRF computations and polynomial evaluations over the finite field \(\mathbb {Z}_p\). Note that we never need any number-theoretic assumptions. As a result, the size of the finite field \(\mathbb {Z}_p\) can be chosen as small as \(p\approx 2^{128}\) when the security parameter \(\lambda =128\). In particular, the PRF F in our both constructions can be chosen as some heuristic PRFs such as AES block ciphers in practical implementations. In Sect. 4.3 we shall see that our server’s computation is significantly more efficient than [4].

Efficient Server Storage. The storage overheads of our schemes are equal to |pk| / |m|, where |pk| and |m| denote the numbers of field elements contained in pk and m respectively. Recall that the number |pk| / |m| is always \(\ge 1\) and our objective is making it as close to 1 as possible. It is trivial to see that \(|pk|/|m|=(|m|+|\mathbf{t}|)/|m|=(Ns+N)/Ns=1+1/s\) in our schemes. Therefore, the storage overheads of our schemes can be made arbitrarily close to 1 as long as s is large enough.

Extending the Size of Datasets. In our schemes the client’s outsourced data is a collection \(m=(m_{i,j})_{N\times s}\) of s datasets, each containing N elements. In practice, the client may add new data elements to the outsourced datasets. Let \(\varPi =\varPi _1\) or \(\varPi _2\). Let (pksk) be any public key and secret key generated by \(\varPi .\mathsf{KeyGen}(1^\lambda ,m)\). Note that pk takes the form \((m, \mathbf{t}=(t_1,\ldots ,t_N))\), where \(t_i\) is a tag authenticating the elements \((m_{i,1},\ldots ,m_{i,s})\) for every \(i\in [N]\). In particular, the tag \(t_i\) is computed using (1) when \(\varPi =\varPi _1\) and using (3) when \(\varPi =\varPi _2\), respectively. Let \(N^\prime =N+1\). To add s new elements \((m_{N^\prime ,1},\ldots ,m_{N^\prime ,s})\) to the s datasets, the client can simply compute a tag \(t_{N^\prime }\) authenticating these elements and instruct the cloud server to change \(pk=(m, \mathbf{t})\) to \(pk^\prime =(m^\prime , \mathbf{t}^\prime )\), where \(m^\prime =(m_{i,j})_{N^\prime \times s}\) and \( \mathbf{t}^\prime =(t_1,\ldots ,t_{N^\prime })\). In particular, when \(\varPi =\varPi _1\), the tag \(t_{N^\prime }\) will computed by solving the equation system (1) for \(i=N^\prime \); and when \(\varPi =\varPi _2\), the tag \(t_{N^\prime }\) will be computed using the equation (3) for \(i=N^\prime \). Extending the size of all datasets in this way will never compromise the security of the underlying schemes.

Extending the Number of Datasets in \(\varPi _2\) . In practice, the client may also want to extend the number of datasets. Let \(s^\prime =s+1\). We consider the scenario of the client updating m to \(m^\prime =(m_{i,j})_{N\times s^\prime }\), where \((m_{1,s^\prime },\ldots , m_{N,s^\prime })\) is a new dataset. The general case for adding more than one new datasets can be done by adding one after the other. In a naive way of updating m to \(m^\prime \), the client may simply download \(pk=(m,\mathbf{t})\), verify the integrity of m and then run our schemes on \(m^\prime \). However, this method will be quite inefficient when the size of m is large. Here we show how the client in \(\varPi _2\) can authenticate \(m^\prime \) without downloading m.

Let \({F}: \mathcal{K}\times \{0,1\}^*\rightarrow \mathbb {Z}_p\) be the PRF and let \(sk=(k,\mathbf{a})\leftarrow \mathcal{K}\times (\mathbb {Z}_p^*)^{s+1}\) be the secret key used to outsource \(m=(m_{i,j})_{N\times s}\) in \(\varPi _2\). Let \(pk=(m,\mathbf{t})\), where \( t_i=a_{0}^{-1}({F}_k(i)-a_1\cdot m_{i,1}-\cdots -a_{s}\cdot m_{i,s}) \) for every \(i\in [N]\). Let \((m_{1,s+1},\ldots ,m_{N,s+1})\) be a new dataset. To authenticate \(m^\prime =(m_{i,j})_{N\times s^\prime }\), the client picks \((k^\prime , a_{s+1})\leftarrow \mathcal{K}\times \mathbb {Z}_p^*\), updates sk to \(sk^\prime =(k^\prime ,\mathbf{a}^\prime =(a_0,\ldots ,a_s,a_{s+1}))\) and instructs the server to change pk to \(pk^\prime =(m^\prime , \mathbf{t}^\prime =(t^\prime _1,\ldots ,t^\prime _N))\), where \( t^\prime _i=a_0^{-1}(F_{k^\prime }(i)-a_1\cdot m_{i,1}- \cdots -a_{s+1}\cdot m_{i,s+1})=t_i- a_0^{-1}\cdot (F_k(i)- F_{k^\prime }(i)+a_{s+1}\cdot m_{i,s+1}). \) To do so, the client only needs to send the new dataset \((m_{1,s+1},\ldots ,m_{N,s+1})\) together with \( \varDelta _i=a_0^{-1}(F_k(i)-F_{k^\prime }(i)+a_{s+1}\cdot m_{i,s+1}), 1\le i\le N, \) to the cloud server such that the server can update \(t_i\) to \(t_i^\prime \) by computing \(t_i^\prime =t_i-\varDelta _i\) for every \(i\in [N]\). All the other algorithms will be changed as below to work with \((sk^\prime , pk^\prime )\):

  • \(vk \leftarrow \mathsf{ProbGen}(sk^\prime ,\mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies on which elements of each dataset f should be computed. This algorithm computes and outputs a verification key \(vk=f({F}_{k^\prime }(i_1),\ldots , {F}_{k^\prime }(i_n))\).

  • \((\rho ,\pi )\leftarrow \mathsf{Compute}(pk^\prime , \mathcal{P})\): Let \(\mathcal{P}=(f,I)\) be a program, where \(f(x_1,\ldots ,x_n)\) is a polynomial of degree d over \(\mathbb {Z}_p\) and \(I=\{i_1,\ldots , i_n\}\subseteq [N]\) specifies on which elements of each dataset f should be computed. This algorithm computes \(\rho _j=f(m_{i_1,j},\ldots , m_{i_n,j})\) for every \(j\in [s+1]\). Let \( \sigma _i(\mathbf{y})=t_i^\prime \cdot y_0+m_{i,1}\cdot y_1+\cdots +m_{i,s}\cdot y_{s}+m_{i,s+1} \cdot y_{s+1} \) for every \(i\in I\), where \(\mathbf{y}=(y_0, y_1,\ldots ,y_s,y_{s+1})\). This algorithm outputs \(\rho =(\rho _1,\ldots , \rho _{s+1})\) and a proof \(\pi =f(\sigma _{i_1}(\mathbf{y}),\ldots , \sigma _{i_n}(\mathbf{y}))\).

  • \(\mathsf{Verify}(sk^\prime ,vk,(\rho ,\pi ))\): This algorithm accepts \(\rho \) and outputs 1 only if \(\pi (\mathbf{a}^\prime )=vk\) and \(\pi (\mathbf{e}_{j+1})=\rho _j\) for every \(j\in [s+1]\).

We say that these modifications resulting in an extended scheme \(\varPi _2^\prime \). It is trivial to verify the correctness of \(\varPi _2^\prime \). In the full version we show that no PPT adversary can win a slight modification of the standard security game (Fig. 1) for \(\varPi ^\prime _2\) except with negligible probability, where the modification means that the adversary is allowed to know two tag vectors \(\mathbf{t}\) and \(\mathbf{t}^\prime \) instead of one.

Theorem 3

If F is a secure PRF, then \(\varPi ^\prime _2\) is a secure BVC scheme.

Composition. We now show that our BVC schemes allow composition and the composed computations can be efficiently verified as well. Let \(\varPi =\varPi _1\) or \(\varPi _2\). Let \(m=(m_{i,j})_{N\times s}\in \mathbb {Z}_p^{N\times s}\) be a collection of s datasets. Let pk and sk be any public key and secret key generated by \(\varPi .\mathsf{KeyGen}(1^\lambda ,m)\). Let \(\mathcal{P}_1=(f_1,I_1),\ldots ,\mathcal{P}_n=(f_n,I_n)\) be n programs, where \(f_i\in \mathcal{F}\) and \(I_i\subseteq [N]\). Let \(vk_i=f_i(\langle F_k(j): j\in I_i\rangle )\) be generated by \(\varPi .\mathsf{ProbGen}(sk,\mathcal{P}_i)\) for every \(i\in [n]\). Let \(((\rho _{i,1},\ldots , \rho _{i,s}),\pi _{i})\leftarrow \varPi .\mathsf{Compute}(pk,\mathcal{P}_i)\) be the results and proof generated by the computing algorithm. We can consider \(\rho =(\rho _{i,\ell })_{n\times s}\) as a collection of s new datasets and consider \((\rho , \{\pi _i\}_{i=1}^n)\) as an encoding of \(\rho \). Let \(\mathcal{P}=(f(x_1,\ldots , x_n), I=[n])\) be a program that defines a computation on \(\rho \).

If \(\varPi =\varPi _1\), we have that \(sk=(k,a)\in \mathcal{K}\times (\mathbb {Z}_p\setminus \{0,1,\ldots ,s\})\) and \(pk=(m,\mathbf{t})\). Due to the correctness of \(\varPi _1\), we have that \(\mathsf{Verify}(sk,vk_i,\{\rho _{i,\ell }\}_{\ell \in [s]}, \pi _i)=1\) for every \(i\in [n]\), that is, \( \pi _i(1)=\rho _{i,1}, \pi _i(2)=\rho _{i,2}, \ldots , \pi _i(s)=\rho _{i,s}\) and \(\pi _i(a)=vk_i. \) Below is the combing algorithm:

  • \(((\xi _1,\ldots ,\xi _s), \pi )\leftarrow \mathsf{Comb}(f,(\rho _{i,\ell })_{n\times s}, \{\pi _i\}_{i\in [n]})\): computes \(\xi _\ell =f(\rho _{1,\ell },\ldots , \rho _{n,\ell })\) for every \(\ell \in [s]\) and \(\pi =f(\pi _1(x), \ldots , \pi _n(x))\). Outputs \(\xi _1,\ldots ,\xi _s\) and \(\pi \).

If \(\varPi =\varPi _2\), we have that \(sk=(k,\mathbf{a})\in \mathcal{K}\times (\mathbb {Z}_p^*)^{s+1}\) and \(pk=(m,\mathbf{t})\). Due to the correctness of \(\varPi _2\), we have \(\mathsf{Verify}(sk,vk_i,\{\rho _{i,\ell }\}_{\ell \in [s]}, \pi _i)=1\) for every \(i\in [n]\), that is, \( \pi _i(\mathbf{e}_2)=\rho _{i,1}, \pi _i(\mathbf{e}_3)=\rho _{i,2}, \ldots , \pi _i(\mathbf{e}_{s+1})=\rho _{i,s} \) and \(\pi _i(\mathbf{a})=vk_i. \) Below is the combing algorithm:

  • \(((\xi _1,\ldots ,\xi _s), \pi )\leftarrow \mathsf{Comb}(f,(\rho _{i,\ell })_{n\times s}, \{\pi _i\}_{i\in [n]})\): computes \(\xi _\ell =f(\rho _{1,\ell },\ldots , \rho _{n,\ell })\) for every \(\ell \in [s]\) and \(\pi =f(\pi _1(\mathbf{y}), \ldots , \pi _n(\mathbf{y}))\). Outputs \(\xi _1,\ldots ,\xi _s\) and \(\pi \).

5 Concluding Remarks

We introduced a model for batch verifiable computation and constructed two BVC schemes with attractive properties. Extending the first scheme to support efficient outsourcing of new datasets, expanding the admissible function family of the second scheme, and constructing publicly verifiable batch computation schemes are interesting open problems that follow from this work.