Keywords

1 Introduction

1.1 Background

In the age of Internet accessing remote database is common and information is the most sought after and costliest commodity. In such a situation it is very important not only to protect information but also to protect the identity of the information that a user is interested in [11]. Private Information Retrieval (PIR) schemes are cryptographic schemes that enable users to retrieve records from public databases while keeping private the identity of the retrieved records [2]. In PIR schemes, a client is allowed to retrieve an entry from a server in possession of a database without revealing which entry is retrieved. The concept of PIR was first proposed in 1995 by Chor et al. [4]. There is a trivial solution consisting in sending the entire database regardless of the query. This solution has a high communication complexity of the database’s size tb(at least \(\log tb \) bits). Later, some schemes [3, 16, 18]that send less data have been proposed. Specifically, the Fully Homomorphic Encryption (FHE) and even the SomeWhat Homomorphic Encryption (SWHE) proposed by Gentry is known to imply the PIR scheme [3].

Moreover, in some practical scenarios, the server may provide the incorrect answers due to malicious behaviors or accidental failures. These scenarios can be defined as the malicious server model. Under this model, a PIR scheme can work effectively if the client should be able to identify the incorrect answers with overwhelming probability. This implies that how to verify the returned answers is a significant problem for a PIR scheme. Actually, for the honest-but-curious server model used in the previous work, it is assumed that the server is honest, which means that he follows the predefined scheme. From this point, this model is not very practical compared with the malicious server model. Then, constructing a PIR scheme that is secure in the malicious server model is well motivated and has been put forth by Beimel [2].

1.2 Related Work

In [18], Zhang and Safavi-Naini gave a verifiable multi-server PIR scheme where the servers may be malicious and provide some fraudulent answers. This scheme is an unconditionally t-private and computationally secure k-server verifiable PIR scheme in the honest-but-curious server setting. The drawback of this scheme is that it is too complicated to implement practically. Moreover, this PIR scheme does not work when all colluding servers host the database, which can be seen as the single malicious server setting..

In Sect. 5 of [3], the SWHE scheme is used to construct an asymptotically efficient Single-server PIR (SPIR) scheme based on the Learning With Errors (LWE) assumption. Specifically, this scheme employs some symmetric encryption scheme in the retrieval procedure. Using the most efficient symmetric scheme with the respect to the communication, the corresponding complexity of this scheme is \(\mathcal {O}((\log n)+\kappa \textsf {poly}\log (\kappa ))\) (n is the database size and \(\kappa \) is the security parameter).

In [16], Vannet and Kunihiro proposed an SPIR scheme under the honest-but-curious server model relying on the unrelated Approximate GCD (AGCD) assumption. Assume the size of database is tb, which can be split into nb blocks of mw words of bb bits each, such that \(nb \cdot mw \cdot bb = tb\). When nb cannot be decomposed in this way, pad the database with several bits. The database is denoted by a 2-dimensional array of words where each word is marked by two coordinates. Now use the set \(\{ b_{i,j} | 1 \le i\le nb,1 \le j \le mw\}\) to denote the database, and write block u as \(\{ b_{u,j} | 1 \le j \le mw \}\). The security of this scheme is based on the AGCD assumption introduced in [6]. The assumption is said that given a random distribution of values \( pq + \epsilon \) where \(\epsilon \ll p\), the q has \(\varphi _{q}\) bits. Sample a set of this distribution, output p. In the single bit scheme, assume that the client wants to retrieve the block u consisting of \(\{ b_{u,j }\} _{1\le j \le mw}\). The client samples a large random odd number p, and saves it as the secret key. He picks nb random numbers \(q_{i}\) and \(\epsilon _{i}\), and computes \(Q_{i} = pq_{i} + 2\epsilon _{i} + {\delta }_{i,u}\)(\({\delta }_{i,u}\) is the index vector where \({\delta }_{i,j}=1\) if \(i=j\), 0 otherwise). For each \(Q_{i}\) that the server received, compute \(R_{j}=\sum _{i=1}^{nb}b_{i,j}Q_{i}\) and send it back to the client. On receiving \(R_{j}\), the client decodes that (\(R_{j} \mod p )\mod 2 = (\sum _{i=1}^{nb}b_{i,j}(pq_{i} + 2\epsilon _{i} + {\delta }_{i,u})\mod q )\mod 2 = \sum _{i=1}^{nb}b_{i,j}(2\epsilon _{i} + {\delta }_{i,u})\mod 2 = b_{u,j}\). In this scheme, p and q should be two large integers, which can guarantee that the scheme holds the security property. However, this scheme works under the honest-but-curious server model but not the malicious server model.

1.3 Open Problem

The previous work [9, 16] related to SPIR is generally under the honest-but-curious server model. This model however is not suitable for many real-world scenarios such as involving the untrusted cloud server. From Zhang and Safavi-Naini’s work [18], although a verifiable multi-server PIR scheme has been presented, constructing a Verifiable SPIR constructing a Verifiable SPIR (VSPIR) scheme under the malicious server model seems to be a difficult task. This is due to the fact that the protection of the input index depends on the heavy FHE scheme, which implies that the computational complexity is very high. To the best of our knowledge, there has not been a practical VSPIR scheme. Therefore how to construct a simple and pratical VSPIR is still an open problem.

1.4 Our Contributions

In this work, we present two main contributions. The first warmup one is to introduce an SPIR scheme based on the decision-LWE with binary error assumption under the honest-but-curious server model. Then, according to this scheme, we construct a VSPIR scheme under the malicious server model.

  • The SPIR scheme based on the decision-LWE assumption. In our proposed SPIR, we use the database defined in [16]. We assume that a client wants to obtain the block u without revealing any information about u. The client uses a special variant of the encryption scheme with additive homomorphism in [7] to encrypt the query vector where the u-th elements is 1 and others are 0, then compute the query messages \(\{Q_{i}\}\). A server computes \(R _j = \sum _i^n b_{i,j}Q_{i}\) where R\(_{i}\) is equivalent to the encrypted \( b_{u,j}\). Then the server sends \(R_j\) to the client. For each \(R _j\), the client runs the homomorphic decryption scheme to recover the block u. Thus the client can get the real block. The privacy of our SPIR scheme is based on the hardness of LWE with binary error problem.

  • The VSPIR scheme using the probabilitic verification process (see Fig. 1). Based on our proposed SPIR scheme, we use a probabilistic verification process [5] to construct our VSPIR scheme. The main idea of the probabilistic verification is very simple: a client samples a random input r and precomputes a specific function F(r). He sends an input pair (x, r) to a server in a random order, and wants to receive both F(x) and F(r) from the server. When receiving the answers from the server, the client checks the correctness of the response value F(r); if it is the same as the precomputed F(r), then the client accepts the response F(x), and rejects otherwise. Because both x and r are independent and distributed identically, no malicious adversary can distinguish the real input x from the random input r and deceive with probability greater than 1 / 2. In our proposed VSPIR scheme, we do the similar process: the client generates a random vector \(\mathbf{r }\in \{0,1\}^{m}\) to replace the u-th row in a matrix. Encrypt this matrix, then send the query message to the server and decode the responses \(R_{j}\) from the server. If the elements of the random vector corresponding to the index are the same, the elements of u-th row are the same. Then, the client accepts the received responses.

Fig. 1.
figure 1

Verifiable single-server PIR

For showing the merits of our proposal, we list the differences between our scheme and some other related scheme in Table 1. Specifically, our construction is essentially different from the SPIR construction proposed by Brakerski and Vaikuntanathan [3]. In our proposal, the client uses the encryption scheme with the additively homomorphic property to encrypt the index directly and the server responses the answer using the addictive homomorphically evaluate the database access function. However, in [3], the client encrypts the symmetric key using the FHE or SWHE scheme, then uses the encrypted symmetric key to encrypt the index, which can convert the symmetric ciphtertexts into homomorphic ciphertexts. The server uses the homomorphic ciphertexts homomorphically evaluate the database access function to retrieve an encryption of the answer. Moreover, in our SPIR construction, since using the encryption scheme based on the LWE with binary error assumption, the matrix multiplication operation in encryption scheme is equivalent to some matrix addition operation. The computational complexity can be slowed down to be \({\mathcal {O}}(\sqrt{tb})\).

1.5 Outline of Our Paper

The rest of this paper is organized as follows: in Sect. 2, after finishing notations used in this paper, we introduce the LWE problem and some definitions related to the VSPIR. In Sect. 3, we detail our proposed constructions for the SPIR scheme and VSPIR scheme, and then in Sect. 4 we analyze their performances. In Sect. 5, we present some computer simulations for our proposals. Finally, in Sect. 6, we make some concluding remarks.

Table 1. Comparisons between the proposed scheme and some other related schemes.

2 Preliminaries

2.1 Notations

Before we present our scheme, we give some notations used in this paper. In this work, We denote vectors by bold lower-case letters \(\mathbf (\text {x} ,\mathbf{y },\cdot \cdot \cdot )\), matrices by bold upper-case letters \((\mathbf X ,\mathbf Y ,\cdot \cdot \cdot )\). We denote a security parameter by \(\kappa \in \mathbb {N}_{+}\). We denote the class of polynomial functions in \(\kappa \) by \(\textsf {poly}(\kappa )\), some fixed polynomial functions q in \(\kappa \) by \(q=q(\kappa )\), and some unspecified negligible function in \(\kappa \) by \(\textsf {negl}(\kappa )\). We denote the transpose of \(\mathbf{x }\) by \(\mathbf{x }^{T}\). We consider the operation \(x\xleftarrow {\$}\varPsi \) as choosing x uniformly at random in a set \(\varPsi \). We use \(\mathcal {D}\) to indicate a distribution over some finite set \(\mathcal {S}\). We denote \(x \xleftarrow {\$} \mathcal {D}\) that x is generated at random from the distribution.

2.2 Learning with Errors

The LWE problem was first introduced by Regev [15]. The formal definition can be as follows:

Definition 1

( LWE Problem [3]). For security parameter \(\kappa \), \(n = n(\kappa )\), let \(q=q(n)\) be an integer and error distribution \(\chi = \chi (n)\) over \(\mathbb {Z}_{q}\). Let \(A_{s, \chi }\) be the distribution obtained by choosing a vector a from \(\mathbb {Z}_{q}^{n}\) and an error term e from \(\chi \) uniformly at random, and outputting \((\mathbf a , \left\langle a, s \right\rangle +e) \in (\mathbb {Z}_{q}^{n} \times \mathbb {Z}_{q})\). The learning with errors problem \({\textsf {LWE}}_{n,m,q,\chi }\) defined as follows: Given m independent instances from \(A_{\mathbf{s },\chi }\), output \(\mathbf{s }\) with non-negligible probability.

The decision variant of the LWE problem, denoted \({\textsf {decision-LWE}}_{n,m,q,\chi }\) is to distinguish the following two distributions: One is that sampling m instances \((\mathbf{a }_{i},\mathbf{b }_{i})\) uniformly from \(\mathbb {Z}_{q}^{n+1}\). The other one is that sampling m instances sampled according to \( A_{\mathbf{s },\chi }\). The \({\textsf {decision-LWE}}_{n,m,q,\chi }\) assumption is that the \({\textsf {decision-LWE}}_{n,\kappa ,q,\chi }\) problem is computationally infeasible.

Regev proved in [15] that given certain module q and Gaussian error distribution \(\chi \), \({\textsf {LWE}}_{n,\kappa ,q,\chi }\) problem is as long as certain worst-case lattice problems which are hard to solve using a quantum algorithm. These reductions take \(\chi \) to be the discretized versions of the Gaussian distribution which is B-bounded for an appropriate value B.

Definition 2

(B-Bounded Distributions [3]). A distribution ensemble \(\{\chi _{n} \}_{n\in N}\), supported over the integers, is called B-bounded if

$$\begin{aligned} \underset{e \leftarrow \chi _{n}}{\Pr }[|e|>B] \le \textsf {negl}(n). \end{aligned}$$

The following theorem is the Regev’s worst-case to average-case reduction for LWE:

Theorem 1

([15]). For \(q=q(n) \in \mathbb {N}\) be a product of \(q = \prod q_{i}\) such that for all i, \(q_{i} = {\textsf {poly}}(n)\), and let \(B\ge n\). There exists an efficiently sampleable B-bounded distribution \(\chi \) such that if there is an efficient algorithm that solves the \({\textsf {decision-LWE}}_{n,q,\chi }\) problem, then there is an efficient quantum algorithm for solving \(\tilde{O}(qn^{1.5}/B)\)-apporoximate worst-case SIVP and gapSVP.

We refer the readers to [14, 15] for the detailed and formal definitions of these lattice problems.

Definition 3

( LWE with Binary Error Problem [10]). Let n, q be positive integers, \(\chi \) be a uniform distribution on \(\{0,1\}\) and \(\mathbf{s }\xleftarrow {\$} \chi ^{n}\) be a secret vector in \(\{0,1\}^{n}\). Let \(A^{'}_{\mathbf{s },\chi }\) be the distribution obtained by choosing a vector \(\mathbf a \in \mathbb {Z}^{n}_{q}\) uniformly at random and a noise term \(e \xleftarrow {\$} \chi \), and outputting \((\mathbf{a },\langle \mathbf{a }, \mathbf{s } \rangle + e) \in \mathbb {Z}_{q}^{n} \times \mathbb {Z}_{q}\).

\({\textsf {LWE}}\) with binary error problem is to recover s from m samples \((\mathbf{a }_{i},\langle \mathbf{a }_{i}, \mathbf{s }_{i} \rangle + e_{i}) \in \mathbb {Z}_{q}^{n} \times \mathbb {Z}_{q}\).

The decision variant of the \({\textsf {LWE}}\) with binary error problem is to distinguish with non-negligible advantage m samples chosen according to \(A^{'}_{\mathbf{s },\chi }\), from m samples chosen according to the uniform distribution over \(\mathbb {Z}_{q}^{n} \times \mathbb {Z}_{q}\).

Theorem 2

([13]). For any integers n and \(m = n\cdot (1+\Omega (1/ \log n))\), and all sufficiently large polynomially bounded prime modulus \(q \ge n^{\mathcal {O}(1)}\), solving \(\textsf {LWE}_{n,m,q}\) with uniformly random binary errors (i,e, in {0,1}) is at least as hard as approximating lattice in the worst case on \(\varTheta (n/ \log n)\)-dimensional lattices within a factor \(\gamma = \tilde{\mathcal {O}}(\sqrt{n} \cdot q)\).

Theorem 2 shows that for the \(\textsf {LWE}\) problem, it remains hard even when the errors are small (e.g, uniformly random from {0, 1}). Most cryptographic constructions are based on the LWE problem where secret and error are identically distributed [10]. Using the search-to-decision reduction of [13], Peikert et al. proved that \(\textsf {decision-LWE}_{n,m,q}\) with binary error has the similar hardness of \(\textsf {LWE}_{n,m,q}\) with binary error.

2.3 The GHV-Type Encryption Scheme

The basis of the GHV scheme [7] is a trapdoor sampling algorithm [8]. The trapdoor sampling procedure generates a matrix \(\mathbf A \in \mathbb {Z}_{q}^{m\times n}\) (that is within negligible statistical distance of uniform), together with an invertible matrix \(\mathbf T \in \mathbb {Z}^{m\times m}\) with small entries such that \(\mathbf T \cdot \mathbf A = 0(\text {mod }{q})\).

The trapdoor can be used to solve the \(\textsf {LWE}\) problem relative to \(\mathbf A \). This is done as follows: \(\mathbf T \mathbf{y } = \mathbf T (\mathbf A \mathbf{s }+\mathbf{x }) = \mathbf T \mathbf{x }(\text {mod }q).\) Multiplying \(\mathbf T ^{-1}\) gives us x. There is a probabilistic polynomial-time (PPT) algorithm TrapDoor that, on input \(1^{\kappa }\), positive integer \(q\ge 2\), and a poly(n)-bounded positive integer \(m\ge 5n \log q\), output matrices \(\mathbf A \in \mathbb {Z}^{m\times n}_{q}\) and \(\mathbf T \in \mathbb {Z}^{m\times m}\) where the Euclidean norm of each rows is at least \(20n\log q\) [1].

The GHV-type encryption scheme [7] is defined by a triple PPT algorithm GHV = (GHV.KeyGen, GHV.Enc, GHV.Dec):

  • GHV.KeyGen\((1^{\kappa })\rightarrow (pk,sk)\): On input the \(1^{\kappa }\), let \(n=\kappa \), run the trapdoor sampling algorithm to obtain a matrix A\( \in {Z^{m\times n}_{q}}\) together with a trapdoor matrix T\(\in Z^{m\times m }\), i.e., (A,T)\(\leftarrow \) TrapDoor(\(1^{n},q,m\)). The public key pk is A and the secret key sk is T.

  • GHV.Enc\(_{pk}\)(M)\(\rightarrow \) C: To encrypt the binary message \(\mathbf M \in \{0,1\}^{m\times m}\), choose a uniformly random matrix S\(\xleftarrow {\$} \mathbb {Z}^{n \times m}_{q}\) and an “error matrix” \(\mathbf{X }\xleftarrow {\$}\chi ^{m\times m} \). Output the ciphertext \(\mathbf{C }\leftarrow \mathbf AS +2\mathbf X +\mathbf M (\text {mod }{q})\) where 2X means multiplying each entry of the matrix X by 2.

  • \({\textsf {GHV.Dec}}_{sk}(\mathbf{C })\rightarrow \mathbf{M }\): Set \(\mathbf{E }\leftarrow \mathbf{TCT }^{T}(\text {mod }{q})\), and then output \(\mathbf{B }\leftarrow \mathbf{T }^{-1}\) \(\mathbf{E (\mathbf T }^{T})^{-1} \mod 2\).

2.4 Formal Definitions About VSPIR

Definition 4

(VSPIR). The VSPIR scheme consists of a database owner server S,and a client C. S has the database \(db=(db_{1},\cdots ,db_{tb})\). C owns an index \(i\in [tb]\) and wants to recover the \(db_i\) from the clouds, keeping the i secret. The VSPIR scheme is defined by five PPT algorithms VSPIR = (VSP.Setup, VSP.Query, VSP.Challenge, VSP.Response, VSP.Verify):

  1. 1.

    \({\textsf {VSP.Setup}}(1^{\kappa })\rightarrow (pk,sk)\) : On input \(1^{\kappa }\), output the public key pk and secret key sk.

  2. 2.

    \({\textsf {VSP.Query}}_{sk}(i)\rightarrow (Q,aux)\) : On input a private key sk and an index \(i\in [tb]\), output a query Q along with auxiliary information aux.

  3. 3.

    \({\textsf {VSP.Challenge}}_{sk}(i, \kappa )\rightarrow L\): On input \(\kappa \), the index i and sk output the challenge message L.

  4. 4.

    \({\textsf {VSP.Response}}_{pk}(Q, db, L)\rightarrow R\): On input a public key pk, the query Q, database db and the challenge L. Output the response message R.

  5. 5.

    \({\textsf {VSP.Verify}}_{sk}(Q,R,aux)\rightarrow \{ db_{i},\bot \}\) : On input sk, Q, the response message R, and the auxiliary aux. Output the \(db_{i}\) or \(\bot \).

The server S who owns the database is responsible to set up the system. To set up the system, S runs VSP.Setup to obtain (pksk) in the off-line stage. Then pk is published or sent to server, the database db is given to the cloud, and the sk is kept private by client. To retrieve \(db_{i}\), C runs VSP.Query to compute (Qaux) and sends the query message Q to the cloud S. Upon receiving Q, S runs VSP.Response and replies with the responses message R. To verify the responses, C runs VSP.Challenge to generate a challenge message L and runs VSP.Verify to verify the responding messages and compute \(db_i\) if the algorithm VSP.Verify does not output the failure message.

Now, we present some formal properties of VSPIR, these definitions are based on the previous work [3, 16, 18].

Definition 5

(Correctness). The VSPIR scheme is convinced to be correct if the verify algorithm always computes the correct value of \(db_i\) when the server gives the correct response. Formally, for \(\kappa \), database db, let \({\textsf {VSP.Setup}}(1^{\kappa })\rightarrow (pk,sk)\), for any query index \(i \in [tb]\), let \({\textsf {VSP.Query}}_{sk}(i) \) \(\rightarrow (Q,aux)\) and \({\textsf {VSP.Response}}_{pk}\)(Q, db, L) \(\rightarrow R_{i}\), it holds that

$$\begin{aligned} {\Pr }[{\textsf {VSP.Verify}}_{sk}(Q,R,aux)\ne db_{i}] \le {\textsf {negl}}(\kappa ) \end{aligned}$$

if the verify algorithm does not compute the failure message.

Definition 6

(Privacy). The scheme VSPIR is convinced to be private if the adversary can not learn any information about i. Namely, for two queries \(i_{1}\),\(i_{2} \in [tb]\) it can computationally distinguish \({\textsf {VSP.Query}}_{sk}(i_{1}) \) from \({\textsf {VSP.Query}}_{sk}(i_{2})\) with negligible probability. Formally, let \(\kappa \) be a security parameter for an adversary \(\mathcal {A}\) running in polynomial time and asking polynomially many queries, it holds that

$$\begin{aligned} {\Pr }[\mathcal {A}(\mathrm{\textsf {VSP.Query}}_{sk}(i_{1}))]-{\Pr }[\mathcal {A}(\mathrm{\textsf {VSP.Query}}_{sk}(i_{2}))] \le \textsf {negl}(\kappa ). \end{aligned}$$

Definition 7

(Security). The scheme VSPIR is convinced to be secure if PPT adversary can deceive the client into obtaining an incorrect value of \(db_i\) with negligible probability. We can consider the behavior of \(\mathcal {A}\) in a number of \(Game_{0}\), \(Game_{1}\), \(Game_{2}\) as defined below:

  1. 1.

    \(Game_{0.}\) The challenger generates \({\textsf {VSP.Setup}}(1^\kappa )\rightarrow (pk,sk)\) and then publishes pk to \(\mathcal {A}\). \(\mathcal {A}\) owns the database db and every time \(\mathcal {A}\) chooses an index i the challenger will reply corresponding query message Q.

  2. 2.

    \(Game_{1.}\) \(\mathcal {A}\) picks a specific index i and sends it to the challenger, the challenger responses \({\textsf {VSP.Query}}_{sk}(i)\rightarrow (Q,aux)\). To verify the \(db_{i}\), challenger runs \({\textsf {VSP.Challenge}}_{sk}(i,\kappa )\) \( \rightarrow C\).Then \(\mathcal {A}\) runs VSP.Respon- \({\textsf {se}}_{pk}\) \((Q, db, C) \rightarrow R\) to response the challenger.

  3. 3.

    \(Game_{2.}\) \(\mathcal {A}\) wins if \({\textsf {VSP.Verify}}_{sk}(Q,R,aux) \notin \{db_{i},\perp \}\).

In the security \(Game_{2}\), \(\mathcal {A}\) can deceive the client into reconstructing an incorrect value of \(db_i\) even if it can choose the index of database freely with negligible probability. Thus, the security of a VSPIR scheme defined above allows the client to recover the correct block that he wants to obtain from the database

Definition 8

(Communication Complexity). The communication complexity of a scheme is defined as the number of bits being exchanged to transfer a single database element excluding the setup phase.

Definition 9

(Index Mapping Function). We define an index mapping function which maps the index u to an vector matrix. It takes as input an index u in some scope and output an index vector: \( \varvec{\delta }_{i,u} \leftarrow E(u) .\) where the u-th element of the vector is 1, the others are 0.

3 Our Constructions

In this section, we demonstrate our scheme in a gradual manner. We first present our variant of the GHV-type encryption scheme and an SPIR using this variant. After that, based on the proposed SPIR, we give a VSPIR construction under the malicious server model.

3.1 A Variant of the GHV-Type Encryption Scheme

In GHV scheme [7], it can encrypt a matrix of \(m^{2}\) elements in time \(\tilde{\mathcal {O}}(m^{3})\). To reduce the computational complexity, we consider the LWE with binary error assumption. Luckly, previous work [10, 13] has proved the hardness of the LWE with binary error problem.

For ease of presentation, we focus below on the case of encrypting binary vectors for better use in our SPIR scheme. The extension for encrypting matrices with lower computational complexity comparable to GHV scheme is straightforward. Our variant of the GHV-type encryption scheme VGHV = (VG.KeyGen, VG.Enc, VG.Dec) is a triple of PPT algorithms as follows:

  • \({\textsf {VG.KeyGen}}(1^{\kappa })\rightarrow (pk,sk)\): The algorithm is the same as the algorithm in GHV scheme. The public key \(pk={ \mathbf A }\in \mathbb {Z}^{m \times n}_{q}\) and the secret key sk =T\(\in \mathbb {Z}^{m\times m}\).

  • \({\textsf {VG.Enc}}_{pk}(\mathbf{m })\rightarrow \) c: To encrypt \(\mathbf m \in \{0,1\}^{m} \), choose a uniformly random vector \(\mathbf{s }\in \{0,1\}^{n}\) and a uniformly random error vector \(\mathbf{x }\in \{0,1\}^{m}\). Output the ciphertext \(\mathbf{c }\leftarrow \mathbf{As }+2\mathbf{x }+\mathbf{m } (\text {mod }{q})\) where 2x means multiplying each entry of the vector x by 2.

  • \({\textsf {VG.Dec}}_{sk}(\mathbf{c })\rightarrow \mathbf{m }\): Set \(\mathbf{e }\leftarrow \mathbf{Tc } \mod q\), and then output m\(\leftarrow \) T\(^{-1}\) e\( \mod 2\).

For the decryption algorithm, recall that \(\mathbf T \cdot \mathbf A = 0 (\text {mod } q)\) and therefore Tc =T(2x+m)\( (\text {mod } q)\). If in addition all the entries of T(2x+m) are smaller than q then we also have the equality over the integers e = (Tc\( (\text {mod }{q})\)) = T(2x+m)\((\text {mod }{q})\), and hence \(\mathbf{T }^{-1}\mathbf{e }=\mathbf m (\text {mod }{2})\). We have the correct decryption when all the entries of T(2x+m) are smaller than 2 / q.

Additional Homomorphic Operation. Given two ciphertexts \(\mathbf{c }_{1}\), \(\mathbf{c }_{2}\) that decrypt to \(\mathbf{m }_{1}\), \(\mathbf{m }_{2}\). Let \(\mathbf c =\mathbf{c }_{1}+\mathbf{c }_{2}\). For addition, we have \(\mathbf c =\mathbf{A }(\mathbf{s }_{1}+\mathbf{s }_{2})+2(\mathbf{x }_{1}+\mathbf{x }_{2})+\mathbf{m }_{1}+\mathbf{m }_{2}\) which can be decrypted as \(\mathbf m _{1}+\mathbf{m }_{2}\) when all entries in \(\mathbf{T }(2(\mathbf{x }_{1}+\mathbf{x }_{2})+\mathbf{m }_{1}+\mathbf{m }_{2})\) are smaller than q / 2.

Theorem 3

Any distinguishing algorithm with advantage \(\epsilon \) against the CPA privacyFootnote 1 of the scheme can be converted to a distinguisher against \(\textsf {decision-LWE}_{m,n,q}\) with binary error with roughly the same advantage at least \({\epsilon }/2\) m.

Proof

See Appendix D for the proof.

We can use the above variant of the GHV scheme to encrypt the binary matrices by setting uniformly random \(\mathbf{S }\in \{0,1\}^{n\times m}\) and uniformly random “error matrix” \(\mathbf{X }\in \{0,1\}^{m \times m} \). Specifically, we call this variant of the GHV scheme as MVGHV. Note that, our MVGHV is more efficient than the original GHV scheme: MVGHV takes time \(\mathcal {O}(m\cdot n)\) to encrypt a matrix of \(m^{2}\) elements comparing with \(\tilde{\mathcal {O}}(m^{3})\) in GHV scheme. The CPA privacy of MVGHV scheme is based on the LWE with binary error using the proof algorithm in [7].

Theorem 4

For the parameter \(n=n(\kappa )\) and \(c = c(n) > 0\), let \(q > 8n^{3c}\), \(m=\lfloor 5n\log q \rfloor \), then the encryption scheme from above with parameters n,m,q supports \(n^{c}\) additions.

Proof

See Appendix A for the proof.

Theorem 4 shows that the number of \(\textsf {LWE}\) with binary error samples \(m=\mathcal {O}(n)\) is linear. For selection about m, it can be satisfied by taking \(m=5n \log q\) for fixed q.

3.2 Our SPIR Scheme

Now we introduce our SPIR scheme. We redefine the database \((db_{1}, \cdots , db_{tb})\), \(db_{i}\in \{0, 1\}\): assume the database can be split into nb blocks of mw words of bb bits each, such that \(nb \cdot mw \cdot bb = tb\) (tb is the size of database). Namely, the nb is the count of blocks, and mw for words per block and bb for bits per word. If nb cannot be decomposed in this way, pad the database with several extra bits to make it that. We denote the database by a 2-dimensional array of words where each word is marked by two coordinates. Then we obtain the database \(b=\{b_{i,j} | 1 \le i \le nb, 1 \le j \le mw \}\), the total bit size of the database is tb. First we assume that every word is a single bit, clearly \( mw = 1 \).

Assume that the client C wants to recover the block u that is consisted of \({ b_{u,j}.}\) We present the SPIR with four PPT algorithms SPIR = (SP.Setup, SP.Query, SP.Response, SP.Dec):

  • \({\textsf {SP.Setup}}(1^{\kappa }) \rightarrow (pk, sk)\): This algorithm is to set up the system and generate the public key pk and the secret key sk. On input \(\kappa \), Run the VG.KeyGen to obtain the public key \(\textit{pk}={\mathbf{A }}\in \mathbb {Z}^{m\times n}_{q}\), and the secret key \(\textit{sk}={\mathbf{T }}\in \mathbb {Z}^{m\times m}\). The pk is published to S, and the sk is kept secretly in C.

  • \({\textsf {SP.Query}}_{pk}\)(\(u)\rightarrow Q\): This algorithm for C is to obtain the query string. On input the public key pk and index u, compute the function E(u) to obtain the index vector \(\varvec{\delta }_{i, u}\in \{0,1\}^{nb}\), and then spit it to \(m_{c}=\lceil nb/m \rceil \) vectors, if \(\varvec{\delta }\) cannot be decomposed in this way, pad the last vector with several 0 elements. Encrypt these vectors in order. Run the algorithm \({\textsf {VGHV.Enc}}_{pk}(\mathbf m _{c})\rightarrow \mathbf c _{c}\), \(c\in [m_{c}]\). The query message Q is these ordered vectors \(\mathbf c _{c}\).

  • \({\textsf {SP.Response}}(b,Q)\rightarrow R \): This algorithm for S is to compute the responses. On input the query message Q and the database b, compute the responses for every j from 1 to mw: \( \mathbf{r }_{j} = \sum _{c=1}^{m_{c}}b_{m(c-1)+i,j}\mathbf{c }_{c} \) for i from 1 to m. According to the homomorphism, multiply \(b_{i,j}\) by \(\mathbf{c }\) corresponds to the multiplication of \(b_{i,j}\) and \(\varvec{\delta }\). Thus \(\mathbf{r }_{j}\) is the homomorphic sum of \(b_{i,j}{} \mathbf m _{c}\). The element of vector \(\varvec{\delta }_{i,u}\) is 1 where \(i=u\), otherwise 0. The \(\mathbf{r }_{j}\) is the ciphertext of block \(b_{u}\). The response message R is these vectors \(\mathbf{r }_{j}\).

  • \({\textsf {SP.Dec}}_{sk}(R)\rightarrow b_{u,j}\): This algorithm for S is to recover the block that C wants. On input the secret key sk and responses R, run \({\textsf {VG.Dec}}_{sk}(\mathbf r _{j})\) to obtain the block u.

Multi-Words SPIR Scheme. In the above scheme, we assume that the word is a single bit. We can easily modify it to recover multi-words scheme. Instead of computing \(\mathbf{A }{\mathbf{s }} + \mathbf{2e }\), we compute \(\mathbf{A }\mathbf{s }+ 2^{mw}\mathbf{e }\) , \(b_{u,j}\mathbf{e }_{j}\leftarrow b_{u,j}\mathbf{T }_{j}(2\mathbf{x }_{j}+\varvec{\delta }_{j})\mod q \), hence \(b_{u,j}\mathbf{T }^{-1}_{j}\mathbf{e } \mod 2^{mw} =b_{u,j}{\varvec{\delta }}_{j}\). Since q has to be large for security reasons and the noise only progresses linearly when processing the database, we can afford to start with a fairly large noise. We can utilize the same trick to obtain the “multi-words matrix retrieval”.

3.3 Our VSPIR Scheme

Before introducing our VSPIR scheme,we show the probabilistic verification process used in the work [5]. C delegates some computation F to an untrusted server S, and C wants to verify the response from S. Assume that C can precompute F(x), we can define it as three procedures:

  • Setup. Input \(\kappa \), the delegated computation function \(F:\{ 0,1 \}^{n} \rightarrow \{ 0,1 \}^{m}\), and the value \(x \in \{ 0,1 \}^{n}\).

  • Precomputation. C samples a random input r, computes \(w=F(r)\), and stores (rw) as secret state.

  • Delegation and Verification. C has an input \(m \in \{ 0,1 \}^{n}\). C sets \(r_{0} = r\) and \(r_{1} = m\), then samples a random bit \(b \in \{ 0,1 \}\), and sends the pair \((r_{b},r_{1-b})\) to S. S computes and sends \((z_{0},z_{1})=(F(r_{0}),F(r_{1}))\) to C. Then C accepts and recovers the response \(z_{1-b}\) if \(w=z_{b}\).

From the work in [5], we can find that since x and r are independent and identically distributed, no malicious adversary can cheat C successfully with non-negligible probability. Our VSPIR scheme employs the similar probabilistic verification procedures as above. The main difference of probabilistic verification procedure between our VSPIR and the proposal in [5] is that we use a random vector to mark the result that we want instead of precomputating. Now we detail the description of our VSPIR scheme VSPIR = (VSP.Setup,VSP.Challenge, VSP.Response, VSP.Verify):

  • VSP.Setup(\(1^{\kappa }) \rightarrow (sk,pk)\). On input \(1^{\kappa }\), run SP.Setup and output (skpk).

  • VSP.Challenge\(_{sk}\)(\( u)\rightarrow Q\). On input the index u and sk, pick up a random vector \(\mathbf{v } \in \{0,1\}^{m}\), if the d-th element of \(\mathbf{v } v_{d}\) is 1, then run index mapping function \(\mathbf v _{i,u} \leftarrow E(u)\) otherwise generate a 0 vector with the same dimension. Then combine these vectors into a matrix \(\mathbf I \). Run the encryption algorithm in MVGHV to encrypt the matrix using the SP.Query way to obtain the query Q.

  • VSP.Response(bQ)\(\rightarrow R_{k}\). On input database b and query messages Q, compute SP.Response(bQ) to obtain the responses \(R_{j}\).

  • VSP.Verify\(_{sk}\)(Q, \(R_{j}\))\(\rightarrow \{ b_{u}, \bot \}\). On input Q and the responses messages \(R_{j}\), run the decryption algorithm in MVGHV and make sure whether the decryption result is our expectation. Accept and output the recovers if when the elements of the random vector’s corresponding index are the same,the elements of u-th row in the matrices \(b_{u,j}{} \mathbf I \) are the same as \(b_{u,j} \). Otherwise, reject and output \(\bot \).

4 Performance Analysis

In this section, we first analyze correctness, privacy and security of the proposed SPIR and VSPIR scheme. After that, we present communication complexity and computational complexity of our proposals.

4.1 Correctness, Privacy and Security

Theorem 5

(Correctness). If the proposed MVGHV encryption scheme holds the homomorphic property for supporting polynomially many additions,then our VSPIR scheme can computes the correct retrieval information when server gives the correct response.

Proof

See Appendix B for the proof.

Theorem 6

(Privacy). If any distinguishing algorithm can distinguish two queries for distinct bits of database with probability at least \(1/2 + \epsilon /2\), then the distinguisher can break the privacy of our VSPIR with probability \((1 + \epsilon )/2\).

Proof

See Appendix C for the proof.

Theorem 7

(Security). Our VSPIR scheme is convinced to be secure if no PPT adversary can deceive the client into obtaining an incorrect value responded from the server with non-negligible probability.

Proof

See Appendix D for the proof.

4.2 Complexity

In this section, we analyze the communication complexity and computational complexity of our protocols. Recall our schemes, the public key is sent only once, it is independent of the database and the query, and it can be used for many queries. Therefore it is customary to analyze such schemes in the public key model where sending the public key does not count towards the communication complexity.

For SPIR scheme, to encrypt the query vectors, the size of ciphertext in our encryption algorithm is composed of \(\lceil nb/m \rceil \) vectors whose size is \( \lceil nb/m \rceil m\log q \). Then the response size is \(m\log q\). When \(nb=mw=\sqrt{tb}\), for fixed security parameter, the communication complexity is \(\mathcal {O}(c^{'}\sqrt{tb})\). For VSPIR scheme, the total size of one round transform messages comes to \((m^{2}+\lceil nb/m \rceil m^{2})\log q\) and the communication complexity is \(\mathcal {O}(c\sqrt{tb})\) (c and \(c^{'}\) are constant).

The communication complexity is not changed in the process for multiple bits recovery, but we now can retrieve a block of mw bb bits. Furthermore, when \(nb=mw=\sqrt{tb/bb}\) the communication complexity is \( \mathcal {O}(c\sqrt{tb/bb})\) per an index recovered.

Now let us look at the computational complexity. To set up the system, the key generation algorithm is executed once and takes time \(\mathcal {O}(1)\). If not considering any optimization algorithm, to encrypt the vector index it takes about \(\lceil nb/m\rceil m\) operations and to encrypt the matrix index it takes about \(\lceil nb/m\rceil m\cdot n\). When \(nb = mw = \sqrt{tb}\), for fixed security parameter, the computational complexity is \(\mathcal {O}(\sqrt{tb})\) and \(\mathcal {O}(c\sqrt{tb})\), respectively.

Fig. 2.
figure 2

The trends of the client’s costs in our VSPIR scheme

Table 2. The client’s costs in our VSPIR scheme (second).

5 Computer Implementations

In this section, we made a straightforward implementation of our VSPIR scheme without aiming for high levels of optimization. The timings were performed on a 2013 ASUS (Intel(R) Core(TM) i5-3230M, 2 hyperthreaded cores at 2.60 GHz, 8 GB RAM at 1.600 GHz), on Windows (Windows 10 Home, x64 64). Our implementations are single-threaded. We used NTL for operations over \(\mathbb {Z}_{q}\), matrix operations, and big number operations. We implement the algorithm VSP.Query and the decryption part of VSP.Verify. To simplify the operation, we focus on a matrix. For showing the cost of the proposed VSPIR scheme, we list the time costs in Table 2 when choosing different parameters and draw the trends of the proposed VSPIR in the Fig. 2. Note that, second can be denoted by s.

Now we consider some real scenarios. As showed in [17], the maximum bandwidth of common Internet access technologies such as the Wireless 802.11 g is 54Mbit/s, Fast Ethernet is 100Mbit/s, OC12 is 622Mbit/s. In these scenarios, the communication complexity of the server and the client can be asymmetrically negligible. Let the database be 1G bits, we set the bit of \(q=30\), \(n= 269\), \(nb=\sqrt{tb}\). Assume that the upload speed and download speed are all 20Mbit/s, now we can roughly compute the time cost of one round query and the response is about 31 s. The probabilistic verification process can increase the time costs slightly.

6 Conclusions

In this paper, we have proposed an efficient SPIR scheme based on the LWE with binary error assumption and a VSPIR scheme using the probabilistic verification that can work under the malicious server model. Compared with previous works, our scheme is the first practical VSPIR scheme under the malicious server model that we know of. Specifically, our VSPIR scheme has communication complexity \(\mathcal {O}(c\sqrt{tb})\) that is smaller than the communication complexity of the proposal in [18].