Abstract
In this paper, we present a new perspective of single server private information retrieval (PIR) schemes by using the notion of linear errorcorrecting codes. Many of the known single server schemes are based on taking linear combinations between database elements and the query elements. Using the theory of linear codes, we develop a generic framework that formalizes all such PIR schemes. This generic framework provides an appropriate setup to analyze the security of such PIR schemes. In fact, we describe some known PIR schemes with respect to this codebased framework, and present the weaknesses of the broken PIR schemes in a unified point of view.
1 Introduction
Private information retrieval (PIR) was first introduced in [1] to cope with the following problem: retrieving an element from a database, without revealing to the untrusted source managing the database any information about that element. Since its introduction, it has attracted many researchers and several works have addressed their focus on it. There have been proposed two solutions to this problem, namely, the information theoretical one and the computational one. The first one aims to guarantee that the server gets no information about the file that the user wants to retrieve. Solutions for multiple servers were presented in [2,3,4,5,6,7]. In the case of a single server, the trivial solution, i.e., downloading the whole database, is the only possibility to ensure information theoretical privacy. However, the trivial solution is not satisfactory as it comes with a very large communication cost, which basically implies that it is impractical. On the contrary, in computational PIR, the privacy is guaranteed assuming that the server has limited computational power. Hence, the computational PIR (cPIR) is the only practical solution in case of a single server.
Most of the early cPIR schemes are based on the difficulty of numbertheoretical problems, such as integer factorization (see for example [8,9,10,11]). The known (nontrivial) single server cPIR constructions require to perform some cryptographic operations on each database element, which increase the computational cost of these schemes in comparison to the information theoretical ones. In [12], Sion and Carbunar showed that the numbertheoretical PIR schemes are not practical, and computing a PIR reply is always less efficient than sending the whole database. Moreover, such schemes, based on factoring an integer, will be insecure in the era of quantum computers [13].
Some recent constructions of PIR schemes use a fully homomorphic encryption (FHE) scheme. Yi et al. presented in [14] a generic way to construct a PIR from an FHE. Following this construction many PIR protocols have been proposed using FHE schemes based on problems in lattices and learning with error (LWE) problems [15,16,17,18]. Recently, AguilarMelchor et al. presented in [16] XPIR, a PIR construction using a RingLWE based FHE scheme, that is computationally efficient but comes with a large communication cost. Following [16], Angel et al. in [17] were able to significantly improve its communication cost with only slightly more computations compared to XPIR. Along with the scheme of Angel et al., the recent work of Ali et al. [18] represent the stateoftheart efficiency for PIR schemes.
Recently, Holzbaur, Hollanti and WachterZeh have proposed in [19] the first single server PIR based on coding theory. However, their proposal was attacked in [20]. The primary idea in [19] is to generate the query by hiding carefully chosen error vectors using codewords from a random linear code. The linear code is kept secret by the user in order to obtain privacy. The same idea was previously used by AguilarMelchor and Gaborit in a latticebased PIR scheme [21], without using the notion of linear codes. The scheme was later attacked by Liu and Bi [22] using lattice reduction algorithms.
Interestingly, the idea of hiding query information using linear codes can be observed, directly or indirectly, in several other PIR schemes. In this paper, we develop a unified framework that describes all such PIR schemes. In particular, this framework characterizes all the single server PIR schemes that generate replies by contracting the database elements and the query elements using linear combinations. The main aim of this paper is to provide a survey on several existing single server PIR schemes in a unified coding theoretic perspective. This allows a deeper theoretical insight on the security of these PIR schemes.
The framework is based on two key elements: a linear code that hides the query information, and a retrieval function that allows the user to retrieve the desired file from a linearly entangled reply. On one hand, the notion of linear codes describes the common features of several existing PIR schemes, and on the other hand, the retrieval function describes the key differences between the schemes. In terms of the framework, the privacy of a PIR scheme heavily relies on the retrieval function. We observe that several choices of retrieval functions are not safe to use, for example, finite field homomorphisms and vector space homomorphisms. Moreover, we discuss the weaknesses of many broken PIR schemes with respect to this codebased framework.
The paper is organized as follows: in Sect. 2, we introduce the notation that will be used throughout the paper and give the background on single server private information retrieval, and linear codes over finite fields and over rings. In Sect. 3, we present the codebased framework and discuss the security in a general point of view. In Sect. 4, we provide a survey on four different PIR schemes, described in terms of the codebased framework. The first one is a basic scheme that uses a finite field homomorphism as the retrieval function. Whereas the other three are based on the existing PIR schemes [19, 21] and [16], respectively. The latter is the only example of a scheme that we present which is still unbroken. For the former two we will also describe the existing attacks with respect to the proposed codebased framework. Finally, in Sect. 5, we draw some theoretical remarks on the generality of the framework, and on the security of single server PIR schemes.
2 Preliminaries
In this section we introduce the notation that we use in the paper and we recall some background on the theory of single server PIR. Moreover, we introduce the basic notions of errorcorrecting linear codes.
2.1 Notation
In this paper, we denote by \(\mathcal{R}\) a ring and by \(\mathcal {R}^\times \) the set of invertible elements in the ring \(\mathcal {R}\). Moreover, let q be a prime power, then we denote by \(\mathbb {F}_q\) the finite field of size q.
We use bold lower case, respectively bold upper case letters to denote row vectors, respectively matrices. When we consider column vectors, we use the transpose symbol. The identity matrix of size k is denoted by \(\mathbf {I}_k\). Given a vector \(\mathbf {x}\) of length n and a set \(S \subset \{1, \ldots , n\}\), we denote by \(\mathbf {x}_S\) the projection of \(\mathbf {x}\) on the coordinates indexed by S. In the same way, \(\mathbf {M}_{S}\) denotes the projection of the \(k\times n\) matrix \(\mathbf {M}\) to the columns indexed by S.
For a set S we denote by \(S^C\) its complement. The support of a vector \(\mathbf {x}\in \mathbb {F}_q^n\) is denoted by \(\text {Supp}(\mathbf {x})=\{ 1 \le i \le n \mid x_i \ne 0\}.\)
The ith entry of a vector \(\mathbf {x} \in \mathbb {F}_q^n\) is denoted by \(\mathbf {x}[i]\), for \(i \in \{1, \ldots , n\}\).
Given a set S and a distribution \(\chi \) on S, \(x \leftarrow \chi \) represents a sample x from S following the distribution \(\chi \).
2.2 Single server private information retrieval
A single server PIR is a scheme involving two parties, the user and the server. The server manages a database containing some public information, and the user is interested in retrieving some entries of the database, without revealing which item was queried.
2.2.1 Basic description
A basic description of a single server PIR scheme is as follows. Let the database be denoted by \(\mathcal {DB}=\{db_1,\ldots ,db_N\}\), containing N files, and suppose the user wishes to retrieve the ith file \(db_i\). The user first constructs a query \(Q = \{q_1,\ldots ,q_N\}\), which hides the information about the index i, and sends it to the server. The server computes a response by performing certain operations between \(q_j\) and \(db_j\) for each j, and returns it to the user. The scheme is said to be correct if the user can retrieve the desired file \(db_i\) from the response.
2.2.2 Communication and computational cost
A simple solution to preserve the privacy is downloading the whole database. However, the communication cost of this operation, measured as the total number of bits exchanged by user and server, in the trivial case is too high, namely \(\mathcal {O}(N)\) where N is the size of the database. Modern PIR protocols allow the user to retrieve data from the database, with a communication complexity much smaller than \(\mathcal {O}(N)\). Some common methods can be used to improve the communication cost of any PIR scheme. In Sect. 3.2, we discuss such techniques in detail.
Another important aspect of a single server PIR scheme is the computational cost. Since the database has to process each entry of the query, the schemes are computationally expensive.
2.3 Linear codes
2.3.1 Over finite fields
Let \(\mathbf {x}\) be a vector in \(\mathbb {F}_q^n\). The Hamming weight of \(\mathbf {x}\) is denoted by \(\mathrm {wt}(\mathbf {x})\) and it is defined as the number of its nonzero entries, i.e., it is the size of its support. The Hamming distance between two vectors \(\mathbf {x},\mathbf {y}\in \mathbb {F}_q^n\) is defined as the number of components in which the two vectors differ, i.e., \(d(\mathbf {x},\mathbf {y}) = \{i \ \mid \ x_i\ne y_i\}\).
An \([n,k]_q\) linear code \(\mathcal {C}\) is a kdimensional subspace of \(\mathbb {F}_q^n\) endowed with the Hamming distance and the elements of \(\mathcal {C}\) are called codewords.
The minimum distance d of \(\mathcal {C}\) is the quantity
When the minimum distance d of a linear code \(\mathcal {C}\) is known, then \(\mathcal {C}\) is denoted by \([n,k,d]_q\).
A matrix \(\mathbf {G}\in \mathbb {F}_q^{k\times n}\) whose rows form a basis for \(\mathcal {C}\) is called generator matrix of \(\mathcal {C}\). Hence, we can define the code \(\mathcal {C}\) as \(\{\mathbf {v}\in \mathbb {F}_q^n\mid \mathbf {v}=\mathbf {u}\mathbf {G}^\top , \mathbf {u}\in \mathbb {F}_q^k\}\). Similarly, we can define the code \(\mathcal {C}\) as the kernel of a matrix \(\mathbf {H}\in \mathbb {F}_q^{(nk)\times n}\), i.e. \(\mathcal {C}:=\ker (\mathbf {H}) = \{\mathbf {v}\in \mathbb {F}_q^n\mid \mathbf {H}\mathbf {v}^\top = \mathbf {0}\}\). Such a matrix is called paritycheck matrix for the code \(\mathcal {C}\). An information set of an \([n,k,d]_q\) code \(\mathcal {C}\) is a set \(I \subset \{1, \ldots , n\}\) of size k, such that \(\mid \mathcal {C} \mid = \mid \mathcal {C}_I \mid ,\) where \(\mathcal {C}_I\) denotes the restriction of all codewords to the entries indexed by I.
2.3.2 Over rings
Let \(\mathcal {R}\) be a commutative ring with identity. A linear code \(\mathcal {C}\) of length n over \(\mathcal {R}\) is an \(\mathcal {R}\)module in the space \(\mathcal {R}^n\).
A linear code \(\mathcal {C}\) of length n over \(\mathcal {R}\) is called cyclic if \(\mathbf {c}= (c_1,\ldots ,c_n) \in \mathcal {C}\) implies \((c_{n},c_1,\ldots ,c_{n1}) \in \mathcal {C}\). Equivalently, \(\mathcal {C}\) is an ideal of the ring \(\mathcal {R}[x]/(x^n1)\).
A linear code \(\mathcal {C}\) of length n over \(\mathcal {R}\) is called negacyclic if \(\mathcal {C}\) is an ideal of the ring \(\mathcal {R}[x]/(x^n+1)\).
3 Codebased framework
In this section, we present a generic framework for single server PIR schemes by using the notion of errorcorrecting codes. For simplicity, we present the framework using a simple database setup, later we discuss different kinds of database setups that can be used to improve the communication complexity.
3.1 Codebased framework
Before we describe the framework in detail, we highlight some elements that are used in the framework:

We describe the generic framework over a finite commutative ring \(\mathcal {R}\) using a retrieval function \(f:\mathcal {R}\rightarrow \mathcal {R}\) and three subsets X, Y, Z of \(\mathcal {R}\).

The database files belong to the set X.

In order to generate queries, we fix a randomly chosen linear code \(\mathcal {C}\) over \(\mathcal {R}\). Each element of the query is the sum of a randomly chosen codeword in \(\mathcal {C}\) and an error vector over \(\mathcal {R}\).

To generate the error vectors corresponding to the nondesired files we use the set Y, whereas for the desired file we use the set Z.
Setup:
We define a retrieval function \(f: \mathcal {R}\rightarrow \mathcal {R}\), and subsets \(X,Y,Z \subseteq \mathcal {R}\) satisfying:

1.
f is a nonzero map.

2.
\(Y \subseteq \ker (f):= \{x \in \mathcal {R}: f(x)=0\}\) such that any linear combination of elements in Y with scalars in X belongs to \(\ker (f)\), i.e., \(x_1y_1+x_2y_2+ \cdots + x_jy_j \in \ker (f)\) whenever \(x_1,\ldots ,x_j \in X\) and \(y_1,\ldots ,y_j \in Y\).

3.
\(Z \subseteq f^{1}(\mathcal {R}^\times )\) such that \(f(y+xz) = xf(z)\) for all \(y \in \ker (f), x \in X\) and \(z \in Z\).
Note that f does not need to be a ring homomorphism, it can be any kind of function from \(\mathcal {R}\) to \(\mathcal {R}\) satisfying the above three conditions.
Let \(\mathbf {M} = (m_{i}) \in X^{N}\) represent the database, i.e., there are N files in the database. Suppose that the user wants to retrieve the bth file from the database.
Let \(\mathcal {C}\) be a random linear code over \(\mathcal {R}\) of length n, i.e., \(\mathcal {C}\) is an \(\mathcal {R}\)submodule of \(\mathcal {R}^n\).
Query generation:
Let \(\mathbf {g}_1,\ldots ,\mathbf {g}_m\) be generators of \(\mathcal {C}\) as an \(\mathcal {R}\)module, and let \(\mathrm {Enc}: \mathcal {R}^m \rightarrow \mathcal {R}^n\) be an encoding map of \(\mathcal {C}\). Note that \(\mathrm {Enc}\) is an \(\mathcal {R}\)linear map given by \((a_1,\ldots ,a_m) \mapsto a_1 \mathbf {g}_1 + \cdots + a_m \mathbf {g}_m\).
Let \(\mathbf {a}_1,\mathbf {a}_2,\ldots ,\mathbf {a}_N\) be randomly chosen elements in \(\mathcal {R}^m\), and define \(\mathbf {c}_i = \mathrm {Enc}(\mathbf {a}_i)\) for all \(i \in \{1,\ldots ,N\}\).
Now, let v be a randomly chosen fixed element in \(\{1,\ldots ,n\}\) and we randomly choose error vectors \(\mathbf {e}_1,\mathbf {e}_2,\ldots ,\mathbf {e}_N\) in \(\mathcal {R}^n\), such that they satisfy the following conditions that allow the reply extraction:
Let \(\mathbf {q}_i:=(\mathbf {a}_i,\mathbf {c}_i+\mathbf {e}_i)\) for all \(i\in \{1,\dots , N\}\). The query is then given by
Reply generation: The response is generated by computing
Reply extraction: First we perform the decoding by applying the encoding map on \(\mathbf {r}_1\), and obtain:
After that we can use the retrieval function f on the vth coordinate,
The above equalities follow from the conditions of the retrieval function. Now, since we know \(f(\mathbf {e}_b[v])\) and we have that \(f(\mathbf {e}_b[v]) \in f(Z) \subseteq \mathcal {R}^\times \), we can retrieve the desired file \(m_b\).
3.2 Communication complexity and different database setups
With respect to the basic description of the codebased framework, the communication cost is more than the size of the whole database. Indeed, for each file which is an element in \(\mathcal {R}\), we are sending a query element in \(\mathcal {R}^{n+m}\). Thus the total communication cost is \((N+1)\) times the size of an element in \(\mathcal {R}^{m+n}\). We can improve the communication complexity by using a matrix database setup [1] or iterative response techniques:

Matrix setup of database: In order to reduce the communication complexity, one can see the database as an \(s \times t\) matrix, where each element of the matrix is a database file. Now, the user generates a query \(Q = \{\mathbf {q}_1,\dots , \mathbf {q}_t\}\) containing t elements. For each query, the server replies by sending back the response \(R=\{\mathbf {r}_1,\dots , \mathbf {r}_s\}\), which contains s responses corresponding to the s rows of the database matrix. This technique was introduced in [1]. Using this approach and assuming \(s=t = \sqrt{N}\), the communication complexity is \(2\sqrt{N}\) times the size of an element in \(\mathcal {R}^{m+n}\).

Iterative reply generation: In this technique, one splits each file into L parts and repeats the query to retrieve each part of the file. Since the query is generated in order to retrieve only small portions of the desired file, the size of the ambient space reduces accordingly. Hence, relative to the size of the database, the query size reduces by a factor of L, and the response size increases by the same factor.
3.3 Security
The security of a single server computational PIR scheme is based on the difficulty of identifying the index of the desired file by looking at the query. With respect to the codebased framework, we can describe the security using the following distinguishability problem.
Problem 1
(Distinguishability Problem) Consider the notations of the setup and the query generation process of the codebased framework. Given the query vectors \(\mathbf {q}_1,\mathbf {q}_2,\ldots ,\) \(\mathbf {q}_N\), determine the index b of the desired file.
The difficulty of solving the distinguishability problem depends highly on the choice of the retrieval function f. In the following, we present two generic strategies that can be used to solve this problem. However, the computational cost of these strategies directly relies on the choice of the retrieval function and the error vectors \(\mathbf {e}_1,\ldots ,\mathbf {e}_N\).

1.
Consider the following matrix consisting of the query vectors
$$\begin{aligned} \mathbf {A} = \begin{pmatrix} \mathbf {q}_1 \\ \mathbf {q}_2 \\ \vdots \\ \mathbf {q}_N \end{pmatrix} = \begin{pmatrix} \mathbf {a}_1 &{} \mathbf {c}_1+\mathbf {e}_1 \\ \mathbf {a}_2 &{} \mathbf {c}_2 + \mathbf {e}_2 \\ \vdots \\ \mathbf {a}_N &{} \mathbf {c}_N + \mathbf {e}_N \end{pmatrix} \in \mathcal {R}^{N \times (m+n)}. \end{aligned}$$Observe that the vectors \(\left( \mathbf {e}_1[j],\mathbf {e}_2[j],\ldots ,\mathbf {e}_N[j]\right) \) for all \(j \in \{1,\ldots ,n\}\) belong to the column span of \(\mathbf {A}\). We recall that the vth coordinate of the error vectors are chosen in a special way, i.e., \(\mathbf {e}_b[v] \in Z \subseteq f^{1}(\mathcal {R}^\times )\) and \(\mathbf {e}_i[v] \in Y \subseteq \ker (f)\) for all \(i \ne b\). Hence, one could solve Problem 1 by finding the vector \(\left( \mathbf {e}_1[v],\mathbf {e}_2[v],\ldots ,\mathbf {e}_N[v]\right) \) in the column span of \(\mathbf {A}\).

2.
Let \(\mathbf {A}\) be the query matrix as defined above. For each \(i \in \{1,\ldots ,N\}\), let \(\mathbf {A}_i\) be the submatrix of \(\mathbf {A}\) obtained by deleting the ith row. Clearly, by construction, \(\mathbf {A}_b\) has distinct properties compared to \(\mathbf {A}_i\) for any \(i\ne b\). Thus, if there exists an (algebraic or nonalgebraic) invariant that can distinguish \(\mathbf {A}_b\) from \(\mathbf {A}_i\) for any \(i \ne b\), then Problem 1 can be solved by computing this invariant for each \(\mathbf {A}_1,\ldots ,\mathbf {A}_N\).
4 Examples of different PIR’s in our framework
In this section, we discuss several examples of single server PIR schemes that are based on different kinds of retrieval function. In each case, we analyze the security with respect to the distinguishability problem. In Table 1, we summarize all the differences among the schemes.
4.1 Basic PIR scheme using finite field isomorphism
In the following we describe the simplest case of the codebased framework, i.e., by considering linear codes over an arbitrary finite field and a field homomorphism for the retrieval function.
4.1.1 Scheme
Setup: Since the identity map is the only nonzero field endomorphism, the retrieval function \(f: \mathbb {F}_q \rightarrow \mathbb {F}_q\) has to be the identity map. We consider the sets \(X = \mathbb {F}_q\), \(Y = \ker (f) = \{ 0\}\) and \(Z= f^{1}(\mathbb {F}_q^\times ) = \mathbb {F}_q^\times \). It is easy to see that f satisfies all the conditions of a retrieval function.
Let \(\mathbf {M} = (m_i) \in \mathbb {F}_{q}^{N}\) represent the database, i.e., there are N files in the database, each file is of size q. Let \(\mathcal {C}\) be a random linear [n, k] code over \(\mathbb {F}_q\). The code \(\mathcal {C}\) is kept secret by the user.
Query generation: Let \(\mathbf {G}\) be a generator matrix of \(\mathcal {C}\), and let \(I \subseteq \{1,\ldots ,n\}\) be an information set. We use \(\mathbf {G}\) to perform the encoding, i.e., the encoding map \(\mathrm {Enc}: \mathbb {F}_q^k \rightarrow \mathbb {F}_q^n\) is given by \(\mathbf {a}\mapsto \mathbf {a}\mathbf {G}\).
Let \(\mathbf {a}_1,\ldots ,\mathbf {a}_N\) be randomly chosen vectors in \(\mathbb {F}_q^k\), and define the corresponding codewords \(\mathbf {c}_i:= \mathrm {Enc}(\mathbf {a}_i) = \mathbf {a}_i \mathbf {G}\) for all \(i \in \{1, \ldots , N\}\).
Note that since I is an information set, we have \((\mathbf {c}_i)_I = \mathbf {a}_i \mathbf {G}_I\) for all \(i \in \{1,\ldots ,N\}\). Recall that in the codebased framework we send \(\mathbf {a}_i\)’s in the query to facilitate the decoding in the reply extraction process. However, in this case, this can equivalently be achieved by adding no errors at the coordinates that are indexed by I. In particular, let v be a random element in \(I^C\), and we randomly choose error vectors \(\mathbf {e}_1,\mathbf {e}_2,\ldots ,\mathbf {e}_N\) in \(\mathbb {F}_q^n\) such that

1.
\(\mathrm {Supp}(\mathbf {e}_i) \subseteq I^C\) for all \(i \in \{1, \ldots , N\}\),

2.
\(\mathbf {e}_i[v] = 0\) for all \(i \ne b\), and \(\mathbf {e}_b[v] \ne 0\).
Let \(\mathbf {q}_i:=\mathbf {c}_i+\mathbf {e}_i\) for all \(i \in \{1,\dots , N\}\). The query is then given by
Reply generation:
The database computes
Reply extraction: Write \(\mathbf {r}= \mathbf {c}+ \mathbf {e}\), where \(\mathbf {c}:= \sum _{i=1}^N m_i \mathbf {c}_i\) and \(\mathbf {e}:= \sum _{i=1}^N m_i \mathbf {e}_i\).
Since I is an information set and \(\mathrm {Supp}(\mathbf {e}) \subseteq I^C\), we can perform decoding on \(\mathbf {r}\) by computing
We now only consider the vth coordinate of \(\mathbf {e}\) and apply the identity retrieval function, which gives \(m_b\mathbf {e}_b[v]\), as for all \(i \ne b\) we have that \(\mathbf {e}_i[v] =0.\) Since \(\mathbf {e}_b[v] \ne 0\), we can retrieve \(m_b\).
4.1.2 Security
As we discussed in Sect. 3.3, the security of the presented PIR scheme relies on the hardness of solving the distinguishability problem (see Problem 1). In this case, the distinguishability problem can be solved in polynomial time using the first strategy mentioned in Sect. 3.3.
Let \(\mathbf {A}\) be the matrix containing all the query vectors as rows, i.e.,
with \(\mathbf {C}= \begin{pmatrix} \mathbf {c}_1 \\ \vdots \\ \mathbf {c}_N \end{pmatrix}\) and \(\mathbf {E}= \begin{pmatrix} \mathbf {e}_1 \\ \vdots \\ \mathbf {e}_N \end{pmatrix}\). Since I is an information set, we have
This implies that
and hence the vector \(\left( \mathbf {e}_1[v],\mathbf {e}_2[v],\ldots ,\mathbf {e}_N[v]\right) \) belongs to the column span of \(\mathbf {A}\). We recall that \(\mathbf {e}_b[v] \ne 0\) and \(\mathbf {e}_i[v] = 0\) for all \(i \ne b\). This means that the bth unitary vector, i.e., the all zero vector having the entry 1 at the bth position, is in the column span of \(\mathbf {A}\).
An attacker can easily find such a vector by simply going through all N unitary vectors and checking their existence in the column span of \(\mathbf {A}\). Moreover, existence of another vector of Hamming weight one in the column span of \(\mathbf {A}\) is very unlikely. More precisely, given an \(N \times (n1)\) random matrix \(\mathbf {A}\) where \(N>n\), the probability of having a weight one vector in the column span of \(\mathbf {A}\) is \((n1)q^{(nN)}\), which is negligible. Despite having a small probability, there exist at most n unit vectors in the column span of \(\mathbf {A}\), which leaks information about the index b, since \(n <N\).
4.2 HHWZ PIR scheme
Recently, Holzbaur, Hollanti and WachterZeh have proposed the first single server PIR scheme based on coding theory in [19]. In this PIR scheme the authors consider the field extension \(\mathbb {F}_{q^m}\) and secretly choose a partition of the basis over \(\mathbb {F}_q\). Shortly after, this proposal has been attacked in [20], using that the removal of one row within the query matrix and checking for the dimension of the rest reveals the position of the desired file.
In the following, we describe this PIR scheme presented in [19] with respect to our codebased framework. Later, we also present the attack [20] in terms of solving the distinguishability problem.
Note that the original PIR scheme differs from our description in the database and query setup. In [19], the authors consider the database elements to be \(L \times \delta \) matrices over the base field \(\mathbb {F}_q\), and the query elements are also \(\delta \times n\) matrices over the base field \(\mathbb {F}_{q^m}\). Note that the authors have used the technique of iterative reply generation, i.e., by using the same query to retrieve each of the L rows of the database file. In the following description, we consider \(L=1\) and use an equivalent setup where the database files are single elements in \(\mathbb {F}_{q}\) and the query elements are vectors over \(\mathbb {F}_{q^m}\).
4.2.1 Scheme
In this case, we work over an extension of the finite field \(\mathbb {F}_q\) and the retrieval function is an \(\mathbb {F}_q\)linear map.
Setup: Let \(\{\beta _1,\ldots ,\beta _m\}\) be a basis of \(\mathbb {F}_{q^m}\) as an \(\mathbb {F}_q\)vector space. Further, let \(\mathcal {V}\) be the subspace \({{\,\mathrm{Span}\,}}_{\mathbb {F}_q}(\beta _1,\ldots ,\beta _s)\) and \(\mathcal {W}\) be \({{\,\mathrm{Span}\,}}_{\mathbb {F}_q}(\beta _{s+1},\ldots ,\beta _m)\), where s is some integer in \(\{1,\ldots ,m\}\). The retrieval function is given as
Let X be the set \(\mathbb {F}_q\), \(Y=\ker (f) = \mathcal {W}\) and \(Z= f^{1}(\mathbb {F}_{q^m}^\times ) = \mathcal {V} \setminus \{0\}\). It is easy to check that \({\mathrm{Proj}}_{\mathcal{V}}\) satisfies all the conditions of the retrieval function.
Let \(\mathbf {M} = (m_{i}) \in \mathbb {F}_{q}^{N}\) be the database, i.e., there are N files in the database, each file is of size q. Suppose the user wants to retrieve the bth file from the database. Let \(\mathcal {C}\) be a random [n, k] linear code over \(\mathbb {F}_{q^m}\).
Query generation: For the encoding and decoding, we follow the same procedure as in Sect. 4.1.
Let \(\mathbf {G}\) be a generator matrix of \(\mathcal {C}\), and let \(I \subseteq \{1,\ldots ,n\}\) be an information set. We use \(\mathbf {G}\) to perform the encoding, i.e., the encoding map is \(\mathrm {Enc}: \mathbb {F}_q^k \rightarrow \mathbb {F}_q^n\) given by \(\mathbf {a}\mapsto \mathbf {a}\mathbf {G}\).
Let \(\mathbf {a}_1,\ldots ,\mathbf {a}_N\) be randomly chosen vectors in \(\mathbb {F}_{q^m}^k\), and define the corresponding codewords \(\mathbf {c}_i:= \mathrm {Enc}(\mathbf {a}_i) = \mathbf {a}_i \mathbf {G}\) for all \(i \in \{1, \ldots , N\}.\)
As in Sect. 4.1, we perform the decoding by adding no errors at the coordinates that are indexed by I.
Let v be a fixed element in \(I^C\). Now, we choose error vectors \(\mathbf {e}_1,\mathbf {e}_2,\ldots ,\mathbf {e}_N\) randomly in \(\mathbb {F}_{q^m}^n\) such that

1.
\(\mathrm {Supp}(\mathbf {e}_i) \subseteq I^C\) for all \(i \in \{1, \ldots , N\} \),

2.
\(\mathbf {e}_i[v] \in \mathcal {W}\) for all \(i \ne b\), and \(\mathbf {e}_b[v] \in \mathcal {V} \setminus \{0\}\).
Let \(\mathbf {q}_i:=\mathbf {c}_i+\mathbf {e}_i\) for \(i \in \{1,\dots , N\}\). The query is then given by
Reply generation: The response is generated by computing
Reply extraction:
Write \(\mathbf {r}= \mathbf {c}+ \mathbf {e}\), where \(\mathbf {c}= \sum _{i=1}^N m_i \mathbf {c}_i\) and \(\mathbf {e}= \sum _{i=1}^N m_i \mathbf {e}_i\).
Since I is an information set and \(\mathrm {Supp}(\mathbf {e}) \subseteq I^C\), we can perform the decoding on \(\mathbf {r}\) by computing
Now we consider the vth coordinate of \(\mathbf {e}\) and apply the retrieval function, which gives
This works because \(\mathbf {e}_i[v] \in \mathcal {W}\) for all \(i \ne b\), and \(\mathbf {e}_b[v] \in \mathcal {V} \setminus \{0\}\). Moreover, since we know \(\mathbf {e}_b[v]\), we can retrieve \(m_b\).
4.2.2 Security
The original PIR scheme [19] has been attacked in [20], by solving the distinguishability problem. The attack follows the second strategy mentioned in Sect. 3.3.
Let \(\mathbf {A}\) be the matrix containing all the query vectors as rows, i.e.,
with \(\mathbf {C}= \begin{pmatrix} \mathbf {c}_1 \\ \vdots \\ \mathbf {c}_N \end{pmatrix}\) and \(\mathbf {E}= \begin{pmatrix} \mathbf {e}_1 \\ \vdots \\ \mathbf {e}_N \end{pmatrix}\).
For each \(i \in \{1,\ldots ,N\}\), let \(\mathbf {A}_i\) be the submatrix of \(\mathbf {A}\) obtained by deleting the ith row. Then the \(\mathbb {F}_q\)rank of these matrices satisfy the following proposition.
Proposition 1
[20, Proposition 3.1] Let \(\mathbf {A}\) be given as above. Then
Moreover, for all \(i \in \{1,\ldots ,N\}\)
In the case when \(N<mn\), the query size becomes bigger than the size of the database, i.e., the scheme is no better than the trivial PIR protocol of downloading entire database. Hence, we assume \(N \ge mn\) and we use the following corollary to distinguish the index b in polynomial time.
Corollary 1
[20, Corollary 3.2, Proposition 3.3] Let \(\mathbf {A}_i\) be given as above. Then, with high probability,

1.
\(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {A}_b) = mns\),

2.
for \(i \ne b\), we have that \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {A}_i) = mn\).
Proof
From Proposition 1, we have that \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {A}_i) = \mathrm {rank}_{\mathbb {F}_q}(\mathbf {C}_i) + \mathrm {rank}_{\mathbb {F}_q}(\mathbf {E}_i)\) for all \(1 \le i \le N\).
In the first case, we have that \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {C}_b) = mk\) and \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {E}_b) = (nk1)m +(ms)\) (with high probability), where the first part comes from the columns not indexed by v, which live in the full space \(\mathbb {F}_{q^m} = \mathcal {W} + \mathcal {V}\) and the second part comes from the column indexed by v, which lives in the subspace \(\mathcal {W}\). Note that the equation \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {E}_b) = m(nk)s\) holds true with high probability due to the randomness of the matrix entries.
In the case of \(i \ne b\), we still have that \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {C}_i) = mk\), but now \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {E}_i) = (nk1)m +m\) (with high probability), where the first part comes from the columns not indexed by v and the second part comes from the column v (observe that in this case all columns are in the full space \(\mathbb {F}_{q^m} = \mathcal {W} + \mathcal {V}\)). Note that the equation \(\mathrm {rank}_{\mathbb {F}_q}(\mathbf {E}_i) = m(nk)\) holds true with high probability due to the randomness of the matrix entries. \(\square \)
4.3 AMG PIR scheme
In the following, we describe the PIR scheme presented in [21] with respect to our codebased framework. Later, we also present the latticebased attack [22] in terms of solving the distinguishability problem. Note that the original PIR scheme differs from our description in the following way:

Database setup: in [21], the authors consider the database elements to be vectors over the base field \(\mathbb {F}_p\). Moreover, each query element is a matrix over \(\mathbb {F}_p\). In the following description, we use an equivalent setup where the database files are single elements in \(\mathbb {F}_{p}\) and query elements are vectors over \(\mathbb {F}_p\).

Noisescrambling matrix \(\varDelta \): the authors introduce an invertible diagonal matrix \(\varDelta \) in order to disguise the softnoise error vectors from the hardnoise error vectors. In our description, we ignore this scrambling matrix \(\varDelta \), as we will see in the security discussion that \(\varDelta \) has no effect on the column space of the query matrix.

In [21], the rate k/n of the underlying linear code is fixed \(k/n=0.5\). In our description we use an arbitrary rate.
4.3.1 Scheme
In this scheme, we work over a finite field \(\mathbb {F}_p\), where p is a prime number. We will see \(\mathbb {F}_p\) as \(\{\lfloor \frac{p}{2} \rfloor , \ldots , \lfloor \frac{p}{2} \rfloor \}\).
Setup: Assume that the database is of the form \(\mathbf {M} = (m_{i}) \in \{0,1,\ldots ,2^\ell  1 \}^{N}\) with \(\ell = \lceil \log _2(N)\rceil +1\), i.e., there are N files in the database each of size \(\ell \) bits. Note that if the file size is bigger than \(\ell \) bits, then we split the files in chunks of \(\ell \) bits. Suppose the user wants to retrieve the bth file from the database.
Let p be a prime number greater than \(2^{3 \ell }\) and \(t = 2^{2\ell }\). The retrieval function is given by the remainder of the Lee weight corresponding to modulo t, i.e.,
where \(\mathrm {wt}_{L_t}\) denotes the Lee weight on \(\mathbb {Z}/t\mathbb {Z}= \{0,1,\ldots ,t1 \}\), which is defined as
The set \(X = \{0,1,\ldots ,2^\ell 1\}\), \(Y= \{1,1\} \subseteq \ker (f)\) and \(Z= \{t\} \subseteq f^{1}(\mathbb {F}_p^\times )\).
Now observe that a linear combination of elements in Y with scalars from X having arbitrary number of terms does not necessarily belongs to \(\ker (f)\). However, the condition is satisfied when we have at most N number of terms in the linear combination: for \(x_1,\ldots ,x_N \in X\) and \(y_1,\ldots ,y_N \in Y\) we have that
and hence
Further we have that for \(y \in Y, x \in X\) and \(z \in Z\)
since \(f(z) = f(t)= t\mathrm {wt}_{L_t}\left( t \mod t\right) = t.\)
Let \(\mathcal {C}\) be a random linear [n, k] code over \(\mathbb {F}_{p}\), which is kept secret by the user.
Query generation: For the encoding and decoding, we follow the same procedure as in Sects. 4.1 and 4.2.
Let \(\mathbf {G}\) be a generator matrix of \(\mathcal {C}\), and let \(I \subseteq \{1,\ldots ,n\}\) be an information set. We use \(\mathbf {G}\) to perform the encoding, i.e., the encoding map is \(\mathrm {Enc}: \mathbb {F}_q^k \rightarrow \mathbb {F}_q^n\) given by \(\mathbf {a}\mapsto \mathbf {a}\mathbf {G}\).
Let \(\mathbf {a}_1,\ldots ,\mathbf {a}_N\) be randomly chosen vectors in \(\mathbb {F}_q^k\), and define the corresponding codewords \(\mathbf {c}_i:= \mathrm {Enc}(\mathbf {a}_i) = \mathbf {a}_i \mathbf {G}\) for all \(i \in \{1, \ldots , N\}\).
As in Sects 4.1 and 4.2, we perform the decoding by adding no errors at the coordinates that are indexed by I.
Let v be a fixed element in \(I^C\). Now, we choose error vectors \(\mathbf {e}_1,\mathbf {e}_2,\ldots ,\mathbf {e}_N\) randomly in \(\mathbb {F}_{q^m}^n\) such that

1.
\(\mathrm {Supp}(\mathbf {e}_i) \subseteq I^C\) for all \(i \in \{1, \ldots , N \}\),

2.
\( \mathbf {e}_i[v] \in \{\pm 1\}\) for all \(i \ne b\), and \(\mathbf {e}_b[v] = t\).
Let \(\mathbf {q}_i:=\mathbf {c}_i+\mathbf {e}_i\) for all \(i\in \{1,\dots , N\}\). The query is then given by
Reply generation: The response is generated by computing
Reply extraction:
Write \(\mathbf {r}= \mathbf {c}+ \mathbf {e}\), where \(\mathbf {c}= \sum _{i=1}^N m_i \mathbf {c}_i\) and \(\mathbf {e}= \sum _{i=1}^N m_i \mathbf {e}_i\).
Since I is an information set and \(\mathrm {Supp}(\mathbf {e}) \subseteq I^C\), we can perform the decoding on \(\mathbf {r}\) by computing
We will only focus on the vth coordinate of \(\mathbf {e}\) and apply the retrieval function to obtain
This works since
and hence
Now since \(\gcd (t,p)=1\), we can retrieve \(m_b\).
4.3.2 Security
In [22], Liu et al. presented a latticebased attack on the AMG PIR scheme. The method used in the attack can be described as per the first strategy, mentioned in Sect. 3.3, to solve the distinguishability problem.
Let \(\mathbf {A}\) be the matrix containing all the query vectors as rows, i.e.,
As discussed in Sect. 4.1, the vector \(\left( \mathbf {e}_1[v],\mathbf {e}_2[v],\ldots ,\mathbf {e}_N[v]\right) \) belongs to the column span of \(\mathbf {A}\).
Recall that by construction, the vector \(\left( \mathbf {e}_1[v],\mathbf {e}_2[v],\ldots ,\mathbf {e}_N[v]\right) \) has \(N1\) entries from \(\{1,+1\}\) and one entry with value equal to t. If we delete the bth row of \(\mathbf {A}\), call it the matrix \(\mathbf {A}_b\), then the vector
will be, with a very high probability, the shortest vector in the pary lattice generated by the columns of \(\mathbf {A}_b\). More precisely, the lattice is generated by the n columns of \([\mathbf {A}_bp\mathbf {Id}_{N1}]\). However, it is still infeasible to find this vector due to the large dimension of the lattice.
In [22], the authors construct multiple small dimensional lattices. Let \(k \le s \le N\), and let \(\mathbf {A}^{(1)},\ldots ,\mathbf {A}^{(\lceil N/s \rceil )}\) be a rowwise partitioning of the matrix \(\mathbf {A}\), i.e., \(\mathbf {A}^{(i)}\) is the \(s \times n\) matrix given by s rows of \(\mathbf {A}\) indexed by \(\{(i1)s+1,\ldots ,is\}\). Now, let \(\mathcal {L}_i\) be the pary lattice generated by the columns of \(\mathbf {A}^{(i)}\). Note that the dimension of the lattices \(\mathcal {L}_i\) is s, hence the attacker chooses s such that implementing basis reduction algorithms for \(\mathcal {L}_i\) is feasible. In order to find the index b, the attacker goes through each of these lattices.
Note that the index b of the desired file corresponds to the lattice \(\mathcal {L}_{\lfloor b/s \rfloor }\), which the attacker is able to find, and then the attacker finds the index b by solving the closest vector problem for \(\mathcal {L}_{\lfloor b/s \rfloor }\).
More in detail, in the case of \(i \ne \lfloor b/s\rfloor \), we observe that the shortest vector in \(\mathcal {L}_i\) corresponds to the vector \((\mathbf {e}_{(i1)s+1}[v],\ldots ,\mathbf {e}_{is}[v])\) having entries in \(\{1,+1\}\). This observation does not hold in the case of \(i=\lfloor b/s \rfloor \) due to the existence of large t. The attacker uses the lattice reduction algorithms to find the shortest vector in each \(\mathcal {L}_i\), and consequently finds the corresponding lattice \(\mathcal {L}_{\lfloor b/s \rfloor }\).
Now, the index b can be located using solving the closest vector problem. Let \(j = \lfloor b/s \rfloor \). Then observe that \((\mathbf {e}_{(j1)s+1}[v],\ldots ,\mathbf {e}_{js}[v]) \in \mathcal {L}_j\) is the closest lattice vector to \((0,\ldots ,0,t,0,\ldots ,0)\) (with t at the bth position). To find the index b, we can use Kannan’s embedding technique [23] to solve (at most) s instances of the closest vector problem with inputs vector of the form \((0,\ldots ,0,t,0,\ldots ,0)\).
4.4 RingLWE based PIR schemes
In the section, we describe the PIR schemes constructed using the RingLWE (RLWE) based homomorphic encryption schemes. In particular, we consider the construction of XPIR scheme [16] that uses the RingLWE based homomorphic encryption scheme presented in [24].
The original PIR scheme differs from our description in the error distribution as follows. In [16], the authors use two different distributions \(\chi \) and \(\chi ^\prime \) to sample errors. The distribution \(\chi \) is used to generate the public key and the distribution \(\chi ^\prime \), having larger variance, is used for encryption. In the following description, we consider only one distribution, mimicking \(\chi ^\prime \), to sample error vectors in the query generation process.
We would like to remark that in the following description, the database elements and the query elements are polynomials of degree smaller than n with coefficients in \(\mathcal {R}\), which can also be represented by vectors in \(\mathcal {R}^n\).
4.4.1 Scheme
In this scheme, we work over a finite ring \(\mathbb {Z}/q\mathbb {Z}\), where q is a positive integer. Instead of a random linear code over \(\mathbb {Z}/q\mathbb {Z}\), we consider a random negacyclic code over \(\mathbb {Z}/q\mathbb {Z}\).
Setup: Let q, t be positive integers with \(t<q\) and \(\gcd (t,q)=1\). The retrieval function is given by
Let \(\chi \) be a discrete Gaussian distribution with standard deviation \(\sigma \). The parameters \(q,n,t,\sigma \) are chosen such that they satisfy \(Nt^2\sigma \sqrt{n} < q/2\), where n is the length of the linear code that will be used in query generation.
Now, we define the subsets
Observe that for \(x_1,x_2,\ldots ,x_N \in X \) and \(ty_1,ty_2,\ldots ,ty_N \in Y\) we have that
This works since the choice of parameters \(q,n,t,\sigma \) implies that \(\sum _{i=1}^N x_i t y_i < q/2\) with very high probability. And for \(x \in X, ty \in Y\) and \(tz+1 \in Z\) we have that
since \(ty+x(tz+1) < q/2\).
Let n be a power of 2, and let \(R_q := (\mathbb {Z}/q\mathbb {Z})[x]/(x^n+1)\). Let \(\mathbf {M} = (m_{i}) \in \left( X[x]/(x^n+1)\right) ^N\), i.e., there are N files in the database and each file is an element in \(R_q\) with coefficients in X. In particular, each file is of size \(\log _2(tn)\) bits. Suppose the user wants to retrieve the bth file from the database.
Let \(\mathcal {C}\) be a negacyclic code of length n over \(\mathbb {Z}/q\mathbb {Z}\) generated by some randomly chosen \(s \in R_q\), i.e., \(\mathcal {C}\) is a ideal in \(R_{q}\) generated by s. The code is kept secret by the user.
Query generation: We use the generating polynomial s to define the encoding map, i.e., \(Enc : R_q \rightarrow R_q\) is given by \(a \mapsto as\).
Let \(a_1,a_2,\ldots ,a_N\) be randomly chosen elements in \(R_q\), and define N codewords \(c_i := a_is\) for all \(i \in \{1,\ldots ,N\}\).
Now, we choose the errors \(e_1,e_2,\ldots ,e_N\) in \(R_{q}\) such that they satisfy the following two conditions that allow the reply extraction:

1.
\(e_i = t y_i\), with \(y_i\) sampled from the distribution \(\chi \), for all \(i \ne b\),

2.
\(e_b = t y_b + 1\) with \(y_b\) sampled from \(\chi \).
Let \(\mathbf {q}_i:= (a_i,c_i + e_i)\) for all \(i\in \{1,\dots , N\}\). The query is then given by
Reply generation: The response is generated by computing
Reply extraction:
By applying the encoding map Enc on \(r_1\), we first decode \(r_2\) to obtain the error part, i.e.,
After that we can use the retrieval function f,
Note that here we apply f on an element of \(R_q\), which is done by applying f on each coefficient.
The last equality follows from the conditions on the parameters \(n,q,t,\sigma \), since the maximal coefficient of \(\sum _{i=1}^N m_i e_i\) is, with high probability, upper bounded by \(Nt^2\sigma \sqrt{n}\) (see [24, Lemma 1]), which is less than q/2.
4.4.2 Security
As mentioned above, the XPIR scheme [16] uses the fully homomorphic encryption scheme presented in [24], whose security is based on the hardness of solving the polynomial learning with error (PLWE) problem, which is a simplified version of the ring LWE problem.
Let \(R_q = \mathbb {Z}/q\mathbb {Z}[x]/(x^n+1)\), and let \(\chi \) be a narrow discrete Gaussian distribution on \(R_q\). Then the PLWE assumption states that it is computationally hard to distinguish a polynomial number of samples of the form \((a_i,a_is+e_i)\) and the same number of samples of the form \((a_i,u_i)\), where \(s, a_i\)’s and \(u_i\)’s are sampled uniformly from \(R_q\) and the \(e_i\)’s are sampled from \(\chi \).
Moreover, [24, Proposition 1] states that if the samples are of the form \((a_i,a_is+te_i)\), where \(a_i,s,e_i\) are as above and \(t \in (\mathbb {Z}/q\mathbb {Z})^\times \), then distinguishing such samples from the uniform samples is equivalent to the PLWE assumption.
Let \(\mathbf {A}\) be the query matrix and \(\mathbf {A}_i\) be the submatrix of \(\mathbf {A}\) obtained by deleting the ith row. Translating the above mentioned approach [24, Proposition 1] to our generic framework means that distinguishing \(\mathbf {A}_b\) from an uniformly sampled matrix is equivalent to the PLWE problem.
However, the second strategy mentioned in Sect. 3.3 aims in a different direction: that is to distinguish between \(A_i\) for \(i \ne b\) and \(A_b\). Thus, this might lead to new security analyses of such PIR schemes.
5 Theoretical remarks
5.1 Generic PIR scheme vs codebased framework
A natural question would be to ask whether any single server PIR scheme can be described in terms of the codebased framework. The answer is no, as the number theoretic PIR scheme by Kushilevitz and Ostrovsky [25] does not fit the framework. However, if we restrict to the class of PIR schemes that generates replies by contracting the database elements and the query elements using linear combinations (which will be denoted from now on as additive PIR schemes), then the answer is yes. In the following, we discuss the requirements of an arbitrary additive PIR scheme and argue the necessity of the elements in the codebased framework to fulfil those requirements:

1.
Ambient space: An additive PIR scheme needs two operations: multiplication \((*)\) between database and query elements, and addition \((+)\) of those products. Hence, the canonical choice of the ambient space is rings. For practical reasons, the rings should be finite.

2.
Retrieval: Let the database be denoted by \(\mathcal {DB}=\{db_1,\ldots ,db_N\}\), and the corresponding query be given by \(Q = \{q_1,\ldots ,q_N\}\). Suppose that the user wants to retrieve the bth file. In an additive PIR scheme, the reply is \(\sum _{i=1}^N db_i *q_i\) and user wants to retrieve \(db_b\) from the reply. The operation \(\sum _{i=1}^N db_i *q_i \mapsto db_b\), denoted by g, is an analogue to the retrieval function used in the codebased framework. First we note that g annihilates \(\sum _{i\ne b} db_i *q_i\) in such a way that we are only left with \(g(db_b *q_b)\). And then \(db_b\) is recovered from \(g(db_b *q_b)\). These two properties imply that \(db_i\)’s and \(q_i\)’s live in special subsets of the ambient space R. Let X denote the space of database elements, Y denote the space of query elements that are not associated with the desired file and Z denote the space of query element associated with the desired file. The requirements on g imply that: (1) a linear combination of elements in Y with scalars in X belongs to the kernel of g, and (2) \(g(x *z) = x *g(z)\) and g(z) is an invertible element, for any \(x \in X\) and \(z \in Z\). These two conditions are the basis of the conditions of the retrieval function used in the codebased framework.

3.
Privacy: Another important aspect of a PIR scheme is privacy, i.e., given a query Q, it should be computationally infeasible to determine the index b of the desired file. Let us look at the scenario where we directly use elements in Y and Z to generate query elements. Then the privacy relies on the hardness of the following decisional problem: given \(q \in Y \cup Z\), decide whether \(q \in Y\) or \(q \in Z\). In general this may not be a hard problem, as one can apply the retrieval function to distinguish the elements between Y and Z. Therefore, to ensure privacy we must add some randomness to the query elements. Moreover, the user should be able to remove this randomness even after receiving the reply that contains their linear combinations. This is exactly the rationale of linear errorcorrecting codes. We treat the elements of Y and Z as errors, and the added randomness belongs to a random linear code.
5.2 On security of PIR schemes
In terms of the codebased framework, the security of a PIR scheme relies on the type of the underlying retrieval function. As we have noticed from the examples in Sect. 4, the following type of retrieval functions are not safe to use.

1.
Field homomorphisms: In the case where the retrieval function is a nontrivial field homomorphism, the PIR scheme is then equivalent to the one described in Sect. 4.1. The kernel of the retrieval function must be \(\{0\}\), as \(\{0\}\) is the only proper ideal in any field. As a consequence, determining the index of the desired file becomes an easy task of finding a unitary vector in the column space of the query matrix, thus it suffers from the first attack strategy discussed in Sect. 3.3.

2.
Vector space homomorphisms: In this case, the resulting PIR scheme is equivalent to HHWZ PIR scheme [19], described in Sect. 4.2. The kernel of a nontrivial linear map is a proper subspace of the parent vector space. This results in an exceptionally low rank of the matrix that is obtained from the query matrix by deleting the row that corresponds to the desired file, thus it suffers from the second attack strategy discussed in Sect. 3.3.
We can generalize these two cases to more types of retrieval functions. Clearly, the weakness of vector space homomorphisms can also be observed in the case of free module homomorphims, because of the existence of the notion of rank and dimension for free modules. On the other hand, the weakness of field homomorphisms can be seen in the case of local ring homomorphims. Let R be a finite local ring with maximal ideal M, then the kernel of the retrieval function is a subideal of M. Note that there exists an integer \(\ell \) such that \(M^\ell = \{0\}\) and \(M^{\ell 1} \ne \{0\}\). Let \(a \in M^{\ell 1} \setminus \{0\}\), then note that \(ar = 0\) for all \(r \in M\). This implies that the special column vector \((\mathbf {e}_1[v],\ldots ,\mathbf {e}_N[v])\), when multiplied by a, results in a unit vector. Hence, similar to the field homomorphism case, we observe the existence of a unit vector in the column space of the query matrix.
The other two schemes, presented in Sects. 4.3 and 4.4 respectively, do not use additive retrieval functions. Both the schemes work on the idea of using small modulus errors in a large modulus ambient space. Due to which the security eventually relies on finding short vectors in a high dimensional lattice, which is a computationally hard problem. However, in the case of AMG PIR scheme, the problem breaks down over multiple small dimensional lattices and hence the attack becomes feasible. Whereas in the case of LWEbased PIR schemes, this new perspective may have a potential in introducing new approaches for their security analysis.
In order to construct an additive PIR scheme, one may investigate the cases of structured morphisms like ring homomorphisms and module homomorphisms, or the cases of unstructured morphisms like the functions used in AMG scheme and LWEbased schemes.
Furthermore, if one constructs an additive PIR scheme independently, then it would be worth translating the scheme in terms of the codebased framework to check for possible security issues.
References
Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In Proceedings of IEEE 36th annual foundations of computer science, pp. 41–50. IEEE (1995)
Dvir, Z., Gopi, S.: 2server PIR with subpolynomial communication. J. ACM (JACM) 63(4), 1–15 (2016)
Beimel, A., Ishai, Y., Kushilevitz, E., Raymond, JF.: Breaking the \({O}(n^{1/(2k1)})\) barrier for informationtheoretic private information retrieval. In Proceedings of the 43rd annual IEEE symposium on foundations of computer science, 2002, pp. 261–270. IEEE (2002)
Sun, H., Jafar, S.A.: The capacity of symmetric private information retrieval. IEEE Trans. Inform. Theory 65(1), 322–329 (2018)
Sun, H., Jafar, S.A.: The capacity of robust private information retrieval with colluding databases. IEEE Trans. Inform. Theory 64(4), 2361–2370 (2017)
Banawan, K., Ulukus, S.: The capacity of private information retrieval from coded databases. IEEE Trans. Inform. Theory 64(3), 1945–1956 (2018)
FreijHollanti, R., Gnilke, O.W., Hollanti, C., Karpuk, D.A.: Private information retrieval from coded databases with colluding servers. SIAM J. Appl. Algebra Geom 1(1), 647–664 (2017)
Dong, C., Chen, L.: A fast single server private information retrieval protocol with low communication cost. In European symposium on research in computer security, pp. 380–399. Springer (2014)
Kushilevitz, E., Ostrovsky, R.: Replication is not needed: single database, computationally—private information retrieval. In Proceedings 38th annual symposium on foundations of computer science, pp. 364–373. IEEE (1997)
Lipmaa, H., Pavlyk, K.: A simpler rateoptimal CPIR protocol. In International conference on financial cryptography and data security, pp. 621–638. Springer (2017)
Stern, JP.: A new and efficient allornothing disclosure of secrets protocol. In International conference on the theory and application of cryptology and information security, pp. 357–371. Springer (1998)
Sion, R., Carbunar, B.: On the computational practicality of private information retrieval. In Proceedings of the network and distributed systems security symposium, pp. 2006–06. Internet Society (2007)
Shor, P.W.: Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2), 303–332 (1999)
Yi, X., Kaosar, M.G., Paulet, R., Bertino, E.: Singledatabase private information retrieval from fully homomorphic encryption. IEEE Trans. Knowl. Data Eng. 25(5), 1125–1134 (2012)
Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. SIAM J. Comput. 43(2), 831–871 (2014)
AguilarMelchor, C., Barrier, J., Fousse, L., Killijian, M.O.: XPIR: private information retrieval for everyone. Proc. Priv. Enhanc. Technol. 2016(2), 155–174 (2016)
Angel, S., Chen, H., Laine, K., Setty, S.: PIR with compressed queries and amortized query processing. In 2018 IEEE symposium on security and privacy (SP). pp. 962–979 (2018)
Ali, A., Lepoint, T., Patel, S., Raykova, M., Schoppmann, P., Seth, K., Yeo, K.: Communication–computation tradeoffs in PIR. IACR Cryptol. ePrint Arch. (2019)
Holzbaur, L., Hollanti, C., WachterZeh, A.: Computational codebased singleserver private information retrieval. In 2020 IEEE international symposium on information theory (ISIT), pp. 1065–1070. IEEE (2020)
Bordage, S., Lavauzelle, J.: On the privacy of a codebased singleserver computational PIR scheme. Cryptogr, Commun (2021)
Melchor, CA, Gaborit, .: A fast private information retrieval protocol. In 2008 IEEE international symposium on information theory, pp. 1848–1852 (2008)
Liu, J., Bi, J.: Cryptanalysis of a fast private information retrieval protocol. In Proceedings of the 3rd ACM international workshop on ASIA publickey cryptography, pp. 56–60 (2016)
Kannan, R.: Minkowski’s convex body theorem and integer programming. Math. Oper. Res. 12(3), 415–440 (1987)
Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic encryption from ringLWE and security for key dependent messages. In Annual cryptology conference, pp. 505–524. Springer (2011)
Kushilevitz, E., Ostrovsky, R.: Replication is not needed: single database, computationallyprivate information retrieval. In Proceedings 38th annual symposium on foundations of computer science, pp. 364–373. IEEE (1997)
Acknowledgements
The authors would like to thank Lukas Holzbaur, Antonia WachterZeh and Camilla Hollanti for useful discussions and Razane Tajeddine for bringing this interesting topic to their knowledge. This work was partially supported by Swiss National Science Foundation grant no. 188430, Grant No. 195290 and Forschungskredit of the University of Zurich grant no. FK19080.
Funding
Open Access funding provided by Universität Zürich.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alfarano, G.N., Khathuria, K. & Weger, V. A survey on single server private information retrieval in a coding theory perspective. AAECC (2021). https://doi.org/10.1007/s00200021005085
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00200021005085
Keywords
 Computational PIR
 Coding theory
 Single server PIR
Mathematics Subject Classification
 68P20
 94B99