Improved user-private information retrieval via finite geometry

In a user-private information retrieval (UPIR) scheme, a set of users collaborate to retrieve files from a database without revealing to observers which participant in the scheme requested the file. To achieve privacy, users retrieve files from the database in response to anonymous requests posted to message spaces; assuming that each message space can be accessed by a subset of the participants in the scheme. Privacy with respect to the database is easily achieved, but privacy with respect to coalitions of other users within the scheme is sensitive to the choice of incidence structure determining which users can access each message space. Earlier schemes were based on pairwise balanced designs and symmetric designs, and involved at most one step of message passing to retrieve a file. We propose a new class of UPIR schemes based on generalised quadrangles (GQs), which need up to two steps of message passing in each file retrieval. We introduce a new message passing protocol in which messages are encrypted. Even using this protocol, previously proposed schemes are compromised by finite coalitions of users. We construct a family of GQ-UPIR schemes which maintain privacy with high probability even when O(n1/2-ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{1/2-\epsilon })$$\end{document} users collude, where n is the total number of users in the scheme. We also show that a UPIR scheme based on any family of generalised quadrangles is secure against coalitions of O(n1/4-ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{1/4-\epsilon })$$\end{document} users.


Introduction
Private Information Retrieval (PIR) allows a user to retrieve information from a database without revealing which information was requested.A trivial solution is for the user to download all of the information on the database, but when the information is replicated in multiple locations, more efficient schemes are known [1,3,5].
A slightly different approach to the problem of private information retrieval attempts to hide the identity of the user downloading a file.One approach to this problem is Onion Routing [11].An onion is a recursively encrypted data Figure 1: A visualisation of a UPIR system.packet, which encodes a path through a network of cooperating users.Each user in the path removes the outermost layer of encryption, and forwards the onion to the next user on the path.The onion carries no identifying features, which would allow an observer to identify it with different outer layers.Anonymity is achieved in this system by choosing sufficiently long paths at random for the onions.
One disadvantage of onion routing is that the number of times a message is passed between users can be large.This results in a low throughput of data when bandwidth in the network is limited.User-Private Information Retrieval (UPIR) is an approach to private information retrieval in which the identities of users are hidden, but the number of times a message is forwarded through the network is tightly controlled.To achieve privacy in a UPIR scheme, it is usual to place strong restrictions on which users can communicate with one another within the scheme.
Definition 1 (cf.Section 2, [9]).A UPIR system consists of a bipartite graph (U ∪ M, E), where U is the set of users and M is the set of message spaces.A user u ∈ U has access to a message space M ∈ M if (u, M ) is in the set of edges E. Furthermore it is assumed that all users have access to a database that evaluates queries.
Example 1. Figure 1 shows a UPIR system with 5 users and 3 message spaces and the incidence matrix of the corresponding bipartite graph.
A common requirement in earlier work is that every pair of users share access to a common message space.Here, we require only that the bipartite graph underlying the UPIR scheme is connected; we say that such a UPIR system is connected.If u is incident with M , then user u can both write messages to M and read any messages written to M ; if u is not incident with M then u has no access to M .Users communicate within the scheme only by writing messages to one another in the message spaces.Any user may send queries to the database for evaluation.Users preserve their privacy by passing requests via the message spaces to other users to act as proxies for them.
In order for users to communicate using a UPIR system, a protocol is required; one example is described explicitly below.We refer to the combination of a UPIR system and a protocol as a UPIR scheme.This distinction is helpful in illustrating the interactions between the combinatorics of the bipartite graph and the privacy properties of the protocol.
Protocol 1.Let (U ∪ M, E) be a connected UPIR system.Suppose that user u wishes to retrieve the response to the query Q from the database.
1. User u chooses a user v uniformly at random from the set of all users.
2. If u = v, u requests Q directly from the server, receiving response R.

3.
Otherwise, u chooses a shortest path (u, M1, u1, . . ., Mn, v) from u to v in the bipartite graph.Remark 1.Many variations of Protocol 1 are possible, including randomising the path, and alterations to save used memory.Such changes do not alter the results in the following sections.Any user with access to a message space M i on the path can observe the request, the identity of the proxy and the response; but gains limited information about the source of the request.
Domingo-Ferrer, Stokes, Bras-Amorós, and co-authors introduced UPIR systems and analysed a protocol where users write queries to message spaces without specifying a proxy [4,8], while a special case of the above protocol was developed by Swanson and Stinson [9,10].Both groups of authors worked on UPIR systems derived from highly structured set systems.In particular, they required that every pair of users share a common message space, in which case Protocol 1 can be implemented so that every path has length at most 2: any user can write requests directly to his chosen proxy.
Stokes and Bras-Amorós considered the problem of constructing a UPIR system under the restrictions that deg(M ) is constant for all message spaces M , [8].This requirement can be interpreted as balancing the load amongst message spaces.They also require that every pair of users share precisely one message space.After rejecting some degenerate solutions in which message spaces have size 1, 2, n − 1 or n; the authors are left with precisely the class of finite projective planes.We recall that a projective plane is a combinatorial structure consisting of points and blocks in which 1. Every pair of points is contained in a unique block 2. Every pair of blocks intersect in a unique point 3.There exist 4 points, no three contained in any block Projective planes have a rich theory playing an important role in combinatorics, geometry and algebra.Inspired by Stokes and Bras-Amorós, Swanson and Stinson analysed attacks on projective plane UPIR systems, and proposed UPIR systems constructed from a broader class of block designs.In particular, they considered balanced incomplete block designs and pairwise balanced designs (PBD).The monograph by Beth, Jungnickel and Lenz is a standard reference for design theory [2].
In the next section, we will show that observers in a UPIR system have an advantage in gathering information about users with whom they share a message space.Motivated by this result, we consider the next obvious class of UPIR systems; those in which users are separated by a path of length at most 2. A natural class of examples are furnished by finite generalised quadrangles.Furthermore we consider a different protocol, based on onion routing, and prove that it aids privacy.

Privacy in a UPIR system
It is assumed throughout that the content of any message space is only available to the users who have access to the given message space as in Protocol 1.An external eavesdropper, i.e., someone who is not a user in the system, can observe the requests made to the database, since these are not encrypted, but cannot read messages sent between users.Security in this setting has been studied previously in [9] and their result forms the basis of any UPIR scheme.Definition 2. A UPIR system is private with respect to external observers if, for any request Q forwarded to the database by user v, we have that Swanson and Stinson have proved that the obvious strategy in which users select proxies uniformly at random is sufficient for privacy against external observers.
Theorem 1 (Theorems 6.1, 6.2 [9]).A connected UPIR scheme is private against external observers if each user chooses proxies uniformly at random, and the proxies for distinct queries are chosen independently.
Protocol 1 can be implemented on any connected UPIR scheme; and Theorem 1 shows that the scheme is private against external observers.
More recent research on UPIR has aimed at preserving privacy with respect to other users in the UPIR scheme, under the assumption that users are honest but curious.That is, they act according to Protocol 1, but they may attempt to determine the source of any queries that they observe.
Since the message space into which a request is written already reveals nontrivial information about the source of the request, perfect privacy with respect to other users is, in general, impossible.For example: in a PBD-UPIR scheme, it can be inferred that the source of a request written to the message space M is a user with access to M .
First, we develop a criterion for judging whether a UPIR system is secure in terms of maintaining users' privacy.Our analysis will be based on linked queries, which are a series of queries which are identifiable as coming from a single source.These were first introduced by Swanson and Stinson, who provided the example of a series of requests for information about a fixed, obscure topic [9].Definition 3. Let C be a coalition of users, collaborating to identify the source of a series of linked queries.Users u and v are pseudonymous with respect to C if for any message space M to which C has access, and for which We allow the possibility that a coalition has non-trivial prior information on the probability that user u wishes to evaluate query Q.In our analysis we focus on the case where this information is limited, and users cannot be identified by their queries alone.The next result follows directly from the definition of pseudonymity, but we record it since we will have use for it in later sections.
Lemma 1. Pseudonymity with respect to the coalition C is an equivalence relation on the users of a UPIR scheme.
The coalition C can resolve the identity of a user u if and only if u belongs to a pseudonymity class of size 1 with respect to C. Users u and v maintain pseudonymity with respect to C after arbitrarily many requests have been observed if and only if they lie in the same pseudonymity class.We propose the following definition for security.Definition 4. Let (V i ) be a family of UPIR schemes indexed by i ∈ N, where the number of users in V i is n i .We say that V i is secure against coalitions of size t if the pseudonymity relation of any coalition of size t contains a giant component, i.e., the union of all other components has size O(n 1− i ) for some > 0. The family (V i ) is secure if for every t there exists N ∈ N such that V i is secure against coalitions of size t for all i ≥ N .
Informally, we consider a UPIR scheme to be secure if any coalition of size at most t can observe only a negligible portion of the system.Equivalently, for any fixed coalition C of bounded size a randomly chosen subset of users, of limited size, will be mutually pseudonymous with respect to C with high probability.Our first result is that families of PBD-UPIR schemes are never secure.
Theorem 2. In a PBD-UPIR scheme, a single eavesdropper can resolve the identity of any user who makes sufficiently many linked queries.Equivalently, for any coalition of eavesdroppers, every pseudonymity class has size 1.
Proof.Suppose that u makes a series of linked queries.An eavesdropper c will observe a subset of these queries in the unique message space M shared by c and u, and will never observe linked queries in any other message space to which he has access.Since users do not write queries addressed to themselves1 c will be able to identify u as soon as he has observed a linked query addressed to every other possible user in M .Provided that u follows the requirements of Theorem 1, c will observe the required queries with probability 1.
In fact, a pair of collaborating users c 1 and c 2 can identify u far more quickly.If c 1 observes a query in the message space M 1 and c 2 observes a linked query in M 2 , then the collaborators can conclude that the source of the requests is a user in M 1 ∩ M 2 .But in a PBD-UPIR scheme, such a user is unique.This is called an intersection attack in [9].Theorem 2 can be easily modified to identify all users in any UPIR scheme in which every pair of users share at least one message space.In particular, all of the UPIR schemes proposed by Swanson and Stinson to circumvent the intersection attack are still vulnerable to Theorem 2; although it will take more linked queries to identify the source.
To protect against the attack outlined in Theorem 2 we suggest using a different incidence structure that we will introduce in the following section.

Generalised Quadrangles
In this section we introduce generalised quadrangles (GQ).For the sake of completeness, we include proofs of some well-known results, for further reading see [7].We will show that the bipartite incidence graphs of generalised quadrangles have diameter 4. So in a UPIR scheme derived from a GQ (GQ-UPIR scheme in short), a pair of users either shares a message space, or there exists a third user sharing message spaces with each of the first two.As a result, when users communicate along a shortest path, a message is written to at most 2 message spaces.In this section, we use the usual language of incidence geometry; in a GQ-UPIR scheme, users are labelled by points and message spaces by blocks.Definition 5. A generalised quadrangle is an incidence structure containing points and blocks in which two blocks can intersect in at most one point, and which satisfies the GQ Axiom: given any point x and block L that does not contain x, there is a unique point x in L that shares a block with x.
Even though we are dealing with an abstract incidence structure, there is a natural representation of this structure as a geometry.It is traditional for the blocks in a generalised quadrangle to be referred to as lines.Indeed, a generalised quadrangle is so-named because there are no triangles (three lines intersecting pairwise in three distinct points) but numerous quadrangles in such a geometry.Lemma 2. There are no non-degenerate triangles in a generalised quadrangle, but any two non-collinear points are contained in a quadrangle.
Proof.Recall that a triangle is a triple of distinct lines L 1 , L 2 , L 3 with pair-wise non-empty intersections, say x ij ∈ L i ∩ L j .Note that x 12 is collinear with both x 13 and x 23 .By the GQ Axiom, if x 12 / ∈ L 3 , then there exists a unique point on L 3 collinear with x 12 : in other words, x 13 and x 23 cannot both be collinear with x 12 , and there are no triangles.
On the other hand, consider two non-collinear points x and y, and consider two lines L 1 and L 2 incident with y.The point x in not incident with either L 1 or L 2 , and, by the GQ Axiom, there is a point w on L 1 and a point z on L 2 such that x is collinear with both w and z.If the line incident with both x and z is L 3 and the line incident with both x and w is L 4 , then the quadruple of distinct lines L 1 , L 2 , L 3 , L 4 is the desired quadrangle.
For a point x in a generalised quadrangle Q we write B 1 (x) for the set of points collinear with x.By convention, x / ∈ B 1 (x).Suppose that y / ∈ {x} ∪ B 1 (x), and let L be any line through x.By the GQ axiom, y is collinear to a unique point on L, so y is at distance 2 from x, which we denote by y ∈ B 2 (x).In fact, since the choice of L was arbitrary, we obtain a bijection: every line through x intersects a unique line through y.Hence every point lies on a fixed number t + 1 of lines.A similar argument shows that all lines have the same size, which we denote s + 1.The standard definition in the literature is to say that a finite generalised quadrangle has order (s, t) if there are s + 1 points incident with a given line and t + 1 lines incident with a given point.Routine counting arguments can be used to establish the following well-known result.
Lemma 3. The number of points in a finite generalised quadrangle of order (s, t) is (s + 1)(st + 1).For any point x in the GQ, there are s(t + 1) points in B 1 (x) and s 2 t points in B 2 (x).
Proof.There are t + 1 lines through x, each containing s points distinct from x. Since a GQ contains no non-trivial triangles, these lines are disjoint (outside of x).So there are s(t + 1) points collinear with x, and |B 1 (x)| = s(t + 1).
Consider now a point y in B 2 (x).Since y / ∈ B 1 (x), y is not incident with any line through x; choose such a line L. By the GQ Axiom, y is collinear with a unique point on L. Since there are s points on L other than x, and each of these points is collinear with s(t + 1) − s = st points not on L, there are exactly s • st = s 2 t points in B 2 (x).
The following result, due to D.G. Higman, shows that the parameters s and t cannot differ by too much, in general.
Our analysis of pseudonymity relations in a GQ-UPIR scheme will require the concept of a hyperbolic line in a GQ, which we introduce now.In a finite generalised quadrangle Q of order (s, t), given any two non-collinear points x and y, by the GQ Axiom, there is a collection C of exactly t + 1 points collinear with both x and y.Thus there are at least two points, x and y, that are collinear with all the points in C, but there could be more.Definition 6.Given a set of pairwise non-collinear points X in a finite generalised quadrangle, we define B 1 (X ) to be the set of points collinear with each point in We define the span of X to be the set of points collinear with every point of When X = {x 1 , . . ., x m }, we often write B 1 (x 1 , . . ., x m ) to denote B 1 (X ) and sp(x 1 , . . ., x m ) to denote sp(X ).Note that, for non-collinear points x and y in a generalised quadrangle of order (s, t) we have {x, y} ⊆ sp(x, y) and, by the GQ Axiom, |B 1 (x, y)| = t + 1.The set sp(x, y) is often referred to as the hyperbolic line defined by x and y.The following results show that hyperbolic lines have incidence properties similar to those of ordinary lines.
We end this section by collecting relevant information about the families of classical generalised quadrangles.These families are related to certain classical groups, and are thus highly symmetric.In each case, the size of spans of sets of a given type are constant, regardless of which points within the generalised quadrangle are chosen.In the following table, q is a prime power, and x, y, and z are three mutually noncollinear points.

Secure GQ-UPIR systems
We begin this section by describing in detail the pseudonymity relation on a GQ-UPIR scheme for a single eavesdropper.

Q
Order Span size W(3, q), q odd (q, q) |sp(x, y)| = q + 1 Q(4, q), q even (q, q) |sp(x, y)| = q + 1 Q(4, q), q odd (q, q) |sp(x, Proposition 1.In a GQ-UPIR scheme using Protocol 1, the pseudonymity classes with respect to a single eavesdropper c are singleton classes for users at distance 1 from c, and are of the form sp c, u \ {c} for any user u at distance 2 from c. Proof.Suppose that c observes a series of linked queries; if c shares a message space with u, then the queries always appear in this message space.Otherwise by the GQ axiom, for every line through c, there is a unique user on that line collinear with u.This implies that c observes linked queries from u distributed uniformly across all message spaces to which c has access.Hence c can decide whether u is at distance 1 or distance 2. If d(c, u) = 1, then an argument exactly analogous to that of Theorem 2 shows that c can resolve the identity of u.So suppose that d(c, u) = 2.For a fixed line M containing c, there is a unique user u 1 sharing a message space with u.Over sufficiently many linked queries, c will observe queries addressed to every user in M except for x.As a result, c learns X = B 1 (c) ∩ B 1 (u).Now recalling Definition 6, suppose that v ∈ sp u, c 1 .Then by Lemma 4, B 1 (c 1 ) ∩ B 1 (v) = X .It follows that u and v are pseudonymous.Likewise, any other user in sp u, c 1 \ c 1 falls into the same pseudonymity class.
An easy corollary of Theorem 1 is that a single user in a GQ-UPIR scheme can resolve the identity of every other user in the scheme if and only if every hyperbolic line in the GQ has size 2.There are two such families known: Q(4, q) where q is an odd prime power, and H(4, q 2 ) D .The data given in Table 3 shows that the pseudonymity relation on a GQ-UPIR scheme will never give a giant component when using Protocol 1.We introduce a new protocol, inspired by onion routing, which will be secure against large coalitions of users.
Protocol 2. Let (U ∪ M, E) be a GQ-UPIR system.Suppose furthermore that every user has been assigned a public key, and that user u wishes to retrieve the response to the query Q from the database.
1. u chooses a user v uniformly at random from the set of all users, and generates a private key ψ.
2. If u = v, u requests Q directly from the server, receiving response R.
3. If d(u, v) = 1, then user u encrypts both the query Q and the private key ψ using v's public key φv, and writes the request [v, φv(Q), φv(ψ)] to the unique message space that they share.5.When v receives the request, he forwards Q to the database, receives response R and writes the response [(v), φv(Q), ψ(R)] to the message space in which the query was observed.The response is returned to user u as in Protocol 1.
Remark 2. In Protocol 2, the only user who learns the query Q is the proxy v; this means that users do not observe linked queries addressed to other users.The use of a private key is necessary since revealing u's public key to v would compromise u's privacy.
Proposition 2. In a GQ-UPIR scheme using Protocol 2, all users at distance two from a single eavesdropper c are pseudonymous.
Proof.As in Proposition 1, c can identify whether the source u of a series of linked queries is at distance 1 or distance 2. Suppose that u is at distance 2. Then c observes linked queries in every message space to which he has access with equal probability.Without observing queries addressed to other users he gains no information about members of B 1 (u), and so any pair of users at distance 2 are pseudonymous.
In contrast to GQ-UPIR schemes, it can be shown that encryption of messages offers limited benefits in PBD-UPIR schemes.In particular, a single eavesdropper learns which message space he shares with another user: this means that there can be no giant component in the pseudonymity relation.It can be shown that a coalition of three users, not all sharing a single message space, suffices to identify any other user in a projective plane UPIR scheme.In essence, intersection attacks do not require coalition members to observe queries addressed to other users.Theorem 4. A GQ-UPIR scheme using Protocol 2 is secure against coalitions of users of size O(n 1/4− ).Hence any family of GQ-UPIR schemes is secure in the sense of Definition 4.
Proof.Let c 1 and c 2 be members of a coalition, and suppose that d(c 1 , u) = d(c 2 , u) = 1.Write M i for the (unique) message space shared by c i and u.An intersection attack identifies u as the unique user with access to both spaces.We conclude that the identity of any user at distance 1 from more than 1 coalition member can be resolved.
Suppose now that c 1 is the unique coalition member at distance 1 from u. Then c 1 learns the message space M shared with u.By the GQ axiom, every other user is at distance 1 from a unique user in M \ {u}.So the coalition gains information about the identity of u proportional to the number of neighbours that they have in M .In particular, the identity of u is completely resolved if and only if every user in M \ {u, c 1 } is a neighbour of some coalition member.In this last case, the size of the coalition is at least proportional to the number s + 1 of points on a line in the GQ.By Proposition 2, the users at distance 2 from every member of the coalition C form a single pseudonymity class with respect to C. We estimate the size of this pseudonymity class.By Lemma 3, the number of users at distance 1 from a single user is s(t + 1).To apply Definition 4, we require |C|(st + s) ≤ n 1− where n = s 2 t + st + s + 1 is the number of users in the scheme.
1+α 2+α ) = O(n 3/4 ), where the second equality comes from α ≤ 2. It follows that when |C| = O(n 1/4− ), the set of users at distance 2 from C form a giant component in the pseudonymity relation of C.