Keywords

1 Introduction

Oblivious RAM (ORAM) has been a vigorous area of study for the last three decades since it was introduced by Goldreich and Ostrovsky [17]. ORAM focuses on a client-server model where the server stores an outsourced database upon which the client wishes to execute a series of reads and writes. ORAM provides privacy, hiding the contents of the database, as well obliviousness, hiding the client’s access patterns. In the traditional client-server model the client is assumed to be trusted. Recent efforts in the field have focused on lower bounds [35], optimal bandwidth [2, 31], and various different settings [15, 32].

Distributed ORAM (DORAM) is a variant of the basic client-server ORAM model in which there are multiple non-colluding servers. Data is duplicated across the servers and the client interacts with both as part of an access. The client again remains the only trusted party. It was first introduced by Ostrovsky and Shoup [30], and later formally defined by Lu and Ostrovsky [26]. Lu and Ostrovsky were motivated by the desire to circumnavigate existing lower bounds in the single-server setting for bandwidth overhead, and their construction achieved \(O(\log N)\) overhead by leveraging two non-communicating servers. Following their seminal paper there have been a number of works in the DORAM model that further reduce bandwidth [1, 7], reduce blocksize [25], or achieve practical efficiency [37].

Another advantage of the multi-server model of DORAM is its natural application to secure computation over databases in the RAM model. Traditional secure computation relies on a circuit representation that is at least linear in the size of the data over which it computes. This is prohibitive for any sublinear computation run on a database, such as binary search. Lu and Ostrovsky observe in [26] that the application of DORAM in this case is highly advantageous. The parties in the secure computation can simply emulate the DORAM client for any database access. In particular, they present a generic transformation from a 2-server DORAM scheme to a 2-party secure computation. It should be noted that works applying ORAM to secure computation are not limited to the DORAM setting, but also include adaptations of single server schemes. For example, there has been significant work on adapting tree-based ORAM schemes [33, 34] for secure computation. All of these DORAM constructions can be used in general-purpose secure computation like garbled RAM schemes [13, 14, 16, 27], or in special-purpose protocols like dynamic searchable encryption schemes [21].

There are two main approaches to constructing ORAM for secure computation: the first is to apply a generic MPC compiler, such as Garbled Circuits, to a ORAM or DORAM client [18, 19, 33, 34], and the second is to design a client specifically implemented by the two servers [5, 11, 23, 36]. Even if we start with an ORAM with our desired asymptotics (i.e. square-root ORAM [17]) applying a generic MPC compilers results in a server computation cost at least linear in the database size if we are to maintain constant rounds. There are a number of works that focus directly on the second model, which offers greater flexibility since the servers are typically afforded much more storage space than the client.

However, in both approaches, the multi-server setting introduces a new set of challenges apart from those found in the single-server ORAM setting. Wang et al. [33] also observe that the traditional efforts to optimize bandwidth overhead are ineffective in a setting where there are other controlling factors, such as the size of the circuit representation of the ORAM client. This is the motivation behind their Circuit ORAM construction, which focuses on optimizing circuit size. Doerner and shelat [11] also show that in many cases, bandwidth is not the limiting factor but rather the latency between the two servers. This encouraged them to build a constant round DORAM for secure computation. Previous constructions had relied on recursive structures, which incurred a \(O(\log N)\) rounds for each access, a prohibitive cost for latency dominated secure computation settings. Subsequent works in the constant round setting worked on improving on the \(O(\sqrt{N})\) overhead of Floram, achieving \(O(\log ^3 N)\) overhead [23], or \(O(\sqrt{N})\) in a black-box setting [5]. As with the original construction, these subsequent works require linear local computation for each server. While for small \(N \), latency costs may still dominate, for sufficiently large \(N \) this linear work is prohibitive. This leaves us with the following question:

Can we construct a 2-server Distributed ORAM for secure computation that achieves both constant round and sublinear server work?

In this work, we answer the above question in the affirmative.

1.1 Our Contributions

We present the first DORAM constructions in the 2-server, semi-honest secure computation setting to achieve constant rounds and sublinear local computation on the servers.

  • Our first sublinear DORAM construction achieves constant rounds and amortized local computation and bandwidth cost of \(O(\sqrt{N} \log N)\) per access. It is based on square-root ORAM and has a modular build, allowing for subsequent improvements in the functionalities we rely on to be easily substituted.

  • Our second sublinear DORAM construction is based on a secure computation of Doubly Efficient Private Information Retrieval (DEPIR) where the distinction between reads and writes is no longer hidden. In this setting, we achieve constant rounds with local computation and bandwidth of \(O(N ^\epsilon \cdot \mathrm{poly}(\lambda ))\) for any \(\epsilon > 0\).

As an crucial building block toward the second construction, we present a secure two-party computation protocol for the Fast Fourier Transform (FFT) for multivariate polynomial evaluation and interpolation in quasilinear time and with only local computation; this may be of independent interest.

1.2 Technical Overview

In this section, we describe both of our DORAM constructions in more detail.

Sublinear DORAM. We start with describing the original square-root ORAM (introduced by Goldreich and Ostrovsky [17]) that our construction is based on. There is a single read-only array of size \(N \), which we call the store, and a writable stash of \(\sqrt{N}\) size. Elements in the store are (address, value) pairs; at initialization, the elements are permuted with a permutation known only to the client, and all elements are encrypted. To perform a read at a particular address, the client checks the stash using a linear scan; if not present then it reads the permuted element from the read-only store, and if present then it is retrieved from the stash and a random ‘dummy’ element is read from the store instead. The newly-read element is placed in the stash, in order to maintain the invariant that each element is read only once from the store. In the case of a write, a dummy is read from the store and the element is written in the stash. After enough queries have been made to fill the stash, a duration that which we call an epoch, the elements from the stash are reshuffled back into the main store, with only the newest write at each location being kept.

While the basic square-root ORAM construction achieves constant rounds with sublinear communication and server computation, it is non-trivial to convert it to a two-party DORAM. There are two major issues incurred by shifting this to the two party case: (1) representing the permutation over the elements of the store and (2) merging the elements from the stash back into the store.

We first discuss how to represent the permutation that maps addresses to physical locations in the store. In [36], which is also based on square root ORAM, they choose to represent the permutation as a shared array in recursive ORAMs. This improves computation complexity but leads to \(O(\log N)\) rounds of communication. To maintain constant rounds, we must instead find a compact representation of the permutation. We look for inspiration from the original square-root scheme. There, they generate a random ‘tag’ for each element in the store using a random oracle and then sort the elements according to the tag. A lookup then involves only a random oracle evaluation and a binary search across the sorted elements. However, because it is a single server scheme, they must use an oblivious sorting network in order to break the correlation between items in different epochs, which does not run efficiently in constant rounds. We leverage the fact that we have a two servers to break up the oblivious sort into its two components, ‘oblivious’ + ‘sort’. To prevent the server from mapping items between epochs, we use a simple constant round functionality to obliviously permute elements that allows each server to permute the elements in turn. As long as one server is honest, the data is permuted obliviously. This allows us to generate the tags using an oblivious pseudorandom function (OPRF), rather than a random oracle, on the newly obliviously permuted elements and then sort the tags locally. Lookup again is just an OPRF evaluation on the address shares and then a local binary search on the store.

The second challenge arises during the reshuffling phase of the protocol. In the original square-root ORAM, elements are simply moved back into their original locations (updated elements in the store, dummies back in the stash) by executing another oblivious shuffle. To solve this in constant rounds, we again exploit the ability to obliviously permute elements by using our two server architecture. In order to do that though, we must ensure that the elements that we are permuting do not contain any duplicates. For example, if a read was executed on index \(\mathsf {i}\), there would be two copies of element \(\mathsf {i}\), one in the stash and one in the store. To solve this issue, we note that the elements that have been read in the store is public knowledge to both servers. As long as we maintain the invariant if an element has been read (or written to), it is in the stash, and each element only occurs in the stash once, we can simply concatenate elements in the stash with the unread elements in the store at the end of an epoch. Once we have concatenated the elements we can obliviously permute them to get our new store. The stash can then just be filled with new dummy elements.

A more detailed discussion of our construction can be found in Sect. 3.

DORAM with Unlimited Reads. Thanks to the modularity of our base scheme, the components are easily extensible. In the second half of this work, we improve the performance of the read-only data store while keeping the rest of the construction (the stash, our periodic shuffling technique at the end of each epoch, etc.) mostly intact.

The separation of our read-only store from a read-and-writable stash suggests an intriguing tradeoff: if we are willing to leak whether each operation is a read or a write operation, then it is beneficial to design an efficient read-only store that supports unlimited reads, and only pay for accessing the stash on (hopefully infrequent) write operations. This optimization allows us to increase the duration of each epoch, or in other words to amortize the cost of each shuffle over more reads. Concretely, in a scenario where the ratio of reads-to-writes is about \(N \)-to-1, then for any constant \(\epsilon > 0\) we can construct a read-only store where whose amortized cost per query is just \(O_{\lambda } (N ^\epsilon )\). Here, the notation \(O_{\lambda } \) means that we suppress \(\mathrm{poly}(\lambda )\) terms in order to focus on the dependency on the database size. By reducing the size of the stash to \(O_{\lambda } (N ^\epsilon )\), we can support write operations with this performance as well.

Our strategy to construct a unlimited-reads store might seem counterintuitive at first: we start from a doubly efficient PIR [4, 6] that supports unlimited reads and convert it into a two-server distributed data store. A doubly efficient private information retrieval (DEPIR) scheme is a client-server protocol for oblivious access to a public dataset that only requires sublinear computation for both the client and server operations and constant rounds of communication between the two. At first glance, it may seem that a 1-server DEPIR is a strictly stronger primitive than a 2-server DORAM, so we might expect to construct the latter generically as a secure computation of the former. However, this intuition isn’t true because there are three properties that we aim to satisfy with DORAM, but that (even a doubly efficient) PIR does not:

  • Support for writes,

  • Hiding the contents of the database, in addition to access patterns, and

  • Ensuring that the secure computation is constant rounds when the two parties collectively emulate the (sublinear but not constant time) client, in addition to the client-server communication.

The main observation underlying this approach is that the SK-DEPIR protocol of Canetti et al. [6] is highly amenable to secure computation as operations mostly involve linear algebra in a finite field that can be done purely locally, plus bitstring and set operations that are easy to handle in constant rounds. SK-DEPIR constructions are based on a locally decodable code (LDC) in the style of a Reed-Muller code, which encodes a dataset as a multivariate polynomial. As a result, the most challenging part of our multiparty computation protocol involves securely emulating the client’s procedures to evaluate or interpolate a multivariate polynomial at \(O(N )\) points. The naive methods for polynomial evaluation (via application of the Vandermonde matrix) or polynomial interpolation (via the Lagrange interpolation polynomial) involve multiplication of a public matrix by a secret-shared vector, which can be done non-interactively but requires \(O(N ^2)\) computation, which is too slow for our purposes.

Given a binary field \(\mathbb {F} = \mathrm {GF}(2^{\ell })\) and a subspace \(H ^m \subset \mathbb {F} ^m \), we construct secure computation protocols for evaluating or interpolating an \(m \)-variate polynomial \(p \in \mathbb {F} [x_1, \ldots , x_m ]\) on all points in \(H ^m \) in time that is quasilinear (rather than quadratic) in \(\vert H ^m \vert \). This protocol may be of independent interest, and in our protocol it is needed to achieve our goal of sublinear computation for the overall DORAM scheme. We construct this secure computation scheme in two stages: first we construct a secure computation protocol for the Additive Fast Fourier Transform protocol of Gao and Mateer [12] for univariate polynomials over a binary field, and then we bootstrap this protocol to handle multivariate polynomials by using recursion on the number of variables in the polynomial as previously shown by Kedlaya and Umans [24]. All operations in this protocol reduce to linear combinations of secret variables, so the entire computation can be done locally by each party on their own boolean secret shares without the need for any communication.

1.3 Related Work

We focus on schemes that are directly designed for secure computation. A direct comparison of their local computation, bandwidth, and number of rounds can be seen in Table 1. The construction of Zahur et al. [36] is very similar to our basic construction, but instead of implementing the permutation by OPRF evaluation, they use Waksman networks and a recursive position map. This allows for sublinear server work but that the cost of non-constant rounds. Doerner et al. [11] uses function secret sharing to obtain a scheme with very good practical efficiency, but their need for linear server work limits scalability to large database lengths \(N \). Gordon et al. [19] is in the more general DORAM model but uses PIR over tree-based ORAM. They are able to obtain \(O(\log N)\) bandwidth but as with Doerner et al. they require linear local computation. Jarecki et al. focus on decreasing the round complexity and bandwidth of SC-DORAMs while still maintaining sublinear server computation. They are able to get the best combined set of parameters, but are still not able to achieve constant rounds of communication. Finally, Bunn et al. [5] achieve a 3-server DORAM scheme that achieves constant rounds and sublinear bandwidth, while providing a black-box construction. However, as with [11, 19] they require linear server work.

Table 1. Comparison of access in DORAM schemes for Secure Computation. Asterisks indicate schemes where the stash size is assumed to be \(O(\log N)\) and \(O(N ^\epsilon )\), respectively, and the distinction between \(\textsf {read} \) and \(\textsf {write} \) is not hidden.

2 Preliminaries

In this section, we provide several definitions and constructions of existing cryptographic primitives that we leverage in this work. We begin with a brief summary of our notation.

Given a bitstring \(x \in \{0,1\}^\ell \), a 2-of-2 boolean secret sharing \(\langle x \rangle \) denotes the uniformly selection of two bitstrings \(x_1\) for party 1 and \(x_2\) for party 2 subject to the constraint that their boolean-xor \(x_1 \oplus x_2 = x\). A binary field \(\mathbb {F} = \mathrm {GF}(2^\ell )\) is a finite field of characteristic 2; there is a canonical bijection \(\mathbb {F} \leftrightarrow \{0,1\}^\ell \) such that field addition corresponds to boolean-xor. Hence, we overload the notation \(\langle f \rangle \) so that it applies to field elements \(f \in \mathbb {F} \). This secret sharing scheme commutes with linear algebra in the field, i.e., \(\langle c f + c' f' \rangle = c \langle f \rangle + c' \langle f' \rangle \) can be computed locally by each server from public constants \(c, c' \in \mathbb {F} \). and secret shares \(\langle f \rangle , \langle f' \rangle \).

We use the convention of 0-indexing, with \([N ] = \{0, 1, \ldots , N-1\}\) as containing all whole numbers less than \(N \). Additionally, \(S \times S'\) denotes the Cartesian product of two sets. Bold letters \(\varvec{v}\) denote vectors, subscripts \(\varvec{v}_i\) indicate the \(i^\text {th} \) element of a vector, and \(( w_i )_{i \in [N ]}\) constructs a vector from an ordered list of items \(w_0, w_1, \ldots , w_{N- 1}\). The notation \(\parallel \) denotes concatenation of bitstrings, sets, or vectors into a single object of longer length containing the (ordered) union of all elements.

The notation \(x \leftarrow \mathcal {D} \) indicates taking a sample from a probability distribution \(\mathcal {D} \). By abuse of notation, \(x \leftarrow S\) indicates sampling from the uniform distribution over set S; we sometimes use \(x \overset{\$}{\leftarrow } S\) for emphasis. We use \(\approx \) to indicate computational indistinguishability of two distributions; that is, \(\mathcal {D} \approx \mathcal {D} '\) if no probabilistic polynomial time adversary \(\mathcal {A}\) has a noticeable difference in output when given a sample from \(\mathcal {D} \) or \(\mathcal {D} '\).

2.1 Distributed Memory

First introduced by Bunn et al. [5], the ideal functionality \(\mathcal{F}_{mem}\) in Fig. 1 captures the behavior achieved by a DORAM. The database is initialized on secret shares of the database, and subsequent accesses are also secret shared, as is their resulting output. This version of the definition deviates from the original in that the \(\mathsf {Init}\) functionality returns shares of the database, and the access protocol takes in those same shares. This syntactic difference is included only to make our own proofs cleaner and does not fundamentally change the definition. While Bunn et al. provide a viable 3-server construction that meets this ideal functionality and provides the necessary performance; we leverage the construction of Doerner et al. [11] that requires only 2-servers. From their construction, we obtain Lemma 1.

Fig. 1.
figure 1

Functionality \(\mathcal{F}_{mem}\)

Lemma 1

There exists a protocol \(\varPi _{DORAM}\) that implements the functionality \(\mathcal{F}_{mem}\) with the following complexity:

  • Access of \(\mathsf {op}= \mathsf {read}\) or \(\mathsf {op}=\mathsf {write}\) results in O(1) rounds of communication, O(n) local server computation, and \(O(\sqrt{n})\) communication bandwidth.

  • Initializing the functionality results in O(1) rounds of communication, O(n) local server computation, and O(n) communication bandwidth.

2.2 Distributed Oblivious Pseudo-random Function

Distributed Oblivious Pseudo-random Function (DOPRF) achieves a distributed evaluation of a PRF between two parties. Typically one party hold the key, and the other the input, and only the second party learns the output. We require a variation of this ideal functionality, presented in Fig. 2, in which both the key and the input are additively secret shared between two the two parties and both parties receive the output of the evaluation.

We introduce a construction in Fig. 3 which meets our new ideal functionality that is effectively the semi-honest version of the DOPRF of Miao et al. [28], which itself is based on the work of Jarecki and Liu [22]. With only a small modification that allows the input and key to be secret-shared between the two servers. The construction leverages the Dodis-Yampolskiy pseudorandom function \(F(k, x) = g^{1/(k+x)}\) [10], and it is secure under the q-Diffie Hellman inversion (q-DHI) assumption using a similar argument as in [28].

Fig. 2.
figure 2

Functionality \(\mathcal{F}_{DOPRF}\)

Fig. 3.
figure 3

DOPRF Protocol, using ElGamal encryption \((\mathsf {ENC},\mathsf {DEC})\)

2.3 Constant-Round Equality Check

The functionality introduced in Fig. 4 allows for two parties to check if the element for which they both hold shares is present in a database for which they also hold shares. In particular it returns shares of a boolean \(\mathsf {b}\) indicating the presence of a match, and if so the shares of that address. The database follows the invariant that there is only a single match within the database for the element. Both Damgard et al. and Nishide et al. [9, 29] construct solutions that achieve the computationFootnote 1 with constant rounds.

2.4 Doubly Efficient Private Information Retrieval

First introduced by Canetti et al. and Boyle et al. [4, 6], Doubly Efficent Private Information Retrieval (DEPIR) is a variant of PIR achieving sub-linear server work by allowing pre-processing of the database. The major building block DEPIR is locally decodably codes (LDCs). Specifically, an application of Reed-Muller Codes, which allows for smooth LDCs.

Definition 1 (Smooth LDC)

A s-smooth, k-query locally decodable code with message length N, codeword size M, with alphabet \(\varSigma \) is denoted by \((s,k,N,M)_{\varSigma }\)-smooth LDC and consists of a tuple of PPT algorithms \((\mathsf {Enc}, \mathsf {Query}, \mathsf {Dec})\) with the following syntax:

  • \(\mathsf {Enc}\) takes a message \(m \in \varSigma ^N\) and outputs a codeword \(c \in \varSigma ^M\)

  • \(\mathsf {Query}\) takes a index \(i \in [N]\) and outputs a vector \(\mathbf{x} = (x_1,\dots , x_k) \in [M]^N\)

  • \(\mathsf {Dec}\) takes in vector codeword symbols \(\mathbf{c} = (c_{x_1},\dots , c_{x_k}) \in \varSigma ^N\) and outputs a symbol \(y \in \varSigma \)

And has the following properties:

  • Local Decodability: For all messages \(m \in \varSigma ^L\) and every index \(i \in [N]\):

    $$\Pr [ \mathsf {Dec}(\mathsf {Enc}(m)_\mathbf{x }) = m_i ~:~ \mathbf{x} \leftarrow \mathsf {Query}(i)] = 1$$
  • Smoothness: For all indices \(i \in [N]\), a LDC is s-smooth if when sampling \((x_1,\dots , x_k) \leftarrow \mathsf {Query}(i)\), \((x_1,\dots , x_k)\) is uniformly distributed on \([N]^s\) for every distinct subset of size s.

Fig. 4.
figure 4

Functionality \(\mathcal{F}_{EQ-DB}\)

We now formally introduce DEPIR, in particular the secret key variant, called SK-DEPIR. Constructions rely on the hidden permutation with noise (HPN) assumption introduced by [6].Footnote 2 While it is a new assumption, the validity of the class of permuted puzzles assumptions has been explored by Boyle et al. [3].

Assumption 1

(Hidden permutation with noise). Let \(m< t< r< u < \vert \mathbb {F} \vert \) be functions of \(\lambda \) and \(N \) such that \(\vert \mathbb {F} \vert ^m = \mathrm{poly}(\lambda )\) and \(\vert \mathbb {F} \vert ^t = \lambda ^{\omega (1)}\). Define the distribution \(\mathcal {D} (\pi , \mathsf {addr}, T)\) that executes the \(\mathsf {Query}\) protocol of \(\tilde{\varPi }_{\textit{store}} \) in the clear (without secret shares) to retrieve a set of vectors \(\tilde{Y} = (\varvec{\tilde{y}_i})_{i \in [u ]}\) and then outputs \(Z = (\pi (\varvec{\tilde{y}_i}))_{i \in [u ]}\), when given a randomly-chosen permutation \(\pi : \mathbb {F} ^m \twoheadrightarrow \mathbb {F} ^m \), integer \(\mathsf {addr}\in N \), and set \(T \subset [u ]\) as input. The hidden permutation with noise assumption states that the distribution \(\mathcal {D} (\pi , \mathsf {addr}, T)\) is computationally indistinguishable from the uniform distribution over \((\mathbb {F} ^m )^u \).

Definition 2 (Doubly Efficient PIR)

A Doubly Efficient PIR (DEPIR) for alphabet \(\varSigma \) consists of a tuple of PPT algorithms \((\mathsf {KeyGen},\mathsf {Process}, \mathsf {Query},\mathsf {Resp},\mathsf {Dec})\) with the following syntax:

  • \(\mathsf {KeyGen}\) takes the security parameter \(1^\lambda \) and outputs the key k

  • \(\mathsf {Process}\) takes a key k, database \(\mathsf {DB}\in \varSigma ^N\) and outputs processed database \(\tilde{\mathsf {DB}}\)

  • \(\mathsf {Query}\) takes a key k, database index \(i \in [N]\) and outputs a query q and temporary state \(\mathsf {State}\)

  • \(\mathsf {Resp}\) takes a query q, processed database \(\tilde{\mathsf {DB}}\) and outputs a server response c

  • \(\mathsf {Dec}\) takes a key k, server response c, temporary state \(\mathsf {State}\) and outputs a database symbol \(y \in \varSigma \)

And has the following properties:

  • Correctness: For all \(\mathsf {DB}\in \varSigma ^N\) and \(i \in [N]\):

    $$\begin{aligned} \Pr \left[ \mathsf {Dec}(k, \mathsf {State}, c) = \mathsf {DB}_i ~ : ~ \begin{array}{rl} k \leftarrow \mathsf {KeyGen}\left( 1^\lambda \right) \\ \tilde{\mathsf {DB}} \leftarrow \mathsf {Process}\left( k, \mathsf {DB}\right) \\ (q, \mathsf {State}) \leftarrow \mathsf {Query}\left( k, i \right) \\ c \leftarrow \mathsf {Resp}\left( \tilde{\mathsf {DB}}, q\right) \\ \end{array} \right] = 1 \end{aligned}$$
  • Double Efficiency: The runtime of \(\mathsf {KeyGen}\) is \(\mathrm{poly}\left( \lambda \right) \), the runtime of \(\mathsf {Process}\) is \(\mathrm{poly}\left( N,\lambda \right) \), and the runtime of \(\mathsf {Query},\mathsf {Dec}\) is \(o\left( N\right) \cdot \mathrm{poly}\left( \lambda \right) \), where N is the database size.

  • Security: Any non-uniform PPT adversary \(\mathcal {A}\) has only \(\mathrm{negl}\left( \lambda \right) \) advantage in the following security game with a challenger \(\mathcal{C}\):

    1. 1.

      \(\mathcal {A}\) sends to \(\mathcal{C}\) a database \(\mathsf {DB}\in \varSigma ^N\).

    2. 2.

      \(\mathcal{C}\) picks a random bit \(b \leftarrow \{0,1\}\), and runs \(k\leftarrow \mathsf {KeyGen}\left( 1^\lambda \right) \) to obtain a key k, and then runs \(\tilde{\mathsf {DB}}\leftarrow \mathsf {Process}\left( k, \mathsf {DB}\right) \) to obtain a processed database \(\tilde{\mathsf {DB}}\), which it sends to \(\mathcal {A}\).

    3. 3.

      \(\mathcal {A}\) selects two addresses \(i^0, i^1 \in [N]\), and sends \((i^0,i^1)\) to \(\mathcal{C}\).

    4. 4.

      \(\mathcal{C}\) samples \((q, \mathsf {State}) \leftarrow \mathsf {Query}(k, i^b)\), and sends 1 to \(\mathcal {A}\).

    5. 5.

      Steps  and are repeated an arbitrary (polynomial) number of times.

    6. 6.

      \(\mathcal {A}\) outputs a bit \(b'\), and his advantage in the game is defined to be \(\Pr [ b = b'] - \frac{1}{2}\).

As shown in [4, 6] we can achieve SK-DEPIR with sublinear or poly-log parameters. We will describe one such construction in Sect. 4.

Lemma 2

There exists SK-DEPIR schemes with the following parameters, where N is the database size and \(\lambda \) is the security parameter:

  • Sublinear SK-DEPIR: For any \(\epsilon > 0\), the running time of \(\mathsf {Process}\) can be \(N^{1+\epsilon } \cdot \mathrm{poly}(\lambda )\), and the running time of \(\mathsf {Query}\) and \(\mathsf {Dec}\) can be \(N^\epsilon \cdot \mathrm{poly}(\lambda )\).

  • Polylog SK-DEPIR: The running time of \(\mathsf {Process}\) can be \(\mathrm{poly}(\lambda , N)\), and the running time of \(\mathsf {Query}\) and \(\mathsf {Dec}\) can be \(\mathrm{poly}(\lambda , \log N)\).

3 DORAM with Sublinear Computation

In this section we present our construction of \(\mathcal{F}_{mem}\) that achieves sublinear server work and communication with constant rounds.

3.1 Construction

In this section, we describe how we bootstrap from a linear-work \(\mathcal{F}_{mem}\) to a new protocol \(\varPi _{DORAM}\) that also instantiates \(\mathcal{F}_{mem}\) but with sublinear work and constant rounds, as desired. The overall architecture of the scheme can be found in Fig. 5. We describe below our implementations of the store and stash.

We implement the stash as an another two-party DORAM (matching the \(\mathcal{F}_{mem}\) functionality). We require a 2-party scheme with constant rounds, this can be instantiated by FLORAM [11] or Gordon et al. [19]. While they have linear server work for each access, because our stash is \(t= \sqrt{N}\) records in size, this still results in sublinear server work within our protocol.

We implement the store in Fig. 9 as a permuted array of elements sorted by PRF evaluation on the address of the element. Neither server knows the underlying permutation because it is created using our oblivious permutation protocol shown in Fig. 8. We perform an \(\mathcal{F}_{DOPRF}\) evaluation across the shares of the addresses, which allows us to look up records in constant rounds by computing the OPRF based on the address of the element being searched for, and then each party performs a local binary search on their own store to find the shares of the element.

At the start of an epoch, the stash contains all the dummy elements and the store contains all the elements of the database concatenated with t dummy elements. The elements in the store are all permuted and indexed as above. Note that we consider dummy to be addressed from 1 to \(t\), so valid elements are indexed started at \(t\). We also have (in the clear) a counter, starting at 1. The access logic is encapsulated within our access protocol in Fig. 7 and proceeds as follows. When we want to do a \(\mathsf {read}\), we check if the element is in the stash by calling \(\mathcal{F}_{EQ-DB}\), which returns a secret shared boolean \(\mathsf {b}\) indicating if the element is present, as well as the shares of the address to each party if it is present. We then use \(\mathsf {b}\) as a selector bit in a shared conditional computation to see if we read the element (if the element is present in the stash) or the next dummy element (addressed at the counter) in the stash. Then we read an element from the store, using \(\mathcal{F}_{store}\), again based on the selector bit. If the element is in the stash, we read the next dummy element at address counter, if it is not in the stash, we read the element itself. Finally, we write an element back to the stash, using \(\mathcal{F}_{mem}\), either the dummy element we read (which is just overwriting the same element) if the element was in the stash, or the element read from the stash. The element is written back at the ‘counter’ location in the stash. The protocol then returns random additive shares of the element being read. If the operation is instead was a \(\mathsf {write}\) the only variation in the above process is in the final step writing elements back to the stash, rather that writing to the ‘counter’ location automatically, if the element was previously in the stash, it is overwritten at that location.

At the end of an epoch (when the counter reaches \(t\)), the overarching \(\varPi _{DORAM}\) invokes \(\varPi _{shuffle}\) in Fig. 10, which resets the state as mentioned above. In the original square-root ORAM scheme, removing duplicates required a costly oblivious sort operation, which is not constant round. By contrast, we achieve a constant-round reshuffling algorithm by leveraging the following invariant of \(\mathcal{F}_{access}\): if an element has been read (or written to) it is in the stash, and each element only occurs in the stash once. This invariant allows us to simply note which elements in the store have been read during the epoch and eliminate them, knowing that their most recent copy is represented in the stash. This claim applies to dummy elements as well: during the shuffle operation, we only need to insert new dummy elements to replace those that have been overwritten in the stash by real writes. Once the unread elements and the current stash have been permuted obliviously by \(\mathcal{F}_{permute}\), the stash is reinitialized with the dummy values and counter is reset to 1. We also note that we leverage \(\mathcal{F}_{shuffle}\) when we first initialize the DORAM. We call \(\mathcal{F}_{shuffle}\) on the original shares of the secret shared database, concatenated with the necessary dummy elements. The set of read elements is empty as is the stash, resulting in a permutation of the original database and dummies after \(\mathcal{F}_{permute}\) is called.

Our oblivious permutation protocol in Fig. 8 does two things: it rerandomizes the shares held by each server and applies the same random permutation to each server’s shares. Beginning with a vector of secret shares \(\langle \mathsf {M} \rangle \), server 2 begins \(\varPi _{\textit{permute}}\) by encrypting her own shares \(\mathsf {M}^2\) using an additively homomorphic encryption scheme and sending the result to server 1. Next, server 1 applies the same randomly-chosen permutation to her own shares \(\mathsf {M}^1\) as well as the ciphertexts from server 2, and she then rerandomizes each pair of shares by adding a random value to her own share and subtracting the same value (homomorphically) from server 2’s share. She sends encrypted versions of both shares to server 2, who performs the same permute-and-rerandomize operation and sends the result to server 1 to complete the constant-round oblivious permutation.

Fig. 5.
figure 5

The overall architecture of the ideal functionalities used in the \(\varPi _{DORAM}\) construction.

Fig. 6.
figure 6

\(\varPi _{DORAM}\) Protocol

Fig. 7.
figure 7

\(\varPi _{access}\) Protocol

Fig. 8.
figure 8

\(\varPi _{permute}\) Protocol

Fig. 9.
figure 9

\(\varPi _{store}\) Protocol

Fig. 10.
figure 10

\(\varPi _{Shuffle}\) Protocol

3.2 Complexity Analysis

Now consider the asymptotic complexity of our scheme. We first evaluate the complexity of the underlying protocols, and then compute the amortized complexity of the overall \(\varPi _{DORAM}\) protocol. The overall complexity when \(t= \sqrt{N}\) is shown in Table 2.

  • \(\varPi _{permute}\): Each server must perform \(O(N +t)\) encryption, decryption and other local operations. The entire encrypted store is sent, again resulting in \(O(N +t)\) bandwidth. The protocol runs in 3 rounds, or O(1).

  • \(\varPi _{store}\): Here we consider two separate costs, one for initialization, and one for performing an access. During initialization, local computation is dominated by the sorting across the OPRF outputs, \(O\left( (N +t) \log (N +t)\right) \), and bandwidth by the OPRF computation itself, \(O(N +t)\). We obtain constant rounds in initialization by executing all of the OPRF evaluations in parallel. On access, local computation is dominated by searching for the tag, \(O(\log (N +t))\), and the only round of interaction and bandwidth is the OPRF evaluation.

  • \(\varPi _{access}\): Finding the element in the stash only takes local computation and bandwidth linear in the stash size and constant rounds. The two other operations of cost are accessing stash and the store, each of which take \(O(t)\) and \(O(\log (N +t))\) local computation and \(O(\sqrt{t})\) and O(1) bandwidth respectively. This leaves access dominated by finding the element in the stash, \(O(t)\) local computation and bandwidthFootnote 3.

  • \(\varPi _{shuffle}\): Shuffle is dominated by the initialization of the store, inheriting the performance and bandwidth complexity directly from \(\varPi _{store}\).

We now consider the amortized complexity of the overall local computation of \(\varPi _{DORAM}\) during access. We consider the cost of shuffling averaged over an epoch of \(t\) accesses. The cost of accessing a single block, represented by \(\varPi _{access}\), is \(O(t)\). The cost of shuffle is \(O((N +t)\log (N +t))\). We can consider the total cost of local computation during an epoch as:

$$D_{LC}(N,t) = t(t) + (N +t)\log (N +t)$$

Averaging over \(t\)-accesses we get:

$$D_{LC}(N,t) = t+ \frac{N}{t}\log (N +t) +\log (N +t) $$

If we set \(t=\sqrt{N}\) we get \(D_{LC}(n) = O(\sqrt{N}\log N)\). For bandwidth, we do a similar computation and get \(D_{B}(n) = O(\sqrt{N}\log N)\).

Table 2. A evaluation of each of the protocol’s server computation, bandwidth and rounds of communication where \(t= \sqrt{N}\). Note that the numbers for \(\varPi _{mem}\) are taken from Lemma 1 and \(\varPi _{DORAM}\) has been amortized where appropriate.

3.3 Security

In this section we provide the overall security statement and ideal functionalities. We refer the reader to the full paper [20] for the proof.

Notation and Valid Inputs. We define a set of notations and valid inputs for our various protocols used in the following proofs. We assume \(\mathsf {op}\in \{0,1\}\) where \(\mathsf {op}= 0\) represents a \(\mathsf {read}\) operation and \(\mathsf {op}= 1\) is \(\mathsf {write}\). Valid elements are a tuple of an address and a value where \(\mathsf {addr}\in [N ]\) and \(\mathsf {val}\in \{0,1\}^B \). The input database \(\mathsf {DB}\) is made up of \(N \) valid elements. The store \(\mathsf {M}\) is represented as key-value store, where the keys are the output of PRF f with key \(\mathsf {k}\) and the values consist of valid elements. The stash \(\mathsf {S}\) is an array of \(t\) valid elements. We define the set of valid inputs for an \(\mathsf {DORAM}\) of \(N \) elements of block size \(B \) and \(t\) dummies as \(\mathsf {Dom}_{N, B, t}\).

Theorem 1

\(\varPi _{DORAM}\) (Fig. 6) implements functionality \(\mathcal{F}_{mem}\) and for each party there exists a PPT simulator for each Server \(s \in \{1, 2\}\) \(\mathsf {Sim}_{D.}\) such that:

$$\begin{aligned}&\left\langle \mathsf {input}_\mathcal {A}, \mathsf {output}^{\varPi _{D.}},\mathsf {view}^{\varPi _{D.}}_\mathcal {A}\right\rangle _{\mathsf {input}\in \mathsf {Dom}_{N, B, t}} \approx \\&\qquad \quad \left\langle \mathsf {input}_\mathcal {A}, \mathcal{F}_{mem}\left( \mathsf {input}\right) ,\mathsf {Sim}_{D.}^s\left( \mathsf {input}_\mathcal {A}, \mathcal{F}_{mem}\left( \mathsf {input}\right) _\mathcal {A}\right) \right\rangle _{\mathsf {input}\in \mathsf {Dom}_{N, B, t}} \end{aligned}$$

where \(\mathsf {input}= \{(\mathsf {Init}, \mathsf {DB}),\) \( (\mathsf {op}_i, \mathsf {elem}_i, \mathsf {ctr}, \mathsf {M}, \mathsf {S}, \mathsf {k})\}\), and \(\mathsf {output}=\{ (\mathsf {ctr}, \mathsf {M}, \mathsf {S}, \mathsf {k}),\) \( (\mathsf {elem},\mathsf {ctr}, \mathsf {M}, \mathsf {S}, \mathsf {k})\}\).

Proof (Theorem 1)

See full paper [20] for proof.

Fig. 11.
figure 11

Functionality \(\mathcal{F}_{access}\)

Fig. 12.
figure 12

Functionality \(\mathcal{F}_{permute}\)

Fig. 13.
figure 13

Functionality \(\mathcal{F}_{store}\)

Fig. 14.
figure 14

Functionality \(\mathcal{F}_{shuffle}\)

4 Sublinear DORAM with Unlimited Reads

In this section, we introduce an alternative DORAM construction \(\tilde{\varPi }_{\textit{DORAM}}\) that also implements the \(\mathcal{F}_{ mem }\) functionality with constant rounds, sublinear server work, and sublinear communication. This protocol differs from the construction in Sect. 3 in that it does not attempt to hide whether a query is a read or write operation, and in exchange it achieves better performance.

The construction in this section only needs to invoke the shuffle functionality \(\mathcal{F}_{ shuffle }\) after \(t \) write operations, independent of the number of read operations. In scenarios where writes are infrequent, the amortized cost per read can have a small \(O_{\lambda } (N ^\epsilon )\) dependency on the database size \(N \) for any constant \(\epsilon > 0\). To build the new DORAM protocol \(\tilde{\varPi }_{\textit{DORAM}}\), we start from a SK-DEPIR that supports unlimited reads while hiding access patterns, and we emulate the server using secure 2-party computation (which hides the database contents as well).

We first describe how we instantiate the new version of store that relies on SK-DEPIR in Sect. 4.1, then show how to use the new \(\mathcal{F}_{ store } \) in the larger \(\tilde{\varPi }_{\textit{DORAM}} \) protocol in Sect. 4.2. Finally we show how to construct secure compuation of multivariate polynomial evaluation and interpolation using FFT in Sect. 4.3.

4.1 Instantiating \(\mathcal{F}_{ store }\) Using Secure Computation of SK-DEPIR

In this section, we show that a secure 2-party computation (2PC) of the Canetti et al. construction leads to an instantiation \(\tilde{\varPi }_{\textit{store}}\) of the \(\mathcal{F}_{ store }\) functionality. The construction we present in this section achieves sublinear communication with a constant number of rounds and quasilinear server work. To do so, we first construct a 2PC protocol for a locally decodable code, and then we construct \(\tilde{\varPi }_{\textit{store}}\) as a 2PC of a secret key doubly efficient private information retrieval (SK-DEPIR) protocol based on an LDC.

We focus on a block size \(B = \ell \), so that each block can canonically be encoded as a field element in \(\mathbb {F} = \mathrm {GF}(2^\ell )\). Put another way, all references to the database size \(N \) are enumerated in terms of the number of blocks, but if one desires a lower block length like \(B = 1\) then \(N \) should instead be interpreted in terms of the number of bits of the database.

Fig. 15.
figure 15

\(\tilde{\varPi }_{\textit{ldc}}\) protocol for secure 2-party computation of a Reed-Muller-style LDC

2PC for a Locally Decodable Code. First, we construct a secure 2-party computation protocol \(\tilde{\varPi }_{\textit{ldc}}\) of the locally decodable code used by Canetti et al. [6], which is a Reed-Muller-based polynomial code. We depict our construction in Fig. 15, in which the two parties maintain boolean secret shares of all input, intermediate, and output data from the LDC of Canetti et al.

Our 2PC protocol \(\tilde{\varPi }_{\textit{ldc}}\) operates over a binary field \(\mathbb {F} = \mathbb {F} _2 [z]/(\rho (z))\) of size \(\vert \mathbb {F} \vert = 2^\ell \) defined using an irreducible polynomial \(\rho \) of degree \(\ell \). Elements of \(\mathbb {F}\) can be represented using bitstrings of length \(\ell \) in the canonical way, such that the addition of two elements corresponds to the boolean-xor of their bitstring values. Furthermore, we consider \(H \subset \mathbb {F} \) to be the subspace of \(\mathbb {F} \) of size \(\vert H \vert = 2^h \) containing the span of basis elements \(\mathcal {H} = \{z^{h- 1}, z^{h- 2}, \ldots , z, 1\}\); this corresponds to bitstrings that have \(\ell - h \) leading 0s. Also, protocol \(\tilde{\varPi }_{\textit{ldc}}\) performs operations in the vector spaces \(H ^m \) and \(\mathbb {F} ^m \) of sizes \(N = \vert H \vert ^m \) and \(M = \vert \mathbb {F} \vert ^m \), respectively.

We claim that this protocol can be securely evaluated efficiently and non-interactively. Throughout this section, we only consider boolean secret shares \(\langle \cdot \rangle \), so that field addition and scalar multiplication can be performed locally by each server, without interaction. Hence, our claim amounts to the statement that all operations in \(\tilde{\varPi }_{\textit{ldc}}\) involve only linear algebra in the field along with concatenation/truncation of bitstrings, because all of these operations commute with boolean-xor.

Theorem 2

Let \(m< t< r< N < M \) be parameters of a Reed-Muller locally decodable code such that \(N \) and \(M \) are powers of 2. Then, protocol \(\tilde{\varPi }_{\textit{ldc}}\) in Fig. 15 is a secure two-party computation of an LDC with local decodability and smoothness. Furthermore, \(\tilde{\varPi }_{\textit{ldc}}\) requires no interaction between parties, and its computation cost is \(O(M \log ^2(M))\) for \(\mathsf {Enc}\) and \(O(r ^2)\) for \(\mathsf {Query}\) and \(\mathsf {Dec}\).

Proof

Our 2PC protocol \(\tilde{\varPi }_{\textit{ldc}}\) contains methods for the servers to securely compute each of the 3 methods of an LDC on boolean secret-shared data. Ergo, the local decodability and smoothness of \(\tilde{\varPi }_{\textit{ldc}}\) follow immediately from the same properties of its non-secure-computation counterpart [6].

There are four types of operations used throughout \(\tilde{\varPi }_{\textit{ldc}}\), and we show below how to compute all of them non-interactively. The first two operations are used in \(\mathsf {Enc}\), and the last two in \(\mathsf {Query}\) and \(\mathsf {Dec}\).

  • Computing the lexicographic maps \(\delta \) and \(\iota \): \(\delta \) is the identity operation on bitstrings, and thanks to the specific basis we chose for \(H\), computing \(\iota (i)\) merely involves partitioning the bits of \(i \in [N ]\) into \(m \) strings of length \(h \), padding with 0s in the \(\ell - h \) leftmost bits. These string operations can can be performed independently in \(O(N )\) time on each boolean secret share of i.

  • Interpolation and evaluation of multivariate polynomial \(\psi \): this task is challenging; we show in Sect. 4.3 a non-interactive secure 2-party protocol that performs these operations across all of \(\mathbb {F} ^m \) in time \(O(M \log ^2(M))\).

  • Random sampling of multivariate polynomial \(\phi \): the parties already hold shares of the constant term \(\phi _0 = \phi (0)\), and they can randomly choose all other \(t \) coefficients in \(O(t)\) time.

  • Evaluation of \(\phi \) at \(r \) points and interpolation of \(\tilde{\phi }\) from \(r \) points: since the evaluation points \((x_i)_{i \in [r ]}\) are publicly known, the coefficients for polynomial evaluation and Lagrange interpolation can also be publicly (pre-)computed. Ergo, evaluating or interpolating a polynomial of degree \(\le r \) only involves linear algebra and takes \(O(r ^2)\) time.

Fig. 16.
figure 16

\(\tilde{\varPi }_{\textit{store}} \) protocol, based on secure 2PC of the SK-DEPIR scheme of Canetti et al., given any integers \(m< t< r< u< N < M \) satisfying the HPN assumption.

Constructing \(\varvec{\tilde{\varPi }_{\textit{store}}}\) as a Secure Computation of SK-DEPIR. Next, we construct a new protocol \(\tilde{\varPi }_{\textit{store}}\) that also instantiates \(\mathcal{F}_{ store }\). It is a secure two-party computation of the client-server SK-DEPIR protocol of Canetti et al. [6] in which the two parties jointly emulate the server. In Fig. 16, we show simultaneously a secure computation of the SK-DEPIR protocol and how its methods (along with \(\tilde{\varPi }_{\textit{ldc}} \), \(\varPi _{\textit{permute}}\), and \(\varPi _{\textit{store}}\)) combine to instantiate a new read-only storage protocol \(\tilde{\varPi }_{\textit{store}} \).

At a high level, the protocol \(\tilde{\varPi }_{\textit{store}}\) operates as follows. During initialization, the parties collectively construct the LDC encoding of the database, permute it \(u \) times, and store the concatenation of these \(u \) encoded databases \(\mathsf {E} ^0, \mathsf {E} ^1, \ldots , \mathsf {E} ^{u-1}\) in an instance of \(\varPi _{\textit{store}}\); the address corresponding to each element \(\mathsf {E} ^i[j]\) is the concatenation of the the instance number i and the location j within this instance. During a read operation, the parties look up \(\tilde{\varPi }_{\textit{store}} \) at one location within each permuted database \(\mathsf {E} ^i\); \(r \) of these lookup operations retrieve data that can collectively be used to decode the desired value, and the remaining \(u- r \) lookups are “decoy” lookups that provide security under the HPN assumption.

Lemma 2 shows two settings of parameters that satisfy the HPN assumption. Using these parameters, we show that \(\tilde{\varPi }_{\textit{store}}\) is an efficient and secure read-only data store.

Theorem 3

Under the HPN assumption, the protocol \(\tilde{\varPi }_{\textit{store}}\) in Fig. 16 securely implements functionality \(\mathcal{F}_{ store }\) with constant rounds of communication. Furthermore, given any constant \(\epsilon > 0\), there exist parameters \(m \), \(t \), \(u \), and \(M \) such that the computation and communication cost of \(\tilde{\varPi }_{\textit{store}} \) is:

  • \(O_{\lambda } (N ^{1 + \epsilon })\) for \(\mathsf {Init}\) and \(O_{\lambda } (N ^\epsilon )\) for \(\textsf {read} \), or

  • \(O_{\lambda } (\mathrm{poly}(N ))\) for \(\mathsf {Init}\) and \(O_{\lambda } (\log (N ))\) for \(\textsf {read} \).

Proof

Our 2PC protocol \(\tilde{\varPi }_{\textit{store}}\) computes all methods of the Canetti et al. SK-DEPIR protocol over boolean secret-shared data, In particular, there exist constant-round secure computation protocols for all set operations in \(\tilde{\varPi }_{\textit{store}}\).

  • Within \(\mathsf {KeyGen}\): to choose a subset \(T \subset [u ]\), form a set of \(r \) 1s and \((u-r)\) 0s, then permute this set using \(\varPi _{\textit{permute}}\). The result is a secret-shared indicator vector \(\langle \varvec{T} \rangle \) of length \(u \) indicating which elements are in \(T \).

  • Within \(\mathsf {Query}\): form the set \(\tilde{Y}\) by oversampling. Run the LDC \(\mathsf {Query}\) operation on \(u \) values rather than \(r \) values (using an LDC protocol with \(u \) constants \(x_i\)) to compute \(Y = (\langle \varvec{y}_i \rangle )_{i \in [u ]}\), and let \(\bar{Y} = (\langle \varvec{\bar{y}} \rangle )_{i \in [u ]}\) be a secret-shared set of \(u \) random values. Compute \(\tilde{Y}\) by multiplexing: in parallel, set each element \(\langle \varvec{\tilde{y}_i} \rangle = \langle \varvec{T}[i] \wedge \varvec{y_i} \oplus (\varvec{T}[i] \oplus 1) \wedge \varvec{\bar{y}_i} \rangle \). (Since all values are boolean secret-shared, the bitwise-AND should be performed in 1 round between the \(\varvec{T}[i]\) and each bit of \(\varvec{y_i}\) in turn, and similarly for the second term.)

  • To truncate the list \(\langle L \rangle |_{T}\) within \(\mathsf {Dec}\): first form a secret sharing of the index vector \(\langle \varvec{I} \rangle \) that equals 0 at decoy values and where \(\varvec{I} |_{T} = \{ 1, 2, \ldots , r \} \) at real values by bit composing \(\langle \varvec{T} \rangle \) to an additive secret sharing \([[ \varvec{T} ]] \) [9, 29], computing \([[ \varvec{I} _i ]] = [[ \varvec{T}_i \cdot \sum _{j = 0}^{i} \varvec{T}_j ]] \;\forall i \in [u ]\), and bit decomposing \([[ \varvec{I} ]] \) into a boolean secret sharing \(\langle \varvec{I} \rangle \). Then, concatenate componentwise the elements of \(\langle \varvec{I} \rangle \) and \(\langle L \rangle \), permute this set using \(\varPi _{\textit{permute}}\), open all shares of indices \(\varvec{I} \) in parallel, and locally sort the values of \(\langle L \rangle \) using the indices.

The computational cost of \(\mathsf {Query}\) is \(O(u ^2)\) due to oversampling, and the cost of the set truncation within \(\mathsf {Dec}\) is \(O(u \log (u))\) as shown by Damgard et al. [9].

As a secure two-party computation of an existing SK-DEPIR scheme, \(\tilde{\varPi }_{\textit{store}}\) inherits the correctness property from Definition 2, which states that the \(\textsf {read} \) operation always returns the correct decoded database entry. Additionally, the use of \(\varPi _{\textit{permute}}\) within the protocol provides the random permutation \(\pi \) as required for use of the HPN assumption, so we also inherit the indistinguishability-style security property from Definition 2. Using these properties, it is straightforward to prove that \(\tilde{\varPi }_{\textit{store}}\) instantiates \(\mathcal{F}_{ store }\) using a similar sequence of hybrids as in the original proof; we omit the details for brevity.

The claims about computational costs follow from Lemma 2 plus the following two observations. First, the cost of \(\mathsf {Init}\) is dominated by the cost of the SK-DEPIR \(\mathsf {Process}\) method, since the \(O(M)\) cost of \(\varPi _{\textit{permute}}\) and the \(O(u \log (u))\) cost of the oblivious search in \(\mathsf {KeyGen}\) are smaller than the parameters in Lemma 2. Second, the cost of \(\textsf {read} \) is dominated by the cost of the LDC \(\mathsf {Query}\) and \(\mathsf {Dec}\) since the call to \(\varPi _{\textit{store}}\) within \(\mathsf {Resp}\) costs \(O(\log (u \cdot M))\) as per Table 2, which is \(O_{\lambda } (\log (N ))\) since \(u < N \) and \(M = O_{\lambda } (N )\).

4.2 The New DORAM Construction \(\tilde{\varPi }_{\textit{DORAM}}\)

In this section, we show how to construct the new DORAM construction \(\tilde{\varPi }_{\textit{DORAM}}\) using this new instantiation \(\tilde{\varPi }_{\textit{store}}\) of the \(\mathcal{F}_{ store }\) functionality, which only needs to be shuffled and reconstructed after a specified bound \(t \) of write operations have been performed, irrespective of the number of read operations.

The updated \(\tilde{\varPi }_{\textit{DORAM}} \) protocol is shown in the full version. The protocol now initializes two versions of the store, one to keep track of which elements are written and leaks access patterns and the other, \(\tilde{\varPi }_{access}\), that supports unlimited reads and does not leak access patterns. This first store is critical to maintain invariant used for reshuffling. In order to know what written elements that are found in the stash, we use this store to keep track of the items ‘written’ into the store. With the distinction between reads and writes no longer hidden, \(\tilde{\varPi }_{\textit{DORAM}} \) only increments the epoch counter when a write is performed. Reads do not count towards the contents of the stash.

The most significant change is within the access protocol, shown in the full version [20]. It now differentiates between \(\mathsf {read}\) and \(\mathsf {write}\) operations. For a \(\textsf {read} \) operation it calls \(\tilde{\varPi }_{\textit{store}} \) and does not write anything back to the stash. The write operation continues to be unchanged from the original protocol.

Reshuffling is shown in the full version [20]. When it comes time to reshuffle after \(t\) writes, the protocol is similar except in one key difference. Though we now support two different stores, we perform the concatenation with elements in the stash only with the original store, not the augmented SK-DEPIR store. The latter does not keep track of the elements read (the indices \(\mathsf {i}_\mathsf {M}\) returned by \(\tilde{\varPi }_{\textit{store}} \) are simply random values and do not allow for the recovery of the elements). Instead we have to rely on the unread elements in the original store; namely the elements that were not written to as part of the store. The unread elements are identical to the elements found in the augmented store, so concatentation will result in the correct operation of \(\tilde{\varPi }_{\textit{shuffle}} \). The only other change is to instantiate these two stores, rather than the one store used within the original protocol.

We also consider the complexity of these new schemes in Table 3. Recall that we use \(O_{\lambda } \) to indicate complexities that only depend on \(N \), ignoring any \(\mathrm{poly}(\lambda )\) terms. The main difference between the complexity of our two schemes is the blowup incurred by the new implementation of \(\tilde{\varPi }_{\textit{store}} \). The LDC encoding incurs a \(O_{\lambda } (N ^{1+\epsilon })\) overhead for any choice of \(\epsilon > 0\). This means any protocols that were original dominated by the computation or bandwidth of \(\tilde{\varPi }_{\textit{store}} \) initialization inherit this new cost. Recall in the original scheme the dominate cost of \(\varPi _{access}\) was the linear scan of the stash. In this setting, with the disparity of reads vs writes, we consider a smaller stash size. If we assume one write for every \(N \) reads and our smaller stash size, \(\tilde{\varPi }_{\textit{DORAM}} \) accesses amortize to be \(O_{\lambda } (N ^{\epsilon })\).

Table 3. A evaluation of each of the protocol’s server computation, bandwidth and rounds of communication where stash size, \(t= N ^\epsilon \). Note that we assume \(N \) \(\textsf {read} \)s for every 1 \(\textsf {write} \) and \(\tilde{\varPi }_{\textit{DORAM}} \) has been amortized where appropriate.

4.3 2PC for Multivariate FFT over Binary Fields

The one remaining task in the specification of protocol \(\tilde{\varPi }_{\textit{ldc}}\) is to construct a secure computation of multivariate polynomial evaluation and interpolation. One effective, but slow, technique is to use Lagrange interpolation. For a univariate polynomial \(p = \sum _i p \cdot x^i\), we can transform secret shares of a vector \(\varvec{p} = ( p ) \) of coefficients into shares of the vector \(\varvec{\hat{p}} = ( p (i) ) \) of its evaluation at all points (or vice-versa) via multiplication by the Vandermonde matrix \(\varvec{\hat{p}} = A \cdot \varvec{p} \), or its inverse \(\varvec{p} = A^{-1} \cdot \varvec{\hat{p}} \), and shares of this matrix-vector multiplication can be computed locally by each party since the Vandermonde matrix A is public. However, the computational cost for matrix-vector multiplication is \(\mathrm {\Omega }(N ^2)\).

Fig. 17.
figure 17

\(\tilde{\varPi }_{\textit{1FFT}}\) protocol for secure 2-party computation of the Additive Fast Fourier Transform of a univariate polynomial p in a binary field \(\mathbb {F} \).

The Fast Fourier Transform (FFT) [8] is a well-known algorithm for computing polynomial evaluation in quasilinear time, and the Inverse FFT similarly calculates polynomial interpolation efficiently. The fastest known FFT for binary fields is the additive FFT algorithm by Gao and Mateer [12]. As its name suggests, this algorithm solely involves linear operations. In this section, we design a secure computation protocol \(\tilde{\varPi }_{\textit{mFFT}}\) of FFT for multilinear polynomials over binary fields that can be performed locally (i.e., without interaction) with quasilinear computational cost. While this contribution may be of independent interest, in this work it completes the task from Sect. 4.1 of constructing a non-interactive \(\tilde{\varPi }_{\textit{ldc}}\) protocol with quasilinear (rather than quadratic) computation cost. For example, in the \(\mathsf {Enc}\) protocol within \(\tilde{\varPi }_{\textit{ldc}}\), it allows for securely computing the coefficients of the polynomial \(\psi : H ^m \rightarrow \mathbb {F} \) in time \(O(N \log ^2 N )\) and securely evaluating the polynomial \(\psi \) at all locations in \(\mathbb {F} ^m \) in time \(O(M \log ^2 M)\).

We describe this protocol in two steps. First, we show how to securely evaluate FFT for univariate polynomials (building a secure computation of Taylor series expansion as a building block). Second, we bootstrap to a secure evaluation of FFT for multivariate polynomials. For brevity, we show these FFT protocols only in the forward (polynomial evaluation) direction. It is straightforward to validate that the same techniques apply to construct a secure computation protocol of inverse FFT (i.e., polynomial interpolation) in quasilinear time.

2PC Protocol \(\varvec{\tilde{\varPi }_{\textit{1FFT}}}\) for univariate FFT. In this section, we present a secure two-party computation protocol \(\tilde{\varPi }_{\textit{1FFT}}\). Let \(H \subset \mathbb {F} \) be a subspace (possibly the entire field) of size \(\vert H \vert = 2^h \) defined by a basis \(\mathcal {H}\), and let \(p = \sum _{i=0}^{2^h-1} p_i \cdot x^i\) be a univariate polynomial of degree less than \(2^h \). This protocol begins with shares of the \(2^h \) coefficients \(\langle \varvec{p} \rangle = ( \langle p_i \rangle )_{i \in [2^h ]} \) of the polynomial, and it returns the shares \(\langle \varvec{\hat{p}} \rangle = ( \langle p(i) \rangle )_{i \in [2^h ]} \) of its evaluation at all \(2^h \) points in \(H\).

Fig. 18.
figure 18

Protocol for Taylor expansion of a polynomial \(q (x) \in \mathbb {F} [x]\) at \(x^2 - x\).

The protocol \(\tilde{\varPi }_{\textit{1FFT}}\) is shown in Fig. 17, and it uses the Taylor series expansion algorithm in Fig. 18 as a building block. Each step of these algorithms only involves addition and scalar multiplication of secret-shared values, so the secure computation \(\tilde{\varPi }_{\textit{1FFT}}\) can be performed locally. These algorithms are precisely the secret-shared versions of their counterparts in Gao and Mateer [12].

We provide a high-level intuition of \(\tilde{\varPi }_{\textit{1FFT}}\) when considering the basis \(\mathcal {H} = \{z^{h- 1}, z^{h- 2}, \ldots , z, 1\}\), in which case \(q = p \); full details are given in [12]. The core idea of the Fast Fourier Transform is to reduce the evaluation of one polynomial \(q \) into the evaluation of two polynomials \(f \) and \(g \) of half the degree, plus quasilinear work to “stitch” the results together into an evaluation of \(q \). Gao and Mateer [12] show how this can be done over binary fields, based on these observations:

  • The Taylor expansion \(q (x) = \sum _{i = 0}^{2^{h- 1}} (f _i + g _i x) \cdot (x^2 - x)^i\) leads to an equation \(q (x) = f (x^2 - x) + x \cdot g (x^2 - x)\) involving polynomials \(f (z) \triangleq \sum _{i=0}^{2^{h- 1}} f _i \cdot z^i\) and \(g (z) \triangleq \sum _{i=0}^{2^{h- 1}} g \cdot z^i\) of lower degree \(2^{h- 1}-1\).

  • The function \(x \mapsto x^2-x\) is 2-to-1, and specifically it maps the \(2^h \)-sized space spanned by \(\mathcal {H} \) into the smaller \(2^{h- 1}\) space spanned by the basis \(\tilde{\mathcal {H}}\).

Ergo, in order to evaluate the polynomial \(q \) at all points spanned by \(\mathcal {H} \), it suffices to evaluate polynomials \(f \) and \(g \) at all points spanned by the smaller basis \(\tilde{\mathcal {H}}\) and combine the results using the Taylor expansion \(q (x) = f (x^2 - x) + x \cdot g (x^2 - x)\).

We provide a secure 2-party computation of Gao and Mateer’s method of computing the Taylor expansion of \(q \) in Fig. 18, and we provide a 2PC of polynomial evaluation in Fig. 17. The only operations that involve secret-shared data are linear combinations and splitting/joining vectors, all of which can be performed locally. Note that step 2 of Fig. 17 involves more complicated algebra, but it only involves public (non-secret-shared) values, so it can be performed locally and pre-computed before parties receive their input shares.

Fig. 19.
figure 19

\(\tilde{\varPi }_{\textit{mFFT}}\) protocol for secure 2-party evaluation of a multivariate polynomial \(p (x_0, \ldots , x_{m- 1})\) at all points in a subspace \(H ^m \in \mathbb {F} ^m \).

2PC Protocol \(\varvec{\tilde{\varPi }_{\textit{mFFT}}}\) for Multivariate FFT. Recall that the locally decodable code used in \(\tilde{\varPi }_{\textit{ldc}}\) is based on Reed-Muller codes, and as a result it uses multivariate polynomials. Here, we show how to bootstrap from an FFT for univariate polynomials into one for multivariate polynomials. The full protocol is shown in Fig. 19, and it is based on a technique used by Kedlaya and Umans [24].

Protocol \(\tilde{\varPi }_{\textit{mFFT}}\) operates via recursion over many evaluations of univariate polynomials. Given an \(m \)-variate polynomial \(p \) of total degree \(<2^h- 1\) for which the parties have shares of all coefficients, we rewrite the polynomial by conditioning on the power of the first variable: \(p (x_0, \ldots , x_{m-1}) = \sum _{i = 0}^{2^h- 1} x_0^i \cdot p _i(x_1, x_2, \ldots , x_{m-1})\). We can evaluate the \((m-1)\)-variate polynomials \(p _i\) recursively, and use the results to evaluate the univariate polynomial over \(x_0\). Since each univariate polynomial evaluation takes time quasilinear in \(\vert H \vert \), a simple recurrence relation shows that the entire evaluation is quasilinear in \(\vert H \vert ^m = N \). This completes the construction, and it is the necessary building block to complete the proof of Theorem 2 and achieve quasilinear server computation for our LDC protocol.