Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This paper considers the problem of hiding access patterns when reading from and writing to an untrusted memory or storage server. This is a fundamental problem in both in the context of software protection, as well as for secure outsourcing to a third-party storage provider.

The basic cryptographic method to hide access patterns is Oblivious RAM (ORAM) [8, 9]. It compiles logical access sequences (from a client) into garbled ones (to a storage space, or server) so that a curious observer seeing the latter only (as well as the server contents) cannot infer anything other than the overall number of logical accesses—we say that such garbled access sequences are oblivious. Since its proposal, ORAM and its applications have been extensively studied (cf e.g. [1, 37, 9, 1121, 2428, 30]). The state-of-the-art constructions [16, 26] have a \(\tilde{O}(\log ^2 N)\) computation (and communication) overhead (per logical access),Footnote 1 where N is the size of the storage, i.e., the number of data bocks (of a certain bit size) it can store.

Parallel Oblivious Accesses. Existing ORAM schemes only support a single client, and in particular do not deal with parallel accesses from multiple clients. However, enabling such parallelism is important, e.g., to achieve scalable cloud storage services for multiple users, or to secure multi-processor architectures. To overcome this barrier, a few systems-oriented works [15, 24, 30] suggested to either use a trusted proxy shared by multiple clients to act as the “sole client” of ORAM, or to adapt known ORAM schemes (such as [7, 9, 29]) to support a limited, \(O(\log N)\), number of parallel accesses.

Recently, Boyle, Chung, and Pass (BCP) [2] proposed the notion of Oblivious Parallel RAM (OPRAM), which compiles synchronous parallel logical access sequences by m clients into, parallel, garbled sequences and inter-client messages, which together still reveal no information other than the total number of logical accesses. They also provided the first – and so far, the only – OPRAM scheme. Their construction is simple and elegant, but, has a server-client communication overhead of \(\omega (\log ^3 N)\)—a factor of \(\tilde{\varOmega }(\log N)\) higher than state-of-the-art ORAM schemes [16, 26]. Their approach seems not to extend directly to use the techniques behind existing communication-efficient ORAM schemes.

Hence, the natural question that arises is: “Can we design an OPRAM scheme with the same per-client efficiency as the state-of-the-art ORAM schemes?”

Our Contributions, in a Nutshell. Our first contribution answers this question affirmatively. In particular, we prove:

Theorem 1 (Informal): There is an OPRAM scheme with \(O(\log ^2 N)\) (amortized) server-client communication overhead, and constant storage overhead.

Going beyond, an even more fundamental question concerns the basic relation between ORAM and OPRAM. We show that the two problems are related at a far more generic level:

Theorem 2 (Informal): There is a generic transformation that turns any ORAM scheme into an OPRAM scheme, with additional \(O(\log N)\) (amortized) server-client communication overhead with respect to the original ORAM scheme.

While the above results are in the amortized case, we note that in the worst case, the above complexity statements are true with O replaced by \(\omega \). Moreover, our OPRAM schemes all require client-to-client communication. Their inter-client communication is \(\omega (\log N)\log m (\log m + \log N) B\) bits. We note that this also is an improvement by a factor \(O(\log N)\) over BCP.

We stress that our approach is substantially different from that of BCP: One key idea is the use of partitioning, i.e., the fact that each client is responsible for a designated portion of the server storage. This eliminates much of the coordination necessary in BCP. Next, we move to explaining the high-level ideas behind our constructions in greater detail.

1.1 Subtree-OPRAM

We provide an overview of our scheme Subtree-OPRAM. Our construction of an m-client OPRAM scheme can be seen as consisting of two steps.

  • (1) First, we construct an ORAM scheme, called Subtree-ORAM, that enables a single client to batch-process m logical accesses at a time in parallel. Our Subtree-ORAM scheme is a generalization of Path-ORAM [26] to the setting with large client memory and parallel processing. We believe that this generalization is of independent interest.

  • (2) In a second step, we exploit the batch-processing structure of Subtree-ORAM to adapt it to the multiple-client setting, and derive our Subtree-OPRAM scheme by distributing its computation across multiple clients.

In the following, we explain all of this in more detail.

Review of Path-ORAM. Let us first give an overview of the tree-based ORAM approach by Shi et al. [23]. In particular, we review Path-ORAM [26], as it will serve as our starting point. (A more detailed review is given in Appendix B.)

To implement a storage space for N data blocks, basic (i.e., non-recursive) Path-ORAM organizes the storage space (virtually) as a complete binary tree with depth \(O(\log N)\), where each node is a “bucket” that contains a fixed number \(Z= O(1)\) of encrypted blocks (some of which may be dummies). To hide access patterns, each data block is assigned to a random path \(\ell \) (from a leaf \(\ell \) to the root, and we use \(\ell \) to identify both the leaf and the associated path interchangeably) and stored in some bucket on path \(\ell \); after each access, the assignment is “refreshed” to a new random path \(\ell '\). The client keeps track of the current path assigned to each block using a position map. The client also keeps an additional (small) memory for overflowing blocks, called the stash. For each logical access to a certain block with address \(\mathsf {a}\in [N]\), Path-ORAM takes the two following steps:

  • (1) Fetching a path. Retrieve the path \(\ell \) currently associated with block \(\mathsf {a}\) in the position map, and find block \(\mathsf {a}\) on the path or in the local stash. Then, assign the block \(\mathsf {a}\) to a new random path \(\ell '\) and update the position map accordingly.

  • (2) Flushing along a path. Iterate over every block \(\mathsf {a}'\) in the fetched path \(\ell \) and in the stash (this includes the block \(\mathsf {a}\) we just retrieved and possibly updated, and which was assigned to the new path \(\ell '\)), and re-insert each block \(\mathsf {a}'\) into the lowest possible bucket on \(\ell \) that is also on the path assigned to \(\mathsf {a}'\) according to the position map. If no suitable place is found (as each bucket can only contain at most Z blocks), the block is placed into the stash. The contents of the path are re-encrypted when being written back to the server (including dummy blocks).

The analysis of Path-ORAM [26] shows that the stash size is bounded by \(\omega (\log N)\) with probability roughly \({poly}(\lambda )2^{-\omega (\log N)}\). To avoid keeping a large position map, Path-ORAM recursively stores the position map at the server. The final scheme has a recursion depth of \(O(\log N)\)—each logical access is translated to \(O(\log N)\) actual accesses, each consisting of retrieving a path. Overall, the communication overhead is \(O(\log ^2 N)\). Also, the overall storage complexity at the server can be kept to O(N) despite the recursion.

Subtree-ORAM. As our first contribution, we generalize Path-ORAM to process \(m \ge 1\) logical accesses at a time. As the recursion step in Path-ORAM is rather generic, we focus on the non-recursive scheme, ignoring the costs of storing the position map.

The natural approach to achieve this is to retrieve a subtree of m paths, i.e., for every m logical accesses to blocks \(\mathsf {a}_1, \ldots , \mathsf {a}_m\), we can do the following:

  • (1) Fetching subtree. Retrieve the subtree \(\mathrm {ST}\) composed of the paths \(\ell _1, \ldots , \ell _m\) assigned to the m blocks and find the blocks of interest in the subtree or in the stash, and possibly update their values.

  • (2) Path-by-path flushing. Execute the flushing procedure from Path-ORAM on the m paths in \(\mathrm {ST}\) sequentially as in Path-ORAM, with each \(\mathsf {a}_i\) assigned to a new random path \(\ell '_i\).

Unfortunately, there are two problems with this approach. First, if \(\mathsf {a}_1, \ldots , \mathsf {a}_m\) are not all distinct, the accesses are not oblivious, as the same path would be retrieved multiple times. To avoid this, the final Subtree-ORAM scheme perform some pre-processing: For accesses to the same block, replace all but the first one with \(\bot \) in the logical sequence to obtain \(\mathsf {a}'_1, \ldots , \mathsf {a}'_m\), and for each repetition \(\mathsf {a}'_i = \bot \), assign random path to be retrieved from the server — this is called a fake read.Footnote 2

The second drawback is that repeating the flushing procedure of Path-ORAM m times in Step 2 is inherently sequential. To use Subtree-ORAM within Subtree-OPRAM below, we instead target a parallelizable flushing procedure. To this end, we introduce the following new flushing procedure, which we refer to as subtree flushing:

  • (2) Subtree flushing: Iterate over every block in \(\mathrm {ST}\) and in the stash and place each block into the lowest node in the entire subtree \(\mathrm {ST}\) that is still on its assigned path, and not yet full. The order in which blocks are processed can be arbitrary, and the process can be parallelized (subject to maintaining the size constraint of each node).

Security and correctness of Subtree-ORAM follow similar arguments as Path-ORAM. Furthermore, we bound the stash size of Subtree-ORAM by generalizing aspects of the analysis of Path-ORAM – we believe this to be of independent interest.

Subtree-OPRAM. Our end goal is to design an interactive protocol that enables m clients to access (and possibly alter) blocks \(\mathsf {a}_1, \ldots , \mathsf {a}_m\) in parallel, where client \(C_i\) is requesting in particular block \(\mathsf {a}_i\); both the access patterns to the server, as well as inter-client communication, must be oblivious.

We can think of our Subtree-OPRAM protocol as having the m clients collectively emulate the single Subtree-ORAM client. To this end, we use inter-client oblivious communication protocols based on tools developed in [2] to let clients interact with each other. Here, we focus our description on how to “distribute” Step 1 and Step 2 of Subtree-ORAM for the special cases that the requested blocks \(\mathsf {a}_1, \ldots , \mathsf {a}_m\) are distinct. (Handling colliding requests in an oblivious way will require extra work.) For simplicity, we assume that all clients have access to the position map and all messages are implicitly encrypted (with a key shared by all clients). In particular, everything is re-encrypted before being written to the server.

Assume for simplicity that \(m = 2^l\). We can think of the server storage in Subtree-OPRAM in terms of a tree of buckets, as in Path-ORAM and Subtree-ORAM. However, we remove the top l levels, effectively turning the tree into a forest of m trees \(T_1, \ldots , T_m\); client \(C_i\) manages all read/write from/to \(T_i\), and all blocks assigned to (a path in) \(T_i\) that do not fit in one of the buckets on the server remain in a local stash managed locally by \(C_i\). More precisely:

  • (1) In parallel, each \(C_i\) finds the path \(\ell _i\) assigned to \(\mathsf {a}_i\) (using the position map) and delegates the job of reading path \(\ell _i\) to the client \(C_j\) responsible for the tree \(T_j\) containing \(\ell _i\), to which it sends a request. Each \(C_j\) retrieves all paths for which it has received a request (again in parallel), which form a subtree \(\mathrm {ST}_j\) of \(T_j\); it then finds the blocks of interest in \(\mathrm {ST}_j\) and its local stash, and sends them back to the respective clients who requested them.

  • (2) Each \(C_i\) assigns \(\mathsf {a}_i\) a new path \(\ell '_i\), and delegates the job of writing back \((B_i, \ell '_i)\) to the client \(C_j\) responsible for the tree \(T_j\) containing \(\ell '_i\). To ensure obliviousness, the clients achieve this by running collectively the oblivious routing protocol of [2], which hides the destination of messages. Next, each \(C_i\) runs the subtree-flushing procedure locally on the retrieved subtree \(\mathrm {ST}_i\) and its own stash, and finally writes the entire subtree \(\mathrm {ST}_i\) back.

We will show that the m clients indeed collectively emulate the execution of the single client of Subtree-ORAM. In particular, parallel flushing on the individual subtrees emulates the effect of a global flushing over the union of these subtrees, but keeping the top l levels of the tree locally at the clients; also, the union of the stashes of all clients contains exactly the contents of the stash of the Subtree-ORAM client, as well as the contents of the top of the tree. This gives a bound on the overall sizes of the stashes.

In expectation, each client reads and writes one path per round, and thus the amortized client-server communication overhead is \(O(\log N)\), and the final recursive Subtree-OPRAM has amortized overhead of \(O(\log ^2 N)\), with overwhelming probability. In fact, we prove that the worst-case overhead is not much higher, and is of the order of \(\omega (\log ^2 N)\), e.g., \(O((\log ^2 N) \cdot \log \log N)\), much smaller than BCP’s \(\omega (\log ^3 N)\). We improve over BCP also in terms of inter-client communication complexity by a factor of \(\log N\).

1.2 The Generic Transformation

Subtree-OPRAM is tailored at achieving the same overhead as Path-ORAM, and not surprisingly, the former heavily relies on the latter. Our second contribution is a generic transformation that converts any ORAM scheme into an OPRAM protocol. When applied to Path-ORAM, the resulting scheme is less efficient than Subtree-OPRAM – still, the main benefit here is generality.

Our approach generalizes ideas from partition-based ORAM [25]. Specifically, we split the server storage into m partitions each storing (roughly) N / m blocks, and let the m clients run each a copy of the basic ORAM algorithm (call them \(\mathcal {O}_1, \ldots , \mathcal {O}_m\)). Each client \(C_i\) thus manages the i-th partition independently using \(\mathcal {O}_i\). Every block \(\mathsf {a}\) is randomly assigned to one of the m partitions \(P \in [m]\), and it is re-assigned to a new random partition after each access. The current assignment of blocks to the m partitions is recorded in a partition map, which we assume (for now) to be accessible by all clients. (In the end, it will be shared using recursion techniques.) Then, when m clients request the m blocks \(\mathsf {a}_1,\ldots , \mathsf {a}_m\) in parallel, the clients simply find the respective partitions \(P_1, \ldots , P_m\) containing these blocks, and let the corresponding clients retrieve the desired blocks and delete them from their partitions (if a block is accessed for multiple times, then “fake reads” are performed to a random partition). The actual access pattern so far is oblivious since all \(P_i\)’s are random, and the basic ORAM scheme ensures that retrieving blocks from each partition is done obliviously.

However, writing these blocks back to new random partitions without revealing their destinations turns out to be non-trivial, even if we can deliver the blocks obliviously to the clients responsible for the new partitions. Indeed, naively invoking the corresponding ORAM copies to insert would reveal how many blocks are assigned to each partition. To hide this information, in our protocol each client inserts the same number \(\kappa \) of blocks to its partition, and keeps a queue of blocks to be inserted. We use a stochastic analysis to show that for any \(R = \omega (\log \lambda )\), it is sufficient to insert only \(\kappa = 2\) blocks to each partition each time (and in particular, perform fake “insertions” if less than 2 blocks need to be inserted), and at most R “overflowing” blocks ever remain in the queue (except with negligible probability).

A challenge we have not addressed is how to use an ORAM for a partition of size O(N / m) to store the blocks associated with it in an efficient way, i.e., without using the whole space of [N] addresses. We will solve this by using an appropriate ORAM-based oblivious dictionary data structure.

As the expected number of read and write operations each client performs is 3 (one read and two writes), the non-recursive version has the same (amortized) computation and communication overhead as the underlying ORAM scheme. To obtain the final OPRAM scheme, we apply recursive techniques to outsource the partition map to the server.

Notation. Throughout this paper, we let [n] denote the set \(\left\{ 1,2,..., n \right\} \). We denote by \(\varDelta (X, Y)\) the statistical distance between distributions (or random variables) X and Y, i.e. \(\varDelta (X, Y) = \sum _{x} \left| \Pr [X = x] - \Pr [Y = x] \right| \). Also, we say that a function \(\mu \) is negligible if for every polynomial p there exists a sufficiently large integer \(n_0\), such that \(\mu (n) \le 1/p(n)\) for all \(n > n_0\).

2 Oblivious (Parallel) RAM

We start by reviewing the notion of Oblivious RAM and its parallel extensions. We present definitions different from (yet essentially equivalent to) the ones by Goldreich and Ostrovsky [8, 9] and BCP [2], considering clients and servers, instead of RAM compilers, which we consider to lead to more compact and natural descriptions, and are more in line with the applied ORAM literature.

Basic ORAM Setting. The basic ORAM setting considers two parties, a client and a server: The server \(\mathcal {S}(M, B)\) has a large storage space consisting of \(M\) cells, each of size \(B\) bits, whereas the client has a much smaller memory. The client can access the storage space at the server using read and write commands, denoted as \(\mathsf{Acc}(\mathsf{read}, \mathsf {a}, \bot )\) and \(\mathsf{Acc}(\mathsf{write}, \mathsf {a}, v)\), where \(\mathsf {a}\in [M]\) and \(v \in \{0,1\}^{B}\). (We assume that all cells on the server are initialized to some fixed string, i.e., \(0^{B}\).) Both operations return the current value stored in cell \(\mathsf {a}\), in particular for the latter operation this is the value before the cell is overwritten with v.

An oblivious RAM (ORAM) scheme consists of an ORAM client \(\mathcal {O}\) (or simply, an ORAM \(\mathcal {O}\)), which is a stateful interactive PPT machine which on initial input the security parameter \(\lambda \), block size \(B\), and storage size N, processes logical commands \(\mathsf{Acc}({\text {op}}_i, \mathsf {a}_i, v_i)\), \({\text {op}}_i \in \{\mathsf{read}, \mathsf{write}\}\), \(\mathsf {a}_i \in [N]\), \(v_i \in \{0,1\}^{B} \cup \{\bot \}\), by interacting with a server \(\mathcal {S}(M, B)\) (for values \(M= M(N)\) and \(B= B(B, \lambda )\) explicitly defined by the scheme), via sequence of actual (read/write) accesses \(\mathsf{Acc}(\overline{{\text {op}}}_{i, 1}, \overline{a}_{i,1},\overline{v}_{i,1}), \ldots , \mathsf{Acc}(\overline{{\text {op}}}_{i, q_{i}}, \overline{a}_{i, q_i}, \overline{v}_{i, q_i})\), and finally outputs a value \(val_i\) and updates its local state depending on the answers of these accesses.

An ORAM scheme hides the sequence of logical commands from an untrusted (honest-but-curious) server, who observes the actual sequence of accesses. The actual values written to the server can be hidden using semantically-secure encryption. Indeed, all known ORAM solutions have server cells hold each the encryption of a block, i.e., in general one has \(B= B+ O(\lambda )\). For this reason, we abstract away from the usage of encryption by dealing only with access-pattern security and tacitly assuming that all cells are going to be stored encrypted in the final scheme with a semantically secure encryption scheme, and that every write access to the server will be in form of a fresh re-encryption of the value. In this case, it makes sense to think of \(B= B\), and an adversary who cannot see the value written to/read from the server.

We defer a definition of security and correctness for single-client ORAM in Appendix A, and here rather focus on generalizing above to the multi-client setting.

Multi-Client Setting. We now consider the setting of oblivious parallel ORAM (or OPRAM for short) with m clients. An m-client OPRAM is a setFootnote 3 of stateful interactive PPT machines \(\mathcal {PO}=\left\{ \mathcal {O}_i \right\} _{i \in [m]}\) which all on initial input the security parameter \(\lambda \), the storage size parameter N, and the block size \(B\), proceed in rounds, interacting with the server \(\mathcal {S}(M(N), B)\) (where M is a parameter of the schemeFootnote 4) and with each other through point-to-point connections. At each round r the following steps happen: First, every client \(\mathcal {O}_i\) receives as input a logical operation \(\mathsf{Acc}({\text {op}}_{i,r}, \mathsf {a}_{i, r}, v_{i, r})\) where \({\text {op}}_{i, r} \in \{\mathsf{read}, \mathsf{write}\}\), \(\mathsf {a}_{i, r} \in [N]\) and \(v_{i, r} \in \{0,1\}^{B} \cup \{\bot \}\). Then, the clients engage in an interactive protocol where at any time each client \(\mathcal {O}_i\) can (1) Send messages to other clients, and (2) Perform one or more accesses to the server \(\mathcal {S}(M, B)\). Finally, every \(\mathcal {O}_i\) outputs some value \(\mathrm {val}_{i, r}\).

Correctness and Obliviousness. We assume without loss of generality than the honest-but-curious adversary learns only the access and communication patterns. To this end, let us fix a sequence of logical access operations that are issued to the m clients in T successive rounds. First off, for all \(i \in [m]\), we denote by \({\varvec{y}}_i = \left( \mathsf{Acc}({\text {op}}_{i,r}, \mathsf {a}_{i,r}, v_{i,r})\right) _{r \in [T]}\) the sequence of logical operations issued to \(\mathcal {O}_i\) in the T rounds, and let \({\varvec{y}} = ({\varvec{y}}_1, \ldots , {\varvec{y}}_m)\;.\)

Now, for an execution of an OPRAM scheme \(\mathcal {PO}\) for logical sequence of accesses \({\varvec{y}}\) as above, we let \(\mathsf{ACP}_i\) be the round-i communication pattern, i.e., the transcript of the communication among clients and between each client and the server in round \(i \in [T]\), except that actual contents of the messages sent among clients, as well as the values \(v_i\) in server accesses by the clients, are removed. We define

$$\begin{aligned} \mathsf{ACP}_{\mathcal {PO}}(\lambda , N, B, {\varvec{y}}) = (\mathsf{ACP}_1, \ldots , \mathsf{ACP}_T). \end{aligned}$$

Finally, we also denote the outputs client i as \({\varvec{val}}_i = (val_{i, 1}, \ldots , val_{i, T})\) and

$$\begin{aligned} \mathsf {Out}_\mathcal {PO}(\lambda , N, B, {\varvec{y}}) = ({\varvec{val}}_1, \ldots , {\varvec{val}}_m). \end{aligned}$$

The outputs \({\varvec{z}} = \mathsf {Out}_\mathcal {PO}(\lambda , N, B, {\varvec{y}}) \) of \(\mathcal {PO}\) are correct w.r.t. the parallel accesses sequence \({\varvec{y}}\), if it satisfies that for each command \(\mathsf{Acc}({\text {op}}_{i,t}, \mathsf {a}_{i,t}, v_{i,t})\) in \({\varvec{y}}\), the corresponding output \(val_{i,t}\) in \({\varvec{z}}\) is either the most recently written value on address \(\mathsf {a}_i\), or \(\bot \) if \(\mathsf {a}_i\) has not yet been written. Moreover, we assume that if two write operations occur in the same round for the same address, issued by clients \(\mathcal {O}_i\) and \(\mathcal {O}_j\), for \(i < j\), then the value written by \(\mathcal {O}_i\) is the one that takes effect. Let \(\mathsf{Correct}\) be the predicate that on input \(({\varvec{y}}, {\varvec{z}})\) returns whether \({\varvec{z}}\) is correct w.r.t. \({\varvec{y}}\).

Definition 1

(Correctness and Security). An OPRAM scheme \(\mathcal {PO}\) achieves correctness and obliviousness if or all \(N, B, T = {poly}(\lambda )\), there exists a negligible function \(\mu \) such that, for every \(\lambda \), every two parallel sequences \({\varvec{y}}\) and \({\varvec{y'}}\) of the same length \(T(\lambda )\), the following are satisfied:

  • (i) Correctness. \(\Pr \left[ \mathsf{Correct}\left( {\varvec{y}}, \mathsf {Out}_\mathcal {PO}(\lambda , N, B,{\varvec{y}}) \right) = 1\right] \ge 1-\mu (\lambda )\).

  • (ii) Obliviousness. \(\varDelta \left( \mathsf{ACP}_\mathcal {PO}(\lambda , N, B,{\varvec{y}}), \mathsf{ACP}_\mathcal {PO}(\lambda , N, B,{\varvec{y'}}) \right) \le \mu (\lambda )\).

Usually, the values \(\lambda \), N, B are understood from the context, and we thus often use \(\mathsf{ACP}({\varvec{y}}) = \mathsf{ACP}_\mathcal {PO}(\lambda , N, B,{\varvec{y}})\) for notational simplicity.

OPRAM Complexity. The server-communication overhead and inter-client communication overhead of an OPRAM scheme \(\mathcal {PO}\) are respectively the number of bits sent/received per client to/from the server, and to/from other clients, per logical access command, divided by the block size B. Finally, the server storage overhead of \(\mathcal {PO}\) is the number of blocks stored at the server divided by N, and client storage overhead is the number of blocks stored at each client after each parallel access.

3 OPRAM with \(O(\log ^2 N)\) Server Communication Overhead

In this section, we present our first OPRAM scheme, called Subtree-OPRAM.

Theorem 1

(Subtree-OPRAM). For every m, there is a m-client OPRAM scheme with the following properties: Let \(\lambda \), N, and B denote the security parameter, the size of the logical space, and block size satisfying \(B \ge 2 \log N\).

  • Client Storage Overhead. Every client keeps a local stash consisting of \(R = (\omega (\log \lambda ) + O(\log m))\log N\) blocks.

  • Server Storage Overhead. O(1).

  • Server Communication Overhead. The amortized overhead is \(O(\log ^2N)\) and the worst case overhead is \(\omega (\log \lambda \log N) + O(\log ^2N)\) with overwhelming probability.

  • Inter-Client Communication Overhead. The amortized and worst-case overheads are both \(\omega (\log \lambda )\log m (\log m + \log N)\) with overwhelming probability.

In particular, when the security parameter \(\lambda \) is set to N, the server communication complexity is \(\omega (\log ^2 N)\) in the worst case, and \(O(\log ^2 N)\) amortized.

To prove the theorem, as discussed in the introduction, we first present a single-client ORAM scheme, Subtree-ORAM, that supports parallel accesses in Sect. 3.1, and then adapt it to the multiple-client setting to obtain Subtree-OPRAM in Sect. 3.3. We analyze these two schemes in Appendixes C and D. Additional helper protocols needed by Subtree-OPRAM are given in Sect. 3.2.

3.1 Subtree-ORAM

In this section, we describe the non-recursive version of Subtree-ORAM, where the client keeps a large position map of size \(O(N\log N)\); the same recursive technique as in Path-ORAM can be applied to reduce the client memory size.

The Subtree-ORAM client, \(\mathsf {ST}\text{- }\mathcal {O}\), keeps a logical space of N blocks of size B using \(M(N) = O(N)\) blocks on the server. The server storage space is organized (virtually) as a complete binary tree \(\mathcal{T}\) of depth \(D=\log N\) (we assume for simplicity that N is a power of two), where each node is a bucket capable of storing Z blocks. In particular, we associate leaves (and paths leading to them from the root) with elements of \([2^D] = [N]\). Additionally, \(\mathsf {ST}\text{- }\mathcal {O}\) locally maintains a position map \(\mathsf{pos.map}\) and a stash \(\mathsf{stash}\) of size respectively \(O(N\log N)\) bits and \(R(\lambda ) \in \omega (\log \lambda )\) blocks.

In each iteration r, the Subtree-ORAM client \(\mathsf {ST}\text{- }\mathcal {O}\) processes a batch of m logical access operations \(\left\{ \mathsf{Acc}({\text {op}}_{i}, \mathsf {a}_{i}, v_{i}) \right\} _{i \in [m]} \) as follows:

  1. 1.

    Pre-process. Remove repetitive block accesses by producing a new m-component vector Q as follows: The i-th entry is set to \(Q_i = ({\text {op}}_i, \mathsf {a}_i)\) if the following condition holds, otherwise \(Q_i = \bot \).

    • Either, there are (one or many) write requests to block \(\mathsf {a}_i\), and the i-th operation \(\mathsf{Acc}({\text {op}}_i, \mathsf {a}_i, v_i)\) is the one with the minimal index among them.

    • Or, there are only read requests to block \(\mathsf {a}_i\), and the i-th operation \(\mathsf{Acc}({\text {op}}_i, \mathsf {a}_i, v_i)\) is the one with the minimal index among them.

  2. 2.

    Read paths in parallel. Determine a set \(S = \{\ell _1, \ldots , \ell _m\}\) of m paths to read, where each path is of one of the following two types:

    • Real-read. For each \(Q_i = ({\text {op}}_i, \mathsf {a}_i) \ne \bot \), set \(\ell _i= \mathsf{pos.map}(\mathsf {a}_i)\) and immediately refresh \(\mathsf{pos.map}(\mathsf {a}_i)\) to \(\ell '_i \displaystyle \mathop {\leftarrow }^{\$}[N]\).

    • Fake-read. For each entry \(Q_i =\bot \), sample a random path \(\ell _i \displaystyle \mathop {\leftarrow }^{\$}[N]\).

    Then, retrieve all paths in S from the server, forming a subtree \(\mathcal{T}_S\) of buckets with (at most) Z decrypted blocks in them.

  3. 3.

    Post-process. Answer each logical access \(\mathsf{Acc}({\text {op}}_i, \mathsf {a}_i, v_i)\) as follows: Find block \(\mathsf {a}_i\) in subtree \(\mathcal{T}_S\) or stash, and returns the value of the block. Next, for each \(Q_i \ne \bot \) if the corresponding logical access is a write operation \(\mathsf{Acc}(\mathsf{write}, \mathsf {a}_i, v_i \ne \bot )\), update block \(\mathsf {a}_i\) to value \(v_i\).

  4. 4.

    Flush subtree and write-back. Let \(\mathcal{T}_{\mathrm {real}}\) be the subtree consisting of only real-read paths in \(\mathcal{T}_S\). Before (re-encrypting and) writing \(\mathcal{T}_S\) back to the server, re-arrange the contents of \(\mathcal{T}_{\mathrm {real}}\) and \(\mathsf{stash}\) to fit as many blocks from \(\mathsf{stash}\) into the subtree as follows: Subtree-flushing. Move all blocks in \(\mathcal{T}_{\mathrm {real}}\) and \(\mathsf{stash}\) to a temporary set \(\Lambda \). Traverse through all blocks in \(\Lambda \) in an arbitrary order: Insert each block with address \(\mathsf {a}\) from \(\Lambda \), either into the lowest non-full bucket in \(\mathcal{T}_{\mathrm {real}}\) that lies on the path \(\mathsf{pos.map}(\mathsf {a})\) (if such bucket exists), or into \(\mathsf{stash}\). If at any point, the stash contains more than R blocks, output \(\mathsf{overflow}\) and abort.

In Appendix C, we briefly discuss the analysis of Subtree-ORAM, noting the bulk of it (proving that the overflow probability is small) is deferred to the full version for lack of space.

3.2 Oblivious Inter-client Communication Protocols

Subtree-OPRAM, which we introduce in the next section, will use as components a few oblivious inter-client communication sub-protocols which will allow to emulate Subtree-ORAM in a distributed fashion. These are variants of similar protocols proposed in [2]. Their communication patterns are statically fixed, independent of inputs (and thus are oblivious in a very strong sense), and the communication and computation complexities of each protocol participant is small, i.e., roughly \({{\mathrm{polylog}}}(m)\) where m is the number of participants. We only describe the interfaces of these protocols; their implementations are based on \(\log (m)\)-depth sorting networks, and we refer the reader to [2] for further low-level details.

Oblivious Aggregation. Our first component protocol is used to aggregate data held by multiple users, and is parameterized by an aggregation function \(\mathsf {agg}\) which can combine an arbitrary number of data items \(\mathsf {d}_1, \mathsf {d}_2, \ldots \) (from a given data set) into an element \(\mathsf {agg}(\mathsf {d}_1, \mathsf {d}_2, \ldots )\). The function \(\mathsf {agg}\) is associative, i.e., \(\mathsf {agg}(\mathsf {agg}(\mathsf {d}_1, \mathsf {d}_2, \ldots , \mathsf {d}_k), \mathsf {d}_{k+1}, \ldots \mathsf {d}_{k+r})\) and \(\mathsf {agg}(\mathsf {d}_1, \mathsf {d}_2, \ldots , \mathsf {d}_k, \mathsf {agg}(\mathsf {d}_{k+1}, \ldots \mathsf {d}_{k+r}))\) both give us the same value as \(\mathsf {agg}(\mathsf {d}_1, \ldots , \mathsf {d}_{k+r})\). Each party \(i \in [m]\) starts the protocol with an input pair consisting of a pair \((\mathsf {key}_i, \mathsf {d}_i)\). At the end of the execution, each party i obtains an output with one of two forms: (1) \((\mathsf rep, \mathsf {d}^*)\), where \(\mathsf {d}^*\) is the output of the aggregation function applied to \(\{\mathsf {d}_j \;:\; \mathsf {key}_j = \mathsf {key}_i\}\), or (2) \((\bot , \bot )\). Moreover, for every \(\mathsf {key}\) which appears among the \(\{\mathsf {key}_i\}_{i \in [m]}\), there exists exactly one party i with \(\mathsf {key}_i = \mathsf {key}\) receiving an output of type (1). We refer to each such party as the representative for \(\mathsf {key}_i\).

An aggregation protocol with fixed communication patterns, called \(\mathsf {OblivAgg}\), is given in [2]. When the bit length of the data items and of the key values is at most \(\ell \) bits, the protocol from [2] proceeds in \(O(\log m)\) rounds, and in each round, every client sends O(1) messages of size \(O(\log m + \ell )\) bits.

Oblivious Routing. Another protocol we will use is the Oblivious Routing protocol \(\mathsf {OblivRoute}\) from [2]. This m-party sub-protocol allows each party to send a message to another party; since the communication patterns are fixed, the recipients of the messages are hidden from an observer.

figure a

We note that the implementation of \(\mathsf {OblivRoute}\) is tailored at the case where each \(\mathrm {id}_i\) is drawn independently and uniformly at random from [m]. (And this will be the case of our application below.) For a parameter \(K \ge 0\), their protocol proceeds in \(O(\log m)\) rounds, and in every round, a client sends a message of size \(O(K \cdot (\ell + \log m))\) bits to another client, where \(\ell \) is the size of the inputs. Then, the probability that the protocol aborts is roughly \(O(m \log m 2^{-K})\), and thus one can set \(K = \omega (\log \lambda )\) for this probability to be negligible in \(\lambda \), or \(K = \omega (\log N)\) in our ORAM applications where N becomes the security parameter.

Oblivious Election. We will need a variant of the above \(\mathsf {OblivAgg}\) protocol with stronger guarantees. In particular, we need a protocol \(\mathsf {OblivElect}\) that allows m parties with requests \(\left\{ ({\text {op}}_i, \mathsf {a}_i) \right\} _{i \in [m]}\) to elect a unique representative party for each unique address that appears among the m requests. This representative will be the party with the smallest identity \(i \in [m]\) wanting to write to that address (if it exists), or otherwise the one with the smallest identity wanting to read from it. Formally, the protocol provides the following interface.

figure b

\(\mathsf {OblivElect}\) can be implemented by modifying \(\mathsf {OblivAgg}\). At the high level, \(\mathsf {OblivAgg}\) proceeds as follows (we refer to [2] for further details):

  • Initially, every client i inputs a pair \((\mathsf {key}_i, \mathsf {d}_i)\), and these inputs are re-shuffled across clients and sorted according to the first component. That is, at the end of the first phase, any two clients \(j < j'\) are going to hold a triple \((i(j), \mathsf {key}_{i(j)}, \mathsf {d}_{i(j)})\) and \((i(j'), \mathsf {key}_{i(j')}, \mathsf {d}_{i(j')})\), respectively, such that \(\mathsf {key}_{i(j)} \le \mathsf {key}_{i(j')}\) and \(i(j) \ne i(j')\). This is achieved via a sorting network, where each client i initially holds \((i, \mathsf {key}_i, \mathsf {d}_i)\), and then such triples are swapped between pairs of clients (defined by the sorting network), according the \(\mathsf {key}\) values.

  • This guarantees that for every \(\mathsf {key}\) which was initially input by \(m' \ge 1\) clients, at the end of the first phase there exist \(m'\) consecutive clients \(j, j +1, \ldots , j + m' - 1\) (for some j) holding triples with \(\mathsf {key}_{i(j)} = \cdots = \mathsf {key}_{i(j + m' - 1)} = \mathsf {key}\). Then, client j is going to aggregate \(\mathsf {d}_{i(j)}, \ldots , \mathsf {d}_{i(j + m' - 1)}\), and the final representative for \(\mathsf {key}\) is client i(j). The aggregate information is sent back to the representatives by using once again a sorting network, sorting with respect to the i(j)’s.

We can easily modify \(\mathsf {OblivAgg}\) to achieve \(\mathsf {OblivElect}\) as follows. We run \(\mathsf {OblivAgg}\) with client i inputting \(\mathsf {key}_i = \mathsf {a}_i\) and \(\mathsf {d}_i = ({\text {op}}_i, i)\). However, the sorting network is not going to sort solely according to the \(\mathsf {key}\) value, but also according to the associated \(\mathsf {d}\) entry. In particular, we say that \((\mathsf {a}, {\text {op}}, i) < (\mathsf {a}', {\text {op}}', i')\) iff (1) \(\mathsf {a}< \mathsf {a}'\), or (2) \(\mathsf {a}= \mathsf {a}'\), \({\text {op}}= \mathsf{write}\) and \({\text {op}}' = \mathsf{read}\), or (3) \(\mathsf {a}= \mathsf {a}'\), \({\text {op}}= {\text {op}}'\), and \(i < i'\). The sorting now will ensure that the left-most client j holding a value for some \(\mathsf {key}= \mathsf {a}\) will be such that i(j) is our intended representative.

The complexity of \(\mathsf {OblivElect}\) is the same as that of \(\mathsf {OblivAgg}\), setting \(\ell = O(\log m + \log N)\). Thus we have \(O(\log m)\) rounds, where each client sends O(1) messages of size \(O(\log m + \log N)\) bits.

Oblivious Multicasting. The oblivious multicast protocol \(\mathsf {OblivMCast}\) is a m-party subprotocol that allows a subset of the parties, called the senders, to multicast values to others, called the receivers. More precisely:

figure c

The protocol is in essence the reversal of our \(\mathsf {OblivElect}\) protocol above. It can be built using similar techniques, achieving round complexity \(O(\log m)\), and every client sends in each round O(1) messages of size \(O(B + \log N + \log m)\) bits, where B is the bit size of the values \(v_i\).

3.3 Subtree-OPRAM

Non-Recursive Subtree-OPRAM. We first describe the non-recursive version of Subtree-OPRAM, where multiple clients share access to a global position map, which can be eliminated using recursive techniques as we explain further below. (Due to the constraints of coordinating access to the same items in OPRAM, our recursive techniques are somewhat more involved than in the basic ORAM case.)

Let m be the number of clients; assume for simplicity that it is a power of 2, i.e., \(\log (m)\) is an integer. The Subtree-OPRAM protocol \(\mathsf{ST\text{- }\mathcal {PO}}= \left\{ \mathcal {O}_i \right\} _{i \in [m]}\), on common input \((\lambda ,N,B,m)\), organizes the server storage as a forest of m complete binary trees \(\mathcal{T}_1, \ldots , \mathcal{T}_m\), each of depth \({\log N -\log (m)}\), where every node in each tree is a bucket of \(Z = O(1)\) blocks of B bits. In other words, the union of \(\mathcal{T}_i\) is the complete tree \(\mathcal{T}\) in Subtree-ORAM, but with the top \(\log (m)\) levels removed. Again, we identify paths with leaves in the tree, and we say that a path \(\ell \) “belongs to” \(\mathcal{T}_i\), if the leaf \(\ell \) is in \(\mathcal{T}_i\). Each client \(\mathcal {O}_i\) is responsible for managing the portion of the storage space corresponding to \(\mathcal{T}_i\), meaning that it reads/writes all paths belonging to \(\mathcal{T}_i\), and maintains a local stash \(\mathsf{stash}_i\) for storing all “overflowing” blocks whose assigned path belongs to \(\mathcal{T}_i\). The Subtree-ORAM analysis will carry over, and imply that the size of each local stash is bounded by any function \(R(\lambda , m) \in \omega (\log \lambda ) + O(\log m)\), where the extra \(O(\log m)\) is to store blocks that in the original Subtree-ORAM scheme would have belonged to the upper \(\log (m)\) levels. The clients also share a global size-N position map \(\mathsf{pos.map}\). (Recall that we are looking at the non-recursive version here.)

Recall that the m clients share a secret key for a semantically secure encryption scheme. In each iteration, each client i processes a logical access request \(\mathsf{Acc}(op_{i}, addr_{i}, v_{i})\). The m clients then proceed in parallel to process jointly the m logical requests from this iteration:

  1. 1.

    Select block representatives. The m clients run sub-protocol \(\mathsf {OblivElect}\), where client i uses input \(({\text {op}}_i, \mathsf {a}_i)\) and receives either output \(\mathsf rep\) or \(\bot \); in the former case client i knows it is the representative for accessing block \(\mathsf {a}_i\).Footnote 5

  2. 2.

    Forward read-path requests. Each client i determines the path \(\ell _i\) it wants to fetch, and there are two possibilities:

    • Real read. If it is a representative, set path \(\ell _i = \mathsf{pos.map}(\mathsf {a}_i)\) and \(\mathsf {a}'_i = \mathsf {a}_i\), and immediately refresh \(\mathsf{pos.map}(\mathsf {a}_i)\) to \(\ell '_i \displaystyle \mathop {\leftarrow }^{\$}[N]\);

    • Fake read. If it is not a representative for \(\mathsf {a}_i\) choose a random path \(\ell _i \displaystyle \mathop {\leftarrow }^{\$}[N]\) and set \(\mathsf {a}'_i = \bot \).

    If path \(\ell _i\) belongs to tree \(\mathcal{T}_j\), client i sends an encrypted message \((i, \mathsf {a}'_i, \ell _i)\) to client j.

  3. 3.

    Read paths. Each client \(j \in [m]\) retrieves collects a set \(S_j\) of all paths contained in the messages \(\left\{ (i, \mathsf {a}'_i, \ell _i) \right\} \) received in the previous step, and then proceeds as follows:

    1. (1)

      Retrieve all paths in \(S_j\), which form a subtree denoted \(\mathcal{T}_{S_j}\).

    2. (2)

      For each \(i \in [m]\) such that a request \((i, \perp , \ell _i)\) was received, send the encryption of a dummy block \(\bot \) to client i

    3. (3)

      For each \(i \in [m]\) such that a request \((i, \mathsf {a}'_i \ne \perp , \ell _i)\) was received, find block \(\mathsf {a}'_i\) in \(\mathcal{T}_{S_j}\) or in stash, delete it, and send the encryption of the value \(\overline{v}_i\) of the block to client i.

  4. 4.

    Answer client requests and update. At the end of the previous step, each client holds a value \(\overline{v}_i\) which is \(\ne \bot \) if and only if i is the representative for \(\mathsf {a}_i\). Next, the m clients run sub-protocol \(\mathsf {OblivMCast}\) to allow each representative to multicast the value it holds to other clients requesting the same block: Concretely, each client i uses input \((\mathsf {a}_i, \overline{v}_i)\) (recall a non-representative has \(\overline{v}_i =\bot \)) and receives output \(\overline{v}'_i\), which is guaranteed to be the value of block \(\mathsf {a}_i\) it requests. Each client i answers its logical request with \(\overline{v}'_i\).

    Next, each representative i that has a write operation \(\mathsf{Acc}(\mathsf{write}, \mathsf {a}_i, v_i)\) locally updates the value of block \(\mathsf {a}_i\) to \(\overline{v}_i = v_i\).

  5. 5.

    Re-route blocks with newly assigned paths. Each representative i send its block \((\mathsf {a}_i, \overline{v}_i)\) to the appropriate client for insertion according to the newly assigned path \(\ell '_i\) (Step 1) as follows: Let \(j_i\) be the tree that path \(\ell '_i\) belongs to; the m clients run sub-protocol \(\mathsf {OblivRoute}\) where each representative i uses input \((j_i, (\ell '_i, \mathsf {a}_i, \overline{v}_i))\), and other clients use input \((j_i, \bot )\) for a randomly drawn \(j_i \displaystyle \mathop {\leftarrow }^{\$}[m]\).Footnote 6

    As the output of \(\mathsf {OblivRoute}\), each client j receives a set of blocks \(\left\{ (\ell '_i, \mathsf {a}_i, \overline{v}_i) \right\} \) whose path \(\ell '_i\) belong to \(\mathcal{T}_j\); it stores each \((\mathsf {a}_i, \overline{v}_i)\) in its local stash \(\mathsf{stash}_j\).

  6. 6.

    Flush subtree and write-back. For each client j, let \(\mathcal{T}_{\mathrm {real}_j}\) be the subtree consisting of only real-read paths in \(\mathcal{T}_{S_j}\). Before writing subtree \(\mathcal{T}_{S_j}\) back to the server (re-encrypting all of its contents), client j runs the Subtree Flushing Procedure on \(\mathcal{T}_{\mathrm {real}_j}\) and \(\mathsf{stash}_j\) (recall that if at any point, \(\mathsf{stash}_j\) contains more than R blocks, the procedure output overflow).

Recursive Version. We can apply recursion to eliminate the use of the shared global position map in the above scheme. Observe that in each iteration, each client read/write the position map at most once in Step 2. In other words, the m clients, in order to answer a batch of m accesses, one per client, to a logical space of size \(N\times B\) bits, clients need to first make a batch of at most m accesses, one per client, to the position map of size \(N\times \log N\) bits. Since \(B \ge \alpha \log N\) for some constant \(\alpha > 1\) (for simplicity, N is a power of two), by recursively storing the shared position map to the server in \(O(\log N)\) trees, the clients no longer need to share any position map. At the end of recursion, the size of the position map decreases to O(1) and can be stored in the local memory of say, the first client. Other clients can access and update this position map using oblivious sub-protocols \(\mathsf {OblivAgg}\) and \(\mathsf {OblivMCast}\).

This high-level strategy goes through almost identically as in Path-ORAM, except from the following caveat. Recall that in Step 2 of Subtree-OPRAM, if a client i is a representative, then it reads entry \(\ell _i = \mathsf{pos.map}(\mathsf {a}_i)\) of the position map and updates it to a new random address \(\ell _i'\), and otherwise, it does not access the position map. Since \(B \ge \alpha \log N\), the entire position map fits into a logical space of \(N/\alpha \) blocks, where the block with address \({\widetilde{\mathsf {a}}}\) contains \(\alpha \) position map entries, \(\mathsf{pos.map}(\alpha {\widetilde{\mathsf {a}}}+1)|| \cdots || \mathsf{pos.map}(\alpha ({\widetilde{\mathsf {a}}}+1))\). This means, when applying recursion and storing the position map at the server, client i needs to make the following logical access:

$$\begin{aligned} \mathsf{Acc}({\widetilde{{\text {op}}}}_i, {\widetilde{\mathsf {a}}}_i, {\widetilde{v}}_i) = {\left\{ \begin{array}{ll} \begin{array}{ll} \mathsf{Acc}(\mathsf{write}, \lfloor \mathsf {a}_i/\alpha \rfloor , \ell '_i) &{} \text {if}\, i \, \text {is a representative}\\ \mathsf{Acc}(\mathsf{read}, 0, \bot ) &{} \text {otherwise}\\ \end{array} \end{array}\right. } \end{aligned}$$

We assume without loss of generality above that clients who are not representatives simply make a read access to the block with address 0. By construction, different representatives i and j access different entries in the position map \(\mathsf {a}_i \ne \mathsf {a}_j\). However, it is possible that two representatives i and j need to access the same logical address \({\widetilde{\mathsf {a}}}= {\widetilde{\mathsf {a}}}_{i} = {\widetilde{\mathsf {a}}}_{j}\), in order to update different entries of position map located in the same block \({\widetilde{\mathsf {a}}}\)—call this a write-collision; since each block contains at most \(\alpha \) position map entries, there are at most \(\alpha \) write collisions for the same logical address. Recall that in Subtree-OPRAM, when multiple clients write to the same logical address, only the write operation with the smallest index is executed. Hence, naively applying recursion on Subtree-OPRAM means when write-collision occurs, only one position map entry would be updated.

This problem can be addressed by slightly modifying the interface of Subtree-OPRAM, so that, under the constraint that there are at most \(\alpha \) writes to different parts of the same block, all writes are executed. In recursion, the modified scheme is invoked, to ensure that position maps are updated correctly, whereas at the top level, the original Subtree-OPRAM is used. To accommodate \(\alpha \) write collisions, the only change appears in Step 1: In Subtree-OPRAM, the sub-protocol \(\mathsf {OblivElect}\) is used, which ensures that for each address \(\mathsf {a}\), only the minimal indexed write is executed. We now modify this step to run the sub-protocol \(\mathsf {OblivAgg}\) (with appropriate key, data and aggregate function specified shortly), so that, a unique representative is elected for each \(\mathsf {a}\), who receives all the write requests to that \(\mathsf {a}\), and executing all of them (note that while the write request are for the same block, they will concern different portions of the block corresponding to distinct position map entries, and thus “executing all of them” has a well-defined meaning):

  1. 1.

    Select block representatives, modified. The m clients run sub-protocol \(\mathsf {OblivAgg}\), where client i uses input \((\mathsf {key}_i = \mathsf {a}_i, \mathsf {d}_i = v_i)\), and aggregate function \(\mathsf {agg}(\mathsf {d}_1, \mathsf {d}_2, \cdots ) = \mathsf {d}_1||\mathsf {d}_2, \cdots = V\). \(\mathsf {OblivAgg}\) ensures that for each address \(\mathsf {a}_i\), a unique client j accessing that address \(\mathsf {a}_i\) receives output \((\mathsf{rep}, V_i)\), and all other clients receive output \((\bot , \bot )\). In the former case, client j knows it is the representative for accessing block \(\mathsf {a}_i\), and \(V_i\) determines the new value of the block \(v_i\).

The rest of the protocol proceeds identically as before. Since there are at most \(\alpha \) write collision for each address, the length of the output of \(\mathsf {agg}\) is bounded by \(\ell = \alpha B\). Thus the protocol proceeds in \(O(\log m)\) rounds, where in each round every client sends O(1) messages of size \(O(\log N + \log m + B)\) bits.

4 Generic OPRAM Scheme

In this section, we generalize the ideas from Subtree-OPRAM to obtain a generic transformation transforming an arbitrary single-client ORAM to an OPRAM scheme, incurring only in a \(O(\log N)\) factor of efficiency loss. Overall, we are going to prove the following general theorem.

Theorem 2

(Generic-OPRAM). There exists a generic transformation that turns any ORAM scheme \(\mathcal {O}\) into an m-client OPRAM scheme Generic-OPRAM such that, for any \(R = \omega (\log \lambda )\), the following are satisfied, as long as the block length satisfied \(B \ge 2 \log m\), and moreover \(N/m \ge R\):

  • Server Communication Overhead. The amortized communication overhead is \(O(\log N \cdot \alpha (N/m))\) and the worst-case communication overhead is \(O((\log N + \omega (\log \lambda )) \cdot \alpha (N/m))\), where \(\alpha (N')\) is the communication overhead of ORAM scheme \(\mathcal {O}\) with logical address space \([N']\).

  • Inter-Client Communication Overhead. The amortized and worst-case overheads are both \(\omega (\log \lambda )\log m (\log m + \log N)\) with overwhelming probability.

  • Server and Client Storage. The sever stores \(O(m \cdot M(N/m))\) blocks, where \(M(N')\) is the number of blocks stored by \(\mathcal {O}\) for logical address space \(N'\). Moreover, the client’s local storage overhead is \(R + {{\mathrm{polylog}}}(N)\).

Our presentation will avoid taking the detour of introducing a single-client ORAM scheme allowing for parallel processing of batches of m access operations, as we have done above with Subtree-OPRAM. A direct description of Generic-OPRAM is conceptually simpler. Before we turn to discussing Generic-OPRAM, however, we discuss a basic building block behind our protocol.

4.1 Oblivious Dictionaries

In our construction below, every client will be responsible for a partition holding roughly N / m blocks. One of the challenges is to store these blocks obliviously using space which is roughly equivalent to that of storing N / m blocks. Ideally, we want to implement this using an ORAM with logical address space for N / m blocks, as this would result in constant storage overhead when the ORAM has also constant overhead. In particular, the elements assigned to a certain partition have addresses spread around the whole of [N], and we have to map them efficiently to be stored into some block in [N / m] in a way which is (a) storage efficient for the client, and (b) only requires accessing a small (i.e., constant) number of blocks to fetch or insert a new block. We going to solve this via an oblivious data structure implementing a dictionary interface and able to store roughly N / m blocks into a not-much-larger amount of memory.

The Data Structure. We want an oblivious implementation \(\mathcal {OD}\) of a dictionary data structure holding at most n pairs \((\mathsf {a}, v)\), where v corresponds to a data block in our ORAM scheme, and \(\mathsf {a}\in [N]\). (For our purposes, think of \(n \approx N/m\).) At any point in time, \(\mathcal {OD}\) stores at most one pair \((\mathsf {a}, v)\) for every \(\mathsf {a}\). It allows us to perform two operations:

  • \(\mathcal {OD}_{}(\mathcal {I}, \mathsf {a}, v)\) inserts an item \((\mathsf {a}, v)\), where \(\mathsf {a}\in [N]\), if the data structure contains less than n elements. Otherwise, if n elements are stored, it does not add an element, and returns an error symbol \(\bot \).

  • \( \mathcal {OD}_{}(\mathcal {R} \& \mathcal {D}, \mathsf {a})\) retrieves and deletes an item \((\mathsf {a}, v)\) stored in the data structure (if it exists), returning v, and otherwise returns an error \(\bot \) if the element is not contained.

Moreover, \(\mathcal {OD}\) enables two additional “dummy” operations \( \mathcal {OD}_{}(\mathcal {R} \& \mathcal {D}, \bot )\) and \(\mathcal {OD}_{}(\mathcal {I}, \bot , \bot )\) which are meant to have no effect on the data structure. Informally, for security, we demant that the access patterns resulting from any two equally long sequences of operations of type \(\mathcal {OD}_{}(\mathcal {I}, *, *)\) and \( \mathcal {OD}_{}(\mathcal {R} \& \mathcal {D}, *)\) are (statistically) indistinguishable.Footnote 7

The Implementation. We can easily obtain the above \(\mathcal {OD}\) data structure using for instance any Cuckoo-hashing based dictionary data structure with constant worst-case access complexity.Footnote 8

Theorem 3

(Efficient Cuckoo-Hashing Based Dictionary [10]). There exists an implementation of a dictionary data structure holding at most n blocks with the following properties: (1) It stores \(n' = O(n)\) blocks in the memory. (2) Every insert, delete, and lookup operation, requires \(c = O(1)\) accesses to blocks in memory. (3) The client stores \({{\mathrm{polylog}}}(n)\) blocks in local memory. (4) The failure probability is negligible (in n) for any \({poly}(n)\)-long sequence of lookups, insertions, and deletions which guarantees that at most n elements are ever stored in the data structure.

From any ORAM scheme \(\mathcal {O}\) with address space \(n'\), it is easy to implement the oblivious data-structure \(\mathcal {OD}\): The client simply implements the dictionary data structure from Theorem 3 on top of the ORAM’s logical address space, and uses additional \({{\mathrm{polylog}}}(n)\) local memory for managing this data structure. Dummy accesses can be performed by simply issuing c arbitrary read requests to the ORAM storage. We omit a formal analysis of this construction, which is immediate.

4.2 The Generic OPRAM Protocol

We finally show how to obtain our main generic construction of an oblivious parallel RAM: The server storage consists of m partitions, and the i-th client manages the i-th partition. In particular, client i runs the oblivious dictionary scheme \(\mathcal {OD}\) presented above (we refer to its interface as \(\mathcal {OD}_i\)) on the i-th partition. Here, we assume that the clients have access to the partition map, mapping each address \(\mathsf {a}\in [N]\) to some partition \(\mathsf {partition}[\mathsf {a}]\). (We will discuss in the analysis how to eliminate this sharing using recursion.) Besides, Generic-OPRAM further takes care of the communication among clients using the algorithms \(\mathsf {OblivElect}\), \(\mathsf {OblivMCast}\), \(\mathsf {OblivRoute}\) from Sect. 3.2.

We postpone a complexity, correctness, and security analysis to Appendix E, as well as a discussion of the recursion version.

Data Structures. The non-recursive version of Generic-OPRAM keeps a partition map with N entries that maps block addresses \(\mathsf {a}\) to their currently assigned partition \(\mathsf {partition}[\mathsf {a}]\), and that can be accessed by all clients obviously (i.e., access to the partition map are secret). Every client additionally keeps a stash \(\mathsf {SS}_i\) which contains at most R items to be inserted into \(\mathcal {OD}_i\). For our analysis to work out, we need \(R = \omega (\log \lambda )\). Also let \(\kappa \ge 2\) be a constant.

Generic OPRAM Protocol. In each iteration, given the logical access requests \( (\mathsf {Acc}(op_i, \text{ addr }_i, v_i))_{i \in [m]}\) input to the client, the m clients go through the following steps (all messages are tacitly encrypted with fresh random coins):

  1. 1.

    Select block representatives. Run \(\mathsf {OblivElect}\) between clients with inputs \((\mathsf {a}_i, {\text {op}}_i)_{i \in [m]}\). In the end, each client i knows whether it has been selected as the representative to get the block value \(\mathsf {a}_i\), or not.

  2. 2.

    Query blocks. Clients do one of two things:

    • Real requests. Each representative client i gets the partition index \(p_i = \mathsf {partition}[\mathsf {a}_i]\), and sends a request \(\mathsf {a}_i\) to client \(p_i\). Moreover, it reassigns \(\mathsf {partition}[\mathsf {a}_i] \displaystyle \mathop {\leftarrow }^{\$}[m]\).

    • Fake requests. Every non-representative client i generates a random \(q_i \displaystyle \mathop {\leftarrow }^{\$}[m]\) and sends a request \(\perp \) to client \(q_i\).

  3. 3.

    Retrieve the blocks. Each client \(p \in [m]\) processes the received requests according to some random ordering: For each request \(\mathsf {a}_i \ne \bot \) received from client i, client p executes \( \mathcal {OD}_{p}(\mathcal {R} \& \mathcal {D}, \mathsf {a}_i)\) and denote the retrieved block value \(\overline{v}_i\). If \(\overline{v}_i = \bot \), then there must be some entry \((\mathsf {a}_i, \overline{v}_i')\) in the \(\mathsf {SS}_p\). Then, client p deletes this entry, and sets \(\overline{v}_i = \overline{v}'_i\). Finally, it sends \(\overline{v}_i\) back to i. For every \(\bot \) request received from some client i, client p executes the fake read access \( \mathcal {OD}_{p}(\mathcal {R} \& \mathcal {D}, \perp )\), and returns \(\overline{v}_i = \bot \) to i.

  4. 4.

    Representatives inform. At the end of the previous step, each client holds a value \(\overline{v}_i\) which is \(\ne \bot \) if and only if i is the representative for \(\mathsf {a}_i\). Next, the m clients run sub-protocol \(\mathsf {OblivMCast}\) to allow each representative to multicast the value it holds to other clients requesting the same block: Concretely, each client i uses input \((\mathsf {a}_i, \overline{v}_i)\) (recall a non-representative has \(\overline{v}_i =\bot \)) and receives output \(\overline{v}'_i\), which is guaranteed to be the value of block \(\mathsf {a}_i\) it requests. Each client i answers its logical request with \(\overline{v}'_i\).

  5. 5.

    Send updated values. For each representative i such that \(\mathsf{Acc}(op_i, \mathsf {a}_i, v_i)\) is a write command, let \(\mathrm {id}_i = \mathsf {partition}[\mathsf {a}_i]\) and \(\mathrm {msg}_i = (\mathsf {a}_i, v_i)\). Otherwise, if it is not a write command (but still, i a representative), it sets \(\mathrm {msg}_i = (\mathsf {a}_i, \overline{v}_i)\) instead. Non-representative clients set \(\mathrm {msg}_i = \bot \) and \(\mathrm {id}_i \displaystyle \mathop {\leftarrow }^{\$}[m]\). Then, the clients run \(\mathsf {OblivRoute}\) with respective inputs \((\mathrm {id}_i, \mathrm {msg}_i)\).

  6. 6.

    Write back. Each client \(p \in [m]\) adds all pairs \((\mathsf {a}, v)\) received through \(\mathsf {OblivRoute}\) to \(\mathsf {SS}_p\). Then, client p picks the first \(\kappa \) elements from \(\mathsf {SS}_{p}\), and for each such element \((\mathsf {a}, v)\), executes \(\mathcal {OD}_{i}(\mathcal {I}, \mathsf {a}_i, v)\). If \(\kappa ' < \kappa \) elements are in \(\mathsf {SS}_i\), then the last \(\kappa - \kappa '\) insertions are dummy insertions \(\mathcal {OD}_{i}(\mathcal {I}, \bot , \bot )\). Anytime when stash \(\mathsf {SS}_i\) needs to store more than R blocks or the partition holds more then \(2N/m + R\) blocks, output “overflow” and halt.