Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Oblivious RAM (ORAM), initially proposed by Goldreich and Ostrovsky [19, 20, 36], is a cryptographic primitive that allows a client to store private data on an untrusted server and maintain obliviousness while accessing that data — i.e., guarantee that the server or any other observer learns nothing about the data or the client’s access pattern (the sequence of addresses or operations) to that data. Since its initial proposal, ORAM has been studied in theory [21, 25, 39, 41, 45, 49], or in various application settings including secure outsourced storage [8, 29, 32, 42, 43, 50], secure processors [1012, 31, 38, 40, 51] and secure multi-party computation [13, 14, 24, 28, 47, 48].

1.1 Server Computation in ORAM

The ORAM model considered historically, starting with the work of Goldreich and Ostrovsky [19, 20, 36], assumed that the server acts as a simple storage device that allows the client to read and write data to it, but does not perform any computation otherwise. However, in many scenarios investigated by subsequent works [8, 32, 42, 50] (e.g., the setting of remote oblivious file servers), the untrusted server has significant computational power, possibly even much greater than that of the client. Therefore, it is natural to extend the ORAM model to allow for server computation, and to distinguish between the amount of computation performed by the server and the amount of communication with the client.

Indeed, many recent ORAM schemes have implicitly or explicitly leveraged some amount of server computation to either reduce bandwidth cost [1, 7, 13, 14, 29, 32, 39, 43, 52], or reduce the number of online roundtrips [49]. We remark that some prior works [1, 32] call themselves oblivious storage (or oblivious outsourced storage) to distinguish from the standard ORAM model where there is no server computation. We will simply apply the term ORAM to both models, and refer to ORAM with/without server computation to distinguish between the two.

At first, many works implicitly used server computation in ORAM constructions [13, 14, 32, 39, 43, 49, 52], without making a clear definitional distinction from standard ORAM. Apon et al. were the first to observe that such a distinction is warranted [1], not only for the extra rigor, but also because the definition renders the important Goldreich-Ostrovsky ORAM lower bound [20] inapplicable to the server computation setting — as we discuss below.

1.2 Attempts to “Break” the Goldreich-Ostrovsky Lower Bound

Traditionally, ORAM constructions are evaluated by their bandwidth, client storage and server storage. Bandwidth is the amount of communication (in bits) between client/server to serve a client request, including the communication in the background to maintain the ORAM (i.e., ORAM evictions). We also define bandwidth blowup to be bandwidth measured in the number of blocks (i.e., blowup compared to a normal RAM). Client storage is the amount of trusted local memory required at the client side to manage the ORAM protocol and server storage is the amount of storage needed at the server to store all data blocks.

In their seminal work [20], Goldreich and Ostrovsky showed that an ORAM of N blocks must incur a \(O(\log N)\) lower bound in bandwidth blowup, under O(1) blocks of client storage. If we allow the server to perform computation, however, the Goldreich-Ostrovsky lower bound no longer applies with respect to client-server bandwidth [1]. The reason is that the Goldreich-Ostrovsky bound is in terms of the number of operations that must be performed. With server computation, though the number of operations is still subject to the bound, most operations can be performed on the server-side without client intervention, making it possible to break the bound in terms of bandwidth between client and server. Since historically bandwidth has been the most important metric and the bottleneck for ORAM, breaking the bound in terms of bandwidth constitutes a significant advance.

However, it turns out that this is not easy. Indeed, two prior works [1, 32] have made endeavors towards this direction using homomorphic encryption. Path-PIR [32] leverages additively homomorphic encryption (AHE) to improve ORAM online bandwidth, but its overall bandwidth blowup is still poly-logarithmic. On the other hand, Apon et al. [1] showed that using a fully homomorphic encryption (FHE) scheme with constant ciphertext expansion, one can construct an ORAM scheme with constant bandwidth blowup. The main idea is that, instead of having the client move data around on the server “manually” by reading and writing to the server, the client can instruct the server to perform ORAM request and eviction operations under an FHE scheme without revealing any data and its movement. While this is a very promising direction, it suffers from the following drawbacks:

  • First, ORAM keeps access patterns private by continuously shuffling memory as data is accessed. This means the ORAM circuit depth that has to be evaluated under FHE depends on the number of ORAM accesses made and can grow unbounded (which we say to mean any polynomial amount in N). Therefore, Apon et al. [1] needs FHE bootstrapping, which not only requires circular security but also incurs a large performance penalty in practice.Footnote 1

  • Second, with the server performing homomorphic operations on encrypted data, achieving malicious security is difficult. Consequently, most existing works either only guarantee semi-honest security [32, 52], or leveraged powerful tools such as SNARKs to ensure malicious security [1]. However, SNARKs not only require non-standard assumptions [18], but also incur prohibitive cost in practice.

1.3 Our Contributions

With the above observation, the goal of this work is to construct constant bandwidth blowup ORAM schemes from standard assumptions that have practical efficiency and verifiability in the malicious setting. Specifically, we give proofs by construction for the following theorems. Let B be the block size in bits and N the number of blocks in the ORAM.

Theorem 1

(Semi-honest Security Construction). Under the Decisional Composite Residuosity assumption (DCR) or Learning With Errors (LWE) assumption, there exists an ORAM scheme with semi-honest security, O(B) bandwidth, O(BN) server storage and O(B) client storage. To achieve negligible in N probability of ORAM failure and success from best known attacks, our schemes require poly-logarithmic in N block size and server computation.

We use negligible in N security following prior ORAM work but also give asymptotics needed for exact exponential security in Sect. 6.

Looking at the big picture, our DCR-based scheme is the first demonstration of a constant bandwidth blowup ORAM using any additively homomorphic encryption scheme (AHE), as opposed to FHE. Our LWE-based scheme (detailed in the online version [9]) is the first time ORAM has been combined with SWHE/FHE in a way that does not require Gentry’s bootstrapping procedure.

Our next goal is to extend our semi-honest constructions to the malicious setting. In Sect. 5, we will introduce the concept of “abstract server-computation ORAM” which both of our constructions satisfy. Then, we can achieve malicious security due to the following theorem:

Theorem 2

(Malicious Security Construction). With the additional assumption of collision-resistant hash functions, any “abstract server-computation ORAM” scheme with semi-honest security can be compiled into a “verified server-computation ORAM” scheme which has malicious security.

We stress that these are the only required assumptions. We do not need the circular security common in FHE schemes and do not rely on SNARKs for malicious security. We defer formal definitions of server-computation ORAM and malicious security to Appendix A.

Main Ideas. The key technical contributions enabling the above results are:

  • (Sect. 3) An ORAM that, when combined with server computation, has shallow circuit depth, i.e., \(O(\log N)\) over the entire history of all ORAM accesses. This is a necessity for our constructions based on AHE or SWHE, and removes the need for FHE (Gentry’s bootstrapping operations). We view this technique as an important step towards practical constant bandwidth blowup ORAM schemes.

  • (Sect. 5) A novel technique that combines a cut and choose-like idea with an error-correcting code to amplify soundness.

Table 1 summarizes our contributions and compares our schemes with some of the state-of-the-art ORAM constructions.

Practical Efficiency. To show how our results translate to practice, Sect. 6.4 compares our semi-honest AHE-based construction against Path PIR [32] and Circuit ORAM [47]—the best prior schemes with and without server computation that match our scheme in client/server storage. The top order bit is that as block size increases, our construction’s bandwidth approaches 2B. When all three schemes use an 8 MB block size (a proxy for modern image file size), Onion ORAM improves over Circuit ORAM and Path-PIR’s bandwidth (in bits) by \(\mathbf {35}\) \(\times \) and \(\mathbf {22}\) \(\times \), respectively. For larger block sizes, our improvement increases. We note that in many cases, block size is an application constraint: for applications asking for a large block size (e.g., image sharing), all ORAM schemes will use that block size.

Table 1. Our Contribution. N is the number of blocks. The optimal block size is the data block size needed to achieve the stated bandwidth, and is measured in bits. All schemes have O(B) client storage and O(BN) server storage (both asymptotically optimal) and negligible failure probability in N. Computation measures the number of two-input plaintext gates evaluated per ORAM access. “M” stands for malicious security, and “SH” stands for semi-honest. We set parameters for AHE/SWHE (the Damgård-Jurik and Ring-LWE cryptosystems [3, 6], respectively) to get super-poly in N defense to best known attacks [26, 27]. For derivation of parameters for the SWHE schemes, see the extended version [9].

1.4 Related Work

Recent non-server-computation ORAMs are approaching the Goldreich-Ostrovsky lower bound under O(1) blocks of client storage. Goodrich et al. [21] and Kushilevitz et al. [25] demonstrated \(O(\log ^2 N)\) and \(O(\log ^2 N / \log \log N)\) bandwidth blowup schemes, respectively. Recently, Wang et al. constructed Circuit ORAM [47], which achieves \(\omega (\log N)\) bandwidth blowup.

Many state-of-the-art ORAM schemes or implementations make use of server computation. For example, the SSS construction [42, 43], Burst ORAM [8] and Ring ORAM [39] assumed the server is able to perform matrix multiplication or XOR operations. Path-PIR [32] and subsequent work [7, 52] increased the allowed computation to additively homomorphic encryption. Apon et al. [1] and Gentry et al. [13, 14] further augmented ORAM with Fully Homomorphic Encryption (FHE). Williams and Sion rely on server computation to achieve a single online roundtrip [49]. We remark that the techniques of Gentry et al. [13] and Wang et al. [46], for improving data structure performance on top of ORAM, can be combined with our techniques.

Recent works on Garbled RAM [15, 30] can also be seen as generalizing the notion of server-computation ORAM. However, existing Garbled RAM constructions incur \(\mathsf {poly}(\lambda ) \cdot \mathsf {polylog}(N)\) client work and bandwidth blowup, and therefore Garbled RAM does not give a server-computation RAM with constant bandwidth blowup. Reusable Garbled RAM [16] achieves constant client work and bandwidth blowup, but known reusable garbled RAM constructions rely on non-standard assumptions (indistinguishability obfuscation, or more) and are prohibitive in practice.

The mechanics of running our shallow depth ORAM over a homomorphic encryption scheme are similar to those used to evaluate encrypted branching programs [23]. (One may think of our contribution as formulating ORAM as a shallow enough circuit so that the techniques of [23] apply.)

2 Overview of Techniques

In our schemes, the client “guides” the server to perform ORAM accesses and evictions homomorphically by sending the server some “helper values”. With these helper values, the server’s main job will be to run a sub-routine called the “homomorphic select” operation (select operation for short), which can be implemented using either AHE or SWHE – resulting in two different constructions. We can achieve constant bandwidth blowup because helper value size is independent of data block size: when the block size sufficiently large, sending helper values does not affect the asymptotic bandwidth blowup. We now explain these ideas along with pitfalls and solutions in more detail. For the rest of the section, we focus on the AHE-based scheme but note that the story with SWHE is very similar.

Building Block: Homomorphic Select Operation. The select operation, which resembles techniques from private information retrieval (PIR) [27], takes as input m plaintext data blocks \(\mathsf {pt}_1,\ldots ,\mathsf {pt}_m\) and encrypted helper values which represent a user-chosen index \(i^*\). The output is an encryption of block \(\mathsf {pt}_{i^*}\). Obviously, the helper values should not reveal \(i^*\).

Our ORAM protocol will need select operations to be performed over the outputs of prior select operations. For this, we require a sequence of AHE schemes \(\mathcal {E}_\ell \) with plaintext space \(\mathbb {L}_\ell \) and ciphertext space \(\mathbb {L}_{\ell +1}\) where \(\mathbb {L}_{\ell +1}\) is again in the plaintext space of \(\mathcal {E}_{\ell +1}\). Each scheme \(\mathcal {E}_\ell \) is additively homomorphic meaning \(\mathcal {E}_\ell (x) \oplus \mathcal {E}_\ell (y) = \mathcal {E}_\ell (x + y)\). We denote an \(\ell \)-layer onion encryption of a message x by \(\mathcal {E}^{\ell }(x) := \mathcal {E}_\ell (\mathcal {E}_{\ell -1}( \ldots \mathcal {E}_1(x) ))\).

Suppose the inputs to a select operation are encrypted with \(\ell \) layers of onion encryption, i.e., \(\mathsf {ct}_{i} = \mathcal {E}^\ell (\mathsf {pt}_{i})\). To select block \(i^*\), the client sends an encrypted select vector (select vector for short), \(\mathcal {E}_{\ell +1}(b_1),\ldots ,\mathcal {E}_{\ell +1}(b_m)\) where \(b_{i^*}=1\) and \(b_i = 0\) for all other \(i\ne i^*\). Using this select vector, the server can homomorphically compute \(\mathsf {ct}^* = \bigoplus _i \mathcal {E}_{\ell +1}\left( b_i \right) \cdot \mathsf {ct}_{i} = \mathcal {E}_{\ell +1}\left( \sum _i b_i \cdot \mathsf {ct}_{i}\right) = \mathcal {E}_{\ell +1}(\mathsf {ct}_{i^*}) = \mathcal {E}^{\ell +1}(\mathsf {pt}_{i^*})\). The result is the selected data block \(\mathsf {pt}_{i^*}\), with \(\ell +1\) layers of onion encryption. Notice that the result has one more layer than the input.

All ORAM Operations can be Implemented Using Homomorphic Select Operations. In our schemes, for each ORAM operation, the client read/writes per-block metadata and creates a select vector(s) based on that metadata. The client then sends the encrypted select vector(s) to the server, who does the heavy work of performing actual computation over block contents.

Specifically, we will build on top of tree-based ORAMs [41, 45], a standard type of ORAM without server computation. Metadata for each block includes its logical address and the path it is mapped to. To request a data block, the client first reads the logic addresses of all blocks along the read path. After this step, the client knows which block to select and can run the homomorphic select protocol with the server. ORAM eviction operations require that the client sends encrypted select vectors to indicate how blocks should percolate down the ORAM tree. As explained above, each select operation adds an encryption layer to the selected block.

Achieving Constant Bandwidth Blowup. To get constant bandwidth blowup, we must ensure that select vector bandwidth is smaller than the data block size. For this, we need several techniques. First, we will split each plaintext data block into C chunks \(\mathsf {pt}_i = (\mathsf {pt}_{i}[1],\ldots ,\mathsf {pt}_{i}[C])\), where each chunk is encrypted separately, i.e., \(\mathsf {ct}_i = (\mathsf {ct}_{i}[1],\ldots ,\mathsf {ct}_{i}[C])\) where \(\mathsf {ct}_{i}[j]\) is an encryption of \(\mathsf {pt}_{i}[j]\). Crucially, each select vector can be reused for all the C chunks. By increasing C, we can increase the data block size to decrease the relative bandwidth of select vectors.

Second, we require that each encryption layer adds a small additive ciphertext expansion (even a constant multiplicative expansion would be too large). Fortunately, we do have well established additively homomorphic encryption schemes that meet this requirement, such as the Damgård-Jurik cryptosystem [6]. Third, the “depth” of the homomorphic select operations has to be bounded and shallow. This requirement is the most technically challenging to satisfy, and we will now discuss it in more detail.

Bounding the Select Operation Depth. We address this issue by constructing a new tree-based ORAM, which we call a “bounded feedback ORAM”.Footnote 2 By “feedback”, we refer to the situation where during an eviction some block a gets stuck in its current bucket b. When this happens, an eviction into b needs select operations that take both incoming blocks and block a as input, resulting in an extra layer on bucket b (on top of the layers bucket b already has). The result is that buckets will accumulate layers (with AHE) or ciphertext noise (with SWHE) on each eviction, which grows unbounded over time.

Our bounded feedback ORAM breaks the feedback loop by guaranteeing that bucket b will be empty at public times, which allows upstream blocks to move into b without feedback from blocks already in b. It turns out that breaking this feedback is not trivial: in all existing tree-based ORAM schemes [39, 41, 45, 47], blocks can get stuck in buckets during evictions which means there is no guarantee on when buckets are empty.Footnote 3 We remark that cutting feedback is equivalent to our claim of shallow circuit depth in Sect. 1.3: Without cutting feedback, the depth of the ORAM circuit keeps growing with the number of ORAM accesses.

Techniques for Malicious Security. We are also interested in achieving malicious security, i.e., enforcing honest behaviors of the server, while avoiding SNARKs. Our idea is to rely on probabilistic checking, and to leverage an error-correcting code to amplify the probability of detection. As mentioned before, each block is divided into C chunks. We will have the client randomly sample security parameter \(\lambda \ll C\) chunks per block (the same random choice for all blocks), referred to as verification chunks, and use standard memory checking to ensure their authenticity and freshness. On each step, the server will perform homomorphic select operations on all C chunks in a block, and the client will perform the same homomorphic select operations on the \(\lambda \) verification chunks. In this way, whenever the server returns the client some encrypted block, the client can check whether the \(\lambda \) corresponding chunks match the verification chunks.

Unfortunately, the above scheme does not guarantee negligible failure of detection. For example, the server can simply tamper with a random chunk and hope that it’s not one of the verification chunks. Clearly, the server succeeds with non-negligible probability. The fix is to leverage an error-correcting code to encode the original C chunks of each block into \(C' = 2C\) chunks, and ensure that as long as \(\frac{3}{4} C'\) chunks are correct, the block can be correctly decoded. Therefore, the server knows a priori that it will have to tamper with at least \(\frac{1}{4} C'\) chunks to cause any damage at all, in which case it will get caught except with negligible probability.

Table 2. ORAM parameters and notations.

3 Bounded Feedback ORAM

We now present the bounded feedback ORAM, a traditional ORAM scheme without server computation, to illustrate its important features. All notation used throughout the rest of the paper is summarized in Table 2.

3.1 Bounded Feedback ORAM Basics

We build on the tree-based ORAM framework of Shi et al. [41], which organizes server storage as a binary tree of nodes. The binary tree has \(L+1\) levels, where the root is at level 0 and the leaves are at level L. Each node in the binary tree is called a bucket and can contain up to Z data blocks. The leaves are numbered \(0, 1, \ldots , 2^L-1\) in the natural manner. Pseudo-code for our algorithm is given in Fig. 1 and described below.

Note that many parts of our algorithm refer to paths down the tree where a path is a contiguous sequence of buckets from the root to a leaf. For a leaf bucket l, we refer to the path to l as path l or \(\mathcal {P}(l)\). \(\mathcal {P}(l, k)\) denotes the bucket at level \(k \in [0..L]\) on \(\mathcal {P}(l)\). Specifically, \(\mathcal {P}(l, 0)\) denotes the root, and \(\mathcal {P}(l, L)\) denotes the leaf bucket on \(\mathcal {P}(l)\).

Main Invariant. Like all tree-based ORAMs, each block is associated with a random path and we say that each block can only live in a bucket along that path at any time. In a local position map, the client stores the path associated to each block.

Recursion. To avoid incurring a large amount of client storage, the position map should be recursively stored in other smaller ORAMs [41]. When the data block size is \(\varOmega (\log ^2N)\) for an N element ORAM—which will be the case for all of our final parameterizations—the asymptotic costs of recursion (in terms of server storage or bandwidth blowup) are insignificant relative to the main ORAM [44]. Thus, for the remainder of the paper, we no longer consider the bandwidth cost of recursion.

Metadata. To enable all ORAM operations, each block of data in the ORAM tree is stored alongside its address and leaf label (the path the block is mapped to). This metadata is encrypted using a semantically secure encryption scheme.

ORAM Request. Requesting a block with address a (\(\mathsf {ReadPath}\) in Fig. 1) is similar to most tree-based ORAMs: look up the position map to obtain the path block a is currently mapped to, read all blocks on that path to find block a, invalidate block a, remap it to a new random path and add it to the root bucket. This involves decrypting the address metadata of every block on the path (Line  13) and setting one address to \(\bot \) (Line 15). All addresses must be then re-encrypted to hide which block was invalidated.

ORAM Eviction. The goal of eviction is to percolate blocks towards the leaves to avoid bucket overflows and it is this procedure where we differ from existing tree-based ORAMs [13, 39, 41, 45, 47]. We now describe our eviction procedure in detail.

Fig. 1.
figure 1

Bounded Feedback ORAM (no server computation). Note that our construction differs from the original tree ORAM [41] only in the \(\mathsf {Evict}\) procedure. We split \(\mathsf {Evict}\) into \(\mathsf {EvictAlongPath}\) to simplify the presentation later.

3.2 New Triplet Eviction Procedure

We combine techniques from [13, 39, 41] to design a novel eviction procedure (\(\mathsf {Evict}\) in Fig. 1) that enables us to break select operation feedback.

Triplet Eviction on a Path. Similar to other Tree ORAMs, eviction is performed along a path. To perform an eviction: For every bucket \(\mathcal {P}(l_e, k)\) (k from 0 to L, i.e., from root to leaf), we move blocks from \(\mathcal {P}(l_e, k)\) to its two children. Specifically, each block in \(\mathcal {P}(l_e, k)\) moves to either the left or right child bucket depending on which move keeps the block on the path to its leaf (this can be determined by comparing the block’s leaf label to \(l_e\)). We call this process a bucket-triplet eviction.

In each of these bucket-triplet evictions, we call \(\mathcal {P}(l_e, k)\) the source bucket, the child bucket also on \(\mathcal {P}(l_e)\) the destination bucket, and the other child the sibling bucket. A crucial change that we make to the eviction procedure of the original binary-tree ORAM [41] is that we move all the blocks in the source bucket to its two children.

Fig. 2.
figure 2

The reverse lexicographical eviction order. Black buckets indicate those on each eviction path and G is the eviction count from Fig. 1. As indicated in Fig. 1, the eviction paths corresponding to \(G=4\) and \(G=0\) are equal: the exact eviction sequence shown above cycles forever. We mark the eviction path edges as 0/1 (goto left child = 0, right child = 1) to illustrate that the eviction path equals G in reverse binary representation.

Eviction Frequency and Order. For every A (a parameter proposed in [39], which we will set later) ORAM requests, we select the next path to evict based on the reverse lexicographical order of paths (proposed in [13] and illustrated in Fig. 2). The reverse lexicographical order eviction most evenly and deterministically spreads out the eviction on all paths in the tree. Specifically, a bucket at level k will get evicted exactly every \(A \cdot 2^k\) ORAM requests.

Setting Parameters for Bounded Feedback. As mentioned, we require that during a bucket-triplet eviction, all blocks in the source bucket move to the two child buckets. The last step to achieve bounded feedback is to show that child buckets will have enough room to receive the incoming blocks, i.e., no child bucket should ever overflow except with negligible probability. (If any bucket overflows, we have experienced ORAM failure.) We guarantee this property by setting the bucket size Z and the eviction frequency A properly. According to the following lemma, if we simply set \(Z=A=\varTheta (\lambda )\), the probability that a bucket overflows is \(2^{-\varTheta (\lambda )}\), exponentially small.

Lemma 1

(No Bucket Overflows). If \(Z \ge A\) and \(N \le A \cdot 2 ^ {L-1}\), the probability that a bucket overflows after an eviction operation is bounded by \(e^{-\frac{(2Z-A)^2}{6A}}\).

The proof of Lemma 1 relies on a careful analysis of the stochastic process stipulated by the reverse lexicographic ordering of eviction, and boils down to a Chernoff bound. We defer the full proof to Appendix B.1. Now, Lemma 1 with \(Z=A=\varTheta (\lambda )\) immediately implies the following key observation.

Fig. 3.
figure 3

ORAM tree state immediately after each of a sequence of four evictions. After an eviction, the buckets on the eviction path (excluding the leaves) are guaranteed to be empty. Further, at the start of each eviction, each sibling bucket for that eviction is guaranteed to be empty. Notations: Assume the ORAM tree has more levels (not shown for simplicity). The eviction path is marked with arrows. The dotted boxes indicate bucket triplets during each eviction.

Observation 1

(Empty Source Bucket). After a bucket-triplet eviction, the source bucket is empty.

Furthermore, straightforwardly from the definition of reverse lexicographical order, we have,

Observation 2

In reverse-lexicographic order eviction, each bucket rotates between the following roles in the following order: source, sibling, and destination.

These observations together guarantee that buckets are empty at public and pre-determined times, as illustrated in Fig. 3.

Towards Bounded Feedback. The above two observations are the keys to achieving bounded feedback. An empty source bucket b will be a sibling bucket the next time it is involved in a triplet eviction. So select operations that move blocks into b do not get feedback from b itself. Thus, the number of encryption layers (with AHE) or ciphertext noise (SWHE) becomes a function of previous levels in the tree only, which we can tightly bound later in Lemma 2 in Sect. 4.3.

Constant Server Storage Blowup. We note that under our parameter setting \(N \le A \cdot 2 ^ {L-1}\) and \(Z=A\), our bounded feedback ORAM’s server storage is \(O(2^{L+1} \cdot Z \cdot B)=O(BN)\), a constant blowup.

4 Semi-honest Onion ORAM with an Additively Homomorphic Encryption

In this section, we describe how to leverage an AHE scheme with additive ciphertext expansion to transform our bounded feedback ORAM into our semi-honest secure Onion ORAM scheme. First, we detail the homomorphic select operation that we introduced in Sect. 2.

4.1 Additively Homomorphic Select Sub-protocol

Suppose the client wishes to select the \(i^*\)-th block from m blocks denoted \(\mathsf {ct}_1, \ldots , \mathsf {ct}_m\), each with \(\ell _1, \ldots , \ell _m\) layers of encryption respectively. The sub-protocol works as follows:

  1. 1.

    Let \(\ell := \max (\ell _1, \ldots , \ell _m)\). The client creates and sends to the server the following encrypted select vector \(\langle \mathcal {E}_{\ell +1}(b_1), \mathcal {E}_{\ell +1}(b_2), \ldots \mathcal {E}_{\ell +1}(b_m) \rangle \), where \(b_{i^*}=1\) and \(b_i=0\) for \(i\ne i^*\).

  2. 2.

    The server “lifts” each block to \(\ell \)-layer ciphertexts, simply by continually re-encrypting a block until it has \(\ell \) layers \(\mathsf {ct}'_i[j]=\mathcal {E}_{\ell }(\mathcal {E}_{\ell -1}(\ldots \mathcal {E}_{\ell _i}(\mathsf {ct}_i[j])))\).

  3. 3.

    The server evaluates the homomorphic select operation on the lifted blocks: \(\mathsf {ct}_{out}[j] := \bigoplus _{i} \left( \mathcal {E}_{\ell +1}(b_i) \otimes \mathsf {ct}'_i[j] \right) = \mathcal {E}_{\ell +1} (\mathsf {ct}'_{i^*}).\) The outcome is the selected block \(\mathsf {ct}_{i^*}\) with \(\ell +1\) layers of encryption.

As mentioned in Sect. 2, we divide each block into C chunks. Each chunk is encrypted separately. All C chunks share the same select vector—therefore, encrypting each element in the select vector only incurs the chunk size (instead of the block size).

We stress again that every time a homomorphic select operation is performed, the output block gains an extra layer of encryption, on top of \(\ell =\max (\ell _1, \ldots , \ell _m)\) onion layers. This poses the challenge of bounding onion encryption layers, which we address in Sect. 4.3.

4.2 Detailed Protocol

We now describe the detailed protocol. Recall that each block is tagged with the following metadata: the block’s logical address and the leaf it is mapped to, and that the size of the metadata is independent of the block size.

Initialization. The client runs a key generation routine for all layers of encryption, and gives all public keys to the server.

Read Path. \(\mathsf {ReadPath}(l,a)\) from Sect. 3.1 can be done with the following steps:

  1. 1.

    Client downloads and decrypts the addresses of all blocks on path l, locates the block of interest a, and creates a corresponding select vector \({\varvec{b}} \in \{0, 1\}^{{Z(L+1)}}\).

  2. 2.

    Client and server run the homomorphic select sub-protocol with client’s input being encryptions of each element in \({\varvec{b}}\) and server’s input being all encrypted blocks on path l. The outcome of the sub-protocol—block a—is sent to the client.

  3. 3.

    Client re-encrypts and writes back the addresses of all blocks on path l, with block a now invalidated. This removes block a from the path without revealing its location. Then, the client re-encrypts block a (possibly modified) under 1 layer, and appends it to the root bucket.

Eviction. To perform \(\mathsf {EvictAlongPath}(l_e)\), do the following for each level k from 0 to \(L-1\):

  1. 1.

    Client downloads all the metadata (addresses and leaf labels) of the bucket triplet. Based on the metadata, the client determines each block’s location after the bucket-triplet eviction.

  2. 2.

    For each slot to be written in the two child buckets:

    • Client creates a corresponding select vector \({\varvec{b}} \in \{0, 1\}^{2Z}\).

    • Client and server run the homomorphic select sub-protocol with the client’s input being encryptions of each element in \({\varvec{b}}\), and the server’s input being the child bucket (being written to) and its parent bucket. Note that if the child bucket is empty due to Observation 1 (which is public information to the server), it conceptually has zero encryption layers.

    • Server overwrites the slot with the outcome of the homomorphic select sub-protocol.

4.3 Bounding Layers

Given the above protocol, we bound layers with the following lemma:

Lemma 2

Any block at level \(k \in [0..L]\) has at most \(2k + 1\) encryption layers.

The proof of Lemma 2 is deferred to Appendix B.2. The key intuition for the proof is that due to the reverse-lexicographic eviction order, each bucket will be written to exactly twice (i.e., be a destination or sibling bucket) before being emptied (as a source bucket). Also in Appendix B.2, we introduce a further optimization called the “copy-to-sibling” optimization, which yields a tighter bound: blocks at level \(k \in [0..L]\) will have only \(k + 1\) layers.

Eviction Post-processing—Peel off Layers in Leaf. The proof only applies to non-leaf buckets: blocks can stay inside a leaf bucket for an unbounded amount of time. Therefore, we need the following post-processing step for leaf nodes. After \(\mathsf {EvictAlongPath}(l_e)\), the client downloads all blocks from the leaf node, peels off the encryption layers, and writes them back to the leaves as layer-\(\varTheta (L)\) re-encrypted ciphertexts (meeting the same layer bound as other levels). Since the client performs an eviction every A ORAM requests, and each leaf bucket has size \(Z=A\), this incurs only O(1) amortized bandwidth blowup.

4.4 Remarks on Cryptosystem Requirements

Let \(L'\) be the layer bound (derived in Sect. 4.3). For efficiency (in bandwidth for the overall protocol) we require the output of an arbitrary select operation performed during an ORAM request (note that \(\ell = L'\) in this case) to be a constant times larger than the block size B. Since \(L'=\omega (1)\), this implies we need additive blowup per encryption layer, independent of \(L'\). One cryptosystem that satisfies the above requirement, for appropriate parameters, is the Damgård-Jurik cryptosystem (Sect. 6.2). We use this scheme to derive final parameters for the AHE construction in Sect. 6.

5 Security Against Fully Malicious Server

So far, we have seen an ORAM scheme that achieves security against an honest-but-curious server who follows the protocol correctly. We now show how to extend this to get a scheme that is secure against a fully malicious server who can deviate arbitrarily from the protocol.

5.1 Abstract Server-Computation ORAM

We start by describing several abstract properties of the Onion ORAM scheme from the previous section. We will call any server-computation ORAM scheme satisfying these properties an abstract server-computation ORAM.

Data Blocks and Metadata. The server storage consists of two types of data: data blocks and metadata. The server performs computation on data blocks, but never on metadata. The client reads and writes the metadata directly, so the metadata can be encrypted under any semantically secure encryption scheme.

Operations on Data Blocks. Following the notations in Sect. 2, each plaintext data block is divided into C chunks, and each chunk is separately encrypted \(\mathsf {ct}_i = (\mathsf {ct}_i[1],\ldots ,\mathsf {ct}_i[C])\). The client operates on the data blocks either by: (1) directly reading/writing an encrypted data block, or (2) instructing the server to apply a function f to form a new data block \(\mathsf {ct}_i\), where \(\mathsf {ct}_i[j]\) only depends on the j-th chunk of other data blocks, i.e., \(\mathsf {ct}_{i}[j] = f(\mathsf {ct}_1[j], \ldots , \mathsf {ct}_m[j])\) for all \(j \in [1..C]\).

It is easy to check that the two Onion ORAM schemes are instances of the above abstraction. The metadata consists of the encrypted addresses and leaf labels of each data block, as well as additional space needed to implement ORAM recursion. The data blocks are encrypted under either a layered AHE scheme or a SWHE scheme. Function f is a “homomorphic select operation”, and is applied to each chunk.

5.2 Semi-honest to Malicious Compiler

We now describe a generic compiler that takes any “abstract server-computation ORAM” that satisfies honest-but-curious security and compiles it into a “verified server-computation ORAM” which is secure in the fully malicious setting.

Verifying Metadata. We can use standard “memory checking” [2] schemes based on Merkle trees [33] to ensure that the client always gets the correct metadata, or aborts if the malicious server ever sends an incorrect value. A generic use of Merkle tree would add an \(O(\log N)\) multiplicative overhead to the process of accessing metadata [29], which is good enough for us. This \(O(\log N)\) overhead can also be avoided by aligning the Merkle tree with the ORAM tree [38], or using generic authenticated data structures [34]. In any case, verifying metadata is basically free in Onion ORAM.

Challenge of Verifying Data Blocks. Unfortunately, we cannot rely on standard memory checking to protect the encrypted data blocks when the client doesn’t read/write them directly but rather instructs the server to compute on them. The problem is that a malicious server that learns some information about the client’s access pattern based on whether the client aborts or not.

Consider Onion ORAM for example. The malicious server wants to learn if, during the homomorphic select operation of a ORAM request, the location being selected is i. The server can perform the operation correctly except that it would replace the ciphertext at position i with some incorrect value. In this case, if the location being selected was indeed i then the client will abort since the data it receives will be incorrect, but otherwise the client will accept. This violates ORAM’s privacy requirement.

A more general way to see the problem is to notice that the client’s abort decision above depends on the decrypted value, which depends on the secret key of the homomorphic encryption scheme. Therefore, we can no longer rely on the semantic security of the encryption scheme if the abort decision is revealed to the server. To fix this problem, we need to ensure that the client’s abort decision only depends on ciphertext and not on the plaintext data.

Verifying Data Blocks. For our solution, the client selects a random subset S consisting of \(\lambda \) chunk positions. This set S is kept secret from the server. The subset of chunks in positions \(\{j: j \in S\}\) of every encrypted data block are treated as additional metadata, which we call the “verification chunks”. Verification chunks are encrypted and memory checked in the same way as the other metadata. Whenever the client instructs the server to update an encrypted data block, the client performs the same operation himself on the verification chunks. Then, when the client reads an encrypted data block from the server, he can check the chunks in S against the ciphertexts of verification chunks. This check ensures that the server cannot modify too many chunks without getting caught. To ensure that this check is sufficient, we apply an error-correcting code which guarantees that the server has to modify a large fraction of chunks to affect the plaintext. In more detail:

  • Every plaintext data block \(\mathsf {pt}= (\mathsf {pt}[1],\ldots ,\mathsf {pt}[C])\) is first encoded via an error-correcting code into a codeword block \(\mathsf {pt\_ecc}= \mathsf {ECC}(\mathsf {pt}) = (\mathsf {pt\_ecc}[1],\ldots ,\mathsf {pt\_ecc}[C'])\). The error-correcting code \(\mathsf {ECC}\) has a rate \(C/C' = \alpha <1\) and can efficiently recover the plaintext block if at most a \(\delta \)-fraction of the codeword chunks are erroneous. For concreteness, we can use a Reed-Solomon code, and set \(\alpha = \frac{1}{2}, \delta = (1-\alpha )/2 = \frac{1}{4}\). The client then uses the “abstract server-computation ORAM” over the codeword blocks \(\mathsf {pt\_ecc}\) (instead of \(\mathsf {pt}\)).

  • During initialization, the client selects a secret random set \(S = \{s_1,\ldots ,s_\lambda \} \subseteq [C']\). Each ciphertext data block \(\mathsf {ct}_i\) has verification chunks \(\mathsf {verCh}_i = (\mathsf {verCh}_i[1],\ldots ,\mathsf {verCh}_i[\lambda ])\). We ensure the invariant that, during an honest execution, \(\mathsf {verCh}_i[j] = \mathsf {ct}_i[s_j]\) for \(j \in [1..\lambda ]\).

  • The client uses a memory checking scheme to ensure the authenticity and freshness of the metadata including the verification chunks. If the client detects a violation in metadata at any point, the client aborts (we call this \(\mathsf {abort}_0\)).

  • Whenever the client directly updates or instructs the server to apply the aforementioned function f on an encrypted data block \(\mathsf {ct}_i\), it also updates or applies the same function f on the corresponding verification chunks \(\mathsf {verCh}_{i}[j]\) for \(j \in [1..\lambda ]\), which possibly involves reading other verification chunks that are input to f.

  • When the client reads an encrypted data block \(\mathsf {ct}_i\), it also reads \(\mathsf {verCh}_i\) and checks that \(\mathsf {verCh}_i[j] = \mathsf {ct}_i[s_j]\) for each \(j \in [1..\lambda ]\) and aborts if this is not the case (we call this \(\mathsf {abort}_1\)). Otherwise the client decrypts \(\mathsf {ct}_i\) to get \(\mathsf {pt\_ecc}_i\) and performs error-correction to recover \(\mathsf {pt}_i\). If the error-correction fails, the client aborts (we call this \(\mathsf {abort}_2\)).

If the client ever aborts during any operation with \(\mathsf {abort}_0,\mathsf {abort}_1\) or \(\mathsf {abort}_2\), it refuses to perform any future operations. This completes the compiler which gives us Theorem 2.

Security Intuition. Notice that in the above scheme, the decision whether \(\mathsf {abort}_1\) occurs does not depend on any secret state of the abstract server-computation ORAM scheme, and therefore can be revealed to the server without sacrificing privacy. We will argue that, if \(\mathsf {abort}_1\) does not occur, then the client retrieves the correct data (so \(\mathsf {abort}_2\) will not occur) with overwhelming probability. Intuitively, the only way that a malicious server can cause the client to either retrieve the incorrect data or trigger \(\mathsf {abort}_2\) without triggering \(\mathsf {abort}_1\) is to modify at least a \(\delta \) (by default, \(\delta = 1/4\)) fraction of the chunks in an encrypted data block, but avoid modifying any of the \(\lambda \) chunks corresponding to the secret set S. This happens with probability at most \((1- \delta )^\lambda \) over the random choice of S, which is negligible. The complete proof is given in Appendix B.3.

6 Optimizations and Analysis

In this section we present two optimizations, an asymptotic analysis and a concrete (with constants) analysis for our AHE-based protocol.

6.1 Optimizations

Hierarchical Select Operation and Sorting Networks. For simplicity, we have discussed select operations as inner products between the data vector and the coefficient vector. As an optimization, we may use the Lipmaa construction [27] to implement select hierarchically as a tree of d-to-1 select operations for a constant d (say \(d=2\)). In that case, for a given 1 out of Z selection, \({\varvec{b}}^{\mathsf {hier}}\in \{0,1\}^{\log Z}\). Eviction along a path requires \(O(\log N)\) bucket-triplet operations, each of which is a Z-to-Z permutation. To implement an arbitrary Z-to-Z permutation, we can use the Beneš sorting network, which consists of a total of \(O(Z\log Z)\) 2-to-1 select operations per triplet.

At the same time, both the hierarchical select and the Beneš network add \(\varTheta (\log Z)\) layers to the output as opposed to a single layer. Clearly, this makes the layer bound from Lemma 2 increase to \(\varTheta (\log Z \log N)\). But we can set related parameters larger to compensate.

Permuted Buckets. Observe that on a request operation, the client and the server need to run a homomorphic select protocol among \(O(\lambda \log N)\) blocks. We can reduce this number to \(O(\lambda )\) using the permuted bucket technique from Ring ORAM [39] (similar ideas were used in hierarchical ORAMs [20]). Instead of reading all slots along the tree path during each read, we can randomly permute blocks in each bucket and only read/remove a block at a random looking slot (out of \(Z=\varTheta (\lambda )\) slots) per bucket. Each random-looking location will either contain the block of interest or a dummy block. We must ensure that no bucket runs out of dummies before the next eviction refills that bucket’s dummies. Given our reverse-lexicographic eviction order, a simple Chernoff bound shows that adding \(\varTheta (A)=\varTheta (\lambda )\) dummies, which increases bucket size by a constant factor, is sufficient to ensure that dummies do not run out except with probability \(2^{-\varTheta (\lambda )}\). We do not permute the root bucket since it will require additional techniques (and does not give much benefit). Therefore, a read path selects among \(O(Z + \log N) = O(\lambda + \log N) = O(\lambda )\) blocks.

6.2 Damgård-Jurik Cryptosystem

We implement our AHE-based protocol over the Damgård-Jurik cryptosystem [6], a generalization of Paillier’s cryptosystem [37]. Both schemes are based on the hardness of the decisional composite residuosity assumption. In this system, the public key \(\mathsf {pk}=n=pq\) is an RSA modulus (p and q are two large, random primes) and the secret key \(\mathsf {sk}=\mathsf {lcm}(p-1,q-1)\). In the terminology from our onion encryptions, \(\mathsf {sk}_i,\mathsf {pk}_i=\mathcal {G}_i()\) for \(i\ge 0\).

We denote the integers mod n as \(\mathbb {Z}_n\). The plaintext space for the i-th layer of the Damgård-Jurik cryptosystem encryption, \(\mathbb {L}_i\), is \(\mathbb {Z}_{n^{s_0+i}}\) for some user specified choice of \(s_0\). The ciphertext space for this layer is \(\mathbb {Z}_{n^{s_0+i+1}}\). Thus, we clearly have the property that ciphertexts are valid plaintexts in the next layer. An interesting property that immediately follows is that if \(s_0 = \varTheta (i)\), then \(|\mathbb {L}_{i}|/|\mathbb {L}_{0}|\) is a constant. In other words, by setting \(s_0\) appropriately the ciphertext blowup after i layers of encryption is a constant.

We further have that \(\oplus \) (the primitive for homomorphic addition) is integer multiplication and \(\otimes \) (for scalar multiplication) is modular exponentiation. If these operations are performed on ciphertexts in \(\mathbb {L}_{i}\), operations are mod \(\mathbb {Z}_{n^{s_0+i}}\).

6.3 Asymptotic Analysis

We first perform the asymptotic analysis for exact exponential security. The results for negligible in N security in Table 1 is derived by setting \(\lambda =\omega (\log N)\) and \(\gamma =\varTheta (\log ^3 N)\) according to best known attacks [27].

Semi-honest Case

Chunk Size. The Damgård-Jurik cryptosystem encrypts a message of length \(\gamma s_0\) bits to a ciphertext of length \(\gamma (s_0 + 1)\) bits, where \(\gamma \) is a parameter dependent on the security parameter \(\lambda \), and \(s_0\) is a user-chosen parameter. Using Beneš network, each ciphertext chunk accumulates \(O(\log \lambda \log N)\) layers of encryption at the maximum. Suppose the plaintext chunk size is \(B_c := \gamma s_0\), then at the maximum onion layer, the ciphertext size would be \(\gamma (s_0 + O(\log \lambda \log N))\). Therefore, to ensure constant ciphertext expansion at all layers, it suffices to set \(s_0 := \varOmega (\log \lambda \log N)\) and chunk size \(B_c := \varOmega (\gamma \log \lambda \log N)\). This means ciphertext chunks and homomorphic select vectors are also \(\varOmega (\gamma \log \lambda \log N)\) bits.

Then we want our block size to be asymptotically larger than the select vectors at each step of our protocol (other metadata are much smaller).

Size of Select Vectors. Each read requires \(O(\log \lambda )\) encrypted coefficients of \(O(B_c)\) bits each. Eviction along a path requires \(O(\log N)\) Beneš network (bucket-triplet operations), a total of \(O(\lambda \log \lambda \log N)\) encrypted coefficients. Also recall that one eviction happens per \(A=\varTheta (\lambda )\) accesses. Therefore, the select vector size per ORAM access (amortized) is dominated by evictions, and is \(\varTheta (B_c \log \lambda \log N)\) bits.

Setting the Block Size. Clearly, if we set the block size to be \(B := \varOmega (B_c \log \lambda \log N)\), the cost of homomorphic select vectors could be asymptotically absorbed, thereby achieving constant bandwidth blowup. Since the chunk size \(B_c = \varOmega (\gamma \log \lambda \log N)\), we have \(B = \varOmega (\gamma \log ^2\lambda \log ^2 N) \) bits.

Server Computation. The bottleneck of server computation is to homomorphically multiple a block with a encrypted select coefficient. In Damgård-Jurik, this is a modular exponentiation operation, which has \(\widetilde{O}(\gamma ^2)\) computational complexity for \(\gamma \)-bit ciphertexts. This means the per-bit computational overhead is \(\widetilde{O}(\gamma )\). The server needs to perform this operation on \(O(\lambda )\) blocks of size B, and therefore has a computational overhead of \(\widetilde{O}(\gamma )O(B\lambda )\).

Client Computation. Client needs to decrypt \(O(\log \lambda \log N)\) layers to get the plaintext block, and therefore has a computational overhead of \(\widetilde{O}(\gamma )O(B\log \lambda \log N)\).

Malicious Case

Setting the Block Size. The main difference from semi-honest case is that on a read, the client must additionally download \(\varTheta (\lambda )\) verification chunks from each of the \(\varTheta (\lambda )\) blocks (assuming permuted buckets). Select vector size stays the same, and the error-correcting code increases block size by only a constant factor. Thus, the block size we need to achieve constant bandwidth over the entire protocol is \(B= \varOmega (B_c \lambda ^2) = \varOmega (\gamma \lambda ^2 \log \lambda \log N)\).

Client Computation. Another difference is that the client now needs to emulate the server’s homomorphic select operation on the verification chunks. But a simple analysis will show that the bottleneck of client computation is still onion decryption, and therefore remains the same asymptotically.

6.4 Concrete Analysis (Semi-honest Case Only)

Figure 4 shows bandwidth as a function of block size for our optimized semi-honest construction, taking into account all constant factors (including the extra bandwidth cost to recursively look up the position map). Other scheme variants in this paper have the same general trend. We compare to Path PIR and Circuit ORAM, the most bandwidth-efficient constructions with/without server computation that match our server/client storage asymptotics.

Takeaway. The high order bit is that as block size increases, Onion ORAM’s bandwidth approaches 2 B. Note that 2 B is the inherent lower bound in bandwidth since every ORAM access must at least the block of interest from the server and send it back after possibly modifying it. Given an 8 MB block size, which is approximately the size of an image file, we improve in bandwidth over Circuit ORAM by \(\mathbf {35}\) \(\times \) and improve over Path PIR by \(\mathbf {22}\) \(\times \). For very large block sizes, our improvement continues to increase but Circuit ORAM and Path PIR improve less dramatically because their asymptotic bandwidth blowup has a \(\log N\) factor. Note that for sufficiently small block sizes, both Path PIR and Circuit ORAM beat our bandwidth because our select vector bandwidth dominates. Yet, this crossover point is around 128 KB, which is reasonable in many settings.

Fig. 4.
figure 4

Plots the bandwidth multiplier (i.e., the hidden constant for O(B)) for semi-honest Onion ORAM and two prior proposals. We fix the ORAM capacity to \(NB=2^{50}\) and give each scheme the same block size across different block sizes (hence as B increases, N decreases).

Constant Factor Optimization: Less Frequent Leaf Post-processing. In the above evaluation, we apply an additional constant factor optimization. Since \(Z=A=\varTheta (\lambda )\), we must send and receive one additional data block (amortized) per ORAM request to post-process leaf buckets during evictions (Sect. 4.3). To save bandwidth, we can perform this post-processing on a particular leaf bucket every p evictions to that leaf (p is a free variable). The consequence is that the number of layers that accumulate on leaf buckets increases by p which makes each ORAM read path more expensive by the corresponding amount. In practice, \(p\ge 8\) yields the best bandwidth.

Parameterization Details. For both schemes, we set acceptable ORAM failure probability to \(2^{-80}\) which results in \(Z=A\approx 300\) for Onion ORAM, \(Z=120\) for Path PIR [41] and a stash size (stored on the server) of 50 blocks for Circuit ORAM [47]. For Onion ORAM and Path PIR we set \(\gamma = 2048\) bits. For Circuit ORAM, we use the reverse lexicographic eviction order as described in that work, which gives 2 evictions per access and \(Z=2\). For Path PIR, we set the eviction frequency \(v=2\) [41].

6.5 Other Optimizations and Remarks

De-Amortization. We remark that it is easy to de-amortize the above algorithm so that the worst-case bandwidth equals amortized bandwidth and overall bandwidth doesn’t increase. First, it is trivial to de-amortize the leaf bucket post-processing (Sect. 4.3) over the A read path operations because \(A=Z\) and post-processing doesn’t change the underlying plaintext contents of that bucket. Second, the standard de-amortization trick of Williams et al. [50] can be applied directly to our \(\mathsf {EvictAlongPath}\) operation. We remark that it is easy to de-amortize evictions over the next A read operations because moving blocks from buckets (possibly on the eviction path) to the root bucket does not impact our eviction algorithm.

Online Roundtrips. The standard recursion technique [44] uses a small block size for position map ORAMs (to save bandwidth) and requires \(O(\log N)\) roundtrips. In Onion ORAM, the block in the main ORAM is large \(B=\varOmega (\lambda \log N)\). We can use Onion ORAM with the same large block size for position map ORAMs. This achieves a constant number of recursive levels if N is polynomial in \(\lambda \), and therefore maintains the constant bandwidth blowup.

7 Conclusion and Open Problems

This paper proposes Onion ORAM, the first concrete ORAM scheme with optimal asymptotics in worst-case bandwidth blowup, server storage and client storage in the single-server setting. We have shown that FHE or SWHE are not necessary in constructing constant bandwidth ORAMs, which instead can be constructed using only an additively homomorphic scheme such as the Damgård-Jurik cryptosystem. Yet combining SWHE with Onion ORAM improves the computational efficiency of the scheme. We further extend Onion ORAM to be secure in the fully malicious setting using standard assumptions. Due to the known efficiency of SWHE schemes like BGV, we think of our work as an important step towards practical constant bandwidth blowup ORAM schemes.

We do note that while our block size is poly-logarithmic, the exponent is rather large (especially for our malicious construction). Subsequent to our proposal of Onion ORAM, Moataz et al. [35] combined our bounded feedback ORAM with an optimized merge procedure for evictions which reduces server computation and block size for the semi-honest construction. We applaud this effort and argue that semi-honest constant bandwidth blowup ORAM is practical (or nearly practical). We leave tightening up poly-logarithmic factors for our malicious security construction as future work.

Beyond tightening parameters, an open problem is whether constant bandwidth blowup ORAMs can be constructed from non-homomorphic encryption schemes. The computational complexity of the Damgård-Jurik cryptosystem (which relies on modular exponentiation for homomorphic operations), or even more efficient SWHE schemes may be a bottleneck in practice. Can we construct constant bandwidth ORAM using simple computation such as XOR and any semantically secure encryption scheme with small ciphertext blowup? A partial result in this direction comes from Burst ORAM [8]: simple computation on ciphertexts (mod 2 XOR) enables a family of schemes (e.g., [39]) to achieve constant online bandwidth blowup on a request. Whether similar ideas can lead to constant bandwidth blowup on eviction is unclear.