Keywords

1 Introduction

Data outsourcing allows a client to store their data on the cloud to reduce data management and maintenance costs. Despite its merits, cloud services come with severe privacy issues. The client may encrypt their data with standard encryption to protect their privacy. However, these techniques also prevent the client from performing basic operations (e.g., search/update) over the outsourced encrypted data. This significantly degrades the benefits of cloud services. In the following, we first outline the current state-of-the-art techniques and their limitations and then, present our methods towards addressing these challenges.

1.1 State-of-the-Art and Limitations

Information Leakage in DSSE. The concept of searchable symmetric encryption (SSE) was first proposed by Song et al. [24]. This construction can only search on static encrypted data. Curtmola et al. [11] introduced single-keyword-searched SSE with formal security definition, followed by refinements with extended capabilities such as ranked query [27], multi-keyword search [26] or their combinations [7]. Dynamic Searchable Symmetric Encryption (DSSE) was introduced by Kamara et al. [17], which offers both search and update on encrypted files \( \mathcal {F}\) via an encrypted index \( \mathbf {I}\) representing keyword-file relationships. Many DSSE schemes have been proposed, each offering various performance, functionality and security trade-offs [4] (e.g., [6, 9, 17, 20, 29, 31]).

It is known that all standard DSSE schemes leak significant information, which are vulnerable to statistical inference analysis [8, 16, 18, 30]. There are two sources of information leakages in DSSE: (i) leakages through search and update on encrypted index \(\mathbf {I}\), (ii) leakages due to access of encrypted files \( \mathcal {F} \). Specifically, since the search and update tokens are deterministic, all DSSE schemes leak access patterns on both \( \mathbf {I}\) and \( \mathcal {F}\). Furthermore, most of them also leak the content of updated files during the update (i.e., forward-privacy) and historical updates (add/delete) on the keyword during the search on \( \mathbf {I}\) (i.e., backward-privacy). By exploiting these leakages, recent studies have shown that, sensitive information about encrypted queries and files can be recovered [8, 18]. Zhang et al. [30] has presented file-injection attacks that can determine which keywords have been searched, especially in forward-insecure DSSE schemes. Although some DSSE schemes with improved security (e.g., forward and backward privacy) have been proposed (e.g., [6]), they rely on extremely costly public key operations and still leak access patterns. Liu et al. [18] demonstrated an attack that can determine which keywords have been searched by observing the frequency of search queries (search patterns). Zhang et al. [30] has indicated that, future research on DSSE should focus on sealing information leakages rather than accepting them by default. Unless these leakages are prevented, a trustworthy deployment of DSSE for privacy-critical applications may remain extremely difficult.

Performance Hurdles of the Existing Approaches to Reduce Information Leakages in DSSE. Several attempts (e.g., [5, 15]) are either highly costly or unable to completely seal all leakages in DSSE access patterns. Generic Oblivious Random Access Machine (ORAM) [25]Footnote 1 can hide access patterns, and therefore, it can prevent most of the information leakages in DSSE. Garg et al. [12] proposed TWORAM scheme, which optimizes the round-trip communication under \( \mathcal {O}{(1)} \) client storage when using ORAM to hide file access patternsFootnote 2 in DSSE. Despite its merits, prior studies (e.g., [9, 21]) stated that generic ORAM (e.g., [25]) is still costly to be used in DSSE due to its logarithmic communication overhead. Although several ORAMs with \( \mathcal {O}(1) \) bandwidth complexity have been introduced recently, they are still very costly due to the use of Homomorphic Encryption (HE). The performance of such schemes has been shown to be worse than \( \mathcal {O}(\log N) \)-bandwidth ORAMs [2].

Fig. 1.
figure 1

Our research objective and high-level approach.

1.2 Our Research Objective and Contributions

It is imperative to seal information leakages from accessing encrypted files \( \mathcal {F}\) and encrypted index \( \mathbf {I}\). Since the size of individual files in \( \mathcal {F} \) might be arbitrarily large and each search/update query might involve a different number of files, to the best of our knowledge, generic ORAM seems to be the only option for oblivious access on \( \mathcal {F}\). The objective of this paper is to design oblivious access techniques on \( \mathbf {I}\), which are more efficient than using generic ORAM, by exploiting special properties of searchable encryption and \( \mathbf {I}\) as elaborated in Fig. 1. Particularly, we identify a suitable data structure for \( \mathbf {I}\) that allows search and update to operate on separate dimensions. This property permits us to harness communication-efficient techniques such as Write-Only ORAM for update and, by exploiting distributed cloud infrastructure, multi-server PIR for search with low computation overhead. Note that the low communication and computation are important factors in practice since they directly translate into the low end-to-end delay and consequently, improve the quality of services of cloud systems. Notice that the price to pay for such low delay is the collusion vulnerability in the distributed setting, where we assume a limited number of servers that can collude with each other, which is the common adversarial model of multi-server PIR techniques (see Sects. 2 and 4).

We propose a series of Oblivious Distributed Encrypted Index \( \mathbf {I}\) on the distributed cloud infrastructure with the application on DSSE, which we refer to as \( \text {ODSE}\) (Fig. 1). We present two \( \text {ODSE}\) schemes called \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) and \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\), each offering various desirable performance and security properties as follows.

  • Low end-to-end delay: \( \text {ODSE}\) schemes achieve low end-to-end-delay, which are \( 3 \times \)\(57\times \)faster than the use of efficient generic ORAMs (e.g., [22, 25]) (with optimization [12]) on encrypted index under real network settings (see Sect. 5).

  • Full obliviousness with Information-theoretic security: \( \text {ODSE}\) seals information leakages due to accesses on encrypted index \( \mathbf {I}\) that lead into statistical attacks such as forward/backward privacy, query types (search/update), hidden size and access patterns. \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) and \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) offer computational and information-theoretic security for \( \mathbf {I}\) and operations on it, respectively.

  • Robustness against malicious servers: \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) can tolerate a certain number of malicious servers in the system.

  • Full-fledged implementation and open-sourced framework: We fully implemented all the proposed \( \text {ODSE}\) schemes, and evaluated their performance on real-cloud infrastructure. To the best of our knowledge, we are among the first to open-source an oblivious access framework for DSSE encrypted index that can be publicly used for comparison and wide adaptation (see Sect. 5).

It is clear that the standard DSSE constructions (e.g., [9]) are much faster, but also less secure than our proposed methods in the sense of leaking more information beyond the access patterns (e.g., forward-privacy, backward-privacy) over the encrypted index. Compared with standard DSSE where access patterns are leaked by default, \( \text {ODSE}\) schemes offer higher security by sealing all these leakages at the cost of higher latency. Nevertheless, they are more efficient than using generic ORAM techniques atop the DSSE encrypted index to seal such leakages in some certain cases regarding database and query sizes. We provide the detail analysis in Sect. 5.

2 Preliminaries and Building Blocks

Notation. We denote \(\mathbb {F}_p\) as a finite field where p is a prime. Operators || and \( \oplus \) denote the concatenation and XOR, respectively. \({(\cdot )}_{\mathsf {bin}}\) denotes the binary representation. \( \mathbf {u} \cdot \mathbf {v} \) denotes the inner product of two vectors \( \mathbf {u} \) and \( \mathbf {v} \). \(x{\mathop {\leftarrow }\limits ^{\$}}\mathcal {S}\) denotes that x is randomly and uniformly selected from set \(\mathcal {S}\). Given I as a row/column of a matrix, I[i] denotes accessing i-th component of I. Given a matrix \(\mathbf {I}\), \( \mathbf {I}[*,j\dots j'] \) denotes accessing columns j to \( j' \) of \(\mathbf {I}\). Let \(\mathcal {E}=(\mathsf {Enc},\mathsf {Dec},\mathsf {Gen})\) be an IND-CPA symmetric encryption: \( \kappa \leftarrow \mathcal {E}.\mathsf {Gen}(1^{\theta }) \) generating key with security parameter \( \theta \); \( C \leftarrow \mathcal {E}.\mathsf {Enc}_{\kappa }(M) \) encrypting plaintext M with key \( \kappa \); \(M\leftarrow \mathcal {E}.\mathsf {Dec}_{\kappa }(C)\) decrypting ciphertext C with key \( \kappa \).

Fig. 2.
figure 2

Shamir Secret Sharing (SSS) scheme [23].

Shamir Secret Sharing (SSS). We present \((t,\ell )\)-threshold Shamir Secret Sharing (SSS) scheme [23] in Fig. 2. Given a secret \(\alpha \in \mathbb {F}_p\), the dealer generates a random t-degree polynomial f and evaluates \( f(x_i) \) for party \(\mathcal {P}_l \in \{\mathcal {P}_1,\dots ,\mathcal {P}_\ell \}\), where \( x_l \in \mathbb {F}_p\setminus \{0\}\) is the deterministic identifier of \( \mathcal {P}_l \). We denote the share for \( \mathcal {P}_l \) as \([\![\alpha \!]\!]_l\). The secret can be reconstructed by combining at least \( t+1 \) correct shares via Lagrange interpolation. Note that the secret can be recovered from a number of incorrect shares by error correction techniques (discussed in Sect. 4). We use this property to improve the robustness of our protocol in malicious settings.

\( {\text {SSS}}\) is t-private so that any combinations of t shares leak no information about the secret. \( {\text {SSS}}\) offers homomorphic properties including addition, scalar multiplication, and partial multiplication. We extend the notion of share of value to indicate the share of vector: \([\![\mathbf {v} \!]\!]_i = (\![\![v_1 \!]\!]_l,\dots ,\![\![v_n \!]\!]_l)\) denotes the share of vector \( \mathbf {v} \) for party \( \mathcal {P}_\ell \), in which \([\![v_i \!]\!]\) is the share of component \( v_i \) in \( \mathbf {v} \).

Private Information Retrieval (PIR). PIR enables private retrieval of a data item from a (unencrypted) public database server. We recall two efficient multi-server PIR protocols: (i) XOR-based PIR [10] (Fig. 3) which uses XOR operations and requires each server \( S_l \) to store \( \mathbf {b}_l \), a replica of database \( \mathbf {b} \) containing m blocks \((b_1,\dots ,b_m) \) with the same size; (ii) SSS -based PIR [13] (Fig. 4), which relies on homomorphic properties of \( {\text {SSS}}\), where each server stores \( \mathbf {b}_l \), a replica of the database \( \mathbf {b} \) containing m blocks \((b_1,\dots ,b_m) \), where \( b_i \in \mathbb {F}_p\).

Fig. 3.
figure 3

XOR-based PIR [10].

Fig. 4.
figure 4

SSS-based PIR [13].

Write-Only ORAM. ORAM allows the user to hide the access patterns when accessing their encrypted data on the cloud. In contrast to generic ORAM where both read and write operations are hidden, Blass et al. [3] proposed a Write-Only ORAM scheme, which only hides the write pattern in the context of hidden volume encryption. Intuitively, 2n memory slots are used to store n blocks, each assigned to a distinct slot and a position map is maintained to keep track of block’s location. Given a block to be rewritten, the client reads \( \lambda \) slots chosen uniformly at random and writes the block to a dummy slot among \( \lambda \) slots. Data in all slots are encrypted to hide which slot is updated. By selecting \( \lambda \) sufficiently large (e.g., 80), one can achieve a negligible failure probability, which might occur when all \( \lambda \) slots are non-dummy. It is possible to select a small \( \lambda \) (e.g., 4). In this case, the client maintains a stash component \( S\) of size \( \mathcal {O}(\log n) \) to temporarily store blocks that cannot be rewritten when all read slots are full.

3 The Proposed \( \text {ODSE}\) Schemes

Intuition. In DSSE, keyword search and file update on \( \mathbf {I}\) are read-only and write-only operations, respectively. This property permits us to leverage specific bandwidth-efficient oblivious access techniques for each operation such as multi-server PIR (for search) and Write-Only ORAM (for update) rather than using generic ORAM. The second requirement is to identify an appropriate data structure for \( \mathbf {I}\) so that the above techniques can be adapted. We found that forward index and inverted index are the ideal choices for the file update and keyword search operations, respectively as proposed in [14]. However, doing search and update on two isolated indexes can cause an inconsistency, which requires the server to perform synchronization. The synchronization operation leaks significant information [14]. To avoid this problem, it is necessary to integrate both search index and update index in an efficient manner. Fortunately, this can be achieved by leveraging a two-dimensional index (i.e., matrix), which allows keyword search and file update to be performed in two separate dimensions without creating any inconsistency at their intersection. This strategy permits us to perform computation-efficient (multi-server) PIR on one dimension, and communication-efficient (Write-Only) ORAM on the other dimension to achieve oblivious search and update, respectively, with a high efficiency.

3.1 ODSE Models and Data Structures

System Model. Our model comprises a client and \(\ell \) servers \(\varvec{\mathcal {S}} = (\mathcal {S}_1, \dots , \mathcal {S}_{\ell })\), each storing a version of the encrypted index. In our system, the encrypted files are stored on \( S' \), a separate server different from \( \varvec{\mathcal {S}} \) (as in [15]), which can be obliviously accessed via a generic ORAM (e.g., [25]). In this paper, we only focus on oblivious access of the encrypted index on \( \varvec{\mathcal {S}} \).

Threat Model. In our system, the client is trusted and the servers \( \mathcal {S}\) are untrusted. We consider the servers to be semi-honest, meaning that they follow the protocol faithfully, but can record the protocol transcripts to learn information regarding the client’s access pattern. However, our system can be easily extended to deal with malicious servers that attempt to tamper the input data to compromise the correctness and the security of the system (see Sect. 4). We allow upto \( t < \ell \) (privacy parameter) servers among \( \mathcal {S}\) to be colluding, meaning that they can share their own recorded protocol transcripts with each other. We present the formal security model in Sect. 4.

Data Structures. Assume that the outsourced database can store up to \( {N}\) distinct files and \( {M}\) unique keywords, our index is an incidence matrix \( \mathbf {I}\), where each cell \( \mathbf {I}[i,j] \in \{0,1\} \) represents the relationship between the keyword at row i and the file at column j. Each keyword and file is assigned to a unique row and column index, respectively. Each row of \( \mathbf {I}\) represents the search result of a keyword while the content (unique keywords) of a file is represented by a column. Since we use Write-Only ORAM for file update, the number of columns in \( \mathbf {I}\) are doubled and a stash \( S\) is used to store columns of \( \mathbf {I}\) during the update. Therefore, the size of search index \( \mathbf {I}\) is \( {M}\times 2{N}\).

We leverage two static hash tables \( T_w\), \( T_f\) as in [28] to keep track of the location of keywords and files in \( \mathbf {I}\), respectively. They have the following structure: \(T:= \langle \mathsf {key}, \mathsf {value} \rangle \), where \( \mathsf {key} \) is a keyword or file ID and \( \mathsf {value} \leftarrow T[\mathsf {key}]\) is the (row/column) index of \( \mathsf {key} \) in \( \mathbf {I}\). Since there are \( 2{N}\) columns in \( \mathbf {I}\) while only \( {N}\) files, we denote \( \mathcal {D}\) as the set of dummy columns that are not assigned to a particular file.

3.2 \(\textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\): Fast ODSE

We introduce \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) that harnesses XOR-based PIR and Write-Only ORAM to achieve low search and update latency.

Setup. Let \(\varPi \) and \(\varPi '\) be random permutations on \(\{1,\dots ,2{N}\}\) and \(\{1,\dots ,{M}\}\) respectively. The procedure to setup encrypted index for \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) is as follows.

figure a

Once \( \mathcal {I}\) is constructed, the client sends \( \mathbf {I}_i \) to server \( \mathcal {S}_i \), and keeps \( \sigma \) as secret.

Search. Intuitively, to search for a keyword w, the client and server execute the XOR-based PIR protocol on the row dimension of \( \mathbf {I}\) to privately retrieve the row data of w. Since the row is encrypted rather than being public as in the traditional PIR model, the client performs decryption on the retrieved data and filter dummy column indexes to obtain the search result. The detail is as follows.

figure b

Update: The overall strategy is to perform a Write-Only ORAM on the column of \( \mathbf {I}\) to achieve oblivious file update operations as follows.

figure c

3.3 \(\textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\): Robust and IT-Secure \(\text {ODSE}\)

Although \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) offers highly-efficient search and update operations, it has the following security limitations: (i) it can only (at most) detect but cannot recover from malicious servers, which might tamper the data to compromise the privacy and correctness of the protocol. In privacy-critical applications, it is desirable to recover from malicious servers to improve the robustness of the protocol; (ii) the encrypted index and update operations on it are only computationally-secure due to the IND-CPA encryption.

To address the limitations of \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), we introduce \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) that offers (i) improved robustness against malicious servers with a partial recover capability, and (ii) the highest level of security (i.e., information-theoretic) for both \( \mathbf {I}\) and operations on it. The main idea is to share the index with \( {\text {SSS}}\), and harness \( {\text {SSS}}\)-based PIR to conduct private search. The robustness comes from the ability to recover the secret shared by \( {\text {SSS}}\) in the presence of incorrect shares (see Sect. 4).

Setup: The client first constructs an index \( \mathbf {I}' \) representing keyword-file relationships as in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}.\mathsf {Setup}\). Instead of encrypting \( \mathbf {I}' \), the client creates shares of \( \mathbf {I}' \) by \( {\text {SSS}}\). Since \( {\text {SSS}}\) operates on elements in \( \mathbb {F}_p\), each row of \( \mathbf {I}' \) is split into \( {\lfloor }\log _2 p{\rfloor } \)-bit chunks before \( {\text {SSS}}\) computation. So, the index \( \mathbf {I}_i \) is the \( {\text {SSS}}\) share of \( \mathbf {I}' \) for server \( \mathcal {S}_i \), which is a matrix of size \( {M}\times 2{N}' \), where \( \mathbf {I}_i[i,j] \in \mathbb {F}_p\) and \( {N}' = {{N}}/{{\lfloor }\log _2 p{\rfloor }} \). The detail is as follows.

figure d

Similar to \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), the client sends \( \mathbf {I}_i \) to server \( \mathcal {S}_i \) and keep \( \sigma \) as secret.

Search. The client executes the \( {\text {SSS}}\)-based PIR protocol on the row dimension of encrypted index to retrieve the row of searched keyword as follows.

figure e

Update: We execute Write-Only ORAM on the column dimension of the encrypted index for the file update. Recall that in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), \( \lambda \) random columns of the original index \( \mathbf {I}' \) are read to update one column. In \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\), each column of the index \( \mathbf {I}_i \) on \( \mathcal {S}_i \) contains the share of \( {\lfloor }\log _2p{\rfloor } \) successive columns of \( \mathbf {I}' \). Therefore, the client reads \( \lambda ' = {\lceil }\frac{\lambda }{{\lfloor }\log _2p{\rfloor }}{\rceil } \) random columns of \( \mathbf {I}_i \) from \( t+1 \) servers to recover \( \lambda \) columns of \( \mathbf {I}' \) before performing update. The detail is as follows.

figure f

4 Security

Definition 1

(ODSE security). Let \({\varvec{op}} = (\mathsf {op}_1,\dots ,\mathsf {op}_q)\) be an operation sequence over the distributed encrypted index \( \mathcal {I}\), where \( \mathsf {op}_i \in \big \{\textsf {Search}(w),\) \(\textsf {Update}(f_{id})\big \} \), w is a keyword to be searched and \( f_{id} \) is a file with keywords to be updated. Let \( {\varvec{ODSE}_j({\varvec{o}}}) \) represent the \( \text {ODSE}\) client’s sequence of interactions with server \( \mathcal {S}_j \), given an operation sequence \({\varvec{o}}\).

An \( \text {ODSE}\) is t-secure if \( \forall \mathcal {L} \subseteq \{1,\dots ,\ell \} \) s.t. \( |\mathcal {L}| \le t \), for any two operation sequences \( {\varvec{op}} \) and \( {\varvec{op}}' \) where \( |{\varvec{op}}| = |{\varvec{op}}'| \), the views \( \{\mathbf {ODSE}_{i\in \mathcal {L}}({\varvec{op}})\} \) and \( \{\mathbf {ODSE}_{i\in \mathcal {L}}({\varvec{op}}')\} \) observed by a coalition of up to t servers are (perfectly, statistically or computationally) indistinguishable.

Remark 1

One might observe that search and update operations in \( \text {ODSE}\) schemes are performed on rows and columns of the encrypted index, respectively. This access structure might enable the adversary to learn whether the operation is search or update, even though each operation is secure. Therefore, to achieve security as in Definition 1, where the query type should also be hidden, we can invoke both search and update protocols (one of them is the dummy operation) regardless of whether the intended action is search or update.

We argue the security of our proposed schemes as follows.

Theorem 1

\(\textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) scheme is computationally \( (\ell -1) \)-secure by Definition 1.

Proof

(Sketch) (i) Oblivious Search: \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) leverages XOR-based PIR and therefore, achieves (\( \ell -1 \))-privacy for keyword search as proven in [10]. (ii) Oblivious Update: \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) employs Write-Only ORAM which achieves negligible write failure probability and therefore, it offers the statistical security without counting the encryption. The index in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) is IND-CPA encrypted, which offers computational security. Therefore in general, the update access pattern of \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) scheme is computationally indistinguishable. \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) performs Write-Only ORAM with an identical procedure on \( \ell \) servers (e.g., the indexes of accessed columns are the same in \( \ell \) servers), and therefore, the server coalition does not affect the update privacy of \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\). (iii) ODSE Security: By Remark 1, \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) performs both search and update regardless of the actual operation. As analyzed, search is \( (\ell -1) \)-private and update pattern is computationally secure. Therefore, \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) achieves computational \( (\ell -1) \)-security by Definition 1.    \(\square \)

Theorem 2

\(\textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) scheme is statistically t-secure by Definition 1.

Proof

(Sketch) (i) Oblivious Search: \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) leverages an SSS-based PIR protocol and therefore, achieves t-privacy for keyword search due to the t-privacy property of \( {\text {SSS}}\) [13]. (ii) Oblivious Update: The index in \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) is \( {\text {SSS}}\)-shared, which is information-theoretically secure in the presence of t colluding servers. \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) also employs Write-Only ORAM, which offers statistical security due to negligible write failure probability. Therefore in general, the update access pattern of \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) scheme is information-theoretically (statistically) indistinguishable in the coalition of up to t servers. (iii) ODSE Security: By Remark 1, \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) performs both search and update protocols regardless of the actual operation. As analyzed above, search is t-private and update pattern is statistically t-indistinguishable. Therefore, \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) is information-theoretically (statistically) t-secure by Definition 1.    \(\square \)

4.1 Malicious Input Tolerance

We have shown that \( \text {ODSE}\) schemes offer a certain level of collusion-resiliency in the honest-but-curious setting where the server follows the protocol faithfully. In some privacy-critical applications, it is necessary to achieve data integrity in the malicious environment, where the adversary can tamper the query and data to compromise the correctness and privacy of the protocol. We show that \( \text {ODSE}\) schemes can be extended to detect and be robust against malicious servers as follows. In \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), we can leverage Message Authentication Code (e.g., HMAC) as presented in [19], where authenticated tag for each row and each column of \( \mathbf {I}\) is generated. The server will perform operations (i.e., PIR, Write-Only ORAM) on such tags as similar to encrypted index data and send the result to the client. The client can recover/decrypt the row/column as well as its authenticated tag verify the integrity.

Since \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) relies on \( {\text {SSS}}\) as the building block, we can not only detect but also be robust against malicious server. The main idea is to leverage list decoding algorithm as in [13], given that the Lagrange interpolation in \( \mathsf {{\text {SSS}}.Recover}\) algorithm does not return a consistent value. Such techniques also allow to determine precisely which server has tampered the data. We refer readers to [13] for detailed description. In general, the list decoding allows \(t_m \le t < \ell - {\lceil }\sqrt{\ell t}{\rceil }\) number of incorrect shares of \([\![\alpha \!]\!]^{(t)}\).

5 Experimental Evaluation

5.1 Configurations

Implementation Details. We implemented all \( \text {ODSE}\) schemes in C++. Specifically, we used Google Sparsehash to implement hash tables \( T_f\) and \( T_w\). We utilized Intel AES-NI library to implement AES-CTR encryption/decryption in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\). We leveraged Shoup’s NTL library for pseudo-random number generator and arithmetic operations over finite field. We used ZeroMQ library for client-server communication. We used multi-threading technique to accelerate PIR computation at the server. Our code is publicly available at

figure g

Hardware and Network Settings. We used Amazon EC2 with r4.4xlarge instance for server(s), each equipped with 16 vCPUs Intel Xeon @ 2.3 GHz and 122 GB RAM. We used a laptop with Intel Core i5 @ 2.90 GHz and 16 GB RAM as the client. All machines ran Ubuntu 16.04. The client established a network connection with the server via WiFi. We used a real network setting, where the download and upload throughputs are 27 and 5 Mbps, respectively.

Dataset. We used subsets of the Enron dataset to build \(\mathbf {I}\) containing from millions to billions of keyword-file pairs. The largest database in this study contain around 300,000 files with 320,000 unique keywords. Our tokenization is identical to [21] so that our keyword distribution and query pattern is similar to [21].

Instantiation of Compared Techniques. We compared \( \text {ODSE}\) with a standard DSSE scheme [9], and the use of generic ORAM atop the DSSE encrypted index. The performance of all schemes was measured under the same setting and in the average-case cost, where each query involves half of the keywords/files in the database. We configured \( \text {ODSE}\) schemes and their counterparts as follows.

  • ODSE: We used two servers for \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) and three servers for \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) scheme. We selected \( \lambda = 4 \) for \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), and \( \lambda '=4 \) with \( \mathbb {F}_p\) where p is a 16-bit prime for \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\). We note that selecting larger p (up to 64 bits) can reduce the PIR computation time, but also increase the bandwidth overhead. We chose a 16-bit prime field to achieve a balanced computation vs. communication overhead.

  • Standard DSSE: We selected one of the most efficient DSSE schemes by Cash et al. in [9] (i.e., \( \mathrm {\varPi }_{\mathsf {2lev}}^{\mathsf {dyn}} \) variant) to showcase the performance gap between \( \text {ODSE}\) and standard DSSE. We estimated the performance of \( \mathrm {\varPi }_{\mathsf {2lev}}^{\mathsf {dyn}} \) using the same software/hardware environments and optimizations as \( \text {ODSE}\) (e.g., parallelization, AES-NI acceleration). Note that we did not use the Java implementation of this scheme available in Clusion library [1] for comparison due to its lack of hardware acceleration support (no AES-NI) and the difference between running environments (Java VM vs. C). Our estimation is conservative in that, we used numbers that would be better than the Clusion library.

  • Using generic ORAM atop DSSE encrypted index: We selected non-recursive Path-ORAM [25] and Ring-ORAM [22], rather than recent ORAMs as \( \text {ODSE}\) counterparts since they are the most efficient generic ORAM schemes to date. Since we focus on encrypted index rather than encrypted files in DSSE, we did not explicitly compare our schemes with TWORAM [12] but instead, used one of their techniques to optimize the performance of using generic ORAM on DSSE encrypted index. Specifically, we applied the selected ORAMs on the dictionary index containing keyword-file pairs as in [21] along with the round-trip optimization as in [12]. Note that our estimates are also conservative where memory access delays were excluded, and cryptographic operations were optimized and parallelized to make a fair comparison between the considered schemes.

Fig. 5.
figure 5

Latency of \( \text {ODSE}\) schemes and their counterparts.

5.2 Overall Results

Figure 5 presents the end-to-end delays of \( \text {ODSE}\) schemes and their counterparts, where both search and update are performed in \( \text {ODSE}\) schemes to hide the actual type of operation (see Remark 1). \( \text {ODSE}\) offers a higher security than standard DSSE at the cost of a longer delay. However, \( \text {ODSE}\) schemes are \( 3 \times \)\(57\times \) faster than the use of generic ORAMs to hide the access patterns. Specifically, with an encrypted index containing ten billions of keyword-file pairs, \( \mathrm {\varPi }_{\mathsf {2lev}}^{\mathsf {dyn}} \) cost 36 ms and 600 ms to finish a search and update operation, respectively. \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) and \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) took 2.8 s and 7.1 s respectively, to accomplish both keyword search and file update operations, compared with 160 s by using Path-ORAM with the round-trip optimization [12]. \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) is the most efficient in terms of search, whose delay was less than 1 s. This is due to the fact that \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) only requires XOR operations and the size of the search query is minimal (i.e., a binary string). \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) is more robust (e.g., malicious tolerant) and more secure (e.g., unconditional security) than \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) at the cost of higher search delay (i.e., 4 s) due to the larger search query and \( {\text {SSS}}\) arithmetic computations. For the file update, \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) costs 3 s, which is slightly higher than \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) (i.e., 2.2 s) since it needs to transmit more data (4 blocks vs. 4 columns) to more servers (3 vs. 2). We further provide a comparison of ODSE schemes with their counterparts in Table 1. We dissected the total cost to investigate which factors contributed the most to the latency of \( \text {ODSE}\) schemes as follows.

5.3 Detailed Cost Analysis

Figure 6 presents the total delays of separate keyword search and file update operations, as well as their detailed costs in \( \text {ODSE}\) schemes. Note that \( \text {ODSE}\) performs both search and update (one of them is dummy) to hide the actual type of operation performed by the client.

Table 1. Comparison of \( \text {ODSE}\) and its counterparts for oblivious access on \( \mathbf {I}\).
Fig. 6.
figure 6

Detailed search () and update () costs of \( \text {ODSE}\) schemes.

  • Client processing: As shown in Fig. 6, client computation contributes the least amount to the overall search delay (less than \(10\%\)) in all \( \text {ODSE}\) schemes. The client computation comprises the following operations: (1) Generate select queries (with SSS in \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) and PRF in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\)); (2) \( {\text {SSS}}\) recovery and IND-CPA decryption (in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\)); (3) Filter dummy columns. Note that the client delay of \( \text {ODSE}\) schemes can be further reduced (by at least 50%-60%) via pre-computation of some values such as row keys and select queries (only contain shares of 0 or 1). For the file update, the client performs decryption and re-encryption on \( \lambda \) columns (in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\)), or SSS over \( \lambda ' \) blocks (in \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\)). Since we used crypto acceleration (i.e., Intel AES-NI) and highly optimized number theory libraries (i.e., NTL), all these computations only contributed to a small fraction of the total delay.

  • Client-server communication: Data transmission is the dominating factor in the delay of \(\text {ODSE}\) schemes. The communication cost of \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) is smaller than that of other ODSE schemes, since the size of search query and the data transmitted from servers are binary vectors. In \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\), the size of components in the select vector is 16 bits. The communication overhead of \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) can be reduced by using a smaller finite field, but at the cost of increased PIR computation on the server side.

  • Server processing: The cost of PIR operations in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) is negligible as it uses XOR. The PIR computation of \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) is reasonable, as it operates on a bunch of 16-bit values. For update operations, the server-side cost is mainly due to memory accesses for column update. \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) is highly memory access-efficient since we organized the memory layout for column-friendly access. This layout minimizes the memory access delay not only in update but also in search, since the inner product in PIR also accesses contiguous memory blocks by this organization. In \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\), we stored the matrix for row-friendly access to permit efficient XOR operations during search. However, this requires file update to access non-contiguous memory blocks. Hence, the file update in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) incurred a higher memory access delay than that of \( \textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) as shown in Fig. 6.

  • Storage overhead: The main limitation of \( \text {ODSE}\) schemes is the size of encrypted index, whose asymptotic cost is \( \mathcal {O}({N}\cdot {M}) \), where \( {N}\) and \( {M}\) are the number of files and unique keywords, respectively. Given the largest database being experimented, the size of our encrypted index is 23 GB. The client storage includes two hash tables of size \( \mathcal {O}({M}) \) and \( \mathcal {O}({N}\log {N}) \), the stash of size \( \mathcal {O}({M}\cdot \log {N}) \), the set of dummy column indexes of size \( \mathcal {O}({N}\log {N}) \), a counter vector of size \( \varOmega ({N}) \) and a master key (in \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) scheme). Empirically, with the same database size discussed above, the client requires approximately 22 MB in both \( \text {ODSE}\) schemes.

5.4 Experiment with Various Query Sizes

We studied the performance of our schemes and their counterparts in the context of various keyword and file numbers involved in search and update operations that we refer to as “query size”. As shown in Fig. 7, \( \text {ODSE}\) schemes are more efficient than using generic ORAMs when more than 5% of keywords/files in the database are involved in the search/update operations. Since the complexity of \( \text {ODSE}\) schemes is linear to the number of keywords and files (i.e., \( \mathcal {O}({M}+ {N}) \)), their delay is constant and independent from the query size. The complexity of ORAM approaches is \( \mathcal {O}(r \log ^2 ({N}\cdot {M})) \), where r is the query size. Although the bandwidth cost of \( \text {ODSE}\) schemes is asymptotically linear, their actual delay is much lower than using generic ORAM, whose cost is poly-logarithmic to the total number of keywords/files but linear to the query size. This confirms the results of Naveed et al. in [21] on the performance limitations of generic ORAM and DSSE composition, wherein we used the same dataset for our experiments.

Fig. 7.
figure 7

Latency of \( \text {ODSE}\) schemes and their counterparts with different fraction of keywords/files involved in a search/update operation.

6 Conclusion

We proposed a new set of Oblivious Distributed DSSE schemes called \( \text {ODSE}\), which achieve full obliviousness, hidden size pattern, and low end-to-end delay simultaneously. Specifically, \( \textsf {ODSE}_{\textsf {xor}}^{\textsf {wo}}\) achieves the lowest end-to-end delay with the smallest communication overhead among all of its counterparts with the highest resiliency against colluding servers. \(\textsf {ODSE}_{\textsf {it}}^{\textsf {wo}}\) achieves the highest level of privacy with information-theoretic security for access patterns and the encrypted index, along with the robustness against malicious servers. Our experiments demonstrated that ODSE schemes are one order of magnitude faster than the most efficient ORAM techniques over DSSE encrypted index. We have released the full implementation of our \( \text {ODSE}\) schemes for public use and wide adaptation.