Keywords

1 Introduction

Cloud computing has become a prevalent and economic platform for users to outsource and share data. For instance, through a file or picture sharing service (e.g., Dropbox, Flickr), users can conveniently upload files or pictures to the cloud to share them with their friends, family or colleagues. In a health information sharing system, a number of research institutes and hospitals may share medical data to facilitate collaboration and accelerate scientific discovery. In such data sharing applications, the sensitive nature of the data means that it is necessary for data owners to encrypt the data before outsourcing it to the cloud. On the other hand, useful functions, e.g., searching over the outsourced encrypted data, should still be supported to preserve the utility of the data. Another complication in such data sharing applications is that data owners often selectively share data with others. For instance, Alice may want to share family pictures with her family members, while share work related files with her colleagues. As such, a user can only have access to a subset of the files that are permitted by the corresponding data owners. Since different users have access to different files, a natural way for encryption is that a data owner encrypts different files using different keys. After that, a data owner gives each user the encryption/decryption keys of all the files that the user is authorized to search. This trivial solution is clearly inefficient. First, the number of keys that the data owner needs to send to a user depends on the number of files. Secondly, the number of trapdoors (or tokens) that a user needs to submit when searching for a keyword also depends on the number of files.

In a recent study [25], Popa and Zeldovich solve the above problem by proposing a multi-key searchable encryption scheme that allows a user to provide a single search token to the server, while allowing the server to search for the (encrypted) keyword in documents encrypted with different keys. Their solution requires each data owner to send the server some public information (delta values) for each user that is allowed to search a file. In general, suppose a data owner has m files, each file is shared with up to n users, and each file contains up to k keywords, the data owner needs to outsource \(O(m(n+k))\) amount of data to the server to support keyword search, where O(mn) and O(mk) correspond to delta values and keyword ciphertexts, respectively. As a result, both the network bandwidth overhead (from the data owner to the server) and storage overhead (at the server) are \(O(m(n+k))\). While the overhead associated with outsourcing keyword ciphertexts is unavoidable, it is desirable to reduce the overhead for outsourcing the delta values, which can dominate the overhead when \(k \ll n\) (e.g., for a picture with a small number of keywords but shared with many users).

Our contributions. Our contributions are as follows:

  • We give an efficient encrypted keyword search scheme for multi-user data sharing that overcomes the drawbacks in [25]. Specifically, in our scheme, when a data owner has m files, each file has up to k keywords, to support encrypted keyword search by up to n users, the data owner only needs to outsource O(mk) amount of information to the cloud. Given that outsourcing keyword ciphertexts is unavoidable (which is O(mk)), our scheme is asymptotically optimal in network bandwidth usage and cloud storage. In addition, each user has constant size secret keys and generates constant size trapdoor for a keyword search in all the files that he is authorized to search by a data owner.

  • Our scheme requires very different techniques to compress ciphertexts and succinctly represent the search/access policies that dictate which files a user can access. We model and analyze the security of our construction and provide detailed proofs for keyword, file and trapdoor privacy, respectively.

  • We give an extension that allows a system with multiple trusted parties (e.g., managers, data owners) to generate distributively the public key for the system and secret keys for the users, thus eliminating the need of having a single trusted party.

Related Work. To the best of our knowledge, no existing study achieves efficiency comparable to ours in the setting that we study. Our work is related to searchable encryption in general. Single-user searchable encryption considers a single data owner and only the data owner is allowed to submit search queries. It has been studied extensively, with focus on symmetric-key settings [13, 16, 19, 27], asymmetric-key settings [5, 8], supporting complicated (conjunctive, subset, or boolean) queries [10, 12], or dynamic settings [21]. Our work is in multi-user setting where a group of users can submit search queries.

Multi-user searchable encryption. While multi-user searchable encryption has been investigated in a number of studies, these studies differ from our study in important aspects. In [16], a user is allowed to search all the files owned by a data owner while in our study each user is allowed to search a different subset of the files. The study in [4] relies on a fully trusted user manager, which not only sets system public parameters but also generates secret keys of all parties. In addition, when a data owner outsources his data, he needs to interact with the semi-trusted server, and the keyword ciphertext size depends on the number of users. Similar to [4], the study in [17] also uses a fully trusted third party. In addition, their scheme does not provide an access control mechanism that specifies which users can access which files. A recent study [20] proposes multi-user searchable encryption that supports boolean queries. It does not provide an access control mechanism either. In addition, an authorized user needs to interact with the data owner to get a token. Then, he can generate a trapdoor based on the token for the query. The same limitations hold for [18], which extends [20] to support range, substring, wildcard and phrase queries.

In the above studies, the documents are encrypted with a single key. In other studies, the documents are encrypted with different keys. We have mentioned [25], which has motivated our study. Tang [28] improves the study of [25] by presenting a new security model for multiparty searchable encryption and proposing a scheme that authorizes users to share their files. Liu et al. [24] propose a data sharing scheme, which however has two major drawbacks. First, a user needs to interact with a honest-but-curious server to download some information in advance to generate a trapdoor. Secondly, the trapdoor size is linear in the number of files. Cui et al. [15] adapt the idea in [14] and propose a key-aggregate searchable encryption scheme. In their scheme, a data owner provides a single aggregated key that contains information about all files that a user is authorized to search. Each user is given a secret key (i.e., the aggregated key) and then a user generates constant size trapdoor to search a keyword from all files. Their scheme is, however, vulnerable to cross pairing attack, and hence has neither data privacy nor trapdoor privacy. Basically, an untrusted server can figure out the encrypted keyword from the ciphertext and recover the aggregate key of any user (see more details in Appendix A.1). Rompay et al. [29] propose a scheme that uses a proxy based architecture to achieve a stronger security model. Their scheme suffers from the same inefficiency problem as that in [25].

Other related primitives for encrypted keyword search. The study [8] introduces a scheme that transforms a given identity based encryption (IBE) to public key encryption with keyword search (PEKS). A later study [1] examines the relationships between these two primitives and points out that if one has an anonymous IBE scheme, then he achieves secure PEKS. Attrapadung et al. [2, 3] introduce a new cryptographic primitive called coupling. It is a broadcast encryption that further has hierarchical identity based dimension. It combines user index with identity. They provide restricted anonymous identity based encryption. Their scheme does not have a concrete proof of security. Moreover, it cannot be applied to our setting since the transformation does not provide trapdoor privacy. Last, existing studies [23, 30] investigate attribute based keyword search, which differ significantly from our study. In [30], the access control policy is based on keywords instead of files, while [23] does not allow a secret key holder to generate search token (trapdoor) individually. In [23], the trapdoor is generated by the collaboration between a secret key holder and the fully trusted PKG (private key generator). In addition, in both schemes, the keyword cliphertext, and trapdoor and secret key sizes depend on the total number of attributes that are involved in a data owner’s access control policy.

2 Preliminaries

Bilinear Map: (1) \(\mathbf {G}_1\), \(\mathbf {G}_2\) and \(\mathbf {G}'\) are three multiplicative cyclic groups of prime order p; (2) \(g_1\) is a generator of \(\mathbf {G}_1\), \(g_2\) is a generator of \(\mathbf {G}_2\). A bilinear map \(e:\mathbf {G}_1 \times \mathbf {G}_2 \rightarrow \mathbf {G}'\) has the following properties: (1) for all \(u\in \mathbf {G}_1,v \in \mathbf {G}_2\) and \(a,b \in \mathbb Z_{p}\), we have \(e\left( u^{a},v^{b}\right) = e\left( u,v\right) ^{ab}\); (2) the map is not degenerate, i.e., \(e\left( g_1,g_2\right) \not =1\). It is a symmetric bilinear map when \(\mathbf {G}_1=\mathbf {G}_2=\mathbf {G}\).

2.1 Popa-Zeldovich Scheme [25]

We briefly summarize the construction in [25]. For simplicity, we only present the scenario where a single data owner has m files (each file is encrypted with a different key) and can search all the files with a single trapdoor (token).

Let \(H:\{0,1\}^{*} \rightarrow \mathbf {G}_{1}\), \(H_{2}:\mathbf {G}' \times \mathbf {G}' \rightarrow \{0,1\}^{*}\) be hash functions that are modeled as random oracles. Let \(e:\mathbf {G}_{1} \times \mathbf {G}_{2} \rightarrow \mathbf {G}'\) be a bilinear map. The multi-key searchable encryption (MKSE) scheme proposed in [25] is as follows.

  • MK.Setup(\(\lambda \)): return params (\(p,\mathbf {G}_{1},\mathbf {G}_{2},\mathbf {G},e,g_{1},g_{2},g'\)).

  • MK.KeyGen(params): returns \(uk, k_{1},..., k_{m}\), where uk is the user secret key, \(k_j\) is the encryption key of file j.

  • MK.Delta(\(uk, k_{1},..., k_{m}\)): returns \(\varDelta _{j}=g_{2}^{k_{j}/uk} \in \mathbf {G}_{2}\).

  • MK.Token(ukw): returns \(tk_{w}=H(w)^{uk} \in \mathbf {G}_{1}\).

  • MK.Enc(\(k_{j}, w\)): Draw \(r \in \mathbf {G}'\), outputs \(c_{j,w}=(r,h)\), where

    $$\begin{aligned} h=H_{2}(r,e(H(w),g_{2})^{k_{j}})\,. \end{aligned}$$
  • MK.Adjust(\(tk_{w}, \varDelta _{j}\)): returns \(stk_{j,w}=e(tk_{w},\varDelta _{j})=e(H(w)^{uk},g_{2}^{k_{j}/uk})\).

  • MK.Match(\(stk_{j,w}, c_{j,w}\)): parse \(c_{j,w}\) as (rh) and check if \(H_{2}(r,stk_{j,w})\mathop {=}\limits ^{?}h\). if so, outputs \(b=1\), otherwise \(b=0\).

In the above construction, to search T different keywords, a user only needs to provide \(O(T+m)\) pieces of information to the server (T trapdoors and m delta values), much more efficient than that when using standard searchable encryption (which needs O(Tm) pieces of information).

The authors show that the MKSE scheme has data hiding and token hiding properties. Data hiding (privacy) requires that the semi-honest adversary is not able to distinguish between ciphertexts of two values not matched by some token. Token hiding (privacy) requires that the adversary cannot learn the keyword that one searches for.

The MKSE scheme is inefficient since the data owner needs to provide a delta value for each user that is authorized to search a file. Hence the number of delta values for a file is linear in the number of users. This results a significant communication overhead between a data owner and the server when the data owner outsources the delta values, and significant storage overhead for the server to store these ciphertexts.

2.2 Complexity Assumptions

Decision Linear Assumption. Let \(\mathbf {G}\) be a bilinear group of prime order p. The Decision Linear problem [7] in \(\mathbf {G}\) is stated as follows: given a vector \((g,Z_1=g^{z_{1}},Z_2=g^{z_{2}},Z_{13}=g^{z_{1}z_{3}},Z_{24}=g^{z_{2}z_{4}},Z) \in \mathbf {G}^{6}\) as input, determine whether \(Z=g^{z_{3}+z_{4}}\) or a random value \(R \in \mathbf {G}\). The advantage of an algorithm \(\mathcal {B}\) in deciding the decision linear problem in \(\mathbf {G}\) is

$$\begin{aligned}&\Bigl |\Pr \left[ \mathcal {B}\left( g,g^{z_{1}},g^{z_{2}},g^{z_{1}z_{3}},g^{z_{2}z_{4}},g^{z_{3}+z_{4}}\right) =0\right] \\&\qquad \qquad \qquad \qquad \qquad \qquad \quad -\Pr \left[ \mathcal {B}\left( g,g^{z_{1}},g^{z_{2}},g^{z_{1}z_{3}},g^{z_{2}z_{4}},R\right) =0\right] \Bigr | \le \epsilon \,. \end{aligned}$$

\(\ell \) -Bilinear Diffie-Hellman Exponentiation Assumption ( \(\ell \) -BDHE) [6]. Let \(\mathbf {G}\) be a bilinear group of prime order p. The \(\ell \)-BDHE problem in \(\mathbf {G}\) is defined as follows: given a vector of \(2\ell +1\) elements \((h,g,g^{\alpha },g^{\alpha ^{2}},...,g^{\alpha ^{\ell }},g^{\alpha ^{\ell +2}},..., g^{\alpha ^{2\ell }})\) as input, output \(e\left( g,h\right) ^{\alpha ^{\ell +1}}\). Once g and \(\alpha \) are specified, we denote \(g_{i}=g^{\alpha ^{i}}\) as shorthand. An algorithm \(\mathcal {A}\) has advantage \(\epsilon \) in solving \(\ell \)-BDHE in \(\mathbf {G}\) if

$$\begin{aligned} \Pr \left[ \mathcal {A}\left( h,g,g^{\alpha }, ...,g^{\alpha ^{\ell }},g^{\alpha ^{\ell +2}},..., g^{\alpha ^{2\ell }}\right) =e\left( g^{\alpha ^{\ell +1}},h\right) \right] \ge \epsilon \,, \end{aligned}$$

where the probability is over the random choice of generators \(g,h \in \mathbf {G}\), the random choice of \(\alpha \in Z_{p}\) and random bits used by \(\mathcal {B}\). The decisional version of the \(\ell \)-BDHE problem in \(\mathbf {G}\) is defined analogously. Let

$$\begin{aligned} \bar{{\varvec{y}}}_{g,\alpha ,\ell }=\left( g^{\alpha },\dots ,g^{\alpha ^{\ell }},g^{\alpha ^{\ell +2}},\dots , g^{\alpha ^{2\ell }}\right) . \end{aligned}$$

An algorithm \(\mathcal {B}\) that outputs \(b \in \{0,1\}\) has advantage \(\epsilon \) in solving decision \(\ell \)-BDHE in \(\mathbf {G}\) if

$$\begin{aligned} \left| \Pr \left[ \mathcal {B}\left( g,h,\bar{{\varvec{y}}}_{g,\alpha ,\ell },e\left( g_{\ell +1},h\right) \right) =0\right] -\Pr \left[ \mathcal {B}\left( g,h,\bar{{\varvec{y}}}_{g,\alpha ,\ell },R\right) =0\right] \right| \ge \epsilon , \end{aligned}$$

where the probability is over the random choice of generators gh in \(\mathbf {G}\), the random choice of \(\alpha \in Z_p\), the random choice of \(R \in \mathbf {G}'\), and the random bits consumed by \(\mathcal {B}\).

\(n\text {-}{} \mathbf{Decisional~Diffie}\text {-}{} \mathbf{Hellman~Inverse~Assumption}~\mathbf{(}n\text {-}{} \mathbf{DDHI).}\) Let \(\mathbf {G}\) be a bilinear group of prime order p. The n-DDHI assumption in \(\mathbf {G}\) is stated as follows: given a vector \(g,\bar{{\varvec{y}}}_{g,\alpha ,n}=(g_1,\dots ,g_n,g_{n+2},\dots ,g_{2n}),g^{\gamma },h=g^{z},Z \in \mathbf {G}^{2n+3}\) as input, determine whether \(Z=h^{{\gamma }/{\alpha ^{n+1}}}\) or a random value R in \(\mathbf {G}\). The advantage of an algorithm \(\mathcal {B}\) in deciding the n-DDHI in \(\mathbf {G}\) is

$$\begin{aligned} \left| \Pr \left[ \mathcal {B}\left( g,\bar{{\varvec{y}}}_{g,\alpha ,n},g^{\gamma },g^{z},g^{{z\gamma }/{\alpha ^{n+1}}}\right) =0\right] -\Pr \left[ \mathcal {B}\left( g,\bar{{\varvec{y}}}_{g,\alpha ,n},g^{\gamma },g^{z},R\right) =0\right] \right| \le \epsilon \,. \end{aligned}$$

We provide security evidence of our hardness assumption (n-DDHI) by presenting bounds on the success probabilities of an adversary \(\mathcal {A}\). We follow the theoretical generic group model (GGM) as presented in [26] and show the justification of it.

Theorem 1

Let \(\mathcal {A}\) be an algorithm that solves the n-DDHI problem in the generic group model, making a total of at most q queries to the oracles computing the group action in \(\mathbf {G},\mathbf {G}'\) and the oracle computing the bilinear pairing e. If \(\alpha ,\gamma ,r,z \in Z^{*}_{p}\) and \(\xi ,\xi '\) are chosen at random, then

$$\begin{aligned} \Pr \begin{bmatrix} \mathcal {A}(p,\xi (1),\xi (z),\xi (\alpha ),\ldots ,\xi (\alpha ^{n}),\\ \xi (\alpha ^{n+2}),\ldots ,\xi (\alpha ^{2n}),\xi (\gamma ),\xi (t_{0}),\xi (t_{1}))=d:\\ \alpha ,\gamma ,z \leftarrow Z^{*}_{p}, d\leftarrow \{0,1\},t_{d} \leftarrow \frac{z\gamma }{\alpha ^{n+1}},t_{1-d} \leftarrow r \end{bmatrix} \le \frac{1}{2} + \frac{16n(q+n+1)^{2}}{p}\,. \end{aligned}$$

We provide security evidence for the hardness of the n-DDHI in Appendix A.2.

3 Secure and Efficient Multi-user Encrypted Keyword Search (SEMEKS)

In this section, we define the notion of secure and efficient multi-user encrypted keyword search (SEMEKS) for data sharing. The constructions are deferred to Sect. 4.

Definition 1

In SEMEKS, there are n users, m documents (files) and a server. Let U denote the set of users and \(D= \{M_1,\ldots ,M_m\}\) denote the set of documents. Each document has a set of unique keywords. Let \(F_j\) denote the set of unique keywords in \(M_j\). Let \(S_{j} \subseteq U\) denote the set of users that can access file j. If user \(i\in S_{j}\), then he is able to search any keyword in \(F_j\) and retrieve \(M_j\).

The server \(\mathcal {S}\) stores all ciphertexts for keywords and documents. The server is honest-but-curious, i.e., he does not change any data, and he gives the query results honestly, but he is curious in that he is trying to learn more information from the data and queries (for extracting keywords from ciphertexts and trapdoors). Our model does not allow the adversary (the server) to collude with any of the users. Otherwise, it leaks keywords.

Definition 2

A SEMEKS scheme consists of the following five algorithms.

  • Setup(\(n,\lambda \)): a randomized algorithm that takes the number of users n, and the security parameter \(\lambda \), as input. It outputs \((pk, \{sk_{1},\dots , sk_{n}\})\), where pk is the set of system public key and \(sk_{i}\) is the secret key of user i.

  • Enc(\(pk,F_k,S_k,M_k\)): a randomized algorithm that takes file \(M_k\), the set of unique keywords \(F_k\), public key pk, and user set \(S_k\) as input. It outputs keyword ciphertext \(C_k\) and file ciphertext \(C''_k\).

  • Trap(\(sk_i,w\)): a randomized algorithm that takes secret key \(sk_{i}\) and keyword w as input. It outputs trapdoor \(t_{i,w}\) for user i to query w.

  • Test(\(pk,S_k,t_{i,w},C_k\)): a deterministic algorithm that takes keyword ciphertext \(C_k\), trapdoor \(t_{i,w}\), user set \(S_k\), and public key pk as input. It outputs \(b\in \{0,1\}\), where \(b = 1\) if \(i\in S_k\) and \(C_k\) includes the ciphertext of w; otherwise, \(b = 0\).

  • Dec(\(pk,sk_{i},C''_k,S_k\)): a deterministic algorithm. If \(b = 1\), then the algorithm takes public key pk, secret key \(sk_i\), file ciphertext \(C''_k\) and user set \(S_k\) as input, and outputs the plaintext file \(M_k\). Otherwise, it outputs \(\bot \).

We consider keyword privacy, file privacy, and trapdoor privacy. In addition, we refer to keyword and file privacy together as data privacy. The security definitions of keyword and trapdoor privacy are similar as those defined in [25] with some differences (our encryption scheme uses asymmetric key, and hence the adversary is able to encrypt keywords himself in our setting, which differs from that in [25]). File privacy is defined in our study while not in [25] since it does not consider file decryption functionality.

We define a semantically secure keyword privacy game. In the game, \(\mathcal {A}\) is static that he outputs a keyword and a set pair that he wants to be challenged on. He observes encryption of keywords and trapdoors. However, he is not able to distinguish whether the challenge ciphertext is encoded by the challenge keyword or a random keyword.

Definition 3

(Keyword Privacy). We define static semantic security for keyword privacy in SEMKS by the following game between an adversary \(\mathcal {A}\) and a challenger \(\mathcal {C}\). Both \(\mathcal {C}\) and \(\mathcal {A}\) are given \((n,\lambda )\) as input.

  • Init: \(\mathcal {A}\) takes security parameter \(\lambda \) and outputs a set \(S_{0}\), a keyword \(w^{*}\) that he wants to be challenged on.

  • Setup: \(\mathcal {C}\) runs \(\textit{Setup}(n,\lambda )\) algorithm to obtain system public key pk and a set of private keys, \(sk_{1},\ldots ,sk_{n}\). It then gives the public key pk to \(\mathcal {A}\).

  • Query: \(\mathcal {A}\) adaptively issues queries \(q_{1},\ldots ,q_{\lambda }\), where query \(q_{k}\) is a trapdoor query (iw). For such a query, \(\mathcal {C}\) responds by running algorithm \(\textit{Trap}(sk_i,w)\) to derive \(t_{i,w}\), and sends it to \(\mathcal {A}\).

  • Guess: \(\mathcal {C}\) picks a random number \(b \in \{0,1\}\), computes \((C_0,S_0)\) as the output of \(\text {Enc}(pk,w_{b},S_0)\), where \(w_{0}=w^{*}\), and \(w_{1}\) is a random keyword (of the same length as \(w_0\)), returns the value \((C_0,S_0)\) to \(\mathcal {A}\). \(\mathcal {A}\) outputs its guess \(b' \in \{0,1\}\) for b and wins the game if \(b=b'\).

Restriction: The adversary asks trapdoor queries only when \(i\notin S_0\) and \(w \ne w^{*}\).

A SEMKS scheme is keyword private if, for all \(\text {PPT}\) adversaries \(\mathcal {A}\), for all sufficiently large \(\lambda \), \(\Pr [\text {win}_{\mathcal {A}}(\lambda ,n)] < 1/2\) + \(\text {negl}(\lambda ,n)\), where \(\text {win}_\mathcal {A}(\lambda ,n)\) is a random variable indicating whether the adversary wins the game for security parameter \(\lambda \).

We define a file privacy game that is similar to a semantically secure broadcast encryption definition since a file is encrypted for a set of users. If a user is a member of the corresponding set, he can download the file ciphertext and retrieves the file using his decryption key. In file privacy game, \(\mathcal {A}\) outputs two messages \(M_{0},M_{1}\) and a user set \(S_0\) he wants to be challenged upon. He gets public parameters and user secret keys that are not in the target user set (\(S_0\)). However, he is not able to distinguish whether the given ciphertext is the ciphertext of \(M_{0}\) or \(M_{1}\).

Definition 4

(File Privacy). We define static semantic security for file privacy in SEMEKS by the following game between a challenger \(\mathcal {C}\) and an adversary \(\mathcal {A}\).

  • Init: \(\mathcal {A}\) takes parameters \((n,\lambda )\) and outputs a set \(S_{0}\) and two messages \(M_{0},M_{1}\) that he wants to be challenged on.

  • Setup: \(\mathcal {C}\) runs \(\textit{Setup}(n,\lambda )\) algorithm to obtain system public key pk. It then gives the public key pk to \(\mathcal {A}\).

  • Query: \(\mathcal {A}\) adaptively issues private key queries of user \(j\notin S_{0}\). \(\mathcal {C}\) gives the secret keys, \(sk_{j}\) to \(\mathcal {A}\), where \(j\notin S_{0}\).

  • Challenge: \(\mathcal {C}\) chooses a random \(b \in \{0,1\}\) and runs \(Enc(S_{0},pk,M_{b})\) to obtain the ciphertext \(C''^{*}\), and gives it to \(\mathcal {A}\).

  • Guess: \(\mathcal {A}\) guesses \(b' \in \{0, 1\}\) for b and wins the game if \(b = b'\).

File privacy game is secure against CPA if for all attacks \(\left| \Pr [b = b'] - {1}/{2}\right| \le \text {negl}(\lambda ,n)\).

We define a static trapdoor privacy game that \(\mathcal {A}\) outputs challenge user index and keyword pair at the beginning of the game to be challenged on. In the game, \(\mathcal {A}\) can observe encryptions of keywords and the challenge trapdoor, but he is not able to distinguish the challenge keyword from a random keyword.

Definition 5

(Trapdoor Privacy). The trapdoor privacy game is between a challenger \(\mathcal {C}\) and an adversary \(\mathcal {A}\) as follows:

  • Init: \(\mathcal {A}\) takes parameters \((n,\lambda )\) and outputs a user and keyword tuple \((i^{*},w^{*})\) that he wants to be challenged on.

  • Setup: \(\mathcal {C}\) runs \(\textit{Setup}(n,\lambda )\) to obtain system public key pk and private keys \(sk_1,...,sk_n\). It gives pk to \(\mathcal {A}\).

  • Query: \(\mathcal {A}\) adaptively issues queries \(q_{1},...,q_{\lambda }\), where query \(q_{k}\) is a trapdoor query (iw). \(\mathcal {C}\) responds by running algorithm \(\textit{Trap}(sk_i,w)\) to derive \(t_{i,w}\), and sends it to \(\mathcal {A}\).

  • Guess: \(\mathcal {C}\) runs \(\text {Trapdoor}\) algorithm on input (\(i^{*},w^{*}_{b}\)) for a random bit b to obtain \(t_{i^{*},w^{*}_{b}}\), where \(w^{*}_b=w^{*}\) if \(b=0\), otherwise, it is a random keyword, and sends it to \(\mathcal {A}\). \(\mathcal {A}\) outputs its guess \(b' \in \{0,1\}\) for b and wins the game if \(b=b'\).

A SEMEKS scheme is trapdoor private if, for all \(\text {PPT}\) adversaries \(\mathcal {A}\),

$$\begin{aligned} \Pr [\text {win}_{\mathcal {A}}(\lambda ,n)] < 1/2 + \text {negl}(\lambda )\,. \end{aligned}$$

4 Constructions

In this section, we give two constructions. For simplicity, we present the constructions assuming there is a single data owner; the scenario where there are multiple data owner can be solved similarly. For ease of exposition, we start with assuming that a centralized trusted third party initializes the system public keys; in Sect. 6, we describe how to eliminate the need of this trusted party. The data owner generates the secret key for each user, and then distributes the secret key to the user.

We adapt the coupling primitive in [2, 3] carefully to the setting of multi-user keyword search where the data owner encrypts each file with a different key; different users are allowed to search different subset of (encrypted) files. Specifically, we couple the broadcast dimension (user index; each user is assigned an index) and the keyword dimension (keyword searchability). When user i wants to retrieve the documents that contain keyword w, user i generates trapdoor \(t_{i,w}\) that has two dimensions: index i and keyword w. It binds both dimensions (index, keyword) to let the server search w in all the files that user i is allowed to access, and then retrieve the corresponding files. Both constructions use two useful cryptographic primitives: broadcast encryption from [9] and anonymous identity-based encryption from [11]. The broadcast encryption primitive provides constant size keyword ciphertexts and user secret keys. Specifically, it uses an aggregation method for ciphertexts that results in constant size ciphertexts (i.e., ciphertext size does not depend on the number of users). The anonymous encryption primitive provides anonymity of the keyword that is being encrypted. It uses anonymous encryption that does not reveal keyword from the ciphertext. Specifically, it uses linear splitting method on the random exponent values, which does not allow the adversary to do guessing attack (cross pairing attack).

The first construction uses a single server. We show that it satisfies data privacy but does not satisfy trapdoor privacy. The second construction (i.e., the main construction) is developed to address the problem. It uses two servers that do not collude. We show that it satisfies both data privacy and trapdoor privacy. At the end, we describe how to eliminate the need of having a single trusted party in our constructions.

4.1 First Construction

Setup(\(n,\lambda \)): Let \(\mathbf {G}\) be a bilinear group of prime order p. The algorithm first picks a random generator \(g \in \mathbf {G}\) and a random value \(\alpha \in Z_{p}\). It computes \(g_{i}= g^{\alpha ^{i}} \in \mathbf {G}\) for \(i=1,2,\ldots ,n,n+2,\ldots ,2n\) (these values are generated by the centralized trusted third party). Next, it picks at random \(\gamma ,\beta \in Z_{p}\) and sets \(v=g^{\gamma },v'=g^{\beta } \in \mathbf {G}\). It then picks random elements \(h_{0,1}, h_{0,2}, h_{1,1}, h_{1,2} \in \mathbf {G}\) and \(a_{1}, b_{1}, a_{2}, b_{2} \in Z_{p}\) (these values are picked by the data owner (DO)). The public key is: \(pk= (g,g_{1},\ldots ,g_{n},g_{n+2},\ldots ,g_{2n}\), \(v,v',h^{a_{1}}_{\ell ,1} ,h^{b_{1}}_{\ell ,1} ,h^{a_{2}}_{\ell ,2} ,h^{b_{2}}_{\ell ,2}\)), where \(\ell =\{0,1\}\). The secret key of the DO is \(sk_{DO} = (\gamma , \beta , a_{1}, b_{1}, a_{2}, b_{2})\). The algorithm outputs pk and \(sk_{DO}\). For users’ secret keys, the DO chooses random \(\rho _{i1},\rho _{i2},\rho '_{i1},\rho '_{i2}\) for user i and computes secret key for user i as follows:

$$\begin{aligned} d_{i,1}&=g_{i}^{\gamma },&d_{i,2}&=g^{a_{1}\rho _{i1}},&d_{i,3}&=g^{a_{1}\rho '_{i1}}, \\ d_{i,4}&=g^{a_{2}\rho _{i2}},&d_{i,5}&=g^{a_{2}\rho '_{i2}},&d_{i,6}&=g^{b_{1}\rho _{i1}}, \\ d_{i,7}&=g^{b_{1}\rho '_{i1}},&d_{i,8}&=g^{b_{2}\rho _{i2}},&d_{i,9}&=g^{b_{2}\rho '_{i2}}, \\ d_{i,10}&=h_{0,1}^{a_{1}b_{1}\rho _{i1}}h_{0,2}^{a_{2}b_{2}\rho _{i2}},&d_{i,11}&=h_{0,1}^{a_{1}b_{1}\rho '_{i1}}h_{0,2}^{a_{2}b_{2}\rho '_{i2}},&d_{i,12}&=h_{1,1}^{a_{1}b_{1}\rho _{i1}}h_{1,2}^{a_{2}b_{2}\rho _{i2}}, \\ d_{i,13}&=h_{1,1}^{a_{1}b_{1}\rho '_{i1}}h_{1,2}^{a_{2}b_{2}\rho '_{i2}},&d_{i,14}&=g_{i}^{\beta }. \end{aligned}$$

These values are given to user i via a secure channel. Specifically, \(sk_i=(d_{i,1},\ldots ,d_{i,14})\).

Enc(\(pk,w,S_{k},M_k\)): The DO picks random values \(t,t',t_{1},t_{2} \in Z_{p}\) for keyword w from \(F_k\), and computes the following:

$$\begin{aligned} K&=e(g_{n+1},g)^{t},&K'&=e(g_{n+1},g)^{t'},&hdr_{k,1}&=(h_{0,1}^{a_{1}}(h_{1,1}^{a_{1}})^{w})^{t_{1}}, \\ hdr_{k,2}&=(h_{0,1}^{b_{1}}(h_{1,1}^{b_{1}})^{w})^{t-t_{1}},&hdr_{k,3}&=(h_{0,2}^{a_{2}}(h_{1,2}^{a_{2}})^{w})^{t_{2}},&hdr_{k,4}&=(h_{0,2}^{b_{2}}(h_{1,2}^{b_{2}})^{w})^{t-t_{2}}, \\ hdr_{k,5}&=g^{t},&hdr_{k,6}&=\Bigl (v\prod _{j\in S_k} g_{n+1-j}\Bigr )^{t},&hdr_{k,7}&=K, \\ hdr_{k,8}&=g^{t'},&hdr_{k,9}&=\Bigl (v'\prod _{j\in S_k} g_{n+1-j}\Bigr )^{t'},&hdr_{k,10}&=K'M_k. \end{aligned}$$

Let the first part of the ciphertext \(C_k=(hdr_{k,1},\ldots ,hdr_{k,7})\). Let the second part of the ciphertext \(C''_k=(hdr_{k,8},hdr_{k,9},hdr_{k,10})\)

Trap(\(d_{i,1}||....||d_{i,13},w\)): User i picks \(r,r' \in Z_{p}\) and computes \(t_{i,w}\) as

$$\begin{aligned} tr_{1}&=d_{i,1}d^{r}_{i,10}d^{r'}_{i,11}\left( d^{r}_{i,12}d^{r'}_{i,13}\right) ^{w},&tr_{2}&=d^{r}_{i,2}d^{r'}_{i,3},&tr_{3}&=d^{r}_{i,4}d^{r'}_{i,5},\\ tr_{4}&=d^{r}_{i,6}d^{r'}_{i,7},&tr_{5}&=d^{r}_{i,8}d^{r'}_{i,9}\,. \end{aligned}$$

Test(\(pk,S_k,t_{i,w},C_k\)): The server checks if

$$\begin{aligned} hdr_{k,7}&\mathop {=}\limits ^{?}&\frac{e(g_{i},hdr_{k,6})e(hdr_{k,1},tr_{4})T}{e(tr_{1}\prod _{j\in S_k, i \ne j} g_{n+1-j+i},hdr_{k,5})}, \end{aligned}$$

where \(T=e(hdr_{k,3},tr_{5})e(hdr_{k,2},tr_{2})e(hdr_{k,4},tr_{3})\). If the equality holds, then the test result \(b=1\). Otherwise, \(b=0\).

Dec(\(pk,d_{i,14},C''_k,S_{k}\)): Once the server outputs \(b=1\) from the Test algorithm, he sends \(S_{k},C''_k\) to user i (we call this process as download). Then, user i does the decryption in the same way as that in [9] to recover first \(K'\) then extracts \(M_k\) by computing \(M_k={K'M_k}/{K'}\).

It is easy to see that the keyword ciphertext does not reveal w by using cross pairing. The data privacy is achieved by using linear splitting method that is introduced in [11] for ciphertexts. The idea in [11] is to use different random blind values in ciphertexts. This scheme is correct as follows:

\((pk,sk_{DO},sk_{i}=(d_{i,1}||\dots ||d_{i,14})_{i=1,..n})) \leftarrow \mathbf{Setup }(n,\lambda )\),

\((C_{k},C''_{k})=(hdr_{k,j})_{j=[1,10]} \leftarrow \mathbf{Enc }(pk,w,S_k,M_k)\),

\(t_{i,w}=(tr_{l})_{l=[1,5]} \leftarrow \mathbf{Trap }(sk_{i}=(d_{i,1}||\dots ||d_{i,13}),w)\),

If \(i\in S_{k}, b \leftarrow \mathbf{Test }(pk,t_{i,w},C_k,S_k)\),

Otherwise, \(\bot \leftarrow \mathbf{Test }(pk,t_{i,w},C_k,S_k)\),

If \(b=1, M_k \leftarrow \mathbf{Dec }(pk,d_{i,14},S_k,C''_k)\).

The first construction does not have trapdoor privacy. The first reason is that the adversary (i.e., the server) can extract the keyword from the generated trapdoor. The problem happens because user i does not blind his first part of the secret key \(d_{i,1}\) when he generates the first part of the trapdoor \(tr_1\). The attack basically occurs when given a trapdoor for a keyword w from user i, \(t_{i,w}=(tr_1,tr_2,tr_3,tr_4,tr_5)\), the server picks a keyword \(w^{*}\) and checks if

$$\begin{aligned} \frac{e(tr_1,g)}{e(h^{b_1}_{0,1},tr_2)e(h^{b_2}_{0,2},tr_3)e(g_i,g^{\gamma })}=e(h^{a_1}_{1,1},tr_4)^{w^{*}}e(h^{a_2}_{1,2},tr_5)^{w^{*}}. \end{aligned}$$

If the above equality holds, the server concludes that \(w=w^{*}\). The second reason is going to be explained in Sect. 5.2.

To counter the above attack, we need to blind the first part of the secret key of a user, which however breaks the BGW decryption process since it needs the first part of the secret key not to be blinded. We solve the above issue in the following construction by using two servers \(\mathcal {S}_{main}\) and \(\mathcal {S}_{aid}\). In our new model, \(\mathcal {S}_{aid}\) is trusted while \(\mathcal {S}_{main}\) is the semi-honest adversary. In trapdoor phase, a user chooses three random values \(r,r',r''\) and generates the trapdoor using \(r,r',r''\). The user uses \(r''\) to blind the first part of secret key \(d_{1}\) and uses \(r,r'\) to blinds other parts of the secret keys (\(d_{j}\), where \(j=2,..,13\)). Then, the user sends a random value \(r_2\) with trapdoor to \(\mathcal {S}_{main}\) and another random value \(r_1\) to \(\mathcal {S}_{aid}\). Here, \(r''\) can be thought as a function of \(r_{1},r_{2}\): \(f(r_1,r_2)=r''\). In our construction, the function f simply takes \(r_1,r_2\) and outputs \(r''=r_1+r_2\). \(\mathcal {S}_{aid}\) stores the values \(C'_k=hdr_{k,5},hdr_{k,6},hdr_{k,7},S_k\) while \(\mathcal {S}_{main}\) stores \((C_k,C''_k)=(hdr_{k,1},hdr_{k,2},hdr_{k,3},hdr_{k,4}, hdr_{k,5},hdr_{k,6},hdr_{k,7},hdr_{k,8},hdr_{k,9},hdr_{k,10},S_k)\) for file \(F_k\). Once the user sends trapdoor and random values to the servers, \(\mathcal {S}_{aid}\) first computes \(f_{1}(r_{1},C'_{k})\) and sends it to \(\mathcal {S}_{main}\). Then \(\mathcal {S}_{main}\) internally checks if the keyword appears in the ciphertext by using \(t_{i,w},f_{2}(r_{2},C_{k}),f_{1}(r_{1},C'_{k}),C_{k},S_{k}\). If so, \(\mathcal {S}_{main}\) sends \(C''_k\) and \(S_k\) to the user. The user decrypts it and recovers the plaintext file k where the keyword appears. We illustrate the query process in Fig. 1.

Fig. 1.
figure 1

Illustration of the secure and efficient construction.

4.2 Main Construction

This construction has two servers. It is as follows (illustrated in Fig. 1).

Setup(\(n,\lambda \)): This algorithm is the same as that in the first construction in Sect. 4.1. It outputs system public key pk and user secret key \(sk_{i}\) for user \(i=1,\ldots ,n\).

Enc(\(pk,w,S_{k},M_k\)): This algorithm is the same as that in the first construction in Sect. 4.1 and outputs \(hdr_{k,1} \ldots hdr_{k,10}\). Let the first part of the ciphertext \(C_k=(hdr_{k,1},\ldots ,hdr_{k,7})\). Let the second part of the ciphertext \(C''_k=(hdr_{k,8},hdr_{k,9},hdr_{k,10})\). In addition, let \(C_k'=(hdr_{k,5},hdr_{k,6},hdr_{k,7})\). The main server \(\mathcal {S}_{main}\) stores \(C_k\), \(C''_k\) and \(S_k\). The aid server \(\mathcal {S}_{aid}\) stores \(C'_k\) and \(S_k\).

Trap(\(d_{i,1}||....||d_{i,13},w\)): User i picks \(r,r',r'' \in Z_{p}\) and computes \(t_{i,w}\) as

$$\begin{aligned} tr_{1}&=d^{r''}_{i,1}d^{r}_{i,10}d^{r'}_{i,11}\left( d^{r}_{i,12}d^{r'}_{i,13}\right) ^{w},&tr_{2}&= d^{r}_{i,2}d^{r'}_{i,3},&tr_{3}&=d^{r}_{i,4}d^{r'}_{i,5},\\ tr_{4}&=d^{r}_{i,6}d^{r'}_{i,7},&tr_{5}&=d^{r}_{i,8}d^{r'}_{i,9},&tr_{6}&=g^{r''}_{n+1+i}\,. \end{aligned}$$

He sends \(t_{i,w}=\left( tr_1,\ldots ,tr_6\right) \) and \(r_2\) to \(\mathcal {S}_{main}\), and sends \(r_1\) to \(\mathcal {S}_{aid}\).

Test(\(pk,S_k,f_{2}(r_2,C_k),f_{1}(r_1,C'_k),t_{i,w},C_k\)): Both servers compute

$$\begin{aligned} A_{k}=\frac{e\left( g_{i},hdr_{k,6}\right) }{hdr_{k,7}e\left( \prod _{j\in S_k, i\ne j} g_{n+1-j+i}, hdr_{k,5}\right) } \end{aligned}$$

for file k. The aid server sends \(f_{1}(r_1,C'_k)=A_{k}^{r_1}\) to \(\mathcal {S}_{main}\). \(S_{main}\) then computes \(f_{2}(r_2,C_k)f_{1}(r_1,C'_k)=A_{k}^{r_2}A_{k}^{r_1}=A_{k}^{r}\) and checks if

$$\begin{aligned} \frac{e\left( tr_1,hdr_{k,5}\right) }{e\left( hdr_{k,1},tr_4\right) T}=A_{k}^{r}\,, \end{aligned}$$

where \(T=e(hdr_{k,3},tr_{5})e(hdr_{k,2},tr_{2})e(hdr_{k,4},tr_{3})\). If the equality holds, then the test result \(b=1\). Otherwise, \(b=0\).

Dec(\(pk,d_{i,14},C''_k,S_{k}\)): Once the server outputs \(b=1\) from the Test algorithm, he sends \(S_{k},C''_k\) to user i (we call this process as download). Then, user i does the decryption in the same way as that in [9] to recover first \(K'\) then extracts \(M_k\) by computing \(M_k={K'M_k}/{K'}\).

5 Security

5.1 Data Privacy

Proving Security: We prove security using a hybrid experiment as that in [11]. Let \([hdr_{0,1},hdr_{0,2},hdr_{0,3},hdr_{0,4},hdr_{0,5},hdr_{0,6},hdr_{0,7}]\) denote the challenge ciphertext for keyword \(w^{*}\) and user set \(S_0\) that are given to the adversary during a real attack. Additionally, let \(R,R'\) be two random elements of \(\mathbf {G}\). We define the following hybrid games which differ on what challenge ciphertext is given by a simulator \(\mathcal {SIM}\) to the adversary:

\(\varGamma _{0}\): The challenge ciphertext is

$$\begin{aligned} C_{0}=[hdr_{0,1},hdr_{0,2},hdr_{0,3},hdr_{0,4},hdr_{0,5},hdr_{0,6},hdr_{0,7}]. \end{aligned}$$

\(\varGamma _{1}\): The challenge ciphertext is

$$\begin{aligned} C_{1}=[hdr_{0,1},R,hdr_{0,3},hdr_{0,4},hdr_{0,5},hdr_{0,6},hdr_{0,7}]. \end{aligned}$$

\(\varGamma _{2}\): The challenge ciphertext is

$$\begin{aligned} C_{2}=[hdr_{0,1},R,hdr_{0,3},R',hdr_{0,5},hdr_{0,6},hdr_{0,7}]. \end{aligned}$$

We remark that the challenge ciphertext in \(\varGamma _{2}\) leaks no information about the keyword since it is composed of seven random group elements, whereas in \(\varGamma _{0}\) the challenge is well formed. We show that the transitions from \(\varGamma _{0}\) to \(\varGamma _{1}\) and \(\varGamma _{1}\) to \(\varGamma _{2}\) are all computationally indistinguishable.

Since we use two servers, we slightly change the definition of keyword privacy game in Sect. 3. During the game, the adversary makes encryption queries since \(\mathcal {S}_{aid}\) is trusted and controlled by the simulator. The adversary is not able to upload any ciphertext that he wants. We restrict the adversary.

Theorem 2

(Keyword Privacy). The main construction of the SEMEKS scheme has keyword privacy under Decision Linear Assumption.

Proof

Suppose the existence of an adversary \(\mathcal {A}\) that distinguishes between the two games (\(\varGamma _{0}\) and \(\varGamma _{1}\)) with a non-negligible advantage \(\epsilon \). Then we construct \(\mathcal {SIM}\) that wins the Decision Linear game as follows. \(\mathcal {SIM}\) takes in a decision linear instance \(g,g^{z_{1}},g^{z_{2}},g^{z_{1}z_{3}},g^{z_{2}z_{4}},Z\), where Z is either \(g^{z_{3}+z_{4}}\) or random in \(\mathbf {G}\) with equal probability. For convenience, we rewrite this as \(g,g^{z_{1}},g^{z_{2}},g^{z_{1}z_{3}},Y,g^{t}\) for t such that \(g^{t}=Z\), and consider the task of deciding whether \(Y=g^{z_{2}(t-z_{3})}\). \(\mathcal {SIM}\) plays the game in the following stages.

  • Init: \(\mathcal {A}\) gives \(\mathcal {SIM}\) the challenge keyword \(w^{*}\) and the challenge set \(S_{0}\).

  • Setup: The simulator first chooses random exponents \(\alpha ,\gamma ,\zeta ,a_2,b_2\). It uses the same g as in the decision linear instance, and sets \(g_{1},\dots ,g_{n}\), \(g_{n+2},\dots ,g_{2n}\), v, where \(g_{i}=g^{\alpha ^{i}}\), \(v=g^{\gamma }\), \(h_{0,1}=g^{-w^{*} \zeta +\zeta }\), \(h_{1,1}=g^{\zeta }\),\(h_{0,2}=g^{z_2(-w^{*} \zeta )}g^{\zeta }\), \(h_{1,2}=g^{z_2\zeta }\). Next the simulator sets \(h^{a_1}_{0,1}\) \(\leftarrow \) \((g^{z_{1}})^{-w^{*} \zeta +\zeta }\), \(h^{a_1}_{1,1}\) \(\leftarrow \) \((g^{z_{1}})^{\zeta }\), \(h^{b_1}_{0,1}\) \(\leftarrow \) \((g^{z_{2}})^{-w^{*} \zeta +\zeta }\), \(h^{b_1}_{1,1}\) \(\leftarrow \) \((g^{z_{2}})^{\zeta }\),\(h^{a_2}_{0,2} \leftarrow g^{a_2z_2(-w^{*} \zeta )}g^{a_2\zeta }\), \(h^{a_2}_{1,2} \leftarrow g^{a_2z_2\zeta }\), \(h^{b_2}_{0,2} \leftarrow g^{b_2z_2(-w^{*} \zeta )}g^{b_2\zeta }\), \(h^{b_2}_{1,2} \leftarrow g^{b_2z_2\zeta }\).

    The system public key is \(pk=g,g_{1},\dots ,g_{n}\), \(g_{n+2},\dots ,g_{2n}\), v, \(h^{a_1}_{0,1}\), \(h^{a_1}_{1,1}\), \(h^{b_1}_{0,1}\), \(h^{b_1}_{1,1}\), \(h^{a_2}_{0,2}\), \(h^{a_2}_{1,2}\), \(h^{b_2}_{0,2}\), \(h^{b_2}_{1,2}\) and \(\mathcal {SIM}\) gives the public key to \(\mathcal {A}\) and \(\mathcal {S}_{aid}\).

  • Query(\(S_{\ell },w_{\ell }\)): To answer encryption query for (\(S_{\ell },w_{\ell }\)) where \(w_{\ell }\ne w^{*}\), the simulator picks \(\ell ,\ell _1,\ell _2\), and computes \(hdr_{k,1},\ldots ,hdr_{k,7}\) as

    $$\begin{aligned} K&=e(g_{n+1},g)^{\ell },&hdr_{k,1}&=(h_{0,1}^{a_{1}}(h_{1,1}^{a_{1}})^{w_{\ell }})^{\ell _{1}}, \\ hdr_{k,2}&=(h_{0,1}^{b_{1}}(h_{1,1}^{b_{1}})^{w_{\ell }})^{\ell -\ell _{1}},&hdr_{k,3}&=(h_{0,2}^{a_{2}}(h_{1,2}^{a_{2}})^{w_{\ell }})^{\ell _{2}}, \\ hdr_{k,4}&=(h_{0,2}^{b_{2}}(h_{1,2}^{b_{2}})^{w_{\ell }})^{\ell -\ell _{2}},&hdr_{k,5}&=g^{\ell },\\ hdr_{k,6}&=\Bigl (v\prod _{j\in S_{\ell }} g_{n+1-j}\Bigr )^{\ell },&hdr_{k,7}&=K \end{aligned}$$

    and gives them to \(\mathcal {A}\) (\(\mathcal {S}_{main}\)) and gives \(hdr_{k,5},hdr_{k,6},hdr_{k,7},S_{\ell }\) to \(\mathcal {S}_{aid}\).

  • Query(iw): To answer trapdoor queries for (iw) where \(i \notin S_0\) and \(w\ne w^{*}\), the simulator picks \(\rho _{i1},\rho '_{i1},\rho _{i2},\rho '_{i2},r,r',r''\), and computes \(t_{i,w}=(tr_1,\ldots ,tr_6)\) as

    $$\begin{aligned} tr_{1}&=g_{i}^{r''\gamma }(h_{0,2}h^{w}_{1,2})^{a_2b_2Y}(g^{\frac{-z_{1}X\zeta }{w-w^{*}}})(g^{-z_1X\zeta }),&tr_{2}&=g^{z_{1}X},\\ tr_{3}&=g^{z_{2}X},&tr_4&=g^{a_2Y}g^{\frac{-z_1X}{b_2}}g^{\frac{-z_1X}{b_2(w-w^{*})}},\\ tr_5&=g^{b_2Y}g^{\frac{-z_1X}{a_2}}g^{\frac{-z_1X}{a_2(w-w^{*})}},&tr_{6}&=g^{r''}_{n+1+i}, \end{aligned}$$

    where \(X=\rho _{i1}r+\rho '_{i1}r'\), \(Y=\rho _{i2}r+\rho '_{i2}r'\). Then, he gives \(tr_1,tr_2,tr_3,tr_4,tr_5,\) \(tr_6,r_2\) to \(\mathcal {A}\) and \(r_1\) to \(\mathcal {S}_{aid}\).

  • Guess: The simulator responds with a challenge ciphertext for the keyword \(w^{*}\) and the set \(S_0\). Assume \(t_{1}=z_{3}\). The simulator picks random \(t_2 \in Z_p\). To proceed, the simulator outputs the ciphertext as

    $$\begin{aligned} hdr_{0,1}&=(g^{\zeta })^{(t_{1}z_{1})},&hdr_{0,2}&=Y^{\zeta },&hdr_{0,3}&=(g^{\zeta })^{(t_{2}a_{2})},\\ hdr_{0,4}&=Z^{\zeta b_2} g^{-\zeta b_2t_2},&hdr_{0,5}&=Z=g^{t},&hdr_{0,6}&=\left( v\prod _{j\in S_0} g_{n+1-j}\right) ^{t}, \\ hdr_{0,7}&=K=e(g_{n+1},g)^{t}. \end{aligned}$$

    If \(Y=g^{z_{2}(t-z_{3})}\) so \(Z=g^{t}\), then all parts of the challenge are well formed and the simulator simulates game \(\varGamma _{0}\). If instead Y is independent of \(z_{1},z_{2},t,t_{1},t_2\), which happens when Z is random, then the simulator emulates the game \(\varGamma _{1}\).

  • Output: \(\mathcal {A}\) outputs a bit b to guess which hybrid game the simulator has been playing. To conclude, the simulator forwards b as its own answer in the Decision Linear game.

Restriction: \(\mathcal {S}_{aid}\) does not use the public key \(g_i\) that is asked in trapdoor query to compute function \(f(r_1,C')\). The function should be independent of \(g_i\) since the index i does not appear in the set \(S_{0}\).

Analysis: Since the challenge ciphertext is independent of \(w^{*}\), the adversary’s best success probability is 1/2 when the adversary \(\mathcal {A}\) gets \(\varGamma _{1}\) as challenge ciphertext. The success probability is \(\Pr [\text {win}_{\mathcal {A}}(\lambda ,n)] < 1/2\) + \(\epsilon \) when \(\mathcal {A}\) gets \(\varGamma _{0}\) as challenge ciphertext. So, \(\mathcal {SIM}\) breaks the Decision Linear assumption with probability \(|\Pr (\mathcal {A}(\varGamma _{0})=1)-\Pr (\mathcal {A}(\varGamma _{1})=1|\)=1/2 + \(\epsilon -1/2=\epsilon \), which is non-negligible.    \(\square \)

Remark 1

The indistinguishability of the hybrid games \(\varGamma _{1}\) and \(\varGamma _{2}\) can be shown similarly by adjusting the parameters as \(a_2=z_1\), \(b_2=z_2\).

Theorem 3

(File Privacy). The main construction of SEMEKS has file privacy under n-DBDHE assumption.

Proof

The other parts of the (file) ciphertexts, \((g^{t'},(v'(\prod _{j\in S_0} g_{n+1-j}))^{t'},K'M_{0})\) are just for downloading process (independent of the first part of the keyword ciphertext and other public key values that are formed by \(a_1,a_2,b_1,b_2,\gamma \) and \(h_{0,1},h_{1,1},h_{0,2},h_{1,2}\)) when the searched keyword matches the first part of the keyword ciphertexts. These parts, \(C''_{0}\), can be simulated in the same way as in [9]. In the interests of space, we do not give concrete proof.

As a proof sketch, once \(\mathcal {SIM}\) gets n-DBDHE parameters, he will set pk, \(hdr_{0,8}\), \(hdr_{0,9}\), \(v'\) and user secret keys \(d_{j,14}\), where \(j \notin S_{0}\), in the same way as in [9]. Then, he follows the same game steps (Challenge and Guess) as those in [9].    \(\square \)

5.2 Trapdoor Privacy

Formalizing a security model for trapdoor privacy is challenging since an adversary can use the public resource and/or provided secret information (trapdoor) to verify if the given trapdoor is generated under a specific keyword. This is a serious problem if a scheme is built in a public key setting. It is because the adversary can encrypt any keyword, then he can verify if the encrypted keyword and the given trapdoor are generated under the same keyword. The solution for this is to restrict the adversary to avoid generating any ciphertext for a set of users to verify if the given trapdoor and the generated ciphertext match under the same keyword.

Our Solution: Since we use a two-server solution for trapdoor privacy, we change the definition of trapdoor privacy in Sect. 3. In the new model, we are going to have some query restrictions.

Restriction: For a challenge trapdoor (\(i^{*},w^{*}\)) in the trapdoor privacy game, the restriction on \(\mathcal {A}\) for encryption query \(\text {Enc}(pk,w,S_{j})\): if \(i^{*}\in S_{j}\) or \(w=w^{*}\), the simulator outputs \(\bot \). It means that the adversary is not able to ask the challenger an encryption query where the challenge user index is in the given set \(S_j\) or the challenge keyword. One can think that the adversary can encrypt whatever he wants to encrypt since he has the system public key pk. However, in our model \(\mathcal {S}_{aid}\) is trusted and is controlled by the simulator. So, the adversary is not able to upload any keyword ciphertext to \(\mathcal {S}_{aid}\). But he can encrypt keywords on his will offline. It means that he can encrypt keywords and see the ciphertext but he is not able to use them in \(\text {test}\) algorithm since he needs \(\mathcal {S}_{aid}\) to send partial trapdoor information. Another restriction is that \(\mathcal {S}_{aid}\) does not use the public key of the challenge user (\(g_i\)) to compute function \(f(r_1,C')\). The function should be independent of \(g_i\).

Theorem 4

(Trapdoor Privacy). The construction of SEMEKS is trapdoor private under n-DDHI assumption.

In the interest of space, we leave the proof to Appendix A.3.

6 Eliminating Single Trusted Party

The two constructions above use a single trusted party to generate the public key pk. The single trusted party can be a manager or a DO. In either case, a DO generates secret keys for the users that are eligible to do keyword search on the DO’s files and distributes the secret keys to the users. A user’s secret key is hence known by the DO, which is not desirable since a user might want her secret key to be only known by herself. Furthermore, when there are multiple DOs and a user wants to search multiple DOs’ data files, each DO needs to generate and send a secret key to the user, and the user needs to generate a trapdoor that is linear in the number of DOs. This results in a scalability problem. In addition, if there are multiple DOs, each setting her own public key which is of size O(n), the system total public key size is going to be \(O(n^{2})\) for n DOs. This results in another scalability problem.

To address the above issues, we can use the recently proposed \(distributed \) \(parameter \) generation protocol [22]. In [22], n parties can jointly generate system public key and user secret keys that eliminates the need of having a single trusted authority. These n parties can be all DOs that share their data files with each other (group data sharing), or a single DO that share his files with \(n-1\) users. Since most parts of the system public key are in the from of \(n-BDHE \) parameters, they can be generated straightforwardly from [22]. The remaining part of the values \(h_{0,1}^{a_1},h_{1,1}^{a_1},h_{0,1}^{b_1},h_{1,1}^{b_1},h_{0,2}^{a_2},h_{1,2}^{a_2},h_{0,2}^{b_2},h_{1,2}^{b_2}\) can be distributively generated by applying \(\mathcal {DKG}\) and \(\mathcal {REC}\) a couple of times. The secret keys of the users can be distributively generated by applying \(\mathcal {DKG},\mathcal {REC}\) and \(\mathcal {RECSQ}\) sub-protocols. Since these are straightforward, we do not provide the constructions explicitly.

7 Conclusion

We have proposed a secure and efficient encrypted keyword search (SEMEKS) scheme for multi-user data sharing. In the scheme, an authorized user uses a single trapdoor to search all files that he is authorized to search. In addition, for keyword ciphertexts of a file, the network bandwidth usage and the storage required at the server does not depend on the number of authorized users that can search that file. We have also performed rigorous security analysis to show that SEMEKS has trapdoor privacy and data privacy.

As future work, we will investigate how to use a single server to achieve secure and efficient keyword search for data sharing. In addition, we want to support boolean and not conjunctive keyword queries and update users set efficiently (adding or removing users from pre-defined authorized set).