Keywords

1 Introduction

Cloud storage services are widely deployed nowadays. Popular services include Amazon S3 [1], Apple iCloud [5] and Microsoft Azure [6]. By using cloud services, data owners pay for the storage they use, eliminating the expensive cost of maintaining dedicated infrastructures.

As more and more users turn to clouds for storage, the amount of data stored in the clouds grows rapidly. Conventionally, the clouds simply store what have been sent by the users. This unfortunately will lead to significant waste of storage space, as different users may upload identical data. A promising remediation is to perform data deduplication, in which the clouds only store a unique copy of duplicated data from different users to reduce the unnecessary waste of storage space. For example, recent research from Microsoft [34] showed that deduplication can achieve 50% and 90–95% storage savings in the standard file systems and backup systems, respectively. Almost all the existing popular file hosting services like Dropbox [4] and Box [2] perform data deduplication.

There are two popular data deduplicaiton mechanisms: server-side deduplication and client-side deduplication. The main difference between them lies in the location of deduplication. In server-side deduplication, servers transparently perform deduplication on the data outsourced by the clients. In client-side deduplication, however, the servers and the clients cooperate to perform deduplication. Compared to the server-side deduplication, the client-side deduplication has a significant benefit that the clients do not need to upload the data which have already stored by the servers, significantly reducing bandwidth consumption. Therefore, the client-side deduplication has been used extensively in the public file hosting services [2, 4].

Encryption is necessary to protect confidentiality of sensitive data. However, it creates a severe obstacle for deduplicaiton, as identical plain-texts may be encrypted into different cipher-texts by different users using different keys. Message-Locked Encryption (MLE) [13] is a cryptographic primitive which can resolve the aforementioned issue. MLE can derive encryption keys from messages being encrypted, such that different users are able to generate the same key for the identical data. Existing MLE schemes include CE [23], DupLESS [12], Duan Scheme [24], and LAP scheme [33].

To ensure continuous data confidentiality for encrypted cloud storage, re-encryption seems unavoidable, due to the potential key exposure [3, 8] or user revocation [39, 41]. Compared to conventional encrypted cloud storage, re-encryption in deduplication-based cloud storage is much more challenging, as it needs to be performed in such a manner that deduplication should not be disturbed. Li et al. proposed REED [32] to address the re-encryption problem for deduplication-based storage systems by smartly transforming the encrypted data such that they can be efficiently re-encrypted when revoking keys/users. REED however is specifically designed for server-side deduplication, which is not immediately applicable to the more beneficial client-side deduplication.

To design a secure client-side deduplication system which supports efficient re-encryption, we face several challenges: (1) To conform to the notion of storage outsourcing, we usually outsource both the data and the management of data [16, 19], such that once the data have been outsourced, the client will be involved as little as possible. It is thus challenging to allow re-encryption with least client intervention. (2) Different from server-side deduplication, in which the client will always upload the data being outsourced, in client-side deduplication, the client will not upload the data when he/she convinces the cloud server that he/she possesses the data which have already been stored in the cloud, and the keys for decrypting these data will be disclosed to such clients after re-encryption. To ensure continuous confidentiality, we need a technique which can allow the cloud server to differentiate valid or invalid clients by efficiently verifying the possession of the file in the clients without being able to learn the plaintext of the file. (3) Considering the data stored in clouds are usually large in size, re-encrypting them may be prohibitively expensive. An efficient re-encryption approach is usually challenging.

In this paper, we propose SEDER, the first secure client-side deduplication system for cloud storage supporting efficient re-encryption. Our key insights are threefold: First, we leverage proofs of ownership (PoWs), by which we can ensure that after re-encryption, the new key is only disclosed to valid users who can prove they are the owners of the data. Second, we leverage all-or-nothing transform, by which it is possible to re-encrypt a file by only re-encrypting a small portion of it. Third, by observing that the existing proxy re-encryption can not be used in SEDER directly, we re-design a delegated re-encryption scheme, by which we can freely delegate the re-encryption to the cloud server without disclosing the plaintext data. This is advantageous as the client can be released from the burden of re-encryption and remains lightweight.

Comparison. Although both SEDER and REED [32] aim to address the re-encryption problem, they are different in multiple aspects: (1) REED resolves the re-encryption problem for server-side deduplication. However, SEDER resolves this problem for client-side deduplication, which has a more complicate design and hence a larger attack surface; (2) REED has very expensive computation/communication overhead for uploading files, as the client always needs to perform the expensive all-or-nothing transform and upload the file. In SEDER however, the amortized computation/communication overhead for uploading files is significantly reduced, as the clients do not need to perform the expensive all-or-nothing transform and upload the file if it exists in the cloud server; (3) In REED, the client is heavily involved in the re-encryption process which imposes significant burden on the client and contradicts with the notion of storage outsourcing. In SEDER however, the re-encryption is delegated to the cloud server who has rich computation resources, such that the client can always remain lightweight.

Contributions. We summarize our contributions in the following:

  • We initiate the research of designing efficient re-encryption schemes for secure client-side deduplication in cloud storage.

  • We design a delegated re-encryption scheme which can be used to delegate re-encryption to the untrusted third party. In addition, we design SEDER by smartly leveraging all-or-nothing transform (\(\textsf {AONT}\)), proofs of ownership (\(\textsf {PoWs}\)), and delegated re-encryption (\(\textsf {DRE}\)).

  • We evaluate the performance of SEDER. Experimental results validate the efficiency of SEDER.

2 System and Adversarial Model

System Model. We consider two entities: (1) Cloud server (\(\textsf {CS}\)). \(\textsf {CS}\) provides storage services and wants to perform client-side deduplication to reduce both storage and bandwidth cost; (2) Cloud users (\(\textsf {U}\)). The users outsource their data to the cloud. To maintain confidentiality of their outsourced data, they will encrypt the data before outsourcing them. Note that when the cloud user tries to upload a file that has been stored in the cloud server, \(\textsf {CS}\) will append this cloud user to the owner list of the corresponding file without requiring uploading the file again.

Adversarial Model. We consider an honest-but-curious cloud server [39, 41]. \(\textsf {CS}\) will honestly store the encrypted data uploaded by the users, perform data deduplication, and respond the requests from the users. Moreover, \(\textsf {CS}\) will not disclose the data to any parties who fail to prove ownership of the data. However, it is curious and attempts to infer information about the encrypted users’ data. In addition, there is a malicious entity (\(\textsf {ME}\)) who has obtained the key materials and tries to have access to the sensitive data.

Assumptions. We assume the MLE used in SEDER is secureFootnote 1. All the communication channels among the \(\textsf {CS}\) and \(\textsf {U}\) are protected by SSL/TLS, so that any eavesdroppers cannot infer the messages being transmitted. Each entity (\(\textsf {CS}\) and \(\textsf {U}\)) has an asymmetric key pair, and the private key is well protected. We also assume that \(\textsf {CS}\) and \(\textsf {ME}\) will not collude with each other.

3 Building Blocks

Message-Locked Encryption ( MLE ). \(\textsf {MLE}\) [13] is a cryptographic primitive which can derive encryption keys from messages being encrypted. In MLE scheme, different users are able to generate the same key for the identical data. Existing MLE schemes include CE [23], DupLESS [12], Duan Scheme [24], and LAP scheme [33]. \(\textsf {MLE}\) uses symmetric encryption to encrypt a message with its MLE key.

All-or-Nothing Transform ( AONT ). \(\textsf {AONT}\) [35] is an unkeyed, invertible and randomized transformation. No one can succeed to perform the inverse transformation without knowing the entire output of the \(\textsf {AONT}\). Specifically, given message m of s-blocks: \(m=m_1||\ldots ||m_s\) where || denotes block concatenation, AONT transforms m into message \(m'\) of t-blocks: \(m' = m'_1|| \ldots || m'_t\) where \(t\ge s+1\), and satisfies the following properties:

  • Given m, \(m' \leftarrow \textsf {AONT}(m)\) can be computed efficiently. That is, the complexity of \(\textsf {AONT}(m)\) is polynomial to the length of m.

  • Given \(m'\), \(m \leftarrow \textsf {AONT}^{-1}(m')\) can be computed efficiently.

  • Without knowing the entire \(m'\) (i.e., if one block is missing), the probability of recovering m is negligibly small.

In this paper, we instantiate \(\textsf {AONT}\)with the package transform [35], which takes input an s-block message m and outputs an \((s+1)\)-block message \(m'\).

Proofs of Ownership (\(\textsf {PoWs}\)). \(\textsf {PoWs}\) [26] is a cryptographic protocol that allows the cloud server (as a verifier) to efficiently and securely validate that the data owner (as a prover), who wants to upload the data file that has been already stored in the server, really possesses that data file. Here the efficiency means that the communication is far less than the bandwidth of uploading the data file, and the security means that the data owner cannot cheat the server in non-negligible probability even if he/she possesses a large portion of the file and its metadata (e.g., hash value).

  • \(\textsf {witness}\leftarrow \textsf {PoWs}.\textsf {Init}(f)\): Given a data file f, the verifier first preprocesses it and obtains some auxiliary data \(\textsf {witness}\) for the verification purpose:

    • The verifier uses an \(\alpha \)-erasure-code EC to encode the data file f, where \(\alpha \) denotes the erasure recovery capability.

    • The verifier computes the Merkle tree \(MT_{H,b}(f)\) of the data file f, where H is a hash function used in computing Merkle tree and b is the size of a Merkle-tree leaf. The root value of Merkle-tree \(r_{MT}(f)\) will be \(\textsf {witness}\).

  • \(\textsf {challenge}\leftarrow \textsf {PoWs}.\textsf {Challenge}\): When a prover declares that he/she owns a file f, the verifier chooses randomly x leaf indexes \(l_1,l_2,...,l_x\) and sends \(\textsf {challenge}= (l_1,l_2,...,l_x)\) to the prover, where \(\epsilon \) is the soundness bound and x is the minimum integer satisfying \((1-\alpha )^x< \epsilon \).

  • \(\textsf {prof}\leftarrow \textsf {PoWs}.\textsf {Prove}(challenge, f)\): The prover builds the Merkle tree on top of data file f and returns the proof \(\textsf {prof}\) which consists of the sibling-paths of \(l_1,l_2,...,l_x\).

  • \(\{0,1\}\leftarrow \textsf {PoWs}.\textsf {Verify}(\textsf {witness}, \textsf {challenge}, \textsf {prof})\): The verifier returns 1 if all the sibling-paths are valid with the Merkle tree root, and 0 otherwise.

4 SEDER

In this section, we first present a delegated re-encryption scheme (\(\textsf {DRE}\)) which allows to delegate re-encryption to an untrusted third party, and then elaborate the design of SEDER by leveraging \(\textsf {DRE}\) and other building blocks (Sect. 3).

4.1 Delegated Re-encryption

Proxy re-encryption (\(\textsf {PRE}\)) [14, 27] allows a proxy to convert the ciphertext, which can only be decrypted by the delegator, into another ciphertext that can be decrypted by the delegatee, without leaking the plaintext to the proxy. Proxy re-encryption has been well studied and many promising features have been proposed, such as uni-direction, key privacy and no-interaction key generation. However, proxy re-encryption cannot be used here, because it cannot support unlimited hops. Based on the scheme [11] which only supports single hop, we re-design a delegable re-encryption scheme supporting unlimited hops (\(\textsf {DRE}\)). The detail of \(\textsf {DRE}\) is as follows:

  • \(\textsf {DRE}.\textsf {SetUp}(1^\ell )\): G is a multiplicative cyclic group of prime order q (q is an \(\ell \)-bit system parameter, and \(\ell \) is large enough). g is chosen from G at random and is known to all the parties.

  • \(\textsf {DRE}.\textsf {KeyGen}(\textsf {U}_i)\): Given user \(\textsf {U}_i\), this algorithm generates the public key \(\textsf {pk}_i=\{g^{a_i}\}\) and \(\textsf {sk}_i=\{a_i\}\), where \(a_i\) is chosen at random from \( \mathbb {Z}_q\).

  • \(\textsf {DRE}.\textsf {Enc}(\textsf {pk}_i, m)\): Message m is encrypted into \(c_i=(c_{i_1},c_{i_2})=((g^{a_i})^{k_i},mg^{k_i})\), where \(k_i\) is chosen at random from \(\mathbb {Z}_q\).

  • \(\textsf {DRE}.\textsf {ReKeyGen}(\textsf {sk}_i, \textsf {pk}_j,c_{i_1})\): Given user \(\textsf {U}_i\)’s private key \(\textsf {sk}_i\), user \(\textsf {U}_j\)’s public key \(\textsf {pk}_j\) (note that by running \(\textsf {PRE}.\textsf {KeyGen}(\textsf {U}_j)\), \(\textsf {U}_j\) generates public key \(\textsf {pk}_j=\{g^{a_j}\}\) and \(\textsf {sk}_j=\{a_j\}\), where \(a_j\) is chosen at random from \(\mathbb {Z}_q\)) and \(c_{i_1}\), the re-encryption key \(rk_{i \rightarrow j}\) can be generated: \(rk_{i \rightarrow j}=(rk_{i \rightarrow j_1},rk_{i \rightarrow j_2})=((g^{a_j})^{k_j},\frac{g^{k_j}}{(c_{i_1})^{1/a_i}})\), where \(k_j\) is randomly selected from \(\mathbb {Z}_q\).

  • \(\textsf {DRE}.\textsf {ReEnc}(rk_{i\rightarrow j}, c_i)\): Given the re-encryption key \(rk_{i\rightarrow j} =(rk_{i \rightarrow j_1},rk_{i \rightarrow j_2})\), the proxy can re-encrypt the ciphertext \(c_i=(c_{i_1},c_{i_2})\) to \(c_j\) by computing: \(c_j =(c_{j_1},c_{j_2} )= (rk_{i \rightarrow j_1},c_{i_2}rk_{i \rightarrow j_2}).\)

  • \(\textsf {DRE}.\textsf {Dec}(\textsf {sk}_j, c_j)\): Given the ciphertext \(c_j=(c_{j_1},c_{j_2} )\), the user \(\textsf {U}_j\) decrypts it using \(\textsf {sk}_j=\{a_j\}\) by computing: \(m = \frac{c_{j_2}}{(c_{j_1})^{1/a_j}}.\)

4.2 Design Rational of SEDER

SEDER contains several key designs: First, we use \(\textsf {AONT}\) and \(\textsf {DRE}\) together to support efficient re-encryption of the outsourced file. Specifically, given a file, we apply \(\textsf {MLE}\), obtaining the \(\textsf {MLE}\) ciphertext. MLE ensures that the same ciphertext will be generated from different users if the file content is the same. Then \(\textsf {AONT}\) is applied to \(\textsf {MLE}\) ciphertext, generating a set of data blocks. Note that without fetching all the data blocks, the \(\textsf {MLE}\) ciphertext cannot be recovered thanks to the interesting property of \(\textsf {AONT}\). In this way, to re-encrypt a data file, the data owner only needs to re-encrypt one data block, rather than all the data blocks. In addition, by leveraging \(\textsf {DRE}\), we can delegate the re-encryption process to the untrusted cloud server, without leaking the plaintext of the file. This is advantageous as we can eliminate the burden on the client who is supposed to be kept lightweight.

Second, to ensure only the valid data owners are able to decrypt the data being re-encrypted, we perform the following: (1) We leverage proofs of ownership (\(\textsf {PoWs}\)) to distinguish valid and invalid data owners. A valid data owner for a file should be able to prove his/her ownership as he/she possesses the file. When a data owner passes the verification, the cloud server will add him/her to the owner list of the file. (2) The cloud user who re-encrypts the file will compute new assisting information that is required to decode the file being re-encrypted. The new assisting information will only be disclosed to the valid data owners. The malicious entity, even though have obtained the secret key, will not be able to pass the \(\textsf {PoWs}\) verification, and thus cannot obtain the new assisting information which is required to decode the re-encrypted file.

4.3 Design Details of SEDER

Let \(\lambda \) and \(\beta \) be the security parameter. Let \(\pi _{\textsf {DRE}}\) be a delegated re-encryption scheme, such that \(\pi _{\textsf {DRE}}=(\pi _{\textsf {DRE}}.\textsf {SetUp}\), \(\pi _{\textsf {DRE}}.\textsf {KeyGen}\), \(\pi _{\textsf {DRE}}.\textsf {Enc}, \pi _{\textsf {DRE}}.\textsf {ReKeyGen},\) \( \pi _{\textsf {DRE}}.\textsf {ReEnc}, \) \(\pi _{\textsf {DRE}}.\textsf {Dec})\). \(\pi _{\textsf {sym}}\) is a symmetric encryption scheme such that \(\pi _{\textsf {sym}} = (\pi _{\textsf {sym}}.\textsf {KeyGen}\), \(\pi _{\textsf {sym}}.\textsf {Enc}, \pi _{\textsf {sym}}.\textsf {Dec})\), and \(\pi _{\textsf {asym}}\) is an asymmetric encryption scheme such that \(\pi _{\textsf {asym}} = (\pi _{\textsf {asym}}.\textsf {KeyGen}, \pi _{\textsf {asym}}.\textsf {Enc}, \pi _{\textsf {asym}}.\textsf {Dec})\). Let \(H_1\) be a cryptographic hash funciton: \(H_1: \{0 ,1\}^{*}\rightarrow \{0, 1\}^{\lambda }\). In the following, we describe the design details of SEDER, which contains six phases: \(\textsf {SetUp}\), \(\textsf {PreUpload}\), \(\textsf {Upload}\), \(\textsf {Update}\), \(\textsf {Download}\) and \(\textsf {Delete}\).

\(\underline{\textsf {SetUp}}\): This is to bootstrap the system parameters, and initialize cryptographic parameters for cloud users and cloud server. The system runs \(\pi _{\textsf {DRE}}.\textsf {SetUp}(1^\beta )\) to initialize the system parameters. In addition,

  • Cloud user \(\textsf {U}_i\): He/She runs the key generation algorithm of asymmetric encryption scheme to generate the public/private key: \((\pi _{\textsf {asym}}.\textsf {pk}_{\textsf {U}_i}, \pi _{\textsf {asym}}.\textsf {sk}_{\textsf {U}_i}) \leftarrow \pi _{\textsf {asym}}.\textsf {KeyGen}(1^\beta )\).

  • Cloud server: It runs the key generation algorithm of asymmetric encryption scheme to generate the public/private key: \((\pi _{\textsf {asym}}.\textsf {pk}_\textsf {CS}, \pi _{\textsf {asym}}.\textsf {sk}_\textsf {CS}) \leftarrow \pi _{\textsf {asym}}.\textsf {KeyGen}(1^\beta )\).

\(\underline{\textsf {PreUpload}}\): The \(\textsf {PreUpload}\) phase is run by the cloud user \(\textsf {U}_i\) before \(\textsf {U}_i\) uploads file f to the cloud. \(\textsf {U}_i\) uses MLE [12, 13, 23, 24] to obtain the file key \(k_f\) for file f. MLE can ensure that different users are able to generate the same key for the same file content.

\(\underline{\textsf {Upload}}\): The \(\textsf {Upload}\) phase is run by \(\textsf {U}_i\) to upload file f. Note that \(\textsf {U}_i\) has obtained the file key \(k_f\) during the \(\textsf {PreUpload}\) phase. \(\textsf {U}_i\) encrypts f by running \(\textsf {ct}= \pi _{\textsf {sym}}.\textsf {Enc}(k_f, f)\). \(\textsf {U}_i\) then computes a tag for f: \(\textsf {Tag}_{f} = H_1(\textsf {ct})\), and sends \(\textsf {Tag}_{f}\) to cloud server \(\textsf {CS}\). \(\textsf {CS}\) proceeds as follows:

Case 1: \(\textsf {Tag}_f\) does not exist in the cloud server: In this case, the cloud user conducts the following operations and uploads the corresponding file to the cloud:

  • Given \(\textsf {ct}\), \(\textsf {U}_i\) runs \(\textsf {PoWs}.\textsf {Init}(\textsf {ct})\) to generate the \(\textsf {witness}\).

  • Assume that the encrypted file \(\textsf {ct}\) consists of s blocks: \(\textsf {ct}=\textsf {ct}_1||\textsf {ct}_2||...\) \(\textsf {ct}_s\). \(\textsf {U}_i\) first applies all-or-nothing transform on \(\textsf {ct}\), generating \(s+1\) blocks, such that \( \textsf {ct}' \leftarrow \textsf {AONT}(\textsf {ct})\) where \(\textsf {ct}' = \textsf {ct}'_1||\textsf {ct}'_2||...||\textsf {ct}'_s||\textsf {ct}'_{s+1}\).

  • \(\textsf {U}_i\) generates a pair of public/private key by applying \((\pi _{\textsf {PRE}}.\textsf {pk}_i, \pi _{\textsf {PRE}}.\textsf {sk}_i) \leftarrow \pi _{\textsf {PRE}}.\textsf {KeyGen}(\textsf {U}_i)\).

  • \(\textsf {U}_i\) randomly selects a data block \(\textsf {ct}'_z\) from \(\textsf {ct}'_1, \ldots , \textsf {ct}'_{s+1}\). Then \(\textsf {U}_i\) applies the delegated re-encryption scheme \(\pi _\textsf {PRE}\) to encrypt \(\textsf {ct}'_z\) into c, such that \(c =(c_1,c_2)= \pi _{\textsf {DRE}}.\textsf {Enc}(\pi _{\textsf {DRE}}.\textsf {pk}_i, \textsf {ct}'_z)\). Therefore, the final ciphertext to be uploaded is: \(\textsf {ct}_{\textsf {Upload}}= \textsf {ct}'_1||\ldots ||\textsf {ct}'_{z-1}|| c || \textsf {ct}'_{z+1} ||\ldots || \textsf {ct}'_{s+1}\).

  • Using file key \(k_f\), \(\textsf {U}_i\) encrypts \(\pi _{\textsf {DRE}}.\textsf {sk}_i\) using \(\pi _{\textsf {sym}}.\textsf {Enc}\) such that \(\textsf {ct}^*_{\textsf {sym}} = \pi _{\textsf {sym}}.\textsf {Enc}(k_f,\pi _{\textsf {DRE}}.\textsf {sk}_i).\)

  • Using the cloud server’s public key \(\pi _{\textsf {asym}}.\textsf {pk}_{\textsf {CS}}\), \(\textsf {U}_i\) encrypts \(\textsf {ct}^*_{\textsf {sym}}\) such that \(\textsf {ct}_{\textsf {asym}}=\pi _{\textsf {asym}}.\textsf {Enc}(\textsf {pk}_{\textsf {CS}}, \textsf {ct}^*_{\textsf {sym}})\).

  • \(\textsf {U}_i\) uploads \(\textsf {ct}_{\textsf {Upload}}\), \(\textsf {witness}\), and \(\textsf {ct}_{\textsf {asym}}\).

  • After receiving the aforementioned information, \(\textsf {CS}\) organizes them in the format \(<\textsf {Tag}_f\), \(\textsf {ct}_{\textsf {Upload}}\), \(\textsf {witness}\), \(\textsf {ct}_{\textsf {asym}}\), user list \(ul_{\textsf {ct}_{\textsf {Upload}}}>\). By decrypting \(\textsf {ct}_{\textsf {asym}}\) using \(\textsf {sk}_{\textsf {CS}}\), \(\textsf {CS}\) obtains the assisting information \(\textsf {ct}^*_{\textsf {sym}}\), which will be distributed to valid cloud users in the following manner: encrypt \(\textsf {ct}^*_{\textsf {sym}}\) using each user’s public key and send the corresponding ciphertext to that user.

Case 2: \(\textsf {Tag}_f\) exists in the cloud server: To further confirm that \(\textsf {U}_i\) really possesses f, \(\textsf {CS}\) and \(\textsf {U}_i\) proceed as follows:

  • \(\textsf {CS}\) runs \(\textsf {PoWs}.\textsf {Challenge}\) to generate a challenge which is sent to \(\textsf {U}_i\).

  • \(\textsf {U}_i\) computes a proof \(\textsf {prof}\) by running \(\textsf {PoWs}.\textsf {Prove}(\textsf {challenge}, \textsf {ct})\).

  • \(\textsf {CS}\) further runs \(\textsf {PoWs}.\textsf {Verify}(\textsf {witness}, \textsf {challenge}, \textsf {prof})\). If the output is 1, \(\textsf {CS}\) appends \(u_i\) to the user list \(ul_{\textsf {ct}_{\textsf {Upload}}}\) and sends the assisting information of file f to \(\textsf {U}_i\). Otherwise, \(\textsf {CS}\) terminates.

\(\underline{\textsf {Update}}\): When a data owner finds his file key is compromised, he needs to re-encrypt the corresponding file and makes sure that other data owners of the file can decrypt the latest ciphertext of the file. Thanks to \(\textsf {AONT}\)  we only need to re-encrypt the encrypted block c rather than the entire outsourced file. Note that \(c =(c_1,c_2)\), and \(c_1\) is also sent to cloud users when \(\textsf {CS}\) distributes the assisting information \(\textsf {ct}^*_{\textsf {sym}}\). The \(\textsf {Upload}\) phase is performed between cloud user \(\textsf {U}_j\) (who is on the user list of file f) and \(\textsf {CS}\). The phase proceeds as:

  • \(\textsf {U}_j\) runs \(\textsf {DRE}.\textsf {KeyGen}(\textsf {U}_j)\) to generate a pair of public/private key, namely \((\textsf {DRE}.\textsf {pk}_j, \textsf {DRE}.\textsf {sk}_j) \leftarrow \textsf {DRE}.\textsf {KeyGen}(\textsf {U}_j)\).

  • \(\textsf {U}_j\) decrypts \(\textsf {ct}^*_{\textsf {sym}}\) using \(k_f\), obtaining \(\textsf {DRE}.\textsf {sk}_i\).

  • Using \(\textsf {DRE}.\textsf {sk}_i, c_1\) and \(\textsf {DRE}.\textsf {pk}_j\), \(\textsf {U}_j\) generates the delegable re-encryption key \(rk_{i\rightarrow j} \leftarrow \textsf {DRE}.\textsf {ReKeyGen}(\textsf {DRE}.\textsf {sk}_i, \textsf {DRE}.\textsf {pk}_j,c_1)\).

  • Using \(k_f\), \(\textsf {U}_j\) encrypts \(\pi _{\textsf {DRE}}.\textsf {sk}_j\): \(\textsf {ct}^{\#}_{\textsf {sym}} = \pi _{\textsf {sym}}.\textsf {Enc}(k_f,\pi _{\textsf {DRE}}.\textsf {sk}_j).\)

  • Using \(\pi _{\textsf {asym}}.\textsf {pk}_{\textsf {CS}}\), \(\textsf {U}_j\) encrypts \(\textsf {ct}^{\#}_{\textsf {sym}}\): \(\textsf {ct}'_{\textsf {asym}}=\pi _{\textsf {asym}}.\textsf {Enc}(\textsf {pk}_{\textsf {CS}}, \textsf {ct}^{\#}_{\textsf {sym}})\).

  • \(\textsf {U}_j\) sends \(\textsf {ct}'_{\textsf {asym}}\) and \(rk_{i\rightarrow j}\) to \(\textsf {CS}\).

  • \(\textsf {CS}\) runs \(c' \leftarrow \pi _{\textsf {DRE}}.\textsf {ReEnc}(rk_{i\rightarrow j}, c)\) and replaces c with \(c'\). In addition, \(\textsf {CS}\) replaces \(\textsf {ct}_{\textsf {asym}}\) with \(\textsf {ct}'_{\textsf {asym}}\), decrypts \(\textsf {ct}'_{\textsf {asym}}\) obtaining \(\textsf {ct}^{\#}_{\textsf {sym}}\), and distributes \(\textsf {ct}^{\#}_{\textsf {sym}}\) to the users on the user list \(ul_{\textsf {ct}_{\textsf {Upload}}}\).

\(\underline{\textsf {Download}}\): If user \(\textsf {U}_i\) wants to download \(\textsf {ct}_{\textsf {Upload}}\) from the cloud server, \(\textsf {U}_i\) will send a download request \((Tag_f,download)\) to \(\textsf {CS}\). When \(\textsf {CS}\) receives the request, \(\textsf {CS}\) returns \(\textsf {ct}_{\textsf {Upload}}\) to the requestor. \(\textsf {U}_i\) uses the file key and the assisting information to decode \(\textsf {ct}_{\textsf {Upload}}\).

\(\underline{\textsf {Delete}}\): When \(\textsf {CS}\) receives a delete request \((Tag_f,delete)\) from user \(\textsf {U}_i\), \(\textsf {CS}\) will delete \(\textsf {U}_i\) from \(ul_{\textsf {ct}_{\textsf {Upload}}}\). If \(ul_{\textsf {ct}_{\textsf {Upload}}}\) turns empty, \(\textsf {CS}\) will delete \(\textsf {ct}_{\textsf {Upload}}\).

5 Security Analysis and Discussion

5.1 Security Analysis

Correctness and Security of DRE . When receiving the ciphertext \(c_j\), \(\textsf {U}_j\) can successfully decrypt it as follows:

$$\begin{aligned} \frac{c_{j_2}}{(c_{j_1})^{1/a_j}}=\frac{c_{i_2}rk_{i \rightarrow j_2}}{(rk_{i \rightarrow j_1})^{1/a_j}}=\frac{c_{i_2}(\frac{g^{k_j}}{(c_{i_1})^{1/a_i}})}{((g^{a_j})^{k_j})^{1/a_j}}=\frac{mg^{k_i}(\frac{g^{k_j}}{((g^{a_i})^{k_i})^{1/a_i}})}{g^{k_j}}=\frac{mg^{k_j}}{g^{k_j}}=m. \end{aligned}$$

In addition, by knowing g, user \(\textsf {U}_i\)’s public key \(g^{a_i}\) and user \(\textsf {U}_j\)’s public key \(g^{a_j}\), the proxy cannot learn anything about plaintext m by observing: (1) \(c_i=(c_{i_1},c_{i_2})=((g^{a_i})^{k_i},mg^{k_i})\); and (2) \(rk_{i \rightarrow j}=(rk_{i \rightarrow j_1},rk_{i \rightarrow j_2})=((g^{a_j})^{k_j},\frac{g^{k_j}}{(c_{i_1})^{1/a_i}})=((g^{a_j})^{k_j},\frac{g^{k_j}}{g^{k_i}})\), due to the hardness of discrete logarithm problem.

Data Confidentiality. In the following, we show that cloud server \(\textsf {CS}\) and the malicious entity \(\textsf {ME}\) are not able to learn the plaintext of the encrypted data.

\(\textsf {CS}\) possesses the following information: \(Tag_f\), \(\textsf {ct}_{\textsf {Upload}}\), \(\textsf {ct}^*_{\textsf {sym}}\), \(\textsf {ct}^{\#}_{\textsf {sym}}\) and \(\textsf {witness}\). Based on \(\textsf {ct}_{\textsf {Upload}}\), \(\textsf {CS}\) is not able to obtain \(\textsf {ct}\), as it is not able to decrypt block c or \(c'\) (security of \(\textsf {AONT}\)). By knowing \(Tag_f\), \(\textsf {CS}\) may try to learn \(\textsf {ct}\) which is computationally impossible. Even if \(\textsf {CS}\) can learn something about \(\textsf {ct}\), without knowing the file key \(k_f\), \(\textsf {CS}\) still cannot learn anything about the plaintext of the file. Also, as \(\textsf {witness}\) is computed from \(\textsf {ct}\), it cannot help to learn the plaintext of the file. The users who really possess a file can prove their ownership of the corresponding file to \(\textsf {CS}\). This ensures that only valid data owners can obtain the latest assisting information from \(\textsf {CS}\) and hence are able to recover the original file from \(\textsf {ct}_{\textsf {Upload}}\).

\(\textsf {ME}\) is not able to prove to \(\textsf {CS}\) the ownership of f, and is thus not able to obtain the new assisting information from \(\textsf {CS}\). Upon having access to \(\textsf {ct}_{\textsf {Upload}}\), \(\textsf {ME}\) cannot use the old assisting information to decrypt the re-encrypted block c, and is thus not able to decode \(\textsf {ct}_{\textsf {Upload}}\) to obtain \(\textsf {ct}\) (security of \(\textsf {AONT}\)). Therefore, even if he/she can have access to the file key \(k_f\), he/she is not able to obtain f.

5.2 Discussion

Zero-Day Attack. SEDER is vulnerable to the zero-day attack, in which the key is leaked and the re-encryption has not been performed. During this period, the adversary can have access to the original file using the obtained key materials. This seems to be unavoidable and we currently do not have a good solution for mitigating such a strong attack.

Supporting User Revocation. Considering the scenario that each data owner has a few users, and the data owner wants to revoke a certain user, which requires re-encrypting the outsourced file. SEDER can be simply adapted to this scenario, but may face an additional attack: the malicious user can store the decrypted version of block c, and is always able to decode \(\textsf {ct}_{\textsf {Upload}}\), even though he/she is not able to obtain the new assisting information. This attack can be mitigated by re-encrypting a randomly chosen block during each re-encryption process.

The Nature of the Storage Being Supported by SEDER. Currently, SEDER only supports archival storage [10, 15, 20, 22]. We will extend SEDER to support dynamic storage (i.e., supporting dynamic operations like insert, delete, modify, and append [17, 18, 25, 39]) in our future work.

6 Experimental Evaluation

We evaluated the overhead of each operation in SEDER. We used OpenSSLv1.0.0e [7] for data encryption/decryption and large number modular operations. The symmetric and asymmetric encryption/decryption function are instantiated by AES-128 and RSA-1024 respectively. Throughout the experiment, the client and the server both ran on local workstations with Intel i7-2600 (3.4 GHz) CPU and 10 GB RAM.

The \(\textsf {PreUpload}\) phase just uses existing MLE schemes. Therefore, we only focus on the performance overhead in \(\textsf {Upload}\), \(\textsf {Download}\) and \(\textsf {Update}\) phases.

6.1 Performance Evaluation

Communication. SEDER only introduces the following extra communication between the user and cloud server: the ciphertext (i.e., 512 bits) for encrypting one block of the transformation encrypted data, verification information (i.e., 256 bits) and the assisting information (i.e., 256 bytes) in \(\textsf {Upload}\) phase; the delegable re-encryption public key (i.e., 512 bits) and the assisting information (i.e., 256 bytes) in \(\textsf {Update}\) phase.

Computation. We evaluated the processing time for uploading, downloading and re-encrypting data with size varying from 100 MB to 2 GB. Results are averaged over 100 runs.

The computation overhead is shown in Fig. 1(a). In SEDER, the user has to perform \(\textsf {AONT}\) which consists multiple AES encryption (determined by the size of the data) and XOR operations on the regular encrypted data. We observed that the computation overhead increases with the size of the processed data and is slightly larger than the regular data encryption in regular MLEs, as we store the intermediate data in the disk to reduce the needed memory in the users.

In \(\textsf {Upload}\) phase, when a cloud user (User 1) has accomplished the above operation, it has to perform the asymmetric encryption on one block of the transformation encrypted data. The user has to generate the asymmetric key pair, encrypt one block of the transformation encrypted data with the corresponding public key, and encrypt the corresponding private key using the file key and the public key of the cloud server in order. From Fig. 1(b), we observed that the processing overhead (denoted as User1BlEncrypt) is the same for different sizes of the data, and is less than 0.5 ms, which is rather small. The cloud server processes the ciphertext of \(ct^*_{sym}\) and returns the re-encrypted \(ct^*_{sym}\) to the user (denoted as CloudReturnKey in Fig. 1(b)).

When a user (User 2) wants to update the key, he/she requires the cloud server corporately to perform the re-encryption. The user generates delegable re-encryption key \(rk_{i\rightarrow j}\) and encrypts \(\pi _{\textsf {DRE}}.\textsf {sk}_j\) with the MLE key and cloud public key in order(denoted as User2ReEncrypt), and requires the cloud to complete the re-encryption (denoted as CloudReEncrypt). We observed that the performance overhead for re-encryption is independent of the size of data, and is very small (less than 6 ms) compared to the time for encryption data (more than 2,000 ms for 100 MB data).

To download the data, the user needs to decrypt the re-encrypted block (denoted as ReDecryption in Fig. 1(c)) when one or more re-encryption is executed. Then, the user performs the inverse transformation of \(\textsf {AONT}\) and decrypts the plain data finally (DataDecryption). From Fig. 1(c), we observed that the performance overhead caused by re-encryption is the same (less than 6 ms) for different sizes of data, which is negligible compared to the cost of data decryption (more than 7,000 ms for 100 MB data). The processing time for data decryption increases with the size of data, and is almost equal to the time for data encryption and \(\textsf {AONT}\) transformation, which is reasonable. Compared with the basic MLE schemes, SEDER spent about double time processing the outsourced file, which is acceptable, as it is one time only. And also, the users can achieve highly efficient re-encryption in the \(\textsf {Update}\) phase.

Fig. 1.
figure 1

SEDER performance

7 Related Work

Bellare et al. [13] formalized a new cryptographic primitive “MLE” (Message-Locked Encryption) to derive encryption/decryption key from the message being encrypted/decrypted. This new primitive can facilitate performing deduplication over data encrypted by different users.

Douceur et al. [23] proposed convergent encryption (CE), the first MLE scheme in which the key used to encrypt a file is the hash value of the file, so that the same file possessed by different users can be encrypted by the same key. CE has been used in a few systems [21, 28, 30, 31, 36, 37, 40]. CE however, is vulnerable to an off-line dictionary attack as file data are usually from a predictable space [13]. Following CE, several MLE schemes were proposed. Bellare et al. proposed DupLESS [12] to mitigate the off-line dictionary attack using per-client rate limiting strategy. Specifically, they introduced a key server during key derivation to restrict the number of signature requests allowed for a user during a fixed time interval. Duan [24] proposed another MLE scheme based on distributed oblivious key generation. Liu et al. [33] proposed a new MLE scheme by eliminating the additional independent servers. In their scheme, users use PAKE to exchange the file encryption key with the help of cloud servers. In order to prevent online dictionary attack, their scheme realizes a per-file rate limiting strategy (every user limits under the number of key agreement he/she takes part in).

REED [32] aimed at addressing the key revocation problem for secure server-side deduplication in cloud storage. In order to efficiently replace old keys and re-encrypt the data, REED introduced two special all-or-nothing transforms derived from CAONT (CANOT is a special case of all-or-nothing transforms in which the key used for \(\textsf {AONT}\) is the hash of message). ClearBox [9] is a transparent deduplication scheme, in which storage service providers can attest to users the number of owners of a file transparently, so that users can share the fee of storing the same file. Li et al. [29] proposed \(SecCloud^+\) to achieve data integrity and deduplication simultaneously. Tang et al. [38] performed data deduplicaiton on CP-ABE.

8 Conclusion

In this paper, we propose SEDER to address the re-encryption problem for secure client-side deduplication in cloud storage. Security analysis and experimental results show that our design brings in acceptable overhead in various phases while being able to ensure continuous confidentiality for encrypted cloud storage based on client-side deduplication.