A Secure Ranked Search Model Over Encrypted Data in Hybrid Cloud Computing

Zhang, Jiuling; Shen, Shijun; Huang, Daochao

doi:10.1007/978-981-33-4922-3_3

Jiuling Zhang¹⁵,
Shijun Shen¹⁵ &
Daochao Huang¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1299))

Included in the following conference series:

China Cyber Security Annual Conference

4706 Accesses

Abstract

The security issue is becoming more and more prominent since user’s private information being outsourced to the somewhat untrustworthy cloud. Encrypting the information before uploading them to the cloud is one of ultimate solutions. Secure searchable encryption schemes and secure ranking schemes have been proposed to help retrieving the most relevant documents over the cloud. However the present methods are encumbered by the huge computing and communicating occupation of the cipher text. In this paper, a fully homomorphic encryption based secure ranked search model over the hybrid cloud is proposed. By introducing hybrid cloud, which typically composed by private cloud and public cloud, the high cost of computing and communicating of the cipher text is transferred to the trustworthy private cloud, in which the decrypting are performed. The client does not need to perform any heavy computations, thence making the secure ranking practical from the client’s point of view.

You have full access to this open access chapter, Download conference paper PDF

Multi-keyword ranked searchable encryption scheme with access control for cloud storage

Article 30 April 2019

Privacy Preserving Ranked Keyword Search over Encrypted Cloud Data

TSED: Top-k Ranked Searchable Encryption for Secure Cloud Data Storage

Keywords

1 Introduction

With the unprecedented growing of information, as well as the limited storage and computation power of their terminal, or the limited capacity of battery, the users are outsourcing more and more individual information to remote servers or clouds. However, since the public cloud are not fully trustworthy, the security of the information stored on the cloud could not be guaranteed. The security issue has attracted a variety of attentions in both the area of engineering and research. One solution to meet the needs of data outsourcing while preserving privacy is to encrypt them before storing them on the cloud, and this is one of the generally accepted ultimate methods. After the sensitive data are encrypted with some scheme, the cipher text of sensitive private information may be deemed as secure and could be outsourced to public cloud. However, once the data are encrypted into cipher text, the processing and utilizing of the cipher text information will be the subsequent problem that needs to be taken into consideration. With the accumulation of the information outsourced over the cloud, the collection will be so large that the retrieving of the encrypted form of information is the subsequent conundrum.

Several kinds of schemes have been proposed since the pioneering work of Song et al. [1], in which a cryptography scheme for the problem of searching on encrypted data is proposed. The proposed scheme is practical and provably secure, as from the query the server cannot learn anything more about the plain text. Although the scheme performs well over the linear search, it is almost impractical over the huge information retrieval scenario. In another work, Goh introduced a Bloom filter based searchable scheme with a complexity of the number of documents in the collection over the cloud [2], which is also not applicable in huge information retrieval scenario. Other work includes a secure conjunctive keyword search over the encrypted information with a linear communication cost [6], privacy-preserving multi-keyword ranked searches over encrypted cloud data [3, 10], and fuzzy keyword search over encrypted cloud [4].

In the general huge plaintext retrieval scenario, different pieces of information are organized in the form of documents. The information size stored in the cloud is very large, and the acquisition of the requested information should be implemented with the help of retrieval methods. A number of documents may contain a given query and if the query is searched, many documents may be retrieved. After the more or less relevant documents are retrieved, the ranking of them over the cloud computing is necessary. Actually in the large information retrieval scenario, the retrieved information should be ranked by the relevance scores between the document and the queries. This is due to that the number of documents contains a keyword or a multiple of keywords is so large that it is hard to obtain the most relevant documents from the client’s point of view. The most relevant documents should be retrieved and given to the users. In plaintext retrieval, a similarity based ranking schemes named the locality sensitive hashing [7] and an inner product similarity to value the relevance between the query and the document are separately presented [4].

In huge collection cipher text retrieval, a one-to-many order-preserving mapping technique is employed to rank sensitive score values [13]. A secure and efficient similarity search over outsourced cloud data is proposed in [14]. There are also work exploring semantic search based on conceptual graphs over encrypted outsourced data [8]. Though the one-to-many order-preserving mapping design facilitates efficient cloud side ranking without revealing keyword privacy and there is no cost on the client’s terminal, the precision of this model is lower than that over the plaintext scenario. There is a counterbalance between the accuracy of the results and the security as the statistical information is provided. There are also efforts utilizing the fully homomorphic encryption [5] to calculate the relevance scores between the documents and queries. However, since the encryption and decryption are all performed on the client’s terminal, and they are also resource consuming, the time cost is also intolerable.

In order to solve the problem that enormous computation and communication emerged in the fully homomorphic encryption based ranking, the hybrid cloud [12] is introduced to employ. Hybrid cloud generally consists public cloud and private cloud. The public cloud is provided by the entrepreneur, and not fully trust worthy, while the private cloud belongs to the organization, and thus trustable. Hybrid cloud also described the architecture and cooperation among different cloud vendors, and gave solution on the communication, storage, and computation among different cloud [11]. Here, we make the assumption that there is at least one secure private cloud in the hybrid cloud. The plain text information is handled over the trustworthy or private cloud. With the cooperation with other public clouds on which encrypted information are stored and processed, the secure ranking model is proposed.

This paper is organized as follows, the related work is reviewed in Sect. 2, and then a secure ranked search model over the hybrid cloud is introduced in Sect. 3. Some experiments are carried out in Sect. 4. Finally, a conclusion is drawn in Sect. 5.

2 Related Work

2.1 The Okapi BM25 Model Over Plain Text

In information retrieval, a document D is generally processed into a bag of words. The collection of documents is denoted by C. The documents and queries are generally preprocessed and stemmed, the index and inverted index are also built to facilitate further retrieval [9], the details are omitted here.

There are a variety of mature information retrieval models, which varies from the linear search to Boolean model to ranked vector space model (VSM) models. Different retrieval model applies in different scenarios. The ranking models are used most frequently for general purposes.

Okapi BM25 model [15] is one of the most popular ranking model for obtaining the relevance scores of documents and queries. In the Okapi BM25 model, the term frequency is defined by Eq. 1.

$$ {\text{TF}}\left( {{\text{q}}_{\text{i}} } \right) = {\text{f}}\left( {{\text{q}}_{\text{i}} ,{\text{D}}} \right) $$

(1)

While the inverse document frequency is given by Eq. 2.

$$ {\text{IDF}}\left( {{\text{q}}_{\text{i}} } \right) = \log \frac{\text{N}}{{{\text{n}}\left( {{\text{q}}_{\text{i}} } \right)}} $$

(2)

In which $ {\text{f}}\left( {{\text{q}}_{\text{i}} ,{\text{D}}} \right) $ means the occurrence frequency of $ {\text{n}}\left( {{\text{q}}_{\text{i}} } \right) $ in D. $ {\text{n}}\left( {{\text{q}}_{\text{i}} } \right) $ means the number of documents which contain $ {\text{q}}_{\text{i}} $.

The Okapi relevance scores between a query and a document is given by Eq. 3.

$$ {\text{Score}}\left( {{\text{D}},{\text{Q}}} \right) = \mathop \sum \limits_{{{\text{q}}_{\text{i}} \in {\text{Q}}}} TF({\text{q}}_{\text{i}} ) \times IDF({\text{q}}_{\text{i}} ) $$

(3)

The relevance between a document and a query is quantified by the Okapi relevance scores.

2.2 Fully Homomorphic Encryption

Homomorphism is a very valuable property of encryption algorithms, which means that the computation results over cipher texts corresponds to that of the computation over plaintext. Fully homomorphic encryption (FHE) is both additive homomorphic and multiplicative homomorphic, satisfying both the Eqs. 4 and 5.

$$ {\text{D}}\left( {{\text{E}}\left( {\text{a}} \right) \oplus {\text{E}}\left( {\text{b}} \right)} \right) = {\text{a}} + {\text{b}} $$

(4)

$$ {\text{D}}\left( {{\text{E}}\left( {\text{a}} \right) \otimes {\text{E}}\left( {\text{b}} \right)} \right) = {\text{a}} \times {\text{b}} $$

(5)

Where $ \oplus $ means the “addition” over the cipher text, while $ \otimes $ denotes the “multiplication” over the cipher text.

In this work, the term frequency TF and inverse document frequency IDF values are encrypted by FHE separately. The documents which contain the terms are encrypted by some other encryption scheme, such as AES, only to protect the information stored on the public cloud.

All the information is thence uploaded to the public cloud after encrypted by a certain encryption scheme. The cipher text of Score (D, Q) could also be obtained.

2.3 The Applicability of Hybrid Cloud Computing

We assume that the hybrid cloud is simply constructed by one private cloud and one public cloud. The private cloud stores the client’s sensitive information and the public cloud performs computation over cipher text information.

A new scheme based on the private and the public cloud platform is proposed here. The public cloud in this hybrid cloud scenario is assumed to have the following characteristics: the computing resource is very enormous, and the resource allocated to a client can be elastically provided in order to meet the client’s computation demands.

The private cloud actually acts as an agent for the client in the scenario. Since the computation and storage resources are relatively abundant over private cloud, it has enough computing power to just encrypt a user’s plaintext information. The bandwidth between the private cloud and the public cloud is also large enough to transfer the cipher text of the relevance scores.

3 Secure Ranked Search Model Over Hybrid Cloud

A new encryption based secure and efficient retrieval scheme over hybrid cloud is proposed in this section.

3.1 The Architecture of the Secure Ranked Search Model Over Hybrid Cloud

There are three parties in this architecture, the client, the private cloud, and the public cloud. As shown in Fig. 1.

In the building process, the client uploads original sensitive information to the private cloud, as shown by step (1) in Fig. 1. The private cloud preprocesses the documents, and encrypts the TF, IDF values and the document itself. The encrypted information are then uploaded to the public cloud, as shown by step (2). Over the public cloud, an inverted index is built, and a variety of corresponding computations are performed.

In the retrieval process, the client gives a certain keyword to the private cloud, as shown by step (3). The private cloud encrypts the word, and search over the public cloud, as shown by step (4). On the public cloud, the calculation over the cipher text is carried out. The cipher text of evaluation scores are downloaded by the private cloud, as shown by step (5). After the decryption, the scores are ranked, thence the top N document IDs are sent to the public cloud, as shown by step (6). Then the private cloud downloads the encrypted document, as shown by step (7). After decryption, the plaintext documents are given back to the clients, as shown by step (8).

3.2 The Implementation of Fully Homomorphic Encryption Based Secure Ranking Scheme

In the inverted index building process, the computation of encryption of the plain text are performed over the private cloud.

The encrypted form of term frequency is expressed as Eq. 6.

$$ {\text{v}}_{\text{tf}} = \left( {{\text{FHE}}\left( {{\text{tf}}_{1} } \right),{\text{FHE}}\left( {{\text{tf}}_{2} } \right), \cdots ,{\text{FHE}}\left( {{\text{tf}}_{\text{N}} } \right)} \right) $$

(6)

The encrypted form of inverse document frequency is given as Eq. 7.

$$ {\text{v}}_{\text{idf}} = \left( {{\text{FHE}}\left( {{\text{idf}}_{1} } \right),{\text{FHE}}\left( {{\text{idf}}_{2} } \right), \cdots ,{\text{FHE}}\left( {{\text{idf}}_{\text{N}} } \right)} \right) $$

(7)

In the ranking process, the computation such as the addition and multiplication over the cipher text are performed over the public cloud.

The full process can be described as the following, Firstly the TF and in decimal form are transformed into binary, then each of them is encrypted, the relevance is obtained after addition and multiplication. The process is shown in Fig. 2.

The process of calculating relevance scores between the query and the document is given as Eq. 8.

$$ {\text{FHE}}\left( {\text{score}} \right) = \mathop \sum \limits_{{{\text{q}}_{\text{i}} }} {\text{FHE}}\left( {{\text{tf}}_{\text{i}} } \right) \times {\text{FHE}}\left( {{\text{idf}}_{\text{i}} } \right) $$

(8)

Thence the relevance scores in FHE form are obtained over the hybrid cloud. By decrypting them, the documents could be subsequently ranked.

4 Experiment Result and Future Work

4.1 Preliminary Experimental Result

Based on the proposed retrieval and ranking model over hybrid cloud, some preliminary experiments are carried out. The experiment utilized a small-sized Cranfield collection. The experimental result is compared with the order preserving scheme (OPE), which is employed in [13].

The precision of top N retrieved documents and the MAP [9] are used to evaluate different ranking schemes. The experimental result is shown in the following table (Table 1).

Table 1. The comparison result of different methods.

Full size table

The tentative experimental result demonstrates that the order preserving encryption based retrieval result is dramatically lower than that of the Okapi BM25 ranking models for the crucial P@N criteria.

4.2 Future Work

While retrieving, the proposed scheme needs the private cloud to download all cipher text of the relevance scores of possibly relevant documents, which also would be enormous. In order to make it more practicable, the future work may incorporate both the OPE and the FHE over the hybrid cloud. By OPE, a pre-rank could be performed over the public cloud, and give a top M relevance scores to private cloud. Here, M should be a large enough number, say 10000. Then the private cloud then decrypts the top M scores and ranks them. By this way, both the computation and communication cost over the private cloud would be limited, the efficiency of retrieving and ranking will be greatly enhanced.

5 Conclusion

A fully homomorphic encryption based secure ranked search model over the hybrid cloud is proposed, the implementation of the retrieval and ranking process are described in detail. Experimental result shows its precedence over the existing purely OPE based ranking. In the future, we would incorporate both OPE and the FHE to implement industrial model while preserving user’s privacy over the hybrid cloud.

References

Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In: Proceeding 2000 IEEE Symposium on Security and Privacy, pp. 44–55 (2000)
Google Scholar
Goh, E.-J.: Secure indexes. IACR Cryptology ePrint Archive, p. 216 (2003)
Google Scholar
Curtmola, R., Garay, J., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption: improved definitions and efficient constructions. J. Comput. Secur. 19, 895–934 (2011)
Article Google Scholar
Li, J., Wang, Q., Wang, C., Cao, N., Ren, K., Lou, W.: Fuzzy keyword search over encrypted data in cloud computing. In: 2010 Proceedings IEEE INFOCOM, pp. 1–5 (2010)
Google Scholar
Gentry, C.: Fully homomorphic encryption using ideal lattices. In: STOC 2009
Google Scholar
Golle, P., Staddon, J., Waters, B.: Secure conjunctive keyword search over encrypted data. In: ACNS (2004)
Google Scholar
Kuzu, M., Islam, M.S., Kantarcioglu, M.: Efficient similarity search over encrypted data. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1156–1167 (2012)
Google Scholar
Fu, Z., Huang, F., Sun, X., Vasilakos, A., Yang, C.: Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Serv. Comput. 12, 813–823 (2019)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2005)
Google Scholar
Vishvapathi, P., Reddy, M.J.: Privacy-preserving multi-keyword ranked search over encrypted cloud data (2016)
Google Scholar
Wang, H., Ding, B.: Growing construction and adaptive evolution of complex software systems. Sci. China Inf. Sci. 59, 1–3 (2016)
Google Scholar
Wang, H., Shi, P., Zhang, Y.: JointCloud: a cross-cloud cooperation architecture for integrated internet service customization. In: IEEE 37th International Conference on Distributed Computing Systems, pp. 1846–1855 (2017)
Google Scholar
Wang, C., Cao, N., Ren, K., Lou, W.: Enabling secure and efficient ranked keyword search over outsourced cloud data. IEEE Trans. Parallel Distrib. Syst. 23, 1467–1479 (2012)
Google Scholar
Wang, C., Ren, K., Yu, S., Urs, K.M.: Achieving usable and privacy-assured similarity search over outsourced cloud data. In: Proceedings IEEE INFOCOM, pp. 451–459 (2012)
Google Scholar
Whissell, J.S., Clarke, C.L.: Improving document clustering using Okapi BM25 feature weighting. Inf. Retr. 14, 466–487 (2011)
Article Google Scholar

Download references

Acknowledgments

This work is supported by The National Key Research and Development Program of China (2016YFB1000105).

Author information

Authors and Affiliations

CNCERT/CC, Beijing, 100029, China
Jiuling Zhang, Shijun Shen & Daochao Huang

Authors

Jiuling Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shijun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Daochao Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiuling Zhang .

Editor information

Editors and Affiliations

CNCERT/CC, Beijing, China
Wei Lu
Beijing University of Posts and Telecommunications, Beijing, China
Qiaoyan Wen
University of Chinese Academy of Sciences, Beijing, China
Yuqing Zhang
Beihang University, Beijing, China
Bo Lang
Peking University, Beijing, China
Weiping Wen
CNCERT/CC, Beijing, China
Hanbing Yan
CNCERT/CC, Beijing, China
Chao Li
CNCERT/CC, Beijing, China
Li Ding
CNCERT/CC, Beijing, China
Ruiguang Li
CNCERT/CC, Beijing, China
Yu Zhou

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Shen, S., Huang, D. (2020). A Secure Ranked Search Model Over Encrypted Data in Hybrid Cloud Computing. In: Lu, W., et al. Cyber Security. CNCERT 2020. Communications in Computer and Information Science, vol 1299. Springer, Singapore. https://doi.org/10.1007/978-981-33-4922-3_3

Download citation

DOI: https://doi.org/10.1007/978-981-33-4922-3_3
Published: 19 January 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4921-6
Online ISBN: 978-981-33-4922-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics