Abstract
Most of the traditional privacy-preserving search schemes in cloud adopt TF-IDF model which is on the basis of keyword frequency statistics. The embedding semantic association between keywords and documents are not considered. To solve this problem, we propose a novel semantic-aware search scheme based on BCI-tree index over encrypted cloud data. The LDA model is adopted to generate vectors for documents and queried keywords and the vectors contain topic-based semantic information. The homomorphic encryption on vectors is used to perform privacy-preserving semantic relevance score computation between queried keywords and documents. To achieve efficient search processing, a novel binary clustering tree-based index (BCI-tree index) is designed, which is constructed following the divisive hierarchical clustering algorithm. By using the BCI-tree index, a depth-first recursive search algorithm is proposed. In addition, a threshold presetting-based optimization is applied to further accelerate the search speed. The experimental results show that the proposed scheme performs good in the semantic precision of search results and the search time cost.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-023-01176-w/MediaObjects/11280_2023_1176_Fig9_HTML.png)
Similar content being viewed by others
Data Availability
Details of the datasets have been described in Section 8.
References
Ballard, L., Kamara, S., Monrose, F.: Achieving efficient conjunctive keyword searches over encrypted data. In: Proceedings of 2005 International Conference on Information and Communications Security. pp. 414–426. Springer (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)
Boneh, D., Waters, B.: Conjunctive, subset, and range queries on encrypted data. In: Theory of Cryptography Conference. pp. 535–554. Springer (2007)
Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. In: Proceeding of the 30th IEEE International Conference on Computer Communications. pp. 829–837. IEEE (2011)
Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans. Parallel Distrib. Sys. 25(1), 222–233 (2014)
Cao, Q., Li, Y., Wu, Z., Miao, Y., Liu, J.: Privacy-preserving conjunctive keyword search on encrypted data with enhanced fine-grained access control. World Wide Web 23, 959–989 (2020)
Chang, Y.C., Mitzenmacher, M.: Privacy preserving keyword searches on remote encrypted data. In: Proceedings of 2005 International Conference on Applied Cryptography and Network Security.pp. 442–455. Springer (2005)
Chen, C., Zhu, X., Shen, P., Hu, J., Guo, S., Tari, Z., Zomaya, A.Y.: An efficient privacy-preserving ranked keyword search method. IEEE Trans. Parallel Distrib. Syst. 27(4), 951–963 (2016)
Curtmola, R., Garay, J., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption: improved definitions and efficient constructions. J. Comput. Secur. 19(5), 895–934 (2011)
Dai, H., Dai, X., Yi, X., Yang, G., Huang, H.: Semantic-aware multi-keyword ranked search scheme over encrypted cloud data. J. Netw. Comput. Appl. 147, 102442 (2019)
Dai, H., Yang, M., Yang, G., Xiang, Y., Hu, Z., Wang, H.: A keyword-grouping inverted index based multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Sustain. Comput. 7(3), 561–578 (2022)
Dai, X., Dai, H., Rong, C., Yang, G., Xiao, F.: Enhanced semantic-aware multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Cloud Comput., pp. 1–16 (2020)
Delfs, H., Knebl, H.: Introduction to cryptography principles and applications, 3rd edn. Springer-Verlag, Berlin (2007)
Fu, Z., Huang, F., Ren, K., Weng, J., Wang, C.: Privacy-preserving smart semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Inf. Forensics Sec. 12(8), 1874–1884 (2017)
Fu, Z., Sun, X., Ji, S., Xie, G.: Towards efficient content-aware search over encrypted outsourced data in cloud. In: Proceedings of the 35th Annual IEEE International Conference on Computer Communications. pp. 1–9. IEEE (2016)
Fu, Z., Sun, X., Linge, N., Zhou, L.: Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans. Cons. Electr. 60(1), 164–172 (2014)
Fu, Z., Xia, L., Sun, X., Liu, A.X., Xie, G.: Semantic-aware searching over encrypted data for cloud computing. IEEE Trans. Inf. Forensics Sec. 13(9), 2359–2371 (2018)
Gabryel, M., Damaševičius, R., Przybyszewski, K.: Application of the bag-of-words algorithm in classification the quality of sales leads. In: Proceedings of 2018 International Conference on Artificial Intelligence and Soft Computing. pp. 615–622. Springer (2018)
Hozhabr, M., Asghari, P., Javadi, H.: Dynamic secure multi-keyword ranked search over encrypted cloud data. J. Inf. Sec. Appl. 61(1), 1–12 (2021)
Hua, J., Liu, Y., Chen, H., Tian, X., Jin, C.: An enhanced wildcard-based fuzzy searching scheme in encrypted databases. World Wide Web 23, 2185–2214 (2020)
Ibrahim, A., Jin, H., Yassin, A.A., Zou, D.: Secure rank-ordered search of multi-keyword trapdoor over encrypted cloud data. In: Proceedings of 2012 IEEE Asia-Pacific Services Computing Conference. pp. 263–270. IEEE (2012)
Kiayias, A., Oksuz, O., Russell, A., Tang, Q., Wang, B.: Efficient encrypted keyword search for multi-user data sharing. In: Proceedings of European Symposium on Research in Computer Security. pp. 173–195. Springer (2016)
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning. pp. 331–339. Elsevier (1995)
Li, J., Wang, Q., Wang, C., Cao, N., Ren, K., Lou, W.: Fuzzy keyword search over encrypted data in cloud computing. In: Proceedings of the 29th IEEE International Conference on Computer Communications. pp. 1–5. IEEE (2010)
Liang, Y., Li, Y., Zhang, K., Ma, L.: DMSE: dynamic multi-keyword search encryption based on inverted index. J. Syst. Architect. 119, 1–10 (2021)
Liu, C., Zhu, L., Chen, J.: Efficient searchable symmetric encryption for storing multiple source dynamic social data on cloud. J. Netw. Comput. Appl. 86, 3–14 (2017)
Liu, Q., Peng, Y., Wu, J., Wang, T., Wang, G.: Secure multi-keyword fuzzy searches with enhanced service quality in cloud computing. IEEE Trans. Network and Service Management 18(2), 2046–2062 (2020)
Orencik, C., Kantarcioglu, M., Savas, E.: A practical and secure multi-keyword search method over encrypted cloud data. In: Proceedings of IEEE 16th International Conference on Cloud Computing. pp. 390–397. IEEE (2013)
Poon, H.T., Miri, A.: An efficient conjunctive keyword and phase search scheme for encrypted cloud storage systems. In: Proceedings of 2015 IEEE 8th International Conference on Cloud Computing. pp. 508–515. IEEE (2015)
Roux, M.: A comparative study of divisive and agglomerative hierarchical clustering algorithms. J. Class. 35(2), 345–366 (2018)
Scheuermann, P., Ouksel, M.: Multidimensional b-trees for associative searching in database systems. Inf. Syst. 7(2), 123–137 (1982)
Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In: Proceeding of 2000 IEEE Symposium on Security and Privacy. pp. 44–55. IEEE (2000)
Sun, W., Liu, X., Lou, W., Hou, Y.T., Li, H.: Catch you if you lie to me: efficient verifiable conjunctive keyword search over large dynamic encrypted cloud data. In: Proceedings of the 34th IEEE International Conference on Computer Communications. pp. 2110–2118. IEEE (2015)
Sun, W., Wang, B., Cao, N., Li, M., Lou, W., Hou, Y.T., Li, H.: Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 11(25), 3025–3035 (2014)
Swaminathan, A., Mao, Y., Su, G.M., Gou, H., Varna, A.L., He, S., Wu, M., Oard, D.W.: Confidentiality-preserving rank-ordered search. In: Proceedings of the 2007 ACM workshop on Storage security and survivability. pp. 7–12. ACM (2007)
Tseng, C.Y., Lu, C., Chou, C.F.: Efficient privacy-preserving multi-keyword ranked search utilizing document replication and partition. In: Proceedings of the 12th Annual IEEE Consumer Communications and Networking Conference (CCNC). pp. 671–676. IEEE (2015)
Wang, B., Yu, S., Lou, W., Hou, Y.T.: Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In: Proceedings of the 33th IEEE International Conference on Computer Communications. pp. 2112–2120. IEEE (2014)
Wang, C., Cao, N., Ren, K., Lou, W.: Enabling secure and efficient ranked keyword search over outsourced cloud data. IEEE Trans. Parallel Distrib. Syst. 23(8), 1467–1479 (2011)
Wang, C., Ren, K., Yu, S., Urs, K.M.R.: Achieving usable and privacy-assured similarity search over outsourced cloud data. In: Proceedings of the 31th IEEE International Conference on Computer Communications. pp. 451–459. IEEE (2012)
Wang, P., Ravishankar, C.V.: On masking topical intent in keyword search. In: Proceedings of 2014 IEEE 30th International Conference on Data Engineering. pp. 256–267. IEEE (2014)
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)
Xia, Z., Zhu, Y., Sun, X., Chen, L.: Secure semantic expansion based search over encrypted cloud data supporting similarity ranking. J. Cloud Comput. 3(1), 1–11 (2014)
Zhu, X., Dai, H., Yi, X., Yang, G., Li, X.: Muse: an efficient and accurate verifiable privacy-preserving multikeyword text search over encrypted cloud data. Secur. Commun. Netw. 2017, 1–17 (2017)
Zerr, S., Olmedilla, D., Nejdl, W., Siberski, W.: Zerber+r: Top-k retrieval from a confidential index. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. pp. 439–449 ACM (2009)
Zhang, B., Zhang, F.: An efficient public key encryption with conjunctive-subset keywords search. J. Netw. Comput. Appl. 34(1), 262–267 (2011)
Zhang, W., Xiao, S., Lin, Y., Zhou, T., Zhou, S.: Secure ranked multi-keyword search for multiple data owners in cloud computing. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. pp. 276–286. IEEE (2014)
Zhou, Q., Dai, H., Shen, W., Liu, Y., Yang, G.: Evss: An efficient verifiable search scheme over encrypted cloud data. World Wide Web pp. 1–21 (2022)
Zhu, X., Liu, Q., Wang, G.: A novel verifiable and dynamic fuzzy keyword search scheme over encrypted data in cloud computing. In: Proceedings of 2016 IEEE Trustcom/BigDataSE/ISPA. pp. 845–851. IEEE (2016)
Zhou Q., Dai H., Hu Z., Liu Y., Yang G.: SAPMS: A Semantic-aware Privacy-preserving Multi-keyword Search Scheme in Cloud. In: Proceedings of the 6th APWeb-WAIM International Joint Conference on Web and Big Data. pp. 251–263. LNCS (2022)
Acknowledgements
Not applicable.
Funding
This work was supported by the National Natural Science Foundation of China under the grant Nos.61902199, 61872197,and 61972209; and the Jiangsu Province Postgraduate Scientific Research Innovation Program under the grand No. KYCX22_0984.
Author information
Authors and Affiliations
Contributions
This work thanks the following authors for their contributions: Qian Zhou and Hua Dai contributed to the conception of the study; Qian Zhou and Yuanlong Liu performed the experiment; Qian Zhou and Zheng Hu contributed significantly to security analysis and manuscript preparation; Qian Zhou and Hua Dai performed the data analyses and wrote the manuscript; Geng Yang and Xun Yi helped perform the analysis with constructive discussions.
Corresponding author
Ethics declarations
Ethical Approval and Consent to participate
Our manuscripts were not submitted to multiple journals for simultaneous consideration and original. All authors agree with the content of the article.
Human and Animal Ethics
Not applicable.
Consent for publication
Our manuscript is approved by all authors for publication.
Competing interests
No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, Q., Dai, H., Liu, Y. et al. A novel semantic-aware search scheme based on BCI-tree index over encrypted cloud data. World Wide Web 26, 3055–3079 (2023). https://doi.org/10.1007/s11280-023-01176-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-023-01176-w