Skip to main content

LSH-aware multitype health data prediction with privacy preservation in edge environment


With the increasing development of electronic technology, traditional paper-driven medical systems have been converting to efficient electronic records that can be easily checked and transmitted. However, due to system updating and equipment failure, missing data problems are very common in the healthcare field. Health data can help people evaluate their health status and adjust their fitness. Therefore, predicting missing health data is a current pressing task. There are two challenges when predicting missing data: (1) people’s health data are complex. The data contain multiple data types (such as continuous data, discrete data and Boolean data) and (2) privacy issues are raised at the edge because huge amounts of health data are published while the edge devices can only provide limited computing and storage resources. Therefore, a novel multitype health data privacy-aware prediction approach based on locality-sensitive hashing is proposed in this paper. Through locality-sensitive hashing, our proposed method can realize a good tradeoff between prediction accuracy and privacy preservation. Finally, through a set of experiments deployed on the WISDM dataset, we verify the validity of our approach in dealing with multitype data and attaining user privacy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Agarwal, A., Sharma, S., Kumar, V., Kaur, M.: Effect of E-Learning on public health and environment during COVID-19 Lockdown. Big Data Mining and Analytics 4(2), 104–115 (2021)

    Article  Google Scholar 

  2. 2.

    Ahila, S. S., Shunmuganathan, K.L.: Role of agent technology in web usage mining: homomorphic encryption based recommendation for ecommerce applications. Wireless Personal Communications 87(2), 499–512 (2016)

    Article  Google Scholar 

  3. 3.

    Cai, Z., Zheng, X.: A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Transactions on Network Science and Engineering (TNSE) 7(2), 766–775 (2020)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Cheng, C. H., Chan, C. P., Sheu, Y.J.: A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction. Eng. Appl. Artif. Intel. 81, 283–299 (2019)

    Article  Google Scholar 

  5. 5.

    Dou, W., Zhang, X., Liu, J., Chen, J.: hiresome-II: Towards privacy-aware cross-cloud service composition for big data applications. IEEE Transactions on Parallel and Distributed Systems 26(2), 455–466 (2015)

    Article  Google Scholar 

  6. 6.

    Dou, K., Guo, B., Kuang, L.: A privacy-preserving multimedia recommendation in the context of social network based on weighted noise injection. Multimedia Tools and Applications 78(19), 26907–26926 (2019)

    Article  Google Scholar 

  7. 7.

    Gerber, F., Jong, de R., Schaepman, M.E., Schaepman-Strub, G., Furrer, R.: Predicting missing values in spatio-temporal remote sensing data. IEEE Transactions on Geoscience and Remote Sensing 56(5), 2841–2853 (2018)

    Article  Google Scholar 

  8. 8.

    Gionis, A., Indyky, P., Motwani, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Databases (1999)

  9. 9.

    Gupta, V. K., Gupta, A., Kumar, D., Sardana, A.: Prediction of COVID-19 confirmed, death, and cured cases in india using random forest model. Big Data Mining and Analytics 4(2), 116–123 (2021)

    Article  Google Scholar 

  10. 10.

    Huang, H., Lin, J., Wu, L., Fang, B., Wen, Z., Sun, F.: Machine learning-based multi-modal information perception for soft robotic hands. Tsinghua Sci. Technol. 25(02), 255–269 (2020)

    Article  Google Scholar 

  11. 11.

    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. (1998)

  12. 12.

    Ioannidis, Y., et al.: Data mining and query log analysis for scalable temporal and continuous query answering (2015)

  13. 13.

    Kumari, R., Kumar, S., Poonia, R. C., Singh, V., Raja, L., Bhatnagar, V., Agarwal, P.: Analysis and predictions of spread, recovery, and death caused by COVID-19 in India. Big Data Mining and Analytics 4(2), 65–75 (2021)

    Article  Google Scholar 

  14. 14.

    Kwapisz, J. R., Weiss, G. M., Moore, S.A.: Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl. 12(2), 74–82 (2011)

    Article  Google Scholar 

  15. 15.

    Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond kAnonymity and l-Diversity. In: International Conference on Data Engineering. (2007)

  16. 16.

    Li, D., Chen, C., Lv, Q., Shang, L., Zhao, Y., Lu, T., Gu, N.: An algorithm for efficient privacy-preserving item-based collaborative filtering. Futur. Gener. Comput. Syst. 55, 311–320 (2016)

    Article  Google Scholar 

  17. 17.

    Li, C., Palanisamy, B., Josh, J.: Differentially private trajectory analysis for points-of-interest recommendation. In: IEEE International Congress on Big Data. (2017)

  18. 18.

    Li, D., Zhang, W., Shen, S., Zhang, Y.: SES-LSH: Shuffle-Efficient Locality Sensitive Hashing for Distributed Similarity Search. In: IEEE International Conference on Web Services. (2017)

  19. 19.

    li, B., He, Q., Chen, F., Jn, H., Xiang, Y., Yang, Y.: Auditing cache data integrity in the edge computing environment. IEEE Transactions on Parallel and Distributed Systems 32(5), 1210–1223 (2021)

    Article  Google Scholar 

  20. 20.

    Liu, Y., Wang, F., Yang, Y., Zhang, X., Wang, H., Dai, H., Qi, L.: An attention-based category-aware GRU model for next POI recommendation. International Journal of Intelligent Systems (2021)

  21. 21.

    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond K-anonymity. International Conference on Data Engineering 1(1), 3–es (2006)

    Google Scholar 

  22. 22.

    Monteiro, A., Mathew, A. J., Colaco, G. V., Fernandes, M., Fernandes, K. R.: The Mechanism to Combat Data Leakage Trojans in Circuits using Ranomized Encoding. In: IEEE International Conference on Distributed Computing. (2020)

  23. 23.

    Qi, L., Zhang, X., Dou, W., Ni, Q.: A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE Journal on Selected Areas in Communications 35(11), 2616–2624 (2017)

    Article  Google Scholar 

  24. 24.

    Qi, L., Wang, X., Xu, X., Dou, W., Li, S.: Privacy-aware cross-platform service recommendation based on enhanced locality-sensitive hashing. In: IEEE Transactions on Network Science and Engineering. (2020)

  25. 25.

    Rusdah, D. A., Murfi, H.: XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2(8), 1336 (2020)

    Article  Google Scholar 

  26. 26.

    Shi, W., Zhu, Y., Yu, P. S., Huang, T., Wang, C., Mao, Y., Chen, Y.: Temporal dynamic matrix factorization for missing data prediction in large scale coevolving time series. IEEE Access 4, 6719–6732 (2016)

    Article  Google Scholar 

  27. 27.

    Shu, J., Jia, X., Yang, K., Wang, H.: Privacy-preserving task recommendation services for crowdsourcing. IEEE Transactions on Services Computing (2018)

  28. 28.

    Singh, K. K., Singh, A.: Diagnosis of COVID-19 from Chest X-Ray images using wavelets-based depthwise convolution network. Big Data Mining and Analytics 4(2), 84–93 (2021)

    Article  Google Scholar 

  29. 29.

    Sun, Z., Wang, Y., Cai, Z., Liu, T., Tong, X., Jiang, N.: A two-stage privacy protection mechanism based on blockchain in mobile crowdsourcing. International Journal of Intelligent Systems. (2021)

  30. 30.

    Wang, Y., Cai, Z., Tong, X., Gao, Y., Yin, G.: Truthful incentive mechanism with location privacy-preserving for mobile crowdsourcing systems. Computer Network 135, 32–43 (2018)

    Article  Google Scholar 

  31. 31.

    Wang, Y., Cai, Z., Zhan, Z., Gong, Y., Tong, X.: An optimization and auction based incentive mechanism to maximize social welfare for mobile crowdsourcing. IEEE Trans. Comput. Soc. Syst. 6(3), 414–429 (2019)

    Article  Google Scholar 

  32. 32.

    Xia, Z., Wang, X., Zhang, L., Qin, Z., Sun, X., Ren, K.: A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans. Inform. Forens. Sec. 11(11), 2594–2608 (2016)

    Article  Google Scholar 

  33. 33.

    Xia, X., Chen, F., He, Q., Grundy, J., Abdelrazek, M., Jin, H.: Online collaborative data caching in edge computing. IEEE Transactions on Parallel and Distributed Systems 32(2), 281–294 (2021)

    Article  Google Scholar 

  34. 34.

    Xia, X., Chen, F., He, Q., Grundy, J., Abdelrazek, M., Jin, H.: Cost-Effective App data distribution in edge computing. IEEE Transactions on Parallel and Distributed Systems 32(1), 31–44 (2021)

    Article  Google Scholar 

  35. 35.

    Xiong, Y., Chen, S., Qin, H., Cao, H., Shen, Y., Wang, X., Chen, Q., Yan, J., Tang, B.: Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity. BMC Medical Informatics and Decision Making, 20(1). (2020)

  36. 36.

    Xu, X., Li, H., Xu, W., Liu, Z., Yao, L., Dai, F.: Artificial intelligence for edge service optimization in internet of vehicles: A survey. Tsinghua Science and Technology. (2020)

  37. 37.

    Xu, X., Huang, Q., Zhu, H., Sharma, S., Zhang, X., Qi, L., Bhuiyan, M.Z.A.: Secure service offloading for internet of vehicles in SDN-Enabled mobile edge computing. IEEE Transactions on Intelligent Transportation Systems. (2020)

  38. 38.

    Yuan, L., He, Q., Tan, S., Li, B., Yu, J., Chen, F., Jin, H., Yang, Y.: A decentralized blockchain-based platform for cooperative edge computing. In: 30th The Web Conference, Ljubljana, Slovenia. (2021)

  39. 39.

    Yue, Z., Chu, X., Xia, J.: PredCID: Prediction of driver frameshift indels in human cancer. Briefings in Bioinformatics. (2020)

  40. 40.

    Zhang, K., Fan, S., Wang, H.J.: An efficient recommender system using locality sensitive hashing. In: The 51th Annual Hawaii International Conference on System Sciences. (2018)

  41. 41.

    Zhang, Y., Pan, J., Qi, L., He, Q.: Privacy-Preserving Quality Prediction for Edge-based IoT Services. Future Generation Computer Systems. (2020)

  42. 42.

    Zhang, X., Yan, C., Gao, C., Malin, B. A., Chen, Y.: Predicting Missing Values in Medical Data Via XGBoost Regression. Journal of Healthcare Informatics Research 4(4), 383–394 (2020)

    Article  Google Scholar 

  43. 43.

    Zhao, X., Wang, Z., Gao, L., Li, Y., Wang, S.: Incremental face clustering with optimal summary learning via graph convolutional network. Tsinghua Sci. Technol. 26(4), 536–547 (2021)

    Article  Google Scholar 

  44. 44.

    Zheng, X., Cai, Z., Li, J., Gao, H.: Location-privacy-aware review publication mechanism for local business service systems. In: IEEE International Conference on Computer Communications. (2017)

  45. 45.

    Zhou, P., Zhou, Y., Wu, D., Jin, H.: Differentially private online learning for cloud-based video recommendation with multimedia big data in social networks. IEEE Transactions on Multimedia 18(6), 1217–1229 (2016)

    Article  Google Scholar 

  46. 46.

    Zhu, J., He, P., Zheng, Z., Lyu, M.R.: A privacy-preserving QoS prediction framework for web service recommendation. In: IEEE International Conference on Web Services. (2015)

  47. 47.

    Zhu, T., Li, G., Zhou, W., Xiong, P., Yuan, C.: Privacy-preserving topic model for tagging recommender systems. Knowl. Inf. Syst. 46(1), 33–58 (2016)

    Article  Google Scholar 

Download references


This work is supported by the National Natural Science Foundation of China (No. 61872219) and the Natural Science Foundation of Shandong Province (ZR2019MF001).

Author information



Corresponding author

Correspondence to Lianyong Qi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Resource Management at the Edge for Future Web, Mobile and IoT Applications

Guest Editors: Qiang He, Fang Dong, Chenshu Wu, and Yun Yang

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kong, L., Wang, L., Gong, W. et al. LSH-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web (2021).

Download citation


  • Multitype health data
  • Missing data prediction
  • Privacy
  • Locality-sensitive hashing
  • Edge