Skip to main content
Log in

Fast privacy-preserving utility mining algorithm based on utility-list dictionary

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Privacy preserving utility mining (PPUM) aims to solve the problem of sensitive information leakage in utility pattern mining. In recent years, researchers have proposed algorithms to solve the privacy-preserving problem. However, these algorithms have high side effects, long sanitization time, and computational complexity. Although the FPUTT algorithm reduces the number of database scans, tree construction and traversal still take much time. The paper proposes a fast utility-list dictionary algorithm (FULD). The utility-list dictionary consists of all sensitive items. Through dictionary lookup, sensitive items can be found and modified. In addition, the novel concepts of SINS and tns are proposed to reduce the side effects of the algorithm. In this paper, the experiments show that the FULD algorithm has good performance, such as running time and side effects. The running time of the FULD is 15–20 times shorter than the FPUTT algorithm. It performs well both on sparse and dense datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Algorithm 3
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Chen M-S, Han J, Philip SY (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883

    Article  Google Scholar 

  2. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Transac Knowl Disc Data (TKDD) 13(3):1–34

    Article  Google Scholar 

  3. Mannila H, Toivonen H, Verkamo IA (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Disc 1(3):259–289

    Article  Google Scholar 

  4. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p. 207–216

  5. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  6. Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Patt Recog 1(1):54–77

    Google Scholar 

  7. Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Transac Knowl Disc Data (TKDD) 10(4):1–29

    Article  Google Scholar 

  8. Fournier-Viger P, Lin JC-W, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdisciplinary Reviews. Data Min Knowl Disc 7(4):e1207

    Article  Google Scholar 

  9. Yao H, Hamilton HJ, Geng L (2006) A unified framework for utility-based measures for mining itemsets. In: Proc. of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, pages 28–37. Citeseer

  10. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: A survey. ACM Comput Surv (CSUR) 38(3):9–es

    Article  Google Scholar 

  11. Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313

    Article  Google Scholar 

  12. McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(1):39–61

    Article  Google Scholar 

  13. Hilderman RJ, Hamilton HJ (2003) Measuring the interestingness of discovered knowledge: A principled approach. Int Data Analy 7(4):347–382

    Article  MATH  Google Scholar 

  14. Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In: KDD, volume 95, pp. 275–281

  15. Dwork C (2006) Differential privacy. In: International Colloquium on Automata, Languages, and Programming, pp. 1–12, Springer

  16. Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 169–178

  17. Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2019) Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Transactions on Dependable and Secure Computing

  18. Yeh J-S, Hsu P-C (2010) Hhuif and msicf: Novel algorithms for privacy preserving utility mining. Expert Syst Appl 37(7):4779–4786

    Article  Google Scholar 

  19. Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Tseng VS (2016) Fast algorithms for mining high-utility itemsets with various discount strategies. Adv Eng Inform 30(2):109–126

    Article  Google Scholar 

  20. Yun U, Kim J (2015) A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Syst Appl 42(3):1149–1165

    Article  Google Scholar 

  21. Li S, Nankun M, Le J, Liao X (2019) A novel algorithm for privacy preserving utility mining based on integer linear programming. Eng Appl Artif Intell 81:300–312

    Article  Google Scholar 

  22. Lin JC-W, Djenouri Y, Srivastava G, Fourier-Viger P (2022) Efficient evolutionary computation model of closed high-utility itemset mining. Appl Intell, p. 1–13

  23. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Philip SY (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327

    Article  Google Scholar 

  24. Lin JC-W, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Inform Fusion 76:122–132

    Article  Google Scholar 

  25. Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput, 108:107422

  26. Kim H, Ryu T, Lee C, Kim H, Yoon E, Vo B, Lin JC-W, Yun U (2022) Ehmin: Efficient approach of list based high-utility pattern mining with negative unit profits. Expert Syst Appl, 209:118214

  27. Lee C, Baek Y, Ryu T, Kim H, Kim H, Lin JC-W, Vo B, Yun U (2022) An efficient approach for mining maximized erasable utility patterns. Inf Sci 609:1288–1308

    Article  Google Scholar 

  28. Ryu T, Yun U, Lee C, Lin JC-W, Pedrycz W (2022) Occupancy-based utility pattern mining in dynamic environments of intelligent systems. Int J Intell Syst 37(9):5477–5507

    Article  Google Scholar 

  29. Jianying H, Mojsilovic A (2007) High-utility pattern mining: A method for discovery of high-utility item sets. Pattern Recogn 40(11):3317–3324

    Article  MATH  Google Scholar 

  30. Lin C-W, Hong T-P, Wen-Hsiang L (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424

    Article  Google Scholar 

  31. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

    Article  Google Scholar 

  32. Zida S, Fournier-Viger P, Lin JC-W, Cheng-Wei W, Tseng VS (2015) Efim: a highly efficient algorithm for high-utility itemset mining. In: Mexican international conference on artificial intelligence, pp. 530–546. Springer

  33. Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th international conference on data mining, pages 984–989. IEEE

  34. Kim H, Yun U, Baek Y, Kim H, Nam H, Lin JC-W, Fournier-Viger P (2021) Damped sliding based utility oriented pattern mining over stream data. Knowl-Based Syst, 213, p. 106653

  35. Baek Y, Yun U, Kim H, Kim J, Vo B, Truong T, Deng Z-H (2021) Approximate high utility itemset mining in noisy environments. Knowl-Based Syst, 212:106596

  36. Hong T-P, Lin C-W, Yang K-T, Wang S-L (2013) Using tf-idf to hide sensitive itemsets. Appl Intell, 38:502–510

  37. Jangra S, Toshniwal D (2022) Efficient algorithms for victim item selection in privacy-preserving utility mining. Futur Gener Comput Syst, 128, pp. 219–234

  38. Lin JC-W, Fournier-Viger P, Wu L, Gan W, Djenouri Y, Zhang J (2018) Ppsf: An open-source privacy-preserving and security mining framework. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1459–1463

  39. Jimmy Ming-Tai W, Srivastava G, Jolfaei A, Pirouz M, Lin JC-W (2021) Security and privacy in shared hitlcps using a ga-based multiple-threshold sanitization model. IEEE Transactions on Emerging Topics in Comput Intell

  40. Lin JC-W, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2020) Privacy-preserving multiobjective sanitization model in 6g iot environments. IEEE Internet Things J 8(7):5340–5349

    Article  Google Scholar 

  41. Dinh T, Quang MN, Le B (2015) A novel approach for hiding high utility sequential patterns. In: Proceedings of the Sixth International Symposium on Information and Communication Technology, pp. 121–128

  42. Quang MN, Huynh U, Dinh T, Le NH, Le B (2016) An approach to decrease execution time and difference for hiding high utility sequential patterns. In: International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, pp. 435–446. Springer

  43. Lin JC-W, Liu Q, Fournier-Viger P, Hong T-P, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell, 53:1–18

  44. Duong Q-H, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-L (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877

    Article  Google Scholar 

  45. Ge Z, Song Z, Ding SX, Huang B (2017) Data mining and analytics in the process industry: The role of machine learning. Ieee Access, 5, pp. 20590–20616

  46. Tassa T (2013) Secure mining of association rules in horizontally distributed databases. IEEE Trans Knowl Data Eng 26(4):970–983

    Article  Google Scholar 

Download references

Acknowledgments

The authors are sincerely grateful to the editor and anonymous reviewers for their insightful and constructive comments, which helped us improve this work greatly.

Funding

This research was funded by the National Natural Science Foundation of China (61772282). It was also supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX21 1008).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Chunyong Yin and Ying Li. The first draft of the manuscript was written by Ying Li and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chunyong Yin.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, C., Li, Y. Fast privacy-preserving utility mining algorithm based on utility-list dictionary. Appl Intell 53, 29363–29377 (2023). https://doi.org/10.1007/s10489-023-04791-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04791-2

Keywords

Navigation