Abstract
This paper introduces a new theoretical scheme for the solution of the frequent itemset hiding problem. We propose an algorithmic approach that consists of a novel constraint-based hiding model which encompasses hiding into one pass mining, along with a solution methodology that relies on Linear Programming. The induced patterns by the constraint-based mining algorithm are, in this way, utilized to build a minimal linear program whose solution dictates the construction of a database extension that delivers the sought-for hiding. This extension should be appended to the original database and released as a whole for mining, with that resulting extended database hiding the sensitive knowledge that we want to protect. Our proposed theory outdoes both in space complexity and accuracy, all the existing approaches which have been proposed so far in this domain and we proved that superiority with a series of experiments against other existing approaches. Our proposal sheds a new light on the exploration of new algorithmic techniques which can be handily applied to model hiding problems by providing solutions that computationally outperform all existing modeling approaches for hiding.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abul O, Atzori M, Bonchi F, Giannotti F (2007) Hiding sequences. In: SEBD, pp 233–241
Abul O, Gökçe H (2012) Knowledge hiding from tree and graph databases. Data Knowl Eng 72:148–171
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487– 499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: SIGMOD conference, pp 439–450
Amiri F, Quirchmayr G (2017) A comparative study on innovative approaches for privacy-preservation in knowledge discovery. In: ICIME 2017: Proceedings of the 9th international conference on information management and engineering, pp 120– 127
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: KDEX workshop. IEEE, pp 45–52
Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & hall/CRC data mining and knowledge discovery series. CRC Press Inc., Boca Raton
Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2):180–201
Bonchi F, Saygin Y, Verykios VS, Atzori M, Gkoulalas-Divanis A, Kaya SV, Savas E (2008) Privacy in spatiotemporal data mining. In: Mobility, data mining and privacy, pp 297– 333
Boulicaut J-F, Jeudy B (2005) Constraint-based data mining. In: The data mining and knowledge discovery handbook, pp 399–416
Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input-output privacy. In: ICDE, pp 696– 705
Calders T (2008) Itemset frequency satisfiability: Complexity and axiomatization. Theor Comput Sci 394(1-2):84–111
Caruccio L, Desiato D, Polese G, Tortora G (2020) GDPR compliant information confidentiality preservation in big data processing. IEEE Access, NJ, pp 205034–205050
Chee CH, Jaafar J, Aziz IA, Hasan MH, Yeoh W (2019) Algorithms for frequent itemset mining: a literature review. Artif Intell 52:2603–2621
Cheng P, Roddick JF, Chu SC, Lin CW (2016) Privacy preservation through a greedy, distortion-based rule-hiding method. Appl Intell 44:295–306
Clifton C (1999) Protecting against data mining through samples. In: DBSEc, pp 193–207
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Information hiding, pp 369–383
Delis A, Verykios VS, Tsitsonis AA (2010) A data perturbation approach to sensitive classification rule hiding. In: SAC, pp 605–609
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, Lin JCW (2018) A new framework for metaheuristic-based frequent itemset mining. Appl Intell 48:4775–4791
Feretzakis G, Mitropoulos K, Kalles D, Verykios VS (2020) Local distortion hiding (LDH) algorithm: a Java-based prototype. In: SETN, pp 144–149
Feretzakis G, Kalles D, Verykios VS (2019) On using linear diophantine equations for in-parallel hiding of decision tree rules. Entropy 21(1):66
Efficient Apriori : https://github.com/tommyod/Efficient-Apriori
Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Inf Syst 29(4):343–364
Frequent itemset mining dataset repository: http://fimi.uantwerpen.be/data/
Gao F, Khandelwal A, Liu J (2019) Mining frequent itemsets using improved apriori on spark. ICISDM 2019
Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: CIKM, pp 748– 757
Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713
Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Inf Syst 20(3):263–299
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037
Kenthapadi K, Mironov I, Thakurta AG (2019) Privacy-preserving data mining in industry. In: Twelfth ACM international conference
Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: the 6th international conference on information, process, and knowledge management, pp 124–130
Li R, Mu N, Le J, Liao X (2019) Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction. Comput Sec (elsevier) 84:17–34
Liu X, Wen S, Zuo W (2020) Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining. Appl Intell 50:169–191
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: CRYPTO, pp 36–54
Makris C, Markovits P (2018) Evaluation of sensitive data hiding techniques for transaction databases. SETN ’18 11:1–8
Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inf Syst Res 16(3):256–270
Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75– 89
Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: ICDM, pp 613–616
Ozturk AC, Bostanoglu EB (2017) Itemset hiding under multiple sensitive support thresholds. In: Proceedings of 9th international joint conference on knowledge discovery knowledge engineering and knowledge management, pp 222–231
Python Pulp Library: https://pythonhosted.org/PuLP/
Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: VLDB, pp 682–693
Sharma S, Toshniwal D (2020) MR-OVNTSA: a heuristics based sensitive pattern hiding approach for big data. Appl Intell
Md Siraj M, Rahmat NA, Din MM (2019) A survey on privacy preserving data mining approaches and techniques. In: ICSCA ’19: proceedings of the 2019 8th international conference on software and computer applications, pp 65–69
Sacca D, Serra E, Rullo A (2019) Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applications. Data Mining Knowl Discov 33:1736–1774
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: KDD, pp 67–73
Stavropoulos EC, Verykios VS, Kagklis V (2016) A transversal hypergraph approach for the frequent itemset hiding problem. Knowl Inf Sys
Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: ICDM, pp 426–433
Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. JCSE 1(1):74–94
Telikani A, Shahbahrami A (2018) Data sanitization in association rule mining: an analytical review. Expert Sys Appl 96:406– 426
Telikani A, Shahbahrami A, Tavoli R (2015) Data sanitization in association rule mining based on impact factor. J AI Data Min 3(2):132–140
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Verykios VS, Stavropoulos EC, Zorkadis V, Elmagarmid AK (2019) A constraint-based model for the frequent itemset hiding problem. e-Democracy 49–64
Voigt P, von dem Bussche A (2017) The EU general data protection regulation(GDPR): a practical guide in Springer
Wen H, Kou M, He H, Li X, Tou H, Yang Y. (2018) A spark-based incremental algorithm for frequent itemset mining. In: BDIOT 2018: proceedings of the 2018 2nd international conference on big data and internet of things, pp 53–58
Acknowledgements
We would like to thank the department of Informatics in the University of Piraeus for infrastructure availability to perform the extensive experimental tests.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Verykios, V.S., Stavropoulos, E.C., Krasadakis, P. et al. Frequent itemset hiding revisited: pushing hiding constraints into mining. Appl Intell 52, 2539–2555 (2022). https://doi.org/10.1007/s10489-021-02490-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02490-4