Abstract
Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: The international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450
Agrawal R, Imielinski T, Sawmi A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Srikant R, Vu Q (1997) Mining association rules with item constraints. In: The international conference on knowledge discovery in databases and data mining, pp 67–73
Amiri A (2007) Dare to share: Protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: IEEE knowledge and data engineering exchange workshop, pp 45–52
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding, pp 369–383
Garey MR, Johnson DS (1979) Computers and intractability: A guide to the theory of NP-completeness. W. H. Freeman, New York
Leary DEO (1991) Knowledge discovery as a threat to database security. In: Knowledge discovery in databases, pp 507–516
Liu F, Lu Z, Lu S (2001) Mining association rules using clustering. Intell Data Anal 5:309–326
Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54
Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: The conference on database security, pp 325–339
Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Secujskı̌ M, Delic V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell. doi:10.1007/s10489-011-0333-9
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report
Salton G, Fox EA, Wu H (1983) Extended boolean information retrieval. Commun ACM 26(2):1022–1036
Tsai C-F, Yeh H-F, Chang J-F, Liu N-H (2010) PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases. Appl Intell 33:39–53
Verma B, Hassan SZ (2011) Hybrid ensemble approach for classification. Appl Intell 34:258–278
Verykios VS, Gkoulalas-Divanis A (2008) Privacy-preserving data mining models and algorithms, Chap 11, pp 267–289
Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: The Kluwer international series on advances in database systems, vol 34, pp 267–289
Verykios VS, Elmagarmid A, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Wang CY, Hong TP, Tseng SS (2002) Maintenance of discovered sequential patterns for record deletion. Intell Data Anal 6:399–410
Wang SL, Patel D, Jafari A, Hong TP (2007) Hiding collaborative recommendation association rules. Appl Intell 27(1):67–77
Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: WRI world congress on computer science and information engineering, pp 61–65
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401–406
Zhu Z, Du WL (2010) K-anonymous association rule hiding. In: ACM symposium on information, computer and communications security, pp 305–309
Zhu Z, Wang G, Du W (2009) Deriving private information from association rule mining results. In: IEEE international conference on data engineering, pp 18–29
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hong, TP., Lin, CW., Yang, KT. et al. Using TF-IDF to hide sensitive itemsets. Appl Intell 38, 502–510 (2013). https://doi.org/10.1007/s10489-012-0377-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0377-5