Using TF-IDF to hide sensitive itemsets

Hong, Tzung-Pei; Lin, Chun-Wei; Yang, Kuo-Tung; Wang, Shyue-Liang

doi:10.1007/s10489-012-0377-5

Using TF-IDF to hide sensitive itemsets

Published: 26 August 2012

Volume 38, pages 502–510, (2013)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tzung-Pei Hong^1,3,
Chun-Wei Lin⁴,
Kuo-Tung Yang¹ &
…
Shyue-Liang Wang²

1327 Accesses
73 Citations
Explore all metrics

Abstract

Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: The international conference on very large data bases, pp 487–499
Google Scholar
Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14
Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450
Google Scholar
Agrawal R, Imielinski T, Sawmi A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Google Scholar
Agrawal R, Srikant R, Vu Q (1997) Mining association rules with item constraints. In: The international conference on knowledge discovery in databases and data mining, pp 67–73
Google Scholar
Amiri A (2007) Dare to share: Protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Article Google Scholar
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: IEEE knowledge and data engineering exchange workshop, pp 45–52
Google Scholar
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding, pp 369–383
Chapter Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: A guide to the theory of NP-completeness. W. H. Freeman, New York
MATH Google Scholar
Leary DEO (1991) Knowledge discovery as a threat to database security. In: Knowledge discovery in databases, pp 507–516
Google Scholar
Liu F, Lu Z, Lu S (2001) Mining association rules using clustering. Intell Data Anal 5:309–326
MATH Google Scholar
Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54
Google Scholar
Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: The conference on database security, pp 325–339
Google Scholar
Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Secujskı̌ M, Delic V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell. doi:10.1007/s10489-011-0333-9
Google Scholar
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report
Salton G, Fox EA, Wu H (1983) Extended boolean information retrieval. Commun ACM 26(2):1022–1036
Article MathSciNet MATH Google Scholar
Tsai C-F, Yeh H-F, Chang J-F, Liu N-H (2010) PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases. Appl Intell 33:39–53
Article Google Scholar
Verma B, Hassan SZ (2011) Hybrid ensemble approach for classification. Appl Intell 34:258–278
Article Google Scholar
Verykios VS, Gkoulalas-Divanis A (2008) Privacy-preserving data mining models and algorithms, Chap 11, pp 267–289
Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: The Kluwer international series on advances in database systems, vol 34, pp 267–289
Google Scholar
Verykios VS, Elmagarmid A, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Article Google Scholar
Wang CY, Hong TP, Tseng SS (2002) Maintenance of discovered sequential patterns for record deletion. Intell Data Anal 6:399–410
MATH Google Scholar
Wang SL, Patel D, Jafari A, Hong TP (2007) Hiding collaborative recommendation association rules. Appl Intell 27(1):67–77
Article MATH Google Scholar
Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: WRI world congress on computer science and information engineering, pp 61–65
Google Scholar
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401–406
Google Scholar
Zhu Z, Du WL (2010) K-anonymous association rule hiding. In: ACM symposium on information, computer and communications security, pp 305–309
Google Scholar
Zhu Z, Wang G, Du W (2009) Deriving private information from association rule mining results. In: IEEE international conference on data engineering, pp 18–29
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong & Kuo-Tung Yang
Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan
Shyue-Liang Wang
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Tzung-Pei Hong
Innovative Information Industry Research Center (IIIRC), School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus of Shenzhen University Town, Xili, Shenzhen, 518055, P.R. China
Chun-Wei Lin

Authors

Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shyue-Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, TP., Lin, CW., Yang, KT. et al. Using TF-IDF to hide sensitive itemsets. Appl Intell 38, 502–510 (2013). https://doi.org/10.1007/s10489-012-0377-5

Download citation

Published: 26 August 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10489-012-0377-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using TF-IDF to hide sensitive itemsets

Abstract

Access this article

Similar content being viewed by others

Privacy Preservation of Periodic Frequent Patterns Using Sensitive Inverse Frequency

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

A Frequent Itemset Hiding Toolbox

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using TF-IDF to hide sensitive itemsets

Abstract

Access this article

Similar content being viewed by others

Privacy Preservation of Periodic Frequent Patterns Using Sensitive Inverse Frequency

A Greedy Approach to Hide Sensitive Frequent Itemsets with Reduced Side Effects

A Frequent Itemset Hiding Toolbox

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation