Skip to main content
Log in

Using TF-IDF to hide sensitive itemsets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: The international conference on very large data bases, pp 487–499

    Google Scholar 

  2. Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14

    Google Scholar 

  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450

    Google Scholar 

  4. Agrawal R, Imielinski T, Sawmi A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216

    Google Scholar 

  5. Agrawal R, Srikant R, Vu Q (1997) Mining association rules with item constraints. In: The international conference on knowledge discovery in databases and data mining, pp 67–73

    Google Scholar 

  6. Amiri A (2007) Dare to share: Protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191

    Article  Google Scholar 

  7. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: IEEE knowledge and data engineering exchange workshop, pp 45–52

    Google Scholar 

  8. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding, pp 369–383

    Chapter  Google Scholar 

  9. Garey MR, Johnson DS (1979) Computers and intractability: A guide to the theory of NP-completeness. W. H. Freeman, New York

    MATH  Google Scholar 

  10. Leary DEO (1991) Knowledge discovery as a threat to database security. In: Knowledge discovery in databases, pp 507–516

    Google Scholar 

  11. Liu F, Lu Z, Lu S (2001) Mining association rules using clustering. Intell Data Anal 5:309–326

    MATH  Google Scholar 

  12. Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54

    Google Scholar 

  13. Pontikakis ED, Tsitsonis AA, Verykios VS (2004) An experimental study of distortion-based techniques for association rule hiding. In: The conference on database security, pp 325–339

    Google Scholar 

  14. Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Secujskı̌ M, Delic V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell. doi:10.1007/s10489-011-0333-9

    Google Scholar 

  15. Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report

  16. Salton G, Fox EA, Wu H (1983) Extended boolean information retrieval. Commun ACM 26(2):1022–1036

    Article  MathSciNet  MATH  Google Scholar 

  17. Tsai C-F, Yeh H-F, Chang J-F, Liu N-H (2010) PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases. Appl Intell 33:39–53

    Article  Google Scholar 

  18. Verma B, Hassan SZ (2011) Hybrid ensemble approach for classification. Appl Intell 34:258–278

    Article  Google Scholar 

  19. Verykios VS, Gkoulalas-Divanis A (2008) Privacy-preserving data mining models and algorithms, Chap 11, pp 267–289

  20. Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy. In: The Kluwer international series on advances in database systems, vol 34, pp 267–289

    Google Scholar 

  21. Verykios VS, Elmagarmid A, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  22. Wang CY, Hong TP, Tseng SS (2002) Maintenance of discovered sequential patterns for record deletion. Intell Data Anal 6:399–410

    MATH  Google Scholar 

  23. Wang SL, Patel D, Jafari A, Hong TP (2007) Hiding collaborative recommendation association rules. Appl Intell 27(1):67–77

    Article  MATH  Google Scholar 

  24. Wu CM, Huang YF, Chen JY (2009) Privacy preserving association rules by using greedy approach. In: WRI world congress on computer science and information engineering, pp 61–65

    Google Scholar 

  25. Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401–406

    Google Scholar 

  26. Zhu Z, Du WL (2010) K-anonymous association rule hiding. In: ACM symposium on information, computer and communications security, pp 305–309

    Google Scholar 

  27. Zhu Z, Wang G, Du W (2009) Deriving private information from association rule mining results. In: IEEE international conference on data engineering, pp 18–29

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, TP., Lin, CW., Yang, KT. et al. Using TF-IDF to hide sensitive itemsets. Appl Intell 38, 502–510 (2013). https://doi.org/10.1007/s10489-012-0377-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0377-5

Keywords

Navigation