Skip to main content
Log in

On anonymizing transactions with sensitive items

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

K-anonymity (Samarati and Sweeny 1998; Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001; Sweeny, Int J Uncertain, Fuzziness Knowl-Based Syst, 10(5):557–570, 2002) and its variants, l-diversity (Machanavajjhala et al., ACM TKDD, 2007) and tcloseness (Li et al. 2007) among others are anonymization techniques for relational data and transaction data, which are used to protect privacy against re-identification attacks. A relational dataset D is k-anonymous if every record in D has at least k-1 other records with identical quasi-identifier attribute values. The combination of released data with external data will never allow the recipient to associate each released record with less than k individuals (Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001). However, the current concept of k-anonymity on transaction data treats all items as quasi-identifiers. The anonymized data set has k identical transactions in groups and suffers from lower data utility (He and Naughton 2009; He et al. 2011; Liu and Wang 2010; Terrovitis et al., VLDB J, 20(1):83–106, 2011; Terrovitis et al. 2008). To improve the utility of anonymized transaction data, this work proposes a novel anonymity concept on transaction data that contain both quasi-identifier items (QID) and sensitive items (SI). A transaction that contains sensitive items must have at least k-1 other identical transactions (Ghinita et al. IEEE TKDE, 33(2):161–174, 2011; Xu et al. 2008). For a transaction that does not contain a sensitive item, no anonymization is required. A transaction dataset that satisfies this property is said to be sensitive k-anonymous. Three algorithms, Sensitive Transaction Neighbors (STN) Gray Sort Clustering (GSC) and Nearest Neighbors for K-anonymization (K-NN), are developed. These algorithms use adding/deleting QID items and only adding SI to achieve sensitive k-anonymity on transaction data. Additionally, a simple “privacy value” is proposed to evaluate the degree of privacy for different types of k-anonymity on transaction data. Extensive numerical simulations were carried out to demonstrate the characteristics of the proposed algorithms and also compared to other types of k-anonymity approaches. The results show that each technique possesses its own advantage under different criteria such as running time, operation, and information loss. The results obtained here can be used as a guideline of the selection of anonymization technique on different data sets and for different applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2006) Achieving anonymity via clustering. In: Proc. of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 153–162

  2. Barbaro M, Jr TZ (2006) A face is exposed for AOL searcher no. 4417749 New York Times

  3. Fung BCM, Wang K, Chen R, Yu PS. (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4)

  4. Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: Proc. of ICDE, pp 715–724

  5. Ghinita G, Kalnis P, Tao Y (2011) Anonymous publication of sensitive transactional data. In: IEEE TKDE, 33(2):161–174

  6. He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. In: Proc. of PVLDB, pp 934–945

  7. He Y, Barman S, Naughton JE (2011) Preventing equivalence attacks in updated, anonymized data. In: Proc. of ICDE

  8. Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Applied Intelligence, pp 502–510

  9. IBM Quest Market-Basket Synthetic Data Generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

  10. Islam MZ, Brankovic L (2011) Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowledge-based Systems, pp 1214–1223

  11. LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proc. of SIGMOD, p 25

  12. Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proc. of ICDE, pp 106–115

  13. Liu JQ, Wang K (2010) Anonymizing transaction data by integrating suppression and generalization. In: Proc. of PAKDD,pp 171–180

  14. Liu L, Zhu H, Huang Z (2011) Analysis of the minimal privacy disclosure for web services collaborations with role mechanisms. Expert Syst Appl 38(4):4540–4549

  15. Loukides G, Shao J (2011) Preventing range disclosure in k-anonymised data. Expert Syst Appl 38(4):4559–4574

  16. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM TKDD, article 3

  17. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity In: Proc. of PODS, pp 223–228

  18. Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell, pp 529–544

  19. Motwani R, Nabar SU (2008) Anonymizing unstructured data, arXiv: 0810.5582v2, [cs.DB]

  20. Ni W, Chong Z (2012) Clustering-oriented privacy-preserving data publishing. Knowl-Based Syst, pp 264–270

  21. Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proc. of ACM SIGMOD, pp 67–78

  22. Samarati P, Sweeny L (1998) Generalizing data to provide anonymity when disclosing information. In: Proc. of ACM symposium on principles of database systems, p 188

  23. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13 (6): 1010–1027

    Article  Google Scholar 

  24. Sweeny L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10 (5): 557–570

    Article  Google Scholar 

  25. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain, Fuzziness Knowl-Based Syst 10(5):571–588

  26. Terrovitis M, Mamoulis N, Kalnis P (2011) Local and global recoding methods for anonymizing set-valued data. VLDB J 20 (1): 83–106

    Article  Google Scholar 

  27. Terrovitis M, Mamoulis N, Kalnis P (2008) Privacy-preserving anonymization of set-valued data. In: Proc. of PVLDB, pp 115–125

  28. Wang SL, Tsai YC, Kao HY, Hong TP (2010) Anonymizing set-valued social data. In: Proc. of the IEEE International Symposium on Social Computing and Networking (SocialNet)

  29. Wang SL, Tsai YC, Kao HY, Hong TP (2011) Extending suppression for anonymization on set-valued data. Int J Innov Comput, Inf Control 7(12):6849–6863

  30. Wang SL, Tsai YC, Kuo HY, Hong TP (2011) K-anonymity on sensitive transaction items. In: Proc. of the IEEE International Conference on GrC, pp 723–727

  31. Xu T, Wang K, Fu AWC, Yu PS (2008) Anonymizing transaction databases for publication. In: Proc. of SIGKDD, pp 767–775

  32. Xu Y, Fung BCM, Wang K, Fu AWC, Pei J (2008) Publishing sensitive transactions for itemset utility. In: Proc. of ICDM,pp 1109–1114

  33. Xue M, Karras P, Raissi C, Vaidya J, Tan K (2012) Anonymizing set-valued data by nonreciprocal recording. In: Proc. of SIGKDD, pp 1050–1058

  34. Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766

Download references

Acknowledgments

This work was supported in part by the National Science Council, Taiwan, under grants NSC-100-2221-E-390-030, NSC-101-2221-E-390-028-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Chuan Tsai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, SL., Tsai, YC., Kao, HY. et al. On anonymizing transactions with sensitive items. Appl Intell 41, 1043–1058 (2014). https://doi.org/10.1007/s10489-014-0554-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0554-9

Keywords

Navigation