On anonymizing transactions with sensitive items

Wang, Shyue-Liang; Tsai, Yu-Chuan; Kao, Hung-Yu; Hong, Tzung-Pei

doi:10.1007/s10489-014-0554-9

On anonymizing transactions with sensitive items

Published: 03 September 2014

Volume 41, pages 1043–1058, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shyue-Liang Wang¹,
Yu-Chuan Tsai²,
Hung-Yu Kao² &
…
Tzung-Pei Hong³

374 Accesses
13 Citations
Explore all metrics

Abstract

K-anonymity (Samarati and Sweeny 1998; Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001; Sweeny, Int J Uncertain, Fuzziness Knowl-Based Syst, 10(5):557–570, 2002) and its variants, l-diversity (Machanavajjhala et al., ACM TKDD, 2007) and tcloseness (Li et al. 2007) among others are anonymization techniques for relational data and transaction data, which are used to protect privacy against re-identification attacks. A relational dataset D is k-anonymous if every record in D has at least k-1 other records with identical quasi-identifier attribute values. The combination of released data with external data will never allow the recipient to associate each released record with less than k individuals (Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001). However, the current concept of k-anonymity on transaction data treats all items as quasi-identifiers. The anonymized data set has k identical transactions in groups and suffers from lower data utility (He and Naughton 2009; He et al. 2011; Liu and Wang 2010; Terrovitis et al., VLDB J, 20(1):83–106, 2011; Terrovitis et al. 2008). To improve the utility of anonymized transaction data, this work proposes a novel anonymity concept on transaction data that contain both quasi-identifier items (QID) and sensitive items (SI). A transaction that contains sensitive items must have at least k-1 other identical transactions (Ghinita et al. IEEE TKDE, 33(2):161–174, 2011; Xu et al. 2008). For a transaction that does not contain a sensitive item, no anonymization is required. A transaction dataset that satisfies this property is said to be sensitive k-anonymous. Three algorithms, Sensitive Transaction Neighbors (STN) Gray Sort Clustering (GSC) and Nearest Neighbors for K-anonymization (K-NN), are developed. These algorithms use adding/deleting QID items and only adding SI to achieve sensitive k-anonymity on transaction data. Additionally, a simple “privacy value” is proposed to evaluate the degree of privacy for different types of k-anonymity on transaction data. Extensive numerical simulations were carried out to demonstrate the characteristics of the proposed algorithms and also compared to other types of k-anonymity approaches. The results show that each technique possesses its own advantage under different criteria such as running time, operation, and information loss. The results obtained here can be used as a guideline of the selection of anonymization technique on different data sets and for different applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible sensitive K-anonymization on transactions

Article 01 April 2020

Estimation of cost of k–anonymity in the number of dummy records

Article 21 March 2022

A New Approach for Anonymizing Relational and Transaction Data

References

Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2006) Achieving anonymity via clustering. In: Proc. of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 153–162
Barbaro M, Jr TZ (2006) A face is exposed for AOL searcher no. 4417749 New York Times
Fung BCM, Wang K, Chen R, Yu PS. (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv 42(4)
Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: Proc. of ICDE, pp 715–724
Ghinita G, Kalnis P, Tao Y (2011) Anonymous publication of sensitive transactional data. In: IEEE TKDE, 33(2):161–174
He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. In: Proc. of PVLDB, pp 934–945
He Y, Barman S, Naughton JE (2011) Preventing equivalence attacks in updated, anonymized data. In: Proc. of ICDE
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Applied Intelligence, pp 502–510
IBM Quest Market-Basket Synthetic Data Generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
Islam MZ, Brankovic L (2011) Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowledge-based Systems, pp 1214–1223
LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proc. of SIGMOD, p 25
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proc. of ICDE, pp 106–115
Liu JQ, Wang K (2010) Anonymizing transaction data by integrating suppression and generalization. In: Proc. of PAKDD,pp 171–180
Liu L, Zhu H, Huang Z (2011) Analysis of the minimal privacy disclosure for web services collaborations with role mechanisms. Expert Syst Appl 38(4):4540–4549
Loukides G, Shao J (2011) Preventing range disclosure in k-anonymised data. Expert Syst Appl 38(4):4559–4574
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM TKDD, article 3
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity In: Proc. of PODS, pp 223–228
Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell, pp 529–544
Motwani R, Nabar SU (2008) Anonymizing unstructured data, arXiv: 0810.5582v2, [cs.DB]
Ni W, Chong Z (2012) Clustering-oriented privacy-preserving data publishing. Knowl-Based Syst, pp 264–270
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proc. of ACM SIGMOD, pp 67–78
Samarati P, Sweeny L (1998) Generalizing data to provide anonymity when disclosing information. In: Proc. of ACM symposium on principles of database systems, p 188
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13 (6): 1010–1027
Article Google Scholar
Sweeny L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain, Fuzziness Knowl-Based Syst 10 (5): 557–570
Article Google Scholar
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain, Fuzziness Knowl-Based Syst 10(5):571–588
Terrovitis M, Mamoulis N, Kalnis P (2011) Local and global recoding methods for anonymizing set-valued data. VLDB J 20 (1): 83–106
Article Google Scholar
Terrovitis M, Mamoulis N, Kalnis P (2008) Privacy-preserving anonymization of set-valued data. In: Proc. of PVLDB, pp 115–125
Wang SL, Tsai YC, Kao HY, Hong TP (2010) Anonymizing set-valued social data. In: Proc. of the IEEE International Symposium on Social Computing and Networking (SocialNet)
Wang SL, Tsai YC, Kao HY, Hong TP (2011) Extending suppression for anonymization on set-valued data. Int J Innov Comput, Inf Control 7(12):6849–6863
Wang SL, Tsai YC, Kuo HY, Hong TP (2011) K-anonymity on sensitive transaction items. In: Proc. of the IEEE International Conference on GrC, pp 723–727
Xu T, Wang K, Fu AWC, Yu PS (2008) Anonymizing transaction databases for publication. In: Proc. of SIGKDD, pp 767–775
Xu Y, Fung BCM, Wang K, Fu AWC, Pei J (2008) Publishing sensitive transactions for itemset utility. In: Proc. of ICDM,pp 1109–1114
Xue M, Karras P, Raissi C, Vaidya J, Tan K (2012) Anonymizing set-valued data by nonreciprocal recording. In: Proc. of SIGKDD, pp 1050–1058
Yang W, Qiao S (2010) A novel anonymization algorithm: privacy protection and knowledge preservation. Expert Syst Appl 37(1):756–766

Download references

Acknowledgments

This work was supported in part by the National Science Council, Taiwan, under grants NSC-100-2221-E-390-030, NSC-101-2221-E-390-028-MY3.

Author information

Authors and Affiliations

Department of Information Management, National University of Kaohsiung, Kaohsiung, 81148, Taiwan
Shyue-Liang Wang
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
Yu-Chuan Tsai & Hung-Yu Kao
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, 81148, Taiwan
Tzung-Pei Hong

Authors

Shyue-Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Chuan Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Yu Kao
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Chuan Tsai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, SL., Tsai, YC., Kao, HY. et al. On anonymizing transactions with sensitive items. Appl Intell 41, 1043–1058 (2014). https://doi.org/10.1007/s10489-014-0554-9

Download citation

Published: 03 September 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10489-014-0554-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On anonymizing transactions with sensitive items

Abstract

Access this article

Similar content being viewed by others

Flexible sensitive K-anonymization on transactions

Estimation of cost of k–anonymity in the number of dummy records

A New Approach for Anonymizing Relational and Transaction Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On anonymizing transactions with sensitive items

Abstract

Access this article

Similar content being viewed by others

Flexible sensitive K-anonymization on transactions

Estimation of cost of k–anonymity in the number of dummy records

A New Approach for Anonymizing Relational and Transaction Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation