An Anonymization Method to Improve Data Utility for Classification

Han, Jianmin; Yu, Juan; Lu, Jianfeng; Peng, Hao; Wu, Jiandang

doi:10.1007/978-3-319-69471-9_5

An Anonymization Method to Improve Data Utility for Classification

Jianmin Han¹⁶,
Juan Yu¹⁷,
Jianfeng Lu¹⁶,
Hao Peng¹⁶ &
…
Jiandang Wu¹⁶

Conference paper
First Online: 21 October 2017

2240 Accesses
6 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10581))

Abstract

k-anonymity is a popular method to preserve privacy in microdata, which sacrifices data utility for preserving individuals’ privacy. Therefore, how to preserve privacy with high data utility has been becoming a hot topic in k-anonymity area. Existing anonymization methods seldomly consider the data utility for specific data mining. To address the problem, we define a novel attribute weight measurement for determining the generalization order, and further propose a new anonymization algorithm based on the weight measurement using global generalization, called Weighted Full-Domain Anonymization (WFDA) Algorithm. The main idea of the algorithm is to generalize attributes with large weights to lower levels, and attributes with small weights to high levels. The proposed algorithm can reserve data utility for classification to a large extent. Experiments show that anonymous data resulted from the proposed method retains higher utility, i.e., has better classification accuracy, than that generated by other anonymization methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACM-SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, Piscataway, NJ, p. 188. IEEE (1998)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, Atlanta, pp. 24–35. IEEE Computer Society (2006)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, pp. 106–115. IEEE Press (2007)
Google Scholar
Gramaglia, M., Fiore, M., Tarable, A., et al.: k ^τ,ɛ-anonymity: Towards Privacy-Preserving Publishing of Spatiotemporal Trajectory Data (2017). arXiv preprint: arXiv:1701.02243
Jia, J., Yan, G., Xing, L.: Personalized sensitive attribute anonymity based on P - sensitive K anonymity. In: Proceedings of the 2016 International Conference on Intelligent Information Processing, New York, NY, USA, pp. 54:1–54:7 (2016)
Google Scholar
Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:M microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)
Article Google Scholar
Yin, C., Zhang, S., Xi, J., et al.: An improved anonymity model for big data security based on clustering algorithm. Concurrency Comput. Pract. Exp. 29(7) (2017)
Google Scholar
Tsai, Y.-C., Wang, S.-L., Song, C.-Y., Ting, I.-H.: Privacy and utility effects of k-anonymity on association rule hiding. In: Proceedings of the 3rd Multidisciplinary International Social Networks Conference on Social Informatics 2016, Data Science 2016, New York, NY, USA, pp. 42:1–42:6 (2016)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, pp. 279–288, July 2002
Google Scholar
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 205–216 (2004)
Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 205–216, April 2005
Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)
Article Google Scholar
Kisilevich, S., Rokach, L., Elovici, Y., Shapira, B.: Efficient multidimensional suppression for k-anonymity. IEEE Trans. Knowl. Data Eng. 22(3), 334–347 (2010)
Article Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: International Conference on Data Engineering (ICDE 2006), p. 25. IEEE Computer Society (2006)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. 33(3), 1–47 (2008)
Article Google Scholar
Li, J., Liu, J., Baig, M.: Information based data anonymization for classification utility. Elsevier, 18 July 2011
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers Inc. (2005)
Google Scholar
Blake, E.K.C., Merz, C.J.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/mlearn/MLRepository.html
Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques. China Machine Press, Beijing (2012)
Google Scholar

Download references

Acknowledgment

We thank anonymous reviewers for constructive comments, which lead to a substantial improvement of this paper. This work is supported by National Natural Science Foundation of China (Grant No. 61402418, 61503342, 61672468, 61602418), the Young Scientists Fund of the National Natural Science Foundation of China (Grant No. 61702148), MOE (Ministry of Education in China) Project of Humanity and Social Science (Grant No. 12YJCZH142, 15YJCZH125), Social development project of Zhejiang provincial public technology research (Grant No. 2016C33168), Zhejiang Provincial Natural Science Foundation of China (Grant No. LY15F020013, LQ13F020007, LY16F030002, LQ16F020002), Key Lab of Information Network Security, Ministry of Public Security (Grant No. C15610), and Opening Fund of Shanghai Information Security Key Laboratory of Integrated Management of Technology (Grant No. AGK2013003).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
Jianmin Han, Jianfeng Lu, Hao Peng & Jiandang Wu
Smart City Research Center, Hangzhou Dianzi University, Hangzhou, 310018, China
Juan Yu

Authors

Jianmin Han
View author publications
You can also search for this author in PubMed Google Scholar
Juan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jiandang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianmin Han .

Editor information

Editors and Affiliations

Deakin University , Geelong, China
Sheng Wen
Fujian Normal University, Fuzhou Shi, China
Wei Wu
University of Salerno, Fisciano, Italy
Aniello Castiglione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, J., Yu, J., Lu, J., Peng, H., Wu, J. (2017). An Anonymization Method to Improve Data Utility for Classification. In: Wen, S., Wu, W., Castiglione, A. (eds) Cyberspace Safety and Security. CSS 2017. Lecture Notes in Computer Science(), vol 10581. Springer, Cham. https://doi.org/10.1007/978-3-319-69471-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-69471-9_5
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69470-2
Online ISBN: 978-3-319-69471-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics