Abstract
As a representative model for privacy preserving data publishing, K-anonymity has raised a considerable number of questions for researchers over the past few decades. Among them, how to achieve data release without sacrificing the users’ privacy and how to maximize the availability of published data is the ultimate goal of privacy preserving data publishing. In order to enhance the clustering effect and reduce the unnecessary computation, this paper proposes a weighted K-member clustering algorithm. A series of weight indicators are designed to evaluate the outlyingness of records, distance between records, and information loss of the published data. The proposed algorithm can reduce the influence of outliers on the clustering effect and maintain the availability of data to the best possible extent during the clustering process. Experimental analysis suggests that the proposed method generates lower information loss, improves the clustering effect, and is less sensitive to outliers as compared with some existing methods.
This is a preview of subscription content, access via your institution.









References
- 1.
Zheng WT, Zhongyue W, Tongtong Lv, Ma Y, Jia C (2018) K-anonymity algorithm based on improved clustering. In: Proceedings of the 18th international conference on algorithms and architectures for parallel processing (ICA3PP 2018). Guangzhou, China, November, pp 462–476
- 2.
Huang Z, Liu S, Mao X, Chen K, Li J (2017) Insight of the protection for data security under selective opening attacks. Inf Sci 412:223–241
- 3.
Li J, Huang X, Chen X, Xiang Y (2014) Insight of the protection for data security under selective opening attacks. IEEE Trans Parallel Distrib Syst 25:2201–2210
- 4.
Yan Y, Gao X, Adnan M, Feng T, Xie PS (2020) ENG differential private spatial decomposition and location publishing based on unbalanced quadtree partition algorithm. IEEE Access 8(1):104775–104787
- 5.
Yan Y, Wang BQ, Quan Z, Sheng Adnan M, Feng T, Xie PS (2020) Modelling the publishing process of big location data using deep learning prediction methods. Electronics 9(3):420
- 6.
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10:557–570
- 7.
Meyerson A, Williams R (2004) On the complexity of optimal K-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04). Paris, France, pp 223–228
- 8.
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’98). Seattle, WA, USA, p 188
- 9.
Byun JW, Kamra S, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA’07). Bangkok, Thailand, pp 188–200
- 10.
Lin J, MengCheng W (2008) An efficient clustering method for k-anonymization. In: Proceedings of the 11th international conference on extending database technology. Nantes, France, pp 46–50
- 11.
Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization for privacy preservation with less information loss. ACM SIGKDD Explor Newsl 8:26–30
- 12.
Li H, Zhu H, Du S, Liang X, Shen X (2018) Privacy leakage of location sharing in mobile social networks: attacks and defense. IEEE Trans Depend Secur Comput 15:646–660
- 13.
Ren XM (2012) Research for privacy protection method based on K-anonymity. Harbin Engineering University (Master thesis)
- 14.
Liu QH, Shen H, Sang Yp (2015) Privacy preserving data publishing for multiple numerical sensitive attributes. Tsinghua Sci Technol 20:246–254
- 15.
Bhaladhare PR, Jinwala DC (2016) Novel approaches for privacy preserving data mining in K-anonymity model. Inf Sci Eng 32:63–78
- 16.
Xin Y, Xie ZQ, Yang J (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143
- 17.
Palanisamy B, Liu L, Zhou Y, Wang Q (2018) Privacy-preserving publishing of multilevel utility-controlled graph datasets. ACM Trans Internet Technol 18:1–21
- 18.
Liu F, Li T (2018) A clustering K-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8
- 19.
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:58
- 20.
Tan PN, Steinbach M, Karpatne A, Kumar V (2019) Introduction to data mining, 2nd edn. Pearson, Boston, pp 563–565
- 21.
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29:93–104
- 22.
Meyerson A, Williams R (2016) The non-uniform k-center problem. In: Proceedings of the 43rd international colloquium on automata languages and programming (ICALP 2016). Rome, Italy, pp 223–228
- 23.
Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. In: Processing of the VLDB endowment. pp 757–768
- 24.
Huang L, Jiang S, Li J, Wu X (2018) Epsilon-coresets for clustering (with outliers) in doubling metrics. In: Proceedings of the 2018 IEEE 59th annual symposium on foundations of computer science (FOCS). pp 814–825
- 25.
Ceccarello M, Pietracaprina A, Pucci G (2019) Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: Processing of the VLDB endowment. pp 766–778
- 26.
Guha S, Li Y, Zhang Q (2017) Distributed partial clustering. In: Proceedings of the 29th ACM symposium on parallelism in algorithms and architectures (SPAA’17). Washington DC, USA, pp 143–152
- 27.
Li S, Guo X (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Montréal, Canada, pp 7849–7857
- 28.
Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15). pp 1063–1071
- 29.
Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2017) A Scalable and Efficient Outlier Strategy for Categorical Data. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 210–217
- 30.
Meltzer M (2015) Outlier detection in datasets with mixed-attributes. Vrije University (Master thesis)
- 31.
Li Hang (2015) Learning to rank for information retrieval and natural language processing, 2nd edn. Morgan & Claypool, San Rafael
Acknowledgements
The research-at-hand is duly supported by National Nature Science Foundation of China (Nos. 61762059, 61762060, and 61862040).
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, Y., Herman, E.A., Mahmood, A. et al. A weighted K-member clustering algorithm for K-anonymization. Computing (2021). https://doi.org/10.1007/s00607-021-00922-0
Received:
Accepted:
Published:
Keywords
- K-anonymity
- Privacy preserving data publishing
- Information loss
- Clustering
- Outliers
Mathematics Subject Classification
- 68P27