A weighted K-member clustering algorithm for K-anonymization

Abstract

As a representative model for privacy preserving data publishing, K-anonymity has raised a considerable number of questions for researchers over the past few decades. Among them, how to achieve data release without sacrificing the users’ privacy and how to maximize the availability of published data is the ultimate goal of privacy preserving data publishing. In order to enhance the clustering effect and reduce the unnecessary computation, this paper proposes a weighted K-member clustering algorithm. A series of weight indicators are designed to evaluate the outlyingness of records, distance between records, and information loss of the published data. The proposed algorithm can reduce the influence of outliers on the clustering effect and maintain the availability of data to the best possible extent during the clustering process. Experimental analysis suggests that the proposed method generates lower information loss, improves the clustering effect, and is less sensitive to outliers as compared with some existing methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data.

References

  1. 1.

    Zheng WT, Zhongyue W, Tongtong Lv, Ma Y, Jia C (2018) K-anonymity algorithm based on improved clustering. In: Proceedings of the 18th international conference on algorithms and architectures for parallel processing (ICA3PP 2018). Guangzhou, China, November, pp 462–476

  2. 2.

    Huang Z, Liu S, Mao X, Chen K, Li J (2017) Insight of the protection for data security under selective opening attacks. Inf Sci 412:223–241

    Article  Google Scholar 

  3. 3.

    Li J, Huang X, Chen X, Xiang Y (2014) Insight of the protection for data security under selective opening attacks. IEEE Trans Parallel Distrib Syst 25:2201–2210

    Article  Google Scholar 

  4. 4.

    Yan Y, Gao X, Adnan M, Feng T, Xie PS (2020) ENG differential private spatial decomposition and location publishing based on unbalanced quadtree partition algorithm. IEEE Access 8(1):104775–104787

    Article  Google Scholar 

  5. 5.

    Yan Y, Wang BQ, Quan Z, Sheng Adnan M, Feng T, Xie PS (2020) Modelling the publishing process of big location data using deep learning prediction methods. Electronics 9(3):420

    Article  Google Scholar 

  6. 6.

    Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10:557–570

    MathSciNet  Article  Google Scholar 

  7. 7.

    Meyerson A, Williams R (2004) On the complexity of optimal K-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04). Paris, France, pp 223–228

  8. 8.

    Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’98). Seattle, WA, USA, p 188

  9. 9.

    Byun JW, Kamra S, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA’07). Bangkok, Thailand, pp 188–200

  10. 10.

    Lin J, MengCheng W (2008) An efficient clustering method for k-anonymization. In: Proceedings of the 11th international conference on extending database technology. Nantes, France, pp 46–50

  11. 11.

    Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization for privacy preservation with less information loss. ACM SIGKDD Explor Newsl 8:26–30

    Article  Google Scholar 

  12. 12.

    Li H, Zhu H, Du S, Liang X, Shen X (2018) Privacy leakage of location sharing in mobile social networks: attacks and defense. IEEE Trans Depend Secur Comput 15:646–660

    Article  Google Scholar 

  13. 13.

    Ren XM (2012) Research for privacy protection method based on K-anonymity. Harbin Engineering University (Master thesis)

  14. 14.

    Liu QH, Shen H, Sang Yp (2015) Privacy preserving data publishing for multiple numerical sensitive attributes. Tsinghua Sci Technol 20:246–254

    Article  Google Scholar 

  15. 15.

    Bhaladhare PR, Jinwala DC (2016) Novel approaches for privacy preserving data mining in K-anonymity model. Inf Sci Eng 32:63–78

    Google Scholar 

  16. 16.

    Xin Y, Xie ZQ, Yang J (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143

    Article  Google Scholar 

  17. 17.

    Palanisamy B, Liu L, Zhou Y, Wang Q (2018) Privacy-preserving publishing of multilevel utility-controlled graph datasets. ACM Trans Internet Technol 18:1–21

    Article  Google Scholar 

  18. 18.

    Liu F, Li T (2018) A clustering K-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8

    Google Scholar 

  19. 19.

    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:58

    Article  Google Scholar 

  20. 20.

    Tan PN, Steinbach M, Karpatne A, Kumar V (2019) Introduction to data mining, 2nd edn. Pearson, Boston, pp 563–565

    Google Scholar 

  21. 21.

    Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29:93–104

    Article  Google Scholar 

  22. 22.

    Meyerson A, Williams R (2016) The non-uniform k-center problem. In: Proceedings of the 43rd international colloquium on automata languages and programming (ICALP 2016). Rome, Italy, pp 223–228

  23. 23.

    Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. In: Processing of the VLDB endowment. pp 757–768

  24. 24.

    Huang L, Jiang S, Li J, Wu X (2018) Epsilon-coresets for clustering (with outliers) in doubling metrics. In: Proceedings of the 2018 IEEE 59th annual symposium on foundations of computer science (FOCS). pp 814–825

  25. 25.

    Ceccarello M, Pietracaprina A, Pucci G (2019) Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: Processing of the VLDB endowment. pp 766–778

  26. 26.

    Guha S, Li Y, Zhang Q (2017) Distributed partial clustering. In: Proceedings of the 29th ACM symposium on parallelism in algorithms and architectures (SPAA’17). Washington DC, USA, pp 143–152

  27. 27.

    Li S, Guo X (2018) Distributed k-clustering for data with heavy noise. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Montréal, Canada, pp 7849–7857

  28. 28.

    Malkomes G, Kusner MJ, Chen W, Weinberger KQ, Moseley B (2015) Fast distributed k-center clustering with outliers on massive data. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15). pp 1063–1071

  29. 29.

    Koufakou A, Ortiz EG, Georgiopoulos M, Anagnostopoulos GC, Reynolds KM (2017) A Scalable and Efficient Outlier Strategy for Categorical Data. In: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 210–217

  30. 30.

    Meltzer M (2015) Outlier detection in datasets with mixed-attributes. Vrije University (Master thesis)

  31. 31.

    Li Hang (2015) Learning to rank for information retrieval and natural language processing, 2nd edn. Morgan & Claypool, San Rafael

    Google Scholar 

Download references

Acknowledgements

The research-at-hand is duly supported by National Nature Science Foundation of China (Nos. 61762059, 61762060, and 61862040).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eyeleko Anselme Herman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yan, Y., Herman, E.A., Mahmood, A. et al. A weighted K-member clustering algorithm for K-anonymization. Computing (2021). https://doi.org/10.1007/s00607-021-00922-0

Download citation

Keywords

  • K-anonymity
  • Privacy preserving data publishing
  • Information loss
  • Clustering
  • Outliers

Mathematics Subject Classification

  • 68P27