Abstract
Anonymization of graph-based data is a problem, which has been widely studied last years, and several anonymization methods have been developed. Information loss measures have been carried out to evaluate the noise introduced in the anonymized data. Generic information loss measures ignore the intended anonymized data use. When data has to be released to third-parties, and there is no control on what kind of analyses users could do, these measures are the standard ones. In this paper we study different generic information loss measures for graphs comparing such measures to the cluster-specific ones. We want to evaluate whether the generic information loss measures are indicative of the usefulness of the data for subsequent data mining processes.
Similar content being viewed by others
References
Aggarwal CC, Wang H (eds) (2010) Managing and mining graph data. Springer, New York
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 10:P10008
Budi A, Lo D, Jiang L, Lucia (2011) \(kb\)-Anonymity: a model for anonymized behaviour-preserving test and debugging data. ACM SIGPLAN conference on programming language design and implementation (PLDI). ACM Press, New York, pp 447–457
Cai B-J, Wang H-Y, Zheng H-R, Wang H (2010) Evaluation repeated random walks in community detection of social networks. In: 2010 International conference on machine learning and cybernetics (ICMLC). IEEE Computer Society, Qingdao, pp 1849–1854
Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) An algorithm for \(k\)-degree anonymity on large networks. In: Proceedings of the 2013 international conference on advances on social networks analysis and mining (ASONAM). IEEE Computer Society, Niagara Falls, pp 671–675
Chakrabarti D and Faloutsos C (2006) Graph mining: Laws, generators, and algorithms. ACM Comput Surv 38(1):2:1–2:69
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Cormode G, Srivastava D, Yu T, Zhang Q (2010) Anonymizing bipartite graph data using safe groupings. Proc VLDB Endow 19(1):115–139
Das S, Egecioglu Ö, Abbadi A (2010) Anonymizing weighted social network graphs. In: IEEE 26th international conference on data engineering (ICDE). IEEE Computer Society, Long Beach, pp 904–907
Dongen S-M (2000) Graph clustering by flow simulation. Dissertation, University of Utrecht
Dwork C (2006) Differential privacy. In: Proceedings of the 33rd international conference on automata, languages and programming (ICALP). Springer, Berlin, pp 1–12
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(04):565–573
Guimerà R, Danon L, Díaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):065103
Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks, Report. University of Massachusetts, Amherst
Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114
Hay M, Li C, Miklau G, Jensen D (2009) Accurate Estimation of the Degree Distribution of Private Networks. In: 9th International conference on data mining (ICDM). IEEE Computer Society, Miami, pp 169–178
Herrera-Joancomartí J, Pérez-Solà C (2011) Online social Honeynets: trapping web crawlers in OSN. In: Proceedings of the 2011 international conference on modeling decisions for artificial intelligence (MDAI). Springer, Girona, pp 115–131
Lancichinetti A and Fortunato S (2009) Community detection algorithms: a comparative analysis. In: Proceedings of the fourth international ICST conference on performance evaluation methodologies and tools. ICST, Pisa, pp 27:1–27:2
Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: 23rd International conference on data engineering (ICDE). IEEE Computer Society, Istanbul, pp 106–115
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of the ACM international conference on management of data (SIGMOD). ACM Press, New York, pp 93–106
Lucia Lo D, Jiang L, Budi A (2012) \(kb^{e}\)-Anonymity: test data anonymization for evolving programs. In: International conference on automated software engineering (ASE). ACM Press, New York, pp 262–265
Machanavajjhala A, Kifer D, Gehrke J and Venkitasubramaniam M (2007) \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data 1(1):3:1–3:12
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM conference on internet measurement (ICM). ACM Press, New York, pp 29–42
Newman MEJ, Girvan M (2003) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Pons P, Latapy M (2005) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123
Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of the SIAM international conference on data mining (SDM). SIAM, Atlanta, pp 739–750
Ying X, Pan K, Wu X and Guo L (2009) Comparisons of randomization and \(k\)-degree anonymization schemes for privacy preserving social network publishing. In: Proceedings of the 3rd workshop on social network mining and analysis (SNA-KDD). ACM Press, New York, pp 10:1–10:10
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang K, Lo D, Lim E, Prasetyo P (2013) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst 35(3):553–583
Zheleva E, Getoor L (2011) Privacy in social networks: a survey. In: Aggarwal CC (ed) Social network data analytics, 1st edn. Springer, Berlin, pp 277–306
Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th international conference on data engineering (ICDE). IEEE Computer Society, Washington, pp 506–515
Zou L, Chen L, Özsu MT (2009) \(K\)-Automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957
Acknowledgments
This work was partly funded by the Spanish Government through projects TIN2011-27076-C03-02 “CO-PRIVACY”, CONSOLIDER INGENIO 2010 CSD2007-0004 “ARES” and TIN2010-15764 “N-KHRONOUS”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Casas-Roma, J., Herrera-Joancomartí, J. & Torra, V. Anonymizing graphs: measuring quality for clustering. Knowl Inf Syst 44, 507–528 (2015). https://doi.org/10.1007/s10115-014-0774-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0774-7