Scalable non-deterministic clustering-based k-anonymization for rich networks

Ros-Martín, Miguel; Salas, Julián; Casas-Roma, Jordi

doi:10.1007/s10207-018-0409-1

Scalable non-deterministic clustering-based k-anonymization for rich networks

Regular Contribution
Published: 21 May 2018

Volume 18, pages 219–238, (2019)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

458 Accesses
13 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we tackle the problem of graph anonymization in the context of privacy-preserving social network mining. We present a greedy and non-deterministic algorithm to achieve k-anonymity on labeled and undirected networks. Our work aims to create a scalable algorithm for real-world big networks, which runs in parallel and uses biased randomization for improving the quality of the solutions. We propose new metrics that consider the utility of the clusters from a recommender system point of view. We compare our approach to SaNGreeA, a well-known state-of-the-art algorithm for k-anonymity generalization. Finally, we have performed scalability tests, with up to 160 machines within the Hadoop framework, for anonymizing a real-world dataset with around 830 K nodes and 63 M relationships, demonstrating our method’s utility and practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

The homophily principle in social network analysis: A survey

Article 18 January 2022

Private measures, random walks, and synthetic data

Article Open access 20 April 2024

Notes

http://www.xing.com.
The Scala source code is available at: https://bitbucket.org/jcasasr/scan-algorithm/.
Available at: http://igraph.org.

References

Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election. In: LinkKDD’05, pp. 36–43. ACM (2005)
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet MATH Google Scholar
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, 2008 (2008)
Google Scholar
Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 188–200 (2007)
Cai, B.J., Wang, H.Y., Zheng, H.R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: ICMLC’10, pp. 1849–1854. IEEE (2010)
Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) Privacy, Security, and Trust in KDD, pp. 33–54. Springer, Berlin (2009)
Chapter Google Scholar
Campan, A., Alufaisan, Y., Truta, T.M.: Preserving communities in anonymized social networks. Trans. Data Privacy 8(1), 55–87 (2015)
Google Scholar
Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2014)
Article Google Scholar
Casas-Roma, J., Rousseau, F.: Community-preserving generalization of social networks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1465–1472. ACM, Paris, France (2015)
Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341366 (2017)
Article Google Scholar
Chester, S., Kapron, B., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2012)
Article Google Scholar
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)
Article Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 1–6 (2004)
Article Google Scholar
Erdös, P., Rényi, A.: On random graphs I. Publ. Math. 6, 290–297 (1959)
MathSciNet MATH Google Scholar
Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)
Article Google Scholar
Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD), pp. 1201–1204. ACM Press, New York, USA (2011)
He, J., Chu, W.W.: A social network-based recommender system (SNRS). Ann. Inf. Syst. 12, 47–74 (2010)
Article Google Scholar
Juan, A.A., Fauln, J., Ferrer, A., Lourenço, H.R., Barrios, B.: MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. TOP 21(1), 109–132 (2011)
Article MathSciNet MATH Google Scholar
Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM,NY, USA (2008)
Lu, X., Song, Y., Bressan, S.: Fast identity anonymization on graphs. In: 23rd International Conference on Database and Expert Systems Applications, pp. 281–295. Springer, Berlin (2012)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd IEEE International Conference on Data Engineering (2006)
Nettleton, D.F., Salas, J.: A data driven anonymization system for information rich online social network graphs. Expert Syst. Appl. 55, 87–105 (2016)
Article Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 1–16 (2003)
Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)
Article Google Scholar
Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 2531 (2015)
Article MathSciNet MATH Google Scholar
Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109114 (2016)
Article MathSciNet MATH Google Scholar
Salas, J.: Sampling and merging for graph anonymization. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds.) Modeling Decisions for Artificial Intelligence, pp. 250–260. Springer, Andorra (2016)
Chapter Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sihag, V.K.: A clustering approach for structural \(k\)-anonymity in social networks using genetic algorithm. In: CUBE International Information Technology Conference, pp. 701–706. ACM, Pune, India (2012)
Singh, L., Schramm, C.: Identifying similar neighborhood structures in private social networks. In: International Conference on Data Mining Workshops, pp. 507–516. IEEE, Sydney, NSW (2010)
Skarkala, M.E., Maragoudakis, M., Gritzalis, S., Mitrou, L., Toivonen, H., Moen, P.: Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of ASONAM. pp. 423–428. IEEE Computer Society (2012)
Stokes, K., Torra, V.: On some clustering approaches for graphs. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 409–415 (2011)
Stokes k, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 16571670 (2012)
Article MATH Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Tassa, T., Cohen, D.J.: Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans. Knowl. Data Eng. 25(2), 311–324 (2013)
Article Google Scholar
Wu, X., Ying, X., Liu, K., Chen, L.: Managing and Mining Graph Data, Chapter A Survey of Privacy-Preservation of Graphs and Social Networks, pp. 421–453. Springer, Boston, MA (2010)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)
Article Google Scholar
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)
Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newsl. 10(2), 1222 (2008)
Article Google Scholar

Download references

Acknowledgements

Jordi Casas-Roma was partially supported by the Spanish MCYT and the FEDER funds under Grants TIN2011-27076-C03 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.” Julián Salas acknowledges the support of a UOC postdoctoral fellowship.

Author information

Authors and Affiliations

XING, Barcelona, Spain
Miguel Ros-Martín
Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya (UOC), Barcelona, Spain
Julián Salas & Jordi Casas-Roma

Authors

Miguel Ros-Martín
View author publications
You can also search for this author in PubMed Google Scholar
Julián Salas
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Casas-Roma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Casas-Roma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ros-Martín, M., Salas, J. & Casas-Roma, J. Scalable non-deterministic clustering-based k-anonymization for rich networks. Int. J. Inf. Secur. 18, 219–238 (2019). https://doi.org/10.1007/s10207-018-0409-1

Download citation

Published: 21 May 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s10207-018-0409-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable non-deterministic clustering-based k-anonymization for rich networks

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

The homophily principle in social network analysis: A survey

Private measures, random walks, and synthetic data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable non-deterministic clustering-based k-anonymization for rich networks

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

The homophily principle in social network analysis: A survey

Private measures, random walks, and synthetic data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation