A synthetic data generator for online social network graphs

  • David F. NettletonEmail author
Original Article


Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. One possible solution to both of these problems is to use synthetically generated data. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels.


Graphs and networks Online social networks Synthetic data generation Topology Attributes Attribute-values Seeds Communities 



This work is partially funded by the Spanish MEC (project TIN2013-49814-EXP). The author is grateful for the suggestions of Prof. Vladimir Estivill-Castro of the Pompeu Fabra University, Barcelona, Spain, and of Dr. Julián Salas of the University Rovira i Virgili, Tarragona, Spain.


  1. Ali AM (2014) Synthetic generators for simulating social networks, 2014. Masters thesis, Univ. FloridaGoogle Scholar
  2. Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of SocInfo 2014Google Scholar
  3. Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and Analysis of Large Synthetic Social Contact Networks. In: Proceedings of the 2009 Winter Simulation Conference, 13–16 Dec 2009, pp 1003–1014Google Scholar
  4. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. Int AAAI Conf Weblogs Soc Media ICWSM 8(2009):361–362Google Scholar
  5. Block P, Grund T (2014) Multidimensional homophily in friendship networks. Netw Sci (Camb Univ Press) 2(2):189–212Google Scholar
  6. Blondel VD, Guillaume JL, Lambiotte R, Lefebure E (2008) Fast unfolding of communities in large networks. J Stat Mech P10008Google Scholar
  7. Boncz P, Perez M, Gavalda R., Angles R, Erling O, Gubichev A, Spasić M, Pham MD, Martínez N (2014) Benchmark Design for Navigational Pattern Matching Benchmarking. LDBC Cooperative Project FP7 – 317548. Coordinators: Arnau Prat, Alex Averbuch. Issue 3 28/09/2014Google Scholar
  8. Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring User Influence in Twitter: The Million Follower Fallacy. In: Proceedings of 4th Int. AAAI Conf. on Weblogs and Social Media (ICWSM), vol 10, pp 10–17Google Scholar
  9. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: A recursive model for graph mining. In: Proc. SIAM Data Mining Conference, 2004. SIAM, Philadelphia, PAGoogle Scholar
  10. Currarini S, Redondoy FV. A Simple Model of Homophily in Social Networks (2013) University Ca’ Foscari of Venice, Dept. of Economics Research Paper Series No. 24, 2013Google Scholar
  11. Dean J, Sanjay G (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  12. Dehghani M, Johnson K, Hoover J, Sagi E, Garten J, Parmar NJ, Vaisey S, Iliev R, Graham J (2016) Purity homophily in social networks. J Exp Psychol Gen 145(3):366–375CrossRefGoogle Scholar
  13. Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(4):681–735CrossRefGoogle Scholar
  14. EU’s Data Protection Directive (2015) Justice, Protection of personal data.
  15. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hagberg A, Schult D, Swart, P, Conway D, Séguin-Charbonneau L, Ellison C, Edwards B, Torrents J (2004) Networkx. High productivity software for complex networks. Webová strá nka
  17. Hajibagheri A, Hamzeh A, Sukthankar G (2013). Modeling information diffusion and community membership using stochastic optimization. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on (pp 175–182). IEEE. describes our community detection algorithm, GPSODMGoogle Scholar
  18. Hajibagheri A, Lakkaraju K, Sukthankar G, Wigand RT, Agarwal N (2015) Conflict and Communication in Massively-Multiplayer Online Games, Social Computing, Behavioral-Cultural Modeling, and Prediction, Vol. 9021, Lecture Notes in Computer Science, pp 65–74, 17 March 2015Google Scholar
  19. Jones R, Kumar R, Pang B, Tomkins A (2007) I know what you did last summer: Query logs and user privacy, Sixteenth ACM Conf. on Information and Knowledge Management, ser. CIKM. 2007, pp 909–914Google Scholar
  20. Kelly, H. (2012) “83 million Facebook accounts are fakes and dupes”. CNN, August 3, 2012.
  21. Kim M, Leskovec J (2011) Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model. In: Proc. UAI 2011, 27th Conf. on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14–17, 2011Google Scholar
  22. Korsgaard M, Picot A, Wigand R, Welpe I, Assmann J (2010) Cooperation, coordination, and trust in virtual teams: Insights from virtual games. In: Online Worlds: Convergence of the Real and the VirtualGoogle Scholar
  23. Kossinets G, Watts D (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90MathSciNetCrossRefzbMATHGoogle Scholar
  24. Kossinets G, Watts D (2009) Origins of homophily in an evolving social network. Am J Sociol 115(2):405–450CrossRefGoogle Scholar
  25. Lakkaraju K, Whetzel J (2013) Group roles in massively multiplayer online games. In: Proceedings of the Workshop on Collaborative Online Organizations at the 14th International Conference on Autonomous Agents and Multiagent SystemsGoogle Scholar
  26. Lee J, Lakkaraju K (2014) Predicting guild membership in massively multiplayer online games. In: Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Washington, D.C., April 2014Google Scholar
  27. Leskovec J (2008) Dynamics of Large Networks. PhD Thesis, School of Computer Science, Carnegie-Mellon UnivGoogle Scholar
  28. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proc. KDD ‘05, 11th ACM SIGKDD Int. Conf. of Knowledge Discovery and Data Mining, 2005, pp 177–187Google Scholar
  29. McAfee, A., Brynjolfsson, E. (2012) Big Data: The Management Revolution, Harvard Business Review, October 2012 IssueGoogle Scholar
  30. McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRefGoogle Scholar
  31. Mislove A, Marcon M, Gummad, KP, Druschel P, Bhattacharjee B (2007) Measurement and Analysis of Online Social Networks. In: Proceedings of IMC ‘07, 7th ACM SIGCOMM Conference on Internet Measurement, pp 29–42Google Scholar
  32. Minitab 17 Statistical Software (2010). [Computer software]. State College, PA: Minitab, Inc. (
  33. Nettleton DF (2013) Data mining of social networks represented as graphs. Comput Sci Rev 7:1–34MathSciNetCrossRefzbMATHGoogle Scholar
  34. Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015Google Scholar
  35. Nettleton DF, Salas J (2016) A data driven anonymization system for information rich online social network graphs. Expert Syst Appl 55:87–105CrossRefGoogle Scholar
  36. Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133CrossRefGoogle Scholar
  37. Ovelgonne M (2013) Distributed community detection in web-scale networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pp 66–73Google Scholar
  38. Pérez-Rosés H, Sebé F (2015) Synthetic generation of social network data with endorsements. J Simul 9(4):279–286CrossRefGoogle Scholar
  39. Pérez-Rosés H, Sebé F, Ribó JM (2016) Endorsement Deduction and Ranking in Social Networks, Computer Communications, Vol. 73, Part B, 1 January 2016, Pages 200–210, ElsevierGoogle Scholar
  40. Pham MD, Boncz P, Erling O (2012) S3G2: a Scalable Structure-correlated Social Graph Generator. In: Proc. 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, Lecture Notes in Computer Science, vol. 7755, pp 156–172Google Scholar
  41. Plimpton SJ, Devine KD (2011) MapReduce in MPI for large-scale graph algorithms. Parallel Comput 37(9):610–632CrossRefGoogle Scholar
  42. Que X, Checconi F, Petrini F, Wang T, Yu W (2013) Lightning-fast Community Detection in Social Media: A Scalable Implementation of the Louvain Algorithm. Technical Report AU-CSSE-PASL/13-TR01 (Auburn University, IBM TJ Watson)Google Scholar
  43. Ramakrishnan N, Keller B, Mirza BJ. (2001). A. Grama, and G. Karypis, “Privacy risks in recommender systems,” IEEE Internet Computing, vol. 5, no. 6, pp. 54–62, 2001Google Scholar
  44. Robins G, Pattison P, Woolcock J (2005) Small and other worlds: global network structures from local processes. Am J Sociol (AJS) 110(4):894–936CrossRefGoogle Scholar
  45. Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated Graph Models for Social Network Experiments, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USAGoogle Scholar
  46. Schult DA, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008). Vol. 2008. 2008Google Scholar
  47. Tang L, Liu H, Zhang J, Nazeri N (2008). Community evolution in dynamic multi-mode networks. In: Proc. of the 14th ACM SIGKDD, KDD’08, New York, NY, USA, 2008, pp 677–685Google Scholar
  48. Tarbush B, Teytelboym A (2012) Homophily in Online Social Networks, Internet and Network Economics, Volume 7695 of the series Lecture Notes in Computer Science pp 512-518 (2012). In: Proc. Internet and Network Economics: 8th International Workshop, WINE 2012, Liverpool, UK, December 10–12, 2012. Springer Berlin HeidelbergGoogle Scholar
  49. Verbrugge LM (1983) A research note on adult friendship contact: a dyadic perspective. Soc Forces 62(1):78–83CrossRefGoogle Scholar
  50. Viswanath, B, Mislove A, Cha M, Gummadi, KP. (2009). On the Evolution of User Interaction in Facebook. In: Proceedings of 2nd ACM workshop on Online Social Networks, WOSN’09, Barcelona, Spain, 2009, pp 37–42Google Scholar
  51. Wang X, Sukthankar G (2013) Link prediction in multirelational collaboration networks. In: Proceedings of the IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, pp 1445–1447, Canada, Aug 2013Google Scholar
  52. Wang X, Maghami M, Sukthankar G (2011) Leveraging network properties for trust evaluation in multi-agent systems. In: Proc. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, pp 288–295Google Scholar
  53. Wattenhofer M, Wattenhofer R, Zhu Z (2012) The YouTube Social Network. In: Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, Dublin, Ireland, 4–7 June, 2012, pp 354–361Google Scholar
  54. Weil, J. (2015) “Mark Zuckerberg: Creator of Facebook”, Abdo Publishing, Minneapolis, USA. Ed. Arnold Ringstad, ISBN 978-1-62403-647-7 (2015)Google Scholar
  55. Wigand R, Agrawal N, Osesina O, Hering W, Korsgaard M, Picot A, Drescher M (2012) Social network indices as performance predictors in a virtual organization. In: proceedings of the 4th international conference on Computational Aspects of Social Networks (CASoN) pp 144–149Google Scholar
  56. Xie J, Szymanski BK (2013). Labelrank: A stabilized label propagation algorithm for community detection in networks. In: Network Science Workshop (NSW), 2013 IEEE 2nd (pp 138–143)Google Scholar
  57. Xie J, Chen M, Szymanski BK (2013). LabelrankT: Incremental community detection in dynamic networks via label propagation. In: ACM Proceedings of the Workshop on Dynamic Networks Management and Mining (pp 25–32)Google Scholar
  58. Yang J, Leskovec J (2012) Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012Google Scholar
  59. Zhao W, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proc. CloudCom 2009, LNCS 5931, pp 674–679, 2009Google Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Universitat Pompeu FabraBarcelonaSpain

Personalised recommendations