Using the Marshall-Olkin Extended Zipf Distribution in Graph Generation

  • Ariel Duarte-LópezEmail author
  • Arnau Prat-Pérez
  • Marta Pérez-Casany
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)


Being able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodness-of-fit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.



The authors, all members of DAMA-UPC, thank the Ministry of Science and Innovation of Spain, Generalitat de Catalunya, for grant numbers TIN2013-47008-R and SGR2014-890 respectively and also the EU FP7/2007-2013 for funding the LDBC project (ICT2011-8-317548). M. Pérez-Casany also thanks the Spanish Ministry of Education and Science for grant MTM2013-43992-R and Generalitat de Catalunya for grant 2014 SGR 890 (AGAUR). The authors thank Oracle Labs for the strategic support to the Graphalytics project.


  1. 1.
    Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York (2002)Google Scholar
  2. 2.
    Capota, M., Hegeman, T., Iosup, A., Prat-Pérez, A., Erling, O., Boncz, P.: Graphalytics: a big data benchmark for graph-processing platforms (2015)Google Scholar
  3. 3.
    Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 619–630. ACM (2015)Google Scholar
  5. 5.
    Luc, D.: Non-uniform Random Variate Generation. Springer, New York (1986)zbMATHGoogle Scholar
  6. 6.
    Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray User’s Group (CUG) (2010)Google Scholar
  7. 7.
    Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46(5), 323–351 (2005)CrossRefGoogle Scholar
  8. 8.
    Pérez-Casany, M., Casellas, A.: Marshall-olkin extended Zipf distribution. arXiv preprint (2013). arXiv:1304.4540
  9. 9.
    Yee, T.W.: Maintainer Thomas Yee, and Suggests VGAMdata. Package ‘vgam’ (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ariel Duarte-López
    • 1
    Email author
  • Arnau Prat-Pérez
    • 1
  • Marta Pérez-Casany
    • 2
  1. 1.DAMA-UPC, Departament d’Arquitectura de ComputadorsUniversitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.DAMA-UPC, Departament Matemàtica Aplicada IIUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations