Skip to main content

Generating Large-Scale Heterogeneous Graphs for Benchmarking

  • Conference paper
Specifying Big Data Benchmarks (WBDB 2012, WBDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8163))

Included in the following conference series:

Abstract

Graphs have emerged as an important genre of data that are found in a wide class of applications. The most dominant benchmark for graph data today is Graph 500 that generates a Stochastic Kronecker graph of various sizes, and reports the time to perform a breadth-first search. Apache Giraph uses Pagerank computation as an algorithmic benchmark for large graphs, but does not provide the mechanism to generate graph data. Other forms of graph benchmarks have been developed by smaller communities and are not known widely. However, most benchmarking data for graphs are derived from a single structure generation model, and therefore does not capture the variability of structure and content. To this end, we propose heterogeneous graphs, a mixed model graph structure that combines several existing generation techniques into a single benchmark. It is a hybrid that constructs edge-labeled multigraphs with multiple components, which can be hierarchical, power-law graphs, community-forming graphs, and a new class of graphs formed by motif composition. The user can use a simple set of 4 parameters to specify the graph, but has the option to use several more parameters to have a finer control of the hybrid structure. We define the generation process for heterogeneous graphs and propose an initial set of query operations against the generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: A recursive model for graph mining. In: Proc. 4th SIAM Int. Conf. on Data Mining (2004)

    Google Scholar 

  2. Seshadhri, C., Pinar, A., Kolda, T.G.: An in-depth study of stochastic kronecker graphs. In: Proc. of the 11th IEEE Int. Conf. on Data Mining (ICDM), pp. 587–596 (2011)

    Google Scholar 

  3. Newman, M.E., Girvan, M.: Mixing patterns and community structure in networks. Statistical Mechanics of Complex Networks, 66–87 (2003)

    Google Scholar 

  4. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 04110 (2008)

    Google Scholar 

  5. Pham, M.-D., Boncz, P., Erling, O.: S3g2: A scalable structure-correlated social graph generator. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 156–172. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Sem. 3(2-3), 158–182 (2005)

    Article  Google Scholar 

  7. Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. on Semantic Web and Information Systems (IJSWIS) 5(2), 1–24 (2009)

    Article  Google Scholar 

  8. Aiello, W., Chung, F., Lu, L.: A random graph model for power law graphs. Experimental Mathematics 10(1), 53–66 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  9. Seshadhri, C., Kolda, T.G., Pinar, A.: Community structure and scale-free collections of Erdös-Rényi graphs. CoRR abs/1112.3644 (2011)

    Google Scholar 

  10. Karrer, B., Newman, M.: Random graph models for directed acyclic networks. Physical Review E 80(4), 046110 (2009)

    Google Scholar 

  11. Lima-Mendez, G., van Helden, J.: The powerful law of the power law and other myths in network biology. Mol. BioSyst. 5, 1482–1493 (2009)

    Article  Google Scholar 

  12. Chung, F.R.K., Lu, L., Dewey, T.G., Galas, D.J.: Duplication models for biological networks. Journal of Computational Biology 10(5), 677–687 (2003)

    Article  Google Scholar 

  13. Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C.: A scalable generative graph model with community structure (February 2013), http://arxiv.org/abs/1302.6636

  14. Krumsiek, J., Suhre, K., Illig, T., Adamski, J., Theis, F.J.: Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Systems Biology 5(1), 21 (2011)

    Article  Google Scholar 

  15. Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Widom, J.: Querying semistructured heterogeneous information. In: Ling, T.W., Mendelzon, A.O., Vieille, L. (eds.) DOOD 1995. LNCS, vol. 1013, pp. 319–344. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, A. (2014). Generating Large-Scale Heterogeneous Graphs for Benchmarking. In: Rabl, T., Poess, M., Baru, C., Jacobsen, HA. (eds) Specifying Big Data Benchmarks. WBDB WBDB 2012 2012. Lecture Notes in Computer Science, vol 8163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53974-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53974-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53973-2

  • Online ISBN: 978-3-642-53974-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics