Fast Construction of Compressed Web Graphs

  • Jan Broß
  • Simon Gog
  • Matthias Hauck
  • Marcus Paradies
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10508)

Abstract

Several compressed graph representations were proposed in the last 15 years. Today, all these representations are highly relevant in practice since they enable to keep large-scale web and social graphs in the main memory of a single machine and consequently facilitate fast random access to nodes and edges.

While much effort was spent on finding space-efficient and fast representations, one issue was only partially addressed: developing resource-efficient construction algorithms. In this paper, we engineer the construction of regular and hybrid \(k^2\)-trees. We show that algorithms based on the Z-order sorting reduce the memory footprint significantly and at the same time are faster than previous approaches. We also engineer a parallel version, which fully utilizes all CPUs and caches. We show the practicality of the latter version by constructing partitioned hybrid k-trees for Web graphs in the scale of a billion nodes and up to 100 billion edges.

Keywords

Web graphs Compact data structures Graph compression 

References

  1. 1.
    Apostolico, A., Drovandi, G.: Graph compression by BFS. Algorithms 2(3), 1031–1044 (2009)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bern, M., Eppstein, D., Teng, S.-H.: Parallel construction of quadtrees and quality triangulations. In: Dehne, F., Sack, J.-R., Santoro, N., Whitesides, S. (eds.) WADS 1993. LNCS, vol. 709, pp. 188–199. Springer, Heidelberg (1993). doi:10.1007/3-540-57155-8_247 CrossRefGoogle Scholar
  3. 3.
    Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a scalable fully distributed web crawler. Softw. Pract. Exp. 34(8), 711–726 (2004)CrossRefGoogle Scholar
  4. 4.
    Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: Proceedings of WWW, pp. 227–228 (2014)Google Scholar
  5. 5.
    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of WWW, pp. 595–601 (2004)Google Scholar
  6. 6.
    Brisaboa, N.R., Ladra, S., Navarro, G.: k2-trees for compact web graph representation. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 18–30. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03784-9_3 CrossRefGoogle Scholar
  7. 7.
    Brisaboa, N.R., Ladra, S., Navarro, G.: DACs: bringing direct access to variable-length codes. Inf. Process. Manag. 49(1), 392–404 (2013)CrossRefGoogle Scholar
  8. 8.
    Brisaboa, N.R., Ladra, S., Navarro, G.: Compact representation of web graphs with extended functionality. Inf. Syst. 39, 152–174 (2014)CrossRefGoogle Scholar
  9. 9.
    Claude, F., Navarro, G.: Fast and compact web graph representations. ACM Trans. Web 1(1), 77–91 (2009)MATHGoogle Scholar
  10. 10.
    Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw. Pract. Exper. 38(6), 589–637 (2008)CrossRefGoogle Scholar
  11. 11.
    Hernández, C., Navarro, G.: Compressed representations for web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)CrossRefGoogle Scholar
  12. 12.
    Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of FOCS, pp. 549–554 (1989)Google Scholar
  13. 13.
    Junghanns, M., Petermann, A., Gómez, K., Rahm, E.: GRADOOP: scalable graph data management and analytics with Hadoop. CoRR abs/1506.00548 (2015)Google Scholar
  14. 14.
    Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of USENIX, pp. 31–46 (2012)Google Scholar
  15. 15.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)Google Scholar
  16. 16.
    Singler, J., Sanders, P., Putze, F.: MCSTL: the multi-core standard template library. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 682–694. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74466-5_72 CrossRefGoogle Scholar
  17. 17.
    Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: unifying data-parallel and graph-parallel analytics. CoRR abs/1402.2394 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jan Broß
    • 1
  • Simon Gog
    • 2
  • Matthias Hauck
    • 1
    • 3
  • Marcus Paradies
    • 1
  1. 1.SAP SEWalldorfGermany
  2. 2.Institute of Theoretical InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.Institute of Computer EngineeringRuprecht-Karls Universität HeidelbergMannheimGermany

Personalised recommendations