Fast Construction of Compressed Web Graphs
Several compressed graph representations were proposed in the last 15 years. Today, all these representations are highly relevant in practice since they enable to keep large-scale web and social graphs in the main memory of a single machine and consequently facilitate fast random access to nodes and edges.
While much effort was spent on finding space-efficient and fast representations, one issue was only partially addressed: developing resource-efficient construction algorithms. In this paper, we engineer the construction of regular and hybrid \(k^2\)-trees. We show that algorithms based on the Z-order sorting reduce the memory footprint significantly and at the same time are faster than previous approaches. We also engineer a parallel version, which fully utilizes all CPUs and caches. We show the practicality of the latter version by constructing partitioned hybrid k-trees for Web graphs in the scale of a billion nodes and up to 100 billion edges.
KeywordsWeb graphs Compact data structures Graph compression
- 4.Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: Proceedings of WWW, pp. 227–228 (2014)Google Scholar
- 5.Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of WWW, pp. 595–601 (2004)Google Scholar
- 12.Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of FOCS, pp. 549–554 (1989)Google Scholar
- 13.Junghanns, M., Petermann, A., Gómez, K., Rahm, E.: GRADOOP: scalable graph data management and analytics with Hadoop. CoRR abs/1506.00548 (2015)Google Scholar
- 14.Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of USENIX, pp. 31–46 (2012)Google Scholar
- 15.Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)Google Scholar
- 17.Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: unifying data-parallel and graph-parallel analytics. CoRR abs/1402.2394 (2014)Google Scholar