On effective and efficient graph edge labeling

Abstract

Graphs, such as social, road and information networks, are ubiquitous as they naturally model entities and their relationships. Many query processing tasks on graphs are concerned about efficiently accessing nodes and edges stored in some order on disk or main memory. A natural following question we focus on here is: given a directed graph, how should we label/order its edges to achieve better disk locality and support various neighborhood queries efficiently? We answer this question by introducing two edge-labeling schemes, GrdRandom and FlipInOut, that label edges with natural number ordering based on the premise that edges should be assigned integer identifiers exploiting their consecutiveness to a maximum degree. We conduct extensive experimental analysis on real-world graphs, and compare our proposed schemes with various baseline labeling methods. We demonstrate that our methods are efficient and result in significantly improved query I/O performance. Finally, we propose an effective streaming graph partitioning method, FlipCut, which leverages the FlipInOut edge labeling.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    https://snap.stanford.edu/data/index.html.

  2. 2.

    http://konect.uni-koblenz.de.

References

  1. 1.

    Andersen, R., Peres, Y.: Finding sparse cuts locally using evolving sets. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 235–244 (2009)

  2. 2.

    Apostolico, A., Drovandi, G.: Graph compression by BFS. Algorithms 2(3), 1031–1044 (2009)

    Google Scholar 

  3. 3.

    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008

  4. 4.

    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, pp. 595–602 (2004)

  5. 5.

    Carrasco, J.J., Fain, D.C., Lang, K.J., Zhukov, L.: Clustering of bipartite advertiser-keyword graph. In: Conference: ICDM 2003 (2003)

  6. 6.

    Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: KDD, pp. 79–88 (2004)

  7. 7.

    Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)

  8. 8.

    Chinn, P.Z., Chvatalova, J., Dewdney, A.K., Gibbs, N.E.: The bandwidth problem for graphs and matrices—a survey. J. Graph Theory 6(3), 223–254 (1982)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Curtiss, M., Becker, I., Bosman, T., Doroshenko, S., Grijincu, L., Jackson, T., Kunnatur, S., Lassen, S., Pronin, P., Sankar, S., Shen, G., Woss, G., Yang, C., Zhang, N.: Unicorn: a system for searching the social graph. PVLDB 6(11), 1150–1161 (2013)

    Google Scholar 

  10. 10.

    Dhulipala, L., Kabiljo, I., Karrer, B., Ottaviano, G., Pupyrev, S., Shalita, A.: Compressing graphs and indexes with recursive graph bisection. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1535–1544 (2016)

  11. 11.

    Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: DAC, pp. 175–181 (1982)

  12. 12.

    Gehweiler, J., Meyerhenke, H.: A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In: IPDPSW, pp. 1–8. IEEE (2010)

  13. 13.

    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of the Ninth IEEE International Conference on Data Mining, ICDM (2009)

  14. 14.

    Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MATH  Google Scholar 

  15. 15.

    Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. (1970). https://doi.org/10.1002/j.1538-7305.1970.tb01770.x

  16. 16.

    Leskovec, J., Sosič, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. 8(1), 1:1–1:20 (2016)

    Article  Google Scholar 

  17. 17.

    Lim, Y., Kang, U., Faloutsos, C.: SlashBurn: graph compression and mining beyond Caveman communities. IEEE Trans. Knowl. Data Eng. (2014). https://doi.org/10.1109/TKDE.2014.2320716

  18. 18.

    Liu, Y., Dighe, A., Safavi, T., Koutra, D.: A graph summarization: a survey. CoRR arXiv:1612.04883 (2016)

  19. 19.

    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: GraphLab: a new framework for parallel machine learning. CoRR arXiv:1006.4990 (2010)

  20. 20.

    Martínez-Bazan, N., Águila Lorente, M.A., Muntés-Mulero, V., Dominguez-Sal, D., Gómez-Villamor, S., Larriba-Pey, J.L.: Efficient graph management based on bitmap indices. In: Proceedings of the 16th International Database Engineering and Applications Symposium (2012)

  21. 21.

    McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what cost? In: 15th Workshop on Hot Topics in Operating Systems, HotOS XV, Kartause Ittingen, Switzerland, 18–20 May 2015 (2015)

  22. 22.

    Mokbel, M.F., Aref, W.G.: Chapter, space-filling curves. In: Encyclopedia of Database Systems. Springer, New York (2009)

  23. 23.

    Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl Acad. Sci. USA 103(23), 8577–8582 (2006). https://doi.org/10.1073/pnas.0601602103

    Article  Google Scholar 

  24. 24.

    Nishimura, J., Ugander, J.: Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13 (2013)

  25. 25.

    Papadimitriou, S., Sun, J., Faloutsos, C., Yu, P.S.: Hierarchical, parameter-free community discovery. In: ECML PKDD 2008: Machine Learning and Knowledge Discovery in Databases (2008)

  26. 26.

    Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: a distributed algorithm for balanced graph partitioning. In: IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems. IEEE (2013)

  27. 27.

    Randall, K.H., Stata, R., Wiener, J.L., Wickremesinghe, R.G.: The link database: fast access to graphs of the web. In: Proceedings of the Data Compression Conference, DCC (2002)

  28. 28.

    Shun, J., Dhulipala, L., Blelloch, G.E.: Smaller and faster: parallel processing of compressed graphs with Ligra+. In: Data Compression Conference, DCC, pp. 403–412 (2015)

  29. 29.

    Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012)

  30. 30.

    Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014)

  31. 31.

    Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: IEEE 30th International Conference on Data Engineering, pp. 568–579 (2014)

  32. 32.

    Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1813–1828. ACM (2016)

  33. 33.

    Yzelman, A.J.N., Bisseling, R.H.: A Cache-Oblivious Sparse Matrix-Vector Multiplication Scheme Based on the Hilbert Curve, pp. 627–633. Springer, Berlin (2012)

    Google Scholar 

  34. 34.

    Yzelman, A.N., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)

    Article  Google Scholar 

  35. 35.

    Zhang, Y., Kiriansky, V., Mendis, C., Zaharia, M., Amarasinghe, S.P.: Optimizing cache performance for graph analytics. CoRR arXiv:1608.01362 (2016)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Oshini Goonetilleke.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Goonetilleke, O., Koutra, D., Liao, K. et al. On effective and efficient graph edge labeling. Distrib Parallel Databases 37, 5–38 (2019). https://doi.org/10.1007/s10619-018-7234-4

Download citation

Keywords

  • Edge labeling
  • Consecutiveness
  • Query processing