Advertisement

On effective and efficient graph edge labeling

  • Oshini Goonetilleke
  • Danai Koutra
  • Kewen Liao
  • Timos Sellis
Article
  • 81 Downloads
Part of the following topical collections:
  1. Special Issue on Scientific and Statistical Data Management

Abstract

Graphs, such as social, road and information networks, are ubiquitous as they naturally model entities and their relationships. Many query processing tasks on graphs are concerned about efficiently accessing nodes and edges stored in some order on disk or main memory. A natural following question we focus on here is: given a directed graph, how should we label/order its edges to achieve better disk locality and support various neighborhood queries efficiently? We answer this question by introducing two edge-labeling schemes, GrdRandom and FlipInOut, that label edges with natural number ordering based on the premise that edges should be assigned integer identifiers exploiting their consecutiveness to a maximum degree. We conduct extensive experimental analysis on real-world graphs, and compare our proposed schemes with various baseline labeling methods. We demonstrate that our methods are efficient and result in significantly improved query I/O performance. Finally, we propose an effective streaming graph partitioning method, FlipCut, which leverages the FlipInOut edge labeling.

Keywords

Edge labeling Consecutiveness Query processing 

Notes

References

  1. 1.
    Andersen, R., Peres, Y.: Finding sparse cuts locally using evolving sets. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 235–244 (2009)Google Scholar
  2. 2.
    Apostolico, A., Drovandi, G.: Graph compression by BFS. Algorithms 2(3), 1031–1044 (2009)Google Scholar
  3. 3.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. (2008).  https://doi.org/10.1088/1742-5468/2008/10/P10008
  4. 4.
    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, pp. 595–602 (2004)Google Scholar
  5. 5.
    Carrasco, J.J., Fain, D.C., Lang, K.J., Zhukov, L.: Clustering of bipartite advertiser-keyword graph. In: Conference: ICDM 2003 (2003)Google Scholar
  6. 6.
    Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: KDD, pp. 79–88 (2004)Google Scholar
  7. 7.
    Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009)Google Scholar
  8. 8.
    Chinn, P.Z., Chvatalova, J., Dewdney, A.K., Gibbs, N.E.: The bandwidth problem for graphs and matrices—a survey. J. Graph Theory 6(3), 223–254 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Curtiss, M., Becker, I., Bosman, T., Doroshenko, S., Grijincu, L., Jackson, T., Kunnatur, S., Lassen, S., Pronin, P., Sankar, S., Shen, G., Woss, G., Yang, C., Zhang, N.: Unicorn: a system for searching the social graph. PVLDB 6(11), 1150–1161 (2013)Google Scholar
  10. 10.
    Dhulipala, L., Kabiljo, I., Karrer, B., Ottaviano, G., Pupyrev, S., Shalita, A.: Compressing graphs and indexes with recursive graph bisection. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1535–1544 (2016)Google Scholar
  11. 11.
    Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: DAC, pp. 175–181 (1982)Google Scholar
  12. 12.
    Gehweiler, J., Meyerhenke, H.: A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In: IPDPSW, pp. 1–8. IEEE (2010)Google Scholar
  13. 13.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of the Ninth IEEE International Conference on Data Mining, ICDM (2009)Google Scholar
  14. 14.
    Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)CrossRefzbMATHGoogle Scholar
  15. 15.
    Kernighan, B., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. (1970).  https://doi.org/10.1002/j.1538-7305.1970.tb01770.x
  16. 16.
    Leskovec, J., Sosič, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. 8(1), 1:1–1:20 (2016)CrossRefGoogle Scholar
  17. 17.
    Lim, Y., Kang, U., Faloutsos, C.: SlashBurn: graph compression and mining beyond Caveman communities. IEEE Trans. Knowl. Data Eng. (2014).  https://doi.org/10.1109/TKDE.2014.2320716
  18. 18.
    Liu, Y., Dighe, A., Safavi, T., Koutra, D.: A graph summarization: a survey. CoRR arXiv:1612.04883 (2016)
  19. 19.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: GraphLab: a new framework for parallel machine learning. CoRR arXiv:1006.4990 (2010)
  20. 20.
    Martínez-Bazan, N., Águila Lorente, M.A., Muntés-Mulero, V., Dominguez-Sal, D., Gómez-Villamor, S., Larriba-Pey, J.L.: Efficient graph management based on bitmap indices. In: Proceedings of the 16th International Database Engineering and Applications Symposium (2012)Google Scholar
  21. 21.
    McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what cost? In: 15th Workshop on Hot Topics in Operating Systems, HotOS XV, Kartause Ittingen, Switzerland, 18–20 May 2015 (2015)Google Scholar
  22. 22.
    Mokbel, M.F., Aref, W.G.: Chapter, space-filling curves. In: Encyclopedia of Database Systems. Springer, New York (2009)Google Scholar
  23. 23.
    Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl Acad. Sci. USA 103(23), 8577–8582 (2006).  https://doi.org/10.1073/pnas.0601602103 CrossRefGoogle Scholar
  24. 24.
    Nishimura, J., Ugander, J.: Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13 (2013)Google Scholar
  25. 25.
    Papadimitriou, S., Sun, J., Faloutsos, C., Yu, P.S.: Hierarchical, parameter-free community discovery. In: ECML PKDD 2008: Machine Learning and Knowledge Discovery in Databases (2008)Google Scholar
  26. 26.
    Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: a distributed algorithm for balanced graph partitioning. In: IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems. IEEE (2013)Google Scholar
  27. 27.
    Randall, K.H., Stata, R., Wiener, J.L., Wickremesinghe, R.G.: The link database: fast access to graphs of the web. In: Proceedings of the Data Compression Conference, DCC (2002)Google Scholar
  28. 28.
    Shun, J., Dhulipala, L., Blelloch, G.E.: Smaller and faster: parallel processing of compressed graphs with Ligra+. In: Data Compression Conference, DCC, pp. 403–412 (2015)Google Scholar
  29. 29.
    Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012)Google Scholar
  30. 30.
    Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining (2014)Google Scholar
  31. 31.
    Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: IEEE 30th International Conference on Data Engineering, pp. 568–579 (2014)Google Scholar
  32. 32.
    Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1813–1828. ACM (2016)Google Scholar
  33. 33.
    Yzelman, A.J.N., Bisseling, R.H.: A Cache-Oblivious Sparse Matrix-Vector Multiplication Scheme Based on the Hilbert Curve, pp. 627–633. Springer, Berlin (2012)zbMATHGoogle Scholar
  34. 34.
    Yzelman, A.N., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)CrossRefGoogle Scholar
  35. 35.
    Zhang, Y., Kiriansky, V., Mendis, C., Zaharia, M., Amarasinghe, S.P.: Optimizing cache performance for graph analytics. CoRR arXiv:1608.01362 (2016)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Oshini Goonetilleke
    • 1
  • Danai Koutra
    • 2
  • Kewen Liao
    • 3
  • Timos Sellis
    • 3
  1. 1.RMIT UniversityMelbourneAustralia
  2. 2.University of MichiganAnn ArborUSA
  3. 3.Swinburne University of TechnologyMelbourneAustralia

Personalised recommendations