Advertisement

Distributed and Parallel Databases

, Volume 35, Issue 3–4, pp 249–285 | Cite as

HiNode: an asymptotically space-optimal storage model for historical queries on graphs

  • Andreas Kosmatopoulos
  • Kostas Tsichlas
  • Anastasios Gounaris
  • Spyros Sioutas
  • Evaggelia Pitoura
Article

Abstract

Most modern networks are perpetually evolving and can be modeled by graph data structures. By collecting and indexing the state of a graph at various time instances we are able to perform queries on its entire history and thus gain insight into its fundamental features and attributes. This calls for advanced solutions for graph history storing and indexing that are capable of supporting application queries efficiently while coping with the aggravated space requirements. To this end, we advocate a purely vertex-centric storage model that is asymptotically space-optimal and more space efficient than any other proposal to date. In addition to space efficiency, the model’s purely vertex-centric approach shows great promise with respect to the efficiency and functionality of update and query operations. Furthermore, we make a qualitative comparison with other general methods for graph history storage identifying the pros and cons of our approach. Finally, we implement and incorporate our technique in the \(G^*\) parallel graph processing system, we conduct thorough experimental evaluation and we show that we can yield time and space improvements up to an order of magnitude when compared to \(G^*\).

Keywords

Historical queries Evolving graphs Indexing Space efficiency 

References

  1. 1.
    Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Ahmed, N.K., Neville, J., Kompella, R.: Network sampling: from static to streaming graphs. ACM Trans. Knowl. Discov. Data 8(2), 7 (2014)Google Scholar
  3. 3.
    Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Cassovary. “big graph” processing library. https://github.com/twitter/cassovary
  5. 5.
    Brisaboa, N.R., Caro, D., Fariña, A., Rodríguez, M.A.: A compressed suffix-array strategy for temporal-graph indexing. In: SPIRE, pp. 77–88 (2014)Google Scholar
  6. 6.
    Brodal, G.S., Katajainen, J.: Worst-case external-memory priority queues. In: SWAT, pp. 107–118 (1998)Google Scholar
  7. 7.
    Brodal, G.S., Tsakalidis, K., Sioutas, S., Tsichlas, K.: Fully persistent B-trees. In: SODA, pp. 602–614 (2012)Google Scholar
  8. 8.
    Caro, D., Rodríguez, M.A., Brisaboa, N.R.: Data structures for temporal graphs based on compact sequence representations. Inf. Syst. 51, 1–26 (2015)CrossRefGoogle Scholar
  9. 9.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Erdős, P., Rényi, A.: On random graphs. I. Publ. Math. Debr. 6, 290–297 (1959)zbMATHGoogle Scholar
  11. 11.
    Gao, J., Zhou, C., Yu, J.X.: Toward continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016)CrossRefGoogle Scholar
  12. 12.
    Gehrke, J., Ginsparg, P., Kleinberg, J.M.: Overview of the 2003 KDD cup. SIGKDD Explor. 5(2), 149–151 (2003)CrossRefGoogle Scholar
  13. 13.
  14. 14.
    Hu, P., Lau, W.C.: A survey and taxonomy of graph sampling. CoRR. arXiv:1308.5865 (2013)
  15. 15.
    Huo, W., Tsotras, V.J.: Efficient temporal shortest path queries on evolving social graphs. In: SSDBM, pp. 38:1–38:4 (2014)Google Scholar
  16. 16.
    Kang, U., Tong, H., Sun, J., Lin, C., Faloutsos, C.: GBASE: a scalable and general graph management system. In: SIGKDD, pp. 1091–1099 (2011)Google Scholar
  17. 17.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)CrossRefGoogle Scholar
  18. 18.
    Khurana, U., Deshpande, A.: Efficient snapshot retrieval over historical graph data. In: ICDE, pp .997–1008 (2013)Google Scholar
  19. 19.
    Khurana, U., Deshpande, A.: Storing and analyzing historical graph data at scale. In: EDBT, pp. 77–88 (2016)Google Scholar
  20. 20.
    Koloniari, G., Souravlias, D., Pitoura, E.: On graph deltas for historical queries. In: WOSS (2012)Google Scholar
  21. 21.
    Kosmatopoulos, A., Giannakopoulou, K., Papadopoulos, A.N., Tsichlas, K.: An overview of methods for handling evolving graph sequences. In: ALGOCLOUD, pp. 181–192 (2015)Google Scholar
  22. 22.
    Labouseur, A.G., Birnbaum, J., Olsen, P.W., Spillane, S.R., Vijayan, J., Hwang, J., Han, W.: The G* graph database: efficiently managing large distributed dynamic graphs. Distrib. Parallel Databases 33(4), 479–514 (2015)CrossRefGoogle Scholar
  23. 23.
    Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
  24. 24.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)Google Scholar
  25. 25.
    Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: SIGMOD, pp. 145–156 (2012)Google Scholar
  26. 26.
    Pagh, R.: Basic external memory data structures. In: Algorithms for Memory Hierarchies, pp. 14–35 (2002)Google Scholar
  27. 27.
    Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph sequences. PVLDB 4(11), 726–737 (2011)Google Scholar
  28. 28.
    Ribeiro, B.F., Towsley, D.: On the estimation accuracy of degree distributions from graph sampling. In: CDC, pp. 5240–5247 (2012)Google Scholar
  29. 29.
    Salzberg, B., Tsotras, V.J.: Comparison of access methods for time-evolving data. ACM Comput. Surv. 31(2), 158–221 (1999)CrossRefGoogle Scholar
  30. 30.
    Semertzidis, K., Pitoura, E., Lillis, K.: Timereach: historical reachability queries on evolving graphs. In: EDBT, pp. 121–132 (2015)Google Scholar
  31. 31.
    Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)Google Scholar
  32. 32.
    Spillane, S.R., Birnbaum, J., Bokser, D., Kemp, D., Labouseur, A.G., Olsen, P.W., Vijayan, J., Hwang, J., Yoon, J.: A demonstration of the G* graph database system. In: ICDE, pp. 1356–1359 (2013)Google Scholar
  33. 33.
    Yang, Y., Yu, J.X., Gao, H., Pei, J., Li, J.: Mining most frequently changing component in evolving graphs. World Wide Web 17(3), 351–376 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Andreas Kosmatopoulos
    • 1
  • Kostas Tsichlas
    • 1
  • Anastasios Gounaris
    • 1
  • Spyros Sioutas
    • 2
  • Evaggelia Pitoura
    • 3
  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece
  2. 2.Department of InformaticsIonian UniversityCorfuGreece
  3. 3.Computer Science DepartmentUniversity of IoanninaIoanninaGreece

Personalised recommendations