Abstract
Most modern networks are perpetually evolving and can be modeled by graph data structures. By collecting and indexing the state of a graph at various time instances we are able to perform queries on its entire history and thus gain insight into its fundamental features and attributes. This calls for advanced solutions for graph history storing and indexing that are capable of supporting application queries efficiently while coping with the aggravated space requirements. To this end, we advocate a purely vertexcentric storage model that is asymptotically spaceoptimal and more space efficient than any other proposal to date. In addition to space efficiency, the model’s purely vertexcentric approach shows great promise with respect to the efficiency and functionality of update and query operations. Furthermore, we make a qualitative comparison with other general methods for graph history storage identifying the pros and cons of our approach. Finally, we implement and incorporate our technique in the \(G^*\) parallel graph processing system, we conduct thorough experimental evaluation and we show that we can yield time and space improvements up to an order of magnitude when compared to \(G^*\).
This is a preview of subscription content, log in to check access.
Notes
 1.
Given a query point \(p \in \mathcal {R}\) and a set of N intervals on the real line, a stabbing query returns all intervals that overlap p.
 2.
For \(G^*\), TGI and our solution (and to a lesser extent for the other methods), one could indeed describe the complexity w.r.t. a variety of parameters and provide a more detailed description. However, doing so would certainly not permit the direct comparison between the methods and would thus invalidate the very reason for which this table is provided.
 3.
The source code is available at https://github.com/hinodeauthors/hinode.
 4.
Dataset—Undirected BarabásiAlbert graph: Starting vertices = 1M, edges per newly inserted vertex = 5, vertex insertions per snapshot = 2K, snapshots = 100.
 5.
Dataset  Undirected BarabásiAlbert graph: starting vertices = 1M, edges per newly inserted vertex = 5, vertex insertions per snapshot = 20K, snapshots = 100.
 6.
In the case of querying the 40% of the sequence for the twohop neighborhood of the vertex with the largest degree, \(G^*\) was unable to finish since it run out of memory.
 7.
We would like to thank an anonymous reviewer for pointing out this issue.
References
 1.
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
 2.
Ahmed, N.K., Neville, J., Kompella, R.: Network sampling: from static to streaming graphs. ACM Trans. Knowl. Discov. Data 8(2), 7 (2014)
 3.
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
 4.
Cassovary. “big graph” processing library. https://github.com/twitter/cassovary
 5.
Brisaboa, N.R., Caro, D., Fariña, A., Rodríguez, M.A.: A compressed suffixarray strategy for temporalgraph indexing. In: SPIRE, pp. 77–88 (2014)
 6.
Brodal, G.S., Katajainen, J.: Worstcase externalmemory priority queues. In: SWAT, pp. 107–118 (1998)
 7.
Brodal, G.S., Tsakalidis, K., Sioutas, S., Tsichlas, K.: Fully persistent Btrees. In: SODA, pp. 602–614 (2012)
 8.
Caro, D., Rodríguez, M.A., Brisaboa, N.R.: Data structures for temporal graphs based on compact sequence representations. Inf. Syst. 51, 1–26 (2015)
 9.
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)
 10.
Erdős, P., Rényi, A.: On random graphs. I. Publ. Math. Debr. 6, 290–297 (1959)
 11.
Gao, J., Zhou, C., Yu, J.X.: Toward continuous pattern detection over evolving large graph with snapshot isolation. VLDB J. 25(2), 269–290 (2016)
 12.
Gehrke, J., Ginsparg, P., Kleinberg, J.M.: Overview of the 2003 KDD cup. SIGKDD Explor. 5(2), 149–151 (2003)
 13.
Giraph, A. http://giraph.apache.org/
 14.
Hu, P., Lau, W.C.: A survey and taxonomy of graph sampling. CoRR. arXiv:1308.5865 (2013)
 15.
Huo, W., Tsotras, V.J.: Efficient temporal shortest path queries on evolving social graphs. In: SSDBM, pp. 38:1–38:4 (2014)
 16.
Kang, U., Tong, H., Sun, J., Lin, C., Faloutsos, C.: GBASE: a scalable and general graph management system. In: SIGKDD, pp. 1091–1099 (2011)
 17.
Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: mining petascale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)
 18.
Khurana, U., Deshpande, A.: Efficient snapshot retrieval over historical graph data. In: ICDE, pp .997–1008 (2013)
 19.
Khurana, U., Deshpande, A.: Storing and analyzing historical graph data at scale. In: EDBT, pp. 77–88 (2016)
 20.
Koloniari, G., Souravlias, D., Pitoura, E.: On graph deltas for historical queries. In: WOSS (2012)
 21.
Kosmatopoulos, A., Giannakopoulou, K., Papadopoulos, A.N., Tsichlas, K.: An overview of methods for handling evolving graph sequences. In: ALGOCLOUD, pp. 181–192 (2015)
 22.
Labouseur, A.G., Birnbaum, J., Olsen, P.W., Spillane, S.R., Vijayan, J., Hwang, J., Han, W.: The G* graph database: efficiently managing large distributed dynamic graphs. Distrib. Parallel Databases 33(4), 479–514 (2015)
 23.
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
 24.
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for largescale graph processing. In: SIGMOD, pp. 135–146 (2010)
 25.
Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: SIGMOD, pp. 145–156 (2012)
 26.
Pagh, R.: Basic external memory data structures. In: Algorithms for Memory Hierarchies, pp. 14–35 (2002)
 27.
Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph sequences. PVLDB 4(11), 726–737 (2011)
 28.
Ribeiro, B.F., Towsley, D.: On the estimation accuracy of degree distributions from graph sampling. In: CDC, pp. 5240–5247 (2012)
 29.
Salzberg, B., Tsotras, V.J.: Comparison of access methods for timeevolving data. ACM Comput. Surv. 31(2), 158–221 (1999)
 30.
Semertzidis, K., Pitoura, E., Lillis, K.: Timereach: historical reachability queries on evolving graphs. In: EDBT, pp. 121–132 (2015)
 31.
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
 32.
Spillane, S.R., Birnbaum, J., Bokser, D., Kemp, D., Labouseur, A.G., Olsen, P.W., Vijayan, J., Hwang, J., Yoon, J.: A demonstration of the G* graph database system. In: ICDE, pp. 1356–1359 (2013)
 33.
Yang, Y., Yu, J.X., Gao, H., Pei, J., Li, J.: Mining most frequently changing component in evolving graphs. World Wide Web 17(3), 351–376 (2014)
Author information
Affiliations
Corresponding author
Appendix: The WriteAttribute cases
Appendix: The WriteAttribute cases
We analyze the two possible cases in WriteAttribute. In the first case, the field f does not have any values associated with it in the time interval \([t_s,t_e]\). In that case we proceed as follows: We insert a quadruple \((f,\{\ell _1, \ell _2, \ldots \},t_s,t_e)\) in \({\mathcal {I}}_{v}\). In addition, a record \((\{\ell _1, \ell _2, \ldots \},t_s,t_e)\) is stored in f’s respective Btree \(A_v^f\).
In the second case, the field f has values associated with it in the time interval \([t_s,t_e]\), i.e. there exist (up to) two intervals \([t'_s,t'_e]\) and \([t''_s,t''_e]\) in the data structure, such that either (a) \(t'_s<t_s<t'_e<t_e\), (b) \(t_s<t'_s<t_e<t'_e\), (c) \(t'_s<t_s<t_e<t'_e\) or (d) \(t'_s<t_s<(t'_e=t''_s)<t_e<t''_e\) is true (Fig. 12). In that case, we search \({\mathcal {I}}_{v}\) for \([t'_s,t'_e]\) corresponding to the field f (and \([t''_s,t''_e]\) if it exists) by simulating an insertion of this interval in \({\mathcal {I}}_{v}\). Let \(v_{t'}\) be the node of \({\mathcal {I}}_{v}\) that interval \([t'_s,t'_e]\) is to be stored. After locating the at most three lists in which it is to be stored we search these lists based on the endpoints of \([t'_s,t'_e]\). If there are more than one such intervals then we use the identifier of \([t'_s,t'_e]\) to search among them and locate this interval. The same procedure is applied for \([t''_s,t''_e]\).
Afterwards, we perform a series of interval insertions and deletions in \({\mathcal {I}}_{v}\) and the corresponding \(A_v^f\) Btree depending on the subcases presented below (the resulting intervals end up with the appropriate set of values based on their original intervals):
 Subcase (a):

Deletion of \([t'_s,t'_e]\) followed by the insertion of \([t'_s,t_s)\), \([t_s,t'_e)\) and \([t'_e,t_e]\)
 Subcase (b):

Deletion of \([t'_s,t'_e]\) followed by the insertion of \([t_s,t'_s)\), \([t'_s,t_e)\) and \([t_e,t_e']\)
 Subcase (c):

Deletion of \([t'_s,t'_e]\) followed by the insertion of \([t'_s,t_s)\), \([t_s,t_e)\) and \([t_e,t'_e]\)
 Subcase (d):

Deletion of \([t'_s,t'_e]\) and \([t''_s,t''_e]\) followed by the insertion of \([t'_s,t_s)\), \([t_s,t'_e)\), \([t''_s,t_e)\) and \([t_e,t''_e]\)
Rights and permissions
About this article
Cite this article
Kosmatopoulos, A., Tsichlas, K., Gounaris, A. et al. HiNode: an asymptotically spaceoptimal storage model for historical queries on graphs. Distrib Parallel Databases 35, 249–285 (2017). https://doi.org/10.1007/s106190177207z
Published:
Issue Date:
Keywords
 Historical queries
 Evolving graphs
 Indexing
 Space efficiency