Skip to main content
Log in

Ingress: an automated incremental graph processing system

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The graph data keep growing over time in real life. The ever-growing amount of dynamic graph data demands efficient techniques of incremental graph computation. However, incremental graph algorithms are challenging to develop. Existing approaches usually require users to manually design nontrivial incremental operators, or choose different memoization strategies for certain specific types of computation, limiting the usability and generality. In light of these challenges, we propose \(\textsf{Ingress}\), an automated system for incremental graph proc essing. \(\textsf{Ingress}\) is able to deduce the incremental counterpart of a batch vertex-centric algorithm, without the need of redesigned logic or data structures from users. Underlying \(\textsf{Ingress}\) is an automated incrementalization framework equipped with four different memoization policies, to support all kinds of vertex-centric computations with optimized memory utilization. We identify sufficient conditions for the applicability of these policies. \(\textsf{Ingress}\) chooses the best-fit policy for a given algorithm automatically by verifying these conditions. In addition to the ease-of-use and generalization, \(\textsf{Ingress}\) outperforms state-of-the-art incremental graph systems by \(12.14\times \) on average (up to \(49.23\times \)) in efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. This is essentially the VAD-Reset approach in [50].

References

  1. Acar, U.A.: Self-adjusting computation. Ph.D. thesis, CMU (2005)

  2. Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., Aly, M.: Video suggestion and discovery for youtube: taking random walks through the view graph. In: WWW, pp. 895–904 (2008)

  3. Bang-Jensen, J., Gutin, G.Z.: Digraphs-Theory, Algorithms and Applications, 2nd edn. Springer, Cham (2009)

    Google Scholar 

  4. Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher-order languages: incrementalizing \(\lambda \)-calculi by static differentiation. In: PLDI, pp. 145–155 (2014)

  5. Chang, X., Liu, X., Wen, J., Li, S., Fang, Y., Song, L., Qi, Y.: Continuous-time dynamic graph learning via neural interaction processes. In: CIKM, pp. 145–154 (2020)

  6. Europe-OSM. https://www.cise.ufl.edu/research/sparse/matrices/DIMACS10/europe_osm.html (2010)

  7. Fan, W., Hu, C., Tian, C.: Incremental graph computations: Doable and undoable. In: SIGMOD, pp. 155–169 (2017)

  8. Fan, W., Liu, M., Tian, C., Xu, R., Zhou, J.: Incrementalization of graph partitioning algorithms. PVLDB 13(8), 1261–1274 (2020)

    Google Scholar 

  9. Fan, W., Tiao, C., Xu, R., Yin, Q., Yu, W., Zhou, J.: Incrementalizing graph algorithms. pp. 459–471 (2022)

  10. Fan, W., Xu, J., Wu, Y., Yu, W., Jiang, J., Zheng, Z., Zhang, B., Cao, Y., Tian, C.: Parallelizing sequential graph computations. In: SIGMOD, pp. 495–510 (2017)

  11. Feng, G., Ma, Z., Li, D., Chen, S., Zhu, X., Han, W., Chen, W.: Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s. In: SIGMOD, pp. 513–527 (2021)

  12. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)

    Article  MathSciNet  Google Scholar 

  13. Gong, S., Tian, C., Yin, Q., Yu, W., Zhang, Y., Geng, L., Yu, S., Yu, G., Zhou, J.: Automating incremental graph processing with flexible memoization. PVLDB 14(9), 1613–1625 (2021)

    Google Scholar 

  14. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp. 17–30 (2012)

  15. Grädel, E., Kolaitis, P.G., Vardi, M.Y.: On the decision problem for two-variable first-order logic. Bull. Symb. Log. 3(1), 53–69 (1997)

    Article  MathSciNet  Google Scholar 

  16. Guan, Z., Wu, J., Zhang, Q., Singh, A.K., Yan, X.: Assessing and ranking structural correlations in graphs. In: SIGMOD, pp. 937–948 (2011)

  17. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS (2017)

  18. Hammer, M.A., Khoo, Y.P., Hicks, M., Foster, J.S.: Adapton: composable, demand-driven incremental computation. In: PLDI, pp. 156–166 (2014)

  19. Holm, J., de Lichtenberg, K., Thorup, M.: Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. J. ACM 48(4), 723–760 (2001)

    Article  MathSciNet  Google Scholar 

  20. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)

  21. Jiang, X., Xu, C., Yin, X., Zhao, Z., Gupta, R.: Tripoline: generalized incremental graph processing via graph triangle inequality. In: EuroSys, pp. 17–32 (2021)

  22. Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)

    Article  Google Scholar 

  23. Kim, K., Seo, I., Han, W., Lee, J., Hong, S., Chafi, H., Shin, H., Jeong, G.: Turboflux: a fast continuous subgraph matching system for streaming graph data. In: SIGMOD, pp. 411–426 (2018)

  24. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)

  25. Li, R., Yu, J.X., Mao, R.: Efficient core maintenance in large dynamic graphs. TKDE 26(10), 2453–2465 (2014)

    Google Scholar 

  26. Liu, Y.A.: Efficiency by incrementalization: an introduction. High. Order Symb. Comput. 13(4), 289–313 (2000)

    Article  Google Scholar 

  27. Luo, X., Liu, L., Yang, Y., Bo, L., Cao, Y., Wu, J., Li, Q., Yang, K., Zhu, K.Q.: Alicoco: Alibaba e-commerce cognitive concept net. In: SIGMOD, pp. 313–327 (2020)

  28. Libgrape-Lite. https://github.com/alibaba/libgrape-lite

  29. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

  30. Mariappan, M., Che, J., Vora, K.: Dzig: Sparsity-aware incremental processing of streaming graphs. In: EuroSys, pp. 83–98 (2021)

  31. Mariappan, M., Vora, K.: Graphbolt: Dependency-driven synchronous processing of streaming graphs. In: EuroSys, pp. 1–16 (2019)

  32. Matijasevič, Y.V.: Diophantine representation of recursively enumerable predicates. In: Studies in Logic and the Foundations of Mathematics, vol. 63, pp. 171–177 (1971)

  33. McCune, R.R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 1–39 (2015)

    Article  Google Scholar 

  34. McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: PODS, pp. 401–411 (2016)

  35. McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: CIDR (2013)

  36. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS, pp. 337–340 (2008)

  37. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: SOSP, pp. 439–455 (2013)

  38. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

  39. Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: AAAI, pp. 129–138 (1982)

  40. Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015). http://networkrepository.com

  41. Road-USA-Graph. https://www.cise.ufl.edu/research/sparse/matrices/DIMACS10/road_usa.html (2011)

  42. Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)

    Article  MathSciNet  Google Scholar 

  43. Sengupta, D., Sundaram, N., Zhu, X., Willke, T.L., Young, J., Wolf, M., Schwan, K.: Graphin: an online high performance incremental graph processing framework. In: Euro-Par, pp. 319–333 (2016)

  44. Shi, X., Cui, B., Shao, Y., Tong, Y.: Tornado: a system for real-time iterative analysis over evolving data. In: SIGMOD, pp. 417–430 (2016)

  45. Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NIPS (2016)

  46. Size of Wikipedia (2020). https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

  47. Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex’’ to “think like a graph’’. PVLDB 7(3), 193–204 (2013)

    Google Scholar 

  48. UK-2005. https://www.cise.ufl.edu/research/sparse/matrices/LAW/uk-2005.html (2005)

  49. Valiant, L.G.: A bridging model for parallel computation. CACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  50. Vora, K., Gupta, R., Xu, G.: Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In: ASPLOS, pp. 237–251 (2017)

  51. Wang, Q., Zhang, Y., Wang, H., Geng, L., Lee, R., Zhang, X., Yu, G.: Automating incremental and asynchronous evaluation for recursive aggregate data processing. In: SIGMOD, pp. 2439–2454 (2020)

  52. Xu, S., Zhang, H., Neubig, G., Dai, W., Kim, J.K., Deng, Z., Ho, Q., Yang, G., Xing, E.P.: Cavs: an efficient runtime system for dynamic neural networks. In: ATC, pp. 937–950 (2018)

  53. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ICDM, pp. 1–8 (2015)

  54. Zakian, T.A.K., Capelli, L.A.R., Hu, Z.: Incrementalization of vertex-centric programs. In: IPDPS, pp. 1019–1029 (2019)

  55. Zhang, Y., Gao, Q., Gao, L., Wang, C.: Priter: a distributed framework for prioritized iterative computations. In: SOCC, pp. 1–14 (2011)

  56. Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. TPDS 25(8), 2091–2100 (2013)

    Google Scholar 

  57. Zhao, J., Yang, Y., Zhang, Y., Liao, X., Gu, L., He, L., He, B., Jin, H., Liu, H., Jiang, X., Yu, H.: Tdgraph: a topology-driven accelerator for high-performance streaming graph processing. In: ISCA, pp. 116–129 (2022)

  58. Zheng, L., Li, Z., Li, J., Li, Z., Gao, J.: Addgraph: anomaly detection in dynamic graph using attention-based temporal GCN. In: IJCAI (2019)

  59. Zhu, X., Chen, W., Zheng, W., Ma, X.: Gemini: A computation-centric distributed graph processing system. In: OSDI, pp. 301–316 (2016)

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (62202301, 62072082, U2241212, 62202088, 62137001, 62302027), the 111 Project (B16009), the Fundamental Research Funds for the Central Universities (N2216012, N2216015), and a research Grant from Alibaba Innovative Research (AIR) Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A1. Proof of Lemma 1

Proof

Following the computation process of an \(\textsf{MF}\)-applicable \(\mathcal {A}\) as shown in Eq. (4), we have that

figure i

In the above equations, lines (a) and (b) are due to the property (C2) as defined in Sect. 4.1; line (c) follows from a simple induction; and line (d) is true because of Eq. (4) and the property (C3). \(\square \)

A2. Proof of Lemma 2

Proof

The incremental \(\textsf{MF}\)-applicable \(\mathcal {A}\) on \(G\oplus \varDelta G\) starts from the previous computation status (see Algorithm 1). By Lemma 1, we have that \(\hat{\mathbb {X}}^0=\mathbb {X}^k={\mathcal {U}}\left( \mathbb {X}^0\cup \bigcup _{\ell =0}^{k-1} \mathcal {G}^\ell (\mathbb {M}^0)\right) \). It remains to verify \(\hat{\mathbb {M}}^{0}\). Let \(\mathbb {M}_v = \bigcup _{\ell =0}^{k-1} m_v^{\ell }\) be the messages received by v in the computation over G. Observe the following.

(1) Algorithm 1 generates two sets of messages, \(M_v^{+}\) and \(M_v^{-}\), in case that the edges related to v are evolved, i.e., \(\mathcal {G}(\mathbb {M}_{v}) \ne \hat{\mathcal {G}}(\mathbb {M}_{v})\). By Algorithm 1, we have that \({\mathcal {U}}\left( \hat{\mathcal {G}}(\mathbb {M}_{v}) \ominus \mathcal {G}(\mathbb {M}_{v})\right) = {\mathcal {U}}(M_v^{+}\cup M_v^{-})\).

(2) If the edges related to v are not evolved, we have that \(\mathcal {G}(\mathbb {M}_v) = \hat{\mathcal {G}}(\mathbb {M}_v)\). By (C1) and (C2), it holds that:

$$\begin{aligned} \begin{aligned} {\mathcal {U}}\left( \hat{\mathcal {G}}(\mathbb {M}_v) \ominus \mathcal {G}(\mathbb {M}_v)\right)&= {\mathcal {U}}\left( \hat{\mathcal {G}}(\mathbb {M}_v) \cup {\mathcal {U}}^{-}\circ {\mathcal {U}}\circ \mathcal {G}(\mathbb {M}_v)\right) \\&= {\mathcal {U}}\left( \hat{\mathcal {G}}(\mathbb {M}_v) \setminus \mathcal {G}(\mathbb {M}_v)\right) = {\mathcal {U}}({\textbf {0}}). \end{aligned} \end{aligned}$$

That is, the messages in \(\hat{\mathcal {G}}(\mathbb {M}_v)\) and \(\mathcal {G}(\mathbb {M}_v)\) can be canceled out w.r.t. function \({\mathcal {U}}\). Thus there is no need to generate these messages in incremental computation over \(G\oplus \varDelta G\).

According to Algorithm 1, the initial messages \(\hat{\mathbb {M}}^{0}\) for the incremental computation consist of \({\mathcal {U}}(M_v^{+}\cup M_v^{-})\) for vertices with evolved edges. By the above analysis, vertices without evolved edges contribute empty even if the corresponding messages are also generated. As a consequence, we can express \(\hat{\mathbb {M}}^{0}\) as follows:

$$\begin{aligned} \begin{aligned} \hat{\mathbb {M}}^{0}&= {\mathcal {U}}\left( \bigcup _{v\in V}\hat{\mathcal {G}}(\mathbb {M}_v) \ominus \bigcup _{v\in V}\mathcal {G}(\mathbb {M}_v)\right) \\&= {\mathcal {U}}\left( \hat{\mathcal {G}}\left( \bigcup _{v\in V}\bigcup _{\ell =0}^{k-1} m_v^{\ell }\right) \ominus \mathcal {G}\left( \bigcup _{v\in V}\bigcup _{\ell =0}^{k-1} m_v^{\ell }\right) \right) \\&= {\mathcal {U}}\left( \hat{\mathcal {G}}\left( \bigcup _{\ell =0}^{k-1} \mathcal {G}^\ell (\mathbb {M}^0)\right) \ominus \mathcal {G}\left( \bigcup _{\ell =0}^{k-1} \mathcal {G}^\ell (\mathbb {M}^0)\right) \right) .\end{aligned} \end{aligned}$$

\(\square \)

A3. Proof of Lemma 3

Proof

Since the batch algorithm \(\mathcal {A}\) is \(\textsf{MP}\)-applicable, i.e., condition (C4) holds, each state \(x_v^k\) of an unreset vertex v in \(\mathbb {X}_N^k\) is determined by an effective message generated when running \(\mathcal {A}\) on G, denoted as \(m_v^c\). Hence \(x_v^k = m_v^c\). Observe that the right-hand side of Eq. (8) refers to the result of executing \(\mathcal {A}\) over the reserved graph \(G_\mathcal {R}\) (see Lemma 1). Then if for each unreset vertex v, the effective message \(m_v^c\) is the same as the corresponding effective message generated for v when invoking \(\mathcal {A}\) over \(G_\mathcal {R}\), then Eq. (8) holds. To show this precondition, it suffices to prove that \(m_v^c\) is transmitted from another unreset vertex \(v'\) and \((v', v)\) is not a deleted edge in \(\varDelta G\). This is because the computation of \(\mathcal {A}\) strictly follows the propagation of messages and it can be formally verified by induction on the rounds of the computation.

We next prove that each effective message \(m_v^c\) is sent from another unreset vertex via an edge that is not deleted by contradiction. Assume by contradiction that (i) a reset vertex \(v'\) sends \(m_v^c\) to the unreset v or (ii) \(m_v^c\) is transmitted along a deleted edge \((v',v)\). (i) If \(v'\) has been reset, then v must also be a reset vertex as Algorithm 2 propagates \(\bot \) along the paths formed by all effective messages, including \(m_v^c\). (ii) If \((v',v)\) is a deleted edge, then the state of v should also be reset since the deleted \((v',v)\) initiates a cancelation message in Algorithm 2. Hence either case leads to a contradiction. The correctness of Eq. 8 follows. \(\square \)

A4. Proof of Lemma 4

Proof

When messages \(\bot \) are propagated in Algorithm 2, the initial \(\hat{\mathbb {X}}^0\) includes the current states for both reset and unreset vertices. By Lemma 3 and the definition of changed graph \(G_\mathcal {C}\), we have that \(\hat{\mathbb {X}}^0 = {\mathcal {U}}\big (\mathbb {X}_\mathcal {R}^0 \cup \bigcup ^{k-1}_{\ell =0}\mathcal {G}_\mathcal {R}^\ell (\mathbb {M}_\mathcal {R}^0) \big ) \cup \mathbb {X}_\mathcal {C}^0 = {\mathcal {U}}\left( \mathbb {X}_\mathcal {R}^0 \cup \bigcup ^{k}_{\ell =0}\mathbb {M}_\mathcal {R}^\ell \right) \cup \mathbb {X}_\mathcal {C}^0 \).

We next analyze the initial compensation messages \(\hat{\mathbb {M}}_0\) initiated by Algorithm 2. By its operations, we can see that each initial compensation message can be sent from either an unreset vertex in \(V_\mathcal {R}\) or a reset vertex in \(V_\mathcal {C}\). In light of this, we denote by \(\hat{\mathbb {M}}^r\) and \(\hat{\mathbb {M}}^c\) the collections of initial compensation messages sent from unreset and reset vertices, respectively.

Suppose that an unreset vertex v sends an initial compensation message to \(v'\). Due to the logic of Algorithm 2 (line 4), \(v'\) must be a reset vertex or \((v, v')\) is an evolved edge. Hence edge \((v, v')\) cannot appear in the reserved graph \(G_\mathcal {R}\) under both cases. Recall that \(M_v^+\) denotes the initial compensation messages sent from v. Then for each unreset vertex \(v \in V_R\), \(M_v^+ = {\mathcal {U}}\circ \hat{\mathcal {G}}(x_v^k) {\setminus } {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(x_v^k)\). This expression also indicates that no actual message is created on unreset v if none of v’s adjacent edges is evolved or covers reset vertices. Based on this observation and Lemma 3, we have

$$\begin{aligned} \begin{aligned} \hat{\mathbb {M}}^r&= \bigcup _{v\in V_\mathcal {R}} \big ( {\mathcal {U}}\circ \hat{\mathcal {G}}(x_v^k) \setminus {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(x_v^k) \big ) \\&= \bigcup _{v\in V_\mathcal {R}} \bigcup _{\ell = 0}^k \big ( {\mathcal {U}}\circ \hat{\mathcal {G}}(m_v^\ell ) \setminus {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(m_v^\ell ) \big ) \\&= \bigcup _{\ell = 0}^k \big ( {\mathcal {U}}\circ \hat{\mathcal {G}}(\mathbb {M}_\mathcal {R}^\ell ) \setminus {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(\mathbb {M}_\mathcal {R}^\ell ) \big ). \end{aligned} \end{aligned}$$

Putting this and the fact that \(\hat{\mathbb {M}}^c {=} \mathbb {M}_\mathcal {C}^0\) together, i.e., the initial messages constructed at reset vertices are the same as their counterparts created in the changed graph, we have \(\hat{\mathbb {M}}^0 = \mathbb {M}^0_\mathcal {C}\cup \bigcup ^{k}_{\ell =0}\left( {\mathcal {U}}\circ \hat{\mathcal {G}}( \mathbb {M}^\ell _\mathcal {R}){\setminus } {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(\mathbb {M}^\ell _\mathcal {R})\right) \). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, S., Tian, C., Yin, Q. et al. Ingress: an automated incremental graph processing system. The VLDB Journal 33, 781–806 (2024). https://doi.org/10.1007/s00778-024-00838-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-024-00838-z

Keywords

Navigation