Skip to main content
Log in

Accelerating directed densest subgraph queries with software and hardware approaches

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Given a directed graph G, the directed densest subgraph (DDS) problem refers to finding a subgraph from G, whose density is the highest among all subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fake follower detection and community mining. Theoretically, the DDS problem closely connects to other essential graph problems, such as network flow and bipartite matching. However, existing DDS solutions suffer from efficiency and scalability issues. In this paper, we develop a convex-programming-based solution by transforming the DDS problem into a set of linear programs. Based on the duality of linear programs, we develop efficient exact and approximation algorithms. Particularly, our approximation algorithm can support flexible parameterized approximation guarantees. We further investigate using GPU to speed up the solution of convex programs in parallel and achieve hundreds of times speedup compared to the original Frank–Wolfe computation. We have performed an extensive empirical evaluation of our approaches on eight real large datasets. The results show that our proposed algorithms are up to five orders of magnitude faster than the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1:
Fig. 4
Algorithm 2:
Fig. 5
Algorithm 3:
Algorithm 4:
Algorithm 5:
Algorithm 6:
Fig. 6
Algorithm 7:
Algorithm 8:
Fig. 7
Algorithm 9:
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The dataset is due to Dr. Saravanan Thirumuruganathan from QCRI, HBKU.

  2. There might be several directed densest subgraphs of a graph, and our algorithm will find one of them.

  3. In a real GPU, a block can have several warps, and a warp contains 32 threads. Here, we use small numbers for illustration.

  4. http://konect.uni-koblenz.de/networks/.

  5. https://github.com/chenhao-ma/DDS-convex-code.

  6. https://deci.ai/blog/measure-inference-time-deep-neural-networks/.

References

  1. Administration, F.A.: Air traffic control system command center. https://www.faa.gov (2019)

  2. Albert, R., Jeong, H., Barabási, A.L.: Internet: diameter of the world-wide web. Nature 401(6749), 130 (1999)

    Article  Google Scholar 

  3. Angel, A., Koudas, N., Sarkas, N., Srivastava, D., Svendsen, M., Tirthapura, S.: Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB J. 23(2), 175–199 (2014)

    Article  Google Scholar 

  4. Bahmani, B., Kumar, R., Vassilvitskii, S.: Densest subgraph in streaming and mapreduce. Proc. VLDB Endowm. 5(5), 454–465 (2012)

    Article  Google Scholar 

  5. Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an o (n \(1/4\)) approximation for densest k-subgraph. In: STOC, pp. 201–210. ACM (2010)

  6. Bhattacharya, S., Henzinger, M., Nanongkai, D., Tsourakakis, C.: Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: STOC, pp. 173–182 (2015)

  7. Boob, D., Gao, Y., Peng, R., Sawlani, S., Tsourakakis, C., Wang, D., Wang, J.: Flowless: Extracting densest subgraphs without flow computations. In: Proceedings of The Web Conference 2020, WWW ’20, pp. 573–583. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380140

  8. Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM, pp. 95–106. ACM (2008)

  9. Capocci, A., Servedio, V.D., Colaiori, F., Buriol, L.S., Donato, D., Leonardi, S., Caldarelli, G.: Preferential attachment in the growth of social networks: the internet encyclopedia wikipedia. Phys Review E 74(3), 0360116 (2006)

    Article  Google Scholar 

  10. Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on Approximation Algorithms for Combinatorial Optimization, pp. 84–95. Springer (2000)

  11. Chekuri, C., Quanrud, K., Torres, M.R.: Densest subgraph: Supermodularity, iterative peeling, and flow. In: Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1531–1555. SIAM (2022)

  12. Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MathSciNet  Google Scholar 

  13. Danisch, M., Chan, T.H.H., Sozio, M.: Large scale density-friendly graph decomposition via convex programming. In: WWW, pp. 233–242. International World Wide Web Conferences Steering Committee (2017)

  14. Epasto, A., Lattanzi, S., Sozio, M.: Efficient densest subgraph computation in evolving graphs. In: Proceedings of the 24th International Conference on World Wide Web, pp. 300–310 (2015)

  15. Fang, Y., Yu, K., Cheng, R., Lakshmanan, L.V., Lin, X.: Efficient algorithms for densest subgraph discovery. Proc. VLDB Endowm. 12(11), 1719–1732 (2019)

    Article  Google Scholar 

  16. Frank, M., Wolfe, P., et al.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  17. Freeman, L.C., Webster, C.M., Kirke, D.M.: Exploring social structure using dynamic three-dimensional color images. Soc. Netw. 20(2), 109–118 (1998)

    Article  Google Scholar 

  18. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: PVLDB, pp. 721–732. VLDB Endowment (2005)

  19. Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: Kdd 2015 tutorial. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2313–2314. ACM (2015)

  20. Goldberg, A.V.: Finding a Maximum Density Subgraph. University of California Berkeley, CA (1984)

    Google Scholar 

  21. Han, X., Cheng, R., Grubenmann, T., Maniu, S., Ma, C., Li, X.: Leveraging contextual graphs for stochastic weight completion in sparse road networks. In: SIAM International Conference on Data Mining. SIAM (2022)

  22. Han, X., Cheng, R., Ma, C., Grubenmann, T.: Deeptea: Effective and efficient online time-dependent trajectory outlier detection. In: Proceedings of the VLDB Endowment (2022)

  23. Han, X., Dell’Aglio, D., Grubenmann, T., Cheng, R., Bernstein, A.: A framework for differentially-private knowledge graph embeddings. J. Web Semant. 72, 100696 (2022)

    Article  Google Scholar 

  24. Han, X., Grubenmann, T., Cheng, R., Wong, S.C., Li, X., Sun, W.: Traffic incident detection: A trajectory-based approach. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1866–1869. IEEE (2020)

  25. Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 895–904. ACM (2016)

  26. Hu, S., Wu, X., Chan, T.H.: Maintaining densest subsets efficiently in evolving hypergraphs. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 929–938 (2017)

  27. Jaggi, M.: Revisiting frank-wolfe: Projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning, CONF, pp. 427–435 (2013)

  28. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM (2007)

  29. Kannan, R., Vinay, V.: Analyzing the structure of large graphs. Rheinische Friedrich-Wilhelms-Universität Bonn Bonn (1999)

  30. Karlebach, G., Shamir, R.: Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9(10), 770–780 (2008)

    Article  Google Scholar 

  31. Khuller, S., Saha, B.: On finding dense subgraphs. In: International Colloquium on Automata, Languages, and Programming, pp. 597–608. Springer (2009)

  32. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  Google Scholar 

  33. Kunegis, J.: KONECT – The Koblenz Network Collection. In: WWW, pp. 1343–1350 (2013). http://userpages.uni-koblenz.de/~kunegis/paper/kunegis-koblenz-network-collection.pdf

  34. Lakshmanan, L.V.: On a quest for combating filter bubbles and misinformation. In: Proceedings of the 2022 International Conference on Management of Data, pp. 2–2 (2022)

  35. Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web 1(1) (2007)

  36. Li, X., Cheng, R., Chang, K.C.C., Shan, C., Ma, C., Cao, H.: On analyzing graphs with motif-paths. Proc. VLDB Endowm. 14(6), 1111–1123 (2021)

    Article  Google Scholar 

  37. Luo, W., Ma, C., Fang, Y., Lakshmanan, L.V.S.: A survey of densest subgraph discovery on large graphs (2023)

  38. Ma, C., Cheng, R., Lakshmanan, L.V., Grubenmann, T., Fang, Y., Li, X.: Linc: a motif counting algorithm for uncertain graphs. Proc. VLDB Endowm. 13(2), 155–168 (2019)

    Article  Google Scholar 

  39. Ma, C., Cheng, R., Lakshmanan, L.V., Han, X.: Finding locally densest subgraphs: a convex programming approach. Proc. VLDB Endowm. 15(11), 2719–2732 (2022)

    Article  Google Scholar 

  40. Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: Efficient algorithms for densest subgraph discovery on large directed graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1051–1066 (2020)

  41. Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: Efficient directed densest subgraph discovery. ACM SIGMOD Rec. 50(1), 33–40 (2021)

    Article  Google Scholar 

  42. Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: On directed densest subgraph discovery. TODS 46(4), 1–45 (2021)

    Article  MathSciNet  Google Scholar 

  43. Massa, P., Salvetti, M., Tomasoni, D.: Bowling alone and trust decline in social network sites. In: Proc. Int. Conf. Dependable, Autonomic and Secure Computing, pp. 658–663 (2009)

  44. Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S.C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 815–824. ACM (2015)

  45. Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: WWW, pp. 191–200 (2012)

  46. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me – weaving Chinese linking open data. In: Proc. Int. Semantic Web Conf., pp. 205–220 (2011)

  47. Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 3(32), 245–251 (2010)

  48. Orlin, J.B.: Max flows in o (nm) time, or better. In: STOC, pp. 765–774 (2013)

  49. Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In: PAKDD, pp. 435–448. Springer (2010)

  50. Qin, L., Li, R.H., Chang, L., Zhang, C.: Locally densest subgraph discovery. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 965–974. ACM (2015)

  51. Rossi, R., Ahmed, N.: Network repository (2013). http://networkrepository.com

  52. Sarma, A.D., Lall, A., Nanongkai, D., Trehan, A.: Dense subgraphs on dynamic networks. In: International Symposium on Distributed Computing, pp. 151–165. Springer (2012)

  53. Sawlani, S., Wang, J.: Near-optimal fully dynamic densest subgraph. In: STOC, pp. 181–193 (2020)

  54. Shiloach, Y., Vishkin, U.: An o (n2log n) parallel max-flow algorithm. J. Algorithms 3(2), 128–146 (1982)

    Article  MathSciNet  Google Scholar 

  55. Stratton, J.A., Anssari, N., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L., Liu, G.D., Hwu, W.m.: Optimization and architecture effects on gpu computing workload performance. In: 2012 Innovative Parallel Computing (InPar), pp. 1–10. IEEE (2012)

  56. Sun, B., Dansich, M., Chan, H., Sozio, M.: Kclist++: a simple algorithm for finding k-clique densest subgraphs in large graphs. Proc. VLDB Endowm. 13(10), 1628–1640 (2020)

    Article  Google Scholar 

  57. Tatti, N., Gionis, A.: Density-friendly graph decomposition. In: WWW, pp. 1089–1099. International World Wide Web Conferences Steering Committee (2015)

  58. Tsourakakis, C.: The k-clique densest subgraph problem. In: WWW, pp. 1122–1132. International World Wide Web Conferences Steering Committee (2015)

  59. Xu, Y., Ma, C., Fang, Y., Bao, Z.: Efficient and effective algorithms for generalized densest subgraph discovery. Proc. ACM Manag. Data 1(2), 1–27 (2023)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSFC under Grant 62102341, Basic and Applied Basic Research Fund in Guangdong Province under Grant 2023A1515011280 and 2022A1515010166, Guangdong Talent Program under Grant 2021QN02X826, Shenzhen Science and Technology Program under Grants JCYJ20220530143602006 and ZDSYS20211021111415025, the Fundamental Research Funds for the Central Universities (No. D5000230191), the University of Hong Kong (Projects 104005858 and 10400599), the Guangdong-Hong Kong-Macau Joint Laboratory Program 2020 (Project No: 2020B1212030009), The Hong Kong Jockey Club Charities Trust (HKJC), No. 260920140, and a grant from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolin Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Convergence rate of   Frank-Wolfe-DDS

To perform the convergence analysis of (Algorithm 1), it would be easier if the objective function is differentiable [27], which, however, is not the case for \(\Vert \textbf{r} \Vert _{\infty }\) in \({\textsf{D}}{\textsf{P}}(c)\) ((4.6)). Hence, we construct a convex program with a differentiable objective function, which shares the same optimal solution and minimizer of the linearization of the objective function at a specific position ((4.8)) with \({\textsf{D}}{\textsf{P}}(c)\).

$$\begin{aligned} \begin{aligned} {\textsf{C}}{\textsf{P}}(c)\text { } \min{} & {} f(\alpha , \beta )=\frac{1}{4\sqrt{c}}\sum _{u\in V} r_{\alpha }(u)^{2} + \frac{\sqrt{c}}{4}\sum _{v\in V}r_{\beta }(v)^{2}&\\ \text {s.t.}{} & {} \alpha , \beta , \textbf{r} \text { satisfy the constraints in } {\textsf{D}}{\textsf{P}}(c). \end{aligned} \end{aligned}$$
(9.1)

We can verify that (4.8) is also the minimizer of the linear function given by \(\partial f(\alpha , \beta )\). Hence, (Algorithm 1) applies to both \({\textsf{D}}{\textsf{P}}(c)\) and \({\textsf{C}}{\textsf{P}}(c)\). Further, the following two lemmas indicate that the optimal solution of \({\textsf{C}}{\textsf{P}}(c)\) induces the c-biased DDS, which is also the objective of \({\textsf{D}}{\textsf{P}}(c)\).

Lemma 9.1

Suppose that an optimal solution \((\alpha , \beta )\) of \({\textsf{C}}{\textsf{P}}(c)\) induces the density vector \(\textbf{r}\in {\mathbb {R}}_{+}^{2|V|}\). Then, we have

  1. 1.

    \(\exists (u,v) \in E, r_{\alpha }(u)>r_{\beta }(v) \Rightarrow \alpha _{u,v} = 0, \beta _{v,u} = 1\);

  2. 2.

    \(\exists (u,v) \in E, r_{\alpha }(u)<r_{\beta }(v) \Rightarrow \beta _{v,u} = 0, \alpha _{u,v} = 1\).

Proof

We prove the lemma by contradiction. For (1), suppose \(\alpha _{u,v} > 0\). There exists \(\epsilon > 0\) such that we could decrease \(\alpha _{u,v}\) by \(\epsilon \) and increase \(\beta _{v,u}\) by \(\epsilon \) to strictly decrease the objective function because \(\frac{\partial f}{\partial \alpha _{u,v}}=r_{\alpha }(u) > \frac{\partial f}{\partial \beta _{v,u}}=r_{\beta }(v)\). This contradicts the optimal assumption. Similarly, we can also prove (2). \(\square \)

To simplify the notations, we denote \({\mathcal {D}}_{c}\) as the feasible set of \({\textsf{D}}{\textsf{P}}(c)\) and \({\textsf{C}}{\textsf{P}}(c)\), as \({\textsf{D}}{\textsf{P}}(c)\) and \({\textsf{C}}{\textsf{P}}(c)\) share the same constraints.

Lemma 9.2

Suppose a non-empty subset pair (ST), where \(S, T \subseteq V\), is stable with respect to a pair \((\alpha , \beta , \textbf{r}) \in {\mathcal {D}}_{c}\). Suppose that \(\exists \rho _{c}^{*} \in {\mathbb {R}}\) such that \(\forall u \in S, r_{\alpha }(u)=\rho _{c}^{*}\) and \(\forall v \in T, r_{\beta }(v)=\rho _{c}^{*}\). Then, G[ST] is the c-biased DDS and has c-biased density \(\rho _{c}^{*}\).

Proof

As \((\alpha , \beta , \textbf{r})\) is a feasible solution of \({\textsf{D}}{\textsf{P}}(c)\) and (ST) is stable, the objective value of \((\alpha , \beta , \textbf{r})\) is \(\Vert \textbf{r} \Vert _{\infty }=\rho _{c}^{*}\). Moreover, since (ST) is stable, \(\rho _{c}(S,T)=\frac{2\sqrt{c c'}}{c+c'}\cdot \frac{|E(S,T)|}{\sqrt{|S||T|}}=\rho _{c}^{*}\), where \(c'=\frac{|S|}{|T|}\). This comes from \(|E(S, T)|= (\frac{|S|}{2\sqrt{c}}+\frac{\sqrt{c}|T|}{2})\rho _{c}(S, T)\). By Lemma 4.1, (ST) gives a feasible primal solution in \({\textsf{L}}{\textsf{P}}(c)\) with objective value \(\rho _{c}^{*}\). Hence, \(\rho _{c}^{*}\) is the optimal value for both \({\textsf{L}}{\textsf{P}}(c)\) and \({\textsf{D}}{\textsf{P}}(c)\), which means that G[ST] is the c-biased DDS. \(\square \)

Lemmas 9.1,9.2 imply that an optimal solution \((\alpha , \beta , \textbf{r})\) of \({\textsf{C}}{\textsf{P}}(c)\) induces the c-biased DDS \(G[S_{c}^{*}, T_{c}^{*}]\) in G, where \(S_{c}^{*}=\{u | r_{\alpha }(u)=\Vert \textbf{r} \Vert _{\infty }\}\) and \(T_{c}^{*}=\{v | r_{\beta }(v)=\Vert \textbf{r} \Vert _{\infty }\}\).

Hence, we can confirm that (Algorithm 1) applies to both \({\textsf{D}}{\textsf{P}}(c)\) and \({\textsf{C}}{\textsf{P}}(c)\) and the optimal solutions of both programs induce the c-biased DDS. Hence, we use \({\textsf{C}}{\textsf{P}}(c)\) to analyze the convergence rate of . According to the previous convergence analysis of the Frank–Wolfe-based algorithms in [13, 27], the convergence rate of our algorithm can be described by a value related to the graph, \(Q_{c}=\frac{1}{2}\textsf{Diam}({\mathcal {D}}_{c})^{2}\sup _{(\alpha ,\beta )\in {\mathcal {D}}_{c}}\Vert \nabla ^{2}f(\alpha ,\beta ) \Vert _{2}\), where \(\textsf{Diam}({\mathcal {D}}_{c})\) is the diameter of \({\mathcal {D}}_{c}\), \(\nabla ^{2}f(\alpha ,\beta )\) is the Hessian, and \(\Vert \cdot \Vert _{2}\) is the spectral norm of a matrix.

Theorem 9.1

(Convergence Rate of Frank–Wolfe [27]) Suppose \((\alpha ^{*}, \beta ^{*}) \in {\mathcal {D}}_{c}\) is an optimal solution of \({\textsf{C}}{\textsf{P}}(c)\). Then, for all \(i\ge 1\), \(f(\alpha ^{(i)}, \beta ^{(i)}) - f(\alpha ^{*}, \beta ^{*}) \le \frac{2Q_{c}}{i+2}\).

Lemma 9.3

(Bounding \(Q_{c}\)) Given a directed graph \(G=(V,E)\) with maximum outdegree \(d^{+}_{\max }\) and maximum indegree \(d^{-}_{\max }\) and a given c, we have that \(Q_{c} \le 2|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}\).

Proof

First, we have \(\textsf{Diam}({\mathcal {D}}_{c})=\sqrt{2|E|}\). The Hessian of \(f(\alpha ,\beta )\) is irrelevant to the value of \((\alpha , \beta )\), and it is a nonnegative symmetric matrix. Therefore, \(\sup _{(\alpha ,\beta )\in {\mathcal {D}}_{c}}\Vert \nabla ^{2}f(\alpha ,\beta ) \Vert _{2}\) is the maximum singular value of \(\nabla ^{2}f(\alpha ,\beta )\). Let \(A=\nabla ^{2}f(\alpha ,\beta )\), \(\lambda _{1}\) be the maximum singular value (also the maximum eigenvalue) of A, x be the eigenvector associated with \(\lambda _{1}\), and p be the component in which x has maximum absolute value. Without loss of generality, we assume \(x_{p}\) is positive. We have

$$\begin{aligned} \lambda _{1} x_{p} = (Ax)_{p}=\sum _{q=1}^{2n}A_{p,q} x_{q}&\le \sum _{q=1}^{2n}A_{p,q}x_{p} \\&\le x_{p} \max \{2\sqrt{c}d^{+}_{\max },\frac{2}{\sqrt{c}}d^{-}_{\max } \}. \end{aligned}$$

Therefore, \(Q_{c} \le 2|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}\). \(\square \)

Lemma 9.4

Suppose \((\alpha , \beta , r) \in {\mathcal {D}}_{c}\) such that \(\varepsilon := \Vert \textbf{r} \Vert _{\infty } - \rho _{c}^{*}\), where \(\rho _{c}^{*}=\Vert \textbf{r}^{*}\Vert _{\infty }\) and \((\alpha ^{*}, \beta ^{*}, \textbf{r}^{*})\) is the optimal solution of \({\textsf{D}}{\textsf{P}}(c)\). Then, we have that \((4\sqrt{c}+\frac{4}{\sqrt{c}})\cdot \left( f(\alpha , \beta ) - f(\alpha ^{*}, \beta ^{*}) \right) \ge \varepsilon ^{2}\).

Proof

First, we have \(f(\alpha , \beta )-f(\alpha ^{*}, \beta ^{*}) \ge f(\alpha - \alpha ^{*}, \beta - \beta ^{*})\), because \(f(\alpha , \beta ) - f(\alpha ^{*}, \beta ^{*}) - f(\alpha - \alpha ^{*}, \beta - \beta ^{*})\) is an affine function on \({\mathcal {D}}_{c}\) and obtains its minimum value 0 at \((\alpha ^{*}, \beta ^{*})\). Second, \(f(\alpha - \alpha ^{*}, \beta - \beta ^{*})\) can be bounded by the \(l^{2}\)-norm of \(\textbf{r}-\textbf{r}^{*}\), i.e., \((4\sqrt{c} + \frac{4}{\sqrt{c}})f(\alpha - \alpha ^{*}, \beta - \beta ^{*}) \ge \Vert \textbf{r} - \textbf{r}^{*} \Vert _{2}^{2}\). For the infinity norm and the \(l^{2}\)-norm, we have \(\Vert \textbf{r} \Vert _{\infty } - \rho _{c}^{*} \le \Vert \textbf{r} - \textbf{r}^{*} \Vert _{\infty } \le \Vert \textbf{r} - \textbf{r}^{*} \Vert _{2}\). Combining the above inequalities, we will have the lemma. \(\square \)

Corollary 9.1

(Convergence of Algorithm 1) Suppose \(d^{+}_{\max }\) (resp. \(d^{-}_{\max }\)) is the maximum outdegree (resp. indegree) of G and c is fixed. In Algorithm 1, for \(i > 16(\sqrt{c} + \frac{1}{\sqrt{c}}) \frac{|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}}{\varepsilon ^{2}}\), we have \(\Vert \textbf{r}^{(i)} \Vert _{\infty } - \rho _{c}^{*} \le \varepsilon \).

1.2 Proofs

Proof of Lemma 4.1

We prove the lemma by showing a feasible solution (xstab) of \({\textsf{L}}{\textsf{P}}(c)\). Let \(a=\frac{2c'}{c+c'}\) and \(b=\frac{2c}{c+c'}\). For each vertex \(u\in P\), set \(s_{u}=\frac{a\sqrt{c}}{|P|}=\frac{2c'\sqrt{c}}{(c+c')|P|}.\) For each vertex \(v\in Q\), set \(t_{v}=\frac{b}{\sqrt{c}|Q|}=\frac{2c}{(c+c')\sqrt{c}|Q|}=\frac{2c'\sqrt{c}}{(c+c')|P|}\). For each edge \((u,v)\in E(P,Q)\), set \(x_{u,v}=s_{u}=t_{v}\). All the remaining variables are set to 0. Now, \(\sum _{u\in V}s_{u}=a\sqrt{c}\) and \(\sum _{v\in V}t_{v}=\frac{b}{\sqrt{c}}\). Hence, this is a feasible solution to \({\textsf{L}}{\textsf{P}}(c)\). The value of this solution is

$$\begin{aligned}{} & {} \frac{2c'\sqrt{c}}{(c+c')|P|}|E(P,Q)|=\frac{2\sqrt{c}c'\sqrt{|Q|}}{(c+c')\sqrt{|P|}}\frac{|E(P,Q)|}{\sqrt{|P||Q|}}\\{} & {} \quad =\frac{2\sqrt{c}\sqrt{c'}}{c+c'}\rho (P,Q). \end{aligned}$$

Thus, the lemma holds. \(\square \)

Proof of Lemma 4.2

Without loss of generality, we can assume that for each \((u,v)\in E\), \(x_{u,v}=\min \{s_{u}, t_{v}\}\). We define a collection of sets ST indexed by a parameter \(r\ge 0\). Let \(S(r)=\{u|s_u \ge r\}\), \(T(r)=\{v|t_{v}\ge r\}\), and \(E(r)=\{(u,v)|x_{u,v}=\min \{s_{u}, t_{v}\}\}\). Hence, E(r) is precisely the set of edges that go from S(r) to T(r).

Now, \(\int _{0}^{\infty }|S(r)|\text {d}r=\sum _{u\in V}s_{u}=a\sqrt{c}\). Similarly, \(\int _{0}^{\infty }|T(r)|\text {d}r=\sum _{v\in V}t_{v}=\frac{b}{\sqrt{c}}\). By the Cauchy–Schwarz inequality,

$$\begin{aligned}{} & {} \int _{0}^{\infty }\sqrt{|S(r)||T(r)|}\text {d}r\\{} & {} \quad \le \sqrt{\left( \int _{0}^{\infty }|S(r)|\text {d}r\right) \left( \int _{0}^{\infty }|T(r)|\text {d}r\right) }=\sqrt{ab}. \end{aligned}$$

Note that \(\int _{0}^{\infty }|E(r)|\text {d}r=\sum _{(u,v)\in E}x_{u,v}\). This is the objective function value of the solution. Let this value be \(x_{\text {sum}}\).

We claim that there exists r such that \(\frac{E(r)}{\sqrt{|S(r)||T(r)|}}\ge \frac{x_{\text {sum}}}{\sqrt{ab}}\). Suppose there was no such r. Then,

$$\begin{aligned} \int _{0}^{\infty }|E(r)|\text {d}r < \frac{x_{\text {sum}}}{\sqrt{ab}} \int _{0}^{\infty }\sqrt{|S(r)||T(r)|}\text {d}r \le x_{\text {sum}}. \end{aligned}$$

This gives a contradiction. Thus, the lemma holds. \(\square \)

Proof of Lemma5.3

As \(G[S_{c}^{*}, T_{c}^{*}]\) is the c-biased DDS with c-biased density \(\rho _{c}(S^{*}, T^{*})\), there must exist \(u\in S\) satisfying \(r_{\alpha }(u) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})\), or \(v\in T\) satisfying \(r_{\beta }(v) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})\). Otherwise, G[ST] is a subgraph with a higher c-biased density than \(G[S_{c}^{*}, T_{c}^{*}]\).

Now, we prove the lemma by contradiction. Assume \(G[S_{c}^{*},T_{c}^{*}]\) is not contained in G[ST]. Based on whether \(G[S_{c}^{*},T_{c}^{*}]\) overlaps G[ST], there are two cases.

  1. 1.

    \(S_{c}^{*} \cap S = \emptyset \) and \(T_{c}^{*} \cap T = \emptyset \). Since \(|E(S_{c}^{*}, T_{c}^{*})|=(\frac{|S_{c}^{*}|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}|)\rho _{c}(S_{c}^{*}, T_{c}^{*})\), there exists \(u\in S_{c}^{*}, r_{\alpha }(u) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})\) or \(v\in T_{c}^{*}, r_{\beta }(v) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})\).

  2. 2.

    \(S_{c}^{*} \cap S \ne \emptyset \) or \(T_{c}^{*} \cap T \ne \emptyset \).

    $$\begin{aligned} \begin{aligned}&|E(S_c^{*}, T_{c}^{*})| \\ =&|E(S_c^{*}\cap S, T_{c}^{*}\cap T)| + |E(S_c^{*}, T_{c}^{*})\setminus E(S, T)| \\ =&\left( \frac{|S_{c}^{*}\cap S|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}\cap T|\right) \rho _{c}(S_c^{*}\cap S, T_{c}^{*}\cap T) \\&+ \left( \frac{|S_{c}^{*}\setminus S|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}\setminus T|\right) \rho '_{c}. \end{aligned} \end{aligned}$$

    Since \(\rho _{c}(S_c^{*}\cap S, T_{c}^{*}\cap T) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})\), we have \(\rho '_{c} \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})\). Thus, there exists \(u\in S_{c}^{*}{\setminus } S, r_{\alpha }(u) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})\) or \(v\in T_{c}^{*}{\setminus } T, r_{\beta }(v) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})\).

Consequently, for each case above, combining the inequalities will give us a contradiction to the first condition of the stable (ST)-induced subgraph definition. Hence, the lemma holds. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, C., Fang, Y., Cheng, R. et al. Accelerating directed densest subgraph queries with software and hardware approaches. The VLDB Journal 33, 207–230 (2024). https://doi.org/10.1007/s00778-023-00805-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-023-00805-0

Keywords

Navigation