Accelerating directed densest subgraph queries with software and hardware approaches

Ma, Chenhao; Fang, Yixiang; Cheng, Reynold; Lakshmanan, Laks V. S.; Han, Xiaolin; Li, Xiaodong

doi:10.1007/s00778-023-00805-0

Accelerating directed densest subgraph queries with software and hardware approaches

Regular Paper
Published: 31 July 2023

Volume 33, pages 207–230, (2024)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Chenhao Ma ORCID: orcid.org/0000-0002-3243-8512¹,
Yixiang Fang¹,
Reynold Cheng²,
Laks V. S. Lakshmanan³,
Xiaolin Han⁴ &
…
Xiaodong Li⁵

345 Accesses
Explore all metrics

Abstract

Given a directed graph G, the directed densest subgraph (DDS) problem refers to finding a subgraph from G, whose density is the highest among all subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fake follower detection and community mining. Theoretically, the DDS problem closely connects to other essential graph problems, such as network flow and bipartite matching. However, existing DDS solutions suffer from efficiency and scalability issues. In this paper, we develop a convex-programming-based solution by transforming the DDS problem into a set of linear programs. Based on the duality of linear programs, we develop efficient exact and approximation algorithms. Particularly, our approximation algorithm can support flexible parameterized approximation guarantees. We further investigate using GPU to speed up the solution of convex programs in parallel and achieve hundreds of times speedup compared to the original Frank–Wolfe computation. We have performed an extensive empirical evaluation of our approaches on eight real large datasets. The results show that our proposed algorithms are up to five orders of magnitude faster than the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 6:

Algorithm 7:

Efficient Subgraph Matching Using GPUs

High-performance parallel frequent subgraph discovery

Article 28 February 2015

Partitioning Dense Graphs with Hardware Accelerators

Notes

The dataset is due to Dr. Saravanan Thirumuruganathan from QCRI, HBKU.
There might be several directed densest subgraphs of a graph, and our algorithm will find one of them.
In a real GPU, a block can have several warps, and a warp contains 32 threads. Here, we use small numbers for illustration.
http://konect.uni-koblenz.de/networks/.
https://github.com/chenhao-ma/DDS-convex-code.
https://deci.ai/blog/measure-inference-time-deep-neural-networks/.

References

Administration, F.A.: Air traffic control system command center. https://www.faa.gov (2019)
Albert, R., Jeong, H., Barabási, A.L.: Internet: diameter of the world-wide web. Nature 401(6749), 130 (1999)
Article Google Scholar
Angel, A., Koudas, N., Sarkas, N., Srivastava, D., Svendsen, M., Tirthapura, S.: Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB J. 23(2), 175–199 (2014)
Article Google Scholar
Bahmani, B., Kumar, R., Vassilvitskii, S.: Densest subgraph in streaming and mapreduce. Proc. VLDB Endowm. 5(5), 454–465 (2012)
Article Google Scholar
Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an o (n $1/4$) approximation for densest k-subgraph. In: STOC, pp. 201–210. ACM (2010)
Bhattacharya, S., Henzinger, M., Nanongkai, D., Tsourakakis, C.: Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: STOC, pp. 173–182 (2015)
Boob, D., Gao, Y., Peng, R., Sawlani, S., Tsourakakis, C., Wang, D., Wang, J.: Flowless: Extracting densest subgraphs without flow computations. In: Proceedings of The Web Conference 2020, WWW ’20, pp. 573–583. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380140
Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM, pp. 95–106. ACM (2008)
Capocci, A., Servedio, V.D., Colaiori, F., Buriol, L.S., Donato, D., Leonardi, S., Caldarelli, G.: Preferential attachment in the growth of social networks: the internet encyclopedia wikipedia. Phys Review E 74(3), 0360116 (2006)
Article Google Scholar
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on Approximation Algorithms for Combinatorial Optimization, pp. 84–95. Springer (2000)
Chekuri, C., Quanrud, K., Torres, M.R.: Densest subgraph: Supermodularity, iterative peeling, and flow. In: Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1531–1555. SIAM (2022)
Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Article MathSciNet Google Scholar
Danisch, M., Chan, T.H.H., Sozio, M.: Large scale density-friendly graph decomposition via convex programming. In: WWW, pp. 233–242. International World Wide Web Conferences Steering Committee (2017)
Epasto, A., Lattanzi, S., Sozio, M.: Efficient densest subgraph computation in evolving graphs. In: Proceedings of the 24th International Conference on World Wide Web, pp. 300–310 (2015)
Fang, Y., Yu, K., Cheng, R., Lakshmanan, L.V., Lin, X.: Efficient algorithms for densest subgraph discovery. Proc. VLDB Endowm. 12(11), 1719–1732 (2019)
Article Google Scholar
Frank, M., Wolfe, P., et al.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Freeman, L.C., Webster, C.M., Kirke, D.M.: Exploring social structure using dynamic three-dimensional color images. Soc. Netw. 20(2), 109–118 (1998)
Article Google Scholar
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: PVLDB, pp. 721–732. VLDB Endowment (2005)
Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: Kdd 2015 tutorial. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2313–2314. ACM (2015)
Goldberg, A.V.: Finding a Maximum Density Subgraph. University of California Berkeley, CA (1984)
Google Scholar
Han, X., Cheng, R., Grubenmann, T., Maniu, S., Ma, C., Li, X.: Leveraging contextual graphs for stochastic weight completion in sparse road networks. In: SIAM International Conference on Data Mining. SIAM (2022)
Han, X., Cheng, R., Ma, C., Grubenmann, T.: Deeptea: Effective and efficient online time-dependent trajectory outlier detection. In: Proceedings of the VLDB Endowment (2022)
Han, X., Dell’Aglio, D., Grubenmann, T., Cheng, R., Bernstein, A.: A framework for differentially-private knowledge graph embeddings. J. Web Semant. 72, 100696 (2022)
Article Google Scholar
Han, X., Grubenmann, T., Cheng, R., Wong, S.C., Li, X., Sun, W.: Traffic incident detection: A trajectory-based approach. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1866–1869. IEEE (2020)
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 895–904. ACM (2016)
Hu, S., Wu, X., Chan, T.H.: Maintaining densest subsets efficiently in evolving hypergraphs. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 929–938 (2017)
Jaggi, M.: Revisiting frank-wolfe: Projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning, CONF, pp. 427–435 (2013)
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM (2007)
Kannan, R., Vinay, V.: Analyzing the structure of large graphs. Rheinische Friedrich-Wilhelms-Universität Bonn Bonn (1999)
Karlebach, G., Shamir, R.: Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9(10), 770–780 (2008)
Article Google Scholar
Khuller, S., Saha, B.: On finding dense subgraphs. In: International Colloquium on Automata, Languages, and Programming, pp. 597–608. Springer (2009)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Article MathSciNet Google Scholar
Kunegis, J.: KONECT – The Koblenz Network Collection. In: WWW, pp. 1343–1350 (2013). http://userpages.uni-koblenz.de/~kunegis/paper/kunegis-koblenz-network-collection.pdf
Lakshmanan, L.V.: On a quest for combating filter bubbles and misinformation. In: Proceedings of the 2022 International Conference on Management of Data, pp. 2–2 (2022)
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web 1(1) (2007)
Li, X., Cheng, R., Chang, K.C.C., Shan, C., Ma, C., Cao, H.: On analyzing graphs with motif-paths. Proc. VLDB Endowm. 14(6), 1111–1123 (2021)
Article Google Scholar
Luo, W., Ma, C., Fang, Y., Lakshmanan, L.V.S.: A survey of densest subgraph discovery on large graphs (2023)
Ma, C., Cheng, R., Lakshmanan, L.V., Grubenmann, T., Fang, Y., Li, X.: Linc: a motif counting algorithm for uncertain graphs. Proc. VLDB Endowm. 13(2), 155–168 (2019)
Article Google Scholar
Ma, C., Cheng, R., Lakshmanan, L.V., Han, X.: Finding locally densest subgraphs: a convex programming approach. Proc. VLDB Endowm. 15(11), 2719–2732 (2022)
Article Google Scholar
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: Efficient algorithms for densest subgraph discovery on large directed graphs. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1051–1066 (2020)
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: Efficient directed densest subgraph discovery. ACM SIGMOD Rec. 50(1), 33–40 (2021)
Article Google Scholar
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: On directed densest subgraph discovery. TODS 46(4), 1–45 (2021)
Article MathSciNet Google Scholar
Massa, P., Salvetti, M., Tomasoni, D.: Bowling alone and trust decline in social network sites. In: Proc. Int. Conf. Dependable, Autonomic and Secure Computing, pp. 658–663 (2009)
Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S.C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 815–824. ACM (2015)
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: WWW, pp. 191–200 (2012)
Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me – weaving Chinese linking open data. In: Proc. Int. Semantic Web Conf., pp. 205–220 (2011)
Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 3(32), 245–251 (2010)
Orlin, J.B.: Max flows in o (nm) time, or better. In: STOC, pp. 765–774 (2013)
Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In: PAKDD, pp. 435–448. Springer (2010)
Qin, L., Li, R.H., Chang, L., Zhang, C.: Locally densest subgraph discovery. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 965–974. ACM (2015)
Rossi, R., Ahmed, N.: Network repository (2013). http://networkrepository.com
Sarma, A.D., Lall, A., Nanongkai, D., Trehan, A.: Dense subgraphs on dynamic networks. In: International Symposium on Distributed Computing, pp. 151–165. Springer (2012)
Sawlani, S., Wang, J.: Near-optimal fully dynamic densest subgraph. In: STOC, pp. 181–193 (2020)
Shiloach, Y., Vishkin, U.: An o (n2log n) parallel max-flow algorithm. J. Algorithms 3(2), 128–146 (1982)
Article MathSciNet Google Scholar
Stratton, J.A., Anssari, N., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L., Liu, G.D., Hwu, W.m.: Optimization and architecture effects on gpu computing workload performance. In: 2012 Innovative Parallel Computing (InPar), pp. 1–10. IEEE (2012)
Sun, B., Dansich, M., Chan, H., Sozio, M.: Kclist++: a simple algorithm for finding k-clique densest subgraphs in large graphs. Proc. VLDB Endowm. 13(10), 1628–1640 (2020)
Article Google Scholar
Tatti, N., Gionis, A.: Density-friendly graph decomposition. In: WWW, pp. 1089–1099. International World Wide Web Conferences Steering Committee (2015)
Tsourakakis, C.: The k-clique densest subgraph problem. In: WWW, pp. 1122–1132. International World Wide Web Conferences Steering Committee (2015)
Xu, Y., Ma, C., Fang, Y., Bao, Z.: Efficient and effective algorithms for generalized densest subgraph discovery. Proc. ACM Manag. Data 1(2), 1–27 (2023)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSFC under Grant 62102341, Basic and Applied Basic Research Fund in Guangdong Province under Grant 2023A1515011280 and 2022A1515010166, Guangdong Talent Program under Grant 2021QN02X826, Shenzhen Science and Technology Program under Grants JCYJ20220530143602006 and ZDSYS20211021111415025, the Fundamental Research Funds for the Central Universities (No. D5000230191), the University of Hong Kong (Projects 104005858 and 10400599), the Guangdong-Hong Kong-Macau Joint Laboratory Program 2020 (Project No: 2020B1212030009), The Hong Kong Jockey Club Charities Trust (HKJC), No. 260920140, and a grant from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shenzhen, China
Chenhao Ma & Yixiang Fang
Department of Computer Science and Guangdong-Hong Kong-Macau Joint Laboratory and HKU Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong, China
Reynold Cheng
The University of British Columbia, Vancouver, Canada
Laks V. S. Lakshmanan
Northwestern Polytechnical University, Xi’an, China
Xiaolin Han
The University of Hong Kong, Hong Kong, China
Xiaodong Li

Authors

Chenhao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yixiang Fang
View author publications
You can also search for this author in PubMed Google Scholar
Reynold Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Laks V. S. Lakshmanan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Han
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolin Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Convergence rate of Frank-Wolfe-DDS

To perform the convergence analysis of (Algorithm 1), it would be easier if the objective function is differentiable [27], which, however, is not the case for $\Vert \textbf{r} \Vert _{\infty }$ in ${\textsf{D}}{\textsf{P}}(c)$ ((4.6)). Hence, we construct a convex program with a differentiable objective function, which shares the same optimal solution and minimizer of the linearization of the objective function at a specific position ((4.8)) with ${\textsf{D}}{\textsf{P}}(c)$.

$$\begin{aligned} \begin{aligned} {\textsf{C}}{\textsf{P}}(c)\text { } \min{} & {} f(\alpha , \beta )=\frac{1}{4\sqrt{c}}\sum _{u\in V} r_{\alpha }(u)^{2} + \frac{\sqrt{c}}{4}\sum _{v\in V}r_{\beta }(v)^{2}&\\ \text {s.t.}{} & {} \alpha , \beta , \textbf{r} \text { satisfy the constraints in } {\textsf{D}}{\textsf{P}}(c). \end{aligned} \end{aligned}$$

(9.1)

We can verify that (4.8) is also the minimizer of the linear function given by $\partial f(\alpha , \beta )$. Hence, (Algorithm 1) applies to both ${\textsf{D}}{\textsf{P}}(c)$ and ${\textsf{C}}{\textsf{P}}(c)$. Further, the following two lemmas indicate that the optimal solution of ${\textsf{C}}{\textsf{P}}(c)$ induces the c-biased DDS, which is also the objective of ${\textsf{D}}{\textsf{P}}(c)$.

Lemma 9.1

Suppose that an optimal solution $(\alpha , \beta )$ of ${\textsf{C}}{\textsf{P}}(c)$ induces the density vector $\textbf{r}\in {\mathbb {R}}_{+}^{2|V|}$. Then, we have

1.
$\exists (u,v) \in E, r_{\alpha }(u)>r_{\beta }(v) \Rightarrow \alpha _{u,v} = 0, \beta _{v,u} = 1$;
2.
$\exists (u,v) \in E, r_{\alpha }(u)<r_{\beta }(v) \Rightarrow \beta _{v,u} = 0, \alpha _{u,v} = 1$.

Proof

We prove the lemma by contradiction. For (1), suppose $\alpha _{u,v} > 0$. There exists $\epsilon > 0$ such that we could decrease $\alpha _{u,v}$ by $\epsilon $ and increase $\beta _{v,u}$ by $\epsilon $ to strictly decrease the objective function because $\frac{\partial f}{\partial \alpha _{u,v}}=r_{\alpha }(u) > \frac{\partial f}{\partial \beta _{v,u}}=r_{\beta }(v)$. This contradicts the optimal assumption. Similarly, we can also prove (2). $\square $

To simplify the notations, we denote ${\mathcal {D}}_{c}$ as the feasible set of ${\textsf{D}}{\textsf{P}}(c)$ and ${\textsf{C}}{\textsf{P}}(c)$, as ${\textsf{D}}{\textsf{P}}(c)$ and ${\textsf{C}}{\textsf{P}}(c)$ share the same constraints.

Lemma 9.2

Suppose a non-empty subset pair (S, T), where $S, T \subseteq V$, is stable with respect to a pair $(\alpha , \beta , \textbf{r}) \in {\mathcal {D}}_{c}$. Suppose that $\exists \rho _{c}^{*} \in {\mathbb {R}}$ such that $\forall u \in S, r_{\alpha }(u)=\rho _{c}^{*}$ and $\forall v \in T, r_{\beta }(v)=\rho _{c}^{*}$. Then, G[S, T] is the c-biased DDS and has c-biased density $\rho _{c}^{*}$.

Proof

As $(\alpha , \beta , \textbf{r})$ is a feasible solution of ${\textsf{D}}{\textsf{P}}(c)$ and (S, T) is stable, the objective value of $(\alpha , \beta , \textbf{r})$ is $\Vert \textbf{r} \Vert _{\infty }=\rho _{c}^{*}$. Moreover, since (S, T) is stable, $\rho _{c}(S,T)=\frac{2\sqrt{c c'}}{c+c'}\cdot \frac{|E(S,T)|}{\sqrt{|S||T|}}=\rho _{c}^{*}$, where $c'=\frac{|S|}{|T|}$. This comes from $|E(S, T)|= (\frac{|S|}{2\sqrt{c}}+\frac{\sqrt{c}|T|}{2})\rho _{c}(S, T)$. By Lemma 4.1, (S, T) gives a feasible primal solution in ${\textsf{L}}{\textsf{P}}(c)$ with objective value $\rho _{c}^{*}$. Hence, $\rho _{c}^{*}$ is the optimal value for both ${\textsf{L}}{\textsf{P}}(c)$ and ${\textsf{D}}{\textsf{P}}(c)$, which means that G[S, T] is the c-biased DDS. $\square $

Lemmas 9.1,9.2 imply that an optimal solution $(\alpha , \beta , \textbf{r})$ of ${\textsf{C}}{\textsf{P}}(c)$ induces the c-biased DDS $G[S_{c}^{*}, T_{c}^{*}]$ in G, where $S_{c}^{*}=\{u | r_{\alpha }(u)=\Vert \textbf{r} \Vert _{\infty }\}$ and $T_{c}^{*}=\{v | r_{\beta }(v)=\Vert \textbf{r} \Vert _{\infty }\}$.

Hence, we can confirm that (Algorithm 1) applies to both ${\textsf{D}}{\textsf{P}}(c)$ and ${\textsf{C}}{\textsf{P}}(c)$ and the optimal solutions of both programs induce the c-biased DDS. Hence, we use ${\textsf{C}}{\textsf{P}}(c)$ to analyze the convergence rate of . According to the previous convergence analysis of the Frank–Wolfe-based algorithms in [13, 27], the convergence rate of our algorithm can be described by a value related to the graph, $Q_{c}=\frac{1}{2}\textsf{Diam}({\mathcal {D}}_{c})^{2}\sup _{(\alpha ,\beta )\in {\mathcal {D}}_{c}}\Vert \nabla ^{2}f(\alpha ,\beta ) \Vert _{2}$, where $\textsf{Diam}({\mathcal {D}}_{c})$ is the diameter of ${\mathcal {D}}_{c}$, $\nabla ^{2}f(\alpha ,\beta )$ is the Hessian, and $\Vert \cdot \Vert _{2}$ is the spectral norm of a matrix.

Theorem 9.1

(Convergence Rate of Frank–Wolfe [27]) Suppose $(\alpha ^{*}, \beta ^{*}) \in {\mathcal {D}}_{c}$ is an optimal solution of ${\textsf{C}}{\textsf{P}}(c)$. Then, for all $i\ge 1$, $f(\alpha ^{(i)}, \beta ^{(i)}) - f(\alpha ^{*}, \beta ^{*}) \le \frac{2Q_{c}}{i+2}$.

Lemma 9.3

(Bounding $Q_{c}$) Given a directed graph $G=(V,E)$ with maximum outdegree $d^{+}_{\max }$ and maximum indegree $d^{-}_{\max }$ and a given c, we have that $Q_{c} \le 2|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}$.

Proof

First, we have $\textsf{Diam}({\mathcal {D}}_{c})=\sqrt{2|E|}$. The Hessian of $f(\alpha ,\beta )$ is irrelevant to the value of $(\alpha , \beta )$, and it is a nonnegative symmetric matrix. Therefore, $\sup _{(\alpha ,\beta )\in {\mathcal {D}}_{c}}\Vert \nabla ^{2}f(\alpha ,\beta ) \Vert _{2}$ is the maximum singular value of $\nabla ^{2}f(\alpha ,\beta )$. Let $A=\nabla ^{2}f(\alpha ,\beta )$, $\lambda _{1}$ be the maximum singular value (also the maximum eigenvalue) of A, x be the eigenvector associated with $\lambda _{1}$, and p be the component in which x has maximum absolute value. Without loss of generality, we assume $x_{p}$ is positive. We have

$$\begin{aligned} \lambda _{1} x_{p} = (Ax)_{p}=\sum _{q=1}^{2n}A_{p,q} x_{q}&\le \sum _{q=1}^{2n}A_{p,q}x_{p} \\&\le x_{p} \max \{2\sqrt{c}d^{+}_{\max },\frac{2}{\sqrt{c}}d^{-}_{\max } \}. \end{aligned}$$

Therefore, $Q_{c} \le 2|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}$. $\square $

Lemma 9.4

Suppose $(\alpha , \beta , r) \in {\mathcal {D}}_{c}$ such that $\varepsilon := \Vert \textbf{r} \Vert _{\infty } - \rho _{c}^{*}$, where $\rho _{c}^{*}=\Vert \textbf{r}^{*}\Vert _{\infty }$ and $(\alpha ^{*}, \beta ^{*}, \textbf{r}^{*})$ is the optimal solution of ${\textsf{D}}{\textsf{P}}(c)$. Then, we have that $(4\sqrt{c}+\frac{4}{\sqrt{c}})\cdot \left( f(\alpha , \beta ) - f(\alpha ^{*}, \beta ^{*}) \right) \ge \varepsilon ^{2}$.

Proof

First, we have $f(\alpha , \beta )-f(\alpha ^{*}, \beta ^{*}) \ge f(\alpha - \alpha ^{*}, \beta - \beta ^{*})$, because $f(\alpha , \beta ) - f(\alpha ^{*}, \beta ^{*}) - f(\alpha - \alpha ^{*}, \beta - \beta ^{*})$ is an affine function on ${\mathcal {D}}_{c}$ and obtains its minimum value 0 at $(\alpha ^{*}, \beta ^{*})$. Second, $f(\alpha - \alpha ^{*}, \beta - \beta ^{*})$ can be bounded by the $l^{2}$-norm of $\textbf{r}-\textbf{r}^{*}$, i.e., $(4\sqrt{c} + \frac{4}{\sqrt{c}})f(\alpha - \alpha ^{*}, \beta - \beta ^{*}) \ge \Vert \textbf{r} - \textbf{r}^{*} \Vert _{2}^{2}$. For the infinity norm and the $l^{2}$-norm, we have $\Vert \textbf{r} \Vert _{\infty } - \rho _{c}^{*} \le \Vert \textbf{r} - \textbf{r}^{*} \Vert _{\infty } \le \Vert \textbf{r} - \textbf{r}^{*} \Vert _{2}$. Combining the above inequalities, we will have the lemma. $\square $

Corollary 9.1

(Convergence of Algorithm 1) Suppose $d^{+}_{\max }$ (resp. $d^{-}_{\max }$) is the maximum outdegree (resp. indegree) of G and c is fixed. In Algorithm 1, for $i > 16(\sqrt{c} + \frac{1}{\sqrt{c}}) \frac{|E|\max \{\sqrt{c}d^{+}_{\max },\frac{1}{\sqrt{c}}d^{-}_{\max })\}}{\varepsilon ^{2}}$, we have $\Vert \textbf{r}^{(i)} \Vert _{\infty } - \rho _{c}^{*} \le \varepsilon $.

1.2 Proofs

Proof of Lemma 4.1

We prove the lemma by showing a feasible solution (x, s, t, a, b) of ${\textsf{L}}{\textsf{P}}(c)$. Let $a=\frac{2c'}{c+c'}$ and $b=\frac{2c}{c+c'}$. For each vertex $u\in P$, set $s_{u}=\frac{a\sqrt{c}}{|P|}=\frac{2c'\sqrt{c}}{(c+c')|P|}.$ For each vertex $v\in Q$, set $t_{v}=\frac{b}{\sqrt{c}|Q|}=\frac{2c}{(c+c')\sqrt{c}|Q|}=\frac{2c'\sqrt{c}}{(c+c')|P|}$. For each edge $(u,v)\in E(P,Q)$, set $x_{u,v}=s_{u}=t_{v}$. All the remaining variables are set to 0. Now, $\sum _{u\in V}s_{u}=a\sqrt{c}$ and $\sum _{v\in V}t_{v}=\frac{b}{\sqrt{c}}$. Hence, this is a feasible solution to ${\textsf{L}}{\textsf{P}}(c)$. The value of this solution is

$$\begin{aligned}{} & {} \frac{2c'\sqrt{c}}{(c+c')|P|}|E(P,Q)|=\frac{2\sqrt{c}c'\sqrt{|Q|}}{(c+c')\sqrt{|P|}}\frac{|E(P,Q)|}{\sqrt{|P||Q|}}\\{} & {} \quad =\frac{2\sqrt{c}\sqrt{c'}}{c+c'}\rho (P,Q). \end{aligned}$$

Thus, the lemma holds. $\square $

Proof of Lemma 4.2

Without loss of generality, we can assume that for each $(u,v)\in E$, $x_{u,v}=\min \{s_{u}, t_{v}\}$. We define a collection of sets S, T indexed by a parameter $r\ge 0$. Let $S(r)=\{u|s_u \ge r\}$, $T(r)=\{v|t_{v}\ge r\}$, and $E(r)=\{(u,v)|x_{u,v}=\min \{s_{u}, t_{v}\}\}$. Hence, E(r) is precisely the set of edges that go from S(r) to T(r).

Now, $\int _{0}^{\infty }|S(r)|\text {d}r=\sum _{u\in V}s_{u}=a\sqrt{c}$. Similarly, $\int _{0}^{\infty }|T(r)|\text {d}r=\sum _{v\in V}t_{v}=\frac{b}{\sqrt{c}}$. By the Cauchy–Schwarz inequality,

$$\begin{aligned}{} & {} \int _{0}^{\infty }\sqrt{|S(r)||T(r)|}\text {d}r\\{} & {} \quad \le \sqrt{\left( \int _{0}^{\infty }|S(r)|\text {d}r\right) \left( \int _{0}^{\infty }|T(r)|\text {d}r\right) }=\sqrt{ab}. \end{aligned}$$

Note that $\int _{0}^{\infty }|E(r)|\text {d}r=\sum _{(u,v)\in E}x_{u,v}$. This is the objective function value of the solution. Let this value be $x_{\text {sum}}$.

We claim that there exists r such that $\frac{E(r)}{\sqrt{|S(r)||T(r)|}}\ge \frac{x_{\text {sum}}}{\sqrt{ab}}$. Suppose there was no such r. Then,

$$\begin{aligned} \int _{0}^{\infty }|E(r)|\text {d}r < \frac{x_{\text {sum}}}{\sqrt{ab}} \int _{0}^{\infty }\sqrt{|S(r)||T(r)|}\text {d}r \le x_{\text {sum}}. \end{aligned}$$

This gives a contradiction. Thus, the lemma holds. $\square $

Proof of Lemma5.3

As $G[S_{c}^{*}, T_{c}^{*}]$ is the c-biased DDS with c-biased density $\rho _{c}(S^{*}, T^{*})$, there must exist $u\in S$ satisfying $r_{\alpha }(u) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})$, or $v\in T$ satisfying $r_{\beta }(v) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})$. Otherwise, G[S, T] is a subgraph with a higher c-biased density than $G[S_{c}^{*}, T_{c}^{*}]$.

Now, we prove the lemma by contradiction. Assume $G[S_{c}^{*},T_{c}^{*}]$ is not contained in G[S, T]. Based on whether $G[S_{c}^{*},T_{c}^{*}]$ overlaps G[S, T], there are two cases.

1.
$S_{c}^{*} \cap S = \emptyset $ and $T_{c}^{*} \cap T = \emptyset $. Since $|E(S_{c}^{*}, T_{c}^{*})|=(\frac{|S_{c}^{*}|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}|)\rho _{c}(S_{c}^{*}, T_{c}^{*})$, there exists $u\in S_{c}^{*}, r_{\alpha }(u) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})$ or $v\in T_{c}^{*}, r_{\beta }(v) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})$.
2.
$S_{c}^{*} \cap S \ne \emptyset $ or $T_{c}^{*} \cap T \ne \emptyset $.
$$\begin{aligned} \begin{aligned}&|E(S_c^{*}, T_{c}^{*})| \\ =&|E(S_c^{*}\cap S, T_{c}^{*}\cap T)| + |E(S_c^{*}, T_{c}^{*})\setminus E(S, T)| \\ =&\left( \frac{|S_{c}^{*}\cap S|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}\cap T|\right) \rho _{c}(S_c^{*}\cap S, T_{c}^{*}\cap T) \\&+ \left( \frac{|S_{c}^{*}\setminus S|}{\sqrt{c}}+\sqrt{c}|T_{c}^{*}\setminus T|\right) \rho '_{c}. \end{aligned} \end{aligned}$$
Since $\rho _{c}(S_c^{*}\cap S, T_{c}^{*}\cap T) \le \rho _{c}(S_{c}^{*}, T_{c}^{*})$, we have $\rho '_{c} \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})$. Thus, there exists $u\in S_{c}^{*}{\setminus } S, r_{\alpha }(u) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})$ or $v\in T_{c}^{*}{\setminus } T, r_{\beta }(v) \ge \rho _{c}(S_{c}^{*}, T_{c}^{*})$.

Consequently, for each case above, combining the inequalities will give us a contradiction to the first condition of the stable (S, T)-induced subgraph definition. Hence, the lemma holds. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, C., Fang, Y., Cheng, R. et al. Accelerating directed densest subgraph queries with software and hardware approaches. The VLDB Journal 33, 207–230 (2024). https://doi.org/10.1007/s00778-023-00805-0

Download citation

Received: 13 October 2022
Revised: 17 April 2023
Accepted: 19 June 2023
Published: 31 July 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00778-023-00805-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating directed densest subgraph queries with software and hardware approaches

Abstract

Access this article