Top-k overlapping densest subgraphs: approximation algorithms and computational complexity

A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put in the problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. An approach introduced to find possible overlapping subgraphs is the Top-k-Overlapping Densest Subgraphs problem. Given an integer k≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \ge 1$$\end{document} and a parameter λ>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda > 0$$\end{document}, the goal of this problem is to find a set of k dense subgraphs that may share some vertices. The objective function to be maximized takes into account the density of the subgraphs, the parameter λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} and the distance between each pair of subgraphs in the solution. The Top-k-Overlapping Densest Subgraphs problem has been shown to admit a 110\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{10}$$\end{document}-factor approximation algorithm. Furthermore, the computational complexity of the problem has been left open. In this paper, we present contributions concerning the approximability and the computational complexity of the problem. For the approximability, we present approximation algorithms that improve the approximation factor to 12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{2}$$\end{document}, when k is smaller than the number of vertices in the graph, and to 23\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{2}{3}$$\end{document}, when k is a constant. For the computational complexity, we show that the problem is NP-hard even when k=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=3$$\end{document}.


Introduction
Complex systems are usually analyzed with graphs. One of the most studied and central task to understand the behaviour of complex system is the identification of communities, that is cohesive subgraphs. This problem has been raised in several contexts, from social network analysis (Kumar et al. 1999) to finding functional motifs in biological networks (Fratkin et al. 2006). Different definitions of cohesive graphs have been proposed and applied in the literature. One of the most remarkable examples is Clique, and finding a maximum size clique is a well-known and studied problem in theoretical computer science (Karp 1972). Other interesting definitions of cohesive subgraphs have been proposed in the literature, for example relaxed cliques (Alba 1973;Mokken 1979;Komusiewicz 2016), which are graphs that satisfy a relaxation of some clique property, like the distance between vertices of the clique or the degree of the vertices of the clique. Notable examples of relaxed cliques are s-clubs, t-cliques, kcore, and s-plex [for an overview of the different clique relaxations, see Komusiewicz (2016)].
Most of the definitions of cohesive subgraph lead to NP-hard problems, in some cases even hard to approximate. For example, finding a clique of maximum size in a graph G = (V , E) is an NP-hard problem (Karp 1972) and it is even hard to approximate within factor O(|V | 1−ε ), for each ε > 0 (Zuckerman 2007). Similarly, finding an s-club, with s ≥ 2, of maximum size in a graph G = (V , E) is an NP-hard problem (Bourjolly et al. 2002) which admits an approximation algorithm of factor O(|V | 1/2 ) (Asahiro et al. 2017), while it is not approximable within factor O(|V | 1/2−ε ), for each ε > 0 ( Asahiro et al. 2017). A definition of a dense subgraph that leads to a polynomial-time algorithm is that of average-degree density. For this problem, called Densest Subgraph, Goldberg gave an elegant polynomial-time algorithm (Goldberg 1984), that requires O(|V | 3 ) time (Kawase and Miyauchi 2018), while a linear-time greedy algorithm that achieves an approximation factor of 1 2 for Densest Subgraph has been given in Asahiro et al. (1996) and Charikar (2000). A related problem, Densest k-Subgraph, is that of finding a densest subgraph with a constraint on the size of the subgraph. The problem becomes NP-hard, if it looks for a densest subgraph of a given size (Asahiro et al. 2002;Feige et al. 2001), of at most a given size (Andersen and Chellapilla 2009) or of at least a given size (Khuller and Saha 2009;Goldstein and Langberg 2009).
The Densest Subgraph problem aims at finding a single subgraph, but in many applications it is of interest to find a collection of dense subgraphs of a given graph. More precisely, it is interesting to compute a collection of distinct subgraphs having maximum density in a given graph. A recent approach proposed in Galbrun et al. (2016) asks for a collection of top k densest, possibly overlapping, distinct subgraphs (denoted as Top-k-Overlapping Densest Subgraphs), since in many real-world cases dense subgraphs are related to non-disjoint communities. As pointed out in Leskovec et al. (2009) and Galbrun et al. (2016), for example hubs are vertices that may be part of several communities and hence of several densest subgraphs, thus motivating the quest for overlapping distinct subgraphs. Top-k-Overlapping Densest Subgraphs, proposed in Galbrun et al. (2016), addresses this problem by looking for a set of k subgraphs that maximize an objective function that takes into account both the density of the subgraphs and the distance between the subgraphs of the solution, thus allowing an overlap between the subgraphs which depends on a parameter λ. When λ is small, compared to the density, then the density plays a dominant role in the objective function, so the output subgraphs can share a significant part of vertices. On the other hand, if λ is large compared to the density, then the subgraphs will share few or no vertices, so the subgraphs may be disjoint.
An approach similar to Top-k-Overlapping Densest Subgraphs was proposed in Balalau et al. (2015), where the goal is to find a set of k subgraphs of maximum density, with the constraint that the pairwise Jaccard coefficient (originally defined in Jaccard (1912)) between the subgraphs is bounded. A dynamic variant of the problem, whose goal is finding a set of k disjoint subgraphs, has been recently considered in Nasir et al. (2017).
Other approaches related to Top-k-Overlapping Densest Subgraphs include covering or partitioning an input graph in dense subgraphs, like Minimum Clique Partition (Garey and Johnson 1979) or Minimum s-Club Covering . However, notice that these approaches require that all the vertices of the graph belong to some dense subgraph of the solution, which is not the case for Top-k-Overlapping Densest Subgraphs.
Top-k-Overlapping Densest Subgraphs has been shown to be approximable within factor 1 10 (Galbrun et al. 2016), while its computational complexity has been left open (Galbrun et al. 2016). In this paper, we present algorithmic and complexity results for Top-k-Overlapping Densest Subgraphs when k is less than the number of vertices in the graph. This last assumption (required in Sect. 3) is reasonable, for example notice that in the experimental results presented in Galbrun et al. (2016) k is equal to 20, even for graphs having thousands or millions of vertices. Concerning the approximation of the problem, we provide in Sect. 3 a 2 3 -approximation algorithm when k is a constant (notice that the time complexity of this algorithm depends exponentially on k), and we present a 1 2 -approximation algorithm when k < |V |. From the computational complexity point of view, we show in Sect. 4 that Top-k Overlapping Densest Subgraphs is NP-hard even if k = 3 (that is we ask for three densest subgraphs), when λ = 3|V | 3 , for an input graph G = (V , E). Notice that, since λ is large, the three subgraphs computed by the reduction are disjoint. The rest of the paper is organized as follows. In Sect. 2, we present some definitions and we give the formal definition of the Top-k-Overlapping Densest Subgraphs problem. In Sect. 3, we present the two approximation algorithms. In Sect. 4, we present the complexity result for Top-k-Overlapping Densest Subgraphs and we show that it is NP-hard even if k = 3, when λ = 3|V | 3 .
We conclude the paper in Sect. 5 with some open problems.

Definitions
In this section, we present some definitions that will be useful in the rest of the paper. Moreover, we provide the formal definition of the problem we are interested in. All the graphs we consider in this paper are undirected. Given a graph G = (V , E), and a set V ⊆ V , we denote by Given a subset U ⊆ V , we denote by E(U ) the set of edges of G having both endpoints in U . Moreover, given that is the set of edges having exactly one endpoint in V 1 and exactly one endpoint in V 2 . Two subgraphs Next, we present the definition of crossing subgraphs, which is fundamental in Sect. 3.2. Fig. 1). Now, we present the definition of density of a subgraph. Given a graph G = (V , E) and a set of k pairwise distinct subgraphs W =

Definition 1 Given a graph
then the density of W, denoted by dens(W), is defined as follows: The goal of the problem we are interested in is to find a set of k, with 1 ≤ k < |V |, pairwise distinct and possibly overlapping subgraphs having high density. In order to differentiate these k subgraphs, in Galbrun et al. (2016) a distance function between subgraphs of the solution is included in the objective function. The problem we consider maximizes an objective function that includes the sum of the densities of the subgraphs and the distances between subgraphs. We present here the distance function between two subgraphs introduced in Galbrun et al. (2016).

Definition 3 Given a graph G = (V , E) and two subgraphs
and G[Z ], respectively, as follows: We prove an upper and a lower bound for the distance between two distinct subgraphs.
Now, we are able to define the problem we are interested in, introduced in Galbrun et al. (2016), where we add the constraint that k < |V |.

Problem 1 Top-k-Overlapping Densest Subgraphs
Notice that a solution W of Top-k-Overlapping Densest Subgraphs (see Fig. 1 for an example) consists of k distinct subgraphs, since W is a set. We denote by (G, λ) an instance of Top-k-Overlapping Densest Subgraphs. Moreover, we assume in what follows that |V | > 5 (it is required in the proof of Lemma 5). Notice that, when |V | ≤ 5, Top-k-Overlapping Densest Subgraphs can be solved optimally in constant time.  In the approximation algorithm, we will apply a modification of Goldberg's Algorithm given in Zou (2013). We refer to this algorithm as the Extended Goldberg's Algorithm. Extended Goldberg's Algorithm (Zou 2013) addresses a constrained variant of Densest-Subgraph, that, given as input a graph G = (V , E) and a subset

Approximating Top-k-Overlapping Densest Subgraphs
In this section, we present a 2 3 -approximation algorithm for Top-k-Overlapping Densest Subgraphs when k is a constant and a 1 2 -approximation algorithm when k is not a constant. First, the two approximation algorithms compute a densest subgraph of G, denoted by G[W 1 ]. Then, the two approximation algorithms iteratively compute a solution for an intermediate problem, called Densest-Distinct-Subgraph. When k is constant we are able to solve the Densest-Distinct-Subgraph problem in polynomial time, while for general k we are able to provide a 1 2 -approximation algorithm for it. First, we introduce the Densest-Distinct-Subgraph problem, then we present the two approximation algorithms and the analysis of their approximation factors.

Problem 2 Densest-Distinct-Subgraph
Notice that Densest-Distinct-Subgraph is not identical to compute a densest subgraph of G, as we need to ensure that the returned subgraph G[Z ] is distinct from any subgraph in W. Moreover, notice that we assume t ≤ k − 1, since if t = k we already have k subgraphs in our solution of Top-k-Overlapping Densest Subgraphs.

Approximation for constant k
First, we show that Densest-Distinct-Subgraph is polynomial-time solvable when k is a constant. The approximation algorithm for Top-k-Overlapping Densest Subgraphs returns the solution of maximum value between a solution obtained by iteratively solving Densest-Distinct-Subgraph (see Algorithm 2) and a solution consisting of k singletons.

A polynomial-time algorithm for Densest-Distinct-Subgraph
We start by proving a property of solutions of Densest-Distinct-Subgraph.

Lemma 2 Consider a graph G = (V , E) and a set
, it follows that: (1) there exists u j ∈ Z \W j , then add u j to U 1 , or (2) there exists u j ∈ W j \Z , then add u j to U 2 . By construction, the two sets U 1 and U 2 satisfy the lemma.
Next, based on Lemma 2, we provide Algorithm 1 that computes an optimal solution of Densest-Distinct-Subgraph, when k is a constant. Algorithm 1 iterates over each subset U of at most t vertices (recall that |W| = t < k) and over the subsets U 1 , U 2 ⊆ U such that U 1 U 2 = U . Algorithm 1 computes a densest subgraph G[Z ] of G, with constrained set U 1 and with Z ∩ U 2 = ∅, such that there is no subgraph of W that contains U 1 and whose set of vertices is disjoint from U 2 . Algorithm 1 applies the Extended Goldberg's algorithm on the subgraph G[V \U 2 ], with constrained set U 1 .

Algorithm 1: Returns an optimal solution for
We prove the correctness of Algorithm 1 in the next theorem.

Theorem 1 Let G[Z ] be the solution returned by Algorithm
be the solution returned by Algorithm 1. By Lemma 2 it follows that for each subgraph distinct from those in W, hence also for an optimal solution G[X ] of Densest-Distinct-Subgraph over instance (G, W), there exist t (non necessarily distinct) vertices u 1 , . . . , u t , that can be partitioned into two sets U 1 , U 2 such that X ⊇ U 1 , returned by Algorithm 1 is computed as a densest subgraph over each subset U of at most t vertices and for each partition of U into two sets U 1 and U 2 , such that Z ⊇ U 1 , Z ∩ U 2 = ∅ and there is no G[W j ] in W, with 1 ≤ j ≤ t, such that W j ⊇ U 1 and W j ∩ U 2 = ∅. This holds also when U 1 = U 1 and We recall that a densest subgraph constrained to a given set can be computed in time O(|V | 3 ) with the Extended Goldberg's Algorithm (Zou 2013;Kawase and Miyauchi 2018). The set U can be computed in O(|V | k−1 ) time, by selecting t elements from V , since there are |V | t ≤ |V | k−1 many of these subsets. For each U , the possible choices of U 1 and U 2 are O(2 k−1 ), which is a constant, since k is a constant. It follows that Algorithm 1 returns an optimal solution of Densest-Distinct-Subgraph in time

A 2 3 -approximation algorithm when k is a constant
We show that, by solving the Densest-Distinct-Subgraph problem optimally, we achieve a 2 First, we consider the solution returned by Algorithm 2. At each step, Algorithm 2 computes an optimal solution of Densest-Distinct-Subgraph in time O(|V | k+2 ) and the output subgraph is added to the solution. Since k is a constant, the number of iterations of Algorithm 2 is a constant, the overall time complexity of Algorithm 2 is O(|V | k+2 ).
Algorithm 2: Algorithm that returns an approximate solution of Top-k-Overlapping Densest Subgraphs Proof The second inequality follows from Lemma 1 and from the fact that the subgraphs in W are all distinct. We prove the first inequality of the lemma by induction on the number h ≤ k of subgraphs added to W.
thus concluding the proof.
Consider a trivial algorithm, called Algorithm A T 1 , that, given an instance We can prove now that the maximum between r (W) (where W is the solution returned by Algorithm 2) and r (W T ) (where W T is the solution returned by Algorithm A T ) is at least 2 3 of the value of an optimal solution of Top-k-Overlapping Densest Subgraphs.
thus in this case A T returns a solution having approximation factor 2 3 . Second, assume that Since we can conclude that hence r (W) ≥ 2 3 r (W o ).

Approximation when k is not a constant
Now, we show that Top-k-Overlapping Densest Subgraphs can be approximated within factor 1 2 when k is not a constant. The approximation algorithm (Algorithm 3), consists of two phases. In the first phase, while W does not contain crossing subgraphs (see Definition 1 of crossing subgraphs), Algorithm 3 adds to W a subgraph which is an optimal solution of Densest-Distinct-Subgraph. When W contains crossing subgraphs (Property 1 holds), Phase 2 of Algorithm 3 completes W, by adding a set of subgraph so that W contains k distinct subgraphs (see the description of Phase 2). We prove that the subgraphs added by Phase 2 are sufficiently dense (see Lemma 6). Notice that the subgraphs added by the algorithm are only distinct, that is a subgraph may be contained or have almost the same vertex set of another subgraph. Complete W by adding the k − |W| densest distinct subgraphs (not already in W) induced by W i ∪ {v}, with v ∈ V \W i , by W j ∪ {u}, with u ∈ V \W j , and by W j \{w}, with w ∈ W i, j (or equivalently by W i \{w}); 12 Return(W); First, we define formally the property on which Algorithm 3 is based.

Algorithm 3: Returns an approximate solution of Top-k-Overlapping Densest Subgraphs
Property 1 W contains two crossing subgraphs.

Description and analysis of phase 1
We show that, while W does not satisfy Property 1, Densest-Distinct-Subgraph can be solved optimally in polynomial time. We assume that a solution of Densest-Distinct-Subgraph contains at least two vertices, otherwise such a subgraph can be easily computed in polynomial time, since it consists of a single vertex and has density 0. First, we prove a property of a solution of Densest-Distinct-Subgraph when Property 1 does not hold.

, of distinct subgraphs of G that does not satisfy Property 1. Given a subgraph G[Z ] distinct from the subgraphs in W, there exists a set U of at most three vertices that can be partitioned in two subsets U 1 and U 2 , where U 2 can possibly be empty, such that Z
Proof Consider a subgraph G[Z ] distinct from the subgraphs in W and a vertex v 1 ∈ Z . Set U = {v 1 }. Notice that, for each subgraph in W that does not contain v 1 , the lemma holds. Now, we consider the set W of subgraphs in W that contain v 1 , and we assume in the following that W = ∅.
Consider the pair (W , ⊆), where ⊆ is the subgraph inclusion relation 2 . (W , ⊆) is a well-ordered set 3 . Clearly, ⊆ is reflexive, antysimmetric and transitive on W . We show that (W , ⊆) is comparable, that is, given It follows that they are crossing subgraphs, since they both contain v 1 , contradicting the hypothesis that Property 1 does not hold. Since W is a finite set, it follows that (W , ⊆) is a well-ordered set.
Consider now the set W C of subgraphs in W that are subgraphs of G[Z ] and notice that, since (W , ⊆) is a well-ordered set, then also (W C , ⊆) is a well-ordered set. Let Consider now the set W N of subgraphs in W which are not subgraphs of G [Z ]. Notice that (W N , ⊆) is a well-ordered set and let G[W y ] be the graph of minimum cardinality in W N . It follows that there exists a vertex v 3 ∈ W y \Z , and notice that, since (W N , ⊆) is a well-ordered set, v 3 belongs to each subgraph in W N . Hence add v 3 to U and set U 2 = {v 3 }.
Since we have shown that there exists U 1 ⊆ Z that is not contained in any subgraph of W C and there exists U 2 Z that is contained in each subgraph of W N , the lemma follows.
is maximum 1 Z ← ∅; 2 dens ← 0; 3 for Each subset U ⊆ V of at most three vertices, and each partition of U in Algorithm 4 computes an optimal solution G[Z ] of Densest-Distinct-Subgraph when Property 1 does not hold. Algorithm 4 is a modified variant of Algorithm 1 (see Sect. 3.1), which considers each set U of three vertices and each possible partition of U into U 1 , U 2 (where U 2 can be empty). Based on Lemma 4, we can prove the following result.

Theorem 3 Let G[Z ] be the solution returned by Algorithm 4. Then, an optimal solution of Densest-Distinct-Subgraph over instance (G, W) when Property 1 does not hold has density at most dens(G[Z ]).
Proof Given (G, W), consider a subgraph G[X ] of maximal density distinct from the subgraphs in W. By Lemma 4, it follows that there exists a set U of at most three vertices that can be partitioned into subsets U 1 , U 2 such that U 1 ⊆ X and U 2 ∩ X = ∅ and there is no subgraph in W satisfying the same property. The subgraph G[Z ] returned by Algorithm 4 is computed as a densest subgraph over each subset U of three vertices and each bipartition U 1 , U 2 of U such that U 1 ⊆ Z and U 2 ∩ Z = ∅ and there is no subgraph in W satisfying the same property. This holds also in the case Notice that Algorithm 4 returns an optimal solution of Densest-Distinct-Subgraph when Property 1 does not hold in time O(|V | 6 ), since it applies the Extended Goldberg's Algorithm of complexity O(|V | 3 ) (Zou 2013;Kawase and Miyauchi 2018) for each subset of three vertices in V .

Description and analysis of phase 2
Assuming that Property 1 holds and |W| = t < k, we consider Phase 2 of Algorithm 3. Given two crossing subgraphs G[W i ] and G[W j ] of W, with 1 ≤ i ≤ t, 1 ≤ j ≤ t and i = j, define W i, j = W i ∩ W j . Algorithm 3 adds h = k − t subgraphs to W until |W| = k, as follows.
If |W i, j | ≤ 3, then Phase 2 of Algorithm 3 adds the h densest distinct subgraphs (not already in W) induced by W i ∪ {v}, for some v ∈ V \W i , and by W j ∪ {u}, for some u ∈ V \W j .
If |W i, j | ≥ 4, then Phase 2 of Algorithm 3 adds the h densest distinct subgraphs (not already in W) induced by W i ∪ {v}, for some v ∈ V \W i , by W j ∪ {u}, for some u ∈ V \W j , and by W j \{w}, for some w ∈ W i, j (or equivalently by W i \{w}, for some w ∈ W i, j ).
Next, we show that, after Phase 2 of Algorithm 3, |W| = k and the set W of subgraphs added by Phase 2 has density at least 1 We start by proving that, after Phase 2 of Algorithm 3, |W| = k. Lemma 5 is based on the size of W i, j = W i ∩ W j . When |W i, j | ≤ 3, we distinguish two cases depending on the number of vertices that belong to |W i \W i, j | and |W j \W i, j |. If one of these sets has at least two vertices, then there are enough subgraphs obtained by adding a vertex to W i and W j . In the other case (that is |W i \W i, j | = |W j \W i, j | = 1), then |W i ∪ W j | = 5, thus there are |V | − 5 vertices that can be added to W i and to W j .
When |W i, j | ≥ 4, we can show that there are at least |V \W i, j | subgraphs obtained by adding a vertex to W i or to W j . Then, we show that there are |W i, j | subgraphs induced by W j \{w} (which are added by Phase 2).

Lemma 5 |W| = k after the execution of Phase 2 of Algorithm 3.
Proof Recall that we have assumed |V | > 5 and that G[W i ] and G[W j ] are two crossing subgraphs added in Phase 1 of Algorithm 3, with W i, j = W i ∩ W j . Next, we consider three cases depending on the size of W i, j .
Consider the case that |W i, j | ≤ 3. If |W i \W i, j | ≥ 2 or |W j \W i, j | ≥ 2, then W i ∪ {v}, with v ∈ V \W i , and W j ∪ {u}, with u ∈ V \W j induce distinct subgraphs. Hence there exist at least |V | − 3 distinct subgraphs induced by W i ∪ {v}, with v ∈ V \W i , or by W j ∪ {u}, with u ∈ V \W j . Since G[W i ] and G[W j ] are in W and k ≤ |V | − 1, it follows that in this case k subgraphs belong to W after Phase 2 of Algorithm 3.
If both |W i \W i, j | = 1 and |W j \W i, j | = 1, then there exist one subgraph induced by W i ∪ W j , since we have assumed that |W i, j | ≤ 3, at least |V |−5 distinct subgraphs induced by are in W and k ≤ |V | − 1, it follows that in this case k subgraphs belong to W after Phase 2 of Algorithm 3.
Consider now the case that |W i, j | ≥ 4. There exist at least |V \W i | subgraphs induced by W i ∪ {v}, with v ∈ V \W i , and at least |V \W j | subgraphs induced by W j ∪ {u}, with u ∈ V \W j . Hence there exist at least |V \W i, j | − 1 distinct subgraphs induced by W i ∪ {v}, with v ∈ V \W i , or by W j ∪ {u}, with u ∈ V \W j (notice that the value −1 is due to the fact that W i ∪ {v} and W j ∪ {u} induce identical subgraphs when W i \W j = {u} and W j \W i = {v}).
There exist at least |W i, j | subgraphs induced by W j \{w}, for some w ∈ W i, j . Since k ≤ |V | − 1, it follows that in this case k subgraphs belong to W after Phase 2 of Algorithm 3.
Each edge {v, w}, with v, w ∈ W i, j , is skipped in the sum exactly twice, once for u = v and once for u = w. It follows that Thus It follows that Since |W i, j | ≥ 4, it follows that Since Algorithm 3 adds the h most dense subgraphs among the choice of u ∈ W i, j so that |W| = k, this completes the proof. Now, we consider the time complexity of Algorithm 3.

Lemma 7 Algorithm 3 requires O(|V | 7 ) time.
Proof Phase 2 of Algorithm 3 requires O(k 2 |V |) time, since we have to compare each subgraph to be added to W with the subgraphs already in W and each of this comparison requires O(k|V |) time. Each iteration of Phase 1 of Algorithm 3 requires time O(|V | 6 ), hence the overall complexity of Algorithm 3 is O(|V | 7 ), since Phase 1 is iterated at most k ≤ |V | − 1 times. Now, thanks to Lemma 6, we are able to prove that the density of the solution returned by Algorithm 3 is at least half the density of an optimal solution of Top-k-Overlapping Densest Subgraphs. We prove that the lemma holds for the subgraphs added by Phase 1 of Algorithm 3 by induction on k 1 . When . Assume that the lemma holds for h < k 1 , we prove that it holds for h + 1.
Notice that

By induction hypothesis
We prove that .
Combining Inequalities 2, 3, we obtain thus concluding the proof. We can conclude the analysis of the approximation factor with the following result.
We can conclude that r (W) ≥ 1 2 r (W o ).

Complexity of Top-k-Overlapping Densest Subgraphs
In this section, we consider the computational complexity of Top-k-Overlapping Densest Subgraphs and we show that the problem is NP-hard even if k = 3, when λ = 3|V | 3 . We denote this restriction of the problem by Top-3-Overlapping Densest Subgraphs. Notice that our hardness result applies when λ is large (λ = 3|V | 3 ) and hence an optimal solution of Top-3-Overlapping Densest Subgraphs consists of three disjoint subgraphs. We prove the result by giving a reduction from 3-Clique Partition, which is NPcomplete (Karp 1972). Next, we recall the definition of 3-Clique Partition.

Problem 3 3-Clique Partition
Input: A graph G P = (V P , E P ). Output: A partition of V P into V P,1 , V P,2 , V P,3 such that V P = V P,1 V P,2 V P,3 and each G[V P,i ], with 1 ≤ i ≤ 3, is a clique.
Given an instance G P = (V P , E P ) of 3-Clique Partition, define an instance (G = (V , E), λ) of Top-3-Overlapping Densest Subgraphs as follows: set G = G P and λ = 3|V | 3 . In order to define a reduction from 3-Clique Partition to Top-3-Overlapping Densest Subgraphs, we show the following result.
Since G P [V P,i ], with 1 ≤ i ≤ 3, is a clique and, by construction, G[V i ] is also a clique, it follows that By construction of G, it follows that G[V P,1 ], G[V P,2 ], G [V P,3 ] are disjoint, V P,1 V P,2 V P,3 = V P and that G[V P,i ], with 1 ≤ i ≤ 3, is a clique.
We can conclude that Top-3-Overlapping Densest Subgraphs is NP-hard.

Theorem 5 Top-3-Overlapping Densest Subgraphs is NP-hard.
Proof From Lemma 9, it follows that we have described a polynomial-time reduction from 3-Clique Partition to Top-3-Overlapping Densest Subgraphs. Since 3-Clique Partition is NP-complete (Karp 1972), it follows that also Top-3-Overlapping Densest Subgraphs is NP-hard.

Conclusion
We have shown that Top-k-Overlapping Densest Subgraphs is NP-hard when k = 3 and we have given two approximation algorithms of factor 2 3 and 1 2 , when k is a constant and when k is smaller than the number of vertices in the graph, respectively. For future works, it would be interesting to further investigate the approximability of Top-k-Overlapping Densest Subgraphs, it remains open whether the problem admits a polynomial-time approximation scheme. A second interesting open problem is the computational complexity of Top-k-Overlapping Densest Subgraphs, in particular when λ is a constant and when the subgraphs in the solution overlap. Another open problem of theoretical interest is the computational complexity of Topk-Overlapping Densest Subgraphs when k = 2.
Another direction is the investigation of the problem with other distance functions. The distance function we have considered has been introduced and applied in Galbrun et al. (2016) and, thanks to its properties (see Lemma 1), we were able to improve the constant-factor approximation of Top-k-Overlapping Densest Subgraphs, since it is enough to return distinct subgraphs. However, for other distance functions alternative algorithmic strategies may be needed to provide approximation algorithms. For example, one may consider the following distance function: e l s e .
Notice that Lemma 1 does not hold for this distance function, so the approximation results we have given cannot be applied.