Complexity of the (Connected) Cluster Vertex Deletion problem on $H$-free graphs

The well-known Cluster Vertex Deletion problem (CVD) asks for a given graph $G$ and an integer $k$ whether it is possible to delete a set $S$ of at most $k$ vertices of $G$ such that the resulting graph $G-S$ is a cluster graph (a disjoint union of cliques). We give a complete characterization of graphs $H$ for which CVD on $H$-free graphs is polynomially solvable and for which it is NP-complete. Moreover, in the NP-completeness cases, CVD cannot be solved in sub-exponential time in the vertex number of the $H$-free input graphs unless the Exponential-Time Hypothesis fails. We also consider the connected variant of CVD, the Connected Cluster Vertex Deletion problem (CCVD), in which the set $S$ has to induce a connected subgraph of $G$. It turns out that CCVD admits the same complexity dichotomy for $H$-free graphs. Our results enlarge a list of rare dichotomy theorems for well-studied problems on $H$-free graphs.


Introduction and results
A very extensively studied version of graph modification problems asks to modify a given graph to a graph that satisfies a certain property G by deleting a minimum number of vertices.The case G being 'edgeless' is the well-known vertex cover problem, one of the classical NP-hard problems.If G is a 'cluster graph', a graph in which every connected component is a clique, the corresponding problem is another well-known NP-hard problem, the cluster vertex deletion problem (cluster-vd for short).In this paper, we revisit the computational complexity of cluster-vd, formally given below.

cluster-vd
Instance: A graph G = (V, E) and an integer k.Question: Does there exist a vertex set S ⊆ V of size at most k such that G − S is a cluster graph?
Being an hereditary property on induced subgraphs, cluster-vd is NP-complete [25] and cannot be solved in 2 o(n+m) time unless the ETH (Exponential-Time Hypothesis) fails [21], where n and m are the vertex and edge number of the input graphs, respectively.cluster-vd remains NP-complete even when restricted to planar graphs [32] and to bipartite graphs [33],

Preliminaries
For a set H of graphs, H-free graphs are those in which no induced subgraph is isomorphic to a graph in H.We denote by K 1,n the tree with n + 1 ≥ 3 vertices and n leaves, by C n the n-vertex cycle.The girth girth(G) of a graph G is the smallest length of a cycle in G; we set girth(G) = ∞ if G is a forest, a graph without cycles.Thus, for any fixed integer g ≥ 3, gith(G) > g if and only if G is {C 3 , C 4 , . . ., C g }-free.
As usual, we denote by G the complement of a graph G.The union G + H of two vertex-disjoint graphs G and H is the graph with vertex set V (G) ∪ V (H) and edge set E(G) ∪ E(H); we write pG for the union of p copies of G.For a subset S ⊆ V (G), let G[S] denote the subgraph of G induced by S; G − S stands for G[V (G) \ S].By 'G contains an H' we mean G contains H as an induced subgraph.Graphs in which every vertex has degree 3 are called 3-regular graphs or cubic graphs and graphs with maximum degree 3 subcubic graphs.
A graph G is a cluster graph if each of its connected components is a clique.Observe that G is a cluster graph if and only if G is P 3 -free.If S ⊆ V (G) is a subset of vertices of G such that G − S is P 3 -free, then S is called a cluster vertex deletion set of G.An optimal cluster vertex deletion set is one of minimum size.
Algorithmic lower bounds in this paper are conditional, based on the Exponential Time Hypothesis (ETH) [16].The ETH asserts that no algorithm can solve 3sat in subexponential time 2 o(n) for n-variable 3-cnf formulas.As shown by the Sparsification Lemma in [17], the hard cases of 3sat consist of sparse formulas with m = O(n) clauses.Hence, the ETH implies that 3sat cannot be solved in time 2 o (n+m) .
Recall that an instance for nae 3sat is a 3-cnf formula F = C 1 ∧ C 2 ∧ • • • ∧ C m over n variables, in which each clause C j consists of three distinct literals.The problem asks whether there is a truth assignment of the variables such that every clause in F has at least one true and at least one false literal.Such an assignment is called an nae assignment, i.e. a not-all-equal assignment.There is a polynomial reduction from 3sat to nae 3sat ([26, Theorem 7.3]), which transforms an instance for 3sat with n variables and m clauses to an equivalent instance for nae 3sat with 2n + 24m variables and 32m clauses.Thus, we obtain: ▶ Theorem 3 ([26, 17]).nae 3sat is NP-complete and, assuming ETH, cannot be solved in time 2 o(n+m) on inputs with n variables and m clauses.
We will also need the following restriction of nae 3sat.For integers p, q ≥ 2, let (p, q)-3sat denote the problem of deciding if a 3-cnf formula in which each variable occurs at most p times positively and at most q times negatively is satisfiable.(p, q)-nae 3sat is defined analogously.A reduction from 3sat, linear in the number of clauses, due to Tovey [30] shows that (2, 2)-3sat remains NP-complete and, assuming ETH, cannot be solved in time 2 o(n) time for inputs with n variables.Now, the reduction due to Moret [26,Theorem 7.3] mentioned above transforms an instance for (2, 2)-3sat to an equivalent instance for (4, 4)-nae 3sat, linear in the number of variables and clauses.Hence, we obtain: ▶ Theorem 4 ([30, 26, 17]).(4, 4)-nae 3sat is NP-complete and, assuming ETH, cannot be solved in time 2 o(n) on inputs with n variables.
Structure of the paper.We first address the polynomial part of Theorems 1 and 2 in the next section.Then we present two new NP-completeness results for cluster-vd and connected cluster-vd in Sections 4 and 5.These hardness results allow us to clear the NP-completeness part of Theorems 1 and 2 in Section 6.The last section concludes the paper.

H-free graphs: polynomial cases
The polynomial part in Theorems 1 and 2 consists of six cases; see Fig. 1 for all graphs H for which cluster-vd and connected cluster-vd are polynomially solvable on H-free graphs.
Figure 1 The graphs H for which cluster-vd and connected cluster-vd are polynomially solvable on H-free graphs.
Observe that H-freeness is hereditary, meaning if H ′ is an induced subgraph of H then H ′ -free graphs are H-free graphs.Thus, it suffices to prove the polynomial part only for the case where H is the 4-vertex path P 4 .
The proof will follow from the concept of clique-width of graphs in connection with the so-called monadic second-order logic, M SOL 1 for short, an extension of first-order logic with quantification over vertex set variables.Briefly, the clique-width of a graph G, introduced in [8], is the minimum number of labels needed to construct G by: creating a new vertex with label i, taking a disjoint union of two labeled graphs, joining every vertex with label i to every vertex with label j ̸ = i, and renaming label i to label j.Such a construction with k labels defines an algebraic k-expression.A well-known metatheorem by Courcelle, Makowsky and Rotics [9] states that any graph property expressible in M SOL 1 is decidable in linear time for graphs with bounded clique-width, provided a k-expression of the graphs is given.It is well known that P 4 -free graphs, also known as cographs, have clique-width at most 2 and a corresponding 2-expression can be constructed in linear time (see, e.g., [9]).Hence, any M SOL 1 graph property is decidable in linear time when restricted to P 4 -free graphs.Now, being a cluster vertex deletion set is a M SOL 1 property: where S(x) means x ∈ S and E(x, y) means xy ∈ E(G).(The sentence says that the graph G − S is P 3 -free.)Also, the fact that the vertex set S in a graph G induces a connected subgraph of G can be written as a M SOL 1 sentence: (The sentence says that, for any bipartition of S into two non-empty sets, there is an edge joining two vertices in different parts of the bipartition.)Thus, cluster-vd and connected cluster-vd can be solved in linear time on P 4 -free graphs.Indeed, we have a stronger fact.The weighted optimization version of cluster-vd and connected cluster-vd, minimum cluster-vd and minimum connected clustervd, are LinEM SOL τ1,p problems (LinEM SOL τ1,p is an extension of M SOL 1 which allows one to search for optimal sets of vertices with respect to some linear objective function).We refer to the paper [9] for details, in which it is shown that every LinEM SOL τ1,p problem on P 4 -free graphs can be solved in linear time [9,Theorem 4].To sum up, we have: ▶ Proposition 5. cluster-vd and connected cluster-vd can be solved in linear time on P 4 -free graphs, even in the weighted optimization version.
Another approach for obtaining the above results is to use the so-called cotree of cographs.Using the cotree of a cograph G, we are able to compute an optimal (connected) cluster vertex deletion set of G in linear time in a direct and simple way.The details are given in the appendices A and B.

4
Cluster-VD and Connected Cluster-VD on dense graphs In this section, we give a polynomial reduction from vertex cover to cluster-vd, showing that cluster-vd remains NP-complete when restricted to {3P 1 , 2P 2 }-free n-vertex graphs with minimum degree at least n − 4.
Recall that the vertex cover problem asks, for a given graph G and an integer k, if one can delete a vertex set S of size at most k such that G − S is edgeless.It is well known that vertex cover is NP-complete and, assuming ETH, cannot be solved in 2 o(n+m) time on n-vertex m-edge graphs.This fact and a result in [18] imply that, assuming ETH, vertex cover cannot be solved in 2 o(n) time on subcubic n-vertex graphs.There is a polynomialtime reduction from vertex cover in cubic graphs to vertex cover in subcubic planar graphs with arbitrarily large girth, which transforms an instance (G, k) of the first version to an equivalent instance (G ′ , k ′ ) for the second version, where the vertex number of G ′ is linear in the vertex number of G (see, e.g., [28] or [21]).Thus, we obtain: ▶ Theorem 6 ([18, 28, 21]).Let g ≥ 3 be a fixed integer.vertex cover is NP-complete even when restricted to subcubic graphs of girth > g and, assuming ETH, vertex cover cannot be solved in 2 o(n) time in this restricted graph class.
We now describe the announced reduction.Let g ≥ 3 be an integer and let (G, k) be an instance for vertex cover, where G is a n-vertex subcubic graph with girth > g.We may assume that G is not perfect.This is because vertex cover is polynomially solvable on perfect graphs (see [12]); notice that G is perfect if and only if G is perfect and perfect graphs can be recognized in polynomial time [5], and k ≤ |V (G)|/2.This fact can be easily seen as follows: given G with n vertices and an integer k, let G ′ be obtained from G by adding p = max{0, 2k − n} isolated vertices.Then k = |V (G ′ )|/2 and (G, k) ∈ vertex cover if and only if (G ′ , k) ∈ vertex cover.Notice that like G, G ′ is subcubic, not perfect and has girth > g, too.
From (G, k) we construct an equivalent instance (G ′ , k ′ ) for cluster-vd as follows: G ′ is obtained from two disjoint copies of G, G 1 and G 2 , by adding all possible edges between V (G 1 ) and V (G 2 ).Set k ′ = 2k.
We argue that (G, k) ∈ vertex cover if and only (G ′ , k ′ ) ∈ cluster-vd.First, let S ⊂ V (G) be a vertex cover, that is G − S is edgeless, with |S| ≤ k.Let S 1 and S 2 be the copy of S in G 1 and G 2 , respectively.Then, for each i ∈ {1, 2}, G i − S i is a clique in G i = G, and with We have seen that G has a vertex cover of size at most k if and only if G ′ has a cluster vertex deletion set of size at most k ′ , as claimed.
Note that G ′ has 2n vertices and minimum degree at least 2n − 4 (as G has n vertices and maximum degree at most 3).Now, observe that, for any connected graph X, if G is X-free then G ′ is X-free.Since G is {C 3 , C 4 , . . ., C g }-free, we obtain with Theorem 6: ▶ Theorem 7.For any fixed g ≥ 3, cluster-vd is NP-complete on {C 3 , C 4 , . . ., C g }-free n-vertex graphs with minimum degree at least n − 4 and, assuming ETH, cannot be solved in In particular, cluster-vd is NP-complete on {3P 1 , 2P 2 }-free graphs and, assuming ETH, cannot be solved in 2 o(n) time.
We observe that the proof of Theorem 7 remains true for connected cluster vertex deletion sets: G has a vertex cover of size at most k ≤ |V (G)|/2 if and only if G ′ has a connected cluster vertex deletion set of size at most k ′ = 2k.Thus, Theorem 7 also holds for connected cluster-vd: ▶ Theorem 8.For any fixed g ≥ 3, connected cluster-vd is NP-complete on {C 3 , C 4 , . . ., C g }-free n-vertex graphs with minimum degree at least n − 4 and, assuming ETH, cannot be solved in 2 o(n) time.
In particular, connected cluster-vd is NP-complete on {3P 1 , 2P 2 }-free graphs and, assuming ETH, cannot be solved in 2 o(n) time.

Cluster-VD and Connected Cluster-VD on sparse graphs
In [33, Lemma 1], Yannakakis gave a polynomial-time reduction from nae 3sat to clustervd, which transforms an instance for nae 3sat with n variables and m clauses, into an equivalent instance (G, k) for cluster-vd, where G is a bipartite graph with 6n + 12m vertices.Thus, by Theorem 3, cluster-vd is NP-complete even when restricted to bipartite graphs and, assuming ETH, cluster-vd cannot be solved in 2 o(n) time on bipartite graphs with n vertices.We remark that by considering (4, 4)-nae 3sat instead of nae 3sat, the bipartite graph obtained from the reduction of Yannakakis mentioned above has maximum degree at most four.Thus, by Theorem 4, we obtain: ▶ Theorem 9 ([33]).cluster-vd is NP-complete even when restricted to n-vertex bipartite graphs of maximum degree at most 4 and, assuming ETH, cannot be solved in 2 o(n) time.
In [14], Hsieh, Le, Le and Peng gave another polynomial-time reduction from nae 3sat to cluster-vd, which transforms an instance for nae 3sat with n variables and m clauses, into an equivalent instance (G, k) for cluster-vd, where G is a subcubic bipartite graph with 6nm + 30m vertices.Recall that we may assume (by the Sparsification Lemma) that m = O(n).Thus, by Theorem 3, we obtain: ▶ Theorem 10 ([14]).cluster-vd is NP-complete even when restricted to subcubic n-vertex bipartite graphs and, assuming ETH, cannot be solved in time In this section, we will further improve Theorems 9 and 10 by Theorems 12 and 13, respectively.We begin with the following fact.Proof.Observe that since G is triangle-free, a cluster in G is a collection of isolated vertices and edges.
For one direction, extend a cluster vertex deletion set S ⊆ V (G) to a cluster vertex deletion set S ′ ⊆ V (G ′ ) of size |S| + m as follows; see also Fig. 2 x for some edge e = xy and e ′ = xz, it follows from these facts and the assumption that G is triangle-free that G ′ − S ′ is P 3 -free.For the other direction, suppose that G ′ has a cluster vertex deletion set of size at most k + m, and consider such a set S ′ of minimum size.Then, we may assume that, for each edge e = xy in G, S ′ contains exactly one of e x , e xy and e y : note that e x e xy e y is a P 3 , hence |S ′ ∩ {e x , e xy , e y }| ≥ 1, and by minimality, |S ′ ∩ {e x , e xy , e y }| ≤ 2. Now, if |S ′ ∩ {e x , e xy , e y }| = 2 for some edge e = xy in G, then S ′ can be modified to a minimum cluster vertex deletion set containing exactly one of e x , e xy and e y as follows: suppose that e x , e xy ∈ S ′ .Then x, y ̸ ∈ S ′ (if x ∈ S ′ then S ′ − e x would be a cluster vertex deletion set of G ′ , and if y ∈ S ′ then S ′ − e xy would be a cluster vertex deletion set of G ′ , contradicting the minimality of S ′ ), and S ′′ = S ′ − e xy + y is the desired cluster vertex deletion set of minimum size; suppose that e y , e xy ∈ S ′ .Then similar to the above case, x, y ̸ ∈ S ′ , and S ′′ = S ′ − e xy + x is the desired cluster vertex deletion set of minimum size; suppose that e x , e y ∈ S ′ .Then x, y / ∈ S ′ (if x ∈ S ′ or y ∈ S ′ then S ′′ = S ′ − e x , respectively S ′′ = S ′ − e y , would be a cluster vertex deletion set of G ′ , contradicting the minimality of S ′ ), and S ′′ = S ′ − e x + x is the desired cluster vertex deletion set of minimum size.Hence, S = S ′ ∩ V (G) has at most k vertices, and G − S is P 3 -free: if there would be an induced P 3 xyz in G with edges e = xy and e ′ = yz, then, as |S ′ ∩ {e x , e xy , e y }| = 1 = |S ′ ∩ {e ′ y , e ′ yz , e ′ z }|, one of the 3-paths xe x e xy , e y ye ′ y and e ′ yz e ′ z z would be outside S ′ .Thus, G has a cluster vertex deletion set of size at most k if and only if G ′ has a cluster vertex deletion set of size at most k + m, as claimed. ◀ We now show that, for any given tree T containing two vertices of degree 3, cluster-vd remains NP-complete when restricted to T -free bipartite graphs of maximum degree 4 and with arbitrarily large girth.
▶ Theorem 12.For any given integer g ≥ 3 and any given tree T containing two degree-3 vertices, cluster-vd is NP-complete on T -free n-vertex bipartite graphs of maximum degree at most 4 and with girth > g and, assuming ETH, cannot be solved in 2 o(n) time.
Proof.Note that cluster-vd restricted to the graph class in question is in NP.Below we give a polynomial-time reduction from cluster-vd restricted to bipartite graphs of degree at most 4 to cluster-vd restricted to T -free bipartite graphs of degree at most 4 and with arbitrarily large girth.
First, given a bipartite graph G of maximum degree at most 4 with n vertices and m edges, let G ′ be obtained from G by subdividing the edges as described in Lemma 11.Note that like G, G ′ is bipartite and has maximum degree at most 4. By Lemma 11, G has a cluster vertex deletion set of size at most k if and only if G ′ has a cluster vertex deletion set of size at most k + m.Now, given g > 0 and a tree T with two degree-3 vertices, fix an integer t ≥ max{log 4 g, |V (T )|}.Then, repeating the construction in Lemma 11 t times, the final bipartite graph G ′ has girth 4 t • girth(G) > g and maximum degree at most 4, and contains no induced subgraph isomorphic to T (as the distance between two degree-3 vertices in G ′ is larger than |V (T )|).Thus the NP-hardness part of the theorem follows from the first part of Theorem 9. Note that G ′ has n + (4 t − 1)m = O(n) vertices, hence, the second part of the theorem follows from the second part of Theorem 9. ◀ Observe that if we consider subcubic bipartite graphs and make use of Theorem 10 instead of Theorem 9 in the proof of Theorem 12, we obtain: ▶ Theorem 13.For any given integer g ≥ 3 and any given tree T containing two degree-3 vertices, cluster-vd is NP-complete on T -free subcubic bipartite graphs and with girth > g and, assuming ETH, cannot be solved in We now are going to show that connected cluster-vd remains NP-complete when restricted to bipartite graphs with arbitrarily large girth.(Notice that a reduction based on Lemma 11, similar to the reduction in Theorem 12, does not work for connected cluster-vd.)Let g > 0 be a given integer.From an instance (G, k) of cluster-vd, where G = (X ∪ Y, E) is a bipartite graph with girth > g, we construct an instance (G(g), k ′ ), where G(g) is a bipartite graph of girth > g, for connected cluster-vd as follows: We may assume that g is odd (otherwise, replace g by g + 1); Write X = {x 1 , x 2 , . . ., x r }, Y = {y 1 , y 2 , . . ., y s }, and n = r + s; Let H(g, r, s) be the tree depicted in Fig. 3; note that H(g, r, s) has 6r + 3gr + 6s + 3gs = (6 + 3g)n vertices.The property of H(g, r, s) that will be used is that the set of all degree-3 vertices of H(g, r, s), that is all x ig , 1 ≤ i ≤ r, and all y jg , 1 ≤ j ≤ s, is both an optimal cluster vertex deletion set and the unique connected cluster vertex deletion set.The vertices x ig and y jg will have degree 3 in the whole graph G(g).In Fig. 3 the unique connected cluster vertex deletion set contains the (g + 2)n black vertices.The tree H(g, r, s).The (g + 2)n black vertices form an optimal (connected) cluster vertex deletion set.
Then, let G(g) be obtained from G and H(g, r, s) by adding an edge between x i and x ig , 1 ≤ i ≤ r, and between y j and y jg , 1 ≤ j ≤ s.Note that like G, G(g) is bipartite (as g is odd) and has n ′ = n + (6 + 3g)n = (7 + 3g)n vertices.See Fig. 4 for an example in case g = 3.Finally, set k ′ = k + (g + 2)n.Clearly, (G(g), k ′ ) can be constructed in polynomial time from (G, k).Now, let S be a cluster vertex deletion set of G of size at most k.Then G(g) has a connected cluster vertex deletion set S ′ of size |S| + (g + 2)n ≤ k ′ : S ′ is obtained from S by adding all vertices of H(g, r, s) with degree 3 in G(g) (the (g + 2)n black vertices in Fig. 3).Observe that S ′ induces a connected subgraph in G(g) since every vertex in S is adjacent to some x ig or y jg , and all vertices of H(g, r, s) with degree 3 in G(g) induce a connected subgraph in G(g).
Conversely, let S ′ be a (connected or not) cluster vertex deletion set of G(g) of size at most k ′ .Since every vertex u in H(g, r, s) with degree 3 in G(g) (the black vertices in Fig. 3) belongs to an induced P 3 = uvw in H(g, r, s) with deg G(g) (v) = 2 and deg G(g) (w) = 1, we may assume that S ′ contains all (g + 2)n vertices of H(g, r, s) with degree 3 (and no other vertices of H(g, r, s)).Let S be the restriction of Observe that the girth of G(g) is at least max{girth(G), 2g + 6} > g and the maximum degree of G(g) is one more than the maximum degree of G. Hence, by Theorems 12 and 13, we obtain: ▶ Theorem 14.For any given integer g ≥ 3, connected cluster-vd is NP-complete on bipartite graphs of maximum degree at most 5 and with girth > g and, assuming ETH, cannot be solved in 2 o(n) time.
▶ Theorem 15.For any given integer g ≥ 3, connected cluster-vd is NP-complete on bipartite graphs of maximum degree at most 4 and with girth > g and, assuming ETH, cannot be solved in 2 o( √ n) time.

H-free graphs: NP-completeness cases
In this section we give the proof of the NP-completeness part of Theorems 1 and 2.
Let H be a fixed graph.By Proposition 5, cluster-vd is polynomially solvable on H-free graphs whenever H is an induced subgraph of the 4-vertex path P 4 .The following fact is easy to see: ▶ Observation 16.A graph is an induced subgraph of the 4-path P 4 if and only if it is a {3P 1 , 2P 2 }-free forest.
Thus, it remains to consider the cases where H contains a cycle or a 3P 1 or a 2P 2 as an induced subgraph.Now, if H contains a cycle then graphs of girth > g = |V (H)| are H-free, hence Theorems 12 and 14 imply that cluster-vd and connected cluster-vd are NP-complete on H-free graphs and, assuming ETH, cannot be solved in 2 o(n) time on H-free n-vertex graphs.If H contains a 3P 1 or a 2P 2 then {3P 1 , 2P 2 }-free graphs are H-free graphs, hence Theorems 7 and 8 imply that cluster-vd and connected cluster-vd are NP-complete on H-free graphs and, assuming ETH, cannot be solved in 2 o(n) time on H-free n-vertex graphs.
The proofs of Theorems 1 and 2 are complete.

Conclusion
We have found a complete characterization of graphs H for which cluster-vd on H-free graphs is polynomially solvable and for which it is NP-complete (Theorem 1).The same complexity dichotomy holds also for connected cluster-vd (Theorem 2).We remark that a complexity dichotomy for vertex cover and connected vertex cover on H-free graphs, like Theorem 1 and Theorem 2 for cluster-vd and connected cluster-vd, respectively, seems very hard to achieve.Indeed, it is a long-standing open problem whether there exists a constant t for which vertex cover or connected vertex cover is NP-complete on P t -free graphs.So far it is known that such a constant t, if any, must be at least 7 for vertex cover [13], respectively, at least 6 for connected vertex cover [19].
Let H be a set of (possibly infinitely many) graphs.A natural question generalizing the case of one forbidden induced subgraph is: what is the complexity of cluster-vd and of connected cluster-vd on H-free graphs?The case H = {H} is completely solved by Theorems 1 and 2. The case H = {C ℓ | ℓ ≥ 4}, also known as chordal graphs, addressed in [3] is still open.The next step may be the case of two-element sets H = {H 1 , H 2 }; in particular, H = {H, H}.Another interesting problem is to clear the complexity of cluster-vd and connected cluster-vd on line graphs, a well-studied graph class defined by excluding nine small induced subgraphs.

A Computing the cluster vertex deletion number of cographs using the cotrees
Recall that P 4 -free graphs are also called cographs [6].More precisely, for vertex-disjoint graphs and let G 1 1 ⃝G 2 be the join of G 1 and G 2 , With these notations, cographs are exactly those graphs that can be constructed from the one-vertex graph by applying the join and co-join operations.Thus, a cograph is the one-vertex graph or is the join of two smaller cographs or is the co-join of two smaller cographs.
Recall that S ⊆ V (G) is a vertex cover if G − S is edgeless and is a cluster vertex deletion set if G − S is a cluster graph.Let τ (G) and ς(G) denote the vertex cover number and the cluster vertex deletion number of G, respectively, τ (G) = min{|S| : S is a vertex cover of G}, ς(G) = min{|S| : S is a cluster vertex deletion set of G}.
We will see that τ (G) and ς(G) can be computed efficiently when restricted to cographs.The calculation is based on the following fact: ▶ Lemma 17.For any (not necessarily P 4 -free) graphs G 1 and G 2 , the following relations hold: Proof.
(1) and ( 3) are trivial. ( For the other direction, let S be a vertex cover of G 1 1 ⃝G 2 of optimal size, and write S i = S∩V (G i ).Then S i is a vertex cover of G i , and moreover, ).For the other direction, let S be a cluster vertex deletion set of G 1 1 ⃝G 2 of optimal size, and write S i = S ∩ V (G i ).Then S i is a cluster vertex deletion set of G i , and moreover, if In the third case where each of G 1 − S 1 and G 2 − S 2 is a clique, S 1 and S 2 are vertex covers of G 1 and G 2 , respectively.Hence in this case, For any integer r ≥ 2, Lemma 17 holds accordingly for G 1 0 We also note that Lemma 17 holds for the weighted version, too.
With each cograph G = (V, E), one can associate a so-called cotree T of G as follows.
The leaves of T are the vertices of G; Every internal node of T has a label 0 ⃝ or 1 ⃝, and has at least two children; No two internal nodes of T with the same label are adjacent; Two vertices u and v of G are (non-)adjacent if and only if the least common ancestor of u and v in T has label 1 ⃝ (respectively, 0 ⃝).
In particular, the cotree of an n-vertex cograph has at most 2n − 1 nodes.Note that, for any internal node v of T , the subtree T v of T rooted at v is the cotree of the subgraph of G induced by the leaves of T v .The cograph corresponding to T v where v has label 0 ⃝ is the disjoint union of the cographs corresponding to the children of v.The cograph corresponding to T v where v has label 1 ⃝ is the join of the cographs corresponding to the children of v.
In particular, the cotree of G can be obtained from the cotree of G by changing the label 0 ⃝ to 1 ⃝ and 1 ⃝ to 0 ⃝.
In [7], a linear time algorithm is given for recognizing if a given graph is a cograph, and if so, constructing its cotree.Note that the cotree can immediately be transformed to an equivalent binary tree; see Fig. 5 for an example of a cograph G, the cotree of G and its binary version.For simplification, we will use the binary cotree in our algorithm below.Now, given a cograph G together with its binary cotree T , the bottom-up Algorithm 1 below computes the cluster vertex deletion number ς(G) of G, as suggested by Lemma 17.The algorithm traverses the cotree T by post-order, that is, for the current node v of T , it recursively traverses the left subtree of T v , then the right subtree of T v , and finally visits the current node v.The algorithm uses the following notations.For a node v of T , if v is an internal node then ℓ(v) and r(v) stands for the left child and the right child of v, respectively; n(v) denotes the size of the subgraph of G induced by the leaves of T v .Thus, if v is a leaf then n(v) = 1 and if v is the root of T then n(v) = |V (G)|; ς(v) denotes the cluster vertex deletion number of the subgraph of G induced by the leaves of T v .Thus, if v is a leaf then ς(v) = 0 and if v is the root of T then ς(v) = ς(G); τ (v) denotes the vertex cover number of the complement of the subgraph of G induced by the leaves of T v .Thus, if v is a leaf then τ (v) = 0 and if v is the root of T then τ (v) = τ (G).
Figure 5 A cograph G, the cotree of G and its binary version.

Algorithm 1 computing cluster vertex deletion number
Input: A cograph G = (V, E) together with its (binary) cotree T .
Output: ς(G), the cluster vertex deletion number of G 1 Traverse T by post-order and let v be the current node Proof.The correctness of Algorithm 1 directly follows from Lemma 17.Since per node in the cotree a constant number of operations is performed, the algorithm runs in O(n) time.◀ We remark that Algorithm 1 can be slightly modified for computing a minimum cluster vertex deletion set.Also, since Lemma 17 holds accordingly for the weighted version, the minimum weight cluster vertex deletion number of cographs can be computed in linear time, too.

B
Computing the connected cluster vertex deletion number of cographs using the cotrees  (Again, we set θ c (G) = ∞ if G has no connected clique deletion set.)Notice that θ(G) = τ (G), and thus θ(G) can be computed in linear time when restricted to cographs (by Lemma 17 and Proposition 19.) Notice also that θ(G) ≤ θ c (G) and ς(G) ≤ ς c (G).We will see in this section that θ c (G) and ς c (G) can be computed efficiently when restricted to cographs.
We first consider the connected clique vertex deletion number.The following fact follows immediately from the definition: The following two lemmas provide a formula for computing the connected clique vertex deletion number of the join of two graphs.▶ Lemma 21.Let G 1 be a complete graph and let G 2 be an arbitrary graph.Then: Proof.Let S be an optimal connected clique vertex deletion set of G 1 1 ⃝G 2 , and write due to the connectedness and the optimality of S) and Let G 1 and G 2 be two arbitrary non-complete graphs.Then: Proof.Let S be an optimal connected clique deletion set of G 1 1 ⃝G 2 and write For the other direction let T i be an optimal clique deletion set of We now consider the connected cluster vertex deletion number of the disjoint union and the join of two graphs.The following fact follows immediately from the definition: ▶ Lemma 23.For arbitrary graphs G 1 and G 2 , Lemmas 24 and 26 below provide a formula for computing the connected cluster vertex deletion number of the join of two graphs.▶ Lemma 24.Let G 1 be a complete graph and let G 2 be an arbitrary graph.Then: Proof.Let S be a connected cluster vertex deletion set of G 1 1 ⃝G 2 of optimal size, and write For the other direction, observe first that by definition, ς c (G 1 1 ⃝G 2 ) ≤ θ c (G 1 1 ⃝G 2 ), and hence by Lemma 21, ς c (G 1 1 ⃝G 2 ) ≤ min {θ c (G 2 ), 1 + θ(G 2 )}.Observe next that, for any cluster vertex deletion set S of G 2 of optimal size ς(G 2 ), V (G 1 ) ∪ S is a connected cluster vertex deletion set of G Proof.Let S be a connected cluster vertex deletion set of G 1 1 ⃝G 2 of optimal size, and write S i = S ∩ V (G i ), i = 1, 2. Then S i is a cluster vertex deletion set of G i .Note, moreover, that at least one of G 1 − S 1 and G 2 − S 2 must be a clique.
(1): Let G 1 be connected, say.Observe that any cluster vertex deletion set S 1 of G 1 is non-empty (because G 1 is connected non-complete), hence V (G 2 ) ∪ S 1 is a connected cluster vertex deletion set of G 1 1 ⃝G 2 , and for any cluster vertex deletion set S 2 of G 2 , V (G 1 ) ∪ S 2 is a connected cluster vertex deletion set of G 1 1 ⃝G 2 (because G 1 is connected).Thus, (2): Observe that for any cluster vertex deletion set S 1 of G 1 of optimal size ς(G 1 ), V (G 2 )∪S 1 (if Now, given a cograph G together with its cotree, with Lemmas 20, 21, 22, 23, 24 and 26 we can compute the connected clique vertex deletion number and the connected cluster deletion number of G in linear time.This is done in the same way for computing the vertex cover number and the cluster vertex deletion number in Appendix A, hence we omit the details.

▶ Lemma 11 .
Given a graph G, let G ′ be obtained from G by subdividing each edge e = xy in G with three new vertices e x , e xy and e y , thus obtaining the 5-vertex path xe x e xy e y y in G ′ in which all new vertices are of degree 2. Assuming G is triangle-free, G has a cluster vertex deletion set of size at most k if and only if G ′ has a cluster vertex deletion set of size at most k + m, where m is the edge number of G.

Figure 2
Figure 2 Proof of Lemma 11 illustrated: A triangle-free graph G (left) with two highlighted edges e = xy and e ′ = xz, and the graph G ′ obtained from G as described in Lemma 11 (right); the cluster vertex deletion set S = {x, y} of G is extended to the cluster vertex deletion set S ′ of G ′ consisting of the nine black vertices.
Figure 3The tree H(g, r, s).The (g + 2)n black vertices form an optimal (connected) cluster vertex deletion set.

Figure 4
Figure4 An example of the reduction from cluster-vd to connected cluster-vd: A bipartite graph G (left) and the bipartite graph G(3) (right) obtained from G and H(3,4,3); the bipartition of the vertex set is indicated by circle and rectangle vertices.

▶ Proposition 19 .
Given a P 4 -free n-vertex graph G together with its cotree, Algorithm 1 correctly computes the cluster deletion number ς(G) of G in O(n) time.
ς c (G) = min{|S| : S is a connected cluster vertex deletion set of G}.(We set ς c (G) = ∞ if G has no connected cluster vertex deletion set.)Whencomputing ς c (G), we will have to consider a special case of (connected) cluster vertex deletion.A setS ⊆ V (G) is a (connected) clique deletion set if G − S is a clique (and G[S] is connected).Let θ(G)and θ c (G) denote the clique vertex deletion number and the connected clique vertex deletion number of G, respectively, θ(G) = min{|S| : S is a clique deletion set of G}, θ c (G) = min{|S| : S is a connected clique deletion set of G}.

1 ⃝G 2 .
where u is any vertex of G 1 , is a connected cluster vertex deletion of G 1 Hence ς c (G 1 1 ⃝G 2 ) ≤ |V (G 2 )| + max{ς(G 1 ), 1}.Similarly, ς c (G 1 1 ⃝G 2 ) ≤ |V (G 1 )| + max{ς(G 2 ), 1}.◀ : initially, set S ′ = S.Then, for each edge e = xy in G, if both x and y are in S or outside S, put e xy into S ′ ; if x ∈ S and y / ∈ S, put e y into S ′ ; if x / ∈ S and y ∈ S, put e x into S ′ .To see that G ′ − S ′ is P 3 -free, notice that by construction, for each edge e = xy in G, exactly one of e x , e xy and e y is in S ′ , and if e x , e xy / ∈ S ′ then x ∈ S, and if e x , x / ∈ S ′ then y / ∈ S, hence e xy ∈ S ′ .Since each P 3 in G ′ has the form xe x e xy , e x e xy e y or e x xe ′
is a connected cluster vertex deletion set if G − S is a cluster graph and G[S] is connected.Note that G has a connected cluster vertex deletion set if and only if G has at most one connected component that contains an induced P 3 (if G has more than two connected components containing an induced P 3 then any cluster vertex deletion set must contain vertices in different connected components).Let ς c (G) denote the connected cluster vertex deletion number of G,