In this section. we first establish NP-completeness results.
Theorem 1
All considered problem variants of Dynamic Cluster Editing are NP-complete, even if the input graph G is a cluster graph.
Intuitively, Theorem 1 means that on top of the NP-hard task of transforming a graph into a cluster graph, it is computationally hard to improve an already found cluster graph with respect to being closer to the target cluster graph. Notably, while the dynamic versions of Cluster Completion turn out to be NP-complete, it is easy to see that classic Cluster Completion is solvable in polynomial time.
In a second part of this section we show W[1]-hardness results both for budget parameter k and for distance parameter d for several variants of Dynamic Cluster Editing. Formally, we show the following.
Theorem 2
The following problems are W[1]-hard when parameterized by the budget k:
The following problems are W[1]-hard when parameterized by the distance d:
-
DCEditing (Matching Dist),
-
DCDeletion (Matching Dist),
-
DCEditing (Edge Dist), and
-
DCDeletion (Edge Dist).
The proof of Theorem 2 is based on several parameterized reductions which are presented in Sect. 3.2. The proof of Theorem 1 is based on nonparameterized polynomial-time many-one reductions (see Sect. 3.1) and some parameterized reductions that also imply NP-hardness (see Sect. 3.2). More precisely, Theorem 1 follows from Lemmas 1, 2, Observation 1, and Lemma 3 presented in Sect. 3.1, as well as Lemmas 4 and 5 presented in Sect. 3.2.
Polynomial-Time Many-One Reductions
We first present two polynomial-time many-one reductions from the strongly NP-hard 3-Partition problem
[24] for both DCCompletion (Matching Dist) and DCCompletion (Edge Dist) with input graphs that are already cluster graphs. We start with the latter.
Lemma 1
DCCompletion (Edge Dist) is NP-complete, even if the input graph G is a cluster graph.
Proof
We present a polynomial-time reduction from 3-Partition, where, given a multi-set of 3m positive integers \(\{a_1,a_2,\dots ,a_{3m}\}\) with \(\sum _{1 \le i \le 3m}a_i=mB\), for \(1 \le i \le 3m\) it holds that \(B/4< a_i < B/2\), and the task is to determine whether this multi-set can be partitioned into m disjoint subsets \(A_1,A_2, \dots , A_m\) such that for each \(1 \le i \le m\) it holds that \(\sum _{a_j \in A_i}a_j=B\). Given an instance \(\{a_1,a_2,\dots ,a_{3m}\}\) of 3-Partition, we construct an instance \((G,G_c,k,d)\) of DCCompletion (Edge Dist) as follows. The construction is illustrated in Fig. 2. For graph G, we first create m disjoint big cliques, each with \(M=4(mB)^2\) vertices. Then for every integer \(a_i\), we create a small clique \(C_i\) with \(|C_i|=a_i\) vertices. We set \(G_c\) to be a complete graph. Further, we set \(k:=mMB+\frac{m}{2}B^2 -\frac{1}{2}\sum _{1 \le i \le 3m}{a_i}^2\) and \(d:=|E(G) \oplus E(G_c)|-k\).
Next we show that \(\{a_1,a_2,\dots ,a_{3m}\}\) is a yes-instance of 3-Partition if and only if \((G,G_c,k,d)\) is a yes-instance of DCCompletion (Edge Dist).
(\(\Rightarrow \)): Assume that \(\{a_1,a_2,\dots ,a_{3m}\}\) is a yes-instance of 3-Partition. Then there is a partition \(A_1,A_2, \dots , A_m\) such that for each \(1 \le i \le m\) it holds that \(\sum _{a_j \in A_i}a_j=B\). For each \(A_i\), we can combine the corresponding three small cliques and one big clique of size M into one clique. This costs \(MB+\frac{1}{2}(B^2-\sum _{a_j \in A_i}{a_j}^2)\) edge insertions. In total, there are
$$\begin{aligned} mMB+\frac{m}{2}B^2 -\frac{1}{2}\sum _{1 \le i \le 3m}{a_i}^2=k \end{aligned}$$
edge insertions. Hence we get a cluster graph \(G'\) with \(|E(G) \oplus E(G')|=k\) and \(|E(G') \oplus E(G_c)|= |E(G) \oplus E(G_c)| -k =d\).
(\(\Leftarrow \)): Assume that \((G,G_c,k,d)\) is a yes-instance of DCCompletion (Edge Dist) and let \(G'\) be the solution. Since \(k+d=|E(G) \oplus E(G_c)|\), to get \(G'\) we have to add exactly k edges to G. We make the following two observations. First, we can never combine two big cliques, as otherwise we need at least \(M^2>k\) edge insertions. Second, every small clique must be combined with a big clique, as otherwise we have at most \(M(mB-1)\) edge insertions between big cliques and small cliques and at most \((mB)^2\) edge insertions between small cliques, and in total there are at most \(M(mB-1)+(mB)^2=mMB-3(mB)^2<k\) edge insertions. Hence, to get solution \(G'\) we must partition all 3m small cliques \(C_1,C_2,\dots ,C_{3m}\) in G into m groups \(A_1,A_2, \dots , A_m\) and combine all cliques in each group with one big clique.
We can split the edge insertions into two parts \(k=k_1+k_2\), where \(k_1:=mMB\) is the number of edge insertions between big cliques and small cliques, and \(k_2:=\sum _{1 \le i \le m}{\sum _{C_j,C_k \in A_i}|C_j||C_k|}\) is the total number of edge insertions between small cliques in each group. We can also write \(k_2\) as
$$\begin{aligned} k_2=\frac{1}{2}\sum _{1 \le i \le m}\left( \sum _{C_j \in A_i}|C_j|\right) ^2-\frac{1}{2}\sum _{1 \le i \le 3m}{a_i}^2. \end{aligned}$$
Recall that \(k=mMB+\frac{m}{2}B^2 -\frac{1}{2}\sum _{1 \le i \le 3m}{a_i}^2\), so we have that
$$\begin{aligned} \sum _{1 \le i \le m}\left(\sum _{C_j \in A_i}|C_j|\right)^{2} = mB^{2}. \end{aligned}$$
Since \(\sum _{1 \le i \le 3m}|C_i|=\sum _{1 \le i \le 3m}a_i=mB\), the equality
$$\begin{aligned} \sum _{1 \le i \le m}\left(\sum _{C_j \in A_i}|C_j|\right)^{2} = mB^2 \end{aligned}$$
holds only if \(C_1,C_2,\dots ,C_{3m}\) can be partitioned into m disjoint subsets \(A_1,\) \(A_2, \dots , A_m\) such that for \(1 \le i \le m\) it holds that \(\sum _{C_j \in A_i}|C_j|=B\). Thus, \(\{a_1,a_2,\dots ,a_{3m}\}\) can be partitioned into m disjoint subsets \({A_1}',{A_2}', \dots , {A_m}'\) such that for \(1 \le i \le m\) it holds that \(\sum _{a_j \in {A_i}'}a_j=B\). \(\square \)
We continue with DCCompletion (Matching Dist). The corresponding NP-hardness reduction uses the same basic ideas as in Lemma 1. The main difference is that in the proof of Lemma 1 we make use of the property that we need to add exactly k edges. This enforces that every small clique should be combined with a big clique, while in the following proof we need to make use of the matching-based distance to enforce this.
Lemma 2
DCCompletion (Matching Dist) is NP-complete, even if the input graph G is a cluster graph.
Proof
We present a polynomial-time reduction from 3-Partition, where, given a multi-set of 3m positive integers \(\{a_1,a_2,\dots ,a_{3m}\}\) with \(\sum _{1 \le i \le 3m}a_i=mB\), for \(1 \le i \le 3m\) it holds that \(B/4< a_i < B/2\), and the task is to determine whether this multi-set can be partitioned into m disjoint subsets \(A_1,A_2, \dots , A_m\) such that for each \(1 \le i \le m\) it holds that \(\sum _{a_j \in A_i}a_j=B\). Given an instance \(\{a_1,a_2,\dots ,a_{3m}\}\) of 3-Partition, we construct an instance \((G,G_c,k,d)\) of DCCompletion (Matching Dist) as follows.
The construction is illustrated in Fig. 3. For graph G, we first create m big cliques \(C_1^M,C_2^M,\dots ,C_m^M\) each with \(M:=4(mB)^2\) vertices. Then for every integer \(a_i\) in \(\{a_1,a_2,\dots ,a_{3m}\}\), we create a small clique \(C_i\) with \(|C_i|=a_im\). Lastly, we create a clique \(C^{M^2}\) with \(M^2\) vertices. For graph \(G_c\), we create \(m+1\) cliques as follows. For every \(C_i^M\) in G, we create a clique \(C_i^{M+3m}\) with \(M+3m\) vertices which contains all M vertices from \(C_i^M\) and one vertex from each \(C_i\) for \(1 \le i \le 3m\). In other words, each \(C_i\) in G contains exactly one vertex from each \(C_i^{M+3m}\) in \(G_c\) for \(1 \le i \le m\). Lastly, we create a clique \(C^{M^2+(B-3)m^2}\) which contains all remaining vertices, that is, \(M^2\) vertices from \(C^{M^2}\) and vertices from every \(C_i\) for \(1 \le i \le 3m\) which are not contained in any \(C_i^{M+3m}\). Thus \(C^{M^2+(B-3)m^2}\) contains \(M^2+\sum _{1 \le i \le 3m}(a_i-1)m=M^2+(B-3)m^2\) vertices. Set \(k:=m^2MB+\frac{m^3}{2}B^2 -\frac{m^2}{2}\sum _{1 \le i \le 3m}{a_i}^2\) and \(d:=m^2B-3m\).
It is easy to see that the maximum-weight matching \(M^*\) for \(B(G,G_c)\) is to match \(C_i^M\) with \(C_i^{M+3m}\) for every \(1 \le i \le m\) and is to match \(C^{Bm^2}\) with \(C^{(2B-3)m^2}\). Thus the matching-based distance between G and \(G_c\) is
$$\begin{aligned} d_0=d_M(G, G_c)=\sum _{1 \le i \le 3m}a_im=m^2B. \end{aligned}$$
Now, we show that \(\{a_1,a_2,\dots ,a_{3m}\}\) is a yes-instance of 3-Partition if and only if \((G,G_c,k,d)\) is a yes-instance of DCCompletion (Matching Dist).
(\(\Rightarrow \)): Assume that \(\{a_1,a_2,\dots ,a_{3m}\}\) is a yes-instance of 3-Partition. Then there is a partition \(A_i,A_2, \dots , A_m\) such that for \(1 \le i \le m\) it holds that \(\sum _{a_j \in A_i}a_j=B\). We add edges into G to get a cluster graph \(G'\) as follows. For each \(A_i\), we combine the corresponding three small cliques for the three integers in \(A_i\) and the big clique \(C_i^M\) into one clique. This costs
$$\begin{aligned} MmB+m^2\sum _{a_k,a_j \in A_i}a_ja_k=MmB+\frac{m^2}{2}\left( B^2-\sum _{a_j \in A_i}{a_j}^2\right) \end{aligned}$$
edge insertions. In total, there are \(m^2MB+\frac{m^3}{2}B^2 -\frac{m^2}{2}\sum _{1 \le i \le 3m}{a_i}^2=k\) edge insertions. Since every small clique \(C_i\), combined with some big clique \(C_j^M\), contains one vertex from \(C_j^{M+3m}\), we obtain
$$\begin{aligned} d_M(G',G_c)=d_0-3m=m^2B-3m=d. \end{aligned}$$
(\(\Leftarrow \)): Assume that \((G,G_c,k,d)\) is a yes-instance of DCCompletion (Matching Dist) and let \(G'\) be the solution and let \(M'\) be the maximum-weight matching between \(G'\) and \(G_c\). First note that clique \(C^{M^2}\) has \(M^2\) vertices and \(M^2>k\), so we cannot combine \(C^{M^2}\) with any other clique. Since \(M^2>(B-3)m^2\), we have also \(|C^{M^2}|>\frac{1}{2}|C^{M^2+(B-3)m^2}|\). Hence, in the matching \(M'\) clique \(C^{M^2}\) must be matched with \(C^{M^2+(B-3)m^2}\). Next in the matching \(M'\) every \(C_i^{M+3m}\) in \(G_c\) must be matched with a clique in \(G'\) which contains clique \(C_i^M\), since otherwise the distance between \(G'\) and \(G_c\) is at least M and \(M>d\). This also means that we cannot combine two big cliques \(C_i^M\) and \(C_j^M\). Since \(d_M(G',G_c) \le d=d_0-3m\), to get solution \(G'\) every small clique \(C_i\) for \(1 \le i \le 3m\) has to be combined with some big clique \(C_j^M\).
We can split k into two parts \(k=k_1+k_2\), where \(k_1:=m^2MB\) is the number of edge insertions between big cliques and small cliques, and \(k_2\) is the total number of edge insertions between small cliques. Similarly to the analysis in Lemma 1, we have that \(k_2 \ge \frac{m^3}{2}B^2 -\frac{m^2}{2}\sum _{1 \le i \le 3m}{a_i}^2\) and the equality holds only if \(\{a_1,a_2,\dots ,a_{3m}\}\) can be partitioned into m disjoint subsets \(A_1,A_2, \dots , A_m\) such that for \(1 \le i \le m\) it holds that \(\sum _{a_j \in A_i}a_j=B\). \(\square \)
Observe that when G is a cluster graph, we can “swap” G with \(G_c\) and k with d:
Observation 1
When G is a cluster graph, instance \((G,G_c,k,d)\) of DCEditing (Edge Dist) is a yes-instance if and only if instance \((G_c,G,d,k)\) of DCEditing (Edge Dist) is a yes-instance. When both G and \(G_c\) are cluster graphs and \(E(G) \subseteq E(G_c)\), instance \((G,G_c,k,d)\) of DCCompletion (Edge Dist) is a yes-instance if and only if instance \((G_c,G,d,k)\) of DCDeletion (Edge Dist) is a yes-instance.
Observe that from Lemma 1 and Observation 1 we can infer NP-hardness for DCDeletion (Edge Dist) even if G is a cluster graph. For the matching-based distance, we do not have an analogue of Observation 1. Thus, we provide another reduction showing NP-hardness for DCDeletion (Matching Dist) even if G is a cluster graph.
Lemma 3
DCDeletion (Matching Dist) is NP-complete, even if the input graph G is a cluster graph.
Proof
We present a polynomial-time reduction from the NP-hard Exact Cover by 3-Sets problem
[31], where, given a set X with \(|X|=3q\) and a collection \({\mathcal {S}}\) of 3-element subsets of X, the task is to determine whether \({\mathcal {S}}\) contains a subcollection \(\mathcal {S'} \subseteq {\mathcal {S}}\) of size q that covers every element in X exactly once. Given an instance \((X,{\mathcal {S}})\) of Exact Cover by 3-Sets, where \(X=\{x_1,x_2,\dots ,x_{3q}\}\) and \({\mathcal {S}}=\{S_1,S_2,\dots ,S_m\}\), we construct an instance \((G,G_c,k,d)\) of DCDeletion (Matching Dist) in polynomial time as follows.
The construction is illustrated in Fig. 4. For every set \(S_i=\{x_{i_1},x_{i_2},x_{i_3}\}\) in \({\mathcal {S}}\), we create a clique \(C_i=\{v_1^i,v_2^i\} \cup \{x_{i_1}^i,x_{i_2}^i,x_{i_3}^i\}\) in G. So G contains m order-five cliques \(C_1,C_2,\dots ,C_m\). For \(G_c\), we first create m cliques \(D_1,D_2,\dots ,D_m\) with \(D_i=\{v_1^i,v_2^i\}\). Then for each element \(x_i\), we create a clique \(D_{x_i}=\{x_i^j \mid x_i \in S_j\}\). For example, if an element \(x_i\) is contained in some set \(S_j\), then in G the corresponding clique \(C_j\) for \(S_j\) contains a vertex \(x_i^j\) which is also contained in the clique \(D_{x_i}\) in \(G_c\). Hence, if there is a subcollection \(\mathcal {S'}\) of size q that covers every element in X exactly once, then we can find these q corresponding cliques in G and separate them to get 3q new vertices each contained in one different clique \(D_{x_i}\) in \(G_c\). Finally, we set \(k:=9q\) and \(d:=3m-3q\).
Note that the maximum-weight matching \(M^*\) for \(B(G,G_c)\) has to match every \(C_i\) in G with \(D_i\) in \(G_c\). Thus \(d_M(G, G_c)=3m\). Now we show that \((X,{\mathcal {S}})\) is a yes-instance of Exact Cover by 3-Sets if and only if \((G,G_c,k,d)\) is a yes-instance of DCDeletion (Matching Dist).
(\(\Rightarrow \)): Assume that \((X,{\mathcal {S}})\) is a yes-instance of Exact Cover by 3-Sets. Let \(\mathcal {S'}\) be the solution. For every \(S_i \in \mathcal {S'}\), we find the corresponding clique \(C_i=\{v_1^i,v_2^i\} \cup \{x_{i_1}^i,x_{i_2}^i,x_{i_3}^i\}\) in G and partition it into four cliques \(\{v_1^i,v_2^i\}\), \(\{x_{i_1}^i\}\), \(\{x_{i_2}^i\}\), and \(\{x_{i_3}^i\}\). Let \(G'\) be the resulting cluster graph. For every such clique \(C_i\), we delete nine edges to partition it. Thus, overall we need to delete \(9q=k\) edges. Since every element of X is covered by exactly one set from \(\mathcal {S'}\), we have that in \(G'\) we get 3q new cliques each with one vertex, and each vertex is contained in a different clique from \(D_{x_1},D_{x_2},\dots ,D_{x_{3q}}\). Thus, we have \(d_M(G,G_c)=3m-3q=d\).
(\(\Leftarrow \)): Assume that \((G,G_c,k,d)\) is a yes-instance of DCDeletion (Matching Dist). Let \(G'\) be the solution and \(M'\) be the maximum-weight matching between \(G'\) and \(G_c\). Since we can only delete edges to get \(G'\) and every set \(S_i\) can only contain each element from X once, we get that in \(M'\) any edge incident on \(D_{x_i}\) has weight at most one. Since \(d_M(G,G_c) \le d =3m-3q\), it has to hold that in \(M'\) every \(D_{x_i}\) is matched with a new clique in \(G'\) and they share exactly one vertex. Thus we need 3q new cliques in \(G'\) to be matched with \(D_{x_1},D_{x_2},\dots ,D_{x_{3q}}\) in \(G_c\). To get these 3q new cliques, we need to separate at least 3q vertices from \(C_1,C_2,\dots ,C_m\) in G. Since we can delete at most \(k=9q\) edges, there have to be q cliques from \(C_1,C_2,\dots ,C_m\) such that we can separate each of them into four parts, where the first part contains \(\{v_1^i,v_2^i\}\) and the remaining three parts each have one vertex. Moreover, these 3q new cliques each share one vertex with one different clique from \(D_{x_1},D_{x_2},\dots ,D_{x_{3q}}\). Thus, in the instance \((X,{\mathcal {S}})\) of Exact Cover by 3-Sets we can find the corresponding 3q sets and they cover each element of X exactly once. \(\square \)
Parameterized Reductions
We first show that DCEditing (Matching Dist) is W[1]-hard when parameterized by the budget k.
Lemma 4
DCEditing (Matching Dist) is NP-complete and W[1]-hard with respect to the budget k, even if the input graph G is a cluster graph.
Proof
We present a parameterized reduction from Clique, where, given a graph \(G_0\) and an integer \(\ell \), we are asked to decide whether \(G_0\) contains a complete subgraph of order \(\ell \). Clique is W[1]-hard when parameterized by \(\ell \)
[18]. Given an instance \((G_0,\ell )\) of Clique, we construct an instance \((G,G_c,k,d)\) of DCEditing (Matching Dist) as follows.
The construction is illustrated in Fig. 5. Let \(n=|V(G_0)|\). We first construct G. For every vertex v of \(G_0\), we create a clique \(C_v\) of size \(\ell ^7+\ell ^4+\ell ^2\). For every edge e of \(G_0\), we create a clique \(C_e\) of size \(\ell ^4+2\). Lastly, we create a big clique \(C_B\) of size \(\ell ^8\). Note that G is already a cluster graph. Next we construct \(G_c\). We first create \(\ell \) cliques \(D_i\) of size \(n\ell ^3\) for each \(1 \le i \le \ell \). Every \(D_i\) contains \(\ell ^3\) vertices in every \(C_v\) in G. In other words, every \(C_v\) in G contains \(\ell ^3\) vertices in every \(D_i\) in \(G_c\). Then we create a big clique \(D_B\) which contains all vertices in \(C_B\) and \(\ell ^7\) vertices in every \(C_v\). For every vertex v of \(G_0\), we create clique \(D_v\) which contains \(\ell ^2\) vertices in \(C_v\) and one vertex in every \(C_e\) for \(v \in e\). Lastly, for every edge e we create \(D_e\) which contains \(\ell ^4\) vertices in \(C_e\). We set \(k:=\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) (2\ell ^4+1)+\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) \) and we set \(d:=d_0-\ell (\ell -1)\), where \(d_0=d_M(G,G_c)\) is the matching-based distance between G and \(G_c\), which is computed as follows.
To compute \(d_M(G,G_c)\), we need to find an optimal matching in \(B(G,G_c)\), the weighted bipartite graph between G and \(G_c\). First, in an optimal matching \(D_B\) must be matched with \(C_B\) since \(|C_B \cap D_B|=\ell ^8 > |C_v \cap D_B|=\ell ^7\) for any \(v \in V(G_0)\) and \(C_B \subseteq D_B\). Similarly, \(D_e\) must be matched with \(C_e\) for every \(e \in E(G_0)\). Then the remaining n cliques \(C_v\) in G need to be matched to \(\ell \) cliques \(D_i\) and n cliques \(D_v\) in \(G_c\). Since \(|C_v \cap D_i|=\ell ^3 > |C_v \cap D_v|=\ell ^2\) for any \(v \in V(G_0)\) and \(1 \le i \le \ell \), it is always better to match \(C_v\) with some \(D_i\). Since there are only \(\ell \) cliques \(D_i\), we can choose any \(\ell \) cliques from \(\{C_v \mid v \in V(G_0)\}\) to be matched with \(D_i\) for \(1 \le i \le \ell \) and the remaining \(n-\ell \) cliques to be matched with \(D_v\). Thus we have many different matchings in \(B(G,G_c)\) which have the same maximum weight, and each of them corresponds to choosing \(\ell \) different cliques from \(\{C_v \mid v \in V(G_0)\}\) to be matched with \(D_i\) for \(1 \le i \le \ell \). For each optimal matching, there are \(\ell \) free cliques \(D_v\) in \(G_c\) which are not matched.
This reduction works in polynomial time. We show that there is a clique of size \(\ell \) in \(G_0\) if and only if there is a cluster graph \(G'=(V,E')\) such that \(|E(G') \oplus E(G)| \le k\) and \(d_M(G',G_c) \le d\).
(\(\Rightarrow \)): Assume that there is a clique \(C^*\) of order \(\ell \) in \(G_0\). We modify the graph G as follows. First, for every edge e in the clique \(C^*\) partition the corresponding clique \(C_e\) in G into three parts; one part contains all vertices in \(D_e\) and the other two parts each have one vertex. After this we get \(\ell (\ell -1)\) single vertices. Since \(C^*\) is a clique, all these single vertices can be partitioned into \(\ell \) groups such that each group has \(\ell -1\) vertices and all these \(\ell -1\) vertices are contained in the same \(D_v\) for some \(v \in C^*\). Then for each \(v \in C^*\), we combine the corresponding \(\ell -1\) vertices into one clique \(C_v^{\ell -1}\). Denote the resulting graph by \(G'\). For an illustration see Fig. 6. Along the way to get \(G'\), we delete \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) (2\ell ^4+1)\) edges and add \(\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) \) edges, thus \(|E(G) \oplus E(G')|=\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) (2\ell ^4+1)+\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) =k\). Next we show that \(d_M(G',G_c) \le d_0-\ell (\ell -1)\). Recall that an optimal matching in \(B(G,G_c)\) can choose \(\ell \) cliques from \(\{C_v \mid v \in V(G_0)\}\) to be matched with \(D_i\) for \(1 \le i \le \ell \). Now in \(B(G',G_c)\) we can choose all cliques in \(\{C_v \mid v \in C^*\}\) to be matched with \(D_i\) for \(1 \le i \le \ell \), and then match \(C_v^{\ell -1}\) with \(D_v\) for all \(v \in C^*\). Then in the new matching we have \(\ell \) additional edges between \(C_v^{\ell -1}\) and \(D_v\) for \(v \in C^*\), each with weight \(\ell -1\). Hence \(d_M(G',G_c) \le d_0-\ell (\ell -1)\).
(\(\Leftarrow \)): Assume that there is a cluster graph \(G'=(V,E')\) such that \(|E' \oplus E(G)| \le k\) and \(d_M(G',G_c) \le d\). Note that \(k<\ell ^7\), thus \(k<|C_v|\) and \(k<|C_B|\). Consequently, we can only modify edges between vertices in \(C_e\). It is easy to see that in any optimal matching in \(B(G',G_c)\), we still have that clique \(C_B\) must be matched with \(D_B\) and clique \(C_e\) must be matched with \(D_e\) for every \(e \in E(G_0)\). We should choose \(\ell \) cliques from \(\{C_v \mid v \in V(G_0)\}\) to be matched with \(D_i\) for \(1 \le i \le \ell \), which creates \(\ell \) free cliques \(D_v\). Hence, to decrease the distance between G and \(G_c\) or to increase the matching, we have to create new cliques to be matched with these \(\ell \) free cliques \(D_v\). Note that every \(D_v\) only contains single vertices from \(C_e\) with \(v \in e\) and the vertices contained in \(C_v\). To create new cliques we need to first separate \(C_e\) to get single vertices and then combine them. To decrease the distance by \(\ell (\ell -1)\), we need to separate at least \(\ell (\ell -1)\) single vertices from \(C_e\). This will cost at least \(\ell (\ell -1)(\ell ^4+1)-\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) =\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) (2\ell ^4+1)\) edge deletions if we always separate one \(C_e\) into three parts and get two single vertices. Then we need to combine these single vertices into at most \(\ell \) cliques since there are at most \(\ell \) free cliques \(D_v\). This will cost at least \(\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) \) edge insertions if all these \(\ell (\ell -1)\) single vertices can be partitioned into \(\ell \) groups and each group has \(\ell -1\) vertices. Since \(k=\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) (2\ell ^4+1)+\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) \), we have that in the first step we have to choose \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) cliques \(C_e\) and separate them into three parts and all these \(\ell (\ell -1)\) single vertices are evenly distributed in \(\ell \) free cliques \(D_v\). This means that in \(G_0\) we can select \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) edges between \(\ell \) vertices and each vertex has \(\ell -1\) incident edges. Thus there is a clique of size \(\ell \) in \(G_0\). \(\square \)
The next lemma shows that DCEditing (Edge Dist) is W[1]-hard with respect to k. The corresponding parameterized reduction is from Clique and shares some similarities with the reduction presented in the proof of Lemma 4 with respect to the edge gadgets. Our proof is based on the following property for instances of DCEditing (Edge Dist) with \(k+d=|E(G) \oplus E(G_c)|\).
Observation 2
If an instance \((G,G_c,k,d)\) of DCEditing (Edge Dist) (DCDeletion (Edge Dist) or DCCompletion (Edge Dist)) has the property that \(k+d=|E(G) \oplus E(G_c)|\), then any solution \(G'\) satisfies that \(|E(G) \oplus E(G')|=k\), \(|E(G') \oplus E(G_c)|=d\), and \(E(G) \oplus E(G') \subseteq E(G) \oplus E(G_c)\).
Proof
On the one hand, for any graph \(G'\), we have that
$$\begin{aligned} |E(G') \oplus E(G)|+|E(G') \oplus E(G_c)| \ge |E(G) \oplus E(G_c)|=k+d. \end{aligned}$$
On the other hand, a solution \(G'\) satisfies that \(|E(G') \oplus E(G)| \le k\) and \(|E(G') \oplus E(G_c)| \le d\). Thus we have that \(|E(G) \oplus E(G')|=k\) and \(|E(G') \oplus E(G_c)|=d\). Let \(S_1:=\big (E(G) \oplus E(G')\big ) {\setminus } \big (E(G) \oplus E(G_c)\big ) \) and \(S_2:=\big (E(G) \oplus E(G')\big ) {\setminus } S_1\). Then
$$\begin{aligned} |E(G') \oplus E(G)|=|S_1|+|S_2|. \end{aligned}$$
(1)
Next we show that
$$\begin{aligned} |E(G') \oplus E(G_c)|=|E(G) \oplus E(G_c)|+|S_1|-|S_2|=k+d+|S_1|-|S_2|. \end{aligned}$$
(2)
Let us consider
$$\begin{aligned} E(G')\oplus E(G_c)= & {} \Big (E(G) \oplus \big (E(G)\oplus E(G')\big )\Big ) \oplus E(G_c) \\= & {} \big (E(G)\oplus (S_1 \cup S_2)\big )\oplus E(G_c)\\= & {} \big (E(G)\oplus E(G_c)\big )\oplus (S_1 \cup S_2)\\= & {} \Big (\big (E(G)\oplus E(G_c)\big ) {\setminus } (S_1 \cup S_2)\Big ) \cup \Big ((S_1 \cup S_2) {\setminus } \big (E(G)\oplus E(G_c)\big )\Big )\\= & {} \big (E(G)\oplus E(G_c)\big ) {\setminus } S_2 \cup S_1, \end{aligned}$$
where the last equation holds since
$$\begin{aligned} S_1=\big (E(G) \oplus E(G')\big ) {\setminus } \big (E(G) \oplus E(G_c)\big ) \Rightarrow S_1 \cap \big ( E(G) \oplus E(G_c) \big ) =\emptyset \end{aligned}$$
and
$$\begin{aligned} S_2=\big (E(G) \oplus E(G')\big ) {\setminus } S_1 \Rightarrow S_2 \subseteq E(G) \oplus E(G_c). \end{aligned}$$
Hence, \(|E(G')\oplus E(G_c)|=|E(G)\oplus E(G_c)|+|S_1|-|S_2|\). If \(S_1 \ne \emptyset \), then combine Eqs. (1) and (2) and we get
$$\begin{aligned} |E(G') \oplus E(G)|+|E(G') \oplus E(G_c)|=k+d+2|S_1|>k+d, \end{aligned}$$
which is a contradiction. Thus we conclude that \(S_1=\emptyset \) and hence \(E(G) \oplus E(G') \subseteq E(G) \oplus E(G_c)\). \(\square \)
From this result we can conclude that, when \(k+d=|E(G) \oplus E(G_c)|\), the only way to get a solution \(G'\) is to find a subset of \(E(G) \oplus E(G_c)\) with size exactly k such that modifying the edges of this subset in G yields a cluster graph.
Lemma 5
DCEditing (Edge Dist) is NP-complete and W[1]-hard with respect to the budget k, even if the input graph G is a cluster graph and \(k+d=|E(G) \oplus E(G_c)|\).
Proof
We present a parameterized reduction from Clique, where given a graph \(G_0\) and an integer \(\ell \), we are asked to decide whether \(G_0\) contains a complete subgraph of order \(\ell \). Clique is W[1]-hard when parameterized by \(\ell \)
[18]. Given an instance \((G_0,\ell )\) of Clique, we construct an instance \((G,G_c,k,d)\) of DCEditing (Edge Dist) as follows. The construction is illustrated in Fig. 7. We set \(L_1:=\ell ^7+1\) and \(L_2:=\ell ^2\). We first construct G. For every vertex v of \(G_0\), we create a clique \(C_v\) of size \(L_1+1=\ell ^7+2\), and for every edge e of \(G_0\), we create a clique \(C_e\) of size \(2L_2\). Note that G is already a cluster graph. Next, we construct \(G_c\). For every vertex v of \(G_0\), let \(C_e^1,C_e^2, \dots , C_e^p\) be all cliques of size \(2L_2\) in G which represent all edges incident on v. For each vertex v of \(G_0\), we create two cliques in \(G_c\). One of them contains \(L_1\) vertices of \(C_v\), the other contains the one remaining vertex of \(C_v\), called the single vertex of \(C_v\), and \(L_2\) vertices from every \(C_e^i\) for \(1 \le i \le p\) (see also Fig. 7). Set
$$\begin{aligned} k:=\ell L_1+\ell (\ell -1)L_2 + \ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2+\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) {L_2}^2 \end{aligned}$$
(3)
and set \(d:=|E(G) \oplus E(G_c)|-k\). This reduction works in polynomial time.
Now we show that there is a clique of size \(\ell \) in \(G_0\) if and only if there is a cluster graph \(G'=(V,E')\) such that \(|E' \oplus E(G)| \le k\) and \(|E' \oplus E(G_c)| \le d\). To simplify the proof, we assume \(\ell \ge 3\) in the following.
(\(\Rightarrow \)): Assume that there is a clique \(C^*\) of size \(\ell \) in \(G_0\). We modify graph G in the following two steps. We first partition cliques in G according to \(C^*\) by deleting edges as follows. For every vertex v in \(C^*\), find the clique \(C_v\) and delete edges between the single vertex in \(C_v\) and the remaining \(L_1\) vertices. For every edge e in \(C^*\), find the clique \(C_e\) in G and delete edges to partition the clique into two parts, each with \(L_2\) vertices. In the first step we delete \(\ell L_1+\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) {L_2}^2\) edges. The next step is to combine some cliques by adding edges. For every vertex v in \(C^*\), we combine the single vertex from \(C_v\) and \(\ell -1\) cliques of size \(L_2\) into one clique. In this step we add \(\ell (\ell -1)L_2 +\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2\) edges. Thus in total we modify k edges.
(\(\Leftarrow \)): Assume that there is a cluster graph \(G'=(V,E')\) such that \(|E(G') \oplus E(G)| \le k\) and \(|E(G') \oplus E(G_c)| \le d\). Since \(k+d=|E(G) \oplus E(G_c)|\), we have \(|E(G) \oplus E(G')|=k\) and \(|E(G') \oplus E(G_c)|=d\). Thus, to get the solution \(G'\) we have to modify exactly k edges from \(E(G) \oplus E(G_c)\). As a result, we only have the following four kinds of operations:
-
1.
partition a clique \(C_v\) in G into two parts, one with the single vertex and the other with \(L_1\) vertices, which costs \(L_1=\ell ^7+1\) edge deletions;
-
2.
partition a clique \(C_e\) in G into two parts, each with \(L_2\) vertices contained in one clique in \(G_c\), which costs \({L_2}^2=\ell ^4\) edge deletions;
-
3.
combine the single vertex of \(C_v\) with some cliques of size \(L_2\) which come from partitioning clique \(C_e\) into two parts, which costs \(aL_2=a\ell ^2\) edge insertions for some integer a;
-
4.
combine some cliques of size \(L_2\) which come from partitioning clique \(C_e\) into two parts, which costs \(\left( {\begin{array}{c}b\\ 2\end{array}}\right) {L_2}^2=\left( {\begin{array}{c}b\\ 2\end{array}}\right) \ell ^4\) edge insertions for some integer b.
First, we claim that there must be \(\ell \) cliques of size \(L_1+1\) in G that have been partitioned. Note that \(k=\ell ^8+\frac{1}{2}\ell ^7-\ell ^6+\frac{1}{2}\ell ^5+\ell ^4-\ell ^3+\ell \), where the last additive term \(\ell \) can only come from partitioning \(\ell \) cliques of size \(L_1+1\) in G. In addition, there cannot be more than \(\ell \) cliques of size \(L_1+1\) in G that have been partitioned, since \((\ell +1) L_1 > k\) (assuming \(\ell \ge 3\)). Thus exactly \(\ell \) cliques of size \(L_1+1\) in G have to be partitioned and we get \(\ell \) single vertices. This costs \(\ell L_1\) edge deletions, which is the first additive item of Eq. (3).
Next, we claim that at least \(\ell (\ell -1)\) cliques of size \(L_2\) are combined with these \(\ell \) single vertices we got in the last step. This is because the second term of k, \(\ell (\ell -1)L_2\), is strictly less than \(\ell ^4\), and hence can only come from the third kind of operation, combining the single vertex with cliques of size \(L_2\). Suppose that \(\ell (\ell -1)+\delta \) cliques of size \(L_2\) are combined with these single vertices for some \(\delta \ge 0\). Then we need \((\ell (\ell -1)+\delta )L_2\) edge insertions. Note that the second additive term of Eq. (3) is \(\ell (\ell -1)L_2\).
Then, we need to partition at least \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) +\frac{\left\lceil \delta \right\rceil }{2}\) cliques of size \(2L_2\) so that we can combine them with single vertices. Denote by \(f_1(\ell ,\delta )\) the number of edge deletions this separation cost. Clearly, \(f_1(\ell ,\delta ) \ge (\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) +\frac{\left\lceil \delta \right\rceil }{2}){L_2}^2\). Notice that the last additive term of Eq. (3) is \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) {L_2}^2\).
Finally, when we combine a single vertex with more than one clique of order \(L_2\), then we also need to add edges between these cliques. Denote by \(f_2(\ell ,\delta )\) the number of edge insertions between these cliques. Since we have \(\ell (\ell -1)+\delta \) cliques of size \(L_2\) and \(\ell \) single vertices, and every clique is combined with one single vertex, it follows that \(f_2(\ell ,\delta ) \ge \ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2\). Notice that the third additive term of Eq. (3) is \(\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2\).
Overall, we need
$$\begin{aligned} \ell L_1+(\ell (\ell -1)+\delta )L_2 + f_2(\ell ,\delta )+f_1(\ell ,\delta ) \ge k \end{aligned}$$
edge modifications. Equality only holds if \(\delta =0\), \(f_1(\ell ,\delta )=\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) {L_2}^2\), and \(f_2(\ell ,\delta )=\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2\). Here \(f_2(\ell ,\delta )=\ell \left( {\begin{array}{c}\ell -1\\ 2\end{array}}\right) {L_2}^2\) means that we can partition all \(\ell (\ell -1)\) cliques of size L into \(\ell \) parts, each with \(\ell -1\) cliques, and then combine all \(\ell -1\) cliques in each part with one single vertex. Moreover, \(f_1(\ell ,\delta )=\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) {L_2}^2\) means that all these \(\ell (\ell -1)\) cliques of order L come from partitioning \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) cliques of order \(2L_2\). Then, in \(G_0\) we have \(\ell \) vertices (corresponding to these \(\ell \) single vertices) and \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) edges (corresponding to these \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) cliques of order \(2L_2\)) such that each vertex has \(\ell -1\) incident edges from these \(\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) edges. Hence, these \(\ell \) vertices form a clique in \(G_0\). \(\square \)
Note that in the reduction of Lemma 5 the constructed graph G is a cluster graph. According to Observations 1 and 2, this reduction can also be used to prove W[1]-hardness with respect to the distance d.
Corollary 1
DCEditing (Edge Dist) is NP-complete and W[1]-hard with respect to the distance d, even if the input graph G is a cluster graph and \(k+d=|E(G) \oplus E(G_c)|\).
The following result also exploits the property that we need exactly k edge modifications when \(k+d=|E(G) \oplus E(G_c)|\).
Lemma 6
DCDeletion (Edge Dist) is W[1]-hard with respect to the distance d, even when \(k+d=|E(G) \oplus E(G_c)|\).
Proof
We present a parameterized reduction from Multicolored Clique. In Multicolored Clique, we are given an integer \(\ell \) and a graph where every vertex is colored with one of \(\ell \) colors. The task is to find a clique of order \(\ell \) containing one vertex of each color. Multicolored Clique is W[1]-hard with respect to \(\ell \)
[20]. Let \((G_0=(V,E),\ell )\) be an instance of Multicolored Clique. We construct an instance \((G,G_c,k,d)\) of DCDeletion (Edge Dist) as follows. For every vertex v in \(G_0\), create a clique \(C_v\) with \(2\ell \) vertices in \(G_c\). Add a special clique with one vertex \(v^*\) in \(G_0\). For graph G, first copy \(G_c\) and then add more edges as follows: add edges between \(v^*\) and all other vertices in G, and for every edge \(\{u,v\}\) in \(G_0\), add all edges between vertices in \(C_u\) and vertices in \(C_v\). Set \(d=2\ell ^2+4\ell ^2 \left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) and \(k=|E(G) \oplus E(G_c)|-d\). This reduction works in polynomial time and the construction is illustrated in Fig. 8.
Note that \(k+d=|E(G) \oplus E(G_c)|\) and according to Observation 2 a solution \(G'\) for instance \((G,G_c,k,d)\) has to delete exactly k edges from \(E(G) \oplus E(G_c)\) from G, which is equivalent to adding exactly d edges from \(E(G) \oplus E(G_c)\) to \(G_c\). Next we show that there is a multicolored clique of order \(\ell \) in \(G_0\) if and only if there is a cluster graph \(G'=(V,E(G'))\) such that \(|E(G') \oplus E(G)| \le k\) and \(|E(G') \oplus E(G_c)| \le d\).
\((\Rightarrow :)\) Suppose that there is a multicolored clique \(C_0\) of size \(\ell \) in \(G_0\), then for all vertices in \(C_0\) find the corresponding cliques in \(G_c\), and combine these \(\ell \) cliques and vertex \(v^*\) into one big clique. Denote the resulting graph by \(G'\). To get graph \(G'\) from G, we need to delete \(|E(G) \oplus E(G_c)|-(2\ell ^2+4\ell ^2 \left( {\begin{array}{c}\ell \\ 2\end{array}}\right) )=k\) edges, and all these edges are in \(E(G) \oplus E(G_c)\). In this way we get a new cluster graph \(G'\) such that \(|E(G') \oplus E(G)| = k\) and \(|E(G') \oplus E(G_c)| = d\).
\((\Leftarrow :)\) Suppose that there is a cluster graph \(G'\) such that \(|E(G') \oplus E(G)| \le k\) and \(|E(G') \oplus E(G_c)| \le d\). Since \(k+d=|E(G) \oplus E(G_c)|\), it has to hold that \(|E(G') \oplus E(G)| = k\) and \(|E(G') \oplus E(G_c)| = d\). Since \(d=2\ell ^2+4\ell ^2 \left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \), and except for \(v^*\), every clique in \(G_c\) has \(2\ell \) vertices, \(2\ell ^2\) in d must come from adding edges between \(v^*\) and \(\ell \) cliques in \(G_c\). Since \(G'\) is a cluster graph, there must be edges between every pair of these \(\ell \) cliques in G, which means that there is a multicolored clique of order \(\ell \) in \(G_0\). \(\square \)
The final two results show W[1]-hardness with respect to the distance d for DCEditing (Matching Dist) and DCDeletion (Matching Dist).
Lemma 7
DCEditing (Matching Dist) is W[1]-hard with respect to the distance d.
Proof
We present a parameterized reduction from Clique on Regular Graphs, where given a regular graph \(G^*=(V,E)\) with vertex degree r with \(r<\frac{n}{2}\), and a number \(k^*\) with \(k^*\le r\), we are asked to decide whether \(G^*\) contains a clique of size \(k^*\). Clique on Regular Graphs is W[1]-hard with respect to \(k^*\)
[10]. Given an instance \((G_0, k_0, r)\) of Clique on Regular Graphs, we construct an instance \((G, G_c, d, k)\) of DCEditing (Matching Dist) as follows.
Graph G is the same as \(G_0\) and graph \(G_c=(V,\left( {\begin{array}{c}V\\ 2\end{array}}\right) )\) is a complete graph. Set \(d:=k_0\) and \(k:=\frac{n(n-1-r)}{2}-k_0(n+k_0-2r-2)\). The construction can trivially be done in polynomial time. In the following we show that there is a clique of size \(k_0\) in \(G_0\) if and only if \((G, G_c, d, k)\) is a yes-instance of DCEditing (Matching Dist).
(\(\Rightarrow \)): Assume that there is a clique of order \(k_0\) in \(G_0\); we construct a graph \(G'\) which consists of two cliques, where one of them contains the vertices from the clique of order \(k_0\) in \(G_0\); the other, denoted by \(C_\text {max}\), contains the remaining vertices and has order \(n-k_0\). Next we compute \(|E(G) \oplus E(G')|\), which consists of two parts:
-
\(D(k_0)\): the set of edges between vertices in \(C_\text {max}\) and the remaining vertices, and
-
\(A(k_0)\): the set of added edges between vertices in \(C_\text {max}\).
Since the vertices outside \(C_\text {max}\) form a clique, every such vertex has \(r-k_0+1\) edges connected to vertices in \(C_\text {max}\). Thus \(|D(k_0)|=k_0(r-k_0+1)\). To determine \(|A(k_0)|\), we count the sum of the degrees of vertices in \(C_\text {max}\). Before adding edges to \(C_\text {max}\), the sum is \((n-k_0)r\). After adding edges the sum should be \((n-k_0)(n-k_0-1)+|D(k_0)|\). So the number of edges which need to added to \(C_\text {max}\) is
$$\begin{aligned} |A(k_0)|=\frac{(n-k_0)(n-k_0-1)+|D(k_0)|-(n-k_0)r}{2}. \end{aligned}$$
Then we get the size of the modification set for \(G'\):
$$\begin{aligned} |E(G) \oplus E(G')|=|D(k_0)|+|A(k_0)|=\frac{n(n-1-r)}{2}-k_0(n+k_0-2r-2)=k. \end{aligned}$$
(\(\Leftarrow \)): To simplify the following proof, we define three functions:
-
\(g_1(x):=x(r-x+1)\),
-
\(g_2(x):=\frac{(n-x)(n-x-1)+g_1(x)-(n-x)r}{2}\), and
-
\(f(x):=g_1(x)+g_2(x)=\frac{n(n-1-r)}{2}-x(n+x-2r-2)\).
Since \(r <\frac{n}{2}\), we have that f(x) is monotonically decreasing and \(f(k_0)=k\).
Suppose that there is no clique of size \(k_0\) in \(G_0\). We need to show that there is no cluster graph \(G'\) satisfying both \(|E(G) \oplus E(G')| \le k\) and \(d(G_c, G') \le d\). Suppose towards a contradiction that there is such a cluster graph \(G'\). Denote the largest cluster in \(G'\) as \(C_{\max }\). Since \(d_M(G_c, G') \le d\), we have that \(|V(C_\text {max})| \ge n-k_0\). Define
-
D: the set of edges between vertices in \(C_\text {max}\) and the remaining vertices, and
-
A: the set of added edges between vertices in \(C_\text {max}\).
To get the clique \(C_{\max }\) from G, we have to delete all edges in D and add all edges in A, thus \(|E(G) \oplus E(G')| \ge |D| + |A|\). We distinguish the following two cases:
Case 1 \(|C_\text {max}| = n-k_0\). Every vertex outside \(C_{\max }\) has at least \(r-k_0+1\) edges connected to vertices in \(C_{\max }\), and since there is no clique of order \(k_0\) in \(G_0\), among all vertices outside \(C_\text {max}\), there is at least one vertex which has more than \(r-k_0+1\) edges connected to vertices in \(C_\text {max}\). This means that \(|D|>g_1(k_0)\) and \(|A|>g_2(k_0)\). Thus, we have:
$$\begin{aligned} |E(G) \oplus E(G')| \ge |D| + |A| > g_1(k_0) + g_2(k_0) = f(k_0) =k. \end{aligned}$$
Case 2 \(|C_\text {max}| > n-k_0\).
Suppose that \(|C_\text {max}|=n-k'\), where \(k'<k_0\) is the number of all vertices outside \(C_\text {max}\). Now we have \(|D|\ge g_1(k')\) and \(|A| \ge g_2(k')\), and
$$\begin{aligned} |E(G) \oplus E(G')| \ge |D| + |A| \ge g_1(k') + g_2(k')= f(k') > f(k_0) = k. \end{aligned}$$
The last inequality holds since f(k) is monotonically decreasing.
In both cases we have that there is no solution for instance \((G, G_c, d, k)\). \(\square \)
The above reduction cannot be used to show W[1]-hardness with respect to d for DCDeletion (Matching Dist) since both edge insertions and edge deletions are needed. Next we show that DCDeletion (Matching Dist) remains W[1]-hard with respect to d.
Lemma 8
DCDeletion (Matching Dist) is W[1]-hard with respect to the distance d.
Proof
We present a parameterized reduction from Clique on Regular Graphs, where, given a regular graph \(G^*=(V,E)\) with vertex degree r with \(r<\frac{n}{2}\), and number \(k^*\) with \(k^*\le r\), we are asked to decide whether \(G^*\) contains a clique of size \(k^*\). Clique on Regular Graphs is known to be W[1]-hard with respect to \(k^*\)
[10]. Given an instance \((G_0, \ell ,r)\) of Clique on Regular Graphs, where \(G_0\) is a regular graph with vertex degree r, we construct an instance \((G, G_c, d, k)\) of DCDeletion (Matching Dist) as follows. The construction is illustrated in Fig. 9.
Let \(\{v_1,v_2,\dots ,v_n\}\) be the vertex set of \(G_0\). For graph G, we first copy the whole graph \(G_0\). Then we add a universal vertex and a private neighbor for each original vertex: we add a universal vertex \(v_0\) and add an edge between \(v_0\) and every vertex \(v_i\) in \(G_0\), and for every vertex \(v_i\) in \(G_0\), we add vertex \(v_i'\) and add an edge between \(v_i\) and \(v_i'\). The graph \(G_c\) has the same vertex set as G. Moreover, graph \(G_c\) contains edges between \(v_i\) and \(v_i'\) for all \(1 \le i \le n\). That is, \(G_c\) consists of \(n+1\) cliques: \(C_0\) with \(V(C_0)=\{v_0\}\) and \(C_i\) with \(V(C_i)=\{v_i,v_i'\}\) for \(1 \le i \le n\). Set \(k=n+\frac{r n}{2} -\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) and \(d=\ell \). Next we show that there is clique of size \(\ell \) in \(G_0\) if and only if the constructed instance \((G, G_c, d, k)\) is a yes-instance of DCDeletion (Matching Dist).
(\(\Rightarrow \)): Assume that there is clique \(C^*\) of order \(\ell \) in \(G_0\). Then in G we first delete edges \(\{v_i,v_i'\}\) for all \(v_i \in V(C^*)\) and delete edges \(\{v_0,v_i\}\) for all \(v_i \in \{v_1,v_2, \dots , v_n\} {\setminus } V(C^*)\). Second, delete all edges between vertices in \(\{v_1,v_2, \dots , v_n\}\) except for edges between vertices in \(C^*\). We delete n edges in the first step and \(\frac{r n}{2} -\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \) edges in the second step, since \(G_0\) is a regular graph and \(C^*\) is a clique. By deleting these \(n+\frac{r n}{2} -\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) =k\) edges, we get a cluster graph \(G'\) which contains \(n+1\) cliques: \(C_i'\) with \(V(C_i')=\{v_i'\}\) for \(v_i \in V(C^*)\), \(C_j'\) with \(V(C_j')=\{v_j,v_j'\}\) for \(v_i \in \{v_1,v_2, \dots , v_n\} {\setminus } V(C^*)\), and \(C_0'\) with \(V(C_0')=\{v_0\} \cup V(C^*)\). Thus \(d_M(G',G_c)=\ell \).
(\(\Leftarrow \)): Assume that \((G, G_c, d, k)\) is a yes-instance of DCDeletion (Matching Dist) and let \(G'\) be a solution. Since we can only delete edges, for every pair of edges \(\{v_i,v_i'\}\) and \(\{v_i,v_0\}\) for \(v_i \in \{v_1, v_2,\dots , v_n\}\), we have to delete one of them because \(\{v_0,v_i'\} \not \in E(G)\). This means that for every \(v_i \in \{v_1,v_2, \dots , v_n\}\) vertex \(v_i\) is either in the same clique with \(v_i'\) or with \(v_0\). Suppose that in \(G'\) there are \(p \le \ell \) vertices from \(\{v_1,v_2, \dots , v_n\}\) which are in the same clique with \(v_0\). Then these p vertices must form a clique \(C'\) in \(G_0\). To get \(G'\), we have to delete edges \(\{v_i,v_i'\}\) for all \(v_i \in V(C')\) and edge \(\{v_i,v_0\}\) for every vertex \(v_i \in \{v_1,v_2, \dots , v_n\} {\setminus } V(C')\). This costs n edge deletions. Moreover, we have to delete all edges between vertices in \(\{v_1, v_2,\dots , v_n\}\) except for edges between vertices in \(C'\). This costs \(\frac{r n}{2} -\left( {\begin{array}{c}p\\ 2\end{array}}\right) \) edge deletions. Overall we have
$$\begin{aligned} |E(G) \oplus E(G')|=n+\frac{r n}{2} -\left( {\begin{array}{c}p\\ 2\end{array}}\right) . \end{aligned}$$
Since \(G'\) is a solution, we have that \(|E(G) \oplus E(G')| \le k=n+\frac{r n}{2} -\left( {\begin{array}{c}\ell \\ 2\end{array}}\right) \). Hence, \(p \ge \ell \) and \(G_0\) contains a clique of order \(\ell \). \(\square \)
We now have shown all intractability results stated in Theorem 2.