1 Introduction

Stoica et al. [48] recently introduced graph-based problems on fair (re)districting, employing “margin of victory” as the measure of fair representation. In their work, they performed theoretical and empirical studies; the latter clearly supporting the practical relevance of these problems. The main contribution of their work is certainly with respect to modeling and performing promising empirical studies (based on greedy heuristics). In this paper, we instead focus on the theoretical aspects, significantly extending their findings in this direction.

Dividing agents into groups is a ubiquitous task. Electoral districting is one of the prime examples: Voters are partitioned into voting districts, each electing its own representative.Footnote 1 Another example emerges in education; in many countries, children are assigned to schools based on their residency. In such scenarios, the agents (in the settings above, voters or school children) are often placed on a (social or geographical) network. When assigning them to districts, it is natural to require that every district should be connected in the network and meet some further criteria.

In districting, there are various objectives. What we study here can be interpreted as a “benevolent” counterpart of the well-studied gerrymandering scenario in voting theory. For gerrymandering, every voter is characterized by their projected vote in the upcoming election. The goal is then to find a partition of the voters into connected districts such that some designated alternative gains the majority in as many districts as possible. Following Stoica et al. [48], we consider an opposite objective. That is, we assume that some central authority wishes to partition the agents, which are of different types, into connected districts that are fair, where a district is deemed fair if the margin of victory in the district is smaller than a given bound. The margin of victory of a district is the minimum number of agents whose deletion results in a tie between the two most frequent types in the district. When it comes to school districts, sociodemographic attributes such as race, gender, and religion may be modeled. Here, a low margin of victory would be desirable because a majority of students sharing certain attributes may result in significant funding disparities between schools, as claimed by EdBuild [19] (see Stoica et al. [48] for a more extensive discussion). In electoral districting where agents’ types can represent their projected vote or ethnicity, a low margin of victory may foster competition among politicians, thereby motivating elected officials to do a great job. To illustrate that districts that are dominated by a certain ethnicity are a serious problem in particular in developing countries, we quote the two Noble price winners Banerjee and Duflo [4, pp. 251–252]

There is reason to be concerned that voting [in developing countries] is often based on ethnic loyalties, which means that the candidate from the largest ethnic group often wins, whatever his intrinsic merit. [...] [I]f voters choose based on ethnicity rather than on merit, the quality of candidates representing the majority group will suffer: These candidates don’t need to make much of an effort because the fact that they are from the “right” caste or ethnic group is sufficient to ensure that they are elected.

Banerjee and Duflo [5] even found evidence that in the 1980 s and 1990 s in North India elected official that belong to the (clearly) dominating caste group were significantly more likely to be corrupt. This illustrates the practical importance of creating districts with a low margin of victory (in terms of ethnicity). The importance of this problem is further stressed by recent finding of Zhao et al. [50], who provided evidence that the 2021 Georgia Congressional Districting Plan was designed in a way to have a high margin of victory in each district (thereby making elections in each district non-competitive and non-responsive to change in the preferences of voters).

In our work, we build upon the studies of Stoica et al. [48] to search for tractable special cases of fair districting over graphs. We focus on the Fair Connected Districting (FCD) problem (a natural special case of Stoica et al.’s Fair Connected Regrouping problem). The input of FCD consists of a graph \(G=(V,E)\) in which every vertex is assigned a color from a set C, and integers k, \(\ell\), \(s_{\textrm{min}}\), and \(s_{\textrm{max}}\). The question is whether the vertex set of G can be partitioned into k connected districts, each containing between \(s_{\textrm{min}}\) and \(s_{\textrm{max}}\) vertices, whose margin of victory is at most \(\ell\). The difference to Fair Connected Regrouping is that FCD does not impose any constraints to which districts an agent can be assigned.Footnote 2

It is easy to see that FCD generalizes the known NP-hard Perfectly Balanced Connected Partition problem [14, 18], which asks for a partition of a graph into two connected components of the same size (see Proposition 1). This motivates a parameterized complexity analysis and the study of restrictions of the underlying graph in order to identify tractable special cases. Specifically, we analyze the computational complexity of FCD on specific graph classes and the parameterized complexity of FCD with respect to several problem-specific parameters (such as \(\vert C \vert\) and k) as well as several parameters measuring structural properties of the underlying graph (such as its treewidth or vertex cover number).

1.1 Related work

1.1.1 Relation to model studied by Stoica et al. [48]

Stoica et al. [48] introduced Fair Connected Regrouping, which is a generalization of our FCD problem. Fair Connected Regrouping differs from FCD in that, in Fair Connected Regrouping, one is additionally given a function that specifies for each vertex to which district it can belong. They proved that Fair Connected Regrouping is NP-hard even for only two colors and two districts. Moreover, Stoica et al. [48] considered two special cases of Fair Connected Regrouping: Fair Regrouping (omitting connectivity constraints) and Fair Regrouping_X (omitting connectivity constraints and any restriction to which districts vertices can belong). They proved that Fair Regrouping is NP-hard for three colors but in XP with respect to the number of districts. Turning to Fair Regrouping_X, they showed that the problem is in XP with respect to the number of colors or districts, but left open the general complexity. Overall, our problem is a special case of Fair Connected Regrouping, incomparable to Fair Regrouping, and a generalization of Fair Regrouping_X. Thus, there are no direct implications of the studies of Stoica et al. for our problem.

Moreover, our problem differs from the ones studied by Stoica et al. [48] in a slightly different definition of margin of victory. While we look at the number of vertices that need to be deleted to have a tied most frequent color, they examine the number of vertices that need to change their color such that the most frequent color changes. We chose our definition in order to be able to distinguish the case of two tied most frequent colors from the case where one color appears once more than the others (which both have margin of victory one in the model of Stoica et al. [48]). In all other cases, if m is the margin of victory in our definition, then the margin of victory in the definition of Stoica et al. [48] is \({\lfloor }{\frac{m}{2}}{\rfloor }\). However, all our algorithmic results can be extended to the definition of Stoica et al. [48]. We expect that hardness results similar to ours can be obtained for their definition as well.

1.1.2 Further related work

Following up on the work of Stoica et al. [48], Boehmer and Koana [10] analyzed the Fair Regrouping and the Fair Regrouping_X problem in more detail. Among others, they proved that Fair Regrouping_X without size constraints is polynomial-time solvable while the NP-hard Fair Regrouping problem without size constraints is polynomial-time solvable for two colors and fixed-parameter tractable with respect to the number of districts. In addition to the margin of victory, they also considered the maximum difference between the occurrences of two colors in a district as an additional fairness notion. Again, their results have no implications on the complexity of our FCD problem.

FCD is relevant in district-based elections, where voters are partitioned into districts and each district elects its own representative. Several papers have studied how to assign voters to districts so as to “fairly” reflect the political choices of voters [3, 34, 35, 43, 44]. Well-studied in this context is gerrymandering, which can be regarded as a “malicious” counterpart to our problem. In gerrymandering, the task is to partition a set of voters into districts obeying certain conditions such that a designated alternative wins in as many districts as possible. An intuitive strategy to solve this problem, which is not necessarily optimal [45], is to maximize the number of districts where the designated alternative wins only by a small margin (this is somewhat related to our problem.) Initially, gerrymandering has been predominantly studied from the perspective of social and political science [23, 28, 40]. More recently, different variants of gerrymandering have been considered from an algorithmic perspective [20, 38]. Notably, the study of gerrymandering over graphs, which is analogous to our problem, has recently gained significant interest [6, 15, 26, 29]. In particular, as done here for FCD, Bentert et al. [6], Gupta et al. [26], and Ito et al. [29] analyzed the complexity of gerrymandering on paths, cycles, and trees and studied the influence of the number of candidates/colors and the number of districts. A similar model for graph-based redistribution scenarios and political districting has been studied under the name “network-based vertex dissolution” [7].

Partitioning agents of different types into balanced groups is conceptually closely related to studies of social segregation. In computer science, social segregation is, for instance, quite extensively studied in the context of Schelling’s segregation model [47]: While initially the work in this context was mostly concerned with the theoretical analysis of segregation patterns [8, 11], recently Schelling’s model has been approached from a game-theoretic perspective [1, 33].

1.2 Contribution

Motivated by the NP-hardness of FCD (see Proposition 1), we conduct a parameterized complexity analysis of FCD and study restrictions of the underlying graph in order to identify tractable special cases. We investigate the influence of problem-specific parameters (the number \(\vert C \vert\) of colors, the number k of districts, and the margin of victory \(\ell\)) and the structure of the underlying graph on the computational complexity of FCD.

The motivation for our refined complexity analysis is two-fold. First, analyzing special graph classes and types of graphs offers a better understanding of the complexity of FCD: Path, stars, and cycles form the building blocks of more complex graphs and thus understanding the problem on these graphs is vital for a further analysis.Footnote 3 Moreover, real-world networks will usually have some structure. One specific use case for our algorithms might be sampling algorithms for “fair” districting plans. These sampling algorithms often work by pre-aggregating groups of voters, creating a spanning tree on the merged space, and then continuing by further merging adjacent groups [2, 16, 42]. Here, our analysis of FCD on trees and tree-like graphs might be useful. Second, considering the analysis of the influence of the number \(\vert C \vert\) of colors and the number k of districts, in most applications we can think of these two parameters are substantially smaller than the number of vertices, motivating to check whether FCD becomes tractable if one (or both) of these parameters are small. For instance, in their experiments, Stoica et al. [48] partitioned 50, 000 voters into 10 voting districts and 41 834 schoolchildren into 61 school districts with \(\vert C \vert =7\).

We show that FCD is NP-hard even if \(\vert C \vert =k=2\) and \(\ell =0\) but polynomial-time solvable on paths, cycles, stars, and caterpillars (for stars, our algorithm even runs in linear time). Subsequently, we extend our polynomial-time algorithms for paths and cycles to a polynomial-time algorithm for all graphs with a constant max leaf number (\({{\,\textrm{mln}\,}}\)), which are basically graphs that consist of a constant number of paths and cycles (where the two endpoints of each path and one point from each cycle can be arbitrarily connected).

Remarkably, in our most involved hardness reduction, we show that FCD already becomes NP-hard and even W[1]-hard with respect to \(\vert C \vert +k\) on trees. However, when the number of colors or the number of districts is constant, FCD on trees becomes polynomial-time solvable. In fact, we show that these results hold for some tree-like graphs as well. Herein, the tree-likeness of a graph is measured by one of three parameters, namely, the treewidth (\({{\,\textrm{tw}\,}}\)), the feedback edge number (\({{\,\textrm{fen}\,}}\)), and the feedback vertex number (\({{\,\textrm{fvn}\,}}\)). More precisely, as our most involved algorithmic results, we establish polynomial-time solvability of FCD when the number of colors and the treewidth are constant. We achieve this with a dynamic programming approach on the tree decomposition of the given graph empowered by some structural observations on FCD. Moreover, we observe that there is a simple polynomial-time algorithm on graphs with a constant feedback edge number when there are a constant number of districts. On the other hand, we prove that FCD is NP-hard for two districts even on graphs with \({{\,\textrm{fvn}\,}}=1\) (and \({{\,\textrm{tw}\,}}=2\)). Lastly, we show that FCD is polynomial-time solvable on graphs with a constant vertex cover number (\({{\,\textrm{vcn}\,}}\)) and fixed-parameter tractable with respect to the vertex cover number and the number of colors. A summary of our parameterized results can be found in Fig. 1. Notably, all our hardness results also hold without size constraints.

In our studies, we identify several sharp complexity dichotomies. For instance, FCD is polynomial-time solvable on trees with diameter at most three but NP-hard and W[1]-hard with respect to \(\vert C \vert +k\) on trees with diameter four. Similarly, FCD is NP-hard and W[1]-hard with respect to \(\vert C \vert +k\) on graphs with pathwidth at least two but polynomial-time solvable on pathwidth-one graphs.

To summarize, we show that FCD without size constraints is NP-hard even in very restricted settings, e.g., on trees or if \(\vert C \vert =k=2\) and \(\ell =0\). To make the problem tractable, one possibility is to significantly restrict the input graph, e.g., to consist of a constant number of paths and cycles, or to combine structural parameters of the given graph with the number \(\vert C \vert\) of colors or the number k of districts. For small \(\vert C \vert\) and k, the tractability of FCD extends to certain tree-like graphs and graphs with a small vertex cover number. In contrast to the parameters \(\vert C \vert\) and k, which have a strong influence on the complexity of FCD, the bound \(\ell\) on the margin of victory has only little impact as all hardness results already hold for \(\ell =0\) and all our algorithmic results hold for arbitrary \(\ell\).

1.2.1 Organization

The remainder of this paper is structured as follows. In Sect. 2, we formally introduce FCD and the graph parameters we examine. In Sect. 3, we present some preliminary results on FCD mostly concerning NP-hardness. In Sect. 4, we then consider FCD on very special graph classes such as paths and cycles as well as on graphs that can be partitioned into a bounded number of those. In Sect. 5, we shift our attention to trees and graphs that are tree-like. Lastly, in Sect. 6, we analyze the influence of the vertex cover number on the complexity of FCD. We defer the proofs of two technical results (marked with \(\bigstar\)) to the appendix.

Fig. 1
figure 1

Overview of our parameterized complexity results. Each box represents one parameterization of FCD. An arc from parameter p to another parameter \(p'\) indicates that p is upper-bounded by some function of \(p'\). For parameters in the red area (dotted), we prove that FCD is NP-hard even if the parameter is a constant. For parameters in the orange area (dashed), we prove W[1]-hardness and present an XP-algorithm. For parameters in the yellow area (solid thick), we have an XP-algorithm but W[1]-hardness is unknown. The green area (solid) indicates fixed-parameter tractability (Color figure online)

2 Preliminaries

For \(a,b\in {\mathbb {N}}\), let [ab] denote \(\{a,a+1,\dots , b-1,b\}\) and let [b] denote [1, b]. Throughout the paper, all graphs \(G = (V, E)\) are undirected and have no self-loops or multi-edges. By convention, we will use \(n:= \vert V \vert\). Given a graph \(G=(V,E)\) and a vertex set \(V'\subseteq V\), let \(G[V']\) be the graph G induced by the vertices from \(V'\).

2.1 Fair connected districting

Let \(C=\{c_1,\dots , c_{ \vert C \vert }\}\) be the set of colors. We assume that each vertex \(v\in V\) of the given graph has a color \(c\in C\) determined by a given coloring function \(\text {col}:V \rightarrow C\). For a vertex set \(V'\subseteq V\), let \(\chi _c(V')\) denote the number of vertices of color c in \(V'\). Moreover, let \(\chi (V') \in {\mathbb {N}}^{ \vert C \vert }\) denote the vector in which the \(i^{\textrm{th}}\) entry contains the number of vertices of color \(c_i\) in \(V'\), i.e, \(\chi _i(V')=\chi _{c_i}(V')\). For a vector \({\textbf{x}}=(x_1,\dots , x_t) \in {\mathbb {N}}^{t}\), let \(i^*_{\textbf{x}}\) denote an index with the largest entry in \({\textbf{x}}\), i.e., \(i^*_{\textbf{x}}\in {{\,\mathrm{arg\,max}\,}}_{i\in [t]} x_i\). The margin of victory \(\text {MOV}({\textbf{x}})\) of a vector \({\textbf{x}}\) is defined as the difference between the largest and second largest entry in \({\textbf{x}}\), i.e., \(\text {MOV}({\textbf{x}}):=x_{i^*_{\textbf{x}}}- \max _{i\in [t]{\setminus } \{i^*_{\textbf{x}}\}} x_i\). If \(t=1\), then we set \(\text {MOV}({\textbf{x}}):= x_1\). Accordingly, we define the margin of victory \(\text {MOV}(V')\) of a vertex set \(V'\subseteq V\) of colored vertices as \(\text {MOV}(V'):=\text {MOV}(\chi (V'))\). For \(\ell \in {\mathbb {N}}\), we call a vertex set \(V'\) \(\ell\)-fair if \(\text {MOV}(V')\le \ell\). We use the term district to refer to a vertex set \(V'\subseteq V\). We say that a color \(c_i\in C\) is the \(j^{\textrm{th}}\) most frequent color in a district \(V'\) if \(\chi _i(V')\) is the \(j^{\textrm{th}}\) largest entry in \(\chi (V')\) (we break ties arbitrarily unless stated otherwise). We now present our central problemFootnote 4:

figure a

2.2 Graph parameters

We define several graph parameters for an undirected graph \(G=(V,E)\). The diameter of G is the maximum shortest distance between any pair of vertices. The max leaf number \({{\,\textrm{mln}\,}}(G)\) (for a connected graph G) is the maximum number of leaves over all spanning trees of G. A vertex set \(V' \subseteq V\) is a vertex cover of G if \(G[V \setminus V']\) has no edge. A vertex set \(V' \subseteq V\) is a feedback vertex set if \(G [V \setminus V']\) is a forest. Analogously, an edge set \(E' \subseteq E\) is a feedback edge set if \((V, E \setminus E')\) is a forest. The vertex cover number \({{\,\textrm{vcn}\,}}(G)\), feedback vertex number \({{\,\textrm{fvn}\,}}(G)\), and feedback edge number \({{\,\textrm{fen}\,}}(G)\) is the size of a smallest vertex cover, feedback vertex set, and feedback edge set, respectively. A tree decomposition of a graph \(G = (V, E)\) is a pair \(\left( T, \{ B_x \}_{x \in V_T}\right)\), where \(T = (V_T, E_T)\) is a rooted tree and \(B_x \subseteq V\) for each \(x \in V_T\) such that

  1. (i)

    \(\bigcup _{x \in V_T} B_x = V\),

  2. (ii)

    for each edge \(\{u, v\} \in E\), there is an \(x \in V_T\) with \(u, v \in B_x\), and

  3. (iii)

    for each \(v \in V\), the set of nodes \(x \in V_T\) with \(v \in B_x\) induces a connected subtree in T.

The width of \((T, \{ B_x \}_{x \in V_T})\) is \(\max _{x \in V_T} \vert B_x \vert - 1\). The treewidth \({{\,\textrm{tw}\,}}(G)\) of G is the minimum width of all tree decompositions of G. The pathwidth \({{\,\textrm{pw}\,}}(G)\) of G is defined analogously with the additional constraint that T is a path.

If G is clear from context, then we simply omit it and write \({{\,\textrm{mln}\,}}\), \({{\,\textrm{vcn}\,}}\), \({{\,\textrm{fvn}\,}}\), \({{\,\textrm{fen}\,}}\), \({{\,\textrm{tw}\,}}\), and \({{\,\textrm{pw}\,}}\), respectively. For all our parameterized algorithms with respect to one of these parameters, we assume that the corresponding structure is given as part of the input. For instance, in our algorithm parameterized by \({{\,\textrm{tw}\,}}\), we assume that we are given a tree decomposition of width tw. However, note that in all cases, the worst-case running time of our algorithms would not change if we compute the structure using known parameterized algorithms.

2.3 Parameterized complexity theory

A parameterized problem L consists of a problem instance \({\mathcal {I}}\) and a parameter value \(k\in {\mathbb {N}}\). Then L lies in XP with respect to k if there exists an algorithm deciding L in \(\vert {\mathcal {I}} \vert ^{f(k)}\) time for some computable function f. Furthermore, L is called fixed-parameter tractable with respect to k if there exists an algorithm deciding L in \(f(k) \vert {\mathcal {I}} \vert ^{{\mathcal {O}}(1)}\) time for a computable function f. The corresponding complexity class is called FPT. There is a hierarchy of complexity classes for parameterized problems: FPT\(\subseteq\) W[1] \(\subseteq\) W[2] \(\subseteq\) XP, where it is commonly believed that all inclusions are strict. Thus, if L is shown to be W[1]-hard, then the common belief is that it is not fixed-parameter tractable. For instance, computing a clique of size at least k in a graph is known to be W[1]-hard with respect to the parameter k. One can show that L is W[t]-hard, \(t\ge 1\), by a parameterized reduction from a known W[t]-hard parameterized problem \(L'\). A parameterized reduction from \(L'\) to L is a function that maps an instance \(({\mathcal {I}}', k')\) of \(L'\) to an instance \(({\mathcal {I}}, k)\) of L such that \(({\mathcal {I}}', k')\) is a yes-instance for \(L'\) if and only if \(({\mathcal {I}}, k)\) is a yes-instance for L. Moreover, we require that k is bounded in a function of \(k'\) and that the transformation takes at most \(f(k') \vert {\mathcal {I}} \vert ^{{\mathcal {O}}(1)}\) time for some computable function f. Finally, we say that a parameterized problem is para-NP-hard if it is NP-hard even for constant parameter values.

3 Basic results on FCD

In this section, we make some basic observations on the computational complexity of FCD. First, we prove that FCD is para-NP-hard with respect to the combination \(\vert C \vert +k+\ell\) of all three problem-specific parameters. This strong general hardness result motivates the study of various restricted graph classes in subsequent sections.

The NP-hardness of FCD easily follows from the fact that it generalizes the Perfectly Balanced Connected Partition problem, which is NP-hard on bipartite graphs [18]:

figure b

Consider the following polynomial-time many-one reduction: Given an instance \(G=(V,E)\) of Perfectly Balanced Connected Partition, construct an equivalent instance of FCD by coloring all vertices in G in the same color, setting \(k=2\), \(\ell = \vert V \vert /2\), \(s_{\textrm{min}}=1\), and \(s_{\textrm{max}}=\infty\). Moreover, it is also possible to prove the NP-hardness for the case with \(\ell =0\) at the cost of introducing a second color:

Proposition 1

FCD is NP-hard even if G is bipartite, \(s_{\textrm{min}}=1\), \(s_{\textrm{max}}=\infty\), and (i) \(\vert C \vert =1\), \(k=2\) or (ii) \(\ell =0\), \(\vert C \vert =2\)\(k=2\).

Proof

To prove the second part, we present a polynomial-time reduction from Perfectly Balanced Connected Partition to FCD. Notably Perfectly Balanced Connected Partition is also NP-hard if we are given two vertices \(v_1,v_2\in V\) and the question is whether there is a partition \((V_1,V_2)\) with \(v_1\in V_1\) and \(v_2\in V_2\) (in the hardness proof by Dyer and Frieze [18, Theorem 2.2], the vertex \(a\) always belongs to one partition and \(b\) to the other).

Construction. Given an instance \((G=(V,E), v_1, v_2)\) of Perfectly Balanced Connected Partition with \(n:= \vert V \vert\), we construct an instance of FCD: We set \(C=\{c_1,c_2\}\), \(k=2\), and \(\ell =0\). We color all vertices of the given graph G in color \(c_1\). Moreover, we modify G by introducing n new vertices of color \(c_2\). We connect half of these vertices to \(v_1\) and the other half to \(v_2\).

Proof of Correctness. Let \((V_1,V_2)\) be a solution to the FCD instance. As \(\ell=0\) and we require that both \(V_1\) and \(V_2\) need to be non-empty, it needs to hold that both districts contain vertices of both colors. Moreover, as both \(G[V_1]\) and \(G[V_2]\) are connected, all vertices of color \(c_2\) adjacent to v need to be in the same district as \(v_1\) (and similarly for \(v_2\)). Thus, \(v_1\) and \(v_2\) need to be in different districts, each consisting of \(\frac{n}{2}\) vertices of color \(c_2\) and \(\frac{n}{2}\) vertices of color \(c_1\). Thereby, after removing all vertices of color \(c_2\), \((V_1,V_2)\) is a solution to the given Perfectly Balanced Connected Partition instance.

To prove the reverse direction, let \((V_1,V_2)\) be a solution to the given Perfectly Balanced Connected Partition instance with \(v_1\in V_1\) and \(v_2\in V_2\). Then, the FCD instance is a yes-instance, as it is possible to add the \(\frac{n}{2}\) vertices of color \(c_2\) attached to \(v_1\) to \(V_1\) and the \(\frac{n}{2}\) vertices of color \(c_2\) attached to \(v_2\) to \(V_2\) to arrive at a solution to the FCD instance. \(\square\)

Note that our results strengthen a result of Stoica et al. [48], who proved that FCD is NP-hard for \(\vert C \vert =2\) and \(k=2\) if we can additionally specify for each vertex the districts to which it can be assigned.

As our second basic result, using a simple dynamic programming approach, we show that an instance of FCD on a disconnected graph is polynomial-time solvable if FCD can be solved in polynomial time on its connected components. This will play a crucial role, e.g., in developing an XP-algorithm for the max leaf number (Theorem 1).

Proposition 2

Let \(\mathcal {G}\) be a class of graphs such that \({\textsc {FCD}}\) is polynomial-time solvable on any graph \(G \in \mathcal {G}\). Then, FCD is polynomial-time solvable on any graph from \(\mathcal {G}'\), where \(\mathcal {G'}\) is the class of graphs obtained by taking disjoint unions of graphs from \(\mathcal {G}\).

Proof

Let \(G'\in \mathcal {G}'\) and let \(G_1, \dots , G_p\in \mathcal {G}\) be the connected components of \(G'\). Clearly, every district is contained in the vertex set of \(G_i\) for some \(i \in [p]\). We solve the problem using a simple subset-sum like dynamic programming algorithm. To this end, we introduce a table T[ij] for \(i\in [p]\) and \(j\in [k]\). An entry T[ij] is true if one can partition the vertices of \(G_1, \dots , G_i\) into j connected \(\ell\)-fair districts respecting the size constraints. We also use a table H where H[ij] for \(i\in [p]\) and \(j\in [k]\) is true if the vertex set of \(G_i\) can be partitioned into j connected \(\ell\)-fair districts respecting the size constraints. Note that H can be computed in polynomial time by our initial assumption on G.

To initialize T, we set T[1, j] to H[1, j] for all \(j\in [k]\). Subsequently, for increasing \(i>1\), we update T as follows:

$$T[i,j]=\bigvee _{\begin{array}{c} j',j''\in [j]\\ j'+j''=j \end{array}} T[i-1,j'] \wedge H[i,j''].$$

In the end, we return T[pk]. This algorithm runs in \({\mathcal {O}}(p\cdot k^2) = {\mathcal {O}}(n^3)\) time (aside from the computation of H). \(\square\)

4 FCD on paths, cycles, and beyond

This section studies the computational complexity of FCD on simple graphs and graphs that can be partitioned into “few” paths and cycles. Specifically, in Sect. 4.1, we develop polynomial-time algorithms on paths, cycles, stars and caterpillars. Subsequently, in Sect. 4.2, we show that FCD is in XP when parameterized by the max leaf number, which generalizes polynomial-time solvability on paths and cycles.

4.1 Polynomial-time algorithms for FCD on simple graph classes

We start by proving that FCD is cubic-time solvable on paths using a simple dynamic programming approach.

Proposition 3

FCD on paths can be solved in \({\mathcal {O}} (k \cdot n^2)\) time.

Proof

Let \(G=\left( \{v_1,\dots ,v_n\}, \{\{v_i,v_{i+1} \} \mid i\in [n-1]\}\right)\) be the input path. We first create a table A[ij], where A[ij] for \(i\le j\in [n]\) is true if and only if \(\{v_i,\dots , v_j\}\) is \(\ell\)-fair and \(\vert \{v_i,\dots , v_j\} \vert =j - i + 1 \in [s_{\textrm{min}}, s_{\textrm{max}}]\). We then create a table T with entries T[it] for \(i\in [n]\) and \(t\in [k]\). The meaning of an entry T[it] is that it is true if there is a partition of the vertices \(\{v_1,\dots , v_i\}\) into t paths (connected districts) that are all \(\ell\)-fair and respect the size constraints. Note that the input is a yes-instance if and only if T[nk] is true.

We initialize table T by setting T[i, 1] for \(i\in [n]\) to A[1, i]. Subsequently, for \(t>1\), we update the table using:

$$T[i,t]=\bigvee _{j\in [i-1]} T[j,t-1] \wedge A[j+1,i].$$

The reasoning behind this is that if we partition \(\{v_1,\dots , v_i\}\) into t districts, then there needs to be some \(j\in [i-1]\) such that \(\{v_{j+1},v_{j+2},\dots , v_i\}\) is one district in the solution. Thus, if T[it] is true, then there needs to be some \(j\in [i-1]\) such that the vertices from \(\{v_1,\dots , v_j\}\) can be partitioned into \(t-1\) \(\ell\)-fair districts respecting the size constraints and the vertices \(\{v_{j+1},v_{j+2},\dots , v_i\}\) form an \(\ell\)-fair district respecting the size constraints.

The table A can be filled in \({\mathcal {O}}(n^2)\) time: for a fixed \(i\in [n]\), we can compute all \(A[i,n-j]\) for \(j\in [0,n-i]\) in linear time by starting for \(j=0\) with computing \(\chi (\{v_i,\dots , v_{n-j}\})\) and, subsequently, for increasing \(j>1\) compute \(\chi (\{v_i,\dots , v_{n-j}\})=\chi (\{v_i,\dots , v_{n-{j+1}}\})- \chi (\{v_{n-j+1}\})\) (which can be done in constant time). It can be determined in O(1) time whether \(\chi (\{v_i,\dots , v_{n-j}\})\) is \(\ell\)-fair by additionally storing the number of occurrences of integers (which are at most n) in \(\chi (\{v_i,\dots , v_{n-j}\})\). Additional bookmarking allows us to find the two largest entries in \(\chi (\{v_i,\dots , v_{n-j}\})\) in \({\mathcal {O}}(1)\) time. Moreover, the number of occurrences in \(\chi (\{v_i,\dots , v_{n-j}\})\) can be updated in \({\mathcal {O}}(1)\) time as we increase j. We set \(A[i, n - j]\) to true if and only if \(\chi (\{v_i,\dots , v_{n-j}\})\) is \(\ell\)-fair and \(s_{\textrm{min}}\le n - i - j + 1 \le s_{\textrm{max}}\).

Note that table T consists of \(k\cdot n\) entries. We spend \({\mathcal {O}}(n)\) time for every entry, resulting in an overall running time of \({\mathcal {O}}(k\cdot n^2)\). \(\square\)

To solve FCD on cycles, we iterate over all vertices of the cycle as the starting point of the first district and split the cycle at this point to convert it into a path. Subsequently, we employ the algorithm from above, which results in a running time of \({\mathcal {O}}(k\cdot n^3)\):

Corollary 1

FCD on cycles is solvable in \({\mathcal {O}}(k\cdot n^3)\) time.

Next, we proceed to stars, for which we derive a precise characterization of yes-instances. In fact, using a relatively involved analysis, we consider a more general problem, in which some set X containing the center vertex must belong to the same district, i.e., the vertex which is adjacent to all other vertices from the star. This proves useful in speeding up the algorithms to be presented in Propositions 5 and 7.

Proposition 4

(\(\bigstar\)) Let \((G=(X\cup Y,E),C,\text {col},k,\ell ,s_{\textrm{min}},s_{\textrm{max}})\) be an FCD instance where G is a star with center vertex \(v\in X\). Then, one can decide in linear time whether there is a partition of \(X\cup Y\) into k connected \(\ell\)-fair districts respecting \(s_{\textrm{min}}\) and \(s_{\textrm{max}}\) such that all vertices from X are part of the same district.

Proposition 4 directly implies that FCD is linear-time solvable on stars, even if the center vertex has a weight for each color, i.e., the center vertex represents multiple vertices that all need to be put into one district.

Corollary 2

FCD is linear-time solvable on stars.

Lastly, we extend our polynomial-time algorithm for paths from Proposition 3 to caterpillar graphs. A caterpillar graph is a tree where every vertex is either on a central path (spine) or a neighbor of a vertex on the central path.

Proposition 5

FCD on caterpillars can be solved in \({\mathcal {O}}(k\cdot n^3 )\) time.

Proof

Let \(G=(V,E)\) be the given caterpillar, let \((u_1, \dots , u_p)\) denote the spine (central path) of G and let \(U=\{u_1,\dots , u_p\}\). Moreover, for \(i\in [p]\), let \(U_i\) consist of \(u_i\) and all vertices from \(V\setminus U\) adjacent to \(u_i\). Note that each vertex from \(V\setminus U\) is only adjacent to one vertex from U, as G is in particular a tree. We solve the problem by extending our algorithm for paths from Proposition 3.

As a first step, we introduce a table A[ijt] for \(i\le j\in [n]\) and \(t\in [k]\). An entry A[ijt] is set to true if there is a partition \((V_1, \dots , V_t)\) of \(\bigcup _{i'\in [i,j]} U_{i'}\) into t districts such that:

  • \(V_1\) contains \(u_{i'}\) for every \(i' \in [i, j]\), and

  • \(V_{t'}\) is \(\ell\)-fair, \(\vert V_{t'} \vert \in [s_{\textrm{min}}, s_{\textrm{max}}]\), and \(G[V_{t'}]\) is connected for every \(t' \in [t]\).

We can fill table A in \({\mathcal {O}}(k\cdot n^3)\) time using Proposition 4 for each A[ijt] (for this, it is necessary to slightly restructure \(G[\bigcup _{i'\in [i,j]} U_{i'}]\) as a star \(G'\) on \(\bigcup _{i'\in [i,j]} U_{i'}\) with \(u_i\) as the center vertex and \(X=\{u_i,\dots , u_j\}\)). Using A, we now apply dynamic programming. To this end, we introduce a table T[it] for \(i\in [n]\) and \(t\in [k]\). Entry T[it] is set to true if it is possible to partition the vertices from \(\bigcup _{j\in [i]} U_j\) into t connected districts that are \(\ell\)-fair and respect the size constraints. We initialize the table by setting T[i, 1] to true if \(\bigcup _{j\in [i]} U_j\) is \(\ell\)-fair. Subsequently, for \(t>1\), we update the table using:

$$T[i,t]=\bigvee _{\begin{array}{c} j\in [i-1] \wedge t',t''\in [t]: \\ t'+t''=t \end{array}} T[j,t'] \wedge A[j+1,i,t''].$$

The reasoning behind this equality is that in a partitioning of \(\bigcup _{j\in [i]} U_j\) into t districts, there needs to be some \(j\in [i-1]\) such that there is one district containing vertices \(\{u_{j+1},\dots , u_i\}\). However, not all vertices from \(\bigcup _{t\in [j+1,i]} U_t\) need to be part of this district. In fact, because of the connectivity constraints, there is some \(t''\) such that one district contains vertices \(\{u_{j+1},\dots , u_i\}\) and all but \(t''-1\) vertices from \(\bigcup _{t\in [j+1,i]} U_t\) and \(t''-1\) districts consist of a single vertex from \(\bigcup _{t\in [j+1,i]} U_t\) (such a partitioning respecting fairness and size constraints exists if and only if \(A[j+1,i,t'']\) is true). The remaining \(t-t''\) districts then consists of vertices from \(\bigcup _{t\in [j]} U_t\) (such a partitioning respecting fairness and size constraints exists if and only if \(T[j,t']\) is true). Table T has \(k\cdot n\) entries each computable in \({\mathcal {O}}(k \cdot n)\) time. This leads to an overall running time of \({\mathcal {O}}(k \cdot n^3)\). \(\square\)

Note that a graph G has pathwidth one (\({{\,\textrm{pw}\,}}(G) = 1\)) if and only if G is a disjoint union of caterpillars. By Proposition 2 and Proposition 5, it follows that FCD is polynomial-time solvable on all graphs with pathwidth one (we later show in Corollary 4 that FCD is NP-hard on pathwidth-two graphs).

Corollary 3

FCD on a graph G with \({{\,\textrm{pw}\,}}(G) = 1\) is polynomial-time solvable.

4.2 An XP-algorithm for \({{\,\textrm{mln}\,}}\)

Now, we generalize the polynomial-time solvability on paths and cycles to a larger graph class. More precisely, we develop an XP-algorithm for the max leaf number (\({{\,\textrm{mln}\,}}\)). Recall that the max leaf number of a connected graph G is the maximum number of leaves over all spanning trees of G. Notably, any path or cycle has \({{\,\textrm{mln}\,}}= 2\). In order to develop a polynomial-time algorithm for constant \({{\,\textrm{mln}\,}}\), we use the notion of branches. (See Fig. 2 for an illustration.)

Fig. 2
figure 2

An illustration of a graph with four branches: \((v_1, v_2, v_3), (v_3, v_5), (v_3, v_4, v_6, v_5), (v_5, v_7, v_8)\)

Definition 1

A branch in a graph is either a maximal path in which all inner vertices have degree two or a cycle in which all but one vertex have degree two.

Using a classical theorem of Kleitman and West [31], Eppstein [22] showed that any graph has at most \({\mathcal {O}}({{\,\textrm{mln}\,}}^2)\) branches. For notational brevity, we assume that every branch B that is a cycle has exactly one endpoint, namely, the vertex in B with degree at least three in G (if there is no such vertex in B, then we fix an arbitrary vertex as its endpoint).

Theorem 1

FCD can be solved in \(n^{{\mathcal {O}}({{\,\textrm{mln}\,}}^2)}\) time.

Proof

Let \({\mathcal {B}}\) be the set of all branches of the given graph G and let X be the set of all endpoints of all branches. Suppose that there is a solution \({\mathcal {V}} = (V_1, \dots , V_k)\). Observe that there are naturally at most \(\vert X \vert\) subsets in \({\mathcal {V}}\) containing at least one vertex from X. Without loss of generality, assume that \(V_i\) contains at least one vertex from X for \(i \in [k']\) for some \(k'\in [\min (k, \vert X \vert )]\), and let \(X_i = X \cap V_i\). Let us now consider the relationship between a district \(V_i\) for \(i\in [k']\) and a branch \(B = (v_1, \dots , v_l) \in {\mathcal {B}}\) (where for all \(i\in [l-1]\), \(v_i\) and \(v_{i+1}\) are adjacent in G and if B is a cycle, then \(v_1 = v_l\)). Since \(V_i\) induces a connected subgraph in G, the following holds:

  • If \(v_1, v_l \notin X_i\), then \(V_i\) and B are disjoint: Since \(V_i\) is connected and contains at least one vertex from \(X {\setminus } \{v_1,v_l \}\), \(V_i\) contains no “inner” vertices from B.

  • If \(v_1 \in X_i\) and \(v_l \notin X_i\), then there is an integer \(j \in [l - 1]\) such that \(V_i \cap \{ v_1, \dots , v_l \} = \{ v_1, \dots , v_j \}\) (for \(v_1 \notin X_i\) and \(v_l \in X_i\), the situation is symmetric).

  • If \(v_1, v_l \in X_i\) for some \(i\in [k]\), then there are two integers \(j, j' \in [l]\) such that \(V_i \cap \{ v_1, \dots , v_l \} = \{ v_1, \dots , v_j \} \cup \{v_{l - j' + 1}, \dots , v_l \}\). Note that \(V_i\) contains all vertices of B when \(j + j' = l\).

For all other districts \(V_i\), \(i\in [k'+1,k]\), it holds that there is a branch \((v_1, \dots , v_l) \in {\mathcal {B}}\) with \(V_i\subseteq \{v_2,\dots , v_{l-1}\}\).

Using these observations, our algorithm proceeds as follows. We iterate over all possible combinations of the following (note that the number of combinations is \(n^{{\mathcal {O}}( \vert {\mathcal {B}} \vert )} = n^{{\mathcal {O}}({{\,\textrm{mln}\,}}^2)}\)):

  • An integer \(k' \in [\min (k, \vert X \vert )]\).

  • A partition \(\mathcal {X} = (X_1, \dots , X_{k'})\) of X into \(k'\) subsets.

  • For every branch \(B = (v_1, \dots , v_l)\in {\mathcal {B}}\), two integers \(j_B, j_B' \in [l]\) with \(j_B+j_B' \le l\).

Using these guesses, we can exactly determine the districts intersecting X: For every \(i \in [k']\) and every branch \(B =(v_1,\dots ,v_l)\in {\mathcal {B}}\), we define \(V_{i, B}\) as follows:

  • If \(v_1, v_l \notin X_i\), then \(V_{i, B}:= \emptyset\).

  • If \(v_1 \in X_i\) and \(v_{l} \notin X_i\), then \(V_{i, B}:=\{v_1,\dots , v_{j_B}\}\). Symmetrically, if \(v_1 \notin X_i\) and \(v_{l} \in X_i\), then \(V_{i,B}:=\{v_{l-j'_B+1},\dots , v_{l}\}\).

  • If \(v_1, v_l \in X_i\), then \(V_{i, B}=\{v_1,\dots , v_{j_B}\}\cup \{v_{l-j'_B+1},\dots , v_{l}\}\).

For \(i\in [k']\), let \(V_i:= \bigcup _{B \in {\mathcal {B}}} V_{i, B}\). We check whether the set \(V_i\) is \(\ell\)-fair and respects the size constraints for every \(i \in [k']\). If there is a \(V_i\) that is not \(\ell\)-fair or violates the size constraints, then we proceed to the next combination. Otherwise, it remains to determine whether the vertices \(V \setminus \bigcup _{i \in [k']} V_i\) can be partitioned into \(k - k'\) \(\ell\)-fair districts that respect the size constraints. Since \(G[V \setminus \bigcup _{i \in [k']} V_i]\) is a disjoint union of paths as all endpoints of branches are contained in \(\bigcup _{i \in [k']} V_i,\) this can be done in polynomial time by Propositions 2 and 3. \(\square\)

We leave it open whether FCD parameterized by \({{\,\textrm{mln}\,}}\) is fixed-parameter tractable or W[1]-hard.

5 FCD on trees and tree-like graphs

After having seen in Sect.  4 that FCD is polynomial-time solvable on paths, cycles, stars, and caterpillars, we now turn to trees. In Sect.  5.1, we prove that these polynomial-time results do not extend to trees. In particular, we prove that FCD on trees is NP-hard and W[1]-hard parameterized by \(\vert C \vert +k\). We complement these hardness results with an XP-algorithm for FCD on trees parameterized by \(\vert C \vert\) and another XP-algorithm for parameter k. In fact, both XP-algorithms further extend to tree-like graphs: In Sect.  5.2, we prove that FCD parameterized by the treewidth of the given graph plus \(\vert C \vert\) is in XP. Subsequently, in Sects.  5.3 and 5.4, we consider the feedback edge number (\({{\,\textrm{fen}\,}}\)) and the feedback vertex number (\({{\,\textrm{fvn}\,}}\)), which are both alternative measures for the tree-likeness of a graph (the treewidth of a graph can be upper bounded in a function of \({{\,\textrm{fen}\,}}\) and in a function of \({{\,\textrm{fvn}\,}}\)). Thus, the XP algorithm for \({{\,\textrm{tw}\,}}+~ \vert C \vert\) extends to the parameter combinations \({{\,\textrm{fvn}\,}}+~ \vert C \vert\) and \({{\,\textrm{fen}\,}}+~ \vert C \vert\), which is why we focus on the parameter combinations \({{\,\textrm{fvn}\,}}+~k\) and \({{\,\textrm{fen}\,}}+~k\) here. We prove that FCD parameterized by \({{\,\textrm{fen}\,}}+~k\) is in XP and we show that there is (presumably) no such result for the treewidth and the feedback vertex number (\({{\,\textrm{fvn}\,}}\)) by showing that FCD is NP-hard even if \({{\,\textrm{fvn}\,}}=1\), \({{\,\textrm{tw}\,}}=2\), and \(k=2\).

5.1 W[1]-hardness on trees

In this subsection, we show that in contrast to paths, cycles, stars, and caterpillars, FCD on trees is NP-hard even without size constraints. Simultaneously, we show that FCD parameterized by the number k of districts and the number \(\vert C \vert\) of colors is W[1]-hard on trees. To this end, we present a parameterized reduction from the following version of Grid Tiling which is NP-hard and W[1]-hard with respect to t [41] (in this subsection, all indices are taken modulo t):

figure c

Without loss of generality, we assume that \(n>2\).

The general idea of the reduction is as follows. Each solution to the constructed FCD instance has a center district. The center district contains some large number Z of vertices of two “dummy” colors c and \(c'\). The instance is constructed such that in every solution, vertices of any fixed color can appear at most Z times in the center district. By definition, each tile (xy) belongs to one of \(t^2\) tile sets \(S^{i, j} \in {\mathcal {S}}\). We construct a star \(T_{x, y}^{i, j}\) for each tile such that for each tile set \(S^{i, j} \in {\mathcal {S}}\), all but exactly one star \(T_{x, y}^{i, j}\) need to be contained in the center district. Thus, the center district (respectively its complement) basically encodes a selection of one tile from each tile set. Moreover, we construct the FCD instance in such a way that for two stars from two “adjacent” tile sets the respective first or second entries of the tiles need to match; otherwise the number of vertices of some color in the central district will exceed Z.

5.1.1 Construction

Let \(({\mathcal {S}},t,m,n,X,Y)\) be an instance of \(\textsc {Grid Tiling}\). We construct an instance of FCD as follows.

First of all, we set \(\ell =0\), \(s_{\textrm{min}}= 1\), \(s_{\textrm{max}}= \infty\), and \(k=t^2+1\). For each \(i,j\in [t]\), we introduce three distinct colors \(b_{i,j}\), \(d_{i,j}\), and \(c_{i,j}\). Moreover, we introduce three distinct colors c, \(c'\), and \(c^{\star }\). Now, we fix some constants which we use later. Let \(W:=5n(t^2+t)+1\), \(Z:=2(n-1)\cdot 5\,m W\), \(f(i,j):=i\cdot t+j\), and \(g(i,j):=t^2+t+i\cdot t+j\). Note that this implies that \(W\ge 2.5\cdot n \cdot g(i,j)\) and \(W\ge 2.5\cdot n \cdot f(i,j)\) for all \(i,j\in [t]\).

We are now ready to construct the vertex-colored graph \(G=(V,E)\). We start by introducing a center vertex \(v_{\textrm{center}}\) of color \(c^{\star }\). In a solution, we call the district containing \(v_{\textrm{center}}\) the center district. We add Z vertices of color c and Z vertices of color \(c'\), all of which are only adjacent to \(v_{\textrm{center}}\).

For each \(i,j\in [t]\) and \((x,y)\in S^{i,j}\), we construct a star \(T^{i, j}_{x, y}\) and connect its center to \(v_{\textrm{center}}\). We color the center of \(T^{i, j}_{x, y}\) in \(c^{\star }\). Moreover, \(T^{i, j}_{x, y}\) has the following leaves:

  • \(\frac{Z}{2(n-1)}+W\cdot x -f(i,j)\) vertices of color \(d_{i,j}\),

  • \(\frac{Z}{2(n-1)}-W\cdot x -f(i',j)\) vertices of color \(d_{i',j}\) for \(i':= i+1 \bmod k\),

  • \(\frac{Z}{2(n-1)}+W\cdot y -g(i,j)\) vertices of color \(b_{i,j}\),

  • \(\frac{Z}{2(n-1)}-W\cdot y -g(i,j')\) vertices of color \(b_{i,j'}\) for \(j':= j + 1 \bmod k\), and

  • \(\max (\frac{Z}{2(n-1)}+W\cdot x-f(i,j),\frac{Z}{2(n-1)}+W\cdot y-g(i,j))\) vertices of color \(c_{i,j}\).

Observe that the constructed star \(T_{x,y}^{i,j}\) is 0-fair, as the number of occurrences of \(c_{i,j}\) matches the number of occurrences of the otherwise most frequent color. This concludes the construction.

5.1.2 Proof of correctness

We start with a simple observation on the total number of vertices of some of the colors:

Observation 1

For each \(i,j\in [t]\), it holds that \(\chi _{d_{i,j}}(V)= \frac{n}{n - 1} Z-2nf(i,j)\) and \(\chi _{b_{i,j}}(V)= \frac{n}{n - 1} Z-2ng(i,j)\). For all \(i,j\in [k]\), it holds that \(\chi _{c_{i,j}}(V)\le Z\).

Proof

For each \(i,j\in [t]\), vertices of color \(d_{i,j}\) occur in 2n different stars, that is, in all stars corresponding to tiles from \(S^{i,j}\) and \(S^{i-1,j}\). As the first entries of all tiles from \(S^{i,j}\) and of all tiles from \(S^{i-1,j}\) sum up to X, it follows that

$$\begin{aligned} \chi _{d_{i,j}}(V)&= \sum _{(x, y) \in S^{i, j} } \frac{Z}{2(n - 1)} + W x - f(i, j) + \sum _{(x, y) \in S^{i - 1, j}} \frac{Z}{2(n - 1)} - Wx - f(i, j) \\&= \left( \frac{n}{2(n - 1)} Z + WX - n f(i, j)\right) + \left( \frac{n}{2(n - 1)} Z - WX - n f(i, j)\right) \\&= \frac{n}{n - 1} Z-2nf(i,j). \end{aligned}$$

The same reasoning applies for all colors \(b_{i,j}\) proving that \(\chi _{b_{i,j}}(V)= \frac{n}{n - 1} Z-2ng(i,j)\). Lastly, for some \(i,j\in [t]\), vertices of color \(c_{i,j}\) appear only in stars corresponding to tiles from \(S^{i,j}\), each of them containing at most \(\frac{Z}{2(n-1)}+W\cdot m\) such vertices. Thus, the number of vertices of color \(c_{i,j}\) is upper-bounded by \(n\cdot (\frac{Z}{2(n-1)}+W\cdot m)\). As \(W\cdot m= \frac{1}{8(n-1)}Z\) and \(n>2\), this is smaller than Z from which \(\chi _{c_{i,j}}(V)\le Z\) follows. \(\square\)

Using Observation 1, we now prove the forward direction of the correctness of the construction.

Lemma 1

If the given Grid Tiling instance is a yes-instance, then the constructed FCD instance is a yes-instance.

Proof

Let \(S=\{(x^{i,j},y^{i,j})\in S^{i,j} \mid i,j \in [t]\}\) be a solution for the given Grid Tiling instance. From this we construct a solution of the FCD instance as follows. For each \((x^{i,j},y^{i,j})\in S\), we create a separate district and put into this district all vertices from the star \(T_{x,y}^{i,j}\) corresponding to \((x^{i,j},y^{i,j})\). We put all other vertices in the center district. Note that this construction respects the given number \(k=t^2+1\) of districts and that all districts are by construction non-empty and connected.

It remains to argue that all created districts are 0-fair. Since every star is 0-fair by construction, all non-center districts are 0-fair. For the center district, note that it contains Z vertices of color c and Z vertices of color \(c'\). Thus, it is sufficient to argue that there is no color such that the number of its occurrences in the center district exceeds Z. For color \(c_{i,j}\) this directly follows from Observation 1. Next we consider color \(d_{i,j}\) for some \(i,j\in [t]\). All stars with some vertices of color \(d_{i, j}\) are part of the center district, except for the one corresponding to \((x^{i,j},y^{i,j})\) and the one corresponding to \((x^{i-1,j},y^{i-1,j})\). Since S is a solution, we have \(x^{i-1,j}=x^{i,j}\). Consequently, exactly \(\frac{Z}{n-1}-2f(i,j)\) vertices of color \(d_{i,j}\) are excluded from the center district. Now it follows from Observation 1 that the number of vertices of color \(d_{i, j}\) in the center district is at most Z. An analogous argument also holds for \(b_{i,j}\) for \(i,j\in [t]\), which concludes the proof. \(\square\)

It remains to prove the correctness of the backward direction. To do this, we first observe that for every star, all of its vertices have to belong to the same district. Subsequently, we prove that the center district can contain at most Z vertices of each color. We then show that in order to respect this bound, for each tile set, exactly one star corresponding to a tile from this set must be excluded from the center district. Again exploiting the fact that every color has at most Z occurrences in the center district, we show that those excluded tiles form indeed a solution to the given Grid Tiling instance. We start by observing that the vertices from one star need to be part of the same district.

Observation 2

For each tile \((x,y)\in {\mathcal {S}}\), all vertices from the corresponding star need to be part of the same district in a solution to the constructed FCD instance.

Proof

As we set \(\ell =0\) in the constructed FCD instance, there cannot exist a district containing just a single vertex. Thus, all vertices from a star need to belong to the same district as the center of the star. \(\square\)

Using this observation, we make some more involved arguments dealing with the possible number of vertices of a color in the center district in a solution. In the following, let \(V_{\textrm{center}}\) denote the center district in a solution to the constructed FCD instance. We make this observation in order to ensure that no two different colors appear the same number of times and in particular more than Z times in the center district.

Lemma 2

For each \(i, j \in [t]\), if \(\chi _{d_{i, j}}(V_{\textrm{center}}) \ge Z\), then the number α of stars \(T^{i, j}_{x, y}\) with some vertex of color \(d_{i,j}\) that are not part of the center district is at most two. In particular, it holds that \(\chi _{d_{i, j}}(V_{\textrm{center}}) = (\frac{n}{n-1} Z- 2n f(i, j)) - \alpha (\frac{Z}{2(n-1)} - f(i,j)) + W q\) for some \(q \in [-2\,m, 2\,m]\).

Proof

Let us consider some \(i,j\in [t]\). By Observation 2, for every star \(T_{x, y}^{i, j}\), all the vertices from \(T_{x, y}^{i, j}\) are in \(V_{\textrm{center}}\) or no vertex from \(T_{x, y}^{i, j}\) is in \(V_{\textrm{center}}\).

By construction, every star with some vertex of color \(d_{i,j}\) contains \(\frac{Z}{2(n-1)}+W\cdot x-f(i,j)\) vertices of this color for some \(x \in [-m, m]\). So as there are \(\frac{n}{n-1}Z-2nf(i,j)\) vertices of color \(d_{i,j}\) (Observation 1), if \( \alpha \ge 3\), then the number of vertices of color \(d_{i,j}\) in the center district is at most \(Z-\frac{Z}{2(n-1)}+3mW\). By the definition of Z this is smaller than Z.

Thus, we obtain \( \alpha\le 2\). Since there are in total \(\frac{n}{n-1} Z- 2n f(i, j)\) vertices of color \(d_{i, j}\) by Observation 2, the lemma holds. \(\square\)

Analogously, we can prove similar bounds for the number of vertices of color \(b_{i,j}\) for some \(i,j\in [t]\) in the center district.

Lemma 3

For each \(i, j \in [t]\), if \(\chi _{b_{i, j}}(V_{\textrm{center}}) \ge Z\), then the number α of stars \(T^{i, j}_{x, y}\) with some vertex of color \(b_{i,j}\) that are not part of the center district is at most two. In particular, it holds that \(\chi _{b_{i, j}}(V_{\textrm{center}}) = (\frac{n}{n-1} Z - 2n g(i, j)) - \alpha(\frac{Z}{2(n-1)} - g(i,j)) + W v\) for some \(v \in [-2\,m, 2\,m]\).

Using these two lemmas, we can now prove that there are at most Z vertices of the same color in the center district in a solution to the constructed FCD instance.

Lemma 4

For every color q, \(\chi _q(V_{\textrm{center}}) \le Z\).

Proof

We claim that for two distinct colors \(q, q'\) with \(\chi _q(V_{{\text {center}}}), \chi _{q'}(V_{{\text {center}}}) > Z\), it holds that \(\chi _q(V_{{\text {center}}}) \ne \chi _{q'}(V_{{\text {center}}})\). The statement of the lemma directly follows from the claim, as \(V_{\textrm{center}}\) needs to be 0-fair.

Assume for a contradiction that \(\chi _q(V_{{\text {center}}}) = \chi _{q'}(V_{\text {center}}) > Z\) for some colors \(q \ne q'\). Since there are at most Z vertices of color \(c,c',c^{\star },c_{i,j}\) for \(i,j\in [t]\), q and \(q'\) are among the colors \(b_{i, j}\) and \(d_{i, j}\) for \(i, j \in [t]\). By Lemmas 2 and 3 and as it holds that \(\chi _q(V_{\text {center}}) = \chi _{q'}(V_{\text {center}})\), from this it follows that \(2n h(i, j) + a\big (\frac{Z}{2(n - 1)} - h(i, j)\big ) + Wv = 2n h'(i', j') + a'\big (\frac{Z}{2(n - 1)} - h'(i', j')\big ) + Wv'\), for some \(a, a' \in \{ 0, 1, 2 \}\), \(v, v' \in [-2\,m, 2\,m]\), \(h(i, j) \in \{ f(i, j), g(i, j) \}\), and \(h'(i', j') \in \{ f(i', j'), g(i', j') \}\). Rewriting yields that \((5\,m(a - a') + v - v') W = (2n - a') h'(i', j') - (2n - a) h(i, j)\). Since W is sufficiently large, the absolute value of the right-hand side does not exceed W. Thus, we have \(5\,m(a - a') + v - v' = 0\) and \((2n - a) h(i, j) = (2n - a') h'(i', j')\). The first equation implies \(a = a'\) since \(v - v' \in [-4\,m, 4\,m]\). Cancelling out \(2n - a > 0\) in the second equation, we obtain \(h(i, j) = h'(i', j')\). Now we have \(q = q'\), as \(f({\tilde{i}},{\tilde{j}})\ne g({\tilde{i}}',{\tilde{j}}')\) for all \({\tilde{i}},{\tilde{j}},{\tilde{i}}',{\tilde{j}}'\in [t]\), \(f({\tilde{i}},{\tilde{j}})= f({\tilde{i}}',{\tilde{j}}')\) only if \({\tilde{i}}={\tilde{i}}'\) and \({\tilde{j}}={\tilde{j}}'\), and \(g({\tilde{i}},{\tilde{j}})= g({\tilde{i}}',{\tilde{j}}')\) only if \({\tilde{i}}={\tilde{i}}'\) and \({\tilde{j}}={\tilde{j}}'\). We have reached a contradiction. \(\square\)

Using Lemma 4, we can prove that each solution to the constructed FCD instance induces a selection of one tile from each tile set in the given Grid Tiling instance:

Lemma 5

For each \(i,j\in [t]\), exactly one star \(T_{x, y}^{i, j}\) for some \((x,y)\in S^{i,j}\) is not part of \(V_{\textrm{center}}\).

Proof

Lemma 2 and Lemma 4 imply that for each \(i,j\in [t]\) at least two stars containing vertices of color \(d_{i,j}\) are not part of \(V_{\textrm{center}}\). As each star \(T_{x, y}^{i, j}\) contains vertices of colors \(d_{i,j}\) and \(d_{i+1,j}\) and as there exist \(t^2\) non-center districts in the end, this means that for each \(i,j\in [t]\) exactly two stars containing vertices of this color are not part of \(V_{\textrm{center}}\).

For the sake of contradiction, let us assume that there exists \(i,j\in [t]\) such that two stars corresponding to tiles \((x, y), (x', y') \in S^{i,j}\) are not part of \(V_{\textrm{center}}\). This implies by our previous observation that all other stars containing vertices of color \(d_{i+1,j}\) need to belong to \(V_{\textrm{center}}\). However, in this case, \(\frac{Z}{n-1} - W(x + x') - 2 f(i, j) \le \frac{W}{n - 1} - 2W\) vertices of color \(d_{i+1,j}\) are not part of \(V_{\textrm{center}}\). By Observation 1, this implies that the number of vertices of color \(d_{i+1,j}\) in \(V_{\textrm{center}}\) is at least \(Z+2W-2nf(i,j)\). As \(W>n(t^2+t)\), this number is greater than Z, contradicting Lemma 4. \(\square\)

Putting all pieces together, we are now ready to prove the correctness of the backward direction of the construction.

Lemma 6

If the constructed FCD instance is a yes-instance, then the given Grid Tiling instance is a yes-instance.

Proof

Let \({\mathcal {V}}\) be a solution to the constructed FCD instance and let \(V_{\textrm{center}}\in {\mathcal {V}}\) be the district containing \(v_{\textrm{center}}\). Lemma 5 implies that for each \(i,j\in [t]\) exactly one star corresponding to a tile from \(S^{i,j}\) is not part of \(V_{\textrm{center}}\). Let

$$S:=\{(x^{i,j},y^{i,j})\in S^{i,j} \mid i,j \in [t]\wedge T^{i,j}_{x,y}\text { is not part of } V_{\textrm{center}} \}$$

be a set of all tiles corresponding to excluded stars (one for each \(i,j\in [t]\)). We claim that S is a valid solution to the given Grid Tiling instance.

For the sake of contradiction, assume that this is not the case. That is, there either (a) exist \(i,j\in [t]\) such that \(x^{i,j}\ne x^{i-1,j}\) or (b) exist \(i',j'\in [t]\) such that \(y^{i',j'}\ne y^{i',j'-1}\). Let us start by assuming that (a) is the case. Note that it is possible to assume without loss of generality that \(x^{i,j}<x^{i-1,j}\), as from the fact that there exists some \(x^{i,j}\ne x^{i-1,j}\) it follows that there also exists some \({\tilde{i}}\in [t]\) such that \(x^{{\tilde{i}},j}<x^{{\tilde{i}}-1,j}\). The only stars containing vertices of color \(d_{i,j}\) that are not part of \(V_{\textrm{center}}\) are the star corresponding to \((x^{i,j},y^{i,j})\), which contains \(\frac{Z}{2(n-1)}+W\cdot x^{i,j}-f(i,j)\) vertices of this color, and the star corresponding to \((x^{i-1,j},y^{i-1,j})\), which contains \(\frac{Z}{2(n-1)}-W\cdot x^{i-1,j}-f(i,j)\) vertices of this color. This implies that at most \(\frac{Z}{n-1}+W\cdot (x^{i,j}-x^{i-1,j})-2f(i,j)\le \frac{Z}{n-1}-W-2f(i,j)\) vertices of this color are not part of \(V_{\textrm{center}}\), where the inequality holds by our assumption that \(x^{i,j}<x^{i-1,j}\). Combining this with Observation 1, it follows that the number of vertices of color \(d_{i,j}\) in \(V_{\textrm{center}}\) is at least \(Z+W-2(n-1)f(i,j)\). By the definition of W, this is strictly greater than Z. Applying Lemma 4, we reach a contradiction.

The same argument can also be applied if (b) holds, which proves that S is a solution to the given Grid Tiling instance. \(\square\)

From Lemma 1 and Lemma 6 the correctness of the reduction follows. As our construction takes only polynomial time and \(\vert C \vert +k=3t^2+4\) is bounded in a function of t, the NP-hardness and the W[1]-hardness with respect to \(\vert C \vert +k\) of FCD on trees follows.

Theorem 2

FCD on trees is NP-hard and W[1]-hard with respect to \(\vert C \vert +k\), even if \(s_{\textrm{min}}=1\) and \(s_{\textrm{max}}=\infty\).

Recall that by Corollary 3, FCD can be solved in polynomial time on all graphs with pathwidth one (disjoint unions of caterpillars). Observe that the tree constructed in the reduction above has pathwidth two: Graph \(G'\) obtained from G by deleting \(v_{\textrm{center}}\) is a disjoint union of stars. Thus, \(G'\) admits a path decomposition of width one. Placing \(v_{\textrm{center}}\) into every bag yields a path decomposition of G of width two. This results in the following.

Corollary 4

FCD on graphs G with \({{\,\textrm{pw}\,}}(G) = 2\) is NP-hard and W[1]-hard with respect to \(\vert C \vert +k\), even if \(s_{\textrm{min}}=1\) and \(s_{\textrm{max}}=\infty\).

To be more precise, we even get NP-hardness and W[1]-hardness with respect to \(\vert C \vert +k\) on a pathwidth one graph to which we add a single vertex and adjacent edges. Notably, this result is tight in the sense that we have proved polynomial-time solvability on pathwidth-one graphs in Corollary 3.

As the tree constructed in the previous reduction has diameter four (it consists of a center vertex and centers of stars attached to it), we conclude that FCD is computationally intractable even on trees with a small constant diameter:

Corollary 5

FCD is NP-hard and W[1]-hard with respect to \(\vert C \vert +k\) on trees with diameter four, even if \(s_{\textrm{min}}=1\) and \(s_{\textrm{max}}=\infty\).

We will complement this result in Corollary 8 where we show that FCD is polynomial-time solvable on trees with diameter at most three.

5.2 An XP-algorithm for \(\mathbf {tw + \vert C \vert }\)

Motivated by the hardness result from the previous subsection, we search for an XP-algorithm for the parameters \(\vert C \vert\) and k for FCD on trees. We start by considering the parameter \(\vert C \vert\) here and the parameter k in the next two subsections. Specifically, we show that there exists an XP-algorithm for \(\vert C \vert\) on tree-like graphs—more precisely, we present an XP-algorithm with respect to \(\vert C \vert +{{\,\textrm{tw}\,}}\), where \({{\,\textrm{tw}\,}}\) is the treewidth of the underlying graph.

Given a graph \(G=(V,E)\), a tree decomposition \((T = (V_T, E_T), \{B_x \}_{x \in V_T})\) of G (as defined in the Preliminaries) is nice if each node \(x\in V_T\) has one of the following typesFootnote 5:

Leaf node.:

A leaf of T with \(B_x = \{ v \}\) for \(v \in V\).

Introduce vertex v node.:

An internal node of T with one child \(y\in V_T\) such that \(B_x = B_y \cup \{ v \}\).

Introduce edge \(\{ u, v \}\) node.:

An internal node of T with one child \(y\in V_T\) such that \(u, v \in B_x = B_y\).

Forget v node.:

An internal node of T with one child \(y\in V_T\) such that \(B_x = B_y {\setminus } \{ v \}\) for \(v \in B_y\).

Join node.:

An internal node of T with two children \(y\in V_T\) and \(z\in V_T\) such that \(B_x = B_y = B_z\).

We will implicitly assume that every introduce edge \(\{ u, v \}\) node is labeled by \(\{ u, v \}\) and that for each edge there is exactly one such node. Given a tree decomposition, a nice tree decomposition of equal width can be computed in linear time [32]. By applying dynamic programming on top of the nice tree decomposition of the given graph, we establish the following:

Theorem 3

(\(\bigstar\)) FCD can be solved in \({\mathcal {O}}(n^{{\mathcal {O}}({{\,\textrm{tw}\,}}\cdot \vert C \vert )})\) time.

Since a tree is of treewidth one, we have the following:

Corollary 6

FCD on trees can be solved in \(n^{{\mathcal {O}}( \vert C \vert )}\) time.

5.3 An XP-algorithm for \(\textbf{fen} + \textbf{k}\)

Having constructed a polynomial-time algorithm for constant \(\vert C \vert\) on trees and on graphs with a constant treewidth, we now consider the number k of districts as our parameter for FCD on trees. We show that there is a simple XP-algorithm with respect to k for FCD on trees, which naturally extends to an XP-algorithm with respect to \({{\,\textrm{fen}\,}}+~k\), where \({{\,\textrm{fen}\,}}\) is the number of edges that need to be deleted to make the given graph a tree:

Proposition 6

FCD can be solved in \(n^{{\mathcal {O}}({{\,\textrm{fen}\,}}+ k)}\) time.

Proof

Suppose that there is a solution \((V_1, \dots , V_k)\). As for each \(i\in [k]\), \(V_i\) is connected (for which at least \(\vert V_i \vert -1\) edges are needed), the number of edges whose endpoints both lie in the same district is at least \(\sum _{i = 1}^k ( \vert V_i \vert - 1) = n - k\). Since the input graph \(G = (V, E)\) has at most \(n + {{\,\textrm{fen}\,}}- 1\) edges (by the definition of \({{\,\textrm{fen}\,}}\)), it follows that there are at most \((n + {{\,\textrm{fen}\,}}- 1) - (n - k) = {{\,\textrm{fen}\,}}+ k - 1\) edges whose endpoints belong to different districts. Accordingly, in our algorithm for each edge set \(E' \subseteq E\) of size at most \({{\,\textrm{fen}\,}}+ k - 1\), we verify whether \((V, E \setminus E')\) has k connected components each of which being \(\ell\)-fair and respecting the size constraints. We return yes if this is the case for some subset of edges and no otherwise. Overall, it takes \((n + {{\,\textrm{fen}\,}}- 1)^{{{\,\textrm{fen}\,}}+ k - 1} \cdot n^{{\mathcal {O}}(1)} = n^{{\mathcal {O}}({{\,\textrm{fen}\,}}+ k)}\) time to do so. \(\square\)

As the treewidth of a graph is upper-bounded in a function of its feedback edge number, we can conclude from Theorem 3 that FCD parameterized by \({{\,\textrm{fen}\,}}+~ \vert C \vert\) is in XP. However, Proposition 6 leaves it open whether there is an XP-algorithm for \({{\,\textrm{tw}\,}}+~k\). We answer this question negatively in the next subsection.

5.4 NP-hardness for \(\mathbf {fvn = 1}\) and \(\textbf{k} = \textbf{2}\)

Having already considered the parameters treewidth and feedback edge number, we now consider a third way to measure the distance from a tree, the number of vertices to delete to make it a tree (feedback vertex number \({{\,\textrm{fvn}\,}}\)). As \({{\,\textrm{tw}\,}}+1\le {{\,\textrm{fvn}\,}}\), from Theorem 3 it follows that there is an XP-algorithm for \({{\,\textrm{fvn}\,}}+~ \vert C \vert\). We now prove that FCD is NP-hard even for \({{\,\textrm{fvn}\,}}=1\) and \(k=2\), in contrast to the result from the previous subsection, where we gave an \(n^{\mathcal O({{\,\textrm{fen}\,}}+ k)}\)-time algorithm. Notably, this NP-hardness result also excludes the existence of an XP-algorithm for \({{\,\textrm{tw}\,}}+~k\) unless P \(=\) NP:

Theorem 4

FCD is NP-hard for \({\text {fvn}} = 1\) and \(k=2\), even if \(s_{\textrm{min}}=1\) and \(s_{\textrm{max}}=\infty\).

The rest of the section is devoted to proving the theorem. We reduce from the NP-hard Not-All-Equal 3-Sat problem [46] where the input is a set X of n boolean variables and a set Y of m clauses over X such that each clause \(y\in Y\) contains three different literals, and the question is whether there exists a truth assignment to the variables in X such that for each clause \(y\in Y\) at least one literal is set to true and at least one literal is set to false. Notably, given an assignment fulfilling these constraints, the assignment that assigns all variables in X the opposite truth value also fulfills the constraints.

The general idea of the construction is that we introduce one vertex for each literal and that the two districts in the solution correspond to two opposite truth assignments of the variables from X that are both solutions of the Not-All-Equal 3-Sat instance.

Fig. 3
figure 3

Example of the hardness reduction from Theorem 4 for the Not-All-Equal 3-Sat instance \(X=\{x_1,x_2,x_3\}\), \(Y=\{y_1=\{x_1,x_2,\overline{x}_3\}, y_2=\{x_1,x_2,x_3\}, y_3=\{\overline{x}_1,\overline{x}_2,x_3\} \}\). The district marked in red corresponds to the solution setting \(x_1\) to true and \(x_2\) and \(x_3\) to false

5.4.1 Construction

Given an instance \((X=\{x_1,\dots , x_n\},Y=\{y_1,\dots , y_m\})\) of Not-All-Equal 3-Sat, we construct an instance of FCD as follows. We add a color \(c_i^x\) for each variable \(x_i\in X\) and a color \(c^y_j\) for each clause \(y_j\in Y\). In addition, we add three colors c, \(c'\), and \(c''\). Moreover, we set \(k=2\), \(\ell =0\), \(s_{\textrm{min}}=1\), and \(s_{\textrm{max}}=\infty\). Let \(Z:=2\cdot n\cdot m+1\).

We start the construction of the vertex colored graph \(G=(V,E)\) by introducing two central vertices \(v^{\star }_1\) and \(v^{\star }_2\) of color c. We will construct the instance in a way that these two vertices need to lie in different districts. For each central vertex \(v^{\star }_i\), \(i \in \{ 1, 2 \}\), we introduce 3Z vertices of color \(c'\) and 3Z vertices of color \(c''\) and connect them to \(v^{\star }_i\).

Subsequently, for each variable \(x_i\in X\), we introduce two literal vertices \(v_{i}\) and \(\overline{v}_{i}\) of color c and connect both these vertices to the two central vertices. For each literal vertex \(\tilde{v}_i \in \{ v_i, \overline{v}_i \}\), we introduce \(3Z-i\) vertices of color \(c^x_{i}\) and connect them to \(\tilde{v}_i\) (these vertices make sure that the two literal vertices end up in different districts). For each clause \(y_j\in Y\) in which \(x_i\) occurs positively, we introduce \(Z+j\) vertices of color \(c^y_{j}\) and connect them to \(v_{i}\). For each clause \(y_j\in Y\) in which \(x_i\) occurs negatively, we introduce \(Z+j\) vertices of color \(c^y_{j}\) and connect them to \(\overline{v}_{i}\) (these vertices ensure that there is no clause in which all three literal vertices corresponding to literals from the clause lie in the same district). See Fig. 3 for a visualization of the construction.

We start by showing the forward direction of the correctness of the construction.

Lemma 7

If the given Not-All-Equal 3-Sat instance is a yes-instance, then the constructed FCD instance is a yes-instance.

Proof

Let \(X'\subseteq X\) be the set of variables set to true in a solution to the given Not-All-Equal 3-Sat instance. From this, we construct a solution \((V_1, V_2)\) to the constructed FCD instance as follows. We include \(v^{\star }_1\) and all leaves attached to it in \(V_1\). Moreover, we include in \(V_1\) the following vertices: \(v_i\) and all leaves attached to it for all \(x_i\in X'\) and \(\overline{v}_i\) and all leaves attached to it for all \(x_i \in X \setminus X'\). We include all other vertices in \(V_2\).

It is easy to verify that \(V_1\) and \(V_2\) are both connected. We will show that \(V_1\) is 0-fair. By symmetry, an analogous argument will show that \(V_2\) is 0-fair. First, observe that \(V_1\) contains exactly 3Z vertices of color \(c'\) and 3Z vertices of color \(c''\). We show that \(\chi _{\tilde{c}}(V_1)\le 3Z\) for every color \(\tilde{c} \in C\). For color c, we have that \(\chi _{c}(V_1) \le \chi _{c}(V) = n+2 < 3Z\). For each variable \(x_i\in X\), as the two corresponding literal vertices are part of different districts, we have that \(\chi _{c^x_i}(V_1)= 3Z-i < 3Z\). For each clause \(y_j\in Y\), as \(X'\) is a solution, either one or two literal vertices corresponding to literals in \(y_j\) and the attached leaves are part of \(V_1\). Thus, the number of vertices of color \(c^y_j\) in \(V_1\) is either \(Z+j\) or \(2Z+2j\). As \(Z>2m\), it follows that \(\chi _{c^y_j}(V_1) < 3Z\). Thus, both districts \(V_1\) and \(V_2\) are 0-fair. \(\square\)

It remains to prove the correctness of the backward direction of the reduction. For this, note that the FCD instance is constructed such that the two central vertices need to end up in different districts and for each \(x_i \in X\), the two literal vertices \(v_i\) and \(\overline{v}_{i}\) need to end up in different districts. Thus, the two districts correspond to two inverse truth assignments. Subsequently, we will prove that there is no clause in which all corresponding vertices are in the same district. This will show that the two truth assignments induced by the two districts are a solution to the given Not-All-Equal 3-Sat instance.

We start by observing that all leaves need to lie in the same district as the vertex they are attached to.

Observation 3

In a solution to the constructed FCD instance, all leaves attached to a vertex \(v\in V\) need to lie in the same district as v.

Proof

If a leaf is not in the same district as its neighbor, then the leaf needs to form its own district. However, this is not possible, as we require each district to be 0-fair. \(\square\)

Next, we show that indeed the two central vertices \(v^{\star }_1\) and \(v^{\star }_2\) always lie in different districts.

Observation 4

The two central vertices \(v^{\star }_1\) and \(v^{\star }_2\) cannot belong to the same district in any solution to the constructed FCD instance.

Proof

Assume that \(v^{\star }_1\) and \(v^{\star }_2\) belong to the same district. By Observation 3, the other district \(V'\) needs to consist of a single literal vertex \(v_i\) or \(\overline{v}_i\) for some \(x_i\in X\) and all leaves attached to it. The most frequent color appearing in \(V'\) is \(c^x_i\) which occurs \(3Z-i > 2Z\) times (by the definition of Z). The second most frequent color is either \(c^y_j\) for some \(j\in [m]\) occurring \(Z+j < 2Z\) times or c occurring once (by the definition of Z). Thus, the district cannot be 0-fair. \(\square\)

To prove that, for each variable, the two corresponding literal vertices need to be part of different districts and that not all literal vertices corresponding to literals from a clause can lie in the same district, we prove that neither district can contain more than 3Z vertices of the same color.

Lemma 8

For any solution to the constructed FCD instance, no district contains more than 3Z vertices of the same color.

Proof

Assume that there exists a solution with a district \(V'\) containing more than 3Z vertices of the same color. Since \(V'\) is 0-fair, there are two colors q and \(q'\) with \(\chi _q(V') = \chi _{q'}(V') > 3Z\). We have \(\chi _{c}(V') \le \chi (V) = n + 2 < 3Z\) and \(\chi _{c'}(V') = \chi _{c''}(V') = 3Z\) by Observation 4. Thus, we have \(q, q' \in \{ c_{i}^x \mid x_i \in X \} \cup \{ c_{j}^y \mid y_j \in Y \}\).

For \(x_i\in X\), there are two literal vertices each of which have \(3Z-i\) leaves of color \(c^x_i\) attached to it. For \(y_j\in Y\), there are three literal vertices each of which have \(Z+j\) leaves of color \(c^y_j\) attached to it. It follows from Observation 3 that \(\chi _{c^x_i}(V') \in \{ 0, 3Z - i, 6Z - 2i \}\) for each \(x_i \in X\) and that \(\chi _{c^y_j}(V') \in \{ 0, Z + j, 2Z + 2j, 3Z + 3j \}\) for each \(y_j \in Y\). Since \(i \le n\), \(j \le m\), \(Z = 2\cdot n\cdot m+1\), and \(\chi _q(V') = \chi _{q'}(V')\) for \(q, q' \in \{ c_{i}^x \mid x_i \in X \} \cup \{ c_{j}^y \mid y_j \in Y \}\), it needs to hold that \(\chi _q(V') = \chi _{q'}(V') = 0\), which contradicts \(\chi _q(V') = \chi _{q'}(V') > 3Z\). \(\square\)

Recalling that \(3Z-i\) vertices of color \(c^x_i\) are attached to each literal vertex corresponding to \(x_i\in X\) and that \(6Z-2i>3Z\), the next observation directly follows from the previous lemma and Observation 3.

Observation 5

Let \((V_1,V_2)\) be a solution of the constructed FCD instance. Then, for each \(x_i\in X\), exactly one of the two corresponding literal vertices \(v_i\) and \(\overline{v}_i\) is part of \(V_1\) and the other is part of \(V_2\).

We are now ready to prove the backward direction of the correctness of our construction.

Lemma 9

If the constructed FCD instance is a yes-instance, then the given Not-All-Equal 3-Sat instance is a yes-instance.

Proof

Let \((V_1,V_2)\) be a solution to the constructed FCD instance. From Observation 5, it follows that for each \(x_i\in X\), exactly one of \(v_i\) and \(\overline{v}_i\) is part of \(V_1\). Let \(\varphi\) be the truth assignment induced by \(V_1\), i.e., \(\varphi\) sets \(x_i\) to true if \(v_i\in V_1\) and \(x_i\) to false if \(\overline{v}_i\in V_1\). We claim that \(\varphi\) is a solution to the given Not-All-Equal 3-Sat instance. Firstly, for the sake of contradiction, assume that there exists a clause \(y_j\in C\) containing only literals that are satisfied by \(\varphi\). However, by Observation 3, this implies that all vertices of color \(c^y_j\) are part of \(V_1\), contradicting Lemma 8 as there exist \(3Z+3j\) such vertices. Secondly, for the sake of contradiction, assume that \(y_j\) contains no literal satisfied by \(\varphi\). However, by Observation 3, this implies that all vertices of color \(c^y_j\) are part of \(V_2\), contradicting again Lemma 8. Consequently, \(\varphi\) is a solution. \(\square\)

Observing that \(\{v^{\star }_1\}\) is a feedback vertex set of the constructed graph and that the construction can be computed in polynomial time, Theorem 4 follows directly from Lemma 7 and Lemma 9

From our reduction, we can further conclude that FCD is also para-NP-hard with respect to the treewidth plus the number k of districts.

Corollary 7

FCD is NP-hard for \({{\,\textrm{tw}\,}}= 2\) and \(k=2\), even if \(s_{\textrm{min}}=1\) and \(s_{\textrm{max}}=\infty\).

6 FCD on graphs of bounded vertex cover number

Motivated by our hardness results for graphs with constant treewidth, we now turn to the size \({{\,\textrm{vcn}\,}}\) of a minimum vertex cover, a parameter never smaller than the treewidth. In this section, we present two parameterized algorithms, namely, an XP-algorithm for \({{\,\textrm{vcn}\,}}\) and an FPT-algorithm for \({{\,\textrm{vcn}\,}}+ \vert C \vert\). Unfortunately, we were unable to settle whether FCD parameterized by \({{\,\textrm{vcn}\,}}\) is W[1]-hard or fixed-parameter tractable. In contrast, we develop an FPT-algorithm for the number of vertices with degree at least two (a parameter which is never smaller than \({{\,\textrm{vcn}\,}}\)).

Both algorithms for \({{\,\textrm{vcn}\,}}\) rely on the following lemma.

Lemma 10

Let S be a vertex cover of minimum size and \({\mathcal {V}} = (V_1, \dots , V_k)\) a solution to an FCD instance on a graph G. There are at most \({{\,\textrm{vcn}\,}}\) districts in \({\mathcal {V}}\) that contain at least one vertex from S. Moreover, for every \(V_i\in {\mathcal {V}}\) with \(S \cap V_i \ne \emptyset\), there is a set \(J_i \subseteq V_i {\setminus } S\) of at most \(\vert S \cap V_i \vert -1\) vertices such that \(G[(S \cap V_i) \cup J_i]\) is connected.

Proof

As each vertex is only contained in one district, the first part of the lemma follows from the definition of \({{\,\textrm{vcn}\,}}\). To prove the second part, fix some \(V_i\in {\mathcal {V}}\). Consider a minimum spanning tree \(T = (V_i, F)\) of \(G[V_i]\). Let \(J_i \subseteq V_i \setminus S\) be the set of vertices of degree at least two in T. Let \(T':= ((S \cap V_i) \cup J_i, F')\) be the result of deleting from T each vertex \(v \in V_i \setminus (S \cup J_i)\) along with an edge incident to it. Observe that \(T'\) is connected and thus \(G[(S \cap V_i) \cup J_i]\) is connected, which contains \(T'\) as a subgraph. We show that \(\vert J_i \vert \le \vert S \cap V_i \vert - 1\). As by the definition of \(J_i\) vertices from \(J_i\) are only adjacent to vertices from \(S\cap V_i\) and as every vertex from \(J_i\) has degree at least two in T and \(T'\), we have \(\vert F' \vert \ge 2 \vert J_i \vert\). We also have \(\vert F' \vert = \vert S \cap V_i \vert + \vert J_i \vert - 1\) since \(T'\) is a tree. Thus, we have \(\vert J_i \vert \le \vert S \cap V_i \vert - 1\). \(\square\)

We first show that FCD parameterized by \({{\,\textrm{vcn}\,}}\) is in XP using the following approach: We first guess how the vertex cover is partitioned into districts in the sought solution, which gives partial (not necessarily connected) districts. For every partial district, we then guess some vertices outside the vertex cover to include such that the partial district becomes connected, the two most frequent colors, and how often they occur in the resulting district. The remaining problem can be reduced to the polynomial-time solvable (g f)-Factor problem [25].

Theorem 5

FCD is solvable in \(n^{{\mathcal {O}}({{\,\textrm{vcn}\,}})}\) time.

Proof

Suppose there is a solution \({\mathcal {V}} = (V_1, \dots , V_k)\) to the given FCD instance. Let S be a vertex cover of minimum size and let \(I = V \setminus S\) be the vertices outside the vertex cover. As in the proof of Theorem 6, our algorithm first tries all possibilities to fix some structure with respect to S. Then, we will construct an instance of the polynomial-time solvable (g f)-Factor problem, which generalizes Maximum Matching, for each choice of the following:

  • An integer \(k' \le \min (k,{{\,\textrm{vcn}\,}})\). If \(\ell = 0\) or \(s_{\textrm{min}}\ge 2\), then we consider only one choice for \(k'\), namely \(k' = k\).

  • A partition of S into \(k'\) non-empty subsets \(S_1, \dots , S_{k'}\). There are at most \({{\,\textrm{vcn}\,}}^{{{\,\textrm{vcn}\,}}}\) such partitions.

  • For every \(i \in [k']\), a set \(J_i \subseteq I\) of at most \(\vert S_i \vert - 1\) vertices such that \(G[S_i \cup J_i]\) is connected and \(J_i \cap J_{i'} = \emptyset\) for \(i \ne i' \in [k']\). We can assume that \(\vert J_i \vert \le \vert S_i \vert - 1\) vertices are sufficient to make \(G[S_i]\) connected by Lemma 10. The number of choices for \(J_i\) is at most \(\prod _{i \in [k']} n^{ \vert S_i \vert - 1} \le n^{{{\,\textrm{vcn}\,}}}\).

  • For every \(i \in [k']\), a pair \((c_i, c_i')\) of colors, which have the largest numbers of occurrences in the sought \(V_i\) among all colors. There are at most \(\vert C \vert ^{2{{\,\textrm{vcn}\,}}}\) choices for all pairs of colors.

  • For every \(i \in [k']\), the numbers \(z_{c_i}, z_{c_i'}\) of occurrences of color \(c_i\) and \(c_i'\), respectively, with \(z_{c_i} - \ell \le z_{c_i'} \le z_{c_i}\). Since \(z_{c_i} \le n\) and \(z_{c_i'} \le n\), there are at most \(n^{2{{\,\textrm{vcn}\,}}}\) choices.

Since \(\vert C \vert \le n\), there are at most \(n^{{\mathcal {O}}({{\,\textrm{vcn}\,}})}\) choices. Let \(I' = I {\setminus } \bigcup _{i \in [k']} J_i\). Now the question is whether we can partition the vertices V into k districts \((V_1, \dots , V_k)\) such that, for \(i\in [k',k]\), \(V_i\) consists of a single vertex from \(I'\), and for \(i\in [k']\), \(S_i\cup J_i\subseteq V_i\), \(G[V_i]\) is connected, \(\vert V_i \vert \in [s_{\textrm{min}},s_{\textrm{max}}]\), \(\chi _{c_i}(V_i)=z_{c_i}\), \(\chi _{c_i'}(V_i)=z_{c_i'}\) and \(\chi _{c}(V_i)\le \chi _{c_i'}(V_i)\) for all \(c\in C\setminus \{c_i\}\). We reject the current combination, if for some \(i\in [k']\) \(\chi _{c_i}(S_i\cup J_i)> z_{c_i}\), \(\chi _{c_i'}(S_i\cup J_i)>z_{c_i'}\), \(\chi _{c}(S_i\cup J_i)> z_{c_i'}\) for some \(c\in C{\setminus } \{c_i\}\), or \(\vert S_i \cup J_i \vert > s_{\textrm{max}}\). Otherwise, to decide whether it is possible to distribute the vertices \(I'\) to construct a partition respecting the above-described properties, we reduce to (g f)-Factor, where given a graph \(H=(U,F)\) and two functions \(g,f:V\mapsto {\mathbb {N}}\), the question is whether there is a subgraph \(H' = (U, F')\) with \(F' \subseteq F\) such that every vertex has at least g(v) and at most f(v) neighbors in \(H'\).

We construct a bipartite graph H and fg as follows. The left bipartition of H consists of all vertices \(v\in I'\). For every \(i \in [k']\), we add the following vertices to the right bipartition:

  • For each color \(c \in \{ c_i, c_i' \}\), we add \(z_c-\chi _{c}(S_i\cup J_i)\) vertices (call them \(A_i\)) and connect them via edges to all vertices \(v\in I'\) having color c in G which have at least one neighbor in \(S_i\) in G.

  • For each color \(c \in C \setminus \{ c_i, c_i' \}\), we add \(z_{c_i'}-\chi _{c}(S_i\cup J_i)\) vertices (call them \(B_i^c\)) and connect them via edges to all vertices \(v\in I'\) having color c in G which have at least one neighbor in \(S_i\) in G. Let \(B_i:= \bigcup _{c \in C {\setminus } \{ c_i, c_i' \}} B_i^c\).

If \(s_{\textrm{min}}= 1\), then we add a set \(A'\) of \(k - k'\) vertices to the right bipartition and connect them via edges to all vertices of \(I'\). For every vertex v introduce thus far, let \(f(v) = g(v) = 1\). Finally, for every \(i \in [k']\), add a vertex \(v_i\) to the left side and let \(f(v_i) = \vert B_i \vert - \max (0, s_{\textrm{min}}- \vert A_i\cup S_i \cup J_i \vert )\) and \(g(v_i) = \vert B_i \vert - s_{\textrm{max}}+ \vert A_i\cup S_i \cup J_i \vert\) (if it does not hold that \(0\le g(v_i)\le f(v_i)\), then we continue with the next combination). We connect \(v_i\) to all vertices from \(B_i^c\).

If the constructed (gf)-Factor instance is a yes-instance, we return yes; otherwise, we continue with the next combination. To see why the algorithm works correctly, suppose that there is a subgraph \(H' = (U, F')\) of H such that the degree of every vertex v in \(H'\) is in the range [g(v), f(v)]. From \(H'\), we can construct a solution of the given FCD instance by putting each vertex \(v\in I'\) into the district corresponding to the neighbor of v in \(H'\) (if \(s_{\textrm{min}}= 1\), then every vertex adjacent to some vertex from \(A'\) forms a district of its own). Moreover, for all \(i\in [k']\), we add \(S_i\cup J_i\) to \(V_i\). As all vertices from \(I'\) have one neighbor in \(H'\), each vertex from \(I'\) is assigned to a district. As all vertices from A have one neighbor in \(H'\), for \(i\in [k']\), it holds that \(\chi _{c_i}=z_{c_i}\) and \(\chi _{c'_i}=z_{c'_i}\) and thus that the resulting districts are \(\ell\)-fair. Moreover, the number of neighbors of \(v_i\) in \(H'\) is between \(\vert B_i \vert - s_{\textrm{max}}+ \vert A_i\cup S_i \cup J_i \vert\) and \(\vert B_i \vert - \max (0, s_{\textrm{min}}- \vert A_i\cup S_i \cup J_i \vert )\), implying that there are at least \(s_{\textrm{min}}- \vert A_i\cup S_i \cup J_i \vert\) and at most \(s_{\textrm{max}}- \vert A_i\cup S_i \cup J_i \vert\) vertices in \(B_i\) that adjacent to some vertex from \(I'\). Thus, the size of each district is in \([s_{\textrm{min}}, s_{\textrm{max}}]\). \(\square\)

Recall that FCD is NP-hard and W[1]-hard with respect to \(\vert C \vert +k\) on trees with diameter four (Corollary 5). In contrast to this, since a tree of diameter at most three has a vertex cover of size at most two, by Theorem 5, the following holds.

Corollary 8

FCD is polynomial-time solvable on trees with diameter at most three.

Recall that we have shown in Corollary 5 that FCD is NP-hard on trees with diameter four.

Using a similar approach as for Theorem 5, we show that FCD is fixed-parameter tractable with respect to \({{\,\textrm{vcn}\,}}+ \vert C \vert\). The overall idea is the following: We guess the partition of the vertex cover into the districts. We categorize vertices outside the vertex cover according to their neighborhood and color. We formulate the resulting problem as an ILP whose number of variables only depends on \({{\,\textrm{vcn}\,}}+ \vert C \vert\). Subsequently we employ Lenstra’s FPT-algorithm for ILPs [30, 36] to obtain the following:

Theorem 6

FCD parameterized by \({{\,\textrm{vcn}\,}}+~ \vert C \vert\) is fixed-parameter tractable.

Proof

Suppose there is a solution \({\mathcal {V}} = (V_1, \dots , V_k)\). Let S be a vertex cover of minimum size. Then the vertices \(I = V {\setminus } S\) that are not part of the vertex cover form an independent set. Let \(k'\) be the number of districts in \({\mathcal {V}}\) that contain at least one vertex of S. By Lemma 10, we have \(k' \le {{\,\textrm{vcn}\,}}\). Assume without loss of generality that exactly for every \(i \in [k']\), \(S_i \ne \emptyset\) where \(S_i = S \cap V_i\). Then, for every \(i \in \{ k' + 1, \dots , k \}\), we have \(V_i = \{ v \}\) where v is some vertex from I.

We say that a vertex \(v \in I\) has type (cX) for \(c \in C\) and \(X \subseteq S\) if the color of v in G is c and its neighborhood is X. Let \({\mathcal {T}}\) denote the set of all types (note that \(\vert {\mathcal {T}} \vert = 2^{{{\,\textrm{vcn}\,}}} \cdot \vert C \vert\)). For type \(T \in {\mathcal {T}}\), let \(n_T\) be the number of vertices of type T, and for type T with \(n_T > 0\) let \(v_{T} \in I\) be an arbitrary vertex of type T. Our algorithm will construct an ILP instance for each choice of the following:

  • An integer \(k' \le {{\,\textrm{vcn}\,}}\). If \(\ell = 0\) or \(s_{\textrm{min}}> 1\), then we consider only one choice for \(k'\), namely \(k' = k\).

  • A partition of S into non-empty subsets \(S_1, \dots , S_{k'}\). There are at most \({{\,\textrm{vcn}\,}}^{{{\,\textrm{vcn}\,}}}\) such partitions.

  • For every \(i \in [k']\), a set \({\mathcal {T}}_i\) of at most \(\vert S_i \vert - 1\) types such that \(G[S_i \cup J_i]\) is connected for \(J_i = \{ v_T \mid T \in {\mathcal {T}}_i \}\). If \(\vert \{ i \mid T \in {\mathcal {T}}_i \} \vert > n_T\) for some \(T \in {\mathcal {T}}\), then we reject the current combination. Note that we can assume that \(\vert {\mathcal {T}}_i \vert \le \vert S_i \vert - 1\) vertices are sufficient to make \(G[S_i]\) connected by Lemma 10. As there are at most \(2^{{{\,\textrm{vcn}\,}}} \cdot \vert C \vert\) types, the number of choices for all \({\mathcal {T}}_i\) is at most \(\prod _{i \in [k']} (2^{{{\,\textrm{vcn}\,}}} \cdot \vert C \vert )^{ \vert S_i \vert - 1} \le 2^{{{\,\textrm{vcn}\,}}^2} \cdot \vert C \vert ^{{{\,\textrm{vcn}\,}}}\).

  • For every \(i \in [k']\), a pair \((c_i, c_i')\) of colors, which have the largest numbers of occurrences in \(V_i\) among all colors. In the sought solution it holds that \(\chi _{c_i}(V_i) \ge \chi _{c_i'}(V_i)\). There are at most \(\vert C \vert ^{2{{\,\textrm{vcn}\,}}}\) choices for all pairs of colors.

To decide on the distribution of the vertices from I, we now construct an ILP. For the ILP formulation, we introduce an integer variable \(x_{i, T}\) for each \(i \in [k']\) and \(T \in {\mathcal {T}}\). The variable \(x_{i, T}\) will indicate the number of vertices of type T that we put in \(V_i\). Clearly, there are at most \(n_T\) vertices for each type \(T\in {\mathcal {T}}\) that belong to one of \(V_1, \dots V_{k'}\):

$$\begin{aligned} \sum _{i \in [k']} x_{i, T} \le n_T. \end{aligned}$$

Then for every \(i \in [k']\) and \(T \in {\mathcal {T}}_i\), at least one vertex of type T is included in \(V_i\) to satisfy our previous guesses:

$$\begin{aligned} x_{i, T} \ge 1. \end{aligned}$$

Further for every \(i \in [k']\), only vertices from I that are adjacent to a vertex from \(S_i\) can be part of \(V_i\). Thus, for all types \(T=(c,X)\in {\mathcal {T}}\) with \(X\cap S_i=\emptyset\) it holds that:

$$\begin{aligned} x_{i, T} = 0. \end{aligned}$$

Moreover, there are exactly \(k - k'\) vertices which are contained in none of \(V_1, \dots , V_{k'}\) as they form their own districts:

$$\begin{aligned} \sum _{T \in {\mathcal {T}}} \left( n_T - \sum _{i \in [k']} x_{i, T} \right) = k - k'. \end{aligned}$$

For \(i\in [k']\) and \(c\in C\), let \(n_{i, c}\) be the number of vertices of color c in \(V_i\):

$$\begin{aligned} n_{i, c} = \chi _{c}(S_i) + \sum _{T \in {\mathcal {T}} \text { with } T = (c, X) \text { for some }X\subseteq S} x_{i, T}. \end{aligned}$$

For every \(i \in [k']\), the following two constraints will ensure that the district \(V_i\) is \(\ell\)-fair.

$$\begin{aligned} n_{i,c_i'} \le n_{i, c_i} \le n_{i, c_i'} + \ell , \text { and } n_{i, c_i'} \ge n_{i, c} \text { for all } c \in C \setminus \{ c_i \}. \end{aligned}$$

Finally, for every \(i \in [k']\), the following imposes the size constraints:

$$\begin{aligned} s_{\textrm{min}}\le \sum _{c \in C} n_{i, c} \le s_{\textrm{max}}. \end{aligned}$$

For the running time, observe that we construct at most \(2^{{\mathcal {O}}({{\,\textrm{vcn}\,}}^2)} \cdot \vert C \vert ^{{{\,\textrm{vcn}\,}}}\) ILP instances. Since each ILP instance uses at most \({{\,\textrm{vcn}\,}}\cdot 2^{{{\,\textrm{vcn}\,}}} \cdot \vert C \vert\) variables, it can be solved in \(f({{\,\textrm{vcn}\,}}+ \vert C \vert ) \cdot n^{{\mathcal {O}}(1)}\) time for some computable function f due to a result of Lenstra [30, 36]. \(\square\)

It remains open whether FCD is fixed-parameter tractable or W[1]-hard with respect to \({{\,\textrm{vcn}\,}}\). However, we can prove fixed-parameter tractability with respect to the number of vertices with degree at least two. Neglecting connected components consisting of two vertices, the set of vertices with degree at least two is always also a vertex cover; therefore, this parameter upper-bounds the vertex cover number in connected graphs on at least three vertices. The idea of the algorithm is to guess the partitioning of the degree-two vertices into the districts. Then, for each of these districts, there exists a set of degree-one vertices where each of these vertices either belongs to the district or needs to form its own. Lastly, we can distribute the degree-one vertices using Proposition 4.

Proposition 7

Let \(p\) be the number of vertices with degree at least two. FCD is solvable in \({\mathcal {O}}(p^{p}\cdot k \cdot n)\) time.

For the sake of simplicity, we assume that the graph does not contain connected components of size two (we can deal with them easily separately). Let \(X\subseteq V\) be the set of vertices with degree at least two and \(Y=V\setminus X\) be the set of degree-one vertices. We iterate over all combinations of the following:

  • An integer \(k'\le k\). If \(\ell =0\) or \(s_{\textrm{min}}> 1\), then we only consider \(k'=k\).

  • A partition of X into \(k'\) non-empty subsets \(X_1,\dots , X_{k'}\). There are at most \(p^{p}\) such partitions.

For \(i\in [k']\), let \(Y_i\subseteq Y\) be the set of vertices from Y that are adjacent to a vertex from \(X_i\). If \(\ell =0\) or \(s_{\textrm{min}}> 1\), then \(k' = k\) and hence we accept if (\(X_1\cup Y_1,\dots , X_k\cup Y_k\)) is a solution and otherwise continue with the next combination.

Otherwise, we put all vertices from \(X_i\) in district \(V_i\). Each vertex \(v\in Y_i\) is either part of \(V_i\) or forms its own district. Thereby, for each \(i\in [k']\), we know by applying Proposition 4 an interval \([\alpha _i,\beta _i]\) for the number of \(\ell\)-fair districts respecting the size constraints in which the vertices \(X_i\cup Y_i\) can be partitioned or we know that no solution exists (in this case, we continue with the next combination). We now check whether the given k lies between \(\sum _{i\in [k']} \alpha _i\) and \(\sum _{i\in [k']} \beta _i\) and return the answer.

As there exist \(p^{p}\) partitions of X, the running time of the algorithm is \({\mathcal {O}}(p^{p} \cdot k\cdot n)\).

7 Conclusion

We initiated a thorough study of the NP-hard Fair Connected Districting (FCD) problem. We considered FCD on specific graph classes and analyzed the parameterized complexity of FCD with a focus on the number of districts, the number of colors, and various graph parameters. We have shown that while FCD can be solved on simple graph classes in polynomial time (mostly using approaches based on dynamic programming), on trees it is already NP-hard and W[1]-hard with respect to the combined parameter number of colors plus number of districts. Nevertheless, for graph parameters such as the vertex cover number and the max leaf number, we developed XP-algorithms.

As to challenges for future research, we left open whether FCD is fixed-parameter tractable or W[1]-hard with respect to the vertex cover number or max leaf number. The former question is particularly intriguing because it is related to an open question of Stoica et al. [48] on the parameterized complexity of Fair Regrouping with respect to the number of districts (which has been very recently partly solved by Boehmer and Koana [10]): Recall that the input of Fair Regrouping is additionally endowed with a function \(f :V \rightarrow 2^{[k]}\) with the requirement that every vertex \(v \in V\) should belong to district \(V_i\) for some \(i \in f(v)\) (and that the connectivity requirement is dropped). Consider the FCD instance on the bipartite graph with bipartition (V, [k]), where there is an edge between \(i \in [k]\) and \(v \in V\) if and only if \(i \in f(v)\). The vertex cover number of this graph is at most the number k of districts. Under certain constraints, FCD on this graph becomes essentially equivalent to Fair Regrouping. Thus an algorithm for the vertex cover number for FCD would also shine further light on the complexity of Fair Regrouping with respect to the number of districts. The latter question concerning the max leaf number is interesting because FCD could turn out to be one of few problems that are in XP yet W[1]-hard with respect to the max leaf number. It would be also promising to consider additional graph parameters (such as cliquewidth, treedepth or sparsity related parameters like the maximum degree or the degeneracy) or to examine further graph classes (such as grids).

From a broader perspective, there are several natural extensions of FCD. For instance, as already suggested by Stoica et al. [48], there may be a function that specifies for each vertex a set of districts to which the vertex can belong.Footnote 6 While all our hardness results still hold for this more complicated setting, it is open which of our algorithmic results can be adapted. Moreover, we did not study the generalization of FCD where each vertex has an integer weight for each color (such a study has been done in the context of gerrymandering by Cohen-Zemach et al. [15] and Ito et al. [29]). Weighted FCD is motivated by the following application scenarios: First, in some settings we might be restricted to put a certain group of agents always in the same district (those can be combined into one vertex). Second, if each vertex represents a voter and the colors represent alternatives, then voters might want to give points to different alternatives (such as done in the context of positional scoring rules).

Lastly, one can modify the definition of FCD. For instance, as done, among others, by Lewenberg et al. [38] and Eiben et al. [20] in the context of Gerrymandering and, among others, by Lu et al. [39] and Fotakis and Tzamos [24] in the context of facility location problems, instead of placing agents on a social network, the agents may be placed in a metric space. The task is then to place k ballot boxes in the space where each agent is assigned to the closest ballot box. The goal is again to make the resulting districts as fair as possible.