Mixed-integer programming techniques for the connected max-k-cut problem

We consider an extended version of the classical Max-k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-Cut problem in which we additionally require that the parts of the graph partition are connected. For this problem we study two alternative mixed-integer linear formulations and review existing as well as develop new branch-and-cut techniques like cuts, branching rules, propagation, primal heuristics, and symmetry breaking. The main focus of this paper is an extensive numerical study in which we analyze the impact of the different techniques for various test sets. It turns out that the techniques from the existing literature are not sufficient to solve an adequate fraction of the test sets. However, our novel techniques significantly outperform the existing ones both in terms of running times and the overall number of instances that can be solved.


Introduction
In this paper we study a special version of the graph partitioning problem in which all parts of the partition have to be connected. The objective is to maximize the number of edges between the parts. We call it the connected Max-k-Cut problem or C-Maxk-Cut for short.
On the one hand, Max-k-Cut is a classical graph theoretical problem and on the other hand, connectivity is a commonly studied restriction for graph problems; see also Sect. 1.1. Furthermore, our reason to study the C-Max-k-Cut stems from an application: it is a subproblem in computing a market splitting for electricity markets; see, e.g., [3,21,22,34]. The C-Max-k-Cut problem is NP-hard (see Sect. 2), and it is also hard to solve with out-of-the-box mixed-integer linear programming (MILP) solvers; see Sect. 8. One reason for the latter is the strong inherent symmetry of the problem. Hence, we focus on developing specialized branch-and-cut techniques to speed up the solution process.
C-Max-k-Cut emerges in other applications as well. For example, forest planning problems are studied in [8], where an important constraint is that, for a number of different planning problems, old tree populations must stay connected. Another related problem is that of finding a coloring of a graph such that each color induces a connected subgraph. However, in this context typically other objectives are used than the one described above. A further application is phylogenetics, where phylogenetic trees are used to model the relationship between different species based upon their similarities and differences [37]. Biological constraints for the characteristics then demand connectivity [10]. Additionally, image segmentation can be done via graph cuts. In this field, imposing connectivity constraints also seems to be beneficial [49].
In the last couple of years, the focus on Max-k-Cut was on approximation algorithms, especially using (complex) semidefinite programming; see, e.g., [19,42].
In contrast, our interest is on exact methods that deliver globally optimal solutions for C-Max-k-Cut. However, due to both the theoretical and computational hardness of the problem, out-of-the-box MILP solvers typically cannot produce such solutions within an acceptable amount of time. Our contribution is the following. We augment a general-purpose MILP solver by (i) tweaking known techniques from the existing literature for related problems and by (ii) developing new MILP-techniques that are tailored for the C-Max-k-Cut problem. Regarding the adapted techniques from the literature, we consider cutting planes for MaxCut and Max-k-Cut, standard concepts for primal heuristics like rounding heuristics, and symmetry handling. For a general treatment of such MILP-techniques we refer to [29]. Up to this point, the connectivity constraints are not yet exploited. Hence, we also develop methods specifically for C-Max-k-Cut-namely branching rules, a propagation algorithm, and further cuts. As it turns out, the novel techniques presented in this paper yield a tailored branch-and-cut algorithm that significantly outperforms the method obtained by using the techniques from the existing literature.
In the remainder of this section, we further discuss related literature. Afterward, we give the main definitions and two MILP models-a flow-and a cut-based formulation-for the C-Max-k-Cut in Sect. 2. In Sect. 3-6, we describe the details of our tailored MILP-techniques regarding cuts, propagation, branching rules, and primal heuristics. Another focus is on symmetry breaking, which is discussed in Sect. 7. The computational results are presented in Sect. 8 and we give our conclusions in Sect. 9.

Related literature
Especially for k = 2, C-Max-k-Cut has been studied before. In [23], the authors show how the connectivity constraints may change MaxCut: The fraction of the edges in the cut can be arbitrarily close to zero. Furthermore, they prove that C-Max-2-Cut is NP-hard even for planar graphs. In [24], NP-hardness of a slightly different problem is shown. The authors consider MaxCut where only one side has to be connected. For this problem, they present an Ω(1/ log n) approximation algorithm. Moreover, for bounded genus graphs, the existence of an O(1/2 − ε) approximation algorithm for a fixed ε is given in [36]. For series-parallel graphs, a linear time algorithm for C-Max-2-Cut is presented in [9].
Coming from the application of phylogenetic trees, [10] describes another interesting approach. They search for a "convex coloring", where convex means that the induced subgraph for each color is connected. After starting with a partial coloring of the nodes, they try to find a minimal number of nodes that must be redyed so that a convex coloring can be obtained. Due to their application, the authors' study is restricted to trees. C-Max-k-Cut with a nonlinear objective function is considered in [44], where different local search methods are used for solving problems from political districting.
As mentioned above, connectivity constraints arise in a number of different graph problems. We now discuss a small selection of them. In [30] sufficient conditions are derived under which a connected partition of the edges exists such that every part has the same number of edges. The general modeling of connectivity constraints is discussed in [17]. The authors compare a compact flow formulation with a formulation using an exponential number of cuts, which can be separated efficiently. Their suggestion is a formulation with variables on nodes and constraints on node-cut sets. Furthermore, they show how to strengthen the cut inequalities. The polyhedral properties of the connectivity cuts are analyzed in [50]. A different approach is to use edge variables and model connectivity with a tailored objective function. We refer to the survey article [51] for this case, where the connection to Steiner trees is discussed as well.
In [8], the above mentioned forest planning is studied. However, the main focus is on connected subgraphs instead of graph partitioning. The connectivity constraints only use node variables and special node-cut sets.
The related connected subgraph problem is considered in [14], where a maximumweight connected subgraph with some additional characteristics is searched. For example, this has an application in wildlife conservation. To model the connectivity, the authors use single-and multi-commodity flow techniques as well as directed Steiner trees. The multi-commodity formulation is larger than the single-commodity formulation and the authors observe that the larger formulation results in a stronger LP relaxation of the problem. Moreover, reduction techniques and heuristics for the maximum connected subgraph problem are investigated in [43].
Finally, in [46], the hardness of C-Max-2-Cut with the additional requirement that the supplies in both parts are balanced is analyzed. This has an application in power grid islanding, which is used to soften propagating failures.

Problem statement
Let G = (V , E) be an undirected simple and connected graph and let k ≥ 2 be an integer. The connected Max-k-Cut problem can then be formulated as the following mixed-integer linear problem: s.t.
Here and in what follows, the abstract constraint x ∈ C in (1h) models connectivity of the parts of the partition that is given by the vector x = (x vi ) v∈V ,i∈ [k] . The details are discussed below. The binary variables x vi model whether vertex v is located in part i of the partition (x vi = 1) or not (x vi = 0). Furthermore, cut edges {u, v} have y uv = 1 and y uv = 0 holds if and only if the vertices u and v are located in the same part. The objective function aims to maximize the total weight w y of the cut. This problem is NP-hard. Considering the weighted case, the hardness follows since our problem contains MaxCut (k = 2), which is NP-hard even for the complete graph [33] (both for positive or free weights), where the connectivity is trivially fulfilled. For the unweighted case, i.e., w uv = 1 for all {u, v} ∈ E, the hardness is shown in [23] even for planar graphs.
If all weights are positive and instead of maximizing the objective, we minimize it, the connectivity constraint is not relevant: suppose there is a solution in which a part decomposes into two connected components V 1 and V 2 . Then there must be (at least) two arcs connecting V 1 and V 2 to other parts, since we assume that the graph is connected. Hence, there are less arcs in the cut if we add V 1 or V 2 to another part, i.e., we obtain a better solution. Therefore, the minimization version is equivalent to the min-k-cut problem, which is known to be NP-hard for arbitrary k and solvable in O(|V | k 2 ) for constant k [20]. If the weights are allowed to be negative, we can transform every minimization instance into the corresponding C-Max-k-Cut with negated objective function. Hence, this version is also NP-hard.
In this paper, we discuss two different possibilities to model the connectivity requirement of Constraint (1h): a flow and a cut formulation. For the former, we follow the modeling of [48] and [21,34] that works on the bi-directed graph G :=(V , A) with arc set A consisting of arcs a 1 e = (u, v) and a 2 e = (v, u) for all e = {u, v} ∈ E. Moreover, we introduce flow variables f uv for every arc (u, v) ∈ A. Connectivity of the parts of the partition can then be modeled using the constraints where we used the standard δ-notation for in-and outgoing arcs of a node u.
Here and in what follows, M is a sufficiently large constant. In our specific setting, we can choose M = |V | − k + 1. The idea of the flow formulation is to declare one node of a part as an "artificial sink" and to route flow from every other node of the part to the sink by using only noncut edges to ensure connectivity. The newly introduced variables z ui model whether vertex u is the artificial sink of the part i. The Constraints (2a) and (2b) state that every part has exactly one sink. Constraint (2c) models that flow is only allowed on arcs that are not in the cut. Finally, Constraint (2d) models that if u is not an artificial sink, it has to have an outflow of at least 1. Otherwise the constraint is redundant. Note that Constraints (2a) and (2b) ensure that every part i ∈ [k] contains at least one vertex. Thus, every part is connected and non-empty.
Besides the flow formulation (2) further approaches to enforce connectivity of the parts of the partition are discussed in the literature. On the one hand, one can model connectivity via multi-commodity flow formulations; see, e.g., [14,21,34]. On the other hand, connectedness can also be modeled directly without using additional variables but by using additional constraints [8]. In this paper, we combine the concepts of adding additional variables and additional constraints to enforce connectivity. We decide to use the single-commodity flow formulation (2) instead of the multi-commodity flow formulation because both formulations model connectivity, but the single-commodity version introduces fewer auxiliary variables.
Next, we introduce the cut formulation, for which we need the concept of induced subgraphs. Given a graph G = (V , E) and a vertex subset Using this definition, connectivity can also be modeled by imposing the inequalities w∈S Moreover, only imposing the minimal separators suffices, since they dominate the remaining separator inequalities. The latter inequalities can be separated by maximum-flow calculations, but the authors of [16] refrain from using this procedure because it is too time consuming.
As we want to partition the graph into exactly k parts, we also add the non-emptiness constraints Note that these are not necessary if we assume all edge weights to be strictly positive or if we use the flow formulation due to (2a) and (2b).

Cuts
In this section, we discuss cuts for the considered problem. We start with the novel ones derived from the connectivity condition (Sect. 3.1-3.3) and then proceed with the review of known cuts from the MaxCut or Max-k-Cut literature (Sect. 3.4-3.6). For an overview of cuts for Max-k-Cut see, e.g., [4]. Furthermore, we state some implementation details, e.g., about separation routines, and give a brief comparison in Table 1 at the end of this section. Later, in Sect. 8 we evaluate the computational gain of the presented cuts in a detailed numerical study.

Articulation-vertex cuts
To state the first cut, we first have to give another definition.

Definition 2
An articulation vertex is a vertex u ∈ V such that the graph without u is no longer connected, i.e., G[V \{u}] decomposes into at least two components. We denote the set of all articulation vertices by A.
In particular, every articulation vertex u is a minimal separator. For an articulation vertex u, we denote the resulting c connected components by V u j , j ∈ [c], and we identify a component with its vertex set. Let and r be two vertices in different components. Then the following special case of (4) holds: see Fig. 1 (left). In principle, one can add all articulation-vertex cuts at the beginning of the branchand-bound algorithm. However, our numerical experiments revealed that for many graphs of our test sets there are simply too many to be efficient. Note that for separating these cuts it suffices to find a violated inequality for the maximal values x i and x ri , respectively, which has a running time of O(k|A||V |) after a preliminary calculation of the articulation vertices and the corresponding connected components. Testing all possible inequalities of type (5) is thus possible in O(k|A||V | 2 ). This procedure turned out to be beneficial in our case and was implemented in all cases where separator inequalities are used.
Since there were initially too many of these cuts, we tried an even more restricted form of cuts, which is explained next. To this end, we denote for a vertex v of an undirected graph the set of incident edges by δ(v). The set of all neighbors of v, i.e., the nodes that are adjacent to v, is denoted by N (v).

Leaf cuts
Let ∈ V be a leaf of G, i.e., |δ( )| = 1. Furthermore, let u be the unique neighbor of , i.e., N ( ) = {u}. This, in particular, means that u is an articulation vertex and a minimal separator. Hence, the following inequality holds: see also Fig. 1 (right) for an illustration. As for the articulation-vertex cuts, the number of these inequalities is still too large for many of our examples, because we have to consider for each part all pairs of leaves and (basically) all remaining nodes of G. We added another separation routine for these inequalities, which iterates over all possible leaves and checks whether the leaf cut is violated. This routine has a running time of O(k|V | 2 ).

Bounded-edge cuts
A simple cut involving all y-variables is given by This inequality is valid since where n i is the number of nodes in part i of the partition. Observe that the first estimation is valid because each connected part contains at least n i − 1 edges. The last equality follows from Constraint (1b). One drawback of (6) might be that all edge variables are involved; a cut linking decisions about fewer edges could be more significant. Hence, if we can identify certain subsets of vertices where less than k different parts are possible, this might lead to stronger cuts. To this end, we want to utilize the additional information from vertices that are already assigned to a certain part. This means going from the globally valid cut (6) to locally valid cuts, i.e., cuts that are only valid in certain sub-trees of the branch-and-bound tree. Herewith, we can generalize the idea of (6): We denote by F i :={u ∈ V : x ui = 1} the set of vertices that are fixed to be in part i of the partition; see also Fig. 2. Let G i be the graph defined by removing all nodes that are fixed to part i, i.e., i.e., k denotes the number of parts that contain at least one node from a connected component C r , r = .
Since we know that if a node from C r is in part j, no node u ∈ C , r = , can be in part j, we can extend (6) to for every ∈ [q]. Note that (7) is (6) applied to the subgraph induced byC with the additional subtraction of k . In particular, if k = 2, the global and local cuts coincide: If there exists r = such that u ∈ C r is in part j, then no node in C can be contained in part j. Hence, all nodes in C can be assigned to the other part and every y uv , , has value 0. Consequently, the global bounded-edge cut reduces to the local cut on E[G r ].
Note that the global cut is completely dense, while the local cuts may be much sparser if E[G ] is small. Thus, because dense constraints may slow down the solution process [40], it might be favorable to separate the local cuts instead of adding the global cut initially. These cuts can be separated by computing the graphG as well as the corresponding value k , which can be done in O(k(|V | + |E|)) time, and adding the locally valid cut if (7) is violated.

Odd-cycle cuts
In the remainder of this section we review some known cuts from the literature that we later also use to enhance our branch-and-cut framework.
For k = 2, any cycle crosses a cut in the graph an even number of times, which implies that the so-called odd-cycle cuts [5] are feasible: Here, Z ⊆ E is a cycle in G. We separate these inequalities in polynomial time using a shortest-path calculation in an auxiliary graph G ; see [5]. Let G = (V , E ) be derived from G = (V , E) by having two nodes u and u for each u ∈ V . Every edge {u, v} ∈ E gives rise to two edges {u , v } and {u , v } in E with weights y uv and two edges {u , v } and {u , v } with weights 1 − y uv . For every vertex u ∈ V , we calculate a shortest path from u to u . The minimum length of all these paths gives rise to the required cycle. The running time of this separation procedure is O(|V |(|V | log|V | + |E|)) by using Dijkstra's algorithm.

Triangle cuts
For k = 2, no cut can contain exactly one edge from any triangle. Therefore, if the nodes u, v, w form a triangle, the following cuts are also valid [11]: y vw − y wu − y uv ≤ 0, −y vw + y wu − y uv ≤ 0, −y vw − y wu + y uv ≤ 0. Clique cut Arbitrary O(|V | k+1 ) [11] We separate these cuts by first calculating all triangles in G and then iterating over all of them to check if there is a violated one.

Clique cuts
For any subset V with |V | = k + 1 vertices, at least two vertices have to be in the same part. Hence, if G[V ] forms a clique, at least one edge is not in the cut. Thus, the so-called clique inequalities hold [11]: To separate these cuts, we compute, in the beginning of the branch-and-cut algorithm, a list containing all cliques of size k+1 by a brute-force method, which runs in O(|V | k+1 ) time. In each separation round of the branch-and-cut method, we iterate over the list of cliques and check whether a corresponding inequality is violated. Since the initialization is very costly and the size of the list may grow exponentially in k, we use this routine only for k ≤ 3. Table 1 summarizes the cutting planes discussed in this section. The column "cutting plane" contains the names of the considered cuts and column "k" specifies for which values of k the corresponding cutting planes are defined. Column "separation complexity" reports on the complexity of separating the cuts and column "reference" provides references for the discussed families of inequalities.

Propagation algorithm
In this section we state an algorithm for propagating the information of vertices that are already assigned to a part of the partition to non-assigned ones. Hence, we consider the situation in which we are given a partial solution x, i.e., some variables are set to 1, some to 0, and the remaining ones are still fractional or unfixed. Given a partial solution, the propagation algorithm 1 assigns vertices to a certain part if this assignment is necessary for connectedness. Moreover, if an unassigned vertex cannot be reached from vertices of part i, then that vertex cannot be in this part. In the following, we denote by F i :={u ∈ V : x ui = 1} the set of vertices that are fixed to be in part i of the partition.
The first aim of Algorithm 1 is to detect vertices u ∈ V that cannot be assigned to a part i. If such a vertex u is detected, the algorithm sets x ui = 0. To find such vertices for part i, we remove all nodes from G which have already been assigned to a part i = i, which results in a graphG (Line 2). Afterward, we select a vertex q ∈ F i and compute all nodes inG that are reachable from q by a breadth-first search (BFS). Let T be the obtained BFS tree. If there exist vertices u inG that are not contained in T , every path from q to u in G has to traverse a vertex assigned to another part. Hence, assigning both q and u to part i results in a disconnected part. Consequently, if there exists a vertex assigned to part i that is not contained in T , the partial solution cannot be extended to a connected solution-it is infeasible; see Line 4. Otherwise, x ui can be set to 0 for all unassigned vertices inG that are not in T ; see Line 5.
The second idea incorporated in Algorithm 1 is to assign a vertex u to part i if every extension of the partial solution satisfies x ui = 1. This is the case if there exists an unassigned articulation vertex u inG that separates two vertices contained in F i . Thus, such articulation vertices need to be assigned to the same part as well in order to ensure connectedness; see Line 8.

Algorithm 1 Propagation.
Input: A graph G = (V , E), a partial solution x, and the corresponding sets F i 1: for all i ∈ [k] do 2:

3:
Let q be some arbitrary vertex in F i and set T ← BFS-T(G, q).

4:
if there exists u ∈ F i \T then partial solution is infeasible, terminate 5: for all unassigned u / ∈ T do x ui = 0 6: Compute the set of all articulation vertices A ofG. 7: for all vertices u ∈ A with u / ∈ F i do 8: if u separates two vertices from F i then x ui = 1 9: function BFS-T(G,u) 10: Compute a spanning tree T of G starting from u using BFS. 11: return T The running time of this algorithm is O(k|V ||E|), which can be seen as follows: Line 1 needs O(k) time. The part inside the for-loop is dominated by the inner forloops over the articulation vertices. Computing them in Line 6 can be done in time O(|V | + |E|) by one breadth-first-search call and some linear checking [26]. Iterating over the articulation vertices in Line 7 is in time O(|V |) and checking if a vertex separates parts containing at least one assigned vertex in Line 8 can be realized by, e.g., one BFS call, and linear checking in time O(|V | + |E|).
As the algorithm depends only on components rather than specific vertices, one idea to improve the running time is to contract edges if both endpoints are assigned to the same part of the partition. This leads to smaller graphs on which the propagation might be faster. However, since the algorithm can be applied in every node of the branch-and-bound tree in principle, it is very costly to keep the contracted graph up to date: Suppose we are given two nodes b and b of the branch-and-bound tree that are treated consecutively and let p be their first common predecessor. To provide the correct contracted graph in b , we basically have two choices. On the one hand, we can expand the edges that have been contracted along the path from p to b to find the contracted graph at p and then contract the edges along the path from p to b . One the other hand, we can compute the contracted graph from scratch at each node of the branch-and-bound tree. Consequently, the cost of contracting and expanding as well as the handling of the corresponding data structures may exceed the time saved by a faster propagation routine. Of course, to keep the changes in the contracted graph as small as possible, one can use a depth first search node selection strategy in branchand-bound. Except for feasibility problems, however, a pure depth first search strategy is not beneficial in general; see [1]. Thus, we refrained from implementing this idea.

Branching rules
A central component of a branch-and-bound algorithm are branching rules. A variablebased branching rule typically selects a variable x i with violated integrality constraint and creates two subproblems by either adding x i ≤ x i or x i ≥ x i to its child nodes, wherex is the relaxation's solution at the current node. Constraint-based branching rules, on the contrary, create two subproblems by choosing a hyperplane a x = β and adding either a x ≤ β or a x ≥ β to a child of an open node. For more details on branching rules, we refer to, e.g., [2,38].
In the following, we describe three variable-based and one constraint-based branching rule for the C-Max-k-Cut problem. To this end, we denote the set of all x-variables that are fixed to value α in a given node b of τ by F α b .

Articulation-vertex branching
By definition, a graph splits into at least two connected components C 1 , . . . , C s if an articulation vertex u is removed from the graph. For C-Max-k-Cut, this implies that if u is assigned to a specific part i ∈ [k], every remaining part in [k]\{i} contains vertices from exactly one connected component C r , r ∈ [s]; see Sect. 4. Thus, once an articulation vertex is assigned to a part, the number of possible assignments of vertices to parts drastically reduces since at most one part may contain vertices from more than one of the connected components. Based on this observation, the idea of the articulation-vertex branching rule is to assign articulation vertices to a part first. In our implementation, we compute all articulation vertices of G, which can be done in O(|V | + |E|) time, see [26], when the branch-and-bound algorithm is initialized and we store the set A of articulation vertices for the entire algorithm. Whenever the branching rule is called in a node b of τ , we select a vertex u ∈ A and check whether u has not yet been assigned. In this case and for each part i ∈ [k] with x ui / ∈ F 0 b we generate a child node of b in which we fix x ui = 1. Otherwise, if u is already assigned to a part, we analogously proceed with another articulation vertex, until we either created child nodes or we processed all vertices in A. In the latter case, the articulation-vertex branching rule unsuccessfully terminates and another branching rule is called. Thus, the running time (neglecting the initialization of A) of the branching rule is O(k|A|).

Infeasibility branching
Consider now a vertex u ∈ V for which only few assignments x ui = 1 exist such that F 1 b ∪ {x ui } can be extended to a feasible solution of the C-Max-k-Cut problem. If u is chosen for branching and for each i ∈ [k] with x ui / ∈ F 0 b a child node is generated by setting x ui = 1, there is hope that many of these child nodes are infeasible, and thus can be pruned. Of course, using this infeasibility branching rule may produce an unbalanced branch-and-bound tree. But if infeasibility of a child node can be detected early, we can potentially rule out many branching decisions that would end up in an infeasible node of τ in a later stage if we had chosen another branching rule.
To find a candidate vertex u for branching, we select an unassigned vertex u ∈ V with the smallest number of unassigned neighbors. To break ties, we choose u such that the remaining possibilities for assignments of u to a part is minimal. As another tie break we choose a vertex with maximal degree. Performing these steps requires to compute the number of remaining assignments to a part for each vertex, which is possible in O(k|V |) time. If we store these numbers, the remaining steps of the branching rule can be implemented to run in O(Δ|V |) time, where Δ is the maximum degree of a vertex in G. This leads to an overall running time of O((k + Δ)|V |).

Objective branching
In this section, we assume that all objective coefficients w uv are 1. The idea of the objective branching rule is to incorporate this objective into branching decisions. Since we aim to find a partition with the maximum number of edges between different parts, this branching rule selects an unassigned vertex u with as many already assigned neighbors as possible. The child nodes are generated via the different possibilities for u to be assigned to a part. Since many neighbors of u were already assigned, assigning u to a part that was not used often by its neighbors hopefully increases the objective in this child node. However, using the same argumentation as for the infeasibility branching rule, shows that the objective branching rule may produce infeasible child nodes with increased probability for some nodes. The time to find a branching candidate u is in O((k + Δ)|V |).

Path branching
In contrast to the previous branching rules, the path branching rule does not change the bounds of a single variable, but adds a specific constraint to the subproblems of the generated child nodes. To find this inequality, the branching rule selects an index i ∈ [k] such that the subgraph G i of G induced by the vertices u ∈ V with x ui ∈ F 1 b is disconnected. If the assignments of vertices to parts in b can be extended to a feasible solution of C-Max-k-Cut, there exists a path P = (V P , E P ) in G that connects two vertices in different connected components of G i such that the vertices of P are unassigned or assigned to part i. To incorporate this observation into a branching decision, the path branching rule chooses such a path P and creates two child nodes of b by adding either of the inequalities In particular, the inequality added in the latter case ensures that the number of connected components of G i decreases by one and thus tries to enforce finding a partition with connected parts. Since the connected components of G i and a path between two of these connected components can be found by two calls of a (modified) BFS algorithm, a branching decision can be made in O(k(|V | + |E|)) time.

Primal heuristics
In this section we present different primal heuristics for the C-Max-k-Cut problem. We start by describing a relaxation-based construction heuristic and an improvement heuristic in Sect. 6.1. Afterward, we present a root-node heuristic in Sect. 6.2 that is based on a spanning tree computation.

A relaxation-based rounding heuristic
Algorithm 2 A relaxation-based rounding heuristic.
if P π( ) = ∅ then 5: Set The relaxation-based rounding algorithm is formally stated in Algorithm 2. It is mainly taken from [21], where it is used in a multilevel framework for electricity market splitting. However, it can be directly used for the more general problem considered in this paper. Letx ui ∈ [0, 1] for u ∈ V , i ∈ [k], be part of a relaxation solution of Problem (1). We interpret these relaxation solutions as the probability that vertex u should be in part i. The idea is now as follows. First, we sort the vectorx ∈ [0, 1] k|V | in descending order in Line 1. While the indicies u and i of the entries of the vectorx clearly put every entry in relation to a node and a part, this is obviously not the case anymore after sorting. For still being able to relate an entry of the sorted vector to the corresponding node and part, we introduce the mappings ν and π to encode the information to which pair of vertex and part each entry of the sorted vector belongs. For instance, assume entryx ui has index α ∈ {1, . . . , k|V |} after sorting. Then, ν(α) = u and π(α) = i holds. Next, we assign to every part the vertex with the highest probability of being assigned to that part (first while-loop). The set M used in the algorithm collects all vertices that have already been associated to a certain part of the partition. Afterward, we again iterate over the sorted vector of relaxation solutions and assign every vertex that is not yet assigned to a part if this assignment does not violate connectivity (for-loop).
It is possible that in early iterations of Algorithm 2, a vertex cannot be assigned to a favorable part because it is not yet connected to any other vertex of that part. Thus, we subsequently apply local improvement steps in which we iteratively check for each vertex whether it should be assigned to a different part than to the one to which it is currently assigned to. The method is given in Algorithm 3. More formally, suppose a vertex u is assigned to part i and let us denote the corresponding objective function value by ϕ ui . Consider now the situation in which node u is assigned to another part j = i and everything else stays unchanged. Let ϕ u j be the corresponding objective function value. If the objective function value ϕ ui is smaller than ϕ u j then u is moved from part i to j if this leaves all parts non-empty and if the resulting parts are still connected.

Algorithm 3 A 1-opt improvement heuristic.
such that ∅ = P i \{u} and P j ∪ {u} are connected then 4: Let ϕ ui be the objective function value with u ∈ P i . 5: Let ϕ u j be the objective function value with u ∈ P j . 6: if ϕ u j > ϕ ui then 7: One crucial aspect of Algorithm 3 is the choice of the part j to which vertex u should be assigned. We implemented a brute-force strategy to search for possible candidate parts and found that this is appropriate in our numerical experiments. However, more involved strategies are given in the literature; see, e.g., [6].

Spanning tree heuristic
We now describe a greedy-based root-node heuristic, which uses a spanning tree of the graph to enforce the connectedness of the resulting parts. The idea is to initialize the components with the root node of the spanning tree and k − 1 leaves and then decide for each branch where it is chopped, i.e., which part is added to the respective leaf component and which part is added to the root component.

Algorithm 4 Spanning Tree Heuristic.
Input: A connected graph G with edge-weights w and the number of parts k. 1: Compute a minimum spanning tree T of G with starting/root node r and let L be the set of leaves of T .
Unfortunately, there are some cases that need to be considered; see Algorithm 4. Since the objective function is to find a maximum-weight cut, we start by computing a minimum spanning tree T . Typically, the edges of the spanning tree tend to end up inside one part of the generated partition. One component is always initialized with the tree's root node r , but we have to distinguish between the number of leaves L of T compared to k. If we do not have enough leaves to initialize all remaining k − 1 components (see Lines 3-7), we look for the leaf with the largest w(δ( )) and set the last uninitialized component to contain and only . Afterward, we remove from T and iterate this process until there are enough leaves in the recent spanning tree to cover the remaining components. Note that the components P |L|+1 , . . . , P k will not change in the second part of the algorithm due to the usage ofk.
Afterward-or if there are enough leaves to start with-every component is initialized with one leaf. If there are more leaves than components (see Line 11), the remaining leaves together with their complete branch are put in the root-component.
Then, we can start the actual heuristic in Line 13. For every remaining branch of the tree, we successively compute the weights for the vertex sets that are obtained by starting with the leaf and adding the respective next vertex on the unique path to the root (or the first node that was already assigned in a previous step). We choose the largest of such weights and set the component to contain the corresponding set. The remainder of the branch is (theoretically) added to the root-component. Since later leaves and their branches should be able to claim them, we do not mark these vertices directly, but wait until after the for-loop to put all remaining uncleared vertices in the root-component. Due to this construction, every component will be connected.

Symmetry handling
Assume we are given a solution of C-Max-k-Cut, i.e., an assignment x vi of the vertices v ∈ V to the parts i ∈ [k]. Then, we can transform this solution into another solution with the same objective value by permuting the partitioning labels.
In the following, we refer to the first kind of symmetries as partitioning symmetries and to the latter kind as graph symmetries.
Since computing symmetric solutions within branch-and-bound typically increases the solution time, we use two symmetry handling approaches that are discussed in the literature for handling partitioning and graph symmetries. The aim of both approaches is to cut off solutions that are not contained in a representative class of symmetric solutions-which hopefully decreases the solution time.

Partitioning symmetries
In Formulation (1), an assignment of vertices to parts is given by a matrix x ∈ {0, 1} V ×[k] with exactly one 1-entry per row. Thus, a partitioning symmetry, i.e., a relabeling of the parts, permutes the columns of x and keeps the number of 1entries per row invariant. To handle such column symmetries, [32] introduced the concept of partitioning orbitopes. The partitioning orbitope O m,n is the convex hull of all binary (m × n)-matrices with one 1-entry per row whose columns are sorted in a lexicographically non-increasing way. In [32], a facet description of partitioning orbitopes has been developed and the authors proved that it can be separated in O(|V |k) time. Moreover, in [31] it is shown that partitioning orbitopes can be propagated in O(|V |k) time. In our experiments, we use an implementation of both the separation and propagation routine to handle partitioning symmetries.

Graph symmetries
Let γ : V → V be an automorphism of G. By the above discussion, we can associate a permutationγ : the rows of x according to γ . To handle graph symmetriesγ , we use the concept of symresacks that was introduced in [25]. The symresack w.r.t. a permutationγ contains all binary vectors that are lexicographically not smaller than their permutation w.r.t.γ . Thus, similarly to orbitopes, valid inequalities for symresacks can be used to cut off symmetric solutions. The authors of [25] show that there exists an IP formulation of Pγ with left-hand side coefficients in {0, ±1} that can be separated in O (N α(N )) time, where α is the inverse Ackermann function. Moreover, a linear time propagation algorithm for Pγ is described in [25]. Both the separation and propagation algorithm for symresacks are used in our implementation to handle graph symmetries in C-Max-k-Cut.
Note that while different symmetry handling techniques cannot be used simultaneously in general, orbitopes and symresacks can be applied on the same instance because both are based on a lexicographic order. Thus, if we guarantee the same underlying variable order, no conflicts can arise.

Computational study
In the Sects. 3-7, we presented enhancements of a branch-and-cut framework for solving the C-Max-k-Cut problem. In this section, we now discuss the numerical results that we obtained using these enhancements. As we have mentioned throughout the paper, some of these enhancements are new and some of them can be found in the literature. Our goal in this section is to compare the novel techniques with a reference branch-and-cut algorithm that involves all the ingredients from the literature.
In order to obtain credible results, we tested the techniques on different test sets from the literature and on an additional test set of randomly generated instances. The entire test set and the general computational setup is described in Sect. 8.1. The large test set allows for a detailed numerical study that (i) evaluates the benefit of different (combinations of) techniques applied to different test sets and (ii) shows the improvement due to the novel techniques compared to the current state-of-the-art. Moreover, the broadness of the numerical study allows for drawing conclusions why a certain technique leads to computational enhancements for different instances or not. Both is carried out in Sects. 8.2-8.4, where we analyze the results for k ∈ {2, 5, 10} in order to also shed light on the computational benefit of the techniques for different values of k. In Sect. 8.5, we collect and discuss the general observations and insights. Finally, we use these insights to set up a problem-tailored parameterization of a branchand-cut method for solving the C-Max-k-Cut on realistic gas and power transport networks in Sect. 8.6.

Test sets and computational setup
The test sets that we use in our computational study are the following:

Color02
51 of the smallest instances from the Color02 symposium on coloring problems [12]. The density of the considered instances ranges from 3.4 to 89.6% of possible edges; see Table 16. This test set contains the largest graphs (with 11-282 nodes; except for one outlier with 2368 nodes) and the graphs that vary the most w.r.t. their density. Random 150 randomly generated instances with sizes in the range of 50-100 vertices and with densities that range from 4.7 to 21.3%; see Table 17. These instances were created with Mathematica [27] employing different edge probabilities. 1 Steiner-80 81 instances containing graphs with 80 vertices from the SteinLib I080 library [35,47] of Steiner tree problems. Since the original test set contains 20 Steiner tree instances on the complete graph, which are identical for our problem, we removed 19 of these instances. The density of the considered instances varies between 3.8 and 11.1% with the exception of one complete graph; see Table 18. Steiner-160 81 instances containing graphs with 160 vertices from the SteinLib I160 library [35,47] of Steiner tree problems. Similar to Steiner-80, we removed 19 out of the 20 instances on the complete graph. The density of the considered instances varies between 1.9 and 6.4% with the exception of one complete graph; see Table 19. Both SteinLib test sets contain the graphs with the largest number of articulation vertices.
Recall from the definition of C-Max-k-Cut that we require all graphs to be simple. Thus, we replace parallel edges in the above instances by a single edge in a preprocessing step. Detailed information about the (preprocessed) test sets are given in the tables in Appendix A. We use SCIP 5.0.1 [18] with the LP solver CPLEX 12.7.1 [13] to solve the instances and to implement the techniques discussed in Sects. 3-6. To handle symmetries as described in Sect. 7, we use the implementation of these methods that is already available in SCIP. Note, however, that SCIP itself is not able to detect symmetries automatically because of our additional plug-ins. Thus, we add partitioning orbitopes by hand. Moreover, we use Nauty [41] to detect graph automorphisms of the underlying graphs and add for each of the automorphism group's generators a symresack to the problem formulation. All graph algorithms (like computing connected components or articulation vertices) have been implemented using the Boost graph library [7]. All computations were run on a Linux cluster with Intel Xeon E5 3.5 GHz quad core processors and 32 GB memory. The code was executed using a single thread. The time limit of all computations is 1 h per instance.
Before we start analyzing the specific results, we briefly describe the reference branch-and-cut algorithm to which we compare the novel techniques in the following. We will see in this section that the novel techniques discussed in this paper significantly outperform the reference branch-and-cut algorithm. This is especially the case for larger values of k, where the reference method is almost never able to compute a global optimal solution. During our preliminary numerical tests it turned out that the situation is even more drastic for the general-purpose solver SCIP without any additional problem-specific components. Since a comparison of new techniques with a setting that is almost never able to compute solutions will not lead to useful insights, we thus decided to use SCIP extended by computational useful existing techniques as a reference branch-and-cut method.
To decide which of the cutting planes, heuristics, and symmetry handling methods from the literature discussed in the preceding sections are activated in the reference algorithm, we ran preliminary experiments. The experiments show that for both the flow and cut formulation it is favorable to handle graph and partitioning symmetries as well as to activate all discussed heuristics. Moreover, the results show that separating clique cuts and separator inequalities during branch-and-bound has a positive impact on the performance, whereas odd cycle and triangle cuts have an adverse effect. Since articulation-vertex cuts and leaf cuts are special cases of separator inequalities, we also evaluated the impact of these cutting planes. Our experiments show that the leaf cuts perform worse in comparison with separator inequalities, whereas the articulationvertex cuts perform well. For this reason, the leaf cuts are deactivated in the reference method and the articulation-vertex cuts are activated. Table 2 provides a summary of the activated components in the reference branch-and-cut method.
Regarding the presented novel techniques, we did preliminary numerical experiments before we obtained the results that are discussed in the following sections. First, for the branching rules introduced in Sect. 5, it turned out that only the articulationvertex branching rule yields improved results reliably. All other branching rules may lead to an improved method for some instances but also harm the solution process on other instances. Thus, we focus on the articulation-vertex branching rule (cf. Sect. 5.1) in what follows.
Next, we report on our experiments on the four test sets and for k ∈ {2, 5, 10}. All reported results on the number of nodes in the branch-and-bound trees ("#nodes") and running times ("time") are given in shifted geometric mean n i=1 (t i + s) 1/n − s of all n instances within a test set, where we use a shift of s = 10 for running times and of s = 100 for nodes to reduce the impact of very easy instances. Moreover, we denote the total number of instances that can be solved by a setting within the time limit by "#opt". The columns "branch.", "prop.", "glo. cut", and "loc. cut" encode whether the articulation-vertex branching rule, the propagation algorithm, the global bounded-edge cut, and the local bounded-edge cut, respectively, are used in a specific setting.
To gain further insight, we also evaluate the performance of selected parameterizations of the different settings using performance profiles with running times as the performance measure [15]. In these profiles we declare that one parameterization dominates another one if the corresponding curve is above the dominated one. Here, "b" denotes the branching rule, "p" the propagation, "gc" the global cut, and "lc" the local cut. If the corresponding method is used, this is denoted by "1", otherwise by "0". 8.2 Discussion of the numerical results for k = 2

The Color02 test set
We start with the Color02 test set. As it can be seen in Table 16, there are only very few articulation vertices. This explains that turning on or off the articulationvertex branching rule changes almost nothing in the results; see the Color02 block in Table 3. There is no difference in the number of solved instances and the running times as well as the number of required branch-and-bound nodes also do not differ significantly. This holds both for the flow and the cut formulation. However, the branching rule is not costly, cf. Sect. 5.1, and thus running times are not affected significantly. It can also be seen that the cut formulation solves, on average, more instances. Using the flow formulation, additionally applying the propagation algorithm 1 and the global bounded-edge cut (6)-without using the separation of the local bounded-edge cut (7) and irrespective of using the branching rule or not-leads to the most successful methods. Using the cut formulation, the most important components are the local and global bounded-edge cut. While the propagation algorithm 1 used alone deteriorates the performance of the cut formulation both in terms of solved instances and running time, applying a combined approach of using either variant of the bounded-edge cut and propagation leads to the best results. Comparing the global and local variant of the bounded-edge cut, it can be seen that the local one performs better if the global cut is not used, which also leads to the best results on this test set.
Choosing the most successful methods w.r.t. this criterion and again comparing them using a performance profile, we obtain the results given in Fig. 3, where we also integrated the reference branch-and-cut method.
First, we see that all combinations of novel techniques outperform the reference approach both for the flow and the cut formulation. Moreover, the figure emphasizes that for the cut formulation (right figure) using the propagation algorithm and the local bounded-edge cut leads to the best approach, whereas the articulation-vertex branching rule is not important for both formulations as explained above. After having chosen these parameterizations, the cut formulation clearly outperforms the flow formulation; see Table 3.
Finally, we see that the Color02 test set for k = 2 contains quite hard instances, since the best method obtained still only solves 20 instances out of the 51 Color02 instances.

The Random test set
First of all, the second block in Table 3 reveals that the cut formulation outperforms the flow formulation both in terms of the number of solved instances and running time. Thus, we focus on the cut formulation in the following. Using the articulationvertex branching rule yields better results on average: The maximum number of solved instances without using the branching rule is 144, which is obtained for 1 (out of 8) parameterization, whereas the same number of solved instances (144) is obtained for 6 (again out of 8) parameterizations if the problem-specific branching rule is activated.  Thus, in contrast to the Color02 test set, the articulation-vertex branching rule has a positive effect on the branch-and-bound mechanism. This behavior is comprehensible because the Random test set contains 21 (out of 150) instances that contain articulation vertices, whereas only 2 instances (out of 51) from the Color02 test set have articulation vertices. Thus, using the branching rule the problem decomposes into subproblems, which can be solved easily since the graphs are relatively sparse. Regarding running times, however, a difference between the variants using the branching rule or not is not obvious. Only considering the combinations using the branching rule, one can see that the most efficient methods use the global bounded-edge cut. The usage of propagation algorithm 1 has a slightly positive effect but does not influence the remaining parameterizations significantly. Hence, the most important choices are to use the cut formulation and to activate the articulation-vertex branching rule as well as the global bounded-edge cut. Again, all novel techniques dominate the reference approach and among the novel ones-compare Fig. 4-it can also be seen that the cut formulation is performing better than the flow formulation. The statistical tool of performance profiles, i.e., of distribution functions, supports the conclusions that can be drawn from Table 3  Finally, since the randomly generated instances are on average smaller and sparser than the Color02 instances, they are much easier to solve than the Color02 instances. Here, we solve 144 out of 150 instances in the best cases, which corresponds to 96.0 % in comparison to less than 40 % for the Color02 test set. Table 3 shows that, on average, using the articulation-branching rule again yields slightly better results both in terms of the maximum number of solved instances and in terms of running times. The larger impact can be seen for running times. Moreover, using the bounded-edge cuts (in its global or local variant) clearly speeds up the solution process and yields a larger number of solved instances. One might thus ask whether both the global and the local bounded-edge cut should be used or not. Interestingly, the answer strongly depends on whether the articulation-vertex branching rule is activated or not. If the branching rule is not used and the propagation is activated, it is better to use the global cut (6) and to deactivate its local variant (7). Deactivating the propagation routine and using the local bounded-edge cut, however, leads to the best result in terms of solved instances and almost the best performance w.r.t. running time for the cut formulation.

The Steiner-80 test set
If the articulation-vertex branching rule is used, the global bounded-edge cut still has the greatest impact on performance. In contrast to the case of the deactivated branching rule, it seems better to use both the global and the local variant of the cut. A more general pattern additionally reveals that the local bounded-edge cut yields drastic improvements in those cases, in which the global variant is not used and leads on average to the best results in the cut formulation. Again, some of the best combinations for the flow and the cut formulations are compared in the performance profiles in Fig. 5.
One can see that all novel techniques drastically outperform the reference approach and that the results for the flow and the cut formulation are quite comparable. The above discussion suggests to use the global bounded-edge cut. Thus, we only included combinations in which this cut is used. For the flow formulation, the branching rule and both variants of the bounded-edge cut lead to the most reliable method (which is in line with the analysis above), whereas deactivating the local variant of the cut yields a faster method. The same applies for the cut formulation but with less clear differences between the different combinations of techniques. Moreover, Fig. 5 shows that for the easy instances it is better to deactivate the separation of the local bounded-edge cut, whereas separating these inequalities improves the running time on the harder instances-a conclusion that cannot be drawn from the average numbers of Table 3 alone.

The Steiner-160 test set
We now turn to the Steiner-160 test set that contains larger graphs. It is obvious from Table 3 that omitting the global and local bounded-edge cut leads to very bad results: only 1 out of 81 instances is solved. Thus, this unfavorable result is also obtained with the reference method. For the flow formulation we observe that using the global or local cuts gives much better results than the reference method. However, using the global variant alone is clearly the best choice regarding these cuts. An interesting pattern of Table 3 is that using the articulation-vertex branching rule does not reduce the number of solved instances in the cut formulation, whereas it almost always (except for one case) leads to less solved instances when used in combination with the flow formulation. For the flow formulation, the most important technique-both in terms of overall solved instances and running times-is the global bounded-edge cut. Additionally using the propagation algorithm then seems to be too costly, yielding longer running times and less solved instances (26 vs. 29 for deactivated branching rule and 26 versus 28 for activated branching rule). Next, we discuss the results for cut formulation and focus on the combinations that use the articulation-vertex branching rule. Using only the local bounded-edge cuts, our implementation is able to solve the largest number of instances. In terms of running times, however, it is necessary to additionally activate the propagation algorithm and the global bounded-edge cut to achieve the best results on average. Unfortunately, the trade-off for the faster average running time is that some instances become unsolvable (26 vs. 32). Nevertheless, the propagation algorithm leads to a more successful method on average. One additional observation is that, when using the cut formulation, it is always preferable (both w.r.t. running times and the number of solved instances) to use the local bounded-edge cuts. When using the flow formulation, however, this is not the case.

Discussion of the numerical results for k = 5
We now discuss the numerical results for the case of k = 5. For this situation, our preliminary experiments revealed that, without the global bounded-edge cut, only very few instances can be solved within the time limit. The bounded-edge cut, however, significantly improves the solution process. Thus, we only discuss the cases in which this cut is used. An overview of all results is given in Table 4.
In contrast to the results for k = 2, for larger k it turned out that it is very helpful to split the separate test sets further depending on the hardness of the contained instances for obtaining a better analysis of the results. This splitting is given in Table 5 for the Color02 test set. For the grouping of the instances of our test sets into difficulty classes,   3600.0 0 we use the following notation. The class [ , u) contains all instances that need at least seconds and less than u seconds for every setting to be solved.

The Color02 test set
First of all, Table 5 shows that the usage of the local bounded-edge cuts hampers the solution process. This holds for all instances except for the very easy instances in class [0, 100), for which the local bounded-edge cut leads to a speedup between 30 and 50% in the flow formulation. It can also be seen that the cut formulation is faster than the flow formulation on the easy instances and that the flow formulation is superior on the medium and hard instances. Moreover,  Table 4 shows that the number of solved instances is larger using the flow formulation whereas the cut formulation is usually faster. Finally, the propagation algorithm improves the solution process both w.r.t. the overall number of solved instances and the running times if the flow formulation is used. Using the cut formulation, however, this behavior can only be observed if the local boundededge cuts are not separated. The articulation-vertex branching rule has almost no impact because the graphs of the Color02 test set contain almost no articulation vertices.
In Fig. 6 we compare the most successful settings, i.e., both the cut and flow formulation with activated global bounded-edge cut and propagation. Since the reference method only solves a single instance for both formulations, we refrain from integrating it into the figure. Figure 6 confirms the above analysis: We solve more instances using the flow formulation. However, if the cut formulation solves an instance, it always solves it faster than the flow formulation.

The Random test set
For the Random test set, Table 6 shows that the cut formulation can be solved significantly faster than the flow formulation: The former has 26 instances in the class [0, 100) compared to only 2 instances for the latter. Thus, the flow formulation clearly leads to longer running times. In contrast, the class [1000, 3600) of hard instances contains almost the same number of instances (28 vs. 27). Interestingly, in this class, the flow formulation solves significantly more instances if the separation of the local bounded-edge cuts is turned off, which is also the better choice for the hard instances solved with the cut formulation. Moreover, it can be seen that the propagation technique is advantageous for hard instances solved with the flow formulation if the local cuts are deactivated, whereas it leads to larger running times for the cut formulation-both for the case that the local bounded-edge cut is not used.
Regarding the articulation-vertex branching rule, it can be seen that the flow formulation clearly benefits from this rule when applied to the easy instances, whereas the cut formulation is slightly weakened by it. For the other instances, the branching rule has no significant influence on both formulations.  Table 4 reveals that, in the flow formulation, the local bounded-edge cut does not help either. In contrast, it improves the cut formulation if the propagation algorithm is also used.
We again compare the best settings (flow formulation with propagation, with branching rule, and without local cuts, as well as the cut formulation with propagation and local cuts but without the branching rule) in Fig. 7.
It is apparent that the flow formulation solves much more instances. Taking Table 6 into account, this also becomes clear because the flow formulation is much more successful especially applied to the hard instances. However, if the cut formulation solves an instance, it is typically faster as it was already the case for the Color02 test set.

The Steiner-80 test set
We now turn to the Steiner-80 test set. Table 7 shows that, again, there exist significantly less easy instances for the flow formulation compared to the cut formulation (1 vs. 26). Regarding the flow formulation, it is noticable that the local cuts and the propagation algorithm are disadvantageous on the medium instances in class [100, 1000) and on the hard instances in class [1000, 3600). The results for the cut formulation show that hard instances can only be solved if both the propagation algorithm and the local bounded-edge cuts are used. Table 4 shows that the articulation-vertex branching rule weakens the cut formulation both in terms of running time and solved instances. The same holds for the flow formulation and the local cuts, whereas the propagation algorithms improves the method, yielding slightly more solved instances. As before, we compare the best settings (flow formulation with branching rule and propagation, and cut formulation with branching rule and local cuts) in Fig. 8. The analysis is the same as for the two other test sets: The flow formulation solves more instances but the cut formulation is faster. Table 8 clearly shows that this is the hardest test set: Almost all instances are classified as hard in the class [1000, 3600). For this set of instances, the cut formulation slightly outperforms the flow formulation. Since we can only solve very few instances for this test set, a reliable discussion of the impact of the different techniques is not possible. However, Table 4 shows that the usage of local cuts weakens the flow formulation-both in terms of the number of solved instances and running times. Additionally, the branching rule does not help solving the flow formulation. The propagation rule improves the solution process of the flow formulation on average but might also lead to slightly less solved instances. In contrast, the cut formulation benefits from the articulation-vertex branching rule and, even more clearly, from the propagation algorithm. Using local bounded-edge cuts weakens the cut formulations in terms of solution time.  The number of instances, however, increases if the separation of the local cuts is enabled.

The Steiner-160 test set
Regarding the best settings, the situation is not as clear as for the other test sets. For the flow formulation, the propagation should be used. Turning on the branching rule gives faster running times but leads to 3 less instances solved. For the cut formulation, one should also use the propagation as well as the branching rule but turn off (considering averages) the local cuts. However, local cuts lead to 3 more solved instances for the best combination.
In summary, the cut formulation is significantly faster and can solve, for the first time, more instances.  The main overview of all results is given in Table 9. Again, the solution process always benefits from activating the global bounded-edge cut, so we only discuss combinations that use this cut. Table 10 shows that the number of easy and hard instances is comparable for the flow and cut formulation and that the cut formulation is significantly faster on the easy instances. For the hard instances, we only solve up to 2 instances, so that trends cannot be discussed reliably. In Table 9 one can additionally see that both the flow and the cut formulation are weakened by using the local bounded-edge cuts. Thus, we only discuss the cases further where the local cuts are not used. Here, the propagation algorithm is always beneficial and yields a speed-up of 6.1% for the flow formulation and even 11.4% for the cut formulation. As before, the articulation-vertex branching rule makes no difference since almost no articulation vertices exist in the Color02 test set.

The Color02 test set
In contrast to the case of k = 5, the cut formulation is now better both w.r.t. the number of solved instances (5 more than the flow variant) and the running times (almost twice as fast as the flow variant). Thus, the overall best setting is the cut formulation with activated propagation and global bounded-edge cut. Table 11 reveals that the cut formulation is again superior: There are 20 easy instances whereas we only have 2 easy instances for the flow formulation. Conversely, the flow formulation has 41 hard instances, whereas there are only 3 hard instances for the cut variant. For the easy instances, it is clear that the local bounded-edge cuts hinder the solution process and for deactivated local cuts, the propagation slows down the solution process. Since all easy instances are solved, this means that using the propagation algorithm is not required. Interestingly, the cut formulation clearly benefits from using the local cuts on the hard instances. Since there are only 3 of them, this is, however, not a reliable trend. In addition, Table 9 shows that, on average, local cuts do not yield better methods. In the same table it can also be seen that propagation is beneficial, especially when applied to the cut formulation. The branching rule does not yield improved results. In summary, the cut formulation is the distinct winner and should be used with the propagation algorithm.

The Steiner-80 and Steiner-160 test sets
We now turn to the Steiner-80 test set. Again, the cut formulation is obviously easier to solve than the flow formulation; see Table 12 (22 vs. 0 easy instances). Since there is only one medium instance, no trends can be discussed. The flow formulation has 8 hard instances and the cut variant only 1. Table 9 additionally shows that the local cuts are unfavorable, but that the propagation and the branching rule help to improve  the solution process on average for the flow formulation. The best combination is thus obtained by activating the propagation algorithm and the branching rule-although the latter is deactivated in the settings which solve the most instances. In the cut formulation only the propagation routine leads to improved results. At last, we briefly discuss the results for the Steiner-160 instances. Table 9 states that the cut formulation is superior, which is also confirmed by Table 13. The clear winner is the cut formulation with activated articulation-vertex branching rule and activated propagation algorithm.

General observations
In the following, we summarize the results that we obtained on the different test sets and draw conclusions from the overall results-independent of a specific test set. Based on our preceding analysis, we split our test instances into two classes for which we can observe a different behavior: dense graphs (Color02) and sparse graphs (Random, Steiner-80, Steiner-160).

General observations for dense graphs
Independent of the value of k and the problem formulation, we observe that it is almost always preferable to deactivate the separation of local bounded-edge cuts. This behavior is expected because the local bounded-edge cut improves the global cut only if the removal of the nodes that are assigned to a part splits the underlying graph into several connected components; cf. Sect. 3.3. Consequently, it is unlikely that a local bounded-edge cut can be applied in a dense graph (in comparison to a sparse graph).  To evaluate the impact of the propagation algorithm, we compare in the following the parameterizations with deactivated branching rule and bounded-edge cut separation, as well as activated global bounded-edge cut, which are on average the best settings. For k = 2, the improvement caused by the propagation algorithm in the flow formulation is 1.8%, whereas the cut formulation is improved by 3.6%. If k = 5, the improvements are 6.5% and 12.4%, respectively, as well as 6.0% and 11.4%, respectively, for k = 10. Thus, the improvement for the cut formulation is about twice as large as for the flow formulation. A possible explanation for this behavior is that the cut formulation needs to separate inequalities to ensure connectivity of a solution, whereas connectivity in the flow formulation is modeled by flow constraints that do not need to be separated. For this reason, it is expected that the propagation algorithm is more powerful in the cut formulation than in the flow model. Furthermore, it is rea-  sonable that the improvement is better for increasing values of k, since the propagation algorithm can only find variable reductions if removing one part (almost) disconnects the graph, which is more likely for a larger value of k. The impact of the branching rule cannot be evaluated on our test set of dense instances, because the corresponding graphs contain almost no articulation vertices. Hence, it remains to discuss the impact of the problem formulation. Based on our findings above, we use the setting with activated propagation routine and global bounded-edge cut as well as disabled local bounded-edge cut and branching rule for our comparison. We observe that the cut formulation is on average 10.9% faster than the flow formulation for k = 2. For k = 5 and k = 10, respectively, the speed-up even increases to 26.7% and 45.5%, respectively. Thus, in terms of running time, the cut formulation clearly dominates the flow formulation. In terms of solved instances, however, the flow formulation is preferable since it performs better on the harder instances.

General observations for sparse graphs
In contrast to dense graphs, there is a qualitative difference if the novel techniques are used in the flow or the cut formulation. For k = 2, the propagation algorithm has only a minor impact on the running time for small graphs (Random, Steiner-80). The larger instances (Steiner-160), however, clearly benefit from the propagation routine in the cut formulation, whereas the routine has a negative effect in the flow formulation. One explanation for this might be that the flow constraints enforce connectivity in sparse graphs much better than in dense graphs if k is small. In contrast to this, the cut formulation needs to separate inequalities to ensure connectivity, which increases the size of the LP relaxations. By propagating connectivity, however, fewer inequalities need to be added, keeping the LP relaxations small. If the number of parts k is increased, both the cut and flow formulation benefit on average from the propagation algorithm.
Similarly, using the local bounded-edge cut in combination with the global version has almost always a negative effect on the running time in the flow formulation. In the cut formulation, however, the situation is more complicated. For smaller values of k, the separation of the local cut should be activated. Although this might lead to larger running times on some instances, this increase is relatively small in comparison to the speed-up we gain on larger instances (e.g., Steiner-160 for k = 2). For k = 10, however, the separation routine has in general no positive impact.
Concerning the branching rule, we cannot observe a clear trend whether it is beneficial to activate it or not. By analyzing results on single instances, it seems that we can benefit by using this rule only if a graph does not contain too many articulation vertices. At first glance, this might be surprising. However, if a graph contains many articulation vertices, it is very likely that the default branching rule of SCIP frequently branches on articulation vertices on its own. Thus, it is not necessary to guide the branching process to use articulation vertices and we can benefit from the SCIP rules that take other parameters of a fractional solution into account to find a good branching decision. This conclusion is also supported by the results on hard instances reported in Tables 7, 8, 12, and 13: Since these instances are hard, we need many more nodes in the branch-and-bound tree and thus it is likely to use an articulation vertex in a branching decision. Consequently, there is almost no difference in the running time if the branching rule is enabled or not.
Finally, regarding the average running times, the cut formulation is almost always faster than the flow formulation. However, and analogously to the dense instances, the flow formulation is often able to solve more instances.

Conclusions for good default settings
Based on the analysis of Sects. 8.5.1 and 8.5.2, we have added some switches to our code to enable or disable the novel techniques if it seems to be favorable: -The separation of local bounded-edge cuts is disabled in the flow formulation.
Moreover, we deactivate the separation routine in the cut formulation if either k > 5 or if the density of the graph exceeds 15%.   Using these parameterized settings, we have conducted new experiments on the four test sets described above. Table 14 summarizes these experiments.
For the flow formulation, we can see that the parameterized settings perform very well in comparison to the standard settings described in the previous sections. Besides two outliers (Steiner-160 for k = 2 and Steiner-80 for k = 10), the average running time of the parameterized settings is comparable to or only slightly worse than the best of the standard settings. In particular, the parameterized setting achieves for all test sets in the case of k = 5 the best average running times and improves the best setting for Steiner-80 by 4.2%.
In the cut formulation, the best results of the parameterized settings are achieved for k = 10, where it realizes the best results on Color02 and Steiner-160 as well as almost the best absolute running times for Random and Steiner-80. For smaller values of k, the running times of the parameterized settings are between 5.5% and 9.1% worse than the best setting on Color02 and Steiner-160. On the other test sets, the absolute running times are rather small, and thus, the deviation from the best settings is larger in percentage although the absolute deviation is in most cases quite small.
In summary, the parameterized settings reliably produce good results for the flow formulation over all test sets. In the cut formulation, the performance of the parameterized settings is on average worse than the best of the standard settings. The best results, however, are often achieved by an outlier and the best setting differs between the different test sets. If we take this into account and again compare the running times, the parameterized settings perform also well in the cut formulation over all test sets. Thus, the parameterization provides a resonable mechanism to efficiently solve C-Max-k-Cut problems. Finally, the parameterized settings significantly outperform the state-of-the-art techniques.

An application to gas and power networks
In this section we finally focus on solving C-Max-k-Cut problems on realistic gas and power networks. Typically, these instances are large in terms of the number of nodes, very sparse, and contain a significant number of articulation vertices. Since these parameters have already been taken into account in finding a parameterized setting in the previous section, there is hope that these settings also allow to efficiently solve C-Max-k-Cut for gas and power instances.
The gas instances (Gas) used in our experiments have been extracted from gas networks of the publicly available GasLib library [45]. This test set consists of 5 instances for which the number of nodes varies between 24 and 582. The density of the corresponding graphs lies between 0.4% and 9.1%. Between 45.9% and 64.2% of the vertices are articulation vertices. An overview of these numbers can be found in Table 20 in the appendix.
The test set of power instances (Power) consists of all power flow networks extracted from the matpower [39] software tool that contain at least 10 and at most 10,000 vertices. These graphs are also very sparse and, besides two outliers, the density is smaller than 9.5%. Furthermore, all instances contain articulation vertices. Table 21 in the appendix contains a detailed overview of these characteristics for every instance.
In our experiments, we compare the performance of the parameterized setting against the standard settings that we have used in Sects. 8. 2-8.4. Besides the measures used in the preceding sections, Table 15 also gives the arithmetic mean of primaldual gap (column "gap") of unsolved instances within a test set in order to be able to evaluate the quality of the best solutions found.
For the gas instances, we observe that the parameterized setting is the best setting in the cut formulation for k ∈ {2, 5}, both in terms of running time and average gap. For k = 10, the parameterization also performs very well leading to almost the smallest average running times and gaps. In the flow formulation, the parameterization is also one of the best settings for k = 2. For larger k, however, the quality of the parameterized setting deteriorates and the running time is much slower than in the best setting. Moreover, we observe that the cut formulation clearly dominates the flow formulation in terms of running times. Thus, using the parameterized setting on the cut formulation reliably leads to the best performance in solving C-Max-k-Cut on gas instances. On the power instances, the parameterized setting is only slightly slower than the best setting (at most 3.8%) in both the cut and flow formulation. The only exceptions are the flow formulation for k = 10 and the cut formulation for k = 5, where the parameterization is by about 7.3% and 16.6% slower. Thus, again, our parameterization performs very well on average. Since the cut formulation leads on average to faster running times than the flow formulation, we conclude, as for the gas instances, that using the parameterized setting on the cut formulation is a good choice to solve C-Max-k-Cut problems on power instances.

Conclusion
In this paper we studied tailored branch-and-cut techniques for the connected Maxk-Cut problem. We reviewed existing mixed-integer programming techniques from the literature and showed in an extensive numerical study that these techniques do not yield an effective branch-and-cut algorithm for a large variety of test sets. Thus, we also developed novel techniques, which are shown to yield a much more successful method for solving hard instances. Finally, we showed that we can determine generalpurpose combinations of the large set of techniques that yield very good results on the studied test sets and that can also solve large-scale and realistic instances of gas and power transport networks.
Acknowledgements The second author was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -EXC 2163/1-Sustainable and Energy Efficient Aviation -Project-ID 390881007. The third author thanks the DFG for their support within project A2 in SFB 666. The last author thanks the DFG for their support within project A05 and B08 in SFB/TRR 154. This research has been performed as part of the Energie Campus Nürnberg and has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 764759. Moreover, we would like to thank Frauke Liers for many fruitful discussions on the topic of this paper and Tristan Gally for useful hints concerning implementation issues. Finally, we thank Marc E. Pfetsch for helpful discussions and his agreement to use parts of the code developed in [28] as the basis for our code.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. See Tables 16, 17, 18, 19, 20 and 21.