1 Introduction

We consider optimization problems \(\textrm{OPT}(f,\Phi ) \,{:}{=}\,\min \{ f(x): x \in \Phi \}\) for a real-valued function \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) and feasible region \(\Phi \subseteq \mathbb {R}^n\) such that \(\textrm{OPT}(f,\Phi )\) can be solved by (spatial) branch-and-bound (B &B) [1, 2]. This class of problems is very rich and captures problems such as mixed-integer linear programs and mixed-integer nonlinear programs. The core of B &B-methods is to repeatedly partition the feasible region \(\Phi \) into smaller subregions \(\Phi '\) and to solve the reduced problem \(\textrm{OPT}(f, \Phi ')\). Subregions do not need to be explored further if it is known that they do not contain optimal or improving solutions (i.e., pruning by bound), or if the region becomes empty (i.e., pruning by feasibility). The sketched mechanism allows to routinely solve problems with thousands of variables and constraints. If symmetries are present, however, plain B &B usually struggles with solving optimization problems as we explain next.

A symmetry of the optimization problem is a bijection \(\gamma :\mathbb {R}^n \rightarrow \mathbb {R}^n\) that maps solution vectors \(x \in \mathbb {R}^n\) to solution vectors \(\gamma (x)\) while preserving the objective value and feasibility state, i.e., \(f(x) = f(\gamma (x))\) holds for all \(x \in \mathbb {R}^n\) and \(x \in \Phi \) if and only if \(\gamma (x) \in \Phi \). Resulting from this definition, the set of all symmetries of an optimization problem forms a group \({\bar{\Gamma }}\). When enumerating the subregions \(\Phi '\) in B &B, it might happen that several subregions at different parts of the branch-and-bound tree contain equivalent copies of (optimal) solutions. This results in unnecessarily large B &B trees. By exploiting the presence of symmetries, one could enhance B &B by finding more reductions and pruning rules that further restrict the (sub-)regions without sacrificing finding optimal solutions to the original problem [3,4,5].

To handle symmetries, different approaches have been discussed in the literature. Most of these approaches avoid computing symmetric solutions by defining a subset of the feasible region that contains representative solutions of classes of symmetric solutions, so-called fundamental domains [21]. There is, however, no unique way to define a fundamental domain, and in the literature two diametrical ways are used: a static and a dynamic one. In the static setting, a fundamental domain \(F \subseteq \mathbb {R}^n\) is defined before the solving process starts. In the case of mixed-integer programming, such a fundamental domain is usually a polyhedron, and to enforce that only solutions within \(F \cap \Phi \) are computed, symmetry handling constraints (SHC s) [4, 6,7,8,9,10,11,12,13,14] are added to the initial formulation. Opposed to the static setting, a dynamic fundamental domain is not determined a priori. Instead, during the solving process symmetry handling methods inspect the current decisions of a solver and adapt the solution process to guarantee that only solutions within \(F \cap \Phi \) are computed. The latter usually happens via variable domain reductions (VDR s) derived from the B &B-tree [15,16,17].

The static and dynamic approach thus use different techniques and it is not always clear whether different techniques can be combined, in particular, if symmetry handling constraints rely on a specific fundamental domain. In their textbook form, SHC s and VDR s are thus incompatible, i.e., they cannot be combined and that might leave some potential for symmetry reductions unexploited. Moreover, many SHC s and VDR s have only been studied for binary variables and for symmetries corresponding to permutations of variables, which restricts their applicability.

The goal of this article is to overcome these drawbacks. We therefore devise a unified framework for symmetry handling. The contributions of our framework are that it

  1. (C1)

    Allows to reveal whether SHC s and VDR s are compatible,

  2. (C2)

    Applies to general variable types, and

  3. (C3)

    Can handle symmetries of arbitrary finite groups, which are not necessarily permutation groups.

Due to C1, one is thus not restricted anymore to either use SHC s or VDR methods. In particular, we show that many popular VDR techniques for binary variables such as orbital fixing [17] and isomorphism pruning [16, 18], but also SHC s can be simultaneously cast into our framework. That is, our framework unifies the application of these techniques. To fully facilitate our framework regarding C2, the second contribution of this paper is a generalization of many symmetry handling techniques from binary variables to general variable types. This allows for handling symmetries in more classes of optimization problems, in particular classes with non-binary variables.

Regarding C3, we stress that this result is not based on the observation that every finite group is isomorphic to a permutation group by Cayley’s theorem [19], because the space in which the isomorphic permutation group is acting might differ from \(\mathbb {R}^n\). Moreover, we remark that Margot [21] also presents a framework for handling symmetries. Our framework is more general though and we stress the differences in Sect. 3.2 in detail.

Outline

After providing basic notations and definitions, Sect. 2 provides an overview of existing symmetry handling methods. In particular, we illustrate the techniques that we will later on cast into our unified framework. The framework itself will be introduced in Sect. 3. Section 4 shows how existing symmetry handling methods can be used in our framework and how these methods can be generalized from binary to general variables. We conclude this article in Sect. 5 with an extensive numerical study of our new framework both for specific applications and benchmarking instances. The study reveals that our novel framework is substantially faster than the state-of-the-art methods on both SHC s and VDRs as implemented in the solver SCIP.

Notation and Definitions

Throughout the article, we assume that we have access to a group \(\Gamma \) consisting of (not necessarily all) symmetries of the optimization problem \(\textrm{OPT}(f, \Phi )\). That is, \(\Gamma \) is a subgroup of \({\bar{\Gamma }}\), which we denote by \(\Gamma \le {\bar{\Gamma }}\). We refer to \(\Gamma \) as a symmetry group of the problem. For solution vectors \(x \in \mathbb {R}^n\), the set of symmetrically equivalent solutions is its \(\Gamma \)-orbit \(\{\gamma (x): \gamma \in \Gamma \}\).

Let \({\mathcal {S}}_{n}\) be the symmetric group of \([n] \,{:}{=}\,\{1,\dots ,n\}\). Moreover, let \([n]_0 \,{:}{=}\,{[n] \cup \{0\}}\). Being in line with the existing literature on symmetry handling, we assume that permutations \(\gamma \in {\mathcal {S}}_{n}\) act on vectors \(x \in \mathbb {R}^n\) by permuting their index sets, i.e., \(\gamma (x) \,{:}{=}\,( x_{\gamma ^{-1}(i)} )_{i=1}^n\). We call such symmetries permutation symmetries. The identity permutation is denoted by \(\textrm{id}\). To represent permutations \(\gamma \), we use their disjoint cycle representation, i.e., \(\gamma \) is the composition of disjoint cycles \((i_1, \dots , i_r)\) such that \(\gamma (i_k) = i_{k + 1}\) for \(k \in \{1,\dots , r-1\}\) and \(\gamma (i_r) = i_1\).

In practice, the symmetry group \(\Gamma \) is either provided by a user or found using detection methods such as in [5, 20]. Detecting the full permutation symmetry group for binary problems, however, is \(\textrm{NP}\)-hard [21]. For non-linear problems, depending on how the feasible region \(\Phi \) is given, already verifying if \(\gamma \) is a symmetry might be undecidable [4]. In practice, one therefore usually computes a subgroup of symmetries that keep the problem formulation invariant, which can be achieved by solving a suitable graph automorphism problem [4, 20]

To handle symmetries, among others, we will make use of variable domain propagation. The idea of propagation approaches is, given a symmetry reduction rule and domains for all variables, to derive reductions of some variable domains if every solution adhering to the symmetry rule is contained in the reduced domain. More concretely, let \(\Phi '\) be the feasible region of some subproblem encountered during branch-and-bound. For every variable \(x_i\), \(i \in [n]\), let \({\mathcal {D}}_i \subseteq \mathbb {R}\) be its domain, which covers the projection of \(\Phi '\) on \(x_i\), i.e., \({\mathcal {D}}_i \supseteq \{ v \in \mathbb {R}: x_i = v \text { for some } x \in \Phi '\}\). In an integer programming context, the domain \({\mathcal {D}}_i\) corresponds to an interval in practice. A symmetry reduction rule is encoded as a set \({\mathcal {C}} \subseteq \mathbb {R}^n\), which consists of all solution vectors that adhere to the rule. The goal of variable domain propagation is to find sets \({\mathcal {D}}'_i \subseteq {\mathcal {D}}_i\), \(i \in [n]\), such that . In this case, the domain of variable \(x_i\) can be reduced to \({\mathcal {D}}'_i\). We say that propagation is complete if, for every \(i \in [n]\), domain \({\mathcal {D}}'_i\) is inclusionwise minimal.

Throughout this article, we denote full B &B-trees by \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\), i.e., we do not prune nodes by their objective value and do not apply enhancements such as cutting planes or bound propagation. This is only required to prove theoretical statements about symmetry handling and does not restrict their practical applicability as we will discuss below. If not mentioned differently, we assume \({\mathcal {B}}\) to be finite, which might not be the case for spatial branch-and-bound; the case of infinite B &B-trees will be discussed separately. For \(\beta \in {\mathcal {V}}\), let \(\chi _\beta \) be the set of its children and let \(\Phi (\beta ) \subseteq \Phi \) be the feasible solutions at \(\beta \), i.e., the intersection of \(\Phi \) and the branching decisions. If \(\beta \) is not a leaf, we assume that \(\Phi (\omega )\), \(\omega \in \chi _\beta \), partitions \(\Phi (\beta )\). In our definitions, this is even the case for spatial branch-and-bound, meaning that the partitioned feasible regions are not necessarily closed sets. We will discuss the practical consequences of this assumption below.

2 Overview of symmetry handling methods for binary programs

This section provides an overview of symmetry handling methods. The methods lexicographic fixing, orbitopal fixing, isomorphism pruning, and orbital fixing are described in detail, because we will later on show that these methods can be cast into our framework and can be generalized from binary to arbitrary variable domains. Further symmetry handling methods will only be mentioned briefly. We illustrate the different methods using the following running example.

Problem 1

(NDB) Sherali and Smith [22] consider the Noise Dosage Problem (ND). There are p machines, and on every machine a number of tasks must be executed. For machine \({i \in [p]}\), there are \(d_i\) work cycles, each requiring \(t_i\) hours of operation, and each such work cycle induces \(\alpha _i\) units of noise. There are q workers to be assigned to the machines, each of which is limited to H hours of work. The problem is to minimize the noise dosage of the worker that receives the most units of noise. We extend this problem definition with the requirement that each worker can only be assigned once to the same machine, which makes the problem a binary problem (NDB), namely to

$$\begin{aligned} \text {minimize}\ \eta&, \end{aligned}$$
(NDB1)
$$\begin{aligned} \text {subject to}\ \eta&\ge \sum \nolimits _{i \in [p]} \alpha _i \vartheta _{i,j}{} & {} \text {for all}\ j \in [q], \end{aligned}$$
(NDB2)
$$\begin{aligned} \sum \nolimits _{j \in [q]} \vartheta _{i,j}&= d_i{} & {} \text {for all}\ i \in [p], \end{aligned}$$
(NDB3)
$$\begin{aligned} \sum \nolimits _{i \in [p]} t_i \vartheta _{i, j}&\le H{} & {} \text {for all}\ j \in [q], \end{aligned}$$
(NDB4)
$$\begin{aligned} \vartheta&\in \{0, 1\}^{p \times q}, \end{aligned}$$
(NDB5)
$$\begin{aligned} \eta&\ge 0. \end{aligned}$$
(NDB6)

For a solution, \(\vartheta \) represent the worker schedules in a \(p \times q\) binary matrix. The value of variable \(\vartheta _{i, j}\) states whether a task on machine i is allocated to worker j. Since all workers have the same properties in this model, symmetrically equivalent solutions are found by permuting the worker schedules. This corresponds to permuting the columns of the \(\vartheta \)-matrix. As such, a symmetry group of this problem is the group \(\Gamma \) consisting of all column permutations of this \(p \times q\) matrix.

For illustration purposes, we focus on an NDB instance with \({p=3}\) machines and \({q=5}\) workers. We stress that the symmetry handling methods work even if variable domain reductions inferred by the model constraints are applied. For the ease of presentation, however, we assume no such reductions are made in the NDB problem instance. For this reason, we do not specify \(d_i\), \(t_i\), and H.

2.1 Symmetry handling constraints based on lexicographic order

The philosophy of symmetry handling constraints (SHC s) is to restrict the feasible region of an optimization problem to representatives of the \(\Gamma \)-orbits of feasible solutions. A common way to do this, is to enforce that feasible solutions must be lexicographically maximal in their \(\Gamma \)-orbit [7].

Let \(x, y \in \mathbb {R}^n\). We say x is lexicographically larger than y, denoted \(x \succ y\), if for some \(k \in [n]\) we have \(x_i = y_i\) for \(i < k\), and \(x_k > y_k\). If \(x \succ y\) or \(x = y\), we write \(x \succeq y\). Since the lexicographic order specifies a total ordering on \(\mathbb {R}^n\), to solve the optimization problem \(\textrm{OPT}(f, \Phi )\), it is sufficient to consider only those solutions x that are lexicographically maximal in their \(\Gamma \)-orbit. Let

$$\begin{aligned} {\mathcal {X}} \,{:}{=}\,\{ x \in \{0, 1\}^n: x \succeq \gamma (x) \text { for all } \gamma \in \Gamma \}. \end{aligned}$$

Then, solving \(\textrm{OPT}(f, \Phi \cap {\mathcal {X}})\) yields the same optimal objective and the same feasibility state as the original problem. Note, however, that deciding whether a vector \(x \in \{0, 1\}^n\) is contained in \({\mathcal {X}} \) is \(\textrm{coNP}\)-complete [23]. Complete propagation of the SHC s  \({\mathcal {X}}\) is thus \(\textrm{coNP}\)-hard for general groups. In practice, one therefore either neglects the group structure or applies specialized algorithms for particular groups [6, 24]. We discuss lexicographic fixing and orbitopal fixing as representatives for these two approaches.

2.1.1 Lexicographic fixing (LexFix)

Instead of handling \(x \succeq \gamma (x)\) for all \(\gamma \in \Gamma \), one can handle this SHC for a single permutation \(\gamma \) only.Footnote 1 For binary problems, [7] shows that \(x \succeq \gamma (x)\) is equivalent to the linear inequality \(\sum _{k=1}^n 2^{n-k} x_k \ge \sum _{k=1}^n 2^{n-k} \gamma (x)_k\). Due to the large coefficients, however, these inequalities might cause numerical instabilities. To circumvent numerical instabilities, [9] presents an alternative family of linear inequalities modeling \(x \succeq \gamma (x)\) in which all variable coefficients are either 0 or \(\pm 1\) and that can be separated efficiently.

Alternatively, \(x \succeq \gamma (x)\) can also be enforced using a complete propagation algorithm that runs in linear time [24, 25]. Since a variable domain reduction in the binary setting corresponds to fixing a variable, we refer to this algorithm as the lexicographic fixing algorithm, or LexFix in short.

Using our running example NDB, we illustrate the idea of LexFix for the permutation \(\gamma \) that exchanges column 2 and 3 and fixes all remaining columns. Since the lexicographic order depends on a specific variable ordering, we assume that the variables of the \(\vartheta \)-matrix are sorted row-wise. That is, \(x = (\vartheta _{1,1}, \dots , \vartheta _{1, 5}; \dots ; \vartheta _{3,1}, \dots , \vartheta _{3, 5})\). We omit \(\eta \) from the vector, since the orbit of \(\eta \) is trivial with respect to \(\Gamma \).

When removing fixed points of the solution vector from \(\gamma \), enforcement of \(x \succeq \gamma (x)\) corresponds to \( (\vartheta _{1,2}, \vartheta _{1,3}, \vartheta _{2,2}, \vartheta _{2,3}, \vartheta _{3,2}, \vartheta _{3,3}) \succeq (\vartheta _{1,3}, \vartheta _{1,2}, \vartheta _{2,3}, \vartheta _{2,2}, \vartheta _{3,3}, \vartheta _{3,2}) \), which in turn corresponds to \( (\vartheta _{1,2}, \vartheta _{2,2}, \vartheta _{3,2}) \succeq (\vartheta _{1,3}, \vartheta _{2,3}, \vartheta _{3,3}) \). Complete propagation of that constraint for the running example is shown in Fig. 1. For instance, in the leftmost node, \(\vartheta _{1,3}\) can be fixed to 0, because any solution with \(\vartheta _{1,3} = 1\) and satisfying the remaining local variable domains violates the lexicographic order constraint as \((\vartheta _{1,2},\vartheta _{2,2},\vartheta _{3,2}) = (0,\vartheta _{2,2},\vartheta _{3,2}) \nsucceq (1,0,\vartheta _{3,3}) = (\vartheta _{1,3},\vartheta _{2,3},\vartheta _{3,3})\).

Fig. 1
figure 1

Branch-and-bound tree for the NDB problem. Fixings by LexFix are drawn red

2.1.2 Orbitopal fixing

Note that LexFix neglects the entire group structure and thus might not find variable domain reductions that are based on the interplay of different symmetries. Since propagation for \({\mathcal {X}} \) is \(\textrm{coNP}\)-hard, special cases of groups have been investigated that appear very frequently in practice. One of these groups corresponds to the symmetries present in NDB, i.e., the symmetry group \(\Gamma \) acts on \(p \times q\) matrices of binary variables by exchanging their columns. We refer to such matrix symmetries as orbitopal symmetries. Besides in NDB, orbitopal symmetries arise in many further applications such as graph coloring or unit commitment problems [6, 10, 26].

Fig. 2
figure 2

Branch-and-bound tree for the NDB problem. Fixings by orbitopal fixing are drawn red

For the variable ordering discussed in Sect. 2.1.1 and \(\Gamma \) being a group of orbitopal symmetries, one can show that enforcing \(x \succeq \gamma (x)\) for all \(\gamma \in \Gamma \) is equivalent to sorting the columns of the variable matrix in lexicographically non-increasing order. Bendotti et al. [6] present a propagation algorithm for such symmetries, so-called orbitopal fixing. Kaibel et al. [10] discuss a propagation algorithm for the case that each row of the variable matrix has at most one 1-entry. Both algorithms are complete and run in linear time. Moreover, Kaibel and Pfetsch [11] derive a facet description of all binary matrices with lexicographically sorted columns and at most (or exactly) one 1-entry per row. That is, the SHC  \({\mathcal {X}} \) can be replaced by the facet description in this case.

Given initial variable domains \({\mathcal {D}}\subseteq \{0, 1\}^{p \times q}\), the algorithm of Bendotti et al. finds the tightest variable domains as follows. First, the lexicographically minimal and maximal matrices in \({\mathcal {X}} \cap {\mathcal {D}}\) are computed. Then, for each column in the variable matrix, the associated variables can be fixed to the value of the lexicographically extreme matrices up to the first row where these extremal matrices differ. If the columns of the extremal matrices are identical, the whole column can be fixed.

For the running example, Fig. 2 presents the branch-and-bound tree with variable fixings by orbitopal fixing. For instance, if \((\vartheta _{2,3}, \vartheta _{1,2}) \leftarrow (0, 0)\), the lexicographically minimal and maximal matrices are \( \left[ \begin{array}{ccccc} 0&{}{\underline{0}}&{}0&{}0&{}0\\ 0&{}0&{}{\underline{0}}&{}0&{}0\\ 0&{}0&{}0&{}0&{}0 \end{array}\right] \) and \( \left[ \begin{array}{ccccc} 1&{}{\underline{0}}&{}0&{}0&{}0\\ 1&{}1&{}{\underline{0}}&{}0&{}0\\ 1&{}1&{}1&{}1&{}1 \end{array}\right] \), with the branching decisions underlined, respectively. Applying orbitopal fixing then leads to the leftmost matrix in Fig. 2.

We say that orbitopal fixing is a complete propagation method, since it finds all variable bound reductions implied by the set of constraints that order the matrix lexicographically non-increasing. The lexicographic comparison constraints \(x \succeq \gamma (x)\) can also be handled independently for every \(\gamma \in \Gamma \) with LexFix. There are two problems with this approach. First, in the case of orbital fixing, \(\Gamma \) is the symmetric group, so \(|\Gamma | = q!\), which is potentially very large. Second, it is possible that not all fixings can be found, e.g., see the example in Appendix A.

Note that orbitopal fixing does not find any variable domain reduction after the first branching decision in Fig. 2. The reason is that branching occurred for a variable in the second row. To still be able to benefit from some symmetry reductions in this case, Bendotti et al. [6] also discuss a variant of orbitopal fixing that adapts the order of rows based on the branching decisions. They empirically show that this adapted algorithm performs better than the original algorithm for the unit commitment problem. We discuss the adapted variant in more detail and with more flexibility in terms of our new framework in Sect. 3 in Example 12.

2.2 Symmetry reductions based on branching tree structure

Recall that the SHC s discussed in the previous section restrict the feasible region of an optimization problem. That is, already before solving the optimization problem, it is determined which symmetric solutions are discarded. A second family of symmetry handling techniques uses a more dynamic approach, which prevents to create symmetric copies of subproblems already created in the branch-and-bound tree. The motivation of this is that symmetry reductions can be carried out earlier than in a static setting as described in the previous section.

Throughout this section, let \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\) be a branch-and-bound tree, where branching is applied on a single variable. For a node \(\beta \in {\mathcal {V}}\), let \(B_0^\beta \) (resp. \(B_1^\beta \)) be the set of variable indices of a solution vector that are fixed to 0 (resp. 1) by the branching decisions on the rooted tree path to \(\beta \). Moreover, we assume \(\Phi \subseteq \{0, 1\}^n\) as the techniques that we describe next have mostly been discussed for binary problems.

2.2.1 Isomorphism pruning

In classical branch-and-bound approaches, a node can be pruned if the corresponding subproblem is infeasible (pruning by infeasibility) or if the subproblem cannot contain an improving solution (pruning by bound). In the presence of symmetry, Margot [15, 18] and Ostrowski [16] discuss another pruning rule that discards symmetric or isomorphic subproblems, so-called isomorphism pruning.

The least restrictive version of isomorphism pruning is due to Ostrowski [16]. Note that the way we phrase isomorphism pruning differs from the notation in [16]. In terms of our unified framework that we discuss in Sect. 3, however, our notation is more suitable.

Let \(\beta \in {\mathcal {V}}\) be a branch-and-bound tree node at depth m, and suppose that every branching decision corresponds to a single variable fixing. Let \(i_k\) be the index of the branching variable at depth \(k \in [m]\) on the rooted path to \(\beta \). Then, \(B_0^\beta \cup B_1^\beta = \{i_1, \dots , i_m\}\). Let \(\pi _\beta \in {\mathcal {S}}_{n}\) be any permutation with \(\pi _\beta (i_k) = k\) for \(k \in [m]\) and let \(y \in \{0, 1\}^n\) such that \(y_i = 1\) if and only if \(i \in B_1^\beta \). For a vector \(x \in \mathbb {R}^n\) and \(A \subseteq [n]\), we denote by \( x{\big |}{}_{\smash {A}} \) its restriction to the entries in A.

Theorem 2

(Isomorphism Pruning) Let \(\beta \in {\mathcal {V}}\) be a node at depth m. Node \(\beta \) can be pruned if there exists \(\gamma \in \Gamma \) such that \( \pi _\beta (y){\big |}{}_{\smash {[m]}} \prec \pi _\beta (\gamma (y)){\big |}{}_{\smash {[m]}} \).

Testing if a vector is lexicographically maximal in its orbit is a \(\textrm{coNP}\)-complete problem [23]. As such, deciding if \(\beta \) can be pruned by isomorphism is an \(\textrm{NP}\)-complete problem.

Remark 3

Margot [26] also describes a variant of isomorphism pruning that can be used to handle symmetries of general integer variables. Margot’s variant assumes a specific branching rule. We do not describe it in more detail as our framework can also handle general integer variables while not relying on any assumptions on the branching rule such as Ostrowski’s version for binary variables.

Figure 3 shows the branch-and-bound tree after applying isomorphism pruning. The only node \(\beta \) that can be pruned is where \((\vartheta _{2,3},\vartheta _{1,2},\vartheta _{1,3}) = (0, 0, 1)\). This is due to the symmetry \(\gamma \) swapping column 2 and 3. For this node, \(\pi _\beta (x) = (\vartheta _{2,3},\vartheta _{1,2},\vartheta _{1,3})\) and \((\pi _\beta \circ \gamma )(x) = (\vartheta _{2,2},\vartheta _{1,3},\vartheta _{1,2})\), and as such

$$\begin{aligned} \pi _\beta (y) = (0, 0, 1, \dots ) \prec (0, 1, 0, \dots ) = \pi _\beta (\gamma (y)). \end{aligned}$$
Fig. 3
figure 3

Branch-and-bound tree for the NDB problem with pruned nodes by isomorphism pruning

Note that isomorphism pruning is a pruning method, which means that it does not find reductions. However, isomorphism pruning can be enhanced by fixing rules that allow to find additional variable fixings early on in the branch-and-bound tree as we discuss next.

2.2.2 Orbital fixing

Orbital fixing (OF) refers to a family of variable domain reductions (VDR s), whose common ground is to fix variables within orbits of already fixed variables. The exact definition of OF differs between different authors [15, 17]. The main difference is whether fixings found by branching decisions are distinguished from fixings found by orbital fixing. We describe the variant [17, Theorem 3], which is compatible with isomorphism pruning. Let \(\beta \in {\mathcal {V}}\). The group of all permutations that stabilize the 1-branchings up to node \(\beta \) is denoted by \(\Delta ^\beta \,{:}{=}\,\textrm{stab}(\Gamma , B_1^\beta ) \,{:}{=}\,\{ \gamma \in \Gamma : {\gamma (B_1^\beta ) = B_1^\beta } \}\).

Theorem 4

(Orbital Fixing) Let \(\beta \in {\mathcal {V}}\). If \(i \in B_0^\beta \), all variables in the \(\Delta ^\beta \)-orbit of i can be fixed to 0.

Figure 4 shows the branch-and-bound tree for applying this orbital fixing rule to the running example. Note that, if up to node \(\beta \) no variables are branched to one, \(\Delta ^\beta \) corresponds to the symmetry group \(\Gamma \). This means that for zero-branchings its whole orbit of \(\Gamma \) (the corresponding row in \(\vartheta \)) can be fixed to zero.

Fig. 4
figure 4

Branch-and-bound tree for the NDB problem with fixings by orbital fixing

Since [17] does not distinguish variables fixed to 1 by branching or other decisions, \(\Delta ^\beta \) can be replaced by all permutations that stabilize the variables that are fixed to 1 (opposed to just branched to be 1). Note that neither definition of \(\Delta ^\beta \) contains the other, i.e., neither version of OF dominates the other in terms of the number of fixings that can be found. Another variant of OF that also finds 1-fixings is presented in [16], see also [5].

2.3 Further symmetry handling methods

Liberti and Ostrowski [14] as well as Salvagnin [27] present symmetry handling inequalities that can be derived from the Schreier-Sims table of group. Further symmetry handling inequalities are described by Liberti [13]. In contrast to the constraints from Sect. 2.1, they are also able to handle symmetries of non-binary variables; their symmetry handling effect is limited though. Another class of inequalities, so-called orbital conflict inequalities, have been proposed in [28]. Moreover, symmetry handling inequalities for specific problem classes are discussed, among others, by [8, 11, 22, 29, 30].

Besides the propagation approaches discussed above, also tailored complete algorithms that can handle special cyclic groups exist [24]. Moreover, Ostrowski [16] presents smallest-image fixing, a propagation algorithm for binary variables. Instead of exploiting symmetries in a propagation framework, symmetries can also be handled by tailored branching rules [17, 31]. Furthermore, orbital shrinking [32] is a method that handles symmetries by aggregating variables contained in a common orbit, which results in a relaxation of the problem. Finally, core points [33,34,35] can be used to restrict the feasible region of problems to a subset of solutions. This latter approach does not coincide with lexicographically maximal representatives.

3 Unified framework for symmetry handling

As the literature review shows, different symmetry handling methods use different paradigms to derive symmetry reductions. For instance, SHC s remove symmetric solutions from the initial problem formulation, whereas methods such as orbital fixing remove symmetric solutions based on the branching history. At first glance, these methods thus are not necessarily compatible.

To overcome this seeming incompatibility, we present a unified framework for symmetry handling that easily allows to check whether symmetry handling methods are compatible. It turns out that, via our framework, isomorphism pruning and OF can be made compatible with a variant of LexFix. Moreover, in contrast to many symmetry handling methods discussed in the literature, our framework also applies to non-binary problems and is not restricted to permutation symmetries. Before we present our framework in Sect. 3.2, it will be useful to first provide an interpretation of isomorphism pruning through the lens of symmetry handling constraints.

3.1 Isomorphism pruning and orbital fixing revisited

Let \(\beta \in {\mathcal {V}}\) be a node at depth m and let y be the incidence vector of 1-branching decisions as described in Sect. 2.2.1. Due to Theorem 2, node \(\beta \) can be pruned by isomorphism if solution vector y violates \( \pi _\beta (y){\big |}{}_{\smash {[m]}} \succeq \pi _\beta (\gamma (y)){\big |}{}_{\smash {[m]}} \) for some \(\gamma \in \Gamma \). The latter condition looks very similar to classical SHC s, however, there are some differences: the variable order is changed via \(\pi _\beta \), not all variables are present in this constraint due to the restriction, and most importantly, every node has a potentially different reordering and restriction. Nevertheless, these modified SHC s can be used to remove all symmetries from a binary problem in the sense that, for every solution x of a binary problem, there exists exactly one node of \({\mathcal {B}}\) at depth n that contains a symmetric counterpart of x, see [16, Thm. 4.5].

Based on the modified SHC s, it is easy to show that isomorphism pruning and orbital fixing are compatible, provided one can show that both methods are compatible with the modified SHC s.

Lemma 5

Let \(\beta \in {\mathcal {V}}\) be a node at depth m. If \(\beta \) gets pruned by isomorphism, there is no \(x \in \{0, 1\}^n\) that is feasible for the subproblem at \(\beta \) and that satisfies \( \pi _\beta (x){\big |}{}_{\smash {[m]}} \succeq \pi _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} \) for all \(\gamma \in \Gamma \).

Proof

As in Sect. 2.2.1, let \(y \in \{0, 1\}^n\) be such that \(y_i = 1\) if and only if \(i \in B_1^\beta \). If IsoPr prunes \(\beta \), there is \({\gamma \in \Gamma }\) with \({ \pi _\beta (y){\big |}{}_{\smash {[m]}} \prec \pi _\beta (\gamma (y)){\big |}{}_{\smash {[m]}} }\). As the first m entries of \(\pi _\beta (y)\) are branching variables and the remaining entries are 0, we find \({ \pi _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} \ge \pi _\beta (\gamma (y)){\big |}{}_{\smash {[m]}} }\) (componentwise) for each \({x \in \Phi (\beta )}\). Thus, \({ \pi _\beta (x){\big |}{}_{\smash {[m]}} = \pi _\beta (y){\big |}{}_{\smash {[m]}} \prec \pi _\beta (\gamma (y)){\big |}{}_{\smash {[m]}} \le \pi _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} }\), which means every solution in \(\Phi (\beta )\) violates \({ \pi _\beta (x){\big |}{}_{\smash {[m]}} \succeq \sigma _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} }\). \(\square \)

Lemma 6

Let \(\beta \in {\mathcal {V}}\) be a node at depth m. Every fixing found by OF at node \(\beta \) is implied by \( \pi _\beta (x){\big |}{}_{\smash {[m]}} \succeq \pi _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} \) for all \(\gamma \in \Gamma \).

Proof

Assume OF is not compatible with \( \pi _\beta (x){\big |}{}_{\smash {[m]}} \succeq \pi _\beta (\gamma (x)){\big |}{}_{\smash {[m]}} \), \(\gamma \in \Gamma \). Then, there exists a node \(\beta \in {\mathcal {V}}\), a solution \({\bar{x}} \in \Phi (\beta )\) that satisfies \( \pi _\beta ({\bar{x}}){\big |}{}_{\smash {[m]}} \succeq \pi _\beta (\gamma ({\bar{x}})){\big |}{}_{\smash {[m]}} \) for all \(\gamma \in \Gamma \), and an index \(j \in [m]\) with \(i_j \in B_0^\beta \) such that \({\bar{x}}_{\ell } = 1\) for some \(\ell \) in the \(\Delta ^\beta \)-orbit of \(i_j\). Suppose j is minimal.

Since \(\ell \) is contained in the \(\Delta ^\beta \)-orbit of \(i_j\), there exists \(\gamma \in \Delta ^\beta \) with \(\gamma (\ell ) = i_j\). By definition of \(\Delta ^\beta \), for all \(k \in B_1^\beta \), \(\gamma (k) \in B^\beta _1\). Moreover, \({\pi _\beta (\gamma ({{\bar{x}}}))_k = {{\bar{x}}}_{\gamma ^{-1}(i_k)} = 0}\) for all \(k \in [j-1]\) with \(i_k \in B_0^\beta \), because j is selected minimally. Consequently, since \({B_0^\beta \cup B_1^\beta = [m]}\) holds, \(\pi _\beta ({{\bar{x}}})\) and \(\pi _\beta (\gamma ({{\bar{x}}}))\) coincide on the first \(j-1\) entries, and \(1 = {{\bar{x}}}_\ell = {{\bar{x}}}_{\gamma ^{-1}(i_j)} = \pi _\beta (\gamma ({{\bar{x}}}))_j > \pi _\beta ({{\bar{x}}}))_j = {{\bar{x}}}_{i_j} = 0\). That is, \({ \pi _\beta ({{\bar{x}}}){\big |}{}_{\smash {[m]}} \prec \pi _\beta (\gamma ({{\bar{x}}})){\big |}{}_{\smash {[m]}} }\), contradicting that \({\bar{x}}\) satisfies all SHC s. OF is thus compatible with the SHC s. \(\square \)

Isomorphism pruning and orbital fixing are thus compatible. While isomorphism pruning can become active as soon as one can show that no lexicographically maximal solution w.r.t. the modified SHC s is feasible at a node \(\beta \), orbital fixing might not be able to find all symmetry related variable reductions.

Example 7

Let \(\Gamma \le {\mathcal {S}}_{4}\) be generated by a cyclic right shift, i.e., the non-trivial permutations in \(\Gamma \) are \(\gamma _1 = (1,2,3,4)\), \(\gamma _2 = (1,3)(2,4)\), and \(\gamma _3 = (1,4,3,2)\). Consider the branch-and-bound tree in Fig. 5. At node \(\beta _0\), no reductions can be found by OF as no proper shift fixes the 1-branching variable \(x_3\). For \(\gamma _1\), the SHC \( \pi _{\beta _0}(x){\big |}{}_{\smash {[2]}} \succeq \pi _{\beta _0}(\gamma (x)){\big |}{}_{\smash {[2]}} \) reduces to  \({(x_3, x_4) \succeq (x_2, x_3)}\). Due to the variable bounds at \(\beta _0\), the constraint simplifies to \((1,0) \succeq (x_2,1)\). This constraint is violated if \(x_2\) has value 1, so \(x_2\) can be fixed to 0.

Fig. 5
figure 5

Branch-and-bound tree for Example 7

Consequently, symmetry handling by isomorphism pruning and orbital fixing can be improved by identifying further symmetry handling methods that are compatible with the modified SHC s.

3.2 The framework

In this section, we present our unified framework for symmetry handling with the following goals: It should

  1. (G1)

    Allow to check whether different symmetry handling methods are compatible. In particular, it should ensure compatibility of LexFix, isomorphism pruning, and OF.

  2. (G2)

    Generalize the modified SHC s by Ostrowski [16].

  3. (G3)

    Apply to general variable types and general symmetries (not necessarily permutations).

To achieve these goals, we define a more general class of SHC s  \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\), where \(\gamma \in \Gamma \) and \(\beta \in {\mathcal {V}}\), that are not necessarily based on branching decisions.

Let \(\Phi \subseteq \mathbb {R}^n\) and let \(f:\Phi \rightarrow \mathbb {R}\) be such that \(\textrm{OPT}(f,\Phi )\) can be solved by (spatial) branch-and-bound. Let \(\Gamma \) be a group of symmetries of \(\textrm{OPT}(f,\Phi )\). Let \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\) be a branch-and-bound tree and let \(\beta \in {\mathcal {V}}\). In our modified SHC s, the map \({\sigma _\beta :\mathbb {R}^n \rightarrow \mathbb {R}^{m_\beta }}\) will be parameterized via a permutation \(\pi _\beta \in {\mathcal {S}}_{n}\), a symmetry \(\varphi _\beta \in \Gamma \), and an integer \(m_\beta \in \{0, \dots , n\}\) as \(\sigma _\beta (\cdot ) \,{:}{=}\, \left( \pi _\beta \circ \varphi _\beta (\cdot ) \right) {\big |}{}_{\smash {[m_\beta ]}} \). As in Ostrowski’s approach, \(\pi _\beta \) selects a variable ordering and \(m_\beta \) allows to restrict the SHC s to a subset of variables. In contrast to [16], however, \(\pi _\beta \) does not necessarily correspond to the branching order. Moreover, \(\varphi _\beta \) provides more degrees of freedom as it allows to change the variable order imposed by \(\pi _\beta \). We refer to the structure \((m_\beta , \pi _\beta , \varphi _\beta )_{\beta \in {\mathcal {V}}}\) as a symmetry prehandling structure for \({\mathcal {B}}\). Note that this definition already achieves goal (G2) by setting \(\varphi _\beta = \textrm{id}\), using the same \(\pi _\beta \) as in Sect. 3.1, and setting \(m_\beta \) to be the number of different branching variables in node \(\beta \).

Note that Margot [21] also provides a generalization of Ostrowski’s approach. After presenting our framework, we will compare our framework also with the one by Margot and argue that ours is applicable in a more general setting.

Theorem 8

Let \(\Phi \subseteq \mathbb {R}^n\) and let \(f:\Phi \rightarrow \mathbb {R}\) be such that \(\textrm{OPT}(f,\Phi )\) can be solved by (spatial) branch-and-bound. Let \(\Gamma \) be a finite group of symmetries of \(\textrm{OPT}(f,\Phi )\). Suppose that the branch-and-bound method used for solving \(\textrm{OPT}(f,\Phi )\) generates a finite full B &B-tree \({{\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})}\). For each \(\beta \in {\mathcal {V}}\), let \((m_\beta ,\pi _\beta ,\varphi _\beta ) \in [n]_0 \times {\mathcal {S}}_{n} \times \Gamma \). Let \(\sigma _\beta (\cdot ) = (\pi _\beta \circ \varphi _\beta (\cdot )){\big |}{}_{\smash {[m_\beta ]}} \). Suppose that we enforce, for every \(\beta \in {\mathcal {V}}\),

$$\begin{aligned} \sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\ \text {for all}\ \gamma \in \Gamma . \end{aligned}$$
(2)

If \((m_\beta ,\pi _\beta ,\varphi _\beta )\) satisfies so-called correctness conditions  (C1–C4) for all nodes \(\beta \in {\mathcal {V}}\):

  1. (C1)

    If \(\beta \) has a parent \(\alpha \in {\mathcal {V}}\), then \(m_\beta \ge m_\alpha \) and for all \(i \le m_\alpha \) and \(x \in \mathbb {R}^n\) holds \(\pi _\alpha (x)_i = \pi _\beta (x)_i\);

  2. (C2)

    If \(\beta \) has a parent \(\alpha \in {\mathcal {V}}\), then \(\varphi _\beta =\varphi _\alpha \circ \psi _\alpha \) for some

    $$\begin{aligned} \psi _\alpha \in \textrm{stab}(\Gamma , \Phi (\alpha )) \,{:}{=}\,\{ \gamma \in \Gamma : \Phi (\alpha ) = \gamma (\Phi (\alpha )) \}; \end{aligned}$$
  3. (C3)

    If \(\beta \) has a sibling \(\beta ' \in {\mathcal {V}}\), then \(m_\beta = m_{\beta '}\), \(\pi _\beta = \pi _{\beta '}\), and \(\varphi _\beta = \varphi _{\beta '}\), i.e., \(\psi _\alpha \) in (C2) does not depend on \(\beta \);

  4. (C4)

    If \(\beta \) has a feasible solution \(x \in \Phi (\beta )\), then for all permutations \(\xi \in \Gamma \) that satisfy \({\sigma _\beta (x) = \sigma _\beta (\xi (x))}\) also the permuted solution \(\xi (x)\) is feasible in \(\Phi (\beta )\);

then, for each \({\tilde{x}} \in \Phi \), there is exactly one leaf \(\nu \) of the B &B-tree containing a solution symmetric to \({\tilde{x}}\), i.e., for which there is \(\xi \in \Gamma \) with \(\xi ({\tilde{x}}) \in \Phi (\nu )\).

Before we apply and prove this theorem, we interpret the correctness conditions and provide some implications and consequences. We start with the latter.

  • Enforcing (2) handles symmetries by excluding feasible solutions from the search space while guaranteeing that exactly one representative solution per class of symmetric solutions remains feasible (recall that \({\mathcal {B}}\) does not prune nodes by bound). Note that by enforcing (2), symmetry reductions can only take place on variables “seen” by \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\) for some \(\gamma \in \Gamma \). We stress that it is not immediate how (2) can be enforced efficiently. We will turn to this question in Sect. 4.

  • If we prune nodes by bound, (2) still can be used to handle symmetries. But not necessarily all \(x \in \Phi \) have a symmetric counterpart feasible at some leaf (e.g., if x is suboptimal).

  • If not all constraints of type (2) are completely enforced, we still find valid symmetry reductions, but not necessarily exactly one representative solution.

  • If different symmetry handling methods can be expressed in terms of (2) having the same choice of the symmetry prehandling structure \((m_\beta , \pi _\beta , \varphi _\beta )_{\beta \in {\mathcal {V}}}\), then both symmetry handling methods can be applied at the same time, i.e., they are compatible.

  • In practice, B &B is enhanced by cutting planes or domain propagation such as reduced cost fixing. Both also work in our framework if their reductions are symmetry compatible, i.e., if, for \(\beta \in {\mathcal {V}}\), the domain of a variable \(x_i\) is reduced, the same reduction can be applied to all symmetric variables w.r.t. symmetries at \(\beta \). Margot [15, Section 4] discusses this in detail for IsoPr. He refers to this as strict setting algorithms.

  • For spatial branch-and-bound, the children of a node \(\alpha \) do not necessarily partition \(\Phi (\alpha )\) (the feasible regions of children can overlap on their boundary). In this case, (2) can still be used to handle symmetries, but there might exist several leaves containing a symmetrically equivalent solution.

Remark 9

As propagating SHC s cuts off feasible solutions, such propagations are not symmetry-compatible. Therefore, we consider SHC reductions in our framework as special branching decisions, called improper: For a SHC reduction \(C \subseteq \mathbb {R}^n\) at node \(\beta \in {\mathcal {V}}\), two children \(\omega , \omega '\) are introduced with \(\Phi (\omega ) = \Phi (\beta ) \cap C\) and  \(\Phi (\omega ') = \Phi (\beta ) {\setminus } \Phi (\omega )\). Node \(\omega '\) can then be pruned by symmetry. Complementing this, traditional (standard) branching decisions are called proper.

We conclude the discussion of our framework by a comparison with Margot’s framework. As there are some similarities between our framework and the one by Ostrowski [16], there are also some similarities with the framework by Margot [21]. The difference between Margot’s and Ostrowski’s framework is that Margot allows any preorder \(\sqsupseteq _\beta \) at a node \(\beta \) of the branch-and-bound tree, whereas Ostrowski uses a lexicographic comparison. Margot only needs to make sure that for a node \(\beta \) with child \(\beta '\), the preorder \(\sqsupseteq _{\beta '}\) refines \(\sqsupseteq _\beta \). For the ease of exposition, we phrased our framework in terms of a lexicographic comparison, but a different order would also be possible in our setting. In contrast to both Ostrowski and Margot, our framework features the parameter \(\varphi _\beta \), which allows to “partially forget” the preorder at an ancestor node. As we will see below, this is particularly useful for methods such as orbitopal fixing.

Furthermore, Margot assumes that the underlying branching rule partitions the domain of a single variable (without providing further details where this is used). In our framework, the branching rule can be arbitrary as long as the symmetry prehandling structures satisfy the compatibility conditions. That is, in particular, our framework applies for constraint-based branching rules. Finally, Margot discusses his framework only for permutation groups, whereas our framework applies for arbitrary finite groups. In Remark 15 below, we will see that even this condition can be further relaxed, provided certain lexicographically maximal solutions exist.

Interpretation

Theorem 8 iteratively builds SHC s  \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\) that do not necessarily build upon a common lexicographic order for different nodes \(\beta \in {\mathcal {V}}\). The map \({\sigma _\beta (\cdot ) = (\pi _\beta \circ \varphi _\beta (\cdot )){\big |}{}_{\smash {[m_\beta ]}} }\) accepts an n-dimensional vector, considers a symmetrically equivalent representative solution hereof (\(\varphi _\beta \)), reorders its entries (\(\pi _\beta \)), and afterwards restricts them to the first \(m_\beta \) coordinates. This way, \(\sigma _\beta \) selects \(m_\beta \) expressions (and their images) that appear in the SHC s  (2). To ensure that consistent SHC s are derived, sufficient information needs to be inherited to a node’s children in the B &B-tree, which is achieved as follows.

For the ease of explanation, let us first assume \(\varphi _\beta \) is the identity \(\textrm{id}\). Then, (C1) guarantees that a child has not less information than its parent. Moreover, siblings must not be too different, i.e., new information at one child also needs to be known to its siblings (C3). (C4) ensures that if two solutions x and \(\xi (x)\) appear identical for the SHC s in the sense \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\), feasibility of x should imply feasibility of \(\xi (x)\). In other words, if x and \(\xi (x)\) are identical with respect to \(\sigma _\beta \), it may not be that one solution is feasible at \(\beta \) while the other solution is not.

Conditions (C1), (C3), and (C4) describe how \(\sigma _\beta (x)\) “grows” as nodes \(\beta \) follow a rooted path, and that siblings are handled in the same way. If \(\varphi _\beta = \textrm{id}\), for a node \(\beta \) with ancestor \(\mu \) all variables and expressions of \(\sigma _\mu (x)\) also occur in the first \(m_\mu \) elements of \(\sigma _\beta (x)\). Condition (C2) allows for more flexibility in this. Let \(\alpha \) be the parent of \(\beta \). If there is a symmetry \(\gamma \in \Gamma \) that leaves the feasible region of \(\alpha \) invariant (i.e., \(\Phi (\alpha ) = \gamma (\Phi (\alpha ))\)), one can choose to handle the symmetries considering the symmetrically equivalent solution space as of node \(\beta \). This degree of freedom might help a solver to find more symmetry reductions in comparison to just “growing” the considered representatives. For example, in Fig. 2 at node \(\alpha \) with \({(\vartheta _{2,3},\vartheta _{1,2},\vartheta _{1,3}) \leftarrow (0,1,0)}\) the feasible region \(\Phi (\alpha )\) is identical when permuting the first two columns or the last three columns. Suppose that one branches next on variable \(\vartheta _{3,3}\), then the zero-branch will find two reductions (namely \(\vartheta _{3,4},\vartheta _{3,5} \leftarrow 0\)) and the one-branch will find no reductions. If the solver has a preference to reduce the discrepancy between the number of reductions found over the siblings, one could exchange column 3 and 4 for the sake of symmetry handling. Effectively, this moves the branching variable to the fourth column. Applying orbitopal fixing on the matrix where these columns are exchanged leads to one fixing in either child.

Examples

Let \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\) be a B &B-tree, in which each branching decision partitions the domain of exactly one variable. We will show that there are many possible symmetry prehandling structures  \((m_\beta , \pi _\beta , \varphi _\beta )_{\beta \in {\mathcal {V}}}\) that satisfy the correctness conditions of Theorem 8. Hence, this gives many degrees of freedom to handle symmetries. In the following, we discuss choices that resemble three symmetry handling techniques: static SHC s, Ostrowksi’s branching variable ordering, and a variant of orbitopal fixing that is more flexible than the setting of Bendotti et al..

Example 10

(Static SHC s) The static SHC s  \(x \succeq \gamma (x)\) for all \(\gamma \in \Gamma \) can be derived in our framework by setting, for each \(\beta \in {\mathcal {V}}\), the parameters \(m_\beta \!=\!n\), \(\pi _\beta \!=\!\varphi _\beta \!=\!\psi _\beta \!=\!\textrm{id}\). (C1–C3) are satisfied trivially. As any \(x \in \mathbb {R}^n\) satisfies \(\sigma _\beta (x) = \pi _\beta \varphi _\beta (x){\big |}{}_{\smash {[n]}} = x\), we find \(\sigma _\beta (x) = \sigma _\beta \gamma (x)\) if and only if \(x = \gamma (x)\). Hence, also (C4) holds.

Next, we resemble Ostrowski’s rank for binary variables and generalize it to arbitrary variable types. In the latter case, only considering the branching order is not sufficient as one might branch several times on the same variable.

Example 11

(Branching-based) Let \(\beta \in {\mathcal {V}}\). If \(\beta \) is the root node, let \(m_\beta = 0\), i.e., \(\sigma _\beta \) is void. Otherwise, let \(\alpha \) be the parent of \(\beta \). If \(\beta \) arises from \(\alpha \) by a proper branching decision on variable \(x_{{{\hat{\imath }}}}\) and \({{\hat{\imath }}}\) has not been used for branching before, i.e., \({{\hat{\imath }}}\notin (\pi _\beta \varphi _\beta )^{-1}([m_\beta ])\), then set \(m_\beta = m_\alpha + 1\), \(\varphi _\beta = \textrm{id}\) and select \(\pi _\beta \in {\mathcal {S}}_{n}\) with \(\pi _\beta (i) = \pi _\alpha (i)\) for \(i \le m_\alpha \) and \(\pi _\beta (m_\beta ) = {{\hat{\imath }}}\). Otherwise, inherit the symmetry prehandling structure from \(\alpha \), i.e., \(\pi _\beta = \pi _\alpha \), \(m_\beta = m_\alpha \), and \(\varphi _\beta = \textrm{id}\).

Example 11 satisfies (C1–C4)

(C1–C3) hold trivially. To show (C4), let \(x \in \Phi (\beta )\) and \(\xi \in \Gamma \) such that \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\). By definition of \((m_\beta ,\pi _\beta ,\varphi _\beta )\), \(\sigma _\beta (x)\) restricts x onto all (resorted) variables used for branching up to node \(\beta \). To show (C4), note that the feasible region \(\Phi (\beta )\) is the intersection of (i) \(\Phi \), (ii) proper branching decisions, and (iii) symmetry reductions due to (2).

It is thus sufficient to show \(\xi (x)\) is contained in each of these sets. Since \(\xi \) is a problem symmetry and \(x \in \Phi \), also \(\xi (x) \in \Phi \). Moreover, as all branching variables are represented in \(\sigma _\beta \) and x respects the branching decisions, \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\) implies that \(\xi (x)\) satisfies the branching decisions. Thus, (i) and (ii) hold. Finally, the SHC s  (2) for \(\beta \) dominate the SHC s for its ancestors \(\alpha \) since (C1) and \(\varphi _\alpha = \textrm{id}\) hold, i.e, if \(\xi (x)\) satisfies the SHC s for \(\beta \), then also all previous SHC s. As \(\Gamma \circ \xi = \Gamma \), each \(\gamma \in \Gamma \) can be written as \(\gamma ' \circ \xi \) for some \(\gamma ' \in \Gamma \). Therefore, for all \(\gamma ' \in \Gamma \), we conclude  \({\sigma _\beta (\xi (x)) = \sigma _\beta (x) \succeq \sigma _\beta (\gamma (x)) = \sigma _\beta (\gamma '(\xi (x)))}\), i.e., (iii) and thus (C4) holds. \(\square \)

By adapting the variable order used in LexFix to the order imposed by \(\sigma _\beta \), LexFix is thus compatible with isomorphism pruning and OF, i.e., the framework achieves goal (G1). In particular, the statement is true for non-binary problems if these methods can be generalized to arbitrary variable domains. We will discuss this in more detail in the next section.

The last symmetry prehandling structure accommodates orbitopal fixing. Bendotti et al. [6] already discussed a dynamic variant of orbitopal fixing, which reorders the rows of the orbitope matrix similar to Ostrowski’s rank; columns, however, are not reordered. As described above, allowing also column reorderings might lead to more balanced branch-and-bound trees, which can be achieved as follows.

Example 12

(Specialized for orbitopal fixing) Let M be the \(p \times q\) orbitope matrix corresponding to the problem variables via \(M_{i,j} = x_{q(i-1) + j}\). That is, x is filled row-wise with the entries of M. Let \(\beta \in {\mathcal {V}}\). If \(\beta \) is the root node, define \((m_\beta , \pi _\beta , \varphi _\beta ) = (0, \textrm{id}, \textrm{id})\). Otherwise, let \(\alpha \) be the parent of \(\beta \). If \(\beta \) arises from \(\alpha \) by a proper branching decision on variable \(M_{{{\hat{\imath }}},{\hat{\jmath }}}\) and no variable in the \({{\hat{\imath }}}\)-th row has been used for branching before, set \(m_\beta = m_\alpha + q\), select \(\pi _\beta \in {\mathcal {S}}_{n}\) with \(\pi _\beta (k) = \pi _\alpha (k)\) for \(k \in [m_\alpha ]\), and, for \(k \in [q]\), define \(\pi _\beta (m_\alpha + k) = q({{\hat{\imath }}}-1) + k\). Also choose \(\psi _\alpha \in \textrm{stab}(\Gamma , \Phi (\alpha ))\) yielding \(\varphi _\beta = \varphi _\alpha \circ \psi _\alpha \). Consistent with Condition (C3), the choice of \(\psi _\alpha \) is the same for all children sharing the same parent \(\alpha \). If the variable is already included in the variable ordering or if the branching decision is improper, inherit \((m_\beta , \pi _\beta , \varphi _\beta ) = (m_\alpha , \pi _\alpha , \varphi _\alpha )\). Effectively, this creates a new matrix in which the rows are sorted based on branching decisions and columns can be permuted as long as this does not affect symmetrically feasible solutions.

Completely handling SHC s  (2) on \(\beta \) corresponds to using orbitopal fixing on the \((m_\beta / q) \times q\)-matrix filled row-wise with the variables with indices in \((\pi _\beta \varphi _\beta )^{-1}(i)\) for \(i \in [m_\beta ]\). Bendotti et al. [6] introduce this without the freedom of permuting the matrix columns, i.e., for all \(\beta \in {\mathcal {V}}\) they choose \(\varphi _\beta = \textrm{id}\). We call their setting row-dynamic, wheres we refer to our setting as row- and column-dynamic.

Example 12 satisfies (C1–C4)

Obviously, (C1–C3) hold. To show (C4), we use induction. As (C4) holds at the root node, the induction base holds. So, assume (C4) holds at node \(\alpha \) with child \(\beta \) (IH). We show (C4) also holds at \(\beta \).

Let \(x \in \Phi (\beta )\) and \(\xi \in \Gamma \) with \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\). To show \(\xi (x) \in \Phi (\beta )\), we distinguish if the branching decision from \(\alpha \) to \(\beta \) is proper or not. Note that both proper and improper branching decisions only happen on variables present in \(\sigma _\beta \) by construction of \((m_\beta ,\pi _\beta ,\varphi _\beta )\). Hence, if \({\xi (x) \in \Phi (\alpha )}\) holds, \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\) implies \(\xi (x) \in \Phi (\beta )\). Thus, it suffices to prove \({\xi (x) \in \Phi (\alpha )}\).

For improper branching decisions, \(\sigma _\beta = \sigma _\alpha \) and SHC s  (2) are propagated. Because \({x \in \Phi (\beta ) \subseteq \Phi (\alpha )}\) and \(\sigma _\alpha (x) = \sigma _\alpha (\xi (x))\), (IH) yields \(\xi (x) \in \Phi (\alpha )\). For proper branching decisions, we observe that \(\sigma _\beta (\cdot ) = (\pi _\beta \varphi _\alpha \psi _\alpha (\cdot )){\big |}{}_{\smash {[m_\beta ]}} \), \(\sigma _\alpha (\cdot ) = (\pi _\beta \varphi _\alpha (\cdot )){\big |}{}_{\smash {[m_\alpha ]}} \) and \(m_\beta \ge m_\alpha \). Thus, \(\sigma _\beta (x) = \sigma _\beta (\xi (x))\) implies \(\sigma _\alpha (\psi _\alpha (x)) = \sigma _\alpha (\psi _\alpha \xi (x))\). As \({x \in \Phi (\beta ) \subseteq \Phi (\alpha )}\) and \(\psi _\alpha \in \textrm{stab}(\Gamma , \Phi (\alpha ))\), we find \(\psi _\alpha (x) \in \Phi (\alpha )\). By (IH), \(\sigma _\alpha (\psi _\alpha (x)) = \sigma _\alpha (\psi _\alpha \xi (x))\) yields \(\psi _\alpha \xi (x) \in \Phi (\alpha )\). Again, since \(\psi _\alpha \in \textrm{stab}(\Gamma , \Phi (\alpha ))\), by applying \(\psi _\alpha ^{-1}\) left we find \(\xi (x) \in \Phi (\alpha )\). \(\square \)

Proof of Theorem 8

The examples illustrate that many symmetry prehandling structures are compatible with the correctness conditions, which shows that there are potentially many variants to handle symmetries based on Theorem 8. We proceed to prove this theorem. To this end, we make use of the following lemma.

Lemma 13

Let \(\Phi \subseteq \mathbb {R}^n\) and let \(f:\Phi \rightarrow \mathbb {R}\) be such that \(\textrm{OPT}(f,\Phi )\) can be solved by (spatial) branch-and-bound. Let \(\Gamma \) be a finite group of symmetries of \(\textrm{OPT}(f,\Phi )\). Suppose that the branch-and-bound method used for solving \(\textrm{OPT}(f,\Phi )\) generates a full B &B-tree \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\). Let \(\beta \in {\mathcal {V}}\) be not a leaf of the B &B-tree. If there is a feasible solution \({{\tilde{x}}} \in \Phi (\beta )\) with

$$\begin{aligned} \sigma _\beta ({{\tilde{x}}}) \succeq \sigma _\beta \gamma ({{\tilde{x}}})\ \text {for all}\ \gamma \in \Gamma , \end{aligned}$$
(3a)

then \(\beta \) has exactly one child \(\omega \in \chi _\beta \) for which there is \(\xi \in \Gamma \) such that

$$\begin{aligned} \xi ({{\tilde{x}}})&\in \Phi (\omega ),\ \end{aligned}$$
(3b)
$$\begin{aligned} \text {and}\ \sigma _\omega \xi ({{\tilde{x}}})&\succeq \sigma _\omega \gamma \xi ({{\tilde{x}}})\ \text {for all}\ \gamma \in \Gamma . \end{aligned}$$
(3c)

Proof

Let \({{\tilde{x}}} \in \Phi (\beta )\) respect (3a). First, we show the existence of \(\omega \in \chi _\beta \) satisfying (3b) and (3c). Thereafter, we show that \(\omega \) is unique.

Existence:   By Condition (C3), the maps \(\sigma _\omega \) for all children \(\omega \in \chi _\beta \) are the same. Let \(\xi \in \Gamma \) be such that \(\sigma _\omega \xi ({{\tilde{x}}})\) is lexicographically maximal. Note that \(\xi \) exists, since \(\Gamma \) is a finite group. Then, \(\sigma _\omega \xi ({{\tilde{x}}}) \succeq \sigma _\omega \gamma \xi (\tilde{x})\), because \(\xi , \gamma \in \Gamma \) implies \(\xi \gamma \in \Gamma \). Thus, \(\xi \) satisfies (3c). We show that \(\xi ({{\tilde{x}}}) \in \Phi (\omega )\) for some \(\omega \in \chi _\beta \).

Recall that the branching decision at \(\beta \) partitions its feasible region, i.e., \({\{ \Phi (\omega ): \omega \in \chi _\beta \}}\) partitions \(\Phi (\beta )\). As such, there is exactly one child \(\omega \in \chi _\beta \) with \({\xi ({{\tilde{x}}}) \in \Phi (\omega )}\) if \(\xi ({{\tilde{x}}}) \in \Phi (\beta )\). To show (3b), it thus suffices to prove \(\xi ({{\tilde{x}}}) \in \Phi (\beta )\).

For any child \(\omega \in \chi _\beta \), vector x, and \(i \le m_\beta \), we have

$$\begin{aligned} (\sigma _\omega (x))_i = (\pi _\omega \varphi _\omega (x))_i {\mathop {=}\limits ^{(\text {C1})}} (\pi _\beta \varphi _\omega (x))_i {\mathop {=}\limits ^{(\text {C2})}} (\pi _\beta \varphi _\beta \psi _\beta (x))_i = (\sigma _\beta \psi _\beta (x))_i. \end{aligned}$$
(4)

Recall that \(\xi \in \Gamma \) satisfies (3c). Substituting (4) yields \(\sigma _\beta \psi _\beta \xi ({{\tilde{x}}}) \succeq \sigma _\beta \psi _\beta \gamma \xi ({{\tilde{x}}})\) for all \(\gamma \in \Gamma \). In particular, for \(\gamma = \psi _\beta ^{-1}\xi ^{-1} \in \Gamma \), we find \(\sigma _\beta \psi _\beta \xi ({{\tilde{x}}}) \succeq \sigma _\beta ({{\tilde{x}}})\). Then (3a) yields \(\sigma _\beta \psi _\beta \xi ({{\tilde{x}}}) = \sigma _\beta ({{\tilde{x}}})\). By (C4), we thus have \(\psi _\beta \xi ({{\tilde{x}}}) \in \Phi (\beta )\). Since \(\psi _\beta \in \textrm{stab}(\Gamma , \Phi (\beta ))\), applying \(\psi _\beta ^{-1}\) left on this solution yields \(\xi ({{\tilde{x}}}) \in \Phi (\beta )\), herewith completing the first part.

Uniqueness:   Suppose \(\xi , \xi ' \in \Gamma \) satisfy (3c). For \(\gamma = \xi ' \xi ^{-1}\), (3c) for \(\xi \) implies \({\sigma _\omega \xi ({{\tilde{x}}}) \succeq \sigma _\omega \xi '({{\tilde{x}}})}\). Analogously, for \(\xi '\) we choose \(\gamma = \xi (\xi ')^{-1}\) to find \(\sigma _\omega \xi '({{\tilde{x}}}) \succeq \sigma _\omega \xi ({{\tilde{x}}})\). As a result, \(\sigma _\omega \xi ({{\tilde{x}}}) = \sigma _\omega \xi '({{\tilde{x}}})\).

Suppose \(\xi ({{\tilde{x}}}) \in \Phi (\omega )\). Let \(x = \xi ({{\tilde{x}}})\) and \(\gamma = \xi ' \xi ^{-1} \in \Gamma \). Then, we find

$$\begin{aligned} \sigma _\omega (x) = \sigma _\omega \xi ({{\tilde{x}}}) = \sigma _\omega \xi '({{\tilde{x}}}) = \sigma _\omega \xi '(\xi ^{-1}(x)) = \sigma _\omega \gamma (x), \end{aligned}$$

and Condition (C4) yields \(\xi '({{\tilde{x}}}) = \gamma (x) \in \Phi (\omega )\). As the children \(\chi _\beta \) partition \(\Phi (\beta )\) and \(\xi '({{\tilde{x}}}) \in \Phi (\omega )\), there is no other child of \(\beta \) where \(\xi '({{\tilde{x}}})\) is feasible. Thus, independent from \(\xi \in \Gamma \) satisfying (3c), there is exactly one child \(\omega \in \chi _\beta \) with \(\xi ({{\tilde{x}}}) \in \Phi (\omega )\). \(\square \)

We are now able to prove Theorem 8.

Proof of Theorem 8

Recall that we assumed \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\) to be finite and that we do not prune nodes by bound. Let \({\mathcal {B}}_d\) be the tree arising from \({\mathcal {B}}\) by pruning all nodes at depth larger than d. Let \((m_\beta ,\pi _\beta ,\varphi _\beta )_{\beta \in {\mathcal {V}}}\) satisfy the correctness conditions. Let \(\check{x} \in \Phi \) be any feasible solution to the original problem. We proceed by induction and show that, for every depth d of the tree, there is exactly one leaf node in \({\mathcal {B}}_d\) for which a permutation of \(\check{x}\) is feasible and that does not violate the local SHC s  (2).

Let \(d = 0\). The only node at depth d is the root node \(\alpha \). Any feasible solution \({\check{x}} \in \Phi \) is feasible in the root node \(\alpha \in {\mathcal {V}}\) of the branch-and-bound tree \({\mathcal {B}}\). In particular, we can permute \({\check{x}}\) by any \(\xi \in \Gamma \), and have a feasible symmetrical solution. For the root node, choose \(\xi \in \Gamma \) such that \(\sigma _\alpha \xi ({\check{x}}) \succeq \sigma _\alpha \gamma \xi ({\check{x}})\) for all \(\gamma \in \Gamma \). That is, \(\xi (\check{x})\) is not cut off by (2) at \(\alpha \).

Let \(d > 0\) and let \({\tilde{x}} \in \Phi \). By induction, we may assume that there is exactly one leaf node \(\beta \) of \({\mathcal {B}}_d\) at which a permutation \(\xi (\check{x})\) is feasible and that is not cut off by (2). If \(\beta \) is also a leaf in \({\mathcal {B}}\), we are done. Otherwise, since \(\xi (\check{x})\) is not cut off by (2), we can apply Lemma 13 and find that \(\beta \) has exactly one child \(\omega \) at which a permutation of \(\xi (\check{x})\) is feasible and is not cut off by (2) at node \(\omega \). This concludes the proof. \(\square \)

Remark 14

For spatial branch-and-bound algorithms, two subtleties arise. On the one hand, there might not exist a finite branch-and-bound tree. If all branching decisions partition a subproblem’s feasible region, Theorem 8 holds true for all trees pruned at a certain depth level. On the other hand, branching decisions do not necessarily partition the feasible region. In this case, (2) can still be used to handle symmetries. However, in the depth-pruned tree there might exist more than one leaf containing a symmetric copy of a feasible solution.

Remark 15

Theorem 8 still holds in case of some infinite groups. The only place where finiteness is used is in the proof of Lemma 13, where it implies that a symmetry \(\xi \in \Gamma \) exists such that \(\sigma _{\omega }\xi ({{\tilde{x}}})\) is lexicographically maximal for a fixed solution vector \({{\tilde{x}}} \in \Phi (\beta )\). For instance, for infinite groups of rotational symmetries, such a symmetry always exists.

4 Apply framework on generic optimization problems

Due to Theorem 8, we can completely handle all symmetries of an arbitrary problem \(\textrm{OPT}(f, \Phi )\), provided we know how to handle Constraints (2). The aim of this section is therefore to find symmetry handling methods that can deal with non-binary variables. Since handling Constraints (2) is already difficult for binary problems, we cannot expect to handle all symmetries efficiently. Instead, we revisit the efficient methods LexFix, orbitopal fixing, and OF for binary variables and provide proper generalizations for non-binary problems, which allows us to partially enforce Constraints (2). We refer to these generalizations as lexicographic reduction, orbitopal reduction, and orbital symmetry handling, respectively.

Throughout this section, we assume that \(\Gamma \le {\mathcal {S}}_{n}\).

4.1 Lexicographic reduction

4.1.1 The static setting

Assume the symmetry prehandling structure of Example 10 is used in Theorem 8. Then, the SHC s  \(x \succeq \gamma (x)\) for all \(\gamma \in \Gamma \) are enforced at each node of the branch-and-bound tree. For all \(i \in [n]\), let \({\mathcal {D}}_i \subseteq \mathbb {R}^n\) be the domain of variable \(x_i\) at a node of the branch-and-bound tree and let \({\mathcal {D}}= ({\mathcal {D}}_i)_{i \in [n]}\) be the vector of variable domains. The aim of the lexicographic reduction (LexRed) algorithm is to find, for a fixed permutation \(\gamma \in \Gamma \), the smallest domains \({\mathcal {D}}'_i\), \(i \in [n]\), such that .

If \({\mathcal {D}}_i \subseteq \{0, 1\}\) for all \(i \in [n]\), the reductions found by LexRed are equivalent to the reductions found by LexFix. For non-binary domains, similar ideas as for LexFix, which are described in [24, 25], can be used: We iterate over the variables \(x_i\) with indices in increasing order. If \(x_j = \gamma (x)_j\) for all indices \(j < i\), we enforce \(x_i \ge \gamma (x)_i\), and we check if a solution with \(x_i = \gamma (x)_i\) exists. Before we provide a rigorous algorithm, we illustrate the idea.

Example 16

Let \(\Phi = [-1, 1]^4 \cap \mathbb {Z}^4\) and \(\gamma = (1,3,2,4)\). Consider a node with relaxed region \(x \in \{ 0 \} \times [-1, 0] \times \{ 1 \} \times [-1, 1]\). Propagating \(x \succeq \gamma (x)\), we find

$$\begin{aligned} \begin{bmatrix} x_1&{} =&{} 0\\ x_2&{} \in &{} [-1,0]\\ x_3&{} =&{} 1 \\ x_4&{} \in &{} [-1, 1]\\ \end{bmatrix} \succeq \begin{bmatrix} x_4&{} \in &{} [-1, 1]\\ x_3&{} =&{} 1 \\ x_1&{} =&{} 0\\ x_2&{} \in &{} [-1,0]\\ \end{bmatrix} {\mathop {\leadsto }\limits ^{\text {(}\dagger \text {)}}} \begin{bmatrix} x_1&{} =&{} 0\\ x_2&{} \in &{} [-1,0]\\ x_3&{} =&{} 1 \\ x_4&{} \in &{} [-1, 0]\\ \end{bmatrix} \succeq \begin{bmatrix} x_4&{} \in &{} [-1, 0]\\ x_3&{} =&{} 1 \\ x_1&{} =&{} 0\\ x_2&{} \in &{} [-1,0]\\ \end{bmatrix} \!. \end{aligned}$$

In (\(\dagger \)), we restrict the domain of \(x_4\) by propagating \(0 = x_1 \ge x_4\), resulting in \(x_4 \in [-1, 0]\). If \({x_1 = x_4 = 0}\), then SHC \(x \succeq \gamma (x)\) implies the contradiction \({[-1, 0] \ni x_2 \ge x_3 = 1}\), so we must have \(x_1 > x_4\). Since \(x_4 \in \mathbb {Z}\), \(x_4\) must be fixed to \(-1\). No further domain reductions can be derived from \(x \succeq \gamma (x)\).

We now proceed with our generalization of LexFix. To enforce \(x \succeq \gamma (x)\) for general variable domains \({\mathcal {D}}\), some artifacts need to be taken into account. For example, if \(n = 3\) and \(\gamma \) is the cyclic right-shift, then \(y^\epsilon \,{:}{=}\,(1+\epsilon ,0,1) \succeq \gamma (y^\epsilon ) = (1,1+\epsilon ,0)\) for every \(\epsilon > 0\), but \(y^0 \prec \gamma (y^0)\), i.e., \(\{x \in \mathbb {R}^n: x \succeq \gamma (x)\}\) is not necessarily closed. Since optimization software usually can only handle closed sets, we propose the following solution. We extend \(\mathbb {R}\) by an infinitesimal symbol \(\varepsilon \) that we can add to or subtract from any real number to represent a strict difference. This results in a symbolically correct algorithm that is as strong as possible. For example, \(\min \{ 1 + x: x > 1 \} = 2 + \varepsilon \), \(\min \{1 + x + \varepsilon : x > 1 \} = 2 + \varepsilon \), \(\max \{ 1 + x: x < 2 \} = 3 - \varepsilon \), and we do not allow further arithmetic with the \(\varepsilon \) symbol. In practice, however, we cannot enforce strict inequalities. We thus replace \(\varepsilon \) by 0, which will lead to slightly weaker but still correct reductions. That is no problem for our purposes, since we will only either apply the \(\min \)-operator or the \(\max \)-operator, the sign of \(\varepsilon \) will always be the same; namely, if \(\varepsilon \) appears, this has a positive sign in minimization-operations, and a negative sign in maximization-operations.

Now, we turn to the generalization of LexFix to arbitrary variable domains. We introduce a timestamp t. At every time t, the current domain is denoted by \({\mathcal {D}}^t\). We initialize \({\mathcal {D}}_i^0 = {\mathcal {D}}_i\) for all \(i \in [n]\), and for two timestamps \(t > t'\), we will possibly strengthen the domains, i.e., \({\mathcal {D}}_i^{t} \subseteq {\mathcal {D}}_i^{t'}\).

The core of LexRed is the observation that if \( x{\big |}{}_{\smash {[t-1]}} = \gamma (x){\big |}{}_{\smash {[t-1]}} \) holds for some \({t \ge 1}\), then constraint \(x \succeq \gamma (x)\) can only hold when \(x_t \ge \gamma (x)_t = x_{\gamma ^{-1}(t)}\). This observation is exploited in a two-stage approach. In the first stage, LexRed performs the following steps for all \(t = 1,\dots ,n\):

  1. 1.

    The algorithm propagates \(x_t \ge \gamma (x)_t\) by updating the variable domains via

    $$\begin{aligned} \begin{aligned} {\mathcal {D}}_t^t&= \{ z \in {\mathcal {D}}_t^{t-1}: z \ge \min ({\mathcal {D}}_{\gamma ^{-1}(t)}^{t-1}) \},\\ {\mathcal {D}}_{\gamma ^{-1}(t)}^t&= \{ z \in {\mathcal {D}}_{\gamma ^{-1}(t)}^{t-1}: z \le \max ({\mathcal {D}}_{t}^{t-1}) \}, \text { and}\\ {\mathcal {D}}_i^t&= {\mathcal {D}}_i^{t-1}\ \text {for}\ i \in [n] \setminus \{ t, \gamma ^{-1}(t) \}. \end{aligned} \end{aligned}$$
    (5)
  2. 2.

    Then, it checks whether \({\mathcal {D}}^t_i \ne \emptyset \) for all \(i \in [n]\) and whether \(x \in {\mathcal {D}}^t\) guarantees \( x{\big |}{}_{\smash {[t]}} = \gamma (x){\big |}{}_{\smash {[t]}} \). If this is the case, the algorithm continues with iteration \(t+1\). Otherwise, the first phase of LexRed terminates, say at time \(t^\star \).

Of course, all variable domain reductions found during phase one are correct based on the previously mentioned observation.

At the end of phase one, three possible cases can occur: a variable domain is empty, phase one has propagated all variables, i.e., \(x = \gamma (x)\) for all \(x \in {\mathcal {D}}^n\), or \({ x{\big |}{}_{\smash {[t^\star -1]}} = \gamma (x){\big |}{}_{\smash {[t^\star -1]}} }\) and there exists \((v,w) \in {\mathcal {D}}^{t^\star }_{t^\star } \times {\mathcal {D}}^{t^\star }_{\gamma ^{-1}(t^\star )}\) with \(v \ne w\). In either of the first two cases, the algorithm stops because it either has shown that no solution \(x \in {\mathcal {D}}^0\) exists with \(x \succeq \gamma (x)\) or all variables are fixed. In the last case, note that \(v > w\) holds due to the domain reductions at time \(t^\star \). Since \( x{\big |}{}_{\smash {[t^\star -1]}} = \gamma (x){\big |}{}_{\smash {[t^\star -1]}} \) holds for all \(x \in {\mathcal {D}}^{t^\star }\), the relation \(v > w\) shows that there exists \(x \in {\mathcal {D}}^{t^\star }\) such that \( x{\big |}{}_{\smash {[t^\star ]}} \succ \gamma (x){\big |}{}_{\smash {[t^\star ]}} \). Consequently, the domains of variables \(x_{t^\star +1},\dots ,x_n\) cannot be tightened. It might be possible, however, that the domains of \(x_{t^\star }\) and \(\gamma (x)_{t^\star }\) can be reduced further. Namely, if \(x_{t^\star } = \min {\mathcal {D}}^{t^\star }_{\gamma ^{-1}(t^\star )}\) or \(x_{\gamma ^{-1}(t^\star )} = \max {\mathcal {D}}^{t^\star }_{t^\star }\). In this case, the other variable necessarily attains the same value, which means that a solution with \( x{\big |}{}_{\smash {[t^\star ]}} = \gamma (x){\big |}{}_{\smash {[t^\star ]}} \) is created, which might lead to a contradiction with \(x \succeq \gamma (x)\) as illustrated in Example 16.

In the second stage of LexRed, it is checked whether one of these cases indeed leads to a contradiction. If this is the case, \(\min {\mathcal {D}}^{t^\star }_{\gamma ^{-1}(t^\star )}\) can be removed from the domain of \(x_{t^\star }\) or \(\max {\mathcal {D}}^{t^\star }_{t^\star }\) can be removed from the domain of \(x_{\gamma ^{-1}(t)}\). To detect whether a contradiction occurs, the second phase hypothetically fixes \(x_{t^\star }\) or \(x_{\gamma ^{-1}(t^\star )}\) to the respective value and continues with stage one since \( x{\big |}{}_{\smash {[t^\star ]}} = \gamma (x){\big |}{}_{\smash {[t^\star ]}} \) now holds. If phase one then terminates because a variable domain becomes empty, this shows that the domain of \(x_{t^\star }\) or \(x_{\gamma ^{-1}(t^\star )}\) can be reduced. Otherwise, no further variable domain reductions can be derived.

Proposition 17

Let \(\tau \) be the time needed to perform one variable domain reduction in (5). Then, LexRed finds all possible variable domain reductions for \(x \succeq \gamma (x)\) in \(\mathop {{\mathcal {O}}}(n \cdot \tau )\) time.

Proof

Completeness of LexRed follows from the previous discussion. The running time holds as the first stage computes at most n domain reductions and the second stage triggers phase one at most twice. \(\square \)

In many cases, for instance, if variable domains are continuous or discrete intervals, \(\tau = \mathop {{\mathcal {O}}}(1)\), turning lexicographic reduction into a linear time algorithm. This is the case for mixed-integer programs, but for other paradigms such as constraint programming, variable domains could be more complicated. For instance, variable domains could contain holes and, depending on the data structure used to represent domains, an update might not be immediate.

4.1.2 Dynamic settings

Theorem 8 shows that \(\sigma _\beta (x) \succeq \sigma _\beta \gamma (x)\) is a valid symmetry handling constraint for certain symmetry prehandling structures \((m_\beta ,\pi _\beta ,\varphi _\beta )_{\beta \in {\mathcal {V}}}\). If \(\Gamma \) is a permutation group, \(\sigma _\beta (x)\) and \(\sigma _\beta (\gamma (x))\) are just permutations of the solution vector entries and a restriction of this vector. In this case, lexicographic reduction can, of course, also propagate these SHC s by changing the order in which we iterate over the solution vector entries.

In particular, in the binary case and the symmetry prehandling structure of Example 11, the adapted version of LexRed is compatible with IsoPr and OF as we have seen in Sect. 3 that the latter two methods propagate \(\sigma _\beta (x) \succeq \sigma _\beta \gamma (x)\) for all \(\gamma \in \Gamma \).

4.2 Orbitopal reduction

Bendotti et al. [6] present a complete propagation algorithm to handle orbitopal symmetries on binary variables. In this section, we generalize their algorithm to arbitrary variable domains. We call the generalization of orbitopal fixing orbitopal reduction as it does not necessarily fix variables.

4.2.1 The static setting

Suppose that \(\Gamma \) is the group that contains all column permutations of a \(p \times q\) variable matrix X. Further, assume that Theorem 8 uses the symmetry prehandling structure from Example 10, where we assume that the variable vector x associated with the \(p \cdot q\) variables in X is such that enforcing \(x \succeq \gamma (x)\) for all \(\gamma \in \Gamma \) corresponds to sorting the columns of X in lexicographic order. With slight abuse of notation, for \(\gamma \in \Gamma \), we write \(X \succeq \gamma (X)\) if and only if the corresponding vector \(x \in \mathbb {R}^{pq}\) satisfies \(x \succeq \gamma (x)\).

We use the following notation. For any \(M \in \mathbb {R}^{p \times q}\) and \((i,j) \in [p] \times [q]\), we denote by \(M_i\) the i-th row of M, by \(M^j\) the j-th column of M, and by \(M_i^j\) the entry at position (ij). For every variable \(X_i^j\), we denote its domain by \({\mathcal {D}}_i^j \subseteq \mathbb {R}\). Using the same matrix notation, \({\mathcal {D}}\subseteq \mathbb {R}^{p \times q}\) denotes the \(p \times q\)-matrix where entry (ij) corresponds to \({\mathcal {D}}_i^j\). For given domain \({\mathcal {D}}\subseteq \mathbb {R}^{p \times q}\), we denote by \(\underline{M}({\mathcal {D}})\) and \({\overline{M}}({\mathcal {D}})\) the lexicographically smallest and largest element in \({\mathcal {D}}\), respectively. Whenever the domain \({\mathcal {D}}\) is clear from the context, we just write \(\underline{M}\) and \({\overline{M}}\). Moreover, let

$$\begin{aligned} O_{p \times q} \,{:}{=}\,\{ X \in \mathbb {R}^{p \times q}: X \succeq \gamma (X)\ \text {for all}\ \gamma \in \Gamma \} \end{aligned}$$

be the set of all matrices with lexicographically sorted columns. Our goal is to find all possible VDR s of the SHC s \(X \succeq \gamma (X)\) for \(\gamma \in \Gamma \), i.e., we want to find the smallest \({{\hat{{\mathcal {D}}}} \subseteq \mathbb {R}^{p \times q}}\) such that \( {\hat{{\mathcal {D}}}} \cap O_{p \times q} = {\mathcal {D}}\cap O_{p \times q} \). It turns out that, as for the binary case [6], the matrices \(\underline{M}({\mathcal {D}})\) and \({\overline{M}}({\mathcal {D}})\) contain sufficient information for finding \({\hat{{\mathcal {D}}}}\). In the following, recall that we (implicitly) use the infinitesimal notation introduced in the previous section to represent strict inequalities.

Theorem 18

Let \({\mathcal {D}}\subseteq \mathbb {R}^{p \times q}\). For \(j \in [q]\), let

$$\begin{aligned}{i_j \,{:}{=}\,\min (\{ i \in [p]: \underline{M}({\mathcal {D}})_i^j \ne {\overline{M}}({\mathcal {D}})_i^j \} \cup \{ p + 1 \})}.\end{aligned}$$

Then, the smallest \({\hat{{\mathcal {D}}}} \subseteq \mathbb {R}^{p \times q}\) for which \({\hat{{\mathcal {D}}}} \cap O_{p \times q} = {\mathcal {D}}\cap O_{p \times q}\) holds satisfies, for every \({(i,j) \in [p] \times [q]}\),

$$\begin{aligned} {\hat{{\mathcal {D}}}}^j_i = {\left\{ \begin{array}{ll} {\mathcal {D}}^j_i \cap [\underline{M}({\mathcal {D}})_i^j, {\overline{M}}({\mathcal {D}})_i^j], &{} \text {if } i \le i_j,\\ {\mathcal {D}}^j_i, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

This theorem is proven by the following two lemmas. The first lemma shows that no tighter VDR s can be achieved: for every \((i, j) \in [p] \times [q]\) and \(v \in {{\hat{{\mathcal {D}}}}}_i^j\) a lexicographically non-increasing solution matrix \({{\tilde{X}}}\) exists with \({{\tilde{X}}}_i^j = v\). The second lemma shows that the VDR s are valid: for every \((i, j) \in [p] \times [q]\) and \(v \in {\mathcal {D}}_i^j {\setminus } {{\hat{{\mathcal {D}}}}}_i^j\), no matrix \({{\tilde{X}}}\) with \({{\tilde{X}}}_i^j = v\) exists.

Lemma 19

Suppose that \(O_{p \times q} \cap {\mathcal {D}}\ne \emptyset \). Let \(i' \in [p]\) and \(j' \in [q]\) with \(i' \le i_{j'}\). For all \(x \in {\mathcal {D}}_{i'}^{j'}\) with \({\underline{M}} _{i'}^{j'} \le x \le {{\overline{M}}} _{i'}^{j'}\) there is \(X \in O_{p \times q} \cap {\mathcal {D}}\) with \(X_{i'}^{j'} = x\).

Proof

We define two matrices \(A, B \in {\mathcal {D}}\), for which entries \((i,j) \in [p] \times [q]\) are

$$\begin{aligned} A_i^j = {\left\{ \begin{array}{ll} {{\overline{M}}}_i^j &{} \text {if}\ j< j', \\ {\underline{M}}_i^j &{} \text {if}\ j> j', \\ {{\overline{M}}}_i^j = {\underline{M}}_i^j &{} \text {if}\ j = j', i< i_j, \\ {{\overline{M}}}_i^j (> {\underline{M}}_i^j) &{} \text {if}\ j = j', i = i_j,\\ \min ({\mathcal {D}}_i^j) &{} \text {if}\ j = j', i> i_j, \end{array}\right. } \ \text {and}\ B_i^j = {\left\{ \begin{array}{ll} {{\overline{M}}}_i^j &{} \text {if}\ j< j', \\ {\underline{M}}_i^j &{} \text {if}\ j> j', \\ {{\overline{M}}}_i^j = {\underline{M}}_i^j &{} \text {if}\ j = j', i< i_j, \\ {\underline{M}}_i^j (< {{\overline{M}}}_i^j) &{} \text {if}\ j = j', i = i_j, \\ \max ({\mathcal {D}}_i^j) &{} \text {if}\ j = j', i > i_j. \end{array}\right. } \end{aligned}$$

From these two matrices, we show that for any \(x \in {\mathcal {D}}_{i'}^{j'}\) with \({\underline{M}} _{i'}^{j'} \le x \le {{\overline{M}}} _{i'}^{j'}\) there is \(X \in O_{p \times q} \cap {\mathcal {D}}\) with \(X_{i'}^{j'} = x\). We call such a matrix X a certificate for x. In the following, we first provide a construction for these certificates, and after that we show that they are contained in \({\mathcal {D}}\cap O_{p \times q}\).

If \(i' < i_{j'}\), then \(x = {{\overline{M}}}_{i'}^{j'} = {\underline{M}}_{i'}^{j'}\). Thus, \(X = A\) is a certificate. If \(i' = i_{j'}\), then there are three options: If \(x = {\underline{M}}_{i'}^{j'}\), then \(X = B\) is a certificate; if \(x = {{\overline{M}}}_{i'}^{j'}\), then \(X = A\) is a certificate; and if \({\underline{M}}_{i'}^{j'}< x < {{\overline{M}}}_{i'}^{j'}\), then construct X with \(X_i^j = A_i^j\) if \((i, j) \ne (i', j')\), and \(X_{i'}^{j'} = x\).

Note that \(X \in {\mathcal {D}}\). We finally show that \(X \in O_{p \times q}\), concluding the proof. The first \(j' - 1\) columns of X correspond to \({{\overline{M}}}\). That is, they satisfy \(X^j \succeq X^{j + 1}\) for all \(1 \le j < j' - 1\). Similarly, the columns after column \(j'\) correspond to \({\underline{M}}\). Hence, \(X^j \succeq X^{j + 1}\) for all \(j'< j < q\). By the definition of A and B, \({{\overline{M}}}^{j' - 1} \succeq {{\overline{M}}}^{j'} \succeq A^{j'}\) and \(B^{j'} \succeq {\underline{M}}^{j'} \succeq {\underline{M}}^{j' + 1}\). As the columns of X are either columns of A or B, or equal to \(A^{j'}\) up to one entry while remaining lexicographically larger than \(B^{j'}\), we find \({{\overline{M}}}^{j' - 1} = A^{j' - 1} = X^{j' - 1} \succeq A^{j'} \succeq X^{j'} \succeq B^{j'} \succeq B^{j' + 1} = X^{j' + 1} = {\underline{M}}^{j' + 1}\). So, for all consecutive \(j\in [q - 1]\), we have \(X^j \succeq X^{j + 1}\), and hence \(X \in O_{p \times q} \cap {\mathcal {D}}\). \(\square \)

Lemma 20

Suppose that \(O_{p \times q} \cap {\mathcal {D}}\ne \emptyset \). Let \(i' \in [p]\) and \(j' \in [q]\) with \(i' \le i_{j'}\). For all \(X \in O_{p \times q} \cap {\mathcal {D}}\), we have \({\underline{M}} _{i'}^{j'} \le X_{i'}^{j'} \le {{\overline{M}}} _{i'}^{j'}\).

Proof

Suppose the contrary, i.e., for \(X \in O_{p \times q} \cap {\mathcal {D}}\) either \(X_{i'}^{j'} < {\underline{M}} _{i'}^{j'}\) or \(X_{i'}^{j'} > {{\overline{M}}} _{i'}^{j'}\). Suppose that \(i'\) is minimal, i.e., there is no \(i'' < i'\) with \(X_{i''}^{j'} < {\underline{M}} _{i''}^{j'}\) or \(X_{i''}^{j'} > {{\overline{M}}} _{i''}^{j'}\). By symmetry, it suffices to consider the case \(X_{i'}^{j'} < {\underline{M}} _{i'}^{j'}\).

If \(X_{i}^{j'} \le {\underline{M}} _{i}^{j'}\) holds for all \(i < i'\), then \(X^{j'} \prec {\underline{M}}^{j'}\), which contradicts that \({\underline{M}}\) is the lexicographically minimal solution of \(O_{p \times q} \cap {\mathcal {D}}\). Hence, there is a row \(i'' < i'\) with \(X_{i''}^{j'} > {\underline{M}}_{i''}^{j'}\). Since \(i'' < i' \le i_{j'}\) yields \({\underline{M}}_{i''}^{j'} = {{\overline{M}}}_{i''}^{j'}\), we have \(X_{i''}^{j'} > {{\overline{M}}}_{i''}^{j'}\). This contradicts that \(i'\) is supposed to be minimal with \(X_{i'}^{j'} < {\underline{M}}_{i'}^{j'}\) or \(X_{i'}^{j'} > {{\overline{M}}}_{i'}^{j'}\), since for \(i'' < i'\) we satisfy the second condition. This is a contradiction. \(\square \)

Proof of Theorem 18

Lemmas 19 and 20 prove the assertion for \(i' \le i_{j'}\). Since the domains for \({i' > i_{j'}}\) are not restricted in comparison to \({\mathcal {D}}\), domain \({\hat{{\mathcal {D}}}}^{j'}_{i'}\) is valid. To show that it is as tight as possible, we can reconsider in the proof of Lemma 19 the matrix \(A \in {\mathcal {D}}\cap O_{p \times q}\). Replacing entry \((i', j')\) in A with any value in \({\mathcal {D}}_{i'}^{j'}\) yields a matrix \({{\tilde{A}}}\). If \(i' > i_{j'}\), this change does not affect the lexicographic order constraint, so \({{\tilde{A}}} \in {\mathcal {D}}\cap O_{p \times q}\) is a certificate of tightness. Combining these statements shows correctness of Theorem 18. \(\square \)

We conclude this section with an analysis of the time needed to find \({\hat{{\mathcal {D}}}}\). The crucial step is to find the matrices \(\underline{M}\) and \({\overline{M}}\). To find these matrices, we adapt the idea from [6] for the binary case. Denote by \(\text {lexmin}(\cdot )\) and \(\text {lexmax}(\cdot )\) the operators that determine the lexicographically minimal and maximal elements of a set, respectively. We claim that for the lexicographically minimal element \(\underline{M}\), the j-th column is

$$\begin{aligned} {\underline{M}}^j = {\left\{ \begin{array}{ll} \text {lexmin}\left\{ X \in {\mathcal {D}}^j: X \succeq {\underline{M}}^{j+1} \right\} , &{} \text {if}\ j < q, \\ \text {lexmin}\{ X \in {\mathcal {D}}^j \}, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

This can be computed iteratively, starting with the last column \(j=q\), and then iteratively reducing j until the first column. The arguments for correctness are the same as in [6, Thm. 1, Lem. 2]. For this reason, we only describe how to compute the j-th column. Due to this iterative approach, when computing column \(\underline{M}^{j}\) for \(j > q\), column \(\underline{M}^{j+1}\) is known. The idea is to choose the entries of \(\underline{M}^{j}\) minimally such that \({\underline{M}^{j} \succeq \underline{M}^{j+1}}\) holds. This resembles the propagation method of the previous section (LexRed), by choosing the entries minimally such that the constraint holds when restricted to the first elements, then increasing the vector sizes by one iteratively. If this leads to a contradiction with the constraint, it is returned to the last step where the entry was not fixed, and this entry is increased to repair feasibility of the constraint.

More precisely, \({\underline{M}}^{j}\) is found by iterating i from 1 to p as follows. If there is a row index \({i' < i}\) with \({\underline{M}}_{i'}^j > {\underline{M}}_{i'}^{j + 1}\), let \({\underline{M}}_i^j \leftarrow \min ( {\mathcal {D}}_i^j )\). This is possible, because row \(i'\) already guaranteed that \(\underline{M}^j \succ \underline{M}^{j+1}\). If no such index exists, we may assume that all preceding rows \(i' < i\) have \({{\underline{M}}_i^j = {\underline{M}}_i^{j+1}}\) (otherwise, the j-th column cannot be lexicographically larger than column \(j+1\) as becomes clear in the following). In this case, denote \(S^i \,{:}{=}\,\{ x \in {\mathcal {D}}_i^j: x \ge {\underline{M}}_i^{j+1} \}\). On the one hand, if \(|S^i| > 0\), set \({\underline{M}}_i^j \leftarrow \min (S^i)\). If this yields \({\underline{M}}_i^j > {\underline{M}}_i^{j + 1}\), then stop the iteration, and for all \({i'' > i}\) set \({\underline{M}}_{i''}^{j} \leftarrow \min ({\mathcal {D}}_{i''}^j)\). This makes sure that \(\underline{M}^j\) is lexicographically strictly larger than \(\underline{M}^{j+1}\).

On the other hand, if \(|S^i| = 0\), we cannot enforce \(\underline{M}^j \succ \underline{M}^{j+1}\) in row i. To ensure \(\underline{M}\) becomes the lexicographically smallest element in \(O_{p \times q} \cap {\mathcal {D}}\), we return to the largest \(i' < i\) with \(|S^{i'}| > 1\) and enforce a lexicographic difference by setting  \({{\underline{M}}_{i'}^j \leftarrow \min \{ x \in {\mathcal {D}}_{i'}^j: x > {\underline{M}}_{i'}^{j+1} \}}\), and, for all \(i'' > i'\), we assign \({{\underline{M}}_{i''}^{j} \leftarrow \min ({\mathcal {D}}_{i''}^j)}\). If no \(i' < i\) exists with \(|S^{i'}| > 1\), column j cannot become lexicographically at least as large as column \(j+1\). That is, \({\mathcal {D}}\cap O_{p \times q} = \emptyset \).

Analogously, one computes \({{\overline{M}}}\) by

$$\begin{aligned} {\overline{M}}^j = {\left\{ \begin{array}{ll} \text {lexmax}\left\{ X \in {\mathcal {D}}^j: {{\overline{M}}}^{j - 1} \succeq X \right\} &{} \text {if}\ j > 1,\ \text {and} \\ \text {lexmax}\{ X \in {\mathcal {D}}^j \} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Since determining the j-th column of \(\underline{M}\) and \({\overline{M}}\) requires to iterate over its elements a constant number of times, for each element a constant number of comparisons and variable domain reductions is executed. The time for finding \(\underline{M}\) and \({\overline{M}}\) is therefore \(\mathop {{\mathcal {O}}}(pq\tau )\), where \(\tau \) is again the time needed to reduce variable domains. Combining all arguments thus yields the following result regarding orbitopal reduction.

Theorem 21

Let \({\mathcal {D}}\subseteq \mathbb {R}^{p \times q}\). Orbitopal reduction finds the smallest \({\hat{{\mathcal {D}}}} \subseteq \mathbb {R}^{p \times q}\) such that \({{\hat{{\mathcal {D}}}} \cap O_{p \times q} = {\mathcal {D}}\cap O_{p \times q}}\) holds in \(\mathop {{\mathcal {O}}}(pq\tau )\) time. In particular, if all variable domains are intervals, orbitopal reduction can be implemented to run in \(\mathop {{\mathcal {O}}}(pq)\) time.

4.2.2 Dynamic settings

Similar to LexRed, also orbitopal reduction can be used to propagate SHC s  \({\sigma _\beta (x) \succeq } {\sigma _\beta \gamma (x)}\) for permutations \(\gamma \) from a group \(\Gamma \) of orbitopal symmetries. The only requirement is that \(\sigma _\beta \) is compatible with the matrix interpretation of a solution x, which can be achieved by using the symmetry prehandling structure of Example 12. In this case, the static orbitopal reduction algorithm is only applied to the variables “seen” by \(\sigma _\beta (x) \succeq \sigma _\beta \gamma (x)\).

Note that this symmetry prehandling structure admits some degrees of freedom in selecting \(\varphi _\beta \). If \(\varphi _\beta = \textrm{id}\) for all \(\beta \in {\mathcal {V}}\), this resembles the adapted version of orbitopal fixing as mentioned in Sect. 2.1.2. But also other choices are possible as we will discuss in Sect. 5.

4.3 Variable ordering derived from branch-and-bound

A natural question is whether also generalizations for isomorphism pruning and OF exist. The main challenge is that after branching on general variables, they are not necessarily fixed (in contrast to the binary setting). Thus, stabilizer computations as discussed in Sect. 2 might not apply in the generalized setting. Inspired by OF (Sect. 2.2.2), we present a way to reduce variable domains of arbitrary variables based on symmetry, called orbital reduction.

Let \(\beta \in {\mathcal {V}}\) be a node of the branch-and-bound tree. Denote the set of solutions respecting the SHC s  (2) by \(W^\beta \,{:}{=}\,\{ x \in \mathbb {R}^n: \sigma _\beta (x) \succeq \sigma _\beta (\delta (x))\ \text {for all}\ \delta \in \Gamma \}\). Recall that the VDR s in OF are based on the symmetries in the set \(\Delta ^\beta \) that stabilize the one-branched variables. For the generalization in this section, we provide a similar but different set definition. For vectors x and y of equal length m, we write \(x \le y\) if \(x_i \le y_i\) for all \(i \in [m]\). Let \(\Delta ^\beta \,{:}{=}\,\{ \gamma \in \Gamma : \sigma _\beta (x) \le \sigma _\beta \gamma (x)\ \text {for all}\ x \in \Phi (\beta ) \cap W^\beta \} \), i.e., the set of all symmetries for which feasible solutions at \(\beta \) respecting (2) are not larger than their symmetric ones when comparing elementwise with respect to \(\sigma _\beta \). Based on \(\Delta ^\beta \), we show two rules for VDR s that work in the setting of Example 11. These rules are similar to the rules of Sect. 2.2.2. In particular, we show that any reduction by OF in a binary setting is also implied by these reduction rules. These results depend on the following auxiliary lemma:

Lemma 22

Let \(\beta \) be a node of a B &B-tree using single variable branching with symmetry prehandling structure of Example 11. Then, \(\Delta ^\beta \) is a group.

Proof

Recall that we assume \(\Gamma \le {\mathcal {S}}_{n}\) in this section. Therefore, both \(\Gamma \) and \(\Delta ^\beta \subseteq \Gamma \) are finite. To show that it is a group, it suffices to show that compositions of elements of \(\Delta ^\beta \) are also contained therein. The identity and inverses follow implicitly.

Let \(\gamma _1, \gamma _2 \in \Delta ^\beta \), and suppose \(x \in \Phi (\beta ) \cap W^\beta \). By definition of \(\Delta ^\beta \), we have \({\sigma _\beta (x) \le \sigma _\beta (\gamma _2(x))}\). Since \(\gamma _2 \in \Delta ^\beta \le \Gamma \) and \(x \in W^\beta \), we have \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma _2(x))\). Note that \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma _2(x))\) and  \(\sigma _\beta (x) \le \sigma _\beta (\gamma _2(x))\) imply  \(\sigma _\beta (x) = \sigma _\beta (\gamma _2(x))\). Since the correctness conditions are satisfied for Example 11, due to Condition (C4), the properties \(x \in \Phi (\beta )\) and \(\sigma _\beta (x) = \sigma _\beta (\gamma _2(x))\) imply that \(\gamma _2(x) \in \Phi (\beta )\) holds.

Since \(x \in W^\beta \), for all \(\delta \in \Gamma \) we have  \(\sigma _\beta (\gamma _2(x)) = \sigma _\beta (x) \succeq \sigma _\beta (\delta (x))\). Because \(\gamma _2\) is a group element of \(\Gamma \), we thus also have \(\sigma _\beta (\gamma _2(x)) \succeq \sigma _\beta (\delta \circ \gamma _2(x))\) for all \(\delta \in \Gamma \), meaning that \(\gamma _2(x) \in W^\beta \). Summarizing, we have \(\sigma _\beta (x) = \sigma _\beta (\gamma _2(x))\) and \(\gamma _2(x) \in \Phi (\beta ) \cap W^\beta \). By analogy, the same results hold for \(\gamma _1(x)\).

Since \(\gamma _2(x) \in \Phi (\beta ) \cap W^\beta \) and \(\gamma _1 \in \Delta ^\beta \), we get from the definition of \(\Delta ^\beta \) that  \({\sigma _\beta (\gamma _2(x)) \le \sigma _\beta (\gamma _1 \circ \gamma _2(x))}\). Using the same reasoning as above, \(\gamma _2(x) \in W^\beta \) implies  \(\sigma _\beta (\gamma _2(x)) \succeq \sigma _\beta (\gamma _1 \circ \gamma _2(x))\), so \(\sigma _\beta (\gamma _2(x)) = \sigma _\beta (\gamma _1 \circ \gamma _2(x))\). We thus find that  \(\sigma _\beta (x) = \sigma _\beta (\gamma _2(x)) = \sigma _\beta (\gamma _1 \circ \gamma _2(x))\), which implies \(\gamma _1 \circ \gamma _2 \in \Delta ^\beta \). \(\square \)

We show two feasible reductions that are based on \(\Delta ^\beta \). The first reduction shows that the variable domains of the variables in the \(\Delta ^\beta \)-orbit of the branching variable can possibly be tightened. To this end, denote the orbit of i in \(\Delta ^\beta \) by \({O_i^\beta \,{:}{=}\,\{ \gamma (i): \gamma \in \Delta ^\beta \}}\). If the branching decision after node \(\beta \) decreased the upper bound of variable \(x_i\) for some \(i \in [n]\), a valid VDR is to decrease the upper bounds of \(x_j\) for all \(j \in O^\beta _i\) to the same value as we show next.

Lemma 23

(Orbital symmetry handling) Let \({\mathcal {B}}= ({\mathcal {V}}, {\mathcal {E}})\) be a B &B-tree using single variable branching with symmetry prehandling structure of Example 11. Let \(\omega \in {\mathcal {V}}\) be a child of \(\beta \in {\mathcal {V}}\) where \(x_i\) is the branching variable for some \(i \in [n]\). Then, at node \(\omega \), each solution \(x \in \Phi (\omega )\) satisfying \({\sigma _\omega (x) \succeq \sigma _\omega (\delta (x))}\) for all \(\delta \in \Gamma \) (i.e., (2) for node \(\omega \)) also satisfies \(x_i \ge x_j\) for all \(j \in O_i^\beta \).

Proof

Let \(\gamma \in \Delta ^\beta \) and let \(x \in \Phi (\omega )\) satisfy \(\sigma _\omega (x) \succeq \sigma _\omega (\delta (x))\) for all \(\delta \in \Gamma \). Since \(\omega \) is a child of \(\beta \), we have \(x \in \Phi (\omega ) \subseteq \Phi (\beta )\). Also, for all \(\delta \in \Gamma \) we have \({\sigma _\omega (x) \succeq \sigma _\omega (\delta (x))}\), so due to the symmetry prehandling structure of Example 11, we also have \({\sigma _\beta (x) \succeq \sigma _\beta (\delta (x))}\), meaning that \(x \in W^\beta \). Hence, we have \({x \in \Phi (\beta ) \cap W^\beta }\). By definition of \(\Delta ^\beta \), we thus have \(\sigma _\beta (x) \le \sigma _\beta (\gamma (x))\). Recall that due to Example 11, we have for all \(\delta \in \Gamma \) that \(\sigma _\beta (x) \succeq \sigma _\beta (\delta (x))\). Since \(\gamma \in \Delta ^\beta \le \Gamma \), therefore \({\sigma _\beta (x) \le \sigma _\beta (\gamma (x))}\) and \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\) hold, implying \(\sigma _\beta (x) = \sigma _\beta (\gamma (x))\). Denote this result by (\(\dagger \)).

Due to Example 11, we have

$$\begin{aligned} \sigma _\omega (x) = {\left\{ \begin{array}{ll} \sigma _\beta (x), &{} \text {if}\ i \in (\pi _\beta \varphi _\beta )^{-1}(m_\beta ) \ \text {(i.e., variable }x_i\text { appears in }\sigma _\beta (x)\text {), and}\\ \left( {\begin{array}{c}\sigma _\beta (x)\\ x_i\end{array}}\right) ,&\text {otherwise.} \end{array}\right. } \end{aligned}$$

As such, SHC \(\sigma _\omega (x) \succeq \sigma _\omega (\gamma (x))\) is equivalent to either \(\sigma _\beta (x) \succ \sigma _\beta (\gamma (x))\), or both \(\sigma _\beta (x) = \sigma _\beta (\gamma (x))\) and \(x_i \ge \gamma (x)_i\). Note that this statement is the case independent from whether entry i is branched on before or not (i.e., whether \({i \in (\pi _\beta \varphi _\beta )^{-1}([m_\beta ])}\) or not). Using (\(\dagger \)), the first of the two options cannot hold, so we must have \(x_i \ge \gamma (x)_i = x_{\gamma ^{-1}(i)}\). Consequently, \(x_i \ge x_{\gamma ^{-1}(i)}\) is a valid SHC for \(\gamma \in \Delta ^\beta \). Thus, for all \(j \in O_i^\beta \), we can propagate \(x_i \ge x_j\). \(\square \)

Second, recall our assumption that any VDR that is not based on our symmetry framework needs to be symmetry compatible, see Sect. 3. In practice, however, a solver might not find all symmetric VDR s, e.g., due to iteration limits. The following lemma allows us to find missing (but not necessarily all) VDR s based on symmetries, which corresponds to orbital fixing as discussed in [5] without the restriction to binary variables.

Lemma 24

Let \(\beta \) be a node of a B &B-tree using single variable branching with symmetry prehandling structure of Example 11. Then, when SHC s  (2) are enforced (i.e., solutions are in \(W^\beta \)), a valid VDR, for \(i \in [n]\), is to reduce the domain of \(x_i\) to the intersection of all variable domains \(x_j\) at node \(\beta \) for \(j \in O_i^\beta \).

Before proving the lemma, we stress that the orbits \(O_i^\beta \) of \(\Delta ^\beta \) are based on \(\Phi (\beta ) \cap W^\beta \). The VDR s do not change \(\Phi (\beta ) \cap W^\beta \), since these are the actual acceptable solution vectors in consideration at node \(\beta \), and not the relaxation. Hence, \(O_i^\beta \) is not changed by applying these VDR s.

Proof

Let \(x \in \Phi (\beta )\) and let \(i \in [n]\). Let \(j \in O_i^\beta \), i.e., there exists \(\gamma \in \Delta ^\beta \) with \(\gamma (i) = j\). As \(\gamma \in \Delta ^\beta \), \(\sigma _\beta (x) \le \sigma _\beta (\gamma (x))\) holds. When SHC s  (2) are enforced (i.e., \(x \in W^\beta \)), it must as well hold that \(\sigma _\beta (x) \succeq \sigma _\beta (\gamma (x))\), and thus \(\sigma _\beta (x) = \sigma _\beta (\gamma (x))\). Since Example 11 satisfies the correctness conditions, Condition (C4) yields \(\gamma (x) \in \Phi (\beta )\), or equivalently, \(x \in \gamma ^{-1}(\Phi (\beta ))\). Hence, if \(x \in \Phi (\beta ) \cap W^\beta \), then for all \(\gamma \in \Delta ^\beta \) holds that \(x \in \gamma ^{-1}(\Phi (\beta ))\). For this reason, the domain of \(x_i\) can be restricted to the intersection of the domains for all variables \(x_j\) for all \(j \in O_i^\beta \). \(\square \)

In fact, we do not need to restrict to reductions of individual variable domains. The proof also shows that any reduction (e.g., a cutting plane) of \(\bigcap _{\gamma \in \Delta ^\beta } \gamma (\Phi (\beta ))\) is implied by \(\Phi (\beta ) \cap W^\beta \).

In practice, as part of a solver, Lemmas 23 and 24 cannot be used immediately as the orbits depend on \(\Delta ^\beta \), which cannot be computed easily as it depends on \(\Phi (\beta )\) and \(W^\beta \). Instead, we base ourselves on a suitably determined subgroup \({{\tilde{\Delta }}}^\beta \le \Delta ^\beta \), and apply the reductions induced by that subgroup. Because the reductions are based on variables in the same orbit, and orbits of the subgroup are subsets of the orbits of the larger group, VDR s found by \({{\tilde{\Delta }}}^\beta \) would also be found by \(\Delta ^\beta \). We provide an example illustrating the reductions.

Example 25

We consider the problem of assigning non-negative numbers to the vertices of a regular polygon with n vertices. Numbers on adjacent vertices must differ by at most 2, and the numbers on vertices at distance d differ by at least d. The problem searches a number assignment with minimal total sum. A natural formulation of this problem uses variables \(x_i\), \(i \in [n]\), to model the numbers that are assigned to vertices \(i \in [n]\) and reads as follows:

$$\begin{aligned} \min \left\{ \! \sum _{i \in [n]} x_i: \begin{aligned}&|x_i - x_j| \le 2\ \text {for all}\ i, j \in [n]\ \text {with}\ i-j=1 \bmod {n}; \\&|x_i - x_j| \ge |i-j \bmod {n}|\ \text {for all}\ i, j \in [n]; x \in \mathbb {Z}_{\ge 0}^n \end{aligned} \right\} . \end{aligned}$$

We consider permutation symmetries of this problem. It is easy to see that permutation symmetries of this problem are given by the symmetries of the dihedral group, i.e., rotations of the polygon and reflection along axes through one vertex if n is odd axes through opposite vertices if n is even. For example, if \(n = 5\), the symmetries are given by the five cyclic shifts of the vertex label vector (1, 2, 3, 4, 5) as well as the reflections (2, 5)(3, 4), (1, 3)(4, 5), (1, 5)(2, 4), (1, 2)(3, 5), and (1, 4)(2, 3).

Let \(n = 5\). Consider the branch-and-bound tree where the root node branches on \({x_1 \le 2}\) and \(x_1 \ge 3\). We call the corresponding child nodes \(\beta _{\le }\) and \(\beta _{\ge }\), respectively. Let \(\beta \in \{ \beta _\le , \beta _\ge \}\). Following Example 11, \(\sigma _\beta (x) = [x_1]\). To obtain an easily computable subgroup of \(\Delta ^\beta \), we can, for instance, select \({{\tilde{\Delta }}}^{\beta }\) to stabilize the branching vertex 1. In our example, this group consists of the identity and the reflection along the axis through vertex 1. It is easy to see that, regardless of the choice of x, every permutation \(\gamma \in {{\tilde{\Delta }}}^\beta \) satisfies \(\sigma _\beta (x) \le \sigma _\beta (\gamma (x))\). We now describe the symmetry reductions that can be found in both \(\beta _\le \) and \(\beta _\ge \).

At node \(\beta _\le \), by Lemma 23, we can apply \(x_1 \ge x_j\) for each j in the orbit of 1 at the parent of \(\beta _{\ge }\) (i.e., the root node). The symmetries at the root node are given by the dihedral group, i.e., the orbit of vertex 1 is [n]. Since \(x_1 \le 2\), we find the reduction \(x_j \le 2\) for all \(j \in [n]\).

At node \(\beta _\ge \), due to the branching decision \(x_1 \ge 3\), constraints \(|x_1 - x_2| \le 2\) and \({|x_1 - x_5| \le 2}\) imply the (symmetry-compatible) reductions \(x_2 \ge 1\) and \(x_5 \ge 1\), respectively. As discussed, it is possible that not all variable domain reductions are found. For instance, suppose that a solver only detects that \(x_2 \in [1, \infty )\) and keeps the larger domain of \([0, \infty )\) for \(x_5\). Since vertices 2 and 5 are in the same orbit of \({{\tilde{\Delta }}}^{\beta _\ge }\), Lemma 24 can also find the reduction \(x_5 \ge 1\). That is, the missing reduction can be found by symmetry.

In the example, we determine \({{\tilde{\Delta }}}^\beta \) in an ad-hoc way. One way to define \({{\tilde{\Delta }}}^\beta \) is the following. For all \(i \in [n]\), let \({\mathcal {D}}_i^\beta \subseteq \mathbb {R}\) be the (known) domain of variable \(x_i\) at node \(\beta \). In particular, we thus have . By replacing \({\Phi (\beta ) \cap W^\beta }\) in the definition of \(\Delta ^\beta \) by , we get a subset of symmetries: . In particular, using (a subset of) the left set as generating set, one finds a permutation group that is a subgroup of \(\Delta ^\beta \). For the computational results shown in Sect. 5 we discuss the subset selection procedure that we chose in our implementation.

We finish this section by showing that, in binary problems, VDR s yielded by OF from Sect. 2.2 are also yielded by the generalized setting of Lemmas 23 and 24.

Lemma 26

Denote \(\Delta ^{\beta }_{\textrm{bin}}\) as the group \(\Delta ^{\beta }\) defined in Sect. 2.2, and \(\Delta ^{\beta }_{\textrm{gen}}\) as the group with the same symbol defined here. If the symmetry group acts on binary variables exclusively, then \(\Delta ^\beta _{\textrm{bin}} \le \Delta ^\beta _{\textrm{gen}}\).

Proof

In binary problems, branching on variables fixes their values. As such, because vector \(\sigma _\beta (x)\) contains all branched variables, it is the same for all \({x \in \Phi (\beta )}\). Suppose \(\gamma \in \Delta ^\beta _{\textrm{bin}}\), i.e., \(\gamma \in \Gamma \) and \(\gamma (B_1^\beta ) = B_1^\beta \). For all \(i \in [m_\beta ]\), with \(\sigma _\beta (x)_i = 1\), we have \(\sigma _\beta (\gamma (x))_i = 1\) for all \(x \in \Phi (\beta )\). Similarly, for all \(i \in [m_\beta ]\) with \(\sigma _\beta (x)_i = 0\), we have \(\sigma _\beta (\gamma (x))_i \ge 0\). This means that for all \(x \in \Phi (\beta )\) we have \(\sigma _\beta (x) \le \sigma _\beta (\gamma (x))\). This holds in particular for all \(x \in \Phi (\beta ) \cap W^\beta \), i.e., \(\gamma \in \Delta ^\beta _{\textrm{gen}}\). \(\square \)

By the OF rule described in Sect. 2.2, for all variable indices i where \(x_i\) is branched to zero, all variables \(x_j\) with j in the orbit of i in \(\Delta ^\beta _{\textrm{bin}}\) can be fixed to zero, as well. Because \(\Delta ^\beta _{\textrm{bin}} \le \Delta ^\beta _{\textrm{gen}}\), every orbit of \(\Delta ^\beta _{\textrm{bin}}\) is contained in an orbit of \(\Delta ^\beta _{\textrm{gen}}\). As such, if \(x_i\) is the branching variable at the present node, this is implied by \(x_i \ge x_j\) in Lemma 23. Otherwise, this is implied by Lemma 24.

5 Computational study

To assess the effectiveness of our methods, we compare the running times of the implementations of the various dynamic symmetry handling methods of Sect. 4 (in the regime of Examples 11 and 12) to similar existing methods. To this end, we make use of diverse testsets.

  • Symmetric benchmark instances from MIPLIB 2010 [36] and MIPLIB 2017 [37].

  • Existence of minimum \(t\text {-}(v,k,\lambda )\)-covering designs.

  • Noise dosage problem instances as discussed by Sherali and Smith [22] (cf. Problem 1).

  • Kissing number problem instances as discussed by Liberti [42].

The MIPLIB instances offer a diverse set of instances that contain symmetries, but these symmetries operate on binary variables predominantly. To evaluate the effectiveness of our framework for non-binary problems, we consider the covering design and noise dosage instances. The symmetries of the former are orbitopal, whereas the latter has no orbitopal symmetries. As opposed to the other testsets, the kissing number problems are non-linear optimization problems.

Although our framework allows to handle more general symmetries, we restrict the numerical experiments to permutation symmetries. On the one hand, SCIP can only detect permutation symmetries at this point of time. On the other hand, most symmetry handling methods discussed in the literature only apply to permutation symmetries. The development of methods for other kinds of symmetries is out of scope of this article.

5.1 Solver components and configurations

We use a development version of the solver SCIP 8.0.3.5 [25, 43], commit 8443db2,Footnote 2 with LP-solver Soplex 6.0.3. SCIP contains implementations of the state-of-the-art methods LexFix, orbitopal fixing, and OF to which we compare our methods. We have extended the code with our dynamic methods. Our modified code is available on GitHub.Footnote 3 This repository also contains the instance generators and problem instances for the noise dosage and covering design problems.

For all settings, symmetries are detected by finding automorphisms of a suitable graph [5, 20] using bliss 0.77 [38]. We make use of the readily implemented symmetry detection code of SCIP, which finds a set of permutations \(\Pi \) that generate a symmetry group \(\Gamma \) of the problem, namely the symmetries implied by its formulation [5, 20, 21]. This is a permutation group acting on the solution vector index space, so the setting of Sect. 2 and 4 applies.

If \(\Gamma \) is a product group consisting of k components, i.e., , then by using similar arguments as in [9, Proposition 5], the symmetries of the different components \(\Gamma _i\), \(i \in [k]\), can be handled independently; compositions of permutations from different components do not need to be taken into account. In particular, it is possible to select a different symmetry prehandling structure for the different components. We therefore decompose the set of permutations \(\Pi \) generating \(\Gamma \) that are found by SCIP in components, yielding generating sets \(\Pi _1, \dots , \Pi _k\) for components \(\Gamma _1, \dots , \Gamma _k\). Symmetry in each component is handled separately.

For all settings, we disable restarts to ensure that all methods exploit the same symmetry information. We compare our newly implemented methods to the methods originally implemented in SCIP.

Our configurations

Given the symmetry group generator description, SCIP 8.0.3.5 [25] features a selection mechanism for which symmetry handling methods are activated. Inspired by the original code, we replace this mechanism by the following. For every component \(\Gamma _i\), we handle symmetries as follows, where we skip some steps if the corresponding symmetry handling method is disabled.

  1. 1.

    If a SCIP-internal heuristic, cf. [9, Sect. 4.1], detects that \(\Gamma _i\) consists of orbitopal symmetries:

    1. (a)

      If the orbitope matrix is a single row \([x_1 \cdots x_\ell ]\), add linear constraints  \({x_1 \ge \dots } {\ge x_\ell }\).

    2. (b)

      Otherwise, if the orbitope matrix contains only two columns (i.e., is generated by a single permutation \(\gamma \)), then use lexicographic reduction using the dynamic variable ordering of Example 11, as described in Sect. 4.1.

    3. (c)

      Otherwise, if there are at least 3 rows with binary variables whose sum is at most 1 (so-called packing-partitioning type) then use the complete static propagation method for packing-partitioning orbitopes as described by Kaibel and Pfetsch [11], where the orbitope matrix is restricted to the rows with this structure.

    4. (d)

      Otherwise, use dynamic orbitopal reduction as described in Sect. 4.2 using the dynamic variable ordering of Example 12. We select \(\varphi _\beta \) such that it swaps the column containing the branched variable to the middlemost (or leftmost) symmetrically equivalent column when propagating the SHC s  (2) using static orbitopal reduction.

  2. 2.

    Otherwise (i.e., if the symmetries are not orbitopal), use the symmetry prehandling structure of Example 11 and use two compatible methods simultaneously:

    1. (a)

      Lexicographic reduction as described in Sect. 4.1.

    2. (b)

      Orbital reduction as described in Sect. 4.3. Since computing \(\Delta ^\beta \) is non-trivial, we work with a subgroup of \(\Delta ^\beta \), namely the group generated by all permutations \(\gamma \in \Pi _i\) for which \(\sigma _\beta (x) \le \sigma _\beta (\gamma (x))\) for all .

We also compare settings where orbitopal reduction, orbital reduction and lexicographic reduction are turned off. If orbitopal symmetries are not handled, we always resort to the second setting, where lexicographic reduction (if enabled) and/or orbital reduction (if enabled) are applied.

For orbitopal symmetries, we have chosen to handle certain common cases before refraining to the dynamic orbitopal reduction code that we devised. First, for single-row orbitopes, the symmetry is completely handled by the linear constraints. Since linear constraints are strong and work well with other components of the solver, we decided to handle those this way. If an orbitope only has two columns, the underlying symmetry group is generated by a single permutation of order 2. In that case, the symmetry is completely handled by lexicographic reduction. Third, it is well known that exploiting problem-specific information can greatly assist symmetry handling. If a packing-partitioning structure can be detected, we therefore apply specialized methods as discussed above. Otherwise, we use orbitopal reduction as discussed in Sect. 4.2. In Step 11d, the choice of \(\varphi _\beta \) is inspired by the discussion in Sect. 3.2 (“Interpretation”). By moving the branching variable to the middlemost possible column, balanced subproblems are created, whereas the leftmost possible column might lead to more reductions in one child than in the other. Below, we will investigate which technique is more favorable.

For the components that are do not consist of orbitopal symmetries, we decided to settle on the setting of Example 11 and use both compatible methods. Since the SCIP version that we compare to either uses orbital fixing or (static) lexicographic fixing for such components, our setting allows us to assess the impact of adapting lexicographic fixing to make it compatible with orbital fixing.

Comparison base

We compare to similar, readily implemented methods in SCIP. In SCIP jargon, these methods are called polyhedral and orbital fixing and can be enabled/disabled independently. The polyhedral methods consist of LexFix for the SHC s  \(x \succeq \gamma (x)\) for \(\gamma \in \Pi _i\) and methods to handle orbitopal symmetries; orbital fixing uses OF from [5]. Note that only symmetries of binary variables are handled.

If a component consists of polytopal symmetries, the polyhedral methods handle these symmetries by carrying out the check of Step 11c. In case it evaluates positively, methods exploiting packing-partitioning structures are applied as described above. Otherwise, orbitopal symmetries of binary variables are handled by a variant of row-dynamic orbitopal fixing. The remaining components are either handled by static LexFix or orbital fixing, depending on whether the polyhedral methods or orbital fixing is enabled. Moreover, if polyhedral methods are disabled, orbital fixing is also applied to components consisting of orbitopal symmetries.

5.2 Results

All experiments with integer variables have been run in parallel on the Dutch National supercomputer Snellius “thin” consisting of compute nodes with dual AMD Rome 7H12 processors providing a total of 128 physical CPU cores, and \({256}\,{\textrm{GB}}\) memory. Each process has an allocation of 4 physical CPU cores and \({8}\,{\textrm{GB}}\) of memory. Note that due to the architecture of the supercomputer, jobs cannot be run exclusively, i.e., the performance of one job impacts the performance of others. In the results below we report the running times (column time) and time spent on symmetry handling (column sym) in shifted geometric mean \(\prod _{i = 1}^n (t_i + 1)^{\frac{1}{n}} - 1\) to reduce the impact of outliers. We also report the number of instances solved within the time limit of \({1}\,\textrm{h}\) (column #S). If the time limit is reached, the solving time of that instance is reported as \({1}\,\textrm{h}\). None of the instances failed or exceeded the memory limit. We report the aggregated results for all instances, for all instances for which at least one setting solved the instance within the time limit, and for all instances solved by all settings. For each of these classes, we provide their size below.

We use abbreviations for the settings. We compare no symmetry handling (Nosym), traditional polyhedral methods (Polyh), traditional orbital fixing (OF), dynamic orbitopal reduction (OtopRed), dynamic lexicographic reduction (LexRed), orbitopal reduction (OR), and combinations hereof. Note that also setting Nosym reports a small symmetry time, because due to SCIP’s architecture, handling the corresponding plug-in requires some time even if it is not used. Moreover, if symmetries are handled by linear constraints in the model (cf. Step 11a), these are not reported in the symmetry handling figures. Recall from 11d that we consider two variants of selecting \(\varphi _\beta \) for dynamic orbitopal fixing. We refer to these variants as first and median for the leftmost and middlemost column, respectively. As the noise dosage testset exclusively consists of orbitopal symmetries, we test both parameterizations there. For the remaining testsets, we restrict ourselves to median as it performs better on average for the noise dosage instances.

Since the covering design and noise dosage testsets are relatively small and contain many easy instances, performance variability is minimized by repeating each configuration-instance pair three times with a different global random seed shift. Due to the large number of instances, settings and large time requirements, only one seed is used for the MIPLIB instances.

5.2.1 MIPLIB

To compose our testset, we presolved all instances from the MIPLIB 2010 [36] and MIPLIB 2017 [37] benchmark testsets and selected those for which SCIP could detect symmetries. This results in 129 instances. We excluded instance mspp16 as it exceeds the memory limit during presolving.

The goal of our experiments for MIPLIB instances is twofold. On the one hand, we investigate whether our framework allows to solve symmetric mixed-integer programs faster than the state-of-the-art methods as implemented in SCIP. On the other hand, we are interested in the effect of adapting different symmetry handling methods for \(\sigma _\beta (x) \succeq \sigma _\beta \gamma (x)\) in comparison with their static counterparts. Table 1 shows the aggregate results of our experiments.

Table 1 Results for MIPLIB 2010 and MIPLIB 2017

Regarding the first question, we observe that using any type of symmetry handling is vastly superior over the setting where no symmetry is handled. Considering all instances, the best of the traditional settings reports an average running time improvement of \({24.9}\,\%\) over Nosym, and the best of the dynamified methods report \({28.8}\,\%\). Our framework thus allows to improve on SCIP’s state-of-the-art by \({5.2}\,\%\). On the instances that can be solved by at least one setting, this effect is even more pronounced and improves on SCIP’s best setting by \({8.6}\,\%\). We believe that this is a substantial improvement, because the MIPLIB instances are rather diverse. In particular, some of these instances contain only very few symmetries.

Regarding the second question, we compare the SCIP settings Polyh, OF, and Polyh + OF with their counterparts in our framework, being OtopRed + LexRed, OR, and OR + OtopRed, respectively. The running time of the pure polyhedral setting Polyh can be improved in our framework by \({7.5}\,\%\) when considering all instances, and by \({12.4}\,\%\) when considering only the instances solved by some setting within the time limit. Consequently, adapting symmetry handling to the branching order via the symmetry prehandling structure of Examples 11 and 12 allows to gain substantial performance improvements. Our explanation for this behavior is that symmetry reductions can be found much earlier in branch-and-bound than in the static setting (cf. Fig. 2, where no reductions can be found at depth 1 if no adaptation is used). Thus, symmetric parts of the branch-and-bound tree can be pruned earlier.

Comparing OF and OR, we observe that OR is slightly faster than OF. Both methods, however, are much slower than Polyh and OtopRed + LexRed. A possible explanation is that the latter methods make use of orbitopal fixing, which can handle entire symmetry groups, whereas OF and OR only find some reductions based on orbits.

Among SCIP’s methods, Polyh + OF performs best. Its counterpart OR + OtopRed in our framework performs comparably on all instances and the solvable instances, however, three fewer instances are solved. A possible explanation for the comparable running time is that traditional orbital fixing and the variant of row-dynamic orbitopal fixing are already dynamic methods. Thus, a comparable running time can be expected (although this does not explain the difference in the number of solved instances).

Lastly, we discuss settings which are not possible in the traditional setting, i.e., combing LexRed and orbital reduction. Enhancing orbital reduction by LexRed indeed leads to an improvement of \({2.4}\,\%\) and allows to solve two more instances. The best setting in our framework, however, does not enable all methods in our framework. Indeed, the running time of OR + OtopRed + LexRed can be improved by \({4.9}\,\%\) when disabling orbital reduction. We explain this phenomenon with the fact that orbitopal reduction already handles a lot of group structure. Combining LexRed and orbital reduction on the remaining components only finds a few more reductions. The time needed for finding these reductions is then not compensated by the symmetry reduction effect. If no group structure is handled via orbitopal reduction, LexRed can indeed be enhanced by orbital reduction.

Recall that symmetries in MIPLIB instances act predominantly on binary variables. As opposed to the traditional settings that we compare to, the generalized setting can handle symmetries on non-binary variable domains. Thus, potentially more reductions can follow from larger symmetry groups that include non-binary variables. As shown in Appendix B, a larger group is detected in only 10 out of the 128 instances, and only 2 of these can be solved by some setting. When considering the subset of instances where symmetry is handled based on the same group, we report similar results as before. This shows that even if we only consider problems where symmetries act on the binary variables, our generalized methods outperform similar state-of-the-art methods.

5.2.2 Minimum \(\varvec{t\text {-}(v, k, \lambda )}\)-covering designs

Since the symmetries of MIPLIB instances predominantly act on binary variables, we turn the focus in the following to symmetric problems without binary variables to assess our framework in this regime. Let \(v \ge k \ge t \ge 0\) and \(\lambda > 0\) be integers. Let V be a set of cardinality v, and let \({\mathcal {K}}\) (resp. \({\mathcal {T}}\)) be the collections of all subsets of V having sizes k (resp. t). A \(t\text {-}(v, k, \lambda )\)-covering design is a multiset \({\mathcal {C}} \subseteq {\mathcal {K}}\) if all sets \(T \in {\mathcal {T}}\) are contained in at least \(\lambda \) sets of \({\mathcal {C}}\), counting with multiplicity. A covering design is minimum if no smaller covering design with these parameters exist, and finding these is of interest, e.g., in [39,40,41]. Margot [39] gives an ILP-formulation, having decision variables  \(\nu \in \{0, \dots , \lambda \}^{{\mathcal {K}}}\) specifying the multiplicity of the sets \(K \in {\mathcal {K}}\) for the minimum \(t\text {-}(v, k, \lambda )\)-covering design sought after. The problem is to

$$\begin{aligned}&\text {minimize}\ \sum _{K \in {\mathcal {K}}} \nu _{K}, \end{aligned}$$
(CD1)
$$\begin{aligned}&\text {subject to}\ \sum _{K \in {\mathcal {K}} : T \subseteq K} \nu _K \ge \lambda \quad \text {for all}\ T \in {\mathcal {T}}, \end{aligned}$$
(CD2)
$$\begin{aligned}&\nu \, \in \{ 0, \dots , \lambda \}^{{\mathcal {K}}}. \end{aligned}$$
(CD3)

Symmetries in this problem re-label the elements of V, and these are also detected by the symmetry detection routine of SCIP. Note that, although the underlying group is symmetric, these symmetries in terms of the variables \(\nu \) are not orbitopal.

Margot [39] considers an instance with \(\lambda = 1\), which is a binary problem. We consider all non-binary instances with parameters \(\lambda \in \{ 2, 3 \}\) and \(12 \ge v \ge k \ge t \ge 1\), and restricted ourselves to instances that were solved within \({7200}\,\textrm{s}\) in preliminary runs and that require at least \({10}\,\textrm{s}\) for some setting. This way, 58 instances remain. With 3 seeds per instance, we end up with 174 results. The aggregated results are shown in Table 2. Because the instances are non-binary, none of the considered traditional methods can be applied to handle the symmetries. As such, we can only compare to no symmetry handling. The best of our instances reports an improvement of \({77.8}\,\%\) over no symmetry handling.

Table 2 Results for finding minimum \(t\text {-}(v,k,\lambda )\) covering designs

If orbital reduction and LexRed are not combined, orbital reduction is the more competitive method as it improves upon LexRed by \({47.2}\,\%\). Although LexRed is much faster than Nosym, this comparison shows that more symmetries can be handled when the group structure is exploited via orbits. Nevertheless, our framework allows to further improve orbital reduction by another \({6.6}\,\%\) if it is combined with LexRed. That is, orbital reduction is not able to capture the entire group structure and missing information can be handled by LexRed efficiently via our framework.

5.2.3 Noise dosage

To assess the effectiveness of orbitopal reduction isolated from other symmetry handling methods, we consider the noise dosage (ND) problem [22], which has orbitopal symmetries as explained in Problem 1. However, we replace the binary constraint (NDB5) by \({\vartheta \in \mathbb {Z}_{\ge 0}^{p \times q}}\) to be able to evaluate the effect of orbitopal reduction on non-binary problem. In particular, we are interested whether the choice of the left or median variants matters for the parameter \(\varphi _\beta \).

For each of the parameters \((p, q) \in \{ (3, 8), (4, 9), (5, 10) \}\), Sherali and Smith have generated four instances [22]. We thank J. Cole Smith for providing us these instances. It turns out that these instances are dated and very easy to solve even without symmetry handling methods. As such, we have extended the testset. For each parameterization \((p, q) \in \{ (6, 11), (7, 12), \dots , (11, 16) \}\), we generate five instances. The details of our generator are in Appendix C.

Symmetries in the noise dosage problem can be handled by adding linear inequalities  \({\sum _{i=1}^p M^{p-i} \vartheta _{i,j} \ge \sum _{i=1}^p M^{p-i} \vartheta _{i,j+1}}\) for \(j \in \{1, \dots , q-1\}\), where M is an upper bound to the maximal number of tasks that one worker can perform on a machine, as described by Sherali and Smith [22]. This is similar to fundamental domain inequalities [7] and the symmetry handling constraints of Ostrowski [16]. Although these can be used to handle symmetries, it is folklore that problems with largely deviating coefficients can lead to numerical instabilities. These constraints work well for instances with a small number of machines p, such as in the original instances of Sherali and Smith. However, for our instance noise11_16_480_s2, such a constraint has minimal absolute coefficient 1 and maximal absolute coefficient \(11^{10}\). Various warnings in the log files confirm the presence of numerical instabilities.

In fact, observable incorrect results follow. For instance, when adding these linear constraints, instance noise11_16_480_s1 finds infeasibility during presolving, whereas it is a feasible instance as illustrated by the no symmetry handling and dynamic orbitopal fixing runs. Moreover, for instance noise9_14_480_s3, no infeasibility is detected, but reports a wrong optimal solution. Thus, there is a need to replace these inequalities with numerically more stable methods.

Table 3 Results for finding optimal solutions to the noise dosage problems

We have removed these two instances from our testset as they obviously result in wrong results. The aggregated results for the remaining instances are presented in Table 3. We observe that the symmetry handling inequalities perform \({22.0}\,\%\) better than orbitopal reduction. However, the presented numbers for the inequality-based approach need to be interpreted carefully as reporting a correct objective value does not necessarily mean that the branch-and-bound algorithm worked correctly. For instance, nodes might have been pruned because of numerical inaccuracies although a numerical correct algorithm would not have pruned them. These issues do not occur for the propagation algorithm orbitopal reduction as it just reduces variable domains.

Comparing the two parameterizations of orbitopal fixing, we see that the median variant performs \({4.7}\,\%\) better than the first variant. This shows that the choice for the column reordering by symmetry \(\varphi _\beta \) has a measurable impact on the running time for dynamic orbitopal reduction. A possible explanation for the median rule to perform better than the first rule is that median creates more balanced branch-and-bound trees, cf. Sect. 3.2. Consequently, the right choice of \(\varphi _\beta \) might significantly change the performance of orbitopal reduction. We leave it as future research to find a good rule for the selection of \(\varphi _\beta \) as this rule might be based on the structure of the underlying problem, e.g., size of the variable domains, number of rows, and number of columns.

5.2.4 Kissing number

The previous instances feature problems with predominantly integer variables. To study the effect on handling symmetries on continuous variables, we study the kissing number problem. The decision variant asks whether it is possible to arrange n non-overlapping d-dimensional spheres, each intersecting with the unit sphere centered around the origin in exactly one point. Hence, the centerpoints of the n spheres are two unit distances away from the origin. To ensure non-overlap of the n spheres, also the pairwise distance of their center points must be at least 2.

As Liberti [42], we consider an optimization version of the kissing number problem in which the minimal distances between the n spheres is maximized. The optimization problem for a configuration (nd) can be used to answer the decisions version. Indeed, (nd) is a yes-instance to the kissing number problem if and only if the distance in the optimization version is at least 2. We use the nonconvex non-linear programming formulations by Liberti [42], which maximizes the minimal distance between the centerpoints. The standard formulation is to

$$\begin{aligned} \text {maximize}\ {}&\alpha \end{aligned}$$
(7a)
$$\begin{aligned} \text {subject to}\,&\Vert x^i \Vert ^2 = 4&\text {for all}\ i \in [n], \end{aligned}$$
(7b)
$$\begin{aligned}&\Vert x^i - x^j \Vert ^2 \ge 4 \alpha&\text {for all}\ i, j \in [n], i \ne j, \end{aligned}$$
(7c)
$$\begin{aligned}&x^i \in [-2, 2]^d&\text {for all}\ i \in [n], \end{aligned}$$
(7d)
$$\begin{aligned}&\alpha \in [0, 1]. \end{aligned}$$
(7e)

Liberti also presents a reformulation, where (7b) is used to substitute the square terms in \(\Vert x^i - x^j \Vert ^2 = \sum _{k \in [d]} (x_d^i - x_d^j)^2\), yielding \(8 - 2 \sum _{k \in [d]} x_d^i x_d^j\). Moreover, in the standard (bounded) formulation, the bounds \(0 \le \alpha \le 1\) can be imposed. This means that the optimization problem is done when it finds any solution where the outer spheres do not intersect. We also consider the unbounded formulation, enforcing only \(\alpha \ge 0\). In this, the minimal distance between the centerpoints of the spheres is maximized.

The problem has many symmetries: any rotation or reflection along the origin, as well as the orbitopal symmetries that permute vectors \(x^i\). We show the effect of handling those orbitopal sub-symmetries only and compare: no symmetry handling (Nosym); static orbitopal fixing (OtopRed (static)) where the ordering of the columns \(x^i\) is lexicographically non-increasing for increasing i; dynamic orbitopal fixing (OtopRed (dyn.)); the SST cuts (SST, cf. [14, 27]) \(x_1^1 \ge x_1^2 \ge \dots \ge x_1^n\) that enforce that the first coordinates of the vectors \(x^i\) are decreasing for increasing i; and the combination of static orbitopal reduction and SST cuts. Obviously, static orbitopal reduction and SST cuts are compatible symmetry handling methods, as the constraints handle the lexicographic ordering constraint of Example 10 with the same variable ordering.

We use the non-linear programming interface of SCIP with Ipopt 3.12, extended with our symmetry handling methods. We consider the instances with parameters \(n \in \{2, \dots , 15 \}\) and \(d = 2\). The experiments have been run on a Linux cluster with Intel Xeon E5 \({3.5}\textrm{GHz}\) quad core processors and \({32}\,\textrm{GB}\) memory. To minimize performance variability, the instances run exclusively on these machines. Since the instances are hard to solve, we utilize a time limit of \({8}\,\textrm{h}\) per instance.

Table 4 Per-instance results for the reformulated unbounded kissing number instances

Table 4 presents the computational results for the unbounded and reformulated instances. The figures for the other instance sets are similar, and are tabulated in Appendix D. Without symmetry handling, no instance with \(n \ge 7\) can be solved, whereas instances up to \(n = 14\) are solved with symmetry handling. Comparing static and dynamic orbitopal reduction, the static version is more performant than the dynamic variant for small values of n. For \(n \ge 11\), however, the dynamic version is faster and can solve one more instance. Handling symmetries via SST cuts is, in general, the method performing best. One possible explanation is that the explicit addition of symmetry handling inequalities in the problem formulation allows to strengthen the relaxation of the non-linear kissing number problem, cf. [44]. Combining static dynamic orbitopal reduction and SST cuts allows to improve the running time, in particular, this combination is faster than dynamic orbitopal reduction. Except for \(n = 12\), however, the combined setting is not faster than exclusively using SST cuts to handle symmetries. For future research, it would be interesting to see whether one can gain synergies between SST cuts and static orbitopal reduction. In principle, one can expect that such synergies exist since SST cuts are implied by the symmetry handling constraints that static orbitopal reduction is also using. But this requires a more in-depth study of the symmetries of the kissing number problem and how they interact with the solution techniques for non-linear problems, which is out of scope of this article.

6 Conclusions and future research

Symmetry handling is an important component of modern solver technology. One of the main issues, however, is the selection and combination of different symmetry handling methods. Since the latter is non-trivial, we have proposed a flexible framework that easily allows to check whether different methods are compatible and that does apply for arbitrary variable domains. Numerical results show that our framework is substantially faster than symmetry handling in the state-of-the-art solver SCIP. In particular, we benefit from combining different symmetry handling methods, which is possible in our framework, but only in a limited way in SCIP. Moreover, due to our generalization of symmetry handling algorithms for binary problems to general variable domains, our framework allows us to reliably handle symmetries in different non-binary applications.

Due to the flexibility of our framework, it is not only applicable for the methods discussed in this article, but also allows to apply methods that will be developed in the future (provided they are compatible with SHC s  (2)). This opens, among others, the following directions for future research.

In this article, we experimentally evaluated our framework only for permutation symmetries. As the framework also supports other types of symmetries such as rotational and reflection symmetries, further research could involve devising symmetry handling methods for such symmetries. Moreover, to handle permutation symmetries, we only used propagation techniques. In the future, these methods can be complemented in two ways. On the one hand, other techniques such as separation routines can be applied to handle SHC s  (2). On the other hand, our symmetry handling methods have not exploited additional problem structure such as packing-partitioning structures in orbitopal fixing. Further research can focus on the incorporation of problem constraints in handling the symmetry handling constraint of Theorem 8. This includes a dynamification of packing-partitioning orbitopes, as well as introducing a way to handle overlapping orbitopal subgroups within a component.

Last, in the computational results we describe decision rules for enabling/disabling certain symmetry handling methods. If new symmetry handling methods are cast into our framework, however, these rules need to be updated. Future research could thus encompass the derivation of good rules for how to handle symmetries in our framework.