1 Introduction

Mixed-integer nonlinear programming (MINLP) concerns with the optimization of an objective function such that a finite set of linear or nonlinear constraints and integrality conditions is satisfied. The generality of this problem class means that many real-world applications can be modeled as MINLP [1,2,3,4], but also that software that handles this class efficiently becomes extremely complex. MINLP solvers [5] are often built on top of or by combining solvers for mixed-integer linear programs (MIP) and solvers that find locally optimal solutions for nonlinear programs (NLP). In fact, the first general purpose solver, DICOPT [6], decomposes the solution of an MINLP into a sequence of MIP and NLP solves [7], thereby building on established software for these program classes. DICOPT solves MINLPs with convex nonlinear constraints to optimality, but works only as a heuristic on nonconvex MINLPs. The first general purpose solvers for nonconvex MINLPs were \(\alpha \)BB, BARON, and GLOP [8,9,10], all based on convexification techniques for nonconvex constraints. Also the solver SCIP (Solving Constraint Integer Programs) belongs to this category [11].

In the following, MINLPs of the form

$$\begin{aligned} \min \quad&c^{\top }x, \\ \mathrm {such\ that} \quad&\underline{g} \le g(x) \le \overline{g}, \\&\underline{b} \le Ax \le \overline{b}, \\&\underline{x} \le x \le \overline{x}, \\&x_{\mathcal {I}} \in \mathbb {Z}^{\vert \mathcal {I}\vert }, \end{aligned}$$
(MINLP)

are considered, where \(\underline{x}\), \(\overline{x} \in \overline{{\mathbb {R}}}^{n}\), \(\overline{{\mathbb {R}}}:= {\mathbb {R}}\cup \{\pm \infty \}\), \(\underline{x}\le \overline{x}\), \(\mathcal {I} \subseteq \{1, \ldots , n\}\), \(c \in {\mathbb {R}}^n\), \(\underline{g}\), \(\overline{g}\in \overline{{\mathbb {R}}}^m\), \(\underline{g}\le \overline{g}\), \(g: {\mathbb {R}}^{n} \rightarrow \overline{{\mathbb {R}}}^m\) is specified explicitly in algebraic form, \(\underline{b},\overline{b}\in \overline{{\mathbb {R}}}^{{\tilde{m}}}\), \(\underline{b}\le \overline{b}\), and \(A\in \mathbb {R}^{{\tilde{m}}\times n}\). The restriction to a linear objective function is a technical detail of SCIP and without loss of generality.

SCIP is a branch-cut-and-price framework for the solution different types of optimization problems, most generally constraint integer programs (CIPs), and most importantly MIPs and MINLPs. CIPs are finite-dimensional optimization problems with arbitrary constraints and a linear objective function that satisfy the following property: if all integer variables are fixed, the remaining subproblem is a linear or nonlinear program. The problem class of CIP was motivated by the modeling flexibility of constraint programming and the algorithmic requirements of integrating it with efficient solution techniques for MIP [12].

In order to solve CIPs, SCIP constructs relaxations—typically linear programs (LPs). If the relaxation solution is not feasible for the current subproblem, the plugins that handle the violated constraints need to take measures to eventually render the relaxation solution infeasible for the updated relaxation, for example by branching or separation [12]. A plethora of additional plugin types, e.g., for presolving, finding feasible solutions, or tightening variable bounds, allow accelerating the solution process. After 20 years of development of the framework itself and included plugins, SCIP includes mature solvers for MIP, MINLP, and several other problem classes [13]. The extended version of this paper [14] provides a short overview on the history of the MINLP solver in SCIP. Since November 2022, SCIP is freely available under an open-source license.

SCIP solves MINLPs to global optimality via a spatial branch-and-bound algorithm that mixes branch-and-infer and branch-and-cut [15]. Important parts of the solution algorithm are presolving, domain propagation (that is, tightening of variable bounds), linear relaxation, and branching. A distinguishing feature of SCIP is that its capabilities to handle nonlinear constraints are not limited to MINLPs, but can be used for any CIP. For example, problems can be handled where linear and nonlinear constraints are mixed with typical constraints from constraint programming, as long as appropriate constraint handlers have been included in SCIP. Since most constraint handlers in SCIP construct a linear relaxation of their constraints, also the handling of nonlinear constraints focuses on linear relaxations. The emphasis on handling CIPs with nonlinear constraints rather than MINLP only is also a reason that the use of nonlinear relaxations or reformulations of complete MINLPs into other problem types, e.g., mixed-integer conic programs, has not been explored much so far.

With SCIP 8 [16], a complete overhaul of nonlinear constraint handling was released. The primary motivation for this change was to increase the reliability of the solver and to alleviate numerical issues that arose from problem reformulations and led to SCIP returning solutions that are feasible in the reformulated problem, but infeasible in the original problem. More precisely, previous SCIP versions built an extended formulation of (MINLP) explicitly, with the consequence that the original constraints were no longer included in the presolved problem. Even though the formulations were theoretically equivalent, it was possible that \(\varepsilon \)-feasible solutions for the reformulated problem were not \(\varepsilon \)-feasible in the original problem. SCIP 8 remedies this by building an implicit extended formulation as an annotation to the original problem. A second motivation for the major changes in SCIP 8 was to reduce the ambiguity of expression and nonlinear structure types by implementing different plugin types for low-level structure types that define expressions, and high-level structure types that add functionality for particular, sometimes overlapping structures. Finally, new features for improving the solver’s performance on MINLPs were introduced. These include intersection, SDP (semi-definite programming), and RLT (reformulation linearization technique) cuts for quadratic expressions [17, 18], perspective strengthening [19], and symmetry detection [20].

An overview of SCIP’s MINLP solving capabilities is given next. Afterwards, the performance of SCIP and other global MINLP solvers is compared.

2 MINLP capabilities of SCIP

In the following, the integration of nonlinear constraints into the branch-and-cut solver of SCIP is discussed. Next, the concept of a nonlinear handler is introduced, which is a new plug-in type of SCIP 8 that facilitates the integration of extensions that handle specific nonlinear structures. The remainder of this section gives a concise overview of features that increase the efficiency of MINLP solving. Unless specified otherwise, more details are often found in [16].

2.1 Framework

2.1.1 Expressions

Algebraic expressions are well-formed combinations of constants, variables, and algebraic operations such as addition, multiplication, and exponentiation, that are used to describe mathematical functions. They are represented by a directed acyclic graph with nodes representing variables, constants, and operations and arcs indicating the flow of computation. In SCIP, all semantics of expression operands are defined by expression handler plugins. These handler provide callbacks that are used by the SCIP core to manage expressions (create, modify, copy, parse, print), to evaluate at a point or over intervals, to compute derivatives, to simplify and compare, and to check curvature and integrality.

For the following operators, expression handlers are included in SCIP 8: constant, variable, affine-linear function, product, power, signpower (\(y\mapsto \textrm{sign}(y)|{y}|^p\) for \(p>1\)), exponentiation, logarithm, entropy, sine, cosine, and absolute value. In previous versions of SCIP, also high-level structures such as quadratic functions could be represented as expression types. To avoid ambiguity and reduce complexity, this has been replaced by a recognition of quadratic expressions that is no longer made explicit in the expression type.

2.1.2 Constraint handler for nonlinear constraints

All nonlinear constraints \(\underline{g}\le g(x)\le \overline{g}\) of (MINLP) are handled by the constraint handler for nonlinear constraints in SCIP, while the linear constraints \(\underline{b}\le Ax\le \overline{b}\) are handled by the constraint handlers for linear constraints and its specializations (e.g., knapsack, set-covering). A constraint handler is responsible for checking whether solutions satisfy constraints and, if that is not the case, to resolve infeasibility by enforcing constraints. This applies in particular to solutions of the LP relaxation. The nonlinear constraint handler currently enforces its constraints by the following means:

  1. DOMAINPROP:

    by analyzing the constraints with respect to the variable bounds at the current node of the branch-and-bound tree, infeasibility or a bound tightening may be deduced, which allow pruning the node or cutting off the given solution, respectively; this is also known as domain propagation;

  2. SEPARATE:

    computing a cutting plane that is violated by the given solution;

  3. BRANCH:

    the current node of the branch-and-bound tree is subdivided, that is, a variable \(x_i\) and a branching point \({\tilde{x}}_i\in [\underline{x}_i,\overline{x}_i]\) are selected and two child nodes with \(x_i\) restricted to \([\underline{x}_i,\tilde{x}_i]\) and \([\tilde{x}_i,\overline{x}_i]\), respectively, are created.

To decide whether a node can be pruned (DOMAINPROP), an overestimate of the range of g(x) with respect to current variable bounds is computed by means of interval arithmetics [21]. If a constraint k is found such that \(g_k([\underline{x},\overline{x}])\cap [\underline{g}_k,\overline{g}_k]=\emptyset \), then there exists no point in \([\underline{x},\overline{x}]\) for which this constraint is feasible. A bound tightening may be computed by applying the same methods in reverse order. That is, interval arithmetic is used to overestimate \(g^{-1}([\underline{g},\overline{g}])\), the preimage of g(x) on \([\underline{g},\overline{g}]\), and variable bounds are tightened to \([\underline{x},\overline{x}]\cap g^{-1}([\underline{g},\overline{g}])\). This is also known as feasibility-based bound tightening (FBBT). In the simplest case, callbacks of expression handlers are used to propagate intervals through expressions. However, in some cases, other methods that take more structure into account or that use additional information are used (see, e.g., Sects. 2.3.1 and 2.3.2).

To construct a linear relaxation of the nonlinear constraints (SEPARATE option), an extended formulation is considered:

figure a

The functions \(h_i\) are obtained from the expressions that define functions \(g_i\) by recursively annotating subexpressions with auxiliary variables \(w_{i+1},\ldots ,w_{{\hat{m}}}\) for some \({\hat{m}} \ge m\). Initially, slack variables \(w_1,\ldots ,w_m\) are introduced and assigned to the root of all expressions, i.e., \(h_i:=g_i\), \(\underline{w}_i:= \underline{g}_i\), \(\overline{w}_i:=\overline{g}_i\), for \(i=1,\ldots ,m\). Next, for each function \(h_i\), subexpressions f may be assigned new auxiliary variables \(w_{i'}\), \(i'>m\), which results in extending (\(\text {MINLP}_\text {ext}\)) by additional constraints \(h_{i'}(x) = w_{i'}\) with \(h_{i'}:= f\). Bounds \(\underline{w}_{i'}\) and \(\overline{w}_{i'}\) are initialized to bounds on \(h_{i'}\), if available. Since auxiliary variables in a subexpression of \(h_i\) always receive an index larger than \(\max (m,i)\), the result is referred to by \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) for any \(i=1,\ldots , {\hat{m}}\). If a subexpression appears in several expressions, it is assigned at most one auxiliary variable.

For the (in)equality sense \(\lesseqgtr _i\), a valid simplification is to assume equality. For performance reasons, though, it can be beneficial to relax to inequalities if that does not change the feasible space of (\(\text {MINLP}_\text {ext}\)) when projected onto x. Therefore, for \(i\in \{1,\ldots ,m\}\), \(\lesseqgtr _i\) is set according to the finiteness of \(\underline{g}_i\) and \(\overline{g}_i\). For \(i>m\), monotonicity of expressions is taken into account to derive \(\lesseqgtr _i\).

Whether to annotate a subexpression by an auxiliary variable depends on the structures that are recognized. In the simplest case, every subexpression that is not already a variable is annotated with an auxiliary variable. This essentially corresponds to the Smith Normal Form [10]. For every function \(h_i\) of (\(\text {MINLP}_\text {ext}\)), the callbacks of the corresponding expression handler can be used to compute linear under- and overestimators, such that a linear relaxation for (\(\text {MINLP}_\text {ext}\)) is constructed. It can, however, be beneficial to not add an auxiliary variable for every subexpression, thus allowing for more complex functions in (\(\text {MINLP}_\text {ext}\)). This will be the discussed in Sect. 2.1.3 below.

If a constraint \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\lesseqgtr _i w_i\) of (\(\text {MINLP}_\text {ext}\)) is violated in the LP solution and no cut is found that separates this solution, then the variables appearing in \(h_i\) are candidates for branching (BRANCH). More precisely, when an expression handler computes a linear under- or overestimator for \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\), it also signals for which variables it used current variable bounds. Marked original variables are then added to the list of branching candidates. For an auxiliary variable \(w_{i'}\), \(i'>i\), the variables in the subexpression that \(h_{i'}\) represents are considered for branching instead.

The decision on whether to add a cutting plane that separates the solution of the LP relaxation or to branch is rather complex, but the idea is to branch if either no cutting plane is found or if the violation of available cutting planes in the relaxation solution is rather small when compared to the convexification gap of the under/overestimators that define the cutting planes. In the latter case, it may be beneficial to first reduce the convexification gap by branching. To select one variable from the list of branching candidates, the violation of constraints in (\(\text {MINLP}_\text {ext}\)) and historical information about the effect of branching on a given variable on the optimal value of the LP relaxation (“pseudo costs”) are taken into account. The branching point is a convex combination of the value of the variable in the LP relaxation and the mid-point of the variable’s interval.

2.1.3 Nonlinear handlers

For a constraint \(\log (x)^2 + 2\log (x)y + y^2\le 4\), a slack variable and four auxiliary variables would be introduced to construct the extended formulation \(w_2+2w_3+w_4\le w_1\), \(w_5=\log (x)\), \(w_2=w_5^2\), \(w_3=w_5y\), \(w_4=y^2\). This is due to the expression handlers having a rather myopic view, basically, implementing techniques that can handle only their direct children. It is clear that, for this example, an extended formulation that only replaces \(\log (x)\) by an auxiliary variable \(w_2\) could be more efficient to solve. However, this requires methods to detect the quadratic (or convex) structure and to either compute linear underestimators for the quadratic (convex) expression \(w_2^2 + 2w_2y+y^2\) or to separate cutting planes for the set defined by \(w_2^2 + 2w_2y+y^2\le w_1\).

Such structure detection and handling methods are the task of the new nonlinear handler plugins that were introduced with SCIP 8. Nonlinear handlers determine the extended formulation (\(\text {MINLP}_\text {ext}\)) by deciding when to annotate subexpressions with auxiliary variables. That is, given a constraint \(h_i(x) \lesseqgtr _i w_i\), a nonlinear handler analyses the expression that defines \(h_i\) and attempts to detect specific structures. At this point, it may also request to introduce additional auxiliary variables, thus changing \(h_i(x)\) into \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\). In addition, it informs the constraint handler that it will provide separation for \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}}) \le w_i\), or \(\ge w_i\), or both. If none of the nonlinear handlers declare that they will handle \(h_i(x) \lesseqgtr _i w_i\), auxiliary variables are introduced for each argument of the root of the expression \(h_i\) and expression handler callbacks are used to construct cutting planes from linear under-/overestimators.

In addition to separation, nonlinear handlers can also contribute to domain propagation. This is implemented analogously to separation by setting up an additional extended formulation similarly to (\(\text {MINLP}_\text {ext}\)).

Note that the extended formulations are stored as annotation on the original expressions. Thus, for each task, the most suitable formulation can be used. For example, feasibility is checked on the original constraints, domain propagation and separation use the corresponding extended formulations, but branching is performed, by default, with respect to original variables only. With SCIP 7 and earlier, only one extended formulation was constructed explicitly and the connection to the original formulation was no longer available, leading to problems in ensuring that solutions are (\(\varepsilon \)-)feasible for the original constraints.

In addition to an improved numeric reliability, the nonlinear handlers also allow for a higher flexibility when handling nonlinear structures. For each node in an expression, several nonlinear handler can be attached, each one annotating possibly different subexpressions with auxiliary variables. For example, for a nonconvex quadratic constraint \(\sum _{i,j} a_{i,j} x_ix_j \le w\), the nonlinear handler for quadratics can declare that it will provide separation (by intersection cuts, see Sect. 2.3.5), but that also other means of separation should be tried. However, since no other nonlinear handler declares that it will provide separation, auxiliary variables are introduced for each argument of the sum, that is, an auxiliary variable \(X_{ij}\) is assigned to each product \(x_ix_j\). For the corresponding constraints \(x_ix_j\le X_{ij}\) (if \(a_{i,j}\ge 0\)), the McCormick underestimators [22]

$$\begin{aligned} X_{ij} \ge \underline{x}_ix_j + \underline{x}_jx_i - \underline{x}_i\underline{x}_j, \quad X_{ij} \ge \overline{x}_ix_j + \overline{x}_jx_i - \overline{x}_i\overline{x}_j \end{aligned}$$
(1)

or other means (see Sect. 2.3.2) will be used to construct a linear relaxation.

2.1.4 NLP relaxation

Similar to the central LP relaxation of SCIP, an NLP relaxation is also available. In contrast to constraint handlers, the NLP relaxation uses a common data structure to store its constraints. Therefore, in case of a MINLP, the NLP relaxation together with the integrality conditions on variables provides a unified view of the problem. To find local optimal solutions for the NLP relaxation, interfaces to the NLP solvers filterSQP, Ipopt, and Worhp [23,24,25] are available. Function derivatives are computed via CppAD [26].

2.2 Presolving

When presolving nonlinear constraints, expressions are simplified and brought into a canonical form. For example, recursive sums and products are flattened and fixed or aggregated variables are replaced by constants or sums of active variables. In addition, it is ensured that if a subexpression appears several times (in the same or different constraints), always the same expression object is used.

2.2.1 Variable fixings

Similar to what has been shown by Hansen et al. [27], if a bounded variable \(x_j\) does not appear in the objective (\(c_j=0\)), but in exactly one constraint \(\underline{g}_k \le g_k(x) \le \overline{g}_k\) where \(g_k(x)\) is convex in \(x_j\) for any fixing of other variables and \(\overline{g}_k = +\infty \) (or concave in \(x_j\) and \(\underline{g}_k=-\infty \)), then there always exists an optimal solution where \(x_j \in \{\underline{x}_j,\overline{x}_j\}\). For example, if \(y\in [0,1]\) appears only in a constraint \(xy+yz-y^2\le 5\), then y can be changed to a binary variable.

SCIP recognizes such variables for polynomial constraints (under additional assumptions [16]) and changes the variable type to binary, if \(\underline{x}_j = 0\) and \(\overline{x}_j=1\), or adds a bound disjunction constraint \(x_j \le \underline{x}_j \vee x_j \ge \overline{x}_j\). As a consequence, branching on \(x_j\) leads to fixing the variable in both children.

2.2.2 Linearization of products

To better utilize SCIP’s techniques for MIP solving, products of binary variables are linearized. In the simplest case, a product \(\prod _i x_i\) is replaced by a new variable z and a constraint of type “and” that models \(z = \bigwedge _i x_i\) is added. The “and”-constraint handler will then separate a linearization of this product [28]. For a product of only two binary variables, the linearization is added directly.

For a quadratic function in binary variables with many terms, the number of variables introduced may be large. In this case, a linearization that requires fewer additional variables is used, even though it may lead to a weaker relaxation.

2.2.3 KKT strengthening for QPs

A presolving method that aims to tighten the relaxation of a quadratic program (QP) by adding redundant constraints derived from Karush-Kuhn-Tucker (KKT) conditions is available. Consider a quadratic program of the form \(\min \{\tfrac{1}{2}\, x^\top Q x + c^\top x: Ax \le b \}\), where \(Q \in {\mathbb {R}}^{n \times n}\) is symmetric, \(c \in {\mathbb {R}}^n\), \(A \in {\mathbb {R}}^{m \times n}\), and \(b \in {\mathbb {R}}^m\). If the QP is bounded, then all optima satisfy the KKT conditions \(Q x + c + A^\top \mu = 0\), \(Ax \le b\), \(\mu _i (Ax - b)_i = 0\), \(i=1, \ldots , m\), where \(\mu \ge 0\) is the vector of Lagrangian multipliers of constraints \(Ax\le b\).

If SCIP recognizes that (MINLP) is equivalent to a QP and all variables are bounded, then the KKT conditions are added as redundant constraints to the problem, whereby the complementarity constraints are formulated via special ordered sets of type 1. The redundant constraints can help to strengthen the linear relaxation and prioritize branching decisions to satisfy the complementarity constraints, which focuses the search more on the local optima.

In addition to a QP, the implementation can also handle mixed-binary quadratic programs. For all details, see [29, 30]. When this presolver was added to SCIP 4.0, it has shown to be very beneficial for box-constrained quadratic programs. Due to the many changes and extensions in SCIP 8 for the handling of quadratic constraints (Sect. 2.3) it needs to be reevaluated under which conditions this presolver should be enabled. Currently, it is disabled by default.

2.2.4 Symmetry detection

Symmetries are automorphisms on \(\mathbb {R}^n\) that map optimal solutions to optimal solutions. They have an adverse effect on the performance of branch-and-bound solvers, because symmetric subproblems may be treated repeatedly. Therefore, SCIP can enforce lexicographically maximal solutions from an orbit of symmetric solutions via bound tightening and separation [16, 31,32,33].

Since optimal solutions are naturally not known in advance, the symmetry detection resorts to find permutations of variables that map the feasible set onto itself and map each point to one with the same objective function value [34]. These permutations are given by isomorphisms in an auxiliary symmetry detection graph, which is constructed from the problem data (e.g., c, A, \(\mathcal {I}\), and the expressions that define g(x)) [20, 35].

2.3 Quadratics

Since quadratic functions frequently appear in MINLPs, a number of techniques have been added to SCIP to handle this structure. Next to the presolving methods that were discussed in the previous section, three nonlinear handlers and four separators deal with quadratic structures. When none of the nonlinear handlers are active, then for each square and bilinear term in a quadratic function, an auxiliary variable is added in the extended formulation and gradient, secant, and McCormick under- and overestimators (1) are generated.

2.3.1 Domain propagation

If variables appear more than once in a quadratic function, then a term-wise domain propagation does not necessarily yield the best possible results, due to suffering from the so-called dependency problem of interval arithmetics. For example, it is easy to compute the range for \(x^2+x\) for given bounds on x, or bounds on x for a given interval on \(x^2+x\), but standard interval arithmetics treats the terms \(x^2\) and x separately, which leads to overestimating the result.

Therefore, a specialized nonlinear handler in SCIP provides a domain propagation procedure for quadratics that aims to reduce overestimation. For this, the detection routine of the handler writes a quadratic expression as \(q(y) = \sum _{i=1}^k q_i(y)\) with \(q_i(y) = a_iy_i^2 + c_iy_i + \sum _{j\in P_i} b_{i,j}y_iy_j\), where \(y_i\) is either an original variable (x) or another expression, \(a_i,c_i\in {\mathbb {R}}\), \(b_{i,j}\in {\mathbb {R}}{\setminus }\{0\}\), \(j\in P_i \Rightarrow i\not \in P_j\) for all \(j\in P_i\), \(P_i\subset \{1,\ldots ,k\}\), \(i=1,\ldots ,k\). For functions \(q_i\) with at least two terms (at least two of \(a_i\), \(b_{i,j}\), \(j\in P_i\), and \(c_i\) are nonzero), a relaxation is obtained by replacing each \(y_j\) by \([\underline{y}_j,\overline{y}_j]\), \(j\in P_i\). For this univariate quadratic interval-term in \(y_i\), tight bounds can be computed [36].

In addition, bounds on variables \(y_j\), \(j\in P_i\), are computed by considering \(\sum _{j\in P_i}b_{i,j}y_j \in ([\underline{q},\overline{q}] - \sum _{i'\ne i} q_{i'}(y))/y_i - a_iy_i - c_i\), \(y_i\in [\underline{y}_i,\overline{y}_i]\), where \([\underline{q},\overline{q}]\) are given bounds on q(y). After relaxing each \(q_{i'}\) to an interval, bounds on each \(y_j\), \(j\in P_i\), can be computed.

2.3.2 Bilinear terms

For a product \(y_1y_2\), where \(y_1\) and \(y_2\) are either non-binary variables or other expressions, best possible linear under- and overestimators when considering the bounds \([\underline{y}_1,\overline{y}_1] \times [\underline{y}_2,\overline{y}_2]\) only are given by (1). However, if linear inequalities in \(y_1\) and \(y_2\) are available, then possibly tighter linear estimates and variable bounds can be computed using an algorithm by Locatelli [37]. The inequalities are found by projection of the LP relaxation onto variables \((y_1,y_2)\). For more details, see [38]. An alternative method that uses linear constraints to tighten the relaxation of quadratic constraints is described in the following.

2.3.3 RLT cuts

The Reformulation–Linearization Technique (RLT) [39, 40] has proven very useful to tighten relaxations of polynomial programming problems. In SCIP, an RLT separator for bilinear product relations in (\(\text {MINLP}_\text {ext}\)) is available.

For simplicity, denote by \(X_{ij}\) the auxiliary variable that is associated with a constraint \(x_ix_j \lesseqgtr X_{ij}\) of (\(\text {MINLP}_\text {ext}\)) (\(X_{ji}\) denotes the same variable as \(X_{ij}\)). Recall that it is valid to replace \(\lesseqgtr \) by \(=\). RLT cuts are derived by multiplying a linear constraint by a nonnegative bound factor and replacing the product relations by variables from X. For example, given a linear constraint \(a^\top x \le b\) and a bound \(x_i \ge \underline{x}_i\), the quadratic inequality \(a^\top x\, (x_i - \underline{x}_i) \le b\,(x_i - \underline{x}_i)\) is formed. Next, each term \(x_kx_i\) is replaced by \(X_{ki}\), if \(X_{ki} = x_kx_i\) exists in (\(\text {MINLP}_\text {ext}\)), or estimated by (1), otherwise.

In addition, the RLT separator can reveal linearized products between binary and continuous variables. To do so, it checks whether pairs of linear inequalities that are defined in the same triple of variables (one of them binary, the other two continuous) imply a product relation. These implicit products can then be used in the linearization step of RLT cut generation [18].

2.3.4 SDP cuts

A popular convex relaxation of the condition \(X = xx^\top \) (see previous section) is given by requiring \(X-xx^\top \) to be positive semidefinite (psd). Separation for the set \(\{(x,X): X-xx^\top \succeq 0\}\) itself is possible, but cuts are typically dense and may include variables \(X_{ij}\) for products that do not exist in the problem. Therefore, only principal \(2 \times 2\) minors of \(X-xx^\top \), which also need to be psd, are considered. By Schur’s complement, this means that the condition

$$\begin{aligned} A_{ij}(x,X):= \begin{bmatrix} 1 &{}\quad x_i &{}\quad x_j \\ x_i &{}\quad X_{ii} &{}\quad X_{ij} \\ x_j &{}\quad X_{ij} &{}\quad X_{jj} \end{bmatrix} \succeq 0 \end{aligned}$$
(2)

needs to hold for any ij, \(i\ne j\). A separator in SCIP detects minors for which \(X_{ii}\), \(X_{jj}\), \(X_{ij}\) exist in (\(\text {MINLP}_\text {ext}\)) and enforces \(A_{ij}(x,X)\succeq 0\) by adding a linear inequality \(v^\top A_{ij}(x,X)v \ge 0\), where \(v\in {\mathbb {R}}^3\) is an eigenvector of \(A_{ij}(\hat{x},\hat{X})\) with \(v^\top A_{ij}(\hat{x},\hat{X})v<0\) and \((\hat{x},\hat{X})\) is the solution that violates (2).

2.3.5 Intersection cuts

Intersection cuts [41, 42] have shown to be efficient to strengthen relaxations of MIPs. A recently described method to compute the tightest possible intersection cuts for quadratic programs [43] has been implemented in SCIP [17].

Assume a nonconvex quadratic constraint of (\(\text {MINLP}_\text {ext}\)) is \(q(y)\le w\) with q being a quadratic as in Sect. 2.3.1. The separation of intersection cuts is implemented for the set \(S:= \{ (y,w) \in {\mathbb {R}}^k: q(y) \le w \}\) that is defined by this constraint. Let \((\hat{y},{\hat{w}})\) be a basic feasible LP solution violating \(q(y) \le w\). First, a convex inequality \(g(y,w) < 0\) is build that is satisfied by \((\hat{y},{\hat{w}})\), but by no point of S. This defines a so-called S-free set \(C = \{ (y,w) \in {\mathbb {R}}^{k+1}: g(y,w) \le 0 \}\), that is, a convex set with \((\hat{y},{\hat{w}}) \in \text {int}(C)\) containing no point of S in its interior. The quality of the resulting cut highly depends on which S-free set is used, but using maximal S-free sets yield the tightest possible intersection cuts [43].

By using the conic relaxation K of the LP-feasible region defined by the nonbasic variables at \((\hat{y},{\hat{w}})\), the intersection points between the extreme rays of K and the boundary of C are computed. The intersection cut is then defined by the hyperplane going through these points and successfully separates \((\hat{x},{\hat{w}})\) and S. To obtain even better cuts, there is also a strengthening procedure implemented that uses the idea of negative edge extension of the cone K [44].

In addition to the separation of intersection cuts for a set S given by a constraint \(q(y)\le w\), SCIP can also generate intersection cuts for quadratic equations implied by the condition \(X=xx^\top \) (see Sect. 2.3.3). Since X needs to have rank 1, any \(2\times 2\) minor of X needs to have determinant zero. Therefore, for any set of variable indices \(i_1\), \(i_2\), \(j_1\), \(j_2\) with \(i_1\ne i_2\) and \(j_1\ne j_2\), the condition \( X_{i_1j_1}X_{i_2j_2} = X_{i_1j_2}X_{i_2j_1} \) needs to hold. If all variables in this condition exist in (\(\text {MINLP}_\text {ext}\)), then the procedure to generate intersection cuts is applied to the set defined by this condition, if it is violated.

Since intersection cuts can be rather dense, it is not clear yet how to decide when it will be beneficial to generate such cuts. Their separation is therefore currently disabled by default. For more details, see [17].

2.3.6 Edge-concave cuts

Another method to obtain a linear outer-approximation for a quadratic constraint is by utilizing an edge-concave decomposition of the quadratic function. This has shown to be particularly useful for randomly generated quadratic instances [45, 46]. A function is edge-concave over the variables’ domain (e.g., \([\underline{x},\overline{x}]\)) if it is componentwise concave.

Given a quadratic function, the separator for edge-concave cuts solves an auxiliary MIP to partition the square and bilinear terms into a sum of edge-concave functions and a remaining function. Since the convex envelope of edge-concave functions is vertex-polyhedral [47], that is, it is a polyhedral function with vertices corresponding to the vertices of the box of variable bounds, facets on the convex envelope of each edge-concave function can be computed by solving an auxiliary linear program (see also Sect. 2.4.1). For the remaining terms, linear underestimators such as (1) are summed up.

Since the current implementation of edge-concave cuts in SCIP has not shown to be particularly useful for general MINLP, it is disabled for now.

2.3.7 Second-order cones

An important connection between MINLP and conic programming is the detection of constraints that can be represented as a second-order cone (SOC) constraint, since the latter defines a convex set, while the original constraint may use a nonconvex constraint function. Thus, SOC detection is the aim of a specialized nonlinear handler in SCIP. In the detection phase, a constraint \(h_i(x) \le w_i\) (the case \(\ge \) is handled similarly) of (\(\text {MINLP}_\text {ext}\)) is passed to the nonlinear handler. For this constraint, it is checked whether it defines a bound on an Euclidian norm (\(\sqrt{\sum _{j=1}^k (a_j y_j^2 + b_j y_j) + c}\le w_i\) for some coefficients \(a_j,b_j,c\in {\mathbb {R}}\), \(a_j>0\), where \(y_j\) is either an original variable or some subexpression of \(h_i(\cdot )\)), or is a quadratic constraint that is SOC-representable [48]. Since the introduction of slack variables \(w_i\), \(i\le m\), may prevent such a detection, the equivalent constraint \(h_i(x) \le \bar{w}_i\) is considered instead.

Once a SOC constraint has been detected, a solution that violates this constraint can be separated. However, if the detected cone is of high dimension, then many cuts may be required to provide a tight linear relaxation. Thus, a disaggregation into three-dimensional cones as suggested by Vielma [49] is used.

2.4 Convexity

2.4.1 Convex and concave constraints

For the linear underestimation of functions like \(x\exp (x)\) or \(x^2 + 2xy + y^2\), the construction of an extended formulation (xw, \(\exp (x)=w\); \(w_1+2w_2+w_3\), \(w_1=x^2\), \(w_2=xy\), \(w_3=y^2\)) is not advisable. Instead, hyperplanes that support the epigraph of a convex function can be used if convexity is recognized. In SCIP, specialized nonlinear handlers are available to detect for a function \(h_i(x)\) of (\(\text {MINLP}_\text {ext}\)) the subexpressions that need to be replaced by auxiliary variables \(w_{i+1},\ldots , w_{\hat{m}}\) such that the remaining expression \(h_i(x,w_{i+1},\ldots , w_{\hat{m}})\) is convex or concave. The detection utilizes the often applied rules for convexity/concavity of function compositions (e.g., f convex and monotone decreasing, g concave \(\Rightarrow \) \(f \circ g\) convex), but applies them in reverse order. That is, instead of deciding whether a function is convex/concave based on information on the convexity/concavity and monotonicity of its arguments, the algorithm formulates conditions on the convexity/concavity of the function arguments given a convexity/concavity requirement on the function itself. When a condition on an argument cannot be fulfilled, it is replaced by an auxiliary variable.

Next to “myopic” rules for convexity/concavity that are implemented by the expression handlers, also rules for product compositions, signomials, and quadratic forms are available. Further, it has been shown that for a composition of convex functions \(f \circ g\), it can be beneficial for the linear relaxation to consider the extended formulation f(w), \(w\ge g(x)\), instead of the composition f(g(x)) [50]. This is enforced by a small variation of the detection algorithm.

When a convex constraint \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}}) \le w_i\) of (\(\text {MINLP}_\text {ext}\)) is violated at a point \(({\hat{x}},{\hat{w}})\), a tangent on the graph of \(h_i\) at \(({\hat{x}},{\hat{w}})\) provides a separating hyperplane. If, however, \(h_i\) is univariate, that is, \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})=f(y)\) for some variable y, and y is integral, then taking the hyperplane through the points \((\lfloor {\hat{y}}\rfloor , f(\lfloor {\hat{y}}\rfloor ))\) and \((\lfloor {\hat{y}}\rfloor +1, f(\lfloor {\hat{y}}\rfloor +1))\) gives a tighter underestimator.

For a concave function \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\), any hyperplane \(\alpha x+\beta w+\gamma \) that underestimates \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) in all vertices of the box \([\underline{x},\overline{x}]\times [\underline{w}_{i+1},\overline{w}_{i+1}]\times \cdots \times [\underline{w}_{{\hat{m}}},\overline{w}_{{\hat{m}}}]\) is a valid linear underestimator, since \(h_i\) is vertex-polyhedral with respect to the box. Maximizing \(\alpha {\hat{x}} + \beta {\hat{w}} + \gamma \) such that \(\alpha x + \beta w + \gamma \) does not exceed \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) for all vertices gives an underestimator that is as tight as possible at a given reference point \(({\hat{x}}, {\hat{w}})\). Since the size of this cut generating LP is exponential in k, underestimators for concave functions in more than 14 variables are currently not computed.

2.4.2 Tighter gradient cuts

The separating hyperplanes generated for convex functions of (\(\text {MINLP}_\text {ext}\)) as discussed in the previous section are, in general, not supporting for the feasible region of (\(\text {MINLP}_\text {ext}\)), because the point where the functions are linearized is not at the boundary of the feasible region. Therefore, often several rounds of cut generation and LP solving are required until the relaxation solution satisfies the convex constraints. Solvers for convex MINLP have handled this problem in various ways [7, 51], but the basic idea is to build gradient cuts at a suitable boundary point of the feasible region.

In SCIP, three procedures for building tighter and/or deeper gradient cuts for convex relaxations are included. The first two methods compute a point on the boundary of the set defined by all convex constraints of (MINLP) that is close to the point to be separated [29]. The first method solves an additional nonlinear program to project the point to be separated onto the convex set. Since solving an NLP for every point to be separated can be quite expensive, the second method, going back to an idea by Veinott [52], does a binary search between an interior point of the convex set and the point to be separated. The interior point is computed once in the beginning of the search by solving an auxiliary NLP. The third method does not aim to separate a given point, but utilizes the feasible points that are found by primal heuristics of SCIP. When a new solution is found, gradient cuts are generated at this solution for convex constraints of (\(\text {MINLP}_\text {ext}\)) and added to the cutpool. If such a cut is later found to separate the relaxation solution, it is added to the LP.

All methods are currently disabled as they are not yet efficient in general.

2.5 Quotients

Note that SCIP does not include a dedicated expression handler for quotients, since they can equivalently be written using a product and a power expression. Therefore, the default extended formulation for an expression \(y_1y_2^{-1}\) is given by replacing \(y_2^{-1}\) by a new auxiliary variable w. The linear outer-approximation is then obtained by estimating \(y_1w\) and \(y_2^{-1}\) separately. However, tighter linear estimates are often possible. Therefore, a specialized nonlinear handler checks whether a given function \(h_i(x)\) can be cast as \(f(y) = \frac{ay_1 + b}{cy_2 + d} + e\) with \(a,b,c,d,e\in {\mathbb {R}}\), \(a,c\ne 0\), and \(y_1\) and \(y_2\) being either original variables or subexpressions of \(h_i(x)\). By distinguishing a number of cases, linear estimators are computed, e.g., by exploring vertex-polyhedrality or by using a formula from [53]. In the univariate case (\(y_1=y_2\)), f is either convex or concave if \(-d/c\not \in [\underline{y}_2,\overline{y}_2]\) and a specialized domain propagation method is used to avoid the dependency problem of interval arithmetic.

2.6 Perspective strengthening

Perspective reformulations have shown to efficiently tighten relaxations of convex mixed-integer nonlinear programs with on/off-structures, which are often modeled via big-M constraints or semi-continuous variables [54]. A variable \(x_j\) is semi-continuous with respect to the binary indicator variable \(x_{j'}\) if it is fixed to a value \(x^0_j\) when \(x_{j'}=0\) and restricted to a domain \([\underline{x}^1_j, \overline{x}^1_j]\) when \(x_{j'}=1\).

In SCIP, a strengthening of under- and overestimators for functions that depend on semi-continuous variables is available. Consider a constraint \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}}) \lesseqgtr w_i\) of (\(\text {MINLP}_\text {ext}\)). A strengthening of under- or overestimators for \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) is attempted if the variables that \(h_i\) depend on are semi-continuous with respect to the same indicator variable \(x_{j'}\).

To determine whether a variable is semi-continuous, suitable bounds that are implied by fixing the same binary variable are searched for. The implied bounds can be obtained either from linear constraints directly or by probing, and are stored by SCIP in a globally available data structure. In addition, an auxiliary variable \(w_i\) is found to be semi-continuous if function \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) depends only on semi-continuous variables with the same indicator variable.

Assume that a linear underestimator \(\ell (x,w_{i+1},\ldots ,w_{{\hat{m}}})\) has been computed for \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\). The perspective strengthening extends the underestimator such that it is tight for \(x_{j'}=0\):

$$\begin{aligned}\ell (x,w_{i+1},\ldots ,w_{{\hat{m}}}) + \left( h_i(x^0,w^0_{i+1},\ldots ,w^0_{{\hat{m}}}) - \ell (x^0,w^0_{i+1},\ldots ,w^0_{{\hat{m}}})\right) (1-x_{j'}).\end{aligned}$$

This extension ensures that the estimator is equal to \(h_i(x,w_{i+1},\ldots ,w_{{\hat{m}}})\) for \(x_{j'}=0\), \((x,w) = (x^0,w^0)\), and equal to \(\ell (x,w_{i+1},\ldots ,w_{{\hat{m}}})\) for \(x_{j'}=1\). If \(h_i\) is convex, cuts obtained this way are equivalent to the classic perspective cuts [54]. The method is also applicable when there is a linear part of \(h_i\) that depends on variables that are not semi-continuous and that do not appear in the nonlinear part. For more details on the implementation in SCIP, see [19].

2.7 Optimization-based bound tightening

Optimization-Based Bound Tightening (OBBT) is a domain propagation technique which minimizes and maximizes each variable over the feasible set of the problem or a relaxation thereof [55]. Whereas FBBT (see Sect. 2.1.2) propagates the nonlinearities individually, OBBT considers (a relaxation of) all constraints together, and may hence compute tighter bounds, with higher effort.

In SCIP, OBBT solves for each variable \(x_k\) that could be subject to spatial branching two LPs that minimize and maximize the variable with respect to the constraints of the LP relaxation and the objective cutoff constraint \(c^\top x \le U\). The optimal values of these LPs may then be used to tighten the bounds of \(x_k\).

By default, OBBT is applied at the root node to tighten bounds globally. It restricts the computational effort by limiting the number of iterations spent for solving the auxiliary LPs and interrupting for cheaper domain propagation techniques between LP solves. Further, the dual solutions of the auxiliary LPs are used to derive linear inequalities that serve as computationally cheap approximation of OBBT during the branch-and-bound search. These inequalities are propagated whenever bounds of variables in the inequality become tighter or a new primal solution is found. For further details, see [56].

In addition to OBBT with respect to the LP relaxation, also a variant is available that optimizes variables with respect to the potentially tighter convex NLP relaxation that is given by all linear and convex nonlinear constraints of (MINLP) [29]. Because of the potentially high computational cost of solving many NLPs, this variant of OBBT is deactivated by default.

2.8 Primal heuristics

The purpose of primal heuristics is to find high quality feasible solutions early in the search. When given an MINLP, up to 40 primal heuristics are active in SCIP by default. Many of them aim to find an integer-feasible solution to the LP relaxation. In the following, primal heuristics that are only active in the presence of nonlinear constraints are discussed.

2.8.1 subNLP

A primal heuristic like subNLP is implemented in virtually any MINLP solver. Given a point \({\tilde{x}}\) that satisfies the integrality requirements, the heuristic fixes all integer variables in (MINLP) to the values given by \({\tilde{x}}\). It then calls the SCIP presolver on this subproblem for possible simplifications. Finally, it triggers a solution of the remaining NLP, using \({\tilde{x}}\) as the starting point. If the NLP solver, such as Ipopt, finds a solution that is feasible (and often also locally optimal) for the NLP relaxation, then a feasible point for (MINLP) has been found.

The starting point \({\tilde{x}}\) can be the current solution of the LP relaxation if integer-feasible, a point found by a primal heuristic that searches for integer-feasible solutions of the LP relaxation, or a point that is passed on by other primal heuristics for MINLP, such as those mentioned in the next sections.

2.8.2 Multistart

If (MINLP) is nonconvex after fixing all integer variables, then several local optima may exist for the NLPs solved by heuristic subNLP. Depending on the starting point, the NLP solver may find different local optimum. Therefore, the multistart heuristic aims to compute several starting points for subNLP.

The algorithm, originally developed in [57], aims to approximate the boundary of the feasible set of the NLP relaxation by sampling points from \([\underline{x},\overline{x}]\) and pushing them towards the feasible set by the use of an inexpensive gradient descent method. Afterwards, points that are relatively close to each other are grouped into clusters. Ideally, each cluster approximates the boundary of some connected component of the feasible set. For each cluster, a linear combination of the points is passed as a starting point to subNLP. For integer variables, the value in the starting point is rounded to an integral value. However, since this most likely leads to infeasible NLPs, the multistart heuristic currently runs for continuous problems only by default. For more details, see [29].

2.8.3 NLP diving

As an alternative to finding a good fixing for all integer variables of (MINLP), the NLP diving heuristic starts by solving the NLP relaxation at the current branch-and-bound node with an NLP solver, using the solution of the LP relaxation as starting point. It then iteratively fixes integer variables with fractional value and resolves both the LP and NLP relaxations, thereby simulating a depth-first-search in a branch-and-bound tree. By default, variables for which the sum of the distances from the solutions of the LP and NLP relaxations to a common integer value is minimal are rounded to the nearest integer value. Further, binary and nonlinear variables are preferred. If the resulting NLP is found to be (locally) infeasible, one-level backtracking is applied, that is, the last fixing is undone, and the opposite fixing is tried.

2.8.4 MPEC

While the NLP diving heuristic either completely omits or enforces integrality restrictions in the NLP relaxation, the MPEC heuristic adds a relaxation of the integrality restriction to the NLP and tightens this relaxation iteratively. The heuristic is only applicable to mixed-binary nonlinear programs at the moment.

The basic idea of the heuristic, originally developed in [58], is to reformulate (MINLP) as a mathematical program with equilibrium constraints (MPEC) and to solve this MPEC to local optimality. The MPEC is obtained by rewriting the condition \(x_i\in \{0,1\}\), \(i\in \mathcal {I}\), as complementarity constraint \(x_i \perp 1 - x_i\). This reformulation is again reformulated to an NLP by writing it as \(x_i\, (1-x_i) = 0\). However, these reformulated complementarity constraints will not, in general, satisfy constraint qualifications. Therefore, in order to increase the chances of solving the NLP reformulation, the heuristic solves regularized versions of the NLP by relaxing \(x_i (1-x_i) = 0\) to \(x_i (1-x_i) \le \theta \), for different, ever smaller \(\theta > 0\). If the NLP solution is close to satisfying \(x_\mathcal {I}\in \{0,1\}^{\vert \mathcal {I}\vert }\), it is passed as starting point to the subNLP heuristic. If an NLP is (locally) infeasible, the heuristic does two more attempts where the values for binary variables that are already close to 0 or 1 are flipped to 1 or 0, respectively. For more details, see [32].

2.8.5 Undercover

While the previous heuristics focused on enforcing the integrality condition on an NLP, heuristic undercover [59] starts from a completely different angle. The heuristic is based on the observation that it sometimes suffices to fix only a comparatively small number of variables of (MINLP) to yield a mixed-integer linear subproblem. For example, for a bilinear term, only one of the variables needs to be fixed. A set covering problem is solved to minimize the number of variables to fix. The values for the fixed variables are taken from solutions of the LP or NLP relaxation or a known feasible solution of the MINLP.

The resulting sub-MIP is less complex to solve, and does not need to be solved to proven optimality. The solutions of the sub-MIP are immediately feasible for (MINLP). However, the best one is also passed as starting point to heuristic subnlp to try for further improvement. For more details, see [59].

3 Benchmark

This section aims to present a fair comparison of SCIP with several other state-of-the-art solvers for general MINLP. Doing so is not trivial at all. First, a set of instances needs to be selected that is suitable as a benchmark set. Second, solver parameters have to be set such that all solvers solve the same instances with the same working limits and the same requirements on feasibility and optimality—this goal could not be reached completely. Third, the solver’s results have to be checked for correctness, or, when this is not possible, plausibility.

GAMS was used for the experiments, as it provides various facilities to help on solver comparisons and comes with current versions of SCIP and the commercial solvers BARON [60], Lindo API [61], and Octeract included.

All computations were run on a Linux cluster with Intel Xeon E5-2670 v2 CPUs (20 cores). The GAMS version is 41.2.0, which includes SCIP 8.0.2, BARON 22.9.30, Lindo API 14.0.5099.162, and Octeract 4.5.1. A GAMS license with all solvers enabled was used, so that SCIP uses CPLEX 22.1.0.0 as LP solver and Ipopt with HSL MA27 as NLP solver, BARON can choose between all LP/MIP/NLP solvers that it interfaces with, and Octeract uses CPLEX 22.1.0.0 as LP/MIP/QP/QCP solver.

3.1 Test set

To construct a test set suitable for benchmarking, the MINLPLib [62] collection of 1595 MINLPs was used as source. First, instances that could not be handled by some solver were excluded. All solvers were then run on the remaining 1505 instances using the parameter settings described below. The results of these runs were then used to select 200 instances that could be solved by at least one solver, that were not trivial for all solvers, had a varying degree of integrality and nonlinearity, and such that having many instances with a similar name is avoided. The latter was done to avoid overrepresentation of problems for which many instances were added to MINLPLib.

Since small changes to an instance can lead to large variations in the solver’s performance, the benchmark’s reliability is improved by considering for each instance four additional variants where the order of variables and equations has been permuted. Thus, a test set of 1000 instances is obtained.

The following approach was used to select the benchmark set of 200 instances before permutation: Let I be the initial set of 1505 instances from MINLPLib, \(d_i\) be the fraction of integer variables in instance \(i\in I\), and \(e_i\) be the fraction of nonzeros in the Jacobian and objective function gradient that correspond to nonlinear terms. Next, assign to each instance an identifier \(f_i\in F\) such that instances that seem to come from the same model are assigned the same identifier. This goal is approximated by mapping i to the name of the instance until the first digit, underscore, or dash, except for the block layout design instances fo*, m*, no*, o*, which were all assigned to the same identifier. \(\vert F\vert =230\) different identifiers were found this way.

Further, let \(\overline{t}_i\) be the largest time in seconds that any solver who did not produce wrong results on instance i spend on instance i. Finally, let S be the number of instances that could be solved by at least one solver.

To ensure that instances with a varying amount of integer variables and nonlinearity are included, the interval [0, 1] was split once at breakpoints 0.05, 0.25, 0.5, 0.9 and once at 0.1, 0.25, 0.5. Let D and E be the resulting partitions of [0, 1]. For every interval from D and E, the aim is to have roughly the same number of instances with \(d_i\) and \(e_i\) in the respective intervals. For the choice of breakpoints that define D and E, the distribution of \(d_i\) and \(e_i\), \(i\in I\), have been taken into account. For example, MINLPLib contains many purely continuous and purely discrete instances, but not many instances that are mostly linear or completely nonlinear.

To avoid including too many instances originating from the same model, including more than two instances for each identifier in F is discouraged. Further, instances that seem trivial, i.e., which are solved by all solvers in no more than five seconds, or could not be solved by any solver are excluded. Introducing penalty terms, the following optimization problem for instance selection is obtained:

This problem was solved for N varying between 180 and 220. For \(N=208\), this yield a selection of 200 instances with an acceptable penalty value of 106. Table 1 shows the number of instances for each element of \(D\times E\). For five identifiers from F, three instead of two instances were selected, i.e., \(\lambda _f=1\) for five \(f\in F\). Section 1 in the supplement gives the list of selected instances.

Table 1 Number of instances selected with “discreteness” \(d_i\) and “nonlinearity” \(e_i\) in intervals from D and E

3.2 Parameter settings

3.2.1 Missing variable bounds

To compute a lower bound on the optimal value of a minimization problem, all solvers considered here construct a convex relaxation of the given problem. For nonconvex constraints, this often relies on the computation of valid convex underestimators or concave overestimators. As these typically depend on variables’ bounds (recall (1)), an instance with missing or very large bounds on variables in nonconvex terms can be very hard or impossible to solve.

Even when the user forgot to specify some variable bounds, the solver may still be able to derive bounds via domain propagation. Further, once a feasible solution \({\hat{x}}\) has been found, additional bounds may be derived from the inequality \(c^\top x\le c^\top {\hat{x}}\). However, as there are always cases where bounds are still missing after presolve, solvers invented different ways to deal with this obstacle.

If SCIP cannot under- or overestimate because of missing variable bounds, it continues by branching on an unbounded variable. This way, there will eventually be a node in the branch-and-bound tree where all variables are bounded. Nodes that still contain unbounded variable domains may be pruned due to a derived lower bound on the objective function exceeding the incumbents objective function value. But it may also be the case that pruning will not be possible and SCIP does not terminate. However, variable bounds after branching cannot grow indefinitely in SCIP, but are limited by \(\pm 10^{20}\) by default. That is, SCIP does not search for solutions with variable values beyond this value.

The other solvers considered here add variable bounds based on a heuristic decision. If BARON is still missing bounds on variables in nonconvex terms after presolve, it sets the bound to a value that depends on the type of nonlinearity involved. Typically, this value is around \(\pm 10^{10}\). BARON also prints a warning and no longer claim to have solved a problem to global optimality, i.e., it does not return a lower bound. Lindo API adjusts the bounds for all variables that are involved in convexification to be within \([-10^{10},10^{10}]\). At termination, it returns the lower bound for the restricted problem. Octeract proceeds similarly and introduces a bound of \(\pm 10^7\) for every missing bound and returns the lower bound for the restricted problem at termination.

Evidently, passing an instance with unbounded variables to several solvers with default settings may mean that each solver solves a different subproblem of the actual problem and often also reports a lower bound that corresponds to the solved subproblem only. Fortunately, parameters are available to adjust the treatment of unbounded variables. A first impulse could be to tell all solvers to set missing bounds to infinity, but this is not possible as each solver treats values beyond a different finite value as “infinity” (BARON: \(10^{50}\), Octeract: \(10^{308}\), SCIP: \(10^{20}\)). Changing this value is either not possible or not advisable.

We therefore decided to aim for \(\pm 10^{12}\) as replacement for a missing variable bound. For BARON and SCIP, the GAMS interface can replace any missing bound by \(\pm 10^{12}\) before the instance is passed to the solver. BARON will hence also return a lower bound for this restricted problem. For Lindo API, a solver parameter can be changed so that bounds for all variables subject to convexification are bounded by \(\pm 10^{12}\) (instead of \(\pm 10^{10}\)). Finally, also for Octeract, all missing bounds are set to \(\pm 10^{12}\) (instead of \(\pm 10^7\)) by changing of a solver parameter. Note, that this still does not ensure that all solvers solve the same instance, since Lindo API may still change initial finite bounds beyond \(10^{12}\) and may not bound variables that are not involved in convexification.

Next to missing bounds on problem variables, also singularities in functions (e.g., 1/x, \(\log (x)\)) can make finite estimators unavailable. Unfortunately, there are no parameters available to ensure a uniform treatment of this case in all solvers. SCIP ensures that the variable in \(x^p\), \(p<0\), or \(\log (x)\) is bounded away from zero by \(10^{-9}\), and terminates with a lower bound for this modified problem. BARON applies the same method as the one for missing variable bounds to choose a suitable bound on x. No lower bound is returned at termination then. The methods in Lindo API and Octeract are not known to us.

3.2.2 Solution quality

To ensure that all solvers return solutions of the same quality, constraints of (MINLP) are required to be satisfied with an absolute tolerance of \(10^{-6}\). This applies to linear and nonlinear equations, variable bounds, and integrality.

In addition, a tolerance on the proof of optimality is set. For this purpose, typically, solvers are allowed to stop when the absolute or relative gap between lower and upper bounds on the optimal value are sufficiently small. Since the test set is diverse and has optimal values of varying magnitude, setting only a relative gap limit and no absolute gap limit would be preferable. Unfortunately, Octeract does not permit different values for these limits. As a compromise, BARON, Lindo API, and SCIP are run with \(10^{-4}\) as relative gap limit and \(10^{-6}\) as absolute gap limit, while for Octeract, \(10^{-6}\) is used for both the absolute and relative gap limit. Section 2.2 in the supplement shows that this tighter optimality tolerance has essentially no effect on the performance of Octeract.

3.2.3 Working limits

As working limits, a time limit of two hours is used and the jobs on the cluster are restricted to 50 GB of RAM. Further, parallelization functionality has been disabled. For a comparison with parallelization enabled, see [14].

3.2.4 Summary

To summarize, the following parameters are used:

  1. GAMS

    (applied to all solvers): optcr=1e-4, optca=1e-6, reslim=7200, workspace=50000, threads=1

  2. BARON:

    InfBnd=1e12, AbsConFeasTol=1e-6, AbsIntFeasTol=1e-6

  3. Lindo API:

    GOP_BNDLIM=1e12, SOLVER_FEASTOL=1e-6

  4. Octeract:

    INFINITY=1e12, INTEGRALITY_VIOLATION_TOLERANCE=1e-6

  5. SCIP:

    gams/infbound=1e12, constraints/nonlinear/linearizeheursol=o (this undoes a change in the algorithmic settings of SCIP that is part of the GAMS/SCIP interface)

3.3 Correctness checks

A run of a solver on an instance is marked as failed if the solver terminated abnormally, the solution is not feasible with respect to the feasibility tolerance, or the lower or upper bound contradicts with the bounds on the optimal value that are specified on the MINLPLib page.

A run that has not failed is marked as solved if the relative or absolute gap limits are satisfied. If a solver stopped without closing the gap before the time limit, then the solver time is changed to the time limit. The only exception here is BARON, which stops on two instances before the time limit without reporting a lower bound due to singularities in functions (see Sect. 3.2.1). To be consistent with the treatment of other solvers, these two instances were accounted as solved by BARON with the original solver time.

3.4 Results

Table 2 shows for each solver the number of instances that could be solved, how often the time limit was reached, and the number of runs that were marked as failed. In addition, the number of instances for which a solution with objective value not more than 1% worse than the best solution found by any considered solver is shown. Finally, the shifted geometric mean of the running time of the solver is provided. The shift has been set to 1 s. Here, instances that failed are accounted with the time limit. In addition, results for the virtual best and virtual worst solver are reported, which are obtained by picking for each instance the fastest or slowest solver (best or worst objective function value for “best obj.” column), respectively. The performance profile in Fig. 1 shows the number of instances a solver solved with a time that is at most a factor of the fastest solvers time. Section 2.1 in the supplement provides detailed results.

Table 2 Aggregated performance data for all solvers on test set of 1000 instances
Fig. 1
figure 1

Performance profile comparing all solvers

The results show a small lead of BARON before SCIP with respect to the number of instances solved, number of instances finding a best solution, and average time. Since the number of timeouts is almost equal, one could argue that it is the higher stability of BARON that moves it onto the first place here. In fact, the 41 fails of SCIP are due to returning a wrong optimal value 16 times, returning an infeasible solution 23 times, and aborts due to numerical troubles for two instances. For BARON, fails are due to returning a wrong optimal value 26 times and an infeasible solution only once. While SCIP 8 has made a large step forward in ensuring that nonlinear constraints are satisfied in the non-presolved problem, violations in linear constraints or variable bounds still occur for a few instances. These are typically due to variables being aggregated during presolve.

Even though Octeract and Lindo API solved considerably fewer instances than BARON and SCIP, which also results in an increased mean time, it is noteworthy that each of the two is also the fastest solver on 270 and 66 instances, respectively. Octeract also produced correct results for 95% of the test set, while for Lindo API a relatively high number of wrong optimal values, infeasible solutions, or aborts is observed.

The large differences between the real and virtual solvers show that none of the solvers dominates all others or is dominated.

4 Conclusion

The development of the MINLP solver in SCIP has come a long way. In a recent version-to-version comparison [13, slides 49–51], a steady improvement in the performance of SCIP on MINLP over the last ten years has been measured, resulting in SCIP 8 solving twice as many instances as SCIP 3 and a speed-up of factor three. Partially, this improvement has been achieved by improving and adding features particular for MINLP. However, due to the generality of SCIP as a CIP solver, also many developments that targeted MIP solving were immediately available for MINLP solving.

With version 8, the MINLP solving capabilities of SCIP have been largely reworked and extended, which resulted in a considerable improvement in both robustness and performance [13, 16]. As a result, SCIP’s performance is currently on par with the state-of-the-art commercial solver BARON.

In contrast to the commercial solvers considered here, SCIP offers a variety of possibilities for a user, developer, or researcher to interact with the solving process. In particular, the newly added “nonlinear handler” plugin type sets SCIP apart from most other MINLP solvers, as it allows focusing on experimenting with new algorithms to handle certain structures in nonlinear functions without modifying the solver’s code.

The rather large number of features that are disabled by default shows that tuning and improving the existing code base has become increasingly necessary. Of course, also new features will be added in the future, e.g., improved separation for signomial functions [63], alternative relaxations for polynomial functions [64], or monoidal strengthening of intersection cuts for quadratics [65].

5 Supplementary information

A supplement with all data generated or analyzed for the computational experiments during this study is available online.