Global Optimization of Mixed-Integer Nonlinear Programs with SCIP 8

For over ten years, the constraint integer programming framework SCIP has been extended by capabilities for the solution of convex and nonconvex mixed-integer nonlinear programs (MINLPs). With the recently published version 8.0, these capabilities have been largely reworked and extended. This paper discusses the motivations for recent changes and provides an overview of features that are particular to MINLP solving in SCIP. Further, difficulties in benchmarking global MINLP solvers are discussed and a comparison with several state-of-the-art global MINLP solvers is provided.


Introduction
Mixed-integer nonlinear programming (MINLP) concerns with the optimization of an objective function such that a finite set of linear or nonlinear constraints and integrality conditions is satisfied. The generality of this problem class means that many real-world applications can be modeled as MINLPs [28,37,58,69], but also that software that can handle this class efficiently becomes extremely complex. Solvers for MINLP [19] are often built on top of or by combining solvers for mixed-integer linear programming (MIP) and solvers that find locally optimal solutions for nonlinear programs (NLP). In fact, one of the first commercial MINLP solvers, SCICONIC [7], extends a MIP solver by piecewise linear approximations of low dimensional nonlinear terms. The first general purpose solver was DICOPT [41], which decomposes the solution of an MINLP into a sequence of MIP and NLP solves [25], thereby building on established software for these two program classes. DICOPT can solve MINLPs where nonlinear constraints are convex to optimality, but works only as a heuristic on nonconvex MINLPs. The first general purpose solvers to solve also nonconvex MINLPs to optimality were αBB, BARON, and GLOP [4,60,65], all based on convexification techniques for nonconvex constraints. Also the solver SCIP (Solving Constraint Integer Programs), for which this paper provides an overview, belongs to the latter category.
In the following, MINLPs of the form min c x, are considered, where x, x ∈ R n , R := R ∪ {±∞}, x ≤ x, I ⊆ {1, . . . , n}, c ∈ R n , g, g ∈ R m , g ≤ g, g : R n → R m is specified explicitly in algebraic form, b, b ∈ Rm, b ≤ b, and A ∈ Rm ×n . The restriction to a linear objective function is a technical detail of SCIP and without loss of generality.
The software SCIP has been designed as a branch-cut-and-price framework to solve different types of optimization problems, most generally constraint integer programs (CIPs), and most importantly MIPs and MINLPs. Roughly speaking, CIPs are finitedimensional optimization problems with arbitrary constraints and a linear objective function that satisfy the following property: if all integer variables are fixed, the remaining subproblem must form a linear or nonlinear program. The problem class of CIP was motivated by the modeling flexibility of constraint programming and the algorithmic requirements of integrating it with efficient solution techniques available for MIP [1].
In order to solve CIPs, SCIP constructs relaxations -typically linear programs (LPs). If the relaxation solution is not feasible for the current subproblem, the plugins that handle the violated constraints need to take measures to eventually render the relaxation solution infeasible for the updated relaxation, for example by branching or separation [1]. A plethora of additional plugin types, e.g., for presolving, finding feasible solutions, or tightening variable bounds, allow accelerating the solution process. After 20 years of development of the framework itself and included plugins, SCIP includes mature solvers for MIP, MINLP, as well as several other problem classes. Since November 2022, SCIP is freely available under an open-source license.
SCIP solves problems like (MINLP) to global optimality via a spatial branch-andbound algorithm that mixes branch-and-infer and branch-and-cut [9]. Important parts of the solution algorithm are presolving, domain propagation (that is, tightening of variable bounds), linear relaxation, and branching. A distinguishing feature of SCIP is that its capabilities to handle nonlinear constraints are not limited to MINLPs, but can be used for any CIP. For example, problems can be handled where linear and nonlinear constraints are mixed with typical constraints from constraint programming, as long as appropriate constraint handlers have been included in SCIP. Since most constraint handlers in SCIP construct a linear relaxation of their constraints, also the handling of nonlinear constraints focuses on linear relaxations. The emphasis on handling CIPs with nonlinear constraints rather than MINLP only is also a reason that the use of nonlinear relaxations or reformulations of complete MINLPs into other problem types, e.g., mixed-integer conic programs, has not been explored much so far.
The development of SCIP initially focused on solving CIPs where fixing all integer variables resulted in a linear program [1]. However, it was soon realized that this requirement was not actually enforced by the implementation. As long as constraint handlers were able to resolve infeasibilities by separation, branching, or other means, the problem could be handled by SCIP. First experiments to handle nonlinear constraints in continuous variables were conducted for bilinear mixing constraints in mine production planning [18]. The positive results of these experiments motivated the decision to include support for more general nonlinear constraints. With version 1.2 (2009), initial support for quadratic constraints (convex or nonconvex) and solving quadratically constrained programs (QCPs) to local optimality by Ipopt [75] was added [13]. For version 2.0 (2010), a primal heuristic that solves sub-MIPs was added [10] and other large-neighborhoodsearch heuristics were extended to create sub-MINLPs [12]. Further, second-order cone constraints in three variables could be handled. More general nonlinear constraints, specified in algebraic form, were first supported by SCIP 2.1 (2011) [73]. Next to the specialized treatment for quadratic constraints, also handlers for signpower constraints (x|x| p = z for some p ≥ 1) [32] and 1-convex bivariate constraints (f (x, y) = z for f being convex or concave whenever x or y has been fixed) [6] were added.
With the basic handling of nonlinear constraints in place [74], the next releases were dedicated to adding features that improved performance. SCIP 3.0 (2012) brought optimization-based bound tightening (OBBT) [33] and an NLP diving heuristic. SCIP 3.2 (2015) added a reformulation of general quadratic constraints into secondorder cone constraints and separation for edge-concave decompositions of quadratic constraints [30]. With SCIP 4.0 (2017), higher-dimensional second-order cone constraints were disaggregated, KKT conditions for quadratic programs were utilized, multiple starting points were tried for NLP solves, solutions of the LP relaxation were projected onto a convex NLP relaxation, and also OBBT could be performed on the NLP instead of the LP relaxation [47]. Improved convexification of bilinear constraints by use of additional linear constraints [54], a new primal heuristic that solves a sequence of NLP reformulations, and interfaces to the NLP solvers filterSQP and Worhp [27,20,56] were added for SCIP 5.0 (2017) [34]. The following two major releases brought a branch-and-price based solver for ring-packing [35] (SCIP 6.0, 2018) and support for convex nonlinear subproblems in Benders Decomposition (SCIP 7.0 [31], 2020).
That versions 6 and 7 added comparatively few features for MINLP was due to an ongoing complete overhaul on the way how nonlinear constraints were handled. The primary motivation for this change, which was released with SCIP 8.0 (2022) [14], was to increase the reliability of the solver and to alleviate numerical issues that arose from problem reformulations and led to SCIP returning solutions that are feasible in the reformulated problem, but infeasible in the original problem. More precisely, previous SCIP versions built an extended formulation of (MINLP) explicitly, with the consequence that the original constraints were no longer included in the presolved problem. Even though the formulations were theoretically equivalent, it was possible that ε-feasible solutions for the reformulated problem were not ε-feasible in the original problem. SCIP 8 remedies this by building an implicit extended formulation as an annotation to the original problem. A second motivation for the major changes in SCIP 8 was to reduce the ambiguity of expression and nonlinear structure types by implementing different plugin types for low-level structure types that define expressions, and high-level structure types that add functionality for particular, sometimes overlapping structures. Finally, new features for improving the solver's performance on MINLPs were introduced with SCIP 8. These include intersection, SDP (semi-definite programming), and RLT (reformulation linearization technique) cuts for quadratic expressions [21,16], perspective strengthening [15], and symmetry detection [76].
SCIP can read MINLPs from files in the following formats: LP, MPS, NL (AMPL), OSiL, PIP, and ZIMPL. In addition, problems can be passed to SCIP via interfaces to a variety of programming languages and modeling packages, including AMPL, C, GAMS, Java, Julia, Python, and MATLAB.
The following section provides an overview of the MINLP solving capabilities of SCIP. Afterwards, the performance of SCIP is compared with that of other state-of-the-art global solvers for MINLP.

MINLP capabilities of SCIP
In the following, an overview of the facilities available in SCIP that are specific to the handling of MINLPs is provided. First, available nonlinear functions are listed and the integration of nonlinear constraints into the branch-and-cut solver of SCIP is discussed. Next, the concept of a nonlinear handler is introduced, which is a new plug-in type that has been added with SCIP 8 and facilitates the integration of extensions that handle specific nonlinear structures. The remainder of this section gives an overview of features available in SCIP that increase the efficiency of MINLP solving, e.g., cut generators to tighten the linear relaxation, presolve reductions to simplify the problem, and primal heuristics to find feasible solutions early.
To be concise, the presentation has been limited to high-level descriptions that spare technical details. Unless specified otherwise, more details are often found in [14].

Expressions
Algebraic expressions are well-formed combinations of constants, variables, and various algebraic operations such as addition, multiplication, and exponentiation, that are used to describe mathematical functions. They are often represented by a directed acyclic graph with nodes representing variables, constants, and operations and arcs indicating the flow of computation, see Figure 1 for an example. Also in SCIP, expressions are stored as directed acyclic graphs, while all semantics of expression operands are defined by expression handler plugins. These handler implement callbacks that are used by methods in the SCIP core to manage expressions (create, modify, copy, free, parse, print), to evaluate and compute derivatives at a point, to evaluate over intervals, to simplify, to identify common subexpressions, to check curvature and integrality, and to iterate over it. Some additional expression handler callbacks are used by the constraint handler for nonlinear constraints (Section 2.1.2) exclusively.
Expression handlers for the following operators are included in SCIP 8.0: • val: scalar constant; • var: a SCIP variable; • sum: an affine-linear function, y → a 0 + k j=1 a j y j for y ∈ R k with constant coefficients a ∈ R k+1 ; • prod: a product, y → c k j=1 y j for y ∈ R k with constant factor c ∈ R; • pow: a power with a constant exponent, y → y p for y ∈ R and exponent p ∈ R (if p ∈ Z, then y ≥ 0 is required); • signpower: a signed power, y → sign(y)|y| p for y ∈ R and constant exponent p ∈ R, p > 1; • exp: exponentiation, y → exp(y) for y ∈ R; • log: natural logarithm, y → log(y) for y ∈ R >0 ; • entropy: entropy, y → −y log(y), if y > 0, 0, if y = 0, for y ∈ R ≥0 ; • sin: sine, y → sin(y) for y ∈ R; • cos: cosine, y → cos(y) for y ∈ R; • abs: absolute value, y → |y| for y ∈ R.
In previous versions of SCIP, also high-level structures such as quadratic functions could be represented as expression types. To avoid ambiguity and reduce complexity, this has been replaced by a recognition of quadratic expressions that is no longer made explicit by a change in the expression type.

Constraint Handler for Nonlinear Constraints
All nonlinear constraints g ≤ g(x) ≤ g of (MINLP) are handled by the constraint handler for nonlinear constraints in SCIP, while the linear constraints b ≤ Ax ≤ b are handled by the constraint handlers for linear constraints and its specializations (e.g., knapsack, set-covering). A constraint handler is responsible for checking whether solutions satisfy constraints and, if that is not the case, to resolve infeasibility by enforcing constraints. This applies in particular to solutions of the LP relaxation. The nonlinear constraint handler currently enforces constraints by the following means: DOMAINPROP by analyzing the nonlinear constraints with respect to the variable bounds at the current node of the branch-and-bound tree, infeasibility or a bound tightening may be deduced, which allow pruning the node or cutting off the given solution, respectively; this is also known as domain propagation; SEPARATE a cutting plane that is violated by the given solution may be computed; BRANCH the current node of the branch-and-bound tree is subdivided, that is, a variable x i and a branching pointx i ∈ [x i , x i ] are selected and two child nodes with x i restricted to [x i ,x i ] and [x i , x i ], respectively, are created.
To decide whether a node can be pruned (DOMAINPROP), an overestimate of the range of g(x) with respect to current variable bounds is computed by means of interval arithmetics [53]. If a constraint k is found such that g k ([x, x]) ∩ [g k , g k ] = ∅, then there exists no point in [x, x] for which this constraint is feasible. A bound tightening may be computed by applying the same methods in reverse order. That is, interval arithmetic is used to overestimate g −1 ([g, g]), the preimage of g(x) on [g, g], and variable bounds are tightened to [x, x] ∩ g −1 ([g, g]). This is also known as feasibility-based bound tightening (FBBT). In the simplest case, callbacks of expression handlers are used to propagate intervals through expressions. However, in some cases, other methods that take more structure into account or that use additional information to tighten variable bounds and constraint sides are used (see, e.g., Sections 2.3.1 and 2.3.2).
To construct a linear relaxation of the nonlinear constraints (SEPARATE option), an extended formulation is considered: The functions h i are obtained from the expressions that define functions g i by recursively annotating subexpressions with auxiliary variables w i+1 , . . . , wm for somem ≥ m. Initially, slack variables w 1 , . . . , w m are introduced and assigned to the root of all expressions, i.e., h i := g i , w i := g i , w i := g i , for i = 1, . . . , m. Next, for each function h i , subexpressions f may be assigned new auxiliary variables w i , i > m, which results in extending (MINLP ext ) by additional constraints h i (x) = w i with h i := f . Bounds w i and w i are initialized to bounds on h i , if available. Since auxiliary variables in a subexpression of h i always receive an index larger than max(m, i), the result is referred to by h i (x, w i+1 , . . . , wm) for any i = 1, . . . ,m. That is, to simplify notation, w i+1 is used instead of w max(i,m)+1 . If a subexpression appears in several expressions, it is assigned at most one auxiliary variable and reindexing may be necessary to have h i depend on x and w i+1 , . . . , wm only. For the (in)equality sense i , a valid simplification would be to assume equality everywhere. For performance reasons, though, it can be beneficial to relax certain equalities to inequalities if that does not change the feasible space of (MINLP ext ) when projected onto x. Therefore, For i > m, monotonicity of expressions is taken into account to derive i .
Whether to annotate a subexpression by an auxiliary variable depends on the structures that are recognized. In the simplest case, every subexpression that is not already a variable is annotated with an auxiliary variable. This essentially corresponds to the Smith Normal Form [65]. For every function h i of (MINLP ext ), the callbacks of the corresponding expression handler can be used to compute linear under-and overestimators, such that a linear relaxation for (MINLP ext ) is constructed. It can, however, be beneficial to not add an auxiliary variable for every subexpression, thus allowing for more complex functions in (MINLP ext ). This will be the discussed in Section 2.1.3 below.
By annotating the root of the expression graph with a slack variable w 1 and each other non-variable node with an auxiliary variable, the extended formulation is obtained. Bounds on auxiliary variables have been omitted here. The constraints w 2 5 = w 2 , w 5 y = w 3 , and y 2 = w 4 were relaxed to inequalities because w 2 + 2w 3 + w 4 is monotonically increasing in each variable. However, to relax log(x) = w 5 to log(x) ≤ w 5 , both w 2 5 and w 5 y would need to be monotonically increasing in w 5 . This would be the case if x ≥ 1 and y ≥ 0.
If a constraint h i (x, w i+1 , . . . , wm) ≤ w i (the ≥-case is analogous) of (MINLP ext ) is violated and h i is nonconvex, then linear underestimators on h i can only be as tight as the convex envelope of h i . Therefore, it may not be possible find a hyperplane that is violated by the solution of the LP relaxation. Since the convex envelope of h i depends on the bounds of variables appearing in h i , these variables are candidates for branching (BRANCH). More precisely, when an expression handler computes a linear under-or overestimator for h i (x, w i+1 , . . . , wm), it also signals for which variables it used current variable bounds. Marked original variables are then added to the list of branching candidates. For an auxiliary variable w i , i > i, the variables in the subexpression that h i represents are considered for branching instead.
The decision on whether to add a cutting plane that separates the solution of the LP relaxation or to branch is rather complex, but the idea is to branch if either no cutting plane is found or if the violation of available cutting planes in the relaxation solution is rather small when compared to the convexification gap of the under/overestimators that define the cutting planes. In the latter case, it may be beneficial to first reduce the convexification gap by branching. To select one variable from the list of branching candidates, the violation of constraints in (MINLP ext ) and historical information about the effect of branching on a given variable on the optimal value of the LP relaxation ("pseudo costs") are taken into account. The branching point is a convex combination of the value of the variable in the LP relaxation and the mid-point of the variable's interval.

Nonlinear Handlers
In the previous example, four auxiliary variables were introduced to construct the extended formulation. This is due to the expression handlers having a rather myopic view, basically, implementing techniques that can handle only their direct children. It is clear that, for this example, an extended formulation that only replaces log(x) by an auxiliary variable w 2 could be more efficient to solve. However, this requires methods to detect the quadratic (or convex) structure and to either compute linear underestimators for the quadratic (convex) expression w 2 2 + 2w 2 y + y 2 or to separate cutting planes for the set defined by w 2 2 + 2w 2 y + y 2 ≤ w 1 . Such structure detection and handling methods are the task of the new nonlinear handler plugins that were introduced with SCIP 8. Nonlinear handlers determine the extended formulation (MINLP ext ) by deciding when to annotate subexpressions with auxiliary variables. That is, given a constraint h i (x) i w i , a nonlinear handler analyses the expression that defines h i and attempts to detect specific structures. At this point, it may also request to introduce additional auxiliary variables, thus changing h i (x) into h i (x, w i+1 , . . . , wm). In addition, it informs the constraint handler that it will now provide separation for h i (x, w i+1 , . . . , wm) ≤ w i , or ≥ w i , or both. If none of the nonlinear handlers declare that they will handle h i (x) i w i , auxiliary variables are introduced for each argument of the root of the expression h i and expression handler callbacks are used to construct cutting planes from linear under-/overestimators.
In addition to separation, nonlinear handlers can also contribute to domain propagation. This is implemented analogously to separation by setting up an additional extended formulation similarly to (MINLP ext ), with the main difference that slack and auxiliary variables are not actually created in SCIP and equalities are currently not relaxed to inequalities.
Note that the extended formulations are stored as annotation on the original expressions. Thus, for each task, the most suitable formulation can be used. For example, feasibility is checked on the original constraints, domain propagation and separation use the corresponding extended formulations, but branching is performed, by default, with respect to original variables only. With SCIP 7 and earlier, only one extended formulation was constructed explicitly and the connection to the original formulation was no longer available, leading to issues due to not ensuring that solutions are also (ε-)feasible for the original constraints.
In addition to the improved numeric reliability, the nonlinear handlers also allow for a higher flexibility when handling nonlinear structures. For each node in an expression, more than one nonlinear handler can be attached, each one annotating possibly different subexpressions with auxiliary variables. For example, for a nonconvex quadratic constraint i,j a i,j x i x j ≤ w, the nonlinear handler for quadratics can declare that it will provide separation (by intersection cuts, see Section 2.3.5), but that also other means of separation should be tried. However, since no other nonlinear handler declares that it will provide separation, auxiliary variables are introduced for each argument of the sum, that is, an auxiliary variable X ij is assigned to each product x i x j . For the corresponding constraints x i x j ≤ X ij (if a i,j ≥ 0), the well-known McCormick underestimators [49], or other means (see Section 2.3.2) will be used to construct a linear relaxation.

NLP Relaxation
Similar to the central LP relaxation of SCIP, an NLP relaxation is also available. In contrast to constraint handlers, the NLP relaxation uses a common data structure to store its constraints. At the moment, constraint handlers for linear constraints and the constraint handler for nonlinear constraints store a representation of their constraints in the NLP relaxation, so that in case of a MINLP, the NLP relaxation together with the integrality conditions on variables provides a unified view of the problem. For nonlinear constraints, the original (non-extended) form g ≤ g(x) ≤ g is added to the NLP. To find local optimal solutions for the NLP relaxation, interfaces to the NLP solvers filterSQP, Ipopt, and Worhp [27,75,20] are available. First-and second-order derivatives for these solvers are computed via CppAD [8].
The NLP relaxation is mainly used by some primal heuristics (Section 2.8) and separators (Section 2.4.2) at the moment.

Presolving
When presolving nonlinear constraints, expressions are simplified and brought into a canonical form. For example, recursive sums and products are flattened and fixed or aggregated variables are replaced by constants or sums of active variables. In addition, it is ensured that if a subexpression appears several times (in the same or different constraints), always the same expression object is used. This ensures that in the extended formulation (MINLP ext ) at most one auxiliary variable is attached to such common subexpressions.

Variable Fixings
Similar to what has been shown by Hansen et al. [38], if a bounded variable x j does not appear in the objective (c j = 0), but in exactly one constraint g k ≤ g k (x) ≤ g k where g k (x) is convex in x j for any fixing of other variables and g k = +∞ (or concave in x j and g k = −∞), then there always exists an optimal solution where x j ∈ {x j , x j }.
For example, if y ∈ [0, 1] appears only in a constraint xy + yz − y 2 ≤ 5, then y can be changed to a binary variable.
SCIP recognizes such variables for polynomial constraints (under additional assumptions [14]) and changes the variable type to binary, if x j = 0 and x j = 1, or adds a bound disjunction constraint x j ≤ x j ∨ x j ≥ x j . As a consequence, branching on x j leads to fixing the variable in both children.

Linearization of Products
The introduction emphasized that with SCIP 8, an explicit extended reformulation of nonlinear constraints is avoided. An exception that proves this "rule" is the linearization of products of binary variables in presolving. Doing so has the advantage that more of SCIP's techniques for MIP solving can be utilized.
In the simplest case, a product i x i is replaced by a new variable z and a constraint of type "and" that models z = i x i is added. The "and"-constraint handler will then separate a linearization of this product [11]. For a product of only two binary variables, the linearization is added directly.
For a quadratic function in binary variables with many terms, the number of variables introduced may be large. Thus, in this case, a linearization that requires fewer additional variables is used, even though it may lead to a weaker relaxation.

KKT Strengthening for QPs
A presolving method that aims to tighten the relaxation of a quadratic program (QP) by adding redundant constraints derived from Karush-Kuhn-Tucker (KKT) conditions is available. Consider a quadratic program of the form where Q ∈ R n×n is symmetric, c ∈ R n , A ∈ R m×n , and b ∈ R m . If (QP) is bounded, then all optima of (QP) satisfy the following KKT conditions: where µ is the vector of Lagrangian multipliers of the constraints Ax ≤ b.
In a specialized presolver, SCIP recognizes whether (MINLP) is equivalent to (QP) by checking whether a quadratic objective function has been reformulated into a constraint. If a (QP) has been found and all variables are bounded, then the equations (KKT) are added as redundant constraints to the problem, whereby the complementarity constraints are formulated via special ordered sets of type 1. The redundant constraints can help to strengthen the linear relaxation and prioritize branching decisions to satisfy the complementarity constraints, which focuses the search more on the local optima of (QP).
In addition to (QP), the implementation can also handle mixed-binary quadratic programs. For all details, see [47,26]. When this presolver was added to SCIP 4.0, it has shown to be very beneficial for box-constrained quadratic programs. Due to the many changes and extensions in SCIP 8, in particular for the handling of quadratic constraints (Section 2.3), it needs to be reevaluated under which conditions this presolver should be enabled. Currently, it is disabled by default.

Symmetry Detection
Symmetries in a MINLP are automorphisms on R n that map optimal solutions to optimal solutions. Such symmetries have an adverse effect on the performance of branch-andbound solvers, because symmetric subproblems may be treated repeatedly. Therefore, SCIP can enforce lexicographically maximal solutions from an orbit of symmetric solutions via bound tightening and separation of linear inequalities [39,34,31,14].
Since optimal solutions are naturally not known in advance, the symmetry detection resorts to find permutations of variables that map the feasible set onto itself and map each point to one with the same objective function value [48]. These permutations are given by isomorphisms in an auxiliary symmetry detection graph, which is constructed from the problem data (e.g., c, A, I, and the expressions that define g(x)) [43, 76].

Quadratics
Since quadratic functions frequently appear in MINLPs (every second instance of MINLPLib [50] has only linear and quadratic constraints), a number of techniques have been added to SCIP to handle this structure. Next to the presolving methods that were discussed in the previous section, three nonlinear handlers and four separators deal with quadratic structures. When none of the nonlinear handlers are active, then for each square and bilinear term in a quadratic function, an auxiliary variable is added in the extended formulation and gradient, secant, and McCormick under-and overestimators (see (1)) are generated.

Domain Propagation
If variables appear more than once in a quadratic function, then a term-wise domain propagation does not necessarily yield the best possible results, due to suffering from the so-called dependency problem of interval arithmetics. For example, it is easy to compute the range for x 2 + x for given bounds on x, or bounds on x for a given interval on x 2 + x, but standard interval arithmetics would treat the terms x 2 and x separately, which can lead to overestimating the result.
Therefore, a specialized nonlinear handler in SCIP provides a domain propagation procedure for quadratics that aims to reduce overestimation. For this, the detection routine of the nonlinear handler writes a quadratic expression as where y i is either an original variable (x) or another expression, . . , k}, i = 1, . . . , k. For functions q i with at least two terms (at least two of a i , b i,j , j ∈ P i , and c i are nonzero), a relaxation is obtained by replacing each y j by [y j , y j ], j ∈ P i . For this univariate quadratic interval-term in y i , tight bounds can be computed [24]. In addition, bounds on variables y j , j ∈ P i , are computed by considering where [q, q] are given bounds on q(y). After relaxing each q i to an interval, bounds on the right-hand side of (3) are computed, which are then used to calculate bounds on each y j , j ∈ P i .

Bilinear Terms
For a product y 1 y 2 , where y 1 and y 2 are either non-binary variables or other expressions, the expression handler for products already provides linear under-and overestimators and domain propagation that is best possible when considering the bounds [y 1 , y 1 ] × [y 2 , y 2 ] only. However, if linear inequalities in y 1 and y 2 are available, then possibly tighter linear estimates and variable bounds can be computed. In SCIP, this is done by a specialized nonlinear handler that implements the algorithm by Locatelli [45]. The inequalities are found by projection of the LP relaxation onto variables (y 1 , y 2 ). For more details, see [54]. An alternative method that uses linear constraints to tighten the relaxation of quadratic constraints are the RLT cuts described in the following.

RLT Cuts
The Reformulation-Linearization Technique (RLT) [2,3] has proven very useful to tighten relaxations of polynomial programming problems. In SCIP, a separator of cuts that are computed via RLT for bilinear product relations in (MINLP ext ) is available. For simplicity, denote by X ij the auxiliary variable that is associated with a constraint x i x j X ij of (MINLP ext ) (X ji denotes the same variable as X ij ). Recall that it is valid to replace by =. Given , and a linear constraint a x ≤ b, RLT cuts are derived by first multiplying the constraint by a nonnegative bound factors ( For instance, consider multiplication by the factor (x i − x i ), which yields a valid nonlinear inequality: This is referred to as the reformulation step. The linearization step is then performed for all terms x k x i in (4). If a product relation X ki = x k x i exists, then the product is replaced with X ki . If x k and x i are contained in the same clique, the product is replaced with an equivalent linear expression. Otherwise, it is replaced by a linear under-or overestimator such as (1).
In addition, the RLT separator can reveal linearized products between binary and continuous variables. To do so, it checks whether pairs of linear inequalities that are defined in the same triple of variables (one of them binary, the other two continuous) imply a product relation. These implicit products can then be used in the linearization step of RLT cut generation [16].

SDP Cuts
As in the previous section, denote by X ij the auxiliary variable that is associated with a constraint x i x j X ij of (MINLP ext ). A popular convex relaxation of the condition X = xx is given by requiring X − xx to be positive semidefinite. Separation for the set {(x, X) : X − xx 0} itself is possible, but cuts are typically dense and may include variables X ij for products that do not exist in the problem. Therefore, only principal 2 × 2 minors of X − xx , which also need to be positive semidefinite, are considered. By Schur's complement, this means that the condition needs to hold for any i, j, i = j. A separator in SCIP detects minors for which X ii , X jj , X ij exist in (MINLP ext ) and enforces A ij (x, X) 0. To do so for a solution (x,X) that violates (5), an eigenvector v ∈ R 3 of A ij (x,X) with v A ij (x,X)v < 0 is computed and the globally valid linear inequality v A ij (x, X)v ≥ 0 is added.

Intersection Cuts
Intersection cuts [70,5] have shown to be an efficient tool to strengthen relaxations of MIPs. Recently, Muñoz and Serrano showed how to compute the tightest possible intersection cuts for quadratic programs [55]. This method has been implemented in SCIP [21]. Assume a nonconvex quadratic constraint of (MINLP ext ) is q(y) ≤ w with q(y) as in (2) and w an auxiliary variable. The separation of intersection cuts is implemented for the set S := {(y, w) ∈ R k : q(y) ≤ w} that is defined by this constraint.
Let (ŷ,ŵ) be a basic feasible LP solution violating q(y) ≤ w. First, a convex inequality g(y, w) < 0 is build that is satisfied by (ŷ,ŵ), but by no point of S. This defines a so-called S-free set C = {(y, w) ∈ R k+1 : g(y, w) ≤ 0}, that is, a convex set with (ŷ,ŵ) ∈ int(C) containing no point of S in its interior. The quality of the resulting cut highly depends on which S-free set is used, but using maximal S-free sets yield the tightest possible intersection cuts [55].
By using the conic relaxation K of the LP-feasible region defined by the nonbasic variables at (ŷ,ŵ), the intersection points between the extreme rays of K and the boundary of C are computed. The intersection cut is then defined by the hyperplane going through these points and successfully separates (x,ŵ) and S. See Figure 2 for an illustration. To obtain even better cuts, there is also a strengthening procedure implemented that uses the idea of negative edge extension of the cone K [36].  The cut is computed using the intersection points of an S-free set C (orange) and the rays of a simplicial cone K ⊇ S (boundary in green) with apex (ŷ,ŵ) ∈ S.
In addition to the separation of intersection cuts for a set S given by a constraint q(y) ≤ w, SCIP can also generate intersection cuts for implied quadratic equations. Recall the matrix of auxiliary variables X as introduced in Section 2.3.3. The condition X = xx implies that X needs to have rank 1. Therefore, any 2 × 2 minor of X needs to have determinant zero. That is, for any set of variable indices i 1 , i 2 , j 1 , j 2 with i 1 = i 2 and j 1 = j 2 , the condition X i1j1 X i2j2 = X i1j2 X i2j1 needs to hold. If all variables in this condition exist in (MINLP ext ) and a solution violates this condition, then the previously described procedure to generate intersection cuts is applied to the set defined by this condition. Since intersection cuts can be rather dense, it is not clear yet how to decide when it will be beneficial to generate such cuts. Their separation is therefore currently disabled by default. For more details, see [21].

Edge-Concave Cuts
Another method to obtain a linear outer-approximation for a quadratic constraint is by utilizing an edge-concave decomposition of the quadratic function. This has shown to be particularly useful for randomly generated quadratic instances [51,52]. A function is edge-concave over the variables' domain (e.g., [x, x]) if it is componentwise concave.
Given a quadratic function, the separator for edge-concave cuts solves an auxiliary MIP to partition the square and bilinear terms into a sum of edge-concave functions and a remaining function. Since the convex envelope of edge-concave functions is vertex-polyhedral [67], that is, it is a polyhedral function with vertices corresponding to the vertices of the box of variable bounds, facets on the convex envelope of each edge-concave function can be computed by solving an auxiliary linear program (see also Section 2.4.1). For the function of remaining terms, term-wise linear underestimators such as (1) are summed up.
Since the current implementation of edge-concave cuts in SCIP has not shown to be particularly useful for general MINLP, this separator is disabled for now.

Second-Order Cones
An important connection between MINLP and conic programming is the detection of constraints that can be represented as a second-order cone (SOC) constraint, since the latter defines a convex set, while the original constraint may use a nonconvex constraint function.
A specialized nonlinear handler aims to detect SOC representable structures. In the detection phase, a constraint h i (x) ≤ w i (the case ≥ is handled similarly) of the extended formulation (MINLP ext ) is passed to the nonlinear handler. For this constraint, it is checked whether it defines a bound on an Euclidian norm ( k j=1 (a j y 2 j + b j y j ) + c ≤ w i for some coefficients a j , b j , c ∈ R, a j > 0, where y j is either an original variable or some subexpression of h i (·)), or is a quadratic constraint that is SOC-representable [46]. Since the introduction of slack variables w i , i ≤ m, may prevent such a detection, the equivalent constraint h i (x) ≤w i is considered instead.
A detected SOC constraint is stored in the form with v j ∈ R , j = 1, . . . , k + 1, where y 1 , . . . , y are variables of (MINLP ext ). Since the left-hand side of (6) is convex, a solutionŷ that violates (6) can be separated by linearization of the left-hand side of (6). However, if there are many terms on the left-hand side of (6) (k being large), then it can require many cuts to provide a tight linear relaxation of (6). Thus, a disaggregation of the cone [72] is used if k ≥ 3: where variables z 1 , . . . , z k are new variables. A solution (ŷ,ẑ) that violates (6) needs to violate also (7) for some j ∈ {1, . . . , k} or (8). The latter is already linear and can be added as a cut. If a rotated second-order cone constraint (7) is violated for some j, then it is transformed into the standard form and a gradient cut is constructed by linearization of the left-hand side.

Convex and Concave Constraints
For the linear underestimation of functions like x exp(x) or x 2 + 2xy + y 2 , the construction of an extended formulation (xw, exp(x) = w; w 1 + 2w 2 + w 3 , w 1 = x 2 , w 2 = xy, w 3 = y 2 ) is not advisable. Instead, hyperplanes that support the epigraph of a convex function can be used if convexity is recognized. In SCIP, specialized nonlinear handlers are available to detect for a function h i (x) of (MINLP ext ) the subexpressions that need to be replaced by auxiliary variables w i+1 , . . . , wm such that the remaining expression h i (x, w i+1 , . . . , wm) is convex or concave. The detection utilizes the often applied rules for convexity/concavity of function compositions (e.g., f convex and monotone decreasing, g concave ⇒ f • g convex), but applies them in reverse order. That is, instead of deciding whether a function is convex/concave based on information on the convexity/concavity and monotonicity of its arguments, the algorithm formulates conditions on the convexity/concavity of the function arguments given a convexity/concavity requirement on the function itself. When a condition on an argument cannot be fulfilled, it is replaced by an auxiliary variable. Next to "myopic" rules for convexity/concavity that are implemented by the expression handlers, also rules for product compositions (af (bg(x) + c)g(x) with constants a, b, c and repeating subexpression g(x)), signomials (c k j=1 f pj j (x) with c, p j ∈ R and subexpressions f j (x), j = 1, . . . , k), and quadratic forms are available. The latter may check for definiteness of its Hessian by calculating its eigenvalues. Further, it has been shown that for a composition of convex functions f • g, it can be beneficial for the linear relaxation to consider the extended formulation f (w), w ≥ g(x), instead of the composition f (g(x)) [68]. This is enforced by a small variation of the detection algorithm.
When a convex constraint h i (x, w i+1 , . . . , wm) ≤ w i of (MINLP ext ) is violated at a point (x,ŵ), a tangent on the graph of h i at (x,ŵ) is used to compute a separating hyperplane. The slope of the tangent is given by the gradient of h i at (x,ŵ), which is calculated via automatic differentiation on the expression graph. If, however, h i is univariate, that is, h i (x, w i+1 , . . . , wm) = f (y) for some variable y, and y is integral, then taking the hyperplane through the points ( ŷ , f ( ŷ )) and ( ŷ + 1, f ( ŷ + 1)) can give a tighter underestimator.
For a concave function h i (x, w i+1 , . . . , wm), any hyperplane αx + βw + γ that is a valid linear underestimator, since h i is vertex-polyhedral with respect to the box. Maximizing αx + βŵ + γ such that αx + βw + γ does not exceed h i (x, w i+1 , . . . , wm) for all vertices gives an underestimator that is as tight as possible at a given reference point (x,ŵ). For the frequent cases k = 1 and k = 2, routines that directly compute such an underestimator are available. For k > 2, a linear program is solved. Since the size of this LP is exponential in k, underestimators for concave functions in more than 14 variables are currently not computed.

Tighter Gradient Cuts
The separating hyperplanes generated for convex functions of (MINLP ext ) as discussed in the previous section are, in general, not supporting for the feasible region of (MINLP ext ), because the point where the functions are linearized is not at the boundary of the feasible region (which is the reason why it needs to be separated). Therefore, often several rounds of cut generation and LP solving are required until the relaxation solution satisfies the convex constraints. Solvers for convex MINLP have handled this problem in various ways [25,42], but the basic idea is to build gradient cuts at a suitable boundary point of the feasible region.
In SCIP, three procedures for building tighter and/or deeper gradient cuts for convex relaxations are included. The first two methods compute a point on the boundary of the set defined by all convex constraints of (MINLP) that is close to the point to be separated. The first method solves an additional nonlinear program to project the point to be separated onto the convex set. Since solving an NLP for every point to be separated can be quite expensive, the second method, going back to an idea by Veinott [71], does a binary search between an interior point of the convex set and the point to be separated. The interior point is computed once in the beginning of the search by solving an auxiliary NLP. For more details, see [47].
The third method does not aim to separate a given point, but utilizes the feasible points that are found by primal heuristics of SCIP. When a new solution is found, gradient cuts are generated at this solution for convex constraints of (MINLP ext ) and added to the cutpool. If such a cut is later found to separate the relaxation solution, it is added to the LP.
All methods are currently disabled as they require more tuning to be efficient in general.

Quotients
Note that the available expression handlers (see Section 2.1.1) do not include a handler for quotients, since they can equivalently be written using a product and a power expression. Therefore, the default extended formulation for an expression y 1 y −1 2 is given by replacing y −1 2 by a new auxiliary variable w. The linear outer-approximation is then obtained by estimating y 1 w and y −1 2 separately. However, tighter linear estimates are often possible. Therefore, a specialized nonlinear handler checks whether a given function h i (x) can be cast as with a, b, c, d, e ∈ R, a, c = 0, and y 1 and y 2 being either original variables or subexpressions of h i (x). Tight linear estimators for (9) are computed by distinguishing a number of cases. For example, for ay 1 + b ≥ 0 and cy 2 + d > 0 (if c > 0), a linear underestimator is obtained by computing a tangent on the graph of the convex underestimator of f that is given by [78]. A linear overestimator is obtained by computing a facet on the concave envelope of f , which is easy since −f is vertex-polyhedral. Furthermore, in the univariate case (y 1 = y 2 ), f is either convex or concave on [y 1 , Since in the univariate case the same variable appears twice, also a specialized domain propagation method that avoids the dependency problem of interval arithmetic is available.

Perspective Strengthening
Perspective reformulations have shown to efficiently tighten relaxations of convex mixedinteger nonlinear programs with on/off-structures, which are often modeled via big-M constraints or semi-continuous variables [29]. A variable x j is semi-continuous with respect to the binary indicator variable x j , j ∈ I, if it is restricted to the domain [x 1 j , x 1 j ] when x j = 1 and has a fixed value x 0 j when x j = 0. In SCIP, a strengthening of under-and overestimators for functions that depend on semi-continuous variables is available. Consider a constraint h i (x, w i+1 , . . . , wm) w i of (MINLP ext ) and write h i as a sum of its nonlinear and linear parts: where h nl i is a nonlinear function, h l i is a linear function, x nl and w nl are the vectors of variables x and w, respectively, that appear only in the nonlinear part of h i , and x l and w l are the vectors of variables x and w, respectively, that appear only in the linear part of h i . A strengthening of under-or overestimators for h i (x, w i+1 , . . . , wm) is attempted if x nl and w nl are semi-continuous with respect to the same indicator variable x j .
To determine whether a variable x j is semi-continuous, bounds on x j that are implied by fixing a binary variable are analyzed. The implied bounds can be obtained either from linear constraints directly or by probing, and are stored by SCIP in a globally available data structure. If a pair of implied bounds on x j with the same binary variable x j is found, i.e., In addition, an auxiliary variable w i is found to be semi-continuous if function h i (x, w i+1 , . . . , wm) depends only on semi-continuous variables with the same indicator variable.
Assume that a linear underestimator (x, w i+1 , . . . , wm) of h i (x, w i+1 , . . . , wm) has been computed and split it into parts corresponding to the nonlinear and linear variables of h i , respectively: The perspective strengthening consists of extending the part of the underestimator that corresponds to the nonlinear part such that it is tight for x j = 0: The linear part remains unchanged, since it shares none of the variables with the nonlinear part. This extension ensures that the estimator is equal to h i (x, w i+1 , . . . , wm) for x j = 0, x nl = x 0 nl , and w nl = w 0 nl , and equal to (x, w i+1 , . . . , wm) for x j = 1. If h i is convex, cuts obtained this way are equivalent to the classic perspective cuts [29]. For more details on the implementation in SCIP, see [15]. An example is given in Figure 3. . From the original set, cuts of the formx 2 + 2x(x −x) ≤ w for some reference pointx would be generated. With perspective strengthening, a linearization on the right set is obtained instead, i.e.,x 2 +2x(x−x)+x 2 (1−y) ≤ w. The latter is typically better as it is tight for y = 0 as well.

Optimization-Based Bound Tightening
Optimization-Based Bound Tightening (OBBT) is a domain propagation technique which minimizes and maximizes each variable over the feasible set of the problem or a relaxation thereof [59]. Whereas FBBT (see Section 2.1.2) propagates the nonlinearities individually, OBBT considers (a relaxation of) all constraints together, and may hence compute tighter bounds. However, it is rather expensive compared to FBBT.
In SCIP, OBBT solves two auxiliary LPs for each variable x k that could be subject to spatial branching: where is the linear relaxation of the feasible region of (MINLP ext ), and c x ≤ U is an objective cutoff constraint that excludes solutions with objective value worse than the current incumbent. The optimal value of (10) may then be used to tighten the lower / upper bound of variable x k . A variable is subject to spatial branching if cut separation routines use the bounds of the variable at a node of the branch-and-bound tree. SCIP, by default, applies OBBT at the root node to tighten bounds globally. It restricts the computational effort by limiting the amount of LP iterations spent for solving the auxiliary LPs and interrupting for cheaper domain propagation techniques to be called between LP solves.
Further, SCIP does not only use the optimal objective values of (10) to tighten the bounds on x k , but it also applies a computationally cheap approximation of OBBT during the branch-and-bound search by exploiting the dual solutions from solves of (10) at the root node. Suppose the maximization LP is solved and feasible dual multipliers respectively, and the corresponding reduced cost vectors r x and r w are obtained. Then is a valid inequality, which is called Lagrangian variable bound (LVB), and is a valid upper bound for x k that equals the OBBT bound if the dual multipliers are optimal. SCIP learns LVBs at the root node and propagates them during the tree search whenever the bounds of variables on the right-hand side of (11) become tighter or an improved primal solution is found. For further details, see [33].
In addition to OBBT with respect to the LP relaxation, also a variant is available that optimizes single variables over the potentially tighter convex NLP relaxation that is given by all linear and convex nonlinear constraints of (MINLP). Also for this variant, linear Lagrangian variable bounds similar to (11) can be constructed by taking constraint convexity and KKT conditions into account. Because of the potentially high computational cost of solving many NLPs, this variant of OBBT is deactivated by default. For more details, see [47].

Primal Heuristics
The purpose of primal heuristics is to find high quality feasible solutions early in the search. When given an MINLP, up to 40 primal heuristics are active in SCIP by default. Many of them aim to find an integer-feasible solution to the LP relaxation. In the following, primal heuristics that are only active in the presence of nonlinear constraints are discussed.

subNLP
A primal heuristic like subNLP is implemented in virtually any global MINLP solver. Given a pointx that satisfies the integrality requirements (x I ∈ Z |I| ), the heuristic starts by fixing all integer variables in (MINLP) to the values given byx. It then calls the SCIP presolver on this subproblem for possible simplifications. Finally, it triggers a solution of the remaining NLP, usingx as the starting point. If the NLP solver, such as Ipopt, finds a solution that is feasible (and often also locally optimal) for the NLP relaxation, then a feasible point for (MINLP) has been found.
The starting pointx can be the current solution of the LP relaxation if integerfeasible, a point found by a primal heuristic that searches for integer-feasible solutions of the LP relaxation, or a point that is passed on by other primal heuristics for MINLP, such as those mentioned in the next sections.
How frequently the heuristic should run and how much effort to spend on an NLP solve is a nontrivial decision. In the current implementation, the heuristic uses a fixed number for the iteration limit of the NLP solver for its first run. For the following calls, the limit is set to twice the average number of iterations required in previous runs. If, however, many of the previous runs hit the iteration limit, then an increased iteration limit is used. Whether to run the heuristic at a node of the branch-and-bound tree depends on the number of nodes processed since it ran the last time, the iteration limit that would be used, and how successful the heuristic has been in finding feasible points in previous calls.

Multistart
If (MINLP) is nonconvex after fixing all integer variables, then several local optima may exist for the NLPs solved by heuristic subNLP. The success of the NLP solver then strongly depends on the starting point. Therefore, the multistart heuristic aims to compute several starting points that are passed to the subNLP heuristic.
The algorithm, originally developed in [66], tries to approximate the boundary of the feasible set of the NLP relaxation by sampling points from [x, x] and pushing them towards the feasible set by the use of an inexpensive gradient descent method. Afterwards, points that are relatively close to each other are grouped into clusters. Ideally, each cluster approximates the boundary of some connected component of the feasible set. For each cluster, a linear combination of the points is passed as a starting point to subNLP. For integer variables x i , i ∈ I, the value in the starting point is rounded to an integral value.
To reduce infeasibility of a pointx, the constraint consensus method [66] is used. The algorithm computes a descent direction for each violated constraint of (MINLP).
. Pointx is then updated by adding the average of the descent directions for all violated linear and nonlinear constraints. This step is iterated untilx becomes feasible, or a stopping criterion has been fulfilled.
The multistart heuristic currently runs for continuous problems (I = ∅) only by default, since rounding and fixing integer variables most likely lead to infeasible NLP subproblems. For more details, see [47].

NLP Diving
As an alternative to finding a good fixing for all integer variables of (MINLP), the NLP diving heuristic starts by solving the NLP relaxation at the current branch-and-bound node with an NLP solver. It then iteratively fixes integer variables with fractional value and resolves both the LP and NLP relaxations, thereby simulating a depth-first-search in a branch-and-bound tree. By default, variables for which the sum of the distances from the solutions of the LP and NLP relaxations to a common integer value is minimal are rounded to the nearest integer value. Further, binary variables and nonlinear variables are preferred. If the resulting NLP is found to be (locally) infeasible, one-level backtracking is applied, that is, the last fixing is undone, and the opposite fixing is tried. If this is infeasible, too, the heuristic aborts.

MPEC
While the NLP diving heuristic either completely omits or enforces integrality restrictions in the NLP relaxation, the MPEC heuristic adds a relaxation of the integrality restriction to the NLP and tightens this relaxation iteratively. The heuristic is only applicable to mixed-binary nonlinear programs at the moment.
The basic idea of the heuristic, originally developed in [61], is to reformulate (MINLP) as a mathematical program with equilibrium constraints (MPEC) and to solve this MPEC to local optimality. The MPEC is obtained from (MINLP) by rewriting the condition x i ∈ {0, 1}, i ∈ I, as complementarity constraint This reformulation is again reformulated to an NLP by writing it as x i (1 − x i ) = 0. However, since these reformulated complementarity constraints will not, in general, satisfy constraint qualifications, solving this NLP reformulation with a generic NLP solver will often fail.
Therefore, in order to increase the chances of solving the NLP reformulation, the heuristic solves regularized versions of the NLP by relaxing The solution of one NLP is thereby used as the starting point for the next solve. If the NLP solution is close to satisfying x I ∈ {0, 1} |I| , it is passed as starting point to the subNLP heuristic. If an NLP is (locally) infeasible, the heuristic does two more attempts where the values for binary variables that are already close to 0 or 1 are flipped to 1 or 0, respectively. For more details, see [34].

Undercover
While the previous heuristics focused mainly on enforcing the integrality condition on an NLP, heuristic undercover [10] starts from a completely different angle. The heuristic is based on the observation that it sometimes suffices to fix only a comparatively small number of variables of (MINLP) to yield a subproblem with all constraints being linear. For example, for a bilinear term, only one of the variables needs to be fixed. The variables to fix are chosen by solving a set covering problem, which aims at minimizing the number of variables to fix. The values for the fixed variables are taken from the solution of the LP or NLP relaxation or a known feasible solution of the MINLP.
The resulting sub-MIP is less complex to solve, and does not need to be solved to proven optimality. The solutions of the sub-MIP are immediately feasible for (MINLP). However, the best one is also passed as starting point to heuristic subnlp to try for further improvement. For more details, see [10].

Benchmark
The following aims to present a fair comparison of SCIP with several other state-of-theart solvers for general MINLP. Doing so is not trivial at all. First, a set of instances needs to be selected that is suitable as a benchmark set. Second, solver parameters have to be set such that all solvers solve the same instances with the same working limits and the same requirements on feasibility and optimality -this goal could not be reached completely. Third, the solver's results have to be checked for correctness, or, when this is not possible, plausibility.
GAMS was used for the experiments, as it provides various facilities to help on solver comparisons and comes with current versions of SCIP and the commercial solvers BARON [40], Lindo API [44], and Octeract included. ANTIGONE has not been included in the comparison, as its development seems to have stopped years ago.
All computations were run on a Linux cluster with Intel Xeon E5-2670 v2 CPUs (20 cores

Test Set
To construct a test set suitable for benchmarking, the MINLPLib [50] collection of 1595 MINLP instances was used as source. First, all instances that could not be handled by some of the considered solvers were excluded, e.g., instances with trigonometric functions, as they are not supported by BARON. All solvers were then run in serial mode (that is, with parallelization features disabled) on the remaining 1505 instances and using the parameter settings described below. The results of these runs were then used to select a set of 200 instances that could be solved by at least one solver, that were not all trivial, had a varying degree of integrality and nonlinearity, and such that having many instances with a similar name is avoided. The latter was done to avoid overrepresentation of optimization problems for which many instances were added to MINLPLib.
Since small changes to an instance can lead to large variations in the solver's performance, the benchmark's reliability is improved by considering for each instance four additional variants where the order of variables and equations has been permuted. The permuted instances were generated with GAMS/Convert. Thus, a test set of 1000 instances is obtained.
The following approach was used to select the 200 instances before permutation: Let I be the set of 1505 instances, d i be the fraction of integer variables in instance i ∈ I, and e i be the fraction of nonzeros in the Jacobian and objective function gradient that correspond to nonlinear terms. Next, assign to each instance an identifier f i ∈ F such that instances that seem to come from the same model are assigned the same identifier. This goal is approximated by mapping i to the name of the instance until the first digit, underscore, or dash, except for the block layout design instances fo*, m*, no*, o*, which were all assigned to the same identifier. |F | = 230 different identifiers were found this way.
Further, let t i be the largest time in seconds that any solver who did not produce wrong results on instance i spend on instance i. Finally, let S be the number of instances that could be solved by at least one solver.
To ensure that instances with a varying amount of integer variables and nonlinearity are included, the interval [0, 1] was split once at breakpoints 0.05, 0.25, 0.5, 0.9 and once at 0.1, 0.25, 0.5. Let D and E be the resulting partitions of [0, 1]. For every interval from D and E, the aim is to have roughly the same number of instances with d i and e i in the respective intervals. For the choice of breakpoints that define D and E, the distribution of d i and e i , i ∈ I, have been taken into account. For example, MINLPLib contains many purely continuous and purely discrete instances, but not many instances that are mostly linear or completely nonlinear.
To avoid including too many instances originating from the same model, including more than two instances for each identifier in F is discouraged. Further, instances that seem trivial, i.e., which are solved by all solvers in no more than five seconds, or could not be solved by any solver are excluded. Introducing penalty terms, the following optimization problem for instance selection is obtained: i∈I:ei∈e This problem was solved for N varying between 180 and 220. For N = 208, this yield a selection of 200 instances with an acceptable penalty value of 106. See Section A for a list of all selected instances. Table 1 shows the number of instances for each element of D × E. For five identifiers from F , three instead of two instances were selected, i.e.,  Table 1: Number of instances selected with "discreteness" d i and "nonlinearity" e i in intervals from D and E.

Missing Variable Bounds
To compute a lower bound on the optimal value of a minimization problem, all solvers considered here construct a convex relaxation of the given problem. For nonconvex constraints, this often relies on the computation of valid convex underestimators or concave overestimators. As these typically depend on variables' bounds (recall the McCormick underestimators (1)), missing or very large bounds on variables in nonconvex terms can mean that an instance will be very hard or impossible to solve. Even when the user forgot to specify some variable bounds, the solver may still be able to derive bounds via domain propagation. Further, once a feasible solution x has been found, additional bounds may be derived from the inequality c x ≤ c x. However, as there are always cases where bounds are still missing after presolve, solvers invented different ways to deal with this obstacle.
If SCIP cannot construct an under-or overestimator because of missing variable bounds, it continues by branching on an unbounded variable. This way, there will eventually be a node in the branch-and-bound tree where all variables are bounded. Nodes that still contain unbounded variable domains may be pruned due to a derived lower bound on the objective function exceeding the incumbents objective function value. But it may also be the case that pruning will not be possible and SCIP does not terminate. However, variable bounds after branching cannot grow indefinitely in SCIP, but are limited by ±10 20 by default. That is, SCIP does not search for solutions with variable values beyond this value.
The other solvers considered here add variable bounds based on a heuristic decision. If BARON is still missing bounds on variables in nonconvex terms after presolve, it sets the bound to a value that depends on the type of nonlinearity involved. Typically, this value is around ±10 10 . BARON also prints a warning to the log and no longer claim to have solved a problem to global optimality, i.e., it does not return a lower bound. Lindo API adjusts the bounds for all variables that are involved in convexification to be within [−10 10 , 10 10 ]. At termination, it returns the lower bound for the restricted problem. Octeract proceeds similarly and introduces a bound of ±10 7 for every missing bound and returns the lower bound for the restricted problem at termination.
Evidently, just passing an instance with unbounded variables to a solver with default settings may mean that each solver solves a different subproblem of the actual problem and often also reports a lower bound that corresponds to the solved subproblem only. Fortunately, for every solver considered here, parameters are available to adjust the treatment of unbounded variables. A first impulse could be to tell all solvers to set missing bounds to infinity, but this is not possible as each solver treats values beyond a certain finite value as "infinity" (BARON: 10 50 , Octeract: 10 308 , SCIP: 10 20 ). Changing this value is either not possible or not advisable.
We therefore decided to aim for ±10 12 as replacement for a missing variable bound. For BARON and SCIP, the GAMS interface can replace any missing bound by ±10 12 before the instance is passed to the solver. BARON will hence also return a lower bound for this restricted problem. For Lindo API, a solver parameter can be changed so that bounds for all variables subject to convexification are bounded by ±10 12 (instead of ±10 10 ). Finally, also for Octeract, all missing bounds are set to ±10 12 (instead of ±10 7 ) by changing of a solver parameter. Note, that this still does not ensure that all solvers solve the same instance, since Lindo API would still change initial finite bounds beyond 10 12 and may also not set any bounds for variables that are not involved in convexification.
Next to missing bounds on problem variables, also singularities in functions (e.g., 1/x, log(x)) can prevent finite under-or overestimators from being available. Unfortunately, there are no parameters available to ensure a uniform treatment of this case in all solvers. SCIP ensures that a variable x in x p , p < 0, or log(x) is bounded away from zero by 10 −9 , and terminates with a lower bound for this modified problem. BARON applies the same method as the one for missing bounds on problem variables to choose a suitable bound on x. No lower bound is returned at termination then. The methods in Lindo API and Octeract are not known to us.

Solution Quality
To ensure that all solvers return solutions of the same quality, constraints of (MINLP) are required to be satisfied with an absolute tolerance of 10 −6 . This applies to linear and nonlinear equations, variable bounds, and integrality.
In addition, a tolerance on the proof of optimality is set. For this purpose, typically, solvers are allowed to stop when the absolute or relative gap between lower and upper bounds on the optimal value are sufficiently small. Since the test set is diverse and has optimal values of varying magnitude, setting only a relative gap limit and no absolute gap limit would be preferable. Unfortunately, Octeract does not permit different values for these limits. As a compromise, BARON, Lindo API, and SCIP are run with 10 −4 as relative gap limit and 10 −6 as absolute gap limit, while for Octeract, 10 −6 is used for both the absolute and relative gap limit. Below, the impact of using a tighter optimality tolerance for Octeract is analyzed in a separate comparison.

Working Limits
As working limits, a time limit of two hours is used and the jobs on the cluster are restricted to 50 GB of RAM. Further, the amount parallelization (multiple threads or processes) that a solver is allowed to use is limited in varying degrees. To simplify the presentation, the term "threads" is used also for Octeract, even though it uses multiple processes instead of threads to parallelize its solving process.

Summary
To summarize, the following parameters are used:

Correctness Checks
The GAMS/Examiner 2.0 tool is used to evaluate the violation of constraints, bounds, and integrality in the solutions reported by the solver. Examiner generates for each solver a file that contains for each instance the solving time, returned lower and upper bound, and solution infeasibility. A run of a solver on an instance is marked as failed if the solver terminated abnormally, the solution is not feasible with respect to the feasibility tolerance, or the lower or upper bound contradicts with the bounds on the optimal value that are specified on the MINLPLib page. Note, that the primal and dual bounds on the MINLPLib page were calculated without enforcing the ±10 12 limit on unbounded variables. However, in order for an instance to be accepted into the test set, one of the solvers considered here must have solved the instance and found an optimal value that fits within the lower and upper bounds given at MINLPLib. It is therefore acceptable to use these bounds for checking.
A run that has not failed is marked as solved if the relative or absolute limits on the gap between lower and upper bound are satisfied. If a solver stopped without closing the gap before the time limit, then the solver time is changed to the time limit. The only exception here is BARON, which stops on two instances before the time limit without reporting a lower bound due to singularities in functions (see Section 3.2.1). To be consistent with the treatment of other solvers, these two instances were accounted as solved by BARON with the original solver time.

Serial Mode
For the main comparison, all parallelization features in the solvers were disabled, that is, GAMS was run with option threads set to 1. In addition to the solver itself, results for the virtual best and virtual worst solver are reported, which are obtained by picking for each instance the fastest or the slowest solver, respectively. Table 2 shows for each solver the number of instances that could be solved, the number of times the time limit was reached, and the number of runs that were marked as failed. Further, the shifted geometric mean of the running time of the solver is provided. The shift has been set to 1 second. Here, instances that failed are accounted with the time limit. The performance profile [23] in Figure 4 shows the number of instances a solver solved within a time that is at most a factor of the fastest solvers time.   The results show a small lead of BARON before SCIP with respect to both number of instances solved and average time. Since the number of timeouts is almost equal, one could argue that it is the higher stability of BARON that moves it onto the first place here. In fact, the 41 fails of SCIP are due to returning a wrong optimal value 16 times, returning an infeasible solution 23 times, and aborts due to numerical troubles for two instances. For BARON, fails are due to returning a wrong optimal value 26 times and an infeasible solution only once. While SCIP 8.0 has made a large step forward in ensuring that nonlinear constraints are satisfied in the non-presolved problem, violations in linear constraints or variable bounds still occur for a few instances. These are typically due to variables being aggregated during presolve.
Even though Octeract and Lindo API solved considerably fewer instances than BARON and SCIP, which also results in an increased mean time, it is noteworthy that each of the two is also the fastest solver on 270 and 66 instances, respectively. Octeract also produced correct results for 95% of the test set, while for Lindo API a relatively high number of wrong optimal values, infeasible solutions, or aborts is observed.
The large differences between the real and virtual solvers show that none of the solvers dominates all others or is dominated.
Next, the effect of changing the gap limit for Octeract has been investigated. Recall from Section 3.2.2 that relative and absolute gap limits of 10 −6 and 10 −4 , respectively, were used for all solvers except for Octeract. Since Octeract does not allow choosing these limits separately, it had been run with the tighter relative gap limit of 10 −6 . To check whether this lead to a considerable disadvantage for this solver, the solver was rerun on the 200 non-permuted instances with both relative and absolute gap limit set to 10 −4 . The table in Section B.2 shows that the change in the convergence tolerance had essentially no effect on the solver's performance. In both cases, the same 134 instances could be solved. The mean time changed from 178.6 for a limit of 10 −6 to 179.0 for a limit of 10 −4 .

Parallel Mode
In the next comparison, each solver is allowed to use multiple threads or processes. Since SCIP's use of multiple threads is limited to presolving MIPs, checking quadratic functions for convexity, and the linear algebra in Ipopt, FiberSCIP [64] is used to run SCIP in parallel mode. FiberSCIP is a shared-memory instantiation of the UG framework [62] for the parallelization of branch-and-bound based solvers. The framework parallelizes the search of the branch-and-bound tree by collecting and distributing open problems between independent instances of SCIP. In addition, the first seconds of the solving process are used for a "racing ramp-up" phase. Here, multiple SCIP instances with differing parameter sets are run concurrently, and the one with the best lower bound is used for the remaining solve. The UG version was 1.0.0 beta3.
For the runs in serial mode, reaching the memory limit of 50 GB was not observed for any solver. But since parallelization often increases memory requirements, a memory limit of 100 GB has been used for the runs in parallel mode. Since this meant a reduction in available computing resources, only the 200 non-permuted instances are used for comparisons. Table 3 shows, for an increasing number of threads, the number of instances that could be solved by each solver and the mean time spent. In addition, Figure 5 provides a performance profile that compares SCIP and FiberSCIP only. Section B.3 gives detailed results.  Apparently, enabling parallelization seldom has a considerable advantage on this test set. For Octeract, where parallelization was part of its original design, a small increase in the number of instances that could be solved and a reduction in time by 34% when using up to 8 parallel processes is observed. As far as we know, BARON's use of multiple threads is currently limited to enabling this feature in the solver for a MIP relaxation. As a consequence, only moderate improvement of running time by up to 11% are seen. For Lindo API, an improvement due to parallelization seems to be impeded by a further increase in fails when using multiple threads (1 thread: 24, 4: 28, 8: 35, 16: 43). Finally, for SCIP/FiberSCIP the additional overhead due to the parallelization being build on top of the solver instead of being tightly integrated is not compensated by the use of multiple threads. However, in contrast to other solvers, a monotonous improvement in both number of instances solved and mean solving time when increasing from 4 to 16 threads is observed. Further, the virtual solvers in the  performance profile show that FiberSCIP can solve instances that SCIP on one thread couldn't solve.
Finally, note that a benefit due to parallelization can usually only be expected for rather challenging instances because of the additional overhead in duplicating and synchronizing data and processes. However, the test set deliberately included only instances that could already be solved by some solver in serial mode, and only instances that were trivial for all solvers, though they may be solved quickly by some, were excluded. As a small experiment, for each solver only those instances that required at least 10 or 100 seconds to solve in serial mode were considered. Unfortunately, this essentially repeated the trends shown in Table 3, so details are omitted here. A more thorough analysis of the parallelization capabilities of MINLP solvers using a set of challenging instances only would be necessary, but exceeds the scope of this paper.

Conclusion
The development of the MINLP solver in SCIP has come a long way. In a recent versionto-version comparison [57, slides 49-51], a steady improvement in the performance of SCIP on MINLP over the last ten years has been measured, resulting in SCIP 8 solving twice as many instances as SCIP 3 and a speed-up of factor three. Partially, this improvement has been achieved by improving and adding features particular for MINLP. However, due to the generality of SCIP as a CIP solver, also many developments that targeted MIP solving were immediately available for MINLP solving.
With version 8.0, the MINLP solving capabilities of SCIP have been largely reworked and extended, which resulted in a considerable improvement in both robustness and performance [14,57]. As a result, SCIP's performance is currently on par with the state-of-the-art commercial solver BARON.
In contrast to the commercial solvers considered here, SCIP offers a variety of possibilities for a user, developer, or researcher to interact with the solving process. In particular, the newly added "nonlinear handler" plugin type sets SCIP apart from most other MINLP solvers, as it allows focusing on experimenting with new algorithms to handle certain structures in nonlinear functions without modifications to the solver's code.
The rather large number of features that are disabled by default shows that tuning and improving the existing code base has become increasingly necessary. Future work will of course also include the addition of new features, e.g., improved separation for signomial functions [77], the use of alternative relaxations for polynomial functions [17], or monoidal strengthening of intersection cuts for quadratic constraints [22].
The increasing number of cores in present-day CPUs means that to fully utilize an ordinary desktop computer, a solver needs to be parallelized. While the UG framework provides such a possibility for SCIP in both shared and distributed memory environments, the experiments with FiberSCIP on up to 16 threads show that more tuning is necessary to ensure that the additional overhead can be compensated by the use of additional computing resources. Since the development of UG was initially motivated and has focused primarily on the use of large-scale parallel computing environments [63], an investigation on using UG with SCIP to solve challenging MINLPs in distributed memory environments with many CPU cores could be interesting as well.

B Detailed Computational Results
The following tables show the outcome from running each solver on instances from the test set. If an instance has been solved to optimality, the time spend is reported. Note that due to differences in formulas for the relative gap in the various solvers, an instance may be accounted as solved even though the solver stopped at the time limit. If a run has been flagged as failed, the reason for this decision is given: "abort" if the solver did not return with a result, "nonopt" if the reported upper or lower bound were not consistent with those given by MINLPLib, and "infeas" if the reported solution is not feasible with respect to the feasibility tolerance. Otherwise, the relative gap at termination is reported, which is ∞ if no feasible solution or lower bound has been computed. An exception here is BARON, where an instance is considered as solved if the solver only decided to not return a lower bound due to singularities in functions (see Section 3.2.1). This is the case for instances mhw4d and multiplants mtg2 and their permutations. For each instance, a time or gap that is at most 10% worse than the one from the best solver on this instance is printed in bold font.

B.1 Serial Mode
The following