New bounding schemes and algorithmic options for the Branch-and-Sandwich algorithm

We consider the global solution of bilevel programs involving nonconvex functions. Deterministic global optimization algorithms for the solution of this challenging class of optimization problems have started to emerge over the last few years. We present new schemes to generate valid bounds on the solution of nonconvex inner and outer problems and examine new strategies for branching and node selection. We integrate these within the Branch-and-Sandwich algorithm (Kleniati and Adjiman in J Glob Opt 60:425–458, 2014), which is based on a branch-and-bound framework and enables the solution of a wide range of problems, including those with nonconvex inequalities and equalities in the inner problem. The impact of the proposed modifications is demonstrated on an illustrative example and 10 nonconvex bilevel test problems from the literature. It is found that the performance of the algorithm is improved for all but one problem (where the CPU time is increased by 2%), with an average reduction in CPU time of 39%. For the two most challenging problems, the CPU time required is decreased by factors of over 3 and 10.


Introduction
Bilevel programming problems (BPP) have a long history in operations research [6,20] and occur in diverse applications, such as chemical and civil engineering, economics, management, transportation etc.; see e.g. [4,10,14,16] and references therein. From the mathematical point of view, bilevel problems are hierarchical mathematical programming problems where an outer (upper-level) problem is constrained by an embedded inner (lower-level) problem: where the n-dimensional vector x ∈ X R n denotes the outer (leader) variables and the m-dimensional vector y ∈ Y R m denotes the inner (follower) variables. Functions F, f : R n × R m → R denote the outer/inner objective functions, G: R n × R m → R p and g: R n × R m → R r are vector-valued outer/inner inequality constraint functions and H: R n × R m → R q and h: R n ×R m → R s are vector-valued outer/inner equality constraint functions. In this work, all nonconvex bilevel problems that fall within the class (BPP) are considered without any convexity assumptions on the functions in the problem. Notice that if for a given x the inner (sub)problem parameterized by upper level variable x ∈ X : has multiple globally optimal solutions in Y to which the follower is indifferent, one must decide whether to adopt an optimistic or pessimistic formulation [14]. Here the optimistic (co-operative) formulation is assumed, i.e., the leader can choose among globally optimal lower-level solutions to achieve the best outer objective value. Special cases of bilevel programming have been studied extensively, and many algorithms have been proposed, see e.g., [4,7,8,11,12,[14][15][16]19,28,35,36,43] for reviews. However, the general nonconvex form is very challenging and only recently were the first methods to tackle this class of problems proposed. The deterministic approach of Mitsos et al. [27] and the approximation method of Tsoukalas et al. [41] apply to very general nonlinear bilevel problems, restricted solely by the absence of inner equality constraints. Recently, both these approaches were extended as a new discretization-based algorithm for bilevel problems with coupling equality constraints [17].
The Branch-and-Sandwich (B&S) algorithm introduced in [22,23] makes it possible to solve general nonconvex bilevel problems that include inner equality constraints provided that a constraint qualification holds for the inner problem. The theoretical advances in [22] were demonstrated practically in [23] by applying the algorithm to a library of test problems [26], using a combination of software and manual application. The impact of some algorithmic options on performance was briefly investigated in [30]. Recent efforts to develop a fully automated implementation of the algorithm have led to further insights and developments that can result in a significant reduction in the computational effort required to solve nonconvex bilevel problems. New theoretical findings, together with a revision of the B&S algorithm, an illustration of the impact of the proposed changes on an illustrative example, and a comparison of the computational performance of the various algorithmic options on 10 problems are presented in this paper. It is assumed that the reader has some familiarity with bilevel programming and deterministic global optimization. Thus, not all concepts are defined but appropriate references are provided wherever relevant for readers who are new to the field. Most aspects of the proposed approach are applicable to the case of mixed integer nonlinear bilevel problems and can therefore be incorporated in the MINLP version of B&S [24], although the specifics of this are beyond the scope of this paper.
The remainder of this paper is organized as follows. Basic concepts of B&S and notations are introduced in Sect. 2. In Sect. 3 new bounding schemes for B&S are introduced based on theoretical analysis. In Sect. 4 the B&S search tree management strategy is extended to include heuristics for branching and node selection. In Sect. 5 the revised and the original B&S approaches are summarized. In Sect. 6 an illustrative example is used to investigate the impact of the proposed changes. In Sect. 7 a computational comparison of the performance of B&S using the new and original bounding schemes as well is performed, as well as the proposed heuristics. Finally, Sect. 8 gives conclusions.

Basic concepts and notation
We start with concepts and definitions required for the Branch-and-Sandwich algorithm.
Throughout the text, we use Remarks as a way to emphasize key implications of theorems and corollaries.
Definition 1 (Node) A node V k [equivalently an (n + m)-rectangle] in the Branch-and-Sandwich tree is uniquely defined by a set of bounds, representing a subdomain of X × Y : (1) Definition 2 (Node properties) Each node V k in the Branch-and-Sandwich tree has a unique number (index), denoted by superscript k. The root node (the whole X × Y domain) has k = 1. A node V k has the following attributes (properties): f k : A valid lower bound on the global solution of the inner problem restricted to node V k ; -f k : A valid upper bound on the global solution of the inner problem restricted to node V k ; f UB,k : The best inner upper bound over the domain X k × Y (note this includes all of Y , rather than the subset of Y in node V k ); -F k : A valid lower bound on the global solution of the bilevel problem restricted to node V k ; -x k : Outer variable vector corresponding to F k ; s k : State of node V k : active, inner-active or inactive.
l k : level (depth) of node V k in the B&S search tree; for the root node l 1 = 0; Based on the bounding values computed at a node V k , we may decide to fathom the current node. Two types of fathoming operations take place in the B&S algorithm (cf. Sect. 4.6 in [22]). If a node has been shown not to contain a global solution of the inner problem, i.e., for all x ∈ X k , the global solution of the inner problem lies outside of Y k , it is "fully-fathomed". When a node has been shown not to contain a global solution of the bilevel problem, but it may nevertheless contain global solutions of the inner problem over the whole Y for x ∈ X k , we need to continue exploring it further via branching. In such a case, it is "outer-fathomed" rather than fully fathomed. As a result, the state s k of a node V k in the Branch-and-Sandwich tree may have one of three different values: -Active or open B&S continues to explore for both the outer and inner problems for such nodes, because they may contain a global solution of the bilevel problem. These nodes are stored in list L, also used in standard branch-and-bound algorithms [21]. -Inner-active (outer-fathomed) B&S continues to explore such nodes for the inner problem only, because they have been shown not to contain a global solution of the bilevel problem but for some x ∈ X k still may contain a global solution for the inner problem. Such nodes are stored in a list L In . -Inactive (fully-fathomed) all other nodes. No further exploration of these nodes is performed and they are not stored.
The sequential-decision making inherent in bilevel problems implies that the entire host set Y must be considered to bound the solution of the inner problem. Thus, the branching scheme for bilevel problems differs from that for single level problems. The algorithm proposed by Mitsos et al. [27] allows branching on the outer variables, but not on the inner variables. The branching scheme used in the B&S algorithm, on the other hand, allows branching on both variable sets x and y. This is made possible by the existence of list L In .
Lists L and L In are disjoint, i.e.,: L ∩ L In = ∅ and their union L ∪ L In contains all the active nodes. In addition to lists L and L In , we use two further types of lists that link together the nodes in an X -partition (Definition 3). We first define an X -partition as follows.
Definition 3 (X -partition) Let P N be a finite index set and X p denote a subdomain of X . Then an X -partition is denoted as {X p ⊆ X : p ∈ P} where: and ∂X p denotes the relative boundary [5] of X p .

Remark 1
At any given iteration of the B&S algorithm, a specific X -partition is defined and this in turn defines the index set P. Whenever branching takes place on an x variable, the partition is updated.
We can then define the concept of a sublist.
Definition 4 (Sublist) At a given point in the branch-and-bound process, let us consider the partition of X obtained by using the smallest subdomains generated for element of x ∈ X . Consider some subdomain in the partition X p,s , s ∈ S where S N is a finite index set. The sublist L p s consists of all nodes in L ∪ L In for which the X domain contains X p,s .
Then, we define a sublist index set K p s that contains the indices of all nodes within a sublist L p s .

Definition 5 (Sublist index set)
The sublist index set K p s is given by By construction of the branch-and-bound tree, the Y subdomains of the nodes in L p s are non-overlapping. However, the X subdomains in two different sublists may overlap because not all nodes have undergone the same extent of branching. We thus define an independent list that connects together all sublists that cover overlapping parts of the X domain but different subdomains of Y . Such a list makes it possible to take into account the hierarchical structure of the bilevel problem, in which the outer x variables drive the decision-making at the inner level, during the exploration of the branch-and-bound tree.
Definition 6 (Independent list) An independent list L p , p ∈ P is given by: where s p is the number of sublists in L p and, when s p > 1, for each s = 1, . . . , s p , there exists s = 1, . . . , s p , s = s and a pair of nodes For simplicity, we sometimes use the shorthand V k ∈ L p to indicate that node V k is such that there exists at least one s ∈ {1, . . . , s p } for which V k ∈ L p s .
Finally, we define the concept of the independence condition. Definition 7 (Independence condition (IC) [22]) If after partitioning of some nodes from L p , p ∈ P, there exist index sets I and J such that: replace L p with two new independent lists L p 1 and L p 2 : where p 1 = p and p 2 = |P| + 1.
In the revised B&S algorithm, we use the same list management scheme as in original B&S, therefore we refer to Sect. 4.1 in [22] for details. To provide some context to the concepts presented in the following sections, a brief statement of the main steps of the Branch-and-Sandwich algorithm is given in Algorithm 1.

New bounding schemes
As in all branch-and-bound algorithms, upper and lower bounds on the solution are obtained at each node. Given the bilevel nature of the problem, such bounds are derived for both the outer and inner problems, i.e., four bounds are obtained for each node. In this section, we present new bounding schemes based on theoretical analysis and practical observations and refer the reader to [22] for the bounding schemes that remain unchanged.

Inner upper bounding scheme at a node
To generate the inner upper bound at a node V k ∈ (L ∪ L In ), the original bounding scheme involves the solution of a challenging optimization problem. Here, we show that this can be avoided at some nodes and that this can result in a guaranteed improvement in the performance of the algorithm.
Two formulations of the inner upper bounding problem have been previously proposed (cf. Sect. 4.4 in [22]): a max-min problem, which yields an exact value of the inner upper bound: or a computationally less expensive relaxed inner upper boundf k , which is based on a semi-infinite reformulation of (IUB(V k )) and a KKT relaxation [37,38]: whereg is the concatenation of the inner inequalities g and the bound constraints y L ≤ y ≤ y U , i.e.:g We consider each formulation in turn.

Bound inheritance for the inner upper bound
We first prove a useful property of the inner upper bound f k,U .

Theorem 1
The inner upper bound f k,U is not improving when branching on an inner (y) variable takes place, i.e., Proof After branching on an inner variable (y) at a node V k = X k × Y k we obtain two new child nodes X k × Y k 1 and X k × Y k 2 such that Y k = Y k 1 ∪ Y k 2 . Then, for each fixedx ∈ X k we have where the equality must hold for at least one of k 1 and k 2 since the solution(s) of the left hand side is (are) in Y k = Y k 1 ∪ Y k 2 . Next, from Eq. (7) it follows that over all x ∈ X k we have where the equality must again hold for at least one of k 1 and k 2 . Finally, from Eq.
and this concludes the proof.
A graphical illustration of Theorem 1 can be seen in Fig. 1, where the bisection takes place at point y = 0. After branching, the newly obtained child nodes are: . Selecting a fixed value ofx = 0.5, we find that f 1,U = −0.08, f 2,U = 0.02 and f 3,U = −0.08 so that f 1,U = min f 2,U , f 3,U in agreement with Theorem 1.

Corollary 1
After branching at node V k on a y-variable and creating two child nodes V k 1 and V k 2 , computation of the inner upper bound f k,U for the child nodes is unnecessary.
To describe the proposed inner upper bounding scheme when branching on a y variable, we first note that all sublists L p s , s ∈ {1, 2, . . . , s p } containing node V k are modified in Step 5 of Algorithm 1, such that: and that no new sublists are created. Then, we can use Theorem 1 to set Thus computation of the inner upper bound or relaxed inner upper bound only takes place at the root node and following branching on an x variable. This is a useful property because the computational complexity of solving the inner upper bounding problem is similar to solving the original bilevel problem [18]. Theorem 1 shows that there is no impact on the bounds when avoiding these calculations.

Bound inheritance for the relaxed inner upper bound
Next, we consider the case of the relaxed inner upper boundf k . In contrast to (IUB(V k )), the relaxed inner upper bounding scheme (RIUB(V k )) does not perform a minimization over Y , but rather a maximization over a subset of Y , therefore an analog to Theorem 1 does not hold. This situation can easily be observed in Fig. 1. The relaxed inner upper bound over the root node V 1 givesf 1 = 0.17 at the suboptimal KKT point (1, 1); solution over node V 2 yieldsf 2 = 0.0 (the former suboptimal KKT point (1, 1) does not belong to this node) and solution over node V 3 givesf 3 = 0.17; thus,f 1 We note, however, thatf 1 remains a valid upper bound on the solution of the inner problem at both child nodes and can therefore be inherited by the child nodes. Heuristics could be developed to trade-off the cost of computingf k against the benefit of tighter relaxed inner upper bounds, but this is not explored here. Instead, we update the relaxed inner upper bound for each new node when appropriate (see Remarks 3, 5).

Inner upper bounding scheme over X k × Y
Having computed the inner upper bound f k,U or relaxed inner upper boundf k for a specific node V k , we need to generate an upper bound that is valid for the inner problem over X k on the entire Y domain. In this section, we show how the cost of obtaining such a bound can be reduced and how the value of the bound can sometimes be improved (decreased).

Best inner upper bound for X k × Y based on list L p
Within each sublist of L p an entire Y partition is present for a given subdomain of X p . Therefore the minimum value of the inner upper bound within a sublist expresses the lowest inner upper bound with respect to Y . Furthermore, different sublists contain overlapping X . Taking this into consideration, we obtain the following definition of the best inner upper bound over L p . Definition 8 (Best inner upper bound for a list L p , cf. Definition 7 in [22]) The best inner upper bound ( f UB, p ) for the list L p is the lowest value of the inner upper bound over the y variables but the largest over the x variables: Remark 2 f UB, p is a valid inner upper bound for all nodes belonging to list L p (corresponding to the X -partition, X p ). It is used to check whether a full-fathoming of some nodes is possible (see Sec. 4.6 in [22]).

Theorem 2 If after branching at a node V k on an x-variable, the independence condition (see Definition 7) is not satisfied, then the best inner upper bound f UB, p cannot improve.
Proof For clarity, all lists, sublists, index sets and the best inner upper bound before branching are indicated by the superscript 0 and those after branching are indicated by the superscript n for "new". Before branching, the best inner upper bound ( f UB, p,0 ) for a list L p,0 is After bisecting on an x-variable, two child nodes V k 1 and V k 2 are created such that We then replace all sublists of L p,0 containing node V k (i.e., sublists L s }) by one or two new sublists that contain V k 1 or V k 2 instead of V k (cf Definition 6 in [22]). If the independence condition (IC) is not satisfied, no new independent list is created. Moreover, from the properties off k , it follows in such a case that One can thus see that a valuef k will only be removed from the maximization in (10) when (IC) is satisfied. More formally, three cases are possible: Case 2: Case 3: There is no possible improvement to f UB, p,0 when (14) holds; without loss of generality, consider the case when (12) holds, noting that equivalent arguments can be made when (13) is satisfied. From (11) and (12) it follows that min j∈K p,n s i and Therefore, from (15) to (17) it follows that f UB, p,n = f UB, p,0 .
This situation is illustrated in Example 1.

Example 1
Assume that the current Branch-and-Sandwich tree consists of one independent list containing one sublist: Next, assume that node V 2 is selected and bisection on variable x takes place. We have one independent list containing two sublists: Fig. 2b). While a better (lower) inner upper bound is achieved for node V 4 , (f 4 = 0.0), this does not improve the best inner upper bound using (BIUB(L p )): Finally, assume that node V 3 is selected and bisected again on variable x. We obtain two new child nodes, V 6 and V 7 . This time, the independence condition is satisfied, i.e., two independent lists containing one sublist each are created: Fig. 2c). As a result a tighter best inner upper bound over list L 1 is obtained:

Remark 3
From Theorem 2 and Remark 2, it follows that for the newly created nodes after branching on an x-variable, the computation of the relaxed inner upper boundf k (as well as the best inner upper bound f UB, p ) can be postponed until the list satisfies the independence condition (IC).

Remark 4
It further follows that in cases where all nodes within an independent list are outer-fathomed before the independence condition is met, the computation of the relaxed inner upper boundf k can be avoided altogether. (IC) in [22]] is satisfied (c)

Best inner upper bound for X k × Y based on a subset of list L p
Although Theorem 2 indicates that the best inner upper bound cannot improve with the original inner upper bound scheme when the independence condition does not hold, this can be circumvented by modifying the bounding scheme. We take advantage of the fact that a given node V k which belongs to at least one sublist in L p may not belong to all sublists Then, a best inner upper bound that is valid for node V k can be obtained by taking into account only those sublists that contain V k , i.e., by considering smaller subsets of subdomain X p where possible.

Definition 9 (Best inner upper bound over the set of sublists of
Note that this best inner upper bound is differentiated from the best inner upper bound defined over all of L p by the use of the superscript k rather than p. Next, this new bound is shown to have the useful property that it is at least as tight as the original best inner upper bound (BIUB(L p )).

Theorem 3
The best inner upper bound ( f UB,k ) for the set of sublists in L p that contain node V k is at least as tight as the best inner upper bound ( f UB, p ) for the list L p containing node V k , i.e., Proof The set L p,k of sublists to which node V k belongs is a subset of all sublists in inde- Fig. 3 Illustration of a case in which the best inner upper bound for node V k based on a set of sublists (BIUB(L p s )) yields a tighter bound compared to the original approach in which the best inner upper bound (BIUB(L p )) is derived based on L p . As before, the numbers within rectangles denote node indices From (19) it follows that the set of minimal inner upper bound values for all the sublists containing node V k is also a subset of the set of minimal values for all sublists in L p : Thus, from (19) and (20) it follows that max min and this concludes the proof. Fig. 3 to show the advantage of deriving a best inner upper bound based on only those sublists in L p that contain node V k . In the case shown in Fig. 3, there is one independent list containing two sublists:

Example 2 This approach is illustrated in
Using the original (BIUB(L p )) approach, the best inner upper bound, for list L p is f UB, p=1 = max{min{0.5, 0.0}, min{0.5, 1.0}} = max{0.0, 0.5} = 0.5, and this is valid for all nodes. Using the alternative approach (BIUB(L p s )), the best inner upper bound for nodes V 3 and V 5 is the same as f UB, p=1 : f UB,k=3 = f UB,k=5 = min{0.5, 1.0} = 0.5, but for node V 4 , a lower (tighter) value is obtained: The impact of generating a potentially tighter best inner upper bound f UB,k for the domain X k × Y is investigated in Sect. 7. We note that the computational cost of computing f UB,k is similar to that of computing f UB, p .

Remark 5
From Theorem 3, it follows that to take full advantage of the fact f UB,k is a tighter bound than f UB, p , the computation of f UB,k should take place whenever branching on an x-variable, because f UB,k can improve after branching on an x-variable even when the independence condition is not satisfied, in contrast to f UB, p (cf., Theorem 2 and Remark 3).

Remark 6
The use of the potentially tighter f UB,k instead of f UB, p to check for fullfathoming (Sec. 4.6 in [22]) can lead to the faster removal of nodes from all lists.

Remark 7
Finally, recall that it was proven in Lemma 2 in [22] that the original fathoming scheme guarantees that a subregion where a global optimal solutions lies remains inner-active (i.e., in at least one list L p , p ∈ P) until that global solution is identified. It can be shown trivially that this property is preserved by construction with the revised scheme.

Outer lower bounding scheme
The formulation of the outer lower bounding problem at a node V k ∈ L ∩ L p is based on the single-level reformulation of the bilevel problem [14] and makes use of the best inner upper bound for the corresponding X k × Y domain (cf. Sect. 3.2 of [22]) as a constraint on the value of the inner objective function: We therefore update the formulation to use f UB,k rather than f UB, p : thereby obtaining a lower bound F k on the bilevel objective function that is as least as tight as that proposed in [22].

Outer upper bounding scheme
In the revised B&S, we propose using an outer upper bounding scheme based on [27]. To motivate this choice, we undertake an analysis of the behaviour of the outer upper bounding scheme in [22], discussing observed drawbacks and suggesting possible ways to overcome them.

Analysis of existing outer upper bounding scheme
We begin by summarizing the outer bounding scheme in [22], given a node V k ∈ L ∩ L p and (x k ,ȳ k ), the solution of the corresponding outer lower bounding problem. We setx equal tō x k , and seek a node V k ∈ (L ∪ L In ) ∩ L p based on (MinRISP(x, V k )) to formulate the outer upper bounding problem. We first introduce relevant definitions.
Definition 10 (Node index set based onx) Given a domain X k , the corresponding list L p in which all nodes that cover X k are stored, and a pointx ∈ X k , the index set K p,x is defined as the set of the indices of those nodes V j that belong to a sublist L p s ∈ L p and for which x ∈ X j : Next, recall the definition of w j (x) at some node V j for whichx ∈ X j , given by [22]: wheref ,g,h andh − represent convex underestimators (e.g., [2,39]) of functions f , g, h and − h. We are interested in the node(s) that yield the smallest value of w j (x) over all sublists L p s ∈ L p , s = 1, . . . , s p and among those, we choose the node with the lowest value of the outer lower bound. Then, the index k of node V k is given by: i.e., K p,x w denotes the set of the indices of all nodesk in the sublists of L p for which wk(x) is at the minimum value.

Remark 8
The use of convex underestimators means that problem (RISP(x, V j )) is convex and hence relatively inexpensive to solve, while maintaining the convergent property of the bounding scheme (cf. [22]). This may however come at a cost in terms of the quality of the upper bounds generated and this issue is discussed further in this section.

Remark 9
The choice of the node with the lowest value of the outer lower bound is consistent with the requirement in optimistic bilevel programming that the leader consider the best candidate among indifferent solutions for the follower. Under this assumption, the algorithmic behavior is not affected by which global minimum is obtained during the solution of the inner problem.
For a givenx, problem (RISP(x, V j )) may be infeasible over some or even over all nodes whose indices belong to K p,x . In the latter case, no solution to the bilevel problem exists and no upper bound can be obtained for the pointx. If, however, a node V k ∈ L ∪ L In such that the corresponding (RISP(x, V k )) problem is feasible, the following outer upper bounding problem is solved for x =x: where ε f is the optimality tolerance for the inner problem. If problem (UB(x, V k )) is infeasible, then no solution exists forx; otherwise, an upper bound is obtained in the sense of an ε f -feasible point [27]. The implications of seeking an ε-optimal bilevel solution have been discussed in the literature and been shown in [3,9] that there is no clear pattern in the way suboptimality in the inner problem affects the outer objective function, i.e, two close inner solutions may cause very different outer objective function values. Alternatively, a valid upper bound can be found through any feasible point of problem (UB(x, V k )), e.g., a point obtained from a local solution.
In the following set of remarks, we explore the behavior of this scheme for generating outer upper bounds.

Remark 10
When node V k is inner-active, i.e., it has been outer-fathomed, we already know that the global solution of the bilevel problem cannot lie in this subdomain X k × Y k and we can skip the solution of (UB(x, V k )).

Remark 11
While we would like to avoid repeating unnecessary calculations, the solution of (RISP(x, V k )) can only be avoided if another (RISP(x , V j )) has already been solved with x =x and Y j = Y k . These conditions are restrictive as illustrated in the following example.  Fig. 4b).
The value found is used in (UB(x, V k )).
Next, after branching on variable y, the case shown in Figs. 4c, d is obtained, i.e., there is one independent list containing one sublist: Finally use of convex relaxations in problem (RISP(x, V j )) can cause slow convergence, described in Remark 12.

Remark 12
The value obtained from the solution of problem (RISP(x, V j )) plays a significant role in the (UB(x, V k )) procedure, specifically due to the presence of the inner objective constraint f (x, y) ≤ w k (x)+ε f . The lower the value of w k (x) obtained from (RISP(x, V j )), the more likely it is that the identification of a bilevel feasible point is delayed. This situation can cause slow outer-fathoming, and thus cause the Branch-and-Sandwich tree to expand. This is especially a concern in the early stages of the algorithm, when the convex relaxations in (RISP(x, V j )) can be very loose due to the size of the domain. A natural way to overcome this is to replace subproblem (RISP(x, V j )) with the computationally more expensive but tighter approach: and to select the node V k for problem (UB(x, V k )) using Using tighter w j (x) values, the B&S algorithm is likely to locate bilevel feasible points faster. This can help to improve the best known upper bound value (F UB ). This can, in turn, lead to earlier fathoming of nodes and thus help to keep the Branch-and-Sandwich tree smaller and to reduce the total number of subproblems solved.

Upper bounding problem based on [27]
Let us observe that the original upper bounding procedure can be computationally demanding, especially during the formulation of (MinRISP(x, V k )) which requires the solution of (RISP(x, V j )) over several nodes V j ∈ L p . Therefore the original upper bounding procedure becomes more and more expensive as the size of the independent lists increases. Thus motivated by the performance of the upper bounding scheme in [27], we adopt this alternative approach to obtain a valid upper bound that avoids this overhead. Using the notation in our current work, this requires solving (ISP(x)) defined over the whole of Y : This alternative approach of solving only one (ISP(x, Y )) and one (UB(x, Y )) for each uniquē x, can have at least three significant benefits, highlighted in the following remarks.

Remark 13
This bounding scheme significantly reduces the total number of optimal value functions (w(x)) that must be calculated during the course of the algorithm.

Remark 14
When solving (UB(x, Y )) globally over the whole of Y , the behavior of the B&S algorithm is not affected by which global minimum is obtained in (ISP(x, Y )). While equality holds for the bounds obtained using (ISP(x, Y )) and ISP(x, V k ), i.e., the upper bound obtained with (UB(x, Y )) is at least as tight as (UB(x, V k )), i.e., This is due to the fact that for (UB(x, Y )) is solved over the whole of Y while (UB(x, V k )) is solved over a subset of Y .

Remark 15
Following the identification of a valuex as the solution of the lower bounding problem, it is trivial to determine whether (ISP(x, Y )) has been solved at a previous iteration. Since every (ISP(x, Y )) problem is solved over the whole of Y, it suffices to check whether the same valuex has been generated previously. For this, we maintain the set corresponding to unique solution points of outer-variables (x). This procedure is much simpler than identifying duplicate problems in the original bounding scheme (see Remark 11).

New branching and node selection rules
In this section, new strategies for branching and the management of nodes are introduced.

Branching variable and branching point selection
The branching scheme of B&S is specified by two rules: a branching variable selection rule and a node selection rule. The choice of branching variable (coordinate axis) influences the structure of the tree generated and can significantly affect the computational performance of the algorithm. Let the variable vector v = (x, y) T = (v 1 , . . . , v n , v n+1 , . . . , v n+m ) T be formed first from the n outer variables and then from the m inner variables. In the original B&S approach [22], the branching variable is selected by giving the highest priority to the variable with the largest range (longest edge). If several edges satisfy the longest edge requirement, then the one with the smallest index is selected to be subdivided.
It is easy to notice that, if some variable bounds differ significantly in magnitude, branching on the variables with the shortest edge is not performed until the ranges for all variables are sufficiently reduced. In the revised version we select the branching variable with the largest normalized range as has long been practised in branch-and-bound algorithms (e.g., see [1,33]). Also, when an ambiguous situation arises, i.e., when several variables satisfy the longest (normalized) edge requirement, we employ two different strategies, where the priority is given either to the variable with the lowest variable index [strategy (XY)], as this gives priority to outer variables x) or the variable with the highest index [strategy (YX)]. The rules for branching variable selection are stated in Algorithm 2.

Algorithm 2 Branching variable selection
Consider a node V k ∈ (L ∪ L In ). 1: Select the branching variable v br with the largest normalized range (edge). 2: If several variables satisfy the largest (normalized) range (edge) requirement, there are two options: Option (XY): Find the branching variable v br with the lowest-index: Option (YX): Find the branching variable v br with the highest-index: where superscripts k, L and k, U indicate the lower and upper variable bounds for node V k , respectively.
Having identified a branching variable, a standard bisection strategy is adopted to select a branching point, although other approaches could be adopted [2].

Property 1 (Branching point selection rule)
For the selected branching variable v br , the branching point is found using exact bisection [42].
An illustration of the subdivision process, i.e., the selection of the branching variable and branching point as well as the list management, is presented in Example 4. Fig. 5. In these pictures, the grey color highlights a node that has been selected for branching. Numbers inside the cuboids are the unique node indices k. At the first iteration (Fig. 5a, d), the selected branching variable v br depends on the branching variable selection rule, as all three variables satisfy the normalized longest range requirement (see Algorithm 2, Step 1). Therefore, when the (XY) strategy is used, i.e., when priority is given to the variable with the lowest index, the selected branching variable is v br = x 1 (as noted in Fig. 5a); however when the (YX) strategy is used, the selected branching variable is v br = y 1 (see Fig. 5d). After branching (bisection) on the selected variable in each case, we obtain two completely different partitions illustrated in Fig. 5b, e, respectively. After bisection on variable x 1 , there are two independent lists with one sublist each:

Example 4 Consider the six partitioning examples of the three-dimensional space
but there is only one independent list after bisection on variable y 1 : If we assume that node V 2 is selected at the next iteration, the same variable, x 2 , is chosen using both branching variable selection rules. However, after branching we obtain two completely different Branch-and-Sandwich trees. In the (XY) case, which is shown in Fig. 5c, we have three independent lists: while in the (YX) case, we continue to have one independent list but now with two sublists: Thus, the (XY) branching variable strategy speeds up the refinement of the X partition and produces a higher number of independent lists, which can be examined completely independently, while the (YX) strategy results in a smaller number of independent lists and faster refinement of the Y space. While performance of the two approaches may be problem-dependent in a serial implementation, strategy (XY) thus appears to be better suited to a parallel implementation.

Node selection rules
The selection of nodes for branching influences the structure of the branch-and-bound tree and can have a significant impact on the number of iterations needed for convergence [25,31,32]. However, well-known heuristics for node selection from single-level global optimization cannot be applied directly to the bilevel case, as they consider only the "outer" level information. To select the "most promising" nodes in L ∪ L In based on the information from both levels, B&S first identifies the most promising independent list (X -partition) taking into account outer-level information. When a list is found, B&S continues to look for the best candidate only among nodes belonging to this list and selects an appropriate node by taking into account inner-level information. In the revised version presented here, we employ four variants of the node selection operation. In the same vein as in the original selection procedure, B&S selects one node from the list of active nodes L at each iteration and one from the list of inner-active nodes L In , if non-empty. The node selection procedure is described in Algorithm 3. For each of the two steps in the node selection procedure, two options are given and the four variants are obtained by taking all combinations of these options. The original B&S algorithm node selection procedure [22] is very similar to the (Fl)-(l f ) heuristic described here. It implies that preference is given in Step 1 to the independent list containing the node(s) with the best (lowest) outer lower bounding value and in Step 2, to the nodes with the smallest level, i.e, covering the largest subdomain of X × Y . When several nodes in the selected list are at the same smallest level in the Branch-and-Sandwich tree, the node corresponding to the lowest inner lower bound f is chosen (the best candidate from the perspective of the inner problem). An ambiguity can arise in Step 1 when there are nodes with the same smallest outer lower bound value in different independent lists. We address this in (Fl)-(l f ) so that B&S selects the list containing the comparatively less reduced node, i.e., the one with the smallest l value.
The (lf ) heuristic is an alternative strategy to (l f ) where in Step 2, if there are several nodes at the same level of the tree in the selected independent list, the node with the lowest inner upper bound value is chosen. In this way, we choose the best candidate in the selected list from the perspective of the outer problem, as tighter inner upper bound values lead to larger values of the outer lower bound and are thus likely to result in faster outer-fathoming.

Algorithm 3 Extended node selection
1: Select an independent list. There are two options for this step: Option (Fl): Find the independent list L p , p ∈ P, containing a node V k ∈ L p ∩ L with the lowest lower bound (F k ). If several such nodes exist, select a node with the smallest level (l k ), and if several such nodes exist, take the node with the smallest index: Option (l F): Find L p containing a node V k with the smallest level (l k ). If several nodes with the smallest level exist, select a node with the lowest lower bound (F k ), and if several such nodes exist, choose the node with the smallest index: (l F) 2: Select nodes for branching. Having found L p , there are two options here: Option (l f ): Select a node V k * ∈ L ∩ L p and a node V k * In ∈ L In ∩ L p , if L In = ∅, with the smallest levels l k * and l k * In , respectively. If several nodes with the smallest level exist, select nodes with the lowest inner lower bounds, f k * and f k * In , respectively, and if several such nodes exist, choose the node with the smallest index: k with I f = k : k = arg min k ∈I l f i and I l = k : k = arg min k ∈LIn∩L p l k ; (l f ) Option (lf ): Select nodes V k * ∈ L ∩ L p and V k * In ∈ L In ∩ L p with the smallest levels l k * and l k * In , respectively. If several nodes with the lowest level exist, select the nodes with the smallest relaxed inner upper bounds,f k * andf k * In , respectively: (lf ) In the remaining two node selection variants, the independent list corresponding to (l F) is selected, i.e., we focus on the list containing the largest node (that is, the smallest level l).
The four variants are illustrated in Example 5. Fig. 6a with the corresponding tree shown in Fig. 6b. All leaf nodes belong to the active list, i.e., L = {V 3 , V 4 , V 5 } and L In = ∅. There are two independent lists corresponding to each member set X p of the X partition with one sublist each: for X 1 = [−1, 0], L 1 = {L 1 1 } = {{V 4 , V 5 }} and for X 2 = [0, −1], L 2 = {L 2 1 } = {{V 3 }}. In Step 1, using the (l F) heuristic, B&S selects independent list L 2 as it contains node V 3 , the only active node at level 1. As node V 3 is the sole candidate in L 2 , B&S selects this node in Step 2. However, when the (Fl) selection heuristic is used in Step 1, B&S selects

Summary of the revised and the original B&S
In this section, in Table 1, we provide a comparative summary of the options available in the two versions of the B&S algorithm (1): the revised B&S version presented in this paper, and the original B&S [22]. We note that it is not possible to reproduce exactly the configuration in [22] as some small ambiguities were present (as was explained in Sects. 4.1 and 4.2) in the algorithm described. We thus use the (YX) branching and (Fl)-(l f ) node selection strategies, as they are closest to those used in the original paper [22]. In the revised algorithm, changes have been made to the node selection and branching rules used in Steps 4 and 5 of the algorithm, while the bounding schemes in Steps 9, 11, 12 have been revised, either by making changes to the bounding problems or by making changes to the timing of the solution of the bounding problems.

Algorithmic comparison of the revised B&S versus the original B&S
In this section the main emphasis is on illustrating the influence of the different bounding procedures on the B&S algorithm by using the default options shown in Table 1. We concentrate on the bilevel instance used in Example 1 in [23], where the behavior of the B&S algorithm at each step was described and explained in detail: Step Step 7: Inner lower bound [22] f k,L ,f k,L f k,L f k,L ,f k,Lf k,L Step 8: Inner upper bound Step 9: Best inner upper bound Step Step 12: Outer upper bound All steps are with reference to Algorithm 1 a Uses best inner upper bound from Step 9 The inner problem satisfies the linear independence constraint qualification LICQ and the problem has a unique global optimal solution at (x * , y * ) = (−1, 1) yielding F * = 0 and f * = −0.83 (see the visualization of the outer and the inner problems in Fig. 7a, b). All bound values obtained using the B&S algorithm with the revised and original bounding schemes are summarized in Table 2a, b, respectively.
From the values obtained at the root node (k = 1), we observe that the inner lower bound f k,L gives a tighter value than the relaxed inner lower boundf k,L , i.e., Using the revised upper bounding scheme, we can safely skip the upper bounding procedure for both nodes, asx =x 2 =x 3 = −1.0 ∈ X. Using the original upper bounding scheme, B&S first solves two (RISP(x, V j )) problems and then one (UB(x, V k )) problem, which yields the first bilevel-feasible solutionF 3 (−1.0) = 0.0; this solution was located at the root node using the revised approach.   At the second iteration, in both versions of B&S node V 3 is selected and bisected on variable x, resulting in two new nodes (V 4 and V 5 ), as shown in Fig. 7d. The two Branch-and-Sandwich trees then consist of one list with two sublists: Note that using results from Theorem 2 and Remark 3, the original B&S approach can be improved by postponing the redundant computation of the relaxed inner upper boundf k and of the best inner upper bound f UB, p until the independence condition is satisfied. However this strategy is not followed for the revised B&S scheme, as stated in Remark 5. The biggest difference in algorithmic behavior is observed when the solution points from the lower bounding procedure are obtained: This leads to the redundant evaluation of w 4 (x) = −0.83. A further issue arises with the second solution pointx 5 = 0.0. First, we observe that this is a new solution point from the perspective of the outer problem. However, before starting the upper bounding procedure, the original B&S checks the outer-fathoming rule (Sect. 4.6 in [22]) for node V 5 , leading to it being moved from the list of open nodes to L In . As a result, the upper bounding procedure is bypassed completely and this can delay the identification of an improved upper bound, therefore slowing down convergence. Using the revised upper bounding scheme, such a situation cannot arise, as the whole of Y is considered for each unique solution of the outer upper bounding problem. Therefore, we add the new solution pointx = 0.0 to the set of unique solutions of the lower bounding problem: X = X ∪ {0.0} = {0.0, 1.0} and perform the revised outer upper bounding procedure for this solution point.
At the third iteration, both versions of B&S select node V 2 and bisect on variable x, generating two new nodes (V 6 and V 7 ), as shown in Fig. 7e. After branching, the independence condition is satisfied so that the inner upper and the best inner upper bound need to be evaluated (Remark 3) and the two Branch-and-Sandwich trees therefore consist of two lists with one sublist each: The most obvious difference between the two approaches at this iteration is that the original approach performs an unnecessary evaluation of w 6 (−1.0) = −0.24. Observe that w 6 (−1.0) = w 2 (−1.0), but node V 2 is no longer in the Branch-and-Sandwich tree; therefore the original approach fails to recognize this unnecessary evaluation. In both approaches, we first perform outer-fathoming of node V 7 ; this is followed by the deletion of list L 2 and the full-fathoming of nodes V 5 and V 7 .
At the fourth iteration, both versions of B&S select node V 4 and bisect on y = 0.5, generating two new nodes (V 8 and V 9 ) shown in Fig. 7f. After this branching step, the two Branch-and-Sandwich trees consist of At this iteration, the best inner upper bound using both schemes, i.e., f UB, p=1 = f UB,k = −0.33 for each k ∈ {6, 8, 9}. This leads to the full-fathoming of node V 8 , followed by outerfathoming of node V 9 . Finally, using the revised scheme we can fully-fathom node V 6 as f 6,L = −0.17, delete list L 1 , and terminate the B&S algorithm successfully. However, using the original approach we cannot fully-fathom node V 6 asf 6,L = −0.41 is lower than the current best inner upper bound. This leads to an additional B&S iteration and the exploration of new nodes, V 10 , V 11 , V 12 , V 13 . A further aspect worthy of note is that the timing of fathoming and list deletion tests can help to reduce the number of relaxed inner lower bounding problems that need to be solved from 13 to 11. Specifically inner open nodes should only be explored after it has been ascertained that the corresponding list L p contains at least one active node. To conclude, the revised approach reduces the total number of solved subproblems and provides a clearer framework to avoid unnecessary calculations or issues with the management of the solution points obtained from lower bounding problem.

Computational comparison of the revised B&S versus the original B&S
In this section, we perform an analysis of the computational impact of the proposed extensions, modifications and improvements. We have extended the BASBL solver [30], an initial implementation of the original B&S approach. The revised bounding schemes and the proposed heuristics for the node selection and branching have been implemented within BASBL and compared against the initial version, which is based on the original algorithm. To test the revised implementation, we use the same ten challenging bilevel instances from [30], originally introduced in [26], with some modifications introduced in [23]. All test instances are solved using the BASBL solver with the revised and original bounding schemes. BASBL subproblems are solved globally with BARON version 17.4.1 [34,40], and accessed using GAMS version 24.8.4 [13] through system calls. Absolute and relative termination tolerances for BARON are set to 10 −5 for all problems. The same outer and inner objective tolerances as in [30] were used, i.e., ε F = 10 −3 and ε f = 10 −5 are set for all problem instances except for mb_1_1_06 and mb_1_1_11 where ε F = 0.1 is used. All experiments are performed on the same computer as in [30], i.e., a 64-bit Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30 GHz processor running Linux OS. The closest combination of branching and node selection rules to the original B&S [23] was used, i.e., (XY) for branching and (Fl)-(l f ) for node selection. We begin by investigating the impact of the new bounding schemes on the performance of the revised B&S algorithm. The number of iterations (I ter), total number of inner and outer (sub)problems (#P) and wall-clock CPU time (t(s)) obtained are reported in Table 3. On average, using the new bounding schemes, the total number of iterations is 39% smaller, the number of solved subproblems decreases by 75% and the execution time is reduced by 87%. This shows the significant benefits that can be achieved through better bounding.
Next, we test the impact of different branching and node selection rules on the performance of the revised B&S algorithm. We report results in Table 4 using three different combinations of branching and node selection rules. Note that we do not include node selection strategies with the (l F) rule, as we find that the choice of (Fl) or (l F) does not have a significant effect on the performance of revised B&S for the test problems used. The combination (YX)-(Fl)-(l f ) is closest to the original B&S [23]. The other combinations are based on changing one of the options at a time.
Comparing the different branching variable selection rules ((XY) vs. (YX)), we can observe that branching preferentially on x variables ((XY) strategy) reduces the total number of solved (sub)problems (#P) by around 11% on average. This is especially evident on the two most challenging problems, mb_1_1_06 and mb_1_1_11, where the (XY) strategy reduces the total number of solved (sub)problems by 22% on average. While the (XY) strategy gives the best average performance, it is not optimal for all tested problems. Finally, only a small difference (of up to around 5%) is observed when choosing one node selection strategy over the other ((l f ) vs. (lf )). We only show the results obtained with (XY)-(Fl)-(l f ), as similar results are found with the (XY) branching strategy.

Conclusions
We have presented algorithmic improvements and extensions to the recently proposed bilevel deterministic global optimization algorithm, Branch-and-Sandwich based on a combination of new theoretical results and heuristic rules. Our algorithmic extensions include changes to the bounding schemes, including an alternative way to obtain tighter the best inner upper bounds, a simpler and significantly faster outer upper bounding scheme as well as alternative choices for branching and node selection. Detailed algorithmic and computational comparisons have been used to demonstrate the beneficial impact of the proposed extensions and modifications on the number of iterations and solution performance for a set of challenging problems. In particular, the number of nonconvex subproblems to be solved and the total execution time were found to be greatly reduced. Numerical comparisons revealed that a notable performance improvement may be achieved by selecting specific node selection and branching rules.
Finally, the detailed description of the fully-fledged BASBL implementation and the application to the larger problems is considered in a recent submission [29]. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.