On interval branch-and-bound for additively separable functions with common variables

Interval branch-and-bound (B&B) algorithms are powerful methods which look for guaranteed solutions of global optimisation problems. The computational effort needed to reach this aim, increases exponentially with the problem dimension in the worst case. For separable functions this effort is less, as lower dimensional sub-problems can be solved individually. The question is how to design specific methods for cases where the objective function can be considered separable, but common variables occur in the sub-problems. This paper is devoted to establish the bases of B&B algorithms for separable problems. New B&B rules are presented based on derived properties to compute bounds. A numerical illustration is elaborated with a test-bed of problems mostly generated by combining traditional box constrained global optimisation problems, to show the potential of using the derived theoretical basis.

illustration is elaborated with a test-bed of problems mostly generated by combining traditional box constrained global optimisation problems, to show the potential of using the derived theoretical basis.
Keywords Branch-and-bound · Interval arithmetic · Separable functions

Introduction
Interval branch-and-bound methods look for guaranteed solutions of global optimisation problems [4,6,7,12,15]. Although these methods have the ability to handle constraints, we focus here on the generic box constrained global optimisation problem, which is to find where S ∈ I n is the search region and I stands for the set of all closed real intervals. With increasing dimension n, the computational effort increases drastically. In design problems, it is not unusual that the objective function f is the composition of several sub-functions, as for example to design electrical devices [9]. If this is the case in an additive way and the variables can be split into subgroups that do not overlap, we call the function completely additively separable.
To be precise, we introduce the following notation. The complete search space is S = (S 1 , . . . , S n ) ∈ I n and the index set of variables is I = {1, . . . , n}. Index sets I [ j] = {i 1 , . . . , i n [ j] } ⊆ I with n [ j] = |I [ j] | elements are used to denote subgroups of variables and their corresponding search region S [ j] ∈ I n [ j] . In this context, we use the following definitions.
Definition 1 Function f : S ⊂ R n → R is additively separable, if ∃ p > 1 such that f can be written as The index sets facilitate the definition of a completely separable function.
We summarise the notation used in the mathematical description: -Box, or n-dimensional interval: X = (X 1 , . . . , -m(X ) is the midpoint of the box X .
w(X ) is the width of X , i.e., the width of the largest component of X .
In general, problem (3) is easier to solve than (1), as the subproblems have a lower dimension. However, in engineering design, cases may have an additive structure in the objective function, but common variables are shared by the underlying functions, see [9]. Nocedal and Wright [13] mention in their book the concept of so-called partially separable problems to derive numerical advantages in Quasi-Newton methods if the Hessian is sparse.
We focus on the case where one group of variables appears in each sub-function. The developed theory is valid for more than one common variable, but in order to keep notation comprehensible, we present it for one common variable. This theory presents the basis for future studies on more complicated relations in the model. Without loss of generality, the common variable is denoted by the last one, x n . So, common variable x n appears in all subfunctions, n ∈ I [1] ∩ . . . ∩ I [ p] . Moreover, in the numerical examples we restrict ourselves to p = 2.
To separate the common variable x n from the non-common variables, the latter will be denoted by The generic problem under investigation can be written as where each vector and the common variable x n ∈ S n ∈ I. There are several ways to denote the common variable, depending on its context. The common variable are denoted by x n when x ∈ S and x n [ j] when x ∈ S [ j] .
We can set z [1] = x 1 and z [2] = x 2 , such that x [1] = (z [1] , x 3 ) and x [2] = (z [2] , x 3 ). In this way function f can be rewritten as [2] (x [2] ), and f [2] (z [2] , Section 2 summarizes interval arithmetic properties relevant for this study. Section 3 shows a standard interval branch-and-bound algorithm which is used for comparison. Section 4 presents the so-called decomposed sub-function perspective (DSP). Section 5 studies DSP properties. This leads to a specific branch-and-bound algorithm described in Sect. 6. Finally, a numerical illustration and conclusions are presented in Sects. 7 and 8, respectively.

Properties of interval inclusion functions
Algorithms based on interval arithmetic have several ingredients. We start with the generic ideas and then focus on how they can be adapted to separable functions with a common variable. Interval arithmetic has been widely studied in the last 40 years [10,11]. We mention some relevant interval arithmetic properties and definitions.
Definition 3 Let f (X ) be the range of f on X . An interval function F : I n → I is an Definition 4 Inclusion isotonicity. Inclusion function F : In the following, we assume that F is an isotone inclusion function.
is a valid lower bound of f over X . This follows directly from Property 1 and is of interest for minimisation problems.
In interval analysis [10], isotone inclusion functions can be constructed as follows: Property 3 Fundamental Theorem of Interval Analysis. Given a real function f : R n → R, a natural interval extension of f , denoted by F NIE , is an isotone inclusion function [10].
In this work, we also use the Baumann inclusion function F B based on the first order Taylor expansion [1,16]: where c ∈ X and F NIE (X ) is the component-wise enclosure of the gradient. In [1], Baumann proved that the best lower bound using this formulation is generated choosing .
The intersection of an isotone inclusion function with another inclusion function also provides an isotone inclusion function: Other isotone inclusion functions are described in [5,8,14,16,17]. Select an interval X from L Selection rule 5. Compute Divide X into subintervals X j , j = 1, . . . , k Division rule 10.
for j = 1 to k 11.
Compute a lower bound F(X j ) of f (X j ) Bounding rule 12.
if X j cannot be eliminated Elimination rule 13. if X j satisfies the termination criterion Termination rule 14.
Store X j in Q 15. else 16. Store X j in L 17. return Q -Bounding: F(X ) is a lower bound of f on X . Upper bound f U of f * is the lowest function value found in the midpoint m(X ) of the evaluated boxes X , F(m(X )). -Selection: box X with the lowest value for bound F(X ) is selected (best first search) in order to find improvements of upper bound f U of f * in an early stage. -Elimination: -Division: Selected and not eliminated interval X is bisected by the midpoint of the widest component. This avoids slicing off small parts assuring convergence of the algorithm. -Termination: If one intends to find all minimum points, a common termination rule is to finish when the size of all boxes is sufficiently small. Algorithm 1 stores boxes with w(X ) ≤ in the final list Q.
As a result of the algorithm, the set of minimum points x * is enclosed by ∪ X ∈Q X , and f * is enclosed by the interval [min X ∈Q F(X ), f U ].

Decomposed sub-function perspective (DSP)
The basic question is how to make use of the fact that the function is separable with common variables when constructing a specific B&B algorithm. One would like to make use of the fact that the subproblems have a dimension n [ j] smaller than n.
One way is to consider copies X n of subboxes X n ⊆ S n for each sub-function. In a branch-and-bound environment, this implies keeping lists L [ j] and Q [ j] for each sub-function f [ j] in an n [ j] -dimensional space.
Instead of only one list L in Algorithm 1, we have a set of p lists L [ j] to store the n [ j] -dimensional (generated and not eliminated) subproblems for each f [ j] sub-function; The algorithm finishes when all L [ j] are empty. Final boxes are stored in Q [ j] . For each sub-function j, the lower bound on the sub-function is given by Notice that after running the algorithm, these lists have to be combined again into a final list Q in n-dimensional space. This requires a last phase to combine those subboxes that have the same range of the common variable.
For higher dimension, we expect to have two effects of using a decomposed approach compared to the standard (full dimensional) Algorithm 1. First of all, the number of boxes generated by Algorithm 1 in the worst case has an exponential behaviour in the dimension. Moreover, in higher dimension the overestimation of interval arithmetic can be larger than in lower dimension when there exist multiple occurrences of variables. If the ratio between non-common and common variables is large, using a decomposition may provide more advantage.
In the following section, we introduce properties of a separable function with a common variable that can be used for the decomposed sub-function perspective (DSP). These properties are the basis for a specific branch-and-bound algorithm.

Properties of a separable function with a common variable
We focus on properties that are of interest in the DSP approach in order to avoid searching in areas not containing a global solution. For this purpose, one should obtain bounds of function values as sharp as possible.

Bounding
We have to distinguish bounds of f [ j] from bounds of f . A lower bound of f [ j] is relatively easy to calculate using inclusion functions An upper bound of the minimum of f [ j] on X ⊆ S [ j] can now be defined as depending on the value of the common variable x n at the midpoint x n = m(X n [ j] ). Let g [ j] (x n ) be the best value found so far of with the common variable having a value of m(X n [ j] ) = x n . Extension of the general bounding concept to the sub-functions is then straightforward as formalised in the following theorem.
, then X cannot contain a n [ j] -dimensional part of an optimum point x * .
There are several ways to obtain a lower bound of f . A straightforward observation is the following.
Property 6 By Definition 1, p j=1 lb [ j] is a lower bound of f over S.
Function f is defined in the n-dimensional space, so a bound of f on X ⊆ S [ j] is not defined. Alternatively, one can look for the lowest values that f can reach when box X marks the solution in the n [ j] -dimensional subspace. For this, it is convenient to define for each interval Y ⊂ S n in the common variable space and for each sub-function j as the subbox in the set of non-rejected subboxes in lists L [ j] ∪ Q [ j] with the smallest lower bound of f [ j] and where the range of the common variable overlaps with Y . The following proposition will facilitate the reasoning.

Proposition 1 Given an interval Y of S n obtained by an iterative uniform division,
, the result follows from the definition of Ψ [ j] (Y ) and inclusion isotonicity of interval arithmetic (see Definition 4). [ j] are generated from the same division scheme, X n [ j] belongs to a uniform subdivision of Y . According to Property 2, ⊂ Y cannot happen because all intervals are generated using the same division scheme.
Given a subbox X ∈ S [ j] , a corresponding lower bound of f can be obtained as follows. Consider f to be evaluated over a (extended) box with the components of I [ j] limited to the subbox X and the other I \ I [ j] components are free in S.

Proposition 2 Consider sub-function j and a corresponding box X
A lower bound of f over E X [ j] is given by Proof Based on (6), Property 5 and inclusion isotonicity (Definition 4), So, A sharper lower bound of f over (E X [ j] ) is given by the following theorem.
is a valid lower bound of f over E X [ j] .
Proof This theorem is a direct consequence of Propositions 1 and 2.

Monotonicity
The monotonicity property for non-common variables is straightforward and is valid for each list separately.
Property 8 Non-common variables monotonicity. Property 4 applies to non-common variables.
Property 9 Common variables monotonicity. From Property 4, focusing on the common variable x n , we know that for an interior optimum Similar to the bounding rule, one has to keep track of what happens in the other lists when checking monotonicity in the common variable. We introduce the following notation for the bound on the derivative: and Now we can formulate the monotonicity property for the common variable using the following theorem.
Proof This theorem follows from considering (12) taking into account that either ∂ x n > 0.

DSP based interval branch-and-bound algorithm (IBB-DSP)
The ingredients of the sketched IBB algorithm are extended towards the DSP approach.

Selection rule
This rule determines the subproblems to be visited in the search tree, searching for the most promising regions. First it should be decided which list L [ j] to process next. A round robin scheme can be applied in the set of non-empty lists, but other schemes can also be considered. For the selection of the subbox X from a list L [ j] to be processed next we evaluated the following criteria.
instead. 6. Take X according to 3. Evaluate F DSP (E X [ j] ). If its value is higher than the saved value, restore X and repeat 3.
Criterion 1 does usually not improve the upper bound of f * due to its focus on subproblem bounds instead of the full objective function.
Criterion 2 is a Breadth-First selection rule which attempts to improve values of F DSP (E X [ j] ) (see (11)) by diminishing the width of boxes in the search tree, avoiding large differences in the size of the boxes in both lists.
Criterion 3 seems a good choice because it uses Best-First and global information, similar to the IBB algorithm. However, each subproblem usually has a different search behaviour. They differ in the total number of evaluated subproblems and in the number of evaluated subproblems per search tree level. So, it is difficult to select the box that improves F DSP (E X [ j] ) in the future, because F DSP (E X [ j] ) depends not only on X , but also on Ψ [k] (X n [ j] ) (see (7) and (11)) which are updated during the algorithm run.
Criterion 4 combines 2 and 3 with their advantages and drawbacks.
Criterion 5 attempts to improve the value of F DSP (E X [ j] ) by dividing the widest box among X [ j] and Ψ [k] (X [ j] n ). Criterion 6 tries to maintain the values of F DSP (E X [ j] ) updated. According to our preliminary experiments, this is the best of the presented selection rules and it is used in the illustration in Section 7.

Elimination rules
The following rules can be used to eliminate subboxes X = (Z , that are shown not to contain components of the optimal solution. i (X ), i = n [ j] and Z is interior with respect to T [ j] (Property 8).

Division rule
Similar to the IBB algorithm, the selected and not eliminated subbox X ∈ S [ j] is bisected by the midpoint of the widest component.

Termination rule
All subboxes X ∈ S [ j] with w(X ) ≤ are stored in the final list Q [ j] . The algorithm combines in the end Q [i] , i = 1, . . . , p, lists into a final list Q of boxes in n-dimensional space. Due to the division rule, the algorithm has a finite number of steps. As subboxes that are guaranteed not to contain components of the optimal solution are thrown out, the algorithm converges to a finite set of combinations that contains the optimal solutions.

Implementation aspects
One of the most time consuming parts of the algorithm is the search of (11). To avoid the search in the L [ j] and Q [ j] data structures, that usually contain many elements, an auxiliary data structure ) for all existing intervals X n [ j] . The IBB-DSP algorithm has three phases:

Numerical illustration
A set of 13 test functions with a common variable has been composed. The Appendix describes the characteristics of these functions. The first five test problems are three-dimensional. This facilitates obtaining a graphical output like Fig. 1. Example 1 is a simple test function created specifically for this work. Most of the problems have been constructed combining two well known global optimisation test functions by sharing one variable. For instance, problem Nr. 3 (L5P) is the combination of Levy-5 and Price functions, and problem Nr. 6 (G5G5) is the sum of two Griewank-5 functions. The algorithm was run using two different values for accuracy in the termination criterion, = 10 −3 and = 10 −6 . The algorithms were run using version 2.4.0 of C-XSC for interval arithmetic and interval automatic differentiation [2]. In order to limit the negative effects of the clustering problems  [3], the isotone inclusion function F ∩ is applied for the computation of bounds, see (5). We used an Intel Core Duo T2300 1.66 GHz processor with 1.5 Gb of RAM. We first analyse one instance to get hold of advantages and disadvantages of running the complete dimensional IBB compared to the DSP-IBB algorithm. Table 1 shows the behaviour of the IBB and IBB-DSP algorithms on test function GP3. It shows the performance of the IBB algorithm above and IBB-DSP algorithm below divided into phases 1, 2 and 3. The -For IBB = #F ∩ + n · #F NIE , where # stands for number of evaluations.
∩ . Additional information in Table 1 shows the number of boxes rejected by the tests presented in Sects. 3 and 6.3. Notice that phase 3 of the IBB-DSP uses rejection tests of the IBB algorithm. The number of final boxes |Q [i] | and the interval including f * are also shown for phases 1 and 2. Any sub-function lower bound lb [i] (see (6)) can be used to obtain the interval including f * in phases 1 and 2.
The performance of the IBB-DSP algorithm, in terms of effort and CPU time, is better than the IBB algorithm for the GP3 test function. Besides, the IBB-DSP algorithm obtains a better final inclusion of f * for = 10 −3 . We first focus on possible advantages of the decomposed perspective: 1. Decomposition leads to lower dimensional problems. 2. Separation of terms may lead to less occurrences of the same variable. This helps to diminish overestimations of the real function range. 3. The new rejection tests for sub-functions (SubRUpT, NCVMT and UVCV), perform well due to point 2. 4. In IBB-DSP, the evaluated point trying to update f U is restricted in the common component, to RUpT in phase 1 of the IBB-DSP algorithm is not as effective as RUpT of the IBB algorithm, but it still rejects a large number of boxes.
Possible disadvantages that can be derived from the observations: 1. In phase 1, the interval enclosing f * is larger than the final enclosing interval in phase 3.
The lower bound of f (E X [ j] ) obtained by F DF E (E X [ j] ) in (11) does not only depend on F [ j] (X [ j] ) but also in how many subboxes in L [k] ∪ Q [k] , k = j have points in common with X [ j] n [ j] . These subboxes determine Ψ [k] (X [ j] n [ j] ), see (7). Fewer subboxes in L [k] ∪ Q [k] give better values of F DF E (E X [ j] ). Therefore, in phase 3 the RUpT is effective in removing more boxes (Table 1). 2. Testing monotonicity in the common variable is left for phase 3.

Better values of F DF E
. RUpT is most effective at the end of the run in phase 2. 4. In phase 2 working with final boxes, the UVCV test is effective due to the following reasons: -In phase 1, one sub-function number j may result in final subboxes in Q [ j] , whereas subboxes of another sub-function number k that shares common variable values are stored in L [k] . Not all of these subboxes have offspring that reach Q [k] . -Rejection of boxes of one sub-function by RUpT in phase 2 (see point 2) allows the UVCV test to remove boxes for other sub-functions. Figure 1 shows rejected and final subboxes in phases 1 and 2 for the IBB-DSP algorithm on the GP3 problem with = 10 −1 . It illustrates the points mentioned above.
In Tables 2 and 3, the effectiveness of the IBB and IBB-DSP algorithms is measured in terms of the interval containing f * and the number of boxes in the final list |Q| for two values of the accuracy. For both values of accuracy , the IBB-DSP algorithm obtains a smaller or equal number of final boxes. Moreover, the IBB-DSP algorithm results into a smaller or equal range for f * .
In Tables 4 and 5, the efficiency is measured in terms of effort, CPU time (in seconds) and maximum number of elements in the lists reached during algorithm execution. For an accuracy of = 10 −3 , the IBB-DSP algorithm requires less effort than the IBB algorithm for all problems and spends similar or less time for all problems apart from case Nr. 9. For an accuracy of = 10 −6 , IBB-DSP requires less effort than IBB for 6 out of 13 problems and consumes less time for 4 problems. For most of the problems, using higher accuracy results into an increase in the size of L [ j] and Q [ j] . This makes it harder for the tests in the decomposed algorithm IBB-DSP to be effective, as was mentioned before.
The maximum number of the elements in the working and final lists affects directly the IBB-DSP efficiency with respect to CPU time. This can be observed from the case of applying an higher accuracies. As mentioned in disadvantage number 1 in page 1113, a larger number of elements in the working and final lists leads to a worse estimation of the lower bound. Additionally, far more computational effort is needed to find the Ψ [ j] (Y ) value defined in (7). One has to go through the complete list of the other subproblem to do so. For an accuracy of = 10 −3 , the maximum number of elements in the working and final lists is in general smaller when using IBB-DSP compared to IBB.
What can be observed is that for the high dimensional case Nr. 6 (with 9 variables), the advantage of using a decomposed approach is larger than for lower dimensional cases. Having 3 variables means here to have two subproblems in two dimensions, whereas the 9-dimensional case has two subproblems in 5-dimensional space. On the other hand, also the worst advantage of using decomposition is found in a higher dimension with function Nr. 7. This case is characterised by an imbalance in number of variables for the sub-functions. This causes an imbalance in the length of the lists for the sub-functions. Better branch-and-bound selection strategies should be developed to overcome the imbalance.
We remark from Tables 2, 3, 4, and 5 that two different decompositions of the same problem (case Nr. 11) yield different numerical results (case Nr. 11b gives better efficiency). This shows that the way the problem is decomposed affects the efficiency, depending on the resulting maximum working and final lists size.

Conclusions and future work
Interval B&B algorithms aim at guaranteed solutions for global optimisation problems. Such methods suffer from the curse of dimensionality. We had a look here at functions that are separable, but where the resulting sub-functions have variables in common. Can one use the underlying structure to obtain more efficient algorithms based on interval Branch-and-Bound? To answer this question, we derived properties that can be used in interval branchand-bound algorithms. Good bookkeeping of (sorted) underlying lists gives the opportunity to derive as well local lower bounds (for each sub-function) as global lower bounds. It has been shown, that the monotonicity test for the individual (non-common) variables can easily be extended to this framework. To apply also a monotonicity test for the common variable over all sub-functions, requires further investigation in possible implementations.
The developed properties have been included in a new branch-and-bound algorithm as part of the selection, bounding and elimination rules.
Experimental results show a potential improvement in effort and CPU time that depends mainly on the number of stored subproblems during the algorithm execution. This shows possible advantages using a decomposed strategy over a full-dimensional strategy.   | U |L [2] | U |Q [2] | U Experiments have been designed in a controlled way, varying test instances that can be composed in an artificial way from existing test cases. As a future work, it is interesting to apply the derived theory to real problems where more than one common variable can appear. Additionally, this paper facilitates the development of new variants and strategies to increase efficiency and their application to solve design problems such as in [9].

Effort
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.