Tighter α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}BB relaxations through a refinement scheme for the scaled Gerschgorin theorem

Of central importance to the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}BB algorithm is the calculation of the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} values that guarantee the convexity of the underestimator. Improvement (reduction) of these values can result in tighter underestimators and thus increase the performance of the algorithm. For instance, it was shown by Wechsung et al. (J Glob Optim 58(3):429–438, 2014) that the emergence of the cluster effect can depend on the magnitude of the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} values. Motivated by this, we present a refinement method that can improve (reduce) the magnitude of α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} values given by the scaled Gerschgorin method and thus create tighter convex underestimators for the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}BB algorithm. We apply the new method and compare it with the scaled Gerschgorin on randomly generated interval symmetric matrices as well as interval Hessians taken from test functions. As a measure of comparison, we use the maximal separation distance between the original function and the underestimator. Based on the results obtained, we conclude that the proposed refinement method can significantly reduce the maximal separation distance when compared to the scaled Gerschgorin method. This approach therefore has the potential to improve the performance of the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}BB algorithm.


Introduction
The αBB algorithm [1,2,5,15] is a branch-and-bound algorithm which is based on creating convex underestimators for general twice-continuously differentiable (C 2 ) functions. The tightness of the underestimator plays a key role in the efficiency of the algorithm. In the αBB method, the underestimator of a C 2 term or function is obtained by adding an appropriate quadratic term to the original expression. The validity of the underestimator depends on the calculation of the so-called α values, which must be chosen appropriately in order to ensure convexity. One must take care, however, not to be over-conservative by selecting α values that are larger than needed as smaller the α values lead to a tighter underestimator with respect to the original function.
A number of methods for the calculation of α values that are rigorously valid, i.e., such that the underestimator is guaranteed to be convex, have been presented in the literature [2,12,19,20]. It is usual, but not necessary, for a trade-off between tightness of the underestimator and computational cost to exist. With respect to this, a comparative study among different methods for calculating α values for the original αBB underestimator as well as methods that employ different underestimators [2][3][4]14,16,20] has been presented by Guzman et al. [9].
One important aspect of the choice of α values is with respect to the so-called cluster problem [8]. The cluster problem describes the situation where a branch-and-bound algorithm creates a large number of unfathomed boxes around a solution because it creates nodes much faster than it fathoms. This effect is of course dependent on the quality of the underestimator and can significantly impact the performance of the algorithm. As shown by Wechsung et al. [21], improving the α values can be critical with respect to the cluster effect during execution of the αBB algorithm.
Motivated by the above observations, we introduce a "refinement" algorithm, based on Haynsworth's theorem [6,11], to improve the α values given by the scaled Gerschgorin method [2]. Although the algorithm can be applied to improve the α values given by any of the methods used in the original αBB method (see [2]) we choose the scaled Gerschgorin method because it usually gives good (i.e., comparatively small) α values, it is computationally cheap and the use of a different α value for each variable (non-uniform shift) allows for more flexibility than other (uniform shift) methods.
In order to test the algorithm on a number of randomly generated (symmetric) interval matrices and interval Hessians generated from test functions, we use the maximum separation distance as a measure of tightness between an αBB underestimator and the original function.
In Sect. 2 we begin by briefly presenting the αBB underestimator for general C 2 functions and the scaled Gerschgorin method for calculating α values for the underestimator. In Sect. 3 we state Haynsworth's theorem which is the basis of our new method. In Sect. 4 we present the refinement algorithm. We begin with an example to help the reader understand how we use Haynsworth's theorem for our purpose. We then give a pseudocode form of the algorithm and close the section with another example where we apply the refinement algorithm. In Sects. 5 and 6 we present the results of comparing the scaled Gerschgorin method and the refinement method. In Sect. 5 we present results from randomly generated symmetric interval matrices while in Sect. 6 we report results from Hessian matrices taken from test functions. Finally, we conclude in Sect. 7.
In what follows we will assume that the reader has some basic knowledge of the αBB algorithm and is familiar with interval arithmetic and interval matrices (see [10] for an introduction to interval analysis). We will denote single intervals using lower case letters inside square brackets, for example [x] = [x, x] and interval matrices with capital letters inside square brackets, for example [M].

The˛BB underestimator and the scaled Gerschgorin method
Given a general nonlinear function, f ∈ C 2 , a convex underestimator, of f over a given hyper-rectangular domain X = [x 1 , x 1 ] · · · [x n , x n ] T is constructed within the αBB algorithm [1,2,5,15], where (1) Note that q(x) ≤ 0, ∀x ∈ X and thus F(x) is indeed an underestimator of f (x) over that domain. The α values have to be determined so as to ensure F is convex. This is accomplished with the use of the interval Hessian matrix, [H f ], over the hyper-rectangular domain of interest. The interval Hessian matrix [H f ] is obtained by constructing the matrix H f (x) of second-order derivatives of f and deriving an interval enclosure h i j , h i j for each element h i j (x) over the domain X . In the scaled Gerschgorin method [2], the α values are calculated as with k i , i = 1, . . . , n, being positive integers. A useful feature of the αBB underestimator is that the maximum separation distance between f (x) and the underestimator F(x) over X is explicitly given by We can see from Eq. (3) that even if the α values were not to improve as we subdivide the domain, the maximum separation distance would nevertheless improve quadratically. This is an important feature of the αBB underestimator which relates to the cluster problem [8].
A theoretical analysis of the cluster problem was first carried out in [8]. This analysis showed that the relaxations in a branch-and-bound algorithm must have at least second-order convergence to "avoid" the cluster problem. In a later paper [21], it was shown that the pre-factor of the convergence order also plays a crucial role. For the αBB algorithm, the prefactor corresponds to the α values. Therefore, an improvement of these values could have a significant effect on the performance of the αBB algorithm.
As is evident from Eq. (3) we would like to make the α values as small as possible while ensuring that the Hessian matrix of second-order derivatives of F(x), H F (x) = H f (x) + D where D is the diagonal matrix with diagonal entries d i = 2α i , is positive semi-definite over the area of interest. With the help of Haynsworth's theorem [6,11], introduced in the next section, we can improve (reduce) the α values obtained by the scaled Gerschgorin method.

Haynsworth's theorem
The inertia of a symmetric matrix is defined as follows: We now state Haynsworth's theorem which is the basis of the refinement method. Theorem 3.2 can be used recursively for the complete calculation of the inertia of a scalar matrix [7] and therefore for revealing whether the matrix is positive semi-definite or not. This can be accomplished by choosing A to be a single diagonal entry, noting its sign, then calculating the Schur complement, C − B T A −1 B and repeating the process on this newly formed matrix. Assume for example that for a given n × n symmetric matrix M, we repeat this procedure n times and find 11 of the i-th Schur complement, M i , with M 1 being the initial matrix M. Then by Theorem 3.2 we conclude that the matrix M is positive semi-definite.
For scalar matrices we can always proceed to calculate the complete inertia even if at some step there is no non-zero diagonal entry that can be chosen (see [7] for details). An extension of the recursive use of Theorem 3.2 for the calculation of the inertia of symmetric interval matrices has been presented in [18]. In this work, however, we are not interested in calculating the inertia but rather guaranteeing semi-definiteness. In a similar manner, we can extend the recursive procedure for determining the positive semi-definiteness of scalar matrices to the case of interval matrices.
For example consider a symmetric interval matrix [M]. We follow the same procedure as in the scalar case but use interval arithmetic for the calculation of each subsequent (interval) Schur complement. Assume we find with  Proof First, we note that when we calculate an interval Schur complement, In the next section we begin with an example of how this can be used to calculate smaller α values and to help the reader understand how we utilize Theorem 3.2.

The refinement algorithm
Consider a 3-dimensional function f : B ⊂ R 3 . We want to construct the αBB underestimator over an area X ⊆ B. After calculation of the interval Hessian [H f ] over X and calculation of the α values using Eq. (2) we consider the convex underestimator with the corresponding Hessian matrix, where (2) for i = 1, 2, 3. Now assume that after applying Haynsworth's theorem recursively on [H F ] we find: at step 1, at step 2, and finally at step 3, Notice that the left-hand sides of inequalities (7), (8), (9) are the (interval) entries [m In the above scenario, based on Proposistion 3.3, we would know that the interval matrix is positive semi-definite.
We focus on one specific output of these calculations, m 11 . In the case shown here, since the original matrix [H F ] is used without any pivoting, m  Note that if Algorithm 1 terminates for some iteration k at step 5, any improvements obtained up to that point (m n , . . . , m n−(k−1) ) to the α values, α n , . . . , α n−(k−1) , are still valid. The rest of the values, m n−k , . . . , m 1 , are still zero from the initialization at step 2. Thus, the new α values given at step 8 are valid.
The question arises of how to choose a value for m n−i ∈ [0, min{r n−i , d n−i }] at step 5 of Algorithm 1. At the first iteration we can choose m n = min{r n , d n }. However, it might be wiser to "spread" the reduction to all the diagonal elements (if possible). We consider three approaches: In the shared option, the current reduction is equal to the current residual divided by the number of remaining diagonal entries to be reduced. In the extra-weighted option the current reduction has the same value as in the shared option plus a weighted portion (w n−i ) of what remains if we subtract this value from the residual. In the weighted option the reduction value is a portion of the current residual which is given by to the ratio of d n−i over the sum of the remaining d j values to be reduced.
Let us now give an example of the refinement algorithm so that it may become clearer to the reader. We will use the shared reduction option for this example. Consider the 3 × 3 (symmetric) interval matrix: The fact that the diagonal elements of the example matrix are scalar bears no significance. In fact, in practice, only the lower bounds of the diagonal elements need to be considered (see Lemma 1 in [13]). Calculating the α values using Eq. (2) (with k i = 1, for i = 1, 2, 3) we find, α 1 = 8, α 2 = 6, α 3 = 8.5. The hypothetical interval Hessian is: Performing calculations (7) Again, using (7)-(9) we find m 11 , m  Although we cannot calculate actual minima in this case, since our example matrix was not derived from a specific function, we can measure the improvement obtained with the reduced values, α i , using Eq. (3). More specifically, we can set (x i − x i ) 2 = 1, i = 1, 2, . . . , n and consider the percentage of improvement with respect to the (hypothetical) maximal separation distance, The value of I can vary from 0% (no reduction at all in the α values), up to 100% (the initial matrix is identified as positive semi-definite). For our example we have I = 21.2%, meaning that (by this measure) the refinement led to a 21.2% reduction in the maximal separation distance.
Note that we could simply apply the recursive procedure given in Eq. (5) on the initial Hessian matrix to determine whether it is positive semi-definite. This concept was proposed in [17]. In this work, however, we are interested in reducing the α values and not identifying whether the initial interval Hessian is positive semi-definite or not.

Results on random symmetric interval matrices
In this section we present results from the application of the refinement algorithm on randomly generated symmetric interval matrices. We have generated four groups of one thousand random matrices each with dimension 3, 4, 5 and 7 respectively and with the intervals in each matrix varying in  (12)] and we calculate the percentage improvement (reduction) in the maximum separation distance, I , given by Eq. (17). For each group of matrices we plot three histograms of the I values obtained after applying the refinement algorithm with each reduction option respectively in Figs. 1, 2, 3 and 4. Furthermore, in Table 1 we give the mean values attained by I for each reduction option in each group of random matrices.
Notice that cases where the α values calculated by the scaled Gerschgorin method were all zero have been filtered out from the results of both this and the next section. In practice, for such cases, there is no need for underestimation as the problem is already convex over the domain of interest.
We can make the following observations. First, as the matrix dimension increases the mean I values improve (increase) for all cases. Second, the shared (10) and extra-weighted  (10), b the extra-weighted option (11) and c the weighted option (12). The thick vertical line indicates the mean value of I (11) options perform significantly better than the weighted option (12) in all four cases while the extra-weighted option performs slightly better than the shared option. For a more detailed analysis of the performance of the shared and extra-weighted options, we plot a histogram (Fig. 5) of the value of I 2 − I 1 , where I 1 and I 2 are the values for the shared method (Fig. 4a) and the extra-weighted method (Fig. 4b), respectively. As can be seen, in the majority of cases in Fig. 5 I 2 − I 1 is positive. Therefore, we can conclude that the extra-weighted option appears preferable overall.

Results on random interval Hessian matrices
In this section we present results from the application of the refinement algorithm on symmetric interval Hessian matrices calculated over random sub-domains of the following three test functions:  17)] for the 1000 5 × 5 random matrices using a the shared option (10), b the extra-weighted option (11) and c the weighted option (12). The thick vertical line indicates the mean value of I Griewank: Levy: f (x) = sin 2 (π y 1 ) + (y i − 1) 2 [1 + 10 sin 2 (π y i+1 )] + (y n − 1) 2 , For each function we calculate three groups of one thousand Hessian matrices each, over random hyper-rectangular domains with randomly chosen centres and with sides of randomly varying length within  The boldface indicates the option which gives the largest mean value of I The results are given in Figs. 6, 7 and 8. We also give the mean I value attained for each test function for each value of L in Table 2.
As mentioned earlier, the results differ for each case since the Hessians have a certain structure and entry values. In Fig. 6 (Griewank) the refinement method results in an improvement of approximately 14% regardless of the value of L. In Fig. 7 (Levy) we see that the refinement method is most when L = 0.2 with average improvement of 11.5% percent. In Fig. 8 (extended Himmelblau) the refinement algorithm performs well for all cases with increasing improvement (21.5%, 27.4% and 32.6%) as the value of L becomes smaller. Finally, in Fig. 9 we compare once more the shared reduction option and the extraweighted reduction option by comparing the I values [given by Eq.

Conclusions
We have presented a refinement method which we use in conjunction with the scaled Gerschgorin method in order to improve (reduce) the α values needed for the convex underestimator of the deterministic global optimization algorithm αBB. The refinement method can also be utilized with other available methods for the calculation of the α values.
We have applied our algorithm on randomly generated symmetrical interval matrices as well as interval Hessian matrices taken from test functions. In order to compare the scaled Gerschgorin method and the refinement method we used as a measure the maximal separation distance of the underestimator.
In the experiments with the randomly generated matrices we used four groups of matrices with dimension 3, 4, 5 and 7 respectively and with each group consisting of a thousand matrices. The results showed that the refinement method improved the maximal separation distance by an average of 7.4%, 11.3%, 13.3% and 16.3% for each group respectively. In the experiments with the interval Hessian matrices we used three test functions: 4D-Griewank, 5D-Levy and a 5D-extension of the Himmelblau function. For each test function we calculated three groups of a thousand interval Hessians each. The Hessians were calculated over randomly chosen hyper-rectangular areas with sides of length (0, L) where L = 2, L = 1 and L = 0.2 for each group respectively. As is natural, the results differ for each function. For the Griewank function the results where similar regardless of the value of L with an average improvement of approximately 14%. For the Levy function there was no significant improvement for L = 2 and L = 1. However for L = 0.2 the improvement was 11.5%. Finally, for the extended Himmelblau function we observed 21.5%, 27.4% and 32.6% improvement for L = 2, L = 1 and L = 0.2, respectively, with many of cases having 100% improvement.
Furthermore, we have tested three different reduction options for step 5 of the refinement algorithm and, based on our numerical results, we have concluded that the extra-weighted option (11) performs the best. From the above we conclude that the refinement method can result in a considerable improvement with respect to the the maximal separation distance. Despite the fact that this improvement comes at a (reasonable) computational cost, the improvement in the α values can result in an overall reduction of computational time, if nodes are fathomed at a higher rate during the execution of the Branch-and-Bound algorithm. As future work, it remains to be seen whether the refinement method is cost-effective, when integrated for use into the αBB algorithm.