Enclosure of all index-1 saddle points of general nonlinear functions

Transition states (index-1 saddle points) play a crucial role in determining the rates of chemical transformations but their reliable identification remains challenging in many applications. Deterministic global optimization methods have previously been employed for the location of transition states (TSs) by initially finding all stationary points and then identifying the TSs among the set of solutions. We propose several regional tests, applicable to general nonlinear, twice continuously differentiable functions, to accelerate the convergence of such approaches by identifying areas that do not contain any TS or that may contain a unique TS. The tests are based on the application of the interval extension of theorems from linear algebra to an interval Hessian matrix. They can be used within the framework of global optimization methods with the potential of reducing the computational time for TS location. We present the theory behind the tests, discuss their algorithmic complexity and show via a few examples that significant gains in computational time can be achieved by using these tests.


Introduction
the interval matrix. In this paper we are interested in symmetric interval matrices since we will calculate interval Hessian matrices over a given hyper-rectangular area, [   Similarly we define the coindex for symmetric interval matrices. Note that π and ν are the same as the index and coindex respectively.  It is easy to verify that by this definition all the conditions required to hold for a norm of a scalar matrix also hold for the norm of an interval matrix.

Proposed approach
We use a branch-and-bound (B&B) algorithm and the formulation proposed in [29] (problem P below) in order to search for critical points: However, aiming to focus the computational effort on the location of TSs, we introduce a number of tests which can be used to bound the number of negative and positive eigenvalues of an interval matrix. In a branch-and-bound algorithm, at any given iteration, valid lower and upper bounds on the global minimum are calculated over hyper-rectangular subsets R of the initial domain B. By dividing each subset area improving lower and upper bounds are obtained. Whenever the lower bound of a given area is found to be greater than the best upper bound so far, the area is fathomed. We modify the approach by applying, prior to each bounding step, a test on the interval Hessian matrix, [∇ 2 f (R)], calculated over R by the natural interval extension [11] of the second derivatives ∂ 2 f /∂ x i ∂ x j . The interval Hessian can be seen as a superset of {∇ 2 f (x) : x ∈ R}. If the test reveals that every matrix in [∇ 2 f (R)] has index > 1 then we fathom the area R. If the test reveals that every matrix in [∇ 2 f (R)] is index-1 and coindex-n − 1 then we can choose to perform a local search, since it can be shown (cf. Sect. 5) that this implies that there can be at most one TS in R. If a TS is found during the local search, we fathom the area. Otherwise the test is inconclusive and we proceed to the next step of the modified B&B algorithm. A flowchart of the proposed procedure is given in Fig. 1. A check to determine if zero is contained in the interval gradient is also applied at every iteration; if it is not the area is discarded.

Regional tests
In this section, we introduce five regional tests related to the presence of TSs. The tests can be used to identify regions that do not contain any TS, or regions that contain at most one TS. The computational complexity of each test is reported in each case. If the tests are embedded within a branch-and-bound algorithm for the solution of Problem (P), the computational complexity of the solution of the convex lower bounding problem, which is NP-hard, dominates the overall cost. Furthermore, if the αBB algorithm [2,3] is used, the interval Hessian matrix information required in the tests is readily available from the construction of the lower bounding problem and an efficient implementation can be developed with minimal effort devoted to the application of tests. Examples of the application of each test can be found in the "Appendix".

The Gerschgorin test
We begin by developing a regional test based on the well-known theorem by Gerschgorin [30]. An interval extension for the first part of the above theorem was given in [2] and used for the calculation of lower bounds for the eigenvalues of symmetric interval matrices. Here we are interested in the second part of Gerschgorin's theorem, on counting the eigenvalues in disjoint sets. The extension in [2] is also valid for the second part of the theorem.  We give a pseudocode for a test based on Theorem 4.2, which we call the Gerschgorin test, in Algorithm 1. Regions for which the interval Hessian contains no negative disks (convex areas), or where a set of more than one discs lie on the negative side and are disjoint from the rest, are removed (lines 16-17 and 21-29 in Algorithm 1). By "discs" here we mean the intervals D i ([M]). Regions with one negative eigenvalue and all the other positive may also be identified (lines [18][19]. Notice that the Gerschgorin test may be inconclusive even for a scalar matrix.

Recursive inertia (RecIn) test
Based on Haynsworth's theorem [5,12] we can construct algorithms for obtaining bounds on the number of negative and positive eigenvalues of interval matrices.

A∈[A]
In(A) + min

S∈S x
In(S) ≥ min

A∈[A]
In(A) + min

S∈[S]
We can make use of Haynsworth's theorem recursively, as shown by Cottle [7]. Cottle considers scalar matrices and chooses A to be a single non-zero entry in the diagonal. By interchanging corresponding rows and columns simultaneously, thus not affecting the eigenvalues, we bring the selected entry A to the top left position of the matrix. We note the sign of A, we then calculate C − B T A − [B] and repeat. We should give priority to negative intervals. If at any point all the diagonal interval elements contain zero, then we cannot proceed further with the analysis and stop. Note that in the interval case, each time we find a negative (resp. positive) interval in the diagonal of a subsequent Schur complement, this means that all the scalar matrices contained in the initial interval matrix have a further negative (resp. positive) eigenvalue. In a similar manner, Meyer and Swartz [21] used Schur's formula, det (M) = det (A)det (C − B T A −1 B), for a convexity test applied to interval matrices (such a test was mentioned in [7] for scalar matrices) along with a branchand-bound method. In Algorithm 2 we give a pseudocode for the proposed recursive inertia test, RecIn.

Extended RecIn test
The RecIn test cannot proceed if all diagonal elements of the initial input matrix or of a subsequent Schur complement contain zero. We extend the RecIn algorithm to overcome this issue.
Also, n − min ν(L) (the inequality stems from the fact that the matrix might have zero eigenvalues) and hence finally, In a similar way we can show that, π(L). Thus, by (10) and (11), we obtain bounds for ν ([M]) and π([M]). We give the pseudocode for the RecIn_U in Algorithm 3 and then the extended recursive inertia test, xRecIn in Algorithm 4. We omit the pseudocode for RecIn_L since it is easy to derive it from RecIn_U.
Note that for the calculation of the Schur complement in step 10 of the RecIn_U algorithm,

2 × 2 Inertia test
Another possible way to make use of Theorem 4.3 for our purpose is to choose , to be any of the 2 × 2 diagonal sub-matrices of The maximum eigenvalue, λ i j = max , of each of these matrices is If λ i j < 0 for any of the sub-matrices then by Theorem 4.3 we know that every M ∈ [M] has at least two negative eigenvalues and thus we can fathom the corresponding area. In Algorithm 5 we give a pseudocode for this test to which we refer as the 2 × 2 inertia test.
Fathom area. 7: end if 8: end for 9: end for Note that the 2 × 2 inertia test does not remove TSs and minima and that it may be inconclusive even for a scalar matrix. However, it is computationally cheap and it is easy to implement. Furthermore, it is straightforward to show that this test is more effective than the Gerschgorin test in identifying non-TS areas. More formally we have the following: This implies that m ii + max{|m i j |, |m i j |} < 0 and m j j + max{|m ji |, |m ji |} < 0.
From (13)  Finding a counter-example to show that the reverse is not always true is easy (see "2 × 2 inertia test example" in the Appendix).

Rohn test
The last test we present is based on Rohn's method [17] which is derived from the interval extension of Weyl's inequality [10].

Theorem 4.8 (Weyl) Given n × n symmetric (scalar) matrices C and E, then
where for any matrixM, λ n (M) ≤ · · · ≤ λ 1 (M). Any given interval matrix Note that because C has been defined as the center matrix of [M], λ n = −λ 1 and also that the widths of the intervals λ k ([M]) are all the same. We can calculate λ n (and λ 1 ) using a number of methods (see [2,27]), the simplest being the interval extension of Gerschgorin's theorem (O(n 2 )) and the most expensive being the Hertz-Rohn method (O(2 n−1 )) [15,16,26]. The Rohn test is summarized in Algorithm 6. In Sect. 3 we stated that hyper-rectangular areas where every matrix is index-1 and coindexn − 1 has at most one TS. We give a proof of this statement here. The proof is straightforward and we state it for completeness.
Proof If f has any critical points in B then by the assumption that ∇ 2 f (x), x ∈ B is index-1 and coindex-n − 1, they would be TSs. Now assume that x 1 , x 2 ∈ B with x 1 = x 2 are critical points of f . Then, by the mean value theorem for some ξ between x 1 and x 2 and since B is a hyper-rectangle ⇒ ξ ∈ B. However, which contradicts our assumption.
In practice the interval Hessian,  . In any case a test might fail to provide a definitive answer and thus be inconclusive. By considering under what circumstances a test may be inconclusive, we can classify the proposed tests using the following definitions.

Definition 6.1 (Complete test)
A test is called complete if it is never inconclusive.
as input.

Definition 6.3 (Incomplete test) A test is called incomplete if it is not -complete.
We note that in the above definitions, for any test, we assume infinite-precision arithmetic and also that we know the maximum number of steps a priori.
The Gerschgorin and 2 × 2 inertia tests are incomplete since they can be inconclusive even for scalar matrices. The recursive inertia test is also incomplete since it cannot deal with matrices where all the diagonal elements contain zero. The extended recursive inertia test and Rohn test are -complete. We do not know of any method that can result in a complete test or if a complete test is even possible. In the next section we prove that this is an NP-hard problem.
We could attempt to construct a complete test with the following reasoning. The Hertz-Rohn method [15] gives the exact lower and upper bounds of the smallest and largest eigenvalue, respectively, of any symmetric interval matrix  The roots of h(c) = −c 2 + 2cb − 2b 2 + d 3 are given by c *

Corollary 6.5 There is no choice of S such that for any n × n symmetric interval matrix [M], S([M]) provides correct bounds for
Proof If such a choice of S existed then it would also allow the correct calculation of the bounds for the index of any symmetric interval matrix, which contradicts Proposition 6.4.
A summary with the characteristics of each test is given in Table 1.

Algorithmic complexity
In this section we investigate the algorithmic complexity of the problems that we aim to solve with the algorithms given in Sect. 4 that is, identifying a TS matrix or a non-TS matrix. By TS and non-TS we mean, given a symmetric interval matrix  [25] proved that checking positive definiteness of an interval matrix is an NP-hard problem.  Proof Simply consider the block interval matrix where D can be any diagonal k × k matrix with all the diagonal entries being negative and Therefore identifying a TS matrix is NP-hard. With the help of Haynsworth's theorem and using the same reduction as in [23], used for proving that checking the positive semidefiniteness of an interval matrix is NP-hard, we can prove the NP-hardness of identifying a non-TS matrix. First we give the following lemma from [23].

d(a) is the smallest common denominator of the entries of a.
is NP-complete. aa T )z * < 0 which would imply a "no" answer to problem 7.3.

Results
The proposed tests have been implemented in the αBB algorithm [1]. The use of the αBB algorithm for solving problem (P) requires the calculation of the second derivatives of the constraints, which include first derivatives of the function f . Therefore, function f must be three-times continuously differentiable in the specific implementation we have developed.
The tests presented here, however, are applicable to C 2 functions and can readily be integrated within algorithms that do not require the constraints to be in C 2 , e.g. [18]. As mentioned previously, an efficient implementation of the tests can be constructed by using the interval values of the second-order derivatives of f that can be computed when calculating α values for the underestimators. A more basic implementation has been used here, so that the computational performance provides a worst-case analysis of the cost of the tests. We investigate the performance of the proposed tests on a number of problems. For each problem we perform one run using no test and separate runs using each test without local search. For the Gerschgorin, RecIn and Rohn tests we also perform runs with local search in order to evaluate whether there would be any improvement regarding the CPU time. For bounding the eigenvalues in Rohn's test we used the interval extension of Gerschgorin's theorem [2]. For each problem we give a table containing the CPU times for each run and the corresponding number of (non-degenerate) minima, TSs and other solutions found and a graph which shows the number of unfathomed nodes at each iteration for each run. We also give a summary of the success rates (No. of nodes fathomed by test/No. of times test applied) for each test in each problem (no local search applied). The computations were performed on an Intel CPU @ 3060 MHz using an absolute convergence tolerance of 10 −6 and a minimum box size of 10 −6 .

Problem 1: Ackley's function
For the first example, we apply the algorithm to Ackley's function: cos(2π x i ) + 20 + e, with n = 3 and x ∈ [0.5, 3] 3 . This low-dimensional example has 81 first-order saddle points, which are found with all configurations of the algorithm (with or without tests). We can observe from Table 2 that, with the application of the regional tests, the CPU time can be reduced by more than 50 % in comparison to the "no test" case (location of all critical points), which has a CPU time of 64 s. A further reduction in CPU time of 15-30 % is achieved with the application of the local search over areas that are found to have index-1. The RecIn test has the best performance, with a CPU time of only 19 s when the local search is also applied, with the Rohn test also exhibiting very strong performance. Furthermore, the Rohn and RecIn tests only return the TSs as solutions while the Gerschgorin test and the 2 × 2 test return a  Fig. 2, the number of open nodes in the branch-and-bound tree is reported as a function of iteration number for every test. The scales used in the five panels are the same to make comparison easier. The significant reduction in the number of iterations when the tests are applied is evident and the branch-and-bound tree is found to be much smaller (Fig 2; Table 2).

Problem 2: Levy function
In this example we use a Levy function: f (x) = sin 2 (π y 1 ) + n−1 i=1 (y i − 1) 2 [1 + 10 sin 2 (π y i+1 )]+(y n − 1) 2 , where y i = 1 + (x i − 1)/4. In our case, n = 5 and x ∈ [−5, 5] 5 . This more challenging example has a total of 349 stationary points of which 142 are transition states and 63 are minima, as can be seen in Table 3. Notice that the Hessian of f is tridiagonal. Again, without local search, we see a significant reduction in CPU time, of between 9 and 38 % (Table 3), and in iteration number, of up to 41 % (Fig. 3). The maximum overall CPU time reduction achieved with the use of a test combined with local search is of 50 %. The RecIn test has the best performance with a CPU time of 108 s in contrast to the 218 s required when no regional test is applied. As in the first example, the Rohn test provides the second-best performance when accompanied by local search. However, without local search, the second-best performance is achieved with the 2 × 2 inertia test. Both Rohn and RecIn tests return only the TSs as solutions, whereas the 2 × 2 inertia test leads to the identification of all 63 minima and the Gerschgorin test to the identification of 58 other stationary points.

Problem 3: Himmelblau's function
In this example we use an extension of Himmelblau's function to multiple dimensions: where n = 6 and x ∈ [−5, 5] 6 . The results are presented in Table 4 and Fig. 4. Although this example has only one variable more than Problem 2, the number of stationary points is much greater, with 729 points in total, of which 192 are transition states and 64 are minima. There is therefore a considerable computational cost to searching for all stationary points. The basic algorithm, without any regional tests, identifies all 729 points in 520 CPU seconds, compared to 218 CPU seconds in Problem 2. In contrast, the use of tests without local search leads to a reduction in CPU time of between 36 and 52 % and the use of tests with local search to a reduction of between 38 and 54 % overall. It is clear from these numbers that the application of one test provides most of the performance improvement in this example, and that the local search, albeit beneficial, has a modest impact on the overall CPU times. Once more the RecIn test is the most effective test, reducing the CPU time by a factor greater than 2 with respect to the case when no test is applied. In this particular case, the Gerschgorin test does not lead to the identification of additional stationary points. The

Problem 4: 2D-XY lattice model
For the last example we use the 2D-XY lattice model [19]: where Λ = {1, 2, . . . , 9} and N {k} is the set of indices of the neighbouring lattice points to the lattice point with index k. The 2-dimensional XY lattice model has been studied, amongst others, in [19,20]. The model exhibits exponential growth of the number of stationary points as the number of lattice points grows. Here, we consider a 3 × 3 lattice where θ 7 = θ 8 = θ 9 = 0, θ 3 = θ 6 = π/2 and θ i ∈ [−π, π] for i = 1, 2, 4, 5. Thus, this is a 4-dimensional problem. This example has a relatively small number of stationary points (33), with only 5 transition states and one minimum, and the algorithm without tests identifies all these points within 86 CPU seconds. However, the performance of the tests, as presented in Table 5 and Fig. 5, is more disparate than in previous examples. The frequent appearance of interval Hessian matrices where some or all diagonal elements include zero makes this example more challenging for some of the tests. Thus, the Gerschgorin test leads to a reduction in the total number of iterations of less than 4 %, and no reduction in the CPU time, which remains at 86 CPU seconds. This is due to the fact that some Gerschgorin discs overlap when zero is present in the diagonal elements and this may result in the test being inconclusive. We note that the computational cost could be reduced with a more efficient implementation that permits the re-use of the calculations of the interval Hessian matrix elements carried out while constructing the αBB underestimators for the purpose of the test. Nevertheless, based on the implementation used here, the Gerschgorin test does not lead to a change in CPU time and identifies 26 "other" solutions in addition to the 5 transition states. Secondly, in this case the Rohn test performs better than the RecIn test: this latter test leads to a larger CPU time than the Rohn test and fails to remove a number of non-TS solutions. The reason for this is the presence of zeros in the diagonal entries of the interval Hessian matrices that prevent application of the RecIn test. However, the use of the xRecIn test can overcome this problem and, as can be seen in Table 5, it performs slightly better than the Rohn test.

Overall performance of the tests
Overall, the application of the proposed tests leads to a reduction in the number of iterations and this is usually accompanied by a significant reduction in CPU time, by up to 50 %. The application of the local search always leads to a reduction in both CPU time and iteration number. The most appropriate version of the recursive inertia test (RecIn or xRecIn) test, as indicated by the presence or not of zeros in the diagonal elements of the interval Hessian matrix, is found to provide the best performance in every case. The Rohn test usually performs well too, while the CPU time reduction is not as large with the Gerschgorin and 2 × 2 inertia tests. The worst performance was observed in applying the Gerschgorin test to Problem 4, where the presence of zeros in the Hessian matrix results in overlap of the Gerschgorin discs and the inability to eliminate most nodes. This provides a useful insight into the types of problems for which this test is most appropriate. It is instructive to consider the success rates of the tests. In the proposed approach, the interval gradient test was applied at every node of the branch-and-bound tree and the chosen test was then applied at every node at which the interval gradient test was passed. The success rate of each test is calculated as the ratio of the number of nodes fathomed by a test to the number of times this test was applied, and is reported in Table 6. The success rates obtained are of the order of a few percent, with a maximum value of 5.35 %. As discussed, the lowest overall success rate is exhibited by the Gerschgorin test, while the RecIn test is most consistently successful. As can be expected, the tests tend to become more effective as the nodes become smaller for two reasons. First, in the test cases considered here, there are many stationary points and a large portion of the domain contains points at which the Hessian matrix is index-1 (whether they are index-1 critical points or not). Second, the larger the volume of the node the larger the overestimation inherent in the evaluation of the interval Hessian matrix, so that large nodes cannot be eliminated easily. Despite the relative inefficiency of the tests, the CPU-times for the problems presented are halved, indicating that the tests play a useful role. Further gains in CPU time may be derived by imposing a maximum threshold on the size of the node so that tests are only applied to "small-enough" nodes. A strategy to reduce the number of iterations is to apply multiple tests. The RecIn/xRecIn tests generally lead to the elimination of regions that are eliminated by other tests. However, the reverse is not true. If the tests are applied in series, it is therefore advantageous to apply the least computationally demanding tests first, specifically Gerschgorin and 2 × 2 inertia and to follow this with RecIn/xRecIn tests. This strategy was deployed on the test problems, but due to the relatively low dimensionality of the examples (up to 6 variables), it did not lead to an improvement in CPU time compared to applying RecIn/xRecIn only. It would be interesting to explore this strategy further by deploying the tests in parallel on larger problems.

Conclusions
In this paper we considered the problem of enclosing all transition states (TSs) of general nonlinear functions in C 2 using global deterministic methods. We introduced five tests that can be applied prior to the bounding step of branch-and-bound algorithm. These tests help to identify areas of the search space which do not contain any TSs or may contain at most one. In the first case we fathom/remove the area while in the second we perform a local search and if a solution is found we then fathom the area. With the tests we aim to focus the computational effort on the location of TSs rather than the identification of all critical points. We have implemented this approach within the αBB algorithm and presented the successful application of the proposed tests to a number of low-dimensional problems in C 3 , with up to six variables. The problems typically exhibit numerous stationary points. The results indicate that the addition of the tests can reduce the computational time significantly while locating all the transition states successfully. Furthermore, the use of a local search in areas that are identified to contain at most one TS is found to be advantageous, reducing both CPU time and iteration number. We note that the proposed tests can be used within any branch-and-bound algorithm or within the interval Newton method and that, with the exception of the 2 × 2 inertia test, they can be altered in order to locate any index-k critical point. The RecIn/xRecIn tests are particularly effective for all problems considered. The use of the tests is a useful step towards the application of a branch-and-bound algorithm to the identification of transition states for larger problems: within the αBB algorithm, the tests can be implemented at relatively low cost because the required interval Hessian matrix is computed implicitly as art of the underestimation procedure. Thus, the overhead arising from the tests can be kept low, while achieving a reduction in iteration number.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix Gerschgorin test example
Consider the matrix,

RecIn test example
Consider the matrix, For the second step, the (interval) Schur complement is  clearly has two negative interval eigenvalues. The 2×2 inertia test would identify this. However, because of the large entry, m 13 = 100, the Gerschgorin discs would form one joint set with negative lower bound and positive upper bound and thus the Gerschgorin test would be inconclusive even for this very simple case.

Rohn test example
Consider again the matrix from "RecIn test example". The center matrix is