Subdomain Separability in Global Optimization

We propose a generalization of separability in the context of global optimization. Our results apply to objective functions implemented as differentiable computer programs. They are presented in the context of a simple branch and bound method. The often significant search space reduction can be expected to yield an acceleration of any global optimization method. We show how to utilize interval derivatives calculated by adjoint algorithmic differentiation to examine the monotonicity of the objective with respect to so called structural separators and how to verify the latter automatically.


Introduction
In contrast to local optimization methods, deterministic global optimization methods, e.g. interval-based branch and bound (b&b) algorithms [1], guarantee to find the global solution for a predefined tolerance for optimality in finite time [2]. These methods are more expensive in terms of computational effort than their local counterparts.
An important property that should be exploited during optimization is separability of the objective function. A function f : R n → R is called partially separable (also: decomposable) if it is of the form with a given partitioning of the set of indexes of independents into two disjoint subsets X 1 , X 2 and functions f 1 : R |X1| → R and f 2 : R |X2| → R. The function is called (fully) separable if the separation can be applied recursively until all disjoint subsets only contain a single element [3,4]. For a global optimization problem with partially separable objective function f as in (1) it is well known [5] that the global minimum can be obtained by decomposing the problem into smaller subproblems y * = min that can be solved in parallel. In the context of b&b algorithms with a division into k parts for all dimensions every non-leaf node generates k n children. The decomposition reduces the number of generated nodes to O(k max(|X1|,|X2|) ) for the particular problem and thus results in a potentially significant reduction of the corresponding search space. Separable functions have been extensively researched in the context of optimization. In [6] a quasi-Newton method is introduced that exploits the structure of partially separable functions when computing secant updates for the Hessian matrix. A parallel b&b approach was used in [7] to find optima of non-convex problems with partially separable functions over a bounded polyhedral set. In [8] a derivative-free method for exploiting partial separability in unconstrained optimization was proposed. The automatic detection of partial separability as in (1) by algorithmic differentiation was proposed in [9].
In [10] a class of problems was introduced, which is called as easy to optimize as decomposable functions and that is related to the present work. Such functions satisfy such that the first-order optimality condition can be transformed to g(x i ) = 0. The equation is only dependent on a single variable. Optima for which h(x) = 0 and optima at the boundary are not taken into consideration by this approach. In this paper we aim to generalize the concept of separability in order to make previously non-separable functions also benefit from decomposition of the optimization problem on subdomains. Therefore, the function must be of a special structure which is less restrictive than (1), but is a variation of (2) and additionally needs to fulfill a monotonicity condition on the separator. The monotonicity condition guarantees that the decomposition still takes all possible optima into consideration which is crucial for the integration into deterministic global optimization algorithms.
We use interval adjoints as a combination of reliable interval computations [11,12] and adjoint algorithmic differentiation [13,14] to obtain an enclosure of all adjoints over a given subdomain. In [15] we used this information for significance based approximate computing. In [16] we discussed significance analysis in the context of neural networks. Deterministic global optimization through a check for first-order optimality is described in [17]. In the following we show how to use interval adjoints for a monotonicity check of structural separators and for the verification of these separators.
The paper is organized as follows: In Section 2 we define structural separability and we formulate the necessary monotonicity condition for the decomposition of the optimization problem. Examples for functions that are non-separable by (1) but fulfill the new definition such that their corresponding optimization problem can still be decomposed are given. Section 3 explains how to implement the presented work and how to integrate it into a b&b algorithm for deterministic global optimization. Therefore, interval adjoints are utilized for the examination of the monotonicity condition and for automatic detection of separators. In Section 4 we show results from a proof of concept implementation for the examples from Section 2 followed by conclusion and outlook in Section 5.

Subdomain Separability
We introduce subdomain separability and we show how to exploit this property in global optimization.
with disjoint and non-empty index sets X 1 and X 2 . The scalar function s (x i ) i∈X1 is called structural separator.
Conventionally separable functions as in (1) are covered by Definition 1 with structural separators Application of the chain rule of differentiation to differentiable structurally separable functions yields the gradient If X 1 only contains a single element, then the structurally separable function f also satisfies (2) with g(x i ) = ds dxi (x i ) and h(x) = df ds (x). Theorem 1. Consider the global optimization problem with structurally separable, non-convex and differentiable objective function f and separator s (x i ) i∈X1 . If the objective function is monotonic w.r.t. the separator on the domain, that is, and then the optimization problem in (3) can be decomposed into Proof. From (4) we know that the objective function is either monotonically increasing or decreasing w.r.t. the separator. In case it is monotonically increasing, that is df ds (x) ≥ 0, over the subdomain D, we have for s − ≤ s + . As to ∂fs ∂xi (s, (x j ) j∈X2 ) = 0 for i ∈ X 1 , and due to monotonicity ∂fs ∂s (s, (x j ) j∈X2 ) > 0 the global minimum of f requires the separator s to be minimal on the domain. The monotonic decrease scenario is handled analogously.
Remark 1. The dimension of the inner optimization problem as in (6) is |X 1 | while the dimension of the outer optimization problem in (5) is |X 2 | + 1.

Remark 2.
If s (x i ) i∈X1 is also structurally separable, then the separation approach can be applied recursively and the original optimization problem decomposes into even smaller disjoint optimization problems.
Remark 3. If two structural separators s 1 (x i ) i∈X1 and s 2 (x i ) i∈X2 fulfilling (4) are independent of each other, i.e. X 1 ∩X 2 = ∅, the decomposed optimization problems can be solved in parallel. Otherwise, either separator s 1 or s 2 needs to be optimized first if X 1 ⊂ X 2 or X 2 ⊂ X 1 , respectively.

Remark 4.
If the monotonicity condition in (4) holds for separator s = x i , i ∈ {0, . . . , n−1} then the minimum is located at the boundary either at min xi∈Di

Remark 5.
A degenerate solution is implied if df ds (x) = 0 for all x ∈ D and D contains more than one element. Remark 6. If the monotonicity condition is violated, then the structural separability can still be exploited similar to [10] by solving ds dxi (x i ) i∈X1 = 0 for finding stationary points. As already proposed in Section 1 this approach does not necessarily compute all stationary points.
Examples Five test problems are investigated in the light of subdomain separability. They illustrate different aspects of the general approach. Besides the partially separable function in Example 1, there is the exponential function which is solvable in parallel and globally monotonic in Example 2, a recursive exponential function which is still globally monotonic but cannot be solved in parallel in Example 3 and the Shubert function in Example 4 that is not globally monotonic but solvable in parallel. Example 5 can neither be solved in parallel nor is it globally monotonic but it could still benefit from subdomain separability.
Example 1 (Styblinski-Tang function [18]). Partially separable functions as in (1) are structurally separable and always fulfill the monotonicity condition in (4) with on any domain which yields the well-known fact that the corresponding optimization problem can be decomposed and solved in parallel. For example, the Styblinski-Tang function is as in (1) except for the factor in front of the sum. In [3] it is marked as nonseparable. Still, the problem can be decomposed into for any x ∈ R n . Example 2 (Exponential function [19]). For the exponential function we choose s i = x 2 i to be the separators and the derivative of the objective w.r.t. these separators is equal to The exponential function is globally monotonically increasing. Theorem 1 becomes applicable to all separators. The resulting subproblems can be solved in parallel.

Example 3 (Recursive exponential function).
To demonstrate the usefulness of structural separability we consider the optimization problem in (3) which is non-separable in a conventional manner, but fulfills Definition 1 with separators y i , i = 1, . . . , n−1. To decompose the optimization problem it remains to be shown that the derivatives of the objective with respect to the separators df dyi (x) for i = 0, . . . , n are positive (or negative) on any subdomain. From By mathematical induction we show that y i ≥ 1 for i = 0, . . . , n. The basis y 0 = 1 obviously fulfills the statement. The assumption y i ≥ 1 yields due to monotonicity of the exponential function. Thus, y i ≥ 1 and df dyi (x) ≥ 1 for i = 0, . . . , n. Furthermore, we know that the global minimum is located at As a consequence of Theorem 1 the optimization problem can be reformulated as Note, that this function is globally monotonic w.r.t. the separator which does not necessarily hold in general. Since the separators are partially dependent on each other the corresponding optimization problems need to be solved sequentially beginning with y * 1 . Example 4 (Shubert function [20]). The Shubert function is given by Each factor of the multiplication can be considered as a structural separator with s i = 5 j=1 cos((j + 1)x i + j). Derivatives of the function value w.r.t. the separators are derived as If any s i is either positive or negative, then the corresponding optimization problem can be decomposed by Theorem 1.
Example 5 (Salomon function [10]). We show that the Salomon function is separable only on selected subdomains. The differentiable program is given by

Introduction of an intermediate result
As dS dsi (x) is always positive it remains to be shown that df dS (x) is either positive or negative. The roots of df dS (x) are The function is monotonic between those roots. Thus, Theorem 1 can be applied to the Salomon function on the (sub-)domain x ∈ Sz √ n , Sz+1 √ n n for all z ∈ N + . If z is even, the minimum of the separator is required for a minimum of the objective function. Otherwise, if z is odd the separator needs to be maximized to obtain a minimum of the objective function.
Next, we show how to compute interval adjoints and how they can be used to apply Theorem 1 to a differentiable program implementing a function f . Furthermore, we use interval adjoints to verify structural separators.

Implementation
Let f : R n → R be implemented as a differentiable program y = f (x) with independent variables x and dependent variable y. Following [13], we assume that at a particular argument x the implementation of f can be expressed by a finite sequence of elemental function evaluations as where v j for j = n, . . . , n + p − 1 are referred to as intermediate variables.
The precedence relation i ≺ j indicates a direct dependency of v j on v i . Furthermore, the transitive closure ≺ * of ≺ induces a partial ordering of all indices j = 0, . . . , n + p. Equation (7) is also referred to as the single assignment code (SAC) of f . The SAC may not be unique due to commutativity, associativity and distributivity. We assume a SAC to be given.

Interval Arithmetic
Interval arithmetic (IA) is a concept that enables the computation of bounds of a function evaluation on a given interval. A closed interval of a variable x with lower bound x and upper bound x is denoted as If there is only a single element in [x], i.e, the endpoints are equal x = x, then the square brackets [·] are dropped and x is called a degenerate interval. In that sense IA represents an extension of the real/floating-point number system. Interval vectors are denoted by bold letters and have endpoints for each component

When evaluating a function y = f (x) in IA on [x] we are interested in the information
The asterisk denotes the united extension which computes the true range of values on the given domain. United extensions for all unary and binary elementary functions and arithmetic operations are known and endpoint formulas can be looked up e.g. in [12]. Unfortunately, the derivation of endpoint formulas for the united extensions of composed functions might be expensive or even impossible. Hence, we will compute corresponding estimates by natural interval extensions. A natural interval extension can be obtained by replacing all elemental functions ϕ j in (7) with their corresponding united extensions as The computation of the interval function value by the natural interval extension from (8) results in The superset relation states that the interval [y] can be an overestimation of all possible values over the given domain, but it guarantees enclosure. Furthermore, the natural interval extension of Lipschitz continuous functions converges linearly to the united extension with decreasing domain size.
The reader is referred to [11,12,21,22] for more information on the topic.

Adjoint Algorithmic Differentiation
Algorithmic differentiation (AD) techniques [13,14] use the chain rule to compute in addition to the function value of a primal implementation its derivatives with respect to independent variables at a specified point. The adjoint or backward mode of AD propagates derivatives of the function w.r.t. independent and intermediate variables in reverse relative to the order of their computation in the primal SAC. The computationally intractable combinatorial optimization problem known as DAG Reversal [23] is implied.
Following [14], first-order adjoints are marked with a subscript (1) . They are defined as A single adjoint computation with seed y (1) = 1 results in the gradient df dx (x) stored in x (1) .
The adjoint of (7) can be implemented by (7) itself followed by The evaluation of the adjoint yields the adjoints of all intermediate variables v j v (1),j = y (1) · df dv j (x) , j = n + p, . . . , n .

Interval Adjoints
The natural interval extension of (7) and (9) Compared to the traditional approach of AD in which the derivatives are only computed at specified points, we now get globalized derivatives that contain all possible values of the derivative over the specified domain. The interval adjoints in (9) might be overestimated compared to the united extension as it is already stated for the interval values in Section 3.1. The natural interval extension of the adjoint converges linearly for continuously differentiable functions [24]. Higherorder converging interval extensions of adjoints can be derived, e.g. by centered forms.
Monotonicity Check A single evaluation of the interval adjoint for y (1) = 1 suffices to verify monotonicity as in (4) for all independent and intermediate variables.
If the separation approach is embedded into a b&b solver that involves verification of the first-order optimality condition by interval adjoints, then the monotonicity check is for free, assuming that the separators are known apriori.

Verification of Separators
Interval adjoints can be used to detect if an intermediate variable s is a separator. Note that df ds ([x]) as well as df dxi ([x]) are assumed to be available from the adjoint evaluation required for the monotonicity check. An additional evaluation of (10) is required with the adjoint of the intermediate variable set to s (1) = df ds ([x]). The resulting adjoints of the independent variables become equal to ) .
If f is structurally separable and fulfills Definition 1 with separator s, then needs to hold over the entire domain, which can be verified by and since all other independent variables need to satisfy If any x (1),i fulfills neither (11) nor (12), then s is not a separator. Consequently, in addition to the interval adjoint evaluation for the monotonicity check another interval adjoint evaluation is required for the verification of each separator candidate.
An exhaustive search for separators should be avoided, due to the potentially high number of intermediate variables and the associated number of separator candidates. Separators given by expert users can be verified efficiently. Since structural separability as given in Definition 1 is domain-independent and thus is a global property, it is sufficient to identify the separators once before performing the global search.

Case Study
The general idea of b&b algorithms [21] used for global optimization problems as given in (3) is to remove all parts of the domain that cannot contain a global minimum. The implementation used for this case study is a variation of the one presented in [17] implementing Theorem 1. The user needs to specify at least one separator. The algorithm performs the following steps: of the subdomain to find a better bound y * ; separator check : Check monotonicity condition for apriori known separators and generate a subproblem if Theorem 1 is applicable.
Obviously, the improvement of the upper bound of the global minimum can be enhanced by local searches instead of evaluation of the objective function at the midpoint of the current subdomain. Recursive separation is not supported by the current version of the solver. It is the subject of ongoing development efforts. The software implements the required interval adjoints by using the interval type from the Boost library [25] as a base type of the first-order adjoint type provided by dco/c++ 1 [26]. Both template libraries make use of the concept of operator overloading as supported e.g. by C++.
On the left side of Fig. 1 isolines of the two-dimensional Shubert function over the domain [0, 2π] are shown with green lines around (local) minima and red lines around local maxima. The two global minima are marked by green crosses. The right side of Fig. 1 shows the subdomains that are considered by the b&b algorithm. For visualization the branching is set up to stop when the subdomain is smaller than 0.1 in any direction. Non-square domains result from the separation approach and only occur in regions that are proven to be monotonic by the interval adjoints. Green boxes are active domains that could contain the global minimum. White boxes are discarded by the value check. Orange boxes violate the first-order optimality condition. Our solver is used to find the global minima of the examples from Section 2. The algorithm is performed with and without separation. Structural separators are marked manually. The results are summarized in Table 1. Most of the presented examples benefit from the domain-dependent separation approach and have less subdomains generated by b&b if separation is enabled. The benefit increases with growing dimensionality due to the exponential complexity of the bisection. The Salomon function does not benefit from the domain-dependent separation since the relevant domains are already discarded by the value or firstorder optimality checks.
We only measure runtimes for the Styblinski-Tang example with n = 8 with and without exploiting subdomain separability. Since the derivative information is already available for all separators after the first-order optimality check, the monotonicity check only iterates over the separators defined by the user. The number of subdomains considered by the b&b algorithm without separation is 8820 times higher than with separation. The corresponding runtime without separation is only 7673 times higher than with separation. This observation correlates with the fact that the computations of subdomains that do not pass the value check are terminated immediately. The percentage of subdomains that are eliminated due to the value check is 30.2% for the case without separation and 2.8% with separation approach. The runtime estimates are averaged over 100 calls of the solver for both cases.
Our in-house solver has been designed as a playground for novel algorithms. Neither is it optimized for speed, nor does it feature state-of-the-art non-convex optimization methodology beyond the previously described b&b algorithm. Ultimately, we aim for integration of our ideas into modern software solutions for global optimization, e.g. [27,28].

Conclusion and Outlook
Our notion of separability combined with checks for monotonicity allows us to decompose an optimization problem into smaller optimization problems. It extends the verification of the first-order optimality condition as it was proposed in [10]. This also enables implementation of the proposed work as an add-on to deterministic global optimization algorithms by considering all possible optima instead of some candidates fulfilling first-order optimality condition. We explained how to utilize interval adjoints to verify monotonicity of the objective function w.r.t. all structural separators at the cost of a single adjoint evaluation. As a first result, we revisited examples from the literature that benefit from the domain-dependent separability approach. Furthermore, we showed how to verify the separation property of a variable in a given computer program at the cost of only two adjoint evaluations.
The verification of separators can be used as a starting point for research into heuristics for automatically detecting separators in a computer program. Further work in progress includes enabling recursive separation. Moreover, interval arithmetic can result in a significant overestimation of the true value range, e.g. due to the wrapping effect or the dependency problem. The replacement of interval adjoints by an adjoint version of affine arithmetic [29] or by McCormick relaxations [30,31,32] of adjoints is expected to yield tighter enclosures.