1 Introduction

In contrast to local optimization methods, deterministic global optimization methods, e.g., interval-based branch and bound (b &b) algorithms [1], guarantee to find the global solution for a predefined tolerance for optimality in finite time [2, 3]. These methods are more expensive in terms of computational effort than their local counterparts.

An important property that should be exploited during optimization is the decomposition of the index set of variables of an optimization problem. We consider continuously differentiable objective functions \(f: S \subset {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) with an available evaluation formula f(x), where \(S\in {\mathbb {I}}^n\) and \({\mathbb {I}}\) stands for the set of all closed real intervals. We follow [4] and denote a decomposition of the index set of the variables \(I=\{1,\ldots ,n\}\) into p subsets as \(I^{[j]}\subseteq I\) with \(\bigcup _{j=1}^p I^{[j]} = I\). Each subset has \(n^{[j]}=\big |I^{[j]}\big |\) elements and a corresponding part of the domain \(S^{[j]} \in {\mathbb {I}}^{n^{[j]}}\). In this paper, we assume disjoint subsets, i.e., \(I^{[j]} \cap I^{[k]} = \emptyset , \forall k, j\in \{1,\ldots ,p\}, j\ne k\), such that there are no common variables.

A function f is called partially separable (also: decomposable or block-separable) if it is of the form

$$\begin{aligned} f(x) = \sum _{j=1}^p f^{[j]}\big (x^{[j]}\big ),\ x^{[j]}\in S^{[j]}, \end{aligned}$$
(1)

with a given partitioning of the variable indices into p disjoint subsets and sub-functions \(f^{[j]}: S^{[j]} \subset {\mathbb {R}}^{n^{[j]}} \rightarrow {\mathbb {R}}\). A function is called (fully) separable [5, 6] if it is of the form (1) with \(n^{[j]}=1\), \(j\in \{1,\ldots ,p\}\).

Separability can be exploited for box constrained global optimization problems

$$\begin{aligned} y^*= \min _{x \in S} f(x), \end{aligned}$$
(2)

with search space \(S\in {\mathbb {I}}^n\) and partially separable objective function f as in (1). It is well known [7] that the global minimum and the corresponding minimizer of (2) can be obtained by decomposition into smaller optimization problems

$$\begin{aligned} y^*=\sum _{j=1}^p \min _{x^{[j]} \in S^{[j]}} f^{[j]}\big (x^{[j]}\big ), \end{aligned}$$
(3)

that can be solved in parallel. In the context of b &b algorithms it is easier to solve (3) than (2) due to the decomposition of the search space.

Separable functions have been extensively investigated in the context of optimization. In [8] a quasi-Newton method is introduced that exploits the structure of partially separable functions when computing secant updates for the Hessian matrix. A parallel b &b approach was used in [9] to find optima of non-convex problems with partially separable functions over a bounded polyhedral set. The automatic detection of partial separability as in (1) by algorithmic differentiation (AD) was presented in [10].

Another class of problems related to the present work is introduced in the context of incomplete global search in [11] and called as easy to optimize as decomposable functions. A function \(f:S\subset {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is in this class, if there exists functions \(g^{[i]}:S^{[i]}\subset {\mathbb {R}}\rightarrow {\mathbb {R}}\) and \(h^{[i]}:S\subset {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) with

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}x_i}(x) = g^{[i]}(x_i)\cdot h^{[i]}(x). \end{aligned}$$
(4)

Minimizer candidates can be obtained from \(g^{[i]}(x_i)=0\) which ensures first-order optimality condition

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}x_i}(x) = 0, \end{aligned}$$

is fulfilled. Minimizer that satisfy \(h^{[i]}(x)=0\) and minimizer at the boundary are not taken into consideration. Thus, there is no guarantee to obtain the global minimum by this approach.

In this paper, we aim to generalize the concept of separability in order to make previously non-decomposable functions also benefit from decomposition of the index set of the variables on subdomains that refer to the search space. Therefore, we introduce a special structure called structural separability and formulate an additional monotonicity condition on so-called structural separators in Sect. 2. Structural separability is less restrictive than (1), but has a similar property as (4). The monotonicity condition is necessary to guarantee that all possible optima are still considered after the decomposition, which is a crucial prerequisite for the integration into deterministic global optimization algorithms. Examples for functions that are non-separable by (1) but fulfill the new definition such that their corresponding optimization problem can still be decomposed are given.

Section 3 explains how to implement the presented work and how to integrate it into a b &b algorithm for deterministic global optimization. Therefore, enclosures of the gradient (also: interval gradient) of the objective function as a combination of reliable and inclusion isotonic interval computations [12] and adjoint AD [13, 14] are used for the examination of the monotonicity condition and for automatic detection of separators. Interval gradients have already been used in deterministic global optimization for, among others, the verification of optimality conditions [4, 15,16,17]. But they can also be used for significance based approximate computing as presented in [18] and discussed in the context of neural networks in [19].

In Sect. 4, we show the results of a proof of concept implementation for the examples from Sect. 2 followed by conclusion and outlook in Sect. 5.

2 Subdomain separability

We introduce subdomain separability and we show how to exploit this property in deterministic global optimization.

Definition 1

(Structural separability) A function \(f: S \subset {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is called structurally separable if there exists a decomposition of the index set of the variables into \(p+1\) disjoint subsets \(I^{[j]}\) such that

$$\begin{aligned} f(x) = h\left( f^{[1]}\big (x^{[1]}),\ldots ,f^{[p]}\big (x^{[p]}\big ),x^{[p+1]}\right) ,\ x^{[j]}\in S^{[j]},\ \forall j, \end{aligned}$$
(5)

with structural separators \(f^{[j]} : S^{[j]} \subset {\mathbb {R}}^{n^{[j]}} \rightarrow {\mathbb {R}}\), \(j\in \{1,\ldots ,p\}\) and \(h : {\mathbb {R}}^{p+n^{[p+1]}} \rightarrow {\mathbb {R}}\). Subsets \(I^{[j]}\), \(j\in \{1,\ldots ,p\}\) are assumed to be non-empty, but \(I^{[p+1]}\) can be empty if \(p>1\).

Remark 1

In the directed acyclic graph (DAG) representation of the optimization problem as described in [20], structural separators \(f^{[j]}\) are cut-vertices [21], i.e., each path connecting vertices corresponding to variables \(x^{[j]}\) with the vertex corresponding to the objective value f(x) passes through the cut-vertex corresponding to \(f^{[j]}\).

Remark 2

Conventionally separable functions as in (1) are covered by Definition 1 with

$$\begin{aligned} h\big (f^{[1]}\big (x^{[1]}\big ),\ldots ,f^{[p]}\big (x^{[p]}\big ),\emptyset \big ) = f^{[1]}\big (x^{[1]}\big )+\ldots +f^{[p]}\big (x^{[p]}\big ) . \end{aligned}$$

Remark 3

Application of the chain rule of differentiation to differentiable structurally separable functions yields the gradient elements

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}x^{[j]}}(x) = \frac{{\textrm{d}}f}{{\textrm{d}}f^{[j]}}(x)\cdot \frac{{\textrm{d}}f^{[j]}}{{\textrm{d}}x^{[j]}}\big (x^{[j]}\big ) ,\ j \in \{1,\ldots ,p\}, \end{aligned}$$
(6)

with

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[j]}}(x)=\frac{\partial h}{\partial f^{[j]}} \left( f^{[1]}(x^{[1]}),\ldots ,f^{[p]}(x^{[p]}),x^{[p+1]}\right) . \end{aligned}$$

Equation (6) is the same as (4) with \(g^{[j]}(x^{[j]})=\frac{{\textrm{d}}f^{[j]}}{{\textrm{d}}x^{[j]}}(x^{[j]})\), \(h^{[j]}(x)=\frac{{\textrm{d}}f}{{\textrm{d}}f^{[j]}}(x)\) and \(n^{[j]}=1, \forall j\).

Theorem 1

(Subdomain separability) Consider the box constrained global optimization problem as in (2) with structurally separable and differentiable objective function f as in (5) and structural separators \(f^{[j]}\), \(j\in \{1,\ldots ,p\}\). If the objective function is monotonic w.r.t. a structural separator \(f^{[k]}\) on the search space S, that is,

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x)\ge 0\quad \forall x\in S\ \vee \ \frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x)\le 0\quad \forall x\in S, \end{aligned}$$
(7)

then the optimization problem in (2) can be decomposed into

$$\begin{aligned} \min _{x^{[j]}\in S^{[j]} \forall j\in \{1,\ldots ,p+1\}\backslash k}&h\left( f^{[1]}(x^{[1]}),\ldots ,s^{[k],*},\ldots ,f^{[p]}(x^{[p]}),x^{[p+1]}\right) , \end{aligned}$$
(8)
$$\begin{aligned} \mathrm {s.t.}\ {}&s^{[k],*} = {\left\{ \begin{array}{ll} \displaystyle \min _{x^{[k]}\in S^{[k]}} f^{[k]}\big (x^{[k]}\big )\ \textrm{if}\ \frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x)\ge 0\ \forall x\in S,\\ \displaystyle \max _{x^{[k]}\in S^{[k]}} f^{[k]}\big (x^{[k]}\big )\ \textrm{if}\ \frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x)\le 0\ \forall x\in S. \end{array}\right. } \end{aligned}$$
(9)

Proof

If the objective function f is monotonically increasing w.r.t. the structural separator \(f^{[k]}\), i.e., \(\frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x) \ge 0\ \forall x\in S\), we have

$$\begin{aligned}{} & {} h\left( f^{[1]}(x^{[1]}),\ldots ,s^{[k],-},\ldots ,f^{[p]}(x^{[p]}),x^{[p+1]}\right) \nonumber \\{} & {} \quad \le h\left( f^{[1]}(x^{[1]}),\ldots ,s^{[k],+},\ldots ,f^{[p]}(x^{[p]}),x^{[p+1]}\right) , \ \forall x^{[j]} \in S^{[j]}, \ \forall j, \end{aligned}$$
(10)

for \(s^{[k],-} \le s^{[k],+}\in {\mathbb {R}}\). As to \(\frac{\partial h}{\partial x^{[k]}}(f^{[1]}(x^{[1]}),\ldots ,f^{[p]}(x^{[p]}),x^{[p+1]})=0\), and due to (10) the global minimum of f requires the separator \(f^{[k]}\) to be a minimum on the search space. The monotonic decrease scenario is handled analogously. \(\square \)

Remark 4

The dimension of the inner optimization problem in (9) is \(n^{[k]}\) while the dimension of the outer optimization problem in (8) is \(n-n^{[k]}+1\).

Remark 5

A degenerate solution is implied if \(\frac{{\textrm{d}}f}{{\textrm{d}}f^{[k]}}(x)=0,\ \forall x\in S\) and S contains more than one element.

Remark 6

If there are multiple structural separators, e.g., \(f^{[j]}\) and \(f^{[k]}\), \(j\ne k\), fulfilling (7), then the inner optimization problems in (9) can be solved in parallel.

Remark 7

If structural separator \(f^{[j]}\) is also structurally separable, then the decomposition approach can be applied recursively and the original optimization problem decomposes into even smaller disjoint optimization problems. In that case, we will introduce an additional superscript to distinguish structural separators belonging to different recursion depths. The j-th structural separator that belongs to recursion depth \(\ell \) is denoted by \(f^{[j],\ell }\).

If there are structural separators \(f^{[j],\ell +1}\) and \(f^{[k],\ell }\) with \(I^{[j],\ell +1}\subset I^{[k],\ell }\), then the optimization problems for \(f^{[j],\ell +1}\) should be solved before solving the optimization problems for \(f^{[k],\ell }\).

Remark 8

If the monotonicity condition is violated, then the structural separability can still be exploited similar to [11] by solving \(\frac{{\textrm{d}}f^{[k]}}{{\textrm{d}}x^{[k]}}\big (x^{[k]}\big )=0\) for finding stationary points. As already stated in Sect. 1, this approach does not necessarily compute all stationary points.

For the remainder of this paper, we will consider that the evaluation formula of the objective function \(y=f(x)\) is implemented as a differentiable program with independent variables x and dependent variable y. Following [13], we assume that at a particular argument x the differentiable program of f can be expressed by a finite sequence of function evaluations as

$$\begin{aligned} \begin{alignedat}{2} v_i&= x_i,\quad{} & {} i=1,\ldots , n,\\ v_j&= \varphi _j(v_i)_{i\prec j},\quad{} & {} j=n+1,\ldots ,n+p+1,\\ y&= v_{n+p+1}, \end{alignedat} \end{aligned}$$
(11)

where \(v_j\) for \(j=n+1,\ldots ,n+p\) are referred to as intermediate variables. The precedence relation \(i\prec j\) indicates a direct dependency of \(v_j\) on \(v_i\). Furthermore, the transitive closure \(\prec ^*\) of \(\prec \) induces a partial ordering of all indices \(j=1,\ldots ,n+p+1\). Equation (11) is also referred to as the single assignment code (SAC) of f. The SAC may not be unique due to commutativity, associativity and distributivity. We assume a SAC to be given.

2.1 Examples

Five test problems are investigated in the light of subdomain separability. They illustrate different aspects of the general approach. Besides the partially separable function in Example 1, there is the exponential function which is solvable in parallel and globally monotonic in Example 2, a recursive exponential function which is still globally monotonic but cannot be solved in parallel in Example 3 and the Shubert function in Example 4 that is not globally monotonic, but solvable in parallel. Example 5 can neither be solved in parallel nor it is globally monotonic but it could still benefit from subdomain separability. The SACs of the functions and their DAG representations are visualized to highlight that the vertices corresponding to the structural separators are cut-vertices.

Example 1

(Styblinski-Tang function [22]) Partially separable functions as in (1) are structurally separable and always fulfill the monotonicity condition in (7) with

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[j]}}(x) = 1,\ \forall j\in \{1,\ldots ,p\}, \end{aligned}$$

on any search space S which yields the well-known fact that the corresponding optimization problem can be decomposed and solved in parallel.

For example, the Styblinski-Tang function

$$\begin{aligned} f(x)=\frac{1}{2}\sum _{i=1}^{n}\left( x_i^4 -16x_i^2+5x_i\right) , \end{aligned}$$

has the form of (1) except for the factor in front of the sum. Due to the factor, this function is marked as non-separable in [5]. Still, the problem can be decomposed into \(f(x)=\frac{1}{2}\sum _{i=1}^{n}f^{[i]}(x^{[i]})\) with structural separators \(f^{[i]}:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(f^{[i]}(x^{[i]})=x_i^4 -16x_i^2+5x_i\) due to \(\frac{{\textrm{d}}f}{{\textrm{d}}f^{[i]}}(x)=\frac{1}{2},\ \forall i\in \{1,\ldots ,n\},\ \forall x\in {\mathbb {R}}^n\). The SAC of this function is given by

$$\begin{aligned} \begin{alignedat}{2} v_i&= x_i^4-16 x_i^2+5 x_i,\ i=1,\ldots , n,\\ \Sigma&= \sum _{i=1}^n v_i,\\ y&= \frac{1}{2} \Sigma , \end{alignedat} \end{aligned}$$
(12)

and the DAG representing (12) is shown in Fig. .

Fig. 1
figure 1

DAG representing the program structure from Example 1 and Example 2 for \(n=4\)

Example 2

(Exponential function [23]) For the exponential function

$$\begin{aligned} f(x) = -\exp \left( -\frac{1}{2}\sum _{i=1}^{n}{x_i^2}\right) , \end{aligned}$$

with SAC

$$\begin{aligned} \begin{alignedat}{2} v_i&= x_i^2,\ i=1,\ldots , n,\\ \Sigma&= \sum _{i=1}^n v_i,\\ y&= -\exp \left( -\frac{1}{2} \Sigma \right) , \end{alignedat} \end{aligned}$$
(13)

we have structural separators \(f^{[i]}:{\mathbb {R}}\rightarrow {\mathbb {R}}\) with \(f^{[i]}(x^{[i]})=x_i^2\) and derivative of the objective w.r.t. these separators equal to

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[i]}}(x) =\frac{1}{2}\exp \left( -\frac{1}{2}\sum _{i=1}^{n}{x_i^2}\right) . \end{aligned}$$

The exponential function is globally monotonically increasing such that Theorem 1 becomes applicable to all separators and the resulting subproblems can be solved in parallel.

The DAG of this function representing (13) is also given in Fig. 1 since it has the same structure as the DAG of the function from Example 1.

Example 3

(Recursive exponential function) For a demonstration of the usefulness of structural separability we consider the optimization problem in (2) with search space \(S=\left[ -2,3\right] ^n\), objective function \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\),

$$\begin{aligned} f(x)&= h\left( f^{[1],1}(x^{[1],1}),x^{[2],1}\right) ,\\ f^{[1],\ell }\left( x^{[1],\ell }\right)&= h\left( f^{[1],\ell +1}(x^{[1],\ell +1}),x^{[2],\ell +1}\right) , \ \forall \ell \in \{1,\ldots ,n-2\},\\ f^{[1],n-1}\left( v\right)&= \exp \left( v^2\right) ,\\ h\left( z,v\right)&= \exp \left( v^2+z-1\right) , \end{aligned}$$

with structural separable function \(h:{\mathbb {R}}^2\rightarrow {\mathbb {R}}\), structural separators \(f^{[1],\ell }\) and corresponding subsets of the index set \(I^{[1],\ell } = \{1,\ldots ,n-\ell \}\), \(I^{[2],\ell } = \{n-\ell +1\}\), \(\ell \in \{1,\ldots ,n-1\}\). The differentiable program is given by the SAC

$$\begin{aligned} \begin{alignedat}{2} v_1&= \exp (x_1),\\ v_i&= \exp (x_i^2 + v_{i-1}-1),\ i=2,\ldots ,n,\\ y&= v_n. \end{alignedat} \end{aligned}$$
(14)

The program is non-decomposable in a conventional manner, but it fulfills Definition 1 for all structural separators. From the DAG representation of this function given in Fig.   it becomes clear that the separators are arranged in a row which requires a recursive decomposition of the optimization problem.

To decompose the optimization problem it remains to be shown that the derivatives of the objective w.r.t. the separators \(\frac{{\textrm{d}}f}{{\textrm{d}}f^{[1],\ell }}(x)\), \(\ell \in \{1,\ldots ,n-1\}\), are non-negative (or non-positive) on any subdomain. From

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[1],1}}(x) = f(x), \end{aligned}$$

and

$$\begin{aligned} \frac{{\textrm{d}}f^{[1],\ell }}{{\textrm{d}}f^{[1],\ell +1}}\left( x^{[1],\ell }\right) = f^{[1],\ell }\left( x^{[1],\ell }\right) , \end{aligned}$$

it follows that

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[1],\ell }}(x) = \frac{{\textrm{d}}f}{{\textrm{d}}f^{[1],1}}(x) \cdot \prod _{j=1}^{\ell } \frac{{\textrm{d}}f^{[1],j}}{{\textrm{d}}f^{[1],j+1}}\left( x^{[1],j}\right) = f(x) \cdot \prod _{j=1}^{\ell } f^{[1],j}\left( x^{[1],j}\right) . \end{aligned}$$

By mathematical induction we show that \(f^{[1],\ell }(x^{[1],\ell })\ge 1\) for \(\ell \in \{1,\ldots ,n-1\}\). The basis \(f^{[1],n-1}(x^{[1],n-1})=\exp (x_1^2)\ge \exp (0)=1\) obviously fulfills the statement. The assumption \(f^{[1],\ell +1}(x^{[1],\ell +1}) \ge 1\) yields

$$\begin{aligned} f^{[1],\ell }\left( x^{[1],\ell }\right)&= \exp \left( x_{n-\ell +1}^2 + f^{[1],\ell +1}\left( x^{[1],\ell +1}\right) - 1\right) \\&\ge \exp \left( x_{n-\ell +1}^2 + 1 - 1\right) \ge \exp (0) = 1, \end{aligned}$$

due to monotonicity of the exponential function. The same is true for f(x). Thus, \(f^{[1],\ell }(x^{[1],\ell })\ge 1\) and \(\frac{{\textrm{d}}f}{{\textrm{d}}f^{[1],\ell }}(x) \ge 1\) for \(\ell \in \{1,\ldots ,n-1\}\). Furthermore, we know that the global minimizer \(x^*\) is located at \(x^*_i=0\), \(i\in \{1,\ldots ,n\}\), with a minimum of \(f(x^*)=1\).

As a consequence of Theorem 1 the optimization problem can be reformulated as

$$\begin{aligned} \min _{x_n\in \left[ -2,3\right] }&h\left( s^{[1],1,*},x_n\right) ,\\ \mathrm {s.t.}&\ s^{[1],\ell ,*} \quad = \min _{x_{n-\ell +1} \in \left[ -2,3\right] } h\left( s^{[1],\ell +1,*},x_{n-\ell +1}\right) ,\ \forall \ell \in \{1,\ldots ,n-2\}\\&s^{[1],n-1,*} =\ \ \min _{x_1 \in \left[ -2,3\right] }\quad \exp \left( x_1^2\right) ,\\&h\left( z,v\right) \ \ = \exp \left( v^2+z-1\right) . \end{aligned}$$

Note, that this function is globally monotonic w.r.t. the separator which does not necessarily hold in general. Since the separators are recursively dependent on each other, the corresponding optimization problems need to be solved sequentially beginning with \(s^{[1],n-1,*}\).

Fig. 2
figure 2

DAG representing the program structure from Example 3 for \(n=4\)

Example 4

(Shubert function [24]) The Shubert function is given by \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) with

$$\begin{aligned} f(x)=\prod _{i=1}^{n}\sum _{j=1}^5{\cos ((j+1)x_i+j)}, \end{aligned}$$

and SAC

$$\begin{aligned} \begin{alignedat}{2} v_i&= \sum _{j=1}^5{\cos ((j+1)x_i+j)},\ i=1,\ldots , n,\\ y&= \prod _{i=1}^n v_i, \end{alignedat} \end{aligned}$$
(15)

Its DAG representation is shown in Fig. . Each factor of the multiplication can be considered as a structural separator with \(f^{[i]}(x^{[i]})=\sum _{j=1}^5{\cos ((j+1)x_i+j)}\) and \(I^{[i]}=\{i\}\) for \(i\in \{1,\ldots ,n\}\). Derivatives of the function value w.r.t. the separators are derived as

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[i]}}(x) = \prod _{\begin{array}{c} k=1 \\ k\ne i \end{array}}^{n} f^{[k]}(x^{[k]}). \end{aligned}$$

If the monotonicity condition holds for any structural separator, then the corresponding optimization problem can be decomposed by Theorem 1.

Fig. 3
figure 3

DAG representing the program structure from Example 4 for \(n=4\)

Example 5

(Salomon function [11]) We show that the Salomon function \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is separable only on selected subdomains. The differentiable program is given by

$$\begin{aligned} f(x) = 1-\cos \left( 2\pi \sqrt{\sum _{i=1}^{n} x_i^2}\right) +0.1\sqrt{\sum _{i=1}^{n} x_i^2}, \end{aligned}$$

with SAC

$$\begin{aligned} \begin{alignedat}{2} v_i&= x_i^2,\ i=1,\ldots , n,\\ \Sigma&=\sum _{i=1}^{n} v_i,\\ g&= \sqrt{\Sigma },\\ y&= 1-\cos \left( 2\pi g\right) +0.1 g. \end{alignedat} \end{aligned}$$
(16)

The DAG of (16) is visualized in Fig. . Introduction of an intermediate result \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) with \(g(x)=\sqrt{\sum _{i=1}^{n} x_i^2}\) and structural separators \(f^{[i]}:{\mathbb {R}}\rightarrow {\mathbb {R}}\) with \(f^{[i]}(x^{[i]}) = x_i^2\), \(i\in \{1,\ldots ,n\}\), yields the derivatives

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}f^{[i]}}(x)&=\frac{{\textrm{d}}f}{{\textrm{d}}g}(x)\cdot \frac{{\textrm{d}}g}{{\textrm{d}}f^{[i]}}(x),\\ \frac{{\textrm{d}}f}{{\textrm{d}}g}(x)&=2\pi \sin (2\pi g(x))+0.1,\\ \frac{{\textrm{d}}g}{{\textrm{d}}f^{[i]}}(x)&=\frac{1}{2g(x)}. \end{aligned}$$

As \(\frac{{\textrm{d}}g}{{\textrm{d}}f^{[i]}}(x)\) is always positive it remains to show that \(\frac{{\textrm{d}}f}{{\textrm{d}}g}(x)\) is either non-positive (or non-negative) on any subdomain. The roots of \(\frac{{\textrm{d}}f}{{\textrm{d}}g}(x)\) are

$$\begin{aligned} g_{2z-1}=z+\arcsin \left( -\frac{0.1}{2\pi }\right) \frac{1}{2\pi }\ \wedge \ g_{2z}=z-\frac{1}{2}-\arcsin \left( -\frac{0.1}{2\pi }\right) \frac{1}{2\pi }, \ z\in {\mathbb {N}}^+. \end{aligned}$$

The function is monotonic between these roots. Thus, Theorem 1 can be applied to the Salomon function on the search domain \(S=\left[ \frac{g_z}{\sqrt{n}},\frac{g_{z+1}}{\sqrt{n}}\right] ^n\) for all \(z\in {\mathbb {N}}^+\). If z is even, then the minimum of the separator is required for a minimum of the objective function. Otherwise, if z is odd, then the separator needs to be maximized to obtain a minimum of the objective function.

Fig. 4
figure 4

DAG representing the structure of the evaluation formula from Example 5 for \(n=4\)

3 Implementation

In this section, we present how to compute interval gradients by adjoint AD, how they can be used to apply Theorem 1 to a differentiable program implementing a function f and how to verify structural separators.

3.1 Interval arithmetic

Interval arithmetic (IA) is a concept that enables the computation of bounds of a function evaluation on a given interval. A closed interval \([x]\in {\mathbb {I}}\) of a variable \(x\in {\mathbb {R}}\) with lower bound \(\underline{x}\in {\mathbb {R}}\) and upper bound \(\overline{x}\in {\mathbb {R}}\) is denoted as

$$\begin{aligned} \left[ x\right] = \left[ \underline{x},\overline{x}\right] = \left\{ x\in {\mathbb {R}}\ |\ \underline{x}\le x\le \overline{x}\right\} . \end{aligned}$$

If there is only a single element in \(\left[ x\right] \), i.e, the endpoints are equal \(\underline{x}=\overline{x}\), then the square brackets \(\left[ \cdot \right] \) are dropped and x is called a degenerate interval. In that sense IA represents an extension of the real/floating-point number system.

Interval vectors \([x]\in {\mathbb {I}}^n\) have endpoints for each component

$$\begin{aligned} \left[ x\right] = \left[ \underline{x},\overline{x}\right] =\left\{ x\in {\mathbb {R}}^n\ |\ \underline{x_i}\le x_i\le \overline{x_i}\right\} . \end{aligned}$$

When evaluating a function \(y=f(x)\) in IA on \(\left[ x\right] \) we are interested in the information

$$\begin{aligned} \left[ y\right] =f^*(\left[ x\right] )=\{f(x)\ |\ x\in \left[ x\right] \}. \end{aligned}$$

The asterisk denotes the united extension which computes the true range of values on the given domain. United extensions for all unary and binary elementary functions and arithmetic operations are known and endpoint formulas can be looked up, e.g., in [12]. Unfortunately, the derivation of endpoint formulas for the united extensions of composed functions might be expensive or even impossible. Hence, we will compute corresponding estimates by natural interval extensions. A natural interval extension can be obtained by replacing all elemental functions \(\varphi _j\) in (11) with their corresponding united extensions as

$$\begin{aligned} \begin{alignedat}{2} \left[ v_i\right]&=\left[ x_i\right] ,\quad{} & {} i=1,\ldots , n,\\ \left[ v_j\right]&=\varphi ^*_j(\left[ v_i\right] )_{i\prec j},\quad{} & {} j=n+1,\ldots ,n+p+1,\\ \left[ y\right]&=\left[ v_{n+p+1}\right] . \end{alignedat} \end{aligned}$$
(17)

The computation of the interval function value by the natural interval extension from (17) results in

$$\begin{aligned} \left[ y\right] =f(\left[ x\right] )\supseteq f^*(\left[ x\right] ). \end{aligned}$$

The superset relation states that the interval \(\left[ y\right] \) can be an overestimation of all possible values over the given domain, but it guarantees enclosure. Furthermore, the natural interval extension of Lipschitz continuous functions converges linearly to the united extension with decreasing domain size.

The reader is referred to [12, 25] for more information on the topic.

3.2 Adjoint algorithmic differentiation

AD techniques [13, 14] use the chain rule to compute in addition to the function value of a computer program implementing the evaluation formula f(x) of a function \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\), its derivatives w.r.t. the independent variables x at a particular point x.

The adjoint or backward mode of AD propagates derivatives of the function w.r.t. independent and intermediate variables in reverse relative to the order of their computation in the SAC (11). The computationally intractable combinatorial optimization problem known as DAG Reversal [26] is implied.

Following [14], first-order adjoints are marked with a subscript \(_{(1)}\). They are defined as

$$\begin{aligned} x_{(1)} = y_{(1)} \cdot \frac{{\textrm{d}}f}{{\textrm{d}}x}(x), \end{aligned}$$

with \(x_{(1)}\in {\mathbb {R}}^n\) and \(y_{(1)}\in {\mathbb {R}}\). A single adjoint computation with seed \(y_{(1)} = 1\) results in the gradient \(\frac{{\textrm{d}}f}{{\textrm{d}}x}(x)\) stored in \(x_{(1)}\). The adjoint of (11) can be implemented by (11) itself followed by

$$\begin{aligned} \begin{alignedat}{2} v_{(1),n+p+1}&= y_{(1)},\\ v_{(1),k}&= \sum _{j:k\prec j} v_{(1),j} \cdot \frac{\partial \varphi _j}{\partial v_k}(v_i)_{i\prec j} ,\quad{} & {} k=n+p+1,\ldots ,n+1,\\ x_{(1),i}&= v_{(1),i},\quad{} & {} i=n,\ldots , 1. \end{alignedat} \end{aligned}$$
(18)

The evaluation of (18) yields the adjoints of all intermediate variables \(v_j\) in addition to those of the independent variables

$$\begin{aligned} v_{(1),j}&= y_{(1)} \cdot \frac{{\textrm{d}}f}{dv_j}(x),\quad j=n+p+1,\ldots ,n+1. \end{aligned}$$

3.3 Interval adjoints

The natural interval extension of (11) and (18) computes the interval function value and its interval derivatives w.r.t. all independent and intermediate variables as the result of a single evaluation. It can be implemented as (17) followed by

$$\begin{aligned} \begin{alignedat}{2} \left[ v_{(1),n+p+1}\right]&= \left[ y_{(1)}\right] ,\\ \left[ v_{(1),k}\right]&= \sum _{j:k\prec j} \left[ v_{(1),j}\right] \cdot \frac{\partial \varphi ^*_j}{\partial v_k}(\left[ v_i\right] )_{i\prec j} ,\quad{} & {} k=n+p+1,\ldots ,n+1,\\ \left[ x_{(1),i}\right]&= \left[ v_{(1),i}\right] ,\quad{} & {} i=n,\ldots , 1. \end{alignedat} \end{aligned}$$
(19)

Compared to the traditional approach of AD in which the derivatives are only computed at specified points, we now get enclosures of the derivatives that contain all possible values of the derivative over the specified domain. The interval adjoints in (19) might be overestimated compared to the united extension as it is already stated for the interval values in Sect. 3.1. The natural interval extension of AD methods converges linearly for continuously differentiable functions [27]. Higher-order converging enclosures, e.g., centered forms [12], slopes or McCormick relaxations [28, 29] of AD methods can be derived [15, 27].

3.3.1 Monotonicity test

A single evaluation of the interval adjoint for \(\left[ y_{(1)}\right] =1\) suffices to verify monotonicity as in (7) for all independent and intermediate variables. If the separation approach is embedded into a b &b solver that involves verification of the first-order optimality condition by interval adjoints, then the monotonicity test is for free, assuming that the separators are known a priori.

3.3.2 Verification of separators

Interval adjoints can be used to verify if an intermediate variable v is a structural separator of its program representing function f. W.l.o.g. we assume that the program is structurally separable with decomposition of the index set into two subsets \(I^{[1]}\) and \(I^{[2]}\), structural separator v dependent on \(x^{[1]}\) and that its computing formula \(v(x^{[1]})\) is implemented in the program. Furthermore, we assume that the interval adjoints \(\frac{{\textrm{d}}f}{{\textrm{d}}v}(\left[ x\right] )\in {\mathbb {I}}\) as well as \(\frac{{\textrm{d}}f}{{\textrm{d}}x^{[1]}}(\left[ x\right] )\in {\mathbb {I}}^{n^{[1]}}\) are already computed for the monotonicity test on \(\left[ x\right] \) as described in Sect. 3.3.1. Evaluation of (19) with the adjoint of the intermediate variable set to \(\left[ v_{(1)}\right] =\frac{{\textrm{d}}f}{{\textrm{d}}v}(\left[ x\right] )\) yields

$$\begin{aligned} \left[ x_{(1)}\right] =\frac{{\textrm{d}}v}{{\textrm{d}}x}\left( \left[ x^{[1]}\right] \right) \cdot \frac{{\textrm{d}}f}{{\textrm{d}}v}(\left[ x\right] ). \end{aligned}$$
(20)

The assumption that f is structurally separable and satisfies Definition 1 with separator v is correct if

$$\begin{aligned} \frac{{\textrm{d}}f}{{\textrm{d}}x^{[1]}}(x)=&\frac{{\textrm{d}}v}{{\textrm{d}}x^{[1]}}\big (x^{[1]}\big )\cdot \frac{{\textrm{d}}f}{{\textrm{d}}v}(x),&\ \forall x \in \left[ x\right] ,\\ \frac{{\textrm{d}}v}{{\textrm{d}}x^{[2]}}\left( x^{[1]}\right) =&0,&\ \forall x \in \left[ x\right] , \end{aligned}$$

holds, which can be verified by checking equality

$$\begin{aligned} \left[ x^{[1]}_{(1)}\right]&=\frac{{\textrm{d}}f}{{\textrm{d}}x^{[1]}}(\left[ x\right] ), \end{aligned}$$
(21)
$$\begin{aligned} \left[ x^{[2]}_{(1)}\right]&=0, \end{aligned}$$
(22)

obtained from (20). If either (21) or (22) is violated, then v is not a structural separator. Thus, for the verification of each separator candidate one adjoint evaluation of (19) is required in addition to the one for the monotonicity test.

An exhaustive search for separators should be avoided, due to the potentially high number of intermediate variables and the associated number of separator candidates. Separators given by expert users can be verified efficiently. Since structural separability as given in Definition 1 is domain-independent and thus is a global property, it is sufficient to identify the separators once before performing the global search.

4 Case study

The general idea of b &b algorithms [25] used for global optimization problems as given in (2) is to remove all parts of the search space that cannot contain a global minimizer. The implementation used for this case study is a variation of the one presented in [17] implementing Theorem 1. The user needs to specify at least one separator. The algorithm performs the following steps:

  • multi-section: half-splitting in every dimension resulting in \(2^n\) new b &b nodes;

  • cut-off test: elimination of b &b node with search space \(\left[ x\right] \) if \(f(\left[ x\right] ) > \overline{y}^*\) with upper bound \(\overline{y}^*\) for the global minimum;

  • first-order optimality test: If \(\frac{{\textrm{d}}f}{{\textrm{d}}x_i}(\left[ x\right] )\ge 0\) and \(\underline{x_i}\) is a bound of original search space S, then recompute with \(x_i=\underline{x_i}\), else if \(\frac{{\textrm{d}}f}{{\textrm{d}}x_i}(\left[ x\right] )\le 0\) and \(\overline{x_i}\) is a bound of original search space S, then recompute with \(x_i=\overline{x_i}\), otherwise eliminate b &b node with search space \(\left[ x\right] \);

  • improvement of bound \(\overline{y}^*\): Evaluate the function at any point (e.g., midpoint) of the search space to find a better bound \(\overline{y}^*\);

  • separator test: Check monotonicity condition for a priori known separators and generate a subproblem if Theorem 1 is applicable.

Obviously, the improvement of the upper bound of the global minimum can be enhanced by local searches instead of evaluation of the objective function at the midpoint of the current search space. Recursive separation is not supported by the current version of the solver. It is the subject of ongoing development efforts.

The software implements the required interval adjoints by using the interval type from the Boost library [30] as a base type of the first-order adjoint type provided by dco/c++Footnote 1 [31]. Both template libraries make use of the concept of operator overloading as supported, e.g., by C++.

Fig. 5
figure 5

Isolines of the Shubert function for \(n=2\) (left) with green lines around local minima and red lines around local maxima. B &b nodes considered by the algorithm (right) with active nodes marked in green, white boxes are discarded by the cut-off test and orange boxes if the first-order optimality condition is violated. Non-square boxes result from the index set decomposition approach

On the left side of Fig. , isolines of the two-dimensional Shubert function over the domain \(\left[ 0,2\pi \right] \) are shown with green lines around (local) minima and red lines around local maxima. The two global minima are marked by green crosses. The right side of Fig. 5 shows the subdomains that are considered by the b &b algorithm. For visualization the branching is set up to stop at an accuracy of 0.1. Non-square search spaces result from the separation approach and only occur in regions that are proven to be monotonic by the interval adjoints. Green boxes are active search spaces that could contain a global minimizer. White boxes are discarded by the cut-off test. Orange boxes violate the first-order optimality condition.

Table 1 Number of generated b &b nodes by the algorithm without and with separation

Our solver is used to find the global minima of the examples from Sect. 2. The algorithm is performed with and without separation. Structural separators are marked manually. The results are summarized in Table . Most of the presented examples benefit from the domain-dependent index set decomposition approach and have less b &b nodes generated by the algorithm if separation is enabled. The benefit increases with growing dimensionality due to the exponential complexity of the multi-section. The Salomon function does not benefit from the domain-dependent index set decomposition since the relevant domains are already discarded by the cut-off or first-order optimality tests.

We only measure runtimes for the Styblinski-Tang example with \(n=8\) with and without exploiting subdomain separability. Since the derivative information is already available for all separators after the first-order optimality test, the monotonicity test only iterates over the separators defined by the user. The number of b &b nodes considered by the algorithm without separation is 8820 times higher than with separation. The corresponding runtime without separation is only 7673 times higher than with separation. This observation correlates with the fact that the computations of b &b nodes that do not pass the cut-off test are terminated immediately. The percentage of b &b nodes that are eliminated due to the cut-off test is \(30.2\%\) for the case without separation and \(2.8\%\) with separation approach. The runtime estimates are averaged over 100 calls of the solver for both cases.

Our in-house solver has been designed as a playground for novel algorithms. Neither is it optimized for speed, nor does it feature state-of-the-art non-convex optimization methodology beyond the previously described b &b algorithm. Ultimately, we aim for integration of our ideas into modern software solutions for deterministic global optimization, e.g., [32, 33].

5 Conclusion and outlook

Our notion of separability combined with tests for monotonicity allows us to decompose an optimization problem into smaller optimization problems. It extends the verification of the first-order optimality condition as presented in the context of incomplete global search in [11] so that it can be used for deterministic global optimization, i.e., instead of just finding a few candidates fulfilling first-order optimality our approach guarantees to consider all possible optima.

We explained how to utilize interval adjoints to verify monotonicity of the objective function w.r.t. all structural separators at the cost of a single adjoint evaluation. As a first result, we revisited examples from the literature that benefit from the domain-dependent separability approach. Furthermore, we showed how to verify the separation property of a variable in a given computer program at the cost of only two adjoint evaluations.

The verification of separators can be used as a starting point for research into heuristics for automatically detecting separators in a computer program. Further work in progress includes enabling recursive separation. Moreover, interval arithmetic can result in a significant overestimation of the true value range, e.g., due to the wrapping effect or the dependency problem. Adjoint versions of interval centered forms, slopes and the McCormick relaxations should achieve better convergence.