1 Introduction

The generic nonconcavity of maximization problems generally leads to multiple local optima. Standard optimality conditions tend to be local, and techniques for global optimization are usually algorithmic in nature, restricting the search for the best solution to subsets of the domain. For the simple case where the domain is an interval, a global maximizer of a continuously differentiable function can be found by using techniques from dynamic systems, notably by introducing global information in the form of an adjoint variable. In this manner, we construct expressions for solutions to a global optimization problem on an interval, which are directly related to dynamic interpretations in terms of optimal stopping and optimal starting. In addition to providing a full characterization of solutions to a global optimization problem over an interval, the adjoint variable can also be used locally to formulate necessary and sufficient optimality conditions for one-sided subproblems of the original global optimization problem.

1.1 Literature

Following [1], global optimization methods use either deterministic search algorithms (e.g., via gradient methods) or random-sampling procedures. The first type of algorithms consists of schemes for systematic search updates. The Bolzano search finds critical points of a concave objective function via bisection (see, e.g., [2], p. 122). The golden-section search by [3] for unimodal functions increases the efficiency of the bisection method by varying the subdivision using Fibonacci numbers; see also [4].Footnote 1 Algorithms based on steepest ascent, such as Newton’s method (see, e.g., [7], Ch. 9.5), tend to be greedy and therefore converge to local extrema. Improvements are achieved by using (deterministic) sampling techniques capitalizing on available knowledge about the variation of the function in terms of its Lipschitz constant [8]. The latter can be refined by locally estimating the Lipschitz constant [9], using a quadratic bound [10], or by employing a higher-order approach, e.g., considering additionally the Lipschitz constant for the variation of the gradient [11]. An overview of the second type of algorithms, based on random sampling, is provided by [12], Ch. 4. An alternative Bayesian approach, assuming a probabilistic model of the objective function as a stochastic process, was proposed by [13]. These algorithms amount to numerical techniques, predicated on the assumption that the objective function is expensive to evaluate or nonsmooth, so as to deny the possibility of direct analytical calculations. In breaking with this premise, our goal is to provide insights about the kind of information needed to compute solutions to a global optimization problem as well as their properties, rather than an attempt to improve on the numerical side.

We assume that the underlying objective function is continuously differentiable, and then reduce the solution of the global optimization problem to solving an “adjoint” differential equation. In the spirit of [14], this differential equation performs the somewhat unexpected task of aggregating global information about the available one-sided improvements. Since the adjoint equation has a discontinuous right-hand side, existence and uniqueness of the solution are obtained separately via successive Picard iterations (see, e.g., [15], p. 213), without relying on (here unavailable) Lipschitz constants.

1.2 Outline

The remainder of this paper is organized as follows. Section 2 introduces notation and basic concepts, most notably an auxiliary (adjoint) variable which represents the optimal improvement up to the interval horizon. Section 3 provides expressions for the solutions of a one-dimensional global optimization problem as well as necessary and sufficient global optimality conditions. Section 4 contains several examples to illustrate the results. It also clarifies the equivalence of global optimization with optimal stopping (or starting) problems. Section 5 discusses global optimality conditions and the relationship of the proposed methods to the analysis of optimal control problems. Section 6 concludes.

2 Preliminaries

For any given \(T>0\), consider the global optimization problemFootnote 2

$$\begin{aligned} F^* = \max _{t\in [0,T]} F(t), \end{aligned}$$
(P)

where \(F:[0,T]\rightarrow {{\mathbb {R}}}\) is a differentiable real-valued objective function with continuous derivative \(f:[0,T]\rightarrow {{\mathbb {R}}}\). By the Weierstrass theorem (see, e.g., [16], p. 540), problem (P) has a solution, i.e., its solution set \({{\mathcal {P}}}\subseteq [0,T]\) is nonempty, and the optimal value \(F^*\) is finite. Furthermore, it is well known that any (interior) optimizer \({\hat{t}}\in \,]0,T[\) (i.e., excluding the boundary points 0 and T) satisfies the Fermat condition,

$$\begin{aligned} f({\hat{t}}) = 0, \end{aligned}$$
(1)

but that there may be many points \({{\hat{t}}}\) that do not solve (P) but still satisfy \(f({{\hat{t}}})=0\). For example, if F is equal to a value \({\bar{F}}<F^*\) on a subinterval, then there is a continuum of such values. We are interested in characterizing the solution(s) to the global optimization problem, as element(s) of [0, T], including the boundaries. For this, we introduce an auxiliary function, also referred to as “adjoint variable,” \(x:[0,T]\rightarrow {{\mathbb {R}}}\) as the unique solution to the initial-value problemFootnote 3

$$\begin{aligned} \dot{x}(s) = \varPhi (f(T-s),x(s)), \quad x(0)=0, \end{aligned}$$
(2)

for \(s\in [0,T]\), where for any \(({\hat{t}},{\hat{x}})\in {{\mathbb {R}}}^2\):

$$\begin{aligned} \varPhi ({\hat{t}},{\hat{x}}):{=} \left\{ \begin{array}{ll} {\hat{t}}, &{}\quad \text{ if } \; {\hat{x}}>0,\\ \max \{0,{\hat{t}}\}, &{}\quad \text{ if } \; {\hat{x}}=0,\\ 0, &{}\quad \text{ otherwise }. \end{array}\right. \end{aligned}$$

The right-hand side of the differential equation in (2) is discontinuous and generally does not satisfy the Carathéodory conditions (see, e.g., [17], p. 3). Before we establish existence and uniqueness of a solution to the initial-value problem in the space \({{\mathcal {W}}}^{1,1}([0,T])\) of absolutely continuous functions on [0, T] (see Theorem 2.1 below), we provide a useful lower bound.

Lemma 2.1

For any \(s\in [0,T]\): \(x(s)\ge \max \{0,F(T) - F(T-s)\}\).

Proof

The adjoint variable x(s) cannot become negative, since Eq. (2) implies that \(\dot{x}\ge 0\) at the boundary of positivity, i.e., whenever \(x=0\). Thus, \(x(s)\ge 0\) for all \(s\in [0,T]\). We now show that \(x(s)\ge F(T)-F(T-s)\). For this, note that the solution to the initial-value problem

$$\begin{aligned} \dot{z}(s) = f(T-s), \quad z(0) = 0, \end{aligned}$$

for \(s\in [0,T]\), is of the form

$$\begin{aligned} z(s) = \int _{T-s}^T f(\theta )\,\mathrm{d}\theta = F(T) - F(T-s). \end{aligned}$$
(3)

Consider the difference \(\Delta :{=} x - z\). Then, \(\Delta (0)=0\) and, using the fact that \(x(s)\ge 0\), it is

$$\begin{aligned} \dot{\Delta }(s) = -\min \{0,f(T-s)\}{\mathbf 1}_{\{x(s)=0\}} = \max \{0,-f(T-s)\}{\mathbf 1}_{\{x(s)=0\}}\ge 0. \end{aligned}$$

Thus,

$$\begin{aligned} \Delta (s) = \int _{T-s}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta \ge 0, \quad s\in [0,T], \end{aligned}$$
(4)

which implies that \(x(s)\ge z(s)\) for all \(s\in [0,T]\). This proves the claim. \(\square \)

As explained in the next section, the adjoint variable x(s) measures the optimal improvement of the objective value \(F(T-s)\) on the interval \({[T-s,T]}\). Because the comparison set includes the current value of the objective function, the improvement must be nonnegative and has to exceed the difference \({F(T)-F(T-s)}\), at least weakly.

By Lemma 2.1 any solution x to Eq. (2), if it exists, cannot have negative values on [0, T]. Moreover, for any \(({\hat{t}},{\hat{x}})\in {{\mathbb {R}}}^2\):Footnote 4

$$\begin{aligned} {\hat{x}}\ge 0 \ \Rightarrow \ \varPhi ({\hat{t}},{\hat{x}}) = {\hat{t}}\,{\mathbf 1}_{\{\hat{x}>0\}} + \max \{0,{\hat{t}}\}\,{\mathbf 1}_{\{\hat{x}=0\}} = {\hat{t}} - \min \{0,{\hat{t}}\}\,{\mathbf 1}_{\{\hat{x}\le 0\}} =:{\hat{\varPhi }}({\hat{t}},\hat{x}). \end{aligned}$$

Thus, if we set \(\varphi (s):{=} f(T-s)\) and \(\varphi _-(s):{=} \min \{0,\varphi (s)\}\) for all \(s\in [0,T]\), then based on the preceding implication, the initial-value problem in Eq. (2) can be rewritten in the form

$$\begin{aligned} \dot{x}(s) = {\hat{\varPhi }}(\varphi (s),x(s)) = \varphi (s) - \varphi _-(s)\,{\mathbf 1}_{\{x(s)\le 0\}}, \ \ \ x(0)=0, \end{aligned}$$
(2')

without affecting its set \({{\mathcal {R}}}\subset {{\mathcal {W}}}^{1,1}([0,T])\) of solutions. The Sobolev space \({{\mathcal {W}}}^{1,1}([0,T])\) contains all absolutely continuous real-valued functions x defined on the domain [0, T] and equipped with the norm \(\Vert \cdot \Vert _{1,1}\), where

$$\begin{aligned} \Vert x\Vert _{1,1} = \int _0^T \left( |x(s)| + |\dot{x}(s)|\right) \,\mathrm{d}s. \end{aligned}$$
(5)

The vector space \({{\mathcal {W}}}^{1,1}([0,T])\) is a Banach space, i.e., a complete normed vector space, which means that any Cauchy sequence with elements in the vector space converges (in the \(\Vert \cdot \Vert _{1,1}\)-norm) to an element of the vector space. The solution set of the initial-value problem (2’) is

$$\begin{aligned} {{\mathcal {R}}} :{=} \{x\in {{\mathcal {W}}}^{1,1}([0,T]) : {\mathbf P}x = x \}, \end{aligned}$$

where the operator \({\mathbf P}:{{\mathcal {W}}}^{1,1}([0,T])\rightarrow {{\mathcal {W}}}^{1,1}([0,T])\) maps any absolutely continuous function x on [0, T] to a function \({\mathbf P}x\), with

$$\begin{aligned} ({\mathbf P} x)(s) :{=} \int _0^s {\hat{\varPhi }}(\varphi (\varsigma ),x(\varsigma ))\,\mathrm{d}\varsigma , \quad s\in [0,T], \end{aligned}$$
(6)

which (as can be verified) is also an element of \({{\mathcal {W}}}^{1,1}([0,T])\). The following result provides existence and uniqueness of a solution to the initial-value problems (2) and (2’).

Theorem 2.1

\({{\mathcal {R}}}=\{x\}\), i.e., there exists a unique solution \(x\in {{\mathcal {W}}}^{1,1}([0,T])\) to the initial-value problem (2), and \({\mathbf P}x = x\).

As becomes clear in the proof of the last result (provided in the Appendix), a repeated application of the operator \(\mathbf P\) to \(\phi \), where \(\phi (s) :{=} \int _0^s \varphi (\varsigma )\,\mathrm{d}\varsigma \) for all \(s\in [0,T]\), converges to the unique solution of Eq. (2). That is, when considering the sequence \(\sigma :{=} (x_k)_{k=0}^\infty \), with the initial function \(x_0 = \phi \) and the Picard iteration \(x_{k+1} = {\mathbf P}x_k\) for \(k\ge 0\), then \(x_k \rightarrow x\in {{\mathcal {R}}}\) as \(k\rightarrow \infty \). In practice, the convergence of the sequence \(\sigma \) to the adjoint variable \({x = \lim _{k\rightarrow \infty } {\mathbf P}^k\phi }\) is usually very efficient and takes place within a few iterations; see Fig. 1 for an example.

Fig. 1
figure 1

Computation of x in 3 iterations, for \(F(t) = \sin (t) - (t-(5\pi /2))^2/50\) on \([0,5\pi ]\)

3 Main Results

Based on the notions introduced in the proof of Lemma 2.1, it is now possible to construct expressions for the solutions of (P), first for the smallest solution \(t^*\), then the largest solution \(t^{**}\), and finally for all solutions in between.

Theorem 3.1

The smallest solution of (P) is

$$\begin{aligned} t^* = T - \sup \{s\in [0,T]: x(s)=0\}. \end{aligned}$$

Proof

By Lemma 2.1 the adjoint variable \(x(s)\ge 0\) for all \(s\in [0,T]\), and \(x(0)=0\) by the initial condition in Eq. (2). The set \({{\mathcal {S}}}:{=}\{s\in [0,T]: x(s)=0\}\) is nonempty (because \(0\in {{\mathcal {S}}}\)), and its supremum, \(s^* :{=} \sup \,{{\mathcal {S}}}\), therefore exists and lies in the interval [0, T]. Depending on whether or not \({\mathcal {S}}\) is a singleton, we consider two cases.

Case 1: \({{\mathcal {S}}} = \{0\}\). Since \(x(s)>0\) for all \(s\in \,]0,T]\), by Eq. (2) it is

$$\begin{aligned} x(s) = \int _0^s f(T-\vartheta )\,\mathrm{d}\vartheta = \int _{T-s}^T f(\theta )\,\mathrm{d}\theta > 0, \quad s\in \,]0,T]. \end{aligned}$$

Thus, for any \(t\in [0,T[\), by setting \(s=T-t\), one obtains

$$\begin{aligned} F(t) = F(T) - \int _{t}^T f(\theta )\,\mathrm{d}\theta = F(T) - x(T-t) < F(T). \end{aligned}$$

Since \(s^*=0\), this implies that \(t^*= T - s^* = T\) solves (P).

Case 2: \({{\mathcal {S}}} \supsetneq \{0\}\). Let \(\hat{s}\in \,]0,T]\) such that \(x(\hat{s})=0\). Thus, \(\hat{s}\in {{\mathcal {S}}}\) and \(s^*\ge \hat{s}>0\). By Eqs. (3) and (4) the difference

$$\begin{aligned} \Delta (s) = x(s)-z(s) = \int _{T-s}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta \end{aligned}$$

is nondecreasing in s. Now consider the optimal value of the global optimization problem (P) subject to the additional constraint that \(t\in [T-\hat{s},T]\), so

$$\begin{aligned} \hat{F}^*(\hat{s}):{=}\max _{t\in [T-\hat{s},T]} F(t) = \max _{t\in [T-\hat{s},T]} \left\{ F(T) - \int _t^T f(\theta )\,\mathrm{d}\theta \right\} \le F^*. \end{aligned}$$
(7)

Then by virtue of Eq. (3) and the nonnegativity of x it is

$$\begin{aligned} \hat{F}^*(\hat{s})= & {} \max _{t\in [T-\hat{s},T]} \left\{ F(T) - z(T-t)\right\} \nonumber \\= & {} F(T) + \max _{t\in [T-\hat{s},T]} \left\{ \Delta (T-t)-x(T-t)\right\} \nonumber \\\le & {} F(T) + \max _{t\in [T-\hat{s},T]} \left\{ \Delta (T-t)\right\} . \nonumber \end{aligned}$$

By the monotonicity of \(\Delta (s)\), alluded to earlier, the maximum on the right-hand side is achieved for \(t = T - \hat{s}\). Since by assumption \(x(\hat{s})=0\), it is \(\Delta (\hat{s})= x(\hat{s}) - z(\hat{s})=-z(\hat{s})\). Furthermore, by Eq. (3), \(-z(\hat{s}) = F(T-\hat{s}) - F(T)\), so that \(\hat{F}^*(\hat{s}) \le F(T-\hat{s})\). But the value on the right-hand side of the preceding inequality can be attained in the maximization of F over the interval \([T-\hat{s},T]\) in Eq. (7) by choosing \(t=T-\hat{s}\), which implies

$$\begin{aligned} \hat{F}^*(\hat{s}) = F(T-\hat{s}^*). \end{aligned}$$

Using again the monotonicity of \(\Delta (s)\), for any \(\hat{s}'\in {{\mathcal {S}}}\) with \(\hat{s}'\ge \hat{s}\), one obtains \(\hat{F}^*(\hat{s}')\ge \hat{F}^*(\hat{s})\), whence

$$\begin{aligned} \hat{F}^*(\hat{s})\le \sup _{s\in {{\mathcal {S}}}}\hat{F}^*(s) = \hat{F}^*(s^*) = F(T-s^*). \end{aligned}$$

We therefore know that

$$\begin{aligned} F(T-s^*) = \max _{t\in [T-s^*,T]} F(t), \end{aligned}$$
(8)

and \(x(s)>0\) for all \(s\in \,]s^*,T]\). Thus, \(\hat{{\mathcal {S}}}:{=} \{s\in [s^*,T]: x(s)=0\}\) is a singleton: \(\hat{{\mathcal {S}}} = \{s^*\}\). Analogous to Case 1, one can conclude that the maximum of F on the interval \([0,T-s^*]\) is attained at the upper end of the domain, so

$$\begin{aligned} F(T-s^*) = \max _{t\in [0,T-s^*]} F(t). \end{aligned}$$
(9)

Combining Eqs. (8) and (9), the solution to the global optimization problem (P) is therefore \(t^*=T-s^*\), and

$$\begin{aligned} F^* = F(T) + \Delta (T-t^*) = F(t^*), \end{aligned}$$

which completes the proof. \(\square \)

Remark 3.1

By substituting \(s=T-t\) in Theorem 3.1, the smallest solution to the global optimization problem (P) can also be written in the form

$$\begin{aligned} t^* = \inf \left\{ t\in [0,T] : x(T-t) = 0\right\} . \end{aligned}$$

Accordingly, the optimal value of (P) is

$$\begin{aligned} F^* = F(t^*) = F(T) + \int _{t^*}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta . \end{aligned}$$

In the foregoing derivations, the nonnegative adjoint variable \(x(T-t)\), defined as the solution to the initial-value problem (2), measures the possible cumulative improvement of a solution in the interval [tT] relative to the current value F(t). The smallest solution of (P) is the smallest \(t^*\) for which no improvement of the objective can be obtained on the interval \([t^*,T]\), so \(x(T-t^*)=0\) in particular. Alternatively, one can determine the largest solution \(t^{**}\) of (P) by measuring cumulative improvements over F(t) on the interval [0, t]. For this, consider the unique solution to the initial-value problem

$$\begin{aligned} \dot{y}(t) = \varPhi (-f(t),y(t)), \ \ \ y(0)=0, \end{aligned}$$
(10)

for \(t\in [0,T]\). Analogous to the iterative procedure for the solution of the initial-value problem (2) in Sect. 2, it is possible to obtain the (co-)adjoint variable y by successive approximation, \(\lim _{k\rightarrow \infty }\hat{\mathbf P}^k {\hat{\varPhi }} = y\), where the operator \(\hat{\mathbf P}:{{\mathcal {W}}}^{1,1}([0,T])\rightarrow {{\mathcal {W}}}^{1,1}([0,T])\) maps any absolutely continuous function y on [0, T] to an absolutely continuous function \(\hat{\mathbf P}y\), with

$$\begin{aligned} (\hat{\mathbf P} y)(t) :{=} \int _0^t {\hat{\varPhi }}(-f(\theta ),y(\theta ))\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$
(11)

just as the operator \(\mathbf P\) in Eq. (6), and where \({\hat{\varPhi }}(t):{=} -\int _0^t f(\theta )\,\mathrm{d}\theta = F(0) - F(t)\). As with Eq. (2’), corresponding to Eq. (2), there exists an equivalent formulation for the initial-value problem (10) for the computation of y,

$$\begin{aligned} \dot{y}(t) = {\hat{\varPhi }}(-f(t),y(t)) = -f(t) + f_+(t)\,{\mathbf 1}_{\{y(t)\le 0\}}, \quad y(0)=0, \end{aligned}$$
(10')

where \(f_+(t) :{=} \max \{0,f(t)\}\) for \(t\in [0,T]\).

Corollary 3.1

The largest solution of (P) is \(t^{**} = \sup \{t\in [0,T]:y(t)=0\}\).

Proof

For any \(s\in [0,T]\), let \(G(s):{=} F(T-s)\). Then, any solution to the global optimization problem

$$\begin{aligned} G^* = \max _{s\in [0,T]} G(s) \end{aligned}$$
(P')

is also a solution of (P). Moreover, by Theorem 3.1 the smallest solution \(s^*\) of (P’) is equal to T minus the largest solution \(t^{**}\) of (P). Mirroring the objective function from F to G also mirrors the corresponding derivatives from f to g, in the sense that

$$\begin{aligned} g(s):{=}\, \dot{G}(s) = -\dot{F}(T-s) = - f(T-s), \end{aligned}$$

for all \(s\in [0,T]\). A (unique) solution y to the initial-value problem (2), applied to the primitives of the mirrored global optimization problem (P’) (with the independent variable s suitably replaced by t), satisfies

$$\begin{aligned} \dot{y}(t)=\varPhi (g(T-t),y(t)) = \varPhi (-f(t),y(t)), \quad y(0)=0, \end{aligned}$$

for \(t\in [0,T]\). The latter corresponds to the initial-value problem (10). By Theorem 3.1, the smallest solution of (P’) is \(s^* = T - \sup \{t\in [0,T]:y(t) = 0\}\), so that the largest solution of (P) becomes

$$\begin{aligned} t^{**} = T - s^* = \sup \{t\in [0,T]:y(t)=0\}, \end{aligned}$$

which concludes the proof. \(\square \)

The two preceding results together characterize the uniqueness of a solution to the global optimization problem.

Corollary 3.2

A solution of (P) is unique if and only if

$$\begin{aligned} \sup \{s\in [0,T]: x(s)=0\} + \sup \{t\in [0,T]: y(t) = 0\} = T. \end{aligned}$$

Proof

The result follows immediately by setting \(t^*=t^{**}\) in Theorem 3.1 and Corollary 3.1. \(\square \)

Intuitively, a solution \(t^*\) of (P) is unique if and only if the length of the largest interval for zero cumulative improvement (of the objective function F) to the right of t and the length of the largest interval for zero cumulative improvement to the left of t add up to the length T of the domain [0, T] at \(t=t^*\).

Remark 3.2

Consider the (slightly) “generalized” global optimization problem

$$\begin{aligned} H^* = \max _{{\hat{t}}\in [a,b]} H({\hat{t}}), \end{aligned}$$
(P'')

featuring a continuously differentiable real-valued objective function H, defined on the interval [ab], where ab are any given real numbers such that \(a<b\). While (P”) seems more general than (P), it can be reduced to the latter by maximizing \(F(t) :{=} H(a+t)\) on the interval [0, T] (for t) with \(T:{=} b-a\), just as in the original optimization problem (P). Any solution \(t^*\) of (P) directly corresponds to a solution \({\hat{t}}^*\) of (P”) via translation, \({\hat{t}}^* = t^*-a\).

It is possible to generalize the representation of the solutions in Theorem 3.1 and Corollary 3.1 to cases where the global optimization problem has more than 2 solutions. Indeed, if (P) has any finite number of solutions, all solutions can be found recursively.

Corollary 3.3

If \({{\mathcal {P}}} = \{t_1,\ldots , t_N\}\subset [0,T]\) (with \(t^*=t_1<\cdots <t_N=t^{**}\)) is a complete set of \(N>2\) distinct solutions of (P), then all solutions (between the smallest and the largest) are

$$\begin{aligned} t_{k} = \check{T} - \sup \{s\in [0,\check{T}-t_{k-1}[\ : \check{x}(s)=0\}, \quad k\in \{2,\ldots ,N-1\}, \end{aligned}$$
(12)

where \(\check{x}\) is the unique solution of the initial-value problem (2) with T replaced by \(\check{T} :{=} t^{**}\).

Proof

Note first that necessarily the optimal value of (P) is such that \(F^*=F(t_k)\) for all \(k\in \{1,\ldots ,N\}\). Consider now any solution \(t_k\in (t^*,t^{**})\) for \(k\in \{2,\ldots ,N-1\}\), obtained by the recursion in Eq. (12). Since \([0,\check{T}]\) is a subset of [0, T], the point \(t_k\) also solves the “generalized” global optimization problem (P”) on the interval \([a,b]=[t_k,\check{T}]\). Moreover, by Theorem 3.1:

$$\begin{aligned} t_k = \check{T} - \sup \{s\in [0,\check{T}-t_k]: \check{x}(s)=0\}. \end{aligned}$$

Since \(F^*=F(\check{T})\), there exists an \(\varepsilon \in \,]0,\check{T}-t_k[\) so that the right-sided improvement \(\check{x}(s)\) is strictly positive for all \(s\in \,]\check{T}-t_k-\varepsilon ,\check{T}-t_k[\). But this implies that

$$\begin{aligned} t_{k+1} = \check{T} - \sup \{s\in [0,\check{T}-t_k[\ : \check{x}(s)=0\}, \end{aligned}$$

which corresponds to the recursion in (12), thus concluding the proof. \(\square \)

Note that the cardinality of the solution set \({\mathcal {P}}\) need not be finite. For instance, the objective function F, defined by \(F(t) :{=} 1-(t^2\sin (1/t))^2\) for \(t>0\), with \(F(0):{=} 0\), is continuously differentiable, and (for \(T\ge 1/\pi \)) the global optimization problem (P) has the countable solution set \({{\mathcal {P}}} = \{t_1,t_2,\ldots \}\), where \(t_k = 1/(k\pi )\) for all \(k\ge 1\). But \({\mathcal {P}}\) need not even be countable: as an example, any constant objective function, \(F(t)\equiv c\in {{\mathbb {R}}}\), would produce the continuum \({{\mathcal {P}}} = [0,T]\) as solution set of (P), equal to the entire domain.

Remark 3.3

Given \(F^* = F(t^*)=F(t^{**})\), the solution set of (P), for any number of solutions, is \({{\mathcal {P}}} = \{t\in [t^*,t^{**}] : F(t)\ge F^*\}\), corresponding to the upper contour set of F relative to its globally optimal value \(F^*\) on [0, T].

By combining the interpretations of the two adjoint variables x and y as the right-sided and left-sided gains, respectively, it is possible to construct a necessary and sufficient optimality condition to decide whether a given point solves the global optimization problem. For this, we introduce the combined (or “two-sided”) adjoint variable \(\lambda (t):{=} \max \{x(T-t),y(t)\}\).

Theorem 3.2

A point \({\hat{t}}\in [0,T]\) is a solution of (P) if and only if

$$\begin{aligned} \lambda ({\hat{t}}) = 0. \end{aligned}$$
(13)

Accordingly, the solution set is \({{\mathcal {P}}} = \{t\in [0,T]: \lambda (t)=0\}\).

Proof

Consider the set \({{\mathcal {P}}}\) of solutions to (P), and let \(F^*\) be the optimal value of this global optimization problem.

  1. (i)

    Necessity: If \({\hat{t}}\in {{\mathcal {P}}}\), then by Remark 3.3 no improvement is possible on the interval \([{\hat{t}},T]\), so \(x(T-{\hat{t}})=0\) necessarily. Similarly, no improvement is possible on the interval \([0,{\hat{t}}]\) which implies that \(y({\hat{t}})=0\). Together with the definition of \(\lambda \), this establishes Eq. (13) as a necessary optimality condition for any element of \({\mathcal {P}}\).

  2. (ii)

    Sufficiency: Consider a point \({\hat{t}}\in [0,T]\) which satisfies \(\lambda ({\hat{t}})=0\). By Lemma 2.1, the adjoint variable x is nonnegative-valued, which—by symmetry—is also true for y. Hence, \(x(T-{\hat{t}})=y(\hat{t})=0\), so neither a right-sided (on \([{\hat{t}},T]\)) nor a left-sided (on \([0,{\hat{t}}]\)) strict improvement over \(F({\hat{t}})\) is possible, which implies that \(F({\hat{t}})=F^*\). Hence, \({\hat{t}}\) must be an element of \({{\mathcal {P}}}\).

Based on (i) and (ii), Eq. (13) characterizes any solution of (P), which implies the representation of the solution set \({\mathcal {P}}\) as the set of roots of \(\lambda (t)\), concluding the proof. \(\square \)

At any given point t the combined adjoint variable \(\lambda (t)\) can be interpreted as the best gain available on the domain [0, T]. This implies the following invariance property.

Corollary 3.4

For any \(t\in [0,T]\), it is \(\lambda (t) + F(t) = F^*\).

Combining the last result with the initial conditions in Eqs. (2) and (10) yields an expression of the optimal value of (P) as a function of the adjoint variables evaluated at the interval horizon.

Corollary 3.5

\(x(T) = \lambda (0) = F^* - F(0)\) and \(y(T)=\lambda (T) = F^*- F(T)\).

The aforementioned properties of the adjoint variables reveal an inherent complementarity, in the sense that the nonnegative one-sided adjoint variables x and y can only vanish together at a global optimum. In addition, because of the normalization to zero at either interval end, the sum of the one-sided adjoint variables at the boundaries must be equal to the optimal increment of the objective function: \(x(T)+y(0) = F^* - F(0)\) and \(x(0)+y(T)=F^* - F(T)\).

Remark 3.4

In the global optimality condition (13), one could replace \(\lambda \) by any nontrivial convex combination of x and y (e.g., by \(\hat{\lambda } :{=} (x + y)/2\)), and Corollary 3.5 would continue to hold. However, as the upper envelope of all convex combinations of x and y, the combined adjoint variable \(\lambda (t)=F^* - F(t)\) enjoys particular significance in terms of its interpretation as the available global gain relative to the value F(t) at any point \(t\in [0,T]\), as stated in Corollary 3.4.

4 Applications

The following examples illustrate the notions and results developed earlier.

Example 4.1

(Multiple Solutions) Consider a \(2\pi \)-periodic objective function of the form \(F(t):{=} \sin (t)\) on the interval [0, T] for \(T=(2N-1)\pi \), where \(N\ge 1\) is a given integer. Equation (2) yields the cumulative improvement of \(F(T-s)\) over the interval \([T-s,T]\),

$$\begin{aligned} x(s) = (1-\sin (s)){\mathbf 1}_{\{s\ge \pi /2\}}, \ \ \ s\in [0,T]. \end{aligned}$$

By symmetry of the objective function with respect to the midpoint (T / 2) of the domain, the cumulative improvement of F(t) over the interval [0, t], i.e., the solution to Eq. (10), is

$$\begin{aligned} y(t) = (1-\sin (t)){\mathbf 1}_{\{t\ge \pi /2\}}, \ \ \ t\in [0,T]. \end{aligned}$$

Thus, by Theorem 3.1 and Corollary 3.1 one obtains the smallest and the largest solution of (P), respectively: \(t^*=T - \sup \{s\in [0,T]:\sin (s)=1\} =\pi /2\) and \(t^{**} = \sup \{t\in [0,T]:\sin (t)=1\} = (4N-3)(\pi /2)\). By Corollary 3.2, the solution of (P) is unique if and only if \(N=1\), since then \(t^*=t^{**}\). For \(N\ge 2\), there are exactly N different solutions: \(t_1=t^*\) and \(t_N=t^{**}\), as well as \(t_k = (4k-3)(\pi /2)\) for \(k\in \{2,\ldots ,N-1\}\), as provided by Corollary 3.3.

Example 4.2

(Monopoly Pricing) A single-product monopolist faces heterogeneous consumers whose highest willingness-to-pay (WTP) for its good is normalized to \(T=1\), without loss of generality. Given a continuous probability density function \(h:[0,1]\rightarrow {{\mathbb {R}}}_+\) describing the distribution of consumers’ WTP, the aggregate demand for the product at the price t is

$$\begin{aligned} D(t) = \int _t^1 h(\theta )\,\mathrm{d}\theta , \quad t\in [0,1]. \end{aligned}$$

Thus, assuming (for simplicity) zero marginal cost, the monopolist’s optimal pricing problem becomes

$$\begin{aligned} \max _{t\in [0,1]} \left\{ t D(t)\right\} , \end{aligned}$$

which is of the form (P) for \(F(t) = t D(t)\) and \(f(t) = D(t) - t\,h(t)\). Fermat’s necessary optimality condition (1) yields that at any positive optimal price \(t^*\in \,]0,1[\), the monopolist would set the marginal revenue f to zero, so \(D(t^*) = t^* h(t^*)\).Footnote 5 For a multimodal distribution h, there can be many prices that satisfy this optimality condition. Figure 2 depicts the situation for a bimodal beta-mixture \(h(t) = \gamma p_{\alpha _1,\beta _1}(t) + (1-\gamma ) p_{\alpha _2,\beta _2}(t)\), where \(\gamma \in [0,1]\) and \(p_{\alpha ,\beta }(t) :{=} t^{\alpha -1}(1-t)^{\beta -1}/B(\alpha ,\beta )\) for any \(\alpha ,\beta >0\).Footnote 6 In order to derive a necessary and sufficient optimality condition, we use Eqs. (2’) and (10’) to compute the adjoint variables x and y. Given any price \(t\in [0,1]\), it is best for the monopolist to increase the price if and only if the adjoint variable \(x(1-t)>0\). And it is best for the monopolist to decrease the price if and only if the (co-)adjoint variable \(y(t)>0\). Hence, as stated in Theorem 3.2 the price \(t=t^*\) is globally optimal if and only if \(\lambda (t^*) = \max \{x(1-t^*),y(t^*)\}=0\); see Fig. 2. Furthermore, following Corollary 3.4 and Corollary 3.5 the combined adjoint variable \(\lambda (t)\), at any price \(t\in [0,1]\), is equal to the distance of the profit F(t) to its optimal value \(F^*\).

Fig. 2
figure 2

Objective function F(t) and cumulative one-sided gains x(t), y(t) in Example 4.2

Example 4.3

(Optimal Stopping) Suppose that at any time t, a decision maker has the option to either stick with a given utility stream u(t) or to make an irreversible switch to an alternative utility stream v(t), where both u and v are defined for all times \(t\in [0,T]\). In addition, \(t=0\) denotes the present and \(t=T>0\) the relevant time horizon. By considering the utility increment of the default utility stream over the alternative utility stream,

$$\begin{aligned} \delta (t) :{=} u(t) - v(t), \quad t\in [0,T], \end{aligned}$$

the decision maker’s optimal stopping problem can be written in the form

$$\begin{aligned} \max _{t\in [0,T]} \left\{ \int _0^t \mathrm{e}^{-r\theta } u(\theta )\,\mathrm{d}\theta + \int _t^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta \right\} = V_0 + \max _{t\in [0,T]} F(t), \end{aligned}$$

where \(r\ge 0\) is a given discount rate, \(V_0:{=} \int _0^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta \) is a constant, and

$$\begin{aligned} F(t) :{=} \int _0^t \mathrm{e}^{-r\theta }\delta (\theta )\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$

is the relevant objective function in the global optimization problem (P). Since \(F(0)=0\), the optimal utility increment \(F^*\) over the discounted utility \(V_0\) of selecting the outside option immediately must be nonnegative. For all s in the interval [0, T], Eq. (2) with \(f(T-s)={\hbox {e}}^{-r (T-s)} \delta (T-s)\) yields the incremental utility of following the optimal stopping strategy on the interval \([T-s,T]\), expressed by the adjoint variable x(s). Moreover, the best stopping strategy, once having arrived at t (possibly suboptimally, by sticking to the default option), is to stop if and only if \(x(T-t)=0\). Hence, the earliest stopping time \(t^*\) must be globally optimal, and \(t^* = \inf \{t\in [0,T]:x(T-t)=0\}\) as already noted in Remark 3.1.

Remark 4.1

The foregoing example shows that a (deterministic) optimal stopping problem can be written in the form (P). The converse also holds: (P) can be interpreted as an optimal stopping problem, given the utility increment \(f(t) \equiv \dot{F}(t)\) and a zero discount rate. Theorem 3.1 addresses this interpretation. By switching the reference point, in the sense that

$$\begin{aligned} \max _{t\in [0,T]} \left\{ \int _0^t {\hbox {e}}^{-r\theta } u(\theta )\,\mathrm{d}\theta + \int _t^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta \right\} = U_0 + \max _{t\in [0,T]} \hat{F}(t), \end{aligned}$$

where \(U_0:{=} \int _0^T {\hbox {e}}^{-r\theta } u(\theta )\,\mathrm{d}\theta \) is a constant, the modified objective function

$$\begin{aligned} \hat{F}(t) :{=} -\int _t^T {\hbox {e}}^{-r\theta }\delta (\theta )\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$

is a translation of the original objective function: \(\hat{F}(t)\equiv F(t) + (U_0-V_0)\). Hence, one can think of (P) as an optimal starting problem. Corollary 3.1 and the cumulative left-sided benefit y(t) in Eq. (10) highlight this interpretation.

5 Perspectives

The representation of solutions to the global optimization problem (P) in Sect. 3 suggests several global optimality conditions and a dynamic-systems interpretation.

5.1 Global Optimality Conditions

Consider the solution x to the initial-value problem (2) and, respectively, the solution y to the initial-value problem (10). The significance of the adjoint variables x and y as the cumulative one-sided gains of the objective value implies several global optimality conditions, cumulating in an exact characterization of solutions to (P).

  1. (i)

    A necessary optimality condition for any solution \(t^*\) of the global optimization problem (P) is that \(x(T-t^*)=0\) (resp., \(y(t^*)=0\)).

  2. (ii)

    The fact that \(x(T-{\hat{t}})=0\) for a given point \({\hat{t}}\in [0,T]\) is a sufficient condition for the existence of a solution to (P) in \([0,{\hat{t}}]\) (resp., if \(y({\hat{t}})=0\), then (P) has a solution on \([{\hat{t}},T]\)).

  3. (iii)

    For local maxima which are not solutions of (P), the condition \(x(T-{\hat{t}})=0\) holds if and only if \({\hat{t}}\) globally maximizes F on \([{\hat{t}},T]\) (resp., \(y({\hat{t}})=0\) if and only if \({\hat{t}}\) globally maximizes F on \([0,{\hat{t}}]\)).

  4. (iv)

    By Theorem 3.1 (resp., Corollary 3.1), the smallest (resp., largest) solution to (P) is \(t^* = T - \sup \{s\in [0,T]:x(s)=0\}\) (resp., \(t^{**} = \sup \{t\in [0,T]: y(t) = 0\}\)). Additional solutions can be found using Corollary 3.3, as well as Remark 3.3.

  5. (v)

    By Theorem 3.2, a point \({\hat{t}}\) solves (P) if and only if \(\lambda ({\hat{t}})=0\), using the “combined” adjoint variable \(\lambda (t) \equiv \max \{x(T-t),y(t)\}\). This condition, which can be checked pointwise, effectively supersedes the local necessary optimality condition (1) by Fermat. Furthermore, by Corollary 3.4 one obtains \(\lambda (t) \equiv F^*-F(t)\). Applied to the interval boundaries, this invariance property implies that the distance to the optimal value is attained by the appropriate one-sided adjoint variable at each endpoint; see Corollary 3.5 for details.

Statements (i)–(v) also apply to points and solutions at the boundaries of the interval [0, T], i.e., they are not limited to interior points, unlike standard (local) first-order optimality conditions such as (1). In particular, statement (v) provides a crisp representation of the solution set: \({{\mathcal {P}}} = \{t\in [0,T]: \lambda (t) = 0\}\).

Remark 5.1

As noted after Theorem 2.1, in practice the adjoint variable x representing the right-sided gain can be efficiently computed by repeatedly applying the operator \({\mathbf P}\) in Eq. (6) a (usually small) number of times to \(\phi \), where \(\phi (s) = \int _0^s f(T-\varsigma )\,\mathrm{d}\varsigma = G(0) - G(s)\) for all \(s\in [0,T]\), as illustrated in Fig. 1. That is, \(x = \lim _{k\rightarrow \infty } {\mathbf P}^k \phi \).Footnote 7 Similarly, the adjoint variable y representing the left-sided gain can be obtained using the operator \(\hat{\mathbf P}\) in Eq. (11), so \(\lim _{k\rightarrow \infty } \hat{\mathbf P}^k{\hat{\phi }} = y\), where \({\hat{\phi }}(t) = -\int _0^t f(\theta )\,\mathrm{d}\theta = F(0)-F(t)\), for all \(t\in [0,T]\).Footnote 8

5.2 Dynamic-Systems Interpretation

The equivalence of global optimization on an interval and optimal stopping (see Remark 4.1) suggests a dynamic-systems interpretation of the solution method proposed in Sect. 3. By introducing the state variable \(\xi (t)\) and the adjoint variable (“co-state”) \(\psi (t) \equiv x(T-t)\), the solution of (P), given in Theorem 3.1, satisfies the following two-point boundary-value problem for \(t\in [0,T]\):

$$\begin{aligned} {\dot{\xi }}(t)= & {} \mu (\psi (t)), \quad \xi (0)=0, \end{aligned}$$
(14)
$$\begin{aligned} {\dot{\psi }}(t)= & {} -\varPhi (f(t),\psi (t)), \quad \psi (T) = 0, \end{aligned}$$
(15)

where the function \(\mu :{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) in Eq. (14) implements the (optimal) stopping policy using a co-state feedback: \(\mu (\hat{\psi }) :{=} {\mathbf 1}_{\{\hat{\psi } > 0\}}\), for all \({\hat{\psi }}\in {{\mathbb {R}}}\). The state \(\xi (t)\) partitions the domain [0, T] into a continuation region \([0,t^*]\) (where \(\xi (t)=0\)) and a stopping region \((t^*,T]\) (where \(\xi (t)>0\)). The co-state \(\psi (t)\), independently determined by Eq. (15), is nonnegative and provides global information about possible improvements by continuing a search for the optimum to the right of the current t. Given the solution \((\xi ,\psi )(t)\) of Eqs. (14)–(15) for \(t\in [0,T]\), the current value \(\nu (t)\) solves the initial-value problem

$$\begin{aligned} \dot{\nu }(t) = {\mathbf 1}_{\{\xi (t)\le 0\}}\,f(t), \quad \nu (0) = F(0), \end{aligned}$$

for \(t\in [0,T]\), so that

$$\begin{aligned} \nu (t) = \left\{ \begin{array}{ll} F(t), &{}\quad \text{ if } \, t\le t^*,\\ F^*, &{}\quad \text{ otherwise }, \end{array}\right. \end{aligned}$$

where \(F^* = \nu (T)\) is the optimal value of (P) and \(t^*\) is the (smallest) solution of (P); see Fig. 3 for an illustration using the primitives of Example 4.2. This formalizes the heuristic that it is globally optimal to walk the ‘mountain range’ defined by F(t), starting at \(t=0\), toward the right, until the view toward the right becomes unimpeded. The global information about the function values not yet experienced during the walk is contributed by the co-state variable \(\psi \). Alternately, it is possible to start walking on the interval at \(t=T\) toward the left, leading to an analogous solution, as formulated in Corollary 3.1. While the results by themselves do not offer a ‘magic potion’ for finding a solution to a global optimization problem without checking the entire interval, they shed light on the importance of global information, unlike the local optimality conditions, such as (1), usually employed to identify candidates for interior local optima. The two-point boundary problem (14)–(15) is reminiscent of the Hamiltonian system which leads to a similar two-point boundary-value problem as part of the Pontryagin maximum principle [19]; see also [20].Footnote 9 As Bellman’s principle of optimality ([21], Ch. III.3) would suggest, the adjoint variable provides in fact a solution to an entire family of nested optimization problems. It thus gives a “complete contingent plan,” in the sense that if for some reason a global optimum \(t^*\) was missed when walking from left to right, then for any \(t\in \,]t^*,T[\) the adjoint variable still provides an optimal stopping rule on the interval [tT].

Fig. 3
figure 3

State \(\xi (t)\), co-state \(\psi (t)\), and current value \(\nu (t)\) in Example 4.2

6 Conclusions

Keeping track of one-sided improvements on an interval [0, T] in the form of adjoint variables \(x(T-t)\) and y(t), for all \(t\in [0,T]\), allows for a characterization of all solutions to the global optimization problem (P). The two-sided adjoint variable \(\lambda (t) = \max \{x(T-t),y(t)\}\), as the upper envelope of both one-sided adjoint variables, vanishes at a point \(\hat{t}\) of the interval if and only if that point is a solution of (P), so \(\hat{t}\in {\mathcal P}\). The adjoint variables are uniquely determined as solutions to the initial-value problems (2) and (10), and they can be obtained using a Picard iteration that usually terminates in a finite number of steps. Conceptually, the adjoint variables incorporate not only all the global information needed for solving (P) but also for solving subproblems of (P): A one-sided adjoint variable, say y(t), describes a (‘stopping’) policy for optimizing on a subinterval [0, t] from the current point t to the corresponding endpoint of the interval (0 for the left-sided adjoint variable y); \(y(t)=0\) if and only if t is a global maximum on [0, t]. Finally, an analytical description of all solutions to the global optimization problem (P) may be used to check solution properties, such as the monotonicity in problem parameters, that may or may not be satisfied at points implied by imprecise optimality conditions such as Eq. (1).