Global Optimization on an Interval

Weber, Thomas A.

doi:10.1007/s10957-016-1006-y

Global Optimization on an Interval

Open access
Published: 20 September 2016

Volume 172, pages 684–705, (2017)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Global Optimization on an Interval

Download PDF

Thomas A. Weber¹

2049 Accesses
7 Citations
Explore all metrics

Abstract

This paper provides expressions for solutions of a one-dimensional global optimization problem using an adjoint variable which represents the available one-sided improvements up to the interval “horizon.” Interpreting the problem in terms of optimal stopping or optimal starting, the solution characterization yields two-point boundary problems as in dynamic optimization. Results also include a procedure for computing the adjoint variable, as well as necessary and sufficient global optimality conditions.

Continuity of Minima: Local Results

Article 08 February 2015

Constrained Global Optimization Using a New Exact Penalty Function

On a Global Search in D.C. Optimization Problems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The generic nonconcavity of maximization problems generally leads to multiple local optima. Standard optimality conditions tend to be local, and techniques for global optimization are usually algorithmic in nature, restricting the search for the best solution to subsets of the domain. For the simple case where the domain is an interval, a global maximizer of a continuously differentiable function can be found by using techniques from dynamic systems, notably by introducing global information in the form of an adjoint variable. In this manner, we construct expressions for solutions to a global optimization problem on an interval, which are directly related to dynamic interpretations in terms of optimal stopping and optimal starting. In addition to providing a full characterization of solutions to a global optimization problem over an interval, the adjoint variable can also be used locally to formulate necessary and sufficient optimality conditions for one-sided subproblems of the original global optimization problem.

1.1 Literature

Following [1], global optimization methods use either deterministic search algorithms (e.g., via gradient methods) or random-sampling procedures. The first type of algorithms consists of schemes for systematic search updates. The Bolzano search finds critical points of a concave objective function via bisection (see, e.g., [2], p. 122). The golden-section search by [3] for unimodal functions increases the efficiency of the bisection method by varying the subdivision using Fibonacci numbers; see also [4].^{Footnote 1} Algorithms based on steepest ascent, such as Newton’s method (see, e.g., [7], Ch. 9.5), tend to be greedy and therefore converge to local extrema. Improvements are achieved by using (deterministic) sampling techniques capitalizing on available knowledge about the variation of the function in terms of its Lipschitz constant [8]. The latter can be refined by locally estimating the Lipschitz constant [9], using a quadratic bound [10], or by employing a higher-order approach, e.g., considering additionally the Lipschitz constant for the variation of the gradient [11]. An overview of the second type of algorithms, based on random sampling, is provided by [12], Ch. 4. An alternative Bayesian approach, assuming a probabilistic model of the objective function as a stochastic process, was proposed by [13]. These algorithms amount to numerical techniques, predicated on the assumption that the objective function is expensive to evaluate or nonsmooth, so as to deny the possibility of direct analytical calculations. In breaking with this premise, our goal is to provide insights about the kind of information needed to compute solutions to a global optimization problem as well as their properties, rather than an attempt to improve on the numerical side.

We assume that the underlying objective function is continuously differentiable, and then reduce the solution of the global optimization problem to solving an “adjoint” differential equation. In the spirit of [14], this differential equation performs the somewhat unexpected task of aggregating global information about the available one-sided improvements. Since the adjoint equation has a discontinuous right-hand side, existence and uniqueness of the solution are obtained separately via successive Picard iterations (see, e.g., [15], p. 213), without relying on (here unavailable) Lipschitz constants.

1.2 Outline

The remainder of this paper is organized as follows. Section 2 introduces notation and basic concepts, most notably an auxiliary (adjoint) variable which represents the optimal improvement up to the interval horizon. Section 3 provides expressions for the solutions of a one-dimensional global optimization problem as well as necessary and sufficient global optimality conditions. Section 4 contains several examples to illustrate the results. It also clarifies the equivalence of global optimization with optimal stopping (or starting) problems. Section 5 discusses global optimality conditions and the relationship of the proposed methods to the analysis of optimal control problems. Section 6 concludes.

2 Preliminaries

For any given $T>0$, consider the global optimization problem^{Footnote 2}

$$\begin{aligned} F^* = \max _{t\in [0,T]} F(t), \end{aligned}$$

(P)

where $F:[0,T]\rightarrow {{\mathbb {R}}}$ is a differentiable real-valued objective function with continuous derivative $f:[0,T]\rightarrow {{\mathbb {R}}}$. By the Weierstrass theorem (see, e.g., [16], p. 540), problem (P) has a solution, i.e., its solution set ${{\mathcal {P}}}\subseteq [0,T]$ is nonempty, and the optimal value $F^*$ is finite. Furthermore, it is well known that any (interior) optimizer ${\hat{t}}\in \,]0,T[$ (i.e., excluding the boundary points 0 and T) satisfies the Fermat condition,

$$\begin{aligned} f({\hat{t}}) = 0, \end{aligned}$$

(1)

but that there may be many points ${{\hat{t}}}$ that do not solve (P) but still satisfy $f({{\hat{t}}})=0$. For example, if F is equal to a value ${\bar{F}}<F^*$ on a subinterval, then there is a continuum of such values. We are interested in characterizing the solution(s) to the global optimization problem, as element(s) of [0, T], including the boundaries. For this, we introduce an auxiliary function, also referred to as “adjoint variable,” $x:[0,T]\rightarrow {{\mathbb {R}}}$ as the unique solution to the initial-value problem^{Footnote 3}

$$\begin{aligned} \dot{x}(s) = \varPhi (f(T-s),x(s)), \quad x(0)=0, \end{aligned}$$

(2)

for $s\in [0,T]$, where for any $({\hat{t}},{\hat{x}})\in {{\mathbb {R}}}^2$:

$$\begin{aligned} \varPhi ({\hat{t}},{\hat{x}}):{=} \left\{ \begin{array}{ll} {\hat{t}}, &{}\quad \text{ if } \; {\hat{x}}>0,\\ \max \{0,{\hat{t}}\}, &{}\quad \text{ if } \; {\hat{x}}=0,\\ 0, &{}\quad \text{ otherwise }. \end{array}\right. \end{aligned}$$

The right-hand side of the differential equation in (2) is discontinuous and generally does not satisfy the Carathéodory conditions (see, e.g., [17], p. 3). Before we establish existence and uniqueness of a solution to the initial-value problem in the space ${{\mathcal {W}}}^{1,1}([0,T])$ of absolutely continuous functions on [0, T] (see Theorem 2.1 below), we provide a useful lower bound.

Lemma 2.1

For any $s\in [0,T]$: $x(s)\ge \max \{0,F(T) - F(T-s)\}$.

Proof

The adjoint variable x(s) cannot become negative, since Eq. (2) implies that $\dot{x}\ge 0$ at the boundary of positivity, i.e., whenever $x=0$. Thus, $x(s)\ge 0$ for all $s\in [0,T]$. We now show that $x(s)\ge F(T)-F(T-s)$. For this, note that the solution to the initial-value problem

$$\begin{aligned} \dot{z}(s) = f(T-s), \quad z(0) = 0, \end{aligned}$$

for $s\in [0,T]$, is of the form

$$\begin{aligned} z(s) = \int _{T-s}^T f(\theta )\,\mathrm{d}\theta = F(T) - F(T-s). \end{aligned}$$

(3)

Consider the difference $\Delta :{=} x - z$. Then, $\Delta (0)=0$ and, using the fact that $x(s)\ge 0$, it is

$$\begin{aligned} \dot{\Delta }(s) = -\min \{0,f(T-s)\}{\mathbf 1}_{\{x(s)=0\}} = \max \{0,-f(T-s)\}{\mathbf 1}_{\{x(s)=0\}}\ge 0. \end{aligned}$$

Thus,

$$\begin{aligned} \Delta (s) = \int _{T-s}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta \ge 0, \quad s\in [0,T], \end{aligned}$$

(4)

which implies that $x(s)\ge z(s)$ for all $s\in [0,T]$. This proves the claim. $\square $

As explained in the next section, the adjoint variable x(s) measures the optimal improvement of the objective value $F(T-s)$ on the interval ${[T-s,T]}$. Because the comparison set includes the current value of the objective function, the improvement must be nonnegative and has to exceed the difference ${F(T)-F(T-s)}$, at least weakly.

By Lemma 2.1 any solution x to Eq. (2), if it exists, cannot have negative values on [0, T]. Moreover, for any $({\hat{t}},{\hat{x}})\in {{\mathbb {R}}}^2$:^{Footnote 4}

$$\begin{aligned} {\hat{x}}\ge 0 \ \Rightarrow \ \varPhi ({\hat{t}},{\hat{x}}) = {\hat{t}}\,{\mathbf 1}_{\{\hat{x}>0\}} + \max \{0,{\hat{t}}\}\,{\mathbf 1}_{\{\hat{x}=0\}} = {\hat{t}} - \min \{0,{\hat{t}}\}\,{\mathbf 1}_{\{\hat{x}\le 0\}} =:{\hat{\varPhi }}({\hat{t}},\hat{x}). \end{aligned}$$

Thus, if we set $\varphi (s):{=} f(T-s)$ and $\varphi _-(s):{=} \min \{0,\varphi (s)\}$ for all $s\in [0,T]$, then based on the preceding implication, the initial-value problem in Eq. (2) can be rewritten in the form

$$\begin{aligned} \dot{x}(s) = {\hat{\varPhi }}(\varphi (s),x(s)) = \varphi (s) - \varphi _-(s)\,{\mathbf 1}_{\{x(s)\le 0\}}, \ \ \ x(0)=0, \end{aligned}$$

(2')

without affecting its set ${{\mathcal {R}}}\subset {{\mathcal {W}}}^{1,1}([0,T])$ of solutions. The Sobolev space ${{\mathcal {W}}}^{1,1}([0,T])$ contains all absolutely continuous real-valued functions x defined on the domain [0, T] and equipped with the norm $\Vert \cdot \Vert _{1,1}$, where

$$\begin{aligned} \Vert x\Vert _{1,1} = \int _0^T \left( |x(s)| + |\dot{x}(s)|\right) \,\mathrm{d}s. \end{aligned}$$

(5)

The vector space ${{\mathcal {W}}}^{1,1}([0,T])$ is a Banach space, i.e., a complete normed vector space, which means that any Cauchy sequence with elements in the vector space converges (in the $\Vert \cdot \Vert _{1,1}$-norm) to an element of the vector space. The solution set of the initial-value problem (2’) is

$$\begin{aligned} {{\mathcal {R}}} :{=} \{x\in {{\mathcal {W}}}^{1,1}([0,T]) : {\mathbf P}x = x \}, \end{aligned}$$

where the operator ${\mathbf P}:{{\mathcal {W}}}^{1,1}([0,T])\rightarrow {{\mathcal {W}}}^{1,1}([0,T])$ maps any absolutely continuous function x on [0, T] to a function ${\mathbf P}x$, with

$$\begin{aligned} ({\mathbf P} x)(s) :{=} \int _0^s {\hat{\varPhi }}(\varphi (\varsigma ),x(\varsigma ))\,\mathrm{d}\varsigma , \quad s\in [0,T], \end{aligned}$$

(6)

which (as can be verified) is also an element of ${{\mathcal {W}}}^{1,1}([0,T])$. The following result provides existence and uniqueness of a solution to the initial-value problems (2) and (2’).

Theorem 2.1

${{\mathcal {R}}}=\{x\}$, i.e., there exists a unique solution $x\in {{\mathcal {W}}}^{1,1}([0,T])$ to the initial-value problem (2), and ${\mathbf P}x = x$.

As becomes clear in the proof of the last result (provided in the Appendix), a repeated application of the operator $\mathbf P$ to $\phi $, where $\phi (s) :{=} \int _0^s \varphi (\varsigma )\,\mathrm{d}\varsigma $ for all $s\in [0,T]$, converges to the unique solution of Eq. (2). That is, when considering the sequence $\sigma :{=} (x_k)_{k=0}^\infty $, with the initial function $x_0 = \phi $ and the Picard iteration $x_{k+1} = {\mathbf P}x_k$ for $k\ge 0$, then $x_k \rightarrow x\in {{\mathcal {R}}}$ as $k\rightarrow \infty $. In practice, the convergence of the sequence $\sigma $ to the adjoint variable ${x = \lim _{k\rightarrow \infty } {\mathbf P}^k\phi }$ is usually very efficient and takes place within a few iterations; see Fig. 1 for an example.

3 Main Results

Based on the notions introduced in the proof of Lemma 2.1, it is now possible to construct expressions for the solutions of (P), first for the smallest solution $t^*$, then the largest solution $t^{**}$, and finally for all solutions in between.

Theorem 3.1

The smallest solution of (P) is

$$\begin{aligned} t^* = T - \sup \{s\in [0,T]: x(s)=0\}. \end{aligned}$$

Proof

By Lemma 2.1 the adjoint variable $x(s)\ge 0$ for all $s\in [0,T]$, and $x(0)=0$ by the initial condition in Eq. (2). The set ${{\mathcal {S}}}:{=}\{s\in [0,T]: x(s)=0\}$ is nonempty (because $0\in {{\mathcal {S}}}$), and its supremum, $s^* :{=} \sup \,{{\mathcal {S}}}$, therefore exists and lies in the interval [0, T]. Depending on whether or not ${\mathcal {S}}$ is a singleton, we consider two cases.

Case 1: ${{\mathcal {S}}} = \{0\}$. Since $x(s)>0$ for all $s\in \,]0,T]$, by Eq. (2) it is

$$\begin{aligned} x(s) = \int _0^s f(T-\vartheta )\,\mathrm{d}\vartheta = \int _{T-s}^T f(\theta )\,\mathrm{d}\theta > 0, \quad s\in \,]0,T]. \end{aligned}$$

Thus, for any $t\in [0,T[$, by setting $s=T-t$, one obtains

$$\begin{aligned} F(t) = F(T) - \int _{t}^T f(\theta )\,\mathrm{d}\theta = F(T) - x(T-t) < F(T). \end{aligned}$$

Since $s^*=0$, this implies that $t^*= T - s^* = T$ solves (P).

Case 2: ${{\mathcal {S}}} \supsetneq \{0\}$. Let $\hat{s}\in \,]0,T]$ such that $x(\hat{s})=0$. Thus, $\hat{s}\in {{\mathcal {S}}}$ and $s^*\ge \hat{s}>0$. By Eqs. (3) and (4) the difference

$$\begin{aligned} \Delta (s) = x(s)-z(s) = \int _{T-s}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta \end{aligned}$$

is nondecreasing in s. Now consider the optimal value of the global optimization problem (P) subject to the additional constraint that $t\in [T-\hat{s},T]$, so

$$\begin{aligned} \hat{F}^*(\hat{s}):{=}\max _{t\in [T-\hat{s},T]} F(t) = \max _{t\in [T-\hat{s},T]} \left\{ F(T) - \int _t^T f(\theta )\,\mathrm{d}\theta \right\} \le F^*. \end{aligned}$$

(7)

Then by virtue of Eq. (3) and the nonnegativity of x it is

$$\begin{aligned} \hat{F}^*(\hat{s})= & {} \max _{t\in [T-\hat{s},T]} \left\{ F(T) - z(T-t)\right\} \nonumber \\= & {} F(T) + \max _{t\in [T-\hat{s},T]} \left\{ \Delta (T-t)-x(T-t)\right\} \nonumber \\\le & {} F(T) + \max _{t\in [T-\hat{s},T]} \left\{ \Delta (T-t)\right\} . \nonumber \end{aligned}$$

By the monotonicity of $\Delta (s)$, alluded to earlier, the maximum on the right-hand side is achieved for $t = T - \hat{s}$. Since by assumption $x(\hat{s})=0$, it is $\Delta (\hat{s})= x(\hat{s}) - z(\hat{s})=-z(\hat{s})$. Furthermore, by Eq. (3), $-z(\hat{s}) = F(T-\hat{s}) - F(T)$, so that $\hat{F}^*(\hat{s}) \le F(T-\hat{s})$. But the value on the right-hand side of the preceding inequality can be attained in the maximization of F over the interval $[T-\hat{s},T]$ in Eq. (7) by choosing $t=T-\hat{s}$, which implies

$$\begin{aligned} \hat{F}^*(\hat{s}) = F(T-\hat{s}^*). \end{aligned}$$

Using again the monotonicity of $\Delta (s)$, for any $\hat{s}'\in {{\mathcal {S}}}$ with $\hat{s}'\ge \hat{s}$, one obtains $\hat{F}^*(\hat{s}')\ge \hat{F}^*(\hat{s})$, whence

$$\begin{aligned} \hat{F}^*(\hat{s})\le \sup _{s\in {{\mathcal {S}}}}\hat{F}^*(s) = \hat{F}^*(s^*) = F(T-s^*). \end{aligned}$$

We therefore know that

$$\begin{aligned} F(T-s^*) = \max _{t\in [T-s^*,T]} F(t), \end{aligned}$$

(8)

and $x(s)>0$ for all $s\in \,]s^*,T]$. Thus, $\hat{{\mathcal {S}}}:{=} \{s\in [s^*,T]: x(s)=0\}$ is a singleton: $\hat{{\mathcal {S}}} = \{s^*\}$. Analogous to Case 1, one can conclude that the maximum of F on the interval $[0,T-s^*]$ is attained at the upper end of the domain, so

$$\begin{aligned} F(T-s^*) = \max _{t\in [0,T-s^*]} F(t). \end{aligned}$$

(9)

Combining Eqs. (8) and (9), the solution to the global optimization problem (P) is therefore $t^*=T-s^*$, and

$$\begin{aligned} F^* = F(T) + \Delta (T-t^*) = F(t^*), \end{aligned}$$

which completes the proof. $\square $

Remark 3.1

By substituting $s=T-t$ in Theorem 3.1, the smallest solution to the global optimization problem (P) can also be written in the form

$$\begin{aligned} t^* = \inf \left\{ t\in [0,T] : x(T-t) = 0\right\} . \end{aligned}$$

Accordingly, the optimal value of (P) is

$$\begin{aligned} F^* = F(t^*) = F(T) + \int _{t^*}^T \max \{0,-f(\theta )\}{\mathbf 1}_{\{x(T-\theta )=0\}}\,\mathrm{d}\theta . \end{aligned}$$

In the foregoing derivations, the nonnegative adjoint variable $x(T-t)$, defined as the solution to the initial-value problem (2), measures the possible cumulative improvement of a solution in the interval [t, T] relative to the current value F(t). The smallest solution of (P) is the smallest $t^*$ for which no improvement of the objective can be obtained on the interval $[t^*,T]$, so $x(T-t^*)=0$ in particular. Alternatively, one can determine the largest solution $t^{**}$ of (P) by measuring cumulative improvements over F(t) on the interval [0, t]. For this, consider the unique solution to the initial-value problem

$$\begin{aligned} \dot{y}(t) = \varPhi (-f(t),y(t)), \ \ \ y(0)=0, \end{aligned}$$

(10)

for $t\in [0,T]$. Analogous to the iterative procedure for the solution of the initial-value problem (2) in Sect. 2, it is possible to obtain the (co-)adjoint variable y by successive approximation, $\lim _{k\rightarrow \infty }\hat{\mathbf P}^k {\hat{\varPhi }} = y$, where the operator $\hat{\mathbf P}:{{\mathcal {W}}}^{1,1}([0,T])\rightarrow {{\mathcal {W}}}^{1,1}([0,T])$ maps any absolutely continuous function y on [0, T] to an absolutely continuous function $\hat{\mathbf P}y$, with

$$\begin{aligned} (\hat{\mathbf P} y)(t) :{=} \int _0^t {\hat{\varPhi }}(-f(\theta ),y(\theta ))\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$

(11)

just as the operator $\mathbf P$ in Eq. (6), and where ${\hat{\varPhi }}(t):{=} -\int _0^t f(\theta )\,\mathrm{d}\theta = F(0) - F(t)$. As with Eq. (2’), corresponding to Eq. (2), there exists an equivalent formulation for the initial-value problem (10) for the computation of y,

$$\begin{aligned} \dot{y}(t) = {\hat{\varPhi }}(-f(t),y(t)) = -f(t) + f_+(t)\,{\mathbf 1}_{\{y(t)\le 0\}}, \quad y(0)=0, \end{aligned}$$

(10')

where $f_+(t) :{=} \max \{0,f(t)\}$ for $t\in [0,T]$.

Corollary 3.1

The largest solution of (P) is $t^{**} = \sup \{t\in [0,T]:y(t)=0\}$.

Proof

For any $s\in [0,T]$, let $G(s):{=} F(T-s)$. Then, any solution to the global optimization problem

$$\begin{aligned} G^* = \max _{s\in [0,T]} G(s) \end{aligned}$$

(P')

is also a solution of (P). Moreover, by Theorem 3.1 the smallest solution $s^*$ of (P’) is equal to T minus the largest solution $t^{**}$ of (P). Mirroring the objective function from F to G also mirrors the corresponding derivatives from f to g, in the sense that

$$\begin{aligned} g(s):{=}\, \dot{G}(s) = -\dot{F}(T-s) = - f(T-s), \end{aligned}$$

for all $s\in [0,T]$. A (unique) solution y to the initial-value problem (2), applied to the primitives of the mirrored global optimization problem (P’) (with the independent variable s suitably replaced by t), satisfies

$$\begin{aligned} \dot{y}(t)=\varPhi (g(T-t),y(t)) = \varPhi (-f(t),y(t)), \quad y(0)=0, \end{aligned}$$

for $t\in [0,T]$. The latter corresponds to the initial-value problem (10). By Theorem 3.1, the smallest solution of (P’) is $s^* = T - \sup \{t\in [0,T]:y(t) = 0\}$, so that the largest solution of (P) becomes

$$\begin{aligned} t^{**} = T - s^* = \sup \{t\in [0,T]:y(t)=0\}, \end{aligned}$$

which concludes the proof. $\square $

The two preceding results together characterize the uniqueness of a solution to the global optimization problem.

Corollary 3.2

A solution of (P) is unique if and only if

$$\begin{aligned} \sup \{s\in [0,T]: x(s)=0\} + \sup \{t\in [0,T]: y(t) = 0\} = T. \end{aligned}$$

Proof

The result follows immediately by setting $t^*=t^{**}$ in Theorem 3.1 and Corollary 3.1. $\square $

Intuitively, a solution $t^*$ of (P) is unique if and only if the length of the largest interval for zero cumulative improvement (of the objective function F) to the right of t and the length of the largest interval for zero cumulative improvement to the left of t add up to the length T of the domain [0, T] at $t=t^*$.

Remark 3.2

Consider the (slightly) “generalized” global optimization problem

$$\begin{aligned} H^* = \max _{{\hat{t}}\in [a,b]} H({\hat{t}}), \end{aligned}$$

(P'')

featuring a continuously differentiable real-valued objective function H, defined on the interval [a, b], where a, b are any given real numbers such that $a<b$. While (P”) seems more general than (P), it can be reduced to the latter by maximizing $F(t) :{=} H(a+t)$ on the interval [0, T] (for t) with $T:{=} b-a$, just as in the original optimization problem (P). Any solution $t^*$ of (P) directly corresponds to a solution ${\hat{t}}^*$ of (P”) via translation, ${\hat{t}}^* = t^*-a$.

It is possible to generalize the representation of the solutions in Theorem 3.1 and Corollary 3.1 to cases where the global optimization problem has more than 2 solutions. Indeed, if (P) has any finite number of solutions, all solutions can be found recursively.

Corollary 3.3

If ${{\mathcal {P}}} = \{t_1,\ldots , t_N\}\subset [0,T]$ (with $t^*=t_1<\cdots <t_N=t^{**}$) is a complete set of $N>2$ distinct solutions of (P), then all solutions (between the smallest and the largest) are

$$\begin{aligned} t_{k} = \check{T} - \sup \{s\in [0,\check{T}-t_{k-1}[\ : \check{x}(s)=0\}, \quad k\in \{2,\ldots ,N-1\}, \end{aligned}$$

(12)

where $\check{x}$ is the unique solution of the initial-value problem (2) with T replaced by $\check{T} :{=} t^{**}$.

Proof

Note first that necessarily the optimal value of (P) is such that $F^*=F(t_k)$ for all $k\in \{1,\ldots ,N\}$. Consider now any solution $t_k\in (t^*,t^{**})$ for $k\in \{2,\ldots ,N-1\}$, obtained by the recursion in Eq. (12). Since $[0,\check{T}]$ is a subset of [0, T], the point $t_k$ also solves the “generalized” global optimization problem (P”) on the interval $[a,b]=[t_k,\check{T}]$. Moreover, by Theorem 3.1:

$$\begin{aligned} t_k = \check{T} - \sup \{s\in [0,\check{T}-t_k]: \check{x}(s)=0\}. \end{aligned}$$

Since $F^*=F(\check{T})$, there exists an $\varepsilon \in \,]0,\check{T}-t_k[$ so that the right-sided improvement $\check{x}(s)$ is strictly positive for all $s\in \,]\check{T}-t_k-\varepsilon ,\check{T}-t_k[$. But this implies that

$$\begin{aligned} t_{k+1} = \check{T} - \sup \{s\in [0,\check{T}-t_k[\ : \check{x}(s)=0\}, \end{aligned}$$

which corresponds to the recursion in (12), thus concluding the proof. $\square $

Note that the cardinality of the solution set ${\mathcal {P}}$ need not be finite. For instance, the objective function F, defined by $F(t) :{=} 1-(t^2\sin (1/t))^2$ for $t>0$, with $F(0):{=} 0$, is continuously differentiable, and (for $T\ge 1/\pi $) the global optimization problem (P) has the countable solution set ${{\mathcal {P}}} = \{t_1,t_2,\ldots \}$, where $t_k = 1/(k\pi )$ for all $k\ge 1$. But ${\mathcal {P}}$ need not even be countable: as an example, any constant objective function, $F(t)\equiv c\in {{\mathbb {R}}}$, would produce the continuum ${{\mathcal {P}}} = [0,T]$ as solution set of (P), equal to the entire domain.

Remark 3.3

Given $F^* = F(t^*)=F(t^{**})$, the solution set of (P), for any number of solutions, is ${{\mathcal {P}}} = \{t\in [t^*,t^{**}] : F(t)\ge F^*\}$, corresponding to the upper contour set of F relative to its globally optimal value $F^*$ on [0, T].

By combining the interpretations of the two adjoint variables x and y as the right-sided and left-sided gains, respectively, it is possible to construct a necessary and sufficient optimality condition to decide whether a given point solves the global optimization problem. For this, we introduce the combined (or “two-sided”) adjoint variable $\lambda (t):{=} \max \{x(T-t),y(t)\}$.

Theorem 3.2

A point ${\hat{t}}\in [0,T]$ is a solution of (P) if and only if

$$\begin{aligned} \lambda ({\hat{t}}) = 0. \end{aligned}$$

(13)

Accordingly, the solution set is ${{\mathcal {P}}} = \{t\in [0,T]: \lambda (t)=0\}$.

Proof

Consider the set ${{\mathcal {P}}}$ of solutions to (P), and let $F^*$ be the optimal value of this global optimization problem.

(i)
Necessity: If ${\hat{t}}\in {{\mathcal {P}}}$, then by Remark 3.3 no improvement is possible on the interval $[{\hat{t}},T]$, so $x(T-{\hat{t}})=0$ necessarily. Similarly, no improvement is possible on the interval $[0,{\hat{t}}]$ which implies that $y({\hat{t}})=0$. Together with the definition of $\lambda $, this establishes Eq. (13) as a necessary optimality condition for any element of ${\mathcal {P}}$.
(ii)
Sufficiency: Consider a point ${\hat{t}}\in [0,T]$ which satisfies $\lambda ({\hat{t}})=0$. By Lemma 2.1, the adjoint variable x is nonnegative-valued, which—by symmetry—is also true for y. Hence, $x(T-{\hat{t}})=y(\hat{t})=0$, so neither a right-sided (on $[{\hat{t}},T]$) nor a left-sided (on $[0,{\hat{t}}]$) strict improvement over $F({\hat{t}})$ is possible, which implies that $F({\hat{t}})=F^*$. Hence, ${\hat{t}}$ must be an element of ${{\mathcal {P}}}$.

Based on (i) and (ii), Eq. (13) characterizes any solution of (P), which implies the representation of the solution set ${\mathcal {P}}$ as the set of roots of $\lambda (t)$, concluding the proof. $\square $

At any given point t the combined adjoint variable $\lambda (t)$ can be interpreted as the best gain available on the domain [0, T]. This implies the following invariance property.

Corollary 3.4

For any $t\in [0,T]$, it is $\lambda (t) + F(t) = F^*$.

Combining the last result with the initial conditions in Eqs. (2) and (10) yields an expression of the optimal value of (P) as a function of the adjoint variables evaluated at the interval horizon.

Corollary 3.5

$x(T) = \lambda (0) = F^* - F(0)$ and $y(T)=\lambda (T) = F^*- F(T)$.

The aforementioned properties of the adjoint variables reveal an inherent complementarity, in the sense that the nonnegative one-sided adjoint variables x and y can only vanish together at a global optimum. In addition, because of the normalization to zero at either interval end, the sum of the one-sided adjoint variables at the boundaries must be equal to the optimal increment of the objective function: $x(T)+y(0) = F^* - F(0)$ and $x(0)+y(T)=F^* - F(T)$.

Remark 3.4

In the global optimality condition (13), one could replace $\lambda $ by any nontrivial convex combination of x and y (e.g., by $\hat{\lambda } :{=} (x + y)/2$), and Corollary 3.5 would continue to hold. However, as the upper envelope of all convex combinations of x and y, the combined adjoint variable $\lambda (t)=F^* - F(t)$ enjoys particular significance in terms of its interpretation as the available global gain relative to the value F(t) at any point $t\in [0,T]$, as stated in Corollary 3.4.

4 Applications

The following examples illustrate the notions and results developed earlier.

Example 4.1

(Multiple Solutions) Consider a $2\pi $-periodic objective function of the form $F(t):{=} \sin (t)$ on the interval [0, T] for $T=(2N-1)\pi $, where $N\ge 1$ is a given integer. Equation (2) yields the cumulative improvement of $F(T-s)$ over the interval $[T-s,T]$,

$$\begin{aligned} x(s) = (1-\sin (s)){\mathbf 1}_{\{s\ge \pi /2\}}, \ \ \ s\in [0,T]. \end{aligned}$$

By symmetry of the objective function with respect to the midpoint (T / 2) of the domain, the cumulative improvement of F(t) over the interval [0, t], i.e., the solution to Eq. (10), is

$$\begin{aligned} y(t) = (1-\sin (t)){\mathbf 1}_{\{t\ge \pi /2\}}, \ \ \ t\in [0,T]. \end{aligned}$$

Thus, by Theorem 3.1 and Corollary 3.1 one obtains the smallest and the largest solution of (P), respectively: $t^*=T - \sup \{s\in [0,T]:\sin (s)=1\} =\pi /2$ and $t^{**} = \sup \{t\in [0,T]:\sin (t)=1\} = (4N-3)(\pi /2)$. By Corollary 3.2, the solution of (P) is unique if and only if $N=1$, since then $t^*=t^{**}$. For $N\ge 2$, there are exactly N different solutions: $t_1=t^*$ and $t_N=t^{**}$, as well as $t_k = (4k-3)(\pi /2)$ for $k\in \{2,\ldots ,N-1\}$, as provided by Corollary 3.3.

Example 4.2

(Monopoly Pricing) A single-product monopolist faces heterogeneous consumers whose highest willingness-to-pay (WTP) for its good is normalized to $T=1$, without loss of generality. Given a continuous probability density function $h:[0,1]\rightarrow {{\mathbb {R}}}_+$ describing the distribution of consumers’ WTP, the aggregate demand for the product at the price t is

$$\begin{aligned} D(t) = \int _t^1 h(\theta )\,\mathrm{d}\theta , \quad t\in [0,1]. \end{aligned}$$

Thus, assuming (for simplicity) zero marginal cost, the monopolist’s optimal pricing problem becomes

$$\begin{aligned} \max _{t\in [0,1]} \left\{ t D(t)\right\} , \end{aligned}$$

which is of the form (P) for $F(t) = t D(t)$ and $f(t) = D(t) - t\,h(t)$. Fermat’s necessary optimality condition (1) yields that at any positive optimal price $t^*\in \,]0,1[$, the monopolist would set the marginal revenue f to zero, so $D(t^*) = t^* h(t^*)$.^{Footnote 5} For a multimodal distribution h, there can be many prices that satisfy this optimality condition. Figure 2 depicts the situation for a bimodal beta-mixture $h(t) = \gamma p_{\alpha _1,\beta _1}(t) + (1-\gamma ) p_{\alpha _2,\beta _2}(t)$, where $\gamma \in [0,1]$ and $p_{\alpha ,\beta }(t) :{=} t^{\alpha -1}(1-t)^{\beta -1}/B(\alpha ,\beta )$ for any $\alpha ,\beta >0$.^{Footnote 6} In order to derive a necessary and sufficient optimality condition, we use Eqs. (2’) and (10’) to compute the adjoint variables x and y. Given any price $t\in [0,1]$, it is best for the monopolist to increase the price if and only if the adjoint variable $x(1-t)>0$. And it is best for the monopolist to decrease the price if and only if the (co-)adjoint variable $y(t)>0$. Hence, as stated in Theorem 3.2 the price $t=t^*$ is globally optimal if and only if $\lambda (t^*) = \max \{x(1-t^*),y(t^*)\}=0$; see Fig. 2. Furthermore, following Corollary 3.4 and Corollary 3.5 the combined adjoint variable $\lambda (t)$, at any price $t\in [0,1]$, is equal to the distance of the profit F(t) to its optimal value $F^*$.

Example 4.3

(Optimal Stopping) Suppose that at any time t, a decision maker has the option to either stick with a given utility stream u(t) or to make an irreversible switch to an alternative utility stream v(t), where both u and v are defined for all times $t\in [0,T]$. In addition, $t=0$ denotes the present and $t=T>0$ the relevant time horizon. By considering the utility increment of the default utility stream over the alternative utility stream,

$$\begin{aligned} \delta (t) :{=} u(t) - v(t), \quad t\in [0,T], \end{aligned}$$

the decision maker’s optimal stopping problem can be written in the form

$$\begin{aligned} \max _{t\in [0,T]} \left\{ \int _0^t \mathrm{e}^{-r\theta } u(\theta )\,\mathrm{d}\theta + \int _t^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta \right\} = V_0 + \max _{t\in [0,T]} F(t), \end{aligned}$$

where $r\ge 0$ is a given discount rate, $V_0:{=} \int _0^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta $ is a constant, and

$$\begin{aligned} F(t) :{=} \int _0^t \mathrm{e}^{-r\theta }\delta (\theta )\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$

is the relevant objective function in the global optimization problem (P). Since $F(0)=0$, the optimal utility increment $F^*$ over the discounted utility $V_0$ of selecting the outside option immediately must be nonnegative. For all s in the interval [0, T], Eq. (2) with $f(T-s)={\hbox {e}}^{-r (T-s)} \delta (T-s)$ yields the incremental utility of following the optimal stopping strategy on the interval $[T-s,T]$, expressed by the adjoint variable x(s). Moreover, the best stopping strategy, once having arrived at t (possibly suboptimally, by sticking to the default option), is to stop if and only if $x(T-t)=0$. Hence, the earliest stopping time $t^*$ must be globally optimal, and $t^* = \inf \{t\in [0,T]:x(T-t)=0\}$ as already noted in Remark 3.1.

Remark 4.1

The foregoing example shows that a (deterministic) optimal stopping problem can be written in the form (P). The converse also holds: (P) can be interpreted as an optimal stopping problem, given the utility increment $f(t) \equiv \dot{F}(t)$ and a zero discount rate. Theorem 3.1 addresses this interpretation. By switching the reference point, in the sense that

$$\begin{aligned} \max _{t\in [0,T]} \left\{ \int _0^t {\hbox {e}}^{-r\theta } u(\theta )\,\mathrm{d}\theta + \int _t^T {\hbox {e}}^{-r\theta } v(\theta )\,\mathrm{d}\theta \right\} = U_0 + \max _{t\in [0,T]} \hat{F}(t), \end{aligned}$$

where $U_0:{=} \int _0^T {\hbox {e}}^{-r\theta } u(\theta )\,\mathrm{d}\theta $ is a constant, the modified objective function

$$\begin{aligned} \hat{F}(t) :{=} -\int _t^T {\hbox {e}}^{-r\theta }\delta (\theta )\,\mathrm{d}\theta , \quad t\in [0,T], \end{aligned}$$

is a translation of the original objective function: $\hat{F}(t)\equiv F(t) + (U_0-V_0)$. Hence, one can think of (P) as an optimal starting problem. Corollary 3.1 and the cumulative left-sided benefit y(t) in Eq. (10) highlight this interpretation.

5 Perspectives

The representation of solutions to the global optimization problem (P) in Sect. 3 suggests several global optimality conditions and a dynamic-systems interpretation.

5.1 Global Optimality Conditions

Consider the solution x to the initial-value problem (2) and, respectively, the solution y to the initial-value problem (10). The significance of the adjoint variables x and y as the cumulative one-sided gains of the objective value implies several global optimality conditions, cumulating in an exact characterization of solutions to (P).

(i)
A necessary optimality condition for any solution $t^*$ of the global optimization problem (P) is that $x(T-t^*)=0$ (resp., $y(t^*)=0$).
(ii)
The fact that $x(T-{\hat{t}})=0$ for a given point ${\hat{t}}\in [0,T]$ is a sufficient condition for the existence of a solution to (P) in $[0,{\hat{t}}]$ (resp., if $y({\hat{t}})=0$, then (P) has a solution on $[{\hat{t}},T]$).
(iii)
For local maxima which are not solutions of (P), the condition $x(T-{\hat{t}})=0$ holds if and only if ${\hat{t}}$ globally maximizes F on $[{\hat{t}},T]$ (resp., $y({\hat{t}})=0$ if and only if ${\hat{t}}$ globally maximizes F on $[0,{\hat{t}}]$).
(iv)
By Theorem 3.1 (resp., Corollary 3.1), the smallest (resp., largest) solution to (P) is $t^* = T - \sup \{s\in [0,T]:x(s)=0\}$ (resp., $t^{**} = \sup \{t\in [0,T]: y(t) = 0\}$). Additional solutions can be found using Corollary 3.3, as well as Remark 3.3.
(v)
By Theorem 3.2, a point ${\hat{t}}$ solves (P) if and only if $\lambda ({\hat{t}})=0$, using the “combined” adjoint variable $\lambda (t) \equiv \max \{x(T-t),y(t)\}$. This condition, which can be checked pointwise, effectively supersedes the local necessary optimality condition (1) by Fermat. Furthermore, by Corollary 3.4 one obtains $\lambda (t) \equiv F^*-F(t)$. Applied to the interval boundaries, this invariance property implies that the distance to the optimal value is attained by the appropriate one-sided adjoint variable at each endpoint; see Corollary 3.5 for details.

Statements (i)–(v) also apply to points and solutions at the boundaries of the interval [0, T], i.e., they are not limited to interior points, unlike standard (local) first-order optimality conditions such as (1). In particular, statement (v) provides a crisp representation of the solution set: ${{\mathcal {P}}} = \{t\in [0,T]: \lambda (t) = 0\}$.

Remark 5.1

As noted after Theorem 2.1, in practice the adjoint variable x representing the right-sided gain can be efficiently computed by repeatedly applying the operator ${\mathbf P}$ in Eq. (6) a (usually small) number of times to $\phi $, where $\phi (s) = \int _0^s f(T-\varsigma )\,\mathrm{d}\varsigma = G(0) - G(s)$ for all $s\in [0,T]$, as illustrated in Fig. 1. That is, $x = \lim _{k\rightarrow \infty } {\mathbf P}^k \phi $.^{Footnote 7} Similarly, the adjoint variable y representing the left-sided gain can be obtained using the operator $\hat{\mathbf P}$ in Eq. (11), so $\lim _{k\rightarrow \infty } \hat{\mathbf P}^k{\hat{\phi }} = y$, where ${\hat{\phi }}(t) = -\int _0^t f(\theta )\,\mathrm{d}\theta = F(0)-F(t)$, for all $t\in [0,T]$.^{Footnote 8}

5.2 Dynamic-Systems Interpretation

The equivalence of global optimization on an interval and optimal stopping (see Remark 4.1) suggests a dynamic-systems interpretation of the solution method proposed in Sect. 3. By introducing the state variable $\xi (t)$ and the adjoint variable (“co-state”) $\psi (t) \equiv x(T-t)$, the solution of (P), given in Theorem 3.1, satisfies the following two-point boundary-value problem for $t\in [0,T]$:

$$\begin{aligned} {\dot{\xi }}(t)= & {} \mu (\psi (t)), \quad \xi (0)=0, \end{aligned}$$

(14)

$$\begin{aligned} {\dot{\psi }}(t)= & {} -\varPhi (f(t),\psi (t)), \quad \psi (T) = 0, \end{aligned}$$

(15)

where the function $\mu :{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}$ in Eq. (14) implements the (optimal) stopping policy using a co-state feedback: $\mu (\hat{\psi }) :{=} {\mathbf 1}_{\{\hat{\psi } > 0\}}$, for all ${\hat{\psi }}\in {{\mathbb {R}}}$. The state $\xi (t)$ partitions the domain [0, T] into a continuation region $[0,t^*]$ (where $\xi (t)=0$) and a stopping region $(t^*,T]$ (where $\xi (t)>0$). The co-state $\psi (t)$, independently determined by Eq. (15), is nonnegative and provides global information about possible improvements by continuing a search for the optimum to the right of the current t. Given the solution $(\xi ,\psi )(t)$ of Eqs. (14)–(15) for $t\in [0,T]$, the current value $\nu (t)$ solves the initial-value problem

$$\begin{aligned} \dot{\nu }(t) = {\mathbf 1}_{\{\xi (t)\le 0\}}\,f(t), \quad \nu (0) = F(0), \end{aligned}$$

for $t\in [0,T]$, so that

$$\begin{aligned} \nu (t) = \left\{ \begin{array}{ll} F(t), &{}\quad \text{ if } \, t\le t^*,\\ F^*, &{}\quad \text{ otherwise }, \end{array}\right. \end{aligned}$$

where $F^* = \nu (T)$ is the optimal value of (P) and $t^*$ is the (smallest) solution of (P); see Fig. 3 for an illustration using the primitives of Example 4.2. This formalizes the heuristic that it is globally optimal to walk the ‘mountain range’ defined by F(t), starting at $t=0$, toward the right, until the view toward the right becomes unimpeded. The global information about the function values not yet experienced during the walk is contributed by the co-state variable $\psi $. Alternately, it is possible to start walking on the interval at $t=T$ toward the left, leading to an analogous solution, as formulated in Corollary 3.1. While the results by themselves do not offer a ‘magic potion’ for finding a solution to a global optimization problem without checking the entire interval, they shed light on the importance of global information, unlike the local optimality conditions, such as (1), usually employed to identify candidates for interior local optima. The two-point boundary problem (14)–(15) is reminiscent of the Hamiltonian system which leads to a similar two-point boundary-value problem as part of the Pontryagin maximum principle [19]; see also [20].^{Footnote 9} As Bellman’s principle of optimality ([21], Ch. III.3) would suggest, the adjoint variable provides in fact a solution to an entire family of nested optimization problems. It thus gives a “complete contingent plan,” in the sense that if for some reason a global optimum $t^*$ was missed when walking from left to right, then for any $t\in \,]t^*,T[$ the adjoint variable still provides an optimal stopping rule on the interval [t, T].

6 Conclusions

Keeping track of one-sided improvements on an interval [0, T] in the form of adjoint variables $x(T-t)$ and y(t), for all $t\in [0,T]$, allows for a characterization of all solutions to the global optimization problem (P). The two-sided adjoint variable $\lambda (t) = \max \{x(T-t),y(t)\}$, as the upper envelope of both one-sided adjoint variables, vanishes at a point $\hat{t}$ of the interval if and only if that point is a solution of (P), so $\hat{t}\in {\mathcal P}$. The adjoint variables are uniquely determined as solutions to the initial-value problems (2) and (10), and they can be obtained using a Picard iteration that usually terminates in a finite number of steps. Conceptually, the adjoint variables incorporate not only all the global information needed for solving (P) but also for solving subproblems of (P): A one-sided adjoint variable, say y(t), describes a (‘stopping’) policy for optimizing on a subinterval [0, t] from the current point t to the corresponding endpoint of the interval (0 for the left-sided adjoint variable y); $y(t)=0$ if and only if t is a global maximum on [0, t]. Finally, an analytical description of all solutions to the global optimization problem (P) may be used to check solution properties, such as the monotonicity in problem parameters, that may or may not be satisfied at points implied by imprecise optimality conditions such as Eq. (1).

Notes

For further discussion of one-dimensional search methods, see [5, 6].
The analysis remains unchanged if the domain [0, T] is replaced by any interval [a, b]; see Remark 3.2. As usual, the function f is a one-sided derivative at the interval boundaries.
Throughout we use the dot-notation for total derivatives, so $\dot{F}(t) \equiv dF(t)/dt \equiv f(t)$.
For ${\hat{x}}<0$, the values $\varPhi ({\hat{t}},{\hat{x}})$ and ${\hat{\varPhi }}({\hat{t}},{\hat{x}})$ may be different. But by Lemma 2.1 the adjoint variable x is nonnegative, so that this case becomes irrelevant for any solution of the initial-value problem (2).
The Fermat condition corresponds to the well-known monopoly pricing rule (see, e.g., [18], p. 66), which does not guarantee optimality.
In the numerical example, $(\alpha _1,\beta _1)=(20,5)$, $(\alpha _2,\beta _2)=(5,20)$, and $\gamma =1/4$.
While our proofs make use of the fact that the objective function F in (P) is continuously differentiable, the iteration method for x and the optimality conditions work numerically if F is merely continuous (and f is approximated by means of differences), as long as the discretization steps are fine enough.
The starting functions $\phi $ and ${\hat{\phi }}$ are lower bounds for the respective adjoint variables. The first iterates (${\mathbf P}\phi $ and $\hat{\mathbf P}{\hat{\phi }}$) are upper bounds; see Lemma A.1 for details.
Endpoint transversality, $\psi (T)=0$, also holds at a global optimum: $\psi (t^*) = 0$.
The idea of Picard iterations of this type dates back to Picard [22] and Lindelöf [23]. It is commonly employed for proving the existence of solutions to ordinary differential equations (see, e.g., Coddington and Levinson [24]). In this case however, the Banach fixed-point theorem cannot be used, as no Lipschitz constant for the usual contraction mapping is available—because of the discontinuous system function $\varPhi $ in Eq. (2).
A better seed for the Picard iteration is $\phi = z_+ :{=}\max \{0,z\}$, corresponding to the lower bound for x in Lemma 2.1.
If there were another bound ${\hat{t}}<T$, then whenever $s_k={\hat{t}}$, by virtue of $\mathscr {A}(k)$ one would obtain $s_{k+1}>{\hat{t}}$, i.e., a contradiction.

References

Spang III, H.A.: A review of minimization techniques for nonlinear functions. SIAM Rev. 4(4), 343–365 (1962)
Article MathSciNet MATH Google Scholar
Zangwill, W.I.: Nonlinear Programming: A Unified Approach. Prentice-Hall, Englewood Cliffs (1969)
MATH Google Scholar
Kiefer, J.: Sequential minimax search for a maximum. Proc. Am. Math. Soc. 4(3), 502–506 (1953)
Article MathSciNet MATH Google Scholar
Kiefer, J.: Optimum sequential search and approximation methods under minimum regularity assumptions. J. Soc. Ind. Appl. Math. 5(3), 105–136 (1957)
Article MathSciNet MATH Google Scholar
Wilde, D.J.: Optimum Seeking Methods. Prentice-Hall, Englewood Cliffs (1964)
MATH Google Scholar
Wilde, D.J., Beightler, C.S.: Foundations of Optimization. Prentice-Hall, Englewood Cliffs (1967)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Shubert, B.O.: A sequential method seeking the global maximum of a function. SIAM J. Numer. Anal. 9(3), 379–388 (1972)
Article MathSciNet MATH Google Scholar
Sergeyev, YaD: A one-dimensional deterministic global minimization algorithm. Comput. Math. Math. Phys. 35(5), 553–562 (1995)
MathSciNet Google Scholar
Breiman, L., Cutler, A.: A deterministic algorithm for global optimization. Math. Program. 58(2), 179–199 (1993)
Article MathSciNet MATH Google Scholar
Lera, D., Sergeyev, YaD: Acceleration of univariate global optimization algorithms working with Lipschitz functions and Lipschitz first derivatives. SIAM J. Optim. 23(1), 508–529 (2013)
Article MathSciNet MATH Google Scholar
Törn, A., Žilinskas, A.: Global Optimization. Lecture Notes in Computer Science, vol. 350. Springer, New York (1989)
MATH Google Scholar
Locatelli, M.: Bayesian algorithms for one-dimensional global optimization. J. Glob. Optim. 10(1), 57–76 (2007)
Article MathSciNet MATH Google Scholar
Brockett, W.R.: Dynamical systems that sort lists, diagonalize matrices and solve linear equations. In: IEEE Conference on Decision and Control, vol. 1, pp. 799–803 (1988)
Arnold, V.I.: Ordinary Differential Equations. MIT Press, Cambridge (1973)
MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1995)
MATH Google Scholar
Filippov, A.F.: Differential Equations with Discontinuous Righthand Sides. Kluwer, Dordrecht (1988)
Book Google Scholar
Tirole, J.: The Theory of Industrial Organization. MIT Press, Cambridge (1988)
Google Scholar
Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mishchenko, E.F.: The Mathematical Theory of Optimal Processes. Wiley Interscience, New York (1962)
Google Scholar
Weber, T.A.: Optimal Control Theory with Applications in Economics. MIT Press, Cambridge (2011)
Book MATH Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Picard, E.: Sur l’application des méthodes d’approximations successives à l’étude de certaines équations différentielles ordinaires. J. Math. Pure Appl. 9, 217–272 (1893)
MATH Google Scholar
Lindelöf, E.: Sur l’application de la méthode des approximations successives aux équations différentielles ordinaires du premier ordre. Cr. Hebd. Acad. Sci. 116, 454–457 (1894)
MATH Google Scholar
Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations. McGraw-Hill, New York (1955)
MATH Google Scholar
Polya, G.: Induction and Analogy in Mathematics. Oxford University Press, London (1954)
MATH Google Scholar
Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, New York (1976)
MATH Google Scholar

Download references

Acknowledgments

The author would like to thank participants of the 14th EUROPT Workshop on Advances in Continuous Optimization in Warsaw, Poland, and the MODU 2016 Workshop in Melbourne, Australia, as well as several anonymous referees for helpful comments and suggestions.

Author information

Authors and Affiliations

École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
Thomas A. Weber

Authors

Thomas A. Weber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas A. Weber.

Additional information

Communicated by Panos M. Pardalos.

Appendix

Proof of Theorem 2.1

We first show existence and then uniqueness of a solution to the initial-value problem (2).

(i) Existence: ${{\mathcal {R}}}\ne \emptyset $. Consider a sequence of absolutely continuous functions, $\sigma :{=} (x_k)_{k=0}^\infty \subset {{\mathcal {W}}}^{1,1}([0,T])$, defined by the recursion^{Footnote 10}

$$\begin{aligned} x_0(s) :{=} \phi (s), \quad x_{k+1}(s) :{=} ({\mathbf P}x_k)(s), \quad s\in [0,T], \end{aligned}$$

for all $k\ge 0$, where $\phi (s) = \int _0^s \varphi (\varsigma )\,\mathrm{d}\varsigma = F(T) - F(T-s) = z(s)$ is the difference between the boundary value and the current value of the objective function.^{Footnote 11} Consider now the sequence of the largest possible horizons $s_k$ such that the consecutive elements of this sequence coincide, $x_k(s)=x_{k-1}(s)$, for all $s\in [0,s_k]$:

$$\begin{aligned} s_k :{=} \sup \{s\in [0,T] : x_k(\varsigma ) = x_{k-1}(\varsigma ), \varsigma \in [0,s]\}, \quad k\ge 1, \end{aligned}$$

(16)

with the additional definition $s_0 :{=} 0$. We now show the following statement:

$$\begin{aligned} \mathscr {A}(k): \ s_k<T \ \Rightarrow \ \ s_{k}< s_{k+1}\le T, \end{aligned}$$

for all $k\ge 1$. For this, note first that $x_1 = {\mathbf P}x_0 = {\mathbf P}\phi $, with

$$\begin{aligned} x_1(s) = \phi (s) + \int _0^s \left( -\varphi _-(s)\right) \,{\mathbf 1}_{\{\phi (s)\le 0 \}}\,\mathrm{d}\varsigma \ge \phi (s) = x_0(s), \quad s\in [0,T], \end{aligned}$$

(17)

so $0\le s_1 = \inf \{s\in [0,T]:\phi (s)\le 0\}$. Since by definition $\phi (0)=0$, the preceding infimum is nonnegative, and by Eq. (17) it describes $s_1\in [0,T]$ as introduced in Eq. (16). By a contradiction argument, it is straightforward to see that $s_1>0$. Indeed, if $s_1=0$, then $\phi (s) > 0$ for all $s\in (0,T]$. Thus, by the continuity of $\varphi $ there exists an $\varepsilon _0\in (0,T]$ such that $\varphi (s)>0$ for all $s\in (0,\varepsilon _0)$. This implies $\varphi _-(s)=0$ and by Eq. (17) therefore $x_1(s)=x_0(s)$ on $[0,\varepsilon _0]$, whence by Eq. (16): $s_1\ge \varepsilon _0>0$, as claimed. If $s_1=T$, then $\mathscr {A}(1)$ holds automatically. Consider now the interesting case where $0<s_1<T$. By the definition of $s_1$, there exists an $\varepsilon _1\in (0,T-s_1)$ such that for all $s\in (s_1,s_1+\varepsilon _1)$: $\phi (s)<0=x_1(s)$, whence ${\mathbf 1}_{\{\phi (s)\le 0<x_1(s)\}}=0$. With this, the inequality in (17) yields

$$\begin{aligned} x_2(s)= & {} x_1(s) + \int _0^s (-\varphi _-(\varsigma ))\,\left( {\mathbf 1}_{\{x_1(\varsigma )\le 0\}} - {\mathbf 1}_{\{\phi (\varsigma )\le 0\}} \right) \,\mathrm{d}\varsigma \nonumber \\= & {} x_1(s) - \int _0^s (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{\phi (\varsigma )\le 0< x_1(\varsigma )\}}\,\mathrm{d}\varsigma \nonumber \\= & {} x_1(s) - \int _{s_1+\varepsilon _1}^{\max \{s,s_1+\varepsilon _1\}} (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{\phi (\varsigma )\le 0 < x_1(\varsigma )\}}\,\mathrm{d}\varsigma , \end{aligned}$$

(18)

for all $s\in [0,T]$. This means that $x_1(s)=x_2(s)$ for all $s\in [0,s_1+\varepsilon _1]$, so necessarily

$$\begin{aligned} s_2\ge s_1+\varepsilon _1 > s_1. \end{aligned}$$

Thus, the statement $\mathscr {A}(1)$ is true. The following auxiliary result establishes an important monotonicity property for the sequence $\sigma $, useful in the sequel of the proof.

Lemma A.1

The even and odd subsequences $(x_{2j})_{j=0}^\infty $ and $(x_{2j+1})_{j=0}^\infty $ of $\sigma $ are both monotonic, and its elements are such that $x_{2j}\le x_{2j+2} \le x_{2j+3} \le x_{2j+1}$, for all $j\ge 0$.

Proof

All claims are implied by the validity of the statement

$$\begin{aligned} \mathscr {B}(j): \ \ \ x_{2j}\le x_{2j+2} \le x_{2j+3} \le x_{2j+1}, \end{aligned}$$

for $j\ge 0$. To show that the statement $\mathscr {B}(j)$ holds for any nonnegative integer j, we use mathematical induction (see, e.g., [25]). The inequality in (17) is equivalent to $x_0\le x_1$, while Eq. (18) immediately yields $x_2\le x_1$. Using the telescopic sum $x_2-x_0 = (x_2-x_1) + (x_1-x_0)$, Eqs. (17) and (18) together give that $x_0\le x_2$. Analogously, we obtain

$$\begin{aligned} x_3(s) - x_2(s) = \int _{s_2}^{\max \{s_2,s\}} (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_2(\varsigma )\le 0<x_1(\varsigma )\}}\,\mathrm{d}\varsigma \ge 0, \quad s\in [0,T], \end{aligned}$$

i.e., $x_3\ge x_2$. Using the statement $\mathscr {A}(1)$ and substituting the already computed differences into the telescopic sum $x_3 - x_1 = (x_3-x_2) + (x_2-x_1)$ yields $x_3\le x_1$. We have therefore established the validity of the induction hypothesis:

$$\begin{aligned} \mathscr {B}(0): \ \ x_0\le x_2\le x_3 \le x_1. \end{aligned}$$

In the ‘induction step,’ we now show that if $\mathscr {B}(j)$ holds for some $j\ge 0$, then $\mathscr {B}(j+1)$ must also be true. By virtue of $\mathscr {B}(j)$, the forward difference between two consecutive elements of $\sigma $, starting with $x_{2j+3}$, is

$$\begin{aligned} x_{2j+4}(s) - x_{2j+3}(s)= & {} \int _0^s (-\varphi _-(\varsigma ))\,\left( {\mathbf 1}_{\{x_{2j+3}(\varsigma )\le 0\}} - {\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0\}} \right) \,\mathrm{d}\varsigma \nonumber \\= & {} -\int _0^s (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0<x_{2j+3}(\varsigma )\}}\,\mathrm{d}\varsigma \nonumber \\\le & {} 0, \end{aligned}$$

(19)

for all $s\in [0,T]$. Based on this, the forward difference between two consecutive elements of $\sigma $, starting with $x_{2j+4}$, is

$$\begin{aligned} x_{2j+5}(s) - x_{2j+4}(s)= & {} \int _0^s (-\varphi _-(\varsigma ))\,\left( {\mathbf 1}_{\{x_{2j+4}(\varsigma )\le 0\}} - {\mathbf 1}_{\{x_{2j+3}(\varsigma )\le 0\}} \right) \,\mathrm{d}\varsigma \nonumber \\= & {} \int _0^s (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0<x_{2j+1}(\varsigma )\}}\,\mathrm{d}\varsigma \nonumber \\\ge & {} 0, \end{aligned}$$

(20)

for all $s\in [0,T]$. The second inequality in

$$\begin{aligned} \mathscr {B}(j+1): \ x_{2j+2}\le x_{2j+4} \le x_{2j+5} \le x_{2j+3}, \end{aligned}$$

corresponds to the inequality in (20). To establish the validity of $\mathscr {B}(j+1)$, it therefore remains to be shown that $x_{2j+2}\le x_{2j+4}$ and $x_{2j+5} \le x_{2j+3}$. Consider the first of these two inequalities. Using the telescopic-sum idea, $x_{2j+4} - x_{2j+2} = (x_{2j+4} - x_{2j+3}) + (x_{2j+3} - x_{2j+2})$, together with Eq. (19) and $\mathscr {B}(j)$, one obtains

$$\begin{aligned} x_{2j+4}(s) - x_{2j+2}(s)= & {} {-}\int _0^s \varphi _-(s)\left[ {\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0<x_{2j+1}(\varsigma )\}} - {\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0<x_{2j+3}(\varsigma )\}}\right] \mathrm{d}\varsigma \\= & {} \int _0^s (-\varphi _-(s))\left[ {\mathbf 1}_{\{0<x_{2j+1}(\varsigma )\}} - {\mathbf 1}_{\{0<x_{2j+3}(\varsigma )\}}\right] {\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0\}}\,\mathrm{d}\varsigma . \end{aligned}$$

By $\mathscr {B}(j)$ it is $x_{2j+3}\le x_{2j+1}$, so that

$$\begin{aligned} {\mathbf 1}_{\{0<x_{2j+1}(\varsigma )\}} - {\mathbf 1}_{\{0<x_{2j+3}(\varsigma )\}} = {\mathbf 1}_{\{x_{2j+3}(\varsigma )\le 0 < x_{2j+1}(\varsigma )\}} \ge 0, \quad s\in [0,T], \end{aligned}$$

which in turn implies that $x_{2j+4}\ge x_{2j+2}$. The demonstration that $x_{2j+5} \le x_{2j+3}$ proceeds analogously and is therefore omitted; this concludes the proof of Lemma A.1. $\square $

By Eqs. (17) and (18), it is $\phi = x_0\le x_2 \le x_1$. By virtue of Lemma A.1, if $x_k = x_{k+1}$ (i.e., $s_{k+1}=T$), then $x_k = x_{k+n}$ (i.e., $s_{k+n}=T$) for all $n\ge 1$. In our proof of $\mathscr {A}(k)$ for $k\ge 1$ we therefore consider the nontrivial case where $s_k<T$.

As in Eq. (20), the forward difference between two consecutive elements of $\sigma $, starting with an even element $x_{k}=x_{2j+2}$, is

$$\begin{aligned} x_{2j+3}(s) - x_{2j+2}(s) = \int _0^s (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{2j+2}(\varsigma )\le 0<x_{2j+1}(\varsigma )\}}\,\mathrm{d}\varsigma , \end{aligned}$$

for all $s\in [0,T]$ and any integer $j\ge 0$. By the definition of $s_k$ in Eq. (16) this yields

$$\begin{aligned} x_{k+1}(s) - x_{k}(s) = \int _{s_k}^{\max \{s_k,s\}} (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{k}(\varsigma )\le 0<x_{k-1}(\varsigma )\}}\,\mathrm{d}\varsigma , \quad s\in [0,T]. \end{aligned}$$

Since $x_k(s_k) = x_{k-1}(s_{k-1})$, by the continuity of $\varphi $ there exists an $\varepsilon _k\in \,]0,T-s_k]$ such that $x_{k}(s)>x_{k-1}(s)$ for all $s\in \,]s_k,s_k+\varepsilon _k[$. But then ${\mathbf 1}_{\{x_{k}(\varsigma )\le 0<x_{k-1}(\varsigma )\}}=0$ on $]s_k,s_k+\varepsilon _k[$, which (by continuity) implies that $x_{k+1}(s) = x_k(s)$ for all $s\in [s_k,s_k+\varepsilon _k]$, whence (given that $s_1>0$, as shown earlier):

$$\begin{aligned} s_{k+1}\ge s_k + \varepsilon _k > s_k, \quad k = 2j, \quad j\ge 0. \end{aligned}$$

(21)

Similarly, as in Eq. (19), the forward difference between two consecutive elements of $\sigma $, starting with an odd element $x_k = x_{2j+1}$, is

$$\begin{aligned} x_{2j+2}(s) - x_{2j+1}(s) = -\int _0^s (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{2j}(\varsigma )\le 0<x_{2j+1}(\varsigma )\}}\,\mathrm{d}\varsigma , \end{aligned}$$

for all $s\in [0,T]$ and any integer $j\ge 0$. As a result, using again the definition of $s_k$:

$$\begin{aligned} x_{k+1}(s) - x_{k}(s) = -\int _{s_k}^{\max \{s_k,s\}} (-\varphi _-(\varsigma ))\,{\mathbf 1}_{\{x_{k-1}(\varsigma )\le 0<x_{k}(\varsigma )\}}\,\mathrm{d}\varsigma , \quad s\in [0,T]. \end{aligned}$$

The fact that $x_k(s_k) = x_{k-1}(s_k)$ implies (by continuity) that there exists an $\varepsilon _k$ in the interval $]0,T-s_k]$ such that $x_k(s)<x_{k-1}(s)$ and therefore also ${\mathbf 1}_{\{x_{k-1}(s)\le 0<x_{k}(s)\}}$, for all $s\in \,]s_k,s_k+\varepsilon _k[$. Hence, $x_{k+1}(s)=x_k(s)$ on $[s_k,s_k+\varepsilon _k]$, resulting in

$$\begin{aligned} s_{k+1}\ge s_k + \varepsilon _k > s_k, \quad k = 2j+1, \quad j\ge 0. \end{aligned}$$

(22)

Combining the monotonicity of $s_k$ in (21) and (22), $(s_k)_{k=0}^\infty $ is an increasing sequence with upper bound T. As such it must converge ([26], p. 55), and since T is the smallest upper bound:^{Footnote 12}

$$\begin{aligned} \lim _{k\rightarrow \infty } s_k = T. \end{aligned}$$

Employing the $\Vert \cdot \Vert _{1,1}$-norm in Eq. (5) we can therefore conclude

$$\begin{aligned} \Vert x_{k+1}-x_{k}\Vert _{1,1}\le & {} \int _0^T \left( \int _{s_{2k}}^T (-\varphi _-(\varsigma ))\,d\varsigma + (-\varphi _-(\varsigma )){\mathbf 1}_{\{x_{2k}(s)\ne x_{2k-1}(s)\}}\right) \,\mathrm{d}s\\\le & {} m(T+1)(T-s_{2k})\rightarrow 0, \end{aligned}$$

as $k\rightarrow \infty $, where

$$\begin{aligned} 0\le m:{=} \max _{t\in [0,T]} \left\{ -\varphi _-(s)\right\} = \max \left\{ 0,-\min _{s\in [0,T]} \varphi (s)\right\} <\infty . \end{aligned}$$

This in turn implies that $\sigma = (x_k)_{k=0}^\infty $ must be a Cauchy sequence. Thus, by completeness of the Banach space ${{\mathcal {W}}}^{1,1}([0,T])$, there exists an absolutely continuous function $x\in {{\mathcal {W}}}^{1,1}([0,T])$ such that $\lim _{k\rightarrow \infty } x_k = x$. The limit function x satisfies

$$\begin{aligned} 0 = \Vert {\mathbf P}x - x\Vert _{1,1}, \end{aligned}$$

so

$$\begin{aligned} x(s) = \int _0^s \left( \varphi (\varsigma ) - \varphi _-(\varsigma )\,{\mathbf 1}_{\{x(\varsigma )\le 0 \}}\right) \,\mathrm{d}\varsigma = \int _0^s {\hat{\varPhi }}(\varphi (\varsigma ),x_k(\varsigma ))\,\mathrm{d}\varsigma , \quad s\in [0,T], \end{aligned}$$

which means that x solves the initial-value problem (2’), and ${{\mathcal {R}}}\ne \emptyset $.

(ii) Uniqueness: $x^1,x^2 \in {{\mathcal {R}}}\,\Rightarrow \,x^1=x^2$. Indeed, for any given solutions $x^1$ and $x^2$, consider the pointwise difference,

$$\begin{aligned} \rho (s):{=} x^1(s)-x^2(s), \quad s\in [0,T]. \end{aligned}$$

By the initial condition in Eq. (2’) it is $\rho (0) = 0$, and

$$\begin{aligned} \dot{\rho }(s) = \dot{x}^1(s) - \dot{x}^2(s) = -\varphi _-(s)\left( {\mathbf 1}_{\{x^1(s)\le 0\}} - {\mathbf 1}_{\{x^2(s)\le 0\}} \right) , \quad s\in [0,T]. \end{aligned}$$

Thus, $\dot{\rho }(s)=0$ whenever the values $x^1(s)$ and $x^2(s)$ are either both positive or both equal to 0. On the other hand, if $x^1(s)>x^2(s)=0$, then $\dot{\rho }(s) = \varphi _-(s)\le 0$; and if $x^1(s)=0<x^2(s)$, then $\dot{\rho }(s) = -\varphi _-(s)\ge 0$. Combining these insights yields

$$\begin{aligned} \frac{d}{ds} \frac{(\rho (s))^2}{2} = (\rho (s))\,\dot{\rho }(s)\le 0, \quad s\in [0,T]. \end{aligned}$$

(23)

Together with the initial condition $\rho (0)=0$, Eq. (23) implies

$$\begin{aligned} x^1(s) - x^2(s) = \rho (s) = 0, \quad s\in [0,T], \end{aligned}$$

so $x^1 = x^2$, as posited at the outset of the argument.

The claims (i) and (ii) together imply that $|{{\mathcal {R}}}|=1$, i.e., there exists a unique solution to the initial-value problem (2’), which by construction has the same solution set ${\mathcal {R}}$ as the initial-value problem (2), thus concluding our proof. $\square $

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Weber, T.A. Global Optimization on an Interval. J Optim Theory Appl 172, 684–705 (2017). https://doi.org/10.1007/s10957-016-1006-y

Download citation

Received: 26 January 2016
Accepted: 25 August 2016
Published: 20 September 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10957-016-1006-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Global Optimization on an Interval

Abstract

Similar content being viewed by others

Continuity of Minima: Local Results

Constrained Global Optimization Using a New Exact Penalty Function

On a Global Search in D.C. Optimization Problems

1 Introduction

1.1 Literature

1.2 Outline

2 Preliminaries

Lemma 2.1

Proof

Theorem 2.1

3 Main Results

Theorem 3.1

Proof

Remark 3.1

Corollary 3.1

Proof

Corollary 3.2

Proof

Remark 3.2

Corollary 3.3

Proof

Remark 3.3

Theorem 3.2

Proof

Corollary 3.4

Corollary 3.5

Remark 3.4

4 Applications

Example 4.1

Example 4.2

Example 4.3

Remark 4.1

5 Perspectives

5.1 Global Optimality Conditions

Remark 5.1

5.2 Dynamic-Systems Interpretation

6 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof of Theorem 2.1

Lemma A.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation