Global Optimization on an Interval

This paper provides expressions for solutions of a one-dimensional global optimization problem using an adjoint variable which represents the available one-sided improvements up to the interval “horizon.” Interpreting the problem in terms of optimal stopping or optimal starting, the solution characterization yields two-point boundary problems as in dynamic optimization. Results also include a procedure for computing the adjoint variable, as well as necessary and sufficient global optimality conditions.


Introduction
The generic nonconcavity of maximization problems generally leads to multiple local optima. Standard optimality conditions tend to be local, and techniques for global optimization are usually algorithmic in nature, restricting the search for the best solution to subsets of the domain. For the simple case where the domain is an interval, a global maximizer of a continuously differentiable function can be found by using techniques from dynamic systems, notably by introducing global information in the form of an adjoint variable. In this manner, we construct expressions for solutions to a global optimization problem on an interval, which are directly related to dynamic interpretations in terms of optimal stopping and optimal starting. In addition to providing a full characterization of solutions to a global optimization problem over an interval, the adjoint variable can also be used locally to formulate necessary and sufficient optimality conditions for one-sided subproblems of the original global optimization problem.

Literature
Following [1], global optimization methods use either deterministic search algorithms (e.g., via gradient methods) or random-sampling procedures. The first type of algorithms consists of schemes for systematic search updates. The Bolzano search finds critical points of a concave objective function via bisection (see, e.g., [2], p. 122). The golden-section search by [3] for unimodal functions increases the efficiency of the bisection method by varying the subdivision using Fibonacci numbers; see also [4]. 1 Algorithms based on steepest ascent, such as Newton's method (see, e.g., [7], Ch. 9.5), tend to be greedy and therefore converge to local extrema. Improvements are achieved by using (deterministic) sampling techniques capitalizing on available knowledge about the variation of the function in terms of its Lipschitz constant [8]. The latter can be refined by locally estimating the Lipschitz constant [9], using a quadratic bound [10], or by employing a higher-order approach, e.g., considering additionally the Lipschitz constant for the variation of the gradient [11]. An overview of the second type of algorithms, based on random sampling, is provided by [12], Ch. 4. An alternative Bayesian approach, assuming a probabilistic model of the objective function as a stochastic process, was proposed by [13]. These algorithms amount to numerical techniques, predicated on the assumption that the objective function is expensive to evaluate or nonsmooth, so as to deny the possibility of direct analytical calculations. In breaking with this premise, our goal is to provide insights about the kind of information needed to compute solutions to a global optimization problem as well as their properties, rather than an attempt to improve on the numerical side.
We assume that the underlying objective function is continuously differentiable, and then reduce the solution of the global optimization problem to solving an "adjoint" differential equation. In the spirit of [14], this differential equation performs the somewhat unexpected task of aggregating global information about the available one-sided improvements. Since the adjoint equation has a discontinuous right-hand side, existence and uniqueness of the solution are obtained separately via successive Picard iterations (see, e.g., [15], p. 213), without relying on (here unavailable) Lipschitz constants.

Outline
The remainder of this paper is organized as follows. Section 2 introduces notation and basic concepts, most notably an auxiliary (adjoint) variable which represents the optimal improvement up to the interval horizon. Section 3 provides expressions for the solutions of a one-dimensional global optimization problem as well as necessary and sufficient global optimality conditions. Section 4 contains several examples to illustrate the results. It also clarifies the equivalence of global optimization with optimal stopping (or starting) problems. Section 5 discusses global optimality conditions and the relationship of the proposed methods to the analysis of optimal control problems. Section 6 concludes.

Preliminaries
For any given T > 0, consider the global optimization problem 2 where F : [0, T ] → R is a differentiable real-valued objective function with continuous derivative f : [0, T ] → R. By the Weierstrass theorem (see, e.g., [16], p. 540), problem (P) has a solution, i.e., its solution set P ⊆ [0, T ] is nonempty, and the optimal value F * is finite. Furthermore, it is well known that any (interior) optimizert ∈ ]0, T [ (i.e., excluding the boundary points 0 and T ) satisfies the Fermat condition, but that there may be many pointst that do not solve (P) but still satisfy f (t) = 0. For example, if F is equal to a valueF < F * on a subinterval, then there is a continuum of such values. We are interested in characterizing the solution(s) to the global optimization problem, as element(s) of [0, T ], including the boundaries. For this, we introduce an auxiliary function, also referred to as "adjoint variable," x : [0, T ] → R as the unique solution to the initial-value problem 3 for s ∈ [0, T ], where for any (t,x) ∈ R 2 : The right-hand side of the differential equation in (2) is discontinuous and generally does not satisfy the Carathéodory conditions (see, e.g., [17], p. 3). Before we establish existence and uniqueness of a solution to the initial-value problem in the space Proof The adjoint variable x(s) cannot become negative, since Eq. (2) implies thaṫ x ≥ 0 at the boundary of positivity, i.e., whenever For this, note that the solution to the initial-value probleṁ Consider the difference : =x − z. Then, (0) = 0 and, using the fact that x(s) ≥ 0, it is˙ Thus, which implies that x(s) ≥ z(s) for all s ∈ [0, T ]. This proves the claim.
As explained in the next section, the adjoint variable x(s) measures the optimal improvement of the objective value F(T − s) on the interval [T − s, T ]. Because the comparison set includes the current value of the objective function, the improvement must be nonnegative and has to exceed the difference F(T ) − F(T − s), at least weakly. By Thus, if we set ϕ(s) : = f (T − s) and ϕ − (s) : = min{0, ϕ(s)} for all s ∈ [0, T ], then based on the preceding implication, the initial-value problem in Eq. (2) can be rewritten in the forṁ The vector space W 1,1 ([0, T ]) is a Banach space, i.e., a complete normed vector space, which means that any Cauchy sequence with elements in the vector space converges (in the · 1,1 -norm) to an element of the vector space. The solution set of the initial-value problem (2') is where the operator P : which (as can be verified) is also an element of W 1,1 ([0, T ]). The following result provides existence and uniqueness of a solution to the initial-value problems (2) and (2'). (2). That is, when considering the sequence σ : =(x k ) ∞ k=0 , with the initial function x 0 = φ and the Picard iteration x k+1 = Px k for k ≥ 0, then x k → x ∈ R as k → ∞. In practice, the convergence of the sequence σ to the adjoint variable x = lim k→∞ P k φ is usually very efficient and takes place within a few iterations; see Fig. 1 for an example.

Main Results
Based on the notions introduced in the proof of Lemma 2.1, it is now possible to construct expressions for the solutions of (P), first for the smallest solution t * , then the largest solution t * * , and finally for all solutions in between.  (because 0 ∈ S), and its supremum, s * : = sup S, therefore exists and lies in the interval [0, T ]. Depending on whether or not S is a singleton, we consider two cases.
is nondecreasing in s. Now consider the optimal value of the global optimization problem (P) subject to the additional constraint that t ∈ [T −ŝ, T ], sô Then by virtue of Eq. (3) and the nonnegativity of x it iŝ By the monotonicity of (s), alluded to earlier, the maximum on the right-hand side is But the value on the right-hand side of the preceding inequality can be attained in the Using again the monotonicity of (s), for anyŝ ∈ S withŝ ≥ŝ, one obtainŝ We therefore know that Combining Eqs. (8) and (9), the solution to the global optimization problem (P) is therefore t * = T − s * , and which completes the proof.
Remark 3.1 By substituting s = T − t in Theorem 3.1, the smallest solution to the global optimization problem (P) can also be written in the form Accordingly, the optimal value of (P) is In the foregoing derivations, the nonnegative adjoint variable x(T − t), defined as the solution to the initial-value problem (2), measures the possible cumulative improvement of a solution in the interval [t, T ] relative to the current value F(t). The smallest solution of (P) is the smallest t * for which no improvement of the objective can be obtained on the interval [t * , T ], so x(T − t * ) = 0 in particular. Alternatively, one can determine the largest solution t * * of (P) by measuring cumulative improvements over For this, consider the unique solution to the initial-value problemẏ for t ∈ [0, T ]. Analogous to the iterative procedure for the solution of the initialvalue problem (2) in Sect. 2, it is possible to obtain the (co-)adjoint variable y by successive approximation, lim k→∞P kΦ = y, where the operatorP : maps any absolutely continuous function y on [0, T ] to an absolutely continuous functionPy, with just as the operator P in Eq. (6), and whereΦ(t) : . As with Eq. (2'), corresponding to Eq. (2), there exists an equivalent formulation for the initial-value problem (10) for the computation of y, is also a solution of (P). Moreover, by Theorem 3.1 the smallest solution s * of (P') is equal to T minus the largest solution t * * of (P). Mirroring the objective function from F to G also mirrors the corresponding derivatives from f to g, in the sense that for all s ∈ [0, T ]. A (unique) solution y to the initial-value problem (2), applied to the primitives of the mirrored global optimization problem (P') (with the independent variable s suitably replaced by t), satisfieṡ The latter corresponds to the initial-value problem (10). By Theorem 3.1, the smallest solution of (P') is s * = T − sup{t ∈ [0, T ] : y(t) = 0}, so that the largest solution of (P) becomes which concludes the proof.
The two preceding results together characterize the uniqueness of a solution to the global optimization problem.

Corollary 3.2 A solution of (P) is unique if and only if
Proof The result follows immediately by setting t * = t * * in Theorem 3.1 and Corollary 3.1.
Intuitively, a solution t * of (P) is unique if and only if the length of the largest interval for zero cumulative improvement (of the objective function F) to the right of t and the length of the largest interval for zero cumulative improvement to the left of t add up to the length T of the domain [0, T ] at t = t * .

Remark 3.2 Consider the (slightly) "generalized" global optimization problem
featuring a continuously differentiable real-valued objective function H , defined on the interval [a, b], where a, b are any given real numbers such that a < b. While (P") seems more general than (P), it can be reduced to the latter by maximizing just as in the original optimization problem (P). Any solution t * of (P) directly corresponds to a solutiont * of (P") via translation,t * = t * − a.
It is possible to generalize the representation of the solutions in Theorem 3.1 and Corollary 3.1 to cases where the global optimization problem has more than 2 solutions. Indeed, if (P) has any finite number of solutions, all solutions can be found recursively.
is a complete set of N > 2 distinct solutions of (P), then all solutions (between the smallest and the largest) are wherex is the unique solution of the initial-value problem (2) with T replaced by T : =t * * .
Moreover, by Theorem 3.1: which corresponds to the recursion in (12), thus concluding the proof.
Note that the cardinality of the solution set P need not be finite. For instance, the objective function F, defined by F(t) : =1−(t 2 sin(1/t)) 2 for t > 0, with F(0) : =0, is continuously differentiable, and (for T ≥ 1/π ) the global optimization problem (P) has the countable solution set P = {t 1 , t 2 , . . .}, where t k = 1/(kπ) for all k ≥ 1. But P need not even be countable: as an example, any constant objective function, F(t) ≡ c ∈ R, would produce the continuum P = [0, T ] as solution set of (P), equal to the entire domain. By combining the interpretations of the two adjoint variables x and y as the rightsided and left-sided gains, respectively, it is possible to construct a necessary and sufficient optimality condition to decide whether a given point solves the global optimization problem. For this, we introduce the combined (or "two-sided") adjoint variable λ(t) : = max{x(T − t), y(t)}.
Proof Consider the set P of solutions to (P), and let F * be the optimal value of this global optimization problem. (ii) Sufficiency: Consider a pointt ∈ [0, T ] which satisfies λ(t) = 0. By Lemma 2.1, the adjoint variable x is nonnegative-valued, which-by symmetry-is also true for y. Hence, x(T −t) = y(t) = 0, so neither a right-sided (on [t, T ]) nor a left-sided (on [0,t]) strict improvement over F(t) is possible, which implies that F(t) = F * . Hence,t must be an element of P.
Based on (i) and (ii), Eq. (13) characterizes any solution of (P), which implies the representation of the solution set P as the set of roots of λ(t), concluding the proof.
At any given point t the combined adjoint variable λ(t) can be interpreted as the best gain available on the domain [0, T ]. This implies the following invariance property.
Combining the last result with the initial conditions in Eqs. (2) and (10) yields an expression of the optimal value of (P) as a function of the adjoint variables evaluated at the interval horizon. The aforementioned properties of the adjoint variables reveal an inherent complementarity, in the sense that the nonnegative one-sided adjoint variables x and y can only vanish together at a global optimum. In addition, because of the normalization to zero at either interval end, the sum of the one-sided adjoint variables at the boundaries must be equal to the optimal increment of the objective function:

Example 4.2 (Monopoly Pricing)
A single-product monopolist faces heterogeneous consumers whose highest willingness-to-pay (WTP) for its good is normalized to T = 1, without loss of generality. Given a continuous probability density function h : [0, 1] → R + describing the distribution of consumers' WTP, the aggregate demand for the product at the price t is Thus, assuming (for simplicity) zero marginal cost, the monopolist's optimal pricing problem becomes

which is of the form (P) for F(t) = t D(t) and f (t) = D(t)−t h(t).
Fermat's necessary optimality condition (1) yields that at any positive optimal price t * ∈ ]0, 1[, the monopolist would set the marginal revenue f to zero, so D(t * ) = t * h(t * ). 5 For a multimodal distribution h, there can be many prices that satisfy this optimality condition. Figure 2 depicts the situation for a bimodal beta-mixture h(t) = γ p α 1 ,β 1 where γ ∈ [0, 1] and p α,β (t) : =t α−1 (1 − t) β−1 /B(α, β) for any α, β > 0. 6 In order to derive a necessary and sufficient optimality condition, we use Eqs. (2') and (10') to compute the adjoint variables x and y. Given any price t ∈ [0, 1], it is best for the monopolist to increase the price if and only if the adjoint variable x(1 − t) > 0. And it is best for the monopolist to decrease the price if and only if the (co-)adjoint variable y(t) > 0. Hence, as stated in Theorem 3.2 the price t = t * is globally optimal if and only if λ(t * ) = max{x(1 − t * ), y(t * )} = 0; see Fig. 2. Furthermore, following Corollary 3.4 and Corollary 3.5 the combined adjoint variable λ(t), at any price t ∈ [0, 1], is equal to the distance of the profit F(t) to its optimal value F * .  x(s). Moreover, the best stopping strategy, once having arrived at t (possibly suboptimally, by sticking to the default option), is to stop if and only if x(T − t) = 0. Hence, the earliest stopping time t * must be globally optimal, and t * = inf{t ∈ [0, T ] : x(T − t) = 0} as already noted in Remark 3.1.

Remark 4.1
The foregoing example shows that a (deterministic) optimal stopping problem can be written in the form (P). The converse also holds: (P) can be interpreted as an optimal stopping problem, given the utility increment f (t) ≡Ḟ(t) and a zero discount rate. Theorem 3.1 addresses this interpretation. By switching the reference point, in the sense that where U 0 : = T 0 e −r θ u(θ ) dθ is a constant, the modified objective function is a translation of the original objective function:F(t) ≡ F(t) + (U 0 − V 0 ). Hence, one can think of (P) as an optimal starting problem. Corollary 3.1 and the cumulative left-sided benefit y(t) in Eq. (10) highlight this interpretation.

Perspectives
The representation of solutions to the global optimization problem (P) in Sect. 3 suggests several global optimality conditions and a dynamic-systems interpretation.

Global Optimality Conditions
Consider the solution x to the initial-value problem (2) and, respectively, the solution y to the initial-value problem (10). The significance of the adjoint variables x and y as the cumulative one-sided gains of the objective value implies several global optimality conditions, cumulating in an exact characterization of solutions to (P).
(i) A necessary optimality condition for any solution t * of the global optimization problem (P) is that x(T − t * ) = 0 (resp., y(t * ) = 0). Applied to the interval boundaries, this invariance property implies that the distance to the optimal value is attained by the appropriate one-sided adjoint variable at each endpoint; see Corollary 3.5 for details.
Statements (i)-(v) also apply to points and solutions at the boundaries of the interval [0, T ], i.e., they are not limited to interior points, unlike standard (local) first-order optimality conditions such as (1). In particular, statement (v) provides a crisp representation of the solution set: P = {t ∈ [0, T ] : λ(t) = 0}.
Remark 5.1 As noted after Theorem 2.1, in practice the adjoint variable x representing the right-sided gain can be efficiently computed by repeatedly applying the operator P in Eq. (6) a (usually small) number of times to φ, where φ(s) Fig. 1. That is, x = lim k→∞ P k φ. 7 Similarly, the adjoint variable y representing the left-sided gain can be obtained using the operatorP in Eq. (11), so lim k→∞P kφ = y,

Dynamic-Systems Interpretation
The equivalence of global optimization on an interval and optimal stopping (see Remark 4.1) suggests a dynamic-systems interpretation of the solution method proposed in Sect. 3. By introducing the state variable ξ(t) and the adjoint variable ("co-state") ψ(t) ≡ x(T − t), the solution of (P), given in Theorem 3.1, satisfies the following two-point boundary-value problem for t ∈ [0, T ]: where the function μ : R → R in Eq. (14) implements the (optimal) stopping policy using a co-state feedback:  where F * = ν(T ) is the optimal value of (P) and t * is the (smallest) solution of (P); see Fig. 3 for an illustration using the primitives of Example 4.2. This formalizes the heuristic that it is globally optimal to walk the 'mountain range' defined by F(t), starting at t = 0, toward the right, until the view toward the right becomes unimpeded.
The global information about the function values not yet experienced during the walk is contributed by the co-state variable ψ. Alternately, it is possible to start walking on the interval at t = T toward the left, leading to an analogous solution, as formulated in Corollary 3.1. While the results by themselves do not offer a 'magic potion' for finding a solution to a global optimization problem without checking the entire interval, they shed light on the importance of global information, unlike the local optimality conditions, such as (1), usually employed to identify candidates for interior local optima. The two-point boundary problem (14)- (15) is reminiscent of the Hamiltonian system which leads to a similar two-point boundary-value problem as part of the Pontryagin maximum principle [19]; see also [20]. 9 As Bellman's principle of optimality ( [21], Ch. III.3) would suggest, the adjoint variable provides in fact a solution to an entire family of nested optimization problems. It thus gives a "complete contingent plan," in the sense that if for some reason a global optimum t * was missed when walking from left to right, then for any t ∈ ]t * , T [ the adjoint variable still provides an optimal stopping rule on the interval [t, T ].

Conclusions
Keeping track of one-sided improvements on an interval [0, T ] in the form of adjoint variables x(T − t) and y(t), for all t ∈ [0, T ], allows for a characterization of all solutions to the global optimization problem (P). The two-sided adjoint variable λ(t) = max{x(T − t), y(t)}, as the upper envelope of both one-sided adjoint variables, vanishes at a pointt of the interval if and only if that point is a solution of (P), sot ∈ P. The adjoint variables are uniquely determined as solutions to the initial-value problems (2) and (10), and they can be obtained using a Picard iteration that usually terminates in a finite number of steps. Conceptually, the adjoint variables incorporate not only all the global information needed for solving (P) but also for solving subproblems of (P): A one-sided adjoint variable, say y(t), describes a ('stopping') policy for optimizing on a subinterval with the additional definition s 0 : =0. We now show the following statement: for all k ≥ 1. For this, note first that x 1 = Px 0 = Pφ, with  (17) yields for all s ∈ [0, T ]. This means that x 1 (s) = x 2 (s) for all s ∈ [0, s 1 +ε 1 ], so necessarily Thus, the statement A (1) is true. The following auxiliary result establishes an important monotonicity property for the sequence σ , useful in the sequel of the proof.
Proof All claims are implied by the validity of the statement for j ≥ 0. To show that the statement B( j) holds for any nonnegative integer j, we use mathematical induction (see, e.g., [25]). The inequality in (17) is equivalent to x 0 ≤ x 1 , while Eq. (18) immediately yields x 2 ≤ x 1 . Using the telescopic sum (17) and (18) together give that x 0 ≤ x 2 . Analogously, we obtain i.e., x 3 ≥ x 2 . Using the statement A (1) and substituting the already computed differences into the telescopic sum We have therefore established the validity of the induction hypothesis: In the 'induction step,' we now show that if B( j) holds for some j ≥ 0, then B( j +1) must also be true. By virtue of B( j), the forward difference between two consecutive elements of σ , starting with x 2 j+3 , is for all s ∈ [0, T ]. Based on this, the forward difference between two consecutive elements of σ , starting with x 2 j+4 , is for all s ∈ [0, T ]. The second inequality in corresponds to the inequality in (20). To establish the validity of B( j + 1), it therefore remains to be shown that x 2 j+2 ≤ x 2 j+4 and x 2 j+5 ≤ x 2 j+3 . Consider the first of these two inequalities. Using the telescopic-sum idea, , together with Eq. (19) and B( j), one obtains which in turn implies that x 2 j+4 ≥ x 2 j+2 . The demonstration that x 2 j+5 ≤ x 2 j+3 proceeds analogously and is therefore omitted; this concludes the proof of Lemma A.1.
By Eqs. (17) and (18), it is φ = x 0 ≤ x 2 ≤ x 1 . By virtue of Lemma A.1, if x k = x k+1 (i.e., s k+1 = T ), then x k = x k+n (i.e., s k+n = T ) for all n ≥ 1. In our proof of A (k) for k ≥ 1 we therefore consider the nontrivial case where s k < T .
As in Eq. (20), the forward difference between two consecutive elements of σ , starting with an even element x k = x 2 j+2 , is for all s ∈ [0, T ] and any integer j ≥ 0. By the definition of s k in Eq. (16) this yields Since x k (s k ) = x k−1 (s k−1 ), by the continuity of ϕ there exists an ε k ∈ ]0, T − s k ] such that x k (s) > x k−1 (s) for all s ∈ ]s k , s k + ε k [. But then 1 {x k (ς)≤0<x k−1 (ς)} = 0 on ]s k , s k +ε k [, which (by continuity) implies that x k+1 (s) = x k (s) for all s ∈ [s k , s k +ε k ], whence (given that s 1 > 0, as shown earlier): Similarly, as in Eq. (19), the forward difference between two consecutive elements of σ , starting with an odd element x k = x 2 j+1 , is for all s ∈ [0, T ] and any integer j ≥ 0. As a result, using again the definition of s k : The fact that x k (s k ) = x k−1 (s k ) implies (by continuity) that there exists an ε k in the interval ]0, T − s k ] such that x k (s) < x k−1 (s) and therefore also 1 {x k−1 (s)≤0<x k (s)} , for all s ∈ ]s k , s k + ε k [. Hence, x k+1 (s) = x k (s) on [s k , s k + ε k ], resulting in s k+1 ≥ s k + ε k > s k , k = 2 j + 1, j ≥ 0.
Combining the monotonicity of s k in (21) and (22), (s k ) ∞ k=0 is an increasing sequence with upper bound T . As such it must converge ( [26], p. 55), and since T is the smallest upper bound: 12 lim k→∞ s k = T.
The claims (i) and (ii) together imply that |R| = 1, i.e., there exists a unique solution to the initial-value problem (2'), which by construction has the same solution set R as the initial-value problem (2), thus concluding our proof.