This article contains a survey of some well known facts about the complexity of global optimization, and also describes some results concerning the average-case complexity .

Consider the following optimization problem. Given a class F of objective functions f defined on a compact subset of d-dimensional Euclidean space, the goal is to approximate the global minimum of f based on evaluation of the function at sequentially selected points. The focus will be on the error after n observations

where f n is the smallest of the first n observed function values (other approximations besides f n are often considered).

Complexity of optimization is usually studied in the worst- or average-case setting. In order for a worst-case analysis to be useful the class of objective functions F must be quite restricted. Consider the case where F is a subset of the continuous functions on a compact set. It is convenient to consider the class F = C r([0, 1]d) of real-valued functions on [0, 1]d with continuous derivatives up to order r ≥ 0. Suppose that r > 0 and f r is bounded. In this case Θ(εd/r) function evaluations are needed to ensure that the error is at most ε for any fF; see [8].

An adaptive algorithm is one for which the (n + 1)st observation point is determined as a function of the previous observations, while a nonadaptive algorithm chooses each point independently of the function values. In the worst-case setting, adaptation does not help much under quite general assumptions. If F is convex and symmetric (in the sense that −F = F), then the maximum error under an adaptive algorithm with n observations is not smaller than the maximum error of a nonadaptive method with n + 1 observations; see [4].

Virtually all global optimization methods in practical use are adaptive. For a survey of such methods see [6], [9]. The fact that the worst-case performance can not be significantly improved with adaptation leads to consideration of alternative settings that may be more appropriate. One such setting is the average-case setting, in which a probability measure P on F is chosen. The object of study is then the sequence of random variables Δ n (f), and the questions include under what conditions (for what algorithms) the error converges to zero and for convergent algorithms the speed of convergence. While the average-case error is often defined as the mathematical expectation of the error, it is useful to take a broader view, and consider for example convergence in probability of a n Δ n for some normalizing sequence {a n }.

With the average-case setting one can consider less restricted classes F than in the worst-case setting. As F gets larger, the worst-case deviates more and more from the average case, but may occur on only a small portion of the set F. Even for continuous functions the worst-case is arbitrarily bad.

Most of what is known about the average-case complexity of optimization is in the one-dimensional setting under the Wiener probability measure on C([0, 1]). Under the Wiener measure, the increments f(t)−f(s) have a normal distribution with mean zero and variance ts, and are independent for disjoint intervals. Almost every f is nowhere differentiable, and the set of local minima is dense in the unit interval. One can thus think of the Wiener measure as corresponding to assuming ‘only’ continuity; i.e., a worst-case probabilistic assumption.

K. Ritter proved [5] that the best nonadaptive algorithms have error of order n−1/2 after n function evaluations; the optimal order is achieved by observing at equally spaced points. Since the choice of each new observation point does not depend on any of the previous observations, the computation can be carried out in parallel. Thus under the Wiener measure, the optimal nonadaptive order of convergence can be accomplished with an algorithm that has computational cost that grows linearly with the number of observations and uses constant storage. This gives the base on which to compare adaptive algorithms.

Recent studies (as of 2000) have formally established the improved power of adaptive methods in the average-case setting by analyzing the convergence rates of certain adaptive algorithms. A randomized algorithm is described in [1] with the property that for any 0 < δ < 1, a version can be constructed so that under the Wiener measure, the error converges to zero at rate n −1+δ. This algorithm maintains a memory of two past observation values, and the computational cost grows linearly with the number of iterations. Therefore, the convergence rate of this adaptive algorithm improves from the nonadaptive n −1/2 rate to n −1+δ with only a constant increase in storage.

Algorithms based on a random model for the objective function are well-suited to average-case analysis. H. Kushner proposed [3] a global optimization method based on modeling the objective function as a Wiener process. Let {z n } be a sequence of positive numbers, and let the (n + 1)st point be chosen to maximize the probability that the new function value is less than the previously observed minimum minus z n . This class of algorithms, often called P-algorithms , was given a formal justification by A. Žilinskas [7].

By allowing the {z n } to depend on the past observations instead of being a fixed deterministic sequence, it is possible to establish a much better convergence rate than that of the randomized algorithm described above. In [2] an algorithm was constructed with the property that the error converges to zero for any continuous function and furthermore, the error is of order e nc n , where {c n } (a parameter of the algorithm) is a deterministic sequence that can be chosen to approach zero at an arbitrarily slow rate. Notice that the convergence rate is now almost exponential in the number of observations n. The computational cost of the algorithm grows quadratically, and the storage increases linearly, since all past observations must be stored.

See also: Global optimization based on statistical models; Adaptive simulated annealing and its application to protein folding.