Convergence of the Nelder-Mead method

We develop a matrix form of the Nelder-Mead simplex method and show that its convergence is related to the convergence of infinite matrix products. We then characterize the spectra of the involved matrices necessary for the study of convergence. Using these results, we discuss several examples of possible convergence or failure modes. Then, we prove a general convergence theorem for the simplex sequences generated by the method. The key assumption of the convergence theorem is proved in low-dimensional spaces up to 8 dimensions.


Introduction
We study the convergence of the Nelder-Mead simplex method [35] for the solution of the unconstrained minimization problem where f is continuous. The Nelder-Mead method is widely used in derivative-free optimization and various application areas [2,6,21,28,42]. There are several forms and variants of the Nelder-Mead method. We use the version of Lagarias, Reeds, Wright and Wright [23]. The vertices of the initial simplex S are denoted by x 1 , x 2 , . . . , x n+1 ∈ R n . It is assumed that vertices x 1 , . . . , x n+1 are ordered such that f (x 1 ) ≤ f (x 2 ) ≤ · · · ≤ f (x n+1 ) (1) and this condition is maintained during the iterations of the Nelder-Mead algorithm. Define the center x c = 1 n n i=1 x i and x (λ) = (1 + λ) x c − λx n+1 . The related evaluation points are Then one (major) iteration of the method consists of the following operations or inner steps:

Reflect
If f (x 1 ) ≤ f (x r ) < f (x n ), then replace x n+1 by x r and goto 0.

Expand
If f (x r ) < f (x 1 ) and f (x e ) < f (x r ), then replace x n+1 by x e and goto 0. If f (x e ) ≥ f (x r ), then replace x n+1 by x r and goto 0.

Contract outside
If f (x n ) ≤ f (x r ) < f (x n+1 ) and f (x oc ) ≤ f (x r ), then replace x n+1 by x oc and goto 0.

Contract inside
If f (x r ) ≥ f (x n+1 ) and f (x ic ) < f (x n+1 ) then replace x n+1 by x ic and goto 0.

Shrink
x i ← (x i + x 1 ) /2, f (x i ) (for all i) and goto 0.
It is assumed that the above operations (or inner steps) are executed in the given order. Since the related logical conditions are mutually disjoint, any order of steps 1-5 results in the same output.
There are two rules that apply to reindexing after each iteration. If a nonshrink step occurs, then x n+1 is replaced by a new point v ∈ {x r , x e , x oc , x ic }. The following cases are possible: then the new simplex vertices are . . , n + 1) . (2) This rule inserts v into the ordering with the highest possible index. If one of operations 1-4 is executed and the insertion rule (2) is used, the ordering operation can be skipped in the next iteration. If shrinking occurs, then We adopt the following notations. The simplex of iteration k is denoted by S (k) = x (k) 1 , x (k) 2 , . . . , x (k) n+1 ∈ R n×(n+1) with vertices that satisfy the condition The initial simplex is S (0) . The center, reflection, expansion and contraction points of simplex S (k) are denoted by x (k) c , x (k) r , x (k) e , x (k) oc and x (k) ic , respectively. The function values at the vertices x (k) j and the points x (k) c , x (k) r , x (k) e , x (k) oc and x (k) ic are denoted by f x ic , respectively. The insertion rule (2) guarantees that holds for any of operations 1-4. However this is not true in the case of shrinking. If function f is bounded from below on R n and only a finite number of shrink iterations occur, then each sequence f (k) i converges to some f ∞ i for i = 1, . . . , n + 1 (see Lemma 3.3 of [23]).
The original Nelder-Mead paper [35] was published in 1965 and since then it has been cited over 31000 times (see Google Scholar). However only a few results are known on the convergence.
In 1998 McKinnon [29] constructed a strictly convex function f : R 2 → R with continuous derivatives for which the Nelder-Mead simplex algorithm converges to a nonstationary point.
Also in 1998 Lagarias, Reeds, Wright and Wright [23] proved convergence for one and two variable strictly convex functions. For n = 2, they summarized the main results as follows (see p. 114 of [23] In 1999 Kelley [19,20] gave a sufficient decrease condition for the average of the object function values (evaluated at the vertices) and proved that if this condition is satisfied during the process, then any accumulation point of the simplices is a critical point of f .
In 2006 Han and Neumann [12] investigated the convergence and the effect of dimensionality on the Nelder-Mead method when it is applied to f (x) = x T x (x ∈ R n ). They also showed that the Nelder-Mead method deteriorates as n increases.
In 2012 Lagarias, Poonen, Wright [22] significantly improved the results of the earlier paper [23] for the restricted Nelder-Mead method, where expansion steps are not allowed. Let F be the class of twice-continuously differentiable functions R 2 → R with bounded level sets and everywhere positive definite Hessian. They proved that for any f ∈ F and any nondegenerate initial simplex S (0) , the restricted Nelder-Mead algorithm converges to the unique minimizer of f . Wright [43,44] raised several questions concerning the Nelder-Mead method such as the following: (a) Do the function values at all vertices necessarily converge to the same value? (b) Do all vertices of the simplices converge to the same point? (c) Why is it sometimes so effective (compared to other direct search methods) in obtaining a rapid improvement in f ? (d) One failure mode is known (McKinnon [29]) -but are there other failure modes? (e) Why, despite its apparent simplicity, should the Nelder-Mead method be difficult to analyze mathematically?
Although questions (a) and (b) were positively answered for one-and twodimensional strictly convex functions by Lagarias et al. [22,23] no general answer is known as yet.
Our purpose is to analyze and prove the convergence of the simplex sequence generated by the method. The matrix formulation of the Nelder-Mead simplex algorithm, which is introduced in Section 2 represents the kth simplex as a product of transformation matrices and so the convergence of the simplex sequence is related to the convergence of infinite matrix products. Hence the spectra of occurring matrices which is necessary for the convergence of the simplex sequence is investigated in Section 3. Section 4 discusses several examples of possible convergence behavior or failure, and specifies the type of convergence we prove later. The main convergence theorem is proved in Section 5 under Assumption (A) of Section 5. The assumption is proved for 1 ≤ n ≤ 8 in Section 6. Related numerical data is given in the Appendix.
The results and examples may answer some of the questions raised by Wright [43,44]. Actually Examples 4 and 5 answer questions (a) and (b) negatively. Examples of Section 4 answer question (d) positively. The main convergence theorem is related to questions (b) and (a). The connection of the Nelder-Mead method with infinite products of matrices may shed some light on question (e). This paper is an improvement of paper [10] where the main convergence result was proved under Assumption (A) and a second assumption on the spectra of transformation matrices. The two assumptions were numerically checked only for n = 1, 2, 3. Here we prove the convergence without the second assumption which follows from Theorem 3 of Section 3. The removal of Assumption (A) is unlikely for its connection to an undecidable problem. This will be discussed at the end of Section 5.
Then S (k) T (α) P j is the new simplex S (k+1) . The following cases are possible Operation New simplex Denote by P n+1 the set of all possible permutation matrices of order n + 1. In the case of shrinking the new simplex is where T shr = 1 2 I n+1 + 1 2 e 1 e T and the permutation matrix P ∈ P n+1 is defined by the ordering condition (3). Hence for k ≥ 1, the kth simplex of the Nelder-Mead method is where and Note that T contains 3n + 3 + (n + 1)! matrices, all transformation matrices T P ∈ T are nonsingular and their column sums are 1. Such matrices have the following properties.

Lemma 1 (i) If
A ∈ R n×n is a matrix whose column sums are 1, then A has an eigenvalue λ = 1 and a corresponding left eigenvector x = e T . (ii) If A, B ∈ R n×n are two matrices whose column sums are 1, then C = AB also has this property. (iii) If A ∈ R n×n is a matrix whose column sums are 1, then A ≥ 1 in any induced matrix norm.
A matrix A is called left stochastic, if a ij ≥ 0 for all i, j and the column sums are 1. A matrix is called stochastic, if a ij ≥ 0 for all i, j and both the column sums and the row sums are 1. All matrices T shr P (P ∈ P n+1 ) and T (α) (−1 ≤ α ≤ 0) are left stochastic matrices.
A simplex S = x 1 , . . . , x n+1 is said to be nondegenerate if the matrix is nonsingular. Then S must be affinely independent, which is equivalent to (see, e.g., [15,31]) that vectors 1 are linearly independent. Hence rank e T S = n + 1. 1 We always assume that the initial simplex S (0) is nondegenerate. Since e T B k = e T and e T S (k) = e T S (0) B k is nonsingular, S (k) is also nondegenerate.
Although S (k) ∈ R n×(n+1) we can relate the convergence of S (k) and {B k } as follows.
Assume that S (k) → S ∞ and {B k } is not convergent. Since {B k } is bounded it must have at least one accumulation point, say B * and a subsequence B i j ⊂ {B k } such that B i j → B * and S (ij ) → S (0) B * = S ∞ . Assume that there exists a second accumulation point B * * = B * and a subsequence B k j ⊂ {B k } such that B k j → B * * . It follows that Since e T S (0) is nonsingular, we obtain that B * = B * * , which is contradiction. It follows that {B k } converges.
If {B k } is not bounded, then as Examples 1 and 2 of Section 4 show that we can have a convergence of the function values f (k) i to limit values that are not related to any extrema of the function f .
Hence we study the convergence of S (k) through the convergence of {B k } or the convergence of the right infinite product B = ∞ i=1 T i P (i) (T i P (i) ∈ T ). We use the following results and definitions from the theory of infinite matrix products (see, e.g., Hartfiel [13]). A right infinite product is an expression A 1 A 2 · · · A k A k+1 · · · . A set Σ of n × n matrices has the right convergence property (RCP), if all possible right infinite products ∞ i=1 A i (A i ∈ Σ) converge. It is easy to show (see, e.g., Hartfiel [13] p. 103) that if Σ is an RCP set, A 1 , . . . , A k ∈ Σ and λ is an eigenvalue of A 1 A 2 · · · A k , then |λ| < 1 or λ = 1 and this eigenvalue is simple. Hence each matrix of Σ must satisfy this condition.
If Σ is an RCP set, then there is a vector norm · such that A ≤ 1 for all A ∈ Σ (see, e.g., [13]).
In any induced matrix norm T i P (i) ≥ 1 holds for every T i P (i) ∈ T and T (2) > 1. Hence T as a whole is not an RCP set. However the eigenvalueeigenvector structure of the transformation matrices makes it possible to identify a subset of T , which might have the RCP property. The next section investigates the structure of the matrices T i P (i) ∈ T . Using these results we show several examples of possible convergence behavior or failure in Section 4. We also specify here the type of convergence we study in the rest of the paper. In Section 5 we identify a subset of T , which might have the RCP property. Using this subset we give a sufficient condition under which the simplex sequences S (k) converge in the specified sense.
There are many problems that may complicate the analysis of the Nelder-Mead method. Here we mention the following. The Nelder-Mead algorithm is a nonstationary iteration (see, e.g., Young [45]) of the form where the matrices T k P (k) are not contractive ( T k P (k) ≥ 1). Operations 1-4 and the insertion rule (2) guarantee only improvement in the worst vertex (f ). In the case of shrinking there is no guaranteed improvement at all. It is also notable that the selection of T k P (k) at iteration k depends only on the positions of f shows that given an initial simplex S (0) , the Nelder-Mead algorithm may generate the same sequence S (k) for different functions such that lim k→∞ S (k) has different meanings for the different functions.
Finally, we note that matrix T (α) appears in Lagarias et al. [23] (p. 149) but it is not exploited subsequently.

The spectra of the transformation matrices
In this section we study the spectra of the transformation matrices T (α) P j and T shr P so that we could find a subset of T , which has the RCP property. We then study the asymptotic behavior of the eigenvalues for n → ∞ which is important for the dimensionality effects. Han, Neumann and Xu [11] and Han and Neumann [12] obtained similar results. The similarities and connections will be discussed at the end of the section.
Theorem 1 (Cohn's rules [38], Thm. 11.5.3) Let p (z) = n i=0 a i z i be a polynomial of degree n. Denote by r and s the number of zeros of p inside the unit circle and on it, respectively. For polynomial p j the corresponding numbers are respectively denoted by r j and s j . Then the following rules hold.

Theorem 4 (Langville and Meyer
Since the eigenvalues of W and W T coincide, the same result holds for the transposed matrix
For a more precise result, we use a less known result of E. Landau [25] (see also [1,30] or [33]). [25]). Consider the polynomial p (z) = a 0 +a 1 z+· · ·+a n z n (a 0 a n = 0). If z is a zero of p (z) and t is any positive real number, then

Theorem 5 (E. Landau
Landau' theorem yields the following characterizations of the eigenvalues of T (α) P j when n → ∞.

Lemma 4 Assume that α > 1 and consider the eigenvalues of T (α) P 1 for n → ∞.
Then the absolute values of the eigenvalues of T (α) P 1 (the zeros of p n+1 (λ)) are converging to 1.
Han and Neumann [12] investigated the Nelder-Mead method when it generates a sequence of simplices in R n such that for k ≥ k 0 . They expressed the next incoming vertex v (k+n) in a difference equation form which has the characteristic equation The characteristic polynomial of A k coincides with p (μ) and A k is close to T (α) P 1 ∈ R (n+1)×(n+1) . Han, Neumann and Xu [11] investigated the zeros of p (μ) in the form of the two parameter polynomial of degree n using a different version of the Schur-Cohn criterion (see Marden [30]) and a different technique. Theorems 4.1, 4.2 and 5.1 of [11] are those results that can be related to this paper.
(i) Assume that |b| < 1. The polynomial (12) has one root in the interior of the unit disk and the remaining roots on the unit circle if b+1 a = −1. In this paper we investigated a set of polynomials of the form where the parameters are α and (2 ≤ ≤ n + 1). In our case b = α, a = 1+α n implying that (b + 1) /a = n for 2 ≤ ≤ n + 1 (see Theorem 3). Theorem 4.1 of [11] assumes that either (b + 1) /a = −1 or (b + 1) /a = n − 1. In the latter case we can write that p For b = α and = n + 1 (j = 1) the polynomial p (λ) coincides with (13). Hence case (iv) of Theorem 3 also follows from case (ii) of Theorem 4.1 of [11]. In turn, case (iv) of Theorem 4.1 of [11] follows from case (vii) of Theorem 3. For = n + 1, the two theorems are clearly different.

Examples of convergence behavior
Following McKinnon [29] and Han and Neumann [12] we investigate simple behavior patterns of the simplex sequences S (k) generated by the Nelder-Mead method. If the simplex sequence S (k) generated by the Nelder-Mead algorithm is con- 1, 2, . . . , n + 1) provided that f is continuous at the points S ∞ e i (i = 1, 2, . . . , n + 1). Note that if for some vector x, x (k) j → x (k → ∞, j = 1, 2, . . . , n + 1), then S (k) → xe T , which is a rank-one matrix of special form.
We show examples where the incoming vertex v satisfies with a fixed index j and the type of v (reflection, expansion, outside contraction, or inside contraction point) is the same for k ≥ 0. 3 Hence S (k) = S (0) T (α) P j k for k ≥ 0.
Using the examples we can specify the type of convergence which is studied in the rest of the paper. Assume that |α| < 1 and ). It follows from Theorem 3 (formula (10)) that If rank S (0) j −1 ≥ 2, then lim k→∞ S (k) cannot be of the form xe T for some vector x, diam S (k) 0, and f (k) i 's do not converge to the same limit. For j = 1, 2, we can write T (α) P j in the form For |α| < 1, Theorem 3 implies that ρ (C) < 1 and so → 0 (i ∈ J ) holds. 3 Here we adopt the notations defined in Section 1.

In Example 3 the incoming vertex is
and three functions are given so that x (0)

is not a stationary point of f or it is a minimum point of f . In the n-dimensional Example 4 the incoming vertex is
c is not a stationary point of the three given functions f .
In Example 5 the incoming vertex is x (k) ic which replaces x (k) 0. Two functions are given such that lim k x (k) c is not a stationary point for the first function, while it is a saddle point for the second function.
In this case .

Example 2 The expansion point x (k)
r is the incoming vertex infinitely many times if In this case S (k) = S (0) B k where B k = [T (1) P 1 ] k . Since T (1) P 1 has a 2 × 2 Jordan block belonging to λ = 1, B k is not bounded. Define If condition (15) holds, then Note that S (k) is unbounded, while diam S (k) is constant. It is easy to verify that condition (15) holds for the functions f 1 (x, y) Examples 1 and 2 show that the assumption on the boundedness of {B k } is justified.
McKinnon [29] proved the convergence behavior (17) of Example 3 for the convex function with initial simplex where the common limit point [0, 0] T of the simplex vertices is not a stationary point of f M . Using simplex S (0) of (16) we show other functions that generate the same simplex sequence, while the limit point [0, 0] T is either a stationary point or a nonstationary point depending on the particular function.
Note that for all three functions, we have the same limit point, and the simplex sequence depends on the initial simplex and the relative function value distribution.
Han and Neumann [12] investigated the behavior pattern where the incoming point v is either x ic . Hence Under this assumption they proved the convergence x (k) For n = 2, they gave an initial simplex S (0) for which condition (18) with α k ≡ 1 2 is fulfilled. Example 4 Assume that x (k) ic is the incoming vertex infinitely many times such that f (k) and n ≥ 2. In this case .
(1 + sin (|x n |)) and Note that 0 < f 7 (x) ≤ 1 + |x n | and f 7 has no finite minimum. Functions f 8 and f 9 have an infinite number of global maximum and minimum points. Also note that diam S (k) 0.

Lemma 5 Assume that
Proof Assume that for some −1 ≤ t < 0, x 3 = ϕ (t) and f (0) . Then x r = ϕ (−t), x ic = ϕ t 2 , and (b) and (c) imply that Example 5 Consider function f (x, y) = 1 4 (x + |x|) + 1 2 |x − |x|| + g (y), where The examples show that for different functions the Nelder-Mead algorithm may generate the same simplex sequence whose limit vertices may be different from or equal to a stationary point of the function. They also show that in case of convergence the simplex vertices x (k) j either converge to the same vector x or converge to different vectors as shown by Examples 4 and 5.
Assume that the simplex vertices x (k) j (j = 1, 2, . . . , n + 1) converge to the same vector x as k → ∞ and f is continuous at x. Then and f Note that properties f (k) i → f (i = 1, . . . , n + 1) and diam S (k) → 0 (k → ∞) were proved directly for strictly convex two-dimensional functions by Lagarias et al. [23] without relating the results to the stationary point of f . Except for Kelley [20], [19] and Lagarias et al. [22] no general result is known as yet on the convergence to a stationary point of the target function f .
Upon the basis of the preceding arguments we restrict our study to the convergence S (k) → xe T , which implies f (k) i → f ( x) (i = 1, . . . , n+1) and the speed estimate (24).

The convergence of the Nelder-Mead method
The convergence result will be proved in several steps. First using a fixed similarity we transform the matrices T i P (i) ∈ T to a common lower block triangular form and identify a subset of T that might have the RCP property. Next we prove a lemma on the convergence of the product of lower block triangular matrices. Finally, under Assumption (A) we prove the convergence in Section 5.3

A common similarity transformation
Since e T is a left eigenvector of each T i P (i) ∈ T , there exists a common similarity transformation that makes them block lower triangular (for a more general case, see Theorem 6.10 of Hartfiel [13]).

Lemma 6 For all T i P (i) ∈ T , matrix F −1 T i P (i) F has the form
where b i ∈ R n and C i ∈ R n×n are defined by T i P (i) .
Note that matrices C i and their norms play the key role in the convergence proof. Accordingly we divide the set T in two disjoint sets and The matrices of W 1 correspond to the inside and outside contraction operations, when the incoming vertices are inserted in the first or the second position of the ordering (3) or they correspond to any shrinking operation. The matrices of W 2 correspond to the remaining operations of T . Theorem 3 and Corollary 1 imply that and Note that for each matrix T i P (i) ∈ W 1 , an induced matrix norm · exists such that ρ (C i ) ≤ C i < 1. However for any T i P (i) ∈ W 2 and any induced matrix norm · , only 1 ≤ ρ (C i ) ≤ C i holds. Since max ρ (C i ) : T i P (i) ∈ W 1 < 1, set W 1 might be an RCP set if we find a proper induced norm for which C i ≤ 1 for all T i P (i) ∈ W 1 . In fact we make a stronger restriction in the form of Assumption (A).

A lemma on the convergence of lower block triangular matrices
For i ≥ 1, let for some x.
Proof It is easy to see that If ∞ k=1 c k is convergent, then c k → 0. Hence k j =1 C j → 0 as k → ∞. Since s k = k j =1 c j is convergent, for any ε > 0 there is a number k 0 = k 0 (ε) such that for m > k ≥ k 0 , |s m − s k | < ε. Thus for m > k ≥ k 0 , we obtain Hence x k → x for some x.
If C j ≤ q < 1 for j ≥ 1, then k j =1 C j ≤ q k and the series ∞ i=1 q i is convergent.

The convergence theorem
Formula (25) and B k is convergent if and only if L k is convergent. The convergence of the Nelder-Mead algorithm will be proved under the following key condition.

Assumption (A): There is an induced matrix norm
Under Assumption (A) the matrix set C = C i : T i P (i) ∈ W 1 is an RCP set and all infinite products ∞ i=1 C (i) (C (i) ∈ C) converge to the zero matrix. By Lemma 7 the sequences B k = k i=1 T i P (i) and S (k) = S (0) B k (T i P (i) ∈ W 1 ) are also convergent. Hence if Assumption (A) holds, then W 1 is an RCP set. We show in Section 6 and the Appendix that Assumption (A) holds at least for 1 ≤ n ≤ 8. Note again that matrices of W 1 correspond to the inside and outside contraction operations, when the incoming vertices are inserted in the first or the second position of the ordering (3) or they correspond to any shrinking operation. t 1 (k) be the number of operations T i P (i) that belong to W 1 , and t 2 (k) be the number of those operations T i P (i) that belong to W 2 during the first k iterations of the Nelder-Mead method. Also assume that for κ ∈ N, q 1−κ ≤ Q ≤ q −κ and for μ ∈ (0, 1), t 1 (k) ≥ μk + κt 2 (k) holds (k ≥ k 0 ). Then the Nelder-Mead algorithm converges in the sense that

Theorem 9 Suppose that Assumption (A) is satisfied and S (0) is nondegenerate. Let
for some vector x with a convergence speed proportional to O q μk . If f is continuous at x, then holds as well.
Proof We first investigate the product (33). By assumption t 1 (k) is the number of those C i 's that satisfies and ∞ i=1 c k is clearly convergent. Hence it follows from Lemma 7 that holds with a suitable constant Γ 2 > 0. Hence and For higher dimension, we can expect slower convergence, since Lemma 3 implies that q must approach 1.
Except for Lagarias et al. [23] we do not know that what kind of steps can follow each others when the Nelder-Mead method applied to a function. Under Assumption (A), if the Nelder-Mead steps are taken only from W 1 , then the algorithm converges in the sense of Theorem 9. Thus the convergence of the Han-Neumann case (18)- (19) also follows. In general the method also takes steps from W 2 . If it occurs in a finite number of occasions, that is t 2 (k) ≤ k 0 , then t 1 (k) ≥ k − k 0 and we can set μ = 1 in the theorem. If not, we must assume that the elements from W 1 counterbalance the effect of those from W 2 . This is provided by the simple assumption t 1 (k) ≥ μk + κt 2 (k).
The difficulty of Theorem 9 is to find a suitable norm · ϑ for which Assumption (A) holds. The reason for this is the following.
If all infinite products from a matrix set Σ converge, that is Σ is an RCP set, then Σ is also product bounded (see, e.g., [13]). A set Σ of n × n matrices is product bounded if there is a constant β > 0 such that A 1 · · · A k ≤ β for all k and all A 1 , . . . , A k ∈ Σ. A matrix set Σ is product bounded if and only if there exists a multiplicative matrix norm · such that A ≤ 1 for all A ∈ Σ (see, e.g., [4,13]).
If W 1 is an RCP set, then it is also product bounded. Blondel and Tsitsiklis [5] proved that the product boundedness of a finite matrix set Σ is algorithmically undecidable and it remains undecidable even in the special case, when Σ consists of only two matrices. Since product boundedness is a weaker property than the RCP, and yet it is algorithmically undecidable, it seems difficult to decide the RCP property in general. In Section 6 we present a technique that circumvents this problem for W 1 at least for n ≤ 8.

Convergence for 2 ≤ n ≤ 8
Using the result of Stein [40], Householder [17] proved that for each matrix A ∈ R m×m with ρ (A) < 1, there is a matrix R such that RAR −1 2 < 1. A related result is given by Deutsch [7].
Here we need a matrix S such that C i w = S −1 C i S 2 < 1 holds for all matrices T i P (i) ∈ W 1 . A simultaneous diagonalization of these matrices clearly would do it. However the matrices of W 1 are not pairwise commuting and so they are not simultaneously diagonalizable (see, e.g., [9,34,37]). Hence we tried to solve the optimization problem using the standard matrix routines of Matlab R2013b and the 'fminsearch' (Nelder-Mead) algorithm starting from several initial points. All numerical computations were done on a PC with Intel i7-8700 CPU @ 3.20GHz and Windows 10 operating system. The numerical results are presented in this section and the Appendix are given in Matlab's short format (scaled fixed point format with 5 digits). Since the number of possible (n + 1) × (n + 1) matrices T i P (i) is N = 3n + 3 + (n + 1)! we only present the following computed quantities The computed matrices S will be presented in the Appendix for 2 ≤ n ≤ 8. The data of the table implies that Assumption (A) holds and so Theorem 9 implies the convergence of the Nelder-Mead method for n = 2, . . . , 8. Note that ρ 1 and ρ 2 approach 1 as indicated by Lemmas 3 and 4.
If we exclude the expansion operations (T (2) P 1 , T (1) P 1 ), we have smaller κ values as shown in the Consequently we have a faster convergence speed, although the estimated speed still slows down for increasing n.

Summary
We analyzed the Nelder-Mead algorithm in the iterative form where T i P (i) ∈ T is the matrix of the executed inner step at iteration i. Since the convergence of the sequence S (k) clearly depends on the convergence of the infinite matrix products ∞ i=1 T i P (i) , we used techniques from the theory of infinite matrix products [13]. First, we investigated the spectra of the matrices T i P (i) , then using a simultaneous similarity reduction on T to block lower triangular matrices we proved a convergence result (Theorem 9) for the simplex sequence S (k) to rank-one matrices of the form xe T for some vector x. This implies the convergence f The examples of Section 4 support the study of this type of convergence. The main idea of the convergence theorem is to identify a subset W 1 of operations T that has the RCP property. This property follows from Assumption (A) which is proved for 1 ≤ n ≤8 using numerical optimization. It is not yet known if this bound can be increased in the same way or not. Theorem 9 has a deficiency that x is not related to any stationary point of f (for a similar result, see also Lagarias et al. [23]). The results of Kelley [20], Lagarias et al. [22] and also the examples of Section 4 indicate that new techniques are to be developed for such results. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.