1 Introduction

We study the convergence of the Nelder-Mead simplex method [35] for the solution of the unconstrained minimization problem

$$ f\left( x\right) \rightarrow\min\quad\left( f:\mathbb{R}^{n}\rightarrow \mathbb{R}\right), $$

where f is continuous. The Nelder-Mead method is widely used in derivative-free optimization and various application areas [2, 6, 21, 28, 42].

There are several forms and variants of the Nelder-Mead method. We use the version of Lagarias, Reeds, Wright and Wright [23]. The vertices of the initial simplex S are denoted by \(x_{1},x_{2},\ldots ,x_{n+1} \in \mathbb {R}^{n}\). It is assumed that vertices x1,…,xn+ 1 are ordered such that

$$ f\left( x_{1}\right) \leq f\left( x_{2}\right) \leq\cdots\leq f\left( x_{n+1}\right) $$
(1)

and this condition is maintained during the iterations of the Nelder-Mead algorithm. Define the center \(x_{c}=\frac {1}{n}{\sum }_{i=1}^{n}x_{i}\) and \(x\left (\lambda \right ) =\left (1+\lambda \right ) x_{c}-\lambda x_{n+1}\). The related evaluation points are

$$ x_{r}=x\left( 1\right) ,\quad x_{e}=x\left( 2\right) ,\quad x_{oc} =x\left( \frac{1}{2}\right) ,\quad x_{ic}=x\left( -\frac{1}{2}\right) . $$

Then one (major) iteration of the method consists of the following operations or inner steps:

Operation

Nelder-Mead simplex method

0. Ordering

Order the vertices of S such that

 

\( f\left (x_{1}\right ) \leq \cdots \leq f\left (x_{n+1}\right ) \).

1. Reflect

If \(f\left (x_{1}\right ) \leq f\left (x_{r}\right ) <f\left (x_{n}\right ) \), then replace

 

xn+ 1 by xr and goto 0.

2. Expand

If \(f\left (x_{r}\right ) <f\left (x_{1}\right ) \) and \(f\left (x_{e}\right ) <f\left (x_{r}\right ) \),

 

then replace xn+ 1 by xe and goto 0.

 

If \(f\left (x_{e}\right ) \geq f\left (x_{r}\right ) \), then replace

 

xn+ 1 by xr and goto 0.

3. Contract outside

If \(f\left (x_{n}\right ) \leq f\left (x_{r}\right ) <f\left (x_{n+1}\right ) \) and \(f\left (x_{oc}\right ) \leq f\left (x_{r}\right ) \),

 

then replace xn+ 1 by xoc and goto 0.

4. Contract inside

If \(f\left (x_{r}\right ) \geq f\left (x_{n+1}\right ) \) and \(f\left (x_{ic}\right ) <f\left (x_{n+1}\right ) \)

 

then replace xn+ 1 by xic and goto 0.

5. Shrink

\(x_{i}\leftarrow \left (x_{i}+x_{1}\right ) /2\), \(f\left (x_{i}\right ) \) (for all i) and goto 0.

It is assumed that the above operations (or inner steps) are executed in the given order. Since the related logical conditions are mutually disjoint, any order of steps 1–5 results in the same output.

There are two rules that apply to reindexing after each iteration. If a nonshrink step occurs, then xn+ 1 is replaced by a new point \(v\in \left \{ x_{r},x_{e},x_{oc},x_{ic}\right \} \). The following cases are possible:

$$ f\left( v\right) <f\left( x_{1}\right) ,\quad f\left( x_{1}\right) \leq f\left( v\right) \leq f\left( x_{n}\right) ,\quad f\left( v\right) <f\left( x_{n+1}\right) . $$

If

$$ j=\left\{ \begin{array} [c]{ll} 1, & \text{if }f\left( v\right) <f\left( x_{1}\right) \\ \max\limits_{2\leq\ell\leq n+1}\left\{ f\left( x_{\ell-1}\right) \leq f\left( v\right) \leq f\left( x_{\ell}\right) \right\} , & \text{otherwise} \end{array} \right. . $$

then the new simplex vertices are

$$ x_{i}^{new}=x_{i}~\left( 1\leq i\leq j-1\right) ,~x_{j}^{new}=v,~x_{i}^{new}=x_{i-1}~\left( i=j+1,\ldots,n+1\right) . $$
(2)

This rule inserts v into the ordering with the highest possible index. If one of operations 1–4 is executed and the insertion rule (2) is used, the ordering operation can be skipped in the next iteration. If shrinking occurs, then

$$ x_{1}^{\prime}=x_{1},\quad x_{i}^{\prime}=\left( x_{i}+x_{1}\right) /2\quad(i=2,\ldots,n+1) $$

and the ordering operation is necessary in the next iteration. By convention, if \(f\left (x_{1}^{\prime }\right ) \leq f\left (x_{i}^{\prime }\right ) \) (i = 2,…,n), then \(x_{1}^{new}=x_{1}\).

We adopt the following notations. The simplex of iteration k is denoted by \(S^{\left (k\right ) }=\left [ x_{1}^{\left (k\right ) },x_{2}^{\left (k\right ) },\ldots ,x_{n+1}^{\left (k\right ) }\right ] \in \mathbb {R}^{n\times \left (n+1\right ) }\) with vertices that satisfy the condition

$$ f\left( x_{1}^{\left( k\right) }\right) \leq f\left( x_{2}^{\left( k\right) }\right) \leq\cdots\leq f\left( x_{n+1}^{\left( k\right) }\right) \quad\left( k\geq0\right) . $$
(3)

The initial simplex is \(S^{\left (0\right ) }\). The center, reflection, expansion and contraction points of simplex \(S^{\left (k\right ) }\) are denoted by \(x_{c}^{\left (k\right ) }\), \(\ x_{r}^{\left (k\right ) }\), \(x_{e}^{\left (k\right ) }\), \(x_{oc}^{\left (k\right ) }\) and \(x_{ic}^{\left (k\right ) }\), respectively. The function values at the vertices \(x_{j}^{\left (k\right ) }\) and the points \(x_{c}^{\left (k\right ) },\) \(x_{r}^{\left (k\right ) }\), \(x_{e}^{\left (k\right ) }\), \(x_{oc}^{\left (k\right ) }\) and \(x_{ic}^{\left (k\right ) }\) are denoted by \(f\left (x_{j}^{\left (k\right ) }\right ) =f_{j}^{\left (k\right ) }\) (j = 1,…,n + 1), \(f_{c}^{\left (k\right ) }\), \(f_{r}^{\left (k\right ) }=f\left (x_{r}^{\left (k\right ) }\right ) \), \(f_{e}^{\left (k\right ) }=f\left (x_{e}^{\left (k\right ) }\right ) \), \(f_{oc}^{\left (k\right ) }=f\left (x_{oc}^{\left (k\right ) }\right ) \) and \(f_{ic}^{\left (k\right ) }=f\left (x_{ic}^{\left (k\right ) }\right ) \), respectively.

The insertion rule (2) guarantees that

$$ f_{i}^{\left( k+1\right) }\leq f_{i}^{\left( k\right) }\quad \mathsf{\ (}i=1,2,\ldots,n+1\mathsf{)} $$
(4)

holds for any of operations 1–4. However this is not true in the case of shrinking. If function f is bounded from below on \(\mathbb {R}^{n}\) and only a finite number of shrink iterations occur, then each sequence \(\left \{ f_{i}^{\left (k\right ) }\right \} \) converges to some \(f_{i}^{\infty }\) for i = 1,…,n + 1 (see Lemma 3.3 of [23]).

The original Nelder-Mead paper [35] was published in 1965 and since then it has been cited over 31000 times (see Google Scholar). However only a few results are known on the convergence.

In 1998 McKinnon [29] constructed a strictly convex function \(f:\mathbb {R}^{2}\rightarrow \mathbb {R}\) with continuous derivatives for which the Nelder-Mead simplex algorithm converges to a nonstationary point.

Also in 1998 Lagarias, Reeds, Wright and Wright [23] proved convergence for one and two variable strictly convex functions. For n = 2, they summarized the main results as follows (see p. 114 of [23]):

  • The function values at all simplex vertices in the standard Nelder-Mead algorithm converge to the same value.

  • The simplices in the standard Nelder-Mead algorithm have diameters converging to zero.

In 1999 Kelley [19, 20] gave a sufficient decrease condition for the average of the object function values (evaluated at the vertices) and proved that if this condition is satisfied during the process, then any accumulation point of the simplices is a critical point of f.

In 2006 Han and Neumann [12] investigated the convergence and the effect of dimensionality on the Nelder–Mead method when it is applied to \(f\left (x\right ) =x^{T}x\) (\(x\in \mathbb {R}^{n}\)). They also showed that the Nelder-Mead method deteriorates as n increases.

In 2012 Lagarias, Poonen, Wright [22] significantly improved the results of the earlier paper [23] for the restricted Nelder-Mead method, where expansion steps are not allowed. Let \(\mathcal {F}\) be the class of twice-continuously differentiable functions \(\mathbb {R}^{2}\rightarrow \mathbb {R}\) with bounded level sets and everywhere positive definite Hessian. They proved that for any \(f\in \mathcal {F}\) and any nondegenerate initial simplex \(S^{\left (0\right ) }\), the restricted Nelder-Mead algorithm converges to the unique minimizer of f.

Wright [43, 44] raised several questions concerning the Nelder-Mead method such as the following:

  1. (a)

    Do the function values at all vertices necessarily converge to the same value?

  2. (b)

    Do all vertices of the simplices converge to the same point?

  3. (c)

    Why is it sometimes so effective (compared to other direct search methods) in obtaining a rapid improvement in f?

  4. (d)

    One failure mode is known (McKinnon [29]) — but are there other failure modes?

  5. (e)

    Why, despite its apparent simplicity, should the Nelder-Mead method be difficult to analyze mathematically?

Although questions (a) and (b) were positively answered for one- and two-dimensional strictly convex functions by Lagarias et al. [22, 23] no general answer is known as yet.

Our purpose is to analyze and prove the convergence of the simplex sequence generated by the method. The matrix formulation of the Nelder-Mead simplex algorithm, which is introduced in Section 2 represents the k th simplex as a product of transformation matrices and so the convergence of the simplex sequence is related to the convergence of infinite matrix products. Hence the spectra of occurring matrices which is necessary for the convergence of the simplex sequence is investigated in Section 3. Section 4 discusses several examples of possible convergence behavior or failure, and specifies the type of convergence we prove later. The main convergence theorem is proved in Section 5 under Assumption (A) of Section 5. The assumption is proved for 1 ≤ n ≤ 8 in Section 6. Related numerical data is given in the Appendix.

The results and examples may answer some of the questions raised by Wright [43, 44]. Actually Examples 4 and 5 answer questions (a) and (b) negatively. Examples of Section 4 answer question (d) positively. The main convergence theorem is related to questions (b) and (a). The connection of the Nelder-Mead method with infinite products of matrices may shed some light on question (e).

This paper is an improvement of paper [10] where the main convergence result was proved under Assumption (A) and a second assumption on the spectra of transformation matrices. The two assumptions were numerically checked only for n = 1,2,3. Here we prove the convergence without the second assumption which follows from Theorem 3 of Section 3. The removal of Assumption (A) is unlikely for its connection to an undecidable problem. This will be discussed at the end of Section 5.

2 A matrix form of the Nelder-Mead method and consequences

Assume that simplex \(S^{\left (k\right ) }=\left [ x_{1}^{\left (k\right ) },x_{2}^{\left (k\right ) },\ldots ,x_{n+1}^{\left (k\right ) }\right ] \) is such that condition (3) holds. If the incoming vertex v is of the form

$$ v=\frac{1+\alpha}{n}\sum\limits_{i=1}^{n}x_{i}^{\left( k\right) }-\alpha x_{n+1}^{\left( k\right) } $$

for some \(\alpha \in \left \{ 1,2,\frac {1}{2},-\frac {1}{2}\right \} \), we can define the transformation matrix

$$ T\left( \alpha\right) =\left[ \begin{array} [c]{cc} I_{n} & \frac{1+\alpha}{n}e\\ 0 & -\alpha \end{array} \right] \quad\left( e=\left[ 1,1,\ldots,1\right]^{T}\right) . $$

Since \(S^{\left (k\right ) }T\left (\alpha \right ) =\left [ x_{1}^{\left (k\right ) },\ldots ,x_{n}^{\left (k\right ) },x\left (\alpha \right ) \right ] \), we have to reorder the matrix columns according to the insertion rule (2). Define the permutation matrix

$$ P_{j}=\left[ e_{1},\ldots,e_{j-1},e_{n+1},e_{j},\ldots,e_{n}\right] \in\mathbb{R}^{\left( n+1\right) \times\left( n+1\right) }\quad\left( j=1,\ldots,n+1\right) . $$

Then \(S^{\left (k\right ) }T\left (\alpha \right ) P_{j}\) is the new simplex \(S^{\left (k+1\right ) }\). The following cases are possible

Operation

New simplex

1. Reflection

\(S^{\left (k+1\right ) }=S^{\left (k\right ) }T\left (1\right ) P_{j}\) (j = 2,…,n)

2a) Expansion(\(v=x_{e}^{\left (k\right ) }\))

\(S^{\left (k+1\right ) }=S^{\left (k\right ) }T\left (2\right ) P_{1}\)

2b) Expansion(\(v=x_{r}^{\left (k\right ) }\))

\(S^{\left (k+1\right ) }=S^{\left (k\right ) }T\left (1\right ) P_{1}\)

3. Outside contraction

\(S^{\left (k+1\right ) }=S^{\left (k\right ) }T\left (\frac {1}{2}\right ) P_{j}\) (j = 1,…,n + 1)

4. Inside contraction

\(S^{\left (k+1\right ) }=S^{\left (k\right ) }T\left (-\frac {1}{2}\right ) P_{j}\) (j = 1,…,n + 1)

Denote by \(\mathcal {P}_{n+1}\) the set of all possible permutation matrices of order n + 1. In the case of shrinking the new simplex is

$$ S^{\left( k+1\right) }=S^{\left( k\right) }T_{shr}P\quad\left( P\in\mathcal{P}_{n+1}\right) , $$

where

$$ T_{shr}=\frac{1}{2}I_{n+1}+\frac{1}{2}e_{1}e^{T} $$

and the permutation matrix \(P\in \mathcal {P}_{n+1}\) is defined by the ordering condition (3).

Hence for k ≥ 1, the k th simplex of the Nelder-Mead method is

$$ S^{\left( k\right) }=S^{\left( k-1\right) }T_{k}P^{\left( k\right) }=S^{\left( 0\right) }B_{k}, $$
(5)

where

$$ B_{k}= \prod\limits_{i=1}^{k}T_{i}P^{\left( i\right) }\quad\left( T_{i}P^{\left( i\right) }\in\mathcal{T}\right) $$
(6)

and

$$ \begin{array}{@{}rcl@{}} \mathcal{T}=\left\{ T\left( \alpha\right) P_{j}:\alpha\in\left\{ -\frac {1}{2},\frac{1}{2}\right\} ,~j=1,\ldots,n+1\right\} \\ \cup\left\{ T_{shr}P:P\in\mathcal{P}_{n+1}\right\} \cup\left\{ T\left( 1\right) P_{j}:j=1,\ldots,n\right\} \cup\left\{ T\left( 2\right) P_{1}\right\} \end{array} $$
(7)

Note that \(\mathcal {T}\) contains \(3n+3+\left (n+1\right ) !\) matrices, all transformation matrices \(TP\in \mathcal {T}\) are nonsingular and their column sums are 1.

Such matrices have the following properties.

Lemma 1

(i) If \(A\in \mathbb {R}^{n\times n}\) is a matrix whose column sums are 1, then A has an eigenvalue λ = 1 and a corresponding left eigenvector x = eT. (ii) If \(A,B\in \mathbb {R}^{n\times n}\) are two matrices whose column sums are 1, then C = AB also has this property. (iii) If \(A\in \mathbb {R}^{n\times n}\) is a matrix whose column sums are 1, then \(\left \Vert A\right \Vert \geq 1\) in any induced matrix norm.

Proof

(i) By definition \(e^{T}A=\left [ {\sum }_{i=1}^{n}a_{i1},\ldots ,{\sum }_{i=1}^{n}a_{in}\right ] =1\cdot e^{T}\). (ii) eTA = A and eTB = eT imply eTAB = eTB = eT. (iii) \(\rho \left (A\right ) \geq 1\) and \(\left \Vert A\right \Vert \geq \rho \left (A\right ) \) imply \(\left \Vert A\right \Vert \geq 1\). □

A matrix A is called left stochastic, if aij ≥ 0 for all i,j and the column sums are 1. A matrix is called stochastic, if aij ≥ 0 for all i,j and both the column sums and the row sums are 1. All matrices TshrP (\(P\in \mathcal {P}_{n+1}\)) and \(T\left (\alpha \right ) \) (− 1 ≤ α ≤ 0) are left stochastic matrices.

A simplex \(S=\left [ x_{1},\ldots ,x_{n+1}\right ] \) is said to be nondegenerate if the matrix

$$ M=\left[ x_{1}-x_{n+1},x_{2}-x_{n+1},\ldots,x_{n}-x_{n+1}\right] $$

is nonsingular. Then S must be affinely independent, which is equivalent to (see, e.g., [15, 31]) that vectors \(\left [ \begin {array} [c]{c} 1\\ x_{1} \end {array} \right ] ,\ldots ,\left [ \begin {array} [c]{c} 1\\ x_{n+1} \end {array} \right ] \) are linearly independent. Hence rank\(\left (\left [ \begin {array} [c]{c} e^{T}\\ S \end {array} \right ] \right ) =n+1\).Footnote 1 We always assume that the initial simplex \(S^{\left (0\right ) }\) is nondegenerate. Since eTBk = eT and \(\left [ \begin {array} [c]{c} e^{T}\\ S^{\left (k\right ) } \end {array} \right ] =\left [ \begin {array} [c]{c} e^{T}\\ S^{\left (0\right ) } \end {array} \right ] B_{k}\) is nonsingular, \(S^{\left (k\right ) }\) is also nondegenerate.

Although \(S^{\left (k\right ) }\in \mathbb {R}^{n\times \left (n+1\right ) }\) we can relate the convergence of \(\left \{ S^{\left (k\right ) }\right \} \) and \(\left \{ B_{k}\right \} \) as follows.

Lemma 2

If \(\left \{ B^{\left (k\right ) }\right \} \) is bounded, then \(\left \{ S^{\left (k\right ) }\right \} \) converge to some \(S^{\infty }\) if and only if \(\left \{ B_{k}\right \} \) converge to some B.

Proof

If \(S^{\left (k\right ) }\rightarrow S^{\infty }\) (whatever \(S^{\infty }\) is) and \(B_{k}\rightarrow B\), then \(S^{\left (0\right ) }B_{k}\rightarrow S^{\left (0\right ) }B=S^{\infty }\). Assume that \(S^{\left (k\right ) }\rightarrow S^{\infty }\) and \(\left \{ B_{k}\right \} \) is not convergent. Since \(\left \{ B_{k}\right \} \) is bounded it must have at least one accumulation point, say B and a subsequence \(\left \{ B_{i_{j} }\right \} \subset \left \{ B_{k}\right \} \) such that \(B_{i_{j}}\rightarrow B^{\ast }\) and \(S^{\left (i_{j}\right ) }\rightarrow S^{\left (0\right ) }B^{\ast }=S^{\infty }\). Assume that there exists a second accumulation point B∗∗B and a subsequence \(\left \{ B_{k_{j}}\right \} \subset \left \{ B_{k}\right \} \) such that \(B_{k_{j}}\rightarrow B^{\ast \ast } \). It follows that

$$ \left[ \begin{array} [c]{c} e^{T}\\ S^{\left( i_{j}\right) } \end{array} \right] \rightarrow\left[ \begin{array} [c]{c} e^{T}\\ S^{\left( 0\right) } \end{array} \right] B^{\ast}=\left[ \begin{array} [c]{c} e^{T}\\ S^{\left( 0\right) } \end{array} \right] B^{\ast\ast}\leftarrow\left[ \begin{array} [c]{c} e^{T}\\ S^{\left( k_{j}\right) } \end{array} \right] . $$

Since \(\left [ \begin {array} [c]{c} e^{T}\\ S^{\left (0\right ) } \end {array} \right ] \) is nonsingular, we obtain that B = B∗∗, which is contradiction. It follows that \(\left \{ B_{k}\right \} \) converges. □

If \(\left \{ B_{k}\right \} \) is not bounded, then as Examples 1 and 2 of Section 4 show that we can have a convergence of the function values \(f_{i}^{\left (k\right ) }\) to limit values that are not related to any extrema of the function f.

Hence we study the convergence of \(\left \{ S^{\left (k\right ) }\right \} \) through the convergence of \(\left \{ B_{k}\right \} \) or the convergence of the right infinite product \(B={\prod }_{i=1}^{\infty } T_{i}P^{\left (i\right ) }\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {T}\)).

We use the following results and definitions from the theory of infinite matrix products (see, e.g., Hartfiel [13]). A right infinite product is an expression A1A2AkAk+ 1⋯. A set Σ of n × n matrices has the right convergence property (RCP), if all possible right infinite products \({\prod }_{i=1}^{\infty }A_{i}\) (AiΣ) converge.

It is easy to show (see, e.g., Hartfiel [13] p. 103) that if Σ is an RCP set, A1,…,AkΣ and λ is an eigenvalue of A1A2Ak, then \(\left \vert \lambda \right \vert <1\) or λ = 1 and this eigenvalue is simple. Hence each matrix of Σ must satisfy this condition.

If Σ is an RCP set, then there is a vector norm \(\left \Vert \cdot \right \Vert \) such that \(\left \Vert A\right \Vert \leq 1\) for all AΣ (see, e.g., [13]).

In any induced matrix norm \(\left \Vert T_{i}P^{\left (i\right ) }\right \Vert \geq 1\) holds for every \(T_{i}P^{\left (i\right ) }\in \mathcal {T}\) and \(\left \Vert T\left (2\right ) \right \Vert >1\). Hence \(\mathcal {T}\) as a whole is not an RCP set. However the eigenvalue-eigenvector structure of the transformation matrices makes it possible to identify a subset of \(\mathcal {T}\), which might have the RCP property. The next section investigates the structure of the matrices \(T_{i}P^{\left (i\right ) } \in \mathcal {T}\). Using these results we show several examples of possible convergence behavior or failure in Section 4. We also specify here the type of convergence we study in the rest of the paper. In Section 5 we identify a subset of \(\mathcal {T}\), which might have the RCP property. Using this subset we give a sufficient condition under which the simplex sequences \(\left \{ S^{\left (k\right ) }\right \} \) converge in the specified sense.

There are many problems that may complicate the analysis of the Nelder-Mead method. Here we mention the following. The Nelder-Mead algorithm is a nonstationary iteration (see, e.g., Young [45]) of the form

$$ S^{\left( k\right) }=S^{\left( k-1\right) }T_{k}P^{\left( k\right) }\mathsf{\quad(}T_{k}P^{\left( k\right) }\in\mathcal{T}\mathsf{,\ } k=0,1,\ldots\mathsf{),} $$

where the matrices \(T_{k}P^{\left (k\right ) }\) are not contractive (\(\left \Vert T_{k}P^{\left (k\right ) }\right \Vert \geq 1\)). Operations 1–4 and the insertion rule (2) guarantee only improvement in the worst vertex (\(f_{n+1}^{\left (k+1\right ) }\leq f_{n+1}^{\left (k\right ) }\)). In the case of shrinking there is no guaranteed improvement at all. It is also notable that the selection of \(T_{k}P^{\left (k\right ) }\) at iteration k depends only on the positions of \(f_{r}^{\left (k\right ) }\), \(f_{e}^{\left (k\right ) }\), \(f_{oc}^{\left (k\right ) }\) and \(f_{ic}^{\left (k\right ) }\) relative to the ordering (3) of \(f_{i}^{\left (k\right ) }\)’s. Example 3 of Section 4 shows that given an initial simplex \(S^{\left (0\right ) }\), the Nelder-Mead algorithm may generate the same sequence \(\left \{ S^{\left (k\right ) }\right \} \) for different functions such that \(\lim _{k\rightarrow \infty }S^{\left (k\right ) }\) has different meanings for the different functions.

Finally, we note that matrix \(T\left (\alpha \right ) \) appears in Lagarias et al. [23] (p. 149) but it is not exploited subsequently.

3 The spectra of the transformation matrices

In this section we study the spectra of the transformation matrices \(T\left (\alpha \right ) P_{j}\) and TshrP so that we could find a subset of \(\mathcal {T}\), which has the RCP property. We then study the asymptotic behavior of the eigenvalues for \(n\rightarrow \infty \) which is important for the dimensionality effects. Han, Neumann and Xu [11] and Han and Neumann [12] obtained similar results. The similarities and connections will be discussed at the end of the section.

We use the following result from Rahman and Schmeisser [38], which is one of the several Schur-Cohn type theorems or criteria (for others, see Marden [30], Henrici [14], Sheil-Small [39], Barnett [3]).

Definition 1

Given a polynomial \(p\left (z\right ) =a_{0}+a_{1} z+\cdots +a_{n}z^{n}\) of degree n, we associate with it the polynomial

$$ p^{\ast}\left( z\right) :=z^{n}\overline{p\left( 1/\overline{z}\right) }=\overline{a}_{n}+\overline{a}_{n-1}z+\cdots+\overline{a}_{0}z^{n}. $$

If \(p\left (z\right ) \equiv \sigma p^{\ast }\left (z\right ) \), where \(\left \vert \sigma \right \vert =1\), then p is said to be self-inversive.

Theorem 1

(Cohn’s rules [38], Thm. 11.5.3) Let \(p\left (z\right ) ={\sum }_{i=0}^{n}a_{i}z^{i}\) be a polynomial of degree n. Denote by r and s the number of zeros of p inside the unit circle and on it, respectively. For polynomial pj the corresponding numbers are respectively denoted by rj and sj. Then the following rules hold.

  1. Rule 1

    If \(\left \vert a_{0}\right \vert >\left \vert a_{n}\right \vert \), then \(p_{1}\left (z\right ) :=\overline {a}_{0}p\left (z\right ) -a_{n}p^{\ast }\left (z\right ) \) is not identically zero, and we have \(\deg p_{1}<\deg p\), r1 = r, and s1 = s.

  2. Rule 2

    If \(\left \vert a_{0}\right \vert <\left \vert a_{n}\right \vert \), then \(p_{1}\left (z\right ) :=\left [ \overline {a}_{n}p\left (z\right ) -a_{0}p^{\ast }\left (z\right ) \right ] /z\) is of degree n − 1, r1 = r − 1, and s1 = s.

  3. Rule 3

    If there is an index kn/2 such that

    $$ a_{0}=\sigma\overline{a}_{n},~a_{1}=\sigma\overline{a}_{n-1},\ldots ,~a_{k-1}=\sigma\overline{a}_{n-k+1},\quad a_{k}\not =\sigma\overline{a}_{n-k}\quad\left( \left\vert \sigma\right\vert =1\right) , $$

    then define \(b:=\left (a_{n-k}-\sigma \overline {a}_{k}\right ) /a_{n}\),

    $$ g\left( z\right) :=\left( z^{k}+\frac{2b}{\left\vert b\right\vert }\right) p\left( z\right) ,\quad g_{1}\left( z\right) :=\overline{g}\left( 0\right) g\left( z\right) -\overline{g}^{\ast}\left( 0\right) g^{\ast }\left( z\right) , $$

    and

    $$ p_{1}\left( z\right) :=\frac{1}{z}\left[ g_{1}^{\ast}\left( 0\right) g_{1}\left( z\right) -g_{1}\left( 0\right) g_{1}^{\ast}\left( z\right) \right] , $$

    which yields that p1 is of degree n − 1, r1 = r − 1, and s1 = s.

  4. Rule 4

    If p is self-inversive, then \(p_{1}\left (z\right ) :=np\left (z\right ) -zp^{\prime }\left (z\right ) \) is not identically zero, and we have \(\deg p_{1}<\deg p\) and r1 = r; furthermore, s = n − 2r.

We also use the classical Eneström-Kakeya theorem, which can be found in several sources (see, e.g., Rahman, Schmeisser [38], Corollary 8.3.5, Sheil-Small [39], Marden [30]).

Theorem 2

(Eneström-Kakeya [8, 18]) A polynomial \(p\left (z\right ) ={\sum }_{i=0}^{n}a_{i}z^{i}\) with positive coefficients has all its zeros in the annulus

$$ R_{1}\leq\left\vert z\right\vert \leq R_{2}, $$

where \(R_{1}=\min \limits _{0\leq i\leq n-1}a_{i}/a_{i+1}\) and \(R_{2}=\max \limits _{0\leq i\leq n-1}a_{i}/a_{i+1}\).

Matrices \(T\left (\alpha \right ) \) and Tshr have a simple eigenvalue-eigenvector structure. When they are multiplied by a permutation matrix other than In+ 1, this may change. For example, if n = 2, \(T\left (1\right ) \) is an involution, while \(T\left (1\right ) P_{2}\) (reflection) is a 6-involutory matrix (for such matrices, see Trench [41]).

Theorem 3

The characteristic polynomial of matrix \(T\left (\alpha \right ) P_{j}\) (1 ≤ jn + 1) is

$$ p_{n+1}\left( \lambda\right) =\left( 1-\lambda\right)^{j-1} p_{n+2-j}\left( \lambda\right) , $$
(8)

where

$$ p_{n+2-j}\left( \lambda\right) =\lambda^{n+2-j}-c\sum\limits_{i=1}^{n+1-j} \lambda^{i}+\alpha\quad\left( c=\frac{1+\alpha}{n}\right) . $$
(9)
  1. (i)

    If j = 1, then \(p_{n+1}\left (\lambda \right ) \) has at least one eigenvalue λ = 1.

  2. (ii)

    If α = 1, \(p_{n+1}\left (\lambda \right ) \) has at least two eigenvalues λ = 1.

  3. (iii)

    If α > − 1 and j ≥ 2, there are exactly j − 1 eigenvalues λ = 1.

  4. (iv)

    For \(\left \vert \alpha \right \vert <1\) and j = 1, \(p_{n+1}\left (\lambda \right ) \) has exactly one zero λ1 = 1, while the remaining n roots are in the open unit disk.

  5. (v)

    If \(\left \vert \alpha \right \vert <1\) and 2 ≤ jn + 1, then all zeros of \(p_{n+2-j}\left (\lambda \right ) \) are in the open unit disk.

  6. (vi)

    If α = 1 and 1 ≤ jn + 1, then all zeros of \(p_{n+2-j}\left (\lambda \right ) \) are on the unit circle.

  7. (vii)

    If α > 1 and j = 1, then \(p_{n+1}\left (\lambda \right ) \) has a second zero in the interval \(\left (1,1+\frac {1+\alpha }{n}\right ) \) and all its zeros are in the annulus \(1\leq \left \vert \lambda \right \vert \leq 1+\alpha \).

Proof

For 1 ≤ jn + 1,

$$ T\left( \alpha\right) P_{j}=\left[ \begin{array} [c]{cc} I_{j-1} & ce{e_{1}^{T}}\\ 0 & A_{n+2-j} \end{array} \right] , $$
(10)

where

$$ A_{n+2-j}=\left[ \begin{array} [c]{ccccc} c & 1 & 0 & {\cdots} & 0\\ c & 0 & {\ddots} & {\ddots} & \vdots\\ {\vdots} & {\vdots} & {\ddots} & {\ddots} & 0\\ c & 0 & {\cdots} & 0 & 1\\ -\alpha & 0 & {\cdots} & 0 & 0 \end{array} \right] . $$

Since An+ 2−j is a companion matrix (for this form, see, e.g., [16]), its characteristic polynomial is

$$ p_{n+2-j}\left( \lambda\right) =\lambda^{n+2-j}-c\sum\limits_{i=1}^{n+1-j} \lambda^{i}+\alpha\quad\left( 1\leq j\leq n+1\right) $$

and the characteristic polynomial of \(T\left (\alpha \right ) P_{j}\) is

$$ \det\left( T\left( \alpha\right) P_{j}-\lambda I_{n+1}\right) =\left( 1-\lambda\right)^{j-1}p_{n+2-j}\left( \lambda\right) . $$

Note that \(p_{n+2-j}\left (1\right ) =\frac {j-1}{n}\left (\alpha +1\right ) \). If j = 1, \(p_{n+1}\left (1\right ) =0\), that is λ1 = 1, which proves (i). Since \(p_{n+1}^{\prime }\left (\lambda \right ) =\frac {1-\alpha } {2}\left (n+1\right ) \), there is a second zero λ2 = 1 if α = 1 (Claim (ii)). If α > − 1 and j ≥ 2, \(p_{n+2-j}\left (1\right ) =\left (j-1\right ) c>0\) proving (iii). For proving (iv) and (v) we shall consider real polynomials of the form

$$ p\left( \lambda\right) =\alpha-c\sum\limits_{i=1}^{m-1}\lambda^{i}+\lambda^{m} \quad\left( c=\frac{1+\alpha}{n}\right) , $$

where 2 ≤ mn + 1. Since \(p^{\ast }\left (\lambda \right ) =1-c{\sum }_{i=1}^{m-1}\lambda ^{i}+\alpha \lambda ^{m}\) and \(\left \vert \alpha \right \vert <1\), we can apply Rule 2 of Theorem 1:

$$ p_{1}\left( \lambda\right) =\left[ 1p\left( \lambda\right) -\alpha p^{\ast}\left( \lambda\right) \right] /z=\left( 1-\alpha^{2}\right) \lambda^{m-1}+c\left( \alpha-1\right) {\sum}_{i=0}^{m-2}\lambda^{i}. $$

Dividing \(p_{1}\left (\lambda \right ) \) by \(c\left (1-\alpha \right ) >0\), one obtains

$$ p_{1}\left( \lambda\right) =n\lambda^{m-1}-\sum\limits_{i=0}^{m-2}\lambda^{i}=n\lambda^{m-1}-\sum\limits_{i=1}^{m-2}\lambda^{i}-1. $$

For \(\left \vert \lambda \right \vert \geq 1\),

$$ \begin{array}{@{}rcl@{}} \left\vert p_{1}\left( \lambda\right) \right\vert & =&\left\vert \lambda\right\vert^{m-1}\left\vert n-{\sum}_{i=1}^{m-1}\frac{1}{\lambda^{i} }\right\vert \geq\left\vert \lambda\right\vert^{m-1}\left( n-{\sum}_{i=1}^{m-1}\frac{1}{\left\vert \lambda\right\vert^{i}}\right) \\ & \geq&\left\vert \lambda\right\vert^{m-1}\left( n-m+1\right) . \end{array} $$

If mn (j ≥ 2), then \(p_{1}\left (\lambda \right ) \) has no root outside the open unit disk, r1 = m − 1, s1 = 0. Hence \(p\left (\lambda \right ) =0\) has m zeros inside the open unit disk. If m = n + 1, then \(p_{1}\left (\lambda \right ) =n\lambda ^{n}-{\sum }_{i=1}^{n-1}\lambda ^{i}-1\). For \(\left \vert \lambda \right \vert >1\), \(\left \vert p_{1}\left (\lambda \right ) \right \vert >0\), and \(p_{1}\left (1\right ) =0\). We must prove that there is no other root on the unit circle, that is s1 = 1. If n = 2, then \(p_{1}\left (\lambda \right ) =2\lambda ^{2}-\lambda -1=0\) with the solution λ1 = 1 and \(\lambda _{2}=-\frac {1}{2}\). Hence r1 = 1 and s1 = 1. It follows that r = 2 and s = s1 = 1. It is easy to check that

$$ p_{1}\left( \lambda\right) =n\lambda^{n}-\sum\limits_{i=1}^{n-1}\lambda^{i}-1=\left( \lambda-1\right) \left( \sum\limits_{i=0}^{n-1}\left( i+1\right) \lambda^{i}\right) =\left( \lambda-1\right) q\left( \lambda\right) . $$

Polynomial \(q\left (\lambda \right ) \) has positive real coefficients ai = i + 1 (i = 0,1,…,n − 1),

$$ R_{1}=\frac{1}{2}\leq\frac{a_{i}}{a_{i+1}}=\frac{i+1}{i+2}\leq\frac{n-1} {n}=R_{2}<1, $$

and the Eneström-Kakeya theorem implies that \(q\left (\lambda \right ) \) has all its zeros in the annulus \(\frac {1}{2}\leq \left \vert \lambda \right \vert \leq \frac {n-1}{n}<1\). Hence r1 = n − 1, s1 = 1, and so r = n and s = 1.

Assume now that α = 1 and 1 ≤ jn. Using the notation = n + 2 − j we have

$$ p\left( \lambda\right) =1-\frac{2}{n}\sum\limits_{i=1}^{\ell-1}\lambda^{i} +\lambda^{\ell}\quad\left( c=\frac{2}{n}\right) , $$

which is self-inversive (see Definition 1). Thus we apply Rule 4 of Cohn (see Theorem 1):

$$ p_{1}\left( \lambda\right) =\ell p\left( \lambda\right) -\lambda p^{\prime}\left( \lambda\right) =\ell-\frac{2}{n}\sum\limits_{i=1}^{\ell-1}\left( \ell-i\right) \lambda^{\ell}. $$

If \(\left \vert \lambda \right \vert <1\), then

$$ \left\vert p_{1}\left( \lambda\right) \right\vert \geq\ell-\frac{2}{n} \sum\limits_{i=1}^{\ell-1}\left( \ell-i\right) \left\vert \lambda\right\vert ^{i}>\ell-\frac{2}{n}\sum\limits_{i=1}^{\ell-1}\left( \ell-i\right) =\frac{1} {n}\ell\left( n-\ell+1\right) \text{.} $$

Hence \(p_{1}\left (\lambda \right ) \) has no zero inside the unit disk, r1 = 0 = r and the zeros of \(p\left (\lambda \right ) \) on the unit circle are s = . Claim (vi) also follows from a result of Lakatos and Losonczi [24].

Since for j = 1 and any α, \(p_{n+1}\left (\lambda \right ) =\left (\lambda -1\right ) {\sum }_{i=0}^{n}\left (-\alpha +ic\right ) \lambda ^{i}\), we investigate \(q\left (\lambda \right ) ={\sum }_{i=0}^{n}\left (-\alpha +ic\right ) \lambda ^{i}\) with \(q\left (0\right ) =-\alpha \). By definition \(q^{\ast }\left (\lambda \right ) ={\sum }_{i=0}^{n}\left (-\alpha +\left (n-i\right ) c\right ) \lambda ^{i}\) and \(q^{\ast }\left (0\right ) =-\alpha +nc=1\). Since α > 1, we can apply Rule 1 of Theorem 1:

$$ q_{1}\left( \lambda\right) =\sum\limits_{i=0}^{n-1}\left( \alpha^{2}-1\right) \left( 1-\frac{i}{n}\right) \lambda^{i}. $$

The coefficients of \(q_{1}\left (\lambda \right ) \) are positive and

$$ 1\leq\frac{a_{i}}{a_{i+1}}=\frac{n-i}{n-i-1}\leq2. $$

The Eneström-Kakeya theorem implies that the zeros of \(q_{1}\left (\lambda \right ) \) are in the annulus \(1\leq \left \vert \lambda \right \vert \leq 2\). Hence \(q_{1}\left (\lambda \right ) \) has no zero inside the unit disk, r1 = 0 = r, q and q1 have the same number of zeros inside the unit disk. Cauchy’s classical estimate yields the annulus \(1\leq \left \vert \lambda \right \vert \leq 1+\alpha \) for the zeros of \(p_{n+1}\left (\lambda \right ) \), if \(\alpha >\frac {1+\alpha }{n}=c\), which is satisfied for n ≥ 2. Note that \(q\left (1\right ) =\frac {1-\alpha }{2}\left (n+1\right ) <0\) for α > 1. Assume that λ≠ 1. Then

$$ q\left( \lambda\right) =\frac{\lambda\left( c+\alpha-\frac{\alpha}{\lambda }\right) +\lambda^{n+1}\left( \lambda-1-c\right) }{\left( \lambda -1\right)^{2}} $$

and

$$ q\left( 1+c\right) =\frac{c+\alpha+1}{c}>0. $$

Hence there must be a real zero in the interval \(\left (1,1+c\right ) \). □

For the eigenvalues of TshrP (\(P\in \mathcal {P}_{n+1}\)), we need the following result.

Theorem 4

(Langville and Meyer [26, 27]). If the spectrum of the stochastic matrix P is \(\left \{ 1,\lambda _{2},\ldots ,\lambda _{3}\right \} \), then the spectrum of

$$ W=\alpha P+\left( 1-\alpha\right) ev^{T} $$

is \(\left \{ 1,\alpha \lambda _{2},\alpha \lambda _{3},\ldots ,\alpha \lambda _{n}\right \} \), where vT is a probability vector.Footnote 2

Since the eigenvalues of W and WT coincide, the same result holds for the transposed matrix

$$ W^{T}=\alpha P^{T}+\left( 1-\alpha\right) ve^{T} $$

as well.

Corollary 1

The spectrum of

$$ T_{shr}P=\frac{1}{2}P+\frac{1}{2}e_{1}e^{T}\quad\left( P\in\mathcal{P} _{n+1}\right) $$

is \(\left \{ 1,\frac {1}{2}\lambda _{2},\frac {1}{2}\lambda _{3},\ldots ,\frac {1}{2}\lambda _{n+1}\right \} \). Since the eigenvalues of a permutation matrix P are on the unit circle \(\left \vert \lambda \right \vert =1\), we have \(\left \vert \lambda _{i}\left (T_{shr}P\right ) \right \vert =\frac {1}{2}\) for i = 2,…,n + 1.

Remark 1

The eigenvalues λ = 1 of \(T\left (\alpha \right ) P_{j}\) (\(\left \vert \alpha \right \vert <1\)) and TshrP (\(P\in \mathcal {P}_{n+1}\)) are simple and so \(\left \{ \left [ T\left (\alpha \right ) P_{j}\right ]^{k}\right \} \) and \(\left \{ \left [ T_{shr}P\right ]^{k}\right \} \) are convergent. For α > 1, \(\rho \left (T\left (\alpha \right ) P_{1}\right )\)> 1, and so \(\left \{ \left [ T\left (\alpha \right ) P_{1}\right ]^{k}\right \} \) is unbounded. For α = 1, \(T\left (1\right ) P_{1}\) has at least a double eigenvalue λ = 1, and since it is a companion matrix, its Jordan form has at least a 2 × 2 block belonging to λ = 1. Hence \(\left \{ \left [ T\left (1\right ) P_{1}\right ] ^{k}\right \} \) is also unbounded (see, e.g., [16, 32, 36]).

For the asymptotic behavior of the eigenvalues of \(T\left (\alpha \right ) P_{j}\), note that for j = n + 1 − with fixed and \(n\rightarrow \infty \), the matrix \(T\left (\alpha \right ) P_{j}\) has n eigenvalues λ = 1, while the remaining + 1 eigenvalues (the zeros of \(p_{\ell +1}\left (\lambda \right ) =\lambda ^{\ell +1}-c{\sum }_{i=1}^{\ell } \lambda ^{i}+\alpha \)) are converging to the zeros of the polynomial \(p\left (\lambda \right ) =\lambda ^{\ell +1}+\alpha \).

For a more precise result, we use a less known result of E. Landau [25] (see also [1, 30] or [33]).

Theorem 5

(E. Landau [25]). Consider the polynomial \(p\left (z\right ) =a_{0}+a_{1}z+\cdots +a_{n}z^{n}\) (a0an≠ 0). If z is a zero of \(p\left (z\right ) \) and t is any positive real number, then

$$ \left\vert z\right\vert \geq g\left( t\right) :=\frac{\left\vert a_{0}\right\vert t}{\left\vert a_{0}\right\vert + \max _{1\leq i\leq n}\left\vert a_{i}\right\vert t^{i}}. $$
(11)

Landau’ theorem yields the following characterizations of the eigenvalues of \(T\left (\alpha \right ) P_{j}\) when \(n\rightarrow \infty \).

Lemma 3

Assume that \(\left \vert \alpha \right \vert <1\) and consider the eigenvalues of \(T\left (\alpha \right ) P_{j}\) for fixed j and \(n\rightarrow \infty \). Then \(T\left (\alpha \right ) P_{j}\) has either one eigenvalue λ = 1 (j = 1) or j − 1 eigenvalues λ = 1 (j ≥ 2), while the absolute values of the remaining n + 2 − j eigenvalues (the zeros of \(p_{n+2-j}\left (\lambda \right ) \)) are converging to 1.

Proof

In our case \(p\left (\lambda \right ) =\lambda ^{\ell }-c{\sum }_{i=1}^{\ell -1}\lambda ^{i}+\alpha \) and

$$ \max\limits_{1\leq i\leq n}\left\vert a_{i}\right\vert t^{i}=\max\limits_{1\leq i\leq\ell -1}\left\{ ct^{i},t^{\ell}\right\} . $$

Assume that n ≥ 2. Then 0 < c < 1. For 0 < t < 1, ct > cti (i ≥ 2) and ct = t for \(t=\sqrt [\ell -1]{c}<1\). If \(t_{1}=\sqrt [\ell -1]{c}\) and = n + 1 − j with j fixed, then for the quantity \(g\left (t\right ) \) given by Theorem 5,

$$ g\left( t_{1}\right) =\frac{\left\vert \alpha\right\vert \sqrt[n+1-j] {\frac{1+\alpha}{n}}}{\left\vert \alpha\right\vert +\frac{1+\alpha} {n}\sqrt[n+1-j]{\frac{1+\alpha}{n}}}\rightarrow1\quad\left( n\rightarrow \infty\right) $$

holds. □

Lemma 4

Assume that α > 1 and consider the eigenvalues of \(T\left (\alpha \right ) P_{1}\) for \(n\rightarrow \infty \). Then the absolute values of the eigenvalues of \(T\left (\alpha \right ) P_{1}\) (the zeros of \(p_{n+1}\left (\lambda \right ) \)) are converging to 1.

Proof

Since for α > 1, the zeros of \(p_{n+1}\left (\lambda \right ) =\lambda ^{n+1}-c{\sum }_{i=1}^{n}\lambda ^{i}+\alpha \) are in the annulus \(1\leq \left \vert z\right \vert \leq 1+\alpha \), the zeros of the reciprocal equation \(p_{n+1}^{\ast }\left (\lambda \right ) =1-c{\sum }_{i=1}^{n}\lambda ^{i}+\alpha \lambda ^{n+1}\) are in the annulus \(\frac {1}{1+\alpha }\leq \left \vert z\right \vert \leq 1\). The Landau theorem (Theorem 5) implies that for the zeros of \(p_{n+1}^{\ast }\left (\lambda \right ) \), we have the lower estimate

$$ \left\vert z\right\vert \geq\frac{t}{1+ \max _{1\leq i\leq n}\left\{ ct^{i},\alpha t^{n+1}\right\} }. $$

Since 0 < c < 1, ct > cti (1 ≤ in) for 0 < t < 1 and ct = αtn+ 1 for \(t=\sqrt [n]{\frac {c}{\alpha }}\). If \(t_{1}=\sqrt [n]{\frac {c}{\alpha }}\), then for the quantity \(g\left (t\right ) \) given by Theorem 5,

$$ g\left( t_{1}\right) =\frac{\sqrt[n]{\frac{1+\alpha}{n\alpha}}} {1+\frac{1+\alpha}{n}\sqrt[n]{\frac{1+\alpha}{n\alpha}}}\rightarrow 1\quad\left( n\rightarrow\infty\right) $$

holds. □

Han and Neumann [12] investigated the Nelder-Mead method when it generates a sequence of simplices in \(\mathbb {R}^{n}\) such that

$$ S^{\left( k\right) }=\left[ 0,v^{\left( k+n-1\right) },\ldots,v^{\left( k+1\right) },v^{\left( k\right) }\right] $$

and

$$ f\left( 0\right) <f\left( v^{\left( k+n-1\right) }\right) <\cdots <f\left( v^{\left( k\right) }\right) $$

for kk0. They expressed the next incoming vertex \(v^{\left (k+n\right ) }\) in a difference equation form

$$ v^{\left( k+n\right) }=\frac{1+\tau_{k}}{n}\sum\limits_{i=1}^{n-1}v^{\left( k+i\right) }-\tau_{k}v^{\left( k\right) }\quad\left( \tau_{k}=1,\frac {1}{2},-\frac{1}{2}\right) , $$

which has the characteristic equation \(p\left (\mu \right ) =\mu ^{n} +\frac {1+\tau _{k}}{n}{\sum }_{i=1}^{n-1}\mu ^{n-i}-\tau _{k}=0\). Introducing \(M_{k}=\left [ v^{\left (k+n-1\right ) },\ldots ,v^{\left (k+1\right ) },v^{\left (k\right ) }\right ] \) they expressed Mk+ 1 in the form Mk+ 1 = MkAk, where

$$ A_{k}=\left[ \begin{array} [c]{cc} \frac{1+\tau_{k}}{n}e & I_{n-1}\\ -\tau_{k} & 0 \end{array} \right] \in\mathbb{R}^{n\times n}. $$

The characteristic polynomial of Ak coincides with \(p\left (\mu \right ) \) and Ak is close to \(T\left (\alpha \right ) P_{1}\in \mathbb {R}^{\left (n+1\right ) \times \left (n+1\right ) }\). Han, Neumann and Xu [11] investigated the zeros of \(p\left (\mu \right ) \) in the form of the two parameter polynomial

$$ \widehat{p}_{n}\left( \lambda\right) =b-a\sum\limits_{i=1}^{n-1}\lambda^{i} +\lambda^{n}\quad\left( a,b\in\mathbb{C}\right) $$
(12)

of degree n using a different version of the Schur-Cohn criterion (see Marden [30]) and a different technique. Theorems 4.1, 4.2 and 5.1 of [11] are those results that can be related to this paper.

Theorem 6

(Han-Neumann-Xu [11], Theorem 4.1) Suppose that \(a,b\in \mathbb {R}\) and a≠ 0.

  1. (i)

    Assume that \(\left \vert b\right \vert <1\). The polynomial (12) has one root in the interior of the unit disk and the remaining roots on the unit circle if \(\frac {b+1}{a}=-1\).

  2. (ii)

    Assume that \(\left \vert b\right \vert <1\). The polynomial (12) has one root on the unit circle and the remaining roots in the interior of the unit disk, If \(\frac {b+1}{a}=n-1\).

  3. (iii)

    Assume that \(\left \vert b\right \vert >1\). The polynomial (12) has one root in the exterior of the unit disk and the remaining roots on the unit circle if \(\frac {b+1}{a}=-1\).

  4. (iv)

    Assume that \(\left \vert b\right \vert >1\). The polynomial (12) has one root on the unit circle and the remaining roots in the exterior of the unit disk if \(\frac {b+1}{a}=n-1\).

Theorem 7

(Han-Neumann-Xu [11], Theorem 4.2) Consider the polynomial (12). If b = 1 and \(0\leq a\leq \frac {2}{n-1}\), then all roots of the polynomial (12) are on the unit circle.

Denote by \(\lambda _{OC}\left (n\right ) \) any root of the polynomial (12) with the coefficients \(a=\frac {3}{2n}\), \(b=\frac {1}{2}\) (outside contraction for \(\alpha =\frac {1}{2}\)). Similarly, denote by \(\lambda _{IC}\left (n\right ) \) any root of the polynomial (12) with the coefficients \(a=\frac {1}{2n}\), \(b=-\frac {1}{2}\) (inside contraction for \(\alpha =-\frac {1}{2}\)).

Theorem 8

(Han-Neumann-Xu [11], Theorem 5.1) For the values \(\lambda _{OC}\left (n\right ) \) and \(\lambda _{IC}\left (n\right ) \),

$$ \lim\limits_{n\rightarrow\infty}\left\vert \lambda_{OC}\left( n\right) \right\vert =1 $$

and

$$ \lim\limits_{n\rightarrow\infty}\left\vert \lambda_{IC}\left( n\right) \right\vert =1. $$

In this paper we investigated a set of polynomials of the form

$$ p_{\ell}\left( \lambda\right) =\alpha-c\sum\limits_{i=1}^{\ell-1}\lambda^{i}+\lambda^{\ell}\quad(c=\frac{1+\alpha}{n}), $$
(13)

where the parameters are α and (2 ≤ n + 1). In our case b = α, \(a=\frac {1+\alpha }{n}\) implying that \(\left (b+1\right ) /a=n\) for 2 ≤ n + 1 (see Theorem 3).

Theorem 4.1 of [11] assumes that either \(\left (b+1\right ) /a=-1\) or \(\left (b+1\right ) /a=n-1\). In the latter case we can write that \(\widehat {p}_{\ell }\left (\lambda \right ) =b-\frac {1+b}{\ell -1}{\sum }_{i=1}^{\ell -1}\lambda ^{i}+\lambda ^{n}\). For b = α and = n + 1 (j = 1) the polynomial \(\widehat {p}_{\ell }\left (\lambda \right ) \) coincides with (13). Hence case (iv) of Theorem 3 also follows from case (ii) of Theorem 4.1 of [11]. In turn, case (iv) of Theorem 4.1 of [11] follows from case (vii) of Theorem 3. For n + 1, the two theorems are clearly different.

In addition, we note that case (vi) of Theorem 3 also follows from Theorem 4.2 of [11] and Lemma 3 implies Theorem 5.1 of [11] for \(\alpha =\pm \frac {1}{2}\).

4 Examples of convergence behavior

Following McKinnon [29] and Han and Neumann [12] we investigate simple behavior patterns of the simplex sequences \(\left \{ S^{\left (k\right ) }\right \} \) generated by the Nelder-Mead method. If the simplex sequence \(\left \{ S^{\left (k\right ) }\right \} \) generated by the Nelder-Mead algorithm is convergent, that is \(\lim _{k\rightarrow \infty }S^{\left (k\right ) }=S^{\infty }\) for some \(S^{\infty }\in \mathbb {R}^{n\times \left (n+1\right ) }\), then \(f\left (x_{i}^{\left (k\right ) }\right ) \rightarrow f\left (S^{\infty }e_{i}\right ) \) (i = 1,2,…,n + 1) provided that f is continuous at the points \(S^{\infty }e_{i}\) (i = 1,2,…,n + 1). Note that if for some vector \(\widehat {x}\), \(x_{j}^{\left (k\right ) }\rightarrow \widehat {x}\) (\(k\rightarrow \infty \), j = 1,2,…,n + 1), then \(S^{\left (k\right ) }\rightarrow \widehat {x}e^{T}\), which is a rank-one matrix of special form.

We show examples where the incoming vertex v satisfies

$$ f_{1}^{\left( k\right) }\leq\cdots\leq f_{j-1}^{\left( k\right) }\leq f\left( v\right) \leq f_{j}^{\left( k\right) }<f_{j+1}^{\left( k\right) }\leq\cdots\leq f_{n+1}^{\left( k\right) }\quad\left( k\geq0\right) $$

with a fixed index j and the type of v (reflection, expansion, outside contraction, or inside contraction point) is the same for k ≥ 0.Footnote 3 Hence \(S^{\left (k\right ) }=S^{\left (0\right ) }\left [ T\left (\alpha \right ) P_{j}\right ]^{k}\) for k ≥ 0. Using the examples we can specify the type of convergence which is studied in the rest of the paper.

Assume that \(\left \vert \alpha \right \vert <1\) and \(S^{\left (0\right ) }=\left [ S_{j-1}^{\left (0\right ) },S_{n+2-j}^{\left (0\right ) }\right ] \) (\(S_{j-1}^{\left (0\right ) }\in \mathbb {R}^{n\times \left (j-1\right ) }\)). It follows from Theorem 3 (formula (10)) that

$$ \lim\limits_{k\rightarrow\infty}S^{\left( k\right) }=\lim\limits_{k\rightarrow\infty }S^{\left( 0\right) }\left[ T\left( \alpha\right) P_{j}\right]^{k}=\left[ S_{j-1}^{\left( 0\right) },S_{j-1}^{\left( 0\right) } ce{e_{1}^{T}}\left( I_{n+2-j}-A_{n+2-j}\right)^{-1}\right] . $$

If rank\(\left (S_{j-1}^{\left (0\right ) }\right ) \geq 2\), then \(\lim _{k\rightarrow \infty }S^{\left (k\right ) }\) cannot be of the form \(\widehat {x}e^{T}\) for some vector \(\widehat {x}\), diam\(\left (S^{\left (k\right ) }\right ) \nrightarrow 0\), and \(f_{i}^{\left (k\right ) }\)’s do not converge to the same limit. For j = 1,2, we can write \(T\left (\alpha \right ) P_{j}\) in the form

$$ T\left( \alpha\right) P_{j}=F\left[ \begin{array} [c]{cc} 1 & 0^{T}\\ b_{j} & C_{j} \end{array} \right] F^{-1}\quad\left( F=\left[ \begin{array} [c]{cc} 1 & -e^{T}\\ 0 & I_{n} \end{array} \right] \right) . $$

For \(\left \vert \alpha \right \vert <1\), Theorem 3 implies that \(\rho \left (C\right ) <1\) and so

$$ \left[ T\left( \alpha\right) P_{j}\right]^{k}\rightarrow\left[ \begin{array} [c]{c} 1-e^{T}\left( I-C_{j}\right)^{-1}b_{j}\\ \left( I-C_{j}\right)^{-1}b_{j} \end{array} \right] e^{T}=B. $$

Hence \(\lim _{k\rightarrow \infty }S^{\left (k\right ) }=S^{\left (0\right ) }B=we^{T}\) for some vector w. For j = 1, b1≠ 0, while for j = 2, bj = 0. In the latter case \(S^{\left (0\right ) }B=x_{1}^{\left (0\right ) }e^{T}\).

Next we show five examples for different behavior patterns of \(\left \{ S^{\left (k\right ) }\right \} \). Examples 1, 2, 3 and 5 are two dimensional, while Example 4 is n-dimensional. Let \(J=\left \{ 1,2,\ldots ,n+1\right \} \).

In Example 1 the incoming vertex is \(x_{e}^{\left (k\right ) }\) for k ≥ 0, f has no finite minimum, \(f_{i}^{\left (k\right ) }\rightarrow 0\) (\(k\rightarrow \infty \), iJ) and diam\(\left (S^{\left (k\right ) }\right ) \rightarrow \infty \). In Example 2 the incoming vertex (as expansion point) is \(x_{r}^{\left (k\right ) }\) for k ≥ 0, diam\(\left (S^{\left (k\right ) }\right ) \) is constant, \(\left \{ S^{\left (k\right ) }\right \} \) is unbounded and three functions are given so that \(f_{i}^{\left (k\right ) }\rightarrow -\infty \) (iJ) or \(f_{i}^{\left (k\right ) }\rightarrow 0\) (iJ) holds.

In Example 3 the incoming vertex is \(x_{ic}^{\left (k\right ) }\) for k ≥ 0, \(x_{i}^{\left (k\right ) }\rightarrow x_{1}^{\left (0\right ) }\) (i = 1,2,3) and three functions are given so that \(x_{1}^{\left (0\right ) }\) is not a stationary point of f or it is a minimum point of f. In the n-dimensional Example 4 the incoming vertex is \(x_{ic}^{\left (k\right ) }\) which replaces \(x_{n+1}^{\left (k\right ) }\) for k ≥ 0. Hence \(x_{i}^{\left (k\right ) }=x_{i}^{\left (0\right ) }\) (i = 1,…,n), \(x_{n+1}^{\left (k\right ) }\rightarrow x_{c}^{\left (0\right ) }\) and diam\(\left (S^{\left (k\right ) }\right ) \nrightarrow 0\). For n≠ 3, \(\lim _{k}x_{n+1}^{\left (k\right ) }=x_{c}^{\left (0\right ) }\) is not a stationary point of the three given functions f.

In Example 5 the incoming vertex is \(x_{ic}^{\left (k\right ) }\) which replaces \(x_{3}^{\left (k\right ) }\) for k ≥ 0, \(x_{3}^{\left (k\right ) }\rightarrow x_{c}^{\left (0\right ) }\), diam\(\left (S^{\left (k\right ) }\right ) \nrightarrow 0\). Two functions are given such that \(\lim _{k}x_{3}^{\left (k\right ) }=x_{c}^{\left (0\right ) }\) is not a stationary point for the first function, while it is a saddle point for the second function.

Example 1

The expansion point \(x_{e}^{\left (k\right ) }\) is the incoming vertex infinitely many times if

$$ f_{e}^{\left( k\right) }<f_{r}^{\left( k\right) }<f_{1}^{\left( k\right) }\leq f_{2}^{\left( k\right) }\leq f_{3}^{\left( k\right) }\quad\left( k\geq0\right) . $$
(14)

In this case \(S^{\left (k\right ) }=S^{\left (0\right ) }B_{k}\) where \(B_{k}=\left [ T\left (2\right ) P_{1}\right ]^{k}\) is not bounded. Select

$$ S^{\left( 0\right) }=\left[ \begin{array} [c]{ccc} \frac{5}{6}-\frac{7}{66}\sqrt{33}~ & \frac{1}{3}-\frac{1}{11}\sqrt {33}~ & \frac{1}{3}-\frac{1}{33}\sqrt{33}\\ \frac{7}{66}\sqrt{33}+\frac{5}{6} & \frac{1}{11}\sqrt{33}+\frac{1}{3} & \frac{1}{33}\sqrt{33}+\frac{1}{3} \end{array} \right] =\left[ \begin{array} [c]{ccc} a & c & e\\ b & d & f \end{array} \right] . $$

The rows of \(S^{\left (0\right ) }\) are the left eigenvectors of \(T\left (2\right ) P_{1}\) corresponding to \(\lambda _{2}=\frac {1}{4}\sqrt {33}+\frac {1}{4}\) and \(\lambda _{3}=\frac {1}{4}-\frac {1}{4}\sqrt {33}\), respectively. Hence if condition (14) holds, then

$$ S^{\left( k\right) }=\left[ \begin{array} [c]{ccc} \lambda^{k}a & \lambda^{k}c & \lambda^{k}e\\ \mu^{k}b & \mu^{k}d & \mu^{k}f \end{array} \right] , $$
$$ x_{r}^{\left( k\right) }=\left[ \begin{array} [c]{c} \lambda^{k}\left( a+c-e\right) \\ \mu^{k}\left( b+d-f\right) \end{array} \right] ,\quad x_{e}^{\left( k\right) }=\left[ \begin{array} [c]{c} \frac{1}{2}\lambda^{k}\left( 3a+3c-4e\right) \\ \frac{1}{2}\mu^{k}\left( 3b+3d-4f\right) \end{array} \right] . $$

Let \(f\left (x,y\right ) =\left (1+x^{2}+y^{2}\right )^{-1}\). Since \(\left \vert a\right \vert >\left \vert c\right \vert >\left \vert e\right \vert \) and \(\left \vert b\right \vert >\left \vert d\right \vert >\left \vert f\right \vert \), condition \(f_{1}^{\left (k\right ) }<f_{2}^{\left (k\right ) } <f_{3}^{\left (k\right ) }\) clearly holds. Inequalities \(f_{r}^{\left (k\right ) }<f_{1}^{\left (k\right ) }\) and \(f_{e}^{\left (r\right ) } <f_{r}^{\left (k\right ) }\) hold if and only if

$$ \lambda^{2k}\left( a+c-e\right)^{2}+\mu^{2k}\left( b+d-f\right)^{2}>\lambda^{2k}a^{2}+\mu^{2k}b^{2} $$

and

$$ \lambda^{2k}\frac{\left( 3a+3c-4e\right)^{2}}{4}+\mu^{2k}\frac{\left( 3b+3d-4f\right)^{2}}{4}>\lambda^{2k}\left( a+c-e\right)^{2}+\mu^{2k}\left( b+d-f\right)^{2} $$

hold, respectively. The last two inequalities can be verified by direct calculation. Hence (14) holds and for \(k\rightarrow \infty \), diam\(\left (S^{\left (k\right ) }\right ) \rightarrow \infty \) and \(f_{i}^{\left (k\right ) }\rightarrow 0\) for i = 1,2,3.

Example 2

The expansion point \(x_{r}^{\left (k\right ) }\) is the incoming vertex infinitely many times if

$$ f_{r}^{\left( k\right) }<f_{1}^{\left( k\right) }\leq f_{2}^{\left( k\right) }\leq f_{3}^{\left( k\right) }\wedge f_{r}^{\left( k\right) }\leq f_{e}^{\left( k\right) }\quad\left( k\geq0\right) . $$
(15)

In this case \(S^{\left (k\right ) }=S^{\left (0\right ) }B_{k}\) where \(B_{k}=\left [ T\left (1\right ) P_{1}\right ]^{k}\). Since \(T\left (1\right ) P_{1}\) has a 2 × 2 Jordan block belonging to λ = 1, Bk is not bounded. Define

$$ S^{\left( 0\right) }=\left[ \begin{array} [c]{ccc} 1 & 0 & -1\\ 2 & -2 & 2 \end{array} \right] . $$

If condition (15) holds, then

$$ S^{\left( k\right) }=\left[ \begin{array} [c]{ccc} k+1 & k & k-1\\ \left( -1\right)^{k}2 & \left( -1\right)^{k+1}2 & \left( -1\right)^{k}2 \end{array} \right] $$

and

$$ x_{r}^{\left( k\right) }=\left[ \begin{array} [c]{c} k+2\\ \left( -1\right)^{k+1}2 \end{array} \right] ,\quad x_{e}^{\left( k\right) }=\left[ \begin{array} [c]{c} k+\frac{7}{2}\\ \left( -1\right)^{k+1}4 \end{array} \right] . $$

Note that \(\left \{ S^{\left (k\right ) }\right \} \) is unbounded, while diam\(\left (S^{\left (k\right ) }\right ) \) is constant. It is easy to verify that condition (15) holds for the functions \(f_{1}\left (x,y\right ) =\frac {1}{2}-\frac {1}{2}x+\frac {1}{4}\left \vert y\right \vert +\frac {1}{4}\left \vert \left \vert y\right \vert -2\right \vert \), \(f_{2}\left (x,y\right ) =\frac {1}{1+x^{2}}+\frac {1}{12}y^{2}-\frac {1}{3}\) and \(f_{3}\left (x,y\right ) =\frac {\sin \limits \left (\left \vert y\right \vert -1.9\right ) }{1+x^{2}}\). In case of f1, \(f_{i}^{\left (k\right ) }\rightarrow -\infty \) (\(k\rightarrow \infty \), i = 1,2,3). For f2 and f3, \(f_{i}^{\left (k\right ) }\rightarrow 0\) (i = 1,2,3). Note that \(\inf f_{1}=-\infty \), \(\left (0,0\right ) \) is a saddle point of f2, \(\inf f_{2}=-\frac {1}{3}\), and − 1 ≤ f3 ≤ 1.

Examples 1 and 2 show that the assumption on the boundedness of \(\left \{ B_{k}\right \} \) is justified.

McKinnon [29] proved the convergence behavior (17) of Example 3 for the convex function

$$ f_{M}\left( x,y\right) =\left\{ \begin{array} [c]{c} \theta\varphi\left\vert x\right\vert^{\tau}+y+y^{2},\text{ if } x\leq0\\ \theta x^{\tau}+y+y^{2},\text{ if }x\geq0 \end{array} \right. \quad\left( \theta=6,~\tau=2,~\varphi=60\right) $$

with initial simplex

$$ S^{\left( 0\right) }=\left[ \begin{array} [c]{ccc} 0 & \frac{1+\sqrt{33}}{8} & 1\\ 0 & \frac{1-\sqrt{33}}{8} & 1 \end{array} \right] =\left[ \begin{array} [c]{ccc} 0 & a & 1\\ 0 & b & 1 \end{array} \right] , $$
(16)

where the common limit point \(\left [ 0,0\right ]^{T}\) of the simplex vertices is not a stationary point of fM. Using simplex \(S^{\left (0\right ) }\) of (16) we show other functions that generate the same simplex sequence, while the limit point \(\left [ 0,0\right ]^{T}\) is either a stationary point or a nonstationary point depending on the particular function.

Example 3

The inside contraction point \(x_{ic}^{\left (k\right ) }\) is the incoming vertex infinitely many times, if

$$ f_{1}^{\left( k\right) }\leq f_{ic}^{\left( k\right) }<f_{2}^{\left( k\right) }<f_{3}^{\left( k\right) }\leq f_{r}^{\left( k\right) } \quad\left( k\geq0\right) $$
(17)

If such a case occurs, then it is generally true that

$$ S^{\left( k\right) }=S^{\left( 0\right) }\left[ T\left( -\frac{1} {2}\right) P_{2}\right]^{k}\rightarrow S^{\left( 0\right) }e_{1} e^{T}=\left[ x_{1}^{\left( 0\right) },x_{1}^{\left( 0\right) } ,x_{1}^{\left( 0\right) }\right] , $$

no matter what \(x_{1}^{\left (0\right ) }\) is. The rows of \(S^{\left (0\right ) }\) are the left eigenvectors of \(T\left (-\frac {1}{2}\right ) P_{2}\) belonging to eigenvalues \(\lambda =\frac {1+\sqrt {33}}{8}\) and \(\mu =\frac {1-\sqrt {33}}{8}\), respectively. If (17) holds, then

$$ S^{\left( k\right) }=\left[ \begin{array} [c]{ccc} 0 & \lambda^{k}a & \lambda^{k}\\ 0 & \mu^{k}b & \mu^{k} \end{array} \right] ,\ x_{r}^{\left( k\right) }=\left[ \begin{array} [c]{c} \left( a-1\right) \lambda^{k}\\ \left( b-1\right) \mu^{k} \end{array} \right] ,\ x_{ic}^{\left( k\right) }=\left[ \begin{array} [c]{c} \left( \frac{1}{4}a+\frac{1}{2}\right) \lambda^{k}\\ \left( \frac{1}{4}b+\frac{1}{2}\right) \mu^{k} \end{array} \right] . $$

Let \(f\left (x,y\right ) =g\left (x\right ) +h\left (y\right ) \). The values of a,b,λ and μ imply that \(\lambda >\left \vert \mu \right \vert \) and the arguments x and y at the points \(x_{i}^{\left (k\right ) }\), \(x_{r}^{\left (k\right ) }\) and \(x_{ic}^{\left (k\right ) }\) will be in the range \(x\in \left [ -1,1\right ] \) and \(y\in I_{y}=\left [ -\left (\frac {1}{8}\sqrt {33}+\frac {7}{8}\right ) ,\left (\frac {1}{8} \sqrt {33}+\frac {7}{8}\right ) \right ] \), respectively. Select \(g\left (x\right ) \) and \(h\left (y\right ) \) such that \(g\left (x\right ) =\max \limits \left (\alpha x,-\beta x\right ) \), α,β > 0 and \(\left \vert h\left (y\right ) \right \vert \leq \gamma \left \vert y\right \vert \) for yIy. The inequality \(g\left (x\right ) -\gamma \left \vert y\right \vert \leq f\left (x,y\right ) \leq g\left (x\right ) +\gamma \left \vert y\right \vert \) implies that for \(\alpha >\left (\frac {7}{8}\sqrt {33}+\frac {41}{8}\right ) \gamma \) and \(\beta >\left (\frac {1}{2}\sqrt {33}+\frac {7}{2}\right ) \alpha +\left (\frac {11}{8}\sqrt {33}+\frac {69}{8}\right ) \gamma \), condition (17) holds. Hence for functions \(f_{4}\left (x,y\right ) =\max \limits \left (7x,-53x\right ) +\frac {1}{2}\sin \limits \left (y\right ) \) (\(\gamma =\frac {1}{2}\)), \(f_{5}\left (x,y\right ) =\max \limits \left (13x,-100x\right ) +0.1y^{3}\) (γ = 1) and \(f_{6}\left (x,y\right ) =\max \limits \left (13x,-100x\right ) +\frac {1}{4}y^{2}\) (γ = 1), condition (17) holds. The point \(\left (0,0\right ) \) is not a stationary point for f4 and f5, while it is a minimum point for f6.

Note that for all three functions, we have the same limit point, and the simplex sequence depends on the initial simplex and the relative function value distribution.

Han and Neumann [12] investigated the behavior pattern

$$ 0=f_{1}^{\left( k\right) }<f\left( v\right) <f_{2}^{\left( k\right) }<\cdots<f_{n+1}^{\left( k\right) }\quad\left( k\geq k_{0}\right) . $$
(18)

where the incoming point v is either \(x_{oc}^{\left (k\right ) }\) or \(x_{ic}^{\left (k\right ) }\). Hence

$$ S^{\left( k+1\right) }=S^{\left( k\right) }T\left( \alpha_{k}\right) P_{2}\quad\left( \alpha_{k}\in\left\{ -\frac{1}{2},\frac{1}{2}\right\} \right) . $$
(19)

Under this assumption they proved the convergence \(x_{i}^{\left (k\right ) }\rightarrow 0\) (\(k\rightarrow \infty \)) for \(f\left (x\right ) =x^{T}x\) (\(x\in \mathbb {R}^{n}\)). For n = 2, they gave an initial simplex \(S^{\left (0\right ) }\) for which condition (18) with \(\alpha _{k} \equiv \frac {1}{2}\) is fulfilled.

Lagarias et al. ([23], Lemma 5.1) investigated a somewhat similar but more complicated case where the best vertex \(x_{1}^{\left (k\right ) }\) is constant for all k.

Example 4

Assume that \(x_{ic}^{\left (k\right ) }\) is the incoming vertex infinitely many times such that

$$ f_{1}^{\left( k\right) }\leq f_{2}^{\left( k\right) }\leq\cdots\leq f_{n}^{\left( k\right) }\leq f_{ic}^{\left( k\right) }<f_{n+1}^{\left( k\right) }\leq f_{r}^{\left( k\right) }\quad\left( k\geq0\right) $$
(20)

and n ≥ 2. In this case

$$ S^{\left( k\right) }=S^{\left( 0\right) }\left[ T\left( -\frac{1} {2}\right) \right]^{k}\rightarrow S^{\left( 0\right) }\left[ \begin{array} [c]{cc} I_{n} & \frac{1}{n}e\\ 0 & 0 \end{array} \right] =\left[ x_{1}^{\left( 0\right) },\ldots,x_{n}^{\left( 0\right) },\frac{1}{n}\sum\limits_{i=0}^{n}x_{i}^{\left( 0\right) }\right] . $$

We select

$$ S^{\left( 0\right) }=\left[ \begin{array} [c]{cc} I_{n-1,n} & \frac{1}{n}e\\ 0 & 1 \end{array} \right] \quad\left( I_{n-1,n}=\left[ \delta_{ij}\right]_{i,j=1}^{n-1,n}\right) , $$

whose rows are the left eigenvectors of \(T\left (-\frac {1}{2}\right ) \). If (20) holds, then

$$ S^{\left( k\right) }=\left[ \begin{array} [c]{cc} I_{n-1,n} & \frac{1}{n}e\\ 0 & \frac{1}{2^{k}} \end{array} \right] ,\quad x_{r}^{\left( k\right) }=\left[ \begin{array} [c]{r} \frac{1}{n}e\\ -\frac{1}{2^{k}} \end{array} \right] ,\quad x_{ic}^{\left( k\right) }=\left[ \begin{array} [c]{c} \frac{1}{n}e\\ \frac{1}{2^{k+1}} \end{array} \right] . $$

Assume that \(f\left (x\right ) =\left [ {\prod }_{i=1}^{n-1}g\left (x_{i}\right ) \right ] h\left (x_{n}\right ) \), where \(g\left (x\right ) ,h\left (x\right ) >0\) for x ≥ 0, \(h\left (-x\right ) \geq h\left (x\right ) \) for x > 0, and \(h\left (x\right ) \) is strictly monotone increasing for x ≥ 0 and strictly monotone decreasing for x < 0. If \(g\left (1\right ) <g\left (0\right ) <g\left (\frac {1}{n}\right ) \) holds, then it follows that function values \(f_{i}^{\left (k\right ) }=g\left (0\right ) ^{n-2}g\left (1\right ) h\left (0\right ) \) (i = 1,…,n − 2), \(f_{n}^{\left (k\right ) }=g\left (0\right )^{n-1}h\left (0\right ) \), \(f_{ic}^{\left (k\right ) }=g\left (\frac {1}{n}\right )^{n-1}h\left (\frac {1}{2^{k+1}}\right ) \), \(f_{n+1}^{\left (k\right ) }=g\left (\frac {1}{n}\right )^{n-1}h\left (\frac {1}{2^{k}}\right ) \) and \(f_{r}^{\left (k\right ) }=g\left (\frac {1}{n}\right )^{n-1}h\left (-\frac {1}{2^{k} }\right ) \) satisfy inequality (20). Since \(x_{n+1}^{\left (k\right ) }\rightarrow \left [ \frac {1}{n}e^{T},0\right ] ^{T}\), we have \(f_{n+1}^{\left (k\right ) }\rightarrow \left [ g\left (\frac {1} {n}\right ) \right ]^{n-1}h\left (0\right ) \). For g, we may select \(g\left (x\right ) =e^{-\left (x-\frac {1}{3}\right )^{2}}\), which has a global maximum at \(x=\frac {1}{3}\), while h has a minimum at y = 0. Hence f has a saddle point \(\left [ \frac {1}{3}e^{T},0\right ]^{T}\), which is different from \(\lim _{k\rightarrow \infty }x_{n+1}^{\left (k\right ) }\) if n≠ 3. However for n = 2, \(\lim _{k\rightarrow \infty }x_{3}^{\left (k\right ) }=x_{saddle}\). We can select any of the functions \(f_{7}\left (x\right ) =e^{-{\sum }_{i=1}^{n-1}\left (x_{i}-\frac {1}{3}\right )^{2}}\left (1+\left \vert x_{n}\right \vert \right ) \), \(f_{8}\left (x\right ) =e^{-{\sum }_{i=1}^{n-1}\left (x_{i}-\frac {1}{3}\right )^{2}}\left (1+\sin \limits \left (\left \vert x_{n}\right \vert \right ) \right ) \) and \(f_{9}\left (x\right ) =e^{-{\sum }_{i=1}^{n-1}\left (x_{i}-\frac {1}{3}\right )^{2}}\sin \limits \left (1+{x_{n}^{2}}\right ) \). Note that \(0<f_{7}\left (x\right ) \leq 1+\left \vert x_{n}\right \vert \) and f7 has no finite minimum. Functions f8 and f9 have an infinite number of global maximum and minimum points. Also note that diam\(\left (S^{\left (k\right ) }\right ) \nrightarrow 0\).

For n = 2, there are plenty of similar cases.

Lemma 5

Assume that \(S^{\left (0\right ) }=\left [ x_{1}^{\left (0\right ) },x_{2}^{\left (0\right ) },x_{3}^{\left (0\right ) }\right ] \) is such that \(f_{1}^{\left (0\right ) }\leq f_{2}^{\left (0\right ) } <f_{3}^{\left (0\right ) }\) and define \(\varphi \left (t\right ) =\left (1+t\right ) x_{c}^{\left (0\right ) }-tx_{3}^{\left (0\right ) }\) If, in addition, f is such that (a) \(f\left (\varphi \left (t\right ) \right ) \) is continuous on \(\left [ -1,1\right ] \); (b) \(f\left (\varphi \left (t\right ) \right ) \geq f\left (\varphi \left (-t\right ) \right ) \) for \(t\in \left [ 0,1\right ] \); (c) \(f\left (\varphi \left (t\right ) \right ) \) is strictly monotone decreasing on \(\left [ -1,0\right ] \); (d) \(f\left (\varphi \left (t\right ) \right ) >f\left (\varphi \left (0\right ) \right ) =f_{c}^{\left (0\right ) }\geq f_{2}^{\left (0\right ) }\) (\(t\in \left [ -1,1\right ] \), t≠ 0), then

$$ f_{1}^{\left( k\right) }\leq f_{2}^{\left( k\right) }\leq f_{ic}^{\left( k\right) }<f_{3}^{\left( k\right) }\leq f_{r}^{\left( k\right) } $$
(21)

holds for all k = 0,1,2,…, \(x_{3}^{\left (k\right ) }\rightarrow x_{c}^{\left (0\right ) }\), and \(f_{3}^{\left (k\right ) }\rightarrow f_{c}^{\left (0\right ) }\).

Proof

Assume that for some − 1 ≤ t < 0, \(x_{3}=\varphi \left (t\right ) \) and \(f_{1}^{\left (0\right ) }\leq f_{2}^{\left (0\right ) }<f\left (\varphi \left (t\right ) \right ) \). Then \(x_{r}=\varphi \left (-t\right ) \), \(x_{ic}=\varphi \left (\frac {t}{2}\right ) \), and (b) and (c) imply that

$$ f\left( x_{r}\right) =f\left( \varphi\left( -t\right) \right) \geq f\left( \varphi\left( t\right) \right) =f\left( x_{3}\right) >f\left( \varphi\left( \frac{t}{2}\right) \right) =f\left( x_{ic}\right) . $$

Condition (d) implies that \(f_{1}^{\left (0\right ) }\leq f_{2}^{\left (0\right ) }<f\left (\varphi \left (\frac {t}{2}\right ) \right ) \). □

Example 5

Consider function \(f\left (x,y\right ) =\frac {1}{4}\left (x+\left \vert x\right \vert \right ) +\frac {1}{2}\left \vert x-\left \vert x\right \vert \right \vert +g\left (y\right ) \), where

$$ g\left( y\right) =\left\{ \begin{array} [c]{ll} 0.2\sin\left( 10\pi y-5\pi\right) , & \text{if }0.5\leq y\leq0.7\\ 0, & \text{otherwise} \end{array} \right. . $$

Select \(x_{1}^{\left (0\right ) }=\left [ 0,0.5\right ]^{T}\), \(x_{2} ^{\left (0\right ) }=\left [ 0,0.7\right ]^{T}\), \(x_{3}^{\left (0\right ) }=\left [ 0.5,0.6\right ]^{T}\). Then by Lemma 5 \(x_{3}^{\left (k\right ) }\rightarrow x_{ic}^{\left (0\right ) }=\left [ 0,0.6\right ]^{T}\), which is not a stationary point. Assume now that \(f\left (x,y\right ) =g\left (x\right ) -h\left (y\right ) \), where g and h are continuous real functions, \(g\left (x\right ) >0\) for x≠ 0, \(g\left (0\right ) =0\), \(g\left (x\right ) \) is strictly monotone increasing for x ≥ 0, \(g\left (x\right ) \) is strictly monotone decreasing for x < 0, \(g\left (-x\right ) \geq g\left (x\right ) \) (x ≥ 0), \(h\left (y\right ) \geq 0\) for y≠ 0, \(h\left (0\right ) =0\) and \(h\left (-y\right ) \geq h\left (y\right ) \) for y ≥ 0. Select \(x_{1}^{\left (0\right ) }=\left [ 0,-a\right ]^{T}\), \(x_{2}^{\left (0\right ) }=\left [ 0,a\right ]^{T}\) and \(x_{3}^{\left (0\right ) }=\left [ b,0\right ] ^{T}\) with a,b > 0. Lemma 5 implies that \(x_{3}^{\left (i\right ) }\) converges to the saddle point \(x_{c}^{\left (0\right ) }=\left [ 0,0\right ]^{T}\).

The examples show that for different functions the Nelder-Mead algorithm may generate the same simplex sequence whose limit vertices may be different from or equal to a stationary point of the function. They also show that in case of convergence the simplex vertices \(x_{j}^{\left (k\right ) }\) either converge to the same vector \(\widehat {x}\) or converge to different vectors as shown by Examples 4 and 5.

Assume that the simplex vertices \(x_{j}^{\left (k\right ) }\) (j = 1,2,…,n + 1) converge to the same vector \(\widehat {x}\) as \(k\rightarrow \infty \) and f is continuous at \(\widehat {x}\). Then

$$ \lim\limits_{k\rightarrow\infty}S^{\left( k\right) }=\left[ \widehat{x} ,\ldots,\widehat{x}\right] =\widehat{x}e^{T} $$
(22)

and \(f_{i}^{\left (k\right ) }\rightarrow f\left (\widehat {x}\right ) \) (i = 1,…,n + 1). Assume that \(B_{k}\rightarrow B=we^{T}\). Then \(S^{\left (k\right ) }=S^{\left (0\right ) }B_{k}\rightarrow S^{\left (0\right ) }we^{T}=\widehat {x}e^{T}\) and

$$ \text{diam}\left( S^{\left( k\right) }\right) = \max\limits_{i,j}\left\Vert S^{\left( k\right) }\left( e_{i}-e_{j}\right) \right\Vert \leq \max\limits_{i,j}\left\Vert S^{\left( 0\right) }\right\Vert \left\Vert B_{k}\left( e_{i}-e_{j}\right) \right\Vert \rightarrow0. $$
(23)

Since \(B\left (e_{i}-e_{j}\right ) =0\), \(B_{k}\left (e_{i}-e_{j}\right ) =\left (B_{k}-B\right ) \left (e_{i}-e_{j}\right ) \), we also have the speed estimate

$$ \text{diam}\left( S^{\left( k\right) }\right) \leq\sqrt{2}\left\Vert S^{\left( 0\right) }\right\Vert \left\Vert B_{k}-B\right\Vert . $$
(24)

Note that properties \(f_{i}^{\left (k\right ) }\rightarrow \widehat {f}\) (i = 1,…,n + 1) and diam\(\left (S^{\left (k\right ) }\right ) \rightarrow 0\) (\(k\rightarrow \infty \)) were proved directly for strictly convex two-dimensional functions by Lagarias et al. [23] without relating the results to the stationary point of f. Except for Kelley [20], [19] and Lagarias et al. [22] no general result is known as yet on the convergence to a stationary point of the target function f.

Upon the basis of the preceding arguments we restrict our study to the convergence \(S^{\left (k\right ) } \rightarrow \widehat {x}e^{T}\), which implies \(f_{i}^{\left (k\right ) } \rightarrow f\left (\widehat {x}\right ) \) (i = 1,…,n + 1) and the speed estimate (24).

5 The convergence of the Nelder-Mead method

The convergence result will be proved in several steps. First using a fixed similarity we transform the matrices \(T_{i}P^{\left (i\right ) } \in \mathcal {T}\) to a common lower block triangular form and identify a subset of \(\mathcal {T}\) that might have the RCP property. Next we prove a lemma on the convergence of the product of lower block triangular matrices. Finally, under Assumption (A) we prove the convergence in Section 5.3

5.1 A common similarity transformation

Since eT is a left eigenvector of each \(T_{i}P^{\left (i\right ) } \in \mathcal {T}\), there exists a common similarity transformation that makes them block lower triangular (for a more general case, see Theorem 6.10 of Hartfiel [13]).

Lemma 6

For all \(T_{i}P^{\left (i\right ) }\in \mathcal {T}\), matrix \(F^{-1}T_{i}P^{\left (i\right ) }F\) has the form

$$ F^{-1}T_{i}P^{\left( i\right) }F=\left[ \begin{array} [c]{cc} 1 & 0\\ b_{i} & C_{i} \end{array} \right] \quad\left( F=\left[ \begin{array} [c]{cc} 1 & -e^{T}\\ 0 & I_{n} \end{array} \right] \right) , $$
(25)

where \(b_{i}\in \mathbb {R}^{n}\) and \(C_{i}\in \mathbb {R}^{n\times n}\) are defined by \(T_{i}P^{\left (i\right ) }\).

Proof

For j > 1 we can write

$$ T\left( \alpha\right) P_{j}=\left[ \begin{array} [c]{cc} 1 & ce_{j-1}^{T}\\ 0 & W \end{array} \right] \quad\left( W\in\mathbb{R}^{n\times n}\right) , $$

and so

$$ F^{-1}T\left( \alpha\right) P_{j}F=\left[ \begin{array} [c]{cc} 1 & -e^{T}+ce_{j-1}^{T}+e^{T}W\\ 0 & W \end{array} \right] . $$

Since \(e^{T}We_{j-1}=\left (n-1\right ) c-\alpha =1-c\), \(e^{T}W=\left [ 1,\ldots ,1-c,1,\ldots ,1\right ] \), we obtain the form

$$ F^{-1}T\left( \alpha\right) P_{j}F=\left[ \begin{array} [c]{cc} 1 & 0\\ 0 & W \end{array} \right] . $$

For j = 1,

$$ T\left( \alpha\right) P_{1}=\left[ \begin{array} [c]{cc} c & {e_{1}^{T}}\\ z & W \end{array} \right] \quad\left( W\in\mathbb{R}^{n\times n}\right) $$

with \(z=\left [ c,\ldots ,c,-\alpha \right ]^{T}\). Hence

$$ F^{-1}T\left( \alpha\right) P_{1}F=\left[ \begin{array} [c]{cc} c+e^{T}z & -ce^{T}+{e_{1}^{T}}-e^{T}ze^{T}+e^{T}W\\ z & -ze^{T}+W \end{array} \right] . $$

Since \(e^{T}W=\left [ 0,1,\ldots ,1\right ] \), eTz = 1 − c, c + eTz = 1,

$$ -ce^{T}+{e_{1}^{T}}-e^{T}ze^{T}+e^{T}W=-ce^{T}+{e_{1}^{T}}-\left( 1-c\right) e^{T}+e^{T}W=0. $$

The final result is

$$ F^{-1}T\left( \alpha\right) P_{1}F=\left[ \begin{array} [c]{cc} 1 & 0\\ z & -ze^{T}+W \end{array} \right] . $$

For \(T\left (\alpha \right ) P_{j}\) (j > 1), b = 0, and for \(T\left (\alpha \right ) P_{1}\), \(\left \Vert b\right \Vert _{2}=\left (\left (n-1\right ) c^{2}+\alpha ^{2}\right )^{\frac {1}{2}}\).

Note that \(T_{shr}P=\frac {1}{2}P+\frac {1}{2}e_{1}e^{T}\) and \(P=\left [ e_{i_{1}} ,\ldots ,e_{i_{n+1}}\right ] \). If i1 = 1, then

$$ T_{shr}P=\left[ \begin{array} [c]{cc} 1 & \frac{1}{2}e^{T}\\ 0 & W_{1} \end{array} \right] $$

where W1 is an n × n permutation matrix multiplied by \(\frac {1}{2} \). Hence \(e^{T}W_{1}=\frac {1}{2}e^{T}\) and

$$ F^{-1}T_{shr}PF=\left[ \begin{array} [c]{cc} 1 & -\frac{1}{2}e^{T}+e^{T}W_{1}\\ 0 & W_{1} \end{array} \right] =\left[ \begin{array} [c]{cc} 1 & 0\\ 0 & W_{1} \end{array} \right] . $$

If i1 > 1 and TshrPej = e1, then

$$ T_{shr}P=\left[ \begin{array} [c]{cc} \frac{1}{2} & \frac{1}{2}e^{T}+\frac{1}{2}e_{j-1}^{T}\\ \frac{1}{2}e_{i_{1}-1} & W_{2} \end{array} \right] , $$

where W2ej− 1 = 0, \(e^{T}W_{2}e_{i}=\frac {1}{2}\) (ij − 1) and \(e^{T}W_{2}=\frac {1}{2}\left (e^{T}-e_{j-1}^{T}\right ) \). Since \(e^{T}e_{i_{1}-1}=1\),

$$ \begin{array}{@{}rcl@{}} F^{-1}T_{shr}PF&=&\left[ \begin{array} [c]{cc} \frac{1}{2}+\frac{1}{2}e^{T}e_{i_{1}-1} & \frac{1}{2}e_{j-1}^{T}-\frac{1} {2}e^{T}e_{i_{1}-1}e^{T}+e^{T}W_{2}\\ \frac{1}{2}e_{i_{1}-1} & -\frac{1}{2}e_{i_{1}-1}e^{T}+W_{2} \end{array} \right] \\ &=&\left[ \begin{array} [c]{cc} 1 & 0\\ \frac{1}{2}e_{i_{1}-1} & -\frac{1}{2}e_{i_{1}-1}e^{T}+W_{2} \end{array} \right] \end{array} $$

If i1 = 1, then the first column entries of F− 1TshrPF are 0 except for entry \(\left (1,1\right ) \). If i1 ≥ 2, then entry \(\left (i_{1},1\right ) \) is \(\frac {1}{2}\), while the remaining entries are 0 (≠ 1,i1). Hence \(\left \Vert b\right \Vert _{2}\leq \frac {1}{2}\). The entries of submatrix C are only 0, \(\frac {1}{2}\) and \(-\frac {1}{2}\). In column j, there can be at most two nonzero elements. Note that \(\rho \left (C_{i}\right ) =\frac {1}{2}\) and \(\left \Vert C_{i}\right \Vert _{1}\leq 1\) for \(T_{i}P^{\left (i\right ) }=T_{shr}P\) (\(P\in \mathcal {P}_{n+1}\)). □

Note that matrices Ci and their norms play the key role in the convergence proof. Accordingly we divide the set \(\mathcal {T}\) in two disjoint sets

$$ \mathcal{W}_{1}=\left\{ T\left( \frac{1}{2}\right) P_{j},T\left( -\frac {1}{2}\right) P_{j}:j=1,2\right\} \cup\left\{ T_{shr}P:P\in\mathcal{P}_{n+1}\right\} $$
(26)

and

$$ \begin{array}{@{}rcl@{}} \mathcal{W}_{2}=\left\{ T\left( 2\right) P_{1},T\left( 1\right) P_{j}:j=1,\ldots,n\right\} \\ \cup\left\{ T\left( \frac{1}{2}\right) P_{j},T\left( -\frac{1}{2}\right) P_{j}:j=3,\ldots,n+1\right\} . \end{array} $$
(27)

The matrices of \(\mathcal {W}_{1}\) correspond to the inside and outside contraction operations, when the incoming vertices are inserted in the first or the second position of the ordering (3) or they correspond to any shrinking operation. The matrices of \(\mathcal {W}_{2}\) correspond to the remaining operations of \(\mathcal {T}\). Theorem 3 and Corollary 1 imply that

$$ \rho\left( C_{i}\right) <1\quad\left( T_{i}P^{\left( i\right) } \in\mathcal{W}_{1}\right) $$
(28)

and

$$ \rho\left( C_{i}\right) \geq1\quad\left( T_{i}P^{\left( i\right) } \in\mathcal{W}_{2}\right) . $$
(29)

Note that for each matrix \(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\), an induced matrix norm \(\left \Vert \cdot \right \Vert \) exists such that \(\rho \left (C_{i}\right ) \leq \left \Vert C_{i}\right \Vert <1\). However for any \(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{2}\) and any induced matrix norm \(\left \Vert \cdot \right \Vert \), only \(1\leq \rho \left (C_{i}\right ) \leq \left \Vert C_{i}\right \Vert \) holds. Since \(\max \limits \left \{ \rho \left (C_{i}\right ) :T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\right \} <1\), set \(\mathcal {W}_{1}\) might be an RCP set if we find a proper induced norm for which \(\left \Vert C_{i}\right \Vert \leq 1\) for all \(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\). In fact we make a stronger restriction in the form of Assumption (A).

5.2 A lemma on the convergence of lower block triangular matrices

For i ≥ 1, let

$$ A_{i}=\left[ \begin{array} [c]{cc} 1 & 0\\ b_{i} & C_{i} \end{array} \right] \in\mathbb{R}^{\left( n+1\right) \times\left( n+1\right) } \quad\left( C_{i}\in\mathbb{R}^{n\times n}\right) . $$
(30)

Lemma 7

Assume that \(\left \Vert {\prod }_{j=1}^{k}C_{j}\right \Vert \leq c_{k}\), \({\sum }_{k=1}^{\infty }c_{k}\) is convergent (\(<\infty \)) and \(\left \Vert b_{k}\right \Vert \leq \gamma \) for all k. Then \(L_{k}={\prod }_{j=1}^{k}A_{j}\) converges and

$$ \lim\limits_{k\rightarrow\infty}L_{k}=\left[ \begin{array} [c]{cc} 1 & 0\\ \widetilde{x} & 0 \end{array} \right] $$
(31)

for some \(\widetilde {x}\).

Proof

It is easy to see that

$$ L_{k}= \prod\limits_{j=1}^{k}A_{j}=\left[ \begin{array} [c]{cc} 1 & 0\\ {\sum}_{i=1}^{k}\left( \prod\limits_{j=1}^{i-1}C_{j}\right) b_{i} & \prod\limits_{j=1}^{k}C_{j} \end{array} \right] =\left[ \begin{array} [c]{cc} 1 & 0\\ x_{k} & \prod\limits_{j=1}^{k}C_{j} \end{array} \right] . $$
(32)

If \({\sum }_{k=1}^{\infty }c_{k}\) is convergent, then \(c_{k}\rightarrow 0\). Hence \({\prod }_{j=1}^{k}C_{j}\rightarrow 0\) as \(k\rightarrow \infty \). Since \(s_{k} ={\sum }_{j=1}^{k}c_{j}\) is convergent, for any ε > 0 there is a number \(k_{0}=k_{0}\left (\varepsilon \right ) \) such that for m > kk0, \(\left \vert s_{m}-s_{k}\right \vert <\varepsilon \). Thus for m > kk0, we obtain

$$ \left\Vert x_{m}-x_{k}\right\Vert \leq\sum\limits_{i=k+1}^{m}\left\Vert \prod\limits_{j=1}^{i-1}C_{j}\right\Vert \left\Vert b_{i}\right\Vert \leq\gamma \sum\limits_{i=k+1}^{m}c_{i-1}\leq\gamma\varepsilon. $$

Hence \(x_{k}\rightarrow \) \(\widetilde {x}\) for some \(\widetilde {x}\). □

If \(\left \Vert C_{j}\right \Vert \leq q<1\) for j ≥ 1, then \(\left \Vert {\prod }_{j=1}^{k}C_{j}\right \Vert \leq q^{k}\) and the series \({\sum }_{i=1} ^{\infty }q^{i}\) is convergent.

5.3 The convergence theorem

Formula (25) implies that \(B_{k}={\prod }_{i=1}^{k}T_{i}P^{\left (i\right ) }=FL_{k}F^{-1}\), where

$$ L_{k}= \prod\limits_{i=1}^{k}\left[ \begin{array} [c]{cc} 1 & 0\\ b_{i} & C_{i} \end{array} \right] , $$
(33)

and Bk is convergent if and only if Lk is convergent. The convergence of the Nelder-Mead algorithm will be proved under the following key condition.

Assumption (A): There is an induced matrix norm \(\left \Vert A\right \Vert _{\vartheta }\) such that if \(T_{i}P^{\left (i\right ) } \in \mathcal {W}_{1}\), then \(\left \Vert C_{i}\right \Vert _{\vartheta } <1\).

If Assumption (A) holds, there exist constants 0 < q < 1 ≤ Q such that \(\left \Vert C_{i}\right \Vert _{\vartheta }\leq q<1\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\)) and \(1\leq \left \Vert C_{i}\right \Vert _{\vartheta }\leq Q\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{2}\)). Also there is a constant γ > 0 such that for every \(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\cup \mathcal {W}_{2}\), \(\left \Vert b_{i}\right \Vert _{\vartheta }\leq \gamma \).

Under Assumption (A) the matrix set \(\mathcal {C=}\left \{ C_{i}:T_{i} P^{\left (i\right ) }\in \mathcal {W}_{1}\right \} \) is an RCP set and all infinite products \({\prod }_{i=1}^{\infty }C^{\left (i\right ) }\) (\(C^{\left (i\right ) }\in \mathcal {C}\)) converge to the zero matrix. By Lemma 7 the sequences \(B_{k}={\prod }_{i=1}^{k}T_{i}P^{\left (i\right ) }\) and \(S^{\left (k\right ) }=S^{\left (0\right ) }B_{k}\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\)) are also convergent. Hence if Assumption (A) holds, then \(\mathcal {W}_{1}\) is an RCP set. We show in Section 6 and the Appendix that Assumption (A) holds at least for 1 ≤ n ≤ 8. Note again that matrices of \(\mathcal {W}_{1}\) correspond to the inside and outside contraction operations, when the incoming vertices are inserted in the first or the second position of the ordering (3) or they correspond to any shrinking operation.

Theorem 9

Suppose that Assumption (A) is satisfied and \(S^{\left (0\right ) }\) is nondegenerate. Let \(t_{1}\left (k\right ) \) be the number of operations \(T_{i}P^{\left (i\right ) }\) that belong to \(\mathcal {W}_{1}\), and \(t_{2}\left (k\right ) \) be the number of those operations \(T_{i}P^{\left (i\right ) }\) that belong to \(\mathcal {W}_{2}\) during the first k iterations of the Nelder-Mead method. Also assume that for \(\kappa \in \mathbb {N}\), q1−κQqκ and for \(\mu \in \left (0,1\right ) \), \(t_{1}\left (k\right ) \geq \mu k+\kappa t_{2}\left (k\right ) \) holds (kk0). Then the Nelder-Mead algorithm converges in the sense that

$$ \lim\limits_{k\rightarrow\infty}x_{j}^{\left( k\right) }=\widehat{x}\quad (j=1,\ldots,n+1) $$
(34)

for some vector \(\widehat {x}\) with a convergence speed proportional to \(O\left (q^{\mu k}\right ) \). If f is continuous at \(\widehat {x}\), then

$$ \lim\limits_{k\rightarrow\infty}f\left( x_{j}^{\left( k\right) }\right) =f\left( \widehat{x}\right) \quad(j=1,\ldots,n+1) $$
(35)

holds as well.

Proof

We first investigate the product (33). By assumption \(t_{1}\left (k\right ) \) is the number of those Ci’s that satisfies \(\left \Vert C_{i}\right \Vert \leq q<1\) (1 ≤ ik) and \(t_{2}\left (k\right ) \) is the number of those Ci’s that satisfies \(1\leq \left \Vert C_{i}\right \Vert \leq Q\) (1 ≤ ik). Clearly, \(0\leq t_{i}\left (k\right ) \leq k\) and \(t_{1}\left (k\right ) +t_{2}\left (k\right ) =k\). Then

$$ \left\Vert \prod\limits_{j=1}^{k}C_{j}\right\Vert_{\vartheta}\leq q^{t_{1}\left( k\right) }Q^{t_{2}\left( k\right) }\leq q^{t_{1}\left( k\right) -\kappa t_{2}\left( k\right) }\leq q^{\mu k}=c_{k} $$

and \({\sum }_{i=1}^{\infty }c_{k}\) is clearly convergent. Hence it follows from Lemma 7 that

$$ \lim\limits_{k\rightarrow\infty}L_{k}=\left[ \begin{array} [c]{cc} 1 & 0\\ \widetilde{x} & 0 \end{array} \right] =\widetilde{L} $$

for some vector \(\widetilde {x}\). Since

$$ \left\Vert \widetilde{x}-x_{k}\right\Vert =\left\Vert \sum\limits_{i=k+1}^{\infty }\left( \prod\limits_{j=1}^{i-1}C_{j}\right) b_{i}\right\Vert_{\vartheta} \leq\gamma\sum\limits_{i=k}^{\infty}q^{\mu i}\leq{\varGamma}_{1}q^{\mu k}, $$
$$ \left\Vert L_{k}-\widetilde{L}\right\Vert_{\vartheta}\leq{\varGamma}_{2}q^{\mu k} $$

holds with a suitable constant Γ2 > 0. Hence

$$ B_{k}\rightarrow F\left[ \begin{array} [c]{cc} 1 & 0\\ \widetilde{x} & 0 \end{array} \right] F^{-1}=\left[ \begin{array} [c]{c} 1-e^{T}\widetilde{x}\\ \widetilde{x} \end{array} \right] e^{T}=we^{T}=B $$
(36)

and

$$ \left\Vert B_{k}-B\right\Vert_{\vartheta}\leq{\varGamma}_{2}\text{cond}\left( F\right) q^{\mu k}. $$
(37)

Corollary 2

diam\(\left (S^{\left (k\right ) }\right ) \rightarrow 0\) (\(k\rightarrow \infty \)) with a speed of \(O\left (q^{\mu k}\right ) \).

For higher dimension, we can expect slower convergence, since Lemma 3 implies that q must approach 1.

Except for Lagarias et al. [23] we do not know that what kind of steps can follow each others when the Nelder-Mead method applied to a function. Under Assumption (A), if the Nelder-Mead steps are taken only from \(\mathcal {W}_{1}\), then the algorithm converges in the sense of Theorem 9. Thus the convergence of the Han-Neumann case (18)–(19) also follows. In general the method also takes steps from \(\mathcal {W}_{2}\). If it occurs in a finite number of occasions, that is \(t_{2}\left (k\right ) \leq k_{0}\), then \(t_{1}\left (k\right ) \geq k-k_{0}\) and we can set μ = 1 in the theorem. If not, we must assume that the elements from \(\mathcal {W}_{1}\) counterbalance the effect of those from \(\mathcal {W}_{2}\). This is provided by the simple assumption \(t_{1}\left (k\right ) \geq \mu k+\kappa t_{2}\left (k\right ) \).

The difficulty of Theorem 9 is to find a suitable norm \(\left \Vert \cdot \right \Vert _{\vartheta }\) for which Assumption (A) holds. The reason for this is the following.

If all infinite products from a matrix set Σ converge, that is Σ is an RCP set, then Σ is also product bounded (see, e.g., [13]). A set Σ of n × n matrices is product bounded if there is a constant β > 0 such that \(\left \Vert A_{1}{\cdots } A_{k} \right \Vert \leq \beta \) for all k and all A1,…,AkΣ. A matrix set Σ is product bounded if and only if there exists a multiplicative matrix norm \(\left \Vert \cdot \right \Vert \) such that \(\left \Vert A\right \Vert \leq 1\) for all AΣ (see, e.g., [4, 13]).

If \(\mathcal {W}_{1}\) is an RCP set, then it is also product bounded. Blondel and Tsitsiklis [5] proved that the product boundedness of a finite matrix set Σ is algorithmically undecidable and it remains undecidable even in the special case, when Σ consists of only two matrices. Since product boundedness is a weaker property than the RCP, and yet it is algorithmically undecidable, it seems difficult to decide the RCP property in general. In Section 6 we present a technique that circumvents this problem for \(\mathcal {W}_{1}\) at least for n ≤ 8.

6 The convergence in low dimensions

Here we show that Assumption (A) holds for n = 1,2,…,8, which implies that Theorem 9 also holds for n = 1,2,…,8. Case n = 1 is simple, but for cases n = 2,…,8, we have to construct induced matrix norms that satisfy (A).

6.1 Convergence for n = 1

In this case

$$ \mathcal{W}_{1}=\left\{ T\left( \frac{1}{2}\right) P_{j},T\left( -\frac {1}{2}\right) P_{j}:j=1,2\right\} $$

and

$$ \mathcal{W}_{2}=\left\{ T\left( 1\right) P_{2},T\left( 2\right) P_{2},T\left( 1\right) P_{1}\right\} . $$

Lemma 6 implies

$$ F^{-1}B_{k}F= \prod\limits_{i=1}^{k}\left[ \begin{array} [c]{cc} 1 & 0\\ b_{i} & c_{i} \end{array} \right] , $$

where \(\left \vert c_{i}\right \vert =\frac {1}{2}=q\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\)) and \(1\leq \left \vert c_{i}\right \vert \leq 2=Q\) (\(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{2}\)). Here the norm \(\left \Vert \cdot \right \Vert _{w}=\left \vert \cdot \right \vert \) and the convergence Theorem 9 holds with κ = 1.

6.2 Convergence for 2 ≤ n ≤ 8

Using the result of Stein [40], Householder [17] proved that for each matrix \(A\in \mathbb {R}^{m\times m}\) with \(\rho \left (A\right ) <1\), there is a matrix R such that \(\left \Vert RAR^{-1}\right \Vert _{2}<1\). A related result is given by Deutsch [7].

Here we need a matrix S such that \(\left \Vert C_{i}\right \Vert _{w}=\left \Vert S^{-1}C_{i}S\right \Vert _{2}<1\) holdsfor all matrices \(T_{i}P^{\left (i\right ) }\in \mathcal {W}_{1}\). A simultaneous diagonalization of these matrices clearly would do it. However the matrices of \(\mathcal {W}_{1}\) are not pairwise commuting and so they are not simultaneously diagonalizable (see, e.g., [9, 34, 37]). Hence we tried to solve the optimization problem

$$ \min_{S}\max\left\{ \left\Vert S^{-1}C_{i}S\right\Vert_{2}:T_{i}P^{\left( i\right) }\in\left\{ T\left( \frac{1}{2}\right) P_{j},T\left( -\frac {1}{2}\right) P_{j}:j=1,2\right\} \right\} . $$
(38)

using the standard matrix routines of Matlab R2013b and the ‘fminsearch’ (Nelder-Mead) algorithm starting from several initial points. All numerical computations were done on a PC with Intel i7-8700 CPU @ 3.20GHz and Windows 10 operating system. The numerical results are presented in this section and the Appendix are given in Matlab’s short format (scaled fixed point format with 5 digits).

Since the number of possible \(\left (n+1\right ) \times \left (n+1\right ) \) matrices \(T_{i}P^{\left (i\right ) }\) is \(N=3n+3+\left (n+1\right ) !\) we only present the following computed quantities

$$ \rho_{1}=\max\left\{ \rho\left( C_{i}\right) :T_{i}P^{i}\in\mathcal{W}_{1}\right\} ,\ \nu_{1}=\max\left\{ \left\Vert C_{i}\right\Vert_{2} :T_{i}P^{\left( i\right) }\in\mathcal{W}_{1}\right\} , $$
$$ \rho_{2}=\max\left\{ \rho\left( C_{i}\right) :T_{i}P^{\left( i\right) }\in\mathcal{W}_{2}\right\} ,\ \nu_{2}=\max\left\{ \left\Vert C_{i} \right\Vert_{2}:T_{i}P^{\left( i\right) }\in\mathcal{W}_{2}\right\} , $$
$$ q=\max\left\{ \left\Vert S^{-1}C_{i}S\right\Vert_{2}:T_{i}P^{\left( i\right) }\in\mathcal{W}_{1}\right\} ,\ Q=\max\left\{ \left\Vert S^{-1}C_{i}S\right\Vert_{2}:T_{i}P^{\left( i\right) }\in\mathcal{W} _{2}\right\} . $$

The computed matrices S will be presented in the Appendix for 2 ≤ n ≤ 8.

  

\(T_{i}P^{i}\in \mathcal {W}_{1}\)

  

\(T_{i}P^{\left (i\right ) } \in \mathcal {W}_{2}\)

  

n

ρ1

ν1

q

ρ2

ν2

Q

κ

2

0.8431

1.2892

0.8431

1.6861

3.1787

2.6568

6

3

0.9275

1.2622

0.9275

1.5214

3.8378

2.7437

14

4

0.9587

1.2271

0.9590

1.4201

4.3635

2.7298

24

5

0.9735

1.2171

0.9735

1.3517

4.8195

2.7560

38

6

0.9815

1.3155

0.9836

1.3024

5.2301

3.0442

68

7

0.9864

1.4075

0.9885

1.2652

5.6075

3.0901

98

8

0.9896

1.4939

0.9913

1.2361

5.9592

3.1962

133

The data of the table implies that Assumption (A) holds and so Theorem 9 implies the convergence of the Nelder-Mead method for n = 2,…,8. Note that ρ1 and ρ2 approach 1 as indicated by Lemmas 3 and 4.

If we exclude the expansion operations (\(T\left (2\right ) P_{1}\), \(T\left (1\right ) P_{1}\)), we have smaller κ values as shown in the table below

n

2

3

4

5

6

7

8

q

0.8431

0.9275

0.9590

0.9735

0.9836

0.9885

0.9913

\(Q^{\prime }\)

1.3520

1.3918

1.3858

1.3889

1.5442

1.5545

1.6144

\(\kappa ^{\prime }\)

2

5

8

13

27

39

56

Consequently we have a faster convergence speed, although the estimated speed still slows down for increasing n.

7 Summary

We analyzed the Nelder-Mead algorithm in the iterative form

$$ S^{\left( k\right) }=S^{\left( k-1\right) }T_{k}P^{\left( k\right) }=S^{\left( 0\right) } \prod\limits_{i=1}^{k}T_{i}P^{\left( i\right) }, $$

where \(T_{i}P^{\left (i\right ) }\in \mathcal {T}\) is the matrix of the executed inner step at iteration i. Since the convergence of the sequence \(\left \{ S^{\left (k\right ) }\right \} \) clearly depends on the convergence of the infinite matrix products \({\prod }_{i=1}^{\infty }T_{i}P^{\left (i\right ) }\), we used techniques from the theory of infinite matrix products [13]. First, we investigated the spectra of the matrices \(T_{i}P^{\left (i\right ) }\), then using a simultaneous similarity reduction on \(\mathcal {T}\) to block lower triangular matrices we proved a convergence result (Theorem 9) for the simplex sequence \(\left \{ S^{\left (k\right ) }\right \} \) to rank-one matrices of the form \(\widehat {x}e^{T}\) for some vector \(\widehat {x}\). This implies the convergence \(f_{i}^{\left (k\right ) }\rightarrow f\left (\widehat {x}\right ) \). The examples of Section 4 support the study of this type of convergence. The main idea of the convergence theorem is to identify a subset \(\mathcal {W}_{1}\) of operations T that has the RCP property. This property follows from Assumption (A) which is proved for \(1\leq n\leq \dot {8}\) using numerical optimization. It is not yet known if this bound can be increased in the same way or not. Theorem 9 has a deficiency that \(\widehat {x}\) is not related to any stationary point of f (for a similar result, see also Lagarias et al. [23]). The results of Kelley [20], Lagarias et al. [22] and also the examples of Section 4 indicate that new techniques are to be developed for such results.