1 Introduction

We derive upper bounds on the moduli of the zeros of a scalar polynomial. While, in general, it is certainly true that such bounds have become less important now that polynomial zeros can be computed very efficiently, a high-quality and simple explicit bound is still useful. The Cauchy radius, which is an implicit bound, outperforms explicit ones, but has the disadvantage that it must be computed numerically. Although the bounds we obtain here are explicit, they are related to an implicit bound to such an extent that they approach the Cauchy radius in effectiveness.

These bounds are obtained in a somewhat nontraditional way by first embedding scalar polynomials into the larger framework of their generalization to matrix polynomials and then using bounds on the eigenvalues of the latter. Matrix polynomials are encountered in polynomial eigenvalue problems, which consist in finding a nonzero eigenvector \(v \in \mathbb {C}^{m}\) and eigenvalue \(\lambda \in \mathbb {C}\) such that \(P(\lambda ) v = 0\), where

$$\begin{aligned} P(\lambda ) = A_{n} \lambda ^{n} + A_{n-1} \lambda ^{n-1} + \dots A_{1} \lambda + A_{0} \; , \end{aligned}$$

and \(A_{j} \in \mathbb {C}^{m \times m}\). A matrix polynomial is called regular if its determinant is not identically zero, and we will assume this to be the case throughout. If \(A_{0}\) is singular, then zero is an eigenvalue, and if \(A_{n}\) is singular, then there are infinite eigenvalues (zero eigenvalues of the reverse polynomial). There are nm eigenvalues, finite and infinite, counting algebraic multiplicities. The finite eigenvalues are the solutions of \(det P(\lambda )=0\). The matrix polynomials we will encounter here have only finite eigenvalues. A good introduction to this topic is [4], while a long list of engineering applications can be found in [1].

The Cauchy radius is defined by the following.

Definition 1.1

The Cauchy radius of the scalar polynomial \(p(z) = \sum _{j=0}^{n} a_{j}z^{j}\) of degree n that is not a monomial is the unique positive zero of the real polynomial \(|a_{n}|z^{n} -\sum _{j=0}^{n-1} |a_{j}| z^{j}\).

For a given matrix norm, the Cauchy radius of the matrix polynomial \(P(z) = \sum _{j=0}^{n} A_{j}z^{j}\) of degree n that is not a monomial is the unique positive zero of the real polynomial \(\Vert A_{n}^{-1}\Vert ^{-1}z^{n} -\sum _{j=0}^{n-1} \Vert A_{j}\Vert z^{j}\).

The Cauchy radius is an upper bound on the moduli of the zeros or eigenvalues of the corresponding scalar or matrix polynomial, respectively. For scalar polynomials, this result is due to Cauchy ([3, 6, Theorem (27,1), p.122]), while it was generalized to matrix polynomials in [5] (2003), [2] (2013), and [7] (2013), each reaching the same result in a different way.

It is optimal for a given matrix norm, namely, it is the best possible bound among all bounds that depend only on the norms of the coefficient matrices, making it difficult to improve. However, Theorem 8.3.1 in [9] does precisely that for scalar polynomials, by showing that the Cauchy radius can be improved by using an appropriate polynomial multiplier. Several generalizations of this theorem from scalar to matrix polynomials exist, and the simplest one of these was derived in [8, Theorem 2.2]. We state it in the following theorem.

Theorem 1.1

([8]) Let \(P(z) = \sum _{j=0}^{n} A_{j} z^{j}\) be a regular matrix polynomial other than a monomial, with square complex matrix coefficients and with \(A_{n}\) nonsingular. Let \(B_{j}=A_{n}^{-1}A_{j}\), and let the matrix norm used to determine the Cauchy radius of P be unital, i.e., \(\Vert \textrm{I} \Vert =1\). Set \(B_{j}=0\) if \(j \notin \{0,1,\ldots ,n\}\), and define

$$\begin{aligned} Q(z) = \left( I z^{k} - B_{n-k} \right) A_{n}^{-1} P(z) \; , \end{aligned}$$

with k a nonnegative integer. Then, Q has a Cauchy radius \(\sigma \) satisfying \(0 < \sigma \le \rho \), where \(\rho \) is the Cauchy radius of P, and all eigenvalues of P lie in the closed disk defined by \(|z| \le \sigma \).

An analogous result is obtained by performing right-multiplication instead of left-multiplication. Theorem 1.1 also applies to the special case of scalar polynomials as they are matrix polynomials with \(1 \times 1\) coefficients.

Our general approach to constructing an upper bound on the moduli of the zeros of a scalar polynomial p is to derive a quadratic matrix polynomial P whose eigenvalues are the zeros of p and then obtain the upper bound as a close approximation to the Cauchy radius of P, improved at the hand of Theorem 1.1.

The following section summarizes the notation and basic concepts that are needed later on. In Section 3, the bounds are derived, while numerical results are presented in Section 4. The Appendix contains the derivation of a technical result that is required in Section 3.

2 Preliminaries

Throughout, we denote the Hermitian conjugate of a matrix \(A \in \mathbb {C}^{n \times n}\) by \(A^{*}\), i.e., \(A^{*}=\bar{A}^{T}\). We use “\(\textrm{I}\)” for the identity matrix and “0” for the zero matrix without specifying their size, which should be clear from the context. Unless specified otherwise, we will use the infinity vector norm for \(v \in \mathbb {C}^{n}\), defined as \(\Vert v\Vert _{\infty } = \max _{1 \le i \le n} |v_{i}|\), where \(v_{i}\) is the ith component of v, and also the infinity matrix norm for \(A \in \mathbb {C}^{n \times n}\), defined as \(\Vert A \Vert _{\infty } = \max _{1 \le i \le n} \sum _{j=1}^{n} |a_{ij}|\), where \(a_{ij}\) is the (ij)th element of A.

The following simple lemma will be useful later on.

Lemma 2.1

For an integer \(n \ge 3\), let \(x_{j} \in \mathbb {C}\) for \(j=1,\ldots ,n\) and \(y_{j} \in \mathbb {C}\) for \(j=1,\ldots ,n-1\), and define

$$\begin{aligned} \Delta _{n}= \left( \begin{array}{ccccc} y_{1} &{} &{} &{} &{} x_{1} \\ -1 &{} y_{2} &{} &{} &{} x_{2} \\ &{} \ddots &{} \ddots &{} &{} \vdots \\ &{} &{} -1 &{} y_{n-1} &{} x_{n-1} \\ &{} &{} &{} -1 &{} x_{n} \\ \end{array}\right) \; , \end{aligned}$$

where blank spaces represent zero entries. Then,

$$\begin{aligned} det \left( \Delta _{n} \right) = x_{n} \prod _{j=1}^{n-1} y_{j} + x_{n-1} \prod _{j=1}^{n-2} y_{j} + \dots + x_{n-i} \prod _{j=1}^{n-i-1} y_{j} + \dots + x_{2} y_{1} + x_{1} \; . \end{aligned}$$

Proof

Developing \(det \left( \Delta _{n} \right) \) along the bottom row, we obtain \(det \left( \Delta _{n} \right) = x_{n} \prod _{j=1}^{n-1} y_{j} + det (\Delta _{n-1})\). Repeatedly applying this recurrence relation completes the proof.\(\square \)

We will also need the following lemma, the derivation of which has been relegated to the appendix, due to its technical nature.

Lemma 2.2

The unique positive zero \(\zeta \) of \(x^{m}-\varepsilon x -1\), with \(\varepsilon \ge 0\) and integer \(m \ge 3\), satisfies

$$\begin{aligned} \zeta \le \dfrac{1+(m-1)(1+\varepsilon )^{\frac{m}{m-1}}}{1+(m-1)(1+\varepsilon )} \le (1+\varepsilon )^{1/(m-1)} \; . \end{aligned}$$

3 Bounds

To keep expressions as simple as possible (and without loss of generality), we consider only monic scalar polynomials, i.e., polynomials with leading coefficient \(a_{n}=1\), and assume that a polynomial cannot be simplified, i.e., that its powers are not multiples of the same integer \(q \ge 2\). Such a polynomial can be rewritten as

$$\begin{aligned} a_{0} + \sum _{j=1}^{k} a_{j} z^{qj} = a_{0} + \sum _{j=1}^{k} a_{j} y^{j} \; , \end{aligned}$$

where \(y=z^{q}\). Bounds on z are then easily obtained from bounds on y. Although not necessary to obtain our results, simplifying a polynomial significantly reduces the size of the matrices involved in those results.

We now derive a quadratic matrix polynomial whose eigenvalues are the zeros of a given scalar polynomial of even degree, with the goal of obtaining bounds for the zeros of the scalar polynomial from bounds for the eigenvalues of the matrix polynomial. If the scalar polynomial is of odd degree, we instead consider the polynomial zp(z) with no discernible detrimental effect on the ultimate bounds we will derive later. Therefore, from now on, all scalar polynomials will be considered to be of even degree, and we begin by writing such a polynomial p as

$$\begin{aligned} p(z)= & {} z^{n} + a_{n-1} z^{n-1} +\ldots + a_{1} z + a_{0} \nonumber \\= & {} \left( z^{2} + a_{n-1} z + a_{n-2} \right) z^{n-2} + \left( a_{n-3} z + a_{n-4} \right) z^{n-4} \nonumber \\{} & {} + \left( a_{n-5} z + a_{n-6} \right) z^{n-6} + \dots + \left( a_{3} z + a_{2} \right) z^{2} + \left( a_{1} z + a_{0} \right) \;. \end{aligned}$$
(1)

Lemma 2.1, applied to the \(n/2 \times n/2\) matrix with \(x_{j}=a_{2j-1}z+a_{2j-2}\) and \(y_{j}=z^{2}\) for \(j=1,\dots ,n/2-1\), and \(x_{n/2}=z^{2}+a_{n-1} z+ a_{n-2}\), shows that the right-hand side in (1) is the determinant of the \(n/2 \times n/2\) matrix

$$\begin{aligned} \left( \begin{array}{cccccc} z^{2} &{} &{} &{} &{} &{} a_{1} z + a_{0} \\ -1 &{} z^{2} &{} &{} &{} &{} a_{3} z + a_{2} \\ &{}\ddots &{} \ddots &{} &{} &{} \vdots \\ &{} &{} -1 &{} z^{2} &{} &{} a_{n-5} z + a_{n-6} \\ &{} &{} &{} -1 &{} z^{2} &{} a_{n-3} z + a_{n-4} \\ &{} &{} &{} &{} -1 &{} z^{2} + a_{n-1} z + a_{n-2} \\ \end{array}\right) \; , \end{aligned}$$

which can be written as \(\textrm{I} z^{2} - A_{1} z - A_{0}\), where

$$\begin{aligned} A_{0} = \left( \begin{array}{ccccc} 0 &{} &{} &{} &{} -a_{0} \\ 1 &{} &{} &{} &{} -a_{2} \\ &{} \ddots &{} \ddots &{} &{} \vdots \\ &{} &{} 1 &{} 0 &{} - a_{n-4} \\ &{} &{} &{} 1 &{} -a_{n-2} \\ \end{array}\right) \, \;\;\;\; \text {and} \;\;\;\; A_{1} = \left( \begin{array}{ccccc} &{} &{} &{} &{} -a_{1} \\ &{} &{} &{} &{} -a_{3} \\ &{} &{} &{} &{} \vdots \\ &{} &{} &{} &{} -a_{n-3} \\ &{} &{} &{} &{} -a_{n-1}\\ \end{array}\right) . \end{aligned}$$

In other words, the zeros of the scalar polynomial \(p(z)=z^{n}+\sum _{j=0}^{n-1} a_{j} z^{j}\) of even degree are the eigenvalues of the quadratic matrix polynomial \(\textrm{I} z^{2} - A_{1} z - A_{0}\) with \(n/2 \times n/2\) matrix coefficients, where \(A_{0}\) contains only the coefficients of even powers, and \(A_{1}\) the coefficients of odd ones. This means that the Cauchy radius of the quadratic matrix polynomial \(\textrm{I} z^{2} - A_{1}z - A_{0}\) is an upper bound on the moduli of the zeros of p. For the infinity norm, this Cauchy radius is given by positive zero of \(z^{2}- \Vert A_{1}\Vert _{\infty }\, z - \Vert A_{0}\Vert _{\infty }\), namely,

$$\begin{aligned} \dfrac{1}{2} \left( \Vert A_{1}\Vert _{\infty } + \sqrt{\Vert A_{1}\Vert _{\infty }^{2} + 4\Vert A_{0}\Vert _{\infty }} \right) \; , \end{aligned}$$
(2)

which we will now improve with Theorem 1.1. Applying this theorem to the matrix polynomial \(\textrm{I} z^{2} -A_{1} z - A_{0}\) with both left and right multipliers implies that the moduli of the eigenvalues of \(\textrm{I} z^{2} -A_{1} z - A_{0}\), and therefore of the zeros of p, are not larger than the Cauchy radii of

$$\begin{aligned} \left( \textrm{I} z \!+\! A_{1} \right) \left( \textrm{I} z^{2} \!-\! A_{1} z \!-\! A_{0} \right) \!=\! \textrm{I} z^{3} \!-\! \left( A_{0} \!+\! A_{1}^{2} \right) z \!-\! A_{1} A_{0} \!=\! \textrm{I} z^{3} \!-\! \left( A_{0} \!-\!a_{n-1} A_{1} \right) z \!-\! A_{1} A_{0} \; , \end{aligned}$$
(3)
$$\begin{aligned} \left( \textrm{I} z^{2} \!-\! A_{1} z \!-\! A_{0} \right) \left( \textrm{I} z \!+\! A_{1} \right) \!=\! \textrm{I} z^{3} \!-\! \left( A_{0} \!+\! A_{1}^{2} \right) z \!-\! A_{0} A_{1} \!=\! \textrm{I} z^{3} \!-\! \left( A_{0} \!-\!a_{n-1} A_{1} \right) z \!-\! A_{0} A_{1} \; , \end{aligned}$$
(4)

where we have used the easily verified fact that \(A_{1}^{2}=-a_{n-1} A_{1}\).

To compute the Cauchy radii of (3) and (4), we define the following vectors of length n/2:

$$\begin{aligned} {\textbf {s}} = \left( \begin{array}{l} a_{0} \\ a_{2} \\ a_{4} \\ \vdots \\ a_{n-2} \end{array}\right) \;; \; {\textbf {t}} = \left( \begin{array}{l} a_{1} \\ a_{3} \\ a_{5} \\ \vdots \\ a_{n-1} \end{array}\right) \;; \; {\textbf {v}} =\left( \begin{array}{l} 0 \\ a_{1} \\ a_{3} \\ \vdots \\ a_{n-3} \end{array}\right) \;; \; {\textbf {f}} = \left( \begin{array}{l} 0 \\ 1 \\ 1 \\ \vdots \\ 1 \end{array}\right) \;. \end{aligned}$$

It is then a matter of straightforward matrix manipulation of the sparse matrices \(A_{0}\) and \(A_{1}\) to obtain that

figure a

The Cauchy radii of (3) and (4) for the infinity norm can now be readily determined from the above expressions. Defining

$$\begin{aligned} \alpha \!= & {} \!\!\Bigl (\! {\textbf {f}} \!+\! \left| {\textbf {s}}\!-\!a_{n\!-1} {\textbf {t}} \right| \!\Bigr )_{\infty } \! \!=\! \max \!\left\{ \! |a_{0}\!-\!a_{n-1} a_{1}|, 1\!+\!|a_{2}-a_{n\!-\!1}a_{3}|, \dots \!, 1\!+\!|a_{n\!-2} \!-\! a_{n-\!1}^{2}| \!\right\} \! , \nonumber \\ \beta _{L}= & {} (1+|a_{n-2}|) \Bigl ( |{\textbf {t}} | \Bigr )_{\infty } = (1+|a_{n-2}|) \max \left\{ |a_{1}|, |a_{3}|, \dots , |a_{n-1}| \right\} \;, \nonumber \\ \beta _{R}= & {} \Bigl ( \left| {\textbf {v}} - a_{n-1} {\textbf {s}} \right| \Bigr )_{\infty } = \max \left\{ |a_{n-1}a_{0}|, |a_{1} - a_{n-1}a_{2}|, \dots , |a_{n-3} - a_{n-1}a_{n-2}| \right\} \;, \nonumber \\ \end{aligned}$$
(5)

we have that \(\Vert A_{0} -a_{n-1} A_{1}\Vert _{\infty } = \alpha \), \(\Vert A_{1} A_{0}\Vert _{\infty } = \beta _{L}\), and \(\Vert A_{0} A_{1}\Vert _{\infty } = \beta _{R}\). Therefore, the Cauchy radii of the matrix polynomials in (3) and (4), which are upper bounds on the moduli of the zeros of \(p(z) = z^{n}+\sum _{j=0}^{n-1} a_{j} z^{j}\), are given by the unique positive zero of \(z^{3}-\alpha z- \beta _{L}\) or of \(z^{3}-\alpha z- \beta _{R}\), respectively. Although this positive zero can be computed analytically, the procedure for doing so is not entirely straightforward, which prompts us, instead, to use the highly accurate approximations for their solutions from Lemma 2.2 in the following theorem. We treat both cubics together by referring to them as \(z^{3}-\alpha z- \beta \) with the understanding that \(\beta =\beta _{L}\) or \(\beta =\beta _{R}\).

Theorem 3.1

Let p be a monic polynomial of degree n, and define \(g(z)=p(z)\) of even degree \(k=n\) if n is even or \(g(z)=zp(z)\) of even degree \(k=n+1\) if n is odd. Let \(g(z)=z^{k}+\sum _{j=0}^{k-1}a_{j} z^{j}\). Then any zero \(\zeta \) of p satisfies

$$\begin{aligned} |\zeta | \le \beta ^{\frac{1}{3}} \cdot \dfrac{1+2\left( 1+\alpha \beta ^{-\frac{2}{3}} \right) ^{\frac{3}{2}}}{1+2 \left( 1+\alpha \beta ^{-\frac{2}{3}} \right) } \le \beta ^{\frac{1}{3}} \cdot \left( 1 + \alpha \beta ^{-\frac{2}{3}} \right) ^{\frac{1}{2}} \; , \end{aligned}$$

where \(\beta =\beta _{L}\) or \(\beta =\beta _{R}\) and \(\alpha \), \(\beta _{L}\), and \(\beta _{R}\) are as defined in (5) with \(n=k\).

Proof

Setting \(x=z/\beta ^{\frac{1}{3}}\) transforms \(z^{3}-\alpha z - \beta =0\) into \(x^{3}-\varepsilon x -1=0\), with \(\varepsilon =\alpha \beta ^{-\frac{2}{3}}\). Since \(z=\beta ^{\frac{1}{3}}x\), Lemma 2.2 with \(m=3\) and the conclusions immediately preceding the statement of the theorem, imply that upper bounds on the unique positive zero of \(z^{3}-\alpha z -\beta \) are given by M and N, with \(N \le M\), where

$$\begin{aligned} M = \beta ^{\frac{1}{3}} \cdot \left( 1 + \alpha \beta ^{-\frac{2}{3}} \right) ^{\frac{1}{2}} \qquad \text {and} \qquad N = \beta ^{\frac{1}{3}} \cdot \dfrac{1+2\left( 1+\alpha \beta ^{-\frac{2}{3}} \right) ^{\frac{3}{2}}}{1+2 \left( 1+\alpha \beta ^{-\frac{2}{3}} \right) } \; . \end{aligned}$$
(6)

There are “left” and “right” versions of M and N, depending on whether \(\beta =\beta _{L}\) or \(\beta =\beta _{R}\), respectively.\(\square \)

4 Numerical results

To compare the bounds from Theorem 3.1 to the Cauchy radius of a polynomial, we generated 1000 scalar polynomials of degrees 20, 40, 80, and 160, with coefficients whose real and complex parts are uniformly randomly distributed in the interval \([-4,4]\). We then computed the ratios of the Cauchy radius (Cauchy), the bound from (2) (Quadratic), and those of the “right” versions of the bounds M and N in (6), to the modulus of the largest zero of each polynomial and reported the averages in Table 1, i.e., the closer this number is to 1, the better it is. The top row in the table lists the degrees of the polynomials. We observed no significant difference in the results for polynomials with even or odd degree. We have arbitrarily used the “right” version, with the “left” version producing very similar results due to the randomness of the coefficients.

As expected, the bound from (2) performs significantly worse than the bounds M and N. If one takes into account that to compute the Cauchy radius, a polynomial equation of high degree needs to be solved numerically, it is remarkable that mere explicit bounds such as M and N produce relatively similar results (for instance, N differs from the Cauchy radius by less than \(15 \%\) on average for polynomials of degree 160). Moreover, if so desired, these bounds (if they are not already better than the Cauchy radius) can be used as efficient starting points for the iterative computation of the Cauchy radius. On the other hand, it is obviously not surprising that the actual Cauchy radius outperforms these bounds on average.

Table 1 Comparison of the bounds M and N with the Cauchy radius

Although there appears to be no clear pattern (or explanation), it regularly happens, especially for lower order polynomials, that the bounds M and N outperform the Cauchy radius when the latter is relatively large. As an example, consider the polynomial

$$\begin{aligned} z^{10} + 6z^{9} + 7z^{8} -2z^{7} -5z^{6} -3z^{5} + 8z^{4} + 3z^{3} + 6z^{2} -2z + 3 \; \end{aligned}$$

for which the ratio of the Cauchy radius to the magnitude of the largest zero is 1.63. The corresponding values for the Quadratic, M, and N ratios are 1.68, 1.51, and 1.42, respectively.

Another example is the polynomial

$$\begin{aligned} z^{11} + \dfrac{8}{3}z^{10} + 3z^{9} - z^{8} - 3z^{7} - \dfrac{1}{3}z^{6} + \dfrac{8}{3} z^{5} + \dfrac{2}{3} z^{4} - \dfrac{7}{3}z^{3} - \dfrac{2}{3}z^{2} - 3z +\dfrac{8}{3} \; , \end{aligned}$$

for which the ratios for the Cauchy radius, Quadratic, M, and N are 2.46, 2.53, 2.44, and 2.29, respectively.

Conclusion We have derived explicit upper bounds on the moduli of the zeros of a scalar polynomial by using its quadratic matrix polynomial companion form. These upper bounds constitute in many cases a close approximation to the Cauchy radius of the polynomial or an efficient starting point for its iterative computation. Lower bounds can easily be obtained by applying these results to the reverse polynomial \(z^{n}p(1/z)\), whose zeros are the reciprocals of those of p. Similar but more complicated bounds can be derived by using multipliers of a higher degree than those in Theorem 1.1.

5 Appendix. Derivation of Lemma 2.2

In this appendix, we derive a close upper bound on the unique positive solution of the equation \(\varphi (x) {:=} x^{m}-\varepsilon x - 1=0\) with \(\varepsilon >0\) and positive integer \(m \ge 3\), which is formally stated in Lemma 2.2.

Fig. 1
figure 1

\(\alpha (x_{0},\varphi )\) versus \(\varepsilon \) for \(m=3,4,5,6\)

Since \(\varphi ^{\prime \prime }(x) =m(m-1)x^{m-2}> 0\) for \(x>0\), \(\varphi \) is convex for positive arguments. In addition, a direct calculation shows that \(\varphi \) has a unique positive critical point at \(\tilde{x}=(\varepsilon /m)^{1/(m-1)}\), which is a minimum with \(\varphi (\tilde{x}) < 0\), and that \(\varphi (1) = -\varepsilon <0\). Consequently, the unique positive zero \(x^{*}\) of \(\varphi \) satisfies

$$\begin{aligned} \text {max} \left\{ 1 , \left( \dfrac{\varepsilon }{m} \right) ^{\frac{1}{m-1}} \right\}< x^{*} < +\infty \;\; \text {and} \;\; \varphi ^{\prime }(x^{*}) > 0 \; . \end{aligned}$$
Fig. 2
figure 2

\((x_{0}-x^{*})/x^{*}\) (left) and \((x_{1}-x^{*})/x^{*}\) (right) as a percentage for \(m=3\)

We note that Newton’s method converges monotonically to \(x^{*}\) from any \(x_{0} > x^{*}\), and we will obtain an upper bound on \(x^{*}\) by carrying out a single Newton step from such an initial point \(x_{0}\). As we will see, our choice of \(x_{0}\) satisfies a useful property. To derive \(x_{0}\), we rewrite \(\varphi (x)=0\) as \(x(x^{m-1}-\varepsilon )-1\) and observe that, if \(x_{0}^{m-1}-\varepsilon =1\), then \(x_{0} > 1\) and \(\varphi (x_{0}) = x_{0} - 1 > 0\), implying that

$$\begin{aligned} x_{0}=(1+\varepsilon )^{1/(m-1)} \end{aligned}$$
(7)

lies to the right of \(x^{*}\). Moreover, \(x_{0}\) has the same leading asymptotic behavior as \(x^{*}\) when \(\varepsilon \rightarrow 0^{+}\) and \(\varepsilon \rightarrow +\infty \), namely, \(x_{0} \sim 1\) and \(x_{0} \sim \varepsilon ^{\frac{1}{m-1}}\), respectively. Although these properties make \(x_{0}\) an adequate bound in its own right, a single Newton step improves it considerably. The reason for this is an interesting result by the 1966 Fields medalist Stephen Smale ([10, Theorem A]), who showed that if one defines

$$\begin{aligned} \alpha (z,f) = \left| \dfrac{f(z)}{f^{\prime }(z)} \right| \cdot \max _{k \ge 2} \left| \dfrac{f^{(k)}(z)}{k! f^{\prime }(z)} \right| ^{\frac{1}{k-1}} \; , \end{aligned}$$

where f is a complex polynomial and \(z \in \mathbb {C}\), and if \(\alpha (z,f) < \alpha _{0}\), with \(\alpha _{0} \approx 0.130707\), then z lies in the quadratic convergence basin of Newton’s method and and one can, roughly speaking, expect a doubling of the number of accurate digits at every iteration. Different authors have obtained slight improvements and variations of this result. It is a tedious exercise to show analytically that, with \(x_{0}\) defined above, \(\alpha (x_{0},\varphi ) < \alpha _{0}\) for any integer \(m\ge 3\) and any positive value of \(\varepsilon \). Instead, we have graphed, in Fig. 1, the values of \(\alpha (x_{0},\varphi )\) for \(\varepsilon \in [0,40]\) and \(m=3,4,5,6\). The top curve corresponds to \(m=3\), and the ones below it correspond to \(m=4,5,6\), respectively. The figure shows that in all cases, \(\alpha (x_{0},\varphi )< 0.08 < \alpha _{0}\), so that quadratic convergence starts immediately from the starting point \(x_{0}\). We now carry out a single Newton step from \(x_{0}\), using the fact that \(\varepsilon =x_{0}^{m-1}-1\):

$$\begin{aligned} x_{1}\!=\! x_{0} - \dfrac{\varphi (x_{0})}{\varphi ^{\prime }(x_{0})}=x_{0} - \dfrac{x_{0}^{m}\! -\! \varepsilon x_{0}\! -\! 1}{m x_{0}^{m - 1}\! -\! \varepsilon } = \dfrac{(m-1)x_{0}^{m}+1}{(m\! -\! 1) x_{0}^{m-1}\! +\! 1} = \dfrac{1\! +\! (m-1)(1+\varepsilon )^{\frac{m}{m-1}}}{1+(m-1)(1+\varepsilon )} \; . \end{aligned}$$
(8)

Because of the monotonic convergence, \(x_{1} < x_{0}\). Collecting the two upper bounds from (7) and (8), we have obtained the following lemma.

Lemma

The unique positive zero \(\zeta \) of \(x^{m}-\varepsilon x -1\), with \(\varepsilon \ge 0\) and integer \(m \ge 3\), satisfies

$$\begin{aligned} \zeta \le \dfrac{1+(m-1)(1+\varepsilon )^{\frac{m}{m-1}}}{1+(m-1)(1+\varepsilon )} \le (1+\varepsilon )^{1/(m-1)} \; . \end{aligned}$$

To assess the quality of these bounds for \(m=3\), we have compared \(x_{0}\) and \(x_{1}\) to \(x^{*}\) (the exact solution) for \(\varepsilon \in [0,40]\). Figure 2 displays the relative differences \((x_{0}-x^{*})/x^{*}\) and \((x_{1}-x^{*})/x^{*}\) as a percentage, for \(m=3\). It shows, as expected, a significant improvement from \(x_{0}\) to \(x_{1}\), with \(x_{1}\) differing from \(x^{*}\) by at most \(0.6 \%\), making \(x_{1}\) an excellent substitute for \(x^{*}\) for all values of \(\varepsilon \). It is not a surpsise that the same pattern is observed as in Fig. 1 since smaller values of \(\alpha (x,\varphi )\) correspond to x being closer to \(x^{*}\).