1 Introduction

We consider a matrix polynomial equation (MPE) of the following form

$$\begin{aligned} A_nX^n+A_{n-1}X^{n-1}+\cdots +A_0=0, \end{aligned}$$
(1.1)

where \(A_n, A_{n-1},\ldots , A_0 \in \mathbb {R}^{m\times m}\) are the coefficient matrices, and \(X\in \mathbb {R}^{m\times m}\) is the unknown matrix.

Matrix polynomial equations often arise in queueing problems, differential equations, system theory, stochastic theory and many other areas [2, 3, 12, 18, 21, 27]. Different techniques have been studied for finding the minimal nonnegative solution. For the case \(n=2\), the MPE (1.1) is the well-known quadratic matrix equation (QME). In [10, 11, 20, 28], the structured QME, which is called the unilateral quadratic matrix equation (UQME), was studied. The authors showed that an algebraic Riccati equation \(XCX-AX-XD+B=0\) can be transformed into a UQME. Bini et al. [11] proposed an algorithm by complementing the transformation with the shrink-and-shift technique of Ramaswami for finding the solution of the UQME. Larin [29] generalized the Schur and doubling methods to the UQME. For the unstructured QME, which has a wide application in the quasi-birth–death process [6, 30], the minimal nonnegative solution is of importance. Davis [14, 15] considered Newton’s method for solving the unstructured QME. Higham and Kim [23, 24] studied the dominant and minimal solvent of the unstructured QME and they improved the global convergence properties of Newton’s method by incorporating an exact line searches. The logarithmic reduction method with quadratic convergence is introduced in [31].

For the case \(n=+\infty \), the MPE (1.1) is called power series matrix equation and often arises in Markov chains. For a given M/G/1-type matrix S, the computation of the probability invariant vector associated with S is strongly related to the minimal nonnegative solution of the MPE (1.1) with \(n=+\infty \). Latouche [6, 30] proved that Newton’s method could be applied to solve the power series matrix equation, and the matrix sequence obtained by Newton’s method converges to the minimal nonnegative solution. Bini et al. [5] solved the matrix polynomial equations by devising some new iterative techniques with quadratic convergence.

For the general case (\(n\ge 2\)), the cyclic reduction method [7,8,9], the invariant subspace algorithm [1] and the doubling technique [33] have been proposed for finding the minimal nonnegative solution of the MPE (1.1). Kratz and Stickel [26] proved that Newton’s method could also be applied to solve this general case. Seo and Kim [38] studied the relaxed Newton’s method for finding the minimal nonnegative solution of the MPE (1.1) and they also proved that the relaxed Newton’s method could work more efficiently than the general Newton’s method.

Since the minimal nonnegative solution of the MPE (1.1) is of practical importance and there is little work about the perturbation analysis for the MPE (1.1), this paper is devoted to the condition numbers of the MPE (1.1), which play an important role in perturbation analysis. We investigate three kinds of normwise condition numbers for Eq. (1.1). Note that the norm-wise condition number ignores the structure of both input and output data, so when the data are badly scaled or sparse, using norms to measure the relative size of the perturbation on its small or zero entries does not suffice to determine how well the problem is conditioned numerically. In this case, componentwise analysis can be one alternative approach by which much tighter and revealing bounds can be obtained. There are two kinds of alternative condition numbers called mixed and componentwise condition numbers, respectively, which are developed by Gohberg and Koltracht [17], and we refer to [16, 22, 34, 35, 39,40,41,42,43] for more details of these two kinds of condition numbers.

We also apply the theory of mixed and componentwise condition numbers to the MPE (1.1) and present local linear perturbation bounds for its minimal nonnegative solution by using mixed and componentwise condition numbers.

This paper is organized as follows. In Sect. 2, we give a sufficient condition for the existence of the minimal nonnegative solution. In Sect. 3, we investigate three kinds of normwise condition numbers and derive explicit expressions for them. In Sect. 4, we obtain explicit expressions and upper bounds for the mixed and componentwise condition numbers. In Sect. 5, we define a backward error of the approximate minimal nonnegative solution and derive an elegant upper and lower bound. In Sect. 6, we give some numerical examples to show the sharpness of these three kinds of condition numbers.

We begin with the notation used throughout this paper. \(\mathbb {R}^{m\times m}\) stands for the set of \(m\times m\) matrices with elements in field \(\mathbb {R}\). \(\Vert \cdot \Vert _2\) and \(\Vert \cdot \Vert _F\) are the spectral norm and the Frobenius norm, respectively. For \(X=(x_{ij})\in \mathbb {R}^{m\times m}\), \(\Vert X\Vert _\mathrm{max}\) is the max norm given by \(\Vert X\Vert _\mathrm{max}=\mathrm{max}_{i,j}\{|x_{ij}|\}\) and |X| is the matrix whose elements are \(|x_{ij}|\). For a vector \(v=(v_1,v_2,\ldots ,v_m)^T\in \mathbb {R}^m\), diag(v) is the diagonal matrix whose diagonal is given by a vector v and \(|v|=(|v_1|, |v_2|,\ldots , |v_m|)^T\). For a matrix \(A=(a_{ij})\in \mathbb {R}^{m\times m}\) and a matrix B, \(\mathrm{vec}(A)\) is a vector defined by \(\mathrm{vec}(A)=(a_1^T,\ldots ,a_m^T)^T\) with \(a_i\) as the i-th column of A, \(A\otimes B=(a_{ij}B)\) is the Kronecker product. For matrices X and Y, we write \(X\ge 0\ (X>0)\) and say that X is nonnegative (positive) if \(x_{ij}\ge 0\ (x_{ij}>0)\) holds for all ij, and \(X\ge Y \ (X>Y)\) is used as a different notation for \(X-Y\ge 0 \ (X-Y>0)\).

2 Existence of the Minimal Nonnegative Solution

In this section, we give a sufficient condition for the existence of the minimal nonnegative solution of the MPE (1.1). Some basic definitions are stated as follows.

Definition 2.1

[25] Let F be a matrix function from \(\mathbb {R}^{m\times n}\) to \(\mathbb {R}^{m\times n}\). Then a nonnegative (positive) solution \(S_1\) of the matrix equation \(F(X)=0\) is a minimal nonnegative (positive) solution if for any nonnegative (positive) solution S of \(F(X)=0\), it holds that \(S_1\le S.\)

Definition 2.2

[19] A matrix \(A\in \mathbb {R}^{m\times m}\) is an M-matrix if \(A=sI-B\) for some nonnegative matrix B and s with \(s\ge \rho (B)\) where \(\rho \) is the spectral radius; it is a singular M-matrix if \(s=\rho (B)\) and a nonsingular M-matrix if \(s>\rho (B)\).

Theorem 2.3

Assume that the coefficient matrices \(A_{k}\)’s of the MPE (1.1) are nonnegative except \(A_{1}\) and \(-A_{1}\) is a nonsingular M-matrix. Then, there exists the unique minimal nonnegative solution to the MPE (1.1) if

$$\begin{aligned} B = -\sum _{k=0}^{n}A_{k}~\text {is a nonsingular, or a singular irreducible}~M\text {-matrix}. \end{aligned}$$
(2.1)

Proof

We define a matrix function by

$$\begin{aligned} G(X) = -A_{1}^{-1}\left( \sum _{k=2}^{n}A_{k}X^{k} + A_{0} \right) , \end{aligned}$$

where the \(A_k\)’s are coefficients of the MPE (1.1) and \(X\in \mathbb {R}^{m\times m}\).

Consider the sequence \(\{X_{k}\}_{k=0}^{\infty }\) defined by

$$\begin{aligned} X_{i+1} = G(X_{i}), \end{aligned}$$

with \(X_{0} = 0\).

By Theorems A.16 and A.19 in [4], there exists a vector \(v>0\) such that \(Bv > 0\) if B is a nonsingular M-matrix, or \(Bv = 0\) if B is a singular irreducible M-matrix, i.e.,

$$\begin{aligned} \left( -\sum _{k=0}^{n}A_{k} \right) v \ge 0. \end{aligned}$$

Since \(-A_{1}\) is a nonsingular M-matrix, it follows that

$$\begin{aligned} v \ge -A_{1}^{-1}\left( \sum _{k=2}^{n} A_{k} + A_{0}\right) v \ge 0. \end{aligned}$$
(2.2)

We show that

$$\begin{aligned} X_{i} \le X_{i+1} \quad \text {and} \quad X_{i}v < v, \end{aligned}$$
(2.3)

hold for all \(i=0, 1, \ldots \).

Clearly,

$$\begin{aligned} X_{1} = -A_{1}^{-1}A_{0} \ge 0 = X_{0} \quad \text {and} \quad X_{0}v = 0 < v. \end{aligned}$$

Hence, (2.3) holds for \(i = 0\).

Suppose that (2.3) holds for \(i = l\). Then,

$$\begin{aligned} X_{l+2} - X_{l+1} = -A_{1}^{-1} \sum _{k=2}^{n}A_{k}\left( X_{l+1}^{k} - X_{l}^{k}\right) \ge 0. \end{aligned}$$

On the other hand, it follows from (2.2) that

$$\begin{aligned} X_{l+1}v = -A_{1}^{-1}\left( \sum _{k=2}^{n}A_{k}X_{l}^{k} + A_{0}\right) v < -A_{1}^{-1}\left( \sum _{k=2}^{n} A_{k} + A_{0}\right) v \le v. \end{aligned}$$

So, (2.3) holds for \(i = l+1\). By induction, (2.3) holds for all \(i =0,1,\ldots \), which implies that \(\{X_{i}\}\) converges to a nonnegative matrix.

Let S be the nonnegative matrix to which \(\{X_{i}\}\) converges and let Y be a nonnegative solution of the MPE (1.1). It is trivial that \(X_{0} \le Y\). Suppose that \(X_{l} \le Y\). Then,

$$\begin{aligned} Y - X_{l+1}&= -A_{1}^{-1}\left( \sum _{k=2}^{n}A_{k}Y^{k} + A_{0}\right) + A_{1}^{-1}\left( \sum _{k=2}^{n}A_{k}X_{l}^{k} + A_{0}\right) \\&= -A_{1}^{-1} \sum _{k=2}^{n} A_{k}\left( Y^{k} - X_{l}^{k}\right) \ge 0. \end{aligned}$$

By induction, \(X_{i} \le Y\) for all \(i=0,1,\ldots \). Therefore, \(S \le Y\) for any nonnegative solution Y of the MPE (1.1), i.e., S is the minimal nonnegative solution of the MPE (1.1). \(\square \)

Remark 2.4

From the proof of Theorem 2.3, we can see that the sequence \(\{X_i\}\) generated by \(X_{i+1}=G(X_i)\) is monotonically increasing and convergent. So if \(X_i>0\) for some \(i\ge 0\), then the matrix sequence \(\{X_i\}\) monotonically converges to the minimal positive solution of the MPE (1.1).

Corollary 2.5

Under the assumption of Theorem 2.3, if

$$\begin{aligned} B = -\sum _{k=0}^{n}A_{k}~\text {is a nonsingular, or a singular irreducible { M}-matrix}, \end{aligned}$$

and one of the following conditions holds true:

  1. (i)

    Both \(A_0\) and \(A_1\) are irreducible matrices;

  2. (ii)

    \(A_0\) is a positive matrix.

Then, the MPE (1.1) has a minimal positive solution.

Proof

Note that if \(A_0\) and \(A_1\) are irreducible matrices, or if \(A_0\) is a positive matrix, we get \(X_1=-A_1^{-1}A_0>0\), where \(X_1\) is generated by iteration \(X_{i+1}=G(X_i)\) in the proof of Theorem 2.3. According to Remark 2.4, the existence of minimal positive solution of the MPE (1.1) can be proved by using the same technique listed in the proof of Theorem 2.3. \(\square \)

3 Normwise Condition Number

In this section, we investigate three kinds of normwise condition numbers of the MPE (1.1).

The perturbed equation of the MPE (1.1) is

$$\begin{aligned} (A_n+\Delta A_n)(X+\Delta X)^n+\cdots +(A_1+\Delta A_1)(X+\Delta X)+(A_0+\Delta A_0)=0. \end{aligned}$$
(3.1)

For the notation simplification, we introduce the recursion function \(\Phi : \mathbb {N}\times \mathbb {N} \times \mathbb {R}^{m\times m}\times \mathbb {R}^{m\times m}\rightarrow \mathbb {R}^{m\times m}\) as defined in [38]:

$$\begin{aligned} {\left\{ \begin{array}{ll} \Phi (i,0)(X,Y)=X^i,\ \Phi (0,j)(X,Y)=Y^j, &{}\quad i,j\in \mathbb {N},\\ \Phi (i,j)(X,Y)=\big (X\Phi (i-1,j)+Y\Phi (i,j-1)\big )(X,Y),&{}\quad i,j\in \mathbb {N}^+, \end{array}\right. } \end{aligned}$$
(3.2)

where \(\mathbb {N}\) is the set of natural numbers and \(\mathbb {N}^+=\mathbb {N}-\{0\}\). It can be easily shown that

$$\begin{aligned} \Phi (0,0)(X,Y)=I_m, \end{aligned}$$

and

$$\begin{aligned} \Phi (n,1)(X,Y)=\sum _{p=0}^nX^{n-p}YX^p. \end{aligned}$$

Using the function \(\Phi \), we can write the MPE (1.1) as

$$\begin{aligned} \sum _{p=0}^nA_p\Phi (p,0)(X,Y)=0. \end{aligned}$$

Lemma 3.1

(Theorem 2.1, [38]) If X and Y are \(m\times m\) matrices and \(\Phi \) is the recursion function defined by (3.2), then we have

$$\begin{aligned} (X+Y)^p=\sum _{i=0}^p\Phi (p-i,i)(X,Y),\quad p\in \mathbb {N}. \end{aligned}$$

By Lemma 3.1, Eq. (3.1) can be rewritten as

$$\begin{aligned} 0&=\sum _{p=0}^n(A_p+\Delta A_p)\sum _{q=0}^p\Phi (p-q,q)(X, \Delta X)\nonumber \\&=\sum _{q=0}^n\sum _{p=q}^n(A_p+\Delta A_p)\Phi (p-q,q)(X,\Delta X)\nonumber \\&=\left( \sum _{p=0}^n(A_p+\Delta A_p)\Phi (p,0)+\sum _{p=1}^n(A_p+\Delta A_p)\Phi (p-1,1)\right) (X,\Delta X)\nonumber \\&\quad +\sum _{q=2}^n\sum _{p=q}^n(A_p+\Delta A_p)\Phi (p-q,q)(X,\Delta X). \end{aligned}$$
(3.3)

Dropping the high order terms in (3.3) yields

$$\begin{aligned} \sum _{p=1}^nA_p\Phi (p-1,1)(X,\Delta X)\approx -\sum _{p=0}^n\Delta A_p\Phi (p,0)(X,\Delta X), \end{aligned}$$

that is,

$$\begin{aligned} \sum _{p=1}^n\sum _{q=0}^{p-1}A_pX^{p-1-q}\Delta XX^q\approx -\sum _{p=0}^n\Delta A_pX^p. \end{aligned}$$
(3.4)

Applying the vec expression to (3.4) gives

$$\begin{aligned} P\mathrm{vec}(\Delta X)\approx Lr, \end{aligned}$$
(3.5)

where

$$\begin{aligned} P&=\sum _{p=1}^n\left[ \sum _{q=0}^{p-1}(X^q)^T\otimes (A_pX^{p-1-q})\right] ,\nonumber \\ L&=\left[ -(X^n)^T\otimes I_m,\ -(X^{n-1})^T\otimes I_m,\ldots ,-I_{m^2}\right] ,\nonumber \\ r&=\left[ \mathrm{vec}(\Delta A_n)^T,\ \mathrm{vec}(\Delta A_{n-1})^T,\ldots ,\mathrm{vec}(\Delta A_0)^T\right] ^T. \end{aligned}$$
(3.6)

Under certain conditions, usually satisfied in the applications, the matrix P is a nonsingular matrix as showed in [38]. We suppose that P is nonsingular in the remainder of this paper.

We define the following mapping

$$\begin{aligned} \varphi : (A_n, A_{n-1}, \ldots , A_0)\mapsto \mathrm{vec}(X), \end{aligned}$$
(3.7)

where X is the minimal nonnegative solution of the MPE (1.1).

Three kinds of normwise condition numbers are defined by

$$\begin{aligned} k_i(\varphi )={\mathop {\hbox {lim}}\limits _{\epsilon \rightarrow 0}}{\mathop {\hbox {sup}}\limits _{\Delta _i\le \epsilon }} \frac{\Vert \Delta X\Vert _F}{\epsilon \Vert X\Vert _F},\quad i=1,2,3, \end{aligned}$$
(3.8)

where

$$\begin{aligned} \Delta _1&=\left\| \left[ \frac{\Vert \Delta A_n\Vert _F}{\delta _n}, \frac{\Vert \Delta A_{n-1}\Vert _F}{\delta _{n-1}},\ldots , \frac{\Vert \Delta A_0\Vert _F}{\delta _0}\right] \right\| _2,\nonumber \\ \Delta _2&=\mathrm{max}\left\{ \frac{\Vert \Delta A_n\Vert _F}{\delta _n}, \frac{\Vert \Delta A_{n-1}\Vert _F}{\delta _{n-1}},\ldots , \frac{\Vert \Delta A_0\Vert _F}{\delta _0}\right\} ,\nonumber \\ \Delta _3&=\frac{\left\| [\Vert \Delta A_n\Vert _F, \Vert \Delta A_{n-1}\Vert _F, \ldots , \Vert \Delta A_0\Vert _F]\right\| _2}{\Vert [\Vert A_n\Vert _F, \Vert A_{n-1}\Vert _F, \ldots , \Vert A_0\Vert _F]\Vert _2}. \end{aligned}$$
(3.9)

The nonzero parameters \(\delta _k\) in \(\Delta _1\) and \(\Delta _2\) provide some freedom in how to measure the perturbations. Generally, \(\delta _k\) is chosen as the functions of \(\Vert A_k\Vert _F\), and \(\delta _k=\Vert A_k\Vert _F\) is most often taken for \(k=0,1,\ldots , n\).

Theorem 3.2

Using the notations given above, the explicit expressions and upper bounds for the three kinds of normwise condition numbers at X of the MPE (1.1), where X is a solution to the MPE (1.1), are

$$\begin{aligned} k_1(\varphi )&\approx \frac{\Vert P^{-1}L_1\Vert _2}{\Vert X\Vert _F}, \end{aligned}$$
(3.10)
$$\begin{aligned} k_2(\varphi )&\lesssim \mathrm{min}\left\{ \sqrt{n}k_1(\varphi ), \mu /\Vert X\Vert _F\right\} , \end{aligned}$$
(3.11)
$$\begin{aligned} k_3(\varphi )&\approx \frac{\Vert P^{-1}L\Vert _2\sqrt{\sum _{i=0}^n\Vert A_i\Vert _F^2}}{\Vert X\Vert _F}, \end{aligned}$$
(3.12)

where

$$\begin{aligned} L_1&=L\,{ diag}\left( \left[ \delta _n, \delta _{n-1},\ldots , \delta _0\right] ^T\right) ,\\ \mu&=\sum _{k=0}^n\delta _k\left\| P^{-1}\big ((X^k)^T\otimes I_m\big )\right\| _2. \end{aligned}$$

Proof

It follows from (3.5) that

$$\begin{aligned} \mathrm{vec}(\Delta X)\approx P^{-1}L_1r_1, \end{aligned}$$
(3.13)

where

$$\begin{aligned} L_1&=\left[ -\delta _n(X^n)^T\otimes I_m,\quad -\delta _{n-1}(X_{n-1})^T\otimes I_m,\ldots ,-\delta _0I_{m^2}\right] ,\\ r_1&=\left( \frac{\mathrm{vec}(\Delta A_n)^T}{\delta _n},\ \frac{\mathrm{vec}(\Delta A_{n-1})^T}{\delta _{n-1}},\ldots ,\frac{\mathrm{vec}(\Delta A_0)^T}{\delta _0}\right) ^T. \end{aligned}$$

It yields

$$\begin{aligned} \Vert \Delta X\Vert _F=\Vert \mathrm{vec}(\Delta X)\Vert _2\thickapprox \Vert P^{-1}L_1r_1\Vert _2\le \Vert P^{-1}L_1\Vert _2\Vert r_1\Vert _2. \end{aligned}$$
(3.14)

Note that \(\Vert r_1\Vert _2=\Delta _1\le \epsilon \), and it follows from (3.8) (when \(i=1\)) and inequality (3.14) that (3.10) holds.

According to (3.5), we get

$$\begin{aligned} \Vert \Delta X\Vert _F=\Vert \mathrm{vec}(\Delta X)\Vert _2\lesssim \Vert P^{-1}L\Vert _2\Vert r\Vert _2. \end{aligned}$$
(3.15)

Since \(\Vert r\Vert _2=\Delta _3\cdot \big \Vert \big [\Vert A_n\Vert _F,\ \Vert A_{n-1}\Vert _F, \ldots ,\Vert A_0\Vert _F\big ]\big \Vert _2\le \epsilon \sqrt{\sum _{k=0}^n\Vert A_k\Vert _F^2}\), then by (3.8) (when \(i=3\)) and inequality (3.15) we arrive at (3.12).

Let \(\epsilon =\Delta _2\). It follows from (3.13) that

$$\begin{aligned} \Vert \Delta X\Vert _F&\lesssim \Vert P^{-1}L_1\Vert _2\sqrt{\sum \nolimits _{i=0}^n\frac{\Vert \Delta A_i\Vert _F^2}{\delta _i^2}}\nonumber \\&\le \epsilon \sqrt{n}\Vert P^{-1}L_1\Vert _2\nonumber \\&\lesssim \epsilon \sqrt{n}\Vert X\Vert _Fk_1(\varphi ). \end{aligned}$$
(3.16)

On the other hand, (3.13) can be rewritten as

$$\begin{aligned} \mathrm{vec}(\Delta X)\approx -\sum _{k=0}^n\delta _kP^{-1}\big ((X^k)^T\otimes I_m\big )\frac{\mathrm{vec}(\Delta A_k)}{\delta _k}, \end{aligned}$$

from which it is easy to get

$$\begin{aligned} \Vert \Delta X\Vert _F\lesssim \sum _{k=0}^n\delta _k\left\| P^{-1}((X^k)^T\otimes I_m)\right\| _2\frac{\Vert \Delta A_k\Vert _F}{\delta _k}\le \epsilon \mu , \end{aligned}$$
(3.17)

where \(\mu =\sum _{k=0}^n\delta _k\Vert P^{-1}\big ((X^k)^T\otimes I_m\big )\Vert _2\).

Then, (3.11) is obtained according to inequalities (3.16) and (3.17). \(\square \)

Now we study another sensitivity analysis for the MPE (1.1). Consider the parameter perturbation of \(A_p: A_p(\tau )=A_p+\tau E_p\) and the equation

$$\begin{aligned} \sum _{p=0}^nA_p(\tau )X^p=0, \end{aligned}$$
(3.18)

where \(E_p\in \mathbb {R}^{m\times m}\) and \(\tau \) is a real parameter.

Let \(Q(X,\tau )=\sum _{p=0}^nA_p(\tau )X^p\) and let \(X_{+}\) be any solution of the MPE (1.1) such that P is nonsingular. Then

  1. (i)

    \(Q(X_{+},0)=0\)

  2. (ii)

    \(Q(X,\tau )\) is differentiable arbitrarily many times in the neighborhood of \((X_{+},0)\), and

    $$\begin{aligned} \frac{\partial Q}{\partial X}\Big |_{(X_{+}, 0)}&=\sum _{p=1}^n(I_m\otimes A_p)\sum _{q=0}^{p-1}\left( X_{+}^q\right) ^T\otimes X_{+}^{p-1-q}\\&=\sum _{p=1}^n\sum _{q=0}^{p-1}\left( X_{+}^q\right) ^T\otimes \left( A_pX_{+}^{p-1-q}\right) . \end{aligned}$$

Note that \(\frac{\partial Q}{\partial X}|_{(X_{+}, 0)}\) is exactly P in (3.6) and is nonsingular under our assumption. By the implicit function theory [36], there exists \(\delta >0\) for \(\tau \in (-\delta ,\delta )\), there is a unique \(X(\tau )\) satisfying:

  1. (i)

    \(Q(X(\tau ),\tau )=0, X(0)=X_{+}\);

  2. (ii)

    \(X(\tau )\) is differentiable arbitrarily many times with respect to \(\tau \).

For

$$\begin{aligned} \sum _{p=0}^n A_p(\tau )X^p(\tau )=0, \end{aligned}$$
(3.19)

taking derivative for both sides of (3.19) with respect to \(\tau \) at \(\tau =0\) gives

$$\begin{aligned} \sum _{p=1}^n A_p\sum _{q=0}^{p-1}X_{+}^q\dot{X}(0)X_{+}^{p-1-q}+\sum _{p=0}^nE_pX_{+}^p=0. \end{aligned}$$
(3.20)

Applying the vec operator to (3.20) yields

$$\begin{aligned} T\mathrm{vec}(\dot{X}(0))=Mr, \end{aligned}$$

where

$$\begin{aligned} T&=\sum _{p=1}^n\sum _{q=0}^{p-1}\left( X_{+}^{p-1-q}\right) ^T\otimes A_p X_{+}^q,\\ M&=\left[ -\left( X^n_{+}\right) ^T\otimes I_m, \ldots , -X_{+}^T\otimes I_m, -I_{m^2}\right] ,\\ r&=\left[ \mathrm{vec}(E_n)^T,\mathrm{vec}(E_{n-1})^T,\ldots , \mathrm{vec}(E_0)^T\right] ^T. \end{aligned}$$

According to [37], we can derive the Rice condition number of \(X_{+}\):

$$\begin{aligned} k_{X_{+}}&=\lim _{\tau \rightarrow 0^+} {\mathop {{\mathop {\hbox {sup}}\limits _{E_p\in \mathbb {R}^{m\times m}}}}\limits _{p=0,1,\ldots ,n}}\left\{ \frac{\Vert X(\tau )-X_{+}\Vert _F}{\Vert X_{+}\Vert _F}\Big /\left( \frac{\tau \Vert [E_n,\ldots , E_0]\Vert _F}{\Vert [A_n,\ldots ,A_0]\Vert _F}\right) \right\} \\&={\mathop {{\mathop {\hbox {sup}}\limits _{E_p\in \mathbb {R}^{m\times m}}}}\limits _{p=0,1,\ldots ,n}}\left\{ \frac{\Vert \dot{X}(0)\Vert _F}{\Vert (E_n,\ldots , E_0)\Vert _F}\cdot \frac{\Vert [A_n,\ldots , A_0]\Vert _F}{\Vert X_{+}\Vert _F}\right\} \\&={\mathop {{\mathop {\hbox {sup}}\limits _{E_p\in \mathbb {R}^{m\times m}}}}\limits _{p=0,1,\ldots ,n}}\left\{ \frac{\Vert T^{-1}Mr\Vert _2}{\Vert r\Vert _2}\cdot \frac{\Vert [A_n,\ldots , A_0]\Vert _F}{\Vert X_{+}\Vert _F}\right\} \\&=\Vert T^{-1}M\Vert _2\cdot \frac{\Vert [A_n,\ldots , A_0]\Vert _F}{\Vert X_{+}\Vert _F}\\&=\Vert T^{-1}M\Vert _2\frac{\sqrt{\sum _{p=0}^n\Vert A_p\Vert _F^2}}{\Vert X_{+}\Vert _F}. \end{aligned}$$

4 Mixed and Componentwise Condition Number

In this section, we investigate the mixed and componentwise condition numbers of the MPE (1.1). Explicit expressions to these two kinds of condition numbers are derived. We first introduce some well-known results. To define mixed and componentwise condition numbers, the following distance function is useful. For any \(a, b\in \mathbb {R}^m\), define \(\frac{a}{b}=[c_1, c_2, \ldots , c_m]^T\) as

$$\begin{aligned} c_i={\left\{ \begin{array}{ll} a_i/b_i, &{}\mathrm{if}\quad b_i\ne 0,\\ 0, &{}\mathrm{if}\quad a_i=b_i=0,\\ \infty ,&{} \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$

Then we define

$$\begin{aligned} d(a,b)=\left\| \frac{a-b}{b}\right\| _{\infty }={\mathop {\hbox {max}}\limits _{i=1,2,\ldots , m}}\left\{ \left| \frac{a_i-b_i}{b_i}\right| \right\} . \end{aligned}$$

Consequently for matrices \(A, B\in \mathbb {R}^{m\times m}\), we define

$$\begin{aligned} d(A,B)=d(\mathrm{vec}(A), \mathrm{vec}(B)). \end{aligned}$$

Note that if \(d(a,b)<\infty \), \(d(a,b)=\mathrm{min}\{\nu \ge 0:|a_i-b_i|\le \nu |b_i|\ \mathrm{for}\ i=1,2,\ldots ,m\}\).

In the sequel, we assume \(d(a,b)<\infty \) for any pair (ab). For \(\epsilon >0\), we set \(B^0(a,\epsilon )=\{x|d(x,a)\le \epsilon \}\). For a vector-valued function \(F: \mathbb {R}^p\rightarrow \mathbb {R}^q\), \(\text{ Dom }(F)\) denotes the domain of F.

The mixed and componentwise condition numbers introduced by Gohberg and Koltracht [17] are listed as follows:

Definition 4.1

[17] Let \(F: \mathbb {R}^p\rightarrow \mathbb {R}^q\) be a continuous mapping defined on an open set \(\text{ Dom }(F)\subset \mathbb {R}^p\) such that \(0\notin \text{ Dom }(F)\) and \(F(a)\ne 0\) for a given \(a\in \mathbb {R}^p\).

  1. (1)

    The mixed condition number of F at a is defined by

    $$\begin{aligned} m(F,a)=\lim _{\epsilon \rightarrow 0} {\mathop {\mathop {\hbox {sup}}\limits _{x\ne a}}\limits _{x\in B^0(a,\epsilon )}} \frac{\Vert F(x)-F(a)\Vert _{\infty }}{\Vert F(a)\Vert _{\infty }}\frac{1}{d(x,a)}. \end{aligned}$$
  2. (2)

    Suppose \(F(a)=\big [f_1(a), f_2(a), \ldots , f_q(a)\big ]^T\) such that \(f_j(a)\ne 0\) for \(j=1,2,\ldots , q\). The componentwise condition number of F at a is defined by

    $$\begin{aligned} c(F,a)=\lim _{\epsilon \rightarrow 0} {\mathop {\mathop {\hbox {sup}}\limits _{x\in B^0(a,\epsilon )}}\limits _{x\ne a}}\frac{d(F(x), F(a))}{d(x,a)}. \end{aligned}$$

The explicit expressions of the mixed and componentwise condition numbers of F at a are given by the following lemma [13, 17].

Lemma 4.2

Suppose F is Fréchet differentiable at a. We have

  1. (1)

    if \(F(a)\ne 0\), then

    $$\begin{aligned} m(F,a)=\frac{\Vert F'(a){ diag}(a)\Vert _{\infty }}{\Vert F(a)\Vert _{\infty }}=\frac{\Vert |F'(a)||a|\Vert _{\infty }}{\Vert F(a)\Vert _{\infty }}; \end{aligned}$$
  2. (2)

    if \(F(a)=\big [f_1(a), f_2(a), \ldots , f_q(a)\big ]^T\) such that \(f_j(a)\ne 0\) for \(j=1,2,\ldots , q\), then

    $$\begin{aligned} c(F,a)=\Vert { diag}^{-1}(F(a))F'(a){ diag}(a)\Vert _{\infty }=\left\| \frac{|F'(a)||a|}{|F(a)|}\right\| _{\infty }. \end{aligned}$$

Theorem 4.3

Let \(m(\varphi )\) and \(c(\varphi )\) be the mixed and componentwise condition numbers of the MPE (1.1), we have

$$\begin{aligned} m(\varphi )\approx \frac{\Vert T\Vert _{\infty }}{\Vert X\Vert _{\max }} \quad \mathrm{and}\quad c(\varphi )\approx \left\| \frac{T}{|\mathrm{vec}(X)|}\right\| _{\infty }, \end{aligned}$$

where

$$\begin{aligned} T=\sum _{k=0}^n\Big |P^{-1}\big ((X^k)^T\otimes I_m\big )\Big |\mathrm{vec}(|A_k|). \end{aligned}$$

Furthermore, we have two simple upper bounds for \(m(\varphi )\) and \(c(\varphi )\) as follows:

$$\begin{aligned} m_U(\varphi ):=\frac{\Vert P^{-1}\Vert _{\infty }\big \Vert \sum _{k=0}^n|A_k||X^k|\big \Vert _\mathrm{max}}{\Vert X\Vert _\mathrm{max}}\gtrsim m(\varphi ), \end{aligned}$$

and

$$\begin{aligned} c_U(\varphi ):=\big \Vert { diag}^{-1}\big (\mathrm{vec}(X)\big )P^{-1}\big \Vert _{\infty }\left\| \sum _{k=0}^n|A_k||X^k|\right\| _\mathrm{max}\gtrsim c(\varphi ). \end{aligned}$$

Proof

It follows from (3.5) that \(\mathrm{vec}(\Delta X)\approx P^{-1}Lr\), which implies that the Fr\(\acute{e}\)chet derivative of \(\varphi \) is

$$\begin{aligned} \varphi '(A_n,A_{n-1},\ldots , A_0)\approx P^{-1}L, \end{aligned}$$

where \(\varphi \) is defined by (3.7). Let \(v=[\mathrm{vec}(A_n)^T, \mathrm{vec}(A_{n-1})^T, \ldots , \mathrm{vec}(A_0)^T]^T\). From (1) of Lemma 4.2, we obtain

$$\begin{aligned} m(\varphi )\approx \frac{\big \Vert |P^{-1}L||v|\big \Vert _{\infty }}{\Vert \mathrm{vec}(X)\Vert _{\infty }}=\frac{\big \Vert |P^{-1}L||v|\big \Vert _{\infty }}{\Vert X\Vert _\mathrm{max}}=\frac{\Vert T\Vert _{\infty }}{\Vert X\Vert _\mathrm{max}}, \end{aligned}$$

where

$$\begin{aligned} T&=\big |P^{-1}L\big ||v|\\&=\sum _{k=0}^n\Big |P^{-1}\Big ((X^k)^T\otimes I_m\Big )\Big |\mathrm{vec}(|A_k|). \end{aligned}$$

It holds that

$$\begin{aligned} \Vert T\Vert _{\infty }&\le \big \Vert |P^{-1}||L||v|\big \Vert _{\infty }\\&\le \big \Vert P^{-1}\big \Vert _{\infty }\big \Vert |L||v|\big \Vert _{\infty }\\&\le \big \Vert P^{-1}\big \Vert _{\infty }\left\| \sum _{k=0}^n|A_k||X^k|\right\| _\mathrm{max}. \end{aligned}$$

Therefore,

$$\begin{aligned} m(\varphi )\lesssim \frac{\big \Vert P^{-1}\big \Vert _{\infty }\Big \Vert \sum _{k=0}^n|A_k||X^k|\Big \Vert _\mathrm{max}}{\Vert X\Vert _\mathrm{max}}. \end{aligned}$$

From (2) of Lemma 4.2, we obtain

$$\begin{aligned} c(\varphi )\thickapprox \left\| \frac{|P^{-1}L||v|}{|\mathrm{vec}(X)|}\right\| _{\infty }=\left\| \frac{T}{|\mathrm{vec}(X)|}\right\| _{\infty }. \end{aligned}$$

Similarly, it holds that

$$\begin{aligned} c(\varphi )&\lesssim \left\| \frac{|P^{-1}||L||v|}{|\mathrm{vec}(X)|}\right\| _{\infty }\\&\le \Big \Vert { diag}^{-1}\big (\mathrm{vec}(X)\big )P^{-1}\Big \Vert _{\infty }\big \Vert |L||v|\big \Vert _{\infty }\\&= \Big \Vert { diag}^{-1}\big (\mathrm{vec}(X)\big )P^{-1}\Big \Vert _{\infty }\left\| \sum _{k=0}^n|A_k||X^k|\right\| _\mathrm{max}. \end{aligned}$$

\(\square \)

5 Backward Error

In this section, we investigate the backward error of an approximate solution Y to the MPE (1.1). The backward error is defined by

$$\begin{aligned} \theta (Y)=\mathrm{min}\left\{ \epsilon : \sum _{p=0}^n(A_p+\Delta A_p)Y^p=0, \left\| \left[ \delta _n^{-1}\Delta A_n,\ldots , \delta _0^{-1}\Delta A_0\right] \right\| _F\le \epsilon \right\} . \end{aligned}$$
(5.1)

Let

$$\begin{aligned} S=\sum _{p=0}^nA_pY^p, \end{aligned}$$

then we can write the equation in (5.1) as

$$\begin{aligned} -S&=\sum _{p=0}^n\Delta A_pY^p\nonumber \\&=\left[ \delta _n^{-1}\Delta A_n, \delta _{n-1}^{-1}\Delta A_{n-1},\ldots , \delta _0^{-1}\Delta A_0\right] \left( \begin{array}{c} \delta _nY^n\\ \delta _{n-1}Y^{n-1}\\ \vdots \\ \delta _0Y^0\\ \end{array}\right) , \end{aligned}$$
(5.2)

from which we get

$$\begin{aligned} \theta (Y)\ge \frac{\Vert S\Vert _F}{\left( \sum _{p=0}^n\delta _p^2\Vert Y^p\Vert _F^2\right) ^{\frac{1}{2}}}. \end{aligned}$$

Applying the vec operator to (5.2) yields

$$\begin{aligned} -\mathrm{vec}(S)=\left[ \delta _n(Y^n)^T\otimes I_m, \delta _{n-1}(Y^{n-1})^T\otimes I_m,\ldots , \delta _0I_{m^2}\right] \left( \begin{array}{c} \mathrm{vec}(\Delta A_n)/\delta _n\\ \mathrm{vec}(\Delta A_{n-1})/\delta _{n-1}\\ \vdots \\ \mathrm{vec}(\Delta A_0)/\delta _0\\ \end{array}\right) . \end{aligned}$$
(5.3)

For convenience, we write (5.3) as

$$\begin{aligned} Ha=s,\quad H\in \mathbb {R}^{m^2\times (n+1)m^2}, \end{aligned}$$
(5.4)

where

$$\begin{aligned} H&=\left[ \delta _n(Y^n)^T\otimes I_m, \delta _{n-1}(Y^{n-1})^T\otimes I_m,\ldots , \delta _0I_{m^2}\right] ,\\ a&=\left[ \mathrm{vec}(\Delta A_n)^T{/}\delta _n,\ldots , \mathrm{vec}(\Delta A_0)^T/\delta _0\right] ^T,\\ s&=-\mathrm{vec}(S). \end{aligned}$$

We assume that H is of full rank. This guarantees that (5.3) has a solution and the backward error is finite.

From (5.4), an upper bound for \(\theta (Y)\) is obtained

$$\begin{aligned} \theta (Y)\le \Vert H^{+}\Vert _2\Vert s\Vert _2=\frac{\Vert s\Vert _2}{\sigma _\mathrm{min}(H)}, \end{aligned}$$

where \(H^+\) is the pseudoinverse of H, and \(\sigma _\mathrm{min}(H)\) is the minimal singular value of H which is nonzero under the assumption that H is of full rank.

Note that

$$\begin{aligned} \sigma _\mathrm{min}^2(H)&=\lambda _\mathrm{min}(HH^*)\\&=\lambda _\mathrm{min}\left( \sum _{p=0}^n\delta _p^2(Y^p)^T\bar{Y}^p \otimes I_m\right) \\&=\lambda _\mathrm{min}\left( \sum _{p=0}^n\delta _p^2(Y^p)^*Y^p \otimes I_m\right) \\&\ge \sum _{p=0}^n\delta _p^2\lambda _\mathrm{min}((Y^p)^*Y^p)\\&=\sum _{p=0}^n\delta _p^2\sigma _\mathrm{min}^2(Y^p). \end{aligned}$$

Thus

$$\begin{aligned} \theta (Y)\le \frac{\Vert S\Vert _F}{\left( \sum _{p=0}^n\delta _p^2\sigma ^2_\mathrm{min}(Y^p)\right) ^{\frac{1}{2}}}. \end{aligned}$$

6 Numerical Examples

In this section, we give three numerical examples to show the sharpness of the normwise, mixed and componentwise condition numbers. All computations are made in Matlab 7.10.0 with the unit roundoff being \(u\approx 2.2\times 10^{-16}\).

Example 6.1

We consider the matrix polynomial equation \(\sum _{k=0}^9A_kX^k=X\) with \(A_k=D^{-1}\bar{A}_k\) for \(k=0,1\ldots ,9\), where \(\bar{A}_k=rand(10)\) with rand as the random function in Matlab. The matrix D is a diagonal matrix whose entries are the row sums of \(\sum _{i=0}^9\bar{A}_k\) so that \((\sum _{k=0}^9A_k)\mathbf 1 _m=\mathbf 1 _m\). We rewrite the matrix polynomial as

$$\begin{aligned} A_9X^9+A_8X^8+\cdots +(A_1-I_m)X+A_0=0. \end{aligned}$$
(6.1)

Note that \(I_m-A_1\) is a nonsingular M-matrix and \(I_m-\sum _{k=0}^9A_k\) is a singular irreducible M-matrix. From Theorem 2.3, we know the minimal nonnegative solution S of Eq. (6.1) exists.

Suppose that the perturbations in the coefficient matrices are

$$\begin{aligned} \tilde{A}_k=A_k-10^{-s}*rand(10)\circ A_k,\quad k=0,1,\ldots , 9, \end{aligned}$$

where s is a positive integer and \(\circ \) is the Hadamard product. Note that \(I_m-\tilde{A}_1\) and \(I_m-\sum _{k=0}^9\tilde{A}_k\) are also nonsingular M-matrices. Hence the corresponding perturbed equation has a unique minimal nonnegative solution \(\tilde{S}\).

We use the Newton’s method proposed in [38] to compute the minimal nonnegative solution S and \(\tilde{S}\). Choose \(\delta _k=\Vert A_k\Vert _F\), from Theorem 3.2 we get three kinds of local normwise perturbation bounds: \(\Vert \Delta S\Vert _F/\Vert S\Vert _F\lesssim k_i(\varphi )\Delta _i\) for \(i=1,2,3\). Denote \(k_2^U=\sqrt{n}k_1(\varphi )\) and \(k_2^M(\varphi )=\mu /\Vert S\Vert _F\), we compare the above approximate perturbation bounds with the exact relative error \(\Vert \tilde{S}-S\Vert _F/\Vert S\Vert _F\). Table 1 shows that our estimates of the three normwise perturbation bounds are close to the exact relative error \(\Vert \tilde{S}-S\Vert _F/\Vert S\Vert _F\). It also shows that the perturbation bound given by \(k_3(\varphi )\Delta _3\) is sharper than the other two bounds.

Table 1 Comparison of exact relative error with local normwise perturbation bounds

Example 6.2

This example is taken from [32]. Consider the matrix polynomial equation \(A_0+A_1X+A_2X^2=0\). The coefficient matrices \(A_0, A_1, A_2 \in \mathbb {R}^{m\times m}\) with \(m=8\) are given by \(A_0=M_1^{-1}M_0\), \(A_1=I\), \(A_2=M_1^{-1}M_2\), where \(M_0=\mathrm{diag}(\beta _1,\ldots ,\beta _m)\), \(M_2=\rho \cdot \mathrm{diag}(\alpha _1,\ldots ,\alpha _m)\) and

$$\begin{aligned} (M_1)_{i,j}={\left\{ \begin{array}{ll} 1, &{}\mathrm{if}\quad j=(i \ \mathrm{mod} \ m)+1,\\ -1-\rho \alpha _i-\beta _i, &{}\mathrm{if}\quad i=j,\\ 0, &{}\mathrm{elsewhere}, \end{array}\right. } \end{aligned}$$

where \(\alpha =(0.2, 0.2, 0.2, 0.2, 13, 1, 1, 0.2)\), \(\beta _i=2\) for \(i=1,\ldots , m\) and \(\rho =0.99\).

This example represents a queueing system in a random environment, where periods of severe overflows alternate with periods of low arrivals. Note that in this example, both \(A_0\) and \(A_2\) are nonpositive. \(A_1\) itself is a nonsingular M-matrix. Consider the following equation

$$\begin{aligned} -A_0-A_1X-A_2X^2=0. \end{aligned}$$
(6.2)

Then Eq. (6.2) has same solutions as equation \(A_0+A_1X+A_2X^2=0\), and the coefficients matrices in (6.2) satisfy the conditions in Corollary 2.5, then Eq. (6.2) has a minimal positive solution X. For \(k=0,1,2\), let \(\Delta A_k=rand(m)\circ A_k\times 10^{-s}\), where s is a positive integer, then \(\tilde{A_k}=A_k+\Delta A_k\) is the perturbed coefficient matrix of the corresponding perturbed equation. Similarly, the minimal positive solution \(\tilde{X}\) of the perturbed matrix polynomial equation exists and can be obtained by using the Newton’s method in [38].

Let

$$\begin{aligned} \gamma _k=\frac{\Vert \Delta X\Vert _F}{\Vert X\Vert _F},\quad \gamma _m=\frac{\Vert \Delta X\Vert _\mathrm{max}}{\Vert X\Vert _\mathrm{max}},\quad \gamma _c=\left\| \frac{\Delta X}{X}\right\| _\mathrm{max}, \end{aligned}$$

and

$$\begin{aligned} \epsilon _0=\mathrm{min}\{\epsilon : |\Delta A_k|\le \epsilon |A_k|, k=0,1,\ldots , n\}. \end{aligned}$$

Table 2 shows that the mixed and componentwise analysis give more tighter and revealing bounds than the normwise perturbation bounds.

Table 2 Linear asymptotic bounds

Example 6.3

We consider the matrix differential equation

$$\begin{aligned} y^{(3)}+A_2y^{(2)}+A_1y'+A_0y=0. \end{aligned}$$

Such equations may occur in connection with vibrating system. The characteristic polynomial is

$$\begin{aligned} P_3(X)=X^3+A_2X^2+A_1X+A_0=0. \end{aligned}$$

Let

$$\begin{aligned} A_0=\left( \begin{array}{cccc} 1.600&{} 1.280&{} 2.890\\ 1.280&{} 0.840&{} 0.413\\ 2.890&{} 0.413&{} 0.725\\ \end{array}\right) ,\quad A_1=\left( \begin{array}{cccc} -20&{} 5&{} \\ 5&{}-20&{}5 \\ &{}5&{}-20\\ \end{array}\right) \end{aligned}$$

and

$$\begin{aligned} A_2=\left( \begin{array}{cccc} 2.660&{} 2.450&{} 2.100\\ 0.230&{} 1.040&{} 0.223\\ 0.600&{} 0.756&{} 0.658\\ \end{array}\right) . \end{aligned}$$

The coefficient matrices of \(P_3(X)=0\) satisfy the condition in Corollary 2.5, so there is a minimal positive solution \(X_*\) such that \(P_3(X_*)=0\).

Let s be a positive integer and suppose the coefficient matrices are perturbed by \(\Delta A_i\ (i=0,1,2)\), where

$$\begin{aligned} \Delta A_0=\left( \begin{array}{cccc} 0.7922&{}0.0357&{}0.6787\\ 0.9595&{}0.8491&{}0.7577\\ 0.6557&{}0.9340&{}0.7431\\ \end{array} \right) \times 10^{-s},\quad \Delta A_1=\left( \begin{array}{cccc} -0.2&{} 0.1&{} \\ 0.1&{}-0.2&{}0.1 \\ &{}0.1&{}-0.2\\ \end{array}\right) \times 10^{-s} \end{aligned}$$

and

$$\begin{aligned} \Delta A_2=\left( \begin{array}{cccc} 0.9649&{}0.9572&{} 0.1419\\ 0.1576&{}0.4854&{} 0.4218\\ 0.9706&{}0.8003&{} 0.9157\\ \end{array}\right) \times 10^{-s}. \end{aligned}$$

Using the notations listed in Examples 6.1 and 6.2, the perturbation bounds obtained by the normwise, mixed and componentwise condition numbers are listed in Table 3. Table 3 shows that our estimated perturbation bounds are sharp. Moreover, we observe that the simple upper bounds \(m_U(\varphi )\) and \(c_U(\varphi )\) of the mixed and componentwise condition numbers \(m(\varphi )\) and \(c(\varphi )\), which are obtained in Theorem 4.3, are also tight.

Table 3 Comparison of the relative error with out estimates

7 Conclusion

In this paper, one sufficient condition for the existence of the minimal nonnegative solution of a matrix polynomial equation is given. Three kinds of normwise condition numbers of the matrix polynomial equation are investigated. The explicit expressions and upper bounds for the mixed and componentwise condition numbers are derived. A backward error is defined and evaluated.