1 Introduction

Matrix equations are often encountered in control theory [1, 2], system theory [3, 4], and stability analysis [5,6,7]. For example, the stability of the autonomous system \(\dot{x}(t)=Ax(t)\) is determined by whether the associated Lyapunov equation \(X A + A^{\top }X =-M\) has a positive definite solution X, where M is a given positive definite matrix with approximate size [8]. In this paper, we are concerned with the following (continuous-time) Lyapunov matrix equations:

$$ AX+XA^{\top }=C, $$
(1.1)

where ARm×m and CRm×m are the given constant matrices, and XRm×m is the unknown matrix to be determined.

Obviously, by using the Kronecker product ⊗ and the vec-operator vec, equation (1.1) can be written as a system of linear equations:

$$ (I_{m}\otimes A+A\otimes I_{m})\operatorname{vec}(X)=\operatorname{vec}(C). $$
(1.2)

The order of its coefficient matrix is \(m^{2}\), which becomes very large when the constant m is large. For example, if \(m=100\), then \(m^{2}=10,000\). Obviously, a 10,000 order square matrix requires much more storage capacity than several 100 order square matrices. Furthermore, the inverse computation and eigenvalue computation of 10,000 order square matrix are much more difficult than those of 100 order square matrix.

To solve equation (1.1) or its special cases or generalized versions, different methods have been developed in the literature [5, 7, 9,10,11,12,13,14,15,16,17,18,19], which belong to the category of iterative methods. For example, two conjugate gradient methods are proposed in [7] to solve consistent or inconsistent equation (1.1). Both have finite termination property in the absence of round-off errors and can get least Frobenius norm solution or least-squares solution with the least Frobenius norm of equation (1.1) when they adopt some special kind of initial matrix. By using the hierarchical identification principle, Ding et al. [18] designed a gradient-based iterative algorithm and a least-squares iterative algorithm for equation (1.1), and they proved that the gradient-based iterative algorithm always converges to the exact solution for any initial matrix. However, convergence of the least-squares iterative algorithm is not proved in [18]. In fact, the authors claimed that convergence of the least-squares iterative algorithm is very difficult to prove and still requires studying further. In this paper, we are going to further study the least-squares iterative algorithm for equation (1.1) and present two convergent least-squares iterative algorithms. The feasible set of their convergence factor is presented.

The remainder of the paper is organized as follows. Section 2 presents the first least-squares iterative algorithm for equation (1.1) and its global convergence. Section 3 discusses the second least-squares iterative algorithm for equation (1.1) and its global convergence. Section 4 gives an example to illustrate the rationality of theoretical results. Section 5 ends the paper with some remarks.

2 The first algorithm and its convergence

In this section, we give some notations, present the first least-squares iterative algorithm for equation (1.1), and analyze its global convergence.

The symbol I stands for an identity matrix of approximate size. For any MRn×n, the symbol \(\lambda _{\mathrm{max}}[M]\) denotes the maximum eigenvalue of the square matrix M. For any NRm×n, we use \(N^{\top }\) to denote its transpose, and the symbol tr(N) to stand for its trace. The Frobenius norm \(\|N\|\) is defined as \(\|N\|=\sqrt{\operatorname{tr}(N ^{\top }N)}\). The symbol \(A\otimes B\) defined as \(A\otimes B=(a_{ij}B)\) stands for the Kronecker product of matrices A and B. For a matrix ARm×n, the vectorization operator \(\operatorname{vec}(A)\) is defined by \(\operatorname{vec}(A)=(a_{1}^{\top }, a_{2}^{\top }, \ldots ,a_{n}^{ \top })^{\top }\), where \(a_{k}\) is the kth column of the matrix A. According to the property of the Kronecker product, for any matrices M, N, and X with approximate size, we have

$$ \operatorname{vec}(MXN)=\bigl(N^{\top }\otimes M\bigr) \operatorname{vec}(X). $$

The following definition is a simple extension of the Frobenius norm \(\|\cdot \|\).

Definition 2.1

Given a positive definite matrix MRn×n and a matrix NRm×n, the M-Frobenius norm \(\|N\|_{M}\) is defined as

$$ \Vert N \Vert _{M}=\sqrt{ \operatorname{tr}\bigl(N^{\top }MN\bigr)}. $$
(2.1)

The M-Frobenius norm \(\|\cdot \|_{M}\) defined in (2.1) satisfies the following properties.

Theorem 2.1

Given a positive definite matrix MRn×n and three matrices N,N1,N2Rm×n, it holds that

  1. (1)

    \(\|N\|_{M}=0\Longleftrightarrow N=0\).

  2. (2)

    \(\|N_{1}+N_{2}\|_{M}^{2}=\|N_{1}\|_{M}^{2}+2\operatorname{tr}(N_{1}^{ \top }MN_{2})+\|N_{2}\|_{M}^{2}\).

Proof

The proof is elementary and is omitted here. □

Theorem 2.2

([18])

Equation (1.1) has a unique solution if and only if the matrix \(I_{m}\otimes A+A\otimes I_{m}\) is nonsingular, and the unique solution \(X^{*}\) is given by

$$ \operatorname{vec}\bigl(X^{*}\bigr)=[I_{m}\otimes A+A \otimes I_{m}]^{-1}\operatorname{vec}(C). $$

By using the hierarchical identification principle, Ding et al. [18] presented the following least-squares iterative algorithm for equation (1.1):

$$\begin{aligned}& X_{1}(k)=X(k-1)+\mu \bigl(A^{\top }A \bigr)^{-1}A^{\top }\bigl[C-AX(k-1)-X(k-1)A^{ \top } \bigr], \end{aligned}$$
(2.2)
$$\begin{aligned}& X_{2}(k)=X(k-1)+\mu \bigl[C-AX(k-1)-X(k-1)A^{\top } \bigr]A\bigl(A^{\top }A\bigr)^{-1}, \end{aligned}$$
(2.3)
$$\begin{aligned}& X(k)=\frac{X_{1}(k)+X_{2}(k)}{2}, \quad 0< \mu < 4. \end{aligned}$$
(2.4)

The initial matrix \(X(0)\) may be taken as any matrix X0Rm×m.

The following example shows that the feasible set of the convergence factor μ of iterative scheme (2.2)–(2.4) maybe not the interval \((0,4)\).

Example 2.1

Consider the Lyapunov matrix equations \(AX+XA^{\top }=C\) with

$$ A= \begin{bmatrix} 2& -1 \\ 1& 1 \end{bmatrix} , \qquad C= \begin{bmatrix} -1&-5 \\ 16&16 \end{bmatrix} . $$

The numerical results of iterative scheme (2.2)–(2.4) with \(\mu =1, 0.99, 0.2\) are plotted in Fig. 1, in which

$$ \operatorname{Error}(k)= \bigl\Vert AX(k)+X(k)A^{\top }-C \bigr\Vert . $$
Figure 1
figure 1

Numerical results of iterative scheme (2.2)–(2.4) with different μ

From the three curves in Fig. 1, we find that: (1) iterative scheme (2.2)–(2.4) with \(\mu =1\) is divergent, while iterative scheme (2.2)–(2.4) with \(\mu =0.99, 0.2\) is convergent; (2) the constant 1 maybe the upper bound of μ for this example; (3) smaller convergence factor often can accelerate the convergence of iterative scheme (2.2)–(2.4).

Based on iterative scheme (2.2)–(2.4), we propose the following modified least-squares iterative algorithm.

The iteration \(X_{1}(k)\) is defined the same as in (2.2), while the iterations \(X_{2}(k)\) and \(X(k)\) are defined as

$$\begin{aligned}& X_{2}(k)=X(k-1)+\mu \bigl(A^{\top }A \bigr)^{-1}\bigl[C-AX(k-1)-X(k-1)A^{\top }\bigr]A, \end{aligned}$$
(2.5)
$$\begin{aligned}& X(k)=\frac{X_{1}(k)+X_{2}(k)}{2}. \end{aligned}$$
(2.6)

The initial matrix \(X(0)\) can also be taken as any matrix X0Rm×m.

Remark 2.1

Modified least-squares iterative algorithm (2.2), (2.5), and (2.6) involves the inverse of the matrix \(A^{\top }A\). However, since this term is invariant in each iteration, we need only compute it once before all iterations.

In the remainder of this section, we shall prove the global convergence of the first least-squares iterative algorithm (2.2), (2.5), and (2.6), which is motivated by Theorem 4 in [18].

Theorem 2.3

If equation (1.1) has a unique solution \(X^{*}\) and \(r(A)=m\), the sequence \(\{X(k)\}\) generated by iterative scheme (2.2), (2.5), and (2.6) converges to \(X^{*}\) for any initial matrix \(X(0)\), where the convergence factor μ satisfies

$$ 0< \mu < \frac{2}{\nu }, $$
(2.7)

and the constant ν is defined as

$$ \nu =1+\lambda _{\max }\bigl(A^{\top }A\bigr)\lambda _{\max }\bigl(\bigl(A^{\top }A\bigr)^{-1}\bigr). $$

Proof

Firstly, let us define three error matrices as follows:

$$ \tilde{X}_{1}(k)=X_{1}(k)-X^{*}, \qquad \bigl(A^{\top }A\bigr)^{-1/2}\tilde{X}_{2}(k)=X_{2}(k)-X^{*}, \qquad \tilde{X}(k)=X(k)-X^{*}. $$

Then, by (2.4), the error matrix corresponding to \(X(k)\) can be written as

$$ \tilde{X}(k)=\frac{X_{1}(k)+X_{2}(k)}{2}-X^{*}= \frac{\tilde{X}_{1}(k)+ \tilde{X}_{2}(k)}{2}. $$
(2.8)

Thus, from the convexity of the function \(\|\cdot \|^{2}_{A^{\top }A}\), it holds that

$$ \bigl\Vert \tilde{X}(k) \bigr\Vert ^{2}_{A^{\top }A}\leq \frac{ \Vert \tilde{X}_{1}(k) \Vert ^{2} _{A^{\top }A}+ \Vert \tilde{X}_{2}(k) \Vert ^{2}_{A^{\top }A}}{2}=\frac{ \Vert \tilde{X}_{1}(k) \Vert ^{2}_{A^{\top }A}+ \Vert \tilde{X}_{2}(k) \Vert ^{2}_{A^{ \top }A}}{2}. $$
(2.9)

Secondly, setting

$$ \xi (k)=A\tilde{X}(k-1), \qquad \eta (k)= \tilde{X}(k-1)A^{\top }. $$
(2.10)

Then, by (2.2), (2.10), and \(X^{*}\) is a solution of equation (1.1), \(\tilde{X}_{1}(k)\) can be written as

$$ \tilde{X}_{1}(k)=\tilde{X}(k-1)+\mu \bigl(A^{\top }A\bigr)^{-1}A^{\top }\bigl[- \xi (k)- \eta (k)\bigr]. $$
(2.11)

Similarly, by (2.5), (2.10), and \(X^{*}\) is a solution of equation (1.1), \(\tilde{X}_{2}(k)\) can be written as

$$ \tilde{X}_{2}(k)=\tilde{X}(k-1)+\mu \bigl(A^{\top }A\bigr)^{-1}\bigl[-\xi (k)-\eta (k)\bigr]A. $$
(2.12)

From (2.11) and Theorem 2.1, we have

$$\begin{aligned} & \bigl\Vert \tilde{X}_{1}(k) \bigr\Vert ^{2}_{A^{\top }A} \\ &\quad = \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}-2 \mu \operatorname{tr} \bigl(\tilde{X} ^{\top }(k-1)A^{\top }\bigl[ \xi (k)+\eta (k)\bigr] \bigr)\\ &\qquad {}+\mu ^{2} \bigl\Vert \bigl(A^{\top }A\bigr)^{-1}A ^{\top }\bigl[\xi (k)+\eta (k)\bigr] \bigr\Vert ^{2}_{A^{\top }A} \\ &\quad = \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}-2 \mu \operatorname{tr} \bigl(\tilde{X} ^{\top }(k-1)A^{\top }\bigl[ \xi (t)+\eta (k)\bigr] \bigr) \\ & \qquad {} +\mu ^{2}\operatorname{tr} \bigl(\bigl[\xi (k)+\eta (k) \bigr]^{\top }A\bigl(A^{\top }A\bigr)^{-1}A ^{\top }\bigl[\xi (k)+\eta (k)\bigr] \bigr) \\ &\quad = \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}-2 \mu \operatorname{tr} \bigl(\xi (k)^{ \top }\bigl[\xi (k)+\eta (k)\bigr] \bigr)+\mu ^{2} \bigl\Vert \xi (k)+\eta (k) \bigr\Vert ^{2}. \end{aligned}$$

Similarly, from (2.12) and Theorem 2.1, we have

$$\begin{aligned} & \bigl\Vert \tilde{X}_{2}(k) \bigr\Vert ^{2}_{A^{\top }A} \\ &\quad = \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}-2 \mu \operatorname{tr} \bigl(\tilde{X} ^{\top }(k-1)\bigl[\xi (k)+\eta (k) \bigr]A \bigr) \\ &\qquad {} +\mu ^{2}\operatorname{tr}\bigl(A^{\top }\bigl[ \xi (k)+\eta (k)\bigr]^{\top }\bigl(A^{\top }A \bigr)^{-1}\bigl[ \xi (k)+\eta (k)\bigr]A\bigr) \\ &\quad \leq \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}-2 \mu \operatorname{tr} \bigl( \eta (k)^{\top }\bigl[\xi (k)+\eta (k)\bigr] \bigr)\\ &\qquad {}+\mu ^{2}\lambda _{\max }\bigl(A^{ \top }A\bigr) \lambda _{\max }\bigl(\bigl(A^{\top }A\bigr)^{-1} \bigr) \bigl\Vert \xi (k)+\eta (k) \bigr\Vert ^{2}. \end{aligned}$$

Substituting the above two inequalities into the right-hand side of (2.9) yields

$$\begin{aligned} & \bigl\Vert \tilde{X}(k) \bigr\Vert ^{2}_{A^{\top }A} \\ &\quad \leq \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}- \mu \bigl\Vert \xi (k)+\eta (k) \bigr\Vert ^{2}+ \frac{\mu ^{2}\nu }{2} \bigl\Vert \xi (k)+\eta (k) \bigr\Vert ^{2} \\ &\quad = \bigl\Vert \tilde{X}(k-1) \bigr\Vert ^{2}_{A^{\top }A}- \mu \biggl(1-\frac{\mu \nu }{2} \biggr) \bigl\Vert \xi (k)+\eta (k) \bigr\Vert ^{2} \\ &\quad \leq \bigl\Vert \tilde{X}(0) \bigr\Vert ^{2}_{A^{\top }A}- \mu \biggl(1- \frac{\mu \nu }{2} \biggr)\sum_{i=1}^{k} \bigl\Vert \xi (i)+\eta (i) \bigr\Vert ^{2}. \end{aligned}$$

Since \(0<\mu <2/\nu \), we have

$$ \sum_{k=1}^{\infty } \bigl\Vert \xi (k)+ \eta (k) \bigr\Vert ^{2}< \infty , $$

from which it holds that

$$ \lim_{k\rightarrow \infty }\bigl(\xi (k)+\eta (k)\bigr)=0. $$

That is,

$$ \lim_{k\rightarrow \infty }\bigl(A\tilde{X}(k-1)+\tilde{X}(k-1)A^{\top } \bigr)=0. $$

So

$$ \lim_{k\rightarrow \infty }[I_{m}\otimes A+A\otimes I_{m}] \operatorname{vec}\bigl(\tilde{X}(k-1)\bigr)=0. $$

Since the matrix \(I_{m}\otimes A+A\otimes I_{m}\) is nonsingular, we have

$$ \lim_{k\rightarrow \infty }\operatorname{vec}\bigl(\tilde{X}(k-1)\bigr)=0. $$

Thus

$$ \lim_{k\rightarrow \infty }{X}(k)=X^{*}. $$

This completes the proof. □

Remark 2.2

We can adopt some iterative methods, such as the sum method, the power method [20], to compute the maximum eigenvalue in the constant ν.

3 The second algorithm and its convergence

In this section, we present the second least-squares iterative algorithm for equation (1.1) and analyze its global convergence.

Define a matrix as follows:

$$ \bar{C}=C-XA^{\top }. $$
(3.1)

Then equation (1.1) can be written as

$$ S: AX=\bar{C}. $$

From [18], the least-squares solution of the system S is

$$ X=\bigl(A^{\top }A\bigr)^{-1}A^{\top }\bar{C}. $$

Substituting (3.1) into the above equation, we have

$$ X=\bigl(A^{\top }A\bigr)^{-1}A^{\top } \bigl(C-XA^{\top }\bigr). $$

Then, we get the second least-squares iterative algorithm for equation (1.1) as follows:

$$ X(k)=X(k-1)-\mu \bigl(X(k-1)-\bigl(A^{\top }A \bigr)^{-1}A^{\top } \bigl(C-X(k-1)A ^{\top } \bigr) \bigr). $$
(3.2)

The initial matrix \(X(0)\) may be taken as any matrix X0Rm×m.

Theorem 3.1

If equation (1.1) has a unique solution \(X^{*}\) and \(r(A)=m\), the sequence \(\{X(k)\}\) generated by iterative scheme (3.2) converges to \(X^{*}\) for any initial matrix \(X(0)\), where the convergence factor μ satisfies

$$ 0< \mu < \frac{2+2\lambda _{\min }(A\otimes A^{-1})}{1+\lambda _{\max } ^{2}(A\otimes A^{-1})+2\lambda _{\min }(A\otimes A^{-1})}. $$
(3.3)

Proof

Firstly, let us define an error matrix as follows:

$$ \tilde{X}(k)=X(k)-X^{*}. $$

Then, by (3.2), it holds that

$$\begin{aligned} \tilde{X}(k)&=X(k)-X^{*} \\ &=X(k-1)-X^{*}-\mu \bigl(X(k-1)-\bigl(A^{\top }A \bigr)^{-1}A^{\top } \bigl(C-X(k-1)A ^{\top } \bigr) \bigr) \\ &=\tilde{X}(k-1)-\mu \bigl(X(k-1)-\bigl(A^{\top }A\bigr)^{-1}A^{\top } \bigl(AX ^{*}+X^{*}A^{\top }-X(k-1)A^{\top } \bigr) \bigr) \\ &=(1-\mu )\tilde{X}(k-1)+\mu \bigl(A^{\top }A\bigr)^{-1}A^{\top } \bigl(X^{*}A ^{\top }-X(k-1)A^{\top } \bigr) \\ &=(1-\mu )\tilde{X}(k-1)+\mu A^{-1}\bigl(X^{*}-X(k-1) \bigr)A^{\top } \\ &=(1-\mu )\tilde{X}(k-1)-\mu A^{-1}\tilde{X}(k-1)A^{\top }. \end{aligned}$$

So

$$\begin{aligned} \operatorname{vec} \bigl(\tilde{X}(k) \bigr) =(1-\mu )\operatorname{vec} \bigl(\tilde{X}(k-1) \bigr)-\mu \bigl(A \otimes A^{-1}\bigr) \operatorname{vec} \bigl(\tilde{X}(k-1) \bigr). \end{aligned}$$

Then

$$\begin{aligned} & \bigl\Vert \operatorname{vec} \bigl(\tilde{X}(k) \bigr) \bigr\Vert ^{2} \\ &\quad =(1-\mu )^{2} \bigl\Vert \operatorname{vec} \bigl( \tilde{X}(k-1) \bigr) \bigr\Vert ^{2}-2\mu (1- \mu ) \operatorname{vec}^{\top } \bigl(\tilde{X}(k-1) \bigr) \bigl(A\otimes A^{-1}\bigr) \operatorname{vec} \bigl(\tilde{X}(k-1) \bigr) \\ & \qquad {} +\mu ^{2} \bigl\Vert \bigl(A\otimes A^{-1} \bigr)\operatorname{vec} \bigl(\tilde{X}(k-1) \bigr) \bigr\Vert ^{2} \\ &\quad \leq \bigl(1+2\mu +\mu ^{2}-2\mu (1-\mu )\lambda _{\min }\bigl(A\otimes A ^{-1}\bigr)+\mu ^{2} \lambda _{\max }^{2}\bigl(A\otimes A^{-1}\bigr) \bigr) \bigl\Vert \operatorname{vec} \bigl(\tilde{X}(k-1) \bigr) \bigr\Vert ^{2}. \end{aligned}$$

Set \(\nu =1+2\mu +\mu ^{2}-2\mu (1-\mu )\lambda _{\min }(A\otimes A^{-1})+ \mu ^{2}\lambda _{\max }^{2}(A\otimes A^{-1})\). From (3.3), it holds that \(0<\nu <1\). Thus

$$ \bigl\Vert \operatorname{vec} \bigl(\tilde{X}(k) \bigr) \bigr\Vert ^{2}\leq \nu \bigl\Vert \operatorname{vec} \bigl(\tilde{X}(k-1) \bigr) \bigr\Vert ^{2}\leq \nu ^{k} \bigl\Vert \operatorname{vec} \bigl( \tilde{X}(0) \bigr) \bigr\Vert ^{2}. $$

Then

$$ \lim_{k\rightarrow \infty }\operatorname{vec}\bigl(\tilde{X}(k)\bigr)=0. $$

So

$$ \lim_{k\rightarrow \infty }{X}(k)=X^{*}. $$

This completes the proof. □

Example 3.1

Let us apply the two modified least-squares iterative algorithms, i.e., iterative scheme (2.2), (2.5), and (2.6) (denoted by LSIA1) and iterative scheme (3.2) (denoted by LSIA2), to solve the Lyapunov matrix equations in Example 2.1. We set \(\mu =0.2546\) in the first algorithm and \(\mu =0.3478\) in the second algorithm. The numerical results are plotted in Fig. 2.

Figure 2
figure 2

Numerical results of the two tested algorithms (LSIA1 and LSIA2)

The two curves in Fig. 2 illustrate that the two modified least-squares iterative algorithms are both convergent, and LSIA2 is faster than LSIA1 for this problem.

4 Numerical results

In this section, an example is given to show the efficiency of the two proposed algorithms (denoted by LSIA1 and LSIA2) in Sect. 2 and Sect. 3, and we give some comparisons with the gradient-based iterative algorithm in [18] (denoted by GBIA). The convergence factors in both algorithms are set to their upper bounds.

Example 4.1

Let us consider a medium scale Lyapunov matrix equation

$$ AX+XA^{\top }=C, $$

with

$$ A=-\mathtt{triu}\bigl(\mathtt{rand}(n),1\bigr)+\mathtt{diag}\bigl(8- \mathtt{diag}\bigl( \mathtt{rand}(n)\bigr)\bigr),\qquad C=\mathtt{rand}(n). $$

We set \(n=20\) and set the initial matrix \(X(0)=0\).

The convergence factors in the three algorithms are all taken half of their upper bounds. Three curves in Fig. 3 indicate that LSIA2 is much faster than LSIA1, and LSIA1 is little faster than GBIA for this problem. In fact, the numbers of iterations of LSIA1, LSIA2, and GBIA are 134, 12, and 135, respectively. The final errors of LSIA1, LSIA2, and GBIA are \(9.1473\mbox{e}{-}07\), \(4.1768\mbox{e}{-}07\), and \(8.9406\mbox{e}{-}07\), respectively.

Figure 3
figure 3

Numerical results of LSIA1, LSIA2, and GBIA

5 Conclusions

In this paper, two modified least-squares iteration algorithms are proposed for solving the Lyapunov matrix equations, whose global convergence is proved. The feasible set of their convergence factor is analyzed. Some numerical results are presented to verify the theoretical results. In the future, we shall analyze the convergence property of the least-squares iteration algorithm for solving the Sylvester matrix equations.