1 Introduction

Reductions of matrices to condensed forms play a central role in matrix computations. For computing eigenvalues of a (dense) matrix, reduction to the Hessenberg form (or the tridiagonal form in the case of a symmetric matrix) by an orthogonal similarity transformation is the standard approach. Partial reductions to the Hessenberg form or to the tridiagonal form that is implemented by the Arnoldi algorithm or the Lanczos algorithm form the bases of many methods for large-scale matrix computational problems (e.g., eigenvalue problems, linear systems, matrix function, and model reductions). These algorithms are well studied in the literature.

When a computational problem involves two matrices, an interesting question is to what condensed forms the two matrices can be simultaneously reduced so as to simplify the problem for efficient computations. For large-scale problems, partial reductions could also be a means to approximate the underlying problems by lower order problems. Indeed, simultaneous reductions of matrices have been studied for generalized eigenvalue problems [15], and eigenvalue problems and model reductions for (monic) quadratic matrix polynomials [11]. Specifically, the QZ algorithm for a pair of square matrices is preceded by a reduction to Hessenberg-triangular pair through a left and a right multiplication of orthogonal matrices [10]. We have developed a reduction of a pair of matrices to banded forms through a simultaneous orthogonal similarity transformation [11]. Simultaneous reduction of a pair of matrices to upper Hessenberg and lower triangular forms through a simultaneous orthogonal similarity transformation has been obtained in [12].

In this paper, we present algorithms that simultaneously reduce a pair of square matrices to upper Hessenberg and lower Hessenberg forms through a similarity transformation. The algorithms have a full reduction version suitable for dense matrices and a partial reduction version for large sparse matrices. In particular, by considering special matrix pairs, they recover the standard Arnoldi algorithm as well as the nonsymmetric Lanczos algorithm. Thus, they generalize these standard Krylov subspace methods; at the same time, they also reveal some common structures underlying these methods.

We shall apply our partial simultaneous reduction algorithm to construct a model reduction of certain kind second-order single-input single-output system that is defined by a pair of matrices. Namely, we construct a reduced system of lower dimension that approximates the original system with the moments of two transfer functions matching up to the dimension of the reduced system. A numerical example will be presented to demonstrate this useful approximation property.

The paper is organized as follows. We present the simultaneous similarity reduction algorithms in Sect. 2. We then present an application to model reduction in Sect. 3, demonstrated by a numerical example in Sect. 4. Section 5 presents our concluding remarks.

Notation Throughout this paper, \({\mathbb {K}}^{n\times m}\) is the set of all \(n\times m\) matrices with entries in \({\mathbb {K}}\), where \({\mathbb {K}}\) is \({\mathbb {C}}\) (the set of complex numbers) or \({\mathbb {R}}\) (the set of real numbers), \({\mathbb {K}}^n={\mathbb {K}}^{n\times 1}\), and \({\mathbb {K}}={\mathbb {K}}^1\). \(I_n\) (or simply \(I\) if its dimension is clear from the context) is the \(n\times n\) identity matrix, and \(e_j\) is its \(j\)th column. The superscript “\({\cdot }^*\)” takes conjugate transpose, while “\({\cdot }^T\)” takes transpose only.

We shall also adopt MATLAB-like convention to access the entries of vectors and matrices. The set of integers from \(i\) to \(j\) inclusive is \(i:j\). For a vector \(u\) and a matrix \(X\), \(u_{(j)}\) is \(u\)’s \(j\)th entry, \(X_{(i,j)}\) is \(X\)’s \((i,j)\)th entry, and \(\text{ diag }(u)\) is the diagonal matrix with \((\text{ diag }(u))_{(j,j)}=u_{(j)}\); \(X\)’s submatrices \(X_{(k:\ell ,i:j)}\), \(X_{(k:\ell ,:)}\), and \(X_{(:,i:j)}\) consist of intersections of row \(k\) to row \(\ell \) and column \(i\) to column \(j\), row \(k\) to row \(\ell \) and all columns, and all rows and column \(i\) to column \(j\), respectively. We adopt the convention that extracting a submatrix precedes taking conjugate transpose or transpose, e.g., \(X_{(k:\ell ,i:j)}^*=\left[ X_{(k:\ell ,i:j)}\right] ^*\). Finally, \(\Vert \cdot \Vert _2\) is the Euclidean norm of a vector or the spectral norm of a matrix.

2 Simultaneous Similarity Reduction

In this section, we present algorithms for a simultaneous similarity reduction of two square matrices \(A\) and \(B\) to upper and lower Hessenberg forms, respectively. Namely, given \(A\) and \(B\), we construct \(X\) such that

$$\begin{aligned} X^{-1}AX=H_a, \quad X^{-1}BX=H_b^*, \end{aligned}$$
(2.1)

where \(H_a\) and \(H_b\) are upper Hessenberg matrices. Here, a matrix H = [\(h_{ij}\)] is said to be in the upper Hessenberg form if \(h_{ij}\) = 0 for all \(i > j + 1\). Furthermore, the first column of \(X\) and the first row of \(X^{-1}\) can be specified. We first consider the reduction for dense problems in Subsect. 2.1 and then an iterative algorithm for partial reduction of large matrices in Subsect. 2.2. We shall also discuss the relation between the new algorithms and the traditional ones.

2.1 Full Reduction

In our presentation, we shall repeatedly use the fact that two vectors can be reduced to multiples of \(e_1 = (1, 0, \ldots , 0)^T\) essentially through two Gaussian transformations [10, p.96] (see [2, 13, 17] for example), which we state as the following lemma, where (2.3) appears to be new.

Lemma 2.1

Let \(x, y\in {\mathbb {C}}^n\). If \(x^*y\ne 0\), then there exists nonsingular \(X\in {\mathbb {C}}^{n\times n}\) and scalars \(\gamma , \alpha \in {\mathbb {C}}\) such that

$$\begin{aligned} X^{-1}y=\alpha e_1, \quad X^*x=\gamma e_1, \end{aligned}$$
(2.2)

and \(x^*y=\bar{\gamma }\alpha \). Furthermore, we can choose \(X\) so that

$$\begin{aligned} \kappa (X):=\Vert X\Vert _2 \Vert X^{-1}\Vert _2\le \left( \frac{\Vert x\Vert _2\Vert y\Vert _2}{|x^*y|}\right) ^2+\sqrt{\left( \frac{\Vert x\Vert _2\Vert y\Vert _2}{|x^*y|}\right) ^2-1}. \end{aligned}$$
(2.3)

Proof

We outline a proof of (2.2) which basically is the same as in the literature, in order to establish the bound (2.3) on the condition number of \(X\). First, let \(Q\in {\mathbb {C}}^{n\times n}\) be the unitary matrix, e.g., a Householder transformation [10], such that \(Q^*x=\gamma e_1\), where \(\gamma \in {\mathbb {C}}\) and \(|\gamma |=\Vert x\Vert _2\). Next, we write \(Q^*y=(\eta _1,\eta _2,\ldots ,\eta _n)^T\), and note

$$\begin{aligned} 0\ne x^*y=x^*QQ^*y=\bar{\gamma }\eta _1 \quad \Rightarrow \quad |\eta _1|=\frac{|x^*y|}{\Vert x\Vert _2}>0. \end{aligned}$$

Let \(z=\frac{1}{\eta _1}(\eta _2, \eta _3,\ldots , \eta _n)^T\), and

$$\begin{aligned} L=\begin{pmatrix} 1 &{}\quad 0 \\ z &{}\quad I_{n-1} \\ \end{pmatrix} \quad \Rightarrow \quad L^{-1}=\begin{pmatrix} 1 &{} \quad 0 \\ -z &{}\quad I_{n-1} \\ \end{pmatrix}. \end{aligned}$$

It can be verified that

$$\begin{aligned} L^{-1}(Q^*y)=\eta _1 e_1, \quad L^*(Q^*x)=Q^*x=\gamma e_1. \end{aligned}$$

Finally, take \(X=QL\) to give (2.2) with \(\alpha =\eta _1\).

For proving (2.3), we have \(\kappa (X)=\kappa (L)=\Vert L\Vert _2\Vert L^{-1}\Vert _2\). We now estimate \(\Vert L\Vert _2\) and \(\Vert L^{-1}\Vert _2\). Let \(Z\in {\mathbb {C}}^{(n-1)\times (n-1)}\) be a unitary matrix such that \(Z^*z=\xi e_1\), where \(\xi \in {\mathbb {C}}\) and

$$\begin{aligned} |\xi |=\Vert z\Vert _2=\sqrt{\frac{\Vert y\Vert _2^2-|\eta _1|^2}{|\eta _1|^2}} =\sqrt{\left( \frac{\Vert x\Vert _2\Vert y\Vert _2}{|x^*y|}\right) ^2-1}. \end{aligned}$$

Let \(Z_1=\text{ diag }(1,Z)\). It can be seen that

$$\begin{aligned} L_1:=Z_1^*LZ_1=\begin{pmatrix} 1 &{} \quad 0 &{} \quad 0 \\ \xi &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad I_{n-2} \end{pmatrix}, \quad L_1^*L_1=\begin{pmatrix} 1+|\xi |^2 &{} \quad \bar{\xi }&{} \quad 0 \\ \xi &{} \quad 1 &{} \quad 0 \\ 0 &{} \quad 0 &{} \quad I_{n-2} \end{pmatrix} \end{aligned}$$

whose largest eigenvalue is the larger root of \(t^2-(2+|\xi |^2)t+1=0\). Thus

$$\begin{aligned} \Vert L\Vert _2^2=\Vert L_1\Vert _2^2&=\frac{2+|\xi |^2+|\xi |\sqrt{|\xi |^2+4}}{2} \nonumber \\&\le 1+|\xi |+|\xi |^2 \nonumber \\&=\left( \frac{\Vert x\Vert _2\Vert y\Vert _2}{|x^*y|}\right) ^2+\sqrt{\left( \frac{\Vert x\Vert _2\Vert y\Vert _2}{|x^*y|}\right) ^2-1}. \end{aligned}$$
(2.4)

It is not hard to see that \(\Vert L^{-1}\Vert _2=\Vert L\Vert _2\). Therefore, \(\kappa (X)=\Vert L\Vert _2^2\) which combined with (2.4) give (2.3). \(\square \)

Remark 2.1

Let the angle \(\theta \) between \(x\) and \(y\) be defined by

$$\begin{aligned} 0\le \theta =\arccos \frac{|x^*y|}{\Vert x\Vert _2\Vert y\Vert _2}\le \frac{\pi }{2}. \end{aligned}$$

Clearly, the transformation matrix \(X\) in the lemma becomes ill conditioned if \(\theta \) is near \(\pi /2\). In that case, the reduction is numerically unstable. This is a situation similar to the breakdown phenomenon in the nonsymmetric Lanczos algorithm [10]. In general, the smaller the angle is, the better conditioning \(X\) will have. In fact, if \(\theta =0\), then the upper bound (2.3) ensures \(\kappa (X)\le 1\) which, together with \(\kappa (X)\ge 1\) always, lead to \(\kappa (X)=1\), the perfect condition number. In terms of the angle \(\theta \), (2.3) becomes \(\kappa (X)\le \sec ^2\theta +\tan \theta \).

Given \(x, y\in {\mathbb {C}}^n\) such that \(y_1^*x_1=1\), by Lemma 2.1, we can find \(X_0\in {\mathbb {C}}^{n\times n}\) with \(X_0e_1=x\) and \(X_0^{-*}e_1=y\). Let

$$\begin{aligned} A_1 = X_0^{-1}AX_0,\;\;\; B_1=X_0^{-1}BX_0, \end{aligned}$$

and partition them as

We say that a breakdown occurs if \(b_1^*a_1=0\); otherwise by Lemma 2.1, we find a \(\widehat{X}_{1}\in {\mathbb {C}}^{(n-1)\times (n-1)}\) such that \(\widehat{X}_{1}^*b_1=\gamma _1 e_1\) and \(\widehat{X}_{1}^{-1} a_1=\alpha _1 e_1\). Let \(X_{1}=\text{ diag }(1, \widehat{X}_{1})\). We have

$$\begin{aligned} X_{1}^{-1}A_1 X_{1}=\left( \begin{array}{c|c} a_{11} &{} \mathtt{x} \\ \hline \begin{array}{c} \alpha _1 \\ 0\end{array}&A_2 \end{array}\right) , \quad X_{1}^{-1}B_1 X_{1}=\left( \begin{array}{c|c} a_{11} &{} \bar{\gamma }_1 \quad 0 \\ \hline \mathtt{x} &{} B_2 \end{array}\right) . \end{aligned}$$

Now, applying the same reduction on \(A_2\) and \(B_2\) repeatedly, as detailed in Algorithm 2.1, \(A\) will be reduced to the upper Hessenberg form and \(B\) to the lower Hessenberg form, which are the desired reduced forms of \(A\) and \(B\) in (2.1). We summarize this process as Algorithm 2.1.

figure a

If there is no need to specify the first column of \(X\) and the first row of \(X^{-1}\), then Lines 1 and 2 can be deleted with the input of \(x_1\) and \(y_1\) omitted. This is typically the case for a full reduction algorithm as above.

On completion, the algorithm outputs \(A\) in the upper Hessenberg form and \(B\) in the lower Hessenberg form, resulting in (2.1). We state this as the following theorem.

Theorem 2.1

Given \(x_1, y_1\in {\mathbb {C}}^n\) such that \(y_1^*x_1=1\), if the BREAK at Line 5 does not occur, then Algorithm 2.1 produces \(X\in {\mathbb {C}}^{n\times n}\) with \(Xe_1=x_1\) and \(X^{-*}e_1=y_1\) such that

$$\begin{aligned} X^{-1}AX=H_a, \quad X^{-1}BX=H_b^*, \end{aligned}$$

where \(H_a\) and \(H_b\) are upper Hessenberg matrices.

Remark 2.2

When the BREAK at Line 5 occurs, the algorithm breaks down. This is the same phenomenon as the well-known breakdown in the nonsymmetric Lanczos algorithm and exists typically in any similarity transformation reduction. Since we have yet to find a serious application of Algorithm 2.1, we shall not discuss this more, but refer to existing literature on this [6, 9].

2.2 Partial Reduction

We now present an iterative process to partially reduce \(A\) and \(B\) to Hessenberg forms as in Theorem 2.1. The process bears the characteristics of both the Lanczos algorithm with the two-sided process for building biorthogonal bases and the Arnoldi algorithm with reduction to Hessenberg form. It will, therefore, be called a Lanczos–Arnoldi biorthogonal algorithm. Indeed, it recovers the Lanczos algorithm as well as the Arnoldi algorithm as special cases.

First, let \(Y=X^{-*}\). We can rewrite (2.1) as

$$\begin{aligned} AX&= X H_a, \quad H_a = (h_{a;ij}), \end{aligned}$$
(2.5a)
$$\begin{aligned} B^*Y&= Y H_b, \quad H_b = (h_{b;ij}). \end{aligned}$$
(2.5b)

Notice that \(x_1\) and \(y_1\) can be arbitrarily chosen so long as \(y_1^*x_1=1\). We can now easily deduce the following two recurrences from (2.5) to compute the columns of \(X=(x_1,x_2,\ldots ,x_n), Y=(y_1,y_2,\ldots ,y_n)\):

$$\begin{aligned} h_{a; j+1\,j} x_{j+1}&= \hat{x} \mathop {=}\limits ^{{\text{ def }}}A x_j-\sum _{i=1}^{j} x_ih_{a;ij}, \end{aligned}$$
(2.6a)
$$\begin{aligned} h_{b; j+1\,j} y_{j+1}&= \hat{y} \mathop {=}\limits ^{{\text{ def }}}B^* y_j-\sum _{i=1}^{j} y_ih_{b;ij}. \end{aligned}$$
(2.6b)

\(Y^* X=I\) implies that the sequences of \(x_i\) and \(y_i\) are biorthogonal. Therefore,

$$\begin{aligned} h_{a;ij}=y_i^*A x_j, \quad h_{b;ij}=x_i^*B^* y_j\quad \text{ for } i\le j, \end{aligned}$$

and

$$\begin{aligned} h_{a; j+1\,j} \bar{h}_{b; j+1\,j} = \hat{y}^*\hat{x}. \end{aligned}$$

We summarize the above process in Algorithm 2.2.

figure b

If the case in either Line 9 or Line 13 occurs, the algorithm breaks down. This corresponds to the same breakdown phenomenon as discussed earlier for Algorithm 2.1. Such a phenomenon has been studied in the context of the nonsymmetric Lanczos algorithm; see [1, 8, 16, 18] for details. Specifically, the BREAK in Line 9 is a benign breakdown, upon which an invariant subspace of \(A\) has been found. The algorithm can be continued by selecting \(x_{j+1}\) to be any unit vector that is orthogonal to \(y_1, \ldots , y_j\). The BREAK in Line 13 is a hard breakdown. Several techniques to remedy this situation have been proposed in [1, 8, 16, 18], and they can be adapted to the algorithm here as well. The following theorem summarizes the results of the algorithm.

Theorem 2.2

Suppose Algorithm 2.2 runs to its completion with no breakdown to produce \(\{x_1, x_2, \ldots , x_{k+1}\} \), \(\{y_1, y_2, \ldots , y_{k+1}\} \), and \(h_{a;ij}\) and \(h_{b;ij}\) with \(1 \le i \le j+1 \le k+1\). Let \(X_k =(x_1, x_2, \ldots , x_{k})\) and \(Y_k = ( y_1, y_2, \ldots , y_{k} )\), \(H_{a,k} = (h_{a;ij})_{i, j=1}^k\) and \(H_{b,k} = (h_{b;ij})_{i, j=1}^k\), where \(h_{a;ij}=h_{b;ij}=0\) for \(i>j+1\). Then, we have

$$\begin{aligned} A X_k&= X_k H_{a,k} + h_{a; k+1\,k} x_{k+1} e_k^*,\end{aligned}$$
(2.7a)
$$\begin{aligned} B^* Y_k&= Y_k H_{b,k} + h_{b; k+1\,k} y_{k+1} e_k^*, \end{aligned}$$
(2.7b)

and

$$\begin{aligned} Y_k^* X_k = I_k. \end{aligned}$$

In particular, if the algorithm is run to step \(k=n\), then \(h_{a; n+1\,n}, X_n^* Y_n=I_n\), and

$$\begin{aligned} A X_n = X_n H_{a,n},\;\; B^* Y_n = Y_n H_{b,n}. \end{aligned}$$

Proof

At step \(j\) of the algorithm, \(\{x_1, x_2, \ldots , x_{j+1}\} \) and \(\{y_1, y_2, \ldots , y_{j+1}\} \) are constructed such that

$$\begin{aligned} h_{a; j+1\,j} x_{j+1}&= A x_j-\sum _{i=1}^{j} x_ih_{a;ij}, \\ h_{b; j+1\,j} y_{j+1}&= B^* y_j-\sum _{i=1}^{j} y_ih_{b;ij}. \end{aligned}$$

These immediately lead to (2.7a) and (2.7b). We next prove by induction in \(k\) that \(\{x_1, x_2, \ldots , x_{k+1}\} \) and \(\{y_1, y_2, \ldots , y_{k+1}\} \) are biorthogonal, i.e., \(y_i^* x_j =\delta _{ij}\) for \(1\le i, j \le k+1\). The case \(k=0\) is trivial. Assume that \(\{x_1, x_2, \ldots , x_{k}\} \) and \(\{y_1, y_2, \ldots , y_{k}\} \) are biorthogonal. Then, for \(1 \le \ell \le k\), \(h_{a;\ell k}\) defined at Line 4 of the algorithm can be written as

$$\begin{aligned} h_{a;\ell k} = y_\ell ^* \left( A x_k-\sum _{i=1}^{\ell -1} x_ih_{a;ik}\right) = y_\ell ^*A x_k. \end{aligned}$$
(2.8)

Hence,

$$\begin{aligned} y_\ell ^* x_{k+1}&= \left( y_\ell ^*A x_k-\sum _{i=1}^{k} y_\ell ^* x_ih_{a;ik} \right) /h_{a; k+1\,k}\\&= \left( y_\ell ^*A x_k- y_\ell ^* x_\ell h_{a;\ell k} \right) /h_{a; k+1\,k}\\&= 0. \end{aligned}$$

Similarly, \(y_{k+1}^* x_\ell =0\). Therefore, we have proved that \(\{x_1, x_2, \ldots , x_{k+1}\} \) and \(\{y_1, y_2, \ldots , y_{k+1}\} \) are biorthogonal. It follows that \(Y_k^* X_k= I_k\).

If the algorithm is run to step \(k=n\), then we have first constructed \(\{x_1, x_2, \ldots , x_{n}\} \) and \(\{y_1, y_2, \ldots , y_{n}\} \) such that they are biorthogonal, i.e., \(Y_n^* X_n=I_n\). Then, in constructing \(x_{n+1}\), we have, as in above, \(A x_n-\sum _{i=1}^{n} x_ih_{a;in}\) is orthogonal to \(\{y_1, y_2, \ldots , y_{n}\}\), and hence \(h_{a; n+1\,n}=\Vert A x_n-\sum _{i=1}^{n} x_ih_{a;in}\Vert _2=0\). Thus, \(A X_n = X_n H_{a,n}\) and \( B^* Y_n = Y_n H_{b,n}\). The proof is complete. \(\square \)

There are two special cases of Algorithm 2.2 that give rise to the standard Lanczos algorithm and the Arnoldi algorithm.

Remark 2.3

If \(B=A\), then it follows from (2.8) that for \(i\le j-2\),

$$\begin{aligned} h_{a;ij}&= y_i^*A x_j = (B^* y_i)* x_j\\&= \left( \sum _{\ell =1}^{i+1} y_\ell h_{b;\ell i}\right) ^* x_j \\&= 0. \end{aligned}$$

Then, \(H_{a,n}\) is tridiagonal, and the recurrence (2.6a) reduces to the three-term recurrence

$$\begin{aligned} h_{a; j+1\,j} x_{j+1}= A x_j-x_{j-1}h_{a;j-1,j}- x_jh_{a;jj}. \end{aligned}$$

Similarly, \(H_{b,n}\) is tridiagonal, and the recurrence (2.6a) reduces to a three-term recurrence. Furthermore, \(h_{a;ij}=y_i^*A x_j =\bar{h}_{b;ji}\) for \(j-1 \le i\le j+1\), i.e., \(H_{a,n}=H_{b,n}^*\). Thus, this is the standard nonsymmetric Lanczos algorithm.

Remark 2.4

If \(B=A^*\) and \(y_1 = x_1\), then the constructions of the sequence \(y_i\) is identical to that for \(x_i\), which implies that \(x_i=y_i\). In that case, the biorthogonality becomes orthogonality, i.e., \(x_1, x_2, \ldots , x_k\) are orthogonal and the algorithm reduces to the standard Arnoldi algorithm. However, if \(y_1 \ne x_1\), then the algorithm is different from the Arnoldi algorithm and indeed, it constructs two biorthogonal bases of two Krylov subspaces generated by \(A\) but with two different starting vectors.

3 Model Reduction of a Quadratic System

The theory of linear dynamical systems is an important model describing a variety of engineering problems. Consider the following state-space formulation of a linear single-input single-output system:

$$\begin{aligned} A \frac{\mathrm{d}x (t)}{\mathrm{d}t} + x (t)&= b\, u(t), \end{aligned}$$
(3.1a)
$$\begin{aligned} y(t)&= c^* x(t) , \end{aligned}$$
(3.1b)

where \(x(t)\in {\mathbb {R}}^n\) is the state vector, \(u(t) \in {\mathbb {R}}\) is the input, and \(y(t) \in {\mathbb {R}}\) is the output of interest. For the sake of simplicity, we have written the coefficient matrix for the \(x(t)\) term to be \(I\) and assume the initial condition \(x(0) = 0\). Applying the Laplace transform to equations in (3.1), we obtain

$$\begin{aligned} s A X(s) + X (s)&= b\, U(s), \end{aligned}$$
(3.2a)
$$\begin{aligned} Y(s)&= c^* X(s) , \end{aligned}$$
(3.2b)

where \(X(s), Y(s)\), and \(U(s)\) are the Laplace transforms of \(x(t), y(t)\), and \(u(t)\), respectively. Then, the input and output of the system are determined by the following rational function:

$$\begin{aligned} H(s) = {Y(s) \over U(s)} = c^* (I + sA )^{-1} b , \end{aligned}$$

called the transfer function of the linear system. \(H(s)\) describes the input–output relation in the frequency variable \(s\). In applications of linear systems, it is important to compute the transfer function over a wide range of the frequency parameter \(s\). Direct computations of \(H(s)\) for a large number of values of \(s\) is inefficient as the matrices involved are typically large.

In model reductions, it is desirable to approximate the given system by another system of a lower dimension, called a reduced system, which is more efficiently handled in ensuing computations. It is desirable that the lower order transfer function approximates the original one for a range of frequency \(s\) as high as possible. One way to derive an approximate lower order system is to require that the transfer function of the reduced system \(g(s)\) and the original transfer function \(f(s)\) have the same moments up to certain degree (e.g., terms associated with \(s^0, s^1, s^2\), \(\ldots \) of their Taylor expansions at \(s=0\)). In the case of the first-order systems, it has been shown that such a lower dimensional system can be efficiently and stably constructed using the Lanczos algorithm or the Arnoldi algorithm; (see [4, 7, 14]).

A second-order linear single-input single-output system takes a similar form as in (3.1), but its state variable is governed by a second-order dynamic system as in the following:

$$\begin{aligned} B \frac{\mathrm{d}^2 x}{\mathrm{d}t^2} + A \frac{\mathrm{d}x}{\mathrm{d}t} + x (t)&= b\, u(t), \end{aligned}$$
(3.3a)
$$\begin{aligned} y(t)&= c^* x(t). \end{aligned}$$
(3.3b)

In a similar way, by applying the Laplace transform to the equations above, the input and output relation can be determined by the transfer function which takes the form

$$\begin{aligned} f(s) = {Y(s) \over U(s)} = c^* ( I + s A +s^2 B)^{-1} b . \end{aligned}$$

The model reduction of the second-order system (3.3) has been studied in [3, 11, 14]. Since the transfer function involves two matrices now, a simultaneous orthogonal reduction of \(A\) and \(B\) to an upper Hessenberg matrix and a banded matrix, respectively, was developed in [11] for deriving a low-dimensional approximation of \(f(s)\). However, the degree of moment matching is typically much smaller than the dimension of the reduced system. Here, we consider a type of second-order system for which the Lanczos–Arnoldi-type algorithm of the previous section can be used to derive an efficient model reduction of \(f(s)\).

We consider a system that is a composition of two linear systems with the state variables \(x_1(t)\in {\mathbb {R}}^n\) and \(x_2 (t) \in {\mathbb {R}}^n\) governed by

$$\begin{aligned} A \frac{\mathrm{d}x_1 (t) }{\mathrm{d}t} + x_1 (t)&= b u(t), \end{aligned}$$
(3.4a)
$$\begin{aligned} B \frac{\mathrm{d}x_2 (t)}{\mathrm{d}t} + x_2 (t)&= x_1(t), \end{aligned}$$
(3.4b)
$$\begin{aligned} y(t)&= c^* x_2(t). \end{aligned}$$
(3.4c)

By eliminating \(x_1(t)\), the system is reduced to the second-order system

$$\begin{aligned} AB \frac{\mathrm{d}^2 }{\mathrm{d}t^2}x_2 (t) +(A+B)x_2 (t) + x_2 (t) = b u(t). \end{aligned}$$

On the other hand, applying the Laplace transform directly to equations in (3.4), we obtain

$$\begin{aligned} s A X_1(s) + X_1(s)&= b\, U(s), \\ s B X_2(s) + X_2(s)&= X_1(s), \\ Y(s)&= c^* X_2(s) , \end{aligned}$$

where \(X_1(s), X_2(s), Y(s)\), and \(U(s)\) are the Laplace transforms of \(x_1(t), x_2(t), y(t)\), and \(u(t)\), respectively. Then, the transfer function that determines the input and output relation of the system is

$$\begin{aligned} f(s) = c^*(I+sB)^{-1} (I+sA)^{-1}b. \end{aligned}$$
(3.5)

We now consider constructing a lower order transfer function for approximations of \(f(s)\). Applying the Lanczos–Arnoldi-type algorithm (Algorithm 2.2) to \(A, B\) with

$$\begin{aligned} x_1 = b / \Vert b\Vert _2,\quad y_1 = c / (x_1^* c), \end{aligned}$$
(3.6)

we obtain

$$\begin{aligned} A X_k&= X_k H_{a,k} + h_{a; k+1\,k} x_{k+1} e_k^*, \\ B^* Y_k&= Y_k H_{b,k} + h_{b; k+1\,k} y_{k+1} e_k^*, \end{aligned}$$

and in particular \(A X_n = X_n H_{a,n}\) and \(B^* Y_n = Y_n H_{b,n} \) if no breakdown occurs. Then

$$\begin{aligned} f(s)&= c^*(I+sB)^{-1} (I+sA)^{-1}b \\&= (c^* b) e_1^* (I + s H_{b, n}^*)^{-1} (I+ s H_{a,n} )^{-1} e_1. \end{aligned}$$

Approximating the \(n\times n\) Hessenberg matrices \(H_{a,n}, H_{b,n}\) above by their submatrices \(H_{a,k}, H_{b,k}\), we use the following function

$$\begin{aligned} f_k (s) := (c^* b) e_1^* (I +s H_{b, k}^* )^{-1} (I+s H_{a, k} )^{-1} e_1 \end{aligned}$$
(3.7)

to approximate \(f(s)\). This approximation has the desirable moment matching property as stated in the following theorem.

Theorem 3.1

Suppose Algorithm 2.2 with \(x_1 = b / \Vert b\Vert _2\) and \( y_1 = c / (x_1^* c)\) runs to its completion without breakdown to produce \(H_{a,k} = (h_{a;ij})_{i, j=1}^k\) and \(H_{b,k} = (h_{b;ij})_{i, j=1}^k\). Let \(f(s)\) be defined by (3.5) and let \(f_k(s)\) be defined by (3.7). Then,

$$\begin{aligned} f(s) - f_k (s) = \mathcal{O}(s^{ k }). \end{aligned}$$
(3.8)

Proof

We first prove by induction that, for \(\ell \le k-1\),

$$\begin{aligned} A^\ell x_1 = X_k H_{a,k}^\ell e_1. \end{aligned}$$
(3.9)

By Theorem 2.2, we have

$$\begin{aligned} A X_k = X_k H_{a,k} + h_{a; k+1\,k} x_{k+1} e_k^*. \end{aligned}$$

Then, \(A x_1 =A X_k e_1 = X_k H_{a,k} e_1\). Since \(H_{a,k}^{\ell -1}\) is a Hessenberg matrix, it is easy to see that, for \(m \le k-2\), \(H_{a,k}^{m}\) is a banded matrix with a lower bandwidth \(m\) (i.e., its \((i,j)\) entry is \(0\) if \(i-j > m\)). In particular, \(e_k^*H_{a,k}^{\ell -1} e_1=0\). Now, assuming (3.9) is true for some \(\ell -1 \le k-2\), we have

$$\begin{aligned} A^\ell x_1&= A A^{\ell -1} x_1 = A X_k H_{a,k}^{\ell -1} e_1 \\&=X_k H_{a,k} H_{a,k}^{\ell -1} e_1+ h_{a; k+1\,k} x_{k+1} e_k^*H_{a,k}^{\ell -1} e_1 \\&=X_k H_{a,k}^{\ell } e_1. \end{aligned}$$

This proves (3.9). Similarly, we can prove that for \(\ell \le k-1\)

$$\begin{aligned} (B^*)^\ell y_1 = Y_k H_{b,k}^\ell e_1. \end{aligned}$$
(3.10)

Therefore, for sufficiently tiny \(|s|\), we have

$$\begin{aligned} (I+As)^{-1}b&=\sum _{j=0}^\infty (-1)^j s^j A^j b \\&=\sum _{j=0}^\infty (-1)^j s^j A^j x_1\Vert b\Vert _2\qquad \qquad \qquad (\text{ by } \text{(3.6) })\\&=\Vert b\Vert _2\Big [\sum _{j=0}^{k-1} (-1)^j s^j A^j x_1+\mathcal{O}(s^k)\Big ] \\&=\Vert b\Vert _2\Big [\sum _{j=0}^{k-1} (-1)^j s^j X_k H_{a,k}^j e_1+\mathcal{O}(s^k)\Big ] \qquad (\text{ by } \text{(3.9) })\\&=\Vert b\Vert _2X_k\Big [\sum _{j=0}^\infty (-1)^j s^j H_{a,k}^j e_1+\mathcal{O}(s^k)\Big ] \\&=\Vert b\Vert _2X_k\Big [(I+s H_{a,k})^{-1}e_1+\mathcal{O}(s^k)\Big ], \end{aligned}$$

and similarly,

$$\begin{aligned} c^*(I+sB)^{-1}=\frac{c^*b}{\Vert b\Vert _2}\cdot \Big [e_1^*(I+s H_{b,k}^*)^{-1}+\mathcal{O}(s^k)\Big ]Y_k^*. \end{aligned}$$

Therefore,

$$\begin{aligned} f(s)&= (c^*b)\cdot \Big [e_1^*\big (I+s H_{b,k}^*\big )^{-1}+\mathcal{O}\big (s^k\big )\Big ]Y_k^*X_k\Big [\big (I+s H_{a,k}\big )^{-1}e_1+\mathcal{O}\big (s^k\big )\Big ]\\&=\big (c^*b\big )\cdot e_1^*\big (I+s H_{b,k}^*\big )^{-1}\big (I+s H_{a,k}\big )^{-1}e_1+\mathcal{O}\big (s^k\big ) \\&=f_k(s)+\mathcal{O}\big (s^{ k }\big ), \end{aligned}$$

as was to be shown.\(\square \)

4 A Numerical Example

We present a numerical example to demonstrate the model reduction technique for \(f(s)\) given by (3.5) in the previous section. It is artificially constructed for illustration only. Both \(A\) and \(B\) are taken from the University of Florida sparse matrix collection [5]. The matrix \(A\) is piston with \(n=2025\), and \(B\) is the \(n\times n\) leading principle submatrix of M40PI_n1 (whose original size is \(2028\times 2028\)). Random \(b\) and \(c\) are used. We ran Algorithm 2.2 with \(x_1\) and \(y_1\) given by (3.6) with \(k=50\). For \(s=2\pi \omega \iota (\iota =\sqrt{-1}\)) with frequency \(\omega \in [0,10^{-2}]\), Fig. 1 plots \(|f(s)|\), \(|f_k(s)|\), and the relative error \(|f(s)-f_k(s)|/|f(s)|\). It can be seen that the curves for \(f(s)\) and \(f_k(s)\) start out indistinguishable until \(\omega \) reaches about \(4\cdot 10^{-3}\) after which the curves begin to noticeably split. This behavior confirms what Theorem 3.1 suggests: \(f_k(s)\) approximates \(f(s)\) if \(|s|\) is sufficiently small.

Fig. 1
figure 1

Left \(|f(s)|, |f_k(s)|\); Right \(|f(s)-f_k(s)|/|f(s)|\)

5 Concluding Remarks

We have presented a simultaneous reduction technique for two matrices into the upper Hessenberg form and lower Hessenberg form, respectively. The corresponding iterative algorithm for partial reductions is found to generalize the standard Arnoldi algorithm and the Lanczos algorithm and provide an interesting connection among them. For an application, we consider a second-order linear single-input single-output system and have shown that the partial reduction algorithm leads to a low-dimensional approximation that has a desirable moment matching property.

For future works, it would be interesting to consider other possible applications of this simultaneous reduction algorithm and some practical problems giving rise to the kind of second-order linear single-input single-output systems discussed in Sect. 3.