Orthogonal Iterations on Companion-Like Pencils

We present a class of fast subspace algorithms based on orthogonal iterations for structured matrices/pencils that can be expressed as small rank perturbations of unitary matrices. The representation of the matrix by means of a new data-sparse factorization—named LFR factorization—using orthogonal Hessenberg matrices is at the core of these algorithms. The factorization can be computed at the cost of O(nk2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n k^2)$$\end{document} arithmetic operations, where n and k are the sizes of the matrix and the small rank perturbation, respectively. At the same cost from the LFR format we can easily obtain suitable QR and RQ factorizations where the orthogonal factor Q is a product of orthogonal Hessenberg matrices and the upper triangular factor R is again given into the LFR format. The orthogonal iteration reduces to a hopping game where Givens plane rotations are moved from one side to the other side of these two factors. The resulting new algorithms approximate an invariant subspace of size s associated with a set of s leading or trailing eigenvalues using only O(nks) operations per iteration. The number of iterations required to reach an invariant subspace depends linearly on the ratio |λs+1|/|λs|\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\lambda _{s+1}|/|\lambda _s|$$\end{document}. Numerical experiments confirm the effectiveness of our adaptations.


Introduction
Subspace methods are an important tool in modern adaptive systems [33]. The goal is the iterative estimation of the s largest or smallest eigenvalues and the associated eigenvectors of a possibly time-varying matrix/pencil. In this paper, we are concerned with the design of fast subspace algorithms for a class of structured matrices (pencils) representable as small rank perturbations of unitary matrices. The paramount example is (block) companion matrices/pencils arising from the linearization of polynomial eigenvalue problems. The companion form linearization of a matrix polynomial P(λ) = d i=0 P i λ i , P i ∈ C k×k , is given by with A, B ∈ C n×n , n = dk. It is well known that the eigenvalues of A − λB are identical with the eigenvalues of P(λ). Also, observe that A, B ∈ U k , the set of unitary-plus-rank-k matrices.
The computation of selected eigenspaces of large companion pencils has several applications.
1. Subspace tracking is often required in system and signal processing analysis because the characteristics of the possibly time-varying system (signal) can be retrieved from the eigenvalues of some associated parameter-dependent matrix polynomial [22,23,30,34]. Subspace methods are ideally suited for these computations since they can use the approximate eigenspace computed in the previous step as starting guess for the new iteration. 2. Adaptations of the block power method as well as its orthogonal generalizations are used to compute the solvents and spectral factors of a matrix polynomial (see [37] for a review and applications of the matrix solvent theory). The unilateral matrix equation The matrix X is called the solvent and its computation reduces to approximate an invariant subspace of (A, B). 3. Nonlinear eigenvalue problems of the form T (z)v = 0, v = 0, where T : → C k×k is a holomorphic matrix-valued function and ⊆ C is a connected and open set, can be addressed using interpolation techniques [20,24]. The approach consists first of interpolating T (z) with a matrix polynomial P d (z) at certain nodes inside a subset ⊂ , and then of computing the eigenvalues of P d (z) to provide numerical approximations of the eigenvalues of T (z) in . Since in general we are interested only in the few eigenvalues of T (z) laying inside the target region , it is convenient to approximate only the eigenvalues of interest rather than approximate the full spectrum of P d (z).
Subspace methods based on orthogonal iterations can be numerically accurate and backward stable. The method of orthogonal iteration goes back to Bauer (see [32] and the reference given therein). A fast eigenvalue algorithm for n × n scalar companion matrices based on orthogonal iteration first appeared in [36]. Essentially, that algorithm is a game of orthonormal Givens plane rotations moved from one side to the other side of orthogonal factors at the cost of O(ns) flops, where s is the dimension of the subspace we want to approximate. More recently such schemes have been termed core-chasing algorithms [5].
In this paper we extend the game to more general matrices A ∈ C n×n which are unitary plus some low rank-k correction term, k ≥ 1. This class includes block companion matrices together with some generalizations for matrix polynomial expressed w.r.t. certain interpolation bases [1,15]. The development follows by exploiting the properties of a suitable LFR factorization [11,12] of some bordered extension A of A, that is, A = L F R, where L (R) is a unitary k-lower (k-upper) Hessenberg matrix and F = U + E Z is a unitary plus rank-k matrix where U is a block diagonal unitary matrix of the form I kÛ and E = [I k , 0] T .
The unitary matrixÛ can be expressed as product of < n − k unitary Hessenberg matrices. It is shown that the shape ofÛ determines the shape of A. In particular A and, a fortiori, A is upper triangular if and only ifÛ is upper triangular and hence, because of the unitary structure of U , diagonal. This key property makes it possible to design a fast implementation of both direct and inverse orthogonal iteration on A. Specifically, from the LFR format ofÂ, we can easily get suitable QR and RQ factorizations ofÂ at the cost of O(nk ) flops, where the orthogonal factor Q is a product of orthogonal Hessenberg matrices and the upper triangular factor R is again given into the LFR format. It turns out that for the matrices of interest ≤ k. Direct and inverse orthogonal iterations applied to A can be carried out stably and efficiently by using these factorizations. Specifically, we show that both iterations can be implemented by moving the Givens plane rotations which specify the current approximation from one side to the other side of the factorizations. Due to the representation of the two factors R and Q in terms of orthogonal Hessenberg matrices the movement of one single rotation reduces to hop from one Hessenberg matrix to the successive in a chain of k matrices. The overall movement is therefore completed at each step with a cost per iteration of only O(nsk) flops.
The resulting algorithm is backward stable since it involves operations among orthogonal matrices only. The corresponding unstructured implementation performed without any preliminary reduction of A into Hessenberg form would require a quadratic complexity in both s and k per iteration. The linear dependence of our estimate from these two quantities makes the proposed algorithm maximally fast w.r.t. the size of the correction term and the invariant eigenspace. Moreover, it can be easily generalized to deal with both the orthogonal and the inverse orthogonal iteration method for structured pencils (A, B) where A and/or B are perturbed unitary matrices.
The paper is organized as follows. In Sect. 2 we recall the theoretical background concerning the orthogonal iteration methods and the properties of modified unitary matrices. Section 3 presents the derivation of our fast adaptations of the orthogonal iteration methods for modified unitary matrices. In Sect. 4 we show the results of numerical experiments that lend support to the theoretical findings. Finally, Sect. 5 summarizes conclusions and future work.

Preliminaries
In this section we recall some preliminary results concerning the formulation of both direct and inverse orthogonal iterations for matrix pencils and the structural properties and datasparse representations of modified unitary matrices.

The Method of Orthogonal Iteration for Matrix Pencils
The method of orthogonal iteration (sometimes called subspace iteration or simultaneous iteration) can be easily generalized for matrix pencils. Let A − λB, A, B ∈ C n×n , be a regular matrix pencil with A or B invertible. The orthogonal iteration method can be applied for approximating the largest or smallest magnitude eigenvalues of the matrix pencil by working on the matrices If B is nonsingular then a generalization of the orthogonal iteration method to compute the s-largest (in magnitude) generalized eigenvalues and corresponding eigenvectors of the matrix pencil A − λB proceeds as follows: where Q 1 ∈ C n×s is a starting orthonormal matrix comprising the initial approximations of the desired eigenvectors. A detailed convergence analysis of this iteration can be found in Chapter 8 of [2]. It is found that the convergence is properly understood in terms of invariant subspaces. Specifically, under mild assumptions it is proved that the angle between the subspace generated by the columns of Q i and the invariant subspace associated with the s largest-magnitude generalized eigenvalues λ 1 , λ 2 , . . . , λ s ∈ C, with |λ 1 | ≥ . . . ≥ |λ s | > |λ s+1 | ≥ . . . ≥ |λ n |, tends to zero as O((|λ s+1 |/|λ s |) i ) for i going to infinity. The number of iterations required to reach an invariant subspace depends linearly on the ratio |λ s+1 |/|λ s |.
Once the subspace has been approximated projection techniques can be employed to find single eigenvalues and corresponding eigenvectors. An effective stopping criterion is the following where τ is a desired tolerance. Observe that this quantity measures the distance between the subspaces S i−1 = span{Q i−1 } and S i = span{Q i }, in fact (I − Q i−1 Q * i−1 ))Q i can be taken as a measure of the angle between S i−1 and S i . Note moreover that Assume now we are given a pencil A − λB with A invertible and that we would like to compute the s smallest-magnitude generalized eigenvalues λ 1 , λ 2 , . . . , λ s ∈ C with |λ 1 | ≤ |λ 2 | ≤ · · · ≤ |λ s | < |λ s+1 | ≤ . . . ≤ |λ n |. Inverse orthogonal iterations can be used to approximate the desired eigenvalues. Starting with a set of s orthogonal vectors stored in the matrix Q 0 ∈ C n×s , we compute the sequences 3) The direct and inverse orthogonal iterations (2.1), (2.3) can be carried out by solving the associated linear systems, but such an approach is prone to numerical instabilities due to the conditioning of the resulting coefficient matrices. A more stable way to perform these schemes is using the QR/RQ factorization of the matrices involved. In particular for the inverse iteration (2.3) one may proceed at each step as: The set of orthogonal vectors satisfying the linear system is such that Q * i = Q L (1 : s, :). In the next subsection we introduce a suitable factorization of modified unitary matrices which makes possible to realize this QR-based process in an efficient way. Since (2.1) can be implemented similarly by interchanging the role of the matrices A and B, in the sequel we refer to orthogonal iteration as the scheme (2.3).

Fast Compressed Representations of Modified Unitary Matrices
In this section we introduce a suitable compressed factorization of unitary plus rank-k matrices which can be exploited for the design of fast orthogonal iterations according to the QR-based process described above. See [11,12] for additional theoretical results.
We denote by U k the set of unitary-plus-rank-k matrices, that is, A ∈ U k if and only if there exists a unitary matrix V and two skinny matrices X , Y ∈ C n×k such that A = V + XY * . A key role is played by generalized Hessenberg factors.
Similarly, L is called k-lower Hessenberg if l i j = 0 when j > i + k. In addition, when R is k-upper Hessenberg (L is k-lower Hessenberg) and the outermost entries are non-zero, that is, r j+k, j = 0 (l j, j+k = 0), 1 ≤ j ≤ m − k, then the matrix is called proper. A matrix which is simultaneously k-lower and k-upper Hessenberg is called k-banded.
Note that for k = 1 a Hessenberg matrix is proper if and only if it is unreduced. Also, a kupper Hessenberg matrix R ∈ C m×m is proper if and only if det(R(k +1 : m, 1 : m −k)) = 0. Similarly a k-lower Hessenberg matrix L is proper if and only if det(L(1 : m − k, k + 1 : m)) = 0. To make the presentation easier when possible, we use the letter R to denote unitary generalized upper Hessenberg matrices, and the letter L for unitary generalized lower Hessenberg matrices.
Note that k-lower (upper) Hessenberg matrices can be obtained as the product of k matrices with the lower (upper) Hessenberg structure, and that unitary block Hessenberg matrices with blocks of size k are (non-proper) k-Hessenberg matrices.
In the following we will work with Givens rotations acting on two consecutive rows and columns. In particular we will denote by G i = I i−1 ⊕ G i ⊕ I n−i−1 the n × n unitary matrix where G i is a 2 × 2 complex Givens rotation of the form c −s sc such that |c| 2 + s 2 = 1, with s ∈ R, s ≥ 0. The subscript index i indicates the active part of the matrix G i . In the case G i = I 2 we say that G i is a trivial rotation.

Definition 2.2
Given a unitary matrix U ∈ C n×n , we say that U has a data-sparse representation if it can be expressed as the product of O(n) Givens matrices of the form G i described before, possibly multiplied by a diagonal unitary matrix.
Note that the Definition 2.2 includes unitary (generalized) Hessenberg defined in 2.1, CMVmatrices [17], and other zig-zag patterns [38], as well as the product of a constant number of these structures. Next lemma shows how the product between data-sparse unitary terms can be factorized swapping the role of the two factors. Proof Let us partition R and U as follows where R 21 is an upper triangular matrix of size n − k, and U 11 is square of size n − k.
Multiplying R and U and imposing the conditions on the blocks of the product V S we get

Definition 2.4 The (lower) staircase of a matrix
The sequence m j (A) allows to represent the zero pattern of a matrix, in particular to identify zero sub-blocks in the matrix, in fact for each 1 ≤ j ≤ n, it holds A(m j (A)+1 : n, 1 : j) = 0. We note that proper k-upper Hessenberg matrices have m j (A) = j + k for j = 1, . . . , n − k, and m j (A) = n, for j = n − k + 1, . . . , n.

Lemma 2.5
Let A ∈ C n×n be a matrix with staircase described by the sequence {m j (A)}, for 1 ≤ j ≤ n and let T ∈ C n×n be a non singular upper triangular matrix, we have To prove the equality of the staircase profile of B and A consider the Any unitary matrix of size n can be factorized as the product of at most n −1 unitary upper Hessenberg matrices, 1 j=i G j and D is a diagonal unitary matrix. To describe the representation and the algorithm we use a pictorial representation already introduced in several papers (compare with [5] and the references given therein). Specifically, the action of a Givens rotation acting on two consecutive rows of the matrix is depicted as . Then a chain of ascending two-pointed arrows as below represents a unitary upper Hessenberg matrix (in the case of size 8). Some of the rotations may be identities (trivial rotations), and we might omit them in the picture. For example, in the above definition of H i we only have non-trivial rotations G j for j ≥ i, while the representation of H 3 is Givens transformations can also interact with each other by means of the fusion or the turnover operations (see [39], pp. 112-115). The fusion operation will be depicted as → and consists of the concatenation of two Givens transformations acting on the same rows. The result is a Givens rotation multiplied by a 2 × 2 phase matrix. The turnover operation allows to rearrange the order of Givens transformations (see [39]).
Graphically we will depict this rearrangement of Givens transformations as: Note that if the Givens transformations involved in turnover operations are all non-trivial also the resulting three new matrices are non trivial (see [4]).
In this paper we are interested in the computation of a few eigenvalues of matrices belonging to U k by means of the orthogonal iteration schemes outlined in Sect. 2.1. We will represent these matrices in the so-called LFR format, a factorization introduced in [11,12]. Definition 2. 6 We say that a matrix A ∈ C n×n , A ∈ U k is represented in the LFR format if (L, F, R) are matrices such that: Any matrix in U k can be brought in the LFR format as follows. Let A ∈ U k , such that Bringing R on the right we get our factorization, where F = U + E Z * and Z = R Y T * k . The LFR format of modified unitary matrices is the key tool to develop fast and accurate adaptations of the orthogonal iterations as we will see in the next section.

Fast Adaptations of the Orthogonal Iterations
As underlined in Sect. 2.1, to compute the next orthogonal vectors approximating a basis of the invariant subspace we need to compute the QR decomposition of B Q i and the RQ decomposition of Q * R A.

RQ and QR Factorization of Unitary Plus Low Rank Matrices
To recognize the triangular factors from the LFR decomposition of A and B it is useful to embed the matrices of the pencil into larger matrices obtained edging the matrices with k additional rows and columns. Next theorem explains how we can get such a larger matrices still maintaining the unitary plus rank-k structure.
The unitary part ofÂ can be described with additional nk Givens rotations with respect to the representation of the unitary part of A.
We assume that Y ∈ C n×n has orthogonal columns, otherwise we compute the economy size QR factorization of Y and then we set Y = Q and X = X R * . Set C A = V Y , and consider the matriceŝ We can prove thatV is unitary by direct substitution. The last k rows ofÂ =V +XŶ * are zero. The matrix V can be factorized as product of unitary factors which are related to the original players of A, namely V , X and Y . In particular where S is a k-lower Hessenberg matrix such that Such a S always exists and is proper (see Lemma 3 in [12]).
Note that the LFR format ofÂ is such that L is proper, sinceX has the last k rows equal to −I k (see [12] Lemma 3). Proof Since L is unitary, we have L * Â = F R. Let partition L and R as follows: L = L 11 L 12 L 21 L 22 , with L 12 a n × n lower triangular matrix, and similarly partition R in such a way that R 21 is n × n upper triangular. Then we get L * 12 A =Û R 21 . For Lemma 2.5 we know that L * 12 A has the same staircase profile as A, andÛ R 21 has the same staircase profile asÛ .
Next lemma helps us to recognize triangular matrices in the LFR format. Proof We have L * 21 A = R 21 . Because L is proper, the triangular block L * 12 is nonsingular and A = (L * 12 ) −1 R 21 . Hence A is upper triangular because is the product of upper triangular factors.Â is upper triangular as well because is obtained padding with zeros (3.1).
We now give an algorithmic interpretation of Lemma 2.3. A pictorial interpretation of the lemma is given in Fig. 1, where we omit the diagonal unitary factors that are possibly present in the general case.
Starting from Fig. 1 we can describe an algorithm for the "swap" of two unitary terms. In fact, we can obtain the new Givens rotations in the factors V and S simply applying repeatedly fusion and turnover operations as described by the algorithm in Fig. 2. We formalize the algorithm as if the Givens rotations involved in the swap were all non-trivial. The algorithm has a cost O(nk) only when U admits a data-sparse representation. Matrix U can be generally factorized as the product of at most ≤ n − 1 unitary upper or lower Hessenberg matrices. In this case the overall cost is O(nk ). In procedure SwapRU we choose to factorize U as the product of lower Hessenberg factors, but we can obtain a similar algorithm expressing U in terms of upper Hessenberg factors, and consider the worst case = n − 1. At step i − 1 we have removed the first i − 1 chains of ascending Givens rotations from U , so the situation is the following where i is an intermediate k-upper Hessenberg which is transformed by the turnover and fusion operations. In particular 1 = R and n−1 = S. Similarly swapping U and R in such a way that U R =R Q we get an RQ decomposition of A where the triangular factor is L(I + EẐ * )R, withẐ = U Z, and the unitary factor is Q. The proof is straightforward since, again from Lemma 3.3, we have that L(I + EẐ * )R is upper triangular.

The Algorithm
In this section we describe the orthogonal iterations on a pencil (A, B) where A and B are unitary-plus-low-rank matrices. For the sake of readability we assume A, B ∈ U k , even if situation where the low-rank part of A and B do not have the same rank is possible: in that case we assume that k is the maximum between the values of the low rank parts in A and B.
We will assume that A and B have been embedded in a larger pencil (Â,B) as described in Theorem 3.1. Since det(Â − λB) = 0 for all λ, the pencil is singular and the k new eigenvalues introduced with the embedding are indeterminate: MATLAB returns "NaN" as eigenvalues in these cases. However, thanks to the block triangular structure ofÂ andB the other eigenvalues coincide with those of the original pencil (A, B). The LFR formats ofÂ and B implicitly reveal these block triangular profiles and although we use the representation of the larger matrices the proposed iterative scheme basically works on the original pencil (A, B) so that its convergence is not affected by indeterminate eigenvalues. In order to understand this crucial point the following properties play a role. Suppose we apply such a rotation on the right ofÂ working on its j, j + 1 columns. From the structure ofQ i we find j < n. Due to the opposite profiles of L and R the bulge into the upper triangular factor can be removed on the left by a Givens plane rotation acting on the j, j + 1 rows. In other words, insensitively of the composite form of the upper triangular factor the overall scheme is mathematically equivalent with the one working on the input matrix A.
We are now in position to describe how we can carry out an orthogonal iteration using only the turnover and fusion operations. The key ingredient for the algorithm is the SwapUR procedure and its variants as described in Sect. 3.1. In fact, working with the pencil and with the LFR factorization, the orthogonal iterations can be reformulated as follows. Let be the LFR decomposition ofÂ andB. The s starting orthogonal vectors inQ 0 ∈ C N ×s , N = n + k, as well as all the intermediate orthogonal vectorsQ i , can be represented as the product of s sequences of ascending Givens rotations. In fact, the columns ofQ 0 can be always be completed to an orthogonal basis {q 0 , q 1 , . . . , q N } such that [q 0 , q 1 , . . . , q N ] is a k-lower Hessenberg matrix (see [12]). In the description of the algorithm, since we are working with the representation ofQ i in terms of Givens rotations we will identify withQ i either the full square matrix or its first s columns.
As we see in the algorithm in Fig. 3, the procedure starts computing the RQ factorization ofÂ and the QR factorizarion ofB by means of SwapUR and SwapLU. This is a preprocessing step which simplifies the iterations of the algorithm. Then we havê The factors Q A and Q B obtained with the swap procedures are block diagonal with the tailing diagonal block equal to I k . In terms of Givens rotations this correspond to have trivial rotations at the bottom of each Givens chain. Note that we do not have actually to update either Z A or Z B which are not needed for the the computation of theQ i .
The iterations then boil down to the application of the two procedures MoveSe-quencesLeft and MoveSequencesRight that can be described in terms of the LFR representation. Note that Q R and Q L returned by the procedures MoveSequenceRight and MoveSequenceLeft have the block diagonal structure with a tailing identity block.

Computational Complexity
Analyzing the cost of the Orthogonal Iterations for companion-like pencils we have to consider the inizialization phase where the LFR decompositions are computed and the initial swaps for the computation of the RQ and QR factorizations ofÂ andB are performed. The computation of the initial LFR form ofÂ andB requires in general O(n 2 k) turnover or fusion operations but reduces to O(nk 2 ) when the unitary factors in A and B have a data- Fig. 3 Inverse orthogonal iterations described in terms of the LFR representation of the pencil sparse representation such as in the block-companion-like case (see [12] for more details). The computation of the LFR format does not require a pre-processing step to transform the matrix into scalar Hessenberg form but can directly be applied to the two matricesÂ andB.
The cost of the swap procedure depends on the number of chains in which the factors to be swapped are decomposed, as explained in detail in Sect. 2. In the case of interest the cost is O(nks). In fact the representation of the initialQ 0 requires ns rotations and therefore it can be decomposed into the product of s unitary Hessenberg matrices. The cost for each iteration is given by the cost of the two procedures MoveSequenceRight and MoveSequenceLeft each one performing three swaps between k and s chains of rotations. Hence each step requires O(nks) operations.
The number of iterations to reach the invariant subspace depends linearly on the ratio |λ s+1 |/|λ s |, as described in [2], Chapter 8. Denoting by it the number of iterations needed to reach an invariant subspace, the total cost is O(nk 2 + it nks) floating point operations.

A Posteriori Measure of Backward Error
The backward stability of the iterative algorithm in Fig. 3 follows from the observation that it involves operations between unitary factors only. A formal proof using the properties of LFR formats can be obtained as in [12] by relying upon the estimates in [5] for the arithmetic among Givens plane rotations. In this section we are interested to devise some a posteriori computable measure of the backward error which will be used in the next section to illustrate the global behaviour of our proposed algorithm.
Suppose that the orthogonal iterations method has reached a numerically invariant subspace spanned by the s orthogonal columns of the matrix Q. We expect that for the computed Q, σ s+1 ([AQ, BQ]) is small since in exact arithmetic we should have AQ = B Q , with ∈ C s×s . To evaluate the stability we analyze then the quantity The following theorem proves that under mild assumptions we can use back s to estimate the backward stability of the method.

Theorem 3.4 Let A be invertible and letQ
This shows that under this assumption V 21 is invertible as well. Now we are looking for matrices A , B and˜ such that the equality AQ − BQ˜ = −AQ + BQ˜ is fulfilled. Rewriting the relation in terms of the SVD factors we get

Numerical Results
In this section we provide a few numerical illustrations of the properties of convergence and stability of the proposed methods. We perform several tests using nonlinear matrix functions T (λ) ∈ C k×k . For matrix polynomials T (λ) = P(λ) = d i=0 P i λ i we consider the companion linearization (1.1) with A, B ∈ U k . For non-polynomial matrix functions we first approximate by interpolation the matrix function with polynomials of different degrees, then we linearize the polynomials as companion pencils (A, B), with A, B ∈ U k .
We focus on the problem of computing a selected eigenspace of (A, B). When building the pencil, our method performs the inverse orthogonal iterations as defined in the algorithm Orthogonal Iterations in Fig. 3, until an invariant subspace is revealed. Then the corresponding eigenvaluesλ i are computed applying the MATLAB eig function to the s × s pencil (A s , B s ) determined as the restriction of A and B to the subspace spanned by the columns of Q i , i.e. the generalized Rayleigh quotients of A and B.
As an error measure for individual eigenvalues, we consider where v is the k-th right singular vector of T (λ i ). In practice we compute err where σ 1 ≥ σ 2 ≥ · · · ≥ σ k are the singular values of T (λ i ). Following [24] it can be shown that err T (i) is an upper bound on the backward error for the approximate eigenvalueλ i of T (λ). According to the customary rule of thumb, this also measures the forward error for well-conditioned problems. As a measure of convergence of the orthogonal iterations we consider where μ i are the "exact" eigenvalues of the pencil (A, B) obtained with MATLAB eig. The subscript P specifies that this quantity is calculated for polynomial matrix functions T (λ) = P(λ) only. If s = {λ 1 , . . . ,λ s } and = {μ 1 , . . . , μ s } are the computed eigenvalues of (A s , B s ) and (A, B), respectively, then these eigenvalues are paired by means of the following rule: = − {μ i }; averr P = averr P +τ i ; end; averr P = averr P /s.
Note that in the genuinely (nonpolynomial) nonlinear case, where P(λ) is a polynomial approximation of T (λ), averr P refers to the average error with respect to the eigenvalues of the approximating polynomial while err T (i) also depends on the quality of the approximation of the nonlinear function with the matrix polynomial.

Matrix Polynomials
We tested our method on some matrix polynomials of small and large degrees. The first test suite consists of matrix polynomials of degree 3 and 4 from the NLEVP collection [9] using the companion linearization. Table 1 summarizes the results. We see that all the test give good results in terms of the backward error with respect to the pencil. For the plasma_drift polynomial, the backward errors on the pencil reflect the termination of the algorithm with the adopted criteria, and the invariant subspace is found with an adequate accuracy. The plasma_drift is a very challenging problem for any eigensolver [26], due to several eigenvalues of high multiplicity and/or clustered around zero. In [26] the authors proposed a variation of the Jacobi-Davidson method for computing several eigenpairs of the polynomial eigenvalue problem. For this problem, they ran their algorithm with a residual threshold of 1.0e−2 and within 200 iterations they were able to compute the approximations of the 19 eigenvalues closer to the origin. To compare with those results we repeated the experiment on plasma_drift twice, once by setting the stopping criteria in (2.2) to a tolerance of 1.0e−04, and then performing 250 iterations. With only 6 iterations we get a backward error of 8.12e−06 while the error respect to the eigenvalues is approximately of order 1.0e−09.
Performing 250 iterations the error on the eigenvalues reaches machine precision.
Regarding the other experiments, we ran the tests setting the tolerance for the stopping criteria to tol = 1.0e−14 and the maximum number of iterations to 1000. For the orr_sommerfeld problem, we observe that the condition E i 2 < tol is not sufficient to guarantee also a sufficiently small error on the eigenvalues. In fact, the method requires only six iterations to stop because the approximating subspaces spanned by the columns of Q i are converging very slowly, so that E i 2 is small despite the invariant subspace has not been reached yet. For this problem also with the MATLAB command polyeig or with methods which do not use scaling and balancing of the coefficients of the polynomial (see Table 1 in [19]) we have similar bounds on err P . However, scaling the orr_sommerfeld polynomial as described in [8,19], the error on the coefficients of the polynomial reaches O(10 −14 ) and the number of iterations increases. The result is better than some of the results reported in [19] where a balanced version of the Sakurai-Sugiura method with Rayleigh-Ritz projection is presented. Then, by comparison of our result with that reported in [6], where a structured version of the QZ method is employed, we see that for the orr_sommerfeld problem we get a slightly better backward error on the pencil. However, we are estimating only 2 or 4 eigenvalues while the QZ allows us to approximate the full spectrum and, moreover, differently from [6] our error analysis assumes a uniform bound for the norm of the perturbation of A and B. The accuracy of the computed eigenvalues is in accordance with the conditioning estimates.
For the butterfly problem our method performs similarly to the results reported in [18] and the number of iterations in relative_pose_5pt agrees with the separation ratio of the eigenvalues. For the other tests there are remarkable differences in the number of iterations depending on the sensibility of our stopping criterion (2.2) used in Algorithm Orthogonal  Concerning large degree polynomials, we have considered applications arising in Markov chain theory that require the computation of solvents. It is well known that the computation of the steady-state vector of an M/G/1-type Markov chain can be related to the solution of a certain unilateral matrix equation (see [14] and the references given therein). In particular, we are interested in the computation of the minimal nonnegative solution of the matrix equation. As already pointed out in the introduction, the nonlinear problem can be reformulated in a matrix setting as the computation of a certain invariant subspace of a matrix pencil (A, B) corresponding with the generalized eigenvalues of minimum modulus. In our experiments, we have tested several cases of M/G/1-type Markov chains introduced in [16]. We do not describe in detail the construction, as it would take some space, but refer the reader to [16,Sections 7.1]. The construction of the Markov chain depends on two probability distributions and a parameter ρ which makes it possible to tune the separation ratio |λ s | |λ s+1 | . In Table 2 we show our results in two settings. In each setting, we have fixed the probability distributions by varying ρ = 0.5, 0.7, 0.9. The stopping criterion was set to tol = 10 −13 . We see that the backward errors slightly deteriorate as the parameter ρ approaches 1 and the number of iterations increases. This is the same behaviour usually observed in the unshifted QR eigenvalue method.
We used large degree polynomials associated with Markov chains to assess the time complexity of the proposed algorithm. Figure 4 shows the linear dependence on the dimension of the invariant subspace s. For a matrix of size n = 80 drawn from the M/G/1 type Markov chain model described in [16], we applied our algorithm with increasing values of s. Then, the time-per-iteration required by the method is plotted in correspondence with the values of s used. In Fig. 5, for some 10 × 10 matrix polynomials with degree ranging from 590 to 2016 (constructed with the technique described in [16]), corresponding to pencils of size n = 5190 up to n = 20,160, and with s = 2, we draw a dot corresponding to the coordinates (n, T /(k s)), where T is the time for an iteration without accounting for computing the LFR factorization. In both the figures, the solid line is the linear fit of these points.

Nonlinear Matrix Functions
Another set of experiments have dealt with matrix polynomials generated in the process of solving nonlinear matrix equations det T (z) = 0 with T (λ) ∈ C k×k a nonlinear matrix function. Our test suite includes: 1. Shifted Time-delay equation [13]. Applying the transformation z → 6z − 1, the matrix function in [13] becomes T (z) = 6I 2 z +T 0 + T 1 exp(−6z + 1) with This function has three eigenvalues inside the unit circle. 2. Model of cancer growth [7]. The matrix function is The parameters are chosen as suggested in [7] by setting r = 5; b 1 = 0.13; b Q = 0.2; μ 1 = 0.28; μ 0 = 0.11; μ Q = 0.02; μ G = 0.0001, μ 2 = μ 0 + μ Q . We refer to [7] for the physical meaning of the constants and for the description of the model. This function has three eigenvalues inside the unit circle. 3. Neutral functional differential equation [21]. The function is scalar t(z) = 1+0.5z + z 2 + hz 2 exp(−τ z). The case h = −0.82465048736655, τ = 6.74469732735569 is analyzed in [28] corresponding to a Hopf bifurcation point. This function has three eigenvalues inside the unit circle. 4. Spectral abscissa optimization [29]. The function is T (z) = z I 3  The choice of q leads to an eigenvalue problem with multiple eigenvalues that are often defective. This function has 4 eigenvalues inside the unit circle. 5. Hadeler problem [9]. The matrix function is In our experiments we set k = 8 and α = 100. Ruhe [31] proved that the problem has k real and positive eigenvalues, in particular two of them are in the open interval (0, 1), and hence lie inside the unit circle. 6. Vibrating string [9,35]. The model refers to a string of unit length clamped at one end, while the other one is free but is loaded with a mass m attached by an elastic spring of stiffness k p . Assuming m = 1, and discretizing the differential equation one gets the nonlinear eigenvalue problem 7. The function F(z) : → C 3×3 from [3] is defined as follows: (4.1) The function F(z) was transformed using elementary transformations from diag(cos(z), sin(z), e z −7). Hence, it has seven real known eigenvalues given by {±π, ±π/2, 0, log (7), 3π/2}. We applied the transformation z → 4z + 1 to bring six of the seven eigenvalues inside the unit disk. With this transformation we do not get an approximation of the eigenvalue −π which after the translation is not inside the unit disk.
For all these functions we compute the interpolating polynomial P d (z) over the roots of unity for different degrees d ranging from 16 to 128. With this choice of the interpolation nodes we have theoretical results [10,20] about the uniform convergence of the interpolating polynomials to the nonlinear function inside the unit disk which prevents the occurrence of spurious eigenvalues. Furthermore, the coefficients of the interpolating polynomial in the monomial basis are computed by means of an FFT which is very stable. In Table 3 are summarized some results for the nonlinear functions considered. In Table 4 are reported the complete results for the test 7 where we know the exact eigenvalues.
In general, for sufficiently large values of the degree we get a very good approximation of the eigenvalues inside the unit disk. Comparing the results in Table 4 with those reported in paper [3] we see that using the same degree (d = 64) as in [3] we get a results with 5 more digits of precision respect to the results reported in [3].
The error results in Table 3 are also comparable with those reported in [10], where a structured QR method was employed to compute all the eigenvalues of the matrix/pencil. In general the number of iterations it does not depend on the size of the problem, but only on the ratio |λ s |/|λ s+1 |, so the cost of the orthogonal iterations can be asymptotically cheaper. In addition, the computation of all eigenvalues of P d (λ) is wasteful since only s of them are reliable because the polynomial is a good approximation of the nonlinear function only inside the unit disk.  For the inverse orthogonal iterations we used s = 6, but we show the results only for the first three eigenvalues (corresponding to the value π/2, log(7) and 0). The other remaining two eigenvalues are approximated just as well

Conclusions and Future Work
In this paper we have presented a fast and backward stable subspace algorithm for block companion forms using orthogonal iterations. The proposed method exploits the properties of a suitable data-sparse factorization of the matrix involving unitary factors. The method can be extended to more generally perturbed unitary matrices and it can incorporate the acceleration techniques based on the updated computation of Ritz eigenvalues and eigenvectors [2]. The design of fast adaptations using adaptive shifting techniques such as the ones proposed in [27] is an ongoing research project. Another very interesting topic for future work is the comparison of orthogonal subspace iteration algorithms and Krylov subspace methods. A refined implementation of these latter for nonlinear eigenvalue problems can be found in [25]. Our approach would be regarded as a robust alternative once complemented with efficient techniques for choosing the size of the polynomial approximation and of the invariant subspace.
Code availability The code can be requested to the corresponding author.

Conflict of interest
The authors have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.