1 Introduction

It is well known that Hermitian matrices, skew-Hermitian matrices, and unitary matrices are unitarily diagonalizable. More generally, this is true for normal matrices, i.e., for matrices \(A\in \mathbb {C}^{n,n}\) satisfying AAH = AHA - a class of matrices that generalizes the classes of Hermitian, skew-Hermitian, and unitary matrices. The fact that unitary matrices can be used as transformation matrices for diagonalizing normal matrices is important because of two fundamental properties: firstly, similarity transformations with unitary matrices preserve the given Hermitian, skew-Hermitian, unitary, or normal structure of the matrix - a fact that it crucial for the development of structure-preserving algorithms for the solution of the given eigenvalue problem - and secondly, unitary matrices are optimally conditioned. Therefore, the normal eigenvalue problem can be considered to be a well-behaved problem in the sense that a diagonalization can be performed by backward stable and structure-preserving algorithms.

The picture changes drastically, if one considers matrices that carry a symmetry structure with respect to an indefinite inner product, i.e., with respect to a nondegenerate Hermitian form that is not necessarily positive definite or with respect to a nondegenerate skew-Hermitian form. An important example are Hamiltonian matrices, i.e., matrices \(A\in \mathbb {R}^{2n,2n}\) satisfying AJ + JA = 0, where J denotes the skew-symmetric matrix

$$ J=\left[\begin{array}{cc} 0&I_{n}\\ -I_{n}&0 \end{array}\right]. $$
(1)

The identity AJ = −JA can be interpreted as skew-symmetry of the matrix A with respect to the skew-symmetric inner product induced by J. The corresponding Hamiltonian eigenvalue problem arises in many application, e.g., in system theory and the theory of Algebraic Riccati equations, see [1, 5, 11] and the references therein. For practical reasons, one often switches to the complex version of the problem which leads to the consideration of matrices \(A\in \mathbb {C}^{2n,2n}\) satisfying AHJ + JA = 0. Typically, these matrices are also called Hamiltonian in the Numerical Linear Algebra community and we will follow this convention in this paper. It should be noted though that other communities prefer the terminology J-skew-adjoint for such matrices (e.g., see [2]) in order to avoid confusion with complex matrices \(A\in \mathbb {C}^{2n,2n}\) satisfying AJ + JA = 0 which are called Hamiltonian as well.

For the solution of the Hamiltonian eigenvalue problem, the so-called Hamiltonian Schur form was suggested in [12] as a target form. This is a Hamiltonian matrix of the block form

$$ \left[\begin{array}{cc} R&B\\ 0&-R^{H} \end{array}\right], $$
(2)

where \(R\in \mathbb {C}^{n,n}\) is upper triangular. It is straightforward to see that this is just a permutation of the classical upper triangular Schur form of a complex matrix and as a consequence the eigenvalues can be read off from the diagonal. It was shown in [12] that this form can always be achieved under a unitary symplectic similarity transformation provided that the given Hamiltonian matrices does not have eigenvalues on the imaginary axis. Recall that a matrix \(S\in \mathbb {C}^{2n,2n}\) is called symplectic (following the convention in the Numerical Linear Algebra community) if SHJS = J. It is easily checked that similarity transformations with symplectic matrices preserve the Hamiltonian structure and are therefore the base of structure-preserving algorithms for the solution of the Hamiltonian eigenvalue problem. However, since the condition number of symplectic matrices may be arbitrarily large, it is favorable to further restrict oneself to the class of unitary symplectic similarity transformations in order to guarantee numerical stability.

Surprisingly, there are Hamiltonian matrices that cannot be transformed to Hamiltonian Schur form via a unitary symplectic similarity transformation. As an obvious example, consider the matrix J which is both Hamiltonian and symplectic (and even unitary). Clearly, if \(U\in \mathbb {C}^{2n,2n}\) is unitary and symplectic then U− 1JU = UHJU = J, so J is invariant under any unitary symplectic similarity transformations and hence cannot be transformed to Hamiltonian Schur form. It is clear from the form in (2) that a necessary condition for the existence of a Hamiltonian Schur form is that the purely imaginary eigenvalues of the given matrix have even multiplicity. Indeed, if λ is an eigenvalue of R then \(-\overline {\lambda }\) is an eigenvalue of − RH, so any purely imaginary eigenvalue of R will also be one of − RH. The example of J however shows that this condition is not sufficient. The long open problem of characterizing all Hamiltonian matrices that can be transformed to Hamiltonian Schur form was finally solved in [6] with the help of a newly developed structured canonical form for Hamiltonian matrices.

At first sight, one may come to the conclusion that the nonexistence of the Hamiltonian Schur form is related to the fact that in contrast to the Euclidean inner product the two fundamental properties of structure-preservation and numerical stability are now partitioned among two sets of transformation matrices instead of only one. Thus, to enable both features, one has to restrict oneself to the set of unitary symplectic matrices which is a much “smaller” subset (in terms of dimension as a manifold) than the sets of symplectic matrices or unitary matrices. This conclusion, however, turns out to be wrong as it is well-known that the Hamiltonian Schur form exists under unitary symplectic similarity transformations if and only if it exists under similarity transformations that are symplectic only. This equivalence can easily be shown by applying a structure-preserving QR decomposition to the transformation matrix, see, e.g., [6] for details. Thus, both in the case of normal matrices with respect to the Euclidean inner product and in the case of Hamiltonian matrices, the actual problem is the computation of a Schur-like form by similarity transformations from the group of matrices that are unitary with respect to the considered inner product and therefore preserve the given symmetry structure of the matrix they are acting on.

We will show in this paper that the diagonal Schur form of normal matrices and the Hamiltonian Schur form of Hamiltonian matrices are two extreme cases of a much more general Schur-like form for matrices carrying a symmetry structure with respect to an indefinite inner product. To treat the problem in full generality, we will consider a generalization of normality of matrices in an indefinite inner product space, the so-called polynomial H-normality which will be introduced in Section 2 where we will also review the basic theory of indefinite inner products. In Section 3 we will formulate the main result of this paper and develop a Schur-like form for polynomially H-normal matrices for the case of a Hermitian and unitary Gram matrix H. Then we will discuss how the Schur form of normal matrices and the Hamiltonian Schur form can be deduced as special cases of this result. The proof of the main result will then be given in Section 4 followed by a short summary in Section 5.

Notation

By \(\mathcal {J}_{n}(\lambda )\) we denote the n × n upper triangular Jordan block associated with the eigenvalue λ. The reverse identity of size n is denote by Rn, i.e.,

figure a

If \(H\in \mathbb {C}^{n,n}\) is a Hermitian matrix, then its inertia index is denoted by (π,ν,ζ), where π, ν, and ζ are the numbers of positive, negative and zero eigenvalues, respectively, each counted with algebraic multiplicities.

2 Indefinite Inner Products and Polynomially H-normal Matrices

Let \(H\in \mathbb {C}^{n,n}\) be Hermitian and invertible. Then H defines an indefinite inner product on \(\mathbb {C}^{n}\) via

$$ {[x,y]}:=x^{H}Hy,\quad x,y\in\mathbb{C}^{n}. $$

As in the case of positive definite inner product, we will call H the Gram matrix of the indefinite inner product. If \(A\in \mathbb {C}^{n,n}\) then the matrix A[∗] := H− 1AHH is called the H-adjoint of A, because it is the unique matrix satisfying the identity [Ax,y] = [x,A[∗]y] for all \(x,y\in \mathbb {C}^{n}\). The matrix A is called

  • H-selfadjoint, if A[∗] = A, or, equivalently, AHH = HA,

  • H-skew-adjoint, if A[∗] = −A, or, equivalently, AHH + HA = 0,

  • and H-unitary, if A[∗] = A− 1, or, equivalently, AHHA = H.

In the following, we will restrict ourselves to Hermitian inner products, because any skew-Hermitian inner product can easily be transformed to a Hermitian inner product by multiplying the corresponding Gram matrix with the imaginary unit i. In particular, a matrix \(A\in \mathbb {C}^{2n,2n}\) is Hamiltonian if and only if A is (iJ)-skew-adjoint, where J is the matrix as in (1).

Canonical forms for H-selfadjoint, H-skew-adjoint, and H-unitary matrices are well known, see, e.g., [2, 8]. More generally, one can define the set of H-normal matrices as the set of all matrices \(A\in \mathbb {C}^{n,n}\) satisfying A[∗]A = AA[∗]. Unfortunately, this set turns out to be “too big”, because it was shown in [3] that the problem of classifying H-normal matrices is a wild problem and hence canonical forms cannot be obtained. Therefore, it was suggested in [10] to consider the set of polynomially H-normal matrices instead. A matrix \(A\in \mathbb {C}^{n,n}\) is called polynomially H-normal if there exists a polynomial p in one variable such that A[∗] = p(A). The polynomial p is then called the H-normality polynomial of A. It is easily checked that any polynomially H-normal matrix is H-normal, but the converse is not true, see [10]. Still, the set of polynomially H-normal matrices is large enough to contain the sets of H-selfadjoint, H-skew-adjoint, and H-unitary matrices. Indeed, if the matrix A is H-selfadjoint or H-skew-adjoint, then it is polynomially H-normal with H-normality polynomial p(t) = t or p(t) = −t, respectively, and since A[∗] = A− 1 and the inverse of a matrix is always a polynomial in that matrix, it follows that also any H-unitary matrix is polynomially H-normal.

The major advantage of polynomially H-normal matrices over H-normal matrices is the fact that a complete classification is available under the following equivalence relation.

Remark 1

Let \(H\in \mathbb {C}^{n,n}\) be Hermitian and invertible and let \(A\in \mathbb {C}^{n,n}\) be polynomially H-normal with H-normality polynomial p. If \(T\in \mathbb {C}^{n,n}\) is invertible, then T− 1AT is polynomially THHT-normal with H-normality polynomial p. In particular, the relation

$$ (A_{1},H_{1}) \sim (A_{2},H_{2}):~\Leftrightarrow~\exists T\in GL_{n}(\mathbb{C}):A_{2}=T^{-1}A_{1}T\wedge H_{2}=T^{H}H_{1}T $$
(3)

is an equivalence relation on the set of pairs (A,H), where H is Hermitian and invertible and A is polynomially H-normal with H-normality polynomial p.

The following canonical form for polynomially H-normal matrices was developed in [8, Theorem 6.1]. Here, \(T(\alpha _{1},\dots ,\alpha _{n})\) denotes an upper triangular n × n Toeplitz matrix that has \(\left [\begin {array}{ccc} \alpha _{1}&\dots &\alpha _{n} \end {array}\right ]\) as its first row.

Theorem 1 (Canonical form for polynomially H-normal matrices)

Let the matrix \(A\in \mathbb {C}^{n,n}\) be polynomially H-normal with H-normality polynomial p. Then there exists a nonsingular matrix T such that

$$ T^{-1}AT=A_{1}\oplus\cdots\oplus A_{q},\quad T^{\ast} HT=H_{1}\oplus\cdots\oplus H_{q}, $$
(4)

where Aj is Hj-indecomposable and where Aj and Hj have one of the following forms:

  1. i)

    blocks associated with eigenvalues \(\lambda _{j}\in \mathbb {C}\) satisfying \(p(\lambda _{j})=\overline {\lambda _{j}}\):

    $$ A_{j}=\lambda_{j} I_{n_{j}} + e^{i\theta_{j}}T(0,1,ir_{j,2},\dots,ir_{j,n_{j}-1}),\quad H_{j}=\sigma_{j} R_{n_{j}}, $$
    (5)

    where \(n_{j}\in \mathbb {N}\), σj ∈{1,− 1}, 𝜃j ∈ [0,π), and \(r_{j,2},\dots ,r_{j,n_{j}-1}\in \mathbb {R}\);

  2. ii)

    blocks associated with a pair (λj,μj) of eigenvalues with \(\mu _{j}=\overline {p(\lambda _{j})} \neq \lambda _{j}\), \(\overline {p(\mu _{j})} = \lambda _{j}\), and Re(λj) > Re(μj) or Im(λj) > Im(μj) if Re(λj) = Re(μj):

    $$ A_{j}=\left[ \begin{array}{cc} \mathcal{J}_{m_{j}}(\lambda_{j})&0\\ 0& p\left( \mathcal{J}_{m_{j}}(\lambda_{j})\right)^{H} \end{array}\right],\quad H_{j}=\left[ \begin{array}{cc} 0 & I_{m_{j}}\\ I_{m_{j}} & 0 \end{array}\right], $$
    (6)

    where \(m_{j}\in \mathbb {N}\).

Moreover, the form (4) is unique up to the permutation of blocks, and the parameters 𝜃j, and \(r_{j,2},\dots ,r_{j,n_{j}-1}\) in (5) are uniquely determined by λj and the coefficients of p, and they can be computed from the identity

$$ \overline{\lambda_{j}}I_{n_{j}} + e^{-i\theta_{j}}T(0,1,-ir_{j,2},\dots,-ir_{j,n_{j}-1}) = p\!\left( \lambda_{j} I_{n_{j}} + e^{i\theta_{j}} T(0,1,ir_{j,2},\dots,ir_{j,n_{j}-1})\right). $$

(We highlight that the eigenvalues λj in i) are not necessarily pairwise distinct, i.e., the same eigenvalue may occur in different blocks. The same is true for the eigenvalues λj,μj in ii).)

Besides the eigenvalues and their partial multiplicities, the signs σj = ± 1 attached to each Jordan block corresponding to an eigenvalue λj satisfying \(\overline {\lambda }_{j}=p(\lambda _{j})\) are additional invariants under the equivalence relation (3). The list of all signs associated with a fixed eigenvalue λj is referred to as the sign characteristic of the eigenvalue λj extending the terminology in [2] used for H-selfadjoint and H-unitary matrices.

The following values related to the sign characteristic of a fixed eigenvalue will play a crucial role in the characterization when a structured Schur-like form will exist.

Definition 1

Let \(H\in \mathbb {C}^{n,n}\) be Hermitian and invertible, let \(A\in \mathbb {C}^{n,n}\) be polynomially H-normal, and let \(\lambda \in \mathbb {C}\) be an eigenvalue of A that satisfies \(\overline {\lambda }=p(\lambda )\). Then the sum of all signs σj from the sign characteristic of λj attached to blocks of odd size is called the sign sum of λj and is denoted by signsum(λj).

To illustrate Definition 1 consider the matrices

$$ \begin{array}{@{}rcl@{}} A&=&\mathcal{J}_{5}(0)\oplus\mathcal{J}_{4}(0)\oplus\mathcal{J}_{3}(0)\oplus\mathcal{J}_{3}(0)\oplus\mathcal{J}_{3}(0)\oplus\mathcal{J}_{2}(0)\oplus\mathcal{J}_{1}(0),\\ H&=&\sigma_{1}R_{5}\oplus\sigma_{2}R_{4}\oplus\sigma_{3}R_{3}\oplus\sigma_{4}R_{3}\oplus\sigma_{5}R_{3}\oplus\sigma_{6}R_{2}\oplus\sigma_{7}R_{1} \end{array} $$

with σi ∈{1,− 1} for \(i=1,\dots ,7\). Then A is polynomially H-normal with H-normality polynomial p(t) = t (in fact, A is H-selfadjoint), and \(\overline {0}=p(0)\). The sign sum of the eigenvalue λ = 0 is then given by

$$ \operatorname{signsum}(0)=\sigma_{1}+\sigma_{3}+\sigma_{4}+\sigma_{5}+\sigma_{7}. $$

Note that in accordance with Definition 1 the values σ2 and σ6 do not contribute to the sign sum as they are attached to blocks of the even sizes 4 and 2, respectively.

The signsum has an important impact on the inertia index of the given Hermitian matrix defining the indefinite inner product as we will show in the following lemma.

Lemma 1

Let \(H\in \mathbb {C}^{n,n}\) be Hermitian with inertia index (π,ν,0) and let \(A\in \mathbb {C}^{n,n}\) be polynomially H normal. If \(\lambda _{1},\dots ,\lambda _{r}\in \mathbb {C}\) are the pairwise distinct eigenvalues of A satisfying \(\overline {\lambda _{j}}=p(\lambda _{j})\), then

$$ \pi-\nu={\sum}_{j=1}^{r}\operatorname{signsum}(\lambda_{j}). $$

Proof

The proof immediately follows by inspection from the canonical form given in Theorem 1. Indeed, one easily checks that the matrices Hj from blocks of type ii) and blocks of type i) corresponding to an even size nj contribute equally to the positive and negative eigenvalues of H. On the other hand, the matrix \(H_{j}=\sigma _{j}R_{n_{j}}\) of a block as in type i) that corresponds to an odd size nj = 2k + 1 has inertia index (k + 1,k,0) if σj = 1, and inertia index (k,k + 1,0) if σj = − 1. □

Finally, we recall the concept of neutral subspaces in indefinite inner product spaces. If \(H\in \mathbb {C}^{n,n}\) is Hermitian and invertible, then a subspace \(\mathcal {V} \subseteq \mathbb {C}^{n}\) is called H-neutral if [v,w] = 0 (or, equivalently, vHHw = 0) for all \(v,w\in \mathcal {V}\). It is well-known that if (π,ν,0) is the inertia index of H, then the maximal possible dimension of an H-neutral subspace is equal to \(k=\min \limits \frac {n-|\pi -\nu |}{2}=\min \limits \{\pi ,\nu \}\). An H-neutral subspace of this maximal dimension k is called a maximal H-neutral subspace.

3 Schur-like Forms and Invariant Maximal H-neutral Subspaces

In this section, we will develop the main result of this paper: the introduction of a structured Schur-like form for polynomially H-normal matrices combined with a characterization of its existence. As pointed out in the introduction, it was shown in [6] that a Hamiltonian matrix can be transformed to Hamiltonian Schur form via a unitary symplectic similarity transformation if and only if the same can be done via a similarity transformation that is only symplectic. An analysis of the corresponding proof in [6] reveals that the property of the matrix J in (1) to be unitary is a crucial fact in this equivalence. Therefore, we will assume throughout this section that the Gram matrix of the given indefinite inner product is not only Hermitian, but also unitary. Many important examples of Gram matrices such as

$$ R_{n},\quad \left[\begin{array}{cc} 0&I_{n}\\ I_{n}&0 \end{array}\right], \quad \left[ \begin{array}{cc} I_{p}&0\\ 0&-I_{q} \end{array}\right] $$

satisfy this extra condition. (We mention in passing that properties of inner products that are either Hermitian or unitary are discussed in [7].)

Theorem 2

Let \(H\in \mathbb {C}^{n,n}\) be unitary and Hermitian with inertia index (π,ν,0) and let \(A\in \mathbb {C}^{n,n}\) be polynomially H-normal with H-normality polynomial p. Furthermore, let m := |πν|. Then nm is even and the following statements are equivalent, where \(k:=\frac {n-m}{2}\).

  1. 1)

    There exists an H-neutral subspace of dimension k that is A-invariant.

  2. 2)

    There exists a unitary matrix \(U\in \mathbb {C}^{n,n}\) such that

    $$ U^{-1}AU=\left[ \begin{array}{ccc} B_{11}&B_{12}&B_{13}\\ 0&p(B_{11})^{H}&0\\ 0&B_{32}&B_{33} \end{array}\right] \quad \text{and}\quad U^{H}HU=\left[ \begin{array}{ccc} 0&I_{k}&0\\ I_{k}&0&0\\ 0&0&sI_{m} \end{array}\right], $$
    (7)

    where \(s=\frac {\pi -\nu }{|\pi -\nu |}\) (if m≠ 0 and thus πν≠ 0, else let s := 1), \(B_{11}\in \mathbb {C}^{k,k}\) is upper triangular and \(B_{33}\in \mathbb {C}^{m,m}\) is diagonal.

  3. 3)

    There exists an invertible matrix \(S\in \mathbb {C}^{n,n}\) such that

    $$ S^{-1}AS=\left[ \begin{array}{ccc} B_{11}&B_{12}&B_{13}\\ 0&p(B_{11})^{H}&0\\ 0&B_{32}&B_{33} \end{array}\right] \quad\text{and}\quad S^{H}HS=\left[ \begin{array}{ccc} 0&I_{k}&0\\ I_{k}&0&0\\ 0&0&sI_{m} \end{array}\right], $$
    (8)

    where \(s=\frac {\pi -\nu }{|\pi -\nu |}\) (if m≠ 0 and thus πν≠ 0, else let s := 1), \(B_{11}\in \mathbb {C}^{k,k}\) is upper triangular and \(B_{33}\in \mathbb {C}^{m,m}\) is diagonal.

  4. 4)

    Let \(\lambda _{1},\dots ,\lambda _{r}\in \mathbb {C}\) be the eigenvalues of A satisfying the equation \(\overline {\lambda }=p(\lambda )\). Then s ⋅ signsum(λi) ≥ 0 for \(i=1,\dots ,r\) and

    $$ {\sum}_{i=1}^{r}\operatorname{signsum}(\lambda_{i})=sm. $$

Proof

The proof of Theorem 2 is rather long and will therefore be presented in a separate section. □

As mentioned in the introduction, Theorem 2 combines and generalizes two important results from the literature that we will restate below as corollaries. The first results recovers the well-known result on unitary diagonalizability of (In-)normal matrices.

Corollary 1 (Schur-form of normal matrices)

Let \(A\in \mathbb {C}^{n,n}\) be a normal matrix, i.e., AHA = AAH. Then A is unitarily diagonalizable.

Proof

By [4] normality with respect to the Euclidean inner product is equivalent to polynomially In-normality and hence Theorem 2 can be applied with H = In. Since In has the inertia index (n,0,0), we find that k = 0. Thus, condition 2) of Theorem 2 states the existence of a unitary matrix \(U\in \mathbb {C}^{n,n}\) such that U− 1AU = B33 is diagonal. □

Corollary 2 (Hamiltonian Schur-form of Hamiltonian matrices)

Let \(A\in \mathbb {C}^{2n,2n}\) be a Hamiltonian matrix, i.e., AHJ + JA = 0. Then the following statements are equivalent:

  1. 1)

    There exists an n-dimensional subspace of \(\mathbb {C}^{2n}\) that is J-neutral and A-invariant.

  2. 2)

    There exists a unitary symplectic matrix \(Q\in \mathbb {C}^{2n,2n}\) such that Q− 1AQ is in Hamiltonian Schur form, i.e.,

    $$ Q^{-1}AQ=\left[ \begin{array}{cc} B&C\\ 0&-B^{H} \end{array}\right], $$

    where \(B\in \mathbb {C}^{n,n}\) is upper triangular.

  3. 3)

    There exists a symplectic matrix \(S\in \mathbb {C}^{2n,2n}\) such that S− 1AS is in Hamiltonian Schur form, i.e.,

    $$ S^{-1}AS=\left[ \begin{array}{cc} B&C\\ 0&-B^{H} \end{array}\right], $$

    where \(B\in \mathbb {C}^{n,n}\) is upper triangular.

  4. 4)

    For any purely imaginary eigenvalue λ of A, the number of odd partial multiplicities corresponding to λ with sign + 1 is equal to the number of partial multiplicities corresponding to λ with sign − 1.

Proof

First, we recall that A is Hamiltonian if and only if A is H-skew-adjoint with H = iJ. In particular, A is polynomially H-normal with H-normality polynomial p(t) = −t. We will frequently make use of this fact in the following.

“1) ⇒ 2)”: It is trivial that the J-neutral subspace in 1) is also iJ-neutral. Moreover, the inertia index of iJ is (n,n,0) and thus, Theorem 2 implies the existence of a unitary matrix \(U\in \mathbb {C}^{2n,2n}\) such that

$$ U^{-1}AU=\left[ \begin{array}{cc} B&C\\ 0&-B^{H} \end{array}\right]\quad\text{and}\quad U^{H}(iJ)U=\left[ \begin{array}{cc} 0&I_{n}\\ I_{n}&0 \end{array}\right]. $$

Setting Q := U ⋅diag(In,iIn), we obtain that

$$ Q^{-1}AQ=\left[\begin{array}{cc} B&iC\\ 0&-B^{H} \end{array}\right] \quad\text{and}\quad Q^{H}JQ=\left[ \begin{array}{cc} 0&I_{n}\\ -I_{n}&0 \end{array}\right]=J, $$

i.e., Q is unitary and symplectic. This implies 2).

“2) ⇒ 3)” is trivial and “3) ⇒ 4)” and “4) ⇒ 1)” follow immediately from Theorem 2 taking into account that the eigenvalues satisfying \(\overline {\lambda }=p(\lambda )=-\lambda \) are exactly the purely imaginary eigenvalues of A, and that signsum(λ) = 0 is equivalent to the statement that the number of odd partial multiplicities corresponding to λ with sign + 1 is equal to the number of partial multiplicities corresponding to λ with sign − 1. □

The equivalence of 2), 3) and 4) was proved in [6] while the equivalence of 1) and 4) was proved in [13]. Clearly, the equivalence of 1) and 2) - or 1) and 3) - immediately follows from those two results in the literature and since then this has been implicitly known by many researchers dealing with Hamiltonian matrices. Nevertheless, it seems that a theorem combining all four equivalent conditions into a single result was explicitly formulated only as late as in [9].

Remark 2

The two results in Corollaries 1 and 2 represent the two extreme cases k = 0 and m = 0 in Theorem 2. Whenever k,m≠ 0 as it would be the case for Gram matrices of the form diag(Ip,−Iq) with pq, then the corresponding Schur-like form will have triangular and diagonal parts on the block diagonal as indicated in (7).

We highlight that the transformation in Theorem 2 is a transformation that changes the inner product, but keeps the symmetry structure of the matrix U− 1AU linked to the transformed Gram matrix UHHU in the sense of Remark 1. For the practical use of Theorem 2, we advise to first transform the pair (A,H) to a form \((A^{\prime },H^{\prime })\), where \(H^{\prime }\) already has the form

$$ H^{\prime}=\left[ \begin{array}{ccc} 0&I_{k}&0\\ I_{k}&0&0\\ 0&0&sI_{m} \end{array}\right] $$

with s ∈{± 1}. When Theorem 2 is then applied to the pair \((A^{\prime },H^{\prime })\) it yields the existence of a unitary matrix U such that \(U^{-1}A^{\prime }U\) is in the Schur-like form as in (7) while \(U^{H}H^{\prime }U=H^{\prime }\). The latter conditions just means that the matrix U is not only unitary, but also \(H^{\prime }\)-unitary.

4 Proof of the Main Result

Before we prove Theorem 2, we start with a technical lemma that will be used frequently in the following.

Lemma 2

Let \(A_{1},H_{1}\in \mathbb {C}^{n_{1},n_{1}}\) and \(A_{2},H_{2}\in \mathbb {C}^{n_{2},n_{2}}\), where H1 and H2 are Hermitian and invertible, and let

$$ A=\left[ \begin{array}{cc} A_{1}&0\\ 0&A_{2} \end{array}\right],\quad H = \left[ \begin{array}{cc} H_{1}&0\\ 0&H_{2} \end{array}\right]. $$

If A1 has an invariant H1-neutral subspace of dimension k1 and A2 has an invariant H2-neutral subspace of dimension k2, then A has an invariant H-neutral subspace of dimension k1 + k2.

Proof

Let the vectors \(v_{1},\dots ,v_{k_{1}}\in \mathbb {C}^{n_{1}}\) and \(w_{1},\dots ,w_{k_{2}}\in \mathbb {C}^{n_{2}}\) form bases of the Ai-invariant Hi-neutral subspaces for i = 1,2, respectively. Then it is straightforward to verify that the vectors

$$ \left[ \begin{array}{c} v_{1}\\ 0 \end{array}\right],\dots,\left[ \begin{array}{c} v_{k_{1}}\\ 0 \end{array}\right],\left[ \begin{array}{c} 0\\ w_{1} \end{array}\right],\dots,\left[ \begin{array}{c} 0\\ w_{k_{2}} \end{array}\right]\in\mathbb{C}^{n_{1}+n_{2}} $$

form a basis of an A-invariant H-neutral subspace which obviously has dimension k1 + k2. □

Proof Proof of Theorem2

“1) ⇒ 2)”: By switching to an orthonormal basis whose first k columns span the A-invariant H-neutral subspace, we can assume that A and H have the forms

$$ A=\left[ \begin{array}{cc} A_{11}&A_{12}\\ 0&A_{22} \end{array}\right]\quad\text{and}\quad H=\left[ \begin{array}{cc} 0&H_{12}\\ H_{12}^{H}&H_{22} \end{array}\right], $$

where \(A_{22},H_{22}\in \mathbb {C}^{n-k,n-k}\). Since H is unitary, the rows of \(H_{12}\in \mathbb {C}^{k,n-k}\) are orthonormal and consequently its singular value decomposition takes the form

$$ H_{12}=U_{2}\left[ \begin{array}{cc} I_{k} & 0 \end{array}\right] V_{2}, $$

where \(U_{2}\in \mathbb {C}^{k,k}\) and \(V_{2}\in \mathbb {C}^{n-k,n-k}\) are unitary. Setting Q := diag(U2,V2) we obtain

$$ Q^{-1}AQ=\left[ \begin{array}{ccc} B_{11}&B_{12}&B_{13}\\ 0&B_{22}&B_{23}\\ 0&B_{32}&B_{33} \end{array}\right] \quad\text{and}\quad Q^{H}HQ=\left[ \begin{array}{ccc} 0&I_{k}&0\\ I_{k}&0&0\\ 0&0&H_{33} \end{array}\right], $$

where B11,B22 are k × k. The zeros in the block positions (2,2), (2,3), and (3,2) are due to the fact that QHHQ is still unitary and consequently has orthonormal columns. On the other hand, QHHQ is also Hermitian and keeping in mind that its inertia index is (π,ν,0) and we have \(k=\min \limits \{\pi ,\nu \}\), the block structure implies that H33 is positive or negative definite, depending on πν being positive or negative, respectively. (If π = ν, then n = 2k and the (3,3)-blocks in Q− 1AQ and QHHQ are void.) Since H33 is also unitary, it follows that H33 = Im or H33 = −Im. Applying a unitary transformation of the form diag(U1,U1,U3) with \(U_{1}\in \mathbb {C}^{k,k}, U_{3}\in \mathbb {C}^{m,m}\) if necessary, we can assume without loss of generality that B11 and B33 are upper triangular. (Note that this transformation will not change QHHQ. Finally, we exploit the fact that A is polynomially H-normal with H-normality-polynomial p. This implies

$$ \left[\begin{array}{ccc} B_{22}^{H}&B_{12}^{H}&B_{32}^{H}\\ 0&B_{11}^{H}&0\\ B_{23}^{H}&B_{13}^{H}&B_{33}^{H} \end{array}\right] = Q^{H}H^{-1}A^{H}HQ=p(Q^{-1}AQ)=\left[ \begin{array}{cc} p(B_{11})&\ast\\ 0&p\left( \left[ \begin{array}{cc} B_{22}&B_{23}\\ B_{32}&B_{33} \end{array}\right]\right) \end{array}\right] $$

and hence B22 = p(B11)H and B23 = 0. But then, we find that \(B_{33}^{H}=p(B_{33})\) which implies that B33 is normal (i.e., with respect to the Euclidean inner product) and hence diagonal as upper triangular normal matrices are diagonal.

“2) ⇒ 3)”: This is trivial.

“3) ⇒ 4)”: The proof proceeds by induction on the number r of eigenvalues satisfying \(\overline {\lambda }=p(\lambda )\). If r = 0, then the block B33 in (8) is void, because it is diagonal and satisfies \(B_{33}^{H}=p(B_{33})\), so all its eigenvalues satisfy \(\overline {\lambda }=p(\lambda )\). Thus, we have m = 0 and 4) is satisfied by the definition of the empty sum.

Now assume that r > 0. Let λ := λ1, i.e., we have \(\overline {\lambda }=p(\lambda )\). Starting from 3) we can assume without loss of generality that A and H are in the form (8), where the eigenvalues on the diagonal of B11 and B33 are ordered in such a way that all occurrences of λ come first, i.e., A has the form

figure b

where k1,m1 ≥ 0 and \(\sigma (A_{11}),\sigma (p({A}_{11})^{H}),\sigma (A_{55})\subseteq \{\lambda \}\) and λ is not contained in σ(A22), σ(p(A22)H), or σ(A66). If k1 > 0, then letting \(X\in \mathbb {C}^{k_{1},k_{2}}\) be the unique solution of the Sylvester equation XA22A11X = A12 (which exists, because the spectra of A11 and A22 are disjoint) and defining \(\widehat {S}\) as the matrix that differs from the block-partitioned identity \(\mathcal {I}:=I_{k_{1}}\oplus I_{k_{2}}\oplus I_{k_{1}}\oplus I_{k_{2}}\oplus I_{m_{1}}\oplus I_{m_{2}}\) by adding X in the (1,2)-block position, we obtain that a similarity transformation on A with \(\widehat {S}\) annihilates the block entry A12 of A. The corresponding congruence transformation creates the entry X in the (3,2)-position and the entry XH in the (2,3)-block-position of H. We can restore H with the congruence transformation with the matrix T which differs from the block-partitioned identity \(\mathcal {I}\) by putting − XH in the (4,3)-block-position. The corresponding similarity applied to \(\widehat {S}^{-1}A \widehat {S}\) will change the block-entries A13, A23, A43, A53, A63, but not the yet established zero in the (1,2)-block position. For the ease of notation let us rename \(T^{-1}\widehat {S}^{-1}A\widehat {S}T\) by A and \(T^{H}\widehat {S}^{H}H\widehat {S}T\) by H. (Similar renaming steps will occur after each of the following steps without further notice.) We illustrate the A12-elimination-step in the following diagram, where non-zero block-entries, that were effected by the current transformation are marked as bullets. In each substep, the pair (i,j) denotes the block-entry of the transformation matrix that differs from the one in the identity matrix. Observe that the similarity transformation with such a matrix adds the i-th block column to the j-th block column, but the j-th block row to the i-th block row while in the corresponding congruence transformation the i-th block column and i-th block row are added to the j-th block column or j-th block row, respectively:

$$ A:~\left[\begin{array}{cccccc} \ast&\ast&\ast&\ast&\ast&\ast\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(1,2)} \left[\begin{array}{cccccc} \ast&0&\bullet&\bullet&\bullet&\bullet\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(4,3)} \left[\begin{array}{cccccc} \ast&0&\bullet&\ast&\ast&\ast\\ 0&\ast&\bullet&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\bullet&\ast&0&0\\ 0&0&\bullet&\ast&\ast&0\\ 0&0&\bullet&\ast&0&\ast \end{array}\right], $$
$$ H:~\left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(1,2)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&\bullet&\ast&0&0\\ \ast&\bullet&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(4,3)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right]. $$

In the next step, we eliminate A16 by applying a similarity transformation on A that is obtained from \(\mathcal {I}\) by changing the (1,6)-block to the solution X of the Sylvester equation XA66A11X = A16. The corresponding congruence transformation on H introduces the matrix X in the (3,6)-block-position and XH in the (6,3)-block-position. We can restore H as follows: first, we apply a congruence with a matrix that differs from \(\mathcal {I}\) by − sXH in the (6,3)-block-position. This will annihilate the (3,6)- and (6,3)-block entries, but introduce the block − sXXH in the (3,3)-block entry. The corresponding similarity transformation on A only effects the block-entries A23 and A26. Then, we eliminate the (3,3)-block entry of H by a congruence transformation with the matrix that coincides with \(\mathcal {I}\) except for having \(-\frac {1}{2}X^{H}X\) as its (1,3)-block. The corresponding similarity transformation on A only changes the block A13. As before, we illustrate this elimination step with the help of a diagram:

$$ A:~\left[\begin{array}{cccccc} \ast&0&\ast&\ast&\ast&\ast\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(1,6)} \left[\begin{array}{cccccc} \ast&0&\bullet&\bullet&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(6,3)} \left[\begin{array}{cccccc} \ast&0&\ast&\ast&\ast&0\\ 0&\ast&\bullet&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\bullet&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(1,3)} \left[\begin{array}{cccccc} \ast&0&\bullet&\ast&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right], $$
$$ H:\left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(1,6)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&\bullet\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&\bullet&0&0&\ast \end{array}\right] \underset{\leadsto}{(6,3)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&\bullet&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(1,3)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right]. $$

We continue by eliminating A54 with a similarity transformation that differs from \(\mathcal {I}\) by the solution of the Sylvester equation Xp(A22)HA55X = A54 in the (5,4)-block-position followed by transformations that restore H. Since the step is analogous to the A12-elimination step, we restrict ourselves to the illustration via a diagram:

$$ A:~\left[\begin{array}{cccccc} \ast&0&\ast&\ast&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&\ast&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(5,4)} \left[\begin{array}{cccccc} \ast&0&\ast&\bullet&\ast&0\\ 0&\ast&\ast&\bullet&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\bullet&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(2,5)} \left[\begin{array}{cccccc} \ast&0&\ast&\ast&\ast&0\\ 0&\ast&\bullet&\ast&\bullet&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right], $$
$$ H:~\left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(5,4)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&\bullet&0\\ 0&0&0&\bullet&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(2,5)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right]. $$

Finally, we eliminate A14 with a similarity transformation whose relevant entry, the solution of the Sylvester equation Xp(A22)HA11X = A14 is in the (1,4)-block position. Restoring H imposes another similarity transformation on A with the relevant entry in the (2,3)-block position which effects the entry A23 only. This step is depicted as follows:

$$ A:~\left[\begin{array}{cccccc} \ast&0&\ast&\ast&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(1,4)} \left[\begin{array}{cccccc} \ast&0&\bullet&0&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right] \underset{\leadsto}{(2,3)} \left[\begin{array}{cccccc} \ast&0&\ast&0&\ast&0\\ 0&\ast&\bullet&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right], $$
$$ H:~\left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(1,4)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&\bullet&0&0\\ 0&\ast&\bullet&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right] \underset{\leadsto}{(2,3)} \left[\begin{array}{cccccc} 0&0&\ast&0&0&0\\ 0&0&0&\ast&0&0\\ \ast&0&0&0&0&0\\ 0&\ast&0&0&0&0\\ 0&0&0&0&\ast&0\\ 0&0&0&0&0&\ast \end{array}\right]. $$

The key observation is now that the block zero pattern obtained for A is invariant under multiplication and thus, all powers of A as well as p(A) will have the same block zero pattern. But then, the fact that A is polynomially H-normal with H-normality polynomial p implies

$$ \left[\begin{array}{cccccc} p(A_{11})&A_{43}^{H}&A_{13}^{H}&A_{23}^{H}&A_{53}^{H}&A_{63}^{H}\\ 0&p(A_{22})&0&A_{24}^{H}&0&A_{64}^{H}\\ 0&0&A_{11}^{H}&0&0&0\\ 0&0&0&A_{22}^{H}&0&0\\ 0&0&A_{15}^{H}&A_{25}^{H}&A_{55}^{H}&0\\ 0&0&0&A_{26}^{H}&0&A_{66}^{H} \end{array}\right] = H^{-1}A^{H}H=p(A)= \left[\begin{array}{cccccc} \ast&0&\ast&0&\ast&0\\ 0&\ast&\ast&\ast&\ast&\ast\\ 0&0&\ast&0&0&0\\ 0&0&\ast&\ast&0&0\\ 0&0&\ast&0&\ast&0\\ 0&0&\ast&\ast&0&\ast \end{array}\right], $$

and we obtain that A23 = 0, A25 = 0, A43 = 0, and A63 = 0. But then, after applying a block permutation, we may assume that A and H have the forms A = A1A2 and H = H1H2 with

$$ A_{1}=\left[\begin{array}{ccc} A_{11}&A_{13}&A_{15}\\ 0&p(A_{11})^{H}&0\\ 0&A_{53}&A_{55} \end{array}\right], \quad H_{1}= \left[\begin{array}{ccc} 0&I_{k_{1}}&0\\ I_{k_{1}}&0&0\\ 0&0&sI_{m_{1}} \end{array}\right] $$

and

$$ A_{2}=\left[\begin{array}{ccc} A_{22}&A_{24}&A_{26}\\ 0&p(A_{22})^{H}&0\\ 0&A_{64}&A_{66} \end{array}\right], \quad H_{2}= \left[\begin{array}{ccc} 0&I_{k_{2}}&0\\ I_{k_{2}}&0&0\\ 0&0&sI_{m_{2}} \end{array}\right]. $$

Note that A1 has only one eigenvalue which is λ1 = λ. It immediately follows from Lemma 1 that m1s = signsum(λ1) and s ⋅signsum λ1 = m1 ≥ 0. On the other hand, the matrix A2 now has precisely r − 1 eigenvalues \(\lambda _{2},\dots ,\lambda _{r}\) satisfying \(\overline {\lambda _{i}}=p(\lambda _{i})\) and we can apply the induction hypothesis to obtain that s ⋅signsum(λi) ≥ 0 for \(i=2,\dots ,r\) and

$$ sm=sm_{1}+sm_{2}=\text{signsum}(\lambda_{1})+{\sum}_{j=2}^{r}\text{signsum}(\lambda_{j}) $$

which proves 4).

“4) ⇒ 1)”: We may assume without loss of generality that A and H are in the forms of Theorem 1. Thus, by Lemma 2, we may consider some of the corresponding diagonal blocks of A and H separately in order to construct an A-invariant H-neutral subspace. We will do this by individually investigating each block in the canonical form of Theorem 1 that is of type ii) or of type i) having even dimension. Concerning the blocks of type i) having odd dimension, we will have to consider all of them together to obtain the desired dimension of the A-invariant H-neutral subspace. We proceed by first considering the following three special cases, before discussing the general case.Special Case 1: A is a block of type ii). In this case, we have m = 0, \(k=\frac {n}{2}\) and

$$ A=\left[\begin{array}{cc} \mathcal{J}_{k}(\lambda)&0\\ 0&p(\mathcal{J}_{k}(\lambda))^{H} \end{array}\right], \quad H=\left[\begin{array}{cc} 0&I_{k}\\ I_{k}&0 \end{array}\right], $$

where \(p(\lambda )\neq \overline {\lambda }\). Obviously, the first k standard basis vectors of \(\mathbb {C}^{n}\) span a k-dimensional A-invariant subspace that is H-neutral.Special Case 2: A is a block of type i) of even dimension. In this case, we again have m = 0, \(k=\frac {n}{2}\) and

$$ A=\lambda I_{n}+e^{i\theta}T(0,1,ir_{2},\dots,ir_{n-1}),\quad H=\sigma R_{n}, $$

where \(p(\lambda )=\overline {\lambda }\), σ ∈{1,− 1}, and where \(\theta ,r_{2},\dots ,r_{n-1}\) are as specified in Theorem 1. Again, it is obvious that the first k standard basis vectors of \(\mathbb {C}^{n}\) span a k-dimensional A-invariant subspace that is H-neutral.Special Case 3: A only consists of blocks of type i) with odd dimension. In this case, we have

$$ A=(\lambda I_{n_{1}}+e^{i\theta_{1}}T(0,1,ir_{1,2},\dots,ir_{1,n_{1}-1}))\oplus\cdots\oplus (\lambda I_{n_{\ell}}+e^{i\theta_{\ell}}T(0,1,ir_{\ell,2},\dots,ir_{\ell,n_{\ell}-1})) $$
$$ \text{and} \quad H=\sigma_{1}R_{n_{1}}\oplus\cdots\oplus\sigma_{\ell} R_{n_{\ell}}, $$

where nj = 2kj + 1 for some nonnegative integers \(k_{1},\dots ,k_{\ell }\). By 4), we have |signsum(λ)| = m. Without loss of generality we may assume that s = 1 considering − H instead of H otherwise, i.e., we have that signsum(λ) = m. It follows that m. More precisely, there exists a nonnegative integer α such that = m + 2α and such that m + α signs among \(\sigma _{1},\dots ,\sigma _{\ell }\) are positive and α are negative. Without loss of generality, we may assume that among the diagonal blocks the first 2α blocks have alternating signs starting with σ1 = 1 and thus the last m blocks all have positive sign. Let us first consider a group of two blocks with signs + 1,− 1, say

$$ A_{j}=\left[\begin{array}{cc} T_{2j-1}&0\\ 0&T_{2j} \end{array}\right], \quad H_{j}=\left[\begin{array}{cc} R_{n_{2j-1}}&0\\ 0&-R_{n_{2j}} \end{array}\right], $$

where \(j\in \{1,\dots ,\alpha \}\) and where \(T_{i}\in \mathbb {C}^{n_{i},n_{i}}\) is an upper triangular matrix having \(\lambda I_{n_{j}}\) as its diagonal. Then it is easy to check that the vectors

$$ e_{1},{\dots} e_{k_{2j-1}},e_{n_{2j-1}+1},\dots,e_{n_{2j-1}+k_{2j}},e_{k_{2j-1}+1}+e_{n_{2j-1}+k_{2j}+1} $$
(9)

form a basis of an Aj-invariant Hj-neutral subspace. Indeed, partitioning

figure c

a simultaneous block permutation on Aj and Hj results in

$$ \left[\begin{array}{cccccc} T_{11} & 0 & T_{12} & 0 & T_{13} & 0\\ 0 & \widetilde T_{11} & 0 & \widetilde T_{12} & 0 & \widetilde T_{13}\\ 0&0&\lambda & 0 & T_{23} & 0\\ 0&0&0&\lambda & 0 & \widetilde T_{23}\\ 0&0&0&0&T_{33}&0\\ 0&0&0&0&0&\widetilde T_{33} \end{array}\right] \quad\text{and}\quad \left[\begin{array}{cccccc} 0&0&0&0&R_{k_{2j-1}}&0\\ 0&0&0&0&0&-R_{k_{2j-1}}\\ 0&0&1&0&0&0\\ 0&0&0&-1&0&0\\ R_{k_{2j-1}}&0&0&0&0&0\\ 0&-R_{k_{2j-1}}&0&0&0&0 \end{array}\right]. $$

Then transforming the middle 2 × 2 blocks with the transformation given by the matrix

$$ \left[\begin{array}{cc} 1&1\\ 1&-1 \end{array}\right] $$

produces matrices of the form

$$ \left[\begin{array}{cc} \widehat T_{1}&\widehat T_{2}\\ 0&\widehat T_{3} \end{array}\right] \quad\text{ and }\quad \left[\begin{array}{cc} 0&\widehat H\\ \widehat H^{H}&0 \end{array}\right], $$
(10)

where all blocks have size \(\frac {n_{2j-1}+n_{2j}}{2}\). Observe that the first \(\frac {n_{2j-1}+n_{2j}}{2}\) columns of the transformation matrix that transform (Aj,Hj) to the pair of the matrices in (10) coincide with the vectors in (9), possibly up to scalar multiples.

Next, we consider a block for \(j\in \{2\alpha +1,\dots ,\ell \}\), i.e.,

$$ A_{j}=\lambda I_{n_{j}}+e^{i\theta_{j}}T(0,1,ir_{j,2},\dots,ir_{1,n_{j}-1}),\quad H_{j}=R_{n_{j}}. $$

Here, the first kj standard basis vectors span an Aj-invariant H-neutral subspace.

In view of Lemma 2, we obtain the existence of an A-invariant H-neutral subspace of dimension

$$ \sum\limits_{j=1}^{\alpha}(k_{2j-1}+k_{2j}+1)+\sum\limits_{j=2\alpha+1}^{\ell} k_{j}= \sum\limits_{j=1}^{\alpha} \frac{n_{2j-1}+n_{2j}}{2}+\sum\limits_{j=2\alpha+1}^{\ell} \frac{n_{j}-1}{2}=\frac{n-m}{2}, $$

where we used that − 2α = m.

The general case: Now let A be general having r eigenvalues \(\lambda _{1},\dots ,\lambda _{r}\in \mathbb {C}\) satisfying \(\overline {\lambda _{i}}=p(\lambda _{i})\), \(i=1,\dots ,r\). Putting together the special cases above in view of our observation, we obtain that there exists an A-invariant H-neutral subspace of dimension

$$ \frac{1}{2}\left( n-{\sum}_{j=1}^{r}|\text{signsum}(\lambda_{i})|\right), $$

and by 4) this dimension is equal to \(\frac {n-m}{2}=k\) as desired. This finishes the proof. □

5 Conclusions

We have developed a Schur-like form for polynomially H-normal matrices, where H is Hermitian and unitary and characterized under which conditions these forms can be obtained via structure preserving (and unitary) similarity transformations. In particular, the result can be applied to all matrices that are selfadjoint, skew-adjoint, or unitary with respect to an indefinite inner product that has a unitary Hermitian Gram matrix. As two extreme special cases, the unitary diagonalizability of normal matrices and equivalent conditions for the existence of the Hamiltonian Schur form of Hamiltonian matrices have been recovered. While structure-preserving and numerically backward stable algorithms for the numerical computation of these forms are well known in the mentioned two special cases, it remains to develop such algorithms for the general case.