On the consistency of the matrix equation X⊤AX=B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^\top A X=B$$\end{document} when B is symmetric: the case where CFC(A) includes skew-symmetric blocks

In this paper, which is a follow-up to Borobia et al. (Mediterr J Math, 18:40, 2021), we provide a necessary and sufficient condition for the matrix equation X⊤AX=B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^\top AX=B$$\end{document} to be consistent when B is symmetric. The condition depends on the canonical form for congruence of the matrix A, and is proved to be necessary for all matrices A, and sufficient for most of them. This result improves the main one in the previous paper, since the condition is stronger than the one in that reference, and the sufficiency is guaranteed for a larger set of matrices (namely, those whose canonical form for congruence, CFC(A), includes skew-symmetric blocks).


Introduction
Let A ∈ C n×n and let B ∈ C m×m be a symmetric matrix. We are interested in the consistency of the matrix equation where (·) denotes the transpose.
To be more precise, we want to obtain necessary and sufficient conditions for (1) to be consistent. The main tool to get these conditions is the canonical form for congruence, CFC (see Theorem 1), because (1)  if the equation that we obtain after replacing the matrices A and/or B by their CFCs is consistent. The CFC is a direct sum of three kinds of blocks of different sizes, named Type-0, Type-I, and Type-II, and the idea is to take advantage of this structure to analyze Eq. (1). In particular, the only symmetric canonical blocks are I 1 = [1] and 0 1 = [0], so the CFC of the symmetric matrix B is of the form CFC(B) = I m ⊕ 0 k (where I m and 0 k are, respectively, a direct sum of m and k copies of I 1 and 0 1 ). With the help of Lemma 2 we can get rid of the null block 0 k , so the equation we are interested in is with m ≥ 1.
In [4] we introduced τ (A), a quantity that depends on the number of certain Type-0, Type-I, and Type-II blocks appearing in the CFC of A, and we proved in [4,Th. 2] that if Eq. (2) is consistent then m ≤ τ (A). Moreover, the main result of that paper, [4,Th. 8], establishes that if the CFC of A contains neither H 2 (−1) nor H 4 (1) blocks (which are specific Type-II blocks) then Eq. (2) is consistent if and only if m ≤ τ (A). This is not necessarily true if we allow the CFC of A to contain blocks H 2 (−1) and/or H 4 (1) (for instance, it is not true for A = H 2 (−1) nor A = H 4 (1)).
In the present work we introduce a new quantity υ(A), that depends also on the number of certain Type-0, Type-I, and Type-II blocks appearing in the CFC of A. In Theorem 7 we will prove that if Eq. Note that the main result of this paper improves the main one in [4] in two senses: (i) the condition here is stronger than the one there; and (ii) the characterization is guaranteed for a larger set of matrices.
In the title we have referred to "the case where CFC(A) includes skew-symmetric blocks". This highlights the fact that, compared to [4], in the present work the main result is applied to matrices whose CFC contains H 2 (−1) blocks, which are the only nonzero skew-symmetric blocks in a CFC.
The interest on Eq. (1) goes back to, at least, the 1920s [16], and it has been mainly devoted to describing the solution, X , for matrices A, B over finite fields and when A and/or B have some specific structure [6-9, 11, 13, 17]. More recently, some related equations have been analyzed [15] and, in particular, in connection with applications [1][2][3]. In [5] we have addressed the consistency of Eq. (1) when B is skew-symmetric, where it is emphasized the connection between the consistency of (1) and the dimension of the largest subspace of C n for which the bilinear form represented by A is skew-symmetric and non-degenerate. The same connection holds after replacing skew-symmetric by symmetric, which is the structure considered in the present work.
The paper is organized as follows. In Sect. 2 we introduce the basic notation and definitions (like the CFC), and we also recall some basic results that are used later. In Sect. 3 the quantities τ (A) and υ(A) are introduced. Section 4 presents the necessary condition for Eq. (2) to be consistent (Theorem 7), whereas in Sect. 6 we show that when the CFC of A does not contain blocks H 4 (1) this condition is sufficient as well (Theorem 12). In between these two sections, Sect. 5 is devoted to introduce the tools (by means of several technical lemmas) that are used to prove the sufficiency of the condition. Finally, in Sect. 7 we summarize the main contributions of this work and indicate the main related open question.

Basic approach and definitions
Throughout the manuscript, I n and 0 n denote, respectively, the identity and the null matrix with size n × n. By 0 m×n we denote the null matrix of size m × n. By i we denote the imaginary unit (namely, i 2 = −1), and by e j we denote the jth canonical vector (namely, the jth column of the identity matrix) of the appropriate size. The notation M ⊕k stands for a direct sum of k copies of the matrix M.
Following the approach in [4] and [5], a key tool in our developments is the canonical form for congruence (CFC). For the ease of reading we first recall the CFC, that depends on the following matrices: By the Canonical reduction law, in Eq. (1) we will assume without loss of generality that A and B are given in CFC.
When B is symmetric, the CFC of B is I m 1 ⊕0 m 2 . Then, as a consequence of the Canonical reduction law, we may restrict ourselves to the case where the right-hand side of (1) is of this form. Moreover, as a consequence of the J 1 (0)-law, in our developments we will consider B = I m in Eq. (1) (leading to Eq. (2)). Therefore, our goal is to characterize those matrices A such that A I m , for a fixed m ≥ 1. This will be done by concatenating several equations A A 1 · · · A k I m , since the Transitivity law allows us to conclude that A I m . For this reason, we will use the word "transformation" for a single equation A B. One way to determine the CFC of an invertible matrix A is by means of its cosquare, A − A (see [14]), where (·) − denotes the transpose of the inverse. Moreover, the cosquare will be used to determine whether two given invertible matrices are congruent, using the following result.

The matrices 0 k and H 2k ( )
Instead of the blocks k and H 2k (μ) we will use the following blocks, for k ≥ 1: We claim that k and H 2k (μ) are congruent to, respectively, k and H 2k (μ).
In order to prove that k and k are congruent, we give an indirect proof. Two matrix pairs Since ( k , k ) and J k (−1) k+1 , I k are strictly equivalent (see [10,Th. 4]) and J k (−1) k+1 , I k and ( k , k ) are strictly equivalent as well (see Eq. (5) in [12]), the pairs ( k , k ) and ( k , k ) are strictly equivalent, so k and k are congruent. Another alternative to show that k and k are congruent is by checking that their cosquares are similar to J k ((−1) k+1 ) and then using Lemma 3.
For the rest of the manuscript, we will replace the blocks k by k and H 2k (μ) by H 2k (μ), so, in particular, we will assume that the CFC is a direct sum of blocks J k (0), k , and H 2k (μ). The only exceptions to this rule are 1 which is equal to 1 , and H 2 (−1) which is equal to H 2 (−1).

The quantities (A) and (A)
The main result of this work (Theorem 12) depends on two intrinsic quantities of the matrix A, that we denote by τ (A) and υ(A). In this section, we introduce them and present some basic properties that will be used later.

Definition 4 Let
A be a complex n × n matrix and consider its CFC, where (i) j 1 is the number of Type-0 blocks with size 1; (ii) j O is the number of Type-0 blocks with odd size at least 3; (iii) γ O is the number of Type-I blocks with odd size; (iv) γ ε is the number of Type-I blocks with even size; 2O is the number of Type-II blocks H 4k−2 (−1) for any k ≥ 1; and (vi) h + 2ε is the number of Type-II blocks H 4k (1) for any k ≥ 1; (vii) it has an arbitrary number of other Type-0 and Type-II blocks.
Then we define the quantities The quantities τ and υ satisfy the following essential additive properties (the proof is straightforward): The notation for the quantities in Definition 4 follows the one in [5]. In particular, the letters used for the number of blocks in parts (i)-(vi) resemble the notation for the corresponding blocks (see [5,Rem. 6]). In [4] we had not yet adopted this notation. The correspondence between the notation in that paper and the one used here is the following: The values γ ε and h − 2O played no role in [4]. Table 1 contains the values of τ (A) and υ(A) for A being a single canonical block in the CFC. We have displayed the values in three categories, from top to bottom, namely: the first  Notice that τ (A) ≤ υ(A) whenever the CFC of A consists of just a single canonical block, except for H 2 (−1). This, together with (4), implies the following result.

Lemma 5 If the CFC of A has no blocks of type H 2 (−1) then τ (A) ≤ υ(A).
In order for the condition that we obtain (in Theorem 7) to be sufficient, the following notion is key.

Definition 6
The transformation A B is (τ, υ)-invariant if the following three conditions are satisfied:

A necessary condition
In this section, we introduce a necessary condition on the matrix A for A I m (namely, for Eq. (1) to be consistent when B is symmetric and invertible). This condition improves the one provided in [4,Th. 2], namely m ≤ τ (A).

Theorem 7 If A is a complex square matrix such that X AX = I m is consistent, then m ≤ min{τ (A), υ(A)}.
Proof In [4,Th. 2] it was proved that m ≤ τ (A) (though the notation τ was not used there). Let us see that m ≤ υ(A) as well. Assuming that the CFC of A is as in Definition 4, in the proof of Lemma 4.1 of [5] it was shown that By hypothesis, there exists some X 0 ∈ C n×m such that X 0 AX 0 = I m . Now, transposing this equation and adding it up, we get X 0 (A + A )X 0 = 2I m . From this identity, and using (5), we obtain , as claimed.

Absorbing the H 2 (−1) blocks
The main goal in the rest of the manuscript is to prove that the necessary condition presented in Theorem 7 is also sufficient when the CFC of A does not contain H 4 (1) blocks. If the CFC of A contains neither H 2 (−1) nor H 4 (1) blocks, this is already known [4,Th. 8]. In that case, as a consequence of Lemma 5, the condition for A I m reduces to m ≤ τ (A). When the CFC of A does not contain blocks H 4 (1) but contains blocks H 2 (−1), this is no longer true (see, for instance, Example 1 in [4]), and then the quantity υ(A) comes into play. This is an indication that the presence of blocks H 2 (−1) in the CFC of A deserves a particular treatment. In this section, we show how to deal with this type of blocks. To be more precise, we see that some blocks H 2 (−1) can be combined with other type of blocks in order to "eliminate" them by means of a (τ, υ)-invariant transformation. In this case, we say that the block H 2 (−1) has been "absorbed". We will consider separately the cases of Type-0, Type-I, and Type-II blocks, in Sects. 5.1, 5.2, and 5.3, respectively.
The following notation is used in the proofs of this section: E α×β denotes the α ×β matrix whose (α, 1) entry is equal to 1 and the remaining entries are zero.

The case of Type-0 blocks
In Lemma 8, we show how to "absorb" a block H 2 (−1) with a Type-0 block, J k (0), with k = 3. In the statement, J 0 (0) stands for an empty block.
Proof By considering separately the cases where k in (6) is odd (k = 2t + 1) and even (k = 2t), using (4) and looking at Table 1, we obtain: so both sides of the transformation in (6) have the same τ and υ. Now let us prove the consistency. The result is true for k = 2, since as wanted.
We will also use the following result, whose proof is straightforward.

The case of Type-I blocks
Lemma 10 is the counterpart of Lemma 8 for Type-I blocks, where k is replaced by k .

Lemma 10
The following transformation is (τ, υ)-invariant: Proof Considering again separately the cases where k in (7) is odd (k = 2t + 1) and even (k = 2t), using (4) and looking at Table 1, we obtain: for t ≥ 2. so both sides of the transformation in (7) have the same τ and υ. Now let us prove the consistency.
For k = 3 we have where the first transformation is due to the permutation law and the second one can be directly checked. For k ≥ 4 we are going to prove that where the first transformation is due to the permutation law and the second one can be directly checked. So for the rest of the proof we will focus on the second transformation. We use the following notation: A(i : j) is the principal submatrix of A containing the rows and columns from the ith to the jth ones.
To prove it we will use the identities 1 ⊕ k−2 where in the last-but-one equality we use that k (5 : k) = k−2 (3 : k − 2).

The case of Type-II blocks
Finally, Lemma 11 is the counterpart of Lemmas 8 and 10 for Type-II blocks. Again, instead of the blocks H 2k (μ) we use the tridiagonal version, H 2k (μ). In the statement, H 0 (μ) stands for an empty block.
(ii) Let us prove, for k ≥ 1, that This is because Finally, let us see that H 4k (−1) is congruent to ⊕2 2k or, equivalently, that H 4k (−1) is congruent to ⊕2 2k . In order to do this, we are going to prove that the cosquares of H 4k (−1) and ⊕2 2k are similar, and this immediately implies that H 4k (−1) and ⊕2 2k are congruent, by Lemma 3. The cosquare of H 4k (−1) is , with (see [10, p. 13 where denotes some entries that are not relevant in our arguments. 2k are similar, since the Jordan canonical form of both them is J 2k (−1) ⊕2 . (iii) Let us prove that, for k ≥ 1: for k = 1 the solution matrix is X 1 = C, as it can directly checked. Let us now see it for k ≥ 2: It remains to see that H 4k−2 (1) is congruent to ⊕2 2k−1 or, equivalently, that H 4k−2 (1) is congruent to ⊕2 2k−1 . To prove this, we can proceed as before, by showing that the cosquares of H 4k−2 (1) and ⊕2 2k−1 are similar (in this case, their Jordan canonical form is J 2k−1 (1) ⊕2 ), and this implies that H 4k−2 (1) and ⊕2 2k−1 are congruent, again by Lemma 3.

The main result
The following result, which is the main result in this work, improves the main result in [4] (namely, Theorem 8 in that reference) by including the case where the CFC of A contains blocks of type H 2 (−1), that were excluded in [4,Th. 8]. Proof The necessity of the condition is already stated in Theorem 7. We are going to prove that it is also sufficient.

Theorem 12 Let
By the J 1 (0)-law and the Canonical reduction law, we may assume that both A and B are given in CFC and that neither A nor B have blocks of type J 1 (0). This implies, in particular, that B = I m , for some m, and j 1 = 0 (where j 1 is the index associated to A given in Definition 4). We also assume that all blocks k and H 2k (μ) in A, if present, have been replaced by k and H 2k (μ), respectively.
Let us recall that ⊕m 1 = I m . Throughout the proof, we mainly use the first notation, to emphasize that we are dealing with canonical blocks.
If the CFC of A does not contain blocks H 2 (−1), then the result is provided in [4,Th. 8]. Otherwise, we are going to see that it is possible, by means of (τ, υ)-invariant transformations, to either "absorb" all blocks H 2 (−1) or to end up with a direct sum of blocks H 2 (−1), together with, possibly, other blocks, which are quite specific. More precisely, we can end up with a direct sum of blocks satisfying one of the following conditions: (C0) There are no blocks H 2 (−1). (C1) There are some blocks H 2 (−1) together with, possibly, a direct sum of blocks J 3 (0), 2 , and/or 1 . We are first going to see that, indeed, we can arrive to one of the situations described in cases (C0)-(C1). In the procedure, we may need to permute the canonical blocks, in order to use Lemmas 8, 10, and 11. By Theorem 1, this provides a congruent matrix which has, in particular, the same τ and υ, so these permutations do not affect the consistency. Then, we will prove that in both cases (C0) and (C1) the statement holds.
So let us assume that the CFC of A contains a direct sum of blocks H 2 (−1), together with some other Type-0, Type-I, and Type-II blocks (except H 4 (1)).
Using Lemma 8, for each block J k (0) (with k = 3) we can "absorb" a block H 2 (−1) by means of a (τ, υ)-invariant transformation, and we end up with a direct sum of a block J k−2 (0) together with two blocks 1 . We can keep reducing the size of the Type-0 blocks until either all H 2 (−1) blocks have been absorbed (so we end up in case (C0)) or there are no more Type-0 blocks, except maybe blocks J 3 (0). Now, we can proceed in the same way with Type-I blocks using Lemma 10. Again, we end up either with a direct sum containing no H 2 (−1) blocks (case (C0) again) or no Type-I blocks, except maybe blocks 1 and/or 2 . Next, we do the same with Type-II blocks using Lemma 11. Note that the reductions in parts (ii) and (iii) in the statement of Lemma 11 produce as an output some Type-I blocks k , with k ≥ 1. In the case when k > 1, we can use again Lemma 10, provided that there are still blocks H 2 (−1). Therefore, after these reductions, either we have absorbed all blocks H 2 (−1) (case (C0) again), or there are blocks H 2 (−1), together with, possibly, a direct sum of other blocks that cannot absorb them, namely J 3 (0), 2 , and/or 1 (case (C1)). Now, it remains to prove that in both cases (C0) and (C1) the statement holds, namely that A ⊕m 1 , for any m ≤ min{τ (A), υ(A)}, in these two cases. Let A be the matrix obtained after applying to A all the transformations explained in the previous paragraph. By the Transitive law, A A. Moreover, since all these transformations are (τ, υ)-invariant, then (4) implies that τ (A) = τ ( A) and υ(A) = υ( A). Therefore, it is enough to prove that A ⊕m 1 for any m ≤ min{τ ( A), υ( A)}. By the Elimination law, ⊕a 1 ⊕b 1 for any b < a, so it will be enough to prove that A min{τ ( A),υ( A)} 1 .
Related to this, Theorem 12 can be slightly improved, allowing the CFC of A to contain blocks H 4 (1) provided that the number of these blocks is not larger than the number of blocks H 2 (−1). In this case, we can start the reduction procedure described in the proof of Theorem 12 by "absorbing" the blocks H 4 (1) with the blocks H 2 (−1) as described in . More precisely, we can gather each block H 4 (1) with a block H 2 (−1), and use the (τ, υ)-invariant transformation H 4 (1) ⊕ H 2 (−1) ⊕4 1 . Once we have absorbed all blocks H 4 (1) we can continue with the reduction as explained in the proof of Theorem 12.

Conclusions and open questions
In this paper, we have obtained a necessary condition for the equation X AX = B to be consistent, with A, B being complex square matrices and B being symmetric. This condition improves the one obtained in [4,Th. 2]. Moreover, we have proved that the condition is sufficient when the CFC of A does not contain blocks H 4 (1). This result also improves the one in [4,Th. 8], where the case in which the CFC has blocks H 2 (−1) was excluded.
As a natural continuation of this work it remains to address the case where the CFC of A contains blocks H 4 (1), in order to fully characterize the consistency of X AX = B, with B symmetric, for any matrix A. We have seen that the condition mentioned above is no longer sufficient in this case, so a different characterization is needed. So far, we have been unable to find such a characterization.