We start this section by stating and proving the following lemma which will be used in the proof of our main result.
Lemma 2.1
Let V be a vector space of dimension \(n \ge 2\) and let H be a subspace of V of dimension k with \(2\le k \le n\). For every vector \(v \in V\), there exists an orthonormal basis \(B = \{w_1, \dots , w_k\}\) of H such that at least \(k-1\) elements of B are orthogonal to v.
Proof
We prove the lemma for \({\mathbb {C}}^n\) and the proof extends naturally to every vector space endowed with an inner product and having dimension \(\ge 2\).
The trivial case is where v is orthogonal to H. If \(v \in H\), then the existence of B is ensured by the Gram–Schmidt process. A description of this process can be found in [13, Sect. 0.6.4]. Suppose that \(v \notin H\) and v is not orthogonal to H. Then there exists a vector \(v_1 \in H\) such that \(v_1^*v \ne 0\). Let \(\{v_1, \dots , v_k\}\) be a basis of H.
For \(i=2, 3, \dots , k\), we use \(v, v_1\) and \(v_i\) to form the vector
$$\begin{aligned} u_i = -\displaystyle {\frac{v^*v_i}{v^*v_1}}v_1 + v_i, \end{aligned}$$
which is orthogonal to v. Moreover, \(u_2, u_3, \dots , u_k\) are linearly independent, belong to H and span a subspace \(G \subset H\) of dimension \(k-1\). Therefore, the Gram–Schmidt process is applied to obtain an orthonormal basis \(\{w_2, \dots , w_k\}\) of G from \(u_2, u_3, \dots , u_k\). Note that the vectors \(w_2, \dots , w_k\) are orthogonal to v since G itself is orthogonal to v. Let
$$\begin{aligned} w_1 = \frac{v_1-{\sum _{i=2}^{k} (w_i^*v_1)w_i}}{\left\| v_1-{\sum _{i=2}^{k}(w_i^*v_1)w_i}\right\| }. \end{aligned}$$
Then \(w_1 \in H, ~\Vert w_1\Vert =1\) and \(\{w_1, \dots , w_k\}\) forms an orthonormal basis of H. \(\square \)
Now we state our main theorem.
Theorem 2.2
Let M be an \(n \times n~\) complex diagonalizable matrix with \(n\ge 2\) and let \(\lambda _1, \dots , \lambda _r\) be the distinct eigenvalues of M with \(r\in \{1, \dots , n\}\).
For every nonzero vector \(v \in {\mathbb {C}}^n\), let \(R_M(v)\) be the subspace of \({\mathbb {C}}^n\) defined by
$$\begin{aligned} R_M(v) = \text {span}~\left\{ v,~ Mv,~ M^2v, \dots \right\} . \end{aligned}$$
If v is not orthogonal to t left eigenspaces of M but is orthogonal to the \(r-t\) remaining ones for some \(t\in \{1, \dots , r\}\), then the vectors \(v,~ Mv,~ \dots ,~ M^{t-1}v\) are linearly independent and span \(R_M(v)\).
Proof
Let \(\beta _1, \dots , \beta _n\) be the eigenvalues of M (not necessarily distinct). Since M is diagonalizable, it has a spectral decomposition of the form
$$\begin{aligned} M = \sum _{i=1}^n\beta _ix_iy_i^*, \end{aligned}$$
(1)
where \(x_i\) and \(y_i\) are, respectively, right and left eigenvectors of M associated with \(\beta _i\) and satisfying
$$\begin{aligned} \sum _{i=1}^n x_iy_i^* = I_n, ~\text { the}\ n \times n~\ \text {identity matrix.} \end{aligned}$$
(2)
To take into account the multiplicities of the eigenvalues of M,
we use the following notation:
-
1.
\(\lambda _1, \dots , \lambda _r\) are the distinct eigenvalues of M for some \(r\le n\).
-
2.
\(g_i\) is the multiplicity of \(\lambda _i\).
-
3.
In (1), the right and left eigenvectors of M associated with \(\lambda _i\) are denoted, respectively, by \(x_{i_1}, \dots , x_{i_{g_i}}\) and \(y_{i_1}, \dots , y_{i_{g_i}}\).
-
4.
The eigenspace of M associated with \(\lambda _i\) is denoted by \(H_i\). Note that \(x_{i_1}, \dots , x_{i_{g_i}}\) form a basis of \(H_i\).
Then (1) can be written as
$$\begin{aligned} M&= \sum _{i=1}^r \sum _{j=1}^{g_i} \lambda _i x_{i_j} y_{i_j}^* \nonumber \\&= \sum _{i=1}^r \lambda _i \sum _{j=1}^{g_i} x_{i_j} y_{i_j}^* \end{aligned}$$
(3)
and (2) as
$$\begin{aligned} \sum _{i=1}^r \sum _{j=1}^{g_i} x_{i_j} y_{i_j}^* = I_n. \end{aligned}$$
(4)
Note also that
$$\begin{aligned} y_{i_j}^*x_{s_t} = {\left\{ \begin{array}{ll} ~1 ~~ \text {if}~~ i=s ~~\text {and}~~ j=t,\\ ~0 ~~\text {otherwise}. \end{array}\right. } \end{aligned}$$
(5)
Using (3), (4), (5) and the notation \(M^0=I_n\), we have
$$\begin{aligned} M^k = \sum _{i=1}^r \lambda _i^k \sum _{j=1}^{g_i}x_{i_j} y_{i_j}^* ~~\text {for every}~~ k\in {\mathbb {N}}\cup \{0\}. \end{aligned}$$
(6)
Let \(t \in \{1, \dots , r\}\) and v be a nonzero vector in \({\mathbb {C}}^n\) not orthogonal to \(H_i\) for \(i=1, \dots , t\) and orthogonal to \(H_i\) for \(i=t+1, \dots , r\). Then (6) implies
$$\begin{aligned} v^*M^k&= \sum _{i=1}^r \lambda _i^k \sum _{j=1}^{g_i} (v^*x_{i_j}) y_{i_j}^* \nonumber \\&= \sum _{i=1}^t \lambda _i^k \sum _{j=1}^{g_i}(v^*x_{i_j}) y_{i_j}^*, ~~~ k \in {\mathbb {N}}\cup \{0\}. \end{aligned}$$
(7)
By Lemma 2.1, eigenvectors \(x_{i_1}, \dots , x_{i_{g_i}}\) associated with \(\lambda _i\) can be chosen to form an orthonormal set in which \(v^*x_{i_1} \ne 0\) and \(v^*x_{i_j} = 0\) for \(j=2, \dots , g_i\) (if \(g_i \ge 2\)). It follows from (7) that
$$\begin{aligned} v^*M^k = \sum _{i=1}^t \lambda _i^k (v^*x_{i_1}) y_{i_1}^*, ~~~ k \in {\mathbb {N}}\cup \{0\}. \end{aligned}$$
(8)
Now, let \(a_0, \dots , a_{t-1}\) be complex numbers such that
$$\begin{aligned} a_0v^*+a_1v^*M+\dots +a_{t-1}v^*M^{t-1}=0. \end{aligned}$$
(9)
From (8) and (9) we have
$$\begin{aligned} \sum _{k=0}^{t-1}a_k\left[ \sum _{i=1}^t \lambda _i^k (v^*x_{i_1}) y_{i_1}^*\right] =0 \end{aligned}$$
or equivalently,
$$\begin{aligned} \sum _{i=1}^t (v^*x_{i_1}) \left[ \sum _{k=0}^{t-1} a_k \lambda _i^k\right] y_{i_1}^* = 0. \end{aligned}$$
(10)
Since \(v^*x_{i_1} \ne 0\) and the eigenvectors \(\{y_{i_1} ~|~ 1\le i \le t\}\) are linearly independent, (10) implies
$$\begin{aligned} \sum _{k=0}^{t-1} a_k \lambda _i^k = 0 ~~\text {for}~~ i=1, \dots , t. \end{aligned}$$
(11)
If at least one of the coefficients \((a_i)\) is different than zero, then (11) implies that the polynomial \(f(z) =\sum _{k=0}^{t-1} a_kz^k\) has t distinct roots \(\lambda _1, \dots , \lambda _t\), while its degree does not exceed \(t-1\). This is a contradiction. Hence, \(a_0=a_1=\dots =a_{t-1}=0\). Then we deduce from (9) that the vectors \(v,~ M^*v,~ \dots ,~ (M^*)^{t-1}v\) are linearly independent. Since there are t of them, those vectors span the subspace spanned by the set of vectors \(\{y_{i_1} ~|~ 1\le i \le t\}\) as can be seen from (8). This subspace contains all the vectors of the form \((M^*)^kv,~ k\in {\mathbb {N}}\cup \{0\}\) as also can be seen from (8). Now we complete the proof using the fact that the eigenvalues of \(M^*\) are complex conjugate of those of M and the eigenspace of \(M^*\) associated with \(\lambda _i^*\) is the left eigenspace of M associated with \(\lambda _i\). \(\square \)
The practical aspect of Theorem 2.2 is reflected in the following two corollaries.
Corollary 2.3
Let M be an \(n \times n~\) complex diagonalizable matrix and q(M) be the number of its distinct eigenvalues. Then
$$\begin{aligned} q(M) = \underset{v \in {\mathbb {C}}^n }{\max } \Big \{ \text {rank}\left( \left[ v ~~ Mv ~~ \dots ~~ M^{n-1}v\right] \right) \Big \}. \end{aligned}$$
(12)
Proof
It is always possible to find a vector \(v_0\) that is not orthogonal to any left eigenspace of M. Then the corollary follows from Theorem 2.2. \(\square \)
Corollary 2.4
An \(n \times n~\) complex diagonalizable matrix M has at least k distinct eigenvalues if and only if there exists a nonzero vector \(v \in {\mathbb {C}}^n\) such that the matrix
$$\begin{aligned} A = \left[ v ~~ Mv ~~ \dots ~~ M^{n-1}v\right] \end{aligned}$$
has rank k.
Proof
Suppose that M has at least k distinct eigenvalues. Following the same notation as in Theorem 2.2, we can use (5) to verify that the vector \(v = y_{1_1}+y_{2_1}+\dots +y_{k_1}\) is not orthogonal to any of the k subspaces of M associated with \(\lambda _1, \dots , \lambda _k\), but orthogonal to each of its subspaces associated with \(\lambda _{k+1}, \dots , \lambda _r\). This tells us that it is always possible to find a vector v that is not orthogonal to exactly k left eigenspaces of M and orthogonal to the remaining ones (if any). It follows from Theorem 2.2 that A has rank k. Conversely, suppose that the rank of A is k. Then by Corollary 2.3, M cannot have less than k distinct eigenvalues. \(\square \)
Remark 2.5
Theorem 2.2 and its corollaries do not hold in the general case of defective (non-diagonalizable) matrices. Here is a counterexample.
Example 2.6
Consider the following matrix:
$$\begin{aligned} M =\left[ \begin{array}{rrr} 1 &{} 0 &{} -1 \\ 2 &{} -1 &{} 3 \\ 1 &{} -1 &{} 3 \end{array}\right] . \end{aligned}$$
If we choose the canonical vector \(v = [1 ~~ 0 ~~ 0]^T\), then the matrix
$$\begin{aligned} A = [v ~~ Mv ~~ M^2v] =\left[ \begin{array}{rrr} 1 &{} 1 &{} 0 \\ 0 &{} 2 &{} 3 \\ 0 &{} 1 &{} 2 \end{array}\right] \end{aligned}$$
is nonsingular since its determinant equals 1. According to Corollary 3.1, matrix M should have 3 distinct eigenvalues. However this is not the case since M is defective as it can be seen from its Jordan form
$$\begin{aligned} M =\left[ \begin{array}{rrr} 1 &{} 1 &{} 0 \\ 1 &{} -1 &{} -1 \\ 0 &{} -1 &{} -1 \end{array}\right] \left[ \begin{array}{rrr} 1 &{} 1 &{} 0 \\ 0 &{} 1 &{} 1 \\ 0 &{} 0 &{} 1 \end{array}\right] \left[ \begin{array}{rrr} 0 &{} 1 &{} -1 \\ 1 &{} -1 &{} 1 \\ -1 &{} 1 &{} -2 \end{array}\right] . \end{aligned}$$
Remark 2.7
The inequality
$$\begin{aligned} q(M) \le \text {rank}(M) +1 \end{aligned}$$
(13)
holds for every \(n \times n~\) complex matrix M. If, in addition, M is diagonalizable, then it follows from (12) and (13) that
$$\begin{aligned} \text {rank}(M) \ge \underset{v \in {\mathbb {C}}^n }{\max } \Big \{\text {rank}\left( \left[ v~~Mv~~\dots ~~M^{n-1}v\right] \right) \Big \}-1. \end{aligned}$$
Remark 2.8
Theorem 2.2 extends to the general case of diagonalizable matrices over an algebraically closed field \({\mathbb {K}}\). In fact, every diagonalizable matrix M in \(M_n({\mathbb {K}})\), the set of \(n \times n~\) matrices with elements in \({\mathbb {K}}\), has the form
$$\begin{aligned} M = \sum _{i=1}^n \lambda _i x_i y_i^T, \end{aligned}$$
where \(\{x_i\}\) and \(\{y_i\}\) satisfy
$$\begin{aligned} \sum _{i=1}^n x_i y_i^T = I_n, ~~\text {the identity matrix in} ~M_n({\mathbb {K}}). \end{aligned}$$
Replacing \(y_i^*\) by \(y_i^T\) in the proof of Theorem 2.2, we can see that this theorem applies to M.