Appendix 1: Proof of Theorem 1
For proving Theorem 1, we require the following two lemmas.
Lemma 1
Consider matrices \(A\) (\(p, q\)) and \(B\) (\(p, m\)). Then
$$\begin{aligned} {\mathcal {M}} (A)\subset {\mathcal {M}} (B) \end{aligned}$$
(11)
iff
$$\begin{aligned} r(B) = r(B , A) \end{aligned}$$
(12)
Proof
Clearly, (11) is equivalent to the existence of an \((m \times q)\) matrix \(H\) with
$$\begin{aligned} A = BH . \end{aligned}$$
(13)
Whenever (13) holds, we have
$$\begin{aligned} r(B) = r(B,A) \end{aligned}$$
since
$$\begin{aligned} r(B) \le r(B,A) = r(B,BH) = r(B(I,H) ) \le r(B), \end{aligned}$$
where (16) below was used to prove the chain inequality \(\square \)
Lemma 2
Consider the matrices \(H \, (n, m)\) and \(X \, (m, m)\), where \(H\) is of full column rank and \(n \ge m\). Then
$$\begin{aligned} r(H X H^\prime ) = r(X). \end{aligned}$$
(14)
Proof
The result follows from the inequality chain
$$\begin{aligned} r(X) \ge r(H X H^\prime ) \ge r(H^\prime H X H^\prime H ), \end{aligned}$$
(15)
where we used
$$\begin{aligned} r(HT) \le \text{ min }\, [ r(H), r(T)] , \end{aligned}$$
(16)
(See, e.g., Magnus & Neudecker, 1999, Chapt. 1.7) and
$$\begin{aligned} r(H^\prime H X H^\prime H ) = r(X) . \end{aligned}$$
(17)
To prove (17), we used
$$\begin{aligned} r(UBU^\prime ) = r(B) \end{aligned}$$
(18)
whenever \(U\) is non-singular, and
$$\begin{aligned} r(H^\prime H) = r(H) = m. \end{aligned}$$
(19)
\(\square \)
We now move to the proof of Theorem 1.
Proof (Theorem 1)
When needed we use the equivalence of (4) and (5), due to Lemma 1. With fixed \(A\), equation (2) has as general solution
$$\begin{aligned} C^\prime = Q M \end{aligned}$$
with symmetric idempotent \(M = I_p - AA^+\) and arbitrary \((m,p)\, Q\). See, e.g., (Magnus and Neudecker (1999), p. 38, Exerc. 4). It is clear that \(r(M) = \text{ tr }\, M = p - r(A) = h\), say. Clearly, \(MA = 0\).
Let us write \(M = L L^\prime \) with \((p,h)\, L\) and \(L^\prime L = I_h\). It follows that \(C^\prime VC = QMVMQ^\prime = (QL) L^\prime VL (L^\prime Q^\prime )\) with \((m,h)\, QL\). As given, \(r(A) + r(C) = p\), hence \(r(C) = p - r(A) = h\). From the (in)equality chain
$$\begin{aligned} h = r(C) = r(C^\prime ) = r(QM) = r(QLL^\prime ) \le r(QL) \le r(L) = h \end{aligned}$$
we get \(h \le m\) and
$$\begin{aligned} r(QL) = h \end{aligned}$$
(20)
Hence \(QL\) is of full column rank. We used the (in)equality \(r(FG) \le \text{ min }\, [ r(F), r(G) ] \) and the property that the rank of a matrix cannot exceed its lowest dimension. Using Lemma 2, we conclude that
$$\begin{aligned} r(C^\prime V C) = r(L^\prime V L) . \end{aligned}$$
(21)
Let now start with \(r(C^\prime V C) = r(V) - r(A)\). This can be rephrased on the strength of (21) as
$$\begin{aligned} r(L^\prime V L) = r(V) - r(A) . \end{aligned}$$
(22)
We shall then show that \({\mathcal {M}} (A)\subset {\mathcal {M}} (V)\), consequently (4).
Subproof: (1) \(MVM = LL^\prime V LL^\prime \), hence \(r(MVM) = r(LL^\prime V LL^\prime ) = r(L^\prime VL)\) as \(L\) has full column rank. Rewrite (22) in turn as
$$\begin{aligned} r(MVM) + r(A) = r(V) \end{aligned}$$
(23)
Clearly,
$$\begin{aligned} r(MV^{\frac{1}{2}},A)&= r[ (MV^{\frac{1}{2}},A)^\prime (MV^{\frac{1}{2}},A) ]\\&= r\left[ \left( \begin{array}{c} V^{\frac{1}{2}}M \\ A^{\prime } \end{array} \right) \left( \begin{array}{cc} M V^{\frac{1}{2}}, A \end{array} \right) \right] = r\left( \left[ \begin{array}{cc} \ V^{\frac{1}{2}}MV^{\frac{1}{2}}\ &{} V^{\frac{1}{2}}M A \\ A^\prime M V^{\frac{1}{2}} &{} A^{\prime }A \end{array} \right] \right) \\&= r\left( \left[ \begin{array}{cc} \ V^{\frac{1}{2}}MV^{\frac{1}{2}}\ &{} 0 \\ 0 &{} A^{\prime }A \end{array} \right] \right) =r(V^{\frac{1}{2}}M^{2}V^{\frac{1}{2}})+r(A^{\prime }A)\\&= r(MVM)+r(A). \end{aligned}$$
We used the idempotence of \(M\) and the definitional equality \(MA=0\).
So we have shown that
$$\begin{aligned} r(MV^{\frac{1}{2}},A)= r(V) . \end{aligned}$$
(24)
(2) Consider now an arbitrary vector \(x = V^{\frac{1}{2}}a \). Hence \(Mx = MV^{\frac{1}{2}}a . \) This is a consistent equation with general solution
$$\begin{aligned} x = M^+ M V^{\frac{1}{2}} a + (I_p - M^+ M)b \end{aligned}$$
(\(b\) arbitrary)
$$\begin{aligned}&= M V^{\frac{1}{2}} a + (I_p - M)b\\&= M V^{\frac{1}{2}} a + AA^+b\\&= (MV^{\frac{1}{2}},A) \left( \begin{array}{c} a \\ A^+ b \end{array} \right) , \end{aligned}$$
see, e.g., (Magnus and Neudecker (1999), p. 37, Theorem 12). Note that \(M\) idempotent implies \(M^+ = M\). Hence \(x \in {\mathcal {M}} (MV^{\frac{1}{2}},A) \). By consequence
$$\begin{aligned} {\mathcal {M}} ( V^{\frac{1}{2}}) = {\mathcal {M}} (MV^{\frac{1}{2}},A) \end{aligned}$$
and
$$\begin{aligned} {\mathcal {M}} (A) \subset {\mathcal {M}} (V^{\frac{1}{2}}) = {\mathcal {M}} ( V ) . \end{aligned}$$
(25)
So we have established
$$\begin{aligned} {\mathcal {M}} (A) \subset {\mathcal {M}} (V), \end{aligned}$$
consequently (4).
Finally, we shall start from (5) (i.e., from (4), in view of Lemma 1) and prove (3). Consider the partitioned \((p, p+q)\) matrix \((MV^{\frac{1}{2}},A) \). Write then \(A = V^{\frac{1}{2}}P\) for expressing (5). So
$$\begin{aligned} MV^{\frac{1}{2}} = ( I_p - AA^+) V^{\frac{1}{2}} = V^{\frac{1}{2}} - V^{\frac{1}{2}} P A^+ V^{\frac{1}{2}} = V^{\frac{1}{2}} (I_p - PA^+V^{\frac{1}{2}}) \end{aligned}$$
and
$$\begin{aligned} (MV^{\frac{1}{2}},A) \left( \begin{array}{cc} I \\ A^+V^{\frac{1}{2}} \end{array} \right) = M V^{\frac{1}{2}} + A A^+ V^{\frac{1}{2}} = (M + A A^+) V^{\frac{1}{2}} = V^{\frac{1}{2}} . \end{aligned}$$
(We used \(M = I_p - A A^+\).) From this follows
$$\begin{aligned} r(V) = r ( V^{\frac{1}{2}}) \le r(MV^{\frac{1}{2}}, A) \end{aligned}$$
(26)
Further
$$\begin{aligned} (MV^{\frac{1}{2}},A) = ( V^{\frac{1}{2}} - V^{\frac{1}{2}} P A^+ V^{\frac{1}{2}}, V^{\frac{1}{2}}P ) = V^{\frac{1}{2}} (I_p - P A^+ V^{\frac{1}{2}}, P). \end{aligned}$$
Hence
$$\begin{aligned} r(MV^{\frac{1}{2}}, A) \le r(V^{\frac{1}{2}}) = r(V) . \end{aligned}$$
(27)
So by (26) and (27) we find
$$\begin{aligned} r(MV^{\frac{1}{2}},A) = r(V) . \end{aligned}$$
(28)
As shown earlier,
$$\begin{aligned} r(MV^{\frac{1}{2}}, A) = r(MVM) + r(A) . \end{aligned}$$
So,
$$\begin{aligned} r(MV^{\frac{1}{2}}, A) = r(L^\prime V L) + r(A) = r(C^\prime V C) + r(A), \end{aligned}$$
and
$$\begin{aligned} r(C^\prime V C) = r(V) - r(A). \end{aligned}$$
Hence (3) has now been proved.\(\square \)
Appendix 2: Proof of Theorem 2
To make more explicit the dependency on sample size \(n\), throughout this proof, we rewrite \(s, \hat{\sigma }, \hat{\theta }, \hat{V}, \hat{A}, T\) as \(s_n, \sigma _n, \hat{\theta }_n, V_n, A_n, T_n\), respectively. We use “\(\mathop {\rightarrow }\limits ^{p} \)“ and “\(\mathop {\rightarrow }\limits ^{d} \)“ to denote convergence in probability and distribution, respectively, and “\(\mathop {=}\limits ^{a} \)” to denote asymptotic equality (which implies that the difference between the left and right-hand sides of the equality converges to zero in probability). For proving Theorem 2, we require the following lemma.
Lemma 3
Assume
-
1.
\(\sigma = \sigma (\theta )\) is a continuously differentiable \(p\)-valued function of \(\theta \in \Theta \subset R^q\), where \(\Theta \) is open and compact
-
2.
\(\hat{\theta }_n\in \Theta \) is a sequence of random vectors with \(\hat{\theta }_n \mathop {\rightarrow }\limits ^{p} \theta _0\) and \(\sqrt{n}(\hat{\theta }_n- \theta _0) = O_p(1)\) {i.e., bounded in probability, as in Mann and Wald (1943)}
-
3.
\( s_n \in R^p\) is a sequence of random vectors with \( s_n \mathop {\rightarrow }\limits ^{p} \sigma _0 \)
-
4.
\(\sqrt{n} (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,V)\), where \(V\) is a \(p \times p\) semi-positive definite matrix
-
5.
\( V_n\) are random \(p \times p\) matrices with \( V_n \mathop {\rightarrow }\limits ^{p} V\)
-
6.
\(\sigma _0 = \sigma (\theta _0), \, \theta _o \in \Theta \)
Footnote 2
-
7.
\(A(\theta ) = \frac{ \partial \sigma (\theta )}{\partial \theta ^\prime }\) is regular in \(\theta _0\) {i.e., the matrix has constant rank for \(\theta \) in a neighborhood of \(\theta _0\)}
-
8.
\(C_n = I_p -A_n( A_n^\prime A_n)^+ A_n^\prime \) and \(C_0 = I_p -A_0(A_0^\prime A_0)^+ A_0^\prime \), where \(A_n\) and \(A_0\) are \(A(\theta ) \) evaluated, respectively, at \(\hat{\theta }_n\) and \(\theta _0\)
Then
$$\begin{aligned}&C_n^\prime V_n C_n \mathop {\rightarrow }\limits ^{p} C_0^\prime VC_0\end{aligned}$$
(29)
$$\begin{aligned}&\sqrt{n}C_n(s_n - \sigma _n ) \mathop {=}\limits ^{a} \sqrt{n}C_0 (s_n - \sigma _0) \end{aligned}$$
(30)
and
$$\begin{aligned} \sqrt{n} C_n^\prime (s - \sigma _n ) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,C_0^\prime VC_0) \end{aligned}$$
(31)
Proof
Using 1, 2, 3, and 7, we obtain
$$\begin{aligned} \sqrt{n} (s_n - \sigma _n ) = \sqrt{n} [(s_n - \sigma _0) - (\sigma _n - \sigma _0 ) ] \end{aligned}$$
and
$$\begin{aligned} \sqrt{n} (s_n - \sigma _n ) \mathop {=}\limits ^{a} \sqrt{n} [(s_n - \sigma _0) + A_0(\hat{\theta }_n- \theta _0)] \end{aligned}$$
(32)
To proceed with the proof, we need a result on the continuity of the Moore-Penrose inverse.
Result S (Stewart , 1969): Let \(M_n\) be a sequence of square matrices, \(M_n^+ \rightarrow \)
provided that
-
1.
\(M_n \rightarrow M \)
and
-
2.
exist an
\(n^\prime \in N\)
such that
\(r(M_n) = r(M)\)
when
\(n > n ^\prime \).
We need the following adaptation to the stochastic case, which we infer from Andrews ’s (1987).
Result A (Andrews , 1987, p. 355): Let
\(M_n\)
be a sequence of square random matrices, then
\( M_n^+ \mathop {\rightarrow }\limits ^{p} M^+\)
provided that
-
1.
\(M_n \mathop {\rightarrow }\limits ^{p} M \)
and
-
2.
\(P[r(M_n) = r(M) ]\rightarrow 1\)
when
\(n \rightarrow \infty \)
The continuity of \(A(\theta )\) and 2 implies \(A_n \mathop {\rightarrow }\limits ^{p} A\), thus \(A_n^\prime A_n \mathop {\rightarrow }\limits ^{p} A^\prime A\). Since \(A(\theta )\) is regular at \(\theta _0\) we obtain that \(P(r(A_n) = r(A)) \rightarrow 1\) and hence
$$\begin{aligned} P[ r(A_n^\prime A_n) = r(A_n) = r(A) = r(A^\prime A) ] \rightarrow 1. \end{aligned}$$
This and Result A imply
$$\begin{aligned} (A_n^\prime A_n)^+ \mathop {\rightarrow }\limits ^{p} ( A^\prime A)^+ \end{aligned}$$
and hence
$$\begin{aligned} C_n \mathop {\rightarrow }\limits ^{p} C_0 , \end{aligned}$$
(33)
where \(C_n\) and \(C_0\) are defined in 8. By (33), and the convergence of \(V_n\) to \(V\), we obtain (29) of the lemma.
Combining (32), (33), assumptions 2 and 5, and \(C_0^\prime A_0 = 0\), we obtain (30). The result (31) follows directly from (30) and assumption 4. \(\square \)
We now move to the proof of Theorem 2.
Proof (Theorem 2)
Using Theorem 1, assumption 8 of Lemma 3 (implying \(C_n^\prime A_n = 0\) and \(r(C_n) + r(A_n) =p\)) and assumption 2 imply that for any \(\epsilon >0\), there is an integer \(n^\prime \) such that for \(n >n^\prime \)
$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) ) > 1 - \epsilon /2; \end{aligned}$$
(34)
in addition, from assumption 1, there is an integer \(n^{\prime \prime }\) such that for \(n > n^{\prime \prime }\)
$$\begin{aligned} P( r(V_n) = r(V) ) > 1 - \epsilon /2. \end{aligned}$$
(35)
Thus, when \(n > \text{ max }\, (n^\prime , n^{\prime \prime })\),
$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) = r(V) - r(A_0) ) > 1 - \epsilon \end{aligned}$$
(36)
Here we used
$$\begin{aligned} P({\mathcal {E}} \cap {\mathcal {U}} ) \ge P({\mathcal {E}} ) + P({\mathcal {U}} ) -1, \end{aligned}$$
(37)
where \({\mathcal {E}} \) and \({\mathcal {U}} \) denote events in the same probability space (in the present case, \({\mathcal {E}} \) and \({\mathcal {U}} \) are the arguments of the probabilities in (34) and (35), respectively). Now, using again Theorem 1, when \(n > \text{ max }\, (n^\prime , n^{\prime \prime })\), the result (36) expands to
$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) = r(V) - r(A_0) = r( C_0^\prime V C_0) ) > 1 - \epsilon \end{aligned}$$
(38)
Since (38) holds for any \(\epsilon >0\), we conclude
$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r( C_0^\prime V C_0) ) \rightarrow 1 \end{aligned}$$
(39)
From (29), (39), and Result A, we obtain the key result
$$\begin{aligned} ( C_n^\prime V_n C_n )^+ \mathop {\rightarrow }\limits ^{p} ( C_0^\prime V C_0 )^+ \end{aligned}$$
(40)
Clearly, using 4 of Lemma 3, we obtain
$$\begin{aligned} \sqrt{n} C_0^\prime (s - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,C_0^\prime VC_0) \end{aligned}$$
(41)
Now, combining (40) and (41) yields (using Theorem 2 of Moore , 1977)
$$\begin{aligned} n (s_n - \sigma _0)^\prime C_0 ( C_0^\prime V C_0 )^+ C_0^\prime (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} \chi ^2_{r( C_0^\prime V C_0) } \end{aligned}$$
(42)
The correspondence with Moore’s Theorem 2 is as follows: Moore’s convergence in probability of \(B_n\) is our (40); Moore’s \(\Sigma \) and \(\sqrt{n}(t_n - \tau _0)\) are respectively \((C_0^\prime V C_0 )\) and \(\sqrt{n} C_0^\prime (s_n - \sigma _0) \); finally, Moore’s degrees of freedom \(k\) is our \(r(C_0^\prime V C_0 )\).
The asymptotic chi-square result (8) follows by noting that the quadratic form value in (42) is asymptotically equal to \(T\) of (7), due to (30) and (40). Finally, using again Theorem 1 and \(r(V) = r(V,A_0)\), we obtain (9), concluding the proof of the theorem. \(\square \)
Note 1 (the non-central case) When 4 of Lemma 3 is replaced by
$$\begin{aligned} \sqrt{n} (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (\delta ,V) \end{aligned}$$
(43)
where \(\delta \) is a \(p \times 1\) vector, then the results of Lemma 3 apply except that the right-hand side of (31) would now have mean \(C^\prime \delta \) instead of zero.
When in addition \(\delta \in {\mathcal {M}} (V) \), then the symptotic distribution of \(T_n\) of Theorem 2 is \(\chi ^2_r(\lambda )\), with \(\lambda = \delta ^\prime C_0(C_0^\prime V C_0)^+C_0^\prime \delta \) and \(r\) the same value \(r=r(C_0^\prime V C_0)\). This follows by noting that \(\delta \in {\mathcal {M}} (V) [= {\mathcal {M}} ( V^{1/2} ) ]\) implies \( C^\prime \delta \in {\mathcal {M}} (C^\prime VC ) [ = {\mathcal {M}} (C^\prime V^{1/2} ) ] \); consequently, the assumptions for b. of Theorem 2 of Moore (1977) hold, concluding a non-central chi-square distribution. Note that we used \( {\mathcal {M}} ( UU^\prime ) = {\mathcal {M}} ( U ) \) for any matrix \(U\) (see A4 of Satorra & Neudecker , 2003, p. 77). Assumption (43) with \(\delta \ne 0\) could be rephrased as the assumption of a sequence of local alternatives (as in, e.g., Satorra , 1989).