Skip to main content

A Theorem on the Rank of a Product of Matrices with Illustration of Its Use in Goodness of Fit Testing

Abstract

This paper develops a theorem that facilitates computing the degrees of freedom of Wald-type chi-square tests for moment restrictions when there is rank deficiency of key matrices involved in the definition of the test. An if and only if (iff) condition is developed for a simple rule of difference of ranks to be used when computing the desired degrees of freedom of the test. The theorem is developed exploiting basics tools of matrix algebra. The theorem is shown to play a key role in proving the asymptotic chi-squaredness of a goodness of fit test in moment structure analysis, and in finding the degrees of freedom of this chi-square statistic.

This is a preview of subscription content, access via your institution.

Notes

  1. The algebraic equality of these alternative test could be proven using Theorem 1 of Section 2 of Satorra and Neudecker (2003).

  2. A more general case of population drift can be handled by assuming that the asymptotic mean of 4. is \(\delta \ne 0\). See Note 1 in the proof of Theorem 2.

References

  • Andrews, D. W. K. (1987). Asymptotic results for generalized Wald tests. Econometric Theory, 3, 348–358.

    Article  Google Scholar 

  • Browne, M. B. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.

    Article  Google Scholar 

  • Jennrich, B., & Satorra, A. (2013a). Continuous orthogonal complement functions and distribution-free goodness of fit tests in moment structure analysis. Psychometrika, 78, 545–552.

    Article  PubMed  Google Scholar 

  • Jennrich, B., & Satorra, A. (2013b). The nonsingularity of gamma in covariance structure analysis of nonnormal data. Psychometrika, 79, 51–59.

    Article  PubMed  Google Scholar 

  • Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics (Revised ed.). New York: Wiley.

    Google Scholar 

  • Mann, H. B., & Wald, A. (1943). On stochastic limit and order relationships. The Annals of Mathematical Statistics, 14, 217–226.

    Article  Google Scholar 

  • Moore, D. S. (1977). Generalized inverses, Wald’s method, and the construction of chi-square tests of fit. Journal of the American Statistical Association, 72, 131–137.

    Article  Google Scholar 

  • Puntanen, S., Styan, G. P. H., & Werner, H. J. (2003). On the rank of a matrix useful in goodness-of-fit testing of structural equation models. Solution, Econometric Theory, 19, 704–705.

  • Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54, 131–151.

    Article  Google Scholar 

  • Satorra, A. (1992). Asymptotic robust inferences in the analysis of mean and covariance structures. Sociological Methodology, 22, 249–278.

    Article  Google Scholar 

  • Satorra, A., & Neudecker, H. (2002). On the rank of a matriz useful in goodness-of-fit testing of structural equation models. Econometric Theory, 18, 1008–1009.

    Article  Google Scholar 

  • Satorra, A., & Neudecker, H. (2003). A matrix equality useful in goodness-of-fit testing of structural equation models. Journal of Statistical Planning and Inference, 114, 63–80.

  • Stewart, G. W. (1969). On the continuity of the generalized inverse, SIAM. Journal on Applied Mathematics, 17, 33–45.

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank a reviewer for useful comments and the AE, Dr. Alberto Maydeu-Olivares, for suggestions that improved considerably the presentation and content of this paper. AS would especially like to honor the memory of Roger Millsap, a most missed colleague, who left us so soon and unexpectedly, and whose initiative as Editor made this paper possible. Research of AS is supported by Grant EC02011-28875 from the Spanish Ministry of Science and Innovation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Albert Satorra.

Appendices

Appendix 1: Proof of Theorem 1

For proving Theorem 1, we require the following two lemmas.

Lemma 1

Consider matrices \(A\) (\(p, q\)) and \(B\) (\(p, m\)). Then

$$\begin{aligned} {\mathcal {M}} (A)\subset {\mathcal {M}} (B) \end{aligned}$$
(11)

iff

$$\begin{aligned} r(B) = r(B , A) \end{aligned}$$
(12)

Proof

Clearly, (11) is equivalent to the existence of an \((m \times q)\) matrix \(H\) with

$$\begin{aligned} A = BH . \end{aligned}$$
(13)

Whenever (13) holds, we have

$$\begin{aligned} r(B) = r(B,A) \end{aligned}$$

since

$$\begin{aligned} r(B) \le r(B,A) = r(B,BH) = r(B(I,H) ) \le r(B), \end{aligned}$$

where (16) below was used to prove the chain inequality \(\square \)

Lemma 2

Consider the matrices \(H \, (n, m)\) and \(X \, (m, m)\), where \(H\) is of full column rank and \(n \ge m\). Then

$$\begin{aligned} r(H X H^\prime ) = r(X). \end{aligned}$$
(14)

Proof

The result follows from the inequality chain

$$\begin{aligned} r(X) \ge r(H X H^\prime ) \ge r(H^\prime H X H^\prime H ), \end{aligned}$$
(15)

where we used

$$\begin{aligned} r(HT) \le \text{ min }\, [ r(H), r(T)] , \end{aligned}$$
(16)

(See, e.g., Magnus & Neudecker, 1999, Chapt. 1.7) and

$$\begin{aligned} r(H^\prime H X H^\prime H ) = r(X) . \end{aligned}$$
(17)

To prove (17), we used

$$\begin{aligned} r(UBU^\prime ) = r(B) \end{aligned}$$
(18)

whenever \(U\) is non-singular, and

$$\begin{aligned} r(H^\prime H) = r(H) = m. \end{aligned}$$
(19)

\(\square \)

We now move to the proof of Theorem 1.

Proof (Theorem 1)

When needed we use the equivalence of (4) and (5), due to Lemma 1. With fixed \(A\), equation (2) has as general solution

$$\begin{aligned} C^\prime = Q M \end{aligned}$$

with symmetric idempotent \(M = I_p - AA^+\) and arbitrary \((m,p)\, Q\). See, e.g., (Magnus and Neudecker (1999), p. 38, Exerc. 4). It is clear that \(r(M) = \text{ tr }\, M = p - r(A) = h\), say. Clearly, \(MA = 0\).

Let us write \(M = L L^\prime \) with \((p,h)\, L\) and \(L^\prime L = I_h\). It follows that \(C^\prime VC = QMVMQ^\prime = (QL) L^\prime VL (L^\prime Q^\prime )\) with \((m,h)\, QL\). As given, \(r(A) + r(C) = p\), hence \(r(C) = p - r(A) = h\). From the (in)equality chain

$$\begin{aligned} h = r(C) = r(C^\prime ) = r(QM) = r(QLL^\prime ) \le r(QL) \le r(L) = h \end{aligned}$$

we get \(h \le m\) and

$$\begin{aligned} r(QL) = h \end{aligned}$$
(20)

Hence \(QL\) is of full column rank. We used the (in)equality \(r(FG) \le \text{ min }\, [ r(F), r(G) ] \) and the property that the rank of a matrix cannot exceed its lowest dimension. Using Lemma 2, we conclude that

$$\begin{aligned} r(C^\prime V C) = r(L^\prime V L) . \end{aligned}$$
(21)

Let now start with \(r(C^\prime V C) = r(V) - r(A)\). This can be rephrased on the strength of (21) as

$$\begin{aligned} r(L^\prime V L) = r(V) - r(A) . \end{aligned}$$
(22)

We shall then show that \({\mathcal {M}} (A)\subset {\mathcal {M}} (V)\), consequently (4).

Subproof: (1) \(MVM = LL^\prime V LL^\prime \), hence \(r(MVM) = r(LL^\prime V LL^\prime ) = r(L^\prime VL)\) as \(L\) has full column rank. Rewrite (22) in turn as

$$\begin{aligned} r(MVM) + r(A) = r(V) \end{aligned}$$
(23)

Clearly,

$$\begin{aligned} r(MV^{\frac{1}{2}},A)&= r[ (MV^{\frac{1}{2}},A)^\prime (MV^{\frac{1}{2}},A) ]\\&= r\left[ \left( \begin{array}{c} V^{\frac{1}{2}}M \\ A^{\prime } \end{array} \right) \left( \begin{array}{cc} M V^{\frac{1}{2}}, A \end{array} \right) \right] = r\left( \left[ \begin{array}{cc} \ V^{\frac{1}{2}}MV^{\frac{1}{2}}\ &{} V^{\frac{1}{2}}M A \\ A^\prime M V^{\frac{1}{2}} &{} A^{\prime }A \end{array} \right] \right) \\&= r\left( \left[ \begin{array}{cc} \ V^{\frac{1}{2}}MV^{\frac{1}{2}}\ &{} 0 \\ 0 &{} A^{\prime }A \end{array} \right] \right) =r(V^{\frac{1}{2}}M^{2}V^{\frac{1}{2}})+r(A^{\prime }A)\\&= r(MVM)+r(A). \end{aligned}$$

We used the idempotence of \(M\) and the definitional equality \(MA=0\).

So we have shown that

$$\begin{aligned} r(MV^{\frac{1}{2}},A)= r(V) . \end{aligned}$$
(24)

(2) Consider now an arbitrary vector \(x = V^{\frac{1}{2}}a \). Hence \(Mx = MV^{\frac{1}{2}}a . \) This is a consistent equation with general solution

$$\begin{aligned} x = M^+ M V^{\frac{1}{2}} a + (I_p - M^+ M)b \end{aligned}$$

(\(b\) arbitrary)

$$\begin{aligned}&= M V^{\frac{1}{2}} a + (I_p - M)b\\&= M V^{\frac{1}{2}} a + AA^+b\\&= (MV^{\frac{1}{2}},A) \left( \begin{array}{c} a \\ A^+ b \end{array} \right) , \end{aligned}$$

see, e.g., (Magnus and Neudecker (1999), p. 37, Theorem 12). Note that \(M\) idempotent implies \(M^+ = M\). Hence \(x \in {\mathcal {M}} (MV^{\frac{1}{2}},A) \). By consequence

$$\begin{aligned} {\mathcal {M}} ( V^{\frac{1}{2}}) = {\mathcal {M}} (MV^{\frac{1}{2}},A) \end{aligned}$$

and

$$\begin{aligned} {\mathcal {M}} (A) \subset {\mathcal {M}} (V^{\frac{1}{2}}) = {\mathcal {M}} ( V ) . \end{aligned}$$
(25)

So we have established

$$\begin{aligned} {\mathcal {M}} (A) \subset {\mathcal {M}} (V), \end{aligned}$$

consequently (4).

Finally, we shall start from (5) (i.e., from (4), in view of Lemma 1) and prove (3). Consider the partitioned \((p, p+q)\) matrix \((MV^{\frac{1}{2}},A) \). Write then \(A = V^{\frac{1}{2}}P\) for expressing (5). So

$$\begin{aligned} MV^{\frac{1}{2}} = ( I_p - AA^+) V^{\frac{1}{2}} = V^{\frac{1}{2}} - V^{\frac{1}{2}} P A^+ V^{\frac{1}{2}} = V^{\frac{1}{2}} (I_p - PA^+V^{\frac{1}{2}}) \end{aligned}$$

and

$$\begin{aligned} (MV^{\frac{1}{2}},A) \left( \begin{array}{cc} I \\ A^+V^{\frac{1}{2}} \end{array} \right) = M V^{\frac{1}{2}} + A A^+ V^{\frac{1}{2}} = (M + A A^+) V^{\frac{1}{2}} = V^{\frac{1}{2}} . \end{aligned}$$

(We used \(M = I_p - A A^+\).) From this follows

$$\begin{aligned} r(V) = r ( V^{\frac{1}{2}}) \le r(MV^{\frac{1}{2}}, A) \end{aligned}$$
(26)

Further

$$\begin{aligned} (MV^{\frac{1}{2}},A) = ( V^{\frac{1}{2}} - V^{\frac{1}{2}} P A^+ V^{\frac{1}{2}}, V^{\frac{1}{2}}P ) = V^{\frac{1}{2}} (I_p - P A^+ V^{\frac{1}{2}}, P). \end{aligned}$$

Hence

$$\begin{aligned} r(MV^{\frac{1}{2}}, A) \le r(V^{\frac{1}{2}}) = r(V) . \end{aligned}$$
(27)

So by (26) and (27) we find

$$\begin{aligned} r(MV^{\frac{1}{2}},A) = r(V) . \end{aligned}$$
(28)

As shown earlier,

$$\begin{aligned} r(MV^{\frac{1}{2}}, A) = r(MVM) + r(A) . \end{aligned}$$

So,

$$\begin{aligned} r(MV^{\frac{1}{2}}, A) = r(L^\prime V L) + r(A) = r(C^\prime V C) + r(A), \end{aligned}$$

and

$$\begin{aligned} r(C^\prime V C) = r(V) - r(A). \end{aligned}$$

Hence (3) has now been proved.\(\square \)

Appendix 2: Proof of Theorem 2

To make more explicit the dependency on sample size \(n\), throughout this proof, we rewrite \(s, \hat{\sigma }, \hat{\theta }, \hat{V}, \hat{A}, T\) as \(s_n, \sigma _n, \hat{\theta }_n, V_n, A_n, T_n\), respectively. We use “\(\mathop {\rightarrow }\limits ^{p} \)“ and “\(\mathop {\rightarrow }\limits ^{d} \)“ to denote convergence in probability and distribution, respectively, and “\(\mathop {=}\limits ^{a} \)” to denote asymptotic equality (which implies that the difference between the left and right-hand sides of the equality converges to zero in probability). For proving Theorem 2, we require the following lemma.

Lemma 3

Assume

  1. 1.

    \(\sigma = \sigma (\theta )\) is a continuously differentiable \(p\)-valued function of \(\theta \in \Theta \subset R^q\), where \(\Theta \) is open and compact

  2. 2.

    \(\hat{\theta }_n\in \Theta \) is a sequence of random vectors with \(\hat{\theta }_n \mathop {\rightarrow }\limits ^{p} \theta _0\) and \(\sqrt{n}(\hat{\theta }_n- \theta _0) = O_p(1)\) {i.e., bounded in probability, as in Mann and Wald (1943)}

  3. 3.

    \( s_n \in R^p\) is a sequence of random vectors with \( s_n \mathop {\rightarrow }\limits ^{p} \sigma _0 \)

  4. 4.

    \(\sqrt{n} (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,V)\), where \(V\) is a \(p \times p\) semi-positive definite matrix

  5. 5.

    \( V_n\) are random \(p \times p\) matrices with \( V_n \mathop {\rightarrow }\limits ^{p} V\)

  6. 6.

    \(\sigma _0 = \sigma (\theta _0), \, \theta _o \in \Theta \) Footnote 2

  7. 7.

    \(A(\theta ) = \frac{ \partial \sigma (\theta )}{\partial \theta ^\prime }\) is regular in \(\theta _0\) {i.e., the matrix has constant rank for \(\theta \) in a neighborhood of \(\theta _0\)}

  8. 8.

    \(C_n = I_p -A_n( A_n^\prime A_n)^+ A_n^\prime \) and \(C_0 = I_p -A_0(A_0^\prime A_0)^+ A_0^\prime \), where \(A_n\) and \(A_0\) are \(A(\theta ) \) evaluated, respectively, at \(\hat{\theta }_n\) and \(\theta _0\)

Then

$$\begin{aligned}&C_n^\prime V_n C_n \mathop {\rightarrow }\limits ^{p} C_0^\prime VC_0\end{aligned}$$
(29)
$$\begin{aligned}&\sqrt{n}C_n(s_n - \sigma _n ) \mathop {=}\limits ^{a} \sqrt{n}C_0 (s_n - \sigma _0) \end{aligned}$$
(30)

and

$$\begin{aligned} \sqrt{n} C_n^\prime (s - \sigma _n ) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,C_0^\prime VC_0) \end{aligned}$$
(31)

Proof

Using 1, 2, 3, and 7, we obtain

$$\begin{aligned} \sqrt{n} (s_n - \sigma _n ) = \sqrt{n} [(s_n - \sigma _0) - (\sigma _n - \sigma _0 ) ] \end{aligned}$$

and

$$\begin{aligned} \sqrt{n} (s_n - \sigma _n ) \mathop {=}\limits ^{a} \sqrt{n} [(s_n - \sigma _0) + A_0(\hat{\theta }_n- \theta _0)] \end{aligned}$$
(32)

To proceed with the proof, we need a result on the continuity of the Moore-Penrose inverse.

Result S (Stewart , 1969): Let \(M_n\) be a sequence of square matrices, \(M_n^+ \rightarrow \) provided that

  1. 1.

    \(M_n \rightarrow M \) and

  2. 2.

    exist an \(n^\prime \in N\) such that \(r(M_n) = r(M)\) when \(n > n ^\prime \).

We need the following adaptation to the stochastic case, which we infer from Andrews ’s (1987).

Result A (Andrews , 1987, p. 355): Let \(M_n\) be a sequence of square random matrices, then \( M_n^+ \mathop {\rightarrow }\limits ^{p} M^+\) provided that

  1. 1.

    \(M_n \mathop {\rightarrow }\limits ^{p} M \) and

  2. 2.

    \(P[r(M_n) = r(M) ]\rightarrow 1\) when \(n \rightarrow \infty \)

The continuity of \(A(\theta )\) and 2 implies \(A_n \mathop {\rightarrow }\limits ^{p} A\), thus \(A_n^\prime A_n \mathop {\rightarrow }\limits ^{p} A^\prime A\). Since \(A(\theta )\) is regular at \(\theta _0\) we obtain that \(P(r(A_n) = r(A)) \rightarrow 1\) and hence

$$\begin{aligned} P[ r(A_n^\prime A_n) = r(A_n) = r(A) = r(A^\prime A) ] \rightarrow 1. \end{aligned}$$

This and Result A imply

$$\begin{aligned} (A_n^\prime A_n)^+ \mathop {\rightarrow }\limits ^{p} ( A^\prime A)^+ \end{aligned}$$

and hence

$$\begin{aligned} C_n \mathop {\rightarrow }\limits ^{p} C_0 , \end{aligned}$$
(33)

where \(C_n\) and \(C_0\) are defined in 8. By (33), and the convergence of \(V_n\) to \(V\), we obtain (29) of the lemma.

Combining (32), (33), assumptions 2 and 5, and \(C_0^\prime A_0 = 0\), we obtain (30). The result (31) follows directly from (30) and assumption 4. \(\square \)

We now move to the proof of Theorem 2.

Proof (Theorem 2)

Using Theorem 1, assumption 8 of Lemma 3 (implying \(C_n^\prime A_n = 0\) and \(r(C_n) + r(A_n) =p\)) and assumption 2 imply that for any \(\epsilon >0\), there is an integer \(n^\prime \) such that for \(n >n^\prime \)

$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) ) > 1 - \epsilon /2; \end{aligned}$$
(34)

in addition, from assumption 1, there is an integer \(n^{\prime \prime }\) such that for \(n > n^{\prime \prime }\)

$$\begin{aligned} P( r(V_n) = r(V) ) > 1 - \epsilon /2. \end{aligned}$$
(35)

Thus, when \(n > \text{ max }\, (n^\prime , n^{\prime \prime })\),

$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) = r(V) - r(A_0) ) > 1 - \epsilon \end{aligned}$$
(36)

Here we used

$$\begin{aligned} P({\mathcal {E}} \cap {\mathcal {U}} ) \ge P({\mathcal {E}} ) + P({\mathcal {U}} ) -1, \end{aligned}$$
(37)

where \({\mathcal {E}} \) and \({\mathcal {U}} \) denote events in the same probability space (in the present case, \({\mathcal {E}} \) and \({\mathcal {U}} \) are the arguments of the probabilities in (34) and (35), respectively). Now, using again Theorem 1, when \(n > \text{ max }\, (n^\prime , n^{\prime \prime })\), the result (36) expands to

$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r(V_n) - r(A_n) = r(V) - r(A_0) = r( C_0^\prime V C_0) ) > 1 - \epsilon \end{aligned}$$
(38)

Since (38) holds for any \(\epsilon >0\), we conclude

$$\begin{aligned} P( r( C_n^\prime V_n C_n ) = r( C_0^\prime V C_0) ) \rightarrow 1 \end{aligned}$$
(39)

From (29), (39), and Result A, we obtain the key result

$$\begin{aligned} ( C_n^\prime V_n C_n )^+ \mathop {\rightarrow }\limits ^{p} ( C_0^\prime V C_0 )^+ \end{aligned}$$
(40)

Clearly, using 4 of Lemma 3, we obtain

$$\begin{aligned} \sqrt{n} C_0^\prime (s - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (0,C_0^\prime VC_0) \end{aligned}$$
(41)

Now, combining (40) and (41) yields (using Theorem 2 of Moore , 1977)

$$\begin{aligned} n (s_n - \sigma _0)^\prime C_0 ( C_0^\prime V C_0 )^+ C_0^\prime (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} \chi ^2_{r( C_0^\prime V C_0) } \end{aligned}$$
(42)

The correspondence with Moore’s Theorem 2 is as follows: Moore’s convergence in probability of \(B_n\) is our (40); Moore’s \(\Sigma \) and \(\sqrt{n}(t_n - \tau _0)\) are respectively \((C_0^\prime V C_0 )\) and \(\sqrt{n} C_0^\prime (s_n - \sigma _0) \); finally, Moore’s degrees of freedom \(k\) is our \(r(C_0^\prime V C_0 )\).

The asymptotic chi-square result (8) follows by noting that the quadratic form value in (42) is asymptotically equal to \(T\) of (7), due to (30) and (40). Finally, using again Theorem 1 and \(r(V) = r(V,A_0)\), we obtain (9), concluding the proof of the theorem. \(\square \)

Note 1 (the non-central case) When 4 of Lemma 3 is replaced by

$$\begin{aligned} \sqrt{n} (s_n - \sigma _0) \mathop {\rightarrow }\limits ^{d} {\mathcal {N}}\, (\delta ,V) \end{aligned}$$
(43)

where \(\delta \) is a \(p \times 1\) vector, then the results of Lemma 3 apply except that the right-hand side of (31) would now have mean \(C^\prime \delta \) instead of zero.

When in addition \(\delta \in {\mathcal {M}} (V) \), then the symptotic distribution of \(T_n\) of Theorem 2 is \(\chi ^2_r(\lambda )\), with \(\lambda = \delta ^\prime C_0(C_0^\prime V C_0)^+C_0^\prime \delta \) and \(r\) the same value \(r=r(C_0^\prime V C_0)\). This follows by noting that \(\delta \in {\mathcal {M}} (V) [= {\mathcal {M}} ( V^{1/2} ) ]\) implies \( C^\prime \delta \in {\mathcal {M}} (C^\prime VC ) [ = {\mathcal {M}} (C^\prime V^{1/2} ) ] \); consequently, the assumptions for b. of Theorem 2 of Moore (1977) hold, concluding a non-central chi-square distribution. Note that we used \( {\mathcal {M}} ( UU^\prime ) = {\mathcal {M}} ( U ) \) for any matrix \(U\) (see A4 of Satorra & Neudecker , 2003, p. 77). Assumption (43) with \(\delta \ne 0\) could be rephrased as the assumption of a sequence of local alternatives (as in, e.g., Satorra , 1989).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Satorra, A., Neudecker, H. A Theorem on the Rank of a Product of Matrices with Illustration of Its Use in Goodness of Fit Testing. Psychometrika 80, 938–948 (2015). https://doi.org/10.1007/s11336-014-9438-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9438-5

Keywords

  • matrix algebra
  • rank deficiency
  • wald test
  • chi-square goodness of fit test
  • augmented moment structures
  • structural equation modeling