Skip to main content

An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square

Abstract

The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson’s Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs ij has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, \(GFfit_{\perp }^{(ij)}\), as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of \(GFfit_{\perp }^{(ij)}\) is also extended to higher-order tables so that the \(GFfit_{\perp }\) statistics sum to the Pearson statistic. As orthogonal components of the Pearson \(X^2\) statistic, \(GFfit_{\perp }^{(ij)}\) statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the \(GFfit_{\perp }^{(ij)}\) generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that \(GFfit_{\perp }^{(ij)}\) statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of \(GFfit_{\perp }^{(ij)}\) statistics is also computationally stable. The extended \(GFfit_{\perp }^{(ij)}\) statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.

This is a preview of subscription content, access via your institution.

References

  • Afifi, A. A., & Clark, V. (1984). Computer-aided multivariate analysis. Lifetime Learning Publications.

  • Agresti, A., & Yang, M. C. (1987). An empirical investigation of some effects of sparseness in contingency tables. Computational Statistics & Data Analysis, 5, 9–21.

    Article  Google Scholar 

  • Asparouhov, T., & Muthén, B. (2010). Simple second order chi-square correction. Mplus technical report. https://www.statmodel.com/download/WLSMV_new_chi21.pdf. Accessed 18 Feb 2018

  • Bartholomew, D. J. (1987). Latent variable models and factor analysis. Oxford University Press.

  • Bartholomew, D. J., & Leung, S. O. (2002). A goodness-of-fit test for sparse \(2^p\) contingency tables. British Journal of Mathematical and Statistical Psychology, 55, 1–15.

    Article  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.

    Google Scholar 

  • Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4), 1165–1188.

    Article  Google Scholar 

  • Birch, M. W. (1964). A new proof of the Pearson-Fisher theorem. Annals of Mathematical Statistics, 35, 818–824.

  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.

    Article  Google Scholar 

  • Breinegaard, N., Rabe-Hesketh, S. & Skrondal, A. (2018). Pairwise residuals and diagnostic tests for misspecified dependence structures in models for binary longitudinal data. Statistics in Medicine, 37(3), 343–356.

  • Cagnone, S., & Mignani, S. (2007). Assessing the goodness of fit for a latent variable model for ordinal data. Metron, LXV, 337–361.

    Google Scholar 

  • Cai, L., & Hansen, M. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245–276.

  • Cai, L., Maydeu-Olivares, A., Coffman, D., & Thissen, D. (2006). Limited information goodness-of-fit testing of item response theory models for sparse \(2^p\) tables. British Journal of Mathematical and Statistical Psychology, 59, 173–194.

    Article  Google Scholar 

  • Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5–32.

    Article  Google Scholar 

  • Dassanayake, M., Reiser, M., & Zhu, J. (2016). Power calculations for statistics based on orthogonal components of Pearson’s chi-square. In V. A. Alexandria (Ed.), JSM proceedings, biometrics section (pp. 1079–1093). American Statistical Association.

  • Eubank, R. L. (1997). Testing goodness of fit with multinomial data. Journal of the American Statistical Association, 92(439), 1084–1093.

    Article  Google Scholar 

  • Glas, C. A. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525–546.

    Article  Google Scholar 

  • Glas, C. A. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64(3), 273–294.

    Article  Google Scholar 

  • Glas, C. A., & Suárez Falcón, J. D. (2003). A comparison of item fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27, 265–289.

    Google Scholar 

  • Glas, C. A., & Verhelst, N. D. (1995). Testing the Rasch model. Rasch models (pp. 69–95). Springer. https://doi.org/10.1007/978-1-4612-4230-7-5.

  • Haberman, S. J. (1973). The analysis of residuals in cross-classified tables. Biometrics, 29, 205–220.

    Article  Google Scholar 

  • Houseman, E. A., Ryan, L. M., & Coull, B. A. (2004). Cholesky residuals for assessing normal errors in a linear model with correlated outcomes. Journal of the American Statistical Association, 99(486), 383–394.

    Article  Google Scholar 

  • Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J.-M., & Thiebaut, R. (2007). Robustness of the linear mixed model to misspecified error distribution. Computational Statistics & Data Analysis, 51(10), 5142–5154.

    Article  Google Scholar 

  • Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.

  • Koehler, K. J. (1986). Goodness-of-fit tests for log-linear models in sparse contingency tables. Journal of the American Statistical Association, 81, 336–344.

    Article  Google Scholar 

  • Koehler, K. J., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association, 75, 336–344.

    Article  Google Scholar 

  • Lancaster, H. O. (1969). The chi-squared distribution. Wiley.

  • Liu, Y., & Maydeu-Olivares, A. (2012). Local dependence diagnostics in IRT modeling of binary data. Educational and Psychological Measurement, 73(2), 254–274.

    Article  Google Scholar 

  • Liu, Y., & Maydeu-Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49, 354–371.

    Article  Google Scholar 

  • Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. Wiley & Sons.

  • Mavridis, D., Moustaki, I., & Knott, M. (2007). Goodness-of-fit measures for latent variable models for binary data. In S.-Y. Lee (Ed.), Handbook of Latent Variable and Related Models (pp. 135–161). Amsterdam, The Netherlands: Elsevier.

    Google Scholar 

  • Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in \(2^{n}\) contingency tables: A unified framework. Journal of the American Statistical Association, 100(471), 1009–1020.

    Article  Google Scholar 

  • Maydeu-Olivares, A., & Joe, H. (2006). Limited and full information estimation and goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732.

    Article  Google Scholar 

  • Maydeu-Olivares, A., & Liu, Y. (2012). Local dependence diagnostics in IRT modeling of binary data. Educational and Psychological Measurement, 73(2), 254–274.

    Google Scholar 

  • Maydeu-Olivares, A., & Montaño, R. (2013). How should we assess the fit of Rasch-type models? Approximating the power of goodness-of-fit statistics in categorical data analysis. Psychometrika, 78, 116–133.

    Article  Google Scholar 

  • Mirvaliev, M. (1987). The components of chi-squared statistics for goodness-of-fit tests. Journal of Soviet Mathematics, 38, 2357–2363. https://doi.org/10.1007/BF01095078

    Article  Google Scholar 

  • National Institute of Mental Health (NIMH). (2019). Results from the 2017 national survey on drug use and mental health. https://www.samhsa.gov/data/sites/default/files/cbhsq-reports/NSDUHDetailedTabs2017/NSDUHDetailedTabs2017.htm#tab8-56A. Accessed 15 June 2020

  • Rayner, J. C. W., & Best, D. J. (1989). Smooth tests of goodness of fit. Oxford.

  • Reiser, M. (1989). An application of the item response model to psychiatric epidemiology. Sociological Methods & Research, 18, 66–103.

    Article  Google Scholar 

  • Reiser, M. (1996). Analysis of residuals for the multinomial item response model. Psychometrika, 61, 509–528.

    Article  Google Scholar 

  • Reiser, M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British Journal of Mathematical and Statistical Psychology, 61(2), 331–360.

    Article  Google Scholar 

  • Reiser, M. (2019). Goodness-of-fit testing in sparse contingency tables when the number of variables is large. WIRES Computational Statistics, 11(6), e1470.

    Article  Google Scholar 

  • Reiser, M., & Dassanayake, M. (2021). A study of lack-of-fit diagnostics for models fit to cross-classified binary variables. In G. Porzio, C. Rampichini, & C. Bocci (Eds.), CLADAG 2021 book of abstracts and short papers (pp. 191–194). Firenze University Press. https://doi.org/10.36253/978-88=5518-340-6

  • Salomaa, H. (1990). Factor analysis of dichotomous data. Statistical Society.

  • Schabenberger, O. (2005). Mixed model influence diagnostics. In SAS users group international conference (SUGI), 189-29.

  • Simonoff, J.S. (1986). Jackknifing and bootstrapping goodness-of-fit statistics in sparse multinomials. Journal of the American Statistical Association, 81(396), 1005–1011.

  • Tollenaar, N., & Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap goodness-of-fit test: Full and limited information. British Journal of Mathematical and Statistical Psycholgy, 56, 271–288.

  • Sharma, S. (1995). Applied multivariate techniques. Wiley.

  • Verbeke, G., & Molenberghs, G. (2009). Linear mixed models for longitudinal data. Springer.

  • Yen, W. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Reiser.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 195 KB)

Supplementary file 2 (zip 19 KB)

Appendix A

Appendix A

First- and Second-Order Marginals

Define \(\mathbf{H}_{[1]} = {\varvec{V}}'\). Then, under the model \(\varvec{\pi }= \varvec{\pi }({\varvec{\beta }})\), the first-order marginal proportion for variable \(Y_i\) can be defined as

$$\begin{aligned}&{\pi }^{(i)}(a; {\varvec{\beta }})= \mathrm{Prob}(Y_i=a \vert {\varvec{\beta }}) =\sum _s h_{\ell s} \pi _s({\varvec{\beta }})={\varvec{h}}^{'}_{\ell }{\varvec{\pi }}({\varvec{\beta }}), \nonumber \\&a=2,\ldots ,c; \ \ell =(c-1)(i-1)+a-1; \ s=1,\ldots ,T, \end{aligned}$$
(9.1)

where \(h_{\ell s}\) is an element of the \(q(c-1)\) by T matrix \(\mathbf{H}_{[1]}\), and where \({\varvec{h}}{'}_{\ell }\) is row \(\ell \) of matrix \(\mathbf{H}_{[1]}\). The true first-order marginal proportion is given by

$$\begin{aligned} {\pi }^{(i)}(a)=\mathrm{Prob}(Y_i=a)=\sum _s h_{\ell s}{\pi }_s ={\varvec{h}}^{'}_{\ell }{\varvec{\pi }} \ . \end{aligned}$$
(9.2)

The second-order marginal proportion for variables \(Y_i\) and \(Y_j\) under the model can be defined as

$$\begin{aligned} {\pi }^{(ij)}(a,b; {\varvec{\beta }})= \mathrm{Prob}(Y_i=a, Y_j=b | {\varvec{\beta }}) =\sum _s h_{m s}h_{\ell s} \pi _s({\varvec{\beta }})=({\varvec{h}}{'}_m \circ {\varvec{h}}^{'}_{\ell }){\varvec{\pi }}({\varvec{\beta }}), \end{aligned}$$
(9.3)

where \(i=1,\ldots ,q-1\); \(j=i,\ldots ,q\); \(m=(c-1)(i-1)+a-1 \); \(\ell =(c-1)(j-1)+b-1\); \(a=2,\ldots ,c\); \(b=2,\ldots ,c\); and \({\varvec{h}}{'}_m \circ {\varvec{h}}^{'}_{\ell }\) represents the Hadamard product Magnus & Neudecker (1999) of rows m and \(\ell \) from matrix \(\mathbf{H}_{[1]}\). Then the true second-order marginal proportion is given by

$$\begin{aligned} {\pi }^{(ij)}(a,b)=\mathrm{Prob}(Y_i=a, Y_j=b)=\sum _s h_{ms}h_{\ell s}{\pi }_s=({\varvec{h}}{'}_m \circ {\varvec{h}}^{'}_{\ell }){\varvec{\pi }} \ . \end{aligned}$$
(9.4)

\(\varvec{V}\) Matrix

The matrix \(\varvec{V}\) has \((c-1)\) kernel patterns, each of dimension c. For \(c=2\), the kernel pattern is \({\varvec{f}}_1 = (0,\ 1)^{'}\), and for \(c=3\), the kernel patterns are \({\varvec{f}}_1=(0,\ 0,\ 1)^{'}\) and \({\varvec{f}}_2=(0, \ 1,\ 0)^{'}\). In general, the kernel patterns, as columns, form a \((c-1)\) by \((c-1)\) matrix \({\varvec{J}} - {\varvec{I}}\) adjoined to a row of zeros. The matrix \({\varvec{V}}\) can be generated by Kronecker products of the kernel patterns with the vector \({\varvec{1}}_c\), which is a vector of length c where each element is 1. The pattern of columns is

$$\begin{aligned} {\varvec{V}}&=({\varvec{f}}_1 \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ {\varvec{f}}_2 \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c) \ldots {\varvec{f}}_{c-1} \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c),\nonumber \\&\qquad {\varvec{1}}_c \otimes ({\varvec{f}}_1 \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ {\varvec{1}}_c \otimes ({\varvec{f}}_2 \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ldots {\varvec{1}}_c \otimes ({\varvec{f}}_{c-1} \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c),\ldots \nonumber \\&\qquad {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_1), \ {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_2), \ldots {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_{c-1})) \end{aligned}$$
(9.5)

With \(q=3\) and \(c=2\), \(\varvec{V}\) is generated as

$$\begin{aligned} {\varvec{V}} = \begin{pmatrix} {\varvec{f}}_1 \otimes ({\varvec{1}}_2 \otimes {\varvec{1}}_2), \ {\varvec{1}}_2 \otimes ({\varvec{f}}_1 \otimes {\varvec{1}}_2), \ ({\varvec{1}}_2 \otimes {\varvec{1}}_2) \otimes {\varvec{f}}_1 \ . \end{pmatrix} \end{aligned}$$
(9.6)

For \(q=3\) and \(c=3\), \(\varvec{V}\) is generated as

$$\begin{aligned} {\varvec{V}}&= ({\varvec{f}}_1 \otimes ({\varvec{1}}_3 \otimes {\varvec{1}}_3), \ {\varvec{f}}_2 \otimes ({\varvec{1}}_3 \otimes {\varvec{1}}_3), \ {\varvec{1}}_3 \otimes ({\varvec{f}}_1 \otimes {\varvec{1}}_3), \nonumber \\&\qquad {\varvec{1}}_3 \otimes ({\varvec{f}}_2 \otimes {\varvec{1}}_3), \ ({\varvec{1}}_3 \otimes {\varvec{1}}_3) \otimes {\varvec{f}}_1, \ ({\varvec{1}}_3 \otimes {\varvec{1}}_3) \otimes {\varvec{f}}_2), \end{aligned}$$
(9.7)

and for \(q=4\) and \(c=4\), \(\varvec{V}\) is generated as

$$\begin{aligned} {\varvec{V}}&=({\varvec{f}}_1 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4), \ {\varvec{f}}_2 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4), \ {\varvec{f}}_3 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4),\nonumber \\&\qquad {\varvec{1}}_4 \otimes ({\varvec{f}}_1 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4), \ {\varvec{1}}_4 \otimes ({\varvec{f}}_2 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4), \ {\varvec{1}}_4 \otimes ({\varvec{f}}_3 \otimes {\varvec{1}}_4 \otimes {\varvec{1}}_4), \nonumber \\&\qquad {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{f}}_1 \otimes {\varvec{1}}_4), \ {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{f}}_2 \otimes {\varvec{1}}_4), \ {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{f}}_3 \otimes {\varvec{1}}_4), \nonumber \\&\qquad {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{f}}_1), \ {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{f}}_2), \ {\varvec{1}}_4 \otimes ({\varvec{1}}_4 \otimes {\varvec{1}}_4 \otimes {\varvec{f}}_3)) \end{aligned}$$
(9.8)

\(\varvec{H}\) Matrix

For second-order marginals, a \((c-1)^2q(q-1)/2\) by \(c^q\) matrix \(\mathbf{H}_{[2]}\) can be defined by forming Hadamard products among the columns \({\varvec{V}}\):

$$\begin{aligned} {\varvec{H}}_{[2]} = \begin{pmatrix} ({\varvec{v}}_1 \circ {\varvec{v}}_{c})' \\ ({\varvec{v}}_1 \circ {\varvec{v}}_{c+1})' \\ \vdots \\ ({\varvec{v}}_1 \circ {\varvec{v}}_{q(c-1)})' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_{c})' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_{c+1})' \\ \vdots \\ ({\varvec{v}}_2 \circ {\varvec{v}}_{q(c-1)})' \\ \vdots \\ ({\varvec{v}}_{c-1} \circ {\varvec{v}}_{c})' \\ ({\varvec{v}}_{c-1} \circ {\varvec{v}}_{c+1})' \\ \vdots \\ ({\varvec{v}}_{c-1} \circ {\varvec{v}}_{q(c-1)})' \\ \vdots \\ ({\varvec{v}}_{c} \circ {\varvec{v}}_{(q-1)(c-1)})' \\ \vdots \\ ({\varvec{v}}_{c} \circ {\varvec{v}}_{q(c-1)})' \\ \vdots \\ ({\varvec{v}}_{(q-1)(c-1)} \circ {\varvec{v}}_{(q-1)(c-1)+1})' \\ \vdots \\ ({\varvec{v}}_{(q-1)(c-1)} \circ {\varvec{v}}_{q(c-1)})' \end{pmatrix} \end{aligned}$$
(9.9)

where \({\varvec{v}}_{\ell }\) represents column \(\ell \) of matrix \({\varvec{V}}\). To place the marginals in a convenient order, the columns of \(\mathbf{H}\) from the products \(({\varvec{v}}{'}_m \circ {\varvec{v}}^{'}_{\ell })\) are arranged in lexicographical order. If \(c=2\),

$$\begin{aligned} \mathbf{H}_{[2]} = \begin{pmatrix} ({\varvec{v}}_1 \circ {\varvec{v}}_2)' \\ ({\varvec{v}}_1 \circ {\varvec{v}}_3)' \\ \vdots \\ ({\varvec{v}}_1 \circ {\varvec{v}}_q)' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_3)' \\ \vdots \\ ({\varvec{v}}_2 \circ {\varvec{v}}_q)' \\ \vdots \\ ({\varvec{v}}_{q-1} \circ {\varvec{v}}_q)' \end{pmatrix}, \end{aligned}$$
(9.10)

If \(q=3\) and \(c=4\) categories, \(\mathbf{H}_{[2]}\) is a 27 by 64 matrix:

$$\begin{aligned} \mathbf{H}_{[2]} = \begin{pmatrix} ({\varvec{v}}_1 \circ {\varvec{v}}_4)' \\ ({\varvec{v}}_1 \circ {\varvec{v}}_5)' \\ ({\varvec{v}}_1 \circ {\varvec{v}}_6)' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_4)' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_5)' \\ ({\varvec{v}}_2 \circ {\varvec{v}}_6)' \\ ({\varvec{v}}_3 \circ {\varvec{v}}_4)' \\ ({\varvec{v}}_3 \circ {\varvec{v}}_5)' \\ ({\varvec{v}}_3 \circ {\varvec{v}}_6)' \\ \vdots \\ ({\varvec{v}}_4 \circ {\varvec{v}}_7)' \\ ({\varvec{v}}_4 \circ {\varvec{v}}_8)' \\ ({\varvec{v}}_4 \circ {\varvec{v}}_9)' \\ ({\varvec{v}}_5 \circ {\varvec{v}}_7)' \\ ({\varvec{v}}_5 \circ {\varvec{v}}_8)' \\ ({\varvec{v}}_5 \circ {\varvec{v}}_9)' \\ ({\varvec{v}}_6 \circ {\varvec{v}}_7)' \\ ({\varvec{v}}_6 \circ {\varvec{v}}_8)' \\ ({\varvec{v}}_6 \circ {\varvec{v}}_9)' \end{pmatrix} \end{aligned}$$
(9.11)

\(\varvec{M}\) Matrix

Consider c kernel patterns \({\varvec{f}}_{\ell }\), \(\ell =1,2,\ldots ,c\) that form, as columns, a c by c matrix \({\varvec{J}}-{\varvec{I}}\), and consider the cq by T matrix \(\varvec{U}\) given by

$$\begin{aligned} {\varvec{U}}&= ({\varvec{f}}_1 \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ {\varvec{f}}_2 \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c) \ldots {\varvec{f}}_c \otimes ({\varvec{1}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c),\nonumber \\&\quad {\varvec{1}}_c \otimes ({\varvec{t}}_1 \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ {\varvec{1}}_c \otimes ({\varvec{f}}_2 \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c), \ldots {\varvec{1}}_c \otimes ({\varvec{f}}_c \otimes {\varvec{1}}_c \ldots \otimes {\varvec{1}}_c),\ldots \nonumber \\&\quad {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_1), \ {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_2), \ldots {\varvec{1}}_c \otimes ({\varvec{1}}_c \ldots \otimes {\varvec{1}}_c \otimes {\varvec{f}}_c)) \end{aligned}$$
(9.12)

Then a \(c^2q(q-1)/2\) by T matrix \(\varvec{M}\) is defined using Hadamard products among the columns of \(\varvec{U}\):

$$\begin{aligned} {\varvec{M}}_{[2]} = \begin{pmatrix} ({\varvec{u}}_1 \circ {\varvec{u}}_{c+1})' \\ ({\varvec{u}}_1 \circ {\varvec{u}}_{c+2})' \\ \vdots \\ ({\varvec{u}}_1 \circ {\varvec{u}}_{qc})' \\ ({\varvec{u}}_2 \circ {\varvec{u}}_{c+1})' \\ ({\varvec{u}}_2 \circ {\varvec{u}}_{c+2})' \\ \vdots \\ ({\varvec{u}}_2 \circ {\varvec{u}}_{qc})' \\ \vdots \\ ({\varvec{u}}_c \circ {\varvec{u}}_{c+1})' \\ ({\varvec{u}}_c \circ {\varvec{u}}_{c+2})' \\ \vdots \\ ({\varvec{u}}_c \circ {\varvec{u}}_{qc})' \\ \vdots \\ ({\varvec{u}}_{c+1} \circ {\varvec{u}}_{2c+1})' \\ \vdots \\ ({\varvec{u}}_{c+1} \circ {\varvec{u}}_{qc})' \\ \vdots \\ ({\varvec{u}}_{(q-2)c+1} \circ {\varvec{u}}_{(q-1)c+1})' \\ \vdots \\ ({\varvec{u}}_{(q-1)c} \circ {\varvec{u}}_{qc)})' \end{pmatrix} \end{aligned}$$
(9.13)

Linear dependencies exist among the columns of \({\varvec{U}}\); \({\varvec{V}}\) from Sect. 2 consists of the linear independent columns of \({\varvec{U}}\) such that \({\varvec{V}}={\varvec{U}}{\varvec{A}}\), where \({\varvec{A}}={\varvec{I}} \otimes ({\varvec{t}}_1, {\varvec{t}}_2, \ldots ,{\varvec{t}}_c)\).

The \((c-1)^2q(q-1)/2\) by \(c^2q(q-1)/2\) matrix \({\varvec{A}}\) is given by

$$\begin{aligned} {\varvec{A}}=\begin{pmatrix} {\varvec{I}}_{\ell } \otimes {\varvec{A}}^{(1)} &{} {\varvec{0}}_{q(c-1)\ \mathrm{x}\ qc} &{} {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{0}} &{} \ldots &{} {\varvec{0}}\\ {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{I}}_{\ell } \otimes {\varvec{A}}^{(1)} &{} {\varvec{0}}_{g(c-1)\ \mathrm{x} (q-1)c} &{} {\varvec{0}} &{} \ldots &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{0}} &{} \ddots &{} {\varvec{0}} &{} {\varvec{0}} \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} {\varvec{0}} &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{0}} &{} {\varvec{0}} &{} \ldots &{} {\varvec{I}}_{2 \mathrm{x} 2} \otimes {\varvec{A}}^{(1)} &{} {\varvec{0}}_{2(c-1) \mathrm{x} c} \end{pmatrix} \end{aligned}$$
(9.14)

where \(\ell =(q-d)(c-1)\) for column d of \({\varvec{A}}\), and \({\varvec{A}}^{(1)} = ({\varvec{I}}_{(c-1) } \ \vdots \ {\varvec{0}}).\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Reiser, M., Cagnone, S. & Zhu, J. An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square. Psychometrika (2022). https://doi.org/10.1007/s11336-022-09866-6

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-022-09866-6

Keywords

  • multivariate discrete distribution
  • overlapping cells
  • orthogonal components
  • composite null hypothesis