Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data

Abstract

The main purpose of this paper is to apply likelihood-based hypothesis testing procedures to a class of latent variable models for ordinal responses that allow for uncertain answers (Colombi et al. in Scand J Stat, 2018. https://doi.org/10.1111/sjos.12366). As these models are based on some assumptions, needed to describe different respondent behaviors, it is essential to discuss inferential issues without assuming that the tested model is correctly specified. By adapting the works of White (Econometrica 50(1):1–25, 1982) and Vuong (Econometrica 57(2):307–333, 1989), we are able to compare nested models under misspecification and then contrast the limiting distributions of Wald, Lagrange multiplier/score and likelihood ratio statistics with the classical asymptotic Chi-square to show the consequences of ignoring misspecification.

This is a preview of subscription content, log in to check access.

References

  1. Amemiya T (1985) Advanced econometrics. Harvard University Press, Cambridge

    Google Scholar 

  2. Bandura A (1986) Social foundations of thought and action: a social cognitive theory. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  3. Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sin 17:691–711

    MathSciNet  MATH  Google Scholar 

  4. Baumgartner H, Steenkamp JBE (2001) Response styles in marketing research: a cross-national investigation. J Market Res 38(2):143–156

    Article  Google Scholar 

  5. Bergsma WP, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159

    MathSciNet  Article  Google Scholar 

  6. Boos DD, Stefanski LA (2013) Essential statistical inference: theory and methods. Springer, Berlin

    Google Scholar 

  7. Bowden RJ (1973) The theory of parametric identification. Econometrica 41(6):1069–74

    MathSciNet  Article  Google Scholar 

  8. Colombi R, Giordano S, Cazzaro M (2014) hmmm: an R package for hierarchical multinomial marginal models. J Stat Softw 59:1–25

    Article  Google Scholar 

  9. Colombi R, Giordano S, Gottard A, Iannario M (2018) A hierarchical marginal model with latent uncertainty. Scand J Stat. https://doi.org/10.1111/sjos.12366

    MathSciNet  Article  Google Scholar 

  10. de Micheaux PL (2017) CompQuadForm: distribution function of quadratic forms in normal variables. R package version 1.4.3

  11. de Leeuw ED, Dillman D (2008) International handbook of survey methodology. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  12. Duchesne P, de Micheaux PL (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu–Tang–Zhang approximation and exact methods. Comput Stat Data Anal 54:858–862

    MathSciNet  Article  Google Scholar 

  13. Forcina A (2008) Identifiability of extended latent class models with individual covariates. Comput Stat Data Anal 52:5263–5268

    MathSciNet  Article  Google Scholar 

  14. Glonek GF, McCullagh P (1995) Multivariate logistic models. J R Stat Soc Ser B (Methodological) 57:533–546

    MATH  Google Scholar 

  15. Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in cub models. Adv Data Anal Classif 10:225–244

    MathSciNet  Article  Google Scholar 

  16. Iannario M, Monti AC, Piccolo D (2016) Robustness issues for cub models. Test 25:731–750

    MathSciNet  Article  Google Scholar 

  17. Kullback S, Leiber RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    MathSciNet  Article  Google Scholar 

  18. Magnus JR (1988) Linear structures. Oxford University Press, Oxford

    Google Scholar 

  19. Magnus JR, Neudecker H (2007) Matrix differential calculus with applications in statistics and econometrics, 3rd edn. Wiley, London

    Google Scholar 

  20. Mathai AM, Provost SB (1992) Quadratic forms in random variables: theory and applications. Statistics: a series of textbooks and monographs. CRC Press, Boca Raton

    Google Scholar 

  21. Rothenberg T (1971) Identification in parametric models. Econometrica 39:577–591

    MathSciNet  Article  Google Scholar 

  22. Sagone E, De Caroli ME (2013) Personality factors and civic moral disengagement in law and psychology university students. Proc Soc Behav Sci 93:158–163

    Article  Google Scholar 

  23. Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Stat Neerl 72:224–245

    MathSciNet  Article  Google Scholar 

  24. Studeny M (2005) Probabilistic conditional independence structures. Springer, London

    Google Scholar 

  25. Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, New York

    Google Scholar 

  26. Tutz G, Schneider M (2017) Mixture models for ordinal responses with a flexible uncertainty component. Technical Report Number 203

  27. Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307–333

    MathSciNet  Article  Google Scholar 

  28. White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Rocco Servidio of the Department of Languages and Educational Sciences (University of Calabria, Italy) for providing the real data analyzed in Sect. 7. Moreover, we acknowledge two referees for their useful comments that improved the initial version of the paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sabrina Giordano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

An useful result of matrix algebra, Magnus (1988, Definition 7.1), is here recalled for easy reference.

Lemma 1

Let \(w({\varvec{X}})\) be a vector containing the diagonal elements of a square matrix \({\varvec{X}}\). If \({\varvec{X}}\) is \(n \times n\) diagonal matrix, then there exists a \(n \times n^2\) matrix \({\varvec{\varPsi }}_n\) with the property

$$\begin{aligned} \mathrm{vec} \, {\varvec{X}} = {\varvec{\varPsi }}^{\prime }_n w({\varvec{X}}). \end{aligned}$$
(17)

In the following part of “Appendix,” matrices \({\varvec{D}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\beta }}^{\prime }}\) and \(\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}\) are computed. To obtain \({\varvec{D}}_h\) we rely on Forcina (2008). The saturated log-linear model for vector \({\varvec{p}}_h\) of the joint probabilities of the v observable responses and the v latent variables in the \(h\mathrm{th}\) stratum is denoted by

$$\begin{aligned} {\varvec{p}}_h=\frac{\exp ({\varvec{Z}} \ {\varvec{\theta }}_h)}{{\varvec{1}}^{\prime }\exp ({\varvec{Z}} \ {\varvec{\theta }}_h)}, \end{aligned}$$

where \({\varvec{Z}}\) is the design matrix of the log-linear model. As shown by Bartolucci et al. (2007), the transformation from the log-linear parameters \({\varvec{\theta }}_h\) to the generalized interactions \({\varvec{\eta }}_h={\varvec{C}} \ln {\varvec{M}} {\varvec{p}}_h\) is a diffeomorphism and

$$\begin{aligned} {\varvec{R}}_h=\frac{\partial {\varvec{\eta }}_h}{\partial {\varvec{\theta }}_h^{\prime }}= {\varvec{C}} \; {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \; {\varvec{M}} \, {\varvec{\varOmega }}_h {\varvec{Z}} = {\varvec{C}} \; {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \; {\varvec{M}} \, \mathrm{Diag}({\varvec{p}}_h) {\varvec{Z}} , \end{aligned}$$

with \({\varvec{\varOmega }}_h= {\mathrm{Diag}}({\varvec{p}}_h)- {\varvec{p}}_h {\varvec{p}}_h^{\prime }\). The second equality in (18) follows from the fact that \({\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \ {\varvec{M}} {\varvec{p}}_h = {\varvec{0}}\) since the sum of every row of \({\varvec{C}}\) is zero.

From the chain rule of matrix differential calculus (Magnus and Neudecker 2007), we get

$$\begin{aligned} {\varvec{D}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\beta }}^{\prime }}=\frac{\partial \, {\varvec{\gamma }}_h}{\partial \, {\varvec{\theta }}_h^{\prime }} \frac{\partial \, {\varvec{\theta }}_h}{\partial \, {\varvec{\eta }}_h^{\prime }}\frac{\partial \, {\varvec{\eta }}_h}{\partial \, {\varvec{\beta }}^{\prime }}={\varvec{Q}}_h \,{\varvec{R}}_h^{-1}\, {\varvec{X}}_h, \end{aligned}$$
(18)

where

$$\begin{aligned} {\varvec{Q}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\theta }}_h ^{\prime }}= & {} {\varvec{K}} \,{\mathrm{Diag}}^{-1}({\varvec{q}}_h) \; {\varvec{L}} \, {\varvec{\varOmega }}_h {\varvec{Z}} ={\varvec{K}} \,{\mathrm{Diag}}^{-1}({\varvec{q}}_h) \; {\varvec{L}} \, \mathrm{Diag}({\varvec{p}}_h) {\varvec{Z}}. \end{aligned}$$

To compute the Hessian for stratum h, it is necessary to calculate the derivative of matrix \({\varvec{D}}_h\), defined in (18). So that, from (18), we deduce

$$\begin{aligned} \frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }} = ({\varvec{X}}_h^{\prime }{\varvec{R}}_h^{^{\prime }-1} \otimes \ {\varvec{I}}_{m-1}) \ \frac{\partial \, \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }} + ({\varvec{X}}_h^{\prime }\ \otimes \ {\varvec{Q}}_h) \ \frac{\partial \, \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \, {\varvec{\beta }}^{\prime }}, \end{aligned}$$
(19)

and to complete the formula the derivatives \(\frac{\partial \, \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\), \(\frac{\partial \, \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \, {\varvec{\beta }}^{\prime }}\) have to be computed.

In light of (17), matrix \({\varvec{R}}_h\) of Eq. (18) can be vectorized as

$$\begin{aligned} \mathrm{vec} \, {\varvec{R}}_h= & {} [{\varvec{Z}}^{\prime }\otimes {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h )\ {\varvec{M}}]\ {\varvec{\varPsi }}_t ^{\prime }\ {\varvec{p}}_h = [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\otimes {\varvec{C}}] \ {\varvec{\varPsi }}_s ^{\prime }\ {\varvec{\mu }}_h, \end{aligned}$$

where t and s are the lengths of the vectors \({\varvec{p}}_h\) and \({\varvec{M}} {\varvec{p}}_h\), respectively, and \({\varvec{\mu }}_h\) is the vector of the reciprocal values of \({\varvec{M}} {\varvec{p}}_h\).

Thus, we obtain

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h}{\partial \, {\varvec{\beta }}^{\prime }}= & {} [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\otimes {\varvec{C}}] \ {\varvec{\varPsi }}_s^{\prime }\ \frac{ \partial \, {\varvec{\mu }}_h}{\partial \, {\varvec{p}}_h ^{\prime }} \ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }} \nonumber \\&+\, [{\varvec{Z}}^{\prime }\otimes {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h) \ {\varvec{M}}] \ {\varvec{\varPsi }}_t ^{\prime }\ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\nonumber \\= & {} \{[{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\ \otimes \ {\varvec{C}}] \ {\varvec{\varPsi }}_s^{\prime }\ [-{\mathrm{Diag}}^{-2}({\varvec{M}} {\varvec{p}}_h) \ {\varvec{M}}] \nonumber \\&+\, [{\varvec{Z}}^{\prime }\ \otimes \ {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h ) \ {\varvec{M}}] \ {\varvec{\varPsi }}_t ^{\prime }\} \ [\mathrm{Diag}({\varvec{p}}_h)-{\varvec{p}}_h {\varvec{p}}_h^{\prime }] \ {\varvec{Z}} \ {\varvec{R}}_h^{-1} \ {\varvec{X}}_h.\nonumber \\ \end{aligned}$$
(20)

Finally, Magnus and Neudecker (2007, Theorem 3, Sect. 4) leads to

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \ {\varvec{\beta }}^{\prime }}= & {} - \ \left( {\varvec{R}}_h^{-1'}\otimes {\varvec{R}}_h^{-1}\right) \ \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h}{\partial \ {\varvec{\beta }}^{\prime }}. \end{aligned}$$
(21)

Analogously, denoting by \(\bar{{\varvec{\mu }}}_h\) the vector of the reciprocal values of \({\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h\), we determine

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }}= & {} [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{L}}^{\prime }\otimes {\varvec{K}}] \ {\varvec{\varPsi }}_o^{\prime }\ \frac{ \partial \, \bar{{\varvec{\mu }}}_h}{\partial \, {\varvec{p}}_h ^{\prime }} \ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }} \nonumber \\&+\, [{\varvec{Z}}^{\prime }\otimes {\varvec{K}} \ {\mathrm{Diag}}^{-1}({\varvec{L}}{\varvec{p}}_h) \ {\varvec{L}}] \ {\varvec{\varPsi }}_t ^{\prime }\ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\nonumber \\= & {} \{[{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{L}}^{\prime }\ \otimes \ {\varvec{K}}] \ {\varvec{\varPsi }}_o^{\prime }\ [-{\mathrm{Diag}}^{-2}({\varvec{L}} {\varvec{p}}_h) \ {\varvec{L}}] \nonumber \\&+\, [{\varvec{Z}}^{\prime }\ \otimes \ {\varvec{K}} \ {\mathrm{Diag}}^{-1}({\varvec{L}}{\varvec{p}}_h ) \ {\varvec{L}}] \ {\varvec{\varPsi }}_t ^{\prime }\} \ [\mathrm{Diag}({\varvec{p}}_h)-{\varvec{p}}_h {\varvec{p}}_h^{\prime }] \ {\varvec{Z}} \ {\varvec{R}}_h^{-1} \ {\varvec{X}}_h,\nonumber \\ \end{aligned}$$
(22)

where o is the size of the vector \({\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h\) and \(\mathrm{vec}\ (\mathrm{Diag}^{-1}({\varvec{L}} {\varvec{p}}_h))= \ {\varvec{\varPsi }}_o ^{\prime }\ \bar{{\varvec{\mu }}}_h\).

Plugging the results (21) and (22) into the expression (19), we complete the description of \(\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}\).

Appendix B

Here, theorems introduced in Sect. 5.2.1 are demonstrated.

Proof of Theorem 1:

Let us choose a compact subset \({\mathcal {K}}\) of \({\mathcal {N}}\) containing \({\varvec{\beta }}^*\), where the open neighborhood \({\mathcal {N}}\) is defined by assumption A1. From A1 and White (1982)’ s Theorem 2.2, it follows that the estimator \({\varvec{b}}_n\), which maximizes \(L_n({\varvec{\beta }})\) on the compact set \({\mathcal {K}}\), converges in probability to \({\varvec{\beta }}^*\). Moreover, as \({\varvec{b}}_n={\varvec{\beta }}^*+ o_p(1)\) and \({\varvec{\beta }}^*\) is interior to the parametric space, with probability tending to one, it holds that \(b_n\) is interior to the parametric space and satisfies the first order conditions \({\varvec{s}}_n({\varvec{\beta }})={\varvec{0}}\). This proves (i).

From the mean value theorem, we have \({\varvec{s}}_n({\varvec{b}}_n)={\varvec{s}}_n({\varvec{\beta }}^*)+\bar{{\varvec{H}}}_n({\varvec{b}}_n- {\varvec{\beta }}^*),\) where every row of \(\bar{{\varvec{H}}}_n\) is computed at a different \({\varvec{\beta }}\) that lies between \({\varvec{b}}_n\) and \({\varvec{\beta }}^*\). Since \({\varvec{s}}_n({\varvec{b}}_n)=o_p(1)\), we obtain \(\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)=-\frac{1}{n}\bar{{\varvec{H}}}_n\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)+o_p(1).\) Knowing that

$$\begin{aligned}&\left| -\frac{1}{n}\bar{{\varvec{H}}}_n-{\varvec{A}}({\varvec{\beta }}^*)\right| \le \left| -\frac{1}{n}\bar{{\varvec{H}}}_n-{\varvec{A}} (\bar{{\varvec{\beta }}}) \right| +\left| {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\right| \\&\quad \le \sup _{{\varvec{\beta }}\in {\mathcal {N}}}\left| -\frac{1}{n}{\varvec{H}}_n({\varvec{\beta }})-{\varvec{A}}({\varvec{\beta }}) \right| +\left| {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\right| , \end{aligned}$$

where \(\Vert {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\Vert =o_p(1)\) and \(\sup _{{\varvec{\beta }}\in {\mathcal {N}}}\Vert -\frac{1}{n}{\varvec{H}}_n({\varvec{\beta }})-{\varvec{A}}({\varvec{\beta }}) \Vert \) converges in probability to zero on the compact set \({\mathcal {K}}\), it follows \(\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)={\varvec{A}}^{-1}({\varvec{\beta }}^*)\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)+o_p(1).\) Point (ii) is proved by considering that \(\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)\) is asymptotically distributed as a multivariate Normal variable with null expectation and covariance matrix \({\varvec{B}}({\varvec{\beta }}^*)\). \(\square \)

Proof of Theorem 2:

Under the null hypothesis, from a Taylor expansion of \(\mathrm{LR}_n\) around \(({\varvec{b}}_{n1}, {\varvec{b}}_{n2})\), it follows that

$$\begin{aligned} \mathrm{LR}_n=\frac{n}{2}({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)^{\prime }{\varvec{A}}_1^*({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)-\frac{n}{2}({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)^{\prime }{\varvec{A}}_2^*({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)+o_p(1). \end{aligned}$$

From a simple extension of point (ii) of Theorem 1, it holds that \(\sqrt{n}[({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)^{\prime },({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)^{\prime }]^{\prime }\) is asymptotically Normal with null expected value and covariance matrix

$$\begin{aligned} {\varvec{\varSigma }}=\left( \begin{array}{cc} {\varvec{A}}_2^{*-1}{\varvec{B}}_2^*{\varvec{A}}_2^{*-1} &{}{\varvec{A}}_2^{*-1}{\varvec{B}}_{21}^*{\varvec{A}}_1^{*-1} \\ {\varvec{A}}_1^{*-1}{\varvec{B}}_{12}^*{\varvec{A}}_2^{*-1} &{} {\varvec{A}}_1^{*-1}{\varvec{B}}_1^*{\varvec{A}}_1^{*-1} \\ \end{array} \right) . \end{aligned}$$
(23)

\(\square \)

Consequently, according to Mathai and Provost (1992, page 29) or Boos and Stefanski (2013, Theorem 8.1), the LR statistic (12) is asymptotically distributed as a weighted sum \(\sum \lambda _i Z_i^2\) of squared independent standard Normal random variables, where the weights \(\lambda _i\) are the eigenvalues of matrix \({\varvec{Q}} {\varvec{\varSigma }}\) with the block-diagonal \({\varvec{Q}}\) defined as \( {\varvec{Q}} =\left( \begin{array}{cc} -{\varvec{A}}_2^* &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{A}}_1^* \\ \end{array} \right) . \)

Now, consider the matrix

$$\begin{aligned} {\varvec{G}}=\left( \begin{array}{cc} {\varvec{I}}_{d_2} &{} {\varvec{0}} \\ {\varvec{B}}_{12}^* {\varvec{B}}_2^* &{} {\varvec{I}}_{d_1} \\ \end{array} \right) , \quad {\varvec{G}}^{-1}=\left( \begin{array}{cc} {\varvec{I}}_{d_2} &{} {\varvec{0}} \\ -{\varvec{B}}_{12}^* {\varvec{B}}_2^* &{} {\varvec{I}}_{d_1} \\ \end{array} \right) . \end{aligned}$$
(24)

Since for nested models the following equalities (Vuong 1989, Lemma B)

$$\begin{aligned}&{\varvec{B}}_1^*={\varvec{B}}_{12}^* {\varvec{B}}_2^{*-1}{\varvec{B}}_{21}^*,\quad&{\varvec{A}}_1^*={\varvec{B}}_{12}^* {\varvec{B}}_2^{*-1}{\varvec{A}}_2^* {\varvec{B}}_2^{*-1}{\varvec{B}}_{21}^*, \end{aligned}$$

hold under \(H_0\), it is easy to see that \({\varvec{G}} {\varvec{Q}} {\varvec{\varSigma }}{\varvec{G}}^{-1}=\left( \begin{array}{cc} {\varvec{\varLambda }}&{} -{\varvec{B}}_{21}^*{\varvec{A}}_1^{*-1} \\ {\varvec{0}} &{} {\varvec{0}} \\ \end{array} \right) ,\) where the matrix \({\varvec{\varLambda }}\) is given in Eq. (13). The last equality ensures that the non-null eigenvalues of \({\varvec{Q}} {\varvec{\varSigma }}\) are the non-null eigenvalues of \({\varvec{\varLambda }}\). To show that there are \(d_2-d_1\) non-null eigenvalues, note that the matrix \({\varvec{P}}^*= {\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12} ( {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2 {\varvec{A}}^{*}_2\)\({\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12})^{-1} {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2{\varvec{A}}^{*}_2\) is idempotent and has rank \(d_1\) as, in line with Lemma B by Vuong (1989), it holds that \({\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}={\varvec{\varPhi }}^*\) where \({\varvec{\varPhi }}^*\) has rank \(d_1\) according to the last assumption of Definition 1. Consequently,

$$\begin{aligned} \mathrm{rank}({\varvec{\varLambda }})= & {} \mathrm{rank}({\varvec{B}}_2^*({\varvec{P}}^*- {\varvec{I}}_{d_2}){\varvec{A}}_2^{*-1})\\ {}= & {} \mathrm{rank}({\varvec{I}}_{d_2}-{\varvec{P}}^*)=\mathrm{trace}({\varvec{I}}_{d_2}-{\varvec{P}}^*)=d_2-d_1. \end{aligned}$$

Here, the second equality follows from the non-singularity of matrices \({\varvec{B}}_2^*\) and \({\varvec{A}}_2^{*-1}\), while the third one from the idempotency of \({\varvec{I}}_{d_2}-{\varvec{P}}^*\). The previous result implies that \({\varvec{\varLambda }}\) has \(d_2-d_1\) non-null eigenvalues.

Proof of Theorem 3:

Under the null hypothesis of equivalence of the two models, it is \({\varvec{\beta }}_2^*= {\varvec{d}}({\varvec{\beta }}_1^*)\) and \({\varvec{\varDelta }}^*{\varvec{\varPhi }}^*={\varvec{0}}\), where \({\varvec{\varPhi }}^*={\varvec{\varPhi }}({\varvec{\beta }}_1^*)\) is introduced by Definition 1. From Lemma B by Vuong (1989), it holds that \({\varvec{\varPhi }}^*={\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}\). After some algebra the thesis follows from (13). \(\square \)

Proof of Theorem 4:

In line with Magnus and Neudecker (2007, Theorem 5, Ch. 1), the eigenvalues of \({\varvec{\varLambda }}\) are the eigenvalues of \(\bar{{\varvec{\varLambda }}}=-{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{B}}_2^*{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}\)\(({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}.\)\(\square \)

As \({\varvec{P}}={\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}\) is idempotent, Magnus and Neudecker (2007, Theorem 9, Ch. 1) implies that the eigenvalues of \(\bar{{\varvec{\varLambda }}}\) are also the eigenvalues of \({\varvec{P}} \bar{{\varvec{\varLambda }}}\). The \(d_2\times d_2\) matrix

$$\begin{aligned} {\varvec{K}}=\left( \begin{array}{c} ({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}} \\ ({\varvec{\varPhi }}^{*^{\prime }}{\varvec{A}}_2^{*}{\varvec{\varPhi }}^*)^{-\frac{1}{2}}{\varvec{\varPhi }}^{*^{\prime }}{\varvec{A}}_2^{*\frac{1}{2}} \\ \end{array} \right) , \end{aligned}$$

is such that \({\varvec{K}} {\varvec{K}} ^{\prime }={\varvec{I}}\), thus (Magnus and Neudecker 2007, Theorem 5, Ch. 1) the eigenvalues of \({\varvec{P}} \bar{{\varvec{\varLambda }}}\) are also the eigenvalues of

$$\begin{aligned} {\varvec{K}} {\varvec{P}} \bar{{\varvec{\varLambda }}} {\varvec{K}} ^{\prime }= \left( \begin{array}{cc} -({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*{\varvec{S}}_2^*{\varvec{\varDelta }}^{*^{\prime }}({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}} &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{0}} \\ \end{array} \right) . \end{aligned}$$

The hypotheses of Theorem 3 ensure that the \((d_2-d_1)\times (d_2-d_1)\) matrix \(-({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*\)\({\varvec{S}}_2^*{\varvec{\varDelta }}^{*^{\prime }} ({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}\) has strictly positive eigenvalues. The statement of the theorem follows by applying again Theorem 5, Ch. 1 by Magnus and Neudecker (2007).

Appendix C

For the models used in Sect. 6, we prove that when the true probabilities \(\tau _h(ij)\), elements of vector \({\varvec{\tau }}_h\), satisfy the independence condition: \(\tau _h(ij)=\tau _h(i \cdot )\tau _h(\cdot j),\) the equality \({\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*\) holds. According to Theorem 2 by Colombi et al. (2018), it is \(q_{h1}^*(ij)=q_{h1}^*(i \cdot )q_{h1}^*(\cdot j)\) for the elements of \({\varvec{q}}_{h1}^*.\)

It follows that \( K_1=\sum _h\sum _i\sum _j\tau _h(ij)\ln q_{h1}^*(ij)=\sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}^*(i\cdot )+\sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}^*(\cdot j),\) and for any other \({\varvec{q}}_{h1}\) belonging to \({\mathcal {M}}_1\) it is

$$\begin{aligned} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}^*(i\cdot )< & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}(i\cdot ),\\ \sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}^*(\cdot j)< & {} \sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}(\cdot j). \end{aligned}$$

From the previous results, it is easy to deduce that

$$\begin{aligned} K_2= & {} \sum _h\sum _i\sum _j\tau _h(ij)\ln q_{h2}^*(ij)\\= & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h2}^*(i\cdot )+\sum _h\sum _i\tau _h(i\cdot )\sum _j\tau _h(\cdot j)\ln \frac{q_{h2}^*(ij)}{q_{h2}^*(i\cdot )}\\= & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h2}^*(i\cdot )+\sum _j\tau _h(\cdot j)\ln \tilde{q}_{h2}(j) \ge K_1. \end{aligned}$$

The final equality and inequality follow by noting that there is a unique best approximating function \(\tilde{{\varvec{q}}}_{h2}\) of the marginal distribution with probabilities \(\tau _h(\cdot j)\) and that in the case of independence, model \({\mathcal {M}}_2\) reduces to \({\mathcal {M}}_1\), but as \({\mathcal {M}}_1\) is nested in \({\mathcal {M}}_2\), \(K_1 \ge K_2\) must also follow. Consequently, equality \({\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*\) is valid.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Colombi, R., Giordano, S. Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data. TEST 28, 1175–1202 (2019). https://doi.org/10.1007/s11749-019-00626-w

Download citation

Keywords

  • Misspecified models
  • Marginal models
  • Likelihood ratio tests
  • Weighted sum of Chi-squares

Mathematics Subject Classification

  • 62F03
  • 62H15