Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data

Colombi, Roberto; Giordano, Sabrina

doi:10.1007/s11749-019-00626-w

Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data

Original Paper
Published: 18 January 2019

Volume 28, pages 1175–1202, (2019)
Cite this article

TEST Aims and scope Submit manuscript

Roberto Colombi¹ &
Sabrina Giordano²

247 Accesses
5 Citations
Explore all metrics

Abstract

The main purpose of this paper is to apply likelihood-based hypothesis testing procedures to a class of latent variable models for ordinal responses that allow for uncertain answers (Colombi et al. in Scand J Stat, 2018. https://doi.org/10.1111/sjos.12366). As these models are based on some assumptions, needed to describe different respondent behaviors, it is essential to discuss inferential issues without assuming that the tested model is correctly specified. By adapting the works of White (Econometrica 50(1):1–25, 1982) and Vuong (Econometrica 57(2):307–333, 1989), we are able to compare nested models under misspecification and then contrast the limiting distributions of Wald, Lagrange multiplier/score and likelihood ratio statistics with the classical asymptotic Chi-square to show the consequences of ignoring misspecification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Inflated Model to Account for Large Heterogeneity in Ordinal Data

Dealing with heterogeneity in ordinal responses

Article 29 July 2016

Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random

Article 25 January 2018

References

Amemiya T (1985) Advanced econometrics. Harvard University Press, Cambridge
Google Scholar
Bandura A (1986) Social foundations of thought and action: a social cognitive theory. Prentice-Hall, Englewood Cliffs
Google Scholar
Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sin 17:691–711
MathSciNet MATH Google Scholar
Baumgartner H, Steenkamp JBE (2001) Response styles in marketing research: a cross-national investigation. J Market Res 38(2):143–156
Article Google Scholar
Bergsma WP, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159
Article MathSciNet Google Scholar
Boos DD, Stefanski LA (2013) Essential statistical inference: theory and methods. Springer, Berlin
Book Google Scholar
Bowden RJ (1973) The theory of parametric identification. Econometrica 41(6):1069–74
Article MathSciNet Google Scholar
Colombi R, Giordano S, Cazzaro M (2014) hmmm: an R package for hierarchical multinomial marginal models. J Stat Softw 59:1–25
Article Google Scholar
Colombi R, Giordano S, Gottard A, Iannario M (2018) A hierarchical marginal model with latent uncertainty. Scand J Stat. https://doi.org/10.1111/sjos.12366
Article MathSciNet Google Scholar
de Micheaux PL (2017) CompQuadForm: distribution function of quadratic forms in normal variables. R package version 1.4.3
de Leeuw ED, Dillman D (2008) International handbook of survey methodology. Lawrence Erlbaum Associates, Hillsdale
Google Scholar
Duchesne P, de Micheaux PL (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu–Tang–Zhang approximation and exact methods. Comput Stat Data Anal 54:858–862
Article MathSciNet Google Scholar
Forcina A (2008) Identifiability of extended latent class models with individual covariates. Comput Stat Data Anal 52:5263–5268
Article MathSciNet Google Scholar
Glonek GF, McCullagh P (1995) Multivariate logistic models. J R Stat Soc Ser B (Methodological) 57:533–546
MATH Google Scholar
Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in cub models. Adv Data Anal Classif 10:225–244
Article MathSciNet Google Scholar
Iannario M, Monti AC, Piccolo D (2016) Robustness issues for cub models. Test 25:731–750
Article MathSciNet Google Scholar
Kullback S, Leiber RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet Google Scholar
Magnus JR (1988) Linear structures. Oxford University Press, Oxford
MATH Google Scholar
Magnus JR, Neudecker H (2007) Matrix differential calculus with applications in statistics and econometrics, 3rd edn. Wiley, London
MATH Google Scholar
Mathai AM, Provost SB (1992) Quadratic forms in random variables: theory and applications. Statistics: a series of textbooks and monographs. CRC Press, Boca Raton
Google Scholar
Rothenberg T (1971) Identification in parametric models. Econometrica 39:577–591
Article MathSciNet Google Scholar
Sagone E, De Caroli ME (2013) Personality factors and civic moral disengagement in law and psychology university students. Proc Soc Behav Sci 93:158–163
Article Google Scholar
Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Stat Neerl 72:224–245
Article MathSciNet Google Scholar
Studeny M (2005) Probabilistic conditional independence structures. Springer, London
MATH Google Scholar
Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, New York
Book Google Scholar
Tutz G, Schneider M (2017) Mixture models for ordinal responses with a flexible uncertainty component. Technical Report Number 203
Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307–333
Article MathSciNet Google Scholar
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank Rocco Servidio of the Department of Languages and Educational Sciences (University of Calabria, Italy) for providing the real data analyzed in Sect. 7. Moreover, we acknowledge two referees for their useful comments that improved the initial version of the paper.

Author information

Authors and Affiliations

Department of Management, Information and Production Engineering, University of Bergamo, Bergamo, Italy
Roberto Colombi
Department of Economics, Statistics and Finance “Giovanni Anania”, University of Calabria, Cosenza, Italy
Sabrina Giordano

Authors

Roberto Colombi
View author publications
You can also search for this author in PubMed Google Scholar
Sabrina Giordano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabrina Giordano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

An useful result of matrix algebra, Magnus (1988, Definition 7.1), is here recalled for easy reference.

Lemma 1

Let $w({\varvec{X}})$ be a vector containing the diagonal elements of a square matrix ${\varvec{X}}$. If ${\varvec{X}}$ is $n \times n$ diagonal matrix, then there exists a $n \times n^2$ matrix ${\varvec{\varPsi }}_n$ with the property

$$\begin{aligned} \mathrm{vec} \, {\varvec{X}} = {\varvec{\varPsi }}^{\prime }_n w({\varvec{X}}). \end{aligned}$$

(17)

In the following part of “Appendix,” matrices ${\varvec{D}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\beta }}^{\prime }}$ and $\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}$ are computed. To obtain ${\varvec{D}}_h$ we rely on Forcina (2008). The saturated log-linear model for vector ${\varvec{p}}_h$ of the joint probabilities of the v observable responses and the v latent variables in the $h\mathrm{th}$ stratum is denoted by

$$\begin{aligned} {\varvec{p}}_h=\frac{\exp ({\varvec{Z}} \ {\varvec{\theta }}_h)}{{\varvec{1}}^{\prime }\exp ({\varvec{Z}} \ {\varvec{\theta }}_h)}, \end{aligned}$$

where ${\varvec{Z}}$ is the design matrix of the log-linear model. As shown by Bartolucci et al. (2007), the transformation from the log-linear parameters ${\varvec{\theta }}_h$ to the generalized interactions ${\varvec{\eta }}_h={\varvec{C}} \ln {\varvec{M}} {\varvec{p}}_h$ is a diffeomorphism and

$$\begin{aligned} {\varvec{R}}_h=\frac{\partial {\varvec{\eta }}_h}{\partial {\varvec{\theta }}_h^{\prime }}= {\varvec{C}} \; {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \; {\varvec{M}} \, {\varvec{\varOmega }}_h {\varvec{Z}} = {\varvec{C}} \; {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \; {\varvec{M}} \, \mathrm{Diag}({\varvec{p}}_h) {\varvec{Z}} , \end{aligned}$$

with ${\varvec{\varOmega }}_h= {\mathrm{Diag}}({\varvec{p}}_h)- {\varvec{p}}_h {\varvec{p}}_h^{\prime }$. The second equality in (18) follows from the fact that ${\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \ {\varvec{M}} {\varvec{p}}_h = {\varvec{0}}$ since the sum of every row of ${\varvec{C}}$ is zero.

From the chain rule of matrix differential calculus (Magnus and Neudecker 2007), we get

$$\begin{aligned} {\varvec{D}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\beta }}^{\prime }}=\frac{\partial \, {\varvec{\gamma }}_h}{\partial \, {\varvec{\theta }}_h^{\prime }} \frac{\partial \, {\varvec{\theta }}_h}{\partial \, {\varvec{\eta }}_h^{\prime }}\frac{\partial \, {\varvec{\eta }}_h}{\partial \, {\varvec{\beta }}^{\prime }}={\varvec{Q}}_h \,{\varvec{R}}_h^{-1}\, {\varvec{X}}_h, \end{aligned}$$

(18)

where

$$\begin{aligned} {\varvec{Q}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\theta }}_h ^{\prime }}= & {} {\varvec{K}} \,{\mathrm{Diag}}^{-1}({\varvec{q}}_h) \; {\varvec{L}} \, {\varvec{\varOmega }}_h {\varvec{Z}} ={\varvec{K}} \,{\mathrm{Diag}}^{-1}({\varvec{q}}_h) \; {\varvec{L}} \, \mathrm{Diag}({\varvec{p}}_h) {\varvec{Z}}. \end{aligned}$$

To compute the Hessian for stratum h, it is necessary to calculate the derivative of matrix ${\varvec{D}}_h$, defined in (18). So that, from (18), we deduce

$$\begin{aligned} \frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }} = ({\varvec{X}}_h^{\prime }{\varvec{R}}_h^{^{\prime }-1} \otimes \ {\varvec{I}}_{m-1}) \ \frac{\partial \, \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }} + ({\varvec{X}}_h^{\prime }\ \otimes \ {\varvec{Q}}_h) \ \frac{\partial \, \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \, {\varvec{\beta }}^{\prime }}, \end{aligned}$$

(19)

and to complete the formula the derivatives $\frac{\partial \, \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }}$, $\frac{\partial \, \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \, {\varvec{\beta }}^{\prime }}$ have to be computed.

In light of (17), matrix ${\varvec{R}}_h$ of Eq. (18) can be vectorized as

$$\begin{aligned} \mathrm{vec} \, {\varvec{R}}_h= & {} [{\varvec{Z}}^{\prime }\otimes {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h )\ {\varvec{M}}]\ {\varvec{\varPsi }}_t ^{\prime }\ {\varvec{p}}_h = [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\otimes {\varvec{C}}] \ {\varvec{\varPsi }}_s ^{\prime }\ {\varvec{\mu }}_h, \end{aligned}$$

where t and s are the lengths of the vectors ${\varvec{p}}_h$ and ${\varvec{M}} {\varvec{p}}_h$, respectively, and ${\varvec{\mu }}_h$ is the vector of the reciprocal values of ${\varvec{M}} {\varvec{p}}_h$.

Thus, we obtain

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h}{\partial \, {\varvec{\beta }}^{\prime }}= & {} [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\otimes {\varvec{C}}] \ {\varvec{\varPsi }}_s^{\prime }\ \frac{ \partial \, {\varvec{\mu }}_h}{\partial \, {\varvec{p}}_h ^{\prime }} \ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }} \nonumber \\&+\, [{\varvec{Z}}^{\prime }\otimes {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h) \ {\varvec{M}}] \ {\varvec{\varPsi }}_t ^{\prime }\ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\nonumber \\= & {} \{[{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{M}}^{\prime }\ \otimes \ {\varvec{C}}] \ {\varvec{\varPsi }}_s^{\prime }\ [-{\mathrm{Diag}}^{-2}({\varvec{M}} {\varvec{p}}_h) \ {\varvec{M}}] \nonumber \\&+\, [{\varvec{Z}}^{\prime }\ \otimes \ {\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}}{\varvec{p}}_h ) \ {\varvec{M}}] \ {\varvec{\varPsi }}_t ^{\prime }\} \ [\mathrm{Diag}({\varvec{p}}_h)-{\varvec{p}}_h {\varvec{p}}_h^{\prime }] \ {\varvec{Z}} \ {\varvec{R}}_h^{-1} \ {\varvec{X}}_h.\nonumber \\ \end{aligned}$$

(20)

Finally, Magnus and Neudecker (2007, Theorem 3, Sect. 4) leads to

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \ {\varvec{\beta }}^{\prime }}= & {} - \ \left( {\varvec{R}}_h^{-1'}\otimes {\varvec{R}}_h^{-1}\right) \ \frac{\partial \ \mathrm{vec} \ {\varvec{R}}_h}{\partial \ {\varvec{\beta }}^{\prime }}. \end{aligned}$$

(21)

Analogously, denoting by $\bar{{\varvec{\mu }}}_h$ the vector of the reciprocal values of ${\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h$, we determine

$$\begin{aligned} \frac{\partial \ \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }}= & {} [{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{L}}^{\prime }\otimes {\varvec{K}}] \ {\varvec{\varPsi }}_o^{\prime }\ \frac{ \partial \, \bar{{\varvec{\mu }}}_h}{\partial \, {\varvec{p}}_h ^{\prime }} \ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }} \nonumber \\&+\, [{\varvec{Z}}^{\prime }\otimes {\varvec{K}} \ {\mathrm{Diag}}^{-1}({\varvec{L}}{\varvec{p}}_h) \ {\varvec{L}}] \ {\varvec{\varPsi }}_t ^{\prime }\ \frac{\partial \, {\varvec{p}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\nonumber \\= & {} \{[{\varvec{Z}}^{\prime }\ {\mathrm{Diag}({\varvec{p}}_h)} \ {\varvec{L}}^{\prime }\ \otimes \ {\varvec{K}}] \ {\varvec{\varPsi }}_o^{\prime }\ [-{\mathrm{Diag}}^{-2}({\varvec{L}} {\varvec{p}}_h) \ {\varvec{L}}] \nonumber \\&+\, [{\varvec{Z}}^{\prime }\ \otimes \ {\varvec{K}} \ {\mathrm{Diag}}^{-1}({\varvec{L}}{\varvec{p}}_h ) \ {\varvec{L}}] \ {\varvec{\varPsi }}_t ^{\prime }\} \ [\mathrm{Diag}({\varvec{p}}_h)-{\varvec{p}}_h {\varvec{p}}_h^{\prime }] \ {\varvec{Z}} \ {\varvec{R}}_h^{-1} \ {\varvec{X}}_h,\nonumber \\ \end{aligned}$$

(22)

where o is the size of the vector ${\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h$ and $\mathrm{vec}\ (\mathrm{Diag}^{-1}({\varvec{L}} {\varvec{p}}_h))= \ {\varvec{\varPsi }}_o ^{\prime }\ \bar{{\varvec{\mu }}}_h$.

Plugging the results (21) and (22) into the expression (19), we complete the description of $\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}$.

Appendix B

Here, theorems introduced in Sect. 5.2.1 are demonstrated.

Proof of Theorem 1:

Let us choose a compact subset ${\mathcal {K}}$ of ${\mathcal {N}}$ containing ${\varvec{\beta }}^*$, where the open neighborhood ${\mathcal {N}}$ is defined by assumption A1. From A1 and White (1982)’ s Theorem 2.2, it follows that the estimator ${\varvec{b}}_n$, which maximizes $L_n({\varvec{\beta }})$ on the compact set ${\mathcal {K}}$, converges in probability to ${\varvec{\beta }}^*$. Moreover, as ${\varvec{b}}_n={\varvec{\beta }}^*+ o_p(1)$ and ${\varvec{\beta }}^*$ is interior to the parametric space, with probability tending to one, it holds that $b_n$ is interior to the parametric space and satisfies the first order conditions ${\varvec{s}}_n({\varvec{\beta }})={\varvec{0}}$. This proves (i).

From the mean value theorem, we have ${\varvec{s}}_n({\varvec{b}}_n)={\varvec{s}}_n({\varvec{\beta }}^*)+\bar{{\varvec{H}}}_n({\varvec{b}}_n- {\varvec{\beta }}^*),$ where every row of $\bar{{\varvec{H}}}_n$ is computed at a different ${\varvec{\beta }}$ that lies between ${\varvec{b}}_n$ and ${\varvec{\beta }}^*$. Since ${\varvec{s}}_n({\varvec{b}}_n)=o_p(1)$, we obtain $\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)=-\frac{1}{n}\bar{{\varvec{H}}}_n\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)+o_p(1).$ Knowing that

$$\begin{aligned}&\left| -\frac{1}{n}\bar{{\varvec{H}}}_n-{\varvec{A}}({\varvec{\beta }}^*)\right| \le \left| -\frac{1}{n}\bar{{\varvec{H}}}_n-{\varvec{A}} (\bar{{\varvec{\beta }}}) \right| +\left| {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\right| \\&\quad \le \sup _{{\varvec{\beta }}\in {\mathcal {N}}}\left| -\frac{1}{n}{\varvec{H}}_n({\varvec{\beta }})-{\varvec{A}}({\varvec{\beta }}) \right| +\left| {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\right| , \end{aligned}$$

where $\Vert {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\Vert =o_p(1)$ and $\sup _{{\varvec{\beta }}\in {\mathcal {N}}}\Vert -\frac{1}{n}{\varvec{H}}_n({\varvec{\beta }})-{\varvec{A}}({\varvec{\beta }}) \Vert $ converges in probability to zero on the compact set ${\mathcal {K}}$, it follows $\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)={\varvec{A}}^{-1}({\varvec{\beta }}^*)\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)+o_p(1).$ Point (ii) is proved by considering that $\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)$ is asymptotically distributed as a multivariate Normal variable with null expectation and covariance matrix ${\varvec{B}}({\varvec{\beta }}^*)$. $\square $

Proof of Theorem 2:

Under the null hypothesis, from a Taylor expansion of $\mathrm{LR}_n$ around $({\varvec{b}}_{n1}, {\varvec{b}}_{n2})$, it follows that

$$\begin{aligned} \mathrm{LR}_n=\frac{n}{2}({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)^{\prime }{\varvec{A}}_1^*({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)-\frac{n}{2}({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)^{\prime }{\varvec{A}}_2^*({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)+o_p(1). \end{aligned}$$

From a simple extension of point (ii) of Theorem 1, it holds that $\sqrt{n}[({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)^{\prime },({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)^{\prime }]^{\prime }$ is asymptotically Normal with null expected value and covariance matrix

$$\begin{aligned} {\varvec{\varSigma }}=\left( \begin{array}{cc} {\varvec{A}}_2^{*-1}{\varvec{B}}_2^*{\varvec{A}}_2^{*-1} &{}{\varvec{A}}_2^{*-1}{\varvec{B}}_{21}^*{\varvec{A}}_1^{*-1} \\ {\varvec{A}}_1^{*-1}{\varvec{B}}_{12}^*{\varvec{A}}_2^{*-1} &{} {\varvec{A}}_1^{*-1}{\varvec{B}}_1^*{\varvec{A}}_1^{*-1} \\ \end{array} \right) . \end{aligned}$$

(23)

$\square $

Consequently, according to Mathai and Provost (1992, page 29) or Boos and Stefanski (2013, Theorem 8.1), the LR statistic (12) is asymptotically distributed as a weighted sum $\sum \lambda _i Z_i^2$ of squared independent standard Normal random variables, where the weights $\lambda _i$ are the eigenvalues of matrix ${\varvec{Q}} {\varvec{\varSigma }}$ with the block-diagonal ${\varvec{Q}}$ defined as $ {\varvec{Q}} =\left( \begin{array}{cc} -{\varvec{A}}_2^* &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{A}}_1^* \\ \end{array} \right) . $

Now, consider the matrix

$$\begin{aligned} {\varvec{G}}=\left( \begin{array}{cc} {\varvec{I}}_{d_2} &{} {\varvec{0}} \\ {\varvec{B}}_{12}^* {\varvec{B}}_2^* &{} {\varvec{I}}_{d_1} \\ \end{array} \right) , \quad {\varvec{G}}^{-1}=\left( \begin{array}{cc} {\varvec{I}}_{d_2} &{} {\varvec{0}} \\ -{\varvec{B}}_{12}^* {\varvec{B}}_2^* &{} {\varvec{I}}_{d_1} \\ \end{array} \right) . \end{aligned}$$

(24)

Since for nested models the following equalities (Vuong 1989, Lemma B)

$$\begin{aligned}&{\varvec{B}}_1^*={\varvec{B}}_{12}^* {\varvec{B}}_2^{*-1}{\varvec{B}}_{21}^*,\quad&{\varvec{A}}_1^*={\varvec{B}}_{12}^* {\varvec{B}}_2^{*-1}{\varvec{A}}_2^* {\varvec{B}}_2^{*-1}{\varvec{B}}_{21}^*, \end{aligned}$$

hold under $H_0$, it is easy to see that ${\varvec{G}} {\varvec{Q}} {\varvec{\varSigma }}{\varvec{G}}^{-1}=\left( \begin{array}{cc} {\varvec{\varLambda }}&{} -{\varvec{B}}_{21}^*{\varvec{A}}_1^{*-1} \\ {\varvec{0}} &{} {\varvec{0}} \\ \end{array} \right) ,$ where the matrix ${\varvec{\varLambda }}$ is given in Eq. (13). The last equality ensures that the non-null eigenvalues of ${\varvec{Q}} {\varvec{\varSigma }}$ are the non-null eigenvalues of ${\varvec{\varLambda }}$. To show that there are $d_2-d_1$ non-null eigenvalues, note that the matrix ${\varvec{P}}^*= {\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12} ( {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2 {\varvec{A}}^{*}_2$${\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12})^{-1} {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2{\varvec{A}}^{*}_2$ is idempotent and has rank $d_1$ as, in line with Lemma B by Vuong (1989), it holds that ${\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}={\varvec{\varPhi }}^*$ where ${\varvec{\varPhi }}^*$ has rank $d_1$ according to the last assumption of Definition 1. Consequently,

$$\begin{aligned} \mathrm{rank}({\varvec{\varLambda }})= & {} \mathrm{rank}({\varvec{B}}_2^*({\varvec{P}}^*- {\varvec{I}}_{d_2}){\varvec{A}}_2^{*-1})\\ {}= & {} \mathrm{rank}({\varvec{I}}_{d_2}-{\varvec{P}}^*)=\mathrm{trace}({\varvec{I}}_{d_2}-{\varvec{P}}^*)=d_2-d_1. \end{aligned}$$

Here, the second equality follows from the non-singularity of matrices ${\varvec{B}}_2^*$ and ${\varvec{A}}_2^{*-1}$, while the third one from the idempotency of ${\varvec{I}}_{d_2}-{\varvec{P}}^*$. The previous result implies that ${\varvec{\varLambda }}$ has $d_2-d_1$ non-null eigenvalues.

Proof of Theorem 3:

Under the null hypothesis of equivalence of the two models, it is ${\varvec{\beta }}_2^*= {\varvec{d}}({\varvec{\beta }}_1^*)$ and ${\varvec{\varDelta }}^*{\varvec{\varPhi }}^*={\varvec{0}}$, where ${\varvec{\varPhi }}^*={\varvec{\varPhi }}({\varvec{\beta }}_1^*)$ is introduced by Definition 1. From Lemma B by Vuong (1989), it holds that ${\varvec{\varPhi }}^*={\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}$. After some algebra the thesis follows from (13). $\square $

Proof of Theorem 4:

In line with Magnus and Neudecker (2007, Theorem 5, Ch. 1), the eigenvalues of ${\varvec{\varLambda }}$ are the eigenvalues of $\bar{{\varvec{\varLambda }}}=-{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{B}}_2^*{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}$$({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}.$$\square $

As ${\varvec{P}}={\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}$ is idempotent, Magnus and Neudecker (2007, Theorem 9, Ch. 1) implies that the eigenvalues of $\bar{{\varvec{\varLambda }}}$ are also the eigenvalues of ${\varvec{P}} \bar{{\varvec{\varLambda }}}$. The $d_2\times d_2$ matrix

$$\begin{aligned} {\varvec{K}}=\left( \begin{array}{c} ({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}} \\ ({\varvec{\varPhi }}^{*^{\prime }}{\varvec{A}}_2^{*}{\varvec{\varPhi }}^*)^{-\frac{1}{2}}{\varvec{\varPhi }}^{*^{\prime }}{\varvec{A}}_2^{*\frac{1}{2}} \\ \end{array} \right) , \end{aligned}$$

is such that ${\varvec{K}} {\varvec{K}} ^{\prime }={\varvec{I}}$, thus (Magnus and Neudecker 2007, Theorem 5, Ch. 1) the eigenvalues of ${\varvec{P}} \bar{{\varvec{\varLambda }}}$ are also the eigenvalues of

$$\begin{aligned} {\varvec{K}} {\varvec{P}} \bar{{\varvec{\varLambda }}} {\varvec{K}} ^{\prime }= \left( \begin{array}{cc} -({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*{\varvec{S}}_2^*{\varvec{\varDelta }}^{*^{\prime }}({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}} &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{0}} \\ \end{array} \right) . \end{aligned}$$

The hypotheses of Theorem 3 ensure that the $(d_2-d_1)\times (d_2-d_1)$ matrix $-({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*$${\varvec{S}}_2^*{\varvec{\varDelta }}^{*^{\prime }} ({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}$ has strictly positive eigenvalues. The statement of the theorem follows by applying again Theorem 5, Ch. 1 by Magnus and Neudecker (2007).

Appendix C

For the models used in Sect. 6, we prove that when the true probabilities $\tau _h(ij)$, elements of vector ${\varvec{\tau }}_h$, satisfy the independence condition: $\tau _h(ij)=\tau _h(i \cdot )\tau _h(\cdot j),$ the equality ${\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*$ holds. According to Theorem 2 by Colombi et al. (2018), it is $q_{h1}^*(ij)=q_{h1}^*(i \cdot )q_{h1}^*(\cdot j)$ for the elements of ${\varvec{q}}_{h1}^*.$

It follows that $ K_1=\sum _h\sum _i\sum _j\tau _h(ij)\ln q_{h1}^*(ij)=\sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}^*(i\cdot )+\sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}^*(\cdot j),$ and for any other ${\varvec{q}}_{h1}$ belonging to ${\mathcal {M}}_1$ it is

$$\begin{aligned} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}^*(i\cdot )< & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}(i\cdot ),\\ \sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}^*(\cdot j)< & {} \sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}(\cdot j). \end{aligned}$$

From the previous results, it is easy to deduce that

$$\begin{aligned} K_2= & {} \sum _h\sum _i\sum _j\tau _h(ij)\ln q_{h2}^*(ij)\\= & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h2}^*(i\cdot )+\sum _h\sum _i\tau _h(i\cdot )\sum _j\tau _h(\cdot j)\ln \frac{q_{h2}^*(ij)}{q_{h2}^*(i\cdot )}\\= & {} \sum _h\sum _i\tau _h(i\cdot )\ln q_{h2}^*(i\cdot )+\sum _j\tau _h(\cdot j)\ln \tilde{q}_{h2}(j) \ge K_1. \end{aligned}$$

The final equality and inequality follow by noting that there is a unique best approximating function $\tilde{{\varvec{q}}}_{h2}$ of the marginal distribution with probabilities $\tau _h(\cdot j)$ and that in the case of independence, model ${\mathcal {M}}_2$ reduces to ${\mathcal {M}}_1$, but as ${\mathcal {M}}_1$ is nested in ${\mathcal {M}}_2$, $K_1 \ge K_2$ must also follow. Consequently, equality ${\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*$ is valid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Colombi, R., Giordano, S. Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data. TEST 28, 1175–1202 (2019). https://doi.org/10.1007/s11749-019-00626-w

Download citation

Received: 09 March 2018
Accepted: 08 January 2019
Published: 18 January 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11749-019-00626-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data

Abstract

Access this article

Similar content being viewed by others

An Inflated Model to Account for Large Heterogeneity in Ordinal Data

Dealing with heterogeneity in ordinal responses

Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 1

Appendix B

Proof of Theorem 1:

Proof of Theorem 2:

Proof of Theorem 3:

Proof of Theorem 4:

Appendix C

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data

Abstract

Access this article

Similar content being viewed by others

An Inflated Model to Account for Large Heterogeneity in Ordinal Data

Dealing with heterogeneity in ordinal responses

Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 1

Appendix B

Proof of Theorem 1:

Proof of Theorem 2:

Proof of Theorem 3:

Proof of Theorem 4:

Appendix C

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation