Abstract
The main purpose of this paper is to apply likelihood-based hypothesis testing procedures to a class of latent variable models for ordinal responses that allow for uncertain answers (Colombi et al. in Scand J Stat, 2018. https://doi.org/10.1111/sjos.12366). As these models are based on some assumptions, needed to describe different respondent behaviors, it is essential to discuss inferential issues without assuming that the tested model is correctly specified. By adapting the works of White (Econometrica 50(1):1–25, 1982) and Vuong (Econometrica 57(2):307–333, 1989), we are able to compare nested models under misspecification and then contrast the limiting distributions of Wald, Lagrange multiplier/score and likelihood ratio statistics with the classical asymptotic Chi-square to show the consequences of ignoring misspecification.
Similar content being viewed by others
References
Amemiya T (1985) Advanced econometrics. Harvard University Press, Cambridge
Bandura A (1986) Social foundations of thought and action: a social cognitive theory. Prentice-Hall, Englewood Cliffs
Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sin 17:691–711
Baumgartner H, Steenkamp JBE (2001) Response styles in marketing research: a cross-national investigation. J Market Res 38(2):143–156
Bergsma WP, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159
Boos DD, Stefanski LA (2013) Essential statistical inference: theory and methods. Springer, Berlin
Bowden RJ (1973) The theory of parametric identification. Econometrica 41(6):1069–74
Colombi R, Giordano S, Cazzaro M (2014) hmmm: an R package for hierarchical multinomial marginal models. J Stat Softw 59:1–25
Colombi R, Giordano S, Gottard A, Iannario M (2018) A hierarchical marginal model with latent uncertainty. Scand J Stat. https://doi.org/10.1111/sjos.12366
de Micheaux PL (2017) CompQuadForm: distribution function of quadratic forms in normal variables. R package version 1.4.3
de Leeuw ED, Dillman D (2008) International handbook of survey methodology. Lawrence Erlbaum Associates, Hillsdale
Duchesne P, de Micheaux PL (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu–Tang–Zhang approximation and exact methods. Comput Stat Data Anal 54:858–862
Forcina A (2008) Identifiability of extended latent class models with individual covariates. Comput Stat Data Anal 52:5263–5268
Glonek GF, McCullagh P (1995) Multivariate logistic models. J R Stat Soc Ser B (Methodological) 57:533–546
Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in cub models. Adv Data Anal Classif 10:225–244
Iannario M, Monti AC, Piccolo D (2016) Robustness issues for cub models. Test 25:731–750
Kullback S, Leiber RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Magnus JR (1988) Linear structures. Oxford University Press, Oxford
Magnus JR, Neudecker H (2007) Matrix differential calculus with applications in statistics and econometrics, 3rd edn. Wiley, London
Mathai AM, Provost SB (1992) Quadratic forms in random variables: theory and applications. Statistics: a series of textbooks and monographs. CRC Press, Boca Raton
Rothenberg T (1971) Identification in parametric models. Econometrica 39:577–591
Sagone E, De Caroli ME (2013) Personality factors and civic moral disengagement in law and psychology university students. Proc Soc Behav Sci 93:158–163
Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Stat Neerl 72:224–245
Studeny M (2005) Probabilistic conditional independence structures. Springer, London
Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, New York
Tutz G, Schneider M (2017) Mixture models for ordinal responses with a flexible uncertainty component. Technical Report Number 203
Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307–333
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25
Acknowledgements
We would like to thank Rocco Servidio of the Department of Languages and Educational Sciences (University of Calabria, Italy) for providing the real data analyzed in Sect. 7. Moreover, we acknowledge two referees for their useful comments that improved the initial version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
An useful result of matrix algebra, Magnus (1988, Definition 7.1), is here recalled for easy reference.
Lemma 1
Let \(w({\varvec{X}})\) be a vector containing the diagonal elements of a square matrix \({\varvec{X}}\). If \({\varvec{X}}\) is \(n \times n\) diagonal matrix, then there exists a \(n \times n^2\) matrix \({\varvec{\varPsi }}_n\) with the property
In the following part of “Appendix,” matrices \({\varvec{D}}_h=\frac{\partial \,{\varvec{\gamma }}_h}{\partial \,{\varvec{\beta }}^{\prime }}\) and \(\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}\) are computed. To obtain \({\varvec{D}}_h\) we rely on Forcina (2008). The saturated log-linear model for vector \({\varvec{p}}_h\) of the joint probabilities of the v observable responses and the v latent variables in the \(h\mathrm{th}\) stratum is denoted by
where \({\varvec{Z}}\) is the design matrix of the log-linear model. As shown by Bartolucci et al. (2007), the transformation from the log-linear parameters \({\varvec{\theta }}_h\) to the generalized interactions \({\varvec{\eta }}_h={\varvec{C}} \ln {\varvec{M}} {\varvec{p}}_h\) is a diffeomorphism and
with \({\varvec{\varOmega }}_h= {\mathrm{Diag}}({\varvec{p}}_h)- {\varvec{p}}_h {\varvec{p}}_h^{\prime }\). The second equality in (18) follows from the fact that \({\varvec{C}} \ {\mathrm{Diag}}^{-1}({\varvec{M}} {\varvec{p}}_h) \ {\varvec{M}} {\varvec{p}}_h = {\varvec{0}}\) since the sum of every row of \({\varvec{C}}\) is zero.
From the chain rule of matrix differential calculus (Magnus and Neudecker 2007), we get
where
To compute the Hessian for stratum h, it is necessary to calculate the derivative of matrix \({\varvec{D}}_h\), defined in (18). So that, from (18), we deduce
and to complete the formula the derivatives \(\frac{\partial \, \mathrm{vec} \ {\varvec{Q}}_h}{\partial \, {\varvec{\beta }}^{\prime }}\), \(\frac{\partial \, \mathrm{vec} \ {\varvec{R}}_h^{-1}}{\partial \, {\varvec{\beta }}^{\prime }}\) have to be computed.
In light of (17), matrix \({\varvec{R}}_h\) of Eq. (18) can be vectorized as
where t and s are the lengths of the vectors \({\varvec{p}}_h\) and \({\varvec{M}} {\varvec{p}}_h\), respectively, and \({\varvec{\mu }}_h\) is the vector of the reciprocal values of \({\varvec{M}} {\varvec{p}}_h\).
Thus, we obtain
Finally, Magnus and Neudecker (2007, Theorem 3, Sect. 4) leads to
Analogously, denoting by \(\bar{{\varvec{\mu }}}_h\) the vector of the reciprocal values of \({\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h\), we determine
where o is the size of the vector \({\varvec{q}}_h={\varvec{L}} {\varvec{p}}_h\) and \(\mathrm{vec}\ (\mathrm{Diag}^{-1}({\varvec{L}} {\varvec{p}}_h))= \ {\varvec{\varPsi }}_o ^{\prime }\ \bar{{\varvec{\mu }}}_h\).
Plugging the results (21) and (22) into the expression (19), we complete the description of \(\frac{\partial \, \mathrm{vec} \ {\varvec{D}}_h}{\partial \ {\varvec{\beta }}^{\prime }}\).
Appendix B
Here, theorems introduced in Sect. 5.2.1 are demonstrated.
Proof of Theorem 1:
Let us choose a compact subset \({\mathcal {K}}\) of \({\mathcal {N}}\) containing \({\varvec{\beta }}^*\), where the open neighborhood \({\mathcal {N}}\) is defined by assumption A1. From A1 and White (1982)’ s Theorem 2.2, it follows that the estimator \({\varvec{b}}_n\), which maximizes \(L_n({\varvec{\beta }})\) on the compact set \({\mathcal {K}}\), converges in probability to \({\varvec{\beta }}^*\). Moreover, as \({\varvec{b}}_n={\varvec{\beta }}^*+ o_p(1)\) and \({\varvec{\beta }}^*\) is interior to the parametric space, with probability tending to one, it holds that \(b_n\) is interior to the parametric space and satisfies the first order conditions \({\varvec{s}}_n({\varvec{\beta }})={\varvec{0}}\). This proves (i).
From the mean value theorem, we have \({\varvec{s}}_n({\varvec{b}}_n)={\varvec{s}}_n({\varvec{\beta }}^*)+\bar{{\varvec{H}}}_n({\varvec{b}}_n- {\varvec{\beta }}^*),\) where every row of \(\bar{{\varvec{H}}}_n\) is computed at a different \({\varvec{\beta }}\) that lies between \({\varvec{b}}_n\) and \({\varvec{\beta }}^*\). Since \({\varvec{s}}_n({\varvec{b}}_n)=o_p(1)\), we obtain \(\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)=-\frac{1}{n}\bar{{\varvec{H}}}_n\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)+o_p(1).\) Knowing that
where \(\Vert {\varvec{A}} (\bar{{\varvec{\beta }}})- {\varvec{A}}({\varvec{\beta }}^*)\Vert =o_p(1)\) and \(\sup _{{\varvec{\beta }}\in {\mathcal {N}}}\Vert -\frac{1}{n}{\varvec{H}}_n({\varvec{\beta }})-{\varvec{A}}({\varvec{\beta }}) \Vert \) converges in probability to zero on the compact set \({\mathcal {K}}\), it follows \(\sqrt{n}({\varvec{b}}_n- {\varvec{\beta }}^*)={\varvec{A}}^{-1}({\varvec{\beta }}^*)\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)+o_p(1).\) Point (ii) is proved by considering that \(\frac{1}{\sqrt{n}}{\varvec{s}}_n({\varvec{\beta }}^*)\) is asymptotically distributed as a multivariate Normal variable with null expectation and covariance matrix \({\varvec{B}}({\varvec{\beta }}^*)\). \(\square \)
Proof of Theorem 2:
Under the null hypothesis, from a Taylor expansion of \(\mathrm{LR}_n\) around \(({\varvec{b}}_{n1}, {\varvec{b}}_{n2})\), it follows that
From a simple extension of point (ii) of Theorem 1, it holds that \(\sqrt{n}[({\varvec{b}}_{n2}-{\varvec{\beta }}_2^*)^{\prime },({\varvec{b}}_{n1}-{\varvec{\beta }}_1^*)^{\prime }]^{\prime }\) is asymptotically Normal with null expected value and covariance matrix
\(\square \)
Consequently, according to Mathai and Provost (1992, page 29) or Boos and Stefanski (2013, Theorem 8.1), the LR statistic (12) is asymptotically distributed as a weighted sum \(\sum \lambda _i Z_i^2\) of squared independent standard Normal random variables, where the weights \(\lambda _i\) are the eigenvalues of matrix \({\varvec{Q}} {\varvec{\varSigma }}\) with the block-diagonal \({\varvec{Q}}\) defined as \( {\varvec{Q}} =\left( \begin{array}{cc} -{\varvec{A}}_2^* &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{A}}_1^* \\ \end{array} \right) . \)
Now, consider the matrix
Since for nested models the following equalities (Vuong 1989, Lemma B)
hold under \(H_0\), it is easy to see that \({\varvec{G}} {\varvec{Q}} {\varvec{\varSigma }}{\varvec{G}}^{-1}=\left( \begin{array}{cc} {\varvec{\varLambda }}&{} -{\varvec{B}}_{21}^*{\varvec{A}}_1^{*-1} \\ {\varvec{0}} &{} {\varvec{0}} \\ \end{array} \right) ,\) where the matrix \({\varvec{\varLambda }}\) is given in Eq. (13). The last equality ensures that the non-null eigenvalues of \({\varvec{Q}} {\varvec{\varSigma }}\) are the non-null eigenvalues of \({\varvec{\varLambda }}\). To show that there are \(d_2-d_1\) non-null eigenvalues, note that the matrix \({\varvec{P}}^*= {\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12} ( {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2 {\varvec{A}}^{*}_2\)\({\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12})^{-1} {\varvec{B}}^{*}_{12}{\varvec{B}}^{*-1}_2{\varvec{A}}^{*}_2\) is idempotent and has rank \(d_1\) as, in line with Lemma B by Vuong (1989), it holds that \({\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}={\varvec{\varPhi }}^*\) where \({\varvec{\varPhi }}^*\) has rank \(d_1\) according to the last assumption of Definition 1. Consequently,
Here, the second equality follows from the non-singularity of matrices \({\varvec{B}}_2^*\) and \({\varvec{A}}_2^{*-1}\), while the third one from the idempotency of \({\varvec{I}}_{d_2}-{\varvec{P}}^*\). The previous result implies that \({\varvec{\varLambda }}\) has \(d_2-d_1\) non-null eigenvalues.
Proof of Theorem 3:
Under the null hypothesis of equivalence of the two models, it is \({\varvec{\beta }}_2^*= {\varvec{d}}({\varvec{\beta }}_1^*)\) and \({\varvec{\varDelta }}^*{\varvec{\varPhi }}^*={\varvec{0}}\), where \({\varvec{\varPhi }}^*={\varvec{\varPhi }}({\varvec{\beta }}_1^*)\) is introduced by Definition 1. From Lemma B by Vuong (1989), it holds that \({\varvec{\varPhi }}^*={\varvec{B}}^{*-1}_2 {\varvec{B}}^{^{\prime }*}_{12}\). After some algebra the thesis follows from (13). \(\square \)
Proof of Theorem 4:
In line with Magnus and Neudecker (2007, Theorem 5, Ch. 1), the eigenvalues of \({\varvec{\varLambda }}\) are the eigenvalues of \(\bar{{\varvec{\varLambda }}}=-{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{B}}_2^*{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}\)\(({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}.\)\(\square \)
As \({\varvec{P}}={\varvec{A}}_2^{*-\frac{1}{2}}{\varvec{\varDelta }}^{*^{\prime }}({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-1}{\varvec{\varDelta }}^*{\varvec{A}}_2^{*-\frac{1}{2}}\) is idempotent, Magnus and Neudecker (2007, Theorem 9, Ch. 1) implies that the eigenvalues of \(\bar{{\varvec{\varLambda }}}\) are also the eigenvalues of \({\varvec{P}} \bar{{\varvec{\varLambda }}}\). The \(d_2\times d_2\) matrix
is such that \({\varvec{K}} {\varvec{K}} ^{\prime }={\varvec{I}}\), thus (Magnus and Neudecker 2007, Theorem 5, Ch. 1) the eigenvalues of \({\varvec{P}} \bar{{\varvec{\varLambda }}}\) are also the eigenvalues of
The hypotheses of Theorem 3 ensure that the \((d_2-d_1)\times (d_2-d_1)\) matrix \(-({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}{\varvec{\varDelta }}^*\)\({\varvec{S}}_2^*{\varvec{\varDelta }}^{*^{\prime }} ({\varvec{\varDelta }}^*{\varvec{A}}_2^{*-1}{\varvec{\varDelta }}^{*^{\prime }})^{-\frac{1}{2}}\) has strictly positive eigenvalues. The statement of the theorem follows by applying again Theorem 5, Ch. 1 by Magnus and Neudecker (2007).
Appendix C
For the models used in Sect. 6, we prove that when the true probabilities \(\tau _h(ij)\), elements of vector \({\varvec{\tau }}_h\), satisfy the independence condition: \(\tau _h(ij)=\tau _h(i \cdot )\tau _h(\cdot j),\) the equality \({\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*\) holds. According to Theorem 2 by Colombi et al. (2018), it is \(q_{h1}^*(ij)=q_{h1}^*(i \cdot )q_{h1}^*(\cdot j)\) for the elements of \({\varvec{q}}_{h1}^*.\)
It follows that \( K_1=\sum _h\sum _i\sum _j\tau _h(ij)\ln q_{h1}^*(ij)=\sum _h\sum _i\tau _h(i\cdot )\ln q_{h1}^*(i\cdot )+\sum _h\sum _j\tau _h(\cdot j)\ln q_{h1}^*(\cdot j),\) and for any other \({\varvec{q}}_{h1}\) belonging to \({\mathcal {M}}_1\) it is
From the previous results, it is easy to deduce that
The final equality and inequality follow by noting that there is a unique best approximating function \(\tilde{{\varvec{q}}}_{h2}\) of the marginal distribution with probabilities \(\tau _h(\cdot j)\) and that in the case of independence, model \({\mathcal {M}}_2\) reduces to \({\mathcal {M}}_1\), but as \({\mathcal {M}}_1\) is nested in \({\mathcal {M}}_2\), \(K_1 \ge K_2\) must also follow. Consequently, equality \({\varvec{q}}_{h1}^*={\varvec{q}}_{h2}^*\) is valid.
Rights and permissions
About this article
Cite this article
Colombi, R., Giordano, S. Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data. TEST 28, 1175–1202 (2019). https://doi.org/10.1007/s11749-019-00626-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-019-00626-w