Abstract
The Health and Retirement Study (HRS) is funded by the National Institute on Aging of US with the aim of investigating the health, social and economic implications of the aging of the American population. The participants of the study receive a thorough in-home clinical and neuropsychological assessment leading to a diagnosis of normal, cognitive impairment but not demented, or dementia. Due to the heterogeneity of the participants into three classes, we analyze some overall cognitive functioning responses through a factor mixture analysis model. The model extends recent proposals developed for binary and continuous data to general mixed data and to the situation of observed heterogeneity, typical of the HRS study.
Similar content being viewed by others
References
Baek, J., McLachlan, G.J., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)
Bartholomew, D.J.: The sensitivity of latent trait analysis to choice of prior distribution. Br. J. Math. Stat. Psychol. 41, 101–107 (1988)
Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Hodder Arnold, London (2011)
Bock, R.D., Atkin, M.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). Psychometrika 46, 443–459 (1981)
Cagnone, S., Viroli, C.: A factor mixture analysis model for multivariate binary data. Stat. Model. 12, 257–277 (2012)
Dempster, N.M., Laird, A.P., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs (1983)
Heeringa, S.G., Fisher, G.G., Hurd, M.D., Langa, K.M., Ofstedal, M.B., Plassman, B.L., Rodgers, W., Weir D.R.: Aging, demographics and memory study (ADAMS). Sample design, weights, and analysis for ADAMS (2007). http://hrsonline.isr.umich.edu/meta/adams/desc/AdamsSampleWeights.pdf
Jöreskog, K.G.: A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34, 183–202 (1969)
Juster, F.T., Suzman, R.: An overview of the health and retirement study. J. Hum. Resour. 30, 135–145 (1995)
Knott, M., Tzamourani, P.: Bootstrapping the estimated latent distribution of the two-parameter latent trait model. Br. J. Math. Stat. Psychol. 60, 175–191 (2007)
Ma, Y., Genton, M.G.: Explicit estimating equations for semiparametric generalized linear latent variable models. J. R. Stat. Soc. Ser. B 72, 475–495 (2010)
McLachlan, G.J., Peel, D. : Finite Mixture Models, Wiley, New York (2000)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, Wiley, 2nd edn. Hoboken, New Jersey (2008)
Meredith, W.: Measurement invariance, factor analysis, and factorial invariance. Psychometrika 58, 525–543 (1993)
Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10, 441–460 (2010a)
Montanari, A., Viroli, C.: A skew-normal factor model for the analysis of student satisfaction towards university courses. J. Appl. Stat. 37, 473–487 (2010b)
Moustaki, I., Knott, M.: Generalized latent trait models. Psychometrika 65, 391–411 (2000)
Moustaki, I.: A general class of latent variable models for ordinal manifest variables with covariate effects on manifest and latent variables. Br. J. Math. Stat. Psychol. 56, 337–357 (2003)
Muthén, B., Asparouhov, I.: Item response mixture modeling: application to tobacco. Addict. Behav. 31, 1050–1066 (2006, dependence criteria)
Muthén, B., Shedden, K.: Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55, 463–469 (1999)
Muthén, B., Lubke, G.H.: Investigating population heterogeneity with factor mixture models. Psychol. Methods 1, 21–39 (2005)
Rizopoulos, D., Moustaki, I.: Generalized latent variable models with non-linear effect. Br. J. Math. Stat. Psychol. 61, 415–438 (2008)
Skrondal, A., Rabe-Hesketh, S.: Generalized Latent Variable Modelling: Multilevel, Longitudinal, and Structural Equation Models. Chapman & Hall, London (2004)
Sörbom, D.: A general method for studying differences in factor means and factor structures between groups. Br. J. Math. Stat. Psychol. 27, 229–239 (1974)
Stroud, A.H., Secrest, D.: Gaussian Quadrature Formulas. Prentice Hall, Englewood Cliffs (1966)
Vermunt, J.K., Magidson, J.: Latent class models for classification. Comput. Stat. Data Anal. 41, 531–537 (2003)
Yung, Y.F.: Finite mixtures in confirmatory factor-analysis models. Psychometrika 62, 297–330 (1997)
Acknowledgments
We are grateful to the Center for the Study of Aging of US (http://www.rand.org) for the data use agreement. The HRS Diabetes Study is sponsored by the National Institute on Aging (Grant Number NIA U01AG009740) and was conducted by the University of Michigan.
Author information
Authors and Affiliations
Corresponding author
Appendix A: The EM-algorithm
Appendix A: The EM-algorithm
Consider first the conventional unsupervised framework. The two steps are the following ones.
1.1 A.1 E-step
In order to compute the conditional expected value of the complete log-likelihood given the observed data, we need to determine the conditional distribution of the latent variables given the observed data on the basis of provisional estimates of the parameters, \(\tilde{\varvec{\tau }}\):
Using Bayes’ rule, the first term of the previous expression is given by
where \(f(\mathbf{z}|s^{(i)}=1;\tilde{\varvec{\tau }})\) has the multivariate Gaussian density with vector mean \({\varvec{\xi }}_i^{\prime }\) and covariance matrix \({\varvec{\Sigma }}_i^{\prime }\) and \(f\left(\mathbf{y}|\mathbf{z},\mathbf{x}; \tilde{\varvec{\tau }}\right)\) is given in expression (2) and evaluated at \({\varvec{\alpha }}^{\prime }.\) However, \(f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})\) cannot be expressed in closed form and must be approximated in some way. Among the possible approximation methods, Gauss–Hermite quadrature points are used here:
where \(\omega _{t_1},\ldots ,\omega _{t_q}\) and \(\mathbf{z}_t=(z_{t_1},\ldots ,z_{t_q})^\top \) represent the weights and the points of the quadrature, respectively.
The second density of expression (15) is the posterior distribution of the allocation variable \(\mathbf{s}\) given the observed data which can be computed as posterior probability:
1.2 A.2 M-step
The optimization step (a) of the algorithm consists in evaluating and maximizing:
with respect to \({\varvec{\alpha }}_{j}^{\top }=(\lambda _{j0}, {\varvec{\lambda }}_{j}^{\top },{\varvec{\beta }}_{j}^{\top })\) with \(j=1,\ldots ,p,\) where \(f(\mathbf{z}|\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}) =\sum _{i=1}^k f(s_i=1|\mathbf{y};\tilde{\varvec{\tau }})f \left(\mathbf{z}|\mathbf{y},\mathbf{x}, s_i=1;\tilde{\varvec{\tau }}\right).\) Let \(S_0({\varvec{\alpha }}_j)\) be the derivative with respect to \({\varvec{\alpha }}_j\) of the log-density in (19):
In the case of binary and count data we obtain
where
With binary data \(N_{j}=1.\)
When the data are ordinal \(S_0({\varvec{\alpha }}_j)\) has the following expression
In this case the derivatives \(\theta ^{\prime }_{j(l)}\) and \(b_{j}^{\prime }(\theta _{j(l)})\) have to be computed with respect to \({\varvec{\alpha }}_{j}^{\top }=(\lambda _{j0(1)},\ldots , \lambda _{j0(d_{j})},{\varvec{\lambda }}^{\top }_{j}, {\varvec{\beta }}^{\top }_{j})\). Their expressions are given in Moustaki (2003).
Thus the expected score function with respect to the parameter vector \({\varvec{\alpha }}_j\)
can be evaluated by approximating the integrals with Gaussian–Hermite quadrature points:
where \(\mathbf{z}_t^*=\sqrt{2}{\varvec{\Sigma }}_i^{1/2}\mathbf{z}_t+{\varvec{\xi }}_i.\) The approximate gradient offers a non-explicit solution for the not null elements of the parameter vector \({\varvec{\alpha }}_j.\) The estimation problem can be solved by nonlinear optimization methods such as the Newton-type algorithms (Dennis and Schnabel (1983)).
The optimization step (b) of the algorithm consists in optimizing:
with respect to \({\varvec{\xi }}_i\) and \({\varvec{\Sigma }}_i.\) By substituting \(f(\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }})=f(s_i=1|\mathbf{y}, \mathbf{x};\tilde{\varvec{\tau }})f(\mathbf{z}\) \(| \mathbf{y},\mathbf{x},s_i=1;\tilde{\varvec{\tau }})\) in the previous expression, the estimates of the new Gaussian mixture parameters in terms of previous parameters \(\tilde{\varvec{\tau }}\) are
where the first and second conditional moments, \(E[\mathbf{z}|s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}]\) and \(E[\mathbf{{zz}}^\top |s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}],\) can be computed through the Gauss–Hermite quadrature points, similar to (17) and (23). In order to take into account the identifiability conditions given in (13) and (14), the following scaling (28) and centering (29) transformations are performed at each iteration of the EM algorithm:
where \(\mathbf{A}\) is the Cholesky decomposition matrix of \(\text{ Var}(\mathbf{z}).\)
Finally, the estimates for the weights of the mixture in step (c) can be computed by evaluating the score function of \(E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x},\tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};\varvec{\tau }) \right]=E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}[\log f(\mathbf{s};\tilde{\varvec{\tau }})]\) from which \(w_i=f \left( s_i=1|\mathbf{y};\tilde{\varvec{\tau }} \right)\!.\)
The variant of the algorithm for the fully supervised mixture model can be easily derived by observing that only maximization step (b) is involved, because, as previously observed, step (c) is not necessary (the weights being the observed group proportions) and for step (a) it holds \(E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}, \tilde{\varvec{\tau }}}\left[\log f(\mathbf{y}|\mathbf{z},\mathbf{x};\varvec{\tau }) \right]=E_{\mathbf{z}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};{\varvec{\tau }})\right].\) With regard to step (b), we need to maximize
with respect to \({\varvec{\xi }}_i\) and \({\varvec{\Sigma }}_i.\) By observing that \(f(\mathbf{s})\) at the denominator of the previous expression is known and it does not depend on parameters, we get expressions (26) and (27) in the M-step of the algorithm, where the posterior membership probabilities, \(f(\mathbf{s}|\mathbf{y},\mathbf{x}),\) are known and fixed at each iteration. For the generic observation, the posterior probability is a vector of length \(k\) whose \(i\)th element is 1 (with \(1\le i \le k\)) if the observation comes from the \(i\)th subpopulation and 0 viceversa.
Rights and permissions
About this article
Cite this article
Cagnone, S., Viroli, C. A factor mixture model for analyzing heterogeneity and cognitive structure of dementia. AStA Adv Stat Anal 98, 1–20 (2014). https://doi.org/10.1007/s10182-012-0206-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-012-0206-5