Skip to main content

Advertisement

Log in

A factor mixture model for analyzing heterogeneity and cognitive structure of dementia

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

The Health and Retirement Study (HRS) is funded by the National Institute on Aging of US with the aim of investigating the health, social and economic implications of the aging of the American population. The participants of the study receive a thorough in-home clinical and neuropsychological assessment leading to a diagnosis of normal, cognitive impairment but not demented, or dementia. Due to the heterogeneity of the participants into three classes, we analyze some overall cognitive functioning responses through a factor mixture analysis model. The model extends recent proposals developed for binary and continuous data to general mixed data and to the situation of observed heterogeneity, typical of the HRS study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Baek, J., McLachlan, G.J., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)

    Article  Google Scholar 

  • Bartholomew, D.J.: The sensitivity of latent trait analysis to choice of prior distribution. Br. J. Math. Stat. Psychol. 41, 101–107 (1988)

    Article  MATH  Google Scholar 

  • Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Hodder Arnold, London (2011)

    Book  Google Scholar 

  • Bock, R.D., Atkin, M.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). Psychometrika 46, 443–459 (1981)

    Article  MathSciNet  Google Scholar 

  • Cagnone, S., Viroli, C.: A factor mixture analysis model for multivariate binary data. Stat. Model. 12, 257–277 (2012)

    Article  Google Scholar 

  • Dempster, N.M., Laird, A.P., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    Google Scholar 

  • Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs (1983)

    MATH  Google Scholar 

  • Heeringa, S.G., Fisher, G.G., Hurd, M.D., Langa, K.M., Ofstedal, M.B., Plassman, B.L., Rodgers, W., Weir D.R.: Aging, demographics and memory study (ADAMS). Sample design, weights, and analysis for ADAMS (2007). http://hrsonline.isr.umich.edu/meta/adams/desc/AdamsSampleWeights.pdf

  • Jöreskog, K.G.: A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34, 183–202 (1969)

    Article  Google Scholar 

  • Juster, F.T., Suzman, R.: An overview of the health and retirement study. J. Hum. Resour. 30, 135–145 (1995)

    Article  Google Scholar 

  • Knott, M., Tzamourani, P.: Bootstrapping the estimated latent distribution of the two-parameter latent trait model. Br. J. Math. Stat. Psychol. 60, 175–191 (2007)

    Article  MathSciNet  Google Scholar 

  • Ma, Y., Genton, M.G.: Explicit estimating equations for semiparametric generalized linear latent variable models. J. R. Stat. Soc. Ser. B 72, 475–495 (2010)

    Article  MathSciNet  Google Scholar 

  • McLachlan, G.J., Peel, D. : Finite Mixture Models, Wiley, New York (2000)

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, Wiley, 2nd edn. Hoboken, New Jersey (2008)

    Book  Google Scholar 

  • Meredith, W.: Measurement invariance, factor analysis, and factorial invariance. Psychometrika 58, 525–543 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10, 441–460 (2010a)

    Article  MathSciNet  Google Scholar 

  • Montanari, A., Viroli, C.: A skew-normal factor model for the analysis of student satisfaction towards university courses. J. Appl. Stat. 37, 473–487 (2010b)

    Article  MathSciNet  Google Scholar 

  • Moustaki, I., Knott, M.: Generalized latent trait models. Psychometrika 65, 391–411 (2000)

    Article  MathSciNet  Google Scholar 

  • Moustaki, I.: A general class of latent variable models for ordinal manifest variables with covariate effects on manifest and latent variables. Br. J. Math. Stat. Psychol. 56, 337–357 (2003)

    Article  MathSciNet  Google Scholar 

  • Muthén, B., Asparouhov, I.: Item response mixture modeling: application to tobacco. Addict. Behav. 31, 1050–1066 (2006, dependence criteria)

    Google Scholar 

  • Muthén, B., Shedden, K.: Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55, 463–469 (1999)

    Article  MATH  Google Scholar 

  • Muthén, B., Lubke, G.H.: Investigating population heterogeneity with factor mixture models. Psychol. Methods 1, 21–39 (2005)

    Google Scholar 

  • Rizopoulos, D., Moustaki, I.: Generalized latent variable models with non-linear effect. Br. J. Math. Stat. Psychol. 61, 415–438 (2008)

    Article  MathSciNet  Google Scholar 

  • Skrondal, A., Rabe-Hesketh, S.: Generalized Latent Variable Modelling: Multilevel, Longitudinal, and Structural Equation Models. Chapman & Hall, London (2004)

    Book  Google Scholar 

  • Sörbom, D.: A general method for studying differences in factor means and factor structures between groups. Br. J. Math. Stat. Psychol. 27, 229–239 (1974)

    Article  MATH  Google Scholar 

  • Stroud, A.H., Secrest, D.: Gaussian Quadrature Formulas. Prentice Hall, Englewood Cliffs (1966)

  • Vermunt, J.K., Magidson, J.: Latent class models for classification. Comput. Stat. Data Anal. 41, 531–537 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Yung, Y.F.: Finite mixtures in confirmatory factor-analysis models. Psychometrika 62, 297–330 (1997)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

We are grateful to the Center for the Study of Aging of US (http://www.rand.org) for the data use agreement. The HRS Diabetes Study is sponsored by the National Institute on Aging (Grant Number NIA U01AG009740) and was conducted by the University of Michigan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cinzia Viroli.

Appendix A: The EM-algorithm

Appendix A: The EM-algorithm

Consider first the conventional unsupervised framework. The two steps are the following ones.

1.1 A.1 E-step

In order to compute the conditional expected value of the complete log-likelihood given the observed data, we need to determine the conditional distribution of the latent variables given the observed data on the basis of provisional estimates of the parameters, \(\tilde{\varvec{\tau }}\):

$$\begin{aligned} f\left(\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}\right)=f\left(\mathbf{z}| \mathbf{y},\mathbf{x},\mathbf{s}; \tilde{\varvec{\tau }}\right) f\left(\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}\right)\!. \end{aligned}$$
(15)

Using Bayes’ rule, the first term of the previous expression is given by

$$\begin{aligned} f\left(\mathbf{z}|\mathbf{y},\mathbf{x}, s_i=1;\tilde{\varvec{\tau }}\right)= \frac{f\left(\mathbf{z}|s_i=1;\tilde{\varvec{\tau }}\right) f(\mathbf{y}|\mathbf{z}, \mathbf{x};\tilde{\varvec{\tau }})}{f\left(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }}\right)}, \end{aligned}$$
(16)

where \(f(\mathbf{z}|s^{(i)}=1;\tilde{\varvec{\tau }})\) has the multivariate Gaussian density with vector mean \({\varvec{\xi }}_i^{\prime }\) and covariance matrix \({\varvec{\Sigma }}_i^{\prime }\) and \(f\left(\mathbf{y}|\mathbf{z},\mathbf{x}; \tilde{\varvec{\tau }}\right)\) is given in expression (2) and evaluated at \({\varvec{\alpha }}^{\prime }.\) However, \(f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})\) cannot be expressed in closed form and must be approximated in some way. Among the possible approximation methods, Gauss–Hermite quadrature points are used here:

$$\begin{aligned} f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})&= \int f(\mathbf{z}|s_i=1;\tilde{\varvec{\tau }}) f(\mathbf{y}|\mathbf{z},\mathbf{x};\tilde{\varvec{\tau }}) ~\text{ d}\mathbf{z}\nonumber \\&\cong \sum _{t_1=1}^{T_1}\cdots \sum _{t_q=1}^{T_q}\omega _{t_1}\cdots \omega _{t_q}f\left(\mathbf{y}|\sqrt{2}{\varvec{\Sigma }}_i^{1/2} \mathbf{z}_t+{\varvec{\xi }}_i,\mathbf{x}; \tilde{\varvec{\tau }}\right) \end{aligned}$$
(17)

where \(\omega _{t_1},\ldots ,\omega _{t_q}\) and \(\mathbf{z}_t=(z_{t_1},\ldots ,z_{t_q})^\top \) represent the weights and the points of the quadrature, respectively.

The second density of expression (15) is the posterior distribution of the allocation variable \(\mathbf{s}\) given the observed data which can be computed as posterior probability:

$$\begin{aligned} f(s_i=1|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }})=\frac{f(s_i=1;\tilde{\varvec{\tau }}) f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})}{\sum _{i=1}^k f(s_i=1;\tilde{\varvec{\tau }}) f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})}. \end{aligned}$$
(18)

1.2 A.2 M-step

The optimization step (a) of the algorithm consists in evaluating and maximizing:

$$\begin{aligned} E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}, \tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};{\varvec{\tau }}) \right]&= E_{\mathbf{z}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};{\varvec{\tau }}) \right] \nonumber \\&= \int \log f(\mathbf{y}|\mathbf{z},\mathbf{x};{\varvec{\tau }}) f(\mathbf{z}|\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }})~ \text{ d}\mathbf{z} \end{aligned}$$
(19)

with respect to \({\varvec{\alpha }}_{j}^{\top }=(\lambda _{j0}, {\varvec{\lambda }}_{j}^{\top },{\varvec{\beta }}_{j}^{\top })\) with \(j=1,\ldots ,p,\) where \(f(\mathbf{z}|\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}) =\sum _{i=1}^k f(s_i=1|\mathbf{y};\tilde{\varvec{\tau }})f \left(\mathbf{z}|\mathbf{y},\mathbf{x}, s_i=1;\tilde{\varvec{\tau }}\right).\) Let \(S_0({\varvec{\alpha }}_j)\) be the derivative with respect to \({\varvec{\alpha }}_j\) of the log-density in (19):

$$\begin{aligned} S_0({\varvec{\alpha }}_j)= \frac{\partial \log f({\mathbf{y}}|\mathbf{z},\mathbf{x};{\varvec{\tau })}}{\partial {{\varvec{\alpha }}_j}} \quad j=1,\ldots ,p \end{aligned}$$
(20)

In the case of binary and count data we obtain

$$\begin{aligned} S_0({\varvec{\alpha }}_j)=[y_{j}\theta ^{\prime }_{j}-b_{j}^{\prime }(\theta _{j})] \end{aligned}$$
(21)

where

$$\begin{aligned} \theta ^{\prime }_{j}=\left\{ \begin{array}{ll} 1,&\quad \text{ if} {\varvec{\alpha }}_{j}=\lambda _{j0}; \\ \mathbf{z},&\quad \text{ if} {\varvec{\alpha }}_{j}= {\varvec{\lambda }}_{j}; \\ \mathbf{x},&\quad \text{ if} {\varvec{\alpha }}_{j}= {\varvec{\beta }}_{j}. \end{array} \right.\quad \text{ and}\quad b_{j}^{\prime }(\theta _{j})=\left\{ \begin{array}{ll} N_{j}\pi _{j},&\quad \text{ if} {\varvec{\alpha }}_{j}= \lambda _{j0}; \\ N_{j}\pi _{j}\mathbf{z},&\quad \text{ if} {\varvec{\alpha }}_{j}= {\varvec{\lambda }}_{j}; \\ N_{j}\pi _{j}\mathbf{x},&\quad \text{ if} {\varvec{\alpha }}_{j}={\varvec{\beta }}_{j}. \end{array} \right. \end{aligned}$$

With binary data \(N_{j}=1.\)

When the data are ordinal \(S_0({\varvec{\alpha }}_j)\) has the following expression

$$\begin{aligned} S_0({\varvec{\alpha }}_j)=\sum _{l=1}^{d_{j}-1} \left[y^{*}_{j(l)}\theta ^{\prime }_{j(l)}-y^{*}_{j(l+1)} b_{j}^{\prime }(\theta _{j(l)})\right] \end{aligned}$$
(22)

In this case the derivatives \(\theta ^{\prime }_{j(l)}\) and \(b_{j}^{\prime }(\theta _{j(l)})\) have to be computed with respect to \({\varvec{\alpha }}_{j}^{\top }=(\lambda _{j0(1)},\ldots , \lambda _{j0(d_{j})},{\varvec{\lambda }}^{\top }_{j}, {\varvec{\beta }}^{\top }_{j})\). Their expressions are given in Moustaki (2003).

Thus the expected score function with respect to the parameter vector \({\varvec{\alpha }}_j\)

$$\begin{aligned} \sum _{i=1}^k f(s_i=1|\mathbf{y},\mathbf{x};{\tilde{\varvec{\tau }}})\int S_0({\varvec{\alpha }}_j) f\left(\mathbf{z}|\mathbf{y},\mathbf{x},s_i=1; \tilde{\varvec{\tau }}\right)~\text{ d}\mathbf{z}=0 \end{aligned}$$

can be evaluated by approximating the integrals with Gaussian–Hermite quadrature points:

$$\begin{aligned}&\int S_0({\varvec{\alpha }}_j,\mathbf{y}| \mathbf{z},\mathbf{x})f\left(\mathbf{z}|\mathbf{y}, \mathbf{x},s_i=1;\tilde{\varvec{\tau }}\right)~ \text{ d}\mathbf{z}\end{aligned}$$
(23)
$$\begin{aligned}&\quad \cong \frac{1}{f(\mathbf{y}|\mathbf{x},s_i=1;\tilde{\varvec{\tau }})} \sum _{t_1=1}^{T_1}\cdots \sum _{t_r=1}^{T_q}w_{t_1}\cdots w_{t_q}S_0({\varvec{\alpha }}_j, \mathbf{y}|\mathbf{z}_t^*,\mathbf{x})f \left(\mathbf{y}|\mathbf{z}_t^*,\mathbf{x}; \tilde{\varvec{\tau }}\right) \end{aligned}$$
(24)

where \(\mathbf{z}_t^*=\sqrt{2}{\varvec{\Sigma }}_i^{1/2}\mathbf{z}_t+{\varvec{\xi }}_i.\) The approximate gradient offers a non-explicit solution for the not null elements of the parameter vector \({\varvec{\alpha }}_j.\) The estimation problem can be solved by nonlinear optimization methods such as the Newton-type algorithms (Dennis and Schnabel (1983)).

The optimization step (b) of the algorithm consists in optimizing:

$$\begin{aligned} E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}\left[\log f(\mathbf{z}|\mathbf{s};{\varvec{\tau }}) \right]=\sum _{i=1}^k\int \log f(\mathbf{z}|\mathbf{s};{\varvec{\tau }}) f(\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }})~\text{ d}\mathbf{z} \end{aligned}$$
(25)

with respect to \({\varvec{\xi }}_i\) and \({\varvec{\Sigma }}_i.\) By substituting \(f(\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }})=f(s_i=1|\mathbf{y}, \mathbf{x};\tilde{\varvec{\tau }})f(\mathbf{z}\) \(| \mathbf{y},\mathbf{x},s_i=1;\tilde{\varvec{\tau }})\) in the previous expression, the estimates of the new Gaussian mixture parameters in terms of previous parameters \(\tilde{\varvec{\tau }}\) are

$$\begin{aligned} {\varvec{\xi }}_i&= \frac{f\left(s_i=1|\mathbf{y}, \mathbf{x};\tilde{\varvec{\tau }}\right) E[\mathbf{z}|s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}]}{f \left( s_i=1|\mathbf{y},\mathbf{x};{\varvec{\tau }}^{\prime }\right) }, \end{aligned}$$
(26)
$$\begin{aligned} {\varvec{\Sigma }}_i&= \frac{f\left(s_i=1|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}\right)\left(E[\mathbf{{zz}}^\top | s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}] -{\varvec{\xi }}_i{\varvec{\xi }}_i^\top \right)}{f\left(s_i=1|\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}\right)} , \end{aligned}$$
(27)

where the first and second conditional moments, \(E[\mathbf{z}|s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}]\) and \(E[\mathbf{{zz}}^\top |s_i=1,\mathbf{y},\mathbf{x};\tilde{\varvec{\tau }}],\) can be computed through the Gauss–Hermite quadrature points, similar to (17) and (23). In order to take into account the identifiability conditions given in (13) and (14), the following scaling (28) and centering (29) transformations are performed at each iteration of the EM algorithm:

$$\begin{aligned}&{\varvec{\Sigma }}_i \rightarrow (\mathbf{A}^{-1})^\top {\varvec{\Sigma }}_i\mathbf{A}^{-1},\quad \varvec{\xi }_i \rightarrow (\mathbf{A}^{-1})^\top {\varvec{\xi }}_i \end{aligned}$$
(28)
$$\begin{aligned}&{\varvec{\xi }}_i \rightarrow {\varvec{\xi }}_i-\sum _{i=1}^k w_i{\varvec{\xi }}_i, \end{aligned}$$
(29)

where \(\mathbf{A}\) is the Cholesky decomposition matrix of \(\text{ Var}(\mathbf{z}).\)

Finally, the estimates for the weights of the mixture in step (c) can be computed by evaluating the score function of \(E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x},\tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};\varvec{\tau }) \right]=E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}[\log f(\mathbf{s};\tilde{\varvec{\tau }})]\) from which \(w_i=f \left( s_i=1|\mathbf{y};\tilde{\varvec{\tau }} \right)\!.\)

The variant of the algorithm for the fully supervised mixture model can be easily derived by observing that only maximization step (b) is involved, because, as previously observed, step (c) is not necessary (the weights being the observed group proportions) and for step (a) it holds \(E_{\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}, \tilde{\varvec{\tau }}}\left[\log f(\mathbf{y}|\mathbf{z},\mathbf{x};\varvec{\tau }) \right]=E_{\mathbf{z}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }}}\left[ \log f(\mathbf{y}|\mathbf{z},\mathbf{x};{\varvec{\tau }})\right].\) With regard to step (b), we need to maximize

$$\begin{aligned} E_{\mathbf{z}|\mathbf{y},\mathbf{x},\mathbf{s}; \tilde{\varvec{\tau }}}\left[ \log f(\mathbf{z}|\mathbf{s};\varvec{\tau }) \right]=\sum _{i=1}^k\int \log f(\mathbf{z}|\mathbf{s};{\varvec{\tau }}) \frac{f(\mathbf{z},\mathbf{s}|\mathbf{y},\mathbf{x}; \tilde{\varvec{\tau }})}{f(\mathbf{s})}~\text{ d}\mathbf{z} \end{aligned}$$
(30)

with respect to \({\varvec{\xi }}_i\) and \({\varvec{\Sigma }}_i.\) By observing that \(f(\mathbf{s})\) at the denominator of the previous expression is known and it does not depend on parameters, we get expressions (26) and (27) in the M-step of the algorithm, where the posterior membership probabilities, \(f(\mathbf{s}|\mathbf{y},\mathbf{x}),\) are known and fixed at each iteration. For the generic observation, the posterior probability is a vector of length \(k\) whose \(i\)th element is 1 (with \(1\le i \le k\)) if the observation comes from the \(i\)th subpopulation and 0 viceversa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cagnone, S., Viroli, C. A factor mixture model for analyzing heterogeneity and cognitive structure of dementia. AStA Adv Stat Anal 98, 1–20 (2014). https://doi.org/10.1007/s10182-012-0206-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-012-0206-5

Keywords

Navigation