, Volume 76, Issue 4, pp 584–611 | Cite as

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses



Statisticians typically estimate the parameters of latent class and latent profile models using the Expectation-Maximization algorithm. This paper proposes an alternative two-stage approach to model fitting. The first stage uses the modified k-means and hierarchical clustering algorithms to identify the latent classes that best satisfy the conditional independence assumption underlying the latent variable model. The second stage then uses mixture modeling treating the class membership as known. The proposed approach is theoretically justifiable, directly checks the conditional independence assumption, and converges much faster than the full likelihood approach when analyzing high-dimensional data. This paper also develops a new classification rule based on latent variable models. The proposed classification procedure reduces the dimensionality of measured data and explicitly recognizes the heterogeneous nature of the complex disease, which makes it perfect for analyzing high-throughput genomic data. Simulation studies and real data analysis demonstrate the advantages of the proposed method.


classification finite mixture hierarchical clustering high-dimensional data k-means microarray two-stage approach 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Albert, P.S., McShane, L.M., & Shih, J.H. (2001). Latent class modeling approaches for assessing diagnostic error without a gold standard: with applications to p53 immunohistochemical assays in bladder tumors. Biometrics, 57, 610–619. PubMedCrossRefGoogle Scholar
  2. Bandeen-Roche, K., Miglioretti, D.L., Zeger, S.L., & Rathouz, P.J. (1997). Latent variable regression for multiple outcomes. Journal of the American Statistical Association, 92, 1375–1386. CrossRefGoogle Scholar
  3. Brusco, M.J., & Cradit, J.D. (2001). A variable selection heuristic for k-means clustering. Psychometrika, 66, 249–270. CrossRefGoogle Scholar
  4. Bryant, P., & Williamson, J.A. (1978). Asymptotic behavior of classification maximum likelihood estimates. Biometrika, 65, 273–281. CrossRefGoogle Scholar
  5. Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315–332. CrossRefGoogle Scholar
  6. Chang, C.J., Chen, W.J., Liu, S.K., Cheng, J.J., Ou Yang, W.C., Chang, H.J., Lane, H.Y., Lin, S.K., Yang, T.W., & Hwu, H.G. (2002). Morbidity risk of psychiatric disorders among the first degree relatives of schizophrenia patients in Taiwan. Schizophrenia Bulletin, 28, 379–392. PubMedGoogle Scholar
  7. Chen, W.J., Liu, S.K., Chang, C.J., Lien, Y.J., Chang, Y.H., & Hwu, H.G. (1998). Sustained attention deficit and schizotypal personality features in nonpsychotic relatives of schizophrenic patients. American Journal of Psychiatry, 155, 1214–1220. PubMedGoogle Scholar
  8. Cheng, J.J., Ho, H., Chang, C.J., Lane, S.Y., & Hwu, H.G. (1996). Positive and Negative Syndrome Scale (PANSS): establishment and reliability study of a Mandarin Chinese language version. Taiwanese Journal Psychiatry, 10, 251–258. Google Scholar
  9. Clogg, C.C. (1995). Latent class models. In Arminger, G., Clogg, C.C., & Sobel, M.E. (Eds.) Handbook of statistical modeling for the social and behavioral sciences (pp. 311–360). New York: Plenum. Google Scholar
  10. Cook, R.D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman Hall. Google Scholar
  11. Dayton, C.M., & Macready, G.B. (1998). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, 173–178. CrossRefGoogle Scholar
  12. Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38. Google Scholar
  13. Dudoit, S., Fridlyand, J., & Speed, T.P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 77–87. CrossRefGoogle Scholar
  14. Friedman, J.H., & Meulman, J.J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society. Series B, 66, 815–849. CrossRefGoogle Scholar
  15. Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231. CrossRefGoogle Scholar
  16. Huang, G.H. (2005). Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika, 70, 325–345. CrossRefGoogle Scholar
  17. Huang, G.H., & Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69, 5–32. CrossRefGoogle Scholar
  18. Hughes, T.R., Mao, M., Jones, A.R., Burchard, J., Marton, M.J., Shannon, K.W., Lefkowitz, S.M., Ziman, M., Schelter, J.M., Meyer, M.R., Kobayashi, S., Davis, C., Dai, H., He, Y.D., Stephaniants, S.B., Cavet, G., Walker, W.L., West, A., Coffey, E., Shoemaker, D.D., Stoughton, R., Blanchard, A.P., Friend, S.H., & Linsley, P.S. (2001). Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nature Biotechnology, 19, 342–347. PubMedCrossRefGoogle Scholar
  19. Landwehr, J.M., Pregibon, D., & Shoemaker, C. (1984). Graphical methods for assessing logistic regression models. Journal of the American Statistical Association, 79, 61–71. CrossRefGoogle Scholar
  20. Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. New York: Houghton-Mifflin. Google Scholar
  21. Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365–411. CrossRefGoogle Scholar
  22. Liu, S.K., Hwu, H.G., & Chen, W.J. (1997). Clinical symptom dimensions and deficits on the continuous performance test in schizophrenia. Schizophrenia Research, 25, 211–219. PubMedCrossRefGoogle Scholar
  23. Lubke, G.H., Carey, G., Lessem, J., & Hewitt, J. (2008). Using observed genetic variables to predict latent class membership: a comparison of two methods. Behavior Genetics, 38, 612–653. CrossRefGoogle Scholar
  24. Lux, V., & Kendler, K.S. (2010). Deconstructing major depression: a validation study of the DSM-IV symptomatic criteria. Psychological Medicine, 40, 1679–1690. PubMedCrossRefGoogle Scholar
  25. Marriott, F.H.C. (1975). Separating mixtures of normal distributions. Biometrics, 31, 767–769. CrossRefGoogle Scholar
  26. McCullagh, P., & Nelder, J.A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall. Google Scholar
  27. Melton, B., Liang, K.Y., & Pulver, A.E. (1994). Extended latent class approach to the study of familial/sporadic forms of a disease: its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology, 11, 311–327. PubMedCrossRefGoogle Scholar
  28. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. CrossRefGoogle Scholar
  29. Mohr, P.E., Cheng, C.M., Claxton, K., Conley, R.R., Feldman, J.J., Hargreaves, W.A., Lehman, A.F., Lenert, L.A., Mahmoud, R., Marder, S.R., & Neumann, P. (2004). The heterogeneity of schizophrenia in disease states. Schizophrenia Research, 71, 83–95. PubMedCrossRefGoogle Scholar
  30. Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49, 313–334. CrossRefGoogle Scholar
  31. Muthén, L.K., & Muthén, B.O. (2007). Mplus user’s guide (5th ed.). Los Angeles: Muthén & Muthén. Google Scholar
  32. Qu, Y., Tan, M., & Kunter, M.H. (1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics, 52, 797–810. PubMedCrossRefGoogle Scholar
  33. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Kopenhagen: Nielsen & Lydiche. Google Scholar
  34. Rosvold, H.E., Mirsk, A.F., Sarason, I., Bransome, E.D. Jr., & Bech, L.H. (1956). A continuous performance test of brain damage. Journal of Consulting Psychology, 20, 343–350. PubMedCrossRefGoogle Scholar
  35. Titterington, D.M., Smith, A.F., & Makov, U.E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley. Google Scholar
  36. van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., & Friend, S.H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2011

Authors and Affiliations

  1. 1.Institute of StatisticsNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations