Latent variable mixture models: a promising approach for the validation of patient reported outcomes



A fundamental assumption of patient-reported outcomes (PRO) measurement is that all individuals interpret questions about their health status in a consistent manner, such that a measurement model can be constructed that is equivalently applicable to all people in the target population. The related assumption of sample homogeneity has been assessed in various ways, including the many approaches to differential item functioning analysis.


This expository paper describes the use of latent variable mixture modeling (LVMM), in conjunction with item response theory (IRT), to examine: (a) whether a sample is homogeneous with respect to a unidimensional measurement model, (b) implications of sample heterogeneity with respect to model-predicted scores (theta), and (c) sources of sample heterogeneity. An example is provided using the 10 items of the Short-Form Health Status (SF-36®) physical functioning subscale with data from the Canadian Community Health Survey (2003) (N = 7,030 adults in Manitoba).


The sample was not homogeneous with respect to a unidimensional measurement structure. Specification of three latent classes, to account for sample heterogeneity, resulted in significantly improved model fit. The latent classes were partially explained by demographic and health-related variables.


The illustrative analyses demonstrate the value of LVMM in revealing the potential implications of sample heterogeneity in the measurement of PROs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5



Akaike information criterion


Bootstrapped likelihood ratio test


Bayesian information criterion


Sample-adjusted Bayesian information criterion


Canadian Community Health survey


Differential item functioning


Item response theory


Graded response model


Latent variable mixture model


Odds ratio


Patient reported outcomes


Vuong-Lo-Mendell-Rubin likelihood ratio test


  1. 1.

    van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.

    Google Scholar 

  2. 2.

    Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.

    Google Scholar 

  3. 3.

    Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.

    Google Scholar 

  4. 4.

    Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). Oxford: Oxford University Press.

    Google Scholar 

  5. 5.

    Fayers, P., & Machin, D. (2007). Quality of life: The assessment, analysis and interpretation of patient-reported outcomes. Chichester, West Sussex: Wiley.

    Google Scholar 

  6. 6.

    Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (vol. 26: Psychometrics) (pp. 45–79). Amsterdam: Elsevier Science.

    Google Scholar 

  7. 7.

    Reise, S. P., & Gomel, J. N. (1995). Modeling qualitative variation within latent trait dimensions: Application of mixed-measurement to personality assessment. Multivariate Behavioral Research, 30, 341–358.

    Article  Google Scholar 

  8. 8.

    Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  9. 9.

    Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233.

    Google Scholar 

  10. 10.

    Sawatzky, R., Ratner, P. A., Johnson, J. L., Kopec, J., & Zumbo, B. D. (2009). Sample heterogeneity and the measurement structure of the Multidimensional Students’ Life Satisfaction Scale. Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, 94, 273–296.

    Google Scholar 

  11. 11.

    Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69.

    Article  Google Scholar 

  12. 12.

    Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: L. Erlbaum.

    Google Scholar 

  13. 13.

    Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel Haenszel procedure. In H. Wainer, H. I. Braun, & Educational Testing Service (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: L. Erlbaum Associates.

    Google Scholar 

  14. 14.

    Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.

    Article  Google Scholar 

  15. 15.

    Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.

    PubMed  Article  Google Scholar 

  16. 16.

    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

  17. 17.

    Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371.

    Article  Google Scholar 

  18. 18.

    Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.

    Article  Google Scholar 

  19. 19.

    Muthén, B., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.

    Article  Google Scholar 

  20. 20.

    Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402–415.

    PubMed  Article  Google Scholar 

  21. 21.

    Morales, L. S., Flowers, C., Gutierrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the differential item and test functioning (DFIT) framework. Medical Care, 44(11 Suppl 3), S143–S151.

    PubMed  Article  Google Scholar 

  22. 22.

    Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42, 133–148.

    Article  Google Scholar 

  23. 23.

    De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243.

    Google Scholar 

  24. 24.

    Samuelsen, K. M. (2008). Examining differential item functioning from a latent mixture perspective. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 177–198). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  25. 25.

    Mislevy, R. J., Levy, R., Kroopnick, M., & Rutstein, D. (2008). Evidentiary foundations of mixture item response theory models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 149–176). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  26. 26.

    De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.

    Google Scholar 

  27. 27.

    Vermunt, J. K. (2001). The use of restricted latent class models for defining and testing nonparametric and parametric item response theory models. Applied Psychological Measurement, 25, 283–294.

    Article  Google Scholar 

  28. 28.

    Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.

    Article  Google Scholar 

  29. 29.

    Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999.

    Article  Google Scholar 

  30. 30.

    Muthén, B. (2008). Latent variables hybrids. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 1–24). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  31. 31.

    Muthén, B. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 1–33). Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  32. 32.

    Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29(1), 81–117.

    Google Scholar 

  33. 33.

    Muthén, B., & Muthén, L. (2008). MPlus (version 5.2). Los Angeles, CA: Statmodel.

    Google Scholar 

  34. 34.

    Lubke, G., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21–39.

    PubMed  Article  Google Scholar 

  35. 35.

    von Davier, M., & Carstensen, C. H. (2007). Multivariate and mixture distribution Rasch models: Extensions and applications. New York, NY: Springer.

    Google Scholar 

  36. 36.

    Bolt, D. M. (2000). A SIBTEST approach to testing DIF hypotheses using experimentally designed test items. Journal of Educational Measurement, 37, 307–327.

    Article  Google Scholar 

  37. 37.

    Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44, 75–92.

    Article  Google Scholar 

  38. 38.

    von Davier, M., & Yamamoto, K. (2007). Mixture-distribution and HYBRID Rasch models. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 99–115). New York, NY: Springer Science + Business Media.

    Chapter  Google Scholar 

  39. 39.

    Muthén, B., & Asparouhov, T. (2006). Item response mixture modeling: Application to tobacco dependence criteria. Addictive Behaviors, 31, 1050–1066.

    PubMed  Article  Google Scholar 

  40. 40.

    Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory. Structural Equation Modeling: A Multidisciplinary Journal, 15, 136–153.

    Article  Google Scholar 

  41. 41.

    Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.

    Article  Google Scholar 

  42. 42.

    McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: L. Erlbaum Associates.

    Google Scholar 

  43. 43.

    Samejima, F. (1997). Graded response model. In W. J. Van Der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.

    Google Scholar 

  44. 44.

    Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31.

    PubMed  Article  Google Scholar 

  45. 45.

    McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.

    Book  Google Scholar 

  46. 46.

    Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge, NY: Cambridge University Press.

    Book  Google Scholar 

  47. 47.

    Asparouhov, T., & Muthén, B. (2008). Multilevel mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 27–51). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  48. 48.

    Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.

    Google Scholar 

  49. 49.

    Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: Wiley-Interscience.

    Book  Google Scholar 

  50. 50.

    Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.

    Article  Google Scholar 

  51. 51.

    Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14, 202–226.

    Article  Google Scholar 

  52. 52.

    Yang, C. C. (2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50, 1090–1104.

    Article  Google Scholar 

  53. 53.

    Nylund, K. L., Asparoutiov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.

    Article  Google Scholar 

  54. 54.

    Li, F. M., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353–373.

    Article  Google Scholar 

  55. 55.

    Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307.

    Article  Google Scholar 

  56. 56.

    Lo, Y. T., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778.

    Article  Google Scholar 

  57. 57.

    Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459–475.

    PubMed  Article  Google Scholar 

  58. 58.

    Lubke, G., & Muthén, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14, 26–47.

    Google Scholar 

  59. 59.

    Holland, P. W. (2007). A framework and history for score linking. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 5–30). New York: Springer.

    Chapter  Google Scholar 

  60. 60.

    Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer.

    Google Scholar 

  61. 61.

    Thomas, D. R., Zhu, P., Zumbo, B. D., & Dutta, S. (2008). On measuring the relative importance of explanatory variables in a logistic regression. Journal of Modern Applied Statistical Methods, 7, 21–38.

    Google Scholar 

  62. 62.

    Muthén, B., & Muthén, L. (2010). IRT in Mplus. Accessed 15 Jan 2011.

  63. 63.

    Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32, 611–631.

    Article  Google Scholar 

  64. 64.

    Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469.

    Article  Google Scholar 

  65. 65.

    Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317–342). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  66. 66.

    Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  67. 67.

    Bolck, A., Croon, M., & Hagenaars, J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12, 3–27.

    Article  Google Scholar 

  68. 68.

    Wang, C. P., Brown, C. H., & Bandeen-Roche, K. (2005). Residual diagnostics for growth mixture models: Examining the impact of a preventive intervention on multiple trajectories of aggressive behavior. Journal of the American Statistical Association, 100, 1054–1076.

    Article  CAS  Google Scholar 

  69. 69.

    Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92(440), 1375–1386.

    Article  Google Scholar 

  70. 70.

    Muthén, B., & Muthén, L. (2007). Wald test of mean equality for potential latent class predictors in mixture modeling. Accessed on 20 Oct 2010.

  71. 71.

    Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 health survey: Manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.

    Google Scholar 

  72. 72.

    Canada, Statistics. (2005). Canadian Community Health Survey Cycle 2.1: User guide for the public use microdata file. Ottawa, ON: Statistics Canada: Health Statistics Division.

    Google Scholar 

  73. 73.

    Dayton, C. M. (1998). Latent class scaling analysis. Thousand Oaks, CA: Sage.

    Google Scholar 

  74. 74.

    Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269–314). Greenwich, CT: Information Age Publishing.

    Google Scholar 

  75. 75.

    Jöreskog, K. G. (1990). New developments in LISREL: Analysis of ordinal variables using polychoric correlations and weighted least squares. Quality & Quantity, 24, 387–404.

    Article  Google Scholar 

  76. 76.

    Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.

    Article  Google Scholar 

  77. 77.

    Rigdon, E. E., & Ferguson, C. E., Jr. (1991). The performance of the polychoric correlation coefficient and selected fitting functions in confirmatory factor analysis with ordinal data. Journal of Marketing Research, 28, 491–497.

    Article  Google Scholar 

  78. 78.

    Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: L. Erlbaum Associates.

    Google Scholar 

  79. 79.

    Bauer, D. J., & Curran, P. J. (2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9, 3–29.

    PubMed  Article  Google Scholar 

  80. 80.

    Canada, Statistics. (2005). Canadian Community Health Survey: Questionnaire for cycle 2.1. Ottawa, ON: Statistics Canada: Health Statistics Division.

    Google Scholar 

Download references


This research was completed with support from the Michael Smith Foundation for Health Research, the Arthritis Research Centre of Canada, and the Canadian Arthritis Network. The research and analysis are based on data from Statistics Canada, and the opinions expressed do not represent the views of Statistics Canada.

Author information



Corresponding author

Correspondence to Richard Sawatzky.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 57 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sawatzky, R., Ratner, P.A., Kopec, J.A. et al. Latent variable mixture models: a promising approach for the validation of patient reported outcomes. Qual Life Res 21, 637–650 (2012).

Download citation


  • Self-report measurement
  • Psychometrics
  • Measurement validity
  • Physical function