Quality of Life Research

, Volume 21, Issue 4, pp 637–650 | Cite as

Latent variable mixture models: a promising approach for the validation of patient reported outcomes

  • Richard Sawatzky
  • Pamela A. Ratner
  • Jacek A. Kopec
  • Bruno D. Zumbo



A fundamental assumption of patient-reported outcomes (PRO) measurement is that all individuals interpret questions about their health status in a consistent manner, such that a measurement model can be constructed that is equivalently applicable to all people in the target population. The related assumption of sample homogeneity has been assessed in various ways, including the many approaches to differential item functioning analysis.


This expository paper describes the use of latent variable mixture modeling (LVMM), in conjunction with item response theory (IRT), to examine: (a) whether a sample is homogeneous with respect to a unidimensional measurement model, (b) implications of sample heterogeneity with respect to model-predicted scores (theta), and (c) sources of sample heterogeneity. An example is provided using the 10 items of the Short-Form Health Status (SF-36®) physical functioning subscale with data from the Canadian Community Health Survey (2003) (N = 7,030 adults in Manitoba).


The sample was not homogeneous with respect to a unidimensional measurement structure. Specification of three latent classes, to account for sample heterogeneity, resulted in significantly improved model fit. The latent classes were partially explained by demographic and health-related variables.


The illustrative analyses demonstrate the value of LVMM in revealing the potential implications of sample heterogeneity in the measurement of PROs.


Self-report measurement Psychometrics Measurement validity Physical function 



Akaike information criterion


Bootstrapped likelihood ratio test


Bayesian information criterion


Sample-adjusted Bayesian information criterion


Canadian Community Health survey


Differential item functioning


Item response theory


Graded response model


Latent variable mixture model


Odds ratio


Patient reported outcomes


Vuong-Lo-Mendell-Rubin likelihood ratio test



This research was completed with support from the Michael Smith Foundation for Health Research, the Arthritis Research Centre of Canada, and the Canadian Arthritis Network. The research and analysis are based on data from Statistics Canada, and the opinions expressed do not represent the views of Statistics Canada.

Supplementary material

11136_2011_9976_MOESM1_ESM.docx (57 kb)
Supplementary material 1 (DOCX 57 kb)


  1. 1.
    van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.Google Scholar
  2. 2.
    Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.Google Scholar
  3. 3.
    Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.Google Scholar
  4. 4.
    Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). Oxford: Oxford University Press.Google Scholar
  5. 5.
    Fayers, P., & Machin, D. (2007). Quality of life: The assessment, analysis and interpretation of patient-reported outcomes. Chichester, West Sussex: Wiley.Google Scholar
  6. 6.
    Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (vol. 26: Psychometrics) (pp. 45–79). Amsterdam: Elsevier Science.Google Scholar
  7. 7.
    Reise, S. P., & Gomel, J. N. (1995). Modeling qualitative variation within latent trait dimensions: Application of mixed-measurement to personality assessment. Multivariate Behavioral Research, 30, 341–358.CrossRefGoogle Scholar
  8. 8.
    Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: Information Age Publishing.Google Scholar
  9. 9.
    Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233.Google Scholar
  10. 10.
    Sawatzky, R., Ratner, P. A., Johnson, J. L., Kopec, J., & Zumbo, B. D. (2009). Sample heterogeneity and the measurement structure of the Multidimensional Students’ Life Satisfaction Scale. Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, 94, 273–296.Google Scholar
  11. 11.
    Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69.CrossRefGoogle Scholar
  12. 12.
    Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: L. Erlbaum.Google Scholar
  13. 13.
    Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel Haenszel procedure. In H. Wainer, H. I. Braun, & Educational Testing Service (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: L. Erlbaum Associates.Google Scholar
  14. 14.
    Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  15. 15.
    Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.PubMedCrossRefGoogle Scholar
  16. 16.
    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
  17. 17.
    Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371.CrossRefGoogle Scholar
  18. 18.
    Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.CrossRefGoogle Scholar
  19. 19.
    Muthén, B., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.CrossRefGoogle Scholar
  20. 20.
    Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402–415.PubMedCrossRefGoogle Scholar
  21. 21.
    Morales, L. S., Flowers, C., Gutierrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the differential item and test functioning (DFIT) framework. Medical Care, 44(11 Suppl 3), S143–S151.PubMedCrossRefGoogle Scholar
  22. 22.
    Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42, 133–148.CrossRefGoogle Scholar
  23. 23.
    De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243.Google Scholar
  24. 24.
    Samuelsen, K. M. (2008). Examining differential item functioning from a latent mixture perspective. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 177–198). Charlotte, NC: Information Age Publishing.Google Scholar
  25. 25.
    Mislevy, R. J., Levy, R., Kroopnick, M., & Rutstein, D. (2008). Evidentiary foundations of mixture item response theory models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 149–176). Charlotte, NC: Information Age Publishing.Google Scholar
  26. 26.
    De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.Google Scholar
  27. 27.
    Vermunt, J. K. (2001). The use of restricted latent class models for defining and testing nonparametric and parametric item response theory models. Applied Psychological Measurement, 25, 283–294.CrossRefGoogle Scholar
  28. 28.
    Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.CrossRefGoogle Scholar
  29. 29.
    Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999.CrossRefGoogle Scholar
  30. 30.
    Muthén, B. (2008). Latent variables hybrids. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 1–24). Charlotte, NC: Information Age Publishing.Google Scholar
  31. 31.
    Muthén, B. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 1–33). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  32. 32.
    Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29(1), 81–117.Google Scholar
  33. 33.
    Muthén, B., & Muthén, L. (2008). MPlus (version 5.2). Los Angeles, CA: Statmodel.Google Scholar
  34. 34.
    Lubke, G., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21–39.PubMedCrossRefGoogle Scholar
  35. 35.
    von Davier, M., & Carstensen, C. H. (2007). Multivariate and mixture distribution Rasch models: Extensions and applications. New York, NY: Springer.Google Scholar
  36. 36.
    Bolt, D. M. (2000). A SIBTEST approach to testing DIF hypotheses using experimentally designed test items. Journal of Educational Measurement, 37, 307–327.CrossRefGoogle Scholar
  37. 37.
    Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44, 75–92.CrossRefGoogle Scholar
  38. 38.
    von Davier, M., & Yamamoto, K. (2007). Mixture-distribution and HYBRID Rasch models. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 99–115). New York, NY: Springer Science + Business Media.CrossRefGoogle Scholar
  39. 39.
    Muthén, B., & Asparouhov, T. (2006). Item response mixture modeling: Application to tobacco dependence criteria. Addictive Behaviors, 31, 1050–1066.PubMedCrossRefGoogle Scholar
  40. 40.
    Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory. Structural Equation Modeling: A Multidisciplinary Journal, 15, 136–153.CrossRefGoogle Scholar
  41. 41.
    Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.CrossRefGoogle Scholar
  42. 42.
    McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: L. Erlbaum Associates.Google Scholar
  43. 43.
    Samejima, F. (1997). Graded response model. In W. J. Van Der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.Google Scholar
  44. 44.
    Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31.PubMedCrossRefGoogle Scholar
  45. 45.
    McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.CrossRefGoogle Scholar
  46. 46.
    Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge, NY: Cambridge University Press.CrossRefGoogle Scholar
  47. 47.
    Asparouhov, T., & Muthén, B. (2008). Multilevel mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 27–51). Charlotte, NC: Information Age Publishing.Google Scholar
  48. 48.
    Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.Google Scholar
  49. 49.
    Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: Wiley-Interscience.CrossRefGoogle Scholar
  50. 50.
    Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.CrossRefGoogle Scholar
  51. 51.
    Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14, 202–226.CrossRefGoogle Scholar
  52. 52.
    Yang, C. C. (2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50, 1090–1104.CrossRefGoogle Scholar
  53. 53.
    Nylund, K. L., Asparoutiov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.CrossRefGoogle Scholar
  54. 54.
    Li, F. M., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353–373.CrossRefGoogle Scholar
  55. 55.
    Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307.CrossRefGoogle Scholar
  56. 56.
    Lo, Y. T., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778.CrossRefGoogle Scholar
  57. 57.
    Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459–475.PubMedCrossRefGoogle Scholar
  58. 58.
    Lubke, G., & Muthén, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14, 26–47.Google Scholar
  59. 59.
    Holland, P. W. (2007). A framework and history for score linking. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 5–30). New York: Springer.CrossRefGoogle Scholar
  60. 60.
    Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer.Google Scholar
  61. 61.
    Thomas, D. R., Zhu, P., Zumbo, B. D., & Dutta, S. (2008). On measuring the relative importance of explanatory variables in a logistic regression. Journal of Modern Applied Statistical Methods, 7, 21–38.Google Scholar
  62. 62.
    Muthén, B., & Muthén, L. (2010). IRT in Mplus. http://www.statmodel.com/download/MplusIRT2.pdf. Accessed 15 Jan 2011.
  63. 63.
    Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32, 611–631.CrossRefGoogle Scholar
  64. 64.
    Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469.CrossRefGoogle Scholar
  65. 65.
    Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317–342). Charlotte, NC: Information Age Publishing.Google Scholar
  66. 66.
    Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.CrossRefGoogle Scholar
  67. 67.
    Bolck, A., Croon, M., & Hagenaars, J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12, 3–27.CrossRefGoogle Scholar
  68. 68.
    Wang, C. P., Brown, C. H., & Bandeen-Roche, K. (2005). Residual diagnostics for growth mixture models: Examining the impact of a preventive intervention on multiple trajectories of aggressive behavior. Journal of the American Statistical Association, 100, 1054–1076.CrossRefGoogle Scholar
  69. 69.
    Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92(440), 1375–1386.CrossRefGoogle Scholar
  70. 70.
    Muthén, B., & Muthén, L. (2007). Wald test of mean equality for potential latent class predictors in mixture modeling. http://www.statmodel.com/download/MeanTest1.pdf. Accessed on 20 Oct 2010.
  71. 71.
    Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 health survey: Manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.Google Scholar
  72. 72.
    Canada, Statistics. (2005). Canadian Community Health Survey Cycle 2.1: User guide for the public use microdata file. Ottawa, ON: Statistics Canada: Health Statistics Division.Google Scholar
  73. 73.
    Dayton, C. M. (1998). Latent class scaling analysis. Thousand Oaks, CA: Sage.Google Scholar
  74. 74.
    Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269–314). Greenwich, CT: Information Age Publishing.Google Scholar
  75. 75.
    Jöreskog, K. G. (1990). New developments in LISREL: Analysis of ordinal variables using polychoric correlations and weighted least squares. Quality & Quantity, 24, 387–404.CrossRefGoogle Scholar
  76. 76.
    Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.CrossRefGoogle Scholar
  77. 77.
    Rigdon, E. E., & Ferguson, C. E., Jr. (1991). The performance of the polychoric correlation coefficient and selected fitting functions in confirmatory factor analysis with ordinal data. Journal of Marketing Research, 28, 491–497.CrossRefGoogle Scholar
  78. 78.
    Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: L. Erlbaum Associates.Google Scholar
  79. 79.
    Bauer, D. J., & Curran, P. J. (2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9, 3–29.PubMedCrossRefGoogle Scholar
  80. 80.
    Canada, Statistics. (2005). Canadian Community Health Survey: Questionnaire for cycle 2.1. Ottawa, ON: Statistics Canada: Health Statistics Division.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Richard Sawatzky
    • 1
  • Pamela A. Ratner
    • 2
  • Jacek A. Kopec
    • 3
  • Bruno D. Zumbo
    • 4
  1. 1.School of NursingTrinity Western UniversityLangleyCanada
  2. 2.School of NursingUniversity of British ColumbiaVancouverCanada
  3. 3.School of Population and Public HealthUniversity of British Columbia; Arthritis Research Centre of CanadaVancouverCanada
  4. 4.ECPS, Measurement, Evaluation and Research MethodologyUniversity of British ColumbiaVancouverCanada

Personalised recommendations