A fundamental assumption of patient-reported outcomes (PRO) measurement is that all individuals interpret questions about their health status in a consistent manner, such that a measurement model can be constructed that is equivalently applicable to all people in the target population. The related assumption of sample homogeneity has been assessed in various ways, including the many approaches to differential item functioning analysis.
This expository paper describes the use of latent variable mixture modeling (LVMM), in conjunction with item response theory (IRT), to examine: (a) whether a sample is homogeneous with respect to a unidimensional measurement model, (b) implications of sample heterogeneity with respect to model-predicted scores (theta), and (c) sources of sample heterogeneity. An example is provided using the 10 items of the Short-Form Health Status (SF-36®) physical functioning subscale with data from the Canadian Community Health Survey (2003) (N = 7,030 adults in Manitoba).
The sample was not homogeneous with respect to a unidimensional measurement structure. Specification of three latent classes, to account for sample heterogeneity, resulted in significantly improved model fit. The latent classes were partially explained by demographic and health-related variables.
The illustrative analyses demonstrate the value of LVMM in revealing the potential implications of sample heterogeneity in the measurement of PROs.
This is a preview of subscription content,to check access.
Access this article
Similar content being viewed by others
Akaike information criterion
Bootstrapped likelihood ratio test
Bayesian information criterion
Sample-adjusted Bayesian information criterion
Canadian Community Health survey
Differential item functioning
Item response theory
Graded response model
Latent variable mixture model
Patient reported outcomes
- VLMR LRT:
Vuong-Lo-Mendell-Rubin likelihood ratio test
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). Oxford: Oxford University Press.
Fayers, P., & Machin, D. (2007). Quality of life: The assessment, analysis and interpretation of patient-reported outcomes. Chichester, West Sussex: Wiley.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (vol. 26: Psychometrics) (pp. 45–79). Amsterdam: Elsevier Science.
Reise, S. P., & Gomel, J. N. (1995). Modeling qualitative variation within latent trait dimensions: Application of mixed-measurement to personality assessment. Multivariate Behavioral Research, 30, 341–358.
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: Information Age Publishing.
Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233.
Sawatzky, R., Ratner, P. A., Johnson, J. L., Kopec, J., & Zumbo, B. D. (2009). Sample heterogeneity and the measurement structure of the Multidimensional Students’ Life Satisfaction Scale. Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, 94, 273–296.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: L. Erlbaum.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel Haenszel procedure. In H. Wainer, H. I. Braun, & Educational Testing Service (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: L. Erlbaum Associates.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
Muthén, B., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.
Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402–415.
Morales, L. S., Flowers, C., Gutierrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the differential item and test functioning (DFIT) framework. Medical Care, 44(11 Suppl 3), S143–S151.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42, 133–148.
De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243.
Samuelsen, K. M. (2008). Examining differential item functioning from a latent mixture perspective. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 177–198). Charlotte, NC: Information Age Publishing.
Mislevy, R. J., Levy, R., Kroopnick, M., & Rutstein, D. (2008). Evidentiary foundations of mixture item response theory models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 149–176). Charlotte, NC: Information Age Publishing.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Vermunt, J. K. (2001). The use of restricted latent class models for defining and testing nonparametric and parametric item response theory models. Applied Psychological Measurement, 25, 283–294.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282.
Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999.
Muthén, B. (2008). Latent variables hybrids. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 1–24). Charlotte, NC: Information Age Publishing.
Muthén, B. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 1–33). Mahwah, NJ: Lawrence Erlbaum.
Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29(1), 81–117.
Muthén, B., & Muthén, L. (2008). MPlus (version 5.2). Los Angeles, CA: Statmodel.
Lubke, G., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21–39.
von Davier, M., & Carstensen, C. H. (2007). Multivariate and mixture distribution Rasch models: Extensions and applications. New York, NY: Springer.
Bolt, D. M. (2000). A SIBTEST approach to testing DIF hypotheses using experimentally designed test items. Journal of Educational Measurement, 37, 307–327.
Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44, 75–92.
von Davier, M., & Yamamoto, K. (2007). Mixture-distribution and HYBRID Rasch models. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications (pp. 99–115). New York, NY: Springer Science + Business Media.
Muthén, B., & Asparouhov, T. (2006). Item response mixture modeling: Application to tobacco dependence criteria. Addictive Behaviors, 31, 1050–1066.
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory. Structural Equation Modeling: A Multidisciplinary Journal, 15, 136–153.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: L. Erlbaum Associates.
Samejima, F. (1997). Graded response model. In W. J. Van Der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31.
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. Cambridge, NY: Cambridge University Press.
Asparouhov, T., & Muthén, B. (2008). Multilevel mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 27–51). Charlotte, NC: Information Age Publishing.
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.
Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: Wiley-Interscience.
Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14, 202–226.
Yang, C. C. (2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50, 1090–1104.
Nylund, K. L., Asparoutiov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14, 535–569.
Li, F. M., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353–373.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307.
Lo, Y. T., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778.
Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459–475.
Lubke, G., & Muthén, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14, 26–47.
Holland, P. W. (2007). A framework and history for score linking. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 5–30). New York: Springer.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer.
Thomas, D. R., Zhu, P., Zumbo, B. D., & Dutta, S. (2008). On measuring the relative importance of explanatory variables in a logistic regression. Journal of Modern Applied Statistical Methods, 7, 21–38.
Muthén, B., & Muthén, L. (2010). IRT in Mplus. http://www.statmodel.com/download/MplusIRT2.pdf. Accessed 15 Jan 2011.
Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32, 611–631.
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469.
Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317–342). Charlotte, NC: Information Age Publishing.
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.
Bolck, A., Croon, M., & Hagenaars, J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12, 3–27.
Wang, C. P., Brown, C. H., & Bandeen-Roche, K. (2005). Residual diagnostics for growth mixture models: Examining the impact of a preventive intervention on multiple trajectories of aggressive behavior. Journal of the American Statistical Association, 100, 1054–1076.
Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92(440), 1375–1386.
Muthén, B., & Muthén, L. (2007). Wald test of mean equality for potential latent class predictors in mixture modeling. http://www.statmodel.com/download/MeanTest1.pdf. Accessed on 20 Oct 2010.
Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 health survey: Manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.
Canada, Statistics. (2005). Canadian Community Health Survey Cycle 2.1: User guide for the public use microdata file. Ottawa, ON: Statistics Canada: Health Statistics Division.
Dayton, C. M. (1998). Latent class scaling analysis. Thousand Oaks, CA: Sage.
Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269–314). Greenwich, CT: Information Age Publishing.
Jöreskog, K. G. (1990). New developments in LISREL: Analysis of ordinal variables using polychoric correlations and weighted least squares. Quality & Quantity, 24, 387–404.
Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.
Rigdon, E. E., & Ferguson, C. E., Jr. (1991). The performance of the polychoric correlation coefficient and selected fitting functions in confirmatory factor analysis with ordinal data. Journal of Marketing Research, 28, 491–497.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: L. Erlbaum Associates.
Bauer, D. J., & Curran, P. J. (2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9, 3–29.
Canada, Statistics. (2005). Canadian Community Health Survey: Questionnaire for cycle 2.1. Ottawa, ON: Statistics Canada: Health Statistics Division.
This research was completed with support from the Michael Smith Foundation for Health Research, the Arthritis Research Centre of Canada, and the Canadian Arthritis Network. The research and analysis are based on data from Statistics Canada, and the opinions expressed do not represent the views of Statistics Canada.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Sawatzky, R., Ratner, P.A., Kopec, J.A. et al. Latent variable mixture models: a promising approach for the validation of patient reported outcomes. Qual Life Res 21, 637–650 (2012). https://doi.org/10.1007/s11136-011-9976-6