, Volume 78, Issue 1, pp 59–82 | Cite as

Tests of Measurement Invariance Without Subgroups: A Generalization of Classical Methods

  • Edgar C. Merkle
  • Achim Zeileis


The issue of measurement invariance commonly arises in factor-analytic contexts, with methods for assessment including likelihood ratio tests, Lagrange multiplier tests, and Wald tests. These tests all require advance definition of the number of groups, group membership, and offending model parameters. In this paper, we study tests of measurement invariance based on stochastic processes of casewise derivatives of the likelihood function. These tests can be viewed as generalizations of the Lagrange multiplier test, and they are especially useful for: (i) identifying subgroups of individuals that violate measurement invariance along a continuous auxiliary variable without prespecified thresholds, and (ii) identifying specific parameters impacted by measurement invariance violations. The tests are presented and illustrated in detail, including an application to a study of stereotype threat and simulations examining the tests’ abilities in controlled conditions.

Key words

measurement invariance parameter stability factor analysis structural equation models 



This work was supported by National Science Foundation grant SES-1061334. The authors thank Jelte Wicherts, who generously shared data for the stereotype threat application, Yves Rosseel, who provided feedback and code for performing the tests with the lavaan package, Kris Preacher, who provided helpful comments on the manuscript, and the participants of the Psychoco 2012 workshop on psychometric computing for helpful discussion.


  1. Andrews, D.W.K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821–856. CrossRefGoogle Scholar
  2. Bauer, D.J., & Curran, P.J. (2004). The integration of continuous and discrete latent variable models: potential problems and promising opportunities. Psychological Methods, 9, 3–29. PubMedCrossRefGoogle Scholar
  3. Bauer, D.J., & Hussong, A.M. (2009). Psychometric approaches for developing commensurate measures across independent studies: traditional and new models. Psychological Methods, 14, 101–125. PubMedCrossRefGoogle Scholar
  4. Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., et al. (2011). OpenMx: an open source extended structural equation modeling framework. Psychometrika, 76(2), 306–317. PubMedCrossRefGoogle Scholar
  5. Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley. Google Scholar
  6. Borsboom, D. (2006). When does measurement invariance matter? Medical Care, 44(11), S176–S181. PubMedCrossRefGoogle Scholar
  7. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Belmont: Wadsworth. Google Scholar
  8. Brown, R.L., Durbin, J., & Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society. Series B, 37, 149–163. Google Scholar
  9. Dolan, C.V., & van der Maas, H.L.J. (1998). Fitting multivariate normal finite mixtures subject to structural equation modeling. Psychometrika, 63, 227–253. CrossRefGoogle Scholar
  10. Ferguson, T.S. (1996). A course in large sample theory. London: Chapman & Hall. Google Scholar
  11. Ferrer, E., Balluerka, N., & Widaman, K.F. (2008). Factorial invariance and the specification of second-order latent growth models. Methodology, 4, 22–36. PubMedGoogle Scholar
  12. Hansen, B.E. (1992). Testing for parameter instability in linear models. Journal of Policy Modeling, 14, 517–533. CrossRefGoogle Scholar
  13. Hansen, B.E. (1997). Approximate asymptotic p values for structural-change tests. Journal of Business & Economic Statistics, 15, 60–67. Google Scholar
  14. Hjort, N.L., & Koning, A. (2002). Tests for constancy of model parameters over time. Nonparametric Statistics, 14, 113–132. CrossRefGoogle Scholar
  15. Horn, J.L., & McArdle, J.J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18, 117–144. PubMedCrossRefGoogle Scholar
  16. Jöreskog, K.G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426. CrossRefGoogle Scholar
  17. Lubke, G.H., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21–39. PubMedCrossRefGoogle Scholar
  18. MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. PubMedCrossRefGoogle Scholar
  19. McArdle, J.J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60, 577–605. PubMedCrossRefGoogle Scholar
  20. McDonald, R.P. (1999). Test theory: a unified treatment. Mahwah: Erlbaum. Google Scholar
  21. Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143. CrossRefGoogle Scholar
  22. Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543. CrossRefGoogle Scholar
  23. Merkle, E.C., & Shaffer, V.A. (2011). Binary recursive partitioning methods with application to psychology. British Journal of Mathematical & Statistical Psychology, 64(1), 161–181. CrossRefGoogle Scholar
  24. Millsap, R.E. (2005). Four unresolved problems in studies of factorial invariance. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 153–171). Mahwah: Erlbaum. Google Scholar
  25. Millsap, R.E. (2011). Statistical approaches to measurement invariance. New York: Routledge. Google Scholar
  26. Molenaar, D., Dolan, C.V., Wicherts, J.M., & van der Mass, H.L.J. (2010). Modeling differentiation of cognitive abilities within the higher-order factor model using moderated factor analysis. Intelligence, 38, 611–624. CrossRefGoogle Scholar
  27. Neale, M.C., Aggen, S.H., Maes, H.H., Kubarych, T.S., & Schmitt, J.E. (2006). Methodological issues in the assessment of substance use phenotypes. Addictive Behaviors, 31, 1010–1034. PubMedCrossRefGoogle Scholar
  28. Nyblom, J. (1989). Testing for the constancy of parameters over time. Journal of the American Statistical Association, 84, 223–230. CrossRefGoogle Scholar
  29. Ploberger, W., & Krämer, W. (1992). The CUSUM test with OLS residuals. Econometrica, 60(2), 271–285. CrossRefGoogle Scholar
  30. Purcell, S. (2002). Variance components models for gene-environment interaction in twin analysis. Twin Research, 5, 554–571. PubMedGoogle Scholar
  31. R Development Core Team (2012). R: a language and environment for statistical computing [Computer software manual]. URL Vienna, Austria (ISBN 3-900051-07-0).
  32. Rosseel, Y. (2012). lavaan: an R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. URL: Google Scholar
  33. Sánchez, G. (2009). PATHMOX approach: segmentation trees in partial least squares path modeling. Unpublished doctoral dissertation. Universitat Politécnica de Catalunya. Google Scholar
  34. Satorra, A. (1989). Alternative test criteria in covariance structure analysis: a unified approach. Psychometrika, 54, 131–151. CrossRefGoogle Scholar
  35. Shorack, G.R., & Wellner, J.A. (1986). Empirical processes with applications to statistics. New York: Wiley. Google Scholar
  36. Stark, S., Chernyshenko, O.S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306. PubMedCrossRefGoogle Scholar
  37. Strobl, C., Kopf, J., & Zeileis, A. (2010). A new method for detecting differential item functioning in the Rasch model (Technical Report No. 92). Department of Statistics, Ludwig-Maximilians-Universität München. URL
  38. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348. PubMedCrossRefGoogle Scholar
  39. Wicherts, J.M., Dolan, C.V., & Hessen, D.J. (2005). Stereotype threat and group differences in test performance: a question of measurement invariance. Journal of Personality and Social Psychology, 89(5), 696–716. PubMedCrossRefGoogle Scholar
  40. Wothke, W. (2000). Longitudinal and multi-group modeling with missing data. In T.D. Little, K.U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multilevel data: practical issues, applied approaches, and specific examples. Mahwah: Erlbaum. Google Scholar
  41. Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, F statistics, and OLS residuals. Econometric Reviews, 24(4), 445–466. CrossRefGoogle Scholar
  42. Zeileis, A. (2006). Implementing a class of structural change tests: an econometric computing approach. Computational Statistics & Data Analysis, 50(11), 2987–3008. CrossRefGoogle Scholar
  43. Zeileis, A., & Hornik, K. (2007). Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica, 61, 488–508. CrossRefGoogle Scholar
  44. Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492–514. CrossRefGoogle Scholar
  45. Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. (2002). strucchange: an R package for testing for structural change in linear regression models. Journal of Statistical Software, 7(2), 1–38. URL Google Scholar
  46. Zeileis, A., Shah, A., & Patnaik, I. (2010). Testing, monitoring, and dating structural changes in exchange rate regimes. Computational Statistics & Data Analysis, 54, 1696–1706. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.Department of Psychological SciencesUniversity of MissouriColumbiaUSA
  2. 2.Universität InnsbruckInnsbruckAustria

Personalised recommendations