Abstract
Borsboom (Psychometrika, 71:425–440, 2006) noted that recent work on measurement invariance (MI) and predictive invariance (PI) has had little impact on the practice of measurement in psychology. To understand this contention, the definitions of MI and PI are reviewed, followed by results on the consistency between the two forms of invariance in the general case. The special parametric cases of factor analysis (strict factorial invariance) and linear regression analyses (strong regression invariance) are then described, along with findings on the inconsistency between the two forms of invariance in this context. Two numerical examples of inconsistency are reviewed in detail. The impact of violations of MI on accuracy of selection is illustrated. Finally, reasons for the slow dissemination of work on invariance are discussed, and the prospects for altering this situation are weighed.
Similar content being viewed by others
References
Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.
Ahmavaara, Y. (1954). The mathematical theory of factorial invariance under selection. Psychometrika, 19, 27–38.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education Joint Committee on Standards for Educational and Psychological Testing (1999). Standards for educational and psychological testing. Washington: AERA.
Birnbaum, M.H. (1979). Procedures for the detection and correction of salary inequities. In T.R. Pezzullo & B.E. Brittingham (Eds.), Salary equity (pp. 121–44). Lexington: Lexington Books.
Bloxom, B. (1972). Alternative approaches to factorial invariance. Psychometrika, 37, 425–440.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440.
Bridgeman, M.H., & Lewis, C. (1996). Gender differences in college mathematics grades and SAT-M scores: A reanalysis of Wainer and Steinberg. Journal of Educational Measurement, 33, 257–270.
Brown, C.H., & Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging developmental epidemiology paradigm. American Journal of Community Psychology, 27, 673–710.
Byrne, B.M. (1994). Testing for factorial validity, replication, and invariance of a measuring instrument: A paradigmatic application based on the Maslach Burnout Inventory. Multivariate Behavioral Research, 29, 289–311.
Clark, L.E. (2006). When a psychometric advance falls in the forest. Psychometrika, 71, 447–450.
Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115–124.
Drasgow, F., & Probst, T.A. (2004). The psychometrics of adaptation: Evaluating measurement equivalence across languages and cultures. In R.K. Hambleton, P.F. Merenda, & C.D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 265–296). Hillsdale: Lawrence Erlbaum.
Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955–963.
Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (2006). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale: Lawrence Erlbaum.
Hartigan, J.A., & Wigdor, A.K. (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington: National Academy Press.
Hofer, S.M., Horn, J.L., & Eber, H.W. (1997). A robust five-factor structure of the 16PF: Strong evidence from independent rotation and confirmatory factorial invariance procedures. Personality and Individual Differences, 23, 247–269.
Horn, J.L., & McArdle, J.J. (1992). A practical guide to measurement invariance in research on aging. Experimental Aging Research, 18, 117–144.
Humphreys, L.G. (1952). Individual differences. Annual Review of Psychology, 3, 131–150.
Humphreys, L.G. (1986). An analysis and evaluation of test and item bias in the prediction context. Psychological Bulletin, 71, 327–333.
Hunter, J.E., & Schmidt, F.L. (2000). Racial and gender bias in ability and achievement tests: Resolving the apparent paradox. Psychology, Public Policy, and Law, 6, 151–158.
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.
Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.), Latent trait and latent models (pp. 263–275). New York: Plenum.
Krakowski, M., & Czobor, P. (2004). Gender differences in violent behaviors: Relationship to clinical symptoms and psychosocial factors. American Journal of Psychiatry, 161, 459–465.
Lehmann, E.L. (1986). Testing statistical hypotheses. New York: Wiley.
Linn, R.L. (1984). Selection bias: Multiple meanings. Journal of Educational Measurement, 21, 33–47.
Linn, R.L., & Werts, C.E. (1971). Considerations for studies of test bias. Journal of Educational Measurement, 8, 1–4.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum.
Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Meredith, W. (1964a). Notes on factorial invariance. Psychometrika, 29, 177–185.
Meredith, W. (1964b). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206.
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.
Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311.
Millsap, R.E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30, 577–605.
Millsap, R.E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.
Millsap, R.E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33, 403–424.
Millsap, R.E., & Hartog, S.B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574–584.
Millsap, R.E., & Kwok, O.M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9, 93–115.
Millsap, R.E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
Neisser, U., Boodoo, G., Bourchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77–101.
Pentz, M.A., & Chou, C. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology, 62, 450–462.
Potthoff, R.F. (1966). Statistical aspects of the problem of biases in psychological tests (Institute of Statistics Mimeo Series No. 479). Chapel Hill, NC: Department of Statistics, University of North Carolina.
Riordan, C.R., Richardson, H.A., Schaffer, B.S., & Vandenberg, R.J. (2001). Alpha, beta, and gamma change: A review of past research with recommendations for new directions. In L.L. Neider & C. Schriesheim (Eds.), Equivalence in measurement (pp. 51–98). Greenwich: Information Age Publishing.
Sackett, P.R., & Wilk, S.L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929–954.
Sackett, P.R., Schmitt, N., Ellington, J.E., & Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist, 56, 302–318.
Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of over 85 years of research findings. Psychological Bulletin, 124, 262–274.
Schmidt, F.L., Pearlman, K., & Hunter, J.E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
Society for Industrial/Organizational Psychology (2003). Principles for the application and use of personnel selection procedures. Bowling Green: Society for Industrial Organizational Psychology.
Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale: Lawrence Erlbaum.
Thomson, G.H., & Lederman, W. (1939). The influence of multivariate selection on the factorial analysis of ability. British Journal of Psychology, 29, 288–305.
Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.
Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is based on the Presidential Address given at the International Meeting of the Psychometric Society in Tokyo, Japan, on July 11, 2007. This research was supported by National Institute of Mental Health grants 1P30 MH 068685-01A1 and RO1 MH64707-01.
Rights and permissions
About this article
Cite this article
Millsap, R.E. Invariance in Measurement and Prediction Revisited. Psychometrika 72, 461–473 (2007). https://doi.org/10.1007/s11336-007-9039-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-007-9039-7