Skip to main content
Log in

Invariance in Measurement and Prediction Revisited

  • Presidential Address
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Borsboom (Psychometrika, 71:425–440, 2006) noted that recent work on measurement invariance (MI) and predictive invariance (PI) has had little impact on the practice of measurement in psychology. To understand this contention, the definitions of MI and PI are reviewed, followed by results on the consistency between the two forms of invariance in the general case. The special parametric cases of factor analysis (strict factorial invariance) and linear regression analyses (strong regression invariance) are then described, along with findings on the inconsistency between the two forms of invariance in this context. Two numerical examples of inconsistency are reviewed in detail. The impact of violations of MI on accuracy of selection is illustrated. Finally, reasons for the slow dissemination of work on invariance are discussed, and the prospects for altering this situation are weighed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.

    Article  Google Scholar 

  • Ahmavaara, Y. (1954). The mathematical theory of factorial invariance under selection. Psychometrika, 19, 27–38.

    Article  Google Scholar 

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education Joint Committee on Standards for Educational and Psychological Testing (1999). Standards for educational and psychological testing. Washington: AERA.

    Google Scholar 

  • Birnbaum, M.H. (1979). Procedures for the detection and correction of salary inequities. In T.R. Pezzullo & B.E. Brittingham (Eds.), Salary equity (pp. 121–44). Lexington: Lexington Books.

    Google Scholar 

  • Bloxom, B. (1972). Alternative approaches to factorial invariance. Psychometrika, 37, 425–440.

    Article  Google Scholar 

  • Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440.

    Article  PubMed  Google Scholar 

  • Bridgeman, M.H., & Lewis, C. (1996). Gender differences in college mathematics grades and SAT-M scores: A reanalysis of Wainer and Steinberg. Journal of Educational Measurement, 33, 257–270.

    Article  Google Scholar 

  • Brown, C.H., & Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging developmental epidemiology paradigm. American Journal of Community Psychology, 27, 673–710.

    Article  PubMed  Google Scholar 

  • Byrne, B.M. (1994). Testing for factorial validity, replication, and invariance of a measuring instrument: A paradigmatic application based on the Maslach Burnout Inventory. Multivariate Behavioral Research, 29, 289–311.

    Article  Google Scholar 

  • Clark, L.E. (2006). When a psychometric advance falls in the forest. Psychometrika, 71, 447–450.

    Article  PubMed  Google Scholar 

  • Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115–124.

    Article  Google Scholar 

  • Drasgow, F., & Probst, T.A. (2004). The psychometrics of adaptation: Evaluating measurement equivalence across languages and cultures. In R.K. Hambleton, P.F. Merenda, & C.D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 265–296). Hillsdale: Lawrence Erlbaum.

    Google Scholar 

  • Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955–963.

    Article  PubMed  Google Scholar 

  • Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (2006). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale: Lawrence Erlbaum.

    Google Scholar 

  • Hartigan, J.A., & Wigdor, A.K. (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington: National Academy Press.

    Google Scholar 

  • Hofer, S.M., Horn, J.L., & Eber, H.W. (1997). A robust five-factor structure of the 16PF: Strong evidence from independent rotation and confirmatory factorial invariance procedures. Personality and Individual Differences, 23, 247–269.

    Article  Google Scholar 

  • Horn, J.L., & McArdle, J.J. (1992). A practical guide to measurement invariance in research on aging. Experimental Aging Research, 18, 117–144.

    PubMed  Google Scholar 

  • Humphreys, L.G. (1952). Individual differences. Annual Review of Psychology, 3, 131–150.

    Article  PubMed  Google Scholar 

  • Humphreys, L.G. (1986). An analysis and evaluation of test and item bias in the prediction context. Psychological Bulletin, 71, 327–333.

    Google Scholar 

  • Hunter, J.E., & Schmidt, F.L. (2000). Racial and gender bias in ability and achievement tests: Resolving the apparent paradox. Psychology, Public Policy, and Law, 6, 151–158.

    Article  Google Scholar 

  • Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.

    Google Scholar 

  • Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.), Latent trait and latent models (pp. 263–275). New York: Plenum.

    Google Scholar 

  • Krakowski, M., & Czobor, P. (2004). Gender differences in violent behaviors: Relationship to clinical symptoms and psychosocial factors. American Journal of Psychiatry, 161, 459–465.

    Article  PubMed  Google Scholar 

  • Lehmann, E.L. (1986). Testing statistical hypotheses. New York: Wiley.

    Google Scholar 

  • Linn, R.L. (1984). Selection bias: Multiple meanings. Journal of Educational Measurement, 21, 33–47.

    Article  Google Scholar 

  • Linn, R.L., & Werts, C.E. (1971). Considerations for studies of test bias. Journal of Educational Measurement, 8, 1–4.

    Article  Google Scholar 

  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum.

    Google Scholar 

  • Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.

    Article  Google Scholar 

  • Meredith, W. (1964a). Notes on factorial invariance. Psychometrika, 29, 177–185.

    Article  Google Scholar 

  • Meredith, W. (1964b). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206.

    Article  Google Scholar 

  • Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.

    Article  Google Scholar 

  • Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311.

    Article  Google Scholar 

  • Millsap, R.E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30, 577–605.

    Article  Google Scholar 

  • Millsap, R.E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.

    Article  Google Scholar 

  • Millsap, R.E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33, 403–424.

    Article  Google Scholar 

  • Millsap, R.E., & Hartog, S.B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574–584.

    Article  Google Scholar 

  • Millsap, R.E., & Kwok, O.M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9, 93–115.

    Article  PubMed  Google Scholar 

  • Millsap, R.E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.

    Article  Google Scholar 

  • Neisser, U., Boodoo, G., Bourchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77–101.

    Article  Google Scholar 

  • Pentz, M.A., & Chou, C. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology, 62, 450–462.

    Article  PubMed  Google Scholar 

  • Potthoff, R.F. (1966). Statistical aspects of the problem of biases in psychological tests (Institute of Statistics Mimeo Series No. 479). Chapel Hill, NC: Department of Statistics, University of North Carolina.

  • Riordan, C.R., Richardson, H.A., Schaffer, B.S., & Vandenberg, R.J. (2001). Alpha, beta, and gamma change: A review of past research with recommendations for new directions. In L.L. Neider & C. Schriesheim (Eds.), Equivalence in measurement (pp. 51–98). Greenwich: Information Age Publishing.

    Google Scholar 

  • Sackett, P.R., & Wilk, S.L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929–954.

    Article  PubMed  Google Scholar 

  • Sackett, P.R., Schmitt, N., Ellington, J.E., & Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist, 56, 302–318.

    Article  PubMed  Google Scholar 

  • Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of over 85 years of research findings. Psychological Bulletin, 124, 262–274.

    Article  Google Scholar 

  • Schmidt, F.L., Pearlman, K., & Hunter, J.E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724.

    Article  Google Scholar 

  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.

    Article  Google Scholar 

  • Society for Industrial/Organizational Psychology (2003). Principles for the application and use of personnel selection procedures. Bowling Green: Society for Industrial Organizational Psychology.

    Google Scholar 

  • Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.

    Article  Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale: Lawrence Erlbaum.

    Google Scholar 

  • Thomson, G.H., & Lederman, W. (1939). The influence of multivariate selection on the factorial analysis of ability. British Journal of Psychology, 29, 288–305.

    Google Scholar 

  • Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.

    Google Scholar 

  • Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger E. Millsap.

Additional information

This paper is based on the Presidential Address given at the International Meeting of the Psychometric Society in Tokyo, Japan, on July 11, 2007. This research was supported by National Institute of Mental Health grants 1P30 MH 068685-01A1 and RO1 MH64707-01.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millsap, R.E. Invariance in Measurement and Prediction Revisited. Psychometrika 72, 461–473 (2007). https://doi.org/10.1007/s11336-007-9039-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-007-9039-7

Keywords

Navigation