, Volume 72, Issue 4, pp 461–473 | Cite as

Invariance in Measurement and Prediction Revisited

  • Roger E. MillsapEmail author
Presidential Address


Borsboom (Psychometrika, 71:425–440, 2006) noted that recent work on measurement invariance (MI) and predictive invariance (PI) has had little impact on the practice of measurement in psychology. To understand this contention, the definitions of MI and PI are reviewed, followed by results on the consistency between the two forms of invariance in the general case. The special parametric cases of factor analysis (strict factorial invariance) and linear regression analyses (strong regression invariance) are then described, along with findings on the inconsistency between the two forms of invariance in this context. Two numerical examples of inconsistency are reviewed in detail. The impact of violations of MI on accuracy of selection is illustrated. Finally, reasons for the slow dissemination of work on invariance are discussed, and the prospects for altering this situation are weighed.


measurement invariance predictive invariance factorial invariance test bias selection accuracy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91. CrossRefGoogle Scholar
  2. Ahmavaara, Y. (1954). The mathematical theory of factorial invariance under selection. Psychometrika, 19, 27–38. CrossRefGoogle Scholar
  3. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education Joint Committee on Standards for Educational and Psychological Testing (1999). Standards for educational and psychological testing. Washington: AERA. Google Scholar
  4. Birnbaum, M.H. (1979). Procedures for the detection and correction of salary inequities. In T.R. Pezzullo & B.E. Brittingham (Eds.), Salary equity (pp. 121–44). Lexington: Lexington Books. Google Scholar
  5. Bloxom, B. (1972). Alternative approaches to factorial invariance. Psychometrika, 37, 425–440. CrossRefGoogle Scholar
  6. Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440. CrossRefPubMedGoogle Scholar
  7. Bridgeman, M.H., & Lewis, C. (1996). Gender differences in college mathematics grades and SAT-M scores: A reanalysis of Wainer and Steinberg. Journal of Educational Measurement, 33, 257–270. CrossRefGoogle Scholar
  8. Brown, C.H., & Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging developmental epidemiology paradigm. American Journal of Community Psychology, 27, 673–710. CrossRefPubMedGoogle Scholar
  9. Byrne, B.M. (1994). Testing for factorial validity, replication, and invariance of a measuring instrument: A paradigmatic application based on the Maslach Burnout Inventory. Multivariate Behavioral Research, 29, 289–311. CrossRefGoogle Scholar
  10. Clark, L.E. (2006). When a psychometric advance falls in the forest. Psychometrika, 71, 447–450. CrossRefPubMedGoogle Scholar
  11. Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115–124. CrossRefGoogle Scholar
  12. Drasgow, F., & Probst, T.A. (2004). The psychometrics of adaptation: Evaluating measurement equivalence across languages and cultures. In R.K. Hambleton, P.F. Merenda, & C.D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 265–296). Hillsdale: Lawrence Erlbaum. Google Scholar
  13. Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955–963. CrossRefPubMedGoogle Scholar
  14. Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (2006). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale: Lawrence Erlbaum. Google Scholar
  15. Hartigan, J.A., & Wigdor, A.K. (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington: National Academy Press. Google Scholar
  16. Hofer, S.M., Horn, J.L., & Eber, H.W. (1997). A robust five-factor structure of the 16PF: Strong evidence from independent rotation and confirmatory factorial invariance procedures. Personality and Individual Differences, 23, 247–269. CrossRefGoogle Scholar
  17. Horn, J.L., & McArdle, J.J. (1992). A practical guide to measurement invariance in research on aging. Experimental Aging Research, 18, 117–144. PubMedGoogle Scholar
  18. Humphreys, L.G. (1952). Individual differences. Annual Review of Psychology, 3, 131–150. CrossRefPubMedGoogle Scholar
  19. Humphreys, L.G. (1986). An analysis and evaluation of test and item bias in the prediction context. Psychological Bulletin, 71, 327–333. Google Scholar
  20. Hunter, J.E., & Schmidt, F.L. (2000). Racial and gender bias in ability and achievement tests: Resolving the apparent paradox. Psychology, Public Policy, and Law, 6, 151–158. CrossRefGoogle Scholar
  21. Jensen, A.R. (1980). Bias in mental testing. New York: Free Press. Google Scholar
  22. Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.), Latent trait and latent models (pp. 263–275). New York: Plenum. Google Scholar
  23. Krakowski, M., & Czobor, P. (2004). Gender differences in violent behaviors: Relationship to clinical symptoms and psychosocial factors. American Journal of Psychiatry, 161, 459–465. CrossRefPubMedGoogle Scholar
  24. Lehmann, E.L. (1986). Testing statistical hypotheses. New York: Wiley. Google Scholar
  25. Linn, R.L. (1984). Selection bias: Multiple meanings. Journal of Educational Measurement, 21, 33–47. CrossRefGoogle Scholar
  26. Linn, R.L., & Werts, C.E. (1971). Considerations for studies of test bias. Journal of Educational Measurement, 8, 1–4. CrossRefGoogle Scholar
  27. Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum. Google Scholar
  28. Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143. CrossRefGoogle Scholar
  29. Meredith, W. (1964a). Notes on factorial invariance. Psychometrika, 29, 177–185. CrossRefGoogle Scholar
  30. Meredith, W. (1964b). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206. CrossRefGoogle Scholar
  31. Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543. CrossRefGoogle Scholar
  32. Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311. CrossRefGoogle Scholar
  33. Millsap, R.E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30, 577–605. CrossRefGoogle Scholar
  34. Millsap, R.E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260. CrossRefGoogle Scholar
  35. Millsap, R.E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33, 403–424. CrossRefGoogle Scholar
  36. Millsap, R.E., & Hartog, S.B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574–584. CrossRefGoogle Scholar
  37. Millsap, R.E., & Kwok, O.M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9, 93–115. CrossRefPubMedGoogle Scholar
  38. Millsap, R.E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402. CrossRefGoogle Scholar
  39. Neisser, U., Boodoo, G., Bourchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77–101. CrossRefGoogle Scholar
  40. Pentz, M.A., & Chou, C. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology, 62, 450–462. CrossRefPubMedGoogle Scholar
  41. Potthoff, R.F. (1966). Statistical aspects of the problem of biases in psychological tests (Institute of Statistics Mimeo Series No. 479). Chapel Hill, NC: Department of Statistics, University of North Carolina. Google Scholar
  42. Riordan, C.R., Richardson, H.A., Schaffer, B.S., & Vandenberg, R.J. (2001). Alpha, beta, and gamma change: A review of past research with recommendations for new directions. In L.L. Neider & C. Schriesheim (Eds.), Equivalence in measurement (pp. 51–98). Greenwich: Information Age Publishing. Google Scholar
  43. Sackett, P.R., & Wilk, S.L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929–954. CrossRefPubMedGoogle Scholar
  44. Sackett, P.R., Schmitt, N., Ellington, J.E., & Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist, 56, 302–318. CrossRefPubMedGoogle Scholar
  45. Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of over 85 years of research findings. Psychological Bulletin, 124, 262–274. CrossRefGoogle Scholar
  46. Schmidt, F.L., Pearlman, K., & Hunter, J.E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724. CrossRefGoogle Scholar
  47. Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194. CrossRefGoogle Scholar
  48. Society for Industrial/Organizational Psychology (2003). Principles for the application and use of personnel selection procedures. Bowling Green: Society for Industrial Organizational Psychology. Google Scholar
  49. Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325. CrossRefGoogle Scholar
  50. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale: Lawrence Erlbaum. Google Scholar
  51. Thomson, G.H., & Lederman, W. (1939). The influence of multivariate selection on the factorial analysis of ability. British Journal of Psychology, 29, 288–305. Google Scholar
  52. Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press. Google Scholar
  53. Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2007

Authors and Affiliations

  1. 1.Department of PsychologyArizona State UniversityTempeUSA

Personalised recommendations