Invariance in Measurement and Prediction Revisited

Millsap, Roger E.

doi:10.1007/s11336-007-9039-7

Invariance in Measurement and Prediction Revisited

Presidential Address
Published: 16 November 2007

Volume 72, pages 461–473, (2007)
Cite this article

Psychometrika Aims and scope Submit manuscript

Roger E. Millsap¹

1280 Accesses
71 Citations
6 Altmetric
Explore all metrics

Abstract

Borsboom (Psychometrika, 71:425–440, 2006) noted that recent work on measurement invariance (MI) and predictive invariance (PI) has had little impact on the practice of measurement in psychology. To understand this contention, the definitions of MI and PI are reviewed, followed by results on the consistency between the two forms of invariance in the general case. The special parametric cases of factor analysis (strict factorial invariance) and linear regression analyses (strong regression invariance) are then described, along with findings on the inconsistency between the two forms of invariance in this context. Two numerical examples of inconsistency are reviewed in detail. The impact of violations of MI on accuracy of selection is illustrated. Finally, reasons for the slow dissemination of work on invariance are discussed, and the prospects for altering this situation are weighed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Testing Measurement and Structural Invariance

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Refining the Bayesian Approach to Unifying Generalisation

Article Open access 18 February 2022

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.
Article Google Scholar
Ahmavaara, Y. (1954). The mathematical theory of factorial invariance under selection. Psychometrika, 19, 27–38.
Article Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education Joint Committee on Standards for Educational and Psychological Testing (1999). Standards for educational and psychological testing. Washington: AERA.
Google Scholar
Birnbaum, M.H. (1979). Procedures for the detection and correction of salary inequities. In T.R. Pezzullo & B.E. Brittingham (Eds.), Salary equity (pp. 121–44). Lexington: Lexington Books.
Google Scholar
Bloxom, B. (1972). Alternative approaches to factorial invariance. Psychometrika, 37, 425–440.
Article Google Scholar
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440.
Article PubMed Google Scholar
Bridgeman, M.H., & Lewis, C. (1996). Gender differences in college mathematics grades and SAT-M scores: A reanalysis of Wainer and Steinberg. Journal of Educational Measurement, 33, 257–270.
Article Google Scholar
Brown, C.H., & Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging developmental epidemiology paradigm. American Journal of Community Psychology, 27, 673–710.
Article PubMed Google Scholar
Byrne, B.M. (1994). Testing for factorial validity, replication, and invariance of a measuring instrument: A paradigmatic application based on the Maslach Burnout Inventory. Multivariate Behavioral Research, 29, 289–311.
Article Google Scholar
Clark, L.E. (2006). When a psychometric advance falls in the forest. Psychometrika, 71, 447–450.
Article PubMed Google Scholar
Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115–124.
Article Google Scholar
Drasgow, F., & Probst, T.A. (2004). The psychometrics of adaptation: Evaluating measurement equivalence across languages and cultures. In R.K. Hambleton, P.F. Merenda, & C.D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 265–296). Hillsdale: Lawrence Erlbaum.
Google Scholar
Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955–963.
Article PubMed Google Scholar
Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (2006). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale: Lawrence Erlbaum.
Google Scholar
Hartigan, J.A., & Wigdor, A.K. (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington: National Academy Press.
Google Scholar
Hofer, S.M., Horn, J.L., & Eber, H.W. (1997). A robust five-factor structure of the 16PF: Strong evidence from independent rotation and confirmatory factorial invariance procedures. Personality and Individual Differences, 23, 247–269.
Article Google Scholar
Horn, J.L., & McArdle, J.J. (1992). A practical guide to measurement invariance in research on aging. Experimental Aging Research, 18, 117–144.
PubMed Google Scholar
Humphreys, L.G. (1952). Individual differences. Annual Review of Psychology, 3, 131–150.
Article PubMed Google Scholar
Humphreys, L.G. (1986). An analysis and evaluation of test and item bias in the prediction context. Psychological Bulletin, 71, 327–333.
Google Scholar
Hunter, J.E., & Schmidt, F.L. (2000). Racial and gender bias in ability and achievement tests: Resolving the apparent paradox. Psychology, Public Policy, and Law, 6, 151–158.
Article Google Scholar
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.
Google Scholar
Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.), Latent trait and latent models (pp. 263–275). New York: Plenum.
Google Scholar
Krakowski, M., & Czobor, P. (2004). Gender differences in violent behaviors: Relationship to clinical symptoms and psychosocial factors. American Journal of Psychiatry, 161, 459–465.
Article PubMed Google Scholar
Lehmann, E.L. (1986). Testing statistical hypotheses. New York: Wiley.
Google Scholar
Linn, R.L. (1984). Selection bias: Multiple meanings. Journal of Educational Measurement, 21, 33–47.
Article Google Scholar
Linn, R.L., & Werts, C.E. (1971). Considerations for studies of test bias. Journal of Educational Measurement, 8, 1–4.
Article Google Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum.
Google Scholar
Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Article Google Scholar
Meredith, W. (1964a). Notes on factorial invariance. Psychometrika, 29, 177–185.
Article Google Scholar
Meredith, W. (1964b). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206.
Article Google Scholar
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543.
Article Google Scholar
Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311.
Article Google Scholar
Millsap, R.E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30, 577–605.
Article Google Scholar
Millsap, R.E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260.
Article Google Scholar
Millsap, R.E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33, 403–424.
Article Google Scholar
Millsap, R.E., & Hartog, S.B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574–584.
Article Google Scholar
Millsap, R.E., & Kwok, O.M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9, 93–115.
Article PubMed Google Scholar
Millsap, R.E., & Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
Article Google Scholar
Neisser, U., Boodoo, G., Bourchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77–101.
Article Google Scholar
Pentz, M.A., & Chou, C. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology, 62, 450–462.
Article PubMed Google Scholar
Potthoff, R.F. (1966). Statistical aspects of the problem of biases in psychological tests (Institute of Statistics Mimeo Series No. 479). Chapel Hill, NC: Department of Statistics, University of North Carolina.
Riordan, C.R., Richardson, H.A., Schaffer, B.S., & Vandenberg, R.J. (2001). Alpha, beta, and gamma change: A review of past research with recommendations for new directions. In L.L. Neider & C. Schriesheim (Eds.), Equivalence in measurement (pp. 51–98). Greenwich: Information Age Publishing.
Google Scholar
Sackett, P.R., & Wilk, S.L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929–954.
Article PubMed Google Scholar
Sackett, P.R., Schmitt, N., Ellington, J.E., & Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist, 56, 302–318.
Article PubMed Google Scholar
Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of over 85 years of research findings. Psychological Bulletin, 124, 262–274.
Article Google Scholar
Schmidt, F.L., Pearlman, K., & Hunter, J.E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724.
Article Google Scholar
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
Article Google Scholar
Society for Industrial/Organizational Psychology (2003). Principles for the application and use of personnel selection procedures. Bowling Green: Society for Industrial Organizational Psychology.
Google Scholar
Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Article Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale: Lawrence Erlbaum.
Google Scholar
Thomson, G.H., & Lederman, W. (1939). The influence of multivariate selection on the factorial analysis of ability. British Journal of Psychology, 29, 288–305.
Google Scholar
Thurstone, L.L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.
Google Scholar
Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Arizona State University, Box 871104, Tempe, AZ 85287-1104, USA
Roger E. Millsap

Authors

Roger E. Millsap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger E. Millsap.

Additional information

This paper is based on the Presidential Address given at the International Meeting of the Psychometric Society in Tokyo, Japan, on July 11, 2007. This research was supported by National Institute of Mental Health grants 1P30 MH 068685-01A1 and RO1 MH64707-01.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millsap, R.E. Invariance in Measurement and Prediction Revisited. Psychometrika 72, 461–473 (2007). https://doi.org/10.1007/s11336-007-9039-7

Download citation

Received: 30 August 2007
Revised: 30 August 2007
Published: 16 November 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s11336-007-9039-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Invariance in Measurement and Prediction Revisited

Abstract

Access this article

Similar content being viewed by others

Testing Measurement and Structural Invariance

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Refining the Bayesian Approach to Unifying Generalisation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Invariance in Measurement and Prediction Revisited

Abstract

Access this article

Similar content being viewed by others

Testing Measurement and Structural Invariance

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Refining the Bayesian Approach to Unifying Generalisation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation