Social Indicators Research

, 103:219 | Cite as

Validity and the Consequences of Test Interpretation and Use

  • Anita M. HubleyEmail author
  • Bruno D. Zumbo


The vast majority of measures have, at their core, a purpose of personal and social change. If test developers and users want measures to have personal and social consequences and impact, then it is critical to consider the consequences and side effects of measurement in the validation process itself. The consequential basis of test interpretation and use, as introduced in Messick’s (Educational measurement, Macmillan, New York, pp. 13–103, 1989) progressive matrix model of unified validity theory, has been misunderstood by many measurement experts, test developers, researchers, and practitioners. The purposes of this paper were to (a) review Messick’s unified view of validity and clarify his consequential basis of test interpretation and use, (b) discuss the kinds of questions evoked by value implications and social consequences and their role in construct validity and score meaning, (c) present a reframing of Messick’s model and a new model of unified validity and validation, (d) bring the concept of multilevel measures under the same validation umbrella as individual differences measures, and (e) offer some thoughts and directions for more explicit consideration of value implications, intended social consequences, and unintended side effects of legitimate test interpretation and use. This paper has implications for the interpretation, use, and validation of both individual differences and multilevel measures in education, psychology, and health contexts.


Consequential validity Early development instrument Educational achievement Psychological assessment Testing Multilevel measures Social consequences Test interpretation Validity Value implications Values 


  1. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.Google Scholar
  2. Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37, 1–15.CrossRefGoogle Scholar
  3. Brennan, R. L. (2006). Perspectives on the evolution and future of educational measurement. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 1–16). Westport, CT: American Council on Education/Praeger.Google Scholar
  4. Cizek, G. J., Rosenberg, S., & Koons, H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68, 397–412.CrossRefGoogle Scholar
  5. Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education.Google Scholar
  6. Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  7. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.CrossRefGoogle Scholar
  8. Forer, B., & Zumbo, B. D. (2011). Validation of multilevel constructs: Validation methods and empirical findings for the EDI. Social Indicators Research. doi: 10.1007/s11205-011-9844-3.
  9. Hubley, A. M., & Zumbo, B. D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123, 207–215.CrossRefGoogle Scholar
  10. Janus, M. (2006). Early Development Instrument: An indicator of developmental health at school entry. Monograph from the proceedings of the International Conference on Measuring Early Child Development, Vaudreuil Quebec.Google Scholar
  11. Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: American Council on Education and National Council on Measurement in Education.Google Scholar
  12. Linn, R. L. (1997). Evaluating the validity of assessments: The consequences of use. Educational Measurement: Issues and Practice, 16, 14–16.CrossRefGoogle Scholar
  13. Linn, R. L. (2006). Validity of inferences from test-based educational accountability systems. Journal of Personnel Evaluation in Education, 19, 5–15.CrossRefGoogle Scholar
  14. Linn, R. L. (2008). Validation of uses and interpretations of state assessments. Washington, DC: Council of Chief State School Officers.Google Scholar
  15. Linn, R. L. (2009). The concept of validity in the context of NCLB. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 195–212). Charlotte, NC: IAP—Information Age Publishing, Inc.Google Scholar
  16. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports (Monograph Supplement), 3, 635–694.CrossRefGoogle Scholar
  17. Mehrens, W. A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16, 16–18.CrossRefGoogle Scholar
  18. Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012–1027.CrossRefGoogle Scholar
  19. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: Macmillan.Google Scholar
  20. Messick, S. (1995). Validity of psychological assessment. Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.CrossRefGoogle Scholar
  21. Messick, S. (1998). Test validity: A matter of consequences. Social Indicators Research, 45, 35–44.CrossRefGoogle Scholar
  22. Messick, S. (2000). Consequences of test interpretation and use: The fusion of validity and values in psychological assessment. In R. D. Goffin & E. Helmes (Eds.), Problems and solutions in human assessment: Honoring Douglas N. Jackson at seventy (pp. 3–20). Boston: Kluwer Academic Publishers.Google Scholar
  23. Popham, W. J. (1997). Consequential validity: Right concern–wrong concept. Educational Measurement: Issues and Practice, 16, 9–13.CrossRefGoogle Scholar
  24. Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401.CrossRefGoogle Scholar
  25. Shepard, L. A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16, 5–8,13, 24.Google Scholar
  26. Willingham, W. W. (2002). Seeking fair alternatives in construct design. In H. I. Braun, D. N. Jackson, D. E. Wiley, & S. Messick (Eds.), The role of constructs in psychological and educational measurement. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  27. Willingham, W. W., & Cole, N. J. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  28. Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, vol. 26: Psychometrics (pp. 45–79). The Netherlands: Elsevier Science B.V.Google Scholar
  29. Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP—Information Age Publishing, Inc.Google Scholar
  30. Zumbo, B. D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. A. Bovaird, K. Geisinger, & C. Buckendahl (Eds). High stakes testing in educationscience and practice in K-12 settings [Festschrift to Barbara Plake]. Washington, DC: American Psychological Association Press (in press).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.Department of ECPSUniversity of British ColumbiaVancouverCanada

Personalised recommendations