Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting

  • Richard M. Jaeger

Abstract

In 1991 the National Board for Professional Teaching Standards established a Technical Analysis Group (TAG) with responsibility for conducting research on the measurement quality of its innovative performance assessments of classroom teachers. The TAG's measurement research agenda focused on four principal areas of inquiry and development—(1) validation of the Board's assessments, (2) characterizing the reliability of the Board's assessments, (3) establishing standards of performance for awarding candidate teachers National Board Certification, and (4) investigation of the presence and degree of adverse impact and bias in the Board's assessments. Because the National Board's assessments differed materially from conventional tests that had been used in the past for assessing teachers' knowledge and skills (for example, the National Teacher Examinations), textbook approaches to evaluation of their measurement properties were largely inapplicable. New measurement methodology was thus required. This article contains a summary of the measurement strategies developed and employed by the TAG. Because investigations of the degree of adverse impact and bias in the National Board's assessments are described in another contribution to this journal issue, the article considers only the first three issues mentioned above. The article begins with a brief description of the structure of the National Board's assessments. A final section of the article identifies some remaining measurement dilemmas and provides suggestions for additional inquiry.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.Google Scholar
  2. Coombs, C. (1964). A theory of data. New York: Wiley.Google Scholar
  3. Cronbach, L.J. (1995). Personal communication.Google Scholar
  4. Crocker, L. (1997). Assessing the content representativeness of performance assessment exercises. Applied Measurement in Education, 10, 83–95.Google Scholar
  5. Ebel, R.L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  6. Edwards, W. (1977). How to use multiattribute utility measurements for social decision-making. IEEE Transactions on Systems, Man and Cybernetics, SMC-7, 326–340.Google Scholar
  7. Edwards, W., & Newman, J.R. (1982). Multiattribute evaluation. Beverly Hills, CA: Sage Publications.Google Scholar
  8. Haertel, E.H., Harnishfeger, A., Pifer, R.E., Wiley, D.E., & Woods, E.M. (1977). Achievement measures as Title I eligibility criteria: Concepts, methods, and eligibility estimation. Technical Report of the M-L Group for Policy Studies in Education, Chicago: CEMREL.Google Scholar
  9. Hattie, J.A. (1996). Validating the specification of a complex content domain. Paper presented at the Annual Conference of the American Educational Research Association, New York.Google Scholar
  10. Jaeger, R.M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.Google Scholar
  11. Jaeger, R.M. (1994a). On the cognitive construction of standard-setting judgments: The case of configural scoring. Paper presented before the NCES/NAGB Conference on Standard-Setting Methodology, Washington, DC, October.Google Scholar
  12. Jaeger, R.M. (1994b). Setting standards for complex performances: An iterative judgmental policy capturing strategy. Paper presented at the annual meeting of the American Psychological Society, Washington, DC, June.Google Scholar
  13. Jaeger, R.M. (1995a). Setting standards for complex performances: an iterative judgmental policy capturing strategy. Educational Measurement: Issues and Practice, 14, 16–20.Google Scholar
  14. Jaeger, R.M. (1995b). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8, 15–40.Google Scholar
  15. Jaeger, R.M., Hambleton, R.L., & Plake, B.S. (1995, April). Eliciting configural performance standards through a sequenced application of complementary methods. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.Google Scholar
  16. Linn, R.L., & Miller, M.D. (1986). Review of test validation procedures and results. In R.M. Jaeger, J.C. Busch, L. Bond, R.L. Linn, M.D. Miller, J. Millman, R.G. O'Sullivan & R. Traub, An evaluation of the Georgia teacher certification testing program. Greensboro, NC: Center for Educational Research and Evaluation, University of North Carolina at Greensboro.Google Scholar
  17. Livingston, S.A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.Google Scholar
  18. Lord, F.M. (1965). A strong true-score theory, with applications. Psychometrika, 30, 239–270.Google Scholar
  19. Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3–19.Google Scholar
  20. Pearlman, M.A. (1997, March). What technology cannot offer for setting standards on complex performance examinations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.Google Scholar
  21. Pitz, G.F., & Sachs, N.J. (1984). Judgment and decision: Theory and application. Annual Review of Psychology, 35, 139–163.Google Scholar
  22. Plake, B.S., Hambleton, R.K., & Jaeger, R.M. (1997). A new standard-setting method for performance assessments: the dominant profile judgment method and some field test results. Educational and Psychological Measurement, 57, 400–411.Google Scholar
  23. Putnam, S.E., Pence, P., & Jaeger, R.M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments, Applied Measurement in Education, 8, 57–83.Google Scholar
  24. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.Google Scholar
  25. Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.Google Scholar
  26. Standards for Educational and Psychological Testing. (1985). Washington, D.C.: American Psychological Association.Google Scholar
  27. Traub, R.E., Haertel, E.H., & Shavelson, R. (1996, April). The effects of measurement error on the trustworthiness of examinee classifications. Paper presented at the annual meeting of the American Educational Research Association, New York.Google Scholar
  28. U.S. Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, August 25, 1978.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Richard M. Jaeger
    • 1
  1. 1.Center for Educational Research and EvaluationUniversity of North Carolina at Greensboro–Greensboro

Personalised recommendations