Skip to main content
Log in

Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting

  • Published:
Journal of Personnel Evaluation in Education Aims and scope Submit manuscript

Abstract

In 1991 the National Board for Professional Teaching Standards established a Technical Analysis Group (TAG) with responsibility for conducting research on the measurement quality of its innovative performance assessments of classroom teachers. The TAG's measurement research agenda focused on four principal areas of inquiry and development—(1) validation of the Board's assessments, (2) characterizing the reliability of the Board's assessments, (3) establishing standards of performance for awarding candidate teachers National Board Certification, and (4) investigation of the presence and degree of adverse impact and bias in the Board's assessments. Because the National Board's assessments differed materially from conventional tests that had been used in the past for assessing teachers' knowledge and skills (for example, the National Teacher Examinations), textbook approaches to evaluation of their measurement properties were largely inapplicable. New measurement methodology was thus required. This article contains a summary of the measurement strategies developed and employed by the TAG. Because investigations of the degree of adverse impact and bias in the National Board's assessments are described in another contribution to this journal issue, the article considers only the first three issues mentioned above. The article begins with a brief description of the structure of the National Board's assessments. A final section of the article identifies some remaining measurement dilemmas and provides suggestions for additional inquiry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.

    Google Scholar 

  • Coombs, C. (1964). A theory of data. New York: Wiley.

    Google Scholar 

  • Cronbach, L.J. (1995). Personal communication.

  • Crocker, L. (1997). Assessing the content representativeness of performance assessment exercises. Applied Measurement in Education, 10, 83–95.

    Google Scholar 

  • Ebel, R.L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Edwards, W. (1977). How to use multiattribute utility measurements for social decision-making. IEEE Transactions on Systems, Man and Cybernetics, SMC-7, 326–340.

    Google Scholar 

  • Edwards, W., & Newman, J.R. (1982). Multiattribute evaluation. Beverly Hills, CA: Sage Publications.

    Google Scholar 

  • Haertel, E.H., Harnishfeger, A., Pifer, R.E., Wiley, D.E., & Woods, E.M. (1977). Achievement measures as Title I eligibility criteria: Concepts, methods, and eligibility estimation. Technical Report of the M-L Group for Policy Studies in Education, Chicago: CEMREL.

    Google Scholar 

  • Hattie, J.A. (1996). Validating the specification of a complex content domain. Paper presented at the Annual Conference of the American Educational Research Association, New York.

  • Jaeger, R.M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.

    Google Scholar 

  • Jaeger, R.M. (1994a). On the cognitive construction of standard-setting judgments: The case of configural scoring. Paper presented before the NCES/NAGB Conference on Standard-Setting Methodology, Washington, DC, October.

  • Jaeger, R.M. (1994b). Setting standards for complex performances: An iterative judgmental policy capturing strategy. Paper presented at the annual meeting of the American Psychological Society, Washington, DC, June.

  • Jaeger, R.M. (1995a). Setting standards for complex performances: an iterative judgmental policy capturing strategy. Educational Measurement: Issues and Practice, 14, 16–20.

    Google Scholar 

  • Jaeger, R.M. (1995b). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8, 15–40.

    Google Scholar 

  • Jaeger, R.M., Hambleton, R.L., & Plake, B.S. (1995, April). Eliciting configural performance standards through a sequenced application of complementary methods. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

  • Linn, R.L., & Miller, M.D. (1986). Review of test validation procedures and results. In R.M. Jaeger, J.C. Busch, L. Bond, R.L. Linn, M.D. Miller, J. Millman, R.G. O'Sullivan & R. Traub, An evaluation of the Georgia teacher certification testing program. Greensboro, NC: Center for Educational Research and Evaluation, University of North Carolina at Greensboro.

    Google Scholar 

  • Livingston, S.A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.

    Google Scholar 

  • Lord, F.M. (1965). A strong true-score theory, with applications. Psychometrika, 30, 239–270.

    Google Scholar 

  • Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3–19.

    Google Scholar 

  • Pearlman, M.A. (1997, March). What technology cannot offer for setting standards on complex performance examinations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

  • Pitz, G.F., & Sachs, N.J. (1984). Judgment and decision: Theory and application. Annual Review of Psychology, 35, 139–163.

    Google Scholar 

  • Plake, B.S., Hambleton, R.K., & Jaeger, R.M. (1997). A new standard-setting method for performance assessments: the dominant profile judgment method and some field test results. Educational and Psychological Measurement, 57, 400–411.

    Google Scholar 

  • Putnam, S.E., Pence, P., & Jaeger, R.M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments, Applied Measurement in Education, 8, 57–83.

    Google Scholar 

  • Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.

    Google Scholar 

  • Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.

    Google Scholar 

  • Standards for Educational and Psychological Testing. (1985). Washington, D.C.: American Psychological Association.

  • Traub, R.E., Haertel, E.H., & Shavelson, R. (1996, April). The effects of measurement error on the trustworthiness of examinee classifications. Paper presented at the annual meeting of the American Educational Research Association, New York.

  • U.S. Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, August 25, 1978.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaeger, R.M. Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting. Journal of Personnel Evaluation in Education 12, 189–210 (1998). https://doi.org/10.1023/A:1008085128230

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008085128230

Keywords

Navigation