Abstract
In 1991 the National Board for Professional Teaching Standards established a Technical Analysis Group (TAG) with responsibility for conducting research on the measurement quality of its innovative performance assessments of classroom teachers. The TAG's measurement research agenda focused on four principal areas of inquiry and development—(1) validation of the Board's assessments, (2) characterizing the reliability of the Board's assessments, (3) establishing standards of performance for awarding candidate teachers National Board Certification, and (4) investigation of the presence and degree of adverse impact and bias in the Board's assessments. Because the National Board's assessments differed materially from conventional tests that had been used in the past for assessing teachers' knowledge and skills (for example, the National Teacher Examinations), textbook approaches to evaluation of their measurement properties were largely inapplicable. New measurement methodology was thus required. This article contains a summary of the measurement strategies developed and employed by the TAG. Because investigations of the degree of adverse impact and bias in the National Board's assessments are described in another contribution to this journal issue, the article considers only the first three issues mentioned above. The article begins with a brief description of the structure of the National Board's assessments. A final section of the article identifies some remaining measurement dilemmas and provides suggestions for additional inquiry.
Similar content being viewed by others
References
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.
Coombs, C. (1964). A theory of data. New York: Wiley.
Cronbach, L.J. (1995). Personal communication.
Crocker, L. (1997). Assessing the content representativeness of performance assessment exercises. Applied Measurement in Education, 10, 83–95.
Ebel, R.L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.
Edwards, W. (1977). How to use multiattribute utility measurements for social decision-making. IEEE Transactions on Systems, Man and Cybernetics, SMC-7, 326–340.
Edwards, W., & Newman, J.R. (1982). Multiattribute evaluation. Beverly Hills, CA: Sage Publications.
Haertel, E.H., Harnishfeger, A., Pifer, R.E., Wiley, D.E., & Woods, E.M. (1977). Achievement measures as Title I eligibility criteria: Concepts, methods, and eligibility estimation. Technical Report of the M-L Group for Policy Studies in Education, Chicago: CEMREL.
Hattie, J.A. (1996). Validating the specification of a complex content domain. Paper presented at the Annual Conference of the American Educational Research Association, New York.
Jaeger, R.M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.
Jaeger, R.M. (1994a). On the cognitive construction of standard-setting judgments: The case of configural scoring. Paper presented before the NCES/NAGB Conference on Standard-Setting Methodology, Washington, DC, October.
Jaeger, R.M. (1994b). Setting standards for complex performances: An iterative judgmental policy capturing strategy. Paper presented at the annual meeting of the American Psychological Society, Washington, DC, June.
Jaeger, R.M. (1995a). Setting standards for complex performances: an iterative judgmental policy capturing strategy. Educational Measurement: Issues and Practice, 14, 16–20.
Jaeger, R.M. (1995b). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8, 15–40.
Jaeger, R.M., Hambleton, R.L., & Plake, B.S. (1995, April). Eliciting configural performance standards through a sequenced application of complementary methods. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Linn, R.L., & Miller, M.D. (1986). Review of test validation procedures and results. In R.M. Jaeger, J.C. Busch, L. Bond, R.L. Linn, M.D. Miller, J. Millman, R.G. O'Sullivan & R. Traub, An evaluation of the Georgia teacher certification testing program. Greensboro, NC: Center for Educational Research and Evaluation, University of North Carolina at Greensboro.
Livingston, S.A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.
Lord, F.M. (1965). A strong true-score theory, with applications. Psychometrika, 30, 239–270.
Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3–19.
Pearlman, M.A. (1997, March). What technology cannot offer for setting standards on complex performance examinations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Pitz, G.F., & Sachs, N.J. (1984). Judgment and decision: Theory and application. Annual Review of Psychology, 35, 139–163.
Plake, B.S., Hambleton, R.K., & Jaeger, R.M. (1997). A new standard-setting method for performance assessments: the dominant profile judgment method and some field test results. Educational and Psychological Measurement, 57, 400–411.
Putnam, S.E., Pence, P., & Jaeger, R.M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments, Applied Measurement in Education, 8, 57–83.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.
Standards for Educational and Psychological Testing. (1985). Washington, D.C.: American Psychological Association.
Traub, R.E., Haertel, E.H., & Shavelson, R. (1996, April). The effects of measurement error on the trustworthiness of examinee classifications. Paper presented at the annual meeting of the American Educational Research Association, New York.
U.S. Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, August 25, 1978.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jaeger, R.M. Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting. Journal of Personnel Evaluation in Education 12, 189–210 (1998). https://doi.org/10.1023/A:1008085128230
Issue Date:
DOI: https://doi.org/10.1023/A:1008085128230