Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.
Coombs, C. (1964). A theory of data. New York: Wiley.
Cronbach, L.J. (1995). Personal communication.
Crocker, L. (1997). Assessing the content representativeness of performance assessment exercises. Applied Measurement in Education, 10, 83–95.
Ebel, R.L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.
Edwards, W. (1977). How to use multiattribute utility measurements for social decision-making. IEEE Transactions on Systems, Man and Cybernetics, SMC-7, 326–340.
Edwards, W., & Newman, J.R. (1982). Multiattribute evaluation. Beverly Hills, CA: Sage Publications.
Haertel, E.H., Harnishfeger, A., Pifer, R.E., Wiley, D.E., & Woods, E.M. (1977). Achievement measures as Title I eligibility criteria: Concepts, methods, and eligibility estimation. Technical Report of the M-L Group for Policy Studies in Education, Chicago: CEMREL.
Hattie, J.A. (1996). Validating the specification of a complex content domain. Paper presented at the Annual Conference of the American Educational Research Association, New York.
Jaeger, R.M. (1982). An iterative structured judgment process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.
Jaeger, R.M. (1994a). On the cognitive construction of standard-setting judgments: The case of configural scoring. Paper presented before the NCES/NAGB Conference on Standard-Setting Methodology, Washington, DC, October.
Jaeger, R.M. (1994b). Setting standards for complex performances: An iterative judgmental policy capturing strategy. Paper presented at the annual meeting of the American Psychological Society, Washington, DC, June.
Jaeger, R.M. (1995a). Setting standards for complex performances: an iterative judgmental policy capturing strategy. Educational Measurement: Issues and Practice, 14, 16–20.
Jaeger, R.M. (1995b). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8, 15–40.
Jaeger, R.M., Hambleton, R.L., & Plake, B.S. (1995, April). Eliciting configural performance standards through a sequenced application of complementary methods. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Linn, R.L., & Miller, M.D. (1986). Review of test validation procedures and results. In R.M. Jaeger, J.C. Busch, L. Bond, R.L. Linn, M.D. Miller, J. Millman, R.G. O'Sullivan & R. Traub, An evaluation of the Georgia teacher certification testing program. Greensboro, NC: Center for Educational Research and Evaluation, University of North Carolina at Greensboro.
Livingston, S.A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.
Lord, F.M. (1965). A strong true-score theory, with applications. Psychometrika, 30, 239–270.
Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3–19.
Pearlman, M.A. (1997, March). What technology cannot offer for setting standards on complex performance examinations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Pitz, G.F., & Sachs, N.J. (1984). Judgment and decision: Theory and application. Annual Review of Psychology, 35, 139–163.
Plake, B.S., Hambleton, R.K., & Jaeger, R.M. (1997). A new standard-setting method for performance assessments: the dominant profile judgment method and some field test results. Educational and Psychological Measurement, 57, 400–411.
Putnam, S.E., Pence, P., & Jaeger, R.M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments, Applied Measurement in Education, 8, 57–83.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.
Standards for Educational and Psychological Testing. (1985). Washington, D.C.: American Psychological Association.
Traub, R.E., Haertel, E.H., & Shavelson, R. (1996, April). The effects of measurement error on the trustworthiness of examinee classifications. Paper presented at the annual meeting of the American Educational Research Association, New York.
U.S. Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, August 25, 1978.