Skip to main content

Part of the book series: Evaluation in Education and Human Services Series ((EEHS,volume 28))

Abstract

One of the major changes in the testing field over the last 20 years has been the increased interest in and use of criterion-referenced tests (CRT). Criterion-referenced tests provide a basis for assessing the performance of examinees in relation to well-defined domains of content rather than in relation to other examinees, as with norm-referenced tests. Criterionreferenced tests are now widely used (1) in the armed services, to assess the competencies of servicemen; (2) in industry, to assess the job skills of employees and to evaluate the results of training programs; (3) in the licensing and certification fields, to distinguish “masters” from “nonmasters” in over 900 professions in the United States alone; and (4) in educational settings such as schools, colleges, and universities, to assess the performance levels of students on competencies of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: APA.

    Google Scholar 

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (ed.), Educational measurement, 2nd ed. Washington, DC: American Council on Education, pp. 508–600.

    Google Scholar 

  • Berk, R. A. (1976). Determination of optimal cutting scores in criterion-referenced measurement. Journal of Experimental Education 45:4–9.

    Google Scholar 

  • Berk, R. A. (1980). A consumer’s guide to criterion-referenced test reliability. Journal of Educational Measurement 17:323–349.

    Article  Google Scholar 

  • Berk, R. A. (ed.). (1984a). A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press.

    Google Scholar 

  • Berk, R. A. (1984b). Conducting the item analysis. In R. A. Berk (ed.), A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press, pp. 97–143.

    Google Scholar 

  • Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests. Review of Educational Research 56(1):137–172.

    Article  Google Scholar 

  • Block, J. H. (1972). Student learning and the setting of mastery performance standards. Educational Horizons 50:183–190.

    Google Scholar 

  • Campbell, D.T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 56:81–105.

    Article  Google Scholar 

  • Carlson, S. B. (1985). Creative classroom testing: 10 designs for assessment and instruction. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Carver, R. P. (1974). Two dimensions of tests: psychometric and edumetric. American Psychologist 29:512–518.

    Article  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37–46.

    Article  Google Scholar 

  • Department of the Army. (1986). Skill qualification test and common task test development policy and procedures (TRADOC Reg. 351–2). Fort Monroe, VA: U.S. Army Training and Doctrine Command.

    Google Scholar 

  • Ebel, R. L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall

    Google Scholar 

  • Ebel, R. L. (1978). The case for norm-referenced measurements. Educational Researcher 7:3–5.

    Google Scholar 

  • Eignor, D. R., & Hambleton, R. K. (1979). Effects of test length and advancement score on several criterion-referenced test reliability and validity indices (Laboratory of Psychometric and Evaluative Research Report No. 86). Amherst, MA: School of Education, University of Massachusetts.

    Google Scholar 

  • Fitzpatrick, A.R. (1983). The meaning of content validity. Applied Psychological Measurement 7:3–13.

    Article  Google Scholar 

  • Glaser, R. (1963). Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist 18:519–521.

    Article  Google Scholar 

  • Glass, G.V. (1978). Standards and criteria. Journal of Educational Measurement 15:237–261.

    Article  Google Scholar 

  • Gray, W. M. (1978). A comparison of Piagetian theory and criterion-referenced measurement. Review of Educational Research 48:223–249.

    Article  Google Scholar 

  • Gulliksen, H. (1986). Perspective on educational measurement. Applied Psychological Measurement 10(2):109–132.

    Article  Google Scholar 

  • Haertel, E. (1985). Construct validity and criterion-referenced testing. Review of Educational Research 55(1):23–46.

    Article  Google Scholar 

  • Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education 2(1):37–50.

    Article  Google Scholar 

  • Haladyna, T. M., & Downing, S. M. (1989b). Validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education 2(1):51–78.

    Article  Google Scholar 

  • Haladyna, T. M., & Shindoll, R. R (1989). Item shells: A method for writing effective multiple-choice test items. Evaluation & the Health Professions 12(1):97–106.

    Article  Google Scholar 

  • Hambleton, R. K. (ed.). (1980). Contributions to criterion-referenced testing technology. Applied Psychological Measurement 4(4):421–581.

    Google Scholar 

  • Hambleton, R. K. (1982). Advances in criterion-referenced testing technology. In C. R. Reynolds & T. Gutkin (eds.), Handbook of school psychology. New York: Wiley, pp. 351–379.

    Google Scholar 

  • Hambleton, R. K. (1984a). Validating the test scores. In R. A. Berk (ed.), A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press, pp. 199–230.

    Google Scholar 

  • Hambleton, R. K. (1984b). Determining test lengths. In R. A. Berk (ed.), A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press, pp. 144–168.

    Google Scholar 

  • Hambleton, R. K. (1985). Criterion-referenced assessment of individual differences. In C. R. Reynolds & V. L. Willson (eds.), Methodological and statistical advances in the study of individual differences. New York: Plenum Press, pp. 393–424.

    Google Scholar 

  • Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (ed.), Educational measurement, 3rd ed. New York: Macmillan, pp. 147–200.

    Google Scholar 

  • Hambleton, R. K. (1990). A practical guide to criterion-referenced testing. Boston, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Hambleton, R. K., & Eignor, D.R. (1978). Guidelines for evaluating criterion-referenced tests and test manuals. Journal of Educational Measurement 15:321–327.

    Article  Google Scholar 

  • Hambleton, R. K., Mills, C.N., & Simon, R. (1983). Determining the lengths for criterion-referenced tests. Journal of Educational Measurement 20(1):27–38.

    Article  Google Scholar 

  • Hambleton, R. K., & Novick, M. R. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement 10:159–171.

    Article  Google Scholar 

  • Hambleton, R. K., & Powell, S. (1983). A framework for viewing the process of standard-setting. Evaluation & the Health Professions 6:3–24.

    Article  Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer Academic Publishers.

    Book  Google Scholar 

  • Hambleton, R.K., Swaminathan, H., Algina, J., & Coulson, D.B. (1978). Criterion-referenced testing and measurement: A review of technical issues and developments. Review of Educational Research 48:1–47.

    Article  Google Scholar 

  • Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement 13:253–264.

    Article  Google Scholar 

  • Jaeger, R. M. (1978). A proposal for setting a standard on the North Carolina High School Competency Test. Paper presented at the Spring meeting of the North Carolina Association for Research in Education, Chapel Hill.

    Google Scholar 

  • Jaeger, R. M. (1989). Certification of student competence. In R.L. Linn (ed.), Educational measurement, 3rd ed. New York: Macmillan, pp. 485–514.

    Google Scholar 

  • Kane, M. T. (1982). The validity of licensure examinations. American Psychologist 37:911–918.

    Article  Google Scholar 

  • Kirsch, I., & Guthrie, J.T. (1980). Construct validity of functional reading tests. Journal of Educational Measurement 17:81–93.

    Article  Google Scholar 

  • Linn, R. L. (1979). Issues of validity in measurement for competency-based programs. In M.A. Bunda & J.R. Sanders (eds.), Practice and problems in competency-based measurement. Washington, DC: National Council on Measurement in Education, pp. 108–123.

    Google Scholar 

  • Linn, R. L. (1980). Issues of validity for criterion-referenced measures. Applied Psychological Measurement 4:547–561.

    Article  Google Scholar 

  • Linn, R. L. (ed.) (1989). Educational measurement, 3rd ed. New York: Macmillan.

    Google Scholar 

  • Livingston, S.A. (1975). A utility-based approach to the evaluation of pass/fail testing decision procedures (Report No. COPA-75-01). Princeton, NJ: Center for Occupational and Professional Assessment, Educational Testing Service.

    Google Scholar 

  • Livingston, S.A. (1976). Choosing minimum passing scores by stochastic approximation techniques (Report No. COPA-76-02). Princeton, NJ: Center for Occupational and Professional Assessment, Educational Testing Service.

    Google Scholar 

  • Livingston, S.A. (1989). New Jersey College Outcomes Evaluation Program: A report on the development of the general intellectual skills assessment. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Livingston, S.A., & Zieky, M.J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Madaus, G. (ed.). (1983). The courts, validity, and minimum competency. Boston, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Mager, R. F. (1962). Preparing instructional objectives. Palo Alto, CA: Fearon Publishers.

    Google Scholar 

  • Meskauskas, J. A. (1976). Evaluation models for criterion-referenced testing: Views regarding mastery and standard-setting. Review of Educational Research 46:133–158.

    Article  Google Scholar 

  • Messick, S. A. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist 30:955–966.

    Article  Google Scholar 

  • Messick, S. A. (1989). Validity. In R. L. Linn (ed.), Educational measurement, 3rd ed. New York: Macmillan, pp. 13–104.

    Google Scholar 

  • Millman, J. (1973). Passing scores and test lengths for domain-referenced measures. Review of Educational Research 43:205–216.

    Article  Google Scholar 

  • Millman, J. (1974). Criterion-referenced measurement. In W.J. Popham (ed.), Evaluation in education: Current applications. Berkeley, CA: McCutchan Publishing, pp. 311–397.

    Google Scholar 

  • Millman, J., & Westman, R. S. (1989). Computer-assisted writing of achievement test items: Toward a future technology. Journal of Educational Measurement 26(2): 177–190.

    Article  Google Scholar 

  • Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement 14:3–19.

    Article  Google Scholar 

  • Nitko, A. J. (1980). Distinguishing the many varieties of criterion-referenced tests. Review of Educational Research 50:461–485.

    Article  Google Scholar 

  • Norcini, J. J., Hancock, E.W., Webster, G.D., Grosso, L. J., & Shea, J. A. (1988). A criterion-referenced examination of physician competence. Evaluation & the Health Professions 11(1):98–112.

    Article  Google Scholar 

  • Popham, W. J. (1974). An approaching peril: Cloud-referenced tests. Phi Delta Kappan 56:614–615.

    Google Scholar 

  • Popham, W. J. (1978a). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Popham, W. J. (1978b). Setting performance standards. Los Angeles, CA: Instructional Objectives Exchange.

    Google Scholar 

  • Popham, W. J. (1978c). As always, provocative. Journal of Educational Measurement 15:297–300.

    Article  Google Scholar 

  • Popham, W. J. (1978d). The case for criterion-referenced measurements. Educational Researcher 7:6–10.

    Article  Google Scholar 

  • Popham, W. J. (1984). Specifying the domain of content or behaviors. In R. Berk (ed.), A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press.

    Google Scholar 

  • Popham, W. J. (1987). Preparing policy-makers for standard-setting on high-stakes tests. Educational Evaluation and Policy Analysis 9:77–82.

    Google Scholar 

  • Popham, W. J., & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement 6:1–9.

    Article  Google Scholar 

  • Roid, G.H., & Haladyna, T. M. (1982). A technology for test-item writing. New York: Academic Press.

    Google Scholar 

  • Rovinelli, R. J., & Hambleton, R.K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Tijdschrift voor Onderwijsresearch 2:49–60.

    Google Scholar 

  • Scheuneman, J. D., & Bleistein, C. A. (1989). A consumer’s guide to statistics for identifying differential item functioning. Applied Measurement in Education 2(3):255–275.

    Article  Google Scholar 

  • Schoon, C.G., Gullion, C.M., & Ferrara, P. (1979). Bayesian statistics, credentialing examinations, and the determination of passing points. Evaluation & the Health Professions 2:181–201.

    Article  Google Scholar 

  • Shepard, L. A. (1984). Setting performance standards. In R.A. Berk (ed.), A guide to criterion-referenced test construction. Baltimore, MD: The Johns Hopkins University Press, pp. 169–198.

    Google Scholar 

  • Subkoviak, M. (1976). Estimating reliability from a single administration of a criterion-referenced test. Journal of Educational Measurement 13:265–275.

    Article  Google Scholar 

  • Subkoviak, M. (1988). A practitioner’s guide to computation and interpretation of reliability indices for mastery tests. Journal of Educational Measurement 25:47–55.

    Article  Google Scholar 

  • Swaminathan, H., Hambleton, R. K., & Algina, J. (1974). Reliability of criterion-referenced tests: A decision-theoretic formulation. Journal of Educational Measurement 11:263–268.

    Article  Google Scholar 

  • van der Linden, W.J. (1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement 4:469–492.

    Article  Google Scholar 

  • van der Linden, W.J. (1981). A latent trait look at pre-test-post-test validation of criterion-referenced test items. Review of Educational Research 51(3):379–402.

    Article  Google Scholar 

  • van der Linden, W.J. & Mellenbergh, G. J. (1977). Optimal cutting scores using a linear loss function. Applied Psychological Measurement 11:593–599.

    Article  Google Scholar 

  • Ward, W. C, Frederiksen, N., & Carlson, S. B. (1980). Construct validity of free-response and machine-scorable forms of a test. Journal of Educational Measurement 17:11–29.

    Article  Google Scholar 

  • Wieberg, H.J.W., Neeb, K.E., & Schott, F. (1984). Empirical comparison of trained and non-trained teachers in constructing criterion-referenced items. Studies in Educational Evaluation 10:199–204.

    Article  Google Scholar 

  • Wilcox, R. (1976). A note on the length and passing score of a mastery test. Journal of Educational Statistics 1:359–364.

    Article  Google Scholar 

  • Wilcox, R. (1980). Determining the length of a criterion-referenced test. Applied Psychological Measurement 4(4):425–446.

    Article  Google Scholar 

  • Woodruff, D.J., & Sawyer, R. L. (1989). Estimating measures of pass-fail reliability from parallel half tests. Applied Psychological Measurement 13(1):33–43.

    Article  Google Scholar 

Download references

Authors

Editor information

Ronald K. Hambleton Jac N. Zaal

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer Science+Business Media New York

About this chapter

Cite this chapter

Hambleton, R.K., Rogers, H.J. (1991). Advances in Criterion-Referenced Measurement. In: Hambleton, R.K., Zaal, J.N. (eds) Advances in Educational and Psychological Testing: Theory and Applications. Evaluation in Education and Human Services Series, vol 28. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-2195-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-2195-5_1

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-7484-1

  • Online ISBN: 978-94-009-2195-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics