American Educational Research Association, American Psychological Association, and American Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.), Educational measurement (2nd ed.), pp. 508–600. Washington, DC: American Council on Education.
Google Scholar
Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion referenced tests. Review of Educational Research, 56, 137–172.
CrossRef
Google Scholar
Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teacher Examinations. Journal of Educational Measurement, 27, 145–163.
CrossRef
Google Scholar
Cizek, G. J. (2001). Conjectures on the rise and call of standard setting: An introduction to context and practice. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 3–17). Mahwah: Lawrence Erlbaum.
Google Scholar
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks: Sage Publications Ltd.
CrossRef
Google Scholar
Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press http://www.coe.int/T/DG4/Linguistic/Default_en.asp. Retrieved Nov 2013.
Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah: Erlbaum.
Google Scholar
Downing, S. M., & Haladyna, T. M. (2006). Handbook of test development. Mahwah: Erlbaum.
Google Scholar
Feskens, R., Keuning, J., Van Til, A., & Verheyen, R. (2014). Performance standards for the CEFR in Dutch secondary education: An international standard setting study. Arnhem: Cito.
Google Scholar
Finn, R. H. (1970). A note on estimating the reliability of categorical data. Educational and Psychological Measurement, 30, 71–76.
CrossRef
Google Scholar
Goodwin, L. D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of borderline candidates. Applied Measurement in Education, 12(1), 13–28.
CrossRef
Google Scholar
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–871.
CrossRef
Google Scholar
Hambleton, R. K., & Plake, B. S. (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in Education, 8, 41–55.
CrossRef
Google Scholar
Hambleton, R. K., & Pitoniak, M. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Westport: Praeger.
Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park: Sage.
Google Scholar
Hambleton, R. K., Jaeger, R. M., Plake, B. S., & Mills, C. N. (2000). Handbook for setting standards on performance assessments. Washington, DC: Council of Chief State School Officers.
Google Scholar
Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34, 353–366.
CrossRef
Google Scholar
Jaeger, R. M. (1978). A proposal for setting a standard on the North Carolina High School competency test. Paper presented at the 1978 spring meeting of the North Carolina Association for Research in Education, Chapel Hill.
Google Scholar
Jaeger, R. (1989). Certification of student competence. In R. Linn (Ed.), Educational measurement (pp. 485–511). Washington, DC: American Council on Education.
Google Scholar
Kaftandjieva, F. (2004). Methods for setting cut scores in criterion-referenced achievement tests. A comparative analysis of six recent methods with an application to tests of reading in EFL. Arnhem: Cito.
Google Scholar
Kane, M. (1998). Choosing between examinee-centered and test-centered standard-setting methods. Educational Assessment, 5, 129–145.
CrossRef
Google Scholar
Karatonis, A., & Sireci, S. (2006). The bookmark standard-setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4–12.
CrossRef
Google Scholar
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
CrossRef
Google Scholar
Lewis, D. M., Mitzel, H. C., Green, D. R. (1996). Standard setting: A bookmark approach. In D. R. Green (Chair), IRT-based standard setting procedures utilizing behavioural anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large-scale Assessment, Phoenix, AZ.
Google Scholar
Lewis, D. M., Mitzel, H. C., Green, D. R., & Patz, R. J. (1999). The bookmark standard setting procedure. Monterey: McGraw-Hill.
Google Scholar
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.
Google Scholar
Pitoniak, M. J., Hambleton, R. K., Sireci, S. G. (2002). Advances in Standard Setting for Professional Licensure Examinations. Paper was presented at the annual meeting of the American Educational Research Association, New Orleans, LA, April, 2002.
Google Scholar
Reckase, M. D. (2006). A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods. Educational Measurement: Issues and Practice, 25(2), 4–18.
CrossRef
Google Scholar
Sireci, S. G., Hambleton, R. K., Huff, K. L., & Jodoin, M. G. (2000). Setting and validating standards on Microsoft certified professional examinations, Laboratory of Psychometric and Evaluative Research Report No. 395. Amherst: University of Massachusetts, School of Education.
Google Scholar
Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004). Setting passing scores on licensure exams using direct consensus. CLEAR Exam Review, 15(1), 21–25.
Google Scholar
Van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
Google Scholar
Verhelst, N. D., & Glas, C. A. W. (1995). The generalized one parameter model: OPLM. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Their foundations, recent developments and applications (pp. 215–238). New York: Springer.
CrossRef
Google Scholar
Woehr, D. J., Arthur, W., & Fehrmann, M. L. (1991). An empirical comparison of cut-off score methods for content-related and criterion-related validity settings. Educational and Psychological Measurement, 51, 1029–1039.
CrossRef
Google Scholar
Zieky, M. J., Perie, M., Livingston, S. (2008). Cuts cores: A manual for setting standards of performance on educational and occupational tests. http://www.amazon.com/Cutscores-Standards-Performance-Educational Occupational/dp/1438250304/