Abstract
Recent legislation and federal regulations in education have increased our attention to issues of inclusion, fairness, equity, and access in achievement testing. This has resulted in a growing literature on item writing for accessibility. This chapter provides an overview of the fundamentals of item development, especially as they pertain to accessible assessments. Constructed-response and selected-response items are first introduced and compared, with examples. Next, the item development process and guidelines for effective item writing are presented. Empirical research examining the item development process is then reviewed for general education items and items modified for accessibility. Methods for evaluating item quality, in regard to accessibility, are summarized. Finally, recent innovations and technological enhancements in item development, administration, and scoring are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
American Institutes for Research. (2009). Reading assessment and item specifications for the 2009 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.
Attali, Y., & Burstein, J. (2006). Automated scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3), 1–30.
Attali, Y., Powers, D., & Hawthorn, J. (2008). Effect of immediate feedback and revision on psychometric properties of open-ended sentence-completion items (ETS RR-08-16). Princeton, NJ: Educational Testing Service.
Baldwin, D., Fowles, M., & Livingston, S. (2005). Guidelines for constructed-response and other performance assessments. Princeton, NJ: Educational Testing Service.
Beddow, P. A. (2010). Beyond universal design: Accessibility theory to advance testing for all students. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 381–405). Charlotte, NC: Information Age.
Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2010). Test accessibility and modification inventory, TAMI™ accessibility rating matrix, technical manual. Nashville, TN: Vanderbilt University. Retrieved at http://peabody.vanderbilt.edu/docs/pdf/PRO/TAMI_Technical_Manual.pdf
Bennett, R. E., Morley, M., Quardt, D., & Rock, D. A. (1999). Graphical modeling: A new response type for measuring the qualitative component of mathematical reasoning (ETS RR-99-21). Princeton, NJ: Educational Testing Service.
Bennett, R. E., & Ward, W. C. (1991). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum.
Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Princeton, NJ: Educational Testing Service. ED395 032.
Bradbard, D. A., Parker, D. F., & Stone, G. L. (2004). An alternate multiple-choice scoring procedure in a macroeconomics course. Decision Sciences Journal of Innovative Education, 2(1), 11–26.
Bridgeman, B., & Cline, F. (2000). Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (ETS RR-00-07). Princeton, NJ: Educational Testing Service.
Bridgeman, B., Cline, F., & Levin, J. (2008). Effects of calculator availability on GRE quantitative questions (ETS RR-08-31). Princeton, NJ: Educational Testing Service.
Browder, D., Flowers, C., Ahlgrim-Delzell, L., Karvonen, M., Spooner, F., & Algozzine, R. (2004). The alignment of alternate assessment content with academic and functional curricula. The Journal of Special Education, 37(4), 211–223.
Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157–163.
Chang, S.-H., Lin, P.-C., & Lin, Z. C. (2007). Measures of partial knowledge and unexpected responses in multiple-choice tests. Educational Technology & Society, 10(4), 95–109.
Christensen, L. L., Shyyan, V., Rogers, C., & Kincaid, A. (2014). Audio support guidelines for accessible assessments: Insights from cognitive labs. Minneapolis, MN: University of Minnesota, Enhanced Assessment Grant (#S368A120006), U.S. Department of Education.
Cook, L., Eignor, D., Steinberg, J., Sawaki, Y., & Cline, F. (2009). Using factor analysis to investigate the impact of accommodations on the scores of students with disabilities on a reading comprehension assessment. Journal of Applied Testing Technology, 10(2), 1–33.
Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37.
Couch, B. A., Wood, W. B., & Knight, J. K. (2015). The molecular biology capstone assessment: A concept assessment for upper-division molecular biology students. CBE – Life Sciences Education, 10(1), 1–11.
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232.
Diaz, J., Rifqi, M., & Bouchon-Meunier, B. (2007). Evidential multiple choice questions. In P. Brusilovsky, M. Grigoriadou, & K. Papanikolaou (Eds.), Proceedings of workshop on personalisation in E-learning environments at individual and group level (pp. 61–64.) 11th International Conference on User Modeling, Corfu, Greece. Retrieved 25 Sept 2010 from http://hermis.di.uoa.gr/PeLEIGL/program.html
Dolan, R. P., Goodman, J., Strain-Seymour, E., Adams, J., & Sethuraman, S. (2011). Cognitive lab evaluation of innovative items in mathematics and English/language arts assessment of elementary, middle, and high school students: Research report. Iowa City, IA: Pearson.
Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 287–301). Mahwah, NJ: Lawrence Erlbaum.
Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A., Compton, E., McGrath, D., et al. (2010). Effects of using modified items to test students with persistent academic difficulties. Exceptional Children, 76(4), 475–495.
Ellsworth, R. A., Dunnell, P., & Duell, O. K. (1990). What are the textbooks telling teachers? The Journal of Educational Research, 83, 289–293.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Revised edition). Cambridge, MA: MIT Press.
Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21, 357–364.
Gallagher, A., Bennet, R. E., & Cahalan, C. (2000). Detecting construct-irrelevant variance in an open-ended, computerized mathematics task (ETS RR-00-18). Princeton, NJ: Educational Testing Service.
Gitomer, D. H. (2007). Design principles for constructed response tasks: Assessing subject-matter understanding in NAEP (ETS unpublished research report). Princeton, NJ: Educational Testing Service.
Haladyna, T. M. (1989, April). Fidelity and proximity to criterion: When should we use multiple-choice? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Boston: Allyn & Bacon.
Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37–50.
Haladyna, T. M., & Downing, S. M. (1989b). The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51–78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. New York: Routledge.
Hannah, L. S., & Michaels, J. U. (1977). A comprehensive framework for instructional objectives. Reading, MA: Addison-Wesley.
Hart, D. (1994). Authentic assessment: A handbook for educators. Menlo Park, CA: Addison-Wesley.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). New York: American Council on Education, Macmillan.
Kelly, F. J. (1916). The Kansas silent reading tests. The Journal of Educational Psychology, 7, 63–80.
Kettler, R. J., Elliot, S. N., & Beddow, P. A. (2009). Modifying achievement test items: A theory-guided and data-based approach for better measurement of what students with disabilities know. Peabody Journal of Education, 84, 529–551.
Kettler, R. J., Rodriguez, M. C., Bolt, D. M., Elliott, S. N., Beddow, P. A., & Kurz, A. (2011). Modified multiple-choice items for alternate assessments: Reliability, difficulty, and differential boost. Applied Measurement in Education, 24, 210–234.
Lane, S., & Iwatani, E. (2016). Design of performance assessments in education. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 274–293). New York: Routledge.
Laitusis, C. C., Buzick, H., Cook, L., & Stone, E. (2010). Adaptive testing options for accountability assessments. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 291–310). Charlotte, NC: Information Age.
Mann, H. (1867). Lectures and annual reports on education. Boston: Rand & Avery.
Marion, S. F., & Pellegrino, J. W. (2006). A validity framework for evaluation the technical quality of alternate assessment. Educational Measurement: Issues and Practice, 25(4), 47–57.
Marion, S. F., & Pellegrino, J. W. (2009). Validity framework for evaluation the technical quality of alternate assessments based on alternate achievement standards. Paper presented at the annual meeting of the national council on measurement in education, San Diego, CA.
Martinez, M. E. (1999). Cognition and the questions of test item format. Educational Psychologist, 34, 207–218.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education, Macmillan.
Moreno, R. M., Martinez, R. J., & Muniz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2, 65–72.
Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions, OTA-SET-519. Washington, DC: US Congress. Retrieved at http://govinfo.library.unt.edu/ota/Ota_1/DATA/1992/9236.PDF
Osterlind, S. J., & Merz, W. R. (1994). Building a taxonomy for constructed-response test items. Educational Assessment, 2(2), 133–147.
Popham, J. W. (2016). Classroom assessment: What teachers need to know. Boston: Pearson.
Rodriguez, M.C. (1997, March). The art & science of item writing: A meta-analysis of multiple-choice item format effects. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Rodriguez, M.C. (1998, April). The construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Rodriguez, M. C. (2002). Choosing an item format. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213–231). Mahwah, NJ: Lawrence Erlbaum Associates.
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184.
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13.
Rodriguez, M. C. (2016). Selected-response item development. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 259–273). New York: Routledge.
Rodriguez, M. C., & Albano, A. D. (in press). The college instructor’s guide to test item writing. New York: Routledge.
Rodriguez, M.C., Elliott, S.N., Kettler, R.J., & Beddow, P.A. (2009, April). The role of item response attractors in the modification of test items. Paper presented at the annual meeting of the National Council on Measurement in Educational, San Diego, CA.
Rodriguez, M. C., Kettler, R. J., & Elliott, S. N. (2014). Distractor functioning in modified items for test accessibility. Sage Open, 4(4), 1–10.
Ruch, G. M., & Stoddard, G. D. (1925). Comparative reliabilities of objective examinations. Journal of Educational Psychology, 16, 89–103.
Russell, M., & Kavanaugh, M. (2010). Assessing students in the margins: Challenges, strategies, and techniques. Charlotte, NC: Information Age.
Schedl, M. A., & Malloy, J. (2014). Writing items and tasks. In A. J. Kunnan (Ed.), The companion to language assessment. Volume II: Approaches and development. Chichester, West Sussex: Wiley-Blackwell.
Sireci, S. G., & Zenisky, A. L. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 313–334). New York: Routledge.
Snow, R. E. (1980). Aptitude and achievement. In W. B. Schrader (Ed.), Measuring achievement: Progress over a decade. New directions for testing and measurement (Vol. 5, pp. 39–59). San Francisco: Jossey-Bass.
Sparks, J. R., Song, Y., Brantley, W., & Liu, O. L. (2014). Assessing written communication in higher education: Review and recommendations for next-generation assessment (RR-14-37). Princeton, NJ: Educational Testing Service. Retrieved at https://www.ets.org/research/policy_research_reports/publications/report/2014/jtmo
Sternberg, R. J. (1982). Handbook of human intelligence. Cambridge, MA: Cambridge University Press.
Thorndike, R. M., & Thorndike-Christ, T. (2011). Measurement and evaluation in psychology and education. Boston: Pearson.
Thurlow, M. L., Laitusis, C. C., Dillon, D. R., Cook, L. L., Moen, R. E., Abedi, J., & O’Brien, D. G. (2009). Accessibility principles for reading assessments. Minneapolis, MN: National Accessible Reading Assessments Projects.
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6, 103–118.
Wakeman, S., Flowers, C., & Browder, D. (2007). Aligning alternate assessments to grade level content standards: Issues and considerations for alternates based on alternate achievement standards (Policy Directions 19). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Williamson, D. M., Bejar, I. I., & Mislevy, R. J. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Lawrence Erlbaum.
Winter, P. C., Kopriva, R. J., Chen, C., & Emick, J. E. (2007). Exploring individual and item factors that affect assessment validity for diverse learners: Results from a large-scale cognitive lab. Learning and Individual Differences, 16, 267–276.
Wyse, A. E., & Albano, A. D. (2015). Considering the use of general and modified assessment items in computerized adaptive testing. Applied Measurement in Education, 28(2), 156–167.
Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15, 391–412.
Young, J. W., So, Y., & Ockey, G. J. (2013). Guidelines for best test development practices to ensure validity and fairness for international English language proficiency assessments. Princeton, NJ: Educational Testing Service.
Ysseldyke, J. E., & Olsen, K. R. (1997). Putting alternate assessments into practice: What to measure and possible sources of data (Synthesis Report No. 28). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Zigmond, N., & Kloo, A. (2009). The “two percent students”: Considerations and consequences of eligibility decisions. Peabody Journal of Education, 84, 478–495.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Albano, A.D., Rodriguez, M.C. (2018). Item Development Research and Practice. In: Elliott, S., Kettler, R., Beddow, P., Kurz, A. (eds) Handbook of Accessible Instruction and Testing Practices. Springer, Cham. https://doi.org/10.1007/978-3-319-71126-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-71126-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71125-6
Online ISBN: 978-3-319-71126-3
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)