Advertisement

Item Development Research and Practice

  • Anthony D. AlbanoEmail author
  • Michael C. Rodriguez
Chapter

Abstract

Recent legislation and federal regulations in education have increased our attention to issues of inclusion, fairness, equity, and access in achievement testing. This has resulted in a growing literature on item writing for accessibility. This chapter provides an overview of the fundamentals of item development, especially as they pertain to accessible assessments. Constructed-response and selected-response items are first introduced and compared, with examples. Next, the item development process and guidelines for effective item writing are presented. Empirical research examining the item development process is then reviewed for general education items and items modified for accessibility. Methods for evaluating item quality, in regard to accessibility, are summarized. Finally, recent innovations and technological enhancements in item development, administration, and scoring are discussed.

Keywords

Item writing Constructed-response items Selected-response items Item development process Modified items 

References

  1. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  2. American Institutes for Research. (2009). Reading assessment and item specifications for the 2009 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.Google Scholar
  3. Attali, Y., & Burstein, J. (2006). Automated scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3), 1–30.Google Scholar
  4. Attali, Y., Powers, D., & Hawthorn, J. (2008). Effect of immediate feedback and revision on psychometric properties of open-ended sentence-completion items (ETS RR-08-16). Princeton, NJ: Educational Testing Service.Google Scholar
  5. Baldwin, D., Fowles, M., & Livingston, S. (2005). Guidelines for constructed-response and other performance assessments. Princeton, NJ: Educational Testing Service.Google Scholar
  6. Beddow, P. A. (2010). Beyond universal design: Accessibility theory to advance testing for all students. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 381–405). Charlotte, NC: Information Age.Google Scholar
  7. Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2010). Test accessibility and modification inventory, TAMI™ accessibility rating matrix, technical manual. Nashville, TN: Vanderbilt University. Retrieved at http://peabody.vanderbilt.edu/docs/pdf/PRO/TAMI_Technical_Manual.pdf Google Scholar
  8. Bennett, R. E., Morley, M., Quardt, D., & Rock, D. A. (1999). Graphical modeling: A new response type for measuring the qualitative component of mathematical reasoning (ETS RR-99-21). Princeton, NJ: Educational Testing Service.Google Scholar
  9. Bennett, R. E., & Ward, W. C. (1991). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  10. Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Princeton, NJ: Educational Testing Service. ED395 032.Google Scholar
  11. Bradbard, D. A., Parker, D. F., & Stone, G. L. (2004). An alternate multiple-choice scoring procedure in a macroeconomics course. Decision Sciences Journal of Innovative Education, 2(1), 11–26.CrossRefGoogle Scholar
  12. Bridgeman, B., & Cline, F. (2000). Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (ETS RR-00-07). Princeton, NJ: Educational Testing Service.Google Scholar
  13. Bridgeman, B., Cline, F., & Levin, J. (2008). Effects of calculator availability on GRE quantitative questions (ETS RR-08-31). Princeton, NJ: Educational Testing Service.Google Scholar
  14. Browder, D., Flowers, C., Ahlgrim-Delzell, L., Karvonen, M., Spooner, F., & Algozzine, R. (2004). The alignment of alternate assessment content with academic and functional curricula. The Journal of Special Education, 37(4), 211–223.Google Scholar
  15. Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157–163.CrossRefGoogle Scholar
  16. Chang, S.-H., Lin, P.-C., & Lin, Z. C. (2007). Measures of partial knowledge and unexpected responses in multiple-choice tests. Educational Technology & Society, 10(4), 95–109.Google Scholar
  17. Christensen, L. L., Shyyan, V., Rogers, C., & Kincaid, A. (2014). Audio support guidelines for accessible assessments: Insights from cognitive labs. Minneapolis, MN: University of Minnesota, Enhanced Assessment Grant (#S368A120006), U.S. Department of Education.Google Scholar
  18. Cook, L., Eignor, D., Steinberg, J., Sawaki, Y., & Cline, F. (2009). Using factor analysis to investigate the impact of accommodations on the scores of students with disabilities on a reading comprehension assessment. Journal of Applied Testing Technology, 10(2), 1–33.Google Scholar
  19. Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37.CrossRefGoogle Scholar
  20. Couch, B. A., Wood, W. B., & Knight, J. K. (2015). The molecular biology capstone assessment: A concept assessment for upper-division molecular biology students. CBE – Life Sciences Education, 10(1), 1–11.Google Scholar
  21. Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232.CrossRefPubMedGoogle Scholar
  22. Diaz, J., Rifqi, M., & Bouchon-Meunier, B. (2007). Evidential multiple choice questions. In P. Brusilovsky, M. Grigoriadou, & K. Papanikolaou (Eds.), Proceedings of workshop on personalisation in E-learning environments at individual and group level (pp. 61–64.) 11th International Conference on User Modeling, Corfu, Greece. Retrieved 25 Sept 2010 from http://hermis.di.uoa.gr/PeLEIGL/program.html Google Scholar
  23. Dolan, R. P., Goodman, J., Strain-Seymour, E., Adams, J., & Sethuraman, S. (2011). Cognitive lab evaluation of innovative items in mathematics and English/language arts assessment of elementary, middle, and high school students: Research report. Iowa City, IA: Pearson.Google Scholar
  24. Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 287–301). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  25. Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A., Compton, E., McGrath, D., et al. (2010). Effects of using modified items to test students with persistent academic difficulties. Exceptional Children, 76(4), 475–495.CrossRefGoogle Scholar
  26. Ellsworth, R. A., Dunnell, P., & Duell, O. K. (1990). What are the textbooks telling teachers? The Journal of Educational Research, 83, 289–293.CrossRefGoogle Scholar
  27. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Revised edition). Cambridge, MA: MIT Press.Google Scholar
  28. Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21, 357–364.CrossRefGoogle Scholar
  29. Gallagher, A., Bennet, R. E., & Cahalan, C. (2000). Detecting construct-irrelevant variance in an open-ended, computerized mathematics task (ETS RR-00-18). Princeton, NJ: Educational Testing Service.Google Scholar
  30. Gitomer, D. H. (2007). Design principles for constructed response tasks: Assessing subject-matter understanding in NAEP (ETS unpublished research report). Princeton, NJ: Educational Testing Service.Google Scholar
  31. Haladyna, T. M. (1989, April). Fidelity and proximity to criterion: When should we use multiple-choice? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.Google Scholar
  32. Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Boston: Allyn & Bacon.Google Scholar
  33. Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37–50.CrossRefGoogle Scholar
  34. Haladyna, T. M., & Downing, S. M. (1989b). The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51–78.CrossRefGoogle Scholar
  35. Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334.CrossRefGoogle Scholar
  36. Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. New York: Routledge.Google Scholar
  37. Hannah, L. S., & Michaels, J. U. (1977). A comprehensive framework for instructional objectives. Reading, MA: Addison-Wesley.Google Scholar
  38. Hart, D. (1994). Authentic assessment: A handbook for educators. Menlo Park, CA: Addison-Wesley.Google Scholar
  39. Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.CrossRefGoogle Scholar
  40. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). New York: American Council on Education, Macmillan.Google Scholar
  41. Kelly, F. J. (1916). The Kansas silent reading tests. The Journal of Educational Psychology, 7, 63–80.CrossRefGoogle Scholar
  42. Kettler, R. J., Elliot, S. N., & Beddow, P. A. (2009). Modifying achievement test items: A theory-guided and data-based approach for better measurement of what students with disabilities know. Peabody Journal of Education, 84, 529–551.CrossRefGoogle Scholar
  43. Kettler, R. J., Rodriguez, M. C., Bolt, D. M., Elliott, S. N., Beddow, P. A., & Kurz, A. (2011). Modified multiple-choice items for alternate assessments: Reliability, difficulty, and differential boost. Applied Measurement in Education, 24, 210–234.CrossRefGoogle Scholar
  44. Lane, S., & Iwatani, E. (2016). Design of performance assessments in education. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 274–293). New York: Routledge.Google Scholar
  45. Laitusis, C. C., Buzick, H., Cook, L., & Stone, E. (2010). Adaptive testing options for accountability assessments. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 291–310). Charlotte, NC: Information Age.Google Scholar
  46. Mann, H. (1867). Lectures and annual reports on education. Boston: Rand & Avery.Google Scholar
  47. Marion, S. F., & Pellegrino, J. W. (2006). A validity framework for evaluation the technical quality of alternate assessment. Educational Measurement: Issues and Practice, 25(4), 47–57.CrossRefGoogle Scholar
  48. Marion, S. F., & Pellegrino, J. W. (2009). Validity framework for evaluation the technical quality of alternate assessments based on alternate achievement standards. Paper presented at the annual meeting of the national council on measurement in education, San Diego, CA.Google Scholar
  49. Martinez, M. E. (1999). Cognition and the questions of test item format. Educational Psychologist, 34, 207–218.CrossRefGoogle Scholar
  50. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education, Macmillan.Google Scholar
  51. Moreno, R. M., Martinez, R. J., & Muniz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2, 65–72.CrossRefGoogle Scholar
  52. Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions, OTA-SET-519. Washington, DC: US Congress. Retrieved at http://govinfo.library.unt.edu/ota/Ota_1/DATA/1992/9236.PDF Google Scholar
  53. Osterlind, S. J., & Merz, W. R. (1994). Building a taxonomy for constructed-response test items. Educational Assessment, 2(2), 133–147.CrossRefGoogle Scholar
  54. Popham, J. W. (2016). Classroom assessment: What teachers need to know. Boston: Pearson.Google Scholar
  55. Rodriguez, M.C. (1997, March). The art & science of item writing: A meta-analysis of multiple-choice item format effects. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.Google Scholar
  56. Rodriguez, M.C. (1998, April). The construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.Google Scholar
  57. Rodriguez, M. C. (2002). Choosing an item format. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213–231). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  58. Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184.CrossRefGoogle Scholar
  59. Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13.CrossRefGoogle Scholar
  60. Rodriguez, M. C. (2016). Selected-response item development. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 259–273). New York: Routledge.Google Scholar
  61. Rodriguez, M. C., & Albano, A. D. (in press). The college instructor’s guide to test item writing. New York: Routledge.Google Scholar
  62. Rodriguez, M.C., Elliott, S.N., Kettler, R.J., & Beddow, P.A. (2009, April). The role of item response attractors in the modification of test items. Paper presented at the annual meeting of the National Council on Measurement in Educational, San Diego, CA.Google Scholar
  63. Rodriguez, M. C., Kettler, R. J., & Elliott, S. N. (2014). Distractor functioning in modified items for test accessibility. Sage Open, 4(4), 1–10.Google Scholar
  64. Ruch, G. M., & Stoddard, G. D. (1925). Comparative reliabilities of objective examinations. Journal of Educational Psychology, 16, 89–103.CrossRefGoogle Scholar
  65. Russell, M., & Kavanaugh, M. (2010). Assessing students in the margins: Challenges, strategies, and techniques. Charlotte, NC: Information Age.Google Scholar
  66. Schedl, M. A., & Malloy, J. (2014). Writing items and tasks. In A. J. Kunnan (Ed.), The companion to language assessment. Volume II: Approaches and development. Chichester, West Sussex: Wiley-Blackwell.Google Scholar
  67. Sireci, S. G., & Zenisky, A. L. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 313–334). New York: Routledge.Google Scholar
  68. Snow, R. E. (1980). Aptitude and achievement. In W. B. Schrader (Ed.), Measuring achievement: Progress over a decade. New directions for testing and measurement (Vol. 5, pp. 39–59). San Francisco: Jossey-Bass.Google Scholar
  69. Sparks, J. R., Song, Y., Brantley, W., & Liu, O. L. (2014). Assessing written communication in higher education: Review and recommendations for next-generation assessment (RR-14-37). Princeton, NJ: Educational Testing Service. Retrieved at https://www.ets.org/research/policy_research_reports/publications/report/2014/jtmo Google Scholar
  70. Sternberg, R. J. (1982). Handbook of human intelligence. Cambridge, MA: Cambridge University Press.Google Scholar
  71. Thorndike, R. M., & Thorndike-Christ, T. (2011). Measurement and evaluation in psychology and education. Boston: Pearson.Google Scholar
  72. Thurlow, M. L., Laitusis, C. C., Dillon, D. R., Cook, L. L., Moen, R. E., Abedi, J., & O’Brien, D. G. (2009). Accessibility principles for reading assessments. Minneapolis, MN: National Accessible Reading Assessments Projects.Google Scholar
  73. Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6, 103–118.CrossRefGoogle Scholar
  74. Wakeman, S., Flowers, C., & Browder, D. (2007). Aligning alternate assessments to grade level content standards: Issues and considerations for alternates based on alternate achievement standards (Policy Directions 19). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.Google Scholar
  75. Williamson, D. M., Bejar, I. I., & Mislevy, R. J. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  76. Winter, P. C., Kopriva, R. J., Chen, C., & Emick, J. E. (2007). Exploring individual and item factors that affect assessment validity for diverse learners: Results from a large-scale cognitive lab. Learning and Individual Differences, 16, 267–276.CrossRefGoogle Scholar
  77. Wyse, A. E., & Albano, A. D. (2015). Considering the use of general and modified assessment items in computerized adaptive testing. Applied Measurement in Education, 28(2), 156–167.CrossRefGoogle Scholar
  78. Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15, 391–412.CrossRefGoogle Scholar
  79. Young, J. W., So, Y., & Ockey, G. J. (2013). Guidelines for best test development practices to ensure validity and fairness for international English language proficiency assessments. Princeton, NJ: Educational Testing Service.Google Scholar
  80. Ysseldyke, J. E., & Olsen, K. R. (1997). Putting alternate assessments into practice: What to measure and possible sources of data (Synthesis Report No. 28). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.Google Scholar
  81. Zigmond, N., & Kloo, A. (2009). The “two percent students”: Considerations and consequences of eligibility decisions. Peabody Journal of Education, 84, 478–495.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.University of Nebraska-LincolnLincolnUSA
  2. 2.University of Minnesota-Twin CitiesMinneapolisUSA

Personalised recommendations