Item Development Research and Practice

Albano, Anthony D.; Rodriguez, Michael C.

doi:10.1007/978-3-319-71126-3_12

Anthony D. Albano⁵ &
Michael C. Rodriguez⁶

1858 Accesses
5 Citations

Abstract

Recent legislation and federal regulations in education have increased our attention to issues of inclusion, fairness, equity, and access in achievement testing. This has resulted in a growing literature on item writing for accessibility. This chapter provides an overview of the fundamentals of item development, especially as they pertain to accessible assessments. Constructed-response and selected-response items are first introduced and compared, with examples. Next, the item development process and guidelines for effective item writing are presented. Empirical research examining the item development process is then reviewed for general education items and items modified for accessibility. Methods for evaluating item quality, in regard to accessibility, are summarized. Finally, recent innovations and technological enhancements in item development, administration, and scoring are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
American Institutes for Research. (2009). Reading assessment and item specifications for the 2009 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board.
Google Scholar
Attali, Y., & Burstein, J. (2006). Automated scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3), 1–30.
Google Scholar
Attali, Y., Powers, D., & Hawthorn, J. (2008). Effect of immediate feedback and revision on psychometric properties of open-ended sentence-completion items (ETS RR-08-16). Princeton, NJ: Educational Testing Service.
Google Scholar
Baldwin, D., Fowles, M., & Livingston, S. (2005). Guidelines for constructed-response and other performance assessments. Princeton, NJ: Educational Testing Service.
Google Scholar
Beddow, P. A. (2010). Beyond universal design: Accessibility theory to advance testing for all students. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 381–405). Charlotte, NC: Information Age.
Google Scholar
Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2010). Test accessibility and modification inventory, TAMI™ accessibility rating matrix, technical manual. Nashville, TN: Vanderbilt University. Retrieved at http://peabody.vanderbilt.edu/docs/pdf/PRO/TAMI_Technical_Manual.pdf
Google Scholar
Bennett, R. E., Morley, M., Quardt, D., & Rock, D. A. (1999). Graphical modeling: A new response type for measuring the qualitative component of mathematical reasoning (ETS RR-99-21). Princeton, NJ: Educational Testing Service.
Google Scholar
Bennett, R. E., & Ward, W. C. (1991). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Princeton, NJ: Educational Testing Service. ED395 032.
Google Scholar
Bradbard, D. A., Parker, D. F., & Stone, G. L. (2004). An alternate multiple-choice scoring procedure in a macroeconomics course. Decision Sciences Journal of Innovative Education, 2(1), 11–26.
Article Google Scholar
Bridgeman, B., & Cline, F. (2000). Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (ETS RR-00-07). Princeton, NJ: Educational Testing Service.
Google Scholar
Bridgeman, B., Cline, F., & Levin, J. (2008). Effects of calculator availability on GRE quantitative questions (ETS RR-08-31). Princeton, NJ: Educational Testing Service.
Google Scholar
Browder, D., Flowers, C., Ahlgrim-Delzell, L., Karvonen, M., Spooner, F., & Algozzine, R. (2004). The alignment of alternate assessment content with academic and functional curricula. The Journal of Special Education, 37(4), 211–223.
Google Scholar
Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157–163.
Article Google Scholar
Chang, S.-H., Lin, P.-C., & Lin, Z. C. (2007). Measures of partial knowledge and unexpected responses in multiple-choice tests. Educational Technology & Society, 10(4), 95–109.
Google Scholar
Christensen, L. L., Shyyan, V., Rogers, C., & Kincaid, A. (2014). Audio support guidelines for accessible assessments: Insights from cognitive labs. Minneapolis, MN: University of Minnesota, Enhanced Assessment Grant (#S368A120006), U.S. Department of Education.
Google Scholar
Cook, L., Eignor, D., Steinberg, J., Sawaki, Y., & Cline, F. (2009). Using factor analysis to investigate the impact of accommodations on the scores of students with disabilities on a reading comprehension assessment. Journal of Applied Testing Technology, 10(2), 1–33.
Google Scholar
Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37.
Article Google Scholar
Couch, B. A., Wood, W. B., & Knight, J. K. (2015). The molecular biology capstone assessment: A concept assessment for upper-division molecular biology students. CBE – Life Sciences Education, 10(1), 1–11.
Google Scholar
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232.
Article PubMed Google Scholar
Diaz, J., Rifqi, M., & Bouchon-Meunier, B. (2007). Evidential multiple choice questions. In P. Brusilovsky, M. Grigoriadou, & K. Papanikolaou (Eds.), Proceedings of workshop on personalisation in E-learning environments at individual and group level (pp. 61–64.) 11th International Conference on User Modeling, Corfu, Greece. Retrieved 25 Sept 2010 from http://hermis.di.uoa.gr/PeLEIGL/program.html
Google Scholar
Dolan, R. P., Goodman, J., Strain-Seymour, E., Adams, J., & Sethuraman, S. (2011). Cognitive lab evaluation of innovative items in mathematics and English/language arts assessment of elementary, middle, and high school students: Research report. Iowa City, IA: Pearson.
Google Scholar
Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 287–301). Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Elliott, S. N., Kettler, R. J., Beddow, P. A., Kurz, A., Compton, E., McGrath, D., et al. (2010). Effects of using modified items to test students with persistent academic difficulties. Exceptional Children, 76(4), 475–495.
Article Google Scholar
Ellsworth, R. A., Dunnell, P., & Duell, O. K. (1990). What are the textbooks telling teachers? The Journal of Educational Research, 83, 289–293.
Article Google Scholar
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Revised edition). Cambridge, MA: MIT Press.
Google Scholar
Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21, 357–364.
Article Google Scholar
Gallagher, A., Bennet, R. E., & Cahalan, C. (2000). Detecting construct-irrelevant variance in an open-ended, computerized mathematics task (ETS RR-00-18). Princeton, NJ: Educational Testing Service.
Google Scholar
Gitomer, D. H. (2007). Design principles for constructed response tasks: Assessing subject-matter understanding in NAEP (ETS unpublished research report). Princeton, NJ: Educational Testing Service.
Google Scholar
Haladyna, T. M. (1989, April). Fidelity and proximity to criterion: When should we use multiple-choice? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Google Scholar
Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Boston: Allyn & Bacon.
Google Scholar
Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37–50.
Article Google Scholar
Haladyna, T. M., & Downing, S. M. (1989b). The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51–78.
Article Google Scholar
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334.
Article Google Scholar
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. New York: Routledge.
Google Scholar
Hannah, L. S., & Michaels, J. U. (1977). A comprehensive framework for instructional objectives. Reading, MA: Addison-Wesley.
Google Scholar
Hart, D. (1994). Authentic assessment: A handbook for educators. Menlo Park, CA: Addison-Wesley.
Google Scholar
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.
Article Google Scholar
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). New York: American Council on Education, Macmillan.
Google Scholar
Kelly, F. J. (1916). The Kansas silent reading tests. The Journal of Educational Psychology, 7, 63–80.
Article Google Scholar
Kettler, R. J., Elliot, S. N., & Beddow, P. A. (2009). Modifying achievement test items: A theory-guided and data-based approach for better measurement of what students with disabilities know. Peabody Journal of Education, 84, 529–551.
Article Google Scholar
Kettler, R. J., Rodriguez, M. C., Bolt, D. M., Elliott, S. N., Beddow, P. A., & Kurz, A. (2011). Modified multiple-choice items for alternate assessments: Reliability, difficulty, and differential boost. Applied Measurement in Education, 24, 210–234.
Article Google Scholar
Lane, S., & Iwatani, E. (2016). Design of performance assessments in education. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 274–293). New York: Routledge.
Google Scholar
Laitusis, C. C., Buzick, H., Cook, L., & Stone, E. (2010). Adaptive testing options for accountability assessments. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margins: Challenges, strategies, and techniques (pp. 291–310). Charlotte, NC: Information Age.
Google Scholar
Mann, H. (1867). Lectures and annual reports on education. Boston: Rand & Avery.
Google Scholar
Marion, S. F., & Pellegrino, J. W. (2006). A validity framework for evaluation the technical quality of alternate assessment. Educational Measurement: Issues and Practice, 25(4), 47–57.
Article Google Scholar
Marion, S. F., & Pellegrino, J. W. (2009). Validity framework for evaluation the technical quality of alternate assessments based on alternate achievement standards. Paper presented at the annual meeting of the national council on measurement in education, San Diego, CA.
Google Scholar
Martinez, M. E. (1999). Cognition and the questions of test item format. Educational Psychologist, 34, 207–218.
Article Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education, Macmillan.
Google Scholar
Moreno, R. M., Martinez, R. J., & Muniz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2, 65–72.
Article Google Scholar
Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions, OTA-SET-519. Washington, DC: US Congress. Retrieved at http://govinfo.library.unt.edu/ota/Ota_1/DATA/1992/9236.PDF
Google Scholar
Osterlind, S. J., & Merz, W. R. (1994). Building a taxonomy for constructed-response test items. Educational Assessment, 2(2), 133–147.
Article Google Scholar
Popham, J. W. (2016). Classroom assessment: What teachers need to know. Boston: Pearson.
Google Scholar
Rodriguez, M.C. (1997, March). The art & science of item writing: A meta-analysis of multiple-choice item format effects. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Google Scholar
Rodriguez, M.C. (1998, April). The construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Google Scholar
Rodriguez, M. C. (2002). Choosing an item format. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213–231). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184.
Article Google Scholar
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13.
Article Google Scholar
Rodriguez, M. C. (2016). Selected-response item development. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 259–273). New York: Routledge.
Google Scholar
Rodriguez, M. C., & Albano, A. D. (in press). The college instructor’s guide to test item writing. New York: Routledge.
Google Scholar
Rodriguez, M.C., Elliott, S.N., Kettler, R.J., & Beddow, P.A. (2009, April). The role of item response attractors in the modification of test items. Paper presented at the annual meeting of the National Council on Measurement in Educational, San Diego, CA.
Google Scholar
Rodriguez, M. C., Kettler, R. J., & Elliott, S. N. (2014). Distractor functioning in modified items for test accessibility. Sage Open, 4(4), 1–10.
Google Scholar
Ruch, G. M., & Stoddard, G. D. (1925). Comparative reliabilities of objective examinations. Journal of Educational Psychology, 16, 89–103.
Article Google Scholar
Russell, M., & Kavanaugh, M. (2010). Assessing students in the margins: Challenges, strategies, and techniques. Charlotte, NC: Information Age.
Google Scholar
Schedl, M. A., & Malloy, J. (2014). Writing items and tasks. In A. J. Kunnan (Ed.), The companion to language assessment. Volume II: Approaches and development. Chichester, West Sussex: Wiley-Blackwell.
Google Scholar
Sireci, S. G., & Zenisky, A. L. (2016). Computerized innovative item formats: Achievement and credentialing. In S. Lane, M. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 313–334). New York: Routledge.
Google Scholar
Snow, R. E. (1980). Aptitude and achievement. In W. B. Schrader (Ed.), Measuring achievement: Progress over a decade. New directions for testing and measurement (Vol. 5, pp. 39–59). San Francisco: Jossey-Bass.
Google Scholar
Sparks, J. R., Song, Y., Brantley, W., & Liu, O. L. (2014). Assessing written communication in higher education: Review and recommendations for next-generation assessment (RR-14-37). Princeton, NJ: Educational Testing Service. Retrieved at https://www.ets.org/research/policy_research_reports/publications/report/2014/jtmo
Google Scholar
Sternberg, R. J. (1982). Handbook of human intelligence. Cambridge, MA: Cambridge University Press.
Google Scholar
Thorndike, R. M., & Thorndike-Christ, T. (2011). Measurement and evaluation in psychology and education. Boston: Pearson.
Google Scholar
Thurlow, M. L., Laitusis, C. C., Dillon, D. R., Cook, L. L., Moen, R. E., Abedi, J., & O’Brien, D. G. (2009). Accessibility principles for reading assessments. Minneapolis, MN: National Accessible Reading Assessments Projects.
Google Scholar
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6, 103–118.
Article Google Scholar
Wakeman, S., Flowers, C., & Browder, D. (2007). Aligning alternate assessments to grade level content standards: Issues and considerations for alternates based on alternate achievement standards (Policy Directions 19). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Google Scholar
Williamson, D. M., Bejar, I. I., & Mislevy, R. J. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Winter, P. C., Kopriva, R. J., Chen, C., & Emick, J. E. (2007). Exploring individual and item factors that affect assessment validity for diverse learners: Results from a large-scale cognitive lab. Learning and Individual Differences, 16, 267–276.
Article Google Scholar
Wyse, A. E., & Albano, A. D. (2015). Considering the use of general and modified assessment items in computerized adaptive testing. Applied Measurement in Education, 28(2), 156–167.
Article Google Scholar
Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15, 391–412.
Article Google Scholar
Young, J. W., So, Y., & Ockey, G. J. (2013). Guidelines for best test development practices to ensure validity and fairness for international English language proficiency assessments. Princeton, NJ: Educational Testing Service.
Google Scholar
Ysseldyke, J. E., & Olsen, K. R. (1997). Putting alternate assessments into practice: What to measure and possible sources of data (Synthesis Report No. 28). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Google Scholar
Zigmond, N., & Kloo, A. (2009). The “two percent students”: Considerations and consequences of eligibility decisions. Peabody Journal of Education, 84, 478–495.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Nebraska-Lincoln, Lincoln, NE, USA
Anthony D. Albano
University of Minnesota-Twin Cities, Minneapolis, MN, USA
Michael C. Rodriguez

Authors

Anthony D. Albano
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony D. Albano .

Editor information

Editors and Affiliations

Arizona State University, Tempe, Arizona, USA
Stephen N. Elliott
Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
Ryan J. Kettler
Accessible Hope, LLC, Nashville, Tennessee, USA
Peter A. Beddow
Arizona State University, Tempe, Arizona, USA
Alexander Kurz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Albano, A.D., Rodriguez, M.C. (2018). Item Development Research and Practice. In: Elliott, S., Kettler, R., Beddow, P., Kurz, A. (eds) Handbook of Accessible Instruction and Testing Practices. Springer, Cham. https://doi.org/10.1007/978-3-319-71126-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-71126-3_12
Published: 09 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71125-6
Online ISBN: 978-3-319-71126-3
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics