Abstract
This paper contrasts three methods of item analysis for multiple-choice items based on classical test theory, generalized linear modeling, and item response theory. Illustrations of the methods are presented with contrived and real data. Specifically, the methods, respectively, use a cross-classification table under classical test theory, a new baseline-category logit model under generalized linear modeling, and a multiple-choice model under item response theory. Advantages and disadvantages of each method are discussed.
Change history
03 April 2021
A Correction to this paper has been published: https://doi.org/10.1007/s41237-021-00137-9
References
Agresti A (2007) An introduction to categorical data analysis, 2nd edn. John Wiley Sons, Hoboken, NJ
Allen MJ, Yen WM (1999) Introduction to measurement theory. Waveland Press, Prospect Heights, IL
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014) Standards for educational and psychological testing. American Educational Research Association, Washington, DC
Baker FB (1964) An intersection of test score interpretation and item analysis. J Educational Measurement 1:23–28
Bandalos DL (2018) Measurement theory and applications for the social sciences. The Guilford Press, New York, NY
Bloom BS (ed) (1956) Taxonomy of educational objectives–Handbook 1: Cognitive domain. Longman, New York, NY
Bloom BS, Madaus GF, Hastings JT (1981) Evaluation to improve learning. McGraw-Hill, New York, NY
Bock RD (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37:29–51
Cangelosi JS (1990) Designing tests for evaluating student achievement. Longman, White Plains, NY
Chase CI (1999) Contemporary assessment for educators. Longman, New York, NY
Coleman JS et al (1966) Equality of educational opportunity. U.S. Government Printing Office, Washington, DC
Crocker L, Algina J (1987) Introduction to classical and modern test theory. Holt Reinhart Winston, New York, NY
de Ayala RJ (2009) The theory and practice of item response theory. The Guilford Press, New York, NY
Dodge Y (ed) (2003) The Oxford dictionary of statistical terms. Oxford University Press, Oxford, Great Britain
Domino G (2000) Psychological testing: An introduction. Prentice Hall, Upper Saddle River, NJ
DuBois PH (1970) A history of psychological testing. Allyn and Bacon, Boston, MA
Ebel RL (1972) Essentials of educational measurement. Prentice-Hall, Engelwood, NJ
Gregory RJ (2000) Psychological testing: History, principles, and applications, 3rd edn. Allyn and Bacon, Boston, MA
Haladyna TM (1994) Developing and validating multiple-choice test items. Lawrence Erlbaum Associates, Hillsdale, NJ
Haladyna TM, Dowing SM (1989) A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education 2:37–50
Henrysson S (1971) Gathering, analyzing, and using data on test items. In: Thorndike RL (ed) Educational measurement, 2nd edn. American Council on Education, Washington, DC
Holland P, Wainer H (eds) (2012) Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, NJ
Kelly FJ (1915) The Kansas Silent Reading Test. Kansas State Printing Plant, Topeka, KS
Kelly FJ (1916) The Kansas Silent Reading Tests. J Educational Psycho 7:63–80
Lane S, Raymond MR, Haladyna TM (eds) (2016) Handbook of test development, 2nd edn. Routledge, New York, NY
Linn RL, Miller MD (2005) Measurement and assessment in teaching, 9th edn. Pearson Education, Upper Saddle River, NJ
McMillan JH (2011) Classroom assessment: Principles and practice for effective standards-based instruction, 5th edn. Pearson Education, Boston, MA
Nelson D (2004) The Penguin dictionary of statistics. Penguin Books, London, England
Nitko AJ (2004) Educational assessment of students, 4th edn. Pearson Education, Upper Saddle Rover, NJ
Nitko AJ, Brookhart SM (2007) Educational assessment of students, 5th edn. Pearson Education, Upper Saddle River, NJ
Oosterhof A (2003) Developing and using classroom assessments, 3rd edn. Pearson Education, Upper Saddle River, NJ
Preston KSJ, Reise SP (2015) Detecting faulty within-item category functioning with the nominal response model. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: Applications to typical performance assessment. Routledge, New York, NY, pp 386–405
Payne DA (2002) Applied educational assessment, 2nd edn. Wadsworth, Belmont, CA
Ostini R, Finkelman M, Nering M (2015) Selecting among polytomous IRT models. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: Applications to typical performance assessment. Routledge, New York, NY, pp 285–304
Rasch G (1960) Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research. (Expanded ed., 1980. Chicago, IL: University of Chicago Press.)
Samejima F (1979) A new family of models for the multiple-choice item (Res. Rep. No. 79-4). Knoxville, TN: University of Tennessee, Department of Psychology
Samelson F (1990) Was Early Mental Testing (a) Racist Inspired, (b) Objective Science, (c) A Technology for Democrary, (d) The Origin of Multiple-Choice Exams, (e) None of the Above? (Mark the RIGHT Answer). In: Sokal MM (ed) Psychological testing and American society1890-1930. Rutgers University Press, New Brunswick, NJ, pp 115–127
Smith EV Jr, Smith RM (2004) Introduction to Rasch measurement: Theory, models and applications. JAM Press, Maple Grove, MN
Stiggins RJ (2005) Student-involved assessment for learning, 4th edn. Pearson Education, Upper Saddle River, NJ
Swaminathan H, Gifford JA (1990) Detecting differential item functioning using logistic regression procedures. J Educational Measurement 27:361–370
Taylor CS, Nolen SB (2003) Classroom assessment: Supporting teaching and learning in real classroom. Pearson Education, Upper Saddle River, NJ
Thissen D, Cai L (2016) Nominal categories models. In: van der Linden WJ (ed) Handbook of item response theory, vol 1. Models. CRC Press, Boca Raton, FL, pp 31–73
Thissen D, Cai L, Bock RD (2010) The nominal categories item response model. In: Nering ML, Ostini R (eds) Handbook of polytomous item response theory models. Routledge, New York, NY, pp 43–75
Thissen D, Steinberg L (1984) A response model for multiple-choice items. Psychometrika 49:501–519
Wainer H (1983) Pyramid power: Searching for an error in test scoring with 830,000 helpers. American Statistician 37:87–91
Wainer H (1989) The future of item analysis. J Educational Measurement 26:191–208
Yerkes R M (Ed.) (1921) Memoirs of the National Academy of Sciences: Vol. 15. Psychological examining in the United States Army. Washington, DC: Government Printing Office
Zimowski MF, Muraki E, Mislevy RJ, Bock RD (2002) BILOG-MG [Computer software]. Scientific Software International, Lincolnwood, IL
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Maomi Ueno
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors would like to thank the editor and anonymous reviewers for providing valuable suggestion
The original online version of this article was revised as appendix text was missing and corrected in this version.
Appendix A
Appendix A
A hypothetical Item 44 based on Wainer (1983).
About this article
Cite this article
Kim, SH., Cohen, A.S. & Eom, H.J. A note on the three methods of item analysis. Behaviormetrika 48, 345–367 (2021). https://doi.org/10.1007/s41237-021-00131-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-021-00131-1