Skip to main content
Log in

A note on the three methods of item analysis

  • Note
  • Published:
Behaviormetrika Aims and scope Submit manuscript

A Correction to this article was published on 03 April 2021

This article has been updated

Abstract

This paper contrasts three methods of item analysis for multiple-choice items based on classical test theory, generalized linear modeling, and item response theory. Illustrations of the methods are presented with contrived and real data. Specifically, the methods, respectively, use a cross-classification table under classical test theory, a new baseline-category logit model under generalized linear modeling, and a multiple-choice model under item response theory. Advantages and disadvantages of each method are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Change history

References

  • Agresti A (2007) An introduction to categorical data analysis, 2nd edn. John Wiley Sons, Hoboken, NJ

    Book  MATH  Google Scholar 

  • Allen MJ, Yen WM (1999) Introduction to measurement theory. Waveland Press, Prospect Heights, IL

    Google Scholar 

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014) Standards for educational and psychological testing. American Educational Research Association, Washington, DC

  • Baker FB (1964) An intersection of test score interpretation and item analysis. J Educational Measurement 1:23–28

    Article  Google Scholar 

  • Bandalos DL (2018) Measurement theory and applications for the social sciences. The Guilford Press, New York, NY

    Google Scholar 

  • Bloom BS (ed) (1956) Taxonomy of educational objectives–Handbook 1: Cognitive domain. Longman, New York, NY

  • Bloom BS, Madaus GF, Hastings JT (1981) Evaluation to improve learning. McGraw-Hill, New York, NY

    Google Scholar 

  • Bock RD (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37:29–51

    Article  MATH  Google Scholar 

  • Cangelosi JS (1990) Designing tests for evaluating student achievement. Longman, White Plains, NY

    Google Scholar 

  • Chase CI (1999) Contemporary assessment for educators. Longman, New York, NY

    Google Scholar 

  • Coleman JS et al (1966) Equality of educational opportunity. U.S. Government Printing Office, Washington, DC

    Google Scholar 

  • Crocker L, Algina J (1987) Introduction to classical and modern test theory. Holt Reinhart Winston, New York, NY

    Google Scholar 

  • de Ayala RJ (2009) The theory and practice of item response theory. The Guilford Press, New York, NY

    Google Scholar 

  • Dodge Y (ed) (2003) The Oxford dictionary of statistical terms. Oxford University Press, Oxford, Great Britain

  • Domino G (2000) Psychological testing: An introduction. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • DuBois PH (1970) A history of psychological testing. Allyn and Bacon, Boston, MA

    Google Scholar 

  • Ebel RL (1972) Essentials of educational measurement. Prentice-Hall, Engelwood, NJ

    Google Scholar 

  • Gregory RJ (2000) Psychological testing: History, principles, and applications, 3rd edn. Allyn and Bacon, Boston, MA

    Google Scholar 

  • Haladyna TM (1994) Developing and validating multiple-choice test items. Lawrence Erlbaum Associates, Hillsdale, NJ

    Google Scholar 

  • Haladyna TM, Dowing SM (1989) A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education 2:37–50

    Article  Google Scholar 

  • Henrysson S (1971) Gathering, analyzing, and using data on test items. In: Thorndike RL (ed) Educational measurement, 2nd edn. American Council on Education, Washington, DC

  • Holland P, Wainer H (eds) (2012) Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, NJ

  • Kelly FJ (1915) The Kansas Silent Reading Test. Kansas State Printing Plant, Topeka, KS

    Google Scholar 

  • Kelly FJ (1916) The Kansas Silent Reading Tests. J Educational Psycho 7:63–80

    Article  Google Scholar 

  • Lane S, Raymond MR, Haladyna TM (eds) (2016) Handbook of test development, 2nd edn. Routledge, New York, NY

  • Linn RL, Miller MD (2005) Measurement and assessment in teaching, 9th edn. Pearson Education, Upper Saddle River, NJ

    Google Scholar 

  • McMillan JH (2011) Classroom assessment: Principles and practice for effective standards-based instruction, 5th edn. Pearson Education, Boston, MA

    Google Scholar 

  • Nelson D (2004) The Penguin dictionary of statistics. Penguin Books, London, England

    Google Scholar 

  • Nitko AJ (2004) Educational assessment of students, 4th edn. Pearson Education, Upper Saddle Rover, NJ

    Google Scholar 

  • Nitko AJ, Brookhart SM (2007) Educational assessment of students, 5th edn. Pearson Education, Upper Saddle River, NJ

    Google Scholar 

  • Oosterhof A (2003) Developing and using classroom assessments, 3rd edn. Pearson Education, Upper Saddle River, NJ

    Google Scholar 

  • Preston KSJ, Reise SP (2015) Detecting faulty within-item category functioning with the nominal response model. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: Applications to typical performance assessment. Routledge, New York, NY, pp 386–405

  • Payne DA (2002) Applied educational assessment, 2nd edn. Wadsworth, Belmont, CA

    Google Scholar 

  • Ostini R, Finkelman M, Nering M (2015) Selecting among polytomous IRT models. In: Reise SP, Revicki DA (eds) Handbook of item response theory modeling: Applications to typical performance assessment. Routledge, New York, NY, pp 285–304

  • Rasch G (1960) Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research. (Expanded ed., 1980. Chicago, IL: University of Chicago Press.)

  • Samejima F (1979) A new family of models for the multiple-choice item (Res. Rep. No. 79-4). Knoxville, TN: University of Tennessee, Department of Psychology

  • Samelson F (1990) Was Early Mental Testing (a) Racist Inspired, (b) Objective Science, (c) A Technology for Democrary, (d) The Origin of Multiple-Choice Exams, (e) None of the Above? (Mark the RIGHT Answer). In: Sokal MM (ed) Psychological testing and American society1890-1930. Rutgers University Press, New Brunswick, NJ, pp 115–127

  • Smith EV Jr, Smith RM (2004) Introduction to Rasch measurement: Theory, models and applications. JAM Press, Maple Grove, MN

    Google Scholar 

  • Stiggins RJ (2005) Student-involved assessment for learning, 4th edn. Pearson Education, Upper Saddle River, NJ

    Google Scholar 

  • Swaminathan H, Gifford JA (1990) Detecting differential item functioning using logistic regression procedures. J Educational Measurement 27:361–370

    Article  Google Scholar 

  • Taylor CS, Nolen SB (2003) Classroom assessment: Supporting teaching and learning in real classroom. Pearson Education, Upper Saddle River, NJ

    Google Scholar 

  • Thissen D, Cai L (2016) Nominal categories models. In: van der Linden WJ (ed) Handbook of item response theory, vol 1. Models. CRC Press, Boca Raton, FL, pp 31–73

  • Thissen D, Cai L, Bock RD (2010) The nominal categories item response model. In: Nering ML, Ostini R (eds) Handbook of polytomous item response theory models. Routledge, New York, NY, pp 43–75

  • Thissen D, Steinberg L (1984) A response model for multiple-choice items. Psychometrika 49:501–519

    Article  Google Scholar 

  • Wainer H (1983) Pyramid power: Searching for an error in test scoring with 830,000 helpers. American Statistician 37:87–91

    Google Scholar 

  • Wainer H (1989) The future of item analysis. J Educational Measurement 26:191–208

    Article  Google Scholar 

  • Yerkes R M (Ed.) (1921) Memoirs of the National Academy of Sciences: Vol. 15. Psychological examining in the United States Army. Washington, DC: Government Printing Office

  • Zimowski MF, Muraki E, Mislevy RJ, Bock RD (2002) BILOG-MG [Computer software]. Scientific Software International, Lincolnwood, IL

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seock-Ho Kim.

Additional information

Communicated by Maomi Ueno

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank the editor and anonymous reviewers for providing valuable suggestion

The original online version of this article was revised as appendix text was missing and corrected in this version.

Appendix A

Appendix A

A hypothetical Item 44 based on Wainer (1983).

figure a

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, SH., Cohen, A.S. & Eom, H.J. A note on the three methods of item analysis. Behaviormetrika 48, 345–367 (2021). https://doi.org/10.1007/s41237-021-00131-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-021-00131-1

Keywords

Navigation