Skip to main content

Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests

  • Conference paper
  • First Online:
Assessment for Learning Within and Beyond the Classroom

Abstract

Item analysis is an important procedure to determine the quality of the items. The purpose of this study is to assess two important indices in item analysis procedure, namely (1) item difficulty (p) and (2) item discrimination (D) as well as a correlation between them. The study involves ten 40-item multiple-choice mathematics tests. A total of 1243 Form 2 students from public schools in Penang, Kedah, Perak, and Pahang are employed as sample for this study. Both indices are calculated based on classical test theory framework because of the advantage provided for teachers. On average, only 67 % (min = 50 %, max = 87.5 %) of the items are considered of good quality that can be kept for future testing. There is incomprehensible result regarding the correlation between the two indices. Implication in terms of teachers’ competency in test development is also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

    Google Scholar 

  • Burton, S. J., Sudweeks, R. R., Merrill, P. F., & Wood, B. (1991). How to prepare better multiple-choice test items: Guidelines for university faculty. Retrieved 3 July 2015 from https://testing.byu.edu/handbooks/betteritems.pdf

  • Callahan, R. M. (2005). Tracking and high school english learners: Limiting opportunity to learn. American Educational Research Journal, 42(2), 305–328.

    Article  Google Scholar 

  • Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2015). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36(5), 648–662.

    Article  Google Scholar 

  • Carbonaro, W. (2005). Tracking, students’ effort and academic achievement. Sociology of Education, 78(1), 27–49.

    Article  Google Scholar 

  • Crocker, L., & Algina, A. (1986). Introduction to classical and modern test theory. Orlando: Holt, Rinehart and Winston Inc.

    Google Scholar 

  • Curriculum Development Division. (2000). Sukatan pelajaran Matematik menengah rendah. Kuala Lumpur: Ministry of Education.

    Google Scholar 

  • Curriculum Development Division. (2002). Curriculum specifications for mathematics form 2. Kuala Lumpur: Ministry of Education.

    Google Scholar 

  • DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Retrieved from http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1061&context=cjsotl_rcacea

  • Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133–143.

    Article  Google Scholar 

  • Gronlund, N. E. (1998). Assessment of student achievement (6th ed.). Needham Heights: Allyn and Bacon.

    Google Scholar 

  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Springer Science and Business Media.

    Book  Google Scholar 

  • Henning, G. A. (1987). A guide to language testing—development, evaluation, research. London: Newbury House Publisher.

    Google Scholar 

  • Higgins, E., & Tatham, L. (2003). Exploring the potentials of multiple-choice questions in assessment. Learning and Teaching in Action, 2(1), 1–12.

    Google Scholar 

  • Jandaghi, G. (2009). Assessment of validity, reliability and difficulty indices for teacher-built physics exam questions in first year high school. Arts and Social Sciences Journal, 11, 1–4.

    Google Scholar 

  • Kerlinger, F. N. (1986). Foundations of behavioral research (3rd ed.). New York: CBS Publishing.

    Google Scholar 

  • Kuechler, W. L., & Simkin, M. (2003). How well do multiple choice tests evaluate student understanding in computer programming classes? Journal of Information System Education, 14(4), 389–400.

    Google Scholar 

  • McDonald, M. E. (2007). The nurse educator’s guide to assessing learning outcomes. Sudbury, MA: Jones and Bartlett.

    Google Scholar 

  • Miller, M. D., Linn, R. L., & Grounlund, N. E. (2009). Measurement and assessment in teaching. New Jersey: Pearson International.

    Google Scholar 

  • Mitra, N. K., Nagaraja, H. S., Ponnudurai, G., & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type A multiple choice question of pre-clinical Semester 1 multidisciplinary summative tests. IeJSME, 3(1), 2–7.

    Google Scholar 

  • Nitko, A. J. (2004). Educational assessment of students (2nd ed.). Englewood Cliffs, NJ: Merrill.

    Google Scholar 

  • Ong, E. T., & Mohamad, M. A. (2014). Pembinaan dan penentusahan instrumenn kemahiran proses sains untuk sekolah menengah. Jurnal Teknologi, 66(1), 7–29.

    Google Scholar 

  • Pande, S. S., Pande, S. R., Parate, V. R., Nikam, A. P., & Agrekar, S. H. (2013). Correlation between difficulty and discrimination indices of MCQs in formative exam in physiology. South-East Asian Journal of Medical Education, 7(1), 45–50.

    Google Scholar 

  • Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders. Boston: Allyn and Bacon.

    Google Scholar 

  • Sabri, S. (2013). Item analysis of student comprehensive test for research in teaching beginner string ensemble using model based teaching among music students in public universities. International Journal of Education and Research, l(12), 1–14.

    Google Scholar 

  • Schrecker, E. (2009). The bad old days. Chronicle of Higher Education, 55(40), 31.

    Google Scholar 

  • Sim, S., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine, Singapore, 35, 67–71.

    Google Scholar 

  • Steven, P. A., & Vermeersch, H. (2010). Streaming in Flemish Secondary Schools: Exploring teachers’ perceptions of and adaptations to students in different streams. Oxford Review of Education, 36(3), 267–284.

    Article  Google Scholar 

  • Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Medical Education, 42, 198–206.

    Article  Google Scholar 

  • Turner, P. (2007). Reflections on numeracy and streaming in Mathematics education. Australian Mathematics Teacher, 63 (2), 28–33.

    Google Scholar 

  • Umar, J. (1999). Item Banking. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment. New York: Pergamon Press.

    Google Scholar 

  • van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern Item response theory. Boston: Springer Science and Business Media.

    Book  Google Scholar 

  • Ware, J., & Vik, T. (2009). Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations. Medical Teacher, 31, 238–243.

    Article  Google Scholar 

  • Wells, C. S., & Wollack, J. A. (2003) An Instructor’s Guide to Understanding Test Retrieved 21 June 2015 from Reliability. https://testing.wisc.edu/Reliability.pdf

  • Yaghmale, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1), 25–27.

    Google Scholar 

  • Zhou, W. (2009). Teachers’ estimation of item difficulty: What contributes their accuracy? In S. L. Swars, D. W. Stinson, & S. Lemons-Smith (Eds.), Proceedings of the 31st annual meeting of the North American Chapter of the International Group for Psychology of Mathematics Education Atalanta, GA: Georgia State University. Retrieved 21 July 2015 from http://www.academia.edu/640866/Proceedings_of_the_31st_Annual_Meeting_of_the_North_American_Chapter_of_the_International_Group_for_the_Psychology_of_Mathematics_Education

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Zamri Khairani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Khairani, A.Z., Shamsuddin, H. (2016). Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests. In: Tang, S., Logonnathan, L. (eds) Assessment for Learning Within and Beyond the Classroom. Springer, Singapore. https://doi.org/10.1007/978-981-10-0908-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0908-2_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0906-8

  • Online ISBN: 978-981-10-0908-2

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics