Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests

Khairani, Ahmad Zamri; Shamsuddin, Hasni

doi:10.1007/978-981-10-0908-2_35

Ahmad Zamri Khairani³ &
Hasni Shamsuddin⁴

2954 Accesses
3 Citations

Abstract

Item analysis is an important procedure to determine the quality of the items. The purpose of this study is to assess two important indices in item analysis procedure, namely (1) item difficulty (p) and (2) item discrimination (D) as well as a correlation between them. The study involves ten 40-item multiple-choice mathematics tests. A total of 1243 Form 2 students from public schools in Penang, Kedah, Perak, and Pahang are employed as sample for this study. Both indices are calculated based on classical test theory framework because of the advantage provided for teachers. On average, only 67 % (min = 50 %, max = 87.5 %) of the items are considered of good quality that can be kept for future testing. There is incomprehensible result regarding the correlation between the two indices. Implication in terms of teachers’ competency in test development is also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Google Scholar
Burton, S. J., Sudweeks, R. R., Merrill, P. F., & Wood, B. (1991). How to prepare better multiple-choice test items: Guidelines for university faculty. Retrieved 3 July 2015 from https://testing.byu.edu/handbooks/betteritems.pdf
Callahan, R. M. (2005). Tracking and high school english learners: Limiting opportunity to learn. American Educational Research Journal, 42(2), 305–328.
Article Google Scholar
Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2015). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36(5), 648–662.
Article Google Scholar
Carbonaro, W. (2005). Tracking, students’ effort and academic achievement. Sociology of Education, 78(1), 27–49.
Article Google Scholar
Crocker, L., & Algina, A. (1986). Introduction to classical and modern test theory. Orlando: Holt, Rinehart and Winston Inc.
Google Scholar
Curriculum Development Division. (2000). Sukatan pelajaran Matematik menengah rendah. Kuala Lumpur: Ministry of Education.
Google Scholar
Curriculum Development Division. (2002). Curriculum specifications for mathematics form 2. Kuala Lumpur: Ministry of Education.
Google Scholar
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Retrieved from http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1061&context=cjsotl_rcacea
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133–143.
Article Google Scholar
Gronlund, N. E. (1998). Assessment of student achievement (6th ed.). Needham Heights: Allyn and Bacon.
Google Scholar
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Springer Science and Business Media.
Book Google Scholar
Henning, G. A. (1987). A guide to language testing—development, evaluation, research. London: Newbury House Publisher.
Google Scholar
Higgins, E., & Tatham, L. (2003). Exploring the potentials of multiple-choice questions in assessment. Learning and Teaching in Action, 2(1), 1–12.
Google Scholar
Jandaghi, G. (2009). Assessment of validity, reliability and difficulty indices for teacher-built physics exam questions in first year high school. Arts and Social Sciences Journal, 11, 1–4.
Google Scholar
Kerlinger, F. N. (1986). Foundations of behavioral research (3rd ed.). New York: CBS Publishing.
Google Scholar
Kuechler, W. L., & Simkin, M. (2003). How well do multiple choice tests evaluate student understanding in computer programming classes? Journal of Information System Education, 14(4), 389–400.
Google Scholar
McDonald, M. E. (2007). The nurse educator’s guide to assessing learning outcomes. Sudbury, MA: Jones and Bartlett.
Google Scholar
Miller, M. D., Linn, R. L., & Grounlund, N. E. (2009). Measurement and assessment in teaching. New Jersey: Pearson International.
Google Scholar
Mitra, N. K., Nagaraja, H. S., Ponnudurai, G., & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type A multiple choice question of pre-clinical Semester 1 multidisciplinary summative tests. IeJSME, 3(1), 2–7.
Google Scholar
Nitko, A. J. (2004). Educational assessment of students (2nd ed.). Englewood Cliffs, NJ: Merrill.
Google Scholar
Ong, E. T., & Mohamad, M. A. (2014). Pembinaan dan penentusahan instrumenn kemahiran proses sains untuk sekolah menengah. Jurnal Teknologi, 66(1), 7–29.
Google Scholar
Pande, S. S., Pande, S. R., Parate, V. R., Nikam, A. P., & Agrekar, S. H. (2013). Correlation between difficulty and discrimination indices of MCQs in formative exam in physiology. South-East Asian Journal of Medical Education, 7(1), 45–50.
Google Scholar
Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders. Boston: Allyn and Bacon.
Google Scholar
Sabri, S. (2013). Item analysis of student comprehensive test for research in teaching beginner string ensemble using model based teaching among music students in public universities. International Journal of Education and Research, l(12), 1–14.
Google Scholar
Schrecker, E. (2009). The bad old days. Chronicle of Higher Education, 55(40), 31.
Google Scholar
Sim, S., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine, Singapore, 35, 67–71.
Google Scholar
Steven, P. A., & Vermeersch, H. (2010). Streaming in Flemish Secondary Schools: Exploring teachers’ perceptions of and adaptations to students in different streams. Oxford Review of Education, 36(3), 267–284.
Article Google Scholar
Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Medical Education, 42, 198–206.
Article Google Scholar
Turner, P. (2007). Reflections on numeracy and streaming in Mathematics education. Australian Mathematics Teacher, 63 (2), 28–33.
Google Scholar
Umar, J. (1999). Item Banking. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment. New York: Pergamon Press.
Google Scholar
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern Item response theory. Boston: Springer Science and Business Media.
Book Google Scholar
Ware, J., & Vik, T. (2009). Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations. Medical Teacher, 31, 238–243.
Article Google Scholar
Wells, C. S., & Wollack, J. A. (2003) An Instructor’s Guide to Understanding Test Retrieved 21 June 2015 from Reliability. https://testing.wisc.edu/Reliability.pdf
Yaghmale, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1), 25–27.
Google Scholar
Zhou, W. (2009). Teachers’ estimation of item difficulty: What contributes their accuracy? In S. L. Swars, D. W. Stinson, & S. Lemons-Smith (Eds.), Proceedings of the 31st annual meeting of the North American Chapter of the International Group for Psychology of Mathematics Education Atalanta, GA: Georgia State University. Retrieved 21 July 2015 from http://www.academia.edu/640866/Proceedings_of_the_31st_Annual_Meeting_of_the_North_American_Chapter_of_the_International_Group_for_the_Psychology_of_Mathematics_Education

Download references

Author information

Authors and Affiliations

School of Educational Studies, Universiti Sains Malaysia, 11800, Gelugor, Penang, Malaysia
Ahmad Zamri Khairani
Sekolah Menengah Sains Kepala Batas, 11800, Kepala Batas, Penang, Malaysia
Hasni Shamsuddin

Authors

Ahmad Zamri Khairani
View author publications
You can also search for this author in PubMed Google Scholar
Hasni Shamsuddin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Zamri Khairani .

Editor information

Editors and Affiliations

Taylor’s University, Subang Jaya, Selangor Darul Ehsan, Malaysia
Siew Fun Tang
Taylor’s University, Subang Jaya, Selangor Darul Ehsan, Malaysia
Loshinikarasi Logonnathan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khairani, A.Z., Shamsuddin, H. (2016). Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests. In: Tang, S., Logonnathan, L. (eds) Assessment for Learning Within and Beyond the Classroom. Springer, Singapore. https://doi.org/10.1007/978-981-10-0908-2_35

Download citation

DOI: https://doi.org/10.1007/978-981-10-0908-2_35
Published: 30 June 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0906-8
Online ISBN: 978-981-10-0908-2
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics