Abstract
Item analysis is an important procedure to determine the quality of the items. The purpose of this study is to assess two important indices in item analysis procedure, namely (1) item difficulty (p) and (2) item discrimination (D) as well as a correlation between them. The study involves ten 40-item multiple-choice mathematics tests. A total of 1243 Form 2 students from public schools in Penang, Kedah, Perak, and Pahang are employed as sample for this study. Both indices are calculated based on classical test theory framework because of the advantage provided for teachers. On average, only 67 % (min = 50 %, max = 87.5 %) of the items are considered of good quality that can be kept for future testing. There is incomprehensible result regarding the correlation between the two indices. Implication in terms of teachers’ competency in test development is also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Burton, S. J., Sudweeks, R. R., Merrill, P. F., & Wood, B. (1991). How to prepare better multiple-choice test items: Guidelines for university faculty. Retrieved 3 July 2015 from https://testing.byu.edu/handbooks/betteritems.pdf
Callahan, R. M. (2005). Tracking and high school english learners: Limiting opportunity to learn. American Educational Research Journal, 42(2), 305–328.
Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2015). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36(5), 648–662.
Carbonaro, W. (2005). Tracking, students’ effort and academic achievement. Sociology of Education, 78(1), 27–49.
Crocker, L., & Algina, A. (1986). Introduction to classical and modern test theory. Orlando: Holt, Rinehart and Winston Inc.
Curriculum Development Division. (2000). Sukatan pelajaran Matematik menengah rendah. Kuala Lumpur: Ministry of Education.
Curriculum Development Division. (2002). Curriculum specifications for mathematics form 2. Kuala Lumpur: Ministry of Education.
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Retrieved from http://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1061&context=cjsotl_rcacea
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133–143.
Gronlund, N. E. (1998). Assessment of student achievement (6th ed.). Needham Heights: Allyn and Bacon.
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Springer Science and Business Media.
Henning, G. A. (1987). A guide to language testing—development, evaluation, research. London: Newbury House Publisher.
Higgins, E., & Tatham, L. (2003). Exploring the potentials of multiple-choice questions in assessment. Learning and Teaching in Action, 2(1), 1–12.
Jandaghi, G. (2009). Assessment of validity, reliability and difficulty indices for teacher-built physics exam questions in first year high school. Arts and Social Sciences Journal, 11, 1–4.
Kerlinger, F. N. (1986). Foundations of behavioral research (3rd ed.). New York: CBS Publishing.
Kuechler, W. L., & Simkin, M. (2003). How well do multiple choice tests evaluate student understanding in computer programming classes? Journal of Information System Education, 14(4), 389–400.
McDonald, M. E. (2007). The nurse educator’s guide to assessing learning outcomes. Sudbury, MA: Jones and Bartlett.
Miller, M. D., Linn, R. L., & Grounlund, N. E. (2009). Measurement and assessment in teaching. New Jersey: Pearson International.
Mitra, N. K., Nagaraja, H. S., Ponnudurai, G., & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type A multiple choice question of pre-clinical Semester 1 multidisciplinary summative tests. IeJSME, 3(1), 2–7.
Nitko, A. J. (2004). Educational assessment of students (2nd ed.). Englewood Cliffs, NJ: Merrill.
Ong, E. T., & Mohamad, M. A. (2014). Pembinaan dan penentusahan instrumenn kemahiran proses sains untuk sekolah menengah. Jurnal Teknologi, 66(1), 7–29.
Pande, S. S., Pande, S. R., Parate, V. R., Nikam, A. P., & Agrekar, S. H. (2013). Correlation between difficulty and discrimination indices of MCQs in formative exam in physiology. South-East Asian Journal of Medical Education, 7(1), 45–50.
Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders. Boston: Allyn and Bacon.
Sabri, S. (2013). Item analysis of student comprehensive test for research in teaching beginner string ensemble using model based teaching among music students in public universities. International Journal of Education and Research, l(12), 1–14.
Schrecker, E. (2009). The bad old days. Chronicle of Higher Education, 55(40), 31.
Sim, S., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine, Singapore, 35, 67–71.
Steven, P. A., & Vermeersch, H. (2010). Streaming in Flemish Secondary Schools: Exploring teachers’ perceptions of and adaptations to students in different streams. Oxford Review of Education, 36(3), 267–284.
Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Medical Education, 42, 198–206.
Turner, P. (2007). Reflections on numeracy and streaming in Mathematics education. Australian Mathematics Teacher, 63 (2), 28–33.
Umar, J. (1999). Item Banking. In G. N. Masters & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment. New York: Pergamon Press.
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern Item response theory. Boston: Springer Science and Business Media.
Ware, J., & Vik, T. (2009). Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations. Medical Teacher, 31, 238–243.
Wells, C. S., & Wollack, J. A. (2003) An Instructor’s Guide to Understanding Test Retrieved 21 June 2015 from Reliability. https://testing.wisc.edu/Reliability.pdf
Yaghmale, F. (2003). Content validity and its estimation. Journal of Medical Education, 3(1), 25–27.
Zhou, W. (2009). Teachers’ estimation of item difficulty: What contributes their accuracy? In S. L. Swars, D. W. Stinson, & S. Lemons-Smith (Eds.), Proceedings of the 31st annual meeting of the North American Chapter of the International Group for Psychology of Mathematics Education Atalanta, GA: Georgia State University. Retrieved 21 July 2015 from http://www.academia.edu/640866/Proceedings_of_the_31st_Annual_Meeting_of_the_North_American_Chapter_of_the_International_Group_for_the_Psychology_of_Mathematics_Education
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Khairani, A.Z., Shamsuddin, H. (2016). Assessing Item Difficulty and Discrimination Indices of Teacher-Developed Multiple-Choice Tests. In: Tang, S., Logonnathan, L. (eds) Assessment for Learning Within and Beyond the Classroom. Springer, Singapore. https://doi.org/10.1007/978-981-10-0908-2_35
Download citation
DOI: https://doi.org/10.1007/978-981-10-0908-2_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0906-8
Online ISBN: 978-981-10-0908-2
eBook Packages: EducationEducation (R0)