A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

AlKhuzaey, Samah; Grasso, Floriana; Payne, Terry R.; Tamma, Valentina

doi:10.1007/978-3-030-78292-4_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12748))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3488 Accesses
6 Citations
1 Altmetric

Abstract

Assessment quality and validity is heavily reliant on the quality of items included in an assessment or test. Difficulty is an essential factor that can determine items and tests’ overall quality. Therefore, item difficulty prediction is extremely important in any pedagogical learning environment. Data-driven approaches to item difficulty prediction are gaining more and more prominence, as demonstrated by the recent literature. In this paper, we provide a systematic review of data-driven approaches to item difficulty prediction. Of the 148 papers that were identified that cover item difficulty prediction, 38 papers were selected for the final analysis. A classification of the different approaches used to predict item difficulty is presented, together with the current practices for item difficulty prediction with respect to the learning algorithms used, and the most influential difficulty features that were investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alsubait, T., Parsia, B., Sattler, U.: Generating multiple choice questions from ontologies: lessons learnt. In: Keet, C.M., Tamma, V.A.M. (eds.) Proceedings of 11th International Workshop on OWL: Experiences and Directions (OWLED 2014). CEUR Workshop Proceedings, vol. 1265, pp. 73–84 (2014)
Google Scholar
Aryadoust, V.: Predicting item difficulty in a language test with an adaptive neuro fuzzy inference system. In: 2013 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA), pp. 43–50. IEEE (2013)
Google Scholar
Beinborn, L., Zesch, T., Gurevych, I.: Predicting the difficulty of language proficiency tests. Trans. Assoc. Comput. Linguist. 2, 517–530 (2014)
Article Google Scholar
Beinborn, L., Zesch, T., Gurevych, I.: Candidate evaluation strategies for improved difficulty prediction of language tests. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–11 (2015)
Google Scholar
Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: Introducing a framework to assess newly created questions with natural language processing. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 43–54. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_4
Chapter Google Scholar
Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: R2de: a NLP approach to estimating IRT parameters of newly generated questions. In: Proceedings of the 10th International Conference on Learning Analytics & Knowledge, pp. 412–421 (2020b)
Google Scholar
Bilotti, M.W., Ogilvie, P., Callan, J., Nyberg, E.: Structured retrieval for question answering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 351–358 (2007)
Google Scholar
Boldt, R.F.: GRE analytical reasoning item statistics prediction study. ETS Res. Rep. Series 1998(2), 1–23 (1998)
Article Google Scholar
Boldt, R.F., Freedle, R.: Using a neural net to predict item difficulty. ETS Res. Rep. Series 1996(2), 1–19 (1996)
Article Google Scholar
Choi, I.C., Moon, Y.: Predicting the difficulty of EFL tests based on corpus linguistic features and expert judgment. Lang. Assess. Q. 17(1), 18–42 (2020)
Article Google Scholar
Crisp, V., Grayson, R.: Modelling question difficulty in an a level physics examination. Res. Papers Educ. 28(3), 346–372 (2013)
Article Google Scholar
Fei, T., Heng, W.J., Toh, K.C., Qi, T.: Question classification for e-learning by artificial neural network. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 3, pp. 1757–1761. IEEE (2003)
Google Scholar
Franzen, M.: Item difficulty. Encycl. Clin. Neuropsychol. 100 (2011)
Google Scholar
Gao, Y., Bing, L., Chen, W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. arXiv preprint arXiv:1807.03586 (2018)
Grivokostopoulou, F., Hatzilygeroudis, I., Perikos, I.: Teaching assistance and automatic difficulty estimation in converting first order logic to clause form. Artif. Intell. Rev. 42(3), 347–367 (2013). https://doi.org/10.1007/s10462-013-9417-8
Article Google Scholar
Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I.: Estimating the difficulty of exercises on search algorithms using a neuro-fuzzy approach. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 866–872. IEEE (2015)
Google Scholar
Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I.: Difficulty estimation of exercises on tree-based search algorithms using neuro-fuzzy and neuro-symbolic approaches. In: Hatzilygeroudis, I., Palade, V., Prentzas, J. (eds.) Advances in Combining Intelligent Methods. ISRL, vol. 116, pp. 75–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-46200-4_4
Chapter Google Scholar
Ha, V., Baldwin, P., Mee, J., et al.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019)
Google Scholar
Hoshino, A., Nakagawa, H.: Predicting the difficulty of multiple-choice close questions for computer-adaptive testing. Nat. Lang. Process. Appl. 46, 279 (2010)
Google Scholar
Hsu, F.Y., Lee, H.M., Chang, T.H., Sung, Y.T.: Automated estimation of item difficulty for multiple-choice tests: an application of word embedding techniques. Inf. Process. Manage. 54(6), 969–984 (2018)
Article Google Scholar
Huang, Z., et al.: Question difficulty prediction for reading problems in standard tests. In: AAAI, pp. 1352–1359 (2017)
Google Scholar
Hutzler, D., David, E., Avigal, M., Azoulay, R.: Learning methods for rating the difficulty of reading comprehension questions. In: 2014 IEEE International Conference on Software Science, Technology and Engineering, pp. 54–62. IEEE (2014)
Google Scholar
Khodeir, N.A., Elazhary, H., Wanas, N.: Generating story problems via controlled parameters in a web-based intelligent tutoring system. Int. J. Inf. Learn. Technol. 35(3), 199–216 (2018)
Article Google Scholar
Khoshdel, F., Baghaei, P., Bemani, M.: Investigating factors of difficulty in c-tests: a construct identification approach. Int. J. Lang. Test. 6(2), 113–122 (2016)
Google Scholar
Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE 2007–001, Keele University and Durham University Joint Report (07 2007)
Google Scholar
Kurdi, G., et al.: A Comparative Study of Methods for a Priori Prediction of MCQ Difficulty. Semantic Web - Interoperability, Usability, Applicability (2020)
Google Scholar
Lin, C., Liu, D., Pang, W., Apeh, E.: Automatically predicting quiz difficulty level using similarity measures. In: Proceedings of the 8th International Conference on Knowledge Capture, pp. 1–8 (2015)
Google Scholar
Lin, L.H., Chang, T.H., Hsu, F.Y.: Automated prediction of item difficulty in reading comprehension using long short-term memory. In: 2019 International Conference on Asian Language Processing (IALP), pp. 132–135. IEEE (2019)
Google Scholar
Loukina, A., Yoon, S.Y., Sakano, J., Wei, Y., Sheehan, K.: Textual complexity as a predictor of difficulty of listening items in language proficiency tests. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3245–3253 (2016)
Google Scholar
Mitra, N., Nagaraja, H., Ponnudurai, G., Judson, J.: The levels of difficulty and discrimination indices in type a multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests. Int. e-J. Sci. Med. Educ. (IeJSME) 3(1), 2–7 (2009)
Google Scholar
Narayanan, S., Kommuri, V.S., Subramanian, N.S., Bijlani, K., Nair, N.C.: Unsupervised learning of question difficulty levels using assessment responses. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10404, pp. 543–552. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62392-4_39
Chapter Google Scholar
Ozuru, Y., Rowe, M., O’Reilly, T., McNamara, D.S.: Where’s the difficulty in standardized reading tests: The passage or the question? Behav. Res. Methods 40(4), 1001–1015 (2008)
Article Google Scholar
Pandarova, I., Schmidt, T., Hartig, J., Boubekki, A., Jones, R.D., Brefeld, U.: Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring. Int. J. Artif. Intell. Educ. 29(3), 342–367 (2019)
Article Google Scholar
Parry, J.R.: Ensuring fairness in difficulty and content among parallel assessments generated from a test-item database. Online Submission (2020). https://doi.org/10.13140/RG.2.2.32537.03689
Perikos, I., Grivokostopoulou, F., Hatzilygeroudis, I., Kovas, K.: Difficulty estimator for converting natural language into first order logic. In: Intelligent Decision Technologies, pp. 135–144. Springer (2011). https://doi.org/10.1007/978-3-642-22194-1_14
Perikos, I., Grivokostopoulou, F., Kovas, K., Hatzilygeroudis, I.: Automatic estimation of exercises’ difficulty levels in a tutoring system for teaching the conversion of natural language into first-order logic. Exp. Syst. 33(6), 569–580 (2016)
Article Google Scholar
Perkins, K., Gupta, L., Tammana, R.: Predicting item difficulty in a reading comprehension test with an artificial neural network. Lang. Test. 12(1), 34–53 (1995)
Article Google Scholar
Qiu, Z., Wu, X., Fan, W.: Question difficulty prediction for multiple choice problems in medical exams. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 139–148 (2019)
Google Scholar
Rust, J., Golombok, S.: Modern Psychometrics: The Science of Psychological Assessment. Routledge (2014)
Google Scholar
Sano, M.: Automated capturing of psycho-linguistic features in reading assessment text. In: Annual Meeting of the National Council on Measurement in Education, Chicago, IL (2015)
Google Scholar
Seyler, D., Yahya, M., Berberich, K.: Knowledge questions from knowledge graphs. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp. 11–18 (2017)
Google Scholar
Stiller, J., et al.: Assessing scientific reasoning: a comprehensive evaluation of item features that affect item difficulty. Assess. Eval. High. Educ. 41(5), 721–732 (2016)
Article Google Scholar
Susanti, Y., Tokunaga, T., Nishikawa, H., Obari, H.: Controlling item difficulty for automatic vocabulary question generation. Res. Pract. Technol. Enhanced Learn. 12(1), 1–16 (2017). https://doi.org/10.1186/s41039-017-0065-5
Article Google Scholar
Vinu, E.V., Alsubait, T., Sreenivasa Kumar, P.: Modeling of item-difficulty for ontology-based MCQs. CoRR abs/1607.00869 (2016)
Google Scholar
Vinu, E.V., Sreenivasa Kumar, P.: A novel approach to generate MCQs from domain ontology: considering DL semantics and open-world assumption. J. Web Semant. 34, 40–54 (2015)
Article Google Scholar
Vinu, E.V., Sreenivasa Kumar, P.: Automated generation of assessment tests from domain ontologies. Semant. Web 8(6), 1023–1047 (2017)
Article Google Scholar
Vinu, E.V., Sreenivasa Kumar, P.: Difficulty-level modeling of ontology-based factual questions. arXiv preprint arXiv:1709.00670 (2017)
Xue, K., Yaneva, V., Runyon, C., Baldwin, P.: Predicting the difficulty and response time of multiple choice questions using transfer learning. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 193–197 (2020)
Google Scholar
Yeung, C.Y., Lee, J.S., Tsou, B.K.: Difficulty-aware distractor generation for gap-fill items. In: Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, pp. 159–164 (2019)
Google Scholar
Zhou, Y., Zhang, H., Huang, X., Yang, S., Babar, M.A., Tang, H.: Quality assessment of systematic reviews in software engineering: a tertiary study. In: Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–14 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Liverpool, Liverpool, L69 3BX, UK
Samah AlKhuzaey, Floriana Grasso, Terry R. Payne & Valentina Tamma

Authors

Samah AlKhuzaey
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Grasso
View author publications
You can also search for this author in PubMed Google Scholar
Terry R. Payne
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Tamma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samah AlKhuzaey .

Editor information

Editors and Affiliations

Technion – Israel Institute of Technology, Haifa, Israel
Ido Roll
Arizona State University, Tempe, AZ, USA
Danielle McNamara
Utrecht University, Utrecht, The Netherlands
Sergey Sosnovsky
London Knowledge Lab, London, UK
Rose Luckin
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AlKhuzaey, S., Grasso, F., Payne, T.R., Tamma, V. (2021). A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-78292-4_3
Published: 11 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78291-7
Online ISBN: 978-3-030-78292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics