Skip to main content

A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2021)

Abstract

Assessment quality and validity is heavily reliant on the quality of items included in an assessment or test. Difficulty is an essential factor that can determine items and tests’ overall quality. Therefore, item difficulty prediction is extremely important in any pedagogical learning environment. Data-driven approaches to item difficulty prediction are gaining more and more prominence, as demonstrated by the recent literature. In this paper, we provide a systematic review of data-driven approaches to item difficulty prediction. Of the 148 papers that were identified that cover item difficulty prediction, 38 papers were selected for the final analysis. A classification of the different approaches used to predict item difficulty is presented, together with the current practices for item difficulty prediction with respect to the learning algorithms used, and the most influential difficulty features that were investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://ieeexplore.ieee.org/Xplore/home.jsp.

  2. 2.

    https://dl.acm.org/.

  3. 3.

    https://www.sciencedirect.com/.

  4. 4.

    https://www.springer.com/.

  5. 5.

    https://www.elsevier.com/.

References

  1. Alsubait, T., Parsia, B., Sattler, U.: Generating multiple choice questions from ontologies: lessons learnt. In: Keet, C.M., Tamma, V.A.M. (eds.) Proceedings of 11th International Workshop on OWL: Experiences and Directions (OWLED 2014). CEUR Workshop Proceedings, vol. 1265, pp. 73–84 (2014)

    Google Scholar 

  2. Aryadoust, V.: Predicting item difficulty in a language test with an adaptive neuro fuzzy inference system. In: 2013 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA), pp. 43–50. IEEE (2013)

    Google Scholar 

  3. Beinborn, L., Zesch, T., Gurevych, I.: Predicting the difficulty of language proficiency tests. Trans. Assoc. Comput. Linguist. 2, 517–530 (2014)

    Article  Google Scholar 

  4. Beinborn, L., Zesch, T., Gurevych, I.: Candidate evaluation strategies for improved difficulty prediction of language tests. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–11 (2015)

    Google Scholar 

  5. Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: Introducing a framework to assess newly created questions with natural language processing. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 43–54. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_4

    Chapter  Google Scholar 

  6. Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: R2de: a NLP approach to estimating IRT parameters of newly generated questions. In: Proceedings of the 10th International Conference on Learning Analytics & Knowledge, pp. 412–421 (2020b)

    Google Scholar 

  7. Bilotti, M.W., Ogilvie, P., Callan, J., Nyberg, E.: Structured retrieval for question answering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 351–358 (2007)

    Google Scholar 

  8. Boldt, R.F.: GRE analytical reasoning item statistics prediction study. ETS Res. Rep. Series 1998(2), 1–23 (1998)

    Article  Google Scholar 

  9. Boldt, R.F., Freedle, R.: Using a neural net to predict item difficulty. ETS Res. Rep. Series 1996(2), 1–19 (1996)

    Article  Google Scholar 

  10. Choi, I.C., Moon, Y.: Predicting the difficulty of EFL tests based on corpus linguistic features and expert judgment. Lang. Assess. Q. 17(1), 18–42 (2020)

    Article  Google Scholar 

  11. Crisp, V., Grayson, R.: Modelling question difficulty in an a level physics examination. Res. Papers Educ. 28(3), 346–372 (2013)

    Article  Google Scholar 

  12. Fei, T., Heng, W.J., Toh, K.C., Qi, T.: Question classification for e-learning by artificial neural network. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 3, pp. 1757–1761. IEEE (2003)

    Google Scholar 

  13. Franzen, M.: Item difficulty. Encycl. Clin. Neuropsychol. 100 (2011)

    Google Scholar 

  14. Gao, Y., Bing, L., Chen, W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. arXiv preprint arXiv:1807.03586 (2018)

  15. Grivokostopoulou, F., Hatzilygeroudis, I., Perikos, I.: Teaching assistance and automatic difficulty estimation in converting first order logic to clause form. Artif. Intell. Rev. 42(3), 347–367 (2013). https://doi.org/10.1007/s10462-013-9417-8

    Article  Google Scholar 

  16. Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I.: Estimating the difficulty of exercises on search algorithms using a neuro-fuzzy approach. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 866–872. IEEE (2015)

    Google Scholar 

  17. Grivokostopoulou, F., Perikos, I., Hatzilygeroudis, I.: Difficulty estimation of exercises on tree-based search algorithms using neuro-fuzzy and neuro-symbolic approaches. In: Hatzilygeroudis, I., Palade, V., Prentzas, J. (eds.) Advances in Combining Intelligent Methods. ISRL, vol. 116, pp. 75–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-46200-4_4

    Chapter  Google Scholar 

  18. Ha, V., Baldwin, P., Mee, J., et al.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019)

    Google Scholar 

  19. Hoshino, A., Nakagawa, H.: Predicting the difficulty of multiple-choice close questions for computer-adaptive testing. Nat. Lang. Process. Appl. 46, 279 (2010)

    Google Scholar 

  20. Hsu, F.Y., Lee, H.M., Chang, T.H., Sung, Y.T.: Automated estimation of item difficulty for multiple-choice tests: an application of word embedding techniques. Inf. Process. Manage. 54(6), 969–984 (2018)

    Article  Google Scholar 

  21. Huang, Z., et al.: Question difficulty prediction for reading problems in standard tests. In: AAAI, pp. 1352–1359 (2017)

    Google Scholar 

  22. Hutzler, D., David, E., Avigal, M., Azoulay, R.: Learning methods for rating the difficulty of reading comprehension questions. In: 2014 IEEE International Conference on Software Science, Technology and Engineering, pp. 54–62. IEEE (2014)

    Google Scholar 

  23. Khodeir, N.A., Elazhary, H., Wanas, N.: Generating story problems via controlled parameters in a web-based intelligent tutoring system. Int. J. Inf. Learn. Technol. 35(3), 199–216 (2018)

    Article  Google Scholar 

  24. Khoshdel, F., Baghaei, P., Bemani, M.: Investigating factors of difficulty in c-tests: a construct identification approach. Int. J. Lang. Test. 6(2), 113–122 (2016)

    Google Scholar 

  25. Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE 2007–001, Keele University and Durham University Joint Report (07 2007)

    Google Scholar 

  26. Kurdi, G., et al.: A Comparative Study of Methods for a Priori Prediction of MCQ Difficulty. Semantic Web - Interoperability, Usability, Applicability (2020)

    Google Scholar 

  27. Lin, C., Liu, D., Pang, W., Apeh, E.: Automatically predicting quiz difficulty level using similarity measures. In: Proceedings of the 8th International Conference on Knowledge Capture, pp. 1–8 (2015)

    Google Scholar 

  28. Lin, L.H., Chang, T.H., Hsu, F.Y.: Automated prediction of item difficulty in reading comprehension using long short-term memory. In: 2019 International Conference on Asian Language Processing (IALP), pp. 132–135. IEEE (2019)

    Google Scholar 

  29. Loukina, A., Yoon, S.Y., Sakano, J., Wei, Y., Sheehan, K.: Textual complexity as a predictor of difficulty of listening items in language proficiency tests. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3245–3253 (2016)

    Google Scholar 

  30. Mitra, N., Nagaraja, H., Ponnudurai, G., Judson, J.: The levels of difficulty and discrimination indices in type a multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests. Int. e-J. Sci. Med. Educ. (IeJSME) 3(1), 2–7 (2009)

    Google Scholar 

  31. Narayanan, S., Kommuri, V.S., Subramanian, N.S., Bijlani, K., Nair, N.C.: Unsupervised learning of question difficulty levels using assessment responses. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10404, pp. 543–552. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62392-4_39

    Chapter  Google Scholar 

  32. Ozuru, Y., Rowe, M., O’Reilly, T., McNamara, D.S.: Where’s the difficulty in standardized reading tests: The passage or the question? Behav. Res. Methods 40(4), 1001–1015 (2008)

    Article  Google Scholar 

  33. Pandarova, I., Schmidt, T., Hartig, J., Boubekki, A., Jones, R.D., Brefeld, U.: Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring. Int. J. Artif. Intell. Educ. 29(3), 342–367 (2019)

    Article  Google Scholar 

  34. Parry, J.R.: Ensuring fairness in difficulty and content among parallel assessments generated from a test-item database. Online Submission (2020). https://doi.org/10.13140/RG.2.2.32537.03689

  35. Perikos, I., Grivokostopoulou, F., Hatzilygeroudis, I., Kovas, K.: Difficulty estimator for converting natural language into first order logic. In: Intelligent Decision Technologies, pp. 135–144. Springer (2011). https://doi.org/10.1007/978-3-642-22194-1_14

  36. Perikos, I., Grivokostopoulou, F., Kovas, K., Hatzilygeroudis, I.: Automatic estimation of exercises’ difficulty levels in a tutoring system for teaching the conversion of natural language into first-order logic. Exp. Syst. 33(6), 569–580 (2016)

    Article  Google Scholar 

  37. Perkins, K., Gupta, L., Tammana, R.: Predicting item difficulty in a reading comprehension test with an artificial neural network. Lang. Test. 12(1), 34–53 (1995)

    Article  Google Scholar 

  38. Qiu, Z., Wu, X., Fan, W.: Question difficulty prediction for multiple choice problems in medical exams. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 139–148 (2019)

    Google Scholar 

  39. Rust, J., Golombok, S.: Modern Psychometrics: The Science of Psychological Assessment. Routledge (2014)

    Google Scholar 

  40. Sano, M.: Automated capturing of psycho-linguistic features in reading assessment text. In: Annual Meeting of the National Council on Measurement in Education, Chicago, IL (2015)

    Google Scholar 

  41. Seyler, D., Yahya, M., Berberich, K.: Knowledge questions from knowledge graphs. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp. 11–18 (2017)

    Google Scholar 

  42. Stiller, J., et al.: Assessing scientific reasoning: a comprehensive evaluation of item features that affect item difficulty. Assess. Eval. High. Educ. 41(5), 721–732 (2016)

    Article  Google Scholar 

  43. Susanti, Y., Tokunaga, T., Nishikawa, H., Obari, H.: Controlling item difficulty for automatic vocabulary question generation. Res. Pract. Technol. Enhanced Learn. 12(1), 1–16 (2017). https://doi.org/10.1186/s41039-017-0065-5

    Article  Google Scholar 

  44. Vinu, E.V., Alsubait, T., Sreenivasa Kumar, P.: Modeling of item-difficulty for ontology-based MCQs. CoRR abs/1607.00869 (2016)

    Google Scholar 

  45. Vinu, E.V., Sreenivasa Kumar, P.: A novel approach to generate MCQs from domain ontology: considering DL semantics and open-world assumption. J. Web Semant. 34, 40–54 (2015)

    Article  Google Scholar 

  46. Vinu, E.V., Sreenivasa Kumar, P.: Automated generation of assessment tests from domain ontologies. Semant. Web 8(6), 1023–1047 (2017)

    Article  Google Scholar 

  47. Vinu, E.V., Sreenivasa Kumar, P.: Difficulty-level modeling of ontology-based factual questions. arXiv preprint arXiv:1709.00670 (2017)

  48. Xue, K., Yaneva, V., Runyon, C., Baldwin, P.: Predicting the difficulty and response time of multiple choice questions using transfer learning. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 193–197 (2020)

    Google Scholar 

  49. Yeung, C.Y., Lee, J.S., Tsou, B.K.: Difficulty-aware distractor generation for gap-fill items. In: Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, pp. 159–164 (2019)

    Google Scholar 

  50. Zhou, Y., Zhang, H., Huang, X., Yang, S., Babar, M.A., Tang, H.: Quality assessment of systematic reviews in software engineering: a tertiary study. In: Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–14 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samah AlKhuzaey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

AlKhuzaey, S., Grasso, F., Payne, T.R., Tamma, V. (2021). A Systematic Review of Data-Driven Approaches to Item Difficulty Prediction. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78292-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78291-7

  • Online ISBN: 978-3-030-78292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics