Skip to main content

Advertisement

Log in

Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology

  • ARTICLE
  • Published:
International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

Abstract

Machine learning algorithms that automatically score scientific explanations can be used to measure students’ conceptual understanding, identify gaps in their reasoning, and provide them with timely and individualized feedback. This paper presents the results of a study that uses Hebrew NLP to automatically score student explanations in Biology according to fine-grained analytic grading rubrics that were developed for formative assessment. The experimental results show that our algorithms achieve a high-level of agreement with human experts, on par with previous work on automated assessment of scientific explanations in English, and that \(\sim \)500 examples are typically enough to build reliable scoring models. The main contribution is twofold. First, we present a conceptual framework for constructing analytic grading rubrics for scientific explanations, which are composed of dichotomous categories that generalize across items. These categories are designed to support automated guidance, but can also be used to provide a composite score. Second, we apply this approach in a new context – Hebrew, which belongs to a group of languages known as Morphologically-Rich. In languages of this group, among them also Arabic and Turkish, each input token may consist of multiple lexical and functional units, making them particularly challenging for NLP. This is the first study on automatic assessment of scientific explanations (and more generally, of open-ended questions) in Hebrew, and among the firsts to do so in Morphologically-Rich Languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., ..., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.

  • Alexandron, G., Ruipérez-Valiente, J.A., Chen, Z., Muñoz-Merino, P. J., & Pritchard, D.E. (2017). Copying@ scale: Using harvesting accounts for collecting correct answers in a mooc. Computers & Education, 108, 96–114.

    Article  Google Scholar 

  • Alexandron, G., Wiltrout, M.E., Berg, A., & Ruipérez-Valiente, J.A. (2020). Assessment that matters: Balancing reliability and learner-centered pedagogy in mooc assessment. In Proceedings of the tenth international conference on learning analytics & knowledge (pp. 512–517).

  • Alexandron, G., Yoo, L.Y., Ruipérez-Valiente, J.A., Lee, S., & Pritchard, D.E. (2019). Are mooc learning analytics results trustworthy? with fake learners, they might not be!. International Journal of Artificial Intelligence in Education, 29(4), 484–506.

    Article  Google Scholar 

  • Allen, L.K., Jacovina, M.E., & McNamara, D.S. (2016). Computer-based writing instruction. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.) Handbook of writing research, chapter 21. 2nd edn. (pp. 316–329). Guilford Press.

  • Ariely, M., Nazaretsky, T., & Alexandron, G. (2020). First steps towards nlp-based formative feedback to improve scientific writing in hebrew. In A.N. Rafferty, J. Whitehill, C. Romero, & V. Cavalli-Sforza (Eds.) Proceedings of the 13th international conference on educational data mining (EDM 2020) (pp. 565–568).

  • Baker, R., Walonoski, J., Heffernan, N., Roll, I., Corbett, A., & Koedinger, K. (2008). Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research, 19(2), 185–224.

    Google Scholar 

  • Berland, L.K., & Reiser, B.J. (2009). Making sense of argumentation and explanation. Science Education, 93(1), 26–55.

    Article  Google Scholar 

  • Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assessment. The Phi Delta Kappan, 80(2), 139–148.

    Google Scholar 

  • Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.

    Article  Google Scholar 

  • Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.

    Article  Google Scholar 

  • Chollet, F., et al. (2015). Keras. https://github.com/fchollet/keras.

  • Çınar, A., Ince, E., Gezer, M., & Yılmaz, Ö. (2020). Machine learning algorithm for grading open-ended physics questions in turkish. Education and Information Technologies, 1–24.

  • Cohen, Y., & Ben-Simon, A. (2011). The hebrew language project: Automated essay scoring & readability analysis. In IAEA annual conference, Vienna, Austria.

  • Cohen, Y., Levi, E., & Ben-Simon, A. (2018). Validating human and automated scoring of essays against “true” scores. Applied Measurement in Education, 31(3), 241–250.

    Article  Google Scholar 

  • Ding, Y., Riordan, B., Horbach, A., Cahill, A., & Zesch, T. (2020). Don’t take “nswvtnvakgxpm” for an answer–the surprising vulnerability of automatic content scoring systems to adversarial input. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 882–892).

  • Dzikovska, M.O., Nielsen, R.D., Brew, C., Leacock, C., Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I., & Dang, H.T. (2013). Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. Technical report, NORTH TEXAS STATE UNIV DENTON.

  • Filighera, A., Steuer, T., & Rensing, C. (2020). Fooling automatic short answer grading systems. In International Conference on Artificial Intelligence in Education (pp. 177–190). Springer.

  • Flor, M., & Cahill, A. (2020). Automated scoring of open-ended written responses – possibilities and challenges. Berlin: Springer Science.

    Google Scholar 

  • Gerard, L.F., & Linn, M.C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.

    Article  Google Scholar 

  • Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.

    Article  MathSciNet  MATH  Google Scholar 

  • Gomaa, W.H., & Fahmy, A.A. (2014). Automatic scoring for answers to arabic test questions. Computer Speech & Language, 28(4), 833–857.

    Article  Google Scholar 

  • Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-metrix measures text characteristics at multiple levels of language and discourse. The Elementary School Journal, 115(2), 210–229.

    Article  Google Scholar 

  • Greer, J., & Mark, M. (2016). Evaluation methods for intelligent tutoring systems revisited. International Journal of Artificial Intelligence in Education, 26(1), 387–392.

    Article  Google Scholar 

  • Ha, M., Nehm, R.H., Urban-Lurain, M., & Merrill, J.E. (2011). Applying computerized-scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE—Life Sciences Education, 10(4), 379–393.

    Article  Google Scholar 

  • Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.

    Article  Google Scholar 

  • Heilman, M., & Madnani, N. (2015). The impact of training data on automated short answer scoring performance. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 81–85).

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Horbach, A., Palmer, A., & Pinkal, M. (2013). Using the text to evaluate short answers for reading comprehension exercises. In Second joint conference on lexical and computational semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp. 286–295).

  • Israeli Ministry of Education, a. (2011). Syllabus of Biological Studies (10th-12th grade). State of Israel Ministry of Education Curriculum Center, Jerusalem, Israel.

  • Jacobs, K., Itai, A., & Wintner, S. (2020). Acronyms: identification, expansion and disambiguation. Annals of Mathematics and Artificial Intelligence, 88(5), 517–532.

    Article  MATH  Google Scholar 

  • Jacovi, A., Sar Shalom, O., & Goldberg, Y. (2018). Understanding convolutional neural networks for text classification. In Proceedings of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP (pp. 56–65). Brussels: Association for Computational Linguistics.

  • Jescovitch, L.N., Doherty, J.H., Scott, E.E., Cerchiara, J.A., Wenderoth, M.P., Urban-Lurain, M., Merrill, J., & Haudek, K.C. (2019a). Challenges in developing computerized scoring models for principle-based reasoning in a physiology context.

  • Jescovitch, L.N., Scott, E.E., Cerchiara, J.A., Doherty, J.H., Wenderoth, M.P., Merrill, J.E., Urban-Lurain, M., & Haudek, K.C. (2019b). Deconstruction of holistic rubrics into analytic rubrics for large-scale assessments of students’ reasoning of complex science concepts. Practical Assessment, Research, and Evaluation, 24(1), 7.

    Google Scholar 

  • Jescovitch, L.N., Scott, E.E., Cerchiara, J.A., Merrill, J., Urban-Lurain, M., Doherty, J.H., & Haudek, K.C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.

    Article  Google Scholar 

  • Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058.

  • Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: reliability, validity and educational consequences. Educational Research Review, 2 (2), 130–144.

    Article  Google Scholar 

  • Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv:1404.2188.

  • Kampourakis, K., & Neibert, K. (2018). Explanation in biology education. In K. Kampourakis M.J. Reiss (Eds.) Teaching biology in schools: Global research, issues and trends, chapter 19 (pp. 236–248). New York and Abingdon: Routledge.

  • Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1746–1751). Doha: Association for Computational Linguistics.

  • Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.

  • Klebanov, B.B., Burstein, J., Harackiewicz, J.M., Priniski, S.J., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing stem motivation and retention–can ai help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818.

    Article  Google Scholar 

  • Klebanov, B.B., & Madnani, N. (2020). Automated evaluation of writing–50 years and counting. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 7796–7810).

  • Landis, J.R., & Koch, G.G. (1977). The measurement of observer agreement for categorical data. biometrics. pp. 159–174.

  • Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.

    Article  Google Scholar 

  • Li, H., Gobert, J., & Dickler, R. (2017), Automated assessment for scientific explanations in on-line science inquiry. International Educational Data Mining Society.

  • Litman, D.J. (2016). Natural language processing for enhancing teaching and learning. In AAAI (pp. 4170–4176).

  • Liu, O.L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M.C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.

    Article  Google Scholar 

  • Liu, O.L., Rios, J.A., Heilman, M., Gerard, L., & Linn, M.C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.

    Article  Google Scholar 

  • Madnani, N., Loukina, A., & Cahill, A. (2017a). A large scale quantitative exploration of modeling strategies for content scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 457–467).

  • Madnani, N., Loukina, A., Von Davier, A., Burstein, J., & Cahill, A. (2017b). Building better open-source tools to support fairness in automated scoring. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 41–52).

  • Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2021). Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology, 30(2), 239–254.

    Article  Google Scholar 

  • Matthews, K., Janicki, T., He, L., & Patterson, L. (2012). Implementation of an automated grading system with an adaptive learning component to affect student feedback and response time. Journal of Information Systems Education, 23(1), 71–84.

    Google Scholar 

  • Mayfield, E., & Rosé, C. (2010). An interactive tool for supporting error analysis for text mining. In Proceedings of the NAACL HLT 2010 Demonstration Session (pp. 25–28).

  • Mayfield, E., & Rosé, C.P. (2013). Lightside: Open source machine learning for text. In Handbook of automated essay evaluation: Current applications and new directions (pp. 146–157). Routledge.

  • McNamara, D.S., Crossley, S.A., & Roscoe, R. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45(2), 499–515.

    Article  Google Scholar 

  • McNeill, K.L., & Krajcik, J. (2008). Scientific explanations: Characterizing and evaluating the effects of teachers’ instructional practices on student learning. Journal of Research in Science Teaching, 45(1), 53–78.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.

  • Moharreri, K., Ha, M., & Nehm, R.H. (2014). Evograder: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 1–14.

    Google Scholar 

  • National Research Council (NRC) (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. Cambridge: The National Academies Press.

  • Nehm, R.H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.

    Article  Google Scholar 

  • Nehm, R.H., & Haertig, H. (2012). Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73.

    Article  Google Scholar 

  • Osborne, J.F., & Patterson, A. (2011). Scientific argument and explanation: a necessary distinction? Science Education, 95(4), 627–638.

    Article  Google Scholar 

  • Padó, U. (2016). Get semantic with me! the usefulness of different feature types for short-answer grading. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical Papers (pp. 2186–2195).

  • Pado, U., & Kiefer, C. (2015). Short answer grading: When sorting helps and when it doesn’t. In Proceedings of the fourth workshop on NLP for computer-assisted language learning (pp. 42–50).

  • Rahimi, Z., Litman, D., Correnti, R., Wang, E., & Matsumura, L.C. (2017). Assessing students’ use of evidence and organization in response-to-text writing: Using natural language processing for rubric-based automated scoring. International Journal of Artificial Intelligence in Education, 27 (4), 694–728.

    Article  Google Scholar 

  • Rehurek, R., & Sojka, P. (2010). Software Framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.

  • Roschelle, J., Dimitriadis, Y., & Hoppe, U. (2013). Classroom orchestration: synthesis. Computers & Education, 69, 523–526.

    Article  Google Scholar 

  • Roscoe, R.D., Varner, L.K., Crossley, S.A., & McNamara, D.S. (2013). Developing pedagogically-guided algorithms for intelligent writing feedback. International Journal of Learning Technology 25, 8(4), 362–381.

    Article  Google Scholar 

  • Ross, L.N. (2020). Causal concepts in biology: How pathways differ from mechanisms and why it matters. The British Journal for the Philosophy of Science.

  • Ryoo, K., & Linn, M.C. (2014). Designing guidance for interpreting dynamic visualizations: Generating versus reading explanations. Journal of Research in Science Teaching, 51(2), 147–174.

    Article  Google Scholar 

  • Sakaguchi, K., Heilman, M., & Madnani, N. (2015). Effective feature integration for automated short answer scoring. In Proceedings of the 2015 conference of the North American Chapter of the association for computational linguistics: Human language technologies (pp. 1049–1054).

  • Seddah, D., Tsarfaty, R., Kübler, S., Candito, M., Choi, J., Farkas, R., Foster, J., Goenaga, I., Gojenola, K., Goldberg, Y., & et al. (2013). Overview of the spmrl 2013 shared task: cross-framework evaluation of parsing morphologically rich languages. Association for Computational Linguistics.

  • Segal, A., Hindi, S., Prusak, N., Swidan, O., Livni, A., Palatnic, A., Schwarz, B., & et al. (2017). Keeping the teacher in the loop: Technologies for monitoring group learning in real-time. In International Conference on Artificial Intelligence in Education (pp. 64–76). Springer.

  • Sheinfux, L.H., Greshler, T.A., Melnik, N., & Wintner, S. (2015). Hebrew Verbal multi-word expressions. In Proceedings of the 22nd International Conference on Head-Driven Phrase Structure Grammar, Nanyang Technological University, NTU, Singapore (pp. 122–135).

  • Songer, N.B., & Gotwals, A.W. (2012). Guiding explanation construction by children at the entry points of learning progressions. Journal of Research in Science Teaching, 49(2), 141–165.

    Article  Google Scholar 

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Strobl, C., Ailhaud, E., Benetos, K., Devitt, A., Kruse, O., Proske, A., & Rapp, C. (2019). Digital support for academic writing: a review of technologies and pedagogies. Computers & Education, 131, 33–48.

    Article  Google Scholar 

  • Sung, C., Dhamecha, T.I., & Mukhi, N. (2019). Improving short answer grading using transformer-based pre-training. In International Conference on Artificial Intelligence in Education (pp. 469–481). Springer.

  • Taghipour, K., & Ng, H.T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 1882–1891).

  • Tang, K. -S. (2016). Constructing scientific explanations through premise–reasoning–outcome (PRO): an exploratory study to scaffold students in structuring written explanations. International Journal of Science Education, 38(9), 1415–1440.

    Article  Google Scholar 

  • Tansomboon, C., Gerard, L.F., Vitale, J.M., & Linn, M.C. (2017). Designing automated guidance to promote productive revision of science explanations. International Journal of Artificial Intelligence in Education, 27(4), 729–757.

    Article  Google Scholar 

  • Taras, M. (2005). Assessment – summative and formative – some theoretical reflections. British Journal of Educational Studies, 53(4), 466–478.

    Article  Google Scholar 

  • Tompson, J., Goroshin, R., Jain, A., LeCun, Y., & Bregler, C. (2015). Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).

  • Tsarfaty, R., Bareket, D., Klein, S., & Seker, A. (2020). From spmrl to nmrl: What did we learn (and unlearn) in a decade of parsing morphologically-rich languages (mrls)? arXiv:2005.01330.

  • Tsarfaty, R., Sadde, S., Klein, S., & Seker, A. (2019). What’s Wrong with hebrew nlp? and how to make it right. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP); system demonstrations (pp. 259–264).

  • Tsarfaty, R., Seddah, D., Kübler, S., & Nivre, J. (2013). Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics, 39(1), 15–22.

    Article  Google Scholar 

  • Wang, C., Liu, X., Wang, L., Sun, Y., & Zhang, H. (2021). Automated scoring of chinese grades 7–9 students’ competence in interpreting and arguing from evidence. Journal of Science Education and Technology, 30(2), 269–282.

    Article  Google Scholar 

  • Weston, M., Parker, J., & Urban-Lurain, M. (2013). Comparing formative feedback reports: Human and automated text analysis of constructed response questions in biology. In NARST annual conference, Rio Grande, Puerto Rico.

  • Williamson, D.M., Xi, X., & Breyer, F.J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.

    Article  Google Scholar 

  • Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–36.

    Article  Google Scholar 

  • Woods, B., Adamson, D., Miel, S., & Mayfield, E. (2017). Formative essay feedback using predictive scoring models. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2071–2080).

  • Yao, L., Cahill, A., & McCaffrey, D.F. (2020). The impact of training data quality on automated content scoring performance.

  • Yune, S.J., Lee, S.Y., Im, S.J., Kam, B.S., & Baek, S.Y. (2018). Holistic rubric vs. analytic rubric for measuring clinical performance levels in medical students. BMC Medical Education, 18(1), 1–6.

    Article  Google Scholar 

  • Zesch, T., Heilman, M., & Cahill, A. (2015). Reducing annotation efforts in supervised short answer scoring. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 124–132).

  • Zhai, X. (2021). Practices and theories: How can machine learning assist in innovative assessment practices in science education. Journal of Science Education and Technology, 30(2), 139–149.

    Article  Google Scholar 

  • Zhai, X., Yin, Y., Pellegrino, J.W., Haudek, K.C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.

    Article  Google Scholar 

  • Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820.

  • Zhu, M., Lee, H.-S., Wang, T., Liu, O.L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.

    Article  Google Scholar 

  • Zhu, M., Liu, O.L., & Lee, H. -S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668.

    Article  Google Scholar 

  • Zhu, M., Liu, O.L., Mao, L., & Pallant, A. (2016). Use of automated scoring and feedback in online interactive earth science tasks. In 2016 IEEE Integrated STEM Education Conference (ISEC) (pp. 224–230). IEEE.

Download references

Acknowledgements

The authors thank Cipy Hofman for her contribution. The research of GA and MA was supported by the Willner Family Leadership Institute for the Weizmann Institute of Science and the Iancovici-Fallmann Memorial Fund, established by Ruth and Henry Yancovich. TN is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moriah Ariely.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Moriah Ariely and Tanya Nazaretsky contributed equally to the paper.

Appendix

Appendix

Fig. 4
figure 4

Examples of students’ responses to the “Anemia”, “Smoking”, and “Height” items, in their original form in Hebrew (Hebrew is a right-to-left language)

Table 7 Performance of the item-level models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)
Table 8 Performance of the between-items models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)
Table 9 Performance of the instrument-level (C-M) and item-level (A-B) models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ariely, M., Nazaretsky, T. & Alexandron, G. Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology. Int J Artif Intell Educ 33, 1–34 (2023). https://doi.org/10.1007/s40593-021-00283-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40593-021-00283-x

Keywords

Navigation