Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology

Ariely, Moriah; Nazaretsky, Tanya; Alexandron, Giora

doi:10.1007/s40593-021-00283-x

Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology

ARTICLE
Published: 03 January 2022

Volume 33, pages 1–34, (2023)
Cite this article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

2462 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Machine learning algorithms that automatically score scientific explanations can be used to measure students’ conceptual understanding, identify gaps in their reasoning, and provide them with timely and individualized feedback. This paper presents the results of a study that uses Hebrew NLP to automatically score student explanations in Biology according to fine-grained analytic grading rubrics that were developed for formative assessment. The experimental results show that our algorithms achieve a high-level of agreement with human experts, on par with previous work on automated assessment of scientific explanations in English, and that \(\sim \)500 examples are typically enough to build reliable scoring models. The main contribution is twofold. First, we present a conceptual framework for constructing analytic grading rubrics for scientific explanations, which are composed of dichotomous categories that generalize across items. These categories are designed to support automated guidance, but can also be used to provide a composite score. Second, we apply this approach in a new context – Hebrew, which belongs to a group of languages known as Morphologically-Rich. In languages of this group, among them also Arabic and Turkish, each input token may consist of multiple lexical and functional units, making them particularly challenging for NLP. This is the first study on automatic assessment of scientific explanations (and more generally, of open-ended questions) in Hebrew, and among the firsts to do so in Morphologically-Rich Languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Article Open access 18 December 2023

Assessing Question Quality Using NLP

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

Article 09 September 2020

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., ..., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
Alexandron, G., Ruipérez-Valiente, J.A., Chen, Z., Muñoz-Merino, P. J., & Pritchard, D.E. (2017). Copying@ scale: Using harvesting accounts for collecting correct answers in a mooc. Computers & Education, 108, 96–114.
Article Google Scholar
Alexandron, G., Wiltrout, M.E., Berg, A., & Ruipérez-Valiente, J.A. (2020). Assessment that matters: Balancing reliability and learner-centered pedagogy in mooc assessment. In Proceedings of the tenth international conference on learning analytics & knowledge (pp. 512–517).
Alexandron, G., Yoo, L.Y., Ruipérez-Valiente, J.A., Lee, S., & Pritchard, D.E. (2019). Are mooc learning analytics results trustworthy? with fake learners, they might not be!. International Journal of Artificial Intelligence in Education, 29(4), 484–506.
Article Google Scholar
Allen, L.K., Jacovina, M.E., & McNamara, D.S. (2016). Computer-based writing instruction. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.) Handbook of writing research, chapter 21. 2nd edn. (pp. 316–329). Guilford Press.
Ariely, M., Nazaretsky, T., & Alexandron, G. (2020). First steps towards nlp-based formative feedback to improve scientific writing in hebrew. In A.N. Rafferty, J. Whitehill, C. Romero, & V. Cavalli-Sforza (Eds.) Proceedings of the 13th international conference on educational data mining (EDM 2020) (pp. 565–568).
Baker, R., Walonoski, J., Heffernan, N., Roll, I., Corbett, A., & Koedinger, K. (2008). Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research, 19(2), 185–224.
Google Scholar
Berland, L.K., & Reiser, B.J. (2009). Making sense of argumentation and explanation. Science Education, 93(1), 26–55.
Article Google Scholar
Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assessment. The Phi Delta Kappan, 80(2), 139–148.
Google Scholar
Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.
Article Google Scholar
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.
Article Google Scholar
Chollet, F., et al. (2015). Keras. https://github.com/fchollet/keras.
Çınar, A., Ince, E., Gezer, M., & Yılmaz, Ö. (2020). Machine learning algorithm for grading open-ended physics questions in turkish. Education and Information Technologies, 1–24.
Cohen, Y., & Ben-Simon, A. (2011). The hebrew language project: Automated essay scoring & readability analysis. In IAEA annual conference, Vienna, Austria.
Cohen, Y., Levi, E., & Ben-Simon, A. (2018). Validating human and automated scoring of essays against “true” scores. Applied Measurement in Education, 31(3), 241–250.
Article Google Scholar
Ding, Y., Riordan, B., Horbach, A., Cahill, A., & Zesch, T. (2020). Don’t take “nswvtnvakgxpm” for an answer–the surprising vulnerability of automatic content scoring systems to adversarial input. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 882–892).
Dzikovska, M.O., Nielsen, R.D., Brew, C., Leacock, C., Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I., & Dang, H.T. (2013). Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. Technical report, NORTH TEXAS STATE UNIV DENTON.
Filighera, A., Steuer, T., & Rensing, C. (2020). Fooling automatic short answer grading systems. In International Conference on Artificial Intelligence in Education (pp. 177–190). Springer.
Flor, M., & Cahill, A. (2020). Automated scoring of open-ended written responses – possibilities and challenges. Berlin: Springer Science.
Google Scholar
Gerard, L.F., & Linn, M.C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.
Article Google Scholar
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.
Article MathSciNet MATH Google Scholar
Gomaa, W.H., & Fahmy, A.A. (2014). Automatic scoring for answers to arabic test questions. Computer Speech & Language, 28(4), 833–857.
Article Google Scholar
Graesser, A.C., McNamara, D.S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-metrix measures text characteristics at multiple levels of language and discourse. The Elementary School Journal, 115(2), 210–229.
Article Google Scholar
Greer, J., & Mark, M. (2016). Evaluation methods for intelligent tutoring systems revisited. International Journal of Artificial Intelligence in Education, 26(1), 387–392.
Article Google Scholar
Ha, M., Nehm, R.H., Urban-Lurain, M., & Merrill, J.E. (2011). Applying computerized-scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE—Life Sciences Education, 10(4), 379–393.
Article Google Scholar
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Article Google Scholar
Heilman, M., & Madnani, N. (2015). The impact of training data on automated short answer scoring performance. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 81–85).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
Article Google Scholar
Horbach, A., Palmer, A., & Pinkal, M. (2013). Using the text to evaluate short answers for reading comprehension exercises. In Second joint conference on lexical and computational semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp. 286–295).
Israeli Ministry of Education, a. (2011). Syllabus of Biological Studies (10th-12th grade). State of Israel Ministry of Education Curriculum Center, Jerusalem, Israel.
Jacobs, K., Itai, A., & Wintner, S. (2020). Acronyms: identification, expansion and disambiguation. Annals of Mathematics and Artificial Intelligence, 88(5), 517–532.
Article MATH Google Scholar
Jacovi, A., Sar Shalom, O., & Goldberg, Y. (2018). Understanding convolutional neural networks for text classification. In Proceedings of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP (pp. 56–65). Brussels: Association for Computational Linguistics.
Jescovitch, L.N., Doherty, J.H., Scott, E.E., Cerchiara, J.A., Wenderoth, M.P., Urban-Lurain, M., Merrill, J., & Haudek, K.C. (2019a). Challenges in developing computerized scoring models for principle-based reasoning in a physiology context.
Jescovitch, L.N., Scott, E.E., Cerchiara, J.A., Doherty, J.H., Wenderoth, M.P., Merrill, J.E., Urban-Lurain, M., & Haudek, K.C. (2019b). Deconstruction of holistic rubrics into analytic rubrics for large-scale assessments of students’ reasoning of complex science concepts. Practical Assessment, Research, and Evaluation, 24(1), 7.
Google Scholar
Jescovitch, L.N., Scott, E.E., Cerchiara, J.A., Merrill, J., Urban-Lurain, M., Doherty, J.H., & Haudek, K.C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.
Article Google Scholar
Johnson, R., & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: reliability, validity and educational consequences. Educational Research Review, 2 (2), 130–144.
Article Google Scholar
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv:1404.2188.
Kampourakis, K., & Neibert, K. (2018). Explanation in biology education. In K. Kampourakis M.J. Reiss (Eds.) Teaching biology in schools: Global research, issues and trends, chapter 19 (pp. 236–248). New York and Abingdon: Routledge.
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1746–1751). Doha: Association for Computational Linguistics.
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
Klebanov, B.B., Burstein, J., Harackiewicz, J.M., Priniski, S.J., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing stem motivation and retention–can ai help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818.
Article Google Scholar
Klebanov, B.B., & Madnani, N. (2020). Automated evaluation of writing–50 years and counting. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 7796–7810).
Landis, J.R., & Koch, G.G. (1977). The measurement of observer agreement for categorical data. biometrics. pp. 159–174.
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.
Article Google Scholar
Li, H., Gobert, J., & Dickler, R. (2017), Automated assessment for scientific explanations in on-line science inquiry. International Educational Data Mining Society.
Litman, D.J. (2016). Natural language processing for enhancing teaching and learning. In AAAI (pp. 4170–4176).
Liu, O.L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M.C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.
Article Google Scholar
Liu, O.L., Rios, J.A., Heilman, M., Gerard, L., & Linn, M.C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.
Article Google Scholar
Madnani, N., Loukina, A., & Cahill, A. (2017a). A large scale quantitative exploration of modeling strategies for content scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 457–467).
Madnani, N., Loukina, A., Von Davier, A., Burstein, J., & Cahill, A. (2017b). Building better open-source tools to support fairness in automated scoring. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 41–52).
Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2021). Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology, 30(2), 239–254.
Article Google Scholar
Matthews, K., Janicki, T., He, L., & Patterson, L. (2012). Implementation of an automated grading system with an adaptive learning component to affect student feedback and response time. Journal of Information Systems Education, 23(1), 71–84.
Google Scholar
Mayfield, E., & Rosé, C. (2010). An interactive tool for supporting error analysis for text mining. In Proceedings of the NAACL HLT 2010 Demonstration Session (pp. 25–28).
Mayfield, E., & Rosé, C.P. (2013). Lightside: Open source machine learning for text. In Handbook of automated essay evaluation: Current applications and new directions (pp. 146–157). Routledge.
McNamara, D.S., Crossley, S.A., & Roscoe, R. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45(2), 499–515.
Article Google Scholar
McNeill, K.L., & Krajcik, J. (2008). Scientific explanations: Characterizing and evaluating the effects of teachers’ instructional practices on student learning. Journal of Research in Science Teaching, 45(1), 53–78.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.
Moharreri, K., Ha, M., & Nehm, R.H. (2014). Evograder: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 1–14.
Google Scholar
National Research Council (NRC) (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. Cambridge: The National Academies Press.
Nehm, R.H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.
Article Google Scholar
Nehm, R.H., & Haertig, H. (2012). Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73.
Article Google Scholar
Osborne, J.F., & Patterson, A. (2011). Scientific argument and explanation: a necessary distinction? Science Education, 95(4), 627–638.
Article Google Scholar
Padó, U. (2016). Get semantic with me! the usefulness of different feature types for short-answer grading. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical Papers (pp. 2186–2195).
Pado, U., & Kiefer, C. (2015). Short answer grading: When sorting helps and when it doesn’t. In Proceedings of the fourth workshop on NLP for computer-assisted language learning (pp. 42–50).
Rahimi, Z., Litman, D., Correnti, R., Wang, E., & Matsumura, L.C. (2017). Assessing students’ use of evidence and organization in response-to-text writing: Using natural language processing for rubric-based automated scoring. International Journal of Artificial Intelligence in Education, 27 (4), 694–728.
Article Google Scholar
Rehurek, R., & Sojka, P. (2010). Software Framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.
Roschelle, J., Dimitriadis, Y., & Hoppe, U. (2013). Classroom orchestration: synthesis. Computers & Education, 69, 523–526.
Article Google Scholar
Roscoe, R.D., Varner, L.K., Crossley, S.A., & McNamara, D.S. (2013). Developing pedagogically-guided algorithms for intelligent writing feedback. International Journal of Learning Technology 25, 8(4), 362–381.
Article Google Scholar
Ross, L.N. (2020). Causal concepts in biology: How pathways differ from mechanisms and why it matters. The British Journal for the Philosophy of Science.
Ryoo, K., & Linn, M.C. (2014). Designing guidance for interpreting dynamic visualizations: Generating versus reading explanations. Journal of Research in Science Teaching, 51(2), 147–174.
Article Google Scholar
Sakaguchi, K., Heilman, M., & Madnani, N. (2015). Effective feature integration for automated short answer scoring. In Proceedings of the 2015 conference of the North American Chapter of the association for computational linguistics: Human language technologies (pp. 1049–1054).
Seddah, D., Tsarfaty, R., Kübler, S., Candito, M., Choi, J., Farkas, R., Foster, J., Goenaga, I., Gojenola, K., Goldberg, Y., & et al. (2013). Overview of the spmrl 2013 shared task: cross-framework evaluation of parsing morphologically rich languages. Association for Computational Linguistics.
Segal, A., Hindi, S., Prusak, N., Swidan, O., Livni, A., Palatnic, A., Schwarz, B., & et al. (2017). Keeping the teacher in the loop: Technologies for monitoring group learning in real-time. In International Conference on Artificial Intelligence in Education (pp. 64–76). Springer.
Sheinfux, L.H., Greshler, T.A., Melnik, N., & Wintner, S. (2015). Hebrew Verbal multi-word expressions. In Proceedings of the 22nd International Conference on Head-Driven Phrase Structure Grammar, Nanyang Technological University, NTU, Singapore (pp. 122–135).
Songer, N.B., & Gotwals, A.W. (2012). Guiding explanation construction by children at the entry points of learning progressions. Journal of Research in Science Teaching, 49(2), 141–165.
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Strobl, C., Ailhaud, E., Benetos, K., Devitt, A., Kruse, O., Proske, A., & Rapp, C. (2019). Digital support for academic writing: a review of technologies and pedagogies. Computers & Education, 131, 33–48.
Article Google Scholar
Sung, C., Dhamecha, T.I., & Mukhi, N. (2019). Improving short answer grading using transformer-based pre-training. In International Conference on Artificial Intelligence in Education (pp. 469–481). Springer.
Taghipour, K., & Ng, H.T. (2016). A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 1882–1891).
Tang, K. -S. (2016). Constructing scientific explanations through premise–reasoning–outcome (PRO): an exploratory study to scaffold students in structuring written explanations. International Journal of Science Education, 38(9), 1415–1440.
Article Google Scholar
Tansomboon, C., Gerard, L.F., Vitale, J.M., & Linn, M.C. (2017). Designing automated guidance to promote productive revision of science explanations. International Journal of Artificial Intelligence in Education, 27(4), 729–757.
Article Google Scholar
Taras, M. (2005). Assessment – summative and formative – some theoretical reflections. British Journal of Educational Studies, 53(4), 466–478.
Article Google Scholar
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., & Bregler, C. (2015). Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).
Tsarfaty, R., Bareket, D., Klein, S., & Seker, A. (2020). From spmrl to nmrl: What did we learn (and unlearn) in a decade of parsing morphologically-rich languages (mrls)? arXiv:2005.01330.
Tsarfaty, R., Sadde, S., Klein, S., & Seker, A. (2019). What’s Wrong with hebrew nlp? and how to make it right. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP); system demonstrations (pp. 259–264).
Tsarfaty, R., Seddah, D., Kübler, S., & Nivre, J. (2013). Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics, 39(1), 15–22.
Article Google Scholar
Wang, C., Liu, X., Wang, L., Sun, Y., & Zhang, H. (2021). Automated scoring of chinese grades 7–9 students’ competence in interpreting and arguing from evidence. Journal of Science Education and Technology, 30(2), 269–282.
Article Google Scholar
Weston, M., Parker, J., & Urban-Lurain, M. (2013). Comparing formative feedback reports: Human and automated text analysis of constructed response questions in biology. In NARST annual conference, Rio Grande, Puerto Rico.
Williamson, D.M., Xi, X., & Breyer, F.J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
Article Google Scholar
Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–36.
Article Google Scholar
Woods, B., Adamson, D., Miel, S., & Mayfield, E. (2017). Formative essay feedback using predictive scoring models. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2071–2080).
Yao, L., Cahill, A., & McCaffrey, D.F. (2020). The impact of training data quality on automated content scoring performance.
Yune, S.J., Lee, S.Y., Im, S.J., Kam, B.S., & Baek, S.Y. (2018). Holistic rubric vs. analytic rubric for measuring clinical performance levels in medical students. BMC Medical Education, 18(1), 1–6.
Article Google Scholar
Zesch, T., Heilman, M., & Cahill, A. (2015). Reducing annotation efforts in supervised short answer scoring. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 124–132).
Zhai, X. (2021). Practices and theories: How can machine learning assist in innovative assessment practices in science education. Journal of Science Education and Technology, 30(2), 139–149.
Article Google Scholar
Zhai, X., Yin, Y., Pellegrino, J.W., Haudek, K.C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.
Article Google Scholar
Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820.
Zhu, M., Lee, H.-S., Wang, T., Liu, O.L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.
Article Google Scholar
Zhu, M., Liu, O.L., & Lee, H. -S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668.
Article Google Scholar
Zhu, M., Liu, O.L., Mao, L., & Pallant, A. (2016). Use of automated scoring and feedback in online interactive earth science tasks. In 2016 IEEE Integrated STEM Education Conference (ISEC) (pp. 224–230). IEEE.

Download references

Acknowledgements

The authors thank Cipy Hofman for her contribution. The research of GA and MA was supported by the Willner Family Leadership Institute for the Weizmann Institute of Science and the Iancovici-Fallmann Memorial Fund, established by Ruth and Henry Yancovich. TN is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship.

Author information

Authors and Affiliations

Weizmann Institute of Science, Rehovot, Israel
Moriah Ariely, Tanya Nazaretsky & Giora Alexandron

Authors

Moriah Ariely
View author publications
You can also search for this author in PubMed Google Scholar
Tanya Nazaretsky
View author publications
You can also search for this author in PubMed Google Scholar
Giora Alexandron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moriah Ariely.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Moriah Ariely and Tanya Nazaretsky contributed equally to the paper.

Appendix

Table 7 Performance of the item-level models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)

Full size table

Table 8 Performance of the between-items models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)

Full size table

Table 9 Performance of the instrument-level (C-M) and item-level (A-B) models, colored according to the interpretation of their Kappa value: Good or very good (light gray), moderate (gray), and fair or less (dark gray)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ariely, M., Nazaretsky, T. & Alexandron, G. Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology. Int J Artif Intell Educ 33, 1–34 (2023). https://doi.org/10.1007/s40593-021-00283-x

Download citation

Accepted: 19 September 2021
Published: 03 January 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s40593-021-00283-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology

Abstract

Access this article

Similar content being viewed by others

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Assessing Question Quality Using NLP

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology

Abstract

Access this article

Similar content being viewed by others

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Assessing Question Quality Using NLP

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation