Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Ruseti, Stefan; Paraschiv, Ionut; Dascalu, Mihai; McNamara, Danielle S.

doi:10.1007/s40593-024-00402-4

107 Accesses
Explore all metrics

Abstract

Automated Essay Scoring (AES) is a well-studied problem in Natural Language Processing applied in education. Solutions vary from handcrafted linguistic features to large Transformer-based models, implying a significant effort in feature extraction and model implementation. We introduce a novel Automated Machine Learning (AutoML) pipeline integrated into the ReaderBench platform designed to simplify the process of training AES models by automating both feature extraction and architecture tuning for any multilingual dataset uploaded by the user. The dataset must contain a list of texts, each with potentially multiple annotations, either scores or labels. The platform includes traditional ML models relying on linguistic features and a hybrid approach combining Transformer-based architectures with the previous features. Our method was evaluated on three publicly available datasets in three different languages (English, Portuguese, and French) and compared with the best currently published results on these datasets. Our automated approach achieved comparable results to state-of-the-art models on two datasets, while it obtained the best performance on the third corpus in Portuguese.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features

Article 07 February 2017

A Study on Performance Sensitivity to Data Sparsity for Automated Essay Scoring

A review of deep-neural automated essay scoring models

Article Open access 20 July 2021

Data Availability

All three datasets used for evaluation are publicly available: ASAP - https://www.kaggle.com/c/asap-aes; Essay-BR - https://github.com/lplnufpi/essay-br; French FakeNews https://huggingface.co/datasets/readerbench/fakenews-climate-fr l.

Code Availability

The code repository is publicly available on GitHub https://github.com/readerbench/ReaderBenchAPI. The platform can be accessed at https://readerbench.com.

Notes

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining
Amorim, E., Canc¸ado, M., & Veloso, A. (2018). Automated essay scoring in the presence of biased ratings. Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics:Human language technologies, volume 1 (long papers) (pp. 229–237).
Ayoub, G. (2023). Pyphen Retrieved from https://pypi.org/project/pyphen/.
Burstein, J., Kukich, K., Wolff, S., Lu, C., & Chodorow, M. (1998, April). Computer analysis of essays. In NCME symposium on automated scoring.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Cozma, M., Butnaru, A., & Ionescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 2: Short papers) (pp. 503–509).
Crossley, S. A., Kyle, K., & McNamara, D. S. (2015). To aggregate or not? Linguistic features in automatic essay scoring and feedback systems. Grantee Submission, 8(1).
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The tool for the automatic analysis of text cohesion (taaco): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48, 1227–1237.
Article Google Scholar
Dascalu, M., Dessus, P., Trausan-Matu, S., Bianco, M., & Nardy, A. (2013). Readerbench - an environment for analyzing textual complexity, reading strategies and collaboration. In International Conference on Artificial Intelligence in Education (AIED 2013) (p. 379–388). Springer.
Dascalu, M., Dessus, P., Bianco, M., Trausan-Matu, S., & Nardy, A. (2014). Mining texts, learner productions and strategies with ReaderBench. In A. Peña-Ayala (Ed.), Educational Data Mining: Applications and Trends (pp. 345–377). Springer.
Dascalu, M., McNamara, D. S., Trausan-Matu, S., & Allen, L. (2018). Cohesion network analysis of cscl participation. Behavior Research Methods, 50(2), 604–619. https://doi.org/10.3758/s13428-017-0888-4.
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). BERT: Pretraining of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American chapter of the Association for Computational Linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics.
Explosion (2023). spaCy. Retrieved from https://spacy.io.
Fellbaum, C. (2005). Wordnet(s). In K. Brown (Ed.), Encyclopedia of language and linguistics (2nd ed., Vol. 13, pp. 665–670). Elsevier.
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. Advances in neural information processing systems, 28.
Foltz, P. W., Lochbaum, K. E., & Rosenstein, M. R. (2017). Automated writing evaluation: Defining the territory. Assessing Writing, 34, 9–22.
Google Scholar
Fonseca, E., Medeiros, I., Kamikawachi, D., & Bokan, A. (2018). Automatically grading Brazilian student essays. In Computational processing of the Portuguese language. September 24–26, 2018 (pp. 170–179).
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods Instruments & Computers, 36(2), 193–202.
Article Google Scholar
He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622. Retrieved from https://www.sciencedirect.com/science/article/pii/S0950705120307516 https://doi.org/10.1016/j.knosys.2020.106622.
Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated machine learning: Methods, systems, challenges. Springer Nature.
Jeon, S., & Strube, M. (2021). Countering the influence of essay length in neural essay scoring. Proceedings of the second workshop on simple and efficient natural language processing (pp. 32–38).
Jin, H., Song, Q., & Hu, X. (2019). Auto-keras: An efficient neural architecture search system. Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining (pp. 1946–1956).
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.
Article MathSciNet Google Scholar
Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (taales): Version 2.0. Behavior Research Methods, 50, 1030–1046.
Article Google Scholar
Landauer, T., Laham, D., & Foltz, P. (2000). The intelligent essay assessor. Intelligent Systems IEEE, 15, 09.
Google Scholar
LeDell, E., & Poirier, S. (2020). H2o automl: Scalable automatic machine learning. Proceedings of the automl workshop at icml (Vol. 2020).
Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., & Talwalkar, A. (2018). Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 5.
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., & Stoica, I. (2018). Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Mangal, D., & Sharma, D. K. (2020, June). Fake news detection with integration of embedded text cues and image features. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (trends and future directions) (ICRITO) (pp. 68–72). IEEE.
Marinho, J., Anchiˆeta, R., & Moura, R. (2022). Essay-BR: a Brazilian corpus to automatic essay scoring task. Journal of Information and Data Management, 13(1), 65–76. Retrieved from https://sol.sbc.org.br/journals/index.php/jidm/article/view/2340.10.5753/jidm.2022.2340.
Martin, L., Muller, B., Suárez, P. J. O., Dupont, Y., Romary, L., de La Clergerie, É. V., Seddah, D., & Sagot, B. (2020). Camembert: a tasty French language model. Proceedings of the 58th annual meeting of the association for computational linguistics
McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–59.
Article Google Scholar
Meddeb, P., Ruseti, S., Dascalu, M., Terian, S. M., & Travadel, S. (2022). Counteracting French fake news on climate change using language models. Sustainability, 14(18), 11724.
Article Google Scholar
Mridha, M. F., Keya, A. J., Hamid, M. A., Monowar, M. M., & Rahman, M. S. (2021). A comprehensive review on fake news detection with deep learning. Ieee Access: Practical Innovations, Open Solutions, 9, 156151–156170.
Article Google Scholar
Olson, R. S., & Moore, J. H. (2016). Tpot: A tree-based pipeline optimization tool for automating machine learning. Workshop on automatic machine learning (pp. 66–74).
Page, E. B. (2003). The imminence of grading essays by computer—25 years later. The Journal of Technology Learning and Assessment, 2(1), 1–19.
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet Google Scholar
Plonska, A., & Plonski, P. (2021). Mljar: State-of-the-art automated machine learning framework for tabular data. version 0.10.3 L apy, Poland: MLJAR. Retrieved from https://github.com/mljar/mljar-supervised.
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., & Tow, J. (2022). Bloom: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration. Assessing Writing, 20, 53–76.
Article Google Scholar
Shu, K., Mahudeswaran, D., Wang, S., & Liu, H. (2020, May). Hierarchical propagation networks for fake news detection: Investigation and exploitation. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 626–637).
Souza, F., Nogueira, R., & Lotufo, R. (2020). BERTimbau: Pretrained BERT models for Brazilian Portuguese. 9th Brazilian conference on intelligent systems, BRACIS, Rio Grande do Sul, Brazil
Stab, C., & Gurevych, I. (2014). Identifying argumentative discourse structures in persuasive essays. https://doi.org/10.3115/v1/D14-1006.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.
Article Google Scholar
Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013). Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 847–855).
Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Implementing AutoML in educational data mining for prediction tasks. Applied Sciences, 10(1). Retrieved from https://www.mdpi.com/2076-3417/10/1/9010.3390/app10010090.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the use of bert for auto-mated essay scoring: Joint learning of multi-scale essay representation. Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 3416–3425).

Download references

Acknowledgements

We would like to thank Emanuel Tertes and Pavel Betiu who supported us in developing the new interface for ReaderBench.

Funding

This work was supported by a grant from the Ministry of Research, Innovation and Digitalization, project CloudPrecis “Increasing UPB’s research capacity in Cloud technologies and massive data processing”, Contract Number 344/390020/06.09.2021, MySMIS code: 124812, within POC. The research reported here was also supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A180261 to Arizona State University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Author information

Authors and Affiliations

Computer Science and Engineering Department, University Politehnica of Bucharest, Bucharest, Romania
Stefan Ruseti, Ionut Paraschiv & Mihai Dascalu
Academy of Romanian Scientists, Bucharest, Romania
Mihai Dascalu
Department of Psychology, Arizona State University, Tempe, AZ, USA
Danielle S. McNamara

Authors

Stefan Ruseti
View author publications
You can also search for this author in PubMed Google Scholar
Ionut Paraschiv
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Dascalu
View author publications
You can also search for this author in PubMed Google Scholar
Danielle S. McNamara
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Stefan Ruseti. The first draft of the manuscript was written by Ionut Paraschiv and Stefan Ruseti, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mihai Dascalu.

Ethics declarations

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Conflict of Interest/Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ruseti, S., Paraschiv, I., Dascalu, M. et al. Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench. Int J Artif Intell Educ (2024). https://doi.org/10.1007/s40593-024-00402-4

Download citation

Accepted: 13 March 2024
Published: 01 April 2024
DOI: https://doi.org/10.1007/s40593-024-00402-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Abstract

Access this article

Similar content being viewed by others

Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features

A Study on Performance Sensitivity to Data Sparsity for Automated Essay Scoring

A review of deep-neural automated essay scoring models

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflict of Interest/Competing Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automated Pipeline for Multi-lingual Automated Essay Scoring with ReaderBench

Abstract

Access this article

Similar content being viewed by others

Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features

A Study on Performance Sensitivity to Data Sparsity for Automated Essay Scoring

A review of deep-neural automated essay scoring models

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflict of Interest/Competing Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation