Skip to main content

Improving Automated Evaluation of Formative Assessments with Text Data Augmentation

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2022)


Formative assessments are an important component of instruction and pedagogy, as they provide students and teachers with insights on how students are progressing in their learning and problem-solving tasks. Most formative assessments are now coded and graded manually, impeding timely interventions that help students overcome difficulties. Automated evaluation of these assessments can facilitate more effective and timely interventions by teachers, allowing them to dynamically discern individual and class trends that they may otherwise miss. State-of-the-art BERT-based models dominate the NLP landscape but require large amounts of training data to attain sufficient classification accuracy and robustness. Unfortunately, educational data sets are often small and unbalanced, limiting any benefits that BERT-like approaches might provide. In this paper, we examine methods for balancing and augmenting training data consisting of students’ textual answers from formative assessments, then analyze the impacts in order to improve the accuracy of BERT-based automated evaluations. Our empirical studies show that these techniques consistently outperform models trained on unbalanced and unaugmented data.

The assessment project described in this article was funded, in part, by the NSF Award # 2017000. The opinions expressed are those of the authors and do not represent views of NSF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    While there were 99 students in the study, not all students answered each question.

  2. 2.

    Here, Maj and Min refer to the number of available sentences from the majority and minority classes.


  1. Bayer, M., Kaufhold, M.A., Reuter, C.: A survey on data augmentation for text classification. arXiv preprint arXiv:2107.03158 (2021)

  2. Biswas, G., Segedy, J.R., Bunchongchit, K.: From design to implementation to practice a learning by teaching system: Betty’s brain. Int. J. Artif. Intell. Educ. 26(1), 350–364 (2016)

    Article  Google Scholar 

  3. Black, P., Wiliam, D.: Developing the theory of formative assessment. Educ. Assessm. Evaluat. Accountab. 21, 5–31 (2009)

    Article  Google Scholar 

  4. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

  5. Chen, J., Tam, D., Raffel, C., Bansal, M., Yang, D.: An empirical survey of data augmentation for limited data learning in NLP. arXiv preprint arXiv:2106.07499 (2021)

  6. Clark, I.: Formative assessment: assessment is for self-regulated learning. Educ. Psychol. Rev. 24, 205–249 (2012).

    Article  Google Scholar 

  7. Cohn, C.: BERT Efficacy on Scientific and Medical Datasets: A Systematic Literature Review. DePaul University (2020)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Feng, S.Y., Gangal, V., Kang, D., Mitamura, T., Hovy, E.: GenAug: data augmentation for finetuning text generators. arXiv preprint arXiv:2010.01794 (2020)

  10. Geden, M., Emerson, A., Carpenter, D., Rowe, J., Azevedo, R., Lester, J.: Predictive student modeling in game-based learning environments with word embedding representations of reflection. Int. J. Artif. Intell. Educ. 31(1), 1–23 (2020).

    Article  Google Scholar 

  11. Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77(1), 81–112 (2007).

    Article  Google Scholar 

  12. Higgins, M., Grant, F., Thompson, P.: Formative assessment: balancing educational effectiveness and resource efficiency. J. Educ. Built Environ. 5(2), 4–24 (2010).

    Article  Google Scholar 

  13. Hughes, S.: Automatic Inference of Causal Reasoning Chains from Student Essays. Ph.D. thesis, DePaul University, Chicago (2019).

  14. Käser, T., Schwartz, D.L.: Modeling and analyzing inquiry strategies in open-ended learning environments. Int. J. Artif. Intell. Educ. 30(3), 504–535 (2020)

    Article  Google Scholar 

  15. Liu, P., Wang, X., Xiang, C., Meng, W.: A survey of text data augmentation. In: 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 191–195. IEEE (2020)

    Google Scholar 

  16. Luckin, R., du Boulay, B.: Reflections on the Ecolab and the zone of proximal development. Int. J. Artif. Intell. Educ. 26(1), 416–430 (2015).

    Article  Google Scholar 

  17. McElhaney, K.W., Zhang, N., Basu, S., McBride, E., Biswas, G., Chiu, J.: Using computational modeling to integrate science and engineering curricular activities. In: Gresalfi, M., Horn, I.S. (Eds.). The Interdisciplinarity of the Learning Sciences, 14th International Conference of the Learning Sciences (ICLS) 2020, vol. 3 (2020)

    Google Scholar 

  18. Mislevy, R.J., Haertel, G.D.: Implications of evidence-centered design for educational testing. Educational Measurement: Issu. Pract. 25(4), 6–20 (2006)

  19. NGSS: Next Generation Science Standards. For States, By States. The National Academies Press (2013)

    Google Scholar 

  20. Wei, J., Zou, K.: EDA: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019)

  21. Winne, Philip H.., Hadwin, Allyson F..: nStudy: tracing and supporting self-regulated learning in the Internet. In: Azevedo, Roger, Aleven, Vincent (eds.) International Handbook of Metacognition and Learning Technologies. SIHE, vol. 28, pp. 293–308. Springer, New York (2013).

    Chapter  Google Scholar 

  22. Zhang, N., Biswas, G., Hutchins, N.: Measuring and analyzing students’ strategic learning behaviors in open-ended learning environments. Int. J. Artif. Intell. Educ. (2021).

  23. Zhang, N., et al.: Studying the interactions between science, engineering, and computational thinking in a learning-by-modeling environment. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 598–609. Springer, Cham (2020).

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Keith Cochran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cochran, K., Cohn, C., Hutchins, N., Biswas, G., Hastings, P. (2022). Improving Automated Evaluation of Formative Assessments with Text Data Augmentation. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11643-8

  • Online ISBN: 978-3-031-11644-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics