Deep Learning in Automated Essay Scoring

  • David BoulangerEmail author
  • Vivekanandan Kumar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10858)


This paper explores the application of deep learning in automated essay scoring (AES). It uses the essay dataset #8 from the Automated Student Assessment Prize competition, hosted by the Kaggle platform, and a state-of-the-art Suite of Automatic Linguistic Analysis Tools (SALAT) to extract 1,463 writing features. A non-linear regressor deep neural network is trained to predict holistic scores on a scale of 10–60. This study shows that deep learning holds the promise to improve significantly the accuracy of AES systems, but that the current dataset and most essay datasets fall short of providing them with enough expertise (hand-graded essays) to exploit that potential. After the tuning of different sets of hyperparameters, the results show that the levels of agreement, as measured by the quadratic weighted kappa metric, obtained on the training, validation, and testing sets are 0.84, 0.63, and 0.58, respectively, while an ensemble (bagging) produced a kappa value of 0.80 on the testing set. Finally, this paper upholds that more than 1,000 hand-graded essays per writing construct would be necessary to adequately train the predictive student models on automated essay scoring, provided that all score categories are equally or fairly represented in the sample dataset.


Deep learning Automated essay scoring Writing analytics 


  1. Crossley, S.A., Kyle, K., McNamara, D.S.: The tool for the automatic analysis of text cohesion (TAACO): automatic assessment of local, global, and text cohesion. Behav. Res. Methods 48(4), 1227–1237 (2016)CrossRefGoogle Scholar
  2. Crossley, S.A., Kyle, K., McNamara, D.S.: Sentiment analysis and social cognition engine (SEANCE): an automatic tool for sentiment, social cognition, and social order analysis. Behav. Res. Methods 49(3), 803–821 (2017)CrossRefGoogle Scholar
  3. Guestrin, C., Fox, E.: Machine Learning: Regression. Coursera (2017). Accessed 22 Mar 2018
  4. Kumar, V., Fraser, S.N., Boulanger, D.: Discovering the predictive power of five baseline writing competences. J. Writ. Anal. 1(1), 176–226 (2017)Google Scholar
  5. Kyle, K., Crossley, S.A.: Automatically assessing lexical sophistication: indices, tools, findings, and application. TESOL Q. 49(4), 757–786 (2015)CrossRefGoogle Scholar
  6. Kyle, K.: Suite of Automatic Linguistic Analysis Tools (SALAT) (2016a). Accessed 25 Apr 2018
  7. Kyle, K.: Measuring syntactic development in L2 writing: fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication. Doctoral Dissertation (2016b).
  8. Ng, A.: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization. Coursera (2017). Accessed 22 Mar 2018
  9. Rosebrock, A.: Deep Learning for Computer Vision with Python, 1st edn. PyImageSearch (2017). Accessed 22 Mar 2018
  10. Shermis, M.D.: State-of-the-art automated essay scoring: competition, results, and future directions from a United States demonstration. Assess. Writ 20(1), 53–76 (2014)CrossRefGoogle Scholar
  11. Zupanc, K., Bosnić, Z.: Automated essay evaluation with semantic analysis. Knowl.-Based Syst. 120, 118–132 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Athabasca UniversityEdmontonCanada

Personalised recommendations