N-Gram Based Approach for Automatic Prediction of Essay Rubric Marks

  • Magdalena JankowskaEmail author
  • Colin Conrad
  • Jabez Harris
  • Vlado Kešelj
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10832)


Automatic Essay Scoring, applied to the prediction of grades for dimensions of a scoring rubric, can provide automatic detailed feedback on students’ written assignments. We apply a character and word n-gram based technique proposed originally for authorship identification—Common N-Gram (CNG) classifier—to this task. We report promising results for the rubric mark prediction for essays by CNG, and perform analysis of suitability of different types of n-grams for the task.


Automatic Essay Scoring Text classification Character n-grams 



The project was supported by the NSERC Engage grant EGP/507291-2016 with industry partner, D2L Corporation. The authors would like to thank D2L members: Brian Cepuran, VP, D2L Labs and Rose Kocher, Director, Grant & Research Programs, for their guidance in the project and the feedback on the paper. The authors would also like to acknowledge a support from Killam Predoctoral Scholarship.


  1. 1.
    Shermis, M.D., Burstein, J.: Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge, New York (2013)Google Scholar
  2. 2.
    Phandi, P., Chai, K.M.A., Ng, H.T.: Flexible domain adaptation for automated essay scoring using correlated linear regression. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 431–439 (2015)Google Scholar
  3. 3.
    Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 2003, Dalhousie University, Halifax, Nova Scotia, Canada, pp. 255–264, August 2003Google Scholar
  4. 4.
    Attali, Y., Burstein, J.: Automated essay scoring with e-rater® v. 2.0. ETS Res. Rep. Ser. 2004(2) (2004)Google Scholar
  5. 5.
    Singh, A., Karayev, S., Gutowski, K., Abbeel, P.: Gradescope: a fast, flexible, and fair system for scalable assessment of handwritten work. In: Proceedings of the Fourth 2017 ACM Conference on Learning@ Scale, pp. 81–88. ACM (2017)Google Scholar
  6. 6.
    Foltz, P.W., Laham, D., Landauer, T.K.: Automated essay scoring: applications to educational technology. In: EdMedia: World Conference on Educational Media and Technology, pp. 939–944. Association for the Advancement of Computing in Education (AACE) (1999)Google Scholar
  7. 7.
    Juola, P.: Authorship attribution. Found. Trends Inf. Retriev. 1(3), 233–334 (2008)CrossRefGoogle Scholar
  8. 8.
    Jankowska, M., Milios, E., Kešelj, V.: Author verification using common n-gram profiles of text documents. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, pp. 387–397. Dublin City University and Association for Computational Linguistics, August 2014Google Scholar
  9. 9.
    Doyle, J.: Automatic evaluation of student essays using n-gram analysis techniques. Master’s thesis, Dalhousie University (2007)Google Scholar
  10. 10.
    Stamatatos, E.: Author identification using imbalanced and limited training texts. In: Proceeding of the 18th International Workshop on Database and Expert Systems Applications, DEXA 2007, Regensburg, Germany, pp. 237–241, September 2007Google Scholar
  11. 11.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  13. 13.
    Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Magdalena Jankowska
    • 1
    Email author
  • Colin Conrad
    • 1
  • Jabez Harris
    • 1
  • Vlado Kešelj
    • 1
  1. 1.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations