Investigating the Validity of Using Automated Writing Evaluation in EFL Writing Assessment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11284)


This study aims to follow an argument-based approach to validation of using automated essay evaluation (AWE) system with the example of Pigai, a Chinese AWE program, in English as a Foreign Language (EFL) writing assessment in China. First, an interpretive argument was developed for its use in the course of College English. Second, three sub-studies were conducted to seek evidence of claims related to score evaluation, score generalization, score explanation, score extrapolation and feedback utilization. Major findings are: (1) Pigai yields scores that are accurate indicators of the quality of a test performance sample; (2) its scores are consistent across tasks in the same form; (3) its scoring features represent the construct of interest to some extent, yet problems of construct under-representation and construct-irrelevant features still exist; (4) its scores are consistent with teachers’ judgments of students’ writing ability; (5) its feedback has a positive impact on students’ development of writing ability, but to some extent. These results reveal that AWE can only be used as a supplement to human evaluation, but can never replace the latter.


Pigai Automated essay evaluation Writing assessments 


  1. 1.
    Warschauer, M.: Automated writing evaluation: defining the classroom research agenda. Lang. Teach. Res. 10, 1–24 (2006)CrossRefGoogle Scholar
  2. 2.
    Valenti, S., Neri, F., Cucchiarelli, A.: An overview of current research on automated essay grading. J. Inf. Technol. Educ. Res. 2, 319–330 (2003)Google Scholar
  3. 3.
    Xi, X.: Automated scoring and feedback systems: where are we and where are we heading? Lang. Test. 27, 291–300 (2010)CrossRefGoogle Scholar
  4. 4.
    Williamson, D.M., Xi, X., Breyer, F.J.: A framework for evaluation and use of automated scoring. Educ. Meas.: Issues Pract. 31, 2–13 (2012)CrossRefGoogle Scholar
  5. 5.
    Zhang, Z.: Student engagement with computer-generated feedback: a case study. ELT J. 70, 1–12 (2016)CrossRefGoogle Scholar
  6. 6.
    Bai, L., Hu, G.: In the face of fallible AWE feedback: how do students respond? Educ. Psychol. 37, 67–81 (2017)CrossRefGoogle Scholar
  7. 7.
    Zhang, J.: Same text different processing? Exploring how raters’ cognitive and meta-cognitive strategies influence rating accuracy in essay scoring. Assessing Writ. 27, 37–53 (2016)CrossRefGoogle Scholar
  8. 8.
    Linacre, J.M.: A User’s Guide to FACETS: Rasch-Model Computer Programs. MESA Press, Chicago (2005)Google Scholar
  9. 9.
    Green, A.: Verbal Protocol Analysis in Language Testing Research: A Handbook. Cambridge University Press, Cambridge (1998)Google Scholar
  10. 10.
    Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, Chicago (1967)Google Scholar
  11. 11.
    McNamara, T.F.: Measuring Second Language Performance. Longman, London (1996)Google Scholar
  12. 12.
    Miles, M.B., Huberman, A.M.: Qualitative Data Analysis: An Expanded Sourcebook. Sage, Thousand Oaks (1994)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Foreign LanguagesSouth China University of TechnologyGuangzhouChina

Personalised recommendations