1 Overview

The definition commonly attributed to automated evaluation of writing is “the ability of computer technology to evaluate and score written prose” (Shermis & Burstein, 2003, p. xiii). Digital writing environments that provide automated feedback on writing are known as Automated Writing Evaluation (AWE) tools. AWE tools originated from automated essay scoring (AES) (chapter “Automated Scoring of Writing”). It must be noted, however, that AWE is not interchangeable with AES and its sister-term automated essay evaluation (AEE). Unlike, AES/AEE whose focus is on summative assessment, AWE tools support the process of writing by providing formative feedback that is typically displayed on an engaging graphic interface. Moreover, AWE is a more encompassing term, where writing covers any genre and evaluation extrapolates to uses beyond scoring.

As AES derivatives, AWE tools employ computational engines that rely on natural language processing (NLP), artificial intelligence, and statistical modelling approaches (chapters “Automated Text Generation and Summarization for Academic Writing to Analytic Techniques for Automated Analysis of Writing;” also Burstein et al., 2003a) to analyze lexical, syntactic, semantic, and discourse traits in written texts. Therefore, their design, development, and implementation are grounded in multi-disciplinary perspectives including Applied Linguistics, Educational Measurement, Computer and Information Sciences, Psychometrics and Quantitative Psychology, Cognitive Psychology and Psycholinguistics, and Writing Studies in first and second languages.

Noting that a “comprehensive history of AWE has yet to be written,” Hazelton et al. (2021) delineate AWE tools into three generations based on how technological capabilities developed over time (p. 43). The first-generation exemplar, in their view, is represented by Project Essay Grade (PEG) introduced in the 1960s. While PEG is indeed the pioneer that spearheaded AWE, it aimed to address the challenge of time-intensive grading of student writing and thus essentially falls within the purview of AES (chapter “Automated Scoring of Writing”). Second-generation AWE, which emerged in the 1980s also primarily as efficiency-driven technology, includes tools that provide immediate individualized feedback aiming to alleviate the labour-intensive task for teachers needing to respond to student writing in formative ways. The Writer’s Workbench was among the first tools of this kind that provided feedback on aspects of writing including errors and topic sentences, followed by Criterion, MY Access!, Write-To-Learn, etc. It is worth noting that, while initially AWE tools hardly accounted for the needs of second and foreign language learners, language learning theories began to gain a steady influence on AWE research and development in the 2000s (Xi, 2010). The third generation of AWE has taken a “left turn” expanding the ability of this technology to analyze student writing across academic disciplines and writing genres (Burstein et al., 2016a, p. 6). Most recent third generation tools (e.g., freely available Writing Mentor app installed from the Google Docs add-on store) are approaching the functionality of intelligent tutoring systems (ITS) since they provide guided activities to complement the feedback. The Writing Pal is the only ITS representative tool that has an AWE component (McCarthy et al., 2022). Writing Pal is modular, and the AWE component can be used solely for feedback as well as for instruction (chapter “The Future of Intelligent Tutoring Systems for Writing”).

2 Core Idea of the Technology

AWE tools serve the purpose of formative assessment and provide practice for writing development. They have been promoted and largely implemented as enhancements for process writing instruction, emphasizing the value of multiple drafting fostered by feedback and other forms of scaffolding. Aligned with the move towards individualized teaching and assessment, AWE is deemed to enhance the dynamics of classroom instruction and to also ensure cross-curricular consistency of writing evaluation. For students, automated feedback is intended as a motivational factor that can guide revision and sustain learner autonomy.

Considering that feedback is at the core of AWE, a two-pronged categorization of AWE alternative to Hazelton et al.’s (2021) can be conceptualized based on the origin of the automated feedback. As mentioned above, most existing AWE tools are descendants of traditional AES used to assess writing performance on constructed-response writing tasks. Such tools can be categorized as assessment-driven. Their feedback is corrective in nature, flagging writing traits that may need to be addressed. Most assessment-driven AWE tools are asynchronous and attempt to address grammatical errors as well as more global discourse traits. There are also a few tools such as Grammarly and CyWrite that deliver the feedback synchronously. The second category comprises genre-based AWE, whose design is guided by discourse analysis studies of the target domain, learning theories, and pedagogical principles (Cotos, 2022). The first genre-based automated analysis tool called Mover was introduced by Anthony and Lashkia (2003), and the Research Writing Tutor (RWT) and AcaWriter are more recent. What sets them apart is that their asynchronous feedback is operationalized to reflect the rhetorical conventions of specific genres and not to facilitate error correction. The development of genre-based AWE requires large-scale corpus-based research of particular genres, which is why there are still very few such tools. This is perhaps the reason why they were not explicitly noted within Hazelton et al.’s (2021) third generation.

Both assessment-driven and genre-based tools have been used by teachers as a complement to instruction and by writers as aids for independent self-paced and self-regulated writing and revision. Assessment-driven AWE has been widely implemented at all levels of formal instruction, from elementary to higher education and to non-traditional adult learning environments. Higher education has witnessed most implementations in English composition courses at undergraduate level as well as in English as a second and foreign language academic writing university courses. There is hardly a ‘prescribed’ use. Rather, teachers make decisions regarding the uses of AWE based on instructional needs and learning goals or based on their level of familiarity with the tool. Some teachers encourage students to process and respond to AWE feedback on lower-level concerns and complement that with their own feedback on more global aspects of writing. Others prefer to incentivize students’ revision by directing them to the summative, scoring-based feedback on specific writing traits. Yet others tend to disregard automated formative feedback and resort to scoring capabilities only for assessment or test preparation purposes (Stevenson, 2016).

3 Functional Specifications

AWE tools are user-facing systems powered by back-end engines used to generate feedback. For assessment-driven tools, these are scoring engines; for example, Criterion as well as Turnitin’s Revision Assistant and Draft Coach use e-rater, Write-To-Learn uses Intelligent Essay Assessor, and MY Access! uses IntelliMetric (chapter “Automated Scoring of Writing”). For genre-based tools, the engines are analytic, trained to ‘learn’ the rhetorical traits of the genre from a representative annotated corpus and then apply the ‘learned’ information to identify those traits in new texts. These analytic engines use different text classification approaches (chapter “Analytic Techniques for Automated Analysis of Writing”). For example, AntMover uses a NaïveBayes classifier, RWT uses support vector machine classifiers, and AcaWriter uses a rule-based parser. Distinct from its counterparts whose classifiers adopt models of consecutive words, AcaWriter’s parser identifies words or expressions and syntactic dependencies that may instantiate rhetorical concepts.

Given that the scoring engines are trained to detect numerous characteristics of texts, assessment-driven tools’ feedback is manifold targeting grammatical forms, syntactic complexity, lexical complexity, style, organization, topical content, idea development, redundancy, relevance, deviance, semantic coherence, mechanics, etc. The formative feedback is commonly embedded in the student’s draft. Some tools flag errors and suggest corrections, which are mostly based on how the scoring engine was trained to evaluate writing but can also draw on individual students’ error correction history (e.g., TechWriter). Summative feedback can also be offered as a performance summary containing a holistic score, a quantification of errors based on the analyzed traits of writing, and hyperlinks to detailed descriptive feedback on each error category. While most AWE tools are for writing in English, some generate multilingual feedback for second language writers (e.g., Criterion and MY Access!).

Genre-based exemplars address higher order concerns related to rhetorical effectiveness as expected by target discourse communities. Their feedback is operationalized per Swales’ (1981) theorizing of genre conventions in terms of communicative goals called ‘moves’ and functional strategies called ‘steps’. Swales’ Create-A-Research-Space (CARS) model comprising three moves (Establishing a Territory, Identifying a Niche, Addressing the Niche) and their respective steps (e.g., Claiming Centrality, Highlighting a Problem, Stating the Value, etc.) is to some extent at the core of all existing genre-based tools’ analytic engines. While different tools articulate and present their feedback in different ways, essentially writers receive feedback indicating what the sentences in their text are doing communicatively. AntMover, trained to analyze research article abstracts, displays the text split into sentences that are labeled with CARS categories. IADE’s feedback visualized the rhetorical composition of research article introductions by color-coding all the sentences in a text for moves, and its RWT successor has expanded this feature with step-level, move-level, and discipline-specific comparative feedback on all the sections of research articles—Introduction-Methods-Results-Discussion/Conclusion (IMRD/C). AcaWriter, on the other hand, gives feedback only for sentences where its rule-based parser identifies concepts indicative of moves (e.g., summarizing issues, describing an open question).

Regardless of the origin and nature of the feedback, AWE tools incorporate a vast array of additional scaffolding features for students. In the interest of brevity, I will only mention select examples here. First, automated feedback may be accompanied by interface features enabling students to solicit feedback from their instructor, who can point to more subtle and more global issues not identifiable automatically. There are also features designed to facilitate guided practice and to help foster the more germane activities of pre-writing, drafting, and revision. Criterion, for instance, contains a Make a Plan feature with a number of templates for planning strategies. MY Access! offers graphical pre-writing tools to assist students with the formulation and organization of their ideas, a word bank for appropriate vocabulary use, a checklist for scoring rubrics for self-assessment, a so-called ‘writing coach’ suggesting revision goals and remediation activities, and an ‘editor’ that supplies suggestions for editing. In addition to such features, WriteToLearn uses text-to-speech technologies so that students can hear the text and see the definitions of words in on-demand pop-up windows. MI Write and MI Tutor, the legacy of PEG, offer students graphic organizers, peer review options for giving and receiving peer feedback, and portfolios that allow them to chart their progress toward grade-level proficiency. The WritingRoadmap embeds model sentence diagrams, tutorials on grammar and syntax, a thesaurus, and tips for essay improvement. RWT provides video tutorials for all IMRD/C moves and steps, a move/step annotated multi-disciplinary corpus of published research articles, and a concordancer searchable for examples of all the steps in all the IMRD/C texts in the corpus. Being an ITS, the Writing Pal provides the most tailored scaffolding focused on writing strategies during prewriting, drafting, and revising stages of the writing process.

Apart from this variety of student-focused features, most tools integrate features for teachers. Perhaps most popular are features like chat or electronic sticky notes that bring teacher’s comments into the feedback loop for the student. Writing prompts, whether ready-made or created by teachers based on stimulus reading materials pre-packaged in the system, enable them to customize writing assignments for better alignment with learning objectives. Additionally, there are options for monitoring students’ use of available scaffolding features and for tracking student progress, as well as for generating proficiency reports for individual students and for full classes or across demographic groups.

4 Main Products

While there are a number of AWE tools that can be considered main products, this section reviews one representative assessment-driven tool and one genre-based tool. Among the former, Criterion is perhaps the most researched and widely implemented commercial product, with features similar to most such tools. Genre-based AWE is well represented by RWT. This non-commercial tool can be considered paradigmatic because it is truly genre-specific, with features most comprehensively covering the rhetorical traits characteristic of the research article genre.

4.1 Criterion

The Educational Testing Service developed Criterion, formally called The Criterion Online Writing Evaluation service, for writers of various age groups in primary, secondary, and higher education settings. The developer describes it as an instructor-led system aimed to help teachers assess student writing performance and progress, and to provide students with self-paced independent writing practice guided by immediate automated feedback. Criterion’s technical capabilities are based on two complementary applications: e-rater and Critique. The former is a scoring engine that assigns a holistic score based on statistical modelling of how linguistic and text features are related to overall writing quality; the latter contains a suite of programs that generate feedback (Burstein et al., 2003b). The feedback covers five major traits: grammar, usage, mechanics, style, and organization and development, detailing specific types of errors within each trait (see Table 1).

Table 1 Criterion’s feedback traits and error types

It takes Criterion less than twenty seconds to assess a submitted text and generate a performance summary presenting a holistic score, the number of errors, and feedback comments corresponding to each error. Note that it does not display the errors of all types at the same time; rather, students can view the feedback selectively by clicking on one of the tabs of the Trait Feedback Analysis Menu, which opens a trait-specific feedback screen. Figure 1 is a screenshot of the feedback screen for Style (Repetition of words). A roll-over message appears when moving the cursor over a highlighted word, expression, or stretch of text, presenting formative feedback on the identified type of error; e.g.:

Fig. 1
A screenshot of the feedback screen for Style tab in Criterion. The right panel is labeled repetition of words. It highlights a word repeated 10 times in the essay. At the left there are 6 summary of style comments, and below it details like number of words, sentences are given.

Example screenshot of the feedback screen for Style (Repetition of words) (from ETS, 2007)

  • Grammar—Fragment or missing comma: This sentence may be a fragment or may have incorrect punctuation. Proofread the sentence to be sure that it has correct punctuation and that it has an independent clause with a complete subject and predicate.

  • Usage—Missing comma: You may need to place a comma after this word.

  • Style—Passive voice: You have used the passive voice in this sentence. Depending upon what you wish to emphasize in the sentence, you may wish to revise it using the active voice.

Another form of feedback is provided along with the holistic score, summarizing the trait feedback analysis to reflect the overall quality of the text and the number of errors (per trait and per error type). To help students understand the meaning of their score, Criterion makes available a score guide with descriptions for basic, proficient, and advanced levels. According to the First Year 6pt Scale—Criterion Scoring Guide (n.d.), an author whose essay scores 2 out of 6, for instance, would receive feedback specifying the following weaknesses of the essay:

You have work to do to improve your writing skills. You probably have not addressed the topic or communicated your ideas effectively. Your writing may be difficult to understand. In one or more of the following areas, your essay:

  • Misunderstands the topic or neglects important parts of the task

  • Does not coherently focus or communicate your ideas

  • Is organized very weakly or doesn’t develop ideas enough

  • Generalizes and does not provide examples or support to make your points clear

  • Uses sentences and vocabulary without control, which sometimes confuses rather than clarifies your meaning.

Criterion’s feedback, multiple revision, and unlimited resubmission features are meant to support revising and editing. Like other AWE tools, Criterion has additional features for students planning and writing, offering planning templates editable while completing the writing assignment, a catalogue of well-written essays, and a thesaurus. Its online Writer’s Handbook can be tailored to different levels of English language proficiency, to a certain first language (Spanish, Simplified Chinese, Japanese, Korean), and to elementary, middle school, high school, or college educational levels. Students’ communication and access are supported by features that facilitate dialogue and development of online portfolios. Teachers, in turn, can enable available pre-writing features, designate a particular planning template, adjust assignments to target specific abilities, and select resources appropriate for the development of those abilities. They can also operate with a library of more than 400 essay topics at various skill levels and pertaining to different kinds of essays (narrative, expository, persuasive). When designing a writing assignment, teachers can select options most suitable for the writing task (e.g., time allocated, number of allowed submissions of revised text). Importantly, they can set the type of automated feedback to be displayed and can also comment on their students’ work through different modalities. For a description of how teachers and students can engage with this tool procedurally, see Lim and Kahng (2012).

4.2 Research Writing Tutor (RWT)

RWT was developed for advanced academic writers needing to learn how to produce publishable quality research articles responsive to the expectations of their socio-disciplinary discourse communities (Cotos, 2014). This tool comprises three standalone yet interconnected modules. ‘Understand Writing Goals’ is a learning module, which contains multimodal content explaining the communicative purposes of the moves and the functions of the steps (see the IMRD/C move-step framework in Cotos et al., 2015), as well as the patterns of language use characteristic of those rhetorical traits. ‘Explore Published Writing’ serves as a demonstration module with IMRD/C Section Structure, Move/Step Examples, and original Research Articles components, which expose students to different forms of a move/step annotated corpus of 960 published articles representative of authentic discourse in 32 disciplines. ‘Analyze my writing’ is the AWE feedback module providing different forms of individualized automated feedback designed for scaffolded revision.

A notable strength of this tool is its integrative theoretical grounding in socio-disciplinary and cognitive dimensions of scientific writing that are important for the development of genre knowledge and research writing competence. From a socio-disciplinary standpoint, the features in the feedback module are designed to render the rhetorical composition of research articles (informed by Swalesian genre theory) and the language choices that instantiate functional meaning (informed by systemic functional linguistics). From a cognitive standpoint, it operationalizes tenets from writing, language learning, and skill acquisition theories. With this grounding, RWT’s features depicted in Fig. 2 are designed to create the learning affordances summarized in Table 2. In ensemble, its features and affordances create conditions for scaffolded writing practice, during which students are able to detect and address discourse-level shortcomings in their drafts, whether related to rhetorical structure, intended mental representation of ideas, or language choices needed to convey specific functional meanings (Cotos, 2017; Cotos et al., 2017, 2020).

Fig. 2
A screenshot of the feedback module of R W T. It has 3 sections. Section 1. 4 options of editing introduction, add methods, add results, and add discussions are at the top, and a textbox below it has an article. An analyze button is below the textbox, for iterative revision and submission.

Screenshot of the features of the ‘Analyze My Writing’ feedback module of RWT

Table 2 The features and affordances of the ‘Analyze My Writing’ feedback module of RWT

RWT is used in various contexts, including credit-bearing writing courses employing data-driven learning pedagogy, hands-on workshops, peer review group activities, individual tutoring with writing consultants, and independent revision. The feedback and scaffolding features provide writers with exposure to authentic disciplinary discourse, directions for how to discern the writing norms of their discourse community, guided writing practice, and productive interaction.

5 Research

Over the last decade, the fields of AES/AEE and AWE have emerged as distinct areas of scholarship. Both these areas still adjoin under the validity argument framework (Kane, 1992), which consists of a chain of inferences that guide research. While describing the framework is beyond the scope of this chapter, highlighting it as an increasingly prolific heuristic adopted in AWE studies is necessary. It has enabled researchers to consolidate various types of empirically supported claims into a systematic progression of inferences about the effectiveness of AWE tools, thus strengthening the defensibility of decisions regarding their uses. For Criterion and RWT reviewed above, claims systematized under this framework can be found in Chapelle et al. (2015) and in Cotos (forthcoming), respectively. Unlike more recent studies, earlier works, many of which were reviewed in meta-analyses (Graham et al., 2015; Nunes et al., 2022; Stevenson & Phakiti, 2014), are not are explicitly positioned within the validity argument framework but still address different inferences. Table 3 synthesizes the findings from example studies to show that there is substantial positive evidence for the successful application of AWE across these key areas.

Table 3 AWE validity argument inferences and claims

As with other educational technologies, some studies unveil rebuttal evidence, or issues that weaken the strength of the claims one would like to make about AWE. For instance, Extrapolation cannot be confidently claimed because AWE feedback may not always be as good as teacher or peer feedback (Dikli & Bleyle, 2014). Impact may be affected because assessment-driven AWE feedback tends to promote surface-level revisions, may have no or low uptake on some writing traits, and can inhibit revising of propositional content (Li et al., 2015; Ranalli, 2021; Ware, 2014).

Such variability in outcomes is not surprising because it depends not so much on the tools themselves but on how they are implemented. Moreover, the research methods adopted stem from different disciplinary paradigms. Mixed methods have gained ground, but there is a clear need for longitudinal studies examining the effects of AWE feedback over an extended period of time. Variability in findings is also due to differing assumptions about what constitutes effectiveness (e.g., engagement, motivation, affect, writing improvement, skill development in first and second language) and how it is measured. Measures like error frequency and error reduction, for instance, are confined to impact on revised texts and do not extrapolate well to new compositions. In future research, revision quantity should be reported along with large-scale analyses of specific qualitative changes in writing performance. Not to overemphasize writing products, they should be examined vis-à-vis the process of writing with AWE feedback, and interaction behaviours should be scrutinized to reveal the metacognitive processes activated by writers along with the strategies they develop when drafting and revising.

6 Implications of This Technology for Writing Theory and Practice

Considering the snapshots of the research and the representative tools above, it can be argued that AWE technology appears to have reached significant milestones in its specific goals to address the challenges inherent in writing development and the teaching of writing. However, this does not mean that AWE has arrived at a standard solution. First, assessment-driven and genre-based strands have been developing in parallel. In the future, it is likely that a fourth generation of AWE will emerge drawing on the features and affordances of both assessment-driven and genre-based tools. The AWE evolution will also leverage the capabilities of ITSs with animated agents (as those of the Writing Pal, chapter “The Future of Intelligent Tutoring Systems for Writing”) and biometric technology (chapter “Investigating Writing Processes with Keystroke Logging”) to personify the feedback and generate interactive, strategic, and data-driven feedback fit for particular stages of the writing process.

To materialize these envisioned directions, it is of utmost importance for research to enrich existing writing theories. One possible scenario falls under the framework of cognitive writing models, where theoretical understanding could be deepened in terms of whether and how cognitive writing modelling applies to the revision process when assisted by AWE tools. Empirical investigations of the effects of cognitive mechanisms activated during AWE-assisted revision will have direct implications for writing theory, as empirical results will lead to devising an enhanced cognitive model of writing that would incorporate the role of technology as the digital environment.

This, in turn, will have ramifications for the next-generation of AWE, as it will enable developers to efficiently map metacognitive participatory engagement and to design an AWE-assisted writing conceptual ‘corridor’ linkable to different realizations of cognitive activities. In other words, when developing writers appear to drift away from critical cognitive and metacognitive paths, advanced artificial intelligence-enabled features might steer them through successful AWE-interaction trajectories with feedback and scaffolding that would facilitate the activation of appropriate aspects of metacognition at appropriate stages of drafting and revision (see Banawan et al., chapter “The Future of Intelligent Tutoring Systems for Writing”).

Furthermore, research conducted in different instructional settings with different learner characteristics and targeting different genres will address the relationship between the cognitive processes activated during AWE-facilitated writing and the instructional practices brought into play by teachers. This intersection with practice will yield potentially generalizable insights informing principles for creating optimal digital conditions for AWE-supported writing skill development and implementation guidelines for effective broader use and integration. Those principles and guidelines would be developed to support possible variations in enactment and to allow practitioners to create AWE-facilitated instructional ecosystems that would be appropriate for different types of learners, contexts, and writing tasks.

Before this (and other) theory-research-practice concatenation scenarios become reality, teachers are encouraged to begin developing what Argyris (1997) terms theory-in-use models for educational effectiveness of an innovation. Hazelton et al. (2021) demonstrate two theory of action models based on instructors’ standpoints for using an AWE tool (Writing Mentor) with non-traditional adult learners and with two-year college students. Their models account for the features of the tool (as instances of digital-technology mediation of the writing construct), demonstrated and hypothesized pedagogical actions (as defined teaching objectives), and intended and unintended consequences (positive and negative, intermediate and long-term effects). With all these model components maintaining a constant focus on learners, Hazelton et al. (2021) argue that the pedagogical future for formative AWE “may best be charted by standpoint theory of action” (p. 81).

7 AWE Tools

See Table 4.

Table 4 Select AWE tools