To validate TAACO and to investigate how local, global, and overall text cohesion indices can be used to assess expert judgments of text coherence and essay quality, we investigated the relations between indices provided by TAACO (outlined in greater detail below) and a corpus of scored student essays. The corpus used for this study comprised a set of independent essays written within a 25-min time frame that were scored by expert raters for essay coherence and overall essay quality.
We selected the corpus of essays used in Crossley and McNamara (2011) in order to assess global cohesion. This corpus comprises 313 timed essays written on SAT prompts. The essays were written by undergraduate freshmen composition students at Mississippi State University. The students were given 25 min to write an essay, during which no outside referencing was allowed. Two SAT prompts were used in the data collection, with students being randomly assigned to either prompt. All of the students were native speakers of English.
Each essay was read and scored by two trained raters on both overall quality (i.e., a holistic score) and specific textual elements (i.e., analytic scores). Eight raters in total took part. The holistic grading scale was based on a standardized rubric commonly used in assessing SATFootnote 1 essays. The analytic rubric included sections related to the essay purpose, the essay plan, the use of topic sentences, the use of paragraph transitions, essay organization, writer conviction, and grammar and mechanics. Of interest for this analysis was the analytic feature relating to organization (i.e., coherence), which evaluated semantic-based, global cohesion (i.e., that the body paragraphs followed the plan set up in the introduction). Such structural elements promote overall text comprehension through the increase of global cohesion.
The trained raters who evaluated the essays had either master’s or doctoral degrees in English, and each rater had at least 3 years experience teaching university-level composition classes. We thus consider these raters to be high-knowledge readers. The raters were informed that the distances between scores were equal. The raters were first trained to use the rubric with 20 practice essays, and after the raters had reached interrater reliabilities of at least r = .50 for the analytic scores and at least r = .70 for the holistic score, the raters then scored the 313 essays independently. After scoring was completed, the differences between raters were calculated. If the difference in ratings on a feature was less than 2 points, an average score was computed for that essay feature. If the difference was greater than 2 points, a third expert rater adjudicated the final rating. The correlations between the raters before adjudication for the holistic score were r = .79, and for the organization score r = .69.
TAACO is a freely available text analysis tool that is written in Python, but it is implemented in a way that requires little to no knowledge of programming, since it can be started by double-clicking the TAACO icon. The TAACO interface is an easy to use and intuitive graphical user interface (GUI) that requires the user to select an input folder containing the files of interest (in .txt format). The user then selects an output folder for the output file and enters a name for a .csv file that TAACO will write the results for each text into (the default name is results.csv). The user then selects to process the texts, and a program status box informs the user of how many texts have been processed (see Fig. 1 for the TAACO GUI). Instructions and explanations for using TAACO, and the program itself, are available at www.kristopherkyle.com/taaco.html.
For a number of indices, the tool incorporates a part-of-speech (POS) tagger from the Natural Language Tool Kit (Bird, Klein, & Loper, 2009) and synonym sets from the WordNet lexical database (Miller, 1995). TAACO differs from other automatic tools that assess cohesion (i.e., Coh-Metrix; Graesser et al., 2004; McNamara et al., 2014) in that it reports on a greater number and variety of local, global, and overall text cohesion markers (see Table 1 for an overview). Additionally, TAACO is housed on the user’s hard drive, allowing users to work independently of outside servers, which allows for secure processing of sensitive data. TAACO also incorporates part-of-speech (POS) tags and WordNet synonym sets.
TAACO calculates a number of sentence overlap indices that assess local and global cohesion. These indices compute lemma (e.g., the lemma for the words human, humans, humanly, and inhumane is human) overlap between two adjacent sentences and paragraphs and between three adjacent sentences and paragraphs. TAACO calculates average overlap scores across sentences and paragraphs for all lemma overlap, content word lemma overlap, and lemma overlap for POS tags such as nouns, verbs, adjectives, adverbs, and pronouns. TAACO also calculates binary overlap scores for these features, which indicate whether there is any overlap between adjacent sentences or paragraphs. Local cohesion overlap indices have demonstrated positive relations with measures of cohesion in previous studies (McNamara et al. 2010a, b), but generally they demonstrate no significant relations with measures of coherence (Crossley & McNamara, 2010, 2011). Paragraph overlap indices have demonstrated positive relations with measures of text coherence in previous studies (Crossley & McNamara, 2011).
Using the WordNet database, TAACO calculates overlap between words and sets of word synonyms (synsets) between sentences and between paragraphs. Unlike strict overlap indices, these indices measure overlap between semantically related words (i.e., the synset for jump contains the related words leap, bound, and spring, among others). TAACO calculates semantic overlap between sentences (local cohesion) and paragraphs (global cohesion) for nouns and for verbs. Semantic overlap has demonstrated positive relations with measures of cohesion in previous studies (McNamara et al. 2010a, b), but generally it has demonstrated no significant relations with measures of coherence (Crossley & McNamara, 2010, 2011).
Givenness is an important element of measuring cohesion and reflects the amount of information that is recoverable from the preceding discourse. To assess givenness, TAACO calculates the incidence of a variety of pronoun types, including first (e.g., I, me, us), second (e.g., you), and third (e.g., he, she, him, them) person pronouns, subject pronouns (i.e., I, you, she, he, but not me, him, and her), and quantity pronouns (e.g., many), under the presumption that pronouns are used when information is given (Crossley, Allen, Kyle, & McNamara, 2014). Following a similar presumption, TAACO calculates the ratio of nouns to pronouns. TAACO also counts the incidence of definite articles (i.e., the) and demonstratives (i.e., this, those, that, and these), under the presumption that definiteness is used for given information. Lastly, TAACO calculates the number and proportion of single content lemmas (e.g., how many lemmas occur only once in a text). Givenness indices have demonstrated positive relations with measures of text coherence in previous studies (Crossley & McNamara, 2011). These indices are calculated at the text level.
Type–token ratio (TTR)
TTR measures the repetition of words in the text by dividing the number of individual words (types) by the total number of words (tokens). Thus, it likely taps into the amount of given information in a text. TAACO calculates a number of different TTR indices. These include simple TTR (the ratio of types to tokens), content word TTR (TTR using only content words such as nouns, verbs, adjectives, and adverbs), lemma TTR, and content lemma TTR. In addition to traditional word-based TTR indices, TAACO also calculates TTR for bigrams (i.e., two-word strings) and for trigrams (three-word strings). TTR indices have demonstrated positive relations with measures of cohesion in previous studies (Crossley & McNamara, 2014; McCarthy & Jarvis, 2010), but generally they demonstrate negative relations with measures of text coherence (Crossley & McNamara, 2010; McNamara et al. 2010a, b). TTR indices are calculated at the text level.
TAACO contains a number of connective indices that measure local cohesion. Many of the connective indices are similar to those found in Coh-Metrix (McNamara et al. 2014) and are theoretically based on two dimensions. The first dimension contrasts positive versus negative connectives, and the second dimension is associated with the particular classes of cohesion identified by Halliday and Hasan (1976) and Louwerse (2001), such as temporal, additive, and causative connectives. These theoretically based indices have demonstrated negligible or negative correlations with essay quality and essay coherence (Crossley & McNamara, 2010, 2011). A number of new connective indices were also included in TAACO, based on considerations of how connectives operate rhetorically in written texts, as compared to theoretical bases. These connective classes are summarized, with examples, in Table 2. The lists were collected through reference searches and consultation with experts. Some connective indices have demonstrated positive relations with measures of cohesion in previous studies (McNamara et al. 2010a, b), but generally they demonstrate no significant relations with measures of coherence (Crossley & McNamara, 2010, 2011).
For the essay analyses, the TAACO indices were the predictor indices, and the human scores (for both coherence and overall essay quality) were the criterion variables. Indices reported by TAACO that lacked normal distributions were removed. The corpus was first divided into training and test sets using a 67/33 split (Witten, Frank, & Hall, 2011). Using the training set, correlations were then calculated to determine whether there was a statistical (p < .05) and meaningful (of at least a small effect size, r > .10) relation between the TAACO indices and both the human scores for coherence and the human scores for holistic quality. Indices that were highly collinear (r > .90) were flagged, and the index with the strongest correlation with human scores was retained while the other indices were removed. The remaining indices were included as predictor variables in a stepwise multiple regression to explain the variance in the human scores of both coherence and overall essay quality. The model from the stepwise regression was then used to predict the variance in the human scores for the essays in the test set. We predicted that the global cohesion indices would positively correlate to the human ratings of coherence and essay quality, and that the local cohesion indices would correlate negatively.