Collaborative augmentation and simplification of text (CoAST): pedagogical applications of natural language processing in digital learning environments

The digitisation of higher education is raising significant questions about the impact of artificial intelligence and automation on teaching and learning environments, highlighting the need to investigate how teachers and students can work with new educational technologies in complementary ways. This paper reports results from a pilot study of the collaborative augmentation and simplification of text (CoAST) system, which is online software designed to facilitate the engagement of university students with theoretically-sophisticated academic texts. CoAST offers a digital learning interface that uses natural language processing algorithms to identify words that can be difficult to understand for readers at different ability levels. Course lecturers use their pedagogical content knowledge to add brief annotations to identified words. The software was trialed using a quasi-experimental design with (1) 23 undergraduate Education Studies students and (2) 23 digital and technology solutions students. Results suggest that CoAST offers a digital learning environment that can effectively mediate and enhance pedagogical relationships between teachers, students, and complex theoretical texts.


Introduction
In recent years, the disciplines of computer science, media studies, and education have found new points of convergence in the development and study of digital learning environments enhanced by artificial intelligence (AI). As a field that has historically integrated qualitative and quantitative studies that situate the psychosocial, technological, and architectural elements of learning environments within dynamic frameworks of multimodal analysis and evaluation (Imms et al., 2016;Tobin, 1998;Tobin & Fraser, 1998;Walker & 1 3 Fraser, 2005;Zandvliet, 2014), the field of learning environments research offers a fertile space for engaging with this interdisciplinary work.
Digital learning spaces now occupy a significant trajectory of research into learning environments. Work in this field has widely described digital learning environments as offering significant opportunities for more immersive (Bacca, 2014), mobile (Baran, 2014), blended (Casquero et al., 2016), asynchronous (Walker and Fraser, 2005), collaborative (Ho et al., 2011), adaptive (Freigang et al., 2018) and environmentally-distributed (Rousell, 2019) teaching and learning experiences. The widespread shift toward digital and blended learning environments has specifically transformed the landscape of higher education over the last two decades, with the digitisation, decentralisation, and massification of the university often leading to increased automation and instrumentalisation of educational provision (Peters & Besley, 2013). In some cases, the educational relationships between students and teachers in higher education are now mediated almost entirely through digital environments, including environments which are increasingly embedded with artificial intelligence and 'smart learning environment' technologies. This introduction of AI and smart interfaces into higher education is opening up new questions and sub-fields of inquiry within the broader field of learning environments research (Freigang et al., 2018;Song & Wang, 2020). These emerging technologies have demonstrated the capacity both to mobilise and instrumentalise learning in complex and unpredictable ways, while also demanding new technological literacies of both teachers and students (Oliver, 2011).
Collaboration has also been a central theme for research into digital learning environments over the last two decades (Nistor et al., 2015;Tobin, 1998). Peters and Besley (2013) argue that the increasing digitisation and decentralisation of the university holds significant opportunities for more democratic and imaginative forms of educational practice which emphasise "theories of collaboration, collective intelligence, commons-based peer production and mass participation in conceptions of open development" (p. x). However, twentyfirst century learning environments increasingly blur conventional boundaries between human and machine learning, raising questions about what it means to 'collaborate' with AI-driven algorithms, while also generating new sets of ethical and pragmatic problems regarding surveillance, control, and the automation of pedagogical work (de Freitas et al., 2019). Koper (2014) further cautions against the tendency to separate digital interfaces from the physical complex locations through which they are accessed, noting that elements of the physical environment continuously influence learners' engagement, attention, memory cueing, affective arousal, and encoding ability when interacting with digital learning interfaces.
Machine learning applications associated with text simplification and natural language processing (NLP) offer one possibility for developing and implementing collaborative learning environments that combine both digital and physical elements (Gasperin et al., 2009;Leroy et al., 2013). While text simplification has an active community of researchers experimenting with different methods of text-based machine learning, little is known about the specific applications of these experiments to enhance the practical work of teaching and learning in higher education. Moreover, the practical applications of text simplification in naturalistic online learning environments remain scarcely reported (Litman, 2016). To date, educational applications of NLP have primarily focused on complex word identification in scientific or medical texts (Gala et al., 2015) and the replacement of disciplinespecific terminology with more-common vernacular language (Aluisio et al., 2010). While the technical capabilities of NLP software advance at an accelerating rate (Hervas et al., 2014;Shardlow & Nawaz, 2019), the broad pedagogical applications of these technologies in digital learning environments remain relatively unexplored and under-theorised.
This article responds to the need for innovative pedagogical approaches and educational technologies that address the affordances of natural language processing within online collaborative learning environments (Ho et al., 2011). Specifically, we investigate the potentials for NLP to support and enhance elements of reading comprehension and pedagogy in higher education through the collaborative augmentation of academic texts. There is a commonly-reported 'gap' in the theoretical literacy of undergraduate students who struggle to read, comprehend, and engage productively with advanced theoretical texts (Bhat, 2012;Brabazon, 2011). Although theoretical literacy is widely cited as one of the key challenges and measures of success for university students, it has only rarely been the subject of research and pedagogical interventions in higher education (Alamon, 2003;Deng, 2004;Ivanic et al., 2009).
The article reports results from a pilot study of Collaborative Augmentation and Simplification of Text (CoAST) software developed to facilitate and enhance student engagement with theoretically-sophisticated academic texts. This work builds on prior research in the field of text simplification to develop new methods for collaborative engagement and augmentation of online texts. The CoAST software automatically identifies words that can be difficult to understand for readers at different ability levels and enables course lecturers to add brief annotations to the identified words. The software is designed to combine the machinic automation of complex word identification with the contextual sensitivity of the lecturer's pedagogical content knowledge (PCK). Importantly, the design of CoAST involves machine/human collaboration throughout the process, leading to high-quality simplifications and making it distinct from previous simplification efforts. Our initial trials and evaluation of the software suggest that the CoAST learning environment effectively enhances student engagement with complex theoretical texts, while also establishing innovative modes of digital mediation and collaboration between students and lecturers.
We situate the development and evaluation of this innovative piece of software within broader theoretical discussions in learning environments research (Mäkelä & Helfenstein, 2016;Roth, 2000;Rousell, 2016) and critical studies in education, new media, and technology (Dakers, 2019;Sellar & Cole, 2017). Our project was undertaken using an interdisciplinary approach that involved a synthesis of two or more disciplines to establish a new level of discourse and integration of knowledge (Choi & Pak, 2006). Drawing on computer science and philosophy of education provided us with a novel perspective on our research problem-the gap in theoretical literacy for undergraduate university students. We treated this gap as an opportunity to conduct a design experiment with human/machine collaborative pedagogies, rather than as an individual deficit susceptible to a technical or conceptual solution imposed from the outside.
In the first part of this article, we begin by situating the development and evaluation of the CoAST software within a conceptual design framework that integrates elements from the philosophy of technology and relevant literature in economics and education studies. Drawing on the philosophical work of Gilbert Simondon, we briefly outline a theoretical framework that focuses on human-technical relations as inherently cultural, collaborative, and pedagogical. We then discuss several critical concerns regarding the automation of labour through digitisation and artificial intelligence, foregrounding the need to identify and develop complementarities between humans and machines. The second part of the article introduces the design, development and trial of the CoAST platform, including detailed descriptions of our methods and initial results from trials of the software with cohorts of first-year Education Studies (n = 23) and Digital Technology Solutions (n = 23) students. A discussion of our findings is followed by a brief conclusion.

Gilbert Simondon's philosophy of technology
Writing in the mid-twentieth century, the French philosopher of technology, Gilbert Simondon (1958Simondon ( /2017, argued for more open and permeable relationships between humans and technical objects. Simondon noted that technical objects had historically been relegated to mere utility, essentially working as slaves to serve instrumental human needs. He argued that technology was having an alienating effect on human civilisation precisely because machines had been alienated from human culture. Modern culture failed to realise the individuations crystallised in machines that require ongoing support and connection, and therefore treated machines as simply things that either 'worked' or were discarded (Dakers, 2019). Simondon (1958Simondon ( /2017) advocated the cultivation of a 'culture of technics' and the design of 'open machines' that enable more-collaborative interconnections between machines and human intelligence. This requires a return to what Simondon calls 'originary technicity', in which cultural and technical realities are considered inseparable, as in the ritual and architectural constructions of ancient societies. Pragmatically, Simondon supported the development of what he calls 'technical ensembles' that bring technics and human culture together through a relationship of orchestration, rather than enslavement. The role of the human in this ensemble is that of inventor, conductor, interpreter, and organiser of technical objects and machines. The role of the technical object in the ensemble depends upon its sensitivity to external information, as well as its capacity to operate within a margin of indeterminacy in mutual relation with human coordinators.
Simondon's philosophy of technology has profound implications for the conceptualisation and design of educational technologies and learning environments, particularly in an age when artificial intelligence is becoming increasingly ubiquitous. At the practical level, Simondon's concepts of a 'culture of technics' and 'open machines' helped to inform the design of CoAST as a technical ensemble which integrates both machine and human intelligence. This also pertains to the integration of technical and cultural work into the process of teaching and learning within educational systems and environments. These ideas directly informed our design of CoAST as a collaborative learning environment for enhancing theoretical literacy in several ways. First, we wanted to create a system that blurred the conventional boundaries between the cultural labour of teaching and the technical labour of text simplification and augmentation. Second, we designed the system in ways that resisted the automation of teachers' labour and, instead, enhanced teachers' cultural work through new modes of human-machine interaction with online texts. Finally, we designed CoAST as an open technical ensemble that allows new modes of human-machine orchestration and mediation between teachers, learners, and texts.

Technology and skills
The argument that Simondon made from a philosophical perspective is supported by morerecent work in economics that examines relationships between humans and machines, particularly with regard to the impact of automation, including artificial intelligence, on the professions. Over the past five years, there has been renewed anxiety about automation and technological unemployment, linked to rapid developments in AI and the automation of tasks. Frey and Osbourne (2017) predicted that 47% of US jobs would be susceptible to automation, while others have used different approaches to argue that the risks of automation are not so great. For example, Arntz et al. (2016) found that, across 21 OECD countries, an average of 9% of jobs are automatable, and Nedelkoska and Quintini (2018) found that 14% of jobs across 32 OECD countries are highly automatable (equating to 66 million workers), while a further 32% have a high risk of automation (50-70%). Despite these differences, there is consensus that automation will be disruptive to a greater or lesser extent, and there are widespread efforts to address this disruption through education and skills policy.
Most technological change in the twentieth century was skill-biased (Katz & Autor, 1999), likely because of the increased supply of skilled workers accelerating the development of certain technologies (Acemoglu, 2002). As a result, demand for, and the earnings of, college-educated workers increased from 1970 to 1998 and this trend has underpinned predictions that education and training can accommodate changing skills requirements in future labour markets. Anxieties about machine substitution of human labour might be overstated, because routine tasks susceptible to automation often cannot be easily separated from non-routine tasks that require "interpersonal interaction, flexibility, adaptability, and problem solving" (Autor, 2015, p. 5). Autor et al. (2003) have shown that "[c] omputer technology substitutes for workers in performing routine tasks that can be readily described with programmed rules, while complementing workers in executing non-routine tasks demanding flexibility, creativity, generalized problem-solving capabilities, and complex communications" (p. 1322). Based on Autor's (2015) analysis, … the issue is not that middle-class workers are doomed by automation and technology, but instead that human capital investment must be at the heart of any long-term strategy for producing skills that are complemented by rather than substituted for by technological change. (p. 27).
There is consensus that automation will substitute for some tasks and workers, although the nature and extent of this substitution varies according to the assumptions and methodologies used to model the risks. The focus on complementarity in Autor's work highlights the need to develop new kinds of skills for working with machines, or to nurture existing non-routine skills, in order to ensure that machines do not replace large numbers of workers, including teachers.

Pedagogy
Our design of CoAST sought to address a gap in existing technical and cultural modes of mediation between teachers, learners, and texts through digital interfaces embedded in HE learning environments. In conventional models of teaching and learning, the relations between teachers, learners, and texts are often seen as mutually-exclusive interactions in the construction of reading comprehension (Alexander & Fox, 2004; see Fig. 1). Consider the example of discussing a text in a HE class. The teacher reads, the student reads and the teacher and student talk, or the teacher reads, the teacher and student talk, and the student reads. While these varied engagements with texts are pervasively conditioned by wider physical, social, and psychological elements of the learning environment, the pedagogical interactions with the text are limited to 1-to-1 engagements between the teacher/text, teacher/student, and student/text. Teachers' understanding of students' engagement with the text must be formed through their interpersonal relation 1 3 to students, and it is difficult for (a) the teacher to influence the student's reading process as it occurs and (b) the student to provide detailed and specific information about this process. This is particularly the case in HE learning environments where reading is typically undertaken as a solitary activity outside tutorial sessions. Arguably, the relationships are more integrated in primary-school classrooms where, for instance, texts are read aloud and comprehension is collectively engaged at the phrase, sentence, and paragraph levels (see, for instance, Scharlach, 2008). Lusted (1986, p.3) has argued that pedagogy involves "the transformation of consciousness that takes place in the intersection of three agencies-the teacher, the learner and the knowledge they together produce". A constructivist pedagogy, as opposed to the transmission of knowledge, "foregrounds exchange between and over the categories, it recognises the productivity of the relations, and it renders the parties within them as active, changing and changeable agencies" (p. 3). In Fig. 2, we show how the CoAST system is designed to mediate between the three agencies, operating as a technical ensemble that enables new modes of pedagogical collaboration amongst teachers, learners, and texts (e.g. Dascalu et al., 2015;Litman, 2016). Interaction with the system is still conditioned by pervasive elements of the wider learning environment (e.g. physical arrangement of furniture and hardware, social milieu, individual mental attitudes towards learning), but the system opens up different possibilities for collaborative engagement with the text through the redistribution of the teacher's specialised knowledge (Vitanova, 2004). Within the pedagogical relationship mediated by CoAST, the teacher selects texts and augments difficult words and phrases identified by the technical system. In many ways, this role is natural for teachers who possess knowledge and skills relevant to the particular disciplines and contexts in which they are teaching. The CoAST system is thus designed as a mediating agent that enables a more powerful use of teachers' pedagogical content knowledge (PCK).

Fig. 1
The student and teacher interact with the text and each other as mediated by elements of the wider HE learning environment. While acknowledging sociocultural and other environmental influences that pervade these interactions, direct relationships are often separated into 1-to-1 interactions between teacher/ text, teacher/student, and student/text The concept of PCK was popularised by Shulman (1986Shulman ( , 1987 and describes "that special amalgam of content and pedagogy that is uniquely the province of teachers, their own special form of professional understanding" (Shulman, 1987, p. 8). PCK "represents the blending of content and pedagogy into an understanding of how particular topics, problems, or issues are organized, represented, and adapted to the diverse interests and abilities of learners, and presented for instruction" (p. 8). CoAST provides (a) teachers with information about texts (difficult words) and students (interactions with the system) and (b) students with information about texts (definitions) provided by teachers (PCK). The text itself is also modified by its clean presentation within the CoAST learning environment, which enables configuration of font, font size, colours and other variables that can enhance readability.

Text simplification
Our theoretical framework also addresses the text itself as the third term in the pedagogical relationship, as mediated by pervasive physical, social, and psychological elements of the situated learning environment. Our design builds on recent technical work in the emerging field of text simplification. Early work focused on learning rules to adapt grammatical structures (Chandrasekar & Srinivas, 1997) and directly replacing difficult items of vocabulary with simpler alternatives for people with autism (Devlin & Tait, 1998). Although these two strands of research continued throughout the 2000s (Shardlow, 2014;Siddharthan, 2014), they remained disconnected, with efforts focusing on either simplifying the grammar or the vocabulary, but never both at the same time. More recent approaches have Fig. 2 The introduction of the CoAST digital interface recentralises the relationships, while remaining influenced by pervasive physical, social, and psychological elements of the situation. The teacher and the student are now able to interact with the text, and each other, through the system, establishing a new collaborative interface for sharing the teacher's pedagogical content knowledge in order to build students' theoretical literacy leveraged statistical machine translation (Xu et al., 2016) and neural machine translation (Nisioi et al., 2017) techniques for text simplification, which allow both syntax and lexicon to be simplified at the same time. However, because this is not without error, we have chosen in our work not to use fully-automated simplification, but instead to enable the teacher to provide the simplifications. This creates an opportunity for teachers to apply their PCK within CoAST to create a pedagogical relationship with students at the point of reading.
An important pre-processing step in lexical simplification is complex word identification (Shardlow, 2013), which involves each word being evaluated to determine if it requires simplification or not. Typical approaches include simplifying every word with an easier alternative (Devlin & Tait, 1998), using a frequency threshold to simplify only less-frequent words (Bott et al., 2012), or using machine learning to identify words which require simplification (Shardlow, 2013). Complex word identification is a naturally subjective task, and words identified as complex depend on the genre of the text, the intended audience and the situated context in which the word is used. We apply complex word identification as a stand-alone task to identify words which students might not understand. We used frequency thresholding with the Google Web1T frequencies, which have proven to be good predictors of lexical complexity in many cases (Brants & Franz, 2006).
Using a frequency threshold allows the CoAST system to modify the words that are identified as difficult depending on the student's abilities, as well as the teacher's expertise. Teachers can (a) choose to annotate the words identified by the frequency threshold or (b) select their own words for annotation based on their knowledge of the reading, their coursework and their students. The system then stores the annotations, thereby building up a lexical corpus of complex words and annotations that can be deployed across any number of texts.

Design and functionality of CoAST
The CoAST system is delivered via a web application, is compatible with all modern web browsers, and was developed using a javascript framework incorporating current web technologies such as angular.js and node.js. CoAST is backed up by a MongoDB database, which stores information such as the user's registration details, the text documents and monitoring information. CoAST is currently hosted on an Amazon EC2 instance and the database is hosted via MLab. At the back-end, the CoAST database is structured as shown in Fig. 3. There are 5 tables in the database corresponding to the objects in our system: Users, Posts, Words, Clicks and Activity (Table 1).
The database structure allows recording and capturing complex information about how our system is used. We can easily see how many times a specific annotation has been clicked by a specific user, because each individual click is registered against the user's ID. We can also see how engaged users are with the system by observing how frequently they have interacted with the various posts that have been made available to them. This information can be shared with teachers to allow them to monitor their students' interaction with the system, thereby increasing their understanding of words and annotations that are accessed most frequently. This information is also valuable for research purposes because we can use it to perform analytics on the users' interactions with the system.
The website is laid out in a simple format as shown in the site-map (Fig. 4). Users must first enter their credentials at the login screen before they can select their course and documents from their course that have been annotated. There is also a page for analytics, which is only available to teachers. The teacher uploads a text document and the system highlights words that might need annotation, based on an analysis of word frequency thresholds. The highlights can be produced at three levels of student ability (beginner, intermediate and advanced), with more words highlighted for beginners than for advanced students.
To identify words which can be difficult for a student, we analyse word frequency according to the Google Web1T resource (LDC) whose frequencies were counted from 1 trillion words from texts collected from the internet and give a good representation of how frequently a word occurs. This is a good general indicator of complexity, following the hypothesis that words which have been seen by a reader more frequently will be more familiar to them and hence easier to understand. We set three thresholds at 150,000, 75,000 and 20,000 for each level, using test documents to tune the thresholds. The larger the threshold, the more words were identified.

User story 1: a typical interaction between a teacher and the system
To explain how the CoAST system can be used within differing contexts of a broader learning environment, we present two user stories and a series of images showing the usage of the system. In User Story 1, the teacher wants the class to read a subject-specific text which contains terminology beyond the expected ability of the students. The teacher uploads the text to the CoAST system (Fig. 5) and runs the word finding algorithm to identify potentially-difficult words (Fig. 6). The teacher then annotates the words that have been highlighted (Fig. 7), as well as any other words of their choice. If the word has previously been annotated in another document, earlier annotations are shown to the teacher who can either select an existing annotation or write a new one if the context or purpose differs. Once a word is annotated in a document, it appears for every instance of that word in the document to reduce the teacher's workload, following the one sense per discourse hypothesis (Gale et al., 1992). All annotations are saved as the teacher makes them. When teachers are finished, they exit the application and the document is immediately available to students.

User story 2: a typical interaction between a student and the system
Students log into the system from home, where they are presented with the available documents (Fig. 8). Students cannot edit the documents, only view them. When students open a document, they are presented with the original text with annotations (Fig. 9). The annotated words are highlighted, which minimises the disruption to the visual flow of the text.
To view an annotation, students must click on a highlighted word, revealing the annotation (Fig. 10). Students can read the document at their own pace, reviewing the annotations for words that they do not understand.

Material and methods
The project employed a design-based methodology (Gutierrez, 2016) to generate insights and build theory about learning through the development, prototyping, and evaluation of a technological, social, and conceptual intervention into HE teaching and learning (Kelly et al., 2008). The project was driven by the following aims: 1. To develop software for the collaborative augmentation and simplification of text to support student engagement with advanced theoretical texts 2. To pilot and evaluate the CoAST system with teachers and undergraduate students  Within this broader design methodology, we emplyed two experiments to test the efficacy of the CoAST learning environment. An initial experiment explored how well the automated complex word identification process compared with a teacher's assessment of which words a student would have difficulty understanding. We first gave each lecturer (n = 2) the test document offline from the system and asked him/her to highlight any words that students might not understand. This allowed us to assess teachers' agreement about which words were complex and about the system's automated complex word identification function.  For the principal experiment, our research question was: Does teacher and student use of the CoAST platform increase students' ability to comprehend key words in theoretical texts? We used a quasi-experimental design to trial the CoAST system on two separate occasions. Three abstracts from papers recently published in the journal Globalisation, Societies and Education (GSE) were uploaded to CoAST for the trial and annotated by two lecturers in the Education Studies unit. Articles from GSE aligned with the curriculum of the Education Studies students with whom CoAST was first trialed. GSE also publishes articles that draw on a wide range of humanities and social science disciplines, including theoretically-sophisticated articles. We selected three abstracts for the trial to provide students with a variety of texts that students could read in the time available for the experiment.
The first trial was conducted with two classes from a first-year undergraduate Education Studies unit. The second trial was conducted with one class of first-year Digital Technology Solutions (DTS) students, who were less familiar with the disciplinary knowledge and technical terms in the texts. This design enabled two types of comparison: (1) experimental versus control; and (2) education studies versus DTS students. All students were native English speakers enrolled in a first-year course at a UK university.
The trials began with a standard introduction to the task and division of students into experimental and control groups. Students were randomly selected into control and experimental groups but, because of their selection of particular units and their decision to attend the class on this particular day, the assignment of students to control and experimental groups was not truly random. Both groups of students then undertook the trial of the system simultaneously in adjacent study laboratories containing desktop computers arranged in parallel rows with standard office chairs. Students began by completing a basic 10-item synonym matching task. Ten words identified as difficult in the selected texts were included in one column and students were asked to match these to synonyms in a second column. This task was used to establish students' prior understanding of key words that were annotated for the control group. Students then read through the annotated texts, with the experimental group having access to the annotations. After reading the texts, students completed a 9-item reading comprehension task that included three items testing their vocabulary, three items testing their capacity to retrieve appropriate information from the text, and three items testing their capacity to draw appropriate inferences from the text.
After the trials with Education Studies students, we conducted focus-group discussions about students' experience of using the tool and their perspectives on its design and potential to support their learning. We analysed focus-group data to answer the following question: How do students experience CoAST as users? We conducted four focus-groups (two with the control groups and two with the experimental groups). The structure of the class with the DTS students did not enable us to conduct structured focus-group conversations, but we did debrief with the students.
The pretest and posttests of reading comprehension were analysed statistically to identify changes in understanding of key words and concepts after reading the texts with and without annotations. Focus-group data were transcribed and thematically evaluated. Interaction data from the experimental groups were analysed to identify interaction patterns in student use of the CoAST learning environment, particularly the numbers of clicks on annotated words.

Results
In our initial experiment, we compared the words highlighted by each teacher using the F1 metric and, although they both agreed on many words, they also disagreed in some cases. Both teachers selected words that the other teacher had not selected. This gave an overall F1 score between the teachers of 0.579, indicating moderate agreement. This is indicative of the subjective nature of the task of identifying complex vocabulary. Each teacher selects a slightly-different subset of words to annotate for students. Our system does not prevent this and, instead, only suggests words that the teacher might not have considered for their annotation, which we hope leads to better coverage of the text (Table 2).
After teachers had completed their own highlighting, we asked them to work together to come up with a resolved list of words that required annotating and could be used with both groups to test our system. We compared the resolved list of annotations to those returned by the beginner, intermediate and advanced metrics, respectively. We found that the F1 score between the resolved list and each automated list increased from advanced to beginner, indicating that the resolved list was more suitable for lower-ability students. Students in the Education Studies program have some of the lowest admissions scores of all students across the university. The F1 score between the resolved list and the beginner's automated list of suggestions was 0.623, which is higher than the agreement between the two teachers in the pre-annotation. Further, this received a precision score of 1.0, indicating that all words from the beginner's list were contained in the resolved list. Perhaps the threshold used to create the beginner's list could be reduced in order to better capture the words that were suggested by the teachers.
We do not expect that the words suggested by our system to form the final list of annotations that a teacher chooses to make, but we do anticipate that, by using the system's suggestions, teachers will save themselves time and effort by having some of the difficult vocabulary highlighted for them. As subject experts, teachers might not realise that words which are common to their field are not well understood by their students. Our system is domain specific and suggests words that are difficult for a lay person. After using the suggested words feature, a teacher would still need to look through the rest of the document to see if any further words require explaining for their students (Table 3).
In our principal experiment, we asked students to use the system to record their understanding of the concepts in the text before and after the interaction. Each cohort was subdivided into an experimental group which received all the annotations on the text and a control group that did not receive annotations. The results from each group are presented in Table 4. The mean average scores on the pretest are shown in row 1, the mean average scores on the posttest are shown in row 2, and the difference between these scores (delta in raw percentage points) is shown in row 3. The overall improvement between the control group and experimental group is shown in row 4 as the average improvement in percentage points between the control and experimental groups. In each case, the experimental group performed worse (p = 0.05) than the control group. We selected students at random according to seating positions in the class, but it is not possible to tell whether this inadvertently biased the sample by enabling students to self-select into peer groups that might have been more or less able than other groups. The students were not told which group they were in until the end of the session.
The pretest results help to better interpret the posttest results. Although students in the experimental group performed worse (although this difference is small and non-significant at p = 0.34), it is important to note that students in the experimental group started with less understanding of the difficult words in the texts, as shown by the pretest. This becomes clear when observing the delta between the cohorts, with students on average receiving 7.59 percentage points higher when using our system (when we control for the pretest).
We used an unpaired t-test to analyse the significance of this result and found that the p-value for the delta between the experimental and control group was 0.12 (t = 1.58). We calculated the effect size to be 0.35, indicating a moderate effect. A power analysis showed that a sample size of N = 102 (2.5* our current sample size) would be required to yield a Table 3 Results of experiments with students, demonstrating a positive improvement in performance between control (without annotations) and experimental (with annotations) groups (Results are presented for each cohort: ED = education studies, DTS = Digital and Technology Solutions, ALL = both cohorts combined) significant result. Although our p-value is not below the well-known threshold of 0.05, it is still informative to see that our results have a low p-value and a moderate effect (Betensky, 2019). We anticipate that future work with increased sample sizes would improve statistical significance. There is some difference in the performance of our two cohorts. The Education Studies students were more familiar with the text genre, whereas the DTS students were less familiar. DTS students outperformed Education Studies students across the pretest and posttests, with DTS students performing on average 24 percentage points better on the pretest and 9 percentage points better on the posttest. Additionally, the magnitude of the effect differed between the two cohorts, with DTS students improving by 12.15 percentage points, whereas the Education studies students improved by 4.33 percentage points. It is unclear why DTS students outperformed Education Studies students, but this could be because many DTS students enter the course as mature students from a professional setting.
After the main experiment, we analysed the click data for the annotations to understand which definitions were most frequently viewed. The click rate was similar for both groups (369 clicks for DTS, 302 for ED), and Spearman's rank correlation between the two groups' most clicked words was 0.51. It is clear from the two lists in Table 5 that the two cohorts clicked frequently on similar words, with 8 of the top 10 appearing on both lists. This confirms that both lecturers and the system are highlighting words for annotation that are difficult for students across different programs and with different levels of academic and professional experience.

Focus-group data
Contributions to the focus-group discussions with Education Studies students are coded using the speaker's group assignment (experimental group 1 or 2 are EG1 and EG2; control groups 1 and 2 are CG1 and CG2) and are not attributed consistently to unique individuals. Clear themes emerged from the focus groups. Students largely found the CoAST learning environment easy to use and they appreciated the organisation of the text into an accessible format, including the use of a message to frame the text for readers: It explained at the top what the key points of the articles are like globalisation and stuff, and how the title and the authors are there, so then when you need to like cite something it's easier to like pick out the titles and the author. [EG2]. I think it's pretty straightforward in terms of when you go onto the page you kind of know what you're doing, I don't think it's complicated at all. And again I think the main thing is the highlighted key words, it does kind of make it stand out from just reading an abstract without anything highlighted. [EG2]. I think the software itself, it was quite easy to use. I found the highlighted bits quite helpful. [EG1].
However, some concerns were raised about the difficulty of reading text on screen, the size of the font and accessibility for students with dyslexia, who could find the black-andwhite text hard to read on screen.
Across both experimental and control groups, there was consensus that the annotations helped students to make sense of the text: When you're using the software always have the keywords highlighted like that. Yeah I think that's very useful to understand like meanings of words. [EG2]. Some of the words it was just like I've seen it before but wouldn't be able to just come up with a definition for it off the top of my head, like I'd need something there to make sense of the reading. [CG2].
Students explained very clearly how they make sense at a sentence level and that one or more difficult or unknown words in a sentence can thwart this strategy: Sometimes … we kind of like make up the word because we see it next to other words in the sentence, we try to make it make sense … but some of them you just can't, especially when you've got three or four in one sentence, so [annotations] would help a lot more to do the readings. [CG1].
Currently, students rely on Google when confronted with unknown or difficult words, but this: (a) requires them to interrupt their reading to switch between windows or screens; and (b) provides definitions that might be unclear or inappropriate to the context of their reading. Lecturers working with these students have used glossaries to aid reading comprehension, but CoAST provides a degree of embeddedness and context specificity that students felt improved on this strategy: Last year we had like a page of terms which was helpful … literally we had a page of terms, like 'hegemonic' was one of them … but again it could mean a different context for different sentence in different abstracts. So I still think that's more useful than just having a page of terms because like you said you could have the meaning and purpose to that part … it depends on what you're talking about in that context if you know what I mean. [CG1].
Students raised questions about (a) the length of definitions, with some wanting longer or shorter definitions and (b) whether the correct words were highlighted: Yeah some of the words are highlighted … I feel like some of them didn't need to be in there, others in the textbook did, so it may need to be like more keywords, like just words that some people may not … like the key terms … 'globalisation' isn't highlighted and some people may not know what that means. [EG2]. Some of them were long enough, some of them weren't. Like some of them you only had like a few words. I'd like them to kind of all be similar, so then you could get like … I feel like you'd get the same amount of information about each one. [EG2].
Finally, students made a number of suggestions for improvement, including linked sound files that provide guidance on pronunciation: I think really it could read it out to you, become sometimes you don't read it in the right way, so where you read it in your head, it's not the right word, but when you hear it out loud you think 'Oh yeah I know that word' do you know what I mean? [CG1].

3
The value of this feature was apparent in another focus-group discussion when a student explained that "for example, … 'emanc-' I don't know how to pronounce it" [EG2]. We believe that a pronunciation feature would increase students' confidence in using difficult terms in classroom discussions. Students also asked whether we could enable them to save definitions in a personal glossary: It would be useful if you could like save them, so like the ones that you've clicked, if you could like save them to like a word bank or like a personal dictionary or something. Cos like obviously for us like they're all the kinds of words that we'd come back to. Because a lot of them like I'd seen before, but I'd forgotten the definition of them. So if you could like just save them and be able to come back to them and be like 'Oh yeah, that's what that one meant'. [CG2].
Overall, the tool was widely appreciated by students and lecturers were approached after the session with requests to make it available to them in their everyday studies. CoAST was clearly felt to address a gap in the support that students desired in relation to reading texts and was seen as a valuable tool that could be integrated into virtual learning environments: It's kind of like getting the support without having to ask for it, if that makes sense. Cos I know I would probably struggle just a little bit, and it kind of saves time from needing to research what each word means because it's just there, so it's like the support's already there. [EG2].

Discussion
On the basis of our results, we can answer our research question in the affirmative. Teacher and student use of the CoAST platform produced a measurable increase students' ability to comprehend key words in theoretical texts in our experiments. CoAST mediated the relationship between text and student to support comprehension of advanced theoretical texts. The students who had access to the annotated texts improved more than their peers between pretests and posttests, and students in both experimental and control groups clearly felt that the annotations would support their reading of academic texts.
CoAST also entered into the relationship between lecturers and texts by providing suggestions regarding difficult words. The system provides an external perspective on the potential difficulty of texts for particular student cohorts and reduces the time required for lecturers to identify and plan to teach difficult words that relate to potentially unfamiliar contexts. The two lecturers also discussed the challenges that arose when trying to create brief, context-specific annotations. While this presented an intellectual challenge in some cases, we see this as a benefit because it encouraged lecturers to settle on clear definitions that could be used in other teaching situations and encouraged reflection about different possible definitions. Such reflection potentially improves lecturers' abilities to communicate their knowledge to students and thus can be seen as a professional development opportunity.
Finally, CoAST mediated the relationship between lecturers and students by creating two additional flows of information in relation to the reading process. The annotation function creates an opportunity for students to benefit from teachers' PCK at the point of reading, rather than simply in preparation for reading or when discussing readings. The click analysis provides lecturers with an indication of which words students feel most need 1 3 for support to understand. While some students queried the length of the annotations and indicated that additional words could be annotated, these suggestions are opportunities to enrich classroom discussion, rather than shortcomings of the system. Consider the example described by many students when asked to explain what they do when they encounter an unknown word in an academic text: resorting to Google. In this example, the student must switch media (paper to digital device) or switch windows or applications. Students are then confronted with the task of identifying a useful definition based on their search and then retaining this definition when they return to their reading. There are many opportunities for misunderstanding. For example, a student might find an entirely-inappropriate definition yet feel confident in their understanding of the concept, thus reducing opportunities for the lecturer to teach to the initial misunderstanding.
In our experience of using the CoAST learning environment, we found that it creates multiple opportunities for pedagogical conversations: conversations between lecturers about key concepts and how to teach them; and conversations between lecturers and students about the words that they find difficult and how to explain them. While we have initial evidence to suggest that the system does improve comprehension for student users, we are confident that it does mediate pedagogical relationships in ways that maximise the benefits of complementary capacities of the humans and machines involved in a technical ensemble (Simondon, 1958(Simondon, /2017.

Conclusion
This article has described the trial and evaluation of online Collaborative Text Augmentation Software (CoAST) designed to enhance student engagement with advanced theoretical texts in higher-education contexts. In outlining our interdisciplinary approach, we situated the development and implementation of the software within a conceptual design framework informed by theories of machine-human complementarity drawn from philosophy of technology and contemporary economic theory. We then described the implementation and evaluation of CoAST through quasi-experimental trials with two cohorts of higher-education students. The results of the experimental trial suggest that using our system leads to an improvement in a student's ability to recall and comprehend information from a text, when controlling for their prior abilities. In the best case, we observed that DTS students using the annotations (experimental group) experienced an increase of 12 percentage points between pretest and posttest compared to those who used the system without annotations (control group).
This study contributes to knowledge regarding the impact of text simplification software on student engagement with texts in naturalistic settings. It also makes a secondary contribution to the broader field of learning environments research in developing a conceptual design framework that emphasises human-machine complementarity within a constructivist pedagogy. Our framework conceptualises design-based interventions, such as CoAST, as socio-technical ensembles that mediate and enhance the pedagogical relationships between teachers, students, and texts. Through considerations of human-machine complementarity within these pedagogical relationships, the article introduces novel theoretical and practical trajectories into ongoing discussions and developments of digital learning environments to enhance literacy, collaboration, and engagement (Freigang et al., 2018;Okan, 2008). Specifically, it places questions of human-machine complementarity and pedagogical content knowledge at the centre of digital learning environment design, with digital algorithms, languages, and processes being considered powerful agencies within dynamic assemblages of human and nonhuman elements. Our study thus raises new questions about how sociotechnical ensembles 'teach', 'learn', 'collaborate', and 'construct knowledge' through the mutual imbrication of pedagogical elements and relations within learning environments. By emphasising the role of human-machine complementarity in the development of educational technologies such as CoAST, we hope to open new trajectories of inquiry into the design and implementation of text simplification software in HE learning environments.