Introduction

From a cognitive view, language production—spoken (Bock, 1982; Caramazza, 1997; Fromkin, 1973; Levelt, 1989) as well as written (Bereiter & Scardamalia, 2013; Hayes, 1996; Hayes & Berninger, 2014; Hayes & Flower, 1980; Kellogg, 1996)—involves three basic levels of processing: conceptual, linguistic, and motor processes. A particular aspect of written language production is that it leaves a “visible” trace which provides writers with continuous access to a visual representation of their emerging texts, i.e., the words/sentences/paragraphs, they have written so far. Most writers spend a significant proportion of time looking at certain parts of their emerging texts (e.g., Johansson et al., 2010). Such gaze behaviour is likely to serve a multitude of functions and these functions are interesting per se. But since computer writing offers writers considerable freedom to gaze at other locations than the inscription point, and keystroke logging and eye tracking provide tools to measure these processes—separately and in interaction, gaze behaviour during writing can also constitute a test ground for recent theories on parallel processing in writing. However, despite the introduction of eye tracking in writing research some 20 years ago, methods for understanding the complex interaction between gaze behaviour, visual attention and written language are still in their childhood, and thus the interaction between gaze behaviour, visual attention and written language production remains poorly understood. In the present paper we attempt to add a piece to this puzzle by presenting an approach to examine writers’ visual attention concurrent with their text typing. More specifically, we will introduce a method to capture writers’ typing during their fixations, and through this outline how writers’ typing activity can be examined in relation to concurrent visual behaviour.

Visual attention to their emerging texts allows writers to assess whether the texts they are writing optimally fulfil their communicative goals and, if necessary, revise them. Therefore, in addition to the above-mentioned three basic levels, most models of writing processes also include components of reading/reviewing and revision. These processes are, however, as argued by Olive (2014) not truly essential when producing a piece of written language: “Not only can revising the text be postponed until long after it has been written, but it can even be optional” (p. 175). This is true, but most skilled writers typically spend a certain proportion of their text production looking back into their texts, and re-reading one’s own text during text production is often thought to be associated with text quality. There is little empirical research to substantiate this notion, however. Breetvelt et al. (1994) carried out a think-aloud study which indicated that rereading was positively correlated with text quality in 9th-grade students, and that the strength of this relationship changed over the course of writing. In contrast Torrance et al. (2016b), who used eye tracking methods, found no quality differences between texts produced by 16–17-year-old students in a condition when they could see what they were writing, and an experimental condition where they had no visible access to the text; in the latter case each letter appeared on the screen as an X. There are of course many possible explanations for these contradictory results, and most likely the relation between rereading and text quality varies with individual characteristics of the writer as well as with contextual factors, such as task or writing duration.

For our purpose, the most interesting implication of Breetvelt et al.’s study is that rereading was not only associated with evaluation and revision of the text produced so far but strongly correlated with text generation processes. This supports the propositions by Pianko (1979), that writers look at prior text segments to reorient themselves to what they have just written for the purpose of deciding what to write next. That is, the emerging text could function as a visual external storage, which could potentially be used to trigger the generation of new ideas, to create and maintain coherence, and/or to decrease the cognitive load of the writer more generally (cf. Galbraith, 2009; Hayes, 1996; Kellogg, 2008; Perl, 1979). A keyword here is “looking back”. Writers’ lookbacks to previous parts of their emerging text do not necessarily follow the prototypical reading patterns of static texts (eyes moving relatively linearly across the lines). How visual feedback is used by writers and in what way such information affects their ongoing text production is still to a large extent unexplored. One of the reasons for this is that in contrast to the static texts that are used in traditional reading research (for overviews, see Engbert et al., 2002; Rayner, 1998) the texts that emerge over time in writing are dynamic and typically comprise information that is constantly growing or changing, more or less unpredictably, as the writing task proceeds (cf. Alamargot et al., 2011; Wengelin et al., 2009). Thus, it is challenging for writing researchers to create one-to-one-relations between the writer’s gaze direction and relevant information in the emerging text (e.g., a word) and this is particularly true when the text exceeds one screen and requires scrolling (e.g., de Smet et al., 2018; Torrance et al., 2016a; Wengelin et al., 2009). Therefore, the established method in traditional reading research, where gaze behaviour is analysed in respect to fixed areas of interest on a static text image presented on the computer screen, is not optimal for corresponding purposes in writing research.

To solve this, researchers have developed and gradually refined software that combines eye tracking with keystroke logging or handwriting capture to allow inobtrusive collection of detailed real-time temporal data of writers’ eye movement that enables them to link writers’ visual behaviour to cognitive processes. Examples of such software include ScriptLog + TimeLine (Andersson, et al., 2006, Wengelin, et al., 2009), EyeWrite (Simpson & Torrance, 2007), Eye and Pen (Alamargot et al., 2006), Inputlog’s merging function (Leijten & Van Waes, 2013), New ScriptLog (Wengelin et al., 2019), and CyWrite (Chukharev-Hudilainen et al., 2019). Synchronizing keystroke logging with eye tracking, or handwriting capture, enables researchers to observe how writers fixate and move their eyes between elements of their own emerging texts, and through that gain understanding of how gaze behaviour and typing interact during composition. See, for example, Anson and Schwegler (2012) for an overview of what eye tracking can add to writing research, and de Smet et al. (2018) for an overview of recent empirical studies. Of particular interest for this paper are studies of visual attention to already-written text during composition for writers who are not hindered by unautomatized transcription processes– as distinguished from single word (Alamargot et al., 2015) and sentence level experiments (de Smet et al., 2018; Nottbusch, 2010; Torrance & Nottbusch, 2012; Van Waes et al., 2010), translation studies (Alves et al., 2014; Dragsted & Carl, 2013; Krüger, 2016), or studies focussing on specific groups, such as young children (Drijbooms et al., 2020), L2 writers (Chukarev-Hudilainen et al., 2019), or dyslexic writers (Wengelin et al., 2014).

Beers et al. (2010) and Johansson et al. (2010) both took their point of departure in “traditional” definitions of reading. They operationalised reading (during writing) as has been done in reading research: sequences of linear reading patterns such as a series of three or more consecutive saccades. Beers et al. were interested in younger writers and focused on adolescents (≈ 13-year-old). To remove potentially confounding factors, such as spelling and keyboarding skills, in these young writers, they let them dictate their texts to a typist who transcribed their texts. Their results may not necessarily be comparable to those of adult typists, but nevertheless their analytical categories are of interest for this article. They analysed reading behaviour according to how far from the inscription point it occurred: at the inscription point (on the word currently being written or the one before), locally (upon the most recently composed 1–2 lines, globally (at least two lines above the cursor position), on the prompt, or off-text. Only reading at the inscription point and local reading could be related to text quality, possibly because of input modality; during dictation writers could constantly read at the inscription point and may have less need for global reading. Beers et al. suggested that a function of “lookbacks” could be to create cohesion and coherence.

That transcription skill is indeed important was confirmed by Johansson et al. (2010). They explored reading patterns during text composition in a group of university students in an attempt to understand how being able to look at the monitor rather than the keyboard during typing influences the writing process. A cluster analysis categorised writers as monitor gazers, keyboard gazers or “between”. Monitor gazers are particularly interesting in that they – with certain similarity to Beers’ dictating adolescents – have minimized the cognitive load associated with transcription. Based on a two-second pause-criterion, the authors compared reading during pauses with reading during writing for both groups. They made two observations. Firstly, only a limited proportion of the writers’ eye movements could be classified as “prototypical reading”. Secondly, as could be expected, monitor gazers were less dependent on pauses for reading. In fact, they even appeared to be able to read previously-written text segments while they were typing at the leading edge, and such behaviour was associated with longer fixation durations when compared to reading during pauses. The authors suggested that not having to shift attention between the keyboard and the monitor probably enabled monitor gazers to devote a larger share of their working-memory resources to higher-level processes than keyboard gazers, and possibly to parallel processing.

Torrance et al. (2016a) set out to investigate the first of these issues: to what extent visual behaviour during writing resembles that of prototypical reading of a static unknown text. In two experiments they collected synchronized typing and gaze behaviour data, which they contrasted with tasks where participants read and evaluated researcher-provided texts. To the best of our knowledge this is the most extensive description of typists’ gaze behaviour during text composition so far. Torrance et al. made two principal distinctions: a) whether a fixation was part of a “sustained reading sequence” in accordance with the prototypical reading patterns mentioned above, and b) whether the fixation was local or distal. While their participants did indeed engage in gaze behaviour that could be categorized as prototypical reading behaviour, a larger proportion of their eye movements consisted of “irregular patterns”, in which writers moved their gazes back and forth over various parts of the texts. This was not the case for their reading of the researcher-provided texts. Furthermore, fixation durations in the writing task were significantly longer for local fixations than when looking at more distal parts of the text. Albeit certain differences in the definitions of local and global, these results support those of Beers et al. (2010). Torrance et al. suggest that fixations which are more “local” to the inscription point are likely to be associated with more extensive processing and thus likely to reflect planning and/or monitoring processes above the lexical level.

The second, and more tentative, observation made by Johansson et al. (2010), about potential parallel processing, has to our knowledge not been further investigated in the context of computer writing. This is somewhat surprising. Language production models are moving away from previous sequential or serial models (e.g., Levelt, 1989) towards models in which linguistic processes can operate in parallel (e.g., Kello et al., 2000)—as long as their accumulated demands do not exceed the total amount of cognitive resources (Just & Carpenter, 1992). For written language production, Olive (2014) suggested a parallel cascading model in which conceptual, formulation and handwriting levels of processing can operate concurrently during fluent phases of writing. He argued that such a model can constitute a heuristic for further understanding of the coordination of the different levels of processing involved in writing. The empirical evidence is limited, but a handful of studies (e.g., Alamargot et al., 2007; Chanquoy et al., 1990; Olive & Kellogg, 2002) support the idea that higher-level processing can take place concurrently with handwriting. For instance, Alamargot et al. (2007) investigated events where writers were able to continue transcribing while visually attending to a part of the emerging text that was distant (> 4 cm) from where the pen was being moved. On average these events appeared in 33.5% of the words. Whether studies of typing will yield similar results is an empirical question because of the different constraints of the two input modalities. While in handwriting the writer typically needs to fixate the point of inscription most of the time, keyboarding allows more freedom for skilled typists (monitor gazers) to move their gaze. Furthermore, while computer writing provides the writer with almost endless possibilities to revise the text, global revisions are not feasible during handwriting. In principle, it could also be possible that typing can be automatized to a larger extent than handwriting and thus allow more cognitive resources for reviewing different parts of the text, but we are not aware of any empirical investigation to support this.

To our knowledge nobody has carried out investigations of parallel events in typing that correspond to the handwriting study of Alamargot et al. (2007)—or suggested methods for doing it, but playbacks of our synchronized keystroke-logging and eye tracking data (collected with ScriptLog; Wengelin, et al., 2019) suggest that such parallel events take place also during typing. We illustrate this in examples 1 (a–b) and 2 (a–e). Both are extracts from an experimental writing session in which a male Swedish adult was asked to write for 30 min discussing why cheating occurs in educational contexts, and how this problem could be remediated. For lack of space, only parts of the text that are relevant for the examples are included.

Example 1a occurs during text production, in a context where the writer has just outlined the problem, described its possible negative consequences, and is about to write a concluding sentence. The example starts with the sentence that includes the word he is currently fixating (indicated here with a little black ring). This sentence is located in the same paragraph as the inscription point, but six lines above it, and seven sentences behind.


Example 1a) Attempting to formulate the concluding sentence

Då det inte är social accepterat att fuska, kan fusk leda till diverse negativa sociala \({{\text{k}}}{\textcircled{on}}{\text{sekvenser}}\) konsekvenser för en påkommen fuskare. Utstötning är en av dessa. Detta leder oss in i frågeställningen om varför fusk trots allt existerar. En del av orsaken kan vara kravet på att vara lyckad. Detta krav kan komma från flera håll, externt såsom från föräldrar, kamrater och samhället i stort, och internt, krav på sig själv. Ofta är det interna trycket värre än det externa. Då alla inte hr samma förutsättningar för att bli lyckade, är det lockande att bättre på sina chanser genom att ta till fusk. Dessutom är risken för upptäckt relativt liten, särskilt för en god fuskare, någon som fuskar smart.

Genom at


Since it isn’t socially accepted to cheat, cheating may lead to a range of negative social \({{c}}{\textcircled{on}}{{sequences}}\) for a detected cheater. Exclusion is one of these. This leads us to the question of why, despite this, cheating still exists. One of the reasons may be the demand to be successful. This demand can have several origins, external, such as from parents, friends and society in general, and internal, high standards. Frequently, the internal pressure is worse than the external. Since not everyone has the same opportunities to be successful, it is tempting to improve your chances by cheating. The risk of detection is relatively small, particularly for a good cheater, who cheats cleverly.

By


He starts writing the phrase “Genom att” (‘by’). When he has written “Genom” he moves his gaze backwards in the text and fixates the first part of the word “konsekvenser” (‘consequences’) simultaneously with pressing the spacebar. This gaze enables him to easily access the whole phrase “sociala konsekvenser” (‘social consequences’). This fixation lasts 525 ms during which he typed the first two letters (‘a’ and ‘t’) of the word “att” (infinitive mark). Apparently, he can type more than one character during one fixation. The characters typed during the fixation are shown in bold underlined font. After having typed these characters he moves his gaze back to the inscription point, finishes the word by typing the last ‘t’, and then continues the sentence by describing positive consequences that could be achieved by improving people’s self-esteem and self-confidence. The whole sentence is shown in example 1b.


Example 1b) The finalized concluding sentence


Genom att bättra på folks självkänsla och självförtroende, få människor att känna sig sedda samt öka acceptansen för våra olikheter tror jag att man kan reducera fusket till ett minimum.


By increasing people’s self-esteem and self-confidence, make people feel noticed and increase the acceptance for our differences I think that one can reduce cheating to a minimum.


Example 2 originates from the same paragraph as example 1 but, in contrast to the text insertion context there, it shows a text deletion context. Example 2a shows the whole paragraph, including the concluding sentence that was formulated in example 1, with the sentence in question highlighted in bold cursive font.


Example 2a)

Då det inte är social accepterat att fuska, kan fusk leda till diverse negativa sociala konsekvenser för en påkommen fuskare. Utstötning är en av dessa. Detta leder oss in i frågeställningen om varför fusk trots allt existerar. En del av orsaken kan vara kravet på att vara lyckad. Detta krav kan komma från flera håll, externt såsom från föräldrar, kamrater och samhället i stort, och internt, krav på sig själv. Ofta är det interna trycket värre än det externa. Då alla inte hr samma förutsättningar för att bli lyckade, är det lockande att bättre på sina chanser genom att ta till fusk. Dessutom är risken för upptäckt relativt liten, särskilt för en god fuskare, någon som fuskar smart.

Genom att bättra på folks självkänsla och självförtroende, få människor att känna sig sedda samt öka acceptansen för våra olikheter tror jag att man kan reducera fusket till ett minimum.


The highlighted sentence questions why cheating exists (‘This leads us to the question of why cheating despite this exists’), but when the writer first formulated it, he hesitated between cheating and cheaters. In the first version he chose cheaters and a slightly different word order with the same meaning. Example 2b shows how he has just finished this sentence and is fixating its last word, “fuskare” ('cheaters').


Example 2b) Initiating a revision

Detta leder oss in i

frågeställningen om varför det då trots allt existerar \({\textcircled{fus}}{\text{kare}}.\)

This leads us to the

question of why then, despite this, there exist cheaters.


He did, however, change his mind and started deleting the word “fuskare” (‘cheaters’), by pressing the backspace key several times. He repeatedly pressed the backspace key, while moving his gaze backwards in the already written sentence. He finally stopped at the adverbial subjunctive “varför” (‘why’) which is the last word kept from his original construction. Example 2c–2e shows step-by-step how he moves his gaze and deletes characters accordingly. To increase readability, we have not deleted anything from the examples but instead chosen to show the deleted words in .


In example 2c he has moved his gaze backwards to the word “det” (‘it’/formal subject) which he fixated for 330 ms. During this period, he deleted the full stop and the word “fuskare” (‘cheaters’). He then moved his gaze forward again and fixated the space between the words “då” (‘then’) and “trots” (‘despite’,) for 300 ms, during which he deleted the word “existerar” (‘exist’) which is shown in example 2d.


Example 2c) Moving backwards

Detta leder oss in i


frågeställningen om varför \({\text{d}}{\textcircled{et}}\)då trots allt existerar


Example 2d) Continuing the deleting process


Detta leder oss in i


frågeställningen om varför det trots allt


In 2e he has finally moved his gaze back to the word “varför” (‘why’), which seems to be the starting point for the reformulation, and deleted all the words in the sentence that followed it. While keeping his gaze there, he used the backspace key to navigate back to where his gaze was. He then rewrote the clause after “varför” (‘why’), now questioning why cheating exists instead of why cheaters exist. In addition, he chose a slightly more compact syntactic structure (with the same semantic content). The final version is shown without context in example 2f with the new formulation in bold underlined font.


Example 2e) Continuing backwards towards the “point of reformulation”


Detta leder oss in i


frågeställningen \({\text{om}}{\textcircled{va}}{\text{rf}}\)ör


Example 2f) The reformulated sentence


Detta leder oss in i


frågeställningen om varför fusk trots allt existerar.


It is undeniable that it looks as if this writer is visually attending to words at one location of the text while inserting more text, as in example 1, or deleting characters, at another. In some deletion cases, for example in 2e, he may just use his gaze as a navigation point for where the deletion presses should stop (i.e., when they are in his visual focus), but in 2c and 2d he fixates other words (admittedly quite close) than those where he intends to stop the text deletion. The million-dollar question is, however, whether and to what extent we can be certain that such seemingly concurrent events are, in fact, happening in parallel. In the above-mentioned handwriting study by Alamargot et al. (2007), the authors described parallel processing as “encoding visual information that is physically distant from where the pen is being moved” (p. 18). While this definition is intuitive and straightforward it can unfortunately not be applied to typing. In handwriting (particularly if cursive) researchers can identify periods when the pen is moving on the page at fixation onset, but fixation location is away from the pen-tip. In typing, each keystroke is a discrete unit and thus each transition between two keystrokes is a potential pause. One way to approach this problem could be to take the point of departure in the keystrokes and identify “eye movements during typing”. We would then only consider fixations that occur between key-down and key-up or exactly simultaneously with key-down. While these fixations would be truly in parallel with typing, they would be infrequent and short, and hence not of particular interest. Instead, we suggest starting from the visually attended information and study to what extent typing takes place during individual fixations. In other words, we look for instances where fixations are located away from the inscription point at already-written text segments and where keypresses occur between fixation onset and fixation offset. From a cascading point of view, this is also more theoretically motivated. The relationship between the visually attended and the buffered information that is executed on the keyboard is most likely fundamental to understanding the temporal dynamics between subprocesses, such as formulation and transcription.

The aim of this paper is therefore to (1) introduce and describe a methodological approach to capture and examine typing during fixations, (2) outline how the visual focus of attention can be examined in relation to the typing activity, and (3) to put this approach into practice and explore typing during fixations in an expository text composition task.

Capturing typing during fixations with ScriptLog

To capture typing behaviour that occurs during a fixation it is necessary to co-register writers’ keystrokes and gaze behaviour, and to synchronize those two data streams for further processing. Here, we demonstrate how this can be achieved using New ScriptLog (Wengelin, et al., 2019)—hereafter ScriptLog, which was (partly) developed for this exact purpose. A fundamental feature of ScriptLog is that keystrokes are output in relation to the fixation data (and not the other way around), which is crucial for capturing typing behaviour during individual fixations.Footnote 1 In the following, we outline how ScriptLog registers eye tracking and keystroke data, how those data streams are synchronized, and how the two data sources are output in relation to each other.

ScriptLog is implemented in the Java programming language and keeps a record of all events that affect the writer’s emerging text in its input window. This includes all keyboard- and mouse-related activity, which critically involves identifying the specific key being pressed and a timestamp for when this key was pressed (and released). To track the dynamics of this interplay, ScriptLog keeps a record of which keys are pressed, when they are pressed, and when they are released. Critically, ScriptLog then also keeps track of the “effects” those key presses have in the writers’ emerging texts. Such effects, include text insertions (the addition of new text), text deletions (the deletion of previously written text), text replacements (e.g., copy–paste–operations) and cursor movements within the text. Thus, to capture these effects, the position of the cursor is also logged. Moreover, if a range of text is highlighted, the start and end positions of such a selection is logged, which, for instance, is critical for logging the effects of a copy–paste-operation. ScriptLog also logs mouse clicks and can thus keep track of mouse events related to highlighting and manipulating text, as well as when relevant mouse-functions are used (e.g., selecting, copy/paste, moving the location of text segments). Finally, ScriptLog keeps track of any scrolling in the editor window. As scrolling involves a change in the vertical position of the visibly present text on the computer screen, this is a critical feature when relating writers’ typing to their visual focus of attention.

Critically for identifying typing behaviour in relation to writers’ gaze fixations, ScriptLog offers online integration of eye tracking data with keystroke logging data with high spatial and temporal resolution. This allows eye tracking data to be simultaneously and synchronously registered with the keyboard and text logging data. While the eye tracking data is streamed and captured into ScriptLog, it is different from the writing data in that it comes as a continuous stream of data samples rather than discrete and well-defined events. However, to functionally understand continuous gaze data in perceptual and cognitive tasks (e.g., during reading) the stream is typically segmented into saccades and fixations. Fixations are the points in time when the eyes are relatively stable, and saccades are the rapid movements from one fixation point to another. During saccades we are virtually blind and cannot process any visual information, whereas visual information, such as letters and words, are processed during fixations (e.g., Rayner, 1998). Thus, when studying processes related to writers’ visual attention towards their emerging texts, it is fundamental to convert the eye movement data into discrete events of saccades and fixations. Then keystroke events can be temporally aligned with the eye movement events.

ScriptLog automatically segments and outputs writers’ gaze data into fixations and saccades, by using the event detection algorithm developed by Nyström and Holmqvist (2010). This algorithm is velocity-based (cf. Smeets & Hooge, 2003) and is particularly useful for capturing typing during fixations for two reasons. First, instead of having the user define the velocity threshold for deciding whether a velocity sample belongs to a saccade or a fixation, this algorithm dynamically and automatically adapts to the present circumstances, making it robust to variations in noise (for implementation details, see Nyström & Holmqvist, 2010). This is a very useful feature—since studies of computer typing are known to challenge the quality of gaze data (Wengelin et al., 2019, Johansson et al., forthcoming): writers engage in extensive head movements and frequently shift their visual focus of attention between the keyboard and the monitor. Second, other commonly used event detectors are prone to erroneously identify smooth pursuit gaze behaviour as fixations, but this event detector has been specifically developed to handle such issues (for implementation details see, Nyström & Holmqvist, 2010). Smooth pursuit-like gaze behaviour is common when writers’ follows the cursor or the emerging text around the inscription point. Thus, this event detector is less sensitive to “noise” from such dynamic gaze behaviour, and reliably outputs writers’ fixation data.

Critically, ScriptLog also includes routines for relating a position on the screen to a position within the text being written. Thus, at each timestamp (sample), ScriptLog logs the x- and y-screen coordinates of the writers’ eyes (separate data streams for the left and right eye). The integration of the two types of data (keystrokes, fixations) allows the user to determine the timestamp of writers’ keystrokes in relation to the onset of their fixations with high temporal resolution (the rate at which eye tracking data is registered is determined by the sample rate of the eye tracker). ScriptLog’s implementation rationale for capturing these dynamics is the following: After running the event detection algorithm to identify the writer’s fixations, a list of fixations with associated screen position coordinates as well as temporal information from all those separate event sources is calculated. ScriptLog then “reconstructs” the text using the keystroke logging data and for each fixation, converts the screen position to a text position. This is implemented by a feature built into Java, in the class TextUI, and the text editor in ScriptLog includes a component called JTextComponent. This component contains a model of the text within it, which in essence is the sequence of characters it currently contains. Another java class, Point, can then represent a position on the screen. Finally, to convert gaze positions on the screen to a position in the text a method—suitably called—viewToModel in the TextUI class is used. To associate writers’ fixations to particular words in their emerging texts, word boundaries to the left and right of the fixated text position are then identified, and the word between those boundaries is then considered as the fixated word. If the writer is currently writing at the end of the text and is fixating on that text segment, the end of the text is used as the right boundary. Furthermore, the cursor position is also associated with particular words, using the same rationale, and it is thus possible to output the word currently being produced as well as words overlapping with the cursor when orienting in the text. From ScriptLog’s data output, writers’ keystrokes can then be analysed in relation to their fixations. See Fig. 1 for an example of the data output that is provided by ScriptLog. This example shows data output for two consecutive fixations in a writing session. During the first fixation (first column) the writer has paused after writing “ho” and fixates the word “annorlunda (‘different’) that was written one row prior to where the cursor is currently located (FixationRow = 18, CursorRow = 19) and 111 characters prior to the cursor location (FixIndex = 1663, CursorIndex = 1774). During the second fixation (columns 2–5) the writer is inserting text and finishes writing the word “hoppas” (‘wish’) while fixating around the inscription point. Thus, one and the same fixation is here represented over several keyboard events (in this example, columns 2–5), which is a very handy data format when analysing typing during fixations.

Fig. 1
figure 1

Example of ScriptLog output. Each column in this table represents a new “event” in the writing session. Such events can be either eye movement events (fixations, saccades) or writing events (e.g., insertions, deletions). Each row represents relevant information for the relationship between the eye tracking data and the keystroke data. Time represents when an event occurred in the writing session. EventType represents which event type the line concerns. EventState represents which state the eye tracking data is in (fixation or saccade). FixStart represents fixation onset. FixStop represents fixation offset. FixDur represents fixation duration (ms). FixX represents fixation location in horizontal screen coordinates. FixY represents fixation location in vertical screen coordinates. FixRow represents which row in the text the fixation is located on. FixIndex represents which character the fixation is currently on (a 1 would represent the first character inserted in the writing session). FixWordLength represents the length of the word currently being fixated. FixWordText represents the text of the word currently being fixated. CursorX represents cursor location in horizontal screen coordinates. CursorY represents fixation location in vertical screen coordinates. CursorRow represents the row in the text where the cursor is currently located. CursorIndex represents which character the cursor is currently associated with (a 1 would represent the first character inserted in the writing session). CursorWord Length represents the length of the word currently associated with the cursor. CursorWordText represents the text of the word where the cursor is currently located

Defining typing during fixations

By using eye tracking together with ScriptLog, it is thus possible to capture writers’ keystrokes and the text information currently being fixated. In the present paper, we focus on keyboard activity occurring during a fixation. For a keystroke to be considered occurring during a fixation it needs to be pressed (key down) after the onset of a fixation and be released (key up) before the offset of that fixation. Several such keystrokes can then occur during one and the same fixation. In the present paper we further distinguish between two types of keyboard events: “text insertion” and “text deletion”. Text insertion concerns all keyboard activities where text information is added into the emerging text and text deletion concerns keyboard activities where text information is removed from the text by means of the backspace key.

Examining typing in relation to gaze location

The relationship between a keystroke and a fixation is critically defined by where in the emerging text the writer is looking in relation to where the inscription point (cursor) is currently located. Thus, when analysing typing during fixations a fundamental aspect is to define relevant units in respect to the distance between the inscription point and the fixation location (cf. Torrance et al., 2016a). Depending on the research question, the definition of such units may vary. For instance, they could be assigned to pre-defined linguistic characteristics in the texts or on number of linguistic units between the fixation point and the inscription point (e.g., number of clauses or sentences away from the inscription point). In the current version of ScriptLog, such mapping of functional units to fixations would require a certain amount of manual coding in the data output.

In the present paper we instead examine the relationship between the inscription point and fixation locations in respect to visuospatial characteristics of the emerging texts and defined different distance levels based on number of characters and rows between the inscription point and the fixation point. The motivation for this approach is multifold. First, indices for the number of characters and rows between inscription point and fixation point are automatically provided by the ScriptLog output and are therefore readily available when putting our approach into practice (see Fig. 1). Second, the purpose of the present paper is not to make claims about how to select meaningful linguistic units when analysing typing during fixations, but to show the possibility and feasibility of how different distances between the fixation point and the inscription point can be examined when putting our approach into practice. Third, fixations more “local” or “distal” to the inscription point have in previous studies been defined both in terms of visuospatial properties of the text (Beers et al., 2010; Torrance et al., 2016a, Experiment 1) and in terms of sentence units (Torrance et al., 2016a, Experiment 2), but have, irrespective of those two types of definitions, shown similar effects upon writers’ fixation durations (Torrance et al., 2016a, Experiment 1 vs Experiment 2). The overall finding is that fixations more local to the inscription point have longer durations than more distal fixations, which has been argued to indicate different levels of processing (Torrance et al., 2016a).

Based on the above-mentioned reasons, we used four different ‘distance levels’ between the fixation point and the inscription point to examine typing during fixations:

  1. 1.

    Fixations around the inscription point. This distance level was defined as fixations 10 or less characters away from the inscription point. The rationale for this threshold was primarily based on basic mechanisms of the human visual system, where it is established that the visual span covers about 10 characters (Frey & Bosse, 2018). Also, when calculating the average word length in the analysed data set (see below), this was found to be about five letters. Thus, the chosen threshold is about twice the size of the average word length, which corresponds to previous definitions of the point of inscription (Beers et al., 2010). By-participant means in the present data set were 3.3 characters (SD = 1.7) for text insertions and 3.1 characters (SD = 2.1) for text deletions.

  2. 2.

    Fixations on the same line as the inscription point. This distance level was defined as fixations 10 or more characters away from the inscription point on the same line as the inscription point. The rationale for this distance level was based on how previous literature has defined reading that were local to the most recently typed word (Beers et al., 2010; Torrance et al., 2016a, Experiment 1). By-participant means in the present data set were 19.2 characters (SD = 8.9) for text insertions and 23.3 characters (SD = 11.7) for text deletions.

  3. 3.

    Fixation on adjacent lines to the inscription point. This distance level was defined as fixations on the line above or below the line of the inscription point (fixations on the blank space below the last written line is thus not included). The rationale for this distance level was based on previously reported differences when reading more local or distal parts of the ongoing text (cf. Beers et al., 2010; Torrance et al., 2016a). We deemed adjacent lines to be somewhere in between in the local-distal-spectrum and thus feasible to analyse as a separate distance level. By-participant means in the present data set were 83.8 characters (SD = 14.7) for text insertions and 89.1 characters (SD = 11.2) for text deletions.

  4. 4.

    Fixations to more distal parts of the texts. This distance level was defined as fixations on more than 2 lines above or below the line of the inscription point. The rationale for this distance level was based on previous studies of reading behaviour in more distal parts than the word currently being composed (cf. Beers et al., 2010; Torrance et al., 2016a). By-participant means between inscription point and fixation point in the present data set were 3.0 lines (SD = 1.0) for text insertions and 3.4 lines (SD = 1.9) for text deletions. Corresponding measures in characters were 258 (SD = 102) for text insertions and 288 (SD = 171) for text deletions.

Also, for typing during fixations to be categorized in distance levels 2–4, it was further required that none of the keystrokes that occurred during the fixation moved the point of inscription to within range of the fixation. For example, if the writer is inserting text in a previously written text segment and is fixating a word 11 characters to the right—on the same line—as the inscription point, this may reduce the distance between the inscription point and the fixation point, leading to a distance now corresponding to the inscription point.

Putting our approach into practice

To put our approach into practice and explore writers’ typing during their fixations, we used data from a previous study, where writers’ keystrokes and gaze data were collected with ScriptLog in a text composition task. The goals of this exercise were to: (1) showcase how ScriptLog can capture these instances in a real data set; (2) provide a general description of writers’ ‘typing-during-fixation-behaviour’ in a text composition task; and (3) explore how this behaviour is influenced by the distance between the fixation point and the inscription point.

Participants

In our data set, fourteen competent adult writers (university students, mean age = 24 years, SD = 1.2, 9 females) participated in a writing experiment. Participants were recruited from a pool of students who self-reported as fairly automatizd (but not professional) typists. None had language or linguistics as a major subject. All participants had at least two years of university studies and they all had Swedish as their first language and normal or corrected-to-normal vision. None of them reported any reading or writing difficulties. In our experiment they wrote an expository text (~ 30 min) discussing possible reasons for and solutions to problems associated with cheating and bullying in school. Findings from the experiment verified that they were fairly automatized typists, in that they mainly looked at the monitor concurrently with their typing (M = 90.3%, SD = 8.5%) and typed relatively rapidly (Mean InterKey Interval within words, = 0.13 s, SD = 0.02).

Apparatus and stimuli

The text-production task was performed on a personal computer equipped with ScriptLog, using a 22″ monitor with the resolution set to 1024 × 768 pixels. The texts were written in a text window occupying 579 × 585 pixels of the screen, and 10 pt Arial font was used with a double line spacing (corresponding to 35 pixels). Eye tracking-data was recorded using a SensoMotoric Instruments (SMI RED) eye tracker, running iView X 2.7 software and sampling at 250 Hz. In this setup the eye tracking camera is located centrally under the presentation monitor at a distance of about 70 cm from the participant. The eye tracker is developed for a contact-free measurement with automatic head-movement compensation, in a range of 40 × 20 cm at a distance of 70 ± 10 cm. In effect, smaller head movements toward and away from the camera do not confound measured eye tracking-data. This set-up has a gaze position accuracy of about 0.4°. Calibration and validation of gaze data was conducted prior to each participant’s experimental session.

Descriptive data

The data set was drawn from a corpus of 60,980 keystrokes in total, out of which 53,596 involved text insertion activity and 16,665 text deletion activity. Collapsed over all participants, 31.6% of the text insertion keystrokes occurred during fixations on the emerging text (only keystrokes corresponding to letters, numbers and the space bar were considered, i.e., delimiters were excluded), and 22.5% of the text deletion keystrokes occurred during fixations on the emerging text (only keystrokes corresponding to backspace were considered, i.e., delete-keys and replacements made with the mouse and/or cut and paste commands were excluded). Collapsed over all participants, 31,121 fixations occurred on the emerging text. Out of those, 80.2% occurred without concurrent typing activity, 15.4% with concurrent text insertion keystrokes, and 4.4% with concurrent text deletion keystrokes.

Thus, in total, the present data set is based on 6,168 fixations (text insertion: 4793; text deletion: 1375) and 20,659 keystroke events (text insertion: 16,911; text deletion: 3748).

By-participant, on average, 342 fixations (SD = 168) involved text insertion keystrokes (Proportion: M = 17.4%, SD = 10.9%), and, on average, 98 fixations (SD = 43) involved text deletion keystrokes (Proportion: M = 4.7%, SD = 2.0%). Proportion fixations to the four distance levels (by-participant) is reported in Table 1, which demonstrates that about 40–50% of writers’ fixations with concurrent keystrokes were located on text information that was not considered to be around the inscription point (according to our definitions above). This highlights that typing during fixations to a significant degree involves visual attention towards previously written text segments and is not simply associated with monitoring the inscription point of the emerging text.

Table 1 Proportion fixations to the four distance levels

Fixations on different text segments in the emerging text

To further scrutinize this relationship, we next examined gaze behaviour to the four distance levels in respect to (1) fixation duration and (2) number of keystrokes. Linear mixed effects models (LMEM; GAMLj module in Jamovi; Gallucci, 2019) were conducted using Jamovi version 1.6.23 (The Jamovi project, 2019), for the two keystroke events (text insertion, text deletion), with the four distance levels (inscription point, same line, adjacent line, distal parts) as independent variables, and where participants were modelled as random effects (intercepts). To describe the model-fit of an independent variable, the deviance of the proposed model was contrasted against an unconditional null model, including only the intercept and the random factor. We then used likelihood-ratio tests to compare models. Models were fitted with restricted maximum likelihood (REML) and Satterthwaite approximations were used to assess the significance of individual predictors.

For fixation durations, results revealed that the four distance levels predicted fixation durations for both text insertion, F(3,4583) = 17.4, p < 0.001, and text deletion, F(3,1346) = 7.98, p < 0.001. For text insertion, post hoc tests (on model parameters) revealed that the difference was significant between all four distance levels (p < 0.002), except for adjacent lines and distal parts. For text deletion, post hoc-tests (on model parameters) revealed that the difference was significant between the inscription point and the same line (p < 0.001) and between the inscription point and adjacent lines (p < 0.001), but not for any of the other comparisons (p > 0.129). See Fig. 2A.

Fig. 2
figure 2

Typing during fixations for text insertion and text deletion in relation to different distances between the fixation point and the inscription point. The top panel A shows average fixation duration (estimated marginal means) over the four distance levels. The bottom panel B shows average number of keystrokes (estimated marginal means) over the four distance levels. Note that results for text insertion and text deletion have different scales. Error bars denote 95% confidence intervals

For number of keystrokes, results revealed that the four distance levels predicted fixation number of keystrokes for text insertion, F(3,4532) = 26.1, p < 0.001, but not for text deletion, F(3,1300) = 0.195, p = 0.9. For text insertion, post hoc tests (on model parameters) revealed that the difference was significant between all four distance levels (p < 0.014), except for adjacent lines and distal parts. See Fig. 2B.

Summary

Results of fixation duration during text insertion revealed that fixations closer to the inscription point had longer durations when compared to more distal fixations, and that fixations away from the inscription point, but on the same line, had the longest durations. This is consistent with previous results of gaze behaviour during composition (Torrance et al., 2016a). For text deletion, this pattern was less consistent, and here fixations around the inscription point were associated with the longest durations. Interestingly, for text insertions, results of the cumulative number of keystrokes during a fixation revealed a pattern that was completely analogous to the fixation duration results. Here, fixations closer to the inscription point were associated with more keystrokes than the more distal fixations, and where keystrokes were most numerous during fixations on the same line (but away from) the inscription point. For text deletion, no clear patterns emerged.

Finally, to compare the two keystroke events (text insertion, text deletion) we included this factor as an independent variable in the same model, which revealed that fixation durations were significantly longer during text deletions, F(1,6158) = 29.0, β = 202, p < 0.001, and that significantly more keystrokes occurred during text insertions F(1,6160) = 17.6, β = 0.7, p < 0.001.

Discussion

In this article, we (1) introduced and described a methodological approach to capture and examine sequences of writing while visually attending to the emerging text, (2) outlined how typing and visual focus of attention could be examined in relation to each other, and (3) tested this approach by exploring typing during fixations in a text composition task. To reach our first goal, we capitalized on ScriptLog’s feature to link gaze with typing across different functional units in the writing task, while allowing for scrolling. Like solutions by Simpson and Torrance (2007) and Chukarev-Hudilainenen et al. (2019), ScriptLog offers the possibility to automatically identify which word the writer is fixating at any given moment, which makes it possible to make detailed descriptions of the relationships between the word being fixated and the word being typed.

The novelty of our approach is that we focused specifically on parallel events and that we took our point of departure in the visual rather than in the keyboard data. Consequentially, we conceptualised parallel as “typing during fixations”. Our focus responds to the call by Alamargot et al. (2007) for empirical studies of parallel events also in a typewriting situation, with the specific constraints of computer writing, as well as to a general need for empirical research that supports or refutes suggestions of parallel/cascading processes in writing (Olive, 2014). Our approach will by no means capture all parallel processing during writing because cognitive processes will not always correspond to measurable behaviour, and not all processing is related to visual behaviour. Planning and formulation can, for example, occur in cascades without any external signs. The approach does, however, offer a method to capture measurable parallel events during typing with high precision and reason about them in the context of a cascading model.

Our choice to analyse the distance between the inscription point and fixation locations in respect to visuospatial characteristics of the emerging texts and pre-defined distance levels in terms of characters and lines, rather than to linguistic structures, could of course be debated, and in addition to the arguments already given in the methods section, we have chosen to elaborate further on that choice here. Following previous research on relations between writing and visual attention (Alamargot et al., 2007; Beers et al., 2010; Torrance et al., 2016a), we were interested in the relation between processing demands—as indicated by fixation durations, and fixation location—ultimately to increase understanding of the functions of lookbacks during composition in general and parallel processing in particular. To analyse this relationship, certain thresholds need to be considered when defining the functional units. Our threshold of ten letters for “fixations around the inscription point” may seem arbitrary at first sight, but since the word-length distribution of our data agrees with previous knowledge of Swedish (cf. Sigurd et al., 2004), we can with reasonable certainty assume that, as mentioned in the methods section, our operationalisation agrees with how previous literature has operationalised writers’ visual attention as being in the vicinity of the inscription point. Choosing lines (like Beers et al., 2010) instead of sentences (like Torrance et al., 2016a) as units for our other thresholds does, however, warrant certain discussion. The limitation of this is that the results will change with, for example, window and font size. On the other hand, a problem with sentence analysis is that writers can express the same semantic content in one sentence that consists of, for example, three coordinated clauses, or in three sentences each of which represents one of those clauses. To create cohesion between them, similar devices and cognitive processes may be needed. T-units (Hunt, 1965), or possibly C-units (Loban, 1976) in case of children or L2-learners, may therefore be more useful analytical units, but due to the dynamic nature of computer writing, where the writer can make changes anytime and anywhere (as illustrated in examples 1 and 2 above) identifying linguistic units that are not signalled by orthography (e.g., spaces or punctuation marks) is inherently challenging. To our knowledge no software with full synchronisation between eye tracking and keystroke logging data can offer automatic detection of syntactic units between the word and the sentence levels in linear production.Footnote 2 We argue that both visuospatial and linguistic thresholds involve a certain level of arbitrariness, but since our results are consistent with previous research—longer fixation durations when looking at text closer to the inscription point than when looking at words at more distal parts of the text—both approaches seems to support the notion that fixations more “local” to the inscription point tend to be associated with more extensive processing, than fixations to more distal parts (cf., Torrance et al., 2016a). For text deletion, these patterns were somewhat different, with overall longer fixation durations than during text insertion and with less distinct distance effects, indicating that different processes are active during these two activities.

Focussing on the sequences of typing during fixations, about 20% of the writers’ fixations on their emerging text involved concurrent typing behaviour (text insertion and text deletion). Furthermore, in several cases one fixation outlasted at least 2–3 keystrokes. In a large proportion of these instances the fixation was within the proximity of the cursor, but in approximately 40% of both the text insertion cases and the text deletion cases writers fixated words further away from the inscription point, indicating that they were not only pursuing the cursor, but also used their gaze behaviour for other purposes. Interestingly, fixations in our parallel sequences were longer than those in previous literature, which have either focused on fixations that took place in pauses or not distinguished between fixations in pauses and fixations concurrent with typing. For instance, in comparison to Torrance et al., (2016a), fixation durations were in the present study about twice as long. This is in line with the results of Alamargot et al. (2007) and may indicate more cognitive processing when fixating text information concurrent with typing—at least for fixations on text segments not located around the inscription point. As longer fixation durations typically signal increased cognitive processing (e.g., Rayner, 1998), it is conceivable that these effects can be attributed to the assumed parallel processing of producing new text while monitoring recently typed characters in the proximity of the inscription point or while visually processing prior text segments (on the same line or on more distal parts of the text). Furthermore, and perhaps not surprisingly, fixation durations showed a high correlation with number of keystrokes pressed during the fixation. A possible interpretation here is that fixation duration is limited to the time it takes to execute the content currently stored in the graphemic buffer. This could be explained in terms of Olive’s (2014) suggestion that operations at a particular level may interact with a subsequent level of processing and that operations at low levels of processing may affect higher levels. Another interpretation is that the fixation duration is dependent on the function of the lookback and that the number of characters that can be produced during a fixation is dependent on how long it is.

For text deletion, the patterns were somewhat different, with overall longer fixation durations than during text insertion. This indicates that different processes are active in these two types of keyboarding. It is likely that the linguistic complexity of revision (e.g., identifying the problem, reformulation, integrating this with the current text, etc.) causes higher demands that lead to longer fixation durations. It is also possible that visual processing that occurs concurrent with deletion requires an increased demand of cognitive resource. However, as eye movements are frequently moving from right to left during text deletions (i.e., in the opposite direction as during regular reading behaviour), fixation durations should be affected by the increased control demands that are recruited during such “un-automatized” ways of attending to coherent text information (cf., Rayner, 1998). Also, as the task is likely to involve a behaviour more similar to visual search than regular reading when monitoring and evaluating what information to remove (and what not to), it is conceivable that fixation durations will increase when compared to regular reading behaviour (cf., Rayner, 1998; Rayner & Raney, 1996).

To conclude, we suggest that parallel processing warrants more theoretical and methodological interest from writing researchers. Our approach offers extended opportunities for researchers to unravel the complex dynamics of writers’ composing and eye movements, for example, to test the suggestion by Breetvelt et al. (1994) and Pianko (1973) that rereading is associated not only with revision but also with text generation processes. The results from our limited empirical study support the idea that higher-level processing is taking place in parallel with graphomotor action and that this fills an important function for skilled writers. Despite being cognitively costly, looking back and monitoring the inscription point seems to be perceived as beneficial by the writer. Lookbacks could, for example, be used to generate ideas, decide on formulation, or create cohesion. In our introductory example, the writer clearly used the information from his distal fixation (with a fixation duration well above the threshold for visual processing of the text content) on “social consequences” to generate his concluding sentence, which like the one he fixated, focused on consequences. It is possible that during deletion eye movements merely function as a type of navigation help, but the fixation durations indicate a somewhat higher level of cognitive processes. Based on the sentence revision in our example, it is conceivable that the writer was reformulating the sentence while simultaneously deleting text and moving backwords towards a feasible “reformulation point”.

Understanding the functions of parallel events currently require manual coding and qualitative interpretation but providing a robust automatic method for identifying parallel events is an important step in facilitating systematic quantitative analysis. As we have shown, our current setup with ScriptLog’s integrated eye-tracking module offers the possibility to automatically identify which word the writer is visually attending at any given moment, and this allows for future analyses of e.g., semantic correspondence between the word that is in focus and the inscription point, for the instances where this differs. Future research should focus on how our rationale can be used for more hypothesis-driven research investigations of how higher-level processes, such as planning, evaluation, and error monitoring, can be more related to linguistic units and boundaries, but also how external factors such as font and window size influence and/or limit parallel processing. An exciting prospect would be to combine our approach with newly developed methods for extraction and categorisation of revisions from keystroke logs. See Conijn et al. (2021) who recently developed a machine-learning system that identifies and categorises insertions (that take place away from the inscription point), deletions and substitutions on subword, word and above-word level. Even though revision by no means is the only function of lookbacks/rereading in general, or parallel processing in particular, it does constitute a substantial proportion of the instance of typing during fixation in our little “test drilling”.