Introduction

Sarcasm is a form of verbal irony that is often used to criticize someone (Attardo, 2000; Kreuz & Glucksberg, 1989; cf. positive sarcasm, e.g., Clark & Gerrig, 1984; Gibbs, 2000). Along with ironic language in general, it is an integral part of everyday communication. Ironic language is found in 7.4% of e-mails sent between friends and 72.8% of personal blog entries (Whalen, Pexman, & Gill, 2009; Whalen, Pexman, Gill, & Nowson, 2013), and it is used eight times per hour in American television shows (Schwoebel, Dews, Winner, & Srinivas, 2000). Although it carries a higher risk of miscommunication, it has been suggested that ironic language such as sarcasm is used more frequently in written, computer-mediated communication than in face-to-face conversations (Hancock, 2004). Misunderstanding sarcasm may have dramatic consequences: misinterpreted sarcastic and ironic messages on Twitter have caused people to be fired from their jobs and to be arrested (Fallows, 2016; Robbins, 2010; Ronson, 2015). In the present study, we examined how readers resolve the meaning of written sarcasm and explored individual differences in how readers process sarcastic utterances.

Resolving the meaning of sarcastic utterances

According to current theories of sarcasm comprehension, it is particularly difficult to process and comprehend sarcasm when the utterance is not typically used in sarcastic meaning and the context does not provide advance cues for a sarcastic interpretation (Gibbs, 1994; Giora, 2003; Grice, 1975; Pexman, 2008). In this type of setting, a reader must search for an alternative interpretation of the utterance and incorporate this into the memory representation of the text. In support of these theoretical views, recent eye-tracking studies have shown that written sarcastic utterances take more time to process and are harder to comprehend than their literal counterparts (Au-Yeung, Kaakinen, Liversedge, & Benson, 2015; Filik, Leuthold, Wallington, & Page, 2014; Filik & Moxey, 2010; Kaakinen, Olkoniemi, Kinnari, & Hyönä, 2014; Olkoniemi, Ranta, & Kaakinen, 2016; Olkoniemi, Strömberg, & Kaakinen, in press; Turcan & Filik, 2016).

The exact time-course of resolving sarcasm can be studied with eye tracking, which is particularly useful for examining how readers comprehend connected discourse (Hyönä, Lorch, & Rinck, 2003; Raney, Campbell, & Bovee, 2014). Eye fixations can be categorized into those done during the first-pass reading of a sentence and those done during later look-backs initiated by subsequent parts of the text. First-pass sentence reading can be further divided into progressive fixations, which land on unread parts of the sentence, and first-pass re-readings, which are fixations that return to earlier parts of the sentence (Liversedge, Paterson, & Pickering, 1998). Sentence-level analyses are informative when the “area of interest” is not a single word (cf. Rayner, 1998) but instead consists of a phrase or a sentence (Hyönä et al., 2003). This is often the case with sarcastic utterances.

What previous eye-tracking studies show about the processing of sarcasm is that readers tend to do more re-reading of sarcastic than literal utterances during first-pass reading (Kaakinen et al., 2014; Olkoniemi et al., 2016, Olkoniemi et al., in press). Moreover, readers tend to look back to the sarcastic utterance from subsequent parts of the text (Kaakinen et al., 2014; Olkoniemi et al., 2016; Olkoniemi et al., in press) and to initiate a look-back from the sarcastic utterance to the preceding context (Olkoniemi et al., 2016; Olkoniemi et al., in press). From a theoretical point of view, these look-backs may reflect that readers are trying to integrate the sarcastic utterance with the developing text representation, which requires a reanalysis of the utterance (e.g., Grice, 1975). The findings resemble those of studies on the processing of syntactically ambiguous sentences (i.e., garden-path sentences), showing that readers typically regress back to the syntactically ambiguous sentence region from subsequent areas of the text (e.g., Frazier & Rayner, 1982; Meseguer, Carreiras, & Clifton, 2002; Mitchell, Shen, Green, & Hodgson, 2008) and sometimes re-read the preceding sentence context (von der Malsburg & Vasishth, 2013). However, it should be noted that these garden-path studies have examined the processing of complex single sentences, not entire passages.

Even though previous studies suggest that readers tend to do more first-pass re-reading and later looking back to the sarcastic utterance and, occasionally, to the preceding context (Kaakinen et al., 2014; Olkoniemi et al., 2016; Olkoniemi et al., in press), the time-course of resolving sarcasm is still unclear. If readers already tend to do more re-reading of the sarcastic utterance during the first-pass, why do they still look back to it from subsequent parts of the text?

The role of re-inspections in text comprehension

An obvious reason for re-inspections is that readers want to refresh the text information in their memory, as this supports comprehension (e.g., Booth and Weger, 2013; Schotter, Tran, & Rayner, 2014; Hyönä & Nurminen, 2006; Hyönä, Lorch, & Kaakinen, 2002; Raney et al., 2014; Rayner, Chace, Slattery, & Ashby, 2006; White, Lantz, & Paterson, 2016). Studies that have examined the function of regressive eye movements (i.e., a backward-directed eye movement) have found that fixations made during first-pass re-reading are crucial for comprehending the sentence (Booth & Weger, 2013; Schotter et al., 2014; White et al., 2016). For example, readers made more comprehension errors when re-reading of the already fixated words was eliminated due to masking the words after the reader had moved on in the sentence (Booth & Weger, 2013; Schotter et al., 2014; White et al., 2016, Exp1). Moreover, Booth and Weger (2013) showed that readers return to previous words in a sentence to re-access their meanings. In their study, participants read single sentences containing a predefined target word. In cases where the reader made a regression back to the target word, a display change was performed during the returning saccade, and the target word was changed to another word that influenced the interpretation of the sentence. After reading, participants’ comprehension of the sentence was measured. The results showed that when the target word was changed, readers were likely to assign a new meaning to the sentence. This suggests that information extracted during re-reading overwrites the information gathered during initial reading (Booth & Weger, 2013, Exp. 3).

Previous research suggests that look-backs (i.e., returning to previous sentences) reflect a conscious effort (Hyönä & Nurminen, 2006) to build a comprehensive mental representation of the text contents (Hyönä et al., 2002). First of all, readers are able to report after reading whether they looked back in the text, and where (Hyönä & Nurminen, 2006). Second, readers who initiate look-backs to important parts of the text gain better comprehension than readers who look back more randomly (e.g., Hyönä et al., 2002).

Re-inspecting parts of sentences or a text means that the reader makes a regression to a previously read word or sentence. Previous research suggests that readers are accurate in making saccades to the part of the text that caused them comprehension difficulty (e.g., Frazier & Rayner, 1982; Meseguer et al., 2002), thus indicating that they build a representation of the text’s spatial layout while reading (e.g., Baccino & Pynte, 1994, 1998; Kennedy, Brooks, Flynn, & Prophet, 2003; Murray & Kennedy, 1988). The representation of the spatial layout allows readers to selectively re-inspect the text, which is more efficient than simply re-reading the whole text again (Kennedy et al., 2003).

An interesting suggestion is that looking back in a text does not necessarily indicate a need to re-inspect the information itself, but rather a need to focus attention on what appeared there (Meseguer et al., 2002). Eye movements to previously viewed locations are sometimes triggered by reactivation of the memory representation, and this enhances subsequent memory retrieval (Ferreira, Apel, & Henderson, 2008). Therefore, look-backs may support the refreshing of text information in memory in two ways: by focusing attention on a certain spatial location – which helps in retrieving the contents of the text from memory – and/or by providing a review of the text content itself.

Even though previous studies suggest that look-back fixations play an important role in text comprehension (Hyönä & Nurminen, 2006; Hyönä et al., 2002; Raney et al., 2006, 2014), very little is known about their actual function in reading comprehension. Studies that experimentally manipulated the availability of previously read information to examine the role of regressive eye movements (e.g., Booth & Weger, 2013; Schotter et al., 2014; White et al., 2016) used only single sentences, thus providing information about the function of first-pass re-readings in sentence comprehension. In the present study, we modified the “trailing mask” paradigm used in these previous studies (Schotter et al., 2014) to be suitable for examining the reading of passages. Our particular interest was the role of look-back fixations in processing sarcastic utterances embedded in passages.

Individual differences in the processing of sarcasm

One recent theory on sarcasm comprehension, the parallel constraint-satisfaction framework, explicitly states that there are individual differences in sarcasm comprehension (Pexman, 2008). According to this framework, reader-related factors – such as how frequently a person uses sarcasm – influence the likelihood that different interpretations (literal, sarcasm, or white lie) are active in the reader’s mind. In line with this, recent eye-tracking studies have suggested that individual differences in working memory capacity (WMC) and the ability to process emotional information impact the likelihood of re-reading sarcastic texts (Kaakinen et al., 2014; Olkoniemi et al., 2016; Olkoniemi et al., in press).

Working memory refers to the mental process of maintaining information in an active state for later recall and manipulating information during the execution of ongoing tasks (e.g., Cowan, 2010; Daneman & Carpenter, 1980; Engle, 2010). There are individual differences in the capacity of working memory, and these are related to the ability to control attentional resources so that relevant information is quickly activated and irrelevant information is inhibited (e.g., Engle, 2010). WMC thus plays a crucial role in text comprehension, as it is needed to direct attention to, and maintain attention on, relevant information (see also Gernsbacher, 1993).

Recent eye-tracking studies have shown that individual differences in WMC are related to the processing of sarcasm (Kaakinen et al., 2014; Olkoniemi et al., 2016). Readers with high WMC demonstrate an increase in first-pass re-reading of sarcastic sentences (Kaakinen et al., 2014; Olkoniemi et al., 2016), whereas low WMC readers are more likely to initiate look-backs to the sarcastic sentences (Olkoniemi et al., 2016). Hence, the time-course of resolving sarcasm seems to depend on WMC, such that high WMC readers detect sarcasm faster and/or resolve it earlier than low WMC readers, who show mainly delayed effects. One possible explanation for these results is that low WMC readers may have trouble in inhibiting the initial literal interpretation of the utterance, which is why they need to engage in later reprocessing to validate the sarcastic meaning (e.g., Giora, 1999). For low WMC readers, then, look-backs should be important in forming sarcastic interpretations.

In the present study, we examined the roles of both verbal (Daneman & Carpenter, 1980) and visuo-spatial (Redick et al., 2012) WMC in sarcasm comprehension. There is an ongoing theoretical debate on whether WMC is domain specific or domain general (e.g., Oswald, McAbee, Redick, and Hambrick, 2015; Fougnie, Zughni, Godwin, & Marois, 2015). The present study does not aim to resolve this debate for good, but we thought it was important to include both types of measures, as look-backs may be driven by a memory representation of the text’s spatial layout. If look-backs are related to episodic memory processes that launch a reader’s eyes toward a spatial location where relevant information was previously identified, readers with higher spatial WMC may be better able to use this strategy to resolve sarcasm than low spatial WMC readers. Using more than one measure of WMC also provides a more reliable view of the role of working memory in comprehending sarcasm (e.g., Conway et al., 2005; Walczyk & Taylor, 1996).

Another important factor in sarcasm comprehension is the ability to process emotional information. Sarcastic statements are typically meant to sting and are usually perceived as insulting (e.g., Akimoto et al., 2014; Bowes & Katz, 2011; Olkoniemi et al., in press). Sarcasm can also be used as a form of humor, and sarcastic comments might be perceived as funny (e.g., Akimoto et al., 2014; Olkoniemi et al., in press). Previous studies have shown that sensitivity to the emotional message delivered by the speaker or by the sarcastic protagonist in a story is important in sarcasm comprehension (Amenta, Noël, Verbanck, & Campanella, 2013; Nicholson, Whalen, & Pexman, 2013; Olkoniemi et al., 2016; Shamay-Tsoory, Tomer, & Aharon-Peretz, 2005). Readers who can rapidly assign an emotional tone to the utterance should thus be able to interpret sarcasm more quickly (Jacob, Kreifelts, Nizielski, Shütz & Wildgruber, 2016; Olkoniemi et al., 2016; Olkoniemi et al., in press). In their eye-tracking study, Olkoniemi et al. (2016) showed that a poor ability to make use of emotional information (as measured with the Iowa Gambling Task; Bechara, Damasio, Damasio, & Anderson, 1994) was reflected in eye-movement records as an increased probability of looking back from the sarcastic target utterance to earlier parts of the text. In another study, readers with a poor ability to recognize emotions (as measured by the 20-item Toronto Alexithymia Scale; Bagby, Parker, & Taylor, 1994) showed increased first-pass reading time of the utterance following a sarcastic statement that validated the sarcastic interpretation (Olkoniemi et al., in press). These findings suggest that a poor ability to process emotional information is related to greater confusion when encountering sarcastic utterances and that this is reflected as an increased need to reprocess the sarcastic utterance and the passage context.

In previous studies, the Iowa Gambling Task (IGT; Bechara et al., 1994) and the Toronto Alexithymia Scale (TAS-20; Bagby et al., 1994) have been used to study individual differences in the processing of written sarcasm (Olkoniemi et al., 2016; Olkoniemi et al., in press). Although they should gage different abilities, both measures have been related to similar effects on the processing of sarcasm. IGT is thought to measure an individual’s sensitivity to emotional responses to reward and/or punishment in a decision-making task. The effect of the emotional response measured by IGT is thought to be, at least to some extent, unconscious (e.g., Buelow & Suhr, 2009). On the other hand, TAS-20 measures individuals’ ability to recognize and name emotions (i.e., alexithymix traits). TAS-20 should thus reflect more conscious emotional processing ability. In terms of written sarcasm comprehension, IGT should be related to the immediate emotional response evoked after a reader has processed the sarcastic utterance and starts to build an inference. TAS-20 should be related to a reader’s ability to name and recognize that the protagonist in the story intends to insult another person, and this should help in inferring the meaning of the utterance. In the present study, both measures were used to explore the role of emotion processing ability in the comprehension of written sarcasm.

Overview of the present experiment

The goal of the present study was to examine the role of look-backs in resolving the meaning of sarcastic utterances embedded in story context. We modified the “trailing mask” paradigm used in previous sentence reading studies (Schotter et al., 2014) to be suitable for passage reading. In the mask condition, readers revealed the text one sentence at the time, and the previously read sentence was always replaced with x’s as soon as the reader moved to the next sentence. This type of masking allowed readers to perform normal first-pass re-reading of the sentence but prevented re-examination of the text content during later look-backs (see Fig. 1). In the no-mask condition, readers revealed initially masked text one sentence at the time, and the sentences remained visible after the reader had moved on.

Fig. 1
figure 1

Examples of the no-mask and mask conditions

Participants read short passages that contained either sarcastic or literal utterances embedded in sentences (e.g., “What a clown you are!” Zachary shouted.). After each passage, participants responded to a question about their interpretation of the target utterance and a question that tested their memory about the passage content. To get a detailed picture of the time-course of processing, we computed sentence-level fixation measures (Hyönä et al., 2003) separately for the target utterance (“What a clown you are!”) and the spillover region (“Zachary shouted.”).

Our primary interest was the impact of masking manipulation on the processing and comprehension of sarcastic versus literal utterances. We assumed that if an increase in look-backs to sarcastic utterances and passage context reflects a need to re-examine the text content to resolve the meaning of the utterance (e.g., Grice, 1975; Olkoniemi et al., 2016), we should observe a decrease in look-backs to and from the sarcastic utterances in the mask condition. It is also possible that readers develop compensatory strategies to deal with the masking of the previous sentences and start spending longer first-pass reading times, and especially re-reading times (Walczyk & Taylor, 1996; White et al., 2016), on sarcastic utterances. As the spillover region is within the same sentence as the target utterance, increased first-pass reading times and look-backs initiated from the spillover region to the target utterance may also occur. However, if such compensatory strategies are not used, we should observe poorer comprehension of the sarcastic utterances in the mask condition.

We also explored individual differences in how readers process sarcastic utterances. Previous studies suggest that WMC (Kaakinen et al., 2014; Olkoniemi et al., 2016) and the ability to process emotional information (e.g., Amenta et al., 2013; Nicholson et al., 2013; Olkoniemi et al., 2016) may be crucial in determining how quickly readers can resolve sarcasm. Assuming that look-backs reflect the need to re-examine the text content, and thus to resolve the meaning of the utterance, we expected that readers with low WMC and readers with relatively poor emotion processing abilities are more influenced by the mask than readers with relatively high WMC and good emotion processing abilities. Thus, the eye-movement measures of these readers should reflect a greater impact of masking. Both verbal and spatial WMC tasks were used to examine the contributions of the different aspects of WMC to resolving sarcasm. Moreover, a test of the ability to name and recognize emotions and a test of the ability to make use of emotional information in higher-order cognitive tasks were used to explore how different aspects of emotion processing relate to the comprehension of sarcasm.

Method

Participants

Sixty-two University of Turku (Finland) students (53 women, Mage = 23.57 years, SDAge = 6.06) participated in the study to fulfill a course requirement. All were native speakers of Finnish (the language studied here) and had normal or corrected-to-normal vision. All participants provided written informed consent before the experiment.

Based on effect size measures from the Olkoniemi et al. (2016) eye-tracking study, a power analysis was conducted using G*Power 3 (Faul, Erdfelder, Lang, & Buchner, 2007) to estimate the sample size needed to detect an interaction between text type and an individual differences measure. The aimed statistical power was set to .80 and the significance level to α = .05. This analysis showed that at least 49 participants were needed.

Apparatus

Eye movements were recorded monocularly using EyeLink 1000 (SR Research Ltd., Ontario, Canada) at 1,000 Hz sampling frequency. The stimuli were presented on a 21-in. CRT screen (resolution: 1,024 × 768, refresh rate: 120 Hz). Participants were seated 70 cm from the screen, and a chin rest was used to stabilize the head.

Text materials

Each participant read a total of 42 text paragraphs on a computer screen (font: Courier New, font size: 14 pt, line height: 3 pt) while their eye movements were recorded. Fourteen of the paragraphs included a sarcastic utterance, 14 included a literal utterance, and there were 14 filler paragraphs. There were two versions of each experimental paragraph: a sarcastic and a literal (28 paragraphs × 2 text types, resulting in 56 experimental paragraphs). The paragraphs used were the same as in the Olkoniemi et al. (2016) study. They were pre-tested for the utterances’ familiarity as sarcasm, naturality in the story context, whether the emotional state of the speaker was apparent in the sarcastic statement, and whether the context supported a sarcastic interpretation. All sarcastic utterances were rated as unfamiliar; literal and sarcastic target utterances were evaluated to be natural in the story context; sarcastic utterances were evaluated to be less positive and more negative than their literal counterparts; and the story contexts did not provide cues about sarcastic interpretation for the utterance. English translations of paragraphs used in the experiment are available via the Open Science Framework (https://osf.io/9maz8). The filler passages were similar to the experimental items, but they did not contain indirect language (Kaakinen et al., 2014).

Each of the participants saw half of the paragraphs in the mask condition and the other half in the no-mask condition. In the no-mask condition, text was first masked with x’s and readers revealed the text one sentence at a time by pressing the spacebar. Once the sentences were read, they remained visible on the screen as the reader progressed. In the mask condition, the text was also initially masked and the readers revealed sentences one at a time by pressing a spacebar. However, after the spacebar was pressed to reveal a new sentence, the previous sentence was masked with x’s (cf. moving window paradigm, McConkie & Rayner, 1975). An example text is shown in Table 1.

Table 1 An example of experimental text (translated from Finnish)
Table 2 Descriptive statistics of the reading time measures

After reading, participants responded to two questions that required a yes/no response. The first tapped into the meaning of the target utterance (e.g., Did Zachary think that Paul’s show was a success?), and the second tested memory for the text information (e.g., Does Paul work at the circus as a clown?). Participants answered the questions by pressing “yes” or “no” buttons on the keyboard. A correct answer was rewarded with one point. The percentage of correct answers was computed separately for different question types. Reaction times to the inference questions were also recorded.

Individual differences measures

Toronto Alexithymia Scale (TAS-20)

The ability to recognize emotions was measured with the TAS-20 (Bagby et al., 1994; Joukamaa et al., 2001). TAS-20 is a paper-and-pencil self-report scale that includes short claims (e.g., “I am often confused about what emotion I am feeling.”). Participants responded to the items on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). Five of the items are negatively keyed. The scale was scored by summing up all responses. The scores can vary between 20 and 100 points, with higher scores indicating a poorer ability to recognize emotions (i.e., higher alexithymia). The internal reliability (Cronbach’s alpha) of the TAS total score was α = .74.

Iowa Gambling Task (IGT)

A computerized version (Psychology Experiment Building Language Version 0.13; Mueller, 2012) of the IGT (Bechara et al., 1994) was used to measure the ability to use emotional information in decision-making. In the task, participants are given four decks of cards (referred to here as A, B, C, and D) and an imaginary capital of $2,000. They are instructed to use the capital in a game of cards so that they win more money than they lose. In the game, turning some cards (by clicking on the deck with a mouse) leads to a reward ($100 from decks A and B, $50 from decks C and D) and turning others leads to a penalty (net outcome of -$250/10 cards in decks A and B and net outcome of +$250/10 cards in decks C and D). Playing mostly from decks A and B leads to an overall loss, and playing mostly from decks C and D results in an overall gain.

Because the cards in the decks are shuffled, the occurrence of the penalty cannot be predicted. In the long run, half of the decks (decks A and B) cost the most. This losing trend should activate a negative emotional signal that these disadvantageous decks should be avoided. If emotional markers are absent or cannot be used properly in the decision-making process, then disadvantageous decks are not avoided. This causes a net loss or minimal gains in the task. The task ends when 100 cards have been drawn, though this is not revealed to the participants beforehand. After a few draws, people tend to choose the advantageous decks (C and D; Bechara et al., 1994). The task is scored by summing the number of draws from the advantageous decks, and scores can vary between 0 and 100. To estimate the internal consistency of the task, the split half correlation was calculated for the choice of advantageous decks (between the odd- and even-numbered trials to avoid a learning effect). The correlation was r = .88, p <.001, 95% CI [.87–.89].

Reading span task (RSPAN)

The reading span task (Daneman & Carpenter, 1980; Kaakinen & Hyönä, 2007) was used to measure verbal WMC. In this task, participants read aloud sets of unrelated sentences presented on a computer screen. After every set, they are asked to recall the last word of each sentence in the set. The task begins with sets of two sentences. The set size increases as long as the participant is able to recall the final words of the sentences. Each set size is repeated three times. The task ends when the participant fails to recall the final words of a sentence of a particular set size for its three repetitions. The task was preceded by a practice session that included reading three sets of two sentences. It was scored for the total number of correctly recalled final words (Friedman & Miyake, 2005). The task scores can vary between 0 and 81 points. The internal reliability (Cronbach’s alpha) of the RSPAN varies between .70 and .90 (see, e.g., Conway et al., 2005).

Symmetry span task (SSPAN)

The symmetry span task (Redick et al., 2012) was used to measure spatial WMC. In the task, participants view an 8 × 8 matrix of white and black squares and determine whether the pattern is symmetrical along its vertical axis. After the judgment, participants are presented with a 4 × 4 matrix of squares in which one cell is filled in red. After the series of matrix presentations, participants must recall the serial order of the positions of the red cells (2–5 spatial locations). The task was scored for the total number of items recalled in the memory trials (the partial span score; Conway et al., 2005). Scores can vary between 0 and 42 points. The internal reliability (Cronbach’s alpha) of the SSPAN varies between .76 and .81 when partial scoring is used (see, e.g., Redick et al., 2012).

Procedure

Participants were tested individually. They were naïve to the purpose of the experiment: upon arrival, they were only informed that the experiment was about reading. The specific nature of the task was explained to the participants only after the experiment.

First, each participant signed an informed consent form and the eye-tracking system was introduced. The experimental procedure was then explained. The eye-tracker was set up and calibrated using a nine-point calibration screen. Participants were instructed to read each paragraph at their own pace. Each paragraph was presented on one screen. Participants were told to press the spacebar on the keyboard when they wanted to move from one sentence to another within a paragraph. After every paragraph, two questions concerning the previously read paragraph were presented, one at a time. After the participant had answered the second question, the next paragraph was presented.

The reading task was followed by SSPAN, IGT, RSPAN, and TAS-20. Each experimental session lasted for about 90 min.

Results

Data preparation

Fixations shorter than 50 ms were either merged with a nearby fixation (if the distance between the fixations was < 1°) or removed from the data. Sentence-level measures for the target utterance and the spillover region were computed from the eye-movement data (see Hyönä et al., 2003). The reading time measures were divided into first-pass fixation times and later look-backs. First-pass reading time was the summed duration of fixations falling within the sentence during first-pass reading. First-pass reading time was further divided into forward-fixation time and re-reading time. Forward-fixation time is the summed duration of fixations that land on unread parts of the sentence during first-pass reading, and first-pass re-reading time is the summed duration of fixations made during re-inspection of the target utterance before moving on in the text. The look-back fixation time is the summed duration of fixations returning to the sentence from other parts of the text after the first-pass reading. Look-froms are look-back fixations that were initiated from the sentence.

For the re-reading and look-back measures (first-pass re-reading, look-back fixations, and look-from fixations), we computed separately the probability of re-reading or a look-back (binomial measure). The summed fixation times were computed on the condition that re-reading or a look-back was done. Because readers saw the target utterance and the spillover region at the same time, look-back times to the target utterance were computed separately for look-backs made from the spillover region and look-backs made from other text regions to the target utterance.

The reading time measures were skewed and were consequently transformed. The best-fitting transformation was selected to normalize the measures. The first-pass reading times on the target utterance and the spillover region and the forward-fixation time on the target utterance were square root transformed. The first-pass re-reading time measures were logarithmically transformed before the analyses. All measures were analyzed for the target utterances; first-pass reading time, probability to look back to the target utterance, and look-back time to the target utterance were analyzed for the spillover region. Observed means and standard deviations of the different eye-movement measures are presented in Table 2.

Statistical analyses

Data were analyzed with linear mixed-effects models (LMM) using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in the R statistical software (Version 3.4.3; R Core Team, 2017). The models were estimated using Maximum Likelihood estimation. Separate models were built for each eye-movement measure for the target utterance and the spillover region. In addition, separate models were built for reaction times and correct answers to text memory and inference questions (descriptive statistics are presented in Table 3).

Participants and items were entered into the models as crossed random factors (Baayen, Davidson, & Bates, 2008). The influences of the text type and mask were tested by fitting models with sum coded fixed effects variables. The individual differences variables (RSPAN, SSPAN, IGT, and TAS-20; see Table 4 for descriptive statistics) were added to the models as centered fixed effects variables. The correlations between the individual differences measures were low or modest (see Table 4). The RSPAN and TAS-20 scores were skewed; consequently, the RSPAN score was square root transformed and the TAS-20 was logarithmically transformed. To examine the potential effects of presentation order on the observed effects (e.g., Olkoniemi et al., 2016), a trial order was added as a sum coded fixed effect to the models (first half of the experiment = -1, end half of the experiment = 1).

Table 3 Reaction times (RTs) and percentage of correct answers (%) to text memory and inference questions
Table 4 Descriptive statistics of the individual differences measures and correlations between the measures

Model fitting was performed in a step-wise fashion, starting with the most complex model and including text type, mask condition, trial order, and the individual differences measures and their interactions as fixed effects. At this point, only intercepts for participants and texts were fitted as random effects. The fixed effect associated with the smallest t-value was removed from the model, beginning with the interaction terms, and the reduced model was compared to the former model using the anova function in the lme4 package (Bates et al., 2015) to compare the fit of the models. Fixed effects were removed one at a time until nothing else could be removed without significantly reducing the fit of the model. The fixed effects of the primary interests (text type, mask condition, and their interaction term) were always retained in the model, as they were crucial with respect to our hypotheses. Finally, a full random structure was fitted to the model (Barr, Levy, Scheepers, & Tily, 2013), and fixed and random effects were removed if further changes did not significantly reduce the fit of the model. If the model failed to converge after fitting the full random structure, the random structure was trimmed top-down, starting with correlations between factors.

The exact degrees of freedom are difficult to determine for the t-statistics estimated by LMMs, leading to problems in determining exact p-values (Baayen et al., 2008). Statistical significance at the .05 level is indicated by the values of |t and z| > 1.96.

In the text, we always report the main effects of text type and masking condition and their interactions. As for the other interaction effects, only significant effects involving text type are reported for the sake of brevity. Interactions were further examined by computing the simple effects of text type; the estimates and their 95% CIs are illustrated in the figures. All of the final models are reported in Appendix A.

Reading of the target utterance

First-pass reading

The analysis of the first-pass reading times, the forward-fixation times, or the probability of first-pass re-reading of the target utterance did not show effects of text type, mask condition, or their interaction.

The analysis of the first-pass re-reading times on the target utterance did not show main effects of text type, mask condition, or their interaction. However, there was a three-way interaction between text type, mask, and SSPAN. This indicates that the effect of masking was different in literal and sarcastic texts – and that this difference depended on the level of SSPAN. Estimates of the effects of text type at low (-1 SD) and high (+ 1 SD) levels of SSPAN and their 95% CIs are illustrated in Fig. 2. As is apparent in Fig. 2, readers scoring relatively low in SSPAN (i.e., lower spatial working memory capacity) showed shorter first-pass re-reading times for sarcastic than for literal target utterances in the no-mask condition, whereas in the mask condition, low SSPAN readers showed longer first-pass re-reading times on sarcastic target utterances when compared to literal. In other words, low SSPAN readers’ first-pass re-reading times on sarcastic target utterances got longer in the masked condition. Readers scoring relatively high in SSPAN, on the other hand, did not show reliable sarcasm effects in either condition. The analysis also showed an interaction between text type and TAS-20 (see Fig. 3). Readers scoring relatively low in TAS-20 (i.e., fewer alexithymic traits) showed shorter re-reading times for sarcastic than for literal target utterances, whereas readers scoring relatively high showed higher re-reading times for sarcastic than for literal target utterances.

Fig. 2
figure 2

Model estimates for the first-pass re-reading time on the target utterance. Y-axis represents the sarcasm effect, which is the difference in the reading times between sarcastic and literal texts. For illustration purposes SSPAN score is divided to High and Low (± 1 SD), and the model means and confidence intervals are back-transformed from log values. Error bars represent 95% CI

Fig. 3
figure 3

Model estimates for the first-pass re-reading time on the target utterance. Y-axis represents the sarcasm effect, which is the difference in the reading times between sarcastic and literal texts. For illustration purposes TAS-20 score is divided to High and Low (± 1 SD), and the model means and confidence intervals are back-transformed from log values. Error bars represent 95% CI

Look-backs

The analysis of the probability to initiate a look-back to the target utterance showed a main effect of text type, indicating that readers were more likely to initiate look-backs to sarcastic than to literal target utterances. There also was a main effect of mask condition, indicating that masking reduced the likelihood of look-backs to the target utterance in general. There was no indication of a Text type × Mask condition interaction. However, the analysis revealed a three-way interaction between text type, mask, and SSPAN, indicating that the effect of masking was different in sarcastic and literal texts – and that this difference depended on the level of SSPAN (see Fig. 4). Readers scoring relatively low in SSPAN were more likely to initiate a look-back to sarcastic than to literal target utterances, especially in the no-mask condition. In the masked condition, the difference between text types was reduced. On the other hand, readers scoring relatively high in SSPAN showed an opposite pattern: they were more likely to look back to sarcastic than to literal target utterances in the mask condition, and the difference between text types was reduced in the no-mask condition.

Fig. 4
figure 4

Model estimates for the probability to initiate a look-back to the target utterance from the other parts of text than spillover region. Y-axis represents the sarcasm effect, which is the difference in the reading times between sarcastic and literal texts. For illustration purposes SSPAN score is divided to High and Low (± 1 SD), and the model means and confidence intervals are back-transformed from log values. Error bars represent 95% CI

The analysis of the look-back time on the target utterance showed a main effect of masking, indicating that overall, look-back times were shorter in the mask than in the no-mask condition. There was no evidence for a main effect of text type or for a Text type × Mask condition interaction.

As for the probability to initiate a look-from from the target utterance, there was a main effect of text type. This indicated that readers were more likely to look to the other parts of the text from the sarcastic than from the literal target utterance. There also was a main effect of mask condition, indicating that masking reduced the likelihood of looking back in the text in general. There was no indication of an interaction between text type and mask condition.

Reading of the spillover region

The analysis of first-pass reading times on the spillover region showed a main effect of mask condition, indicating that the first-pass reading time in the spillover region was longer in the mask than in the no-mask condition. There was no main effect of text type or an interaction between text type and mask condition.

The analysis of the probability to initiate a look-back to the target utterance from the spillover region did not show effects of text type, mask condition, or their interaction. Furthermore, the analysis of the look-back time to the target utterance from the spillover region did not show main effects of text type, mask condition, or their interaction. There was only an interaction between text type and trial order (see Fig. 5). The result indicates that look-back times were longer for sarcastic than for literal texts in the beginning of the experiment. However, look-back times to sarcastic target utterances decreased toward the end of the experiment – during the second half of the experiment, look-back times were shorter for sarcastic than for literal target utterances.

Fig. 5
figure 5

Model estimates for the look-back time from the spillover region to the target utterance. Y-axis represents the sarcasm effect, which is the difference in the reading times between sarcastic and literal texts. The model means and confidence intervals are back-transformed from log values. Error bars represent 95% CI

Text memory and inference questions

The analysis of the correct answers to the text memory questions revealed a main effect of text type, indicating that readers were better at responding to the text memory questions after sarcastic paragraphs than after literal paragraphs. There were no effects of masking or an interaction between text type and mask condition. The analysis of the reaction times to the text memory questions did not show effects of text type, mask, or their interaction.

The analysis of the correct answers to the inference questions showed a main effect of text type, indicating that readers made more correct responses after literal than after sarcastic passages. There were no effects of masking or an interaction between text type and mask condition. The analysis of reaction times to the inference questions showed a main effect of text type, indicating that it took longer to answer inference questions after sarcastic than after literal passages. There was also an interaction between text type and trial order, indicating that readers were slower in answering inference questions after sarcastic paragraphs than after literal paragraphs in the first half of the experiment. This effect wore off toward the end of the experiment, however (see Fig. 6). There was no indication of a main effect of masking or a Text type × Mask condition interaction.

Fig. 6
figure 6

Model estimates for the reaction times to the inference questions. Y-axis represents the sarcasm effect, which is the difference in the reaction times between sarcastic and literal texts. The model means and confidence intervals are back-transformed from log values. Error bars represent 95% CI

Relationship between reading times and sarcasm comprehension

Based on the suggestions of the reviewers, a new set of analyses was conducted to explore the correlation between reading times and sarcasm comprehension. Separate models were built for each eye-movement measure shown to be related to the processing of sarcasm in the previous analyses (first-pass re-reading time, probability to initiate a look-back, look-back time from the spillover region, and the probability to initiate a look-from). Because literal items showed a roof effect with limited variance, the analyses were conducted only for the sarcasm comprehension scores. The reading time measures were fitted to the models as centered fixed effects variables, and probability measures were fitted as sum coded fixed effects variables. To explore the possible effects of mask manipulation, separate models were built for mask versus no-mask conditions. Model coefficients for mask and no-mask conditions did not differ from each other across the models; consequently, mask was not fitted as a predictor to the final models. All of the final models are reported in Appendix B.

The analysis showed that first-pass re-reading time on sarcastic utterance, probability to initiate a look-back to the sarcastic utterance (from parts of the text other than the spillover region), or look-back time to the sarcastic utterance from the spillover region did not have a statistically significant effect on correct responses to comprehension questions after sarcastic passages. However, the analysis of the probability to initiate a look-from the sarcastic utterance showed that look-froms were associated with poorer comprehension of the sarcastic utterance.

Discussion

In the present experiment, we examined the role of look-backs in the processing of written sarcasm by using a version of the trailing mask paradigm (Schotter et al., 2014) modified to be suitable for studying the reading of texts. The results showed some effects of the masking manipulation. Readers were less likely to look back to and from the target utterance, showed shorter look-back times, and demonstrated increased first-pass reading time in the spillover region in the mask versus the no-mask condition. The increased first-pass reading time in the spillover region can be taken as an indicator of increased wrap-up processing at the sentence end (e.g., Rayner, Kambe, & Duffy, 2000), suggesting that when look-backs do not provide an opportunity to re-examine the text content, readers invest extra effort into sentence integration before they move on to the next sentence. In other words, look-backs do seem to provide an opportunity to re-examine the text contents. Even though look-backs are likely to be important in forming a comprehensive memory representation of the text contents, masking did not hamper memory or comprehension of the target utterances in the present study. This is probably because readers compensated by increasing sentence wrap-up processing during first-pass reading.

We had expected that if resolving the meaning of a sarcastic utterance requires re-examining the text content (see, e.g., Olkoniemi et al., 2016; Olkoniemi et al., in press), masking should have greater effects on the processing of passages containing sarcastic utterances. However, we obtained no evidence of a critical interaction between text type and mask condition in any of the dependent measures. Interactions only appeared in combination with individual differences measures – specifically, the spatial working memory task.

Processing of sarcasm and sarcasm comprehension

We replicated the previous findings that sarcastic utterances are more likely to attract look-backs and that readers tend to initiate more look-backs to other parts of text from sarcastic than from literal utterances (Au-Yeung et al., 2015; Filik et al., 2014; Filik & Moxey, 2010; Kaakinen et al., 2014; Olkoniemi et al., 2016; Olkoniemi et al., in press; Turcan & Filik, 2016). In the beginning of the experiment, sarcastic utterances also received longer look-backs from the spillover region, implying increased sentence wrap-up processing (Rayner et al., 2000).

The finding that the sentence wrap-up effect (i.e., look-back time to the target utterance from the spillover region) observed for sarcastic utterances diminished across the experiment is in line with previous studies that have reported trial effects (Olkoniemi et al., 2016; Olkoniemi et al., in press). It seems that after encountering several sarcastic utterances during the course of the experiment, readers formed an expectation for sarcasm (Olkoniemi et al., 2016). This, in turn, facilitated the processing of sarcastic utterances. This interpretation is supported by the finding that response times to inference questions about sarcastic utterances decreased across the experiment. The results are in line with theoretical views that assume that context may bias the interpretation of an utterance toward the non-salient indirect meaning (Gibbs, 1994; Giora, 2003; Pexman, 2008).

Sarcastic utterances were also more difficult to comprehend than literal utterances, as evidenced by lower accuracy scores and slower response times to the inference questions (Au-Yeung et al., 2015; Kaakinen et al., 2014; Olkoniemi et al., 2016; Olkoniemi et al., in press). These results show that despite the extra processing effort readers invest in reading sarcastic utterances, they do not always understand the intended meaning. Moreover, the results show that the extra processing effort invested in the processing of sarcasm sometimes reflects problems in comprehension: look-backs initiated from the sarcastic utterance to previous text parts were related to poorer comprehension of the utterance. The result indicates that readers’ need to return to previous text parts after reading the sarcastic utterance may reflect confusion about a possible interpretation of the utterance. The result is in line with Gibbs and Colston’s (2012) suggestion that a failure to integrate the utterance with the context is one reason why a reader might not understand a sarcastic comment.

However, readers were more accurate in answering text memory questions related to sarcastic than to literal texts (Olkoniemi et al., in press). The extra processing effort invested in the processing of sarcasm may help readers to recall the text content.

Individual differences in the processing of sarcasm

The results showed that SSPAN score was related to compensatory strategies utilized during the reading of sarcastic passages in the mask condition. A lower SSPAN score correlated to longer first-pass re-reading time of sarcastic versus literal utterances in the mask condition. Readers with a relatively low SSPAN score also showed a lower likelihood of initiating look-backs to the sarcastic utterances in the mask condition. These results suggest that spatial working memory is associated with the processing of sarcasm, such that look-backs provide an opportunity for lower spatial WMC readers to re-examine the target utterance (Walczyk & Taylor, 1996). It seems that for higher visuospatial WMC readers, look-backs work as cues to the text content (e.g., Ferreira et al., 2008; Meseguer et al., 2002), as these readers still made look-backs to the sarcastic target utterance even in the mask condition. It is possible that high-SSPAN readers are better at maintaining an episodic memory representation that contains both the text content and the spatial information and that look-backs to previous parts of the text help them to retrieve the content information from memory.

We did not find any relation between RSPAN and the processing of sarcasm, as in our previous studies (Kaakinen et al., 2014; Olkoniemi et al., 2016). A possible explanation is that even though the correlation between RSPAN and SSPAN scores was relatively low, SSPAN captured most of the variance in WMC, while RSPAN had no explanatory power over and above it. This would be understandable if SSPAN captured the task-specific variance in spatial WMC as well as domain-general variations in WMC.

In accordance with previous research (Olkoniemi et al., 2016), the present results suggest that the ability to recognize emotions is related to the processing of sarcastic utterances. Readers who were better able to recognize emotions (i.e., low TAS-20 score) showed shorter first-pass re-reading times of sarcastic utterances. Sarcasm conveys an emotional message that differs from the meaning of a literal interpretation (Akimoto et al., 2014). Sarcasm is typically used to criticize someone (e.g., Bowes & Katz, 2011), and it may be used as a form of humor and experienced as funny (e.g., Gibbs, Bryant, & Colston, 2014). The intended emotional message was apparent in the sarcastic utterances used in the present experiment: they were evaluated as more negative and less positive than their literal counterparts (Olkoniemi et al., 2016). The result implies that those who are better able to recognize this emotional component are able to resolve sarcasm more quickly. The finding is in line with theoretical views stating that sarcasm comprehension is, to some extent, dependent on emotional inferences (Pexman, 2008). In contrast to our hypothesis, we failed to observe interactions between the effects of masking and the ability to recognize emotions. One possible explanation is that readers slowed down their reading on the spillover region after the sarcastic statements in the mask condition, indicating that readers with a poorer ability to recognize emotions might have had enough time to do extra processing that they would normally need.

The lack of correlation between IGT and processing times for sarcastic texts (cf. Olkoniemi et al., 2016) suggests that the ability to name and recognize the emotions the sarcastic protagonist wants to deliver may be a stronger predictor of sarcasm comprehension than the automatic activation of emotion. Future studies are needed to examine in more detail the role of emotional processing in the comprehension of emotionally complex language such as sarcasm.

Limitations of the study

Naturally, there are some limitations to the present study. First, we failed to observe positive correlations between processing time and the comprehension of sarcastic utterances. We would like to note that exploring correlations between these measures is tricky, as the comprehension scores were relatively high, even for sarcastic passages. The analyses are further complicated by the experimental manipulation employed in the present study (i.e., masking), which had an overall impact on the number and duration of re-inspections. Even though these results should be considered with some caution, the analyses do complement the planned analyses of the data.

Moreover, the gender distribution of the present sample was skewed (53 of the 62 participants were women). Previous studies suggest that men might be more likely to use sarcasm than women (Colston & Lee, 2004; Gibbs, 2000; Rockwell & Theriot, 2001; cf. Taylor, 2017). However, only a few studies have examined gender differences in the comprehension of sarcasm, and most have failed to find evidence for gender differences (Baptista, Macedo, & Boggio, 2015; see also Holtgraves, 1997). However, it has been suggested that the processing strategies used to understand irony might differ between men and women (Baptista et al., 2015) and that women might be better in recognizing sarcasm (Rothermich & Pell, 2015). Thus, the potential gender effects should be taken into account in the future.

Conclusions

Even though look-backs provide an opportunity to re-examine the text contents when resolving sarcasm, the present results indicate that they are not necessary for the successful comprehension of sarcasm. Readers seem to compensate for their inability to retrieve text content with look-backs by investing extra effort during first-pass reading (see also White et al., 2016). This result suggests that readers are already aware of sarcasm during first-pass reading and that re-inspecting the sarcastic utterance, either during first-pass reading or during look-backs, is important in forming the sarcastic interpretation. Furthermore, the results showed that the need to re-access the sarcastic utterance was mediated by spatial WMC. Readers with lower spatial WMC were more bound to re-accessing the text content, whereas high spatial WMC readers seemed able to use look-backs to the utterance location as cues about the text content. These finding are in line with the Compensatory-Encoding Model (Walczyk & Taylor, 1996), suggesting that low WMC readers use text as external memory and employ compensatory strategies (i.e., looking back or slowing down their reading). However, as this is the first study to experimentally examine the role of look-backs in the processing of sarcasm, it is clear that further studies are needed to fully understand how different readers react to and process sarcasm.

The results also showed that the ability to recognize emotions is related to the efficiency of resolving sarcasm. Readers who are better able to recognize emotions invest less processing effort to form a sarcastic interpretation. As sarcasm is typically used to criticize someone (Attardo, 2000; Kreuz & Glucksberg, 1989), it delivers an emotional message (i.e., negative emotions; Olkoniemi et al., 2016). The finding suggests that readers who are better able to recognize emotions are faster to categorize the utterance as sarcastic.

Finally, the results demonstrated that longer sentence wrap-up processing time and slower responses to inference questions observed for sarcastic texts wore off toward the end of the experiment. This suggests that after encountering several sarcastic utterances during the course of the experiment, readers form an expectation of sarcasm, which facilitates its processing.

Of the current theories of sarcasm comprehension, the direct access view (Gibbs, 1994) and the parallel constraint-satisfaction framework (Pexman, 2008) can best accommodate the present results, as they make the general assumption that various reader-, text-, and context-related factors constrain the accessibility of different interpretations during reading. The present study shows that while comprehending unfamiliar sarcastic utterances is more difficult and takes more time than comprehending literal utterances (Giora, 2003; Grice, 1975), the effort required and the strategies used to resolve the meaning of the sarcastic utterances depends on the reading context and reader characteristics.