Short-term memory (STM) comprises multiple mechanisms that maintain different kinds of information. Since the proposal of the influential multicomponent model of working memory (Baddeley, 2003; Baddeley & Hitch, 1974), theorists have distinguished between the phonological loop, responsible for subvocal rehearsal of verbal information, and the visuospatial sketchpad, responsible for maintenance of visual information over short delays. Subsequent research has suggested further fractionation of verbal STM into other resources beyond the phonological loop. Although phonological STM (pSTM) is critical for the maintenance of arbitrary information such as digit strings and nonwords, semantic mechanisms can complement pSTM to support the maintenance of more meaningful information. For example, neuropsychological patients with impairments of the phonological loop fail on short-term recall of arbitrary lists, but they can often produce reasonable paraphrases of meaningful sentences (Baldo, Klostermann, & Dronkers, 2008; Butterworth, Shallice, & Watson, 1990; McCarthy & Warrington, 1984, 1987). Healthy participants can perfectly recall sentences that greatly exceed their estimated span for word lists in length (Brener, 1940; Miller & Selfridge, 1950). Although chunking and prediction may play roles in sentence recall, semantic factors also affect the recall of word lists, with superior recall for words versus nonwords (Hulme, Maughan, & Brown, 1991), for high- versus low-frequency words (Hulme et al., 1997), and for words of higher imageability (Bourassa & Besner, 1994).

Although experimental tasks can be designed to stress phonological or semantic maintenance, both kinds of information likely contribute to verbal STM in everyday life. A classic task involving both is sentence repetition. Although early viewpoints tended to view phonological mechanisms as the primary contributor to sentence repetition (e.g., Clark & Clark, 1977), a radically different view was supported by Potter and Lombardi (1990), who proposed that short-term sentence recall depends mainly on regeneration from a conceptual code. This was evidenced by participants’ tendency to make semantic substitutions in their sentence repetitions when semantically related lure words were contained within word lists used in secondary tasks performed immediately before or after the sentence presentation. Subsequent studies using similar paradigms have shown that both semantically and phonologically related lure words tend to intrude in sentence recall attempts, supporting a more mixed viewpoint in which both semantic and phonological codes contribute to sentence recall performance (Alloway, 2007; Rummer & Engelkamp, 2001, 2003; Schweppe, Rummer, Bormann, & Martin, 2011).

The parallel engagement of phonological and semantic mechanisms suggests that the two processes may interact with each other. One key question is whether phonological and semantic maintenance are competitive with each other. In the present study, we hypothesized that suppressing pSTM during a short-term sentence repetition task would encourage participants to increase their employment of semantic maintenance strategies, and that this manipulation would have consequences for the long-term retention of sentence content when it was tested in a subsequent cued-recall test. The logic of the present experiment rests on a close link between semantic processing and the encoding of verbal information into long-term memory (LTM). It is well known that making semantic decisions about verbal stimuli promotes their encoding into LTM, relative to perceptual or phonological decisions, a finding known as the “levels-of-processing” effect (Craik & Lockhart, 1972). The close link between semantic processing and LTM encoding, along with other evidence, has led some theorists to propose that a single mechanism can account for both short-term and long-term retention of verbal information, and that semantic STM represents the temporary activation of representations stored in LTM (Cameron, Haarmann, Grafman, & Ruchkin, 2005; Ruchkin, Grafman, Cameron, & Berndt, 2003). This viewpoint is implied by Baddeley’s (2000) use of the term “episodic buffer” to refer to the short-term storage of verbal information by mechanisms other than the phonological loop. Such information may be temporarily stored via LTM mechanisms but ultimately fails to be consolidated for longer-term retention, such that the episodic buffer serves as both an STM store and a gateway into LTM. Other theorists have posited the existence of a dedicated storage buffer independent of both pSTM and LTM, labeled variously as “semantic STM” (R. C. Martin & He, 2004) or “conceptual STM” (cSTM; Potter & Lombardi, 1990). Experiments have identified specific effects that seem to call for a buffer independent of LTM (Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, & Usher, 2005; Davelaar, Haarmann, Goshen-Gottstein, & Usher, 2006; Haarmann, Davelaar, & Usher, 2003; Haarmann & Usher, 2001; Shivde & Anderson, 2011), although a contribution of LTM to short-term sentence recall is not ruled out by such findings.

In the present study, we do not aim to distinguish between dual- and single-mechanism accounts of semantic maintenance, but rather seek to clarify the relative contributions of pSTM and semantic mechanisms toward the encoding of sentence content into LTM. We evaluated memory for sentence content by having participants recall the same sentences twice. In Task 1 (short-term repetition), participants heard a lengthy sentence at the beginning of each trial and were asked to attempt to repeat it verbatim after a 14-s delay. During the delay, they were asked to either tap their fingers or perform a phonologically demanding distractor task, with both actions being externally paced by a visual cue. The more demanding distractor task involved counting backward by threes in Experiment 1, and repeating a nonword in Experiment 2. The distractor conditions in both experiments involved articulatory suppression (AS) of pSTM and were expected to produce poorer performance in short-term repetition. For effective short-term recall of sentences following AS, participants must rely on alternative mechanisms either to maintain the sentence content in a buffer during the delay or to recall the sentence from LTM after the delay is over. We hypothesized that this engagement of alternative mechanisms would have consequences for the degree to which the sentence content was ultimately retained in LTM, as tested on a second task.

Task 2 (long-term cued recall) involved a somewhat novel paradigm. On each trial, participants were presented with two words (the subject and main verb) from a sentence they had previously encountered in Task 1. They were asked to recall as much of the sentence as possible on the basis of this two-word cue. Sentences were scored according to raw recall performance (how many words from the original sentence were recalled), but also, critically, for conditional recall (the proportion of words recalled in Task 2 relative to the number recalled for the same sentence on Task 1). The conditional recall score specifically assessed the degree to which the sentence was forgotten following its initial short-term recall, before it was cued in the second task. The raw recall score reflects forgetting (or failure to encode) at two stages—during the initial delay in Task 1, and between the two tasks.

On the basis of the principle of levels of processing, we predicted in the present experiments that short-term repetition of sentences under conditions of AS would result in higher conditional recall scores, which would indicate less forgetting between repetition on Task 1 and cued recall on Task 2. Although we expected that AS would produce lower raw recall scores than the tapping condition on both tasks, due to the words forgotten during the initial filled delay period, we expected participants to forget fewer words between the two recall attempts when the first attempt was performed following AS. This prediction stems from the expected use of semantically based STM and LTM strategies during the initial recall attempt, which should promote encoding into LTM. This prediction can be contrasted with an opposite prediction that would view pSTM-based repetition as being more beneficial for encoding information into LTM. Even acknowledging the existence of levels-of-processing effects, one might predict that the benefit of rehearsing a sentence in the phonological loop without interference might outweigh the advantage afforded by being forced to recall the sentence using semantic resources, especially if semantic processing happens inadvertently in the course of rehearsal. This would be in line with numerous studies that have demonstrated an LTM advantage for verbal stimuli that are rehearsed for longer periods of time (e.g., Aldridge & Crisp, 1982; Dark & Loftus, 1976; Rundus, 1977). Thus, a levels-of-processing perspective may be contrasted with a more intuitive “rehearsal advantage” perspective.

We hypothesized that for the AS condition, the increased effort exerted to maintain the sentence in cSTM, or to recall it from LTM after the delay, might engage semantic mechanisms that would actually promote the encoding of the sentence into LTM. During the easier finger-tapping condition, participants freely engaged in subvocal rehearsal, which yields superior short-term recall but may promote shallower encoding, and therefore more forgetting of the sentence content between the two tasks. This assumes that AS is more disruptive of pSTM than is finger tapping, such that the finger-tapping condition afforded the participants the luxury of rehearsing without much interference. To assure that this was the case, we used a simple version in which participants only had to tap one finger, synchronized to periodic visual cues on the screen. This condition matched the AS conditions in timing and the amount of visual information, but was cognitively very undemanding. Although finger-tapping tasks are occasionally used to cause dual-task interference with memory processing (see, e.g., Kane & Engle, 2000), such tasks typically require endogenous timing and complex sequences. The assertion that finger tapping minimally interfered with pSTM is supported by the high performance of participants on short-term repetition (Task 1) in the tapping condition (as compared to the much poorer repetition following AS; see the results), and also by the responses to a debriefing questionnaire administered to the participants on the strategies used.Footnote 1

In contrast to the intuitive “rehearsal advantage account,” we hypothesized here that AS would have different effects on short-term and long-term retention of sentences. Although sentence recall had not to our knowledge previously been tested at both short-term and long-term stages, as in this experiment, studies have shown that increased demands for semantic processing during short-term retention result in improved subsequent memory for single words as assessed by recognition (Rose & Craik, 2012; Rose, Myerson, Roediger, & Hale, 2010) or free recall (McCabe, 2008; Rose, 2013; Rose, Buchsbaum, & Craik, 2014). On the basis of these findings, we expected that the AS conditions would bring about increased processing of sentence content in cSTM or LTM, resulting in less forgetting (better conditional recall) of sentences initially recalled after AS.

In addition to testing the effects of distraction on short-term repetition and long-term cued recall, the experiments incorporated a second manipulation intended to further probe the selective engagement of semantic mechanisms in short-term retention. Sentences were designed in two conditions: abstract and concrete. Abstract sentences were designed to be relatively devoid of sensory information, whereas concrete sentences described scenes that were rich in visual information. The greater imageability of the concrete sentences would support semantic memory for their content, making them less dependent on rote rehearsal for successful recall. We therefore predicted better recall for the concrete sentences in both short-term and long-term recall tasks, in line with prior experiments (Paivio, Clark, & Khan, 1988). Furthermore, we predicted an interaction between imageability and distraction condition. Studies on single-word recall have shown that concrete words benefit more from semantic relative to phonological processing in incidental encoding than abstract words do, due to their stronger links to amodal semantic representations (D’Agostino, O’Neill, & Paivio, 1977). In the present experiments, we expected to find distinct interaction effects in the two tasks. For short-term repetition, we expected that participants would be more dependent on rote rehearsal to support the maintenance of sentence content for abstract sentences, since the lack of imageability would make the semantic support weaker. Therefore, abstract sentences should suffer more degradation in performance from AS than would concrete sentences. But for conditional performance on long-term cued recall, we expected AS to have a greater positive impact on abstract sentences (i.e., we should see less forgetting for abstract sentences initially recalled after AS), due to participants having been forced to process these sentences semantically when they otherwise would have relied chiefly on pSTM. For concrete sentences, we also expected a beneficial effect of AS on conditional recall, but it might be smaller than the effect on abstract sentences, since concrete sentences already benefit from stronger semantic support even in the absence of AS, such as in the finger-tapping condition.

In addition to quantifying accuracy, we also analyzed sentence recall performance for the frequencies of distinct kinds of errors. Recall of sentences based on meaning rather than a phonological trace predicts a higher frequency of certain kinds of errors, particularly semantic substitutions. We observed differences in the patterns of errors induced by the two experimental factors in the stages of short-term repetition and long-term cued recall, further elucidating the complementary interactions between the phonological and semantic memory systems.

Method

Materials: sentence construction

All sentences were written by the authors to be suitable for both tasks used in the experiment: short-term repetition and long-term cued recall. For the cued-recall paradigm, the subject and verb of the sentence’s main clause served as the retrieval cue. Therefore, sentences were deliberately constructed to put these two words at the beginning of the sentence, slightly constraining the syntactic structures used. The sentences were intended to fall into two conditions, “abstract” and “concrete,” although independent raters were subsequently used to verify the distinction and select the final stimulus set. All of the concrete sentences contained rich visual imagery. Traditionally, “concreteness” is defined as the extent to which a concept can be experienced through the senses, whereas “imageability” is specific to the visual modality (Richardson, 1975). Although some authors dissociate these concepts, they are highly correlated, and subjective concreteness ratings are dominated by visual experience (Brysbaert, Warriner, & Kuperman, 2014). Thus, in the present experiments, we, like many other researchers (e.g., Reilly & Kean, 2007) considered the terms “concrete” and “imageable” to be interchangeable.

We composed 198 sentences ranging from 10 to 16 words in length (median = 13), corresponding to previous estimates of sentence memory span (Baddeley, Vallar, & Wilson, 1987; Brener, 1940). To confirm our intuitions about the distinction between abstract and concrete sentences, we recruited three raters naïve to the purpose of the experiment. The raters were instructed to rate the sentences according to imageability, on a scale from 1 (least) to 5 (most). See Appendix A for the instructions given to the raters. Preliminary analysis of the average ratings revealed a bimodal distribution, with abstract sentences consistently being rated 1–2 and concrete sentences being rated 3–5. We next eliminated all sentences with an average rating falling between 2 and 3, as well as all sentences with a standard deviation of the ratings greater than 1.145, such that ratings of [3, 5, 5] were acceptable, but [2, 5, 5] were not. Additional sentences were excluded on the basis of perceived semantic overlap and other subjective criteria, leaving 162 sentences that were considered acceptable for the experiment. Of these, the concrete sentences had a mean imageability rating of 4.54, and the abstract sentences of 1.17 [t(115) = 44.3, p < 10–15].

Of the remaining 162 sentences, we then selected four sets of 25 sentences, two sets each for concrete and abstract. The sets were matched on a variety of quantitative criteria computed using readily available tools from the computational-linguistics literature (details can be found in Appendix B). The full set of 100 selected sentences is given in Appendix D. The matched sets were randomly assigned to the counting and tapping conditions, counterbalanced across participants. Sentences were recorded by a theatrically trained female speaker at a natural speaking rate (175 words, or 281 syllables, per minute).

Experiment 1: short-term repetition and long-term cued recall with an arithmetic distraction task

Participants

Twenty right-handed young adult participants (mean age = 21.9, SD = 2.87; 12 females, eight males) were recruited from local universities. All had spoken English fluently from age 5 or earlier and reported no history of neurological conditions or speech/hearing difficulties. All procedures were approved by the Research Ethics Board of Baycrest Hospital, and participants were compensated financially.

Procedure

Participants were tested individually in a private room, seated in front of a computer monitor. The experiment was implemented using the Presentation software (Neurobehavioral Systems, Albany, CA, USA). The experimenter was present but did not interact with the participants except to instruct them on the task and verify compliance. Auditory sentence stimuli were presented through headphones adjusted by the participant to a comfortable listening level with an attached microphone. All verbal responses were recorded for subsequent analysis.

Task 1: short-term repetition

Participants completed two brief practice sessions before the experiment. First, they practiced the counting task (described below) without the concurrent sentence recall task. All participants were able to comfortably count out loud, paced with the visual cues, after five to ten practice trials. Next, the participants practiced four trials of the full task, to ensure that they understood the instructions.

This experiment featured a 2×2 factorial design, for a total of four conditions. One factor was Sentence Type, abstract or concrete (see below), but this factor was not disclosed to the participants. The second factor was the Delay Period Task, either counting (a task involving AS) or finger tapping (a nondemanding control task). The trial structure is diagrammed in Fig. 1. The counting task involved counting backward by threes from a random number between 50 and 150 that was presented on the screen. At the start of each trial, participants heard a sentence, followed by a 2-s delay. On counting trials, a visual text cue then appeared with the initial random number—for example, “COUNT FROM 115.” This cue was displayed for 1,000 ms, followed by a 1,000-ms delay. Next, the visual cue “-3” appeared regularly (500 ms on, 1,000 ms off) a total of seven times over 10.5 s. Participants were instructed to say the next number in the series out loud, paced to the cue. For the tapping task, after an initial “TAP” cue (1,000 ms on, 1,000 ms off), the letter “T” appeared regularly on the screen 14 times (500 ms on, 250 ms off) over 10.5 s. Participants were instructed to tap their fingers on the table in front of them paced to the cue. After the delay period task (tapping or counting), an empty word balloon appeared on the monitor, cueing the participant to recall the sentence that he or she had just heard. The instruction “Press space bar when finished” appeared below. Participants were instructed to attempt to reproduce the sentence verbatim. When finished, they pressed the space bar, at which point the visual cue disappeared, a 2-s delay occurred, and the next trial began. A maximum of 20 s was allowed for each verbal response, after which the visual cue would disappear and the next trial would begin, but participants almost always pressed the space bar well before the time limit. After every 20 trials, they were given the opportunity to rest briefly. In total, 100 trials were presented—25 in each condition, intermixed in a random order. Sentences were presented in the same fixed random order for each participant, but the assignments of sentences to the tapping and counting conditions were counterbalanced.

Fig. 1
figure 1

Schematic of the trial structure for the short-term recall task. The top line shows the structure of tapping trials in both experiments. The middle and bottom lines show the articulatory suppression conditions in Experiment 1 (counting aloud backward by threes) and Experiment 2 (nonword repetition), respectively

Task 2: long-term cued recall

Immediately after completion of Task 1, participants were told that they would undergo a second, separate memory test, consisting of cued sentence recall. Participants were then instructed on the second task and given another practice session of four trials. The conditions for the cued-recall test involved the same 2×2 design as the immediate-recall task, but the task procedure was exactly the same for all trials (i.e., there was no distinction between “tap” and “count” trials in the task demands for cued recall—only differences in the conditions under which the sentences had previously been recalled). The sentences were the same 100 used for Task 1, presented in randomized order. They were either abstract or concrete, and had previously been recalled in Task 1 following tapping or counting. At the start of each trial, a word balloon appeared on the screen containing two words as a retrieval cue. Again, the instruction “Press space bar when finished” appeared at the bottom of the screen. Participants were asked to attempt to recall verbatim the sentence from Task 1 that had contained the two cue words as subject and verb. Again, a maximum of 20 s was given for each response, but the experiment was otherwise self-paced.

Experiment 2: short-term repetition and long-term cued recall with simple AS

In Experiment 1, we had used a backward-counting task to disrupt phonological rehearsal during the delay period in the short-term recall task. This task is somewhat more difficult and attentionally demanding than more traditional articulatory distraction tasks, which typically involved repetition of a single word or phrase. We chose this task in order to more completely disrupt processes that might aid in retention of the phonological form of sentences over the delay period, including articulatory rehearsal and attentional refreshing or covert retrieval (Rose et al., 2014). However, we were concerned that the results may have been influenced by the suppression of additional cognitive processes, beyond phonological rehearsal, so we chose to replicate the experiment using a more traditional AS task as the more challenging distractor condition. The experimental design was otherwise identical to that of Experiment 1 (see Fig. 1). Fifteen participants (mean age = 21.5, SD = 4.53; nine females, six males) were recruited, meeting the same criteria as in Experiment 1. The finger-tapping condition was identical. On AS trials for Task 1 (short-term repetition), the nonword “BABATAKA” appeared on the screen for 1,000 ms, followed by a 1,000-ms delay. Next, the visual cue “B” appeared regularly (500 ms on, 1,000 ms off) a total of seven times over 10.5 s. Participants were instructed to pronounce “BABATAKA” out loud, paced to the cue. Thus, the visual pacing was the same as in the first experiment, but participants only had to repeat a simple four-syllable sequence instead of performing mental arithmetic. Task 2 (long-term cued recall) was identical to that in Experiment 1.

Analysis of recall accuracy

Verbatim

Full details of the analysis procedure are given in Appendix C, with examples. Verbal responses were manually transcribed. The procedures for scoring short-term repetition and long-term cued-recall trials was the same. We computed a strict “verbatim” score on the basis of recall of exact word forms, and also a more liberal “gist” score accounting for paraphrases of the sentence content.

The primary measure of interest was the proportion of words in the sentence recalled correctly verbatim, ranging from 0 to 1. For a word to be scored as correctly recalled, it had to be identical to the target word, including grammatical inflections such as tense and plurality. Credit was given for open-class words that were recalled correctly, even if their serial order was changed in the response. The transcribed response was then compared with the target sentence to produce a “condensed” transcription consisting only of the correctly recalled words. For both immediate and delayed recall, the score on each trial was computed as follows:

$$ \mathrm{Recall}=\#\mathrm{words}\;\mathrm{in}\;\mathrm{condensed}\;\mathrm{transcription}/\#\mathrm{words}\;\mathrm{in}\;\mathrm{original}\;\mathrm{target}\;\mathrm{sentence}. $$

For the long-term cued-recall task, the raw accuracy was not of primary interest. Failure to recall words in this task could result from two stages of forgetting: over the initial filled delay (also influenced by failure to encode initially), and during the intervening time between the short-term repetition trial and the subsequent cued-recall trial. In general, for long-term recall, participants would not be expected to correctly recall words of a sentence that they did not recall upon short-term repetition of the same sentence. Although such “reminiscences” do occur occasionally (Erdelyi, 2010), words that are forgotten during the immediate delay interval (the counting/tapping period) are unlikely to be recalled successfully in the subsequent cued-recall test. To assess effects of forgetting that occurred after short-term repetition, we conducted an analysis of conditional delayed recall. This assessment of conditional recall performance allowed us to assess how much of the sentences were forgotten between the immediate and delayed recall tests, controlling for performance at immediate recall. Conditional recall was calculated simply as follows:

$$ \mathrm{Conditional}\;\mathrm{recall}=\#\mathrm{words}\;\mathrm{in}\;\mathrm{delayed}\;\mathrm{condensed}\;\mathrm{transcription}/\#\mathrm{words}\;\mathrm{in}\;\mathrm{immediate}\;\mathrm{condensed}\;\mathrm{transcription}. $$

For the rare trials in which more words were recalled correctly in delayed than in immediate recall, we set this value to a maximum of 1.

Accuracy scores were averaged within subjects by condition and were subjected to a subject-wise repeated measures analysis of variance (ANOVA; F 1), with Sentence Type and Distraction Condition as within-subjects factors. Accuracy scores were also averaged across subjects for each experimental sentence and subjected to an item-wise ANOVA (F 2; Clark, 1973), with Distraction Condition as a repeated factor and Sentence Type as a between-items factor.

Gist

To account for successful recall of information from the sentences that may have been missed by verbatim scoring, we also counted the number of “gist words” for which credit could be given. Acceptable gist words included synonyms or close semantic substitutions, morphological substitutions (e.g., verb tense changes), direct determiner substitutions, semantically close prepositional changes, and order changes, including active–passive voice alternations. Gist accuracy scores were analyzed statistically in the same manner as the verbatim scores.

Analysis of error types in recall

In addition to examining verbal responses for overall accuracy, we also measured the frequency of occurrence of the various kinds of errors. We defined six categories of errors: major order changes (correct words in the wrong order), unrelated additions, semantic substitutions, grammatical substitutions, phonological substitutions, and open-class omissions (open-class being defined as all words except closed-class words, which included prepositions, conjunctions, determiners, auxiliary verbs, and pronouns). See Appendix C for more details. The raw number of errors made by each participant in each category was submitted to a repeated measures ANOVA (F 1), with Sentence Type and Distraction Condition as within-subjects factors.

Assessment of interrater reliability

The majority of transcripts were analyzed by one rater for Experiment 1, and by two different raters for Experiment 2. Because some subjectivity was involved, particularly for the assignment of gist points, we assessed interrater reliability for both the verbatim and gist scoring procedures. Three raters were trained in the scoring procedure, and each independently scored transcripts for three participants chosen randomly from Experiment 2. To assess reliability, we computed the intraclass correlation coefficient (ICC) by comparing the three raters’ scores for each individual sentence in the short-term repetition and long-term recall tasks, pooling sentences across the three participants assessed, for a total of 300 sentences rated at both short-term and long-term recall. The specific algorithm used was “absolute two-way single measures ICC” (Hallgren, 2012). The ICC values were as follows: for short-term repetition, verbatim = .992, gist = .976; for raw long-term cued recall, verbatim = .980, gist = .960; and for conditional long-term cued recall, verbatim = .917, gist = .905. ICC values above .75 are generally considered “excellent” (Cicchetti, 1994).

Results

Experiment 1: short-term repetition and long-term cued recall with arithmetic distraction task

Short-term repetition: accuracy

Both sentence type and distraction condition influenced participants’ ability to repeat sentences following a filled delay. Recall performance across the four conditions is plotted in Fig. 2a (verbatim) and 2b (gist). The patterns of results were essentially identical using both verbatim and gist measures of recall for both the subject-wise and item-wise analyses. We found a main effect of sentence type [verbatim: F 1(1, 19) = 35.33, p < .001, η p 2 = .65; F 2(1, 98) = 13.01, p < .001, η p 2 = .12; gist: F 1(1, 19) = 40.88, p < .001, η p 2 = .68; F 2(1, 98) = 15.33, p < .001, η p 2 = .14], with participants achieving higher accuracy for concrete than for abstract sentences. We also observed a main effect of distraction condition [verbatim: F 1(1, 19) = 71.09, p < .001, η p 2 = .79; F 2(1, 98) = 202.77, p < .001, η p 2 = .67; gist: F 1(1, 19) = 52.57, p < .001, η p 2 = .73; F 2(1, 98) = 160.52, p < .001, η p 2 = .62], with lower performance following counting than following finger tapping. Notably, the size of the effect for distraction condition was larger than that for sentence type; that is, in the counting condition, AS had a larger impact on performance than did sentence abstractness. Finally, a significant interaction was apparent between the two factors [verbatim: F 1(1, 19) = 7.59, p = .012, η p 2 = .29; F 2(1, 98) = 6.64, p = .011, η p 2 = .06; gist: F 1(1, 19) = 6.40, p = .020, η p 2 = .25; F 2(1, 98) = 7.17, p = .008, η p 2 = .07]. The form of the interaction was a stronger effect of distraction condition for abstract than for concrete sentences; that is, the AS caused by counting backward disrupted participants’ ability to recall all of the sentences, but more strongly for abstract ones. Recall of concrete sentences was more resilient to the detrimental effects of AS, suggesting greater support from short-term maintenance of semantic information.

Fig. 2
figure 2

Experiment 1: Recall accuracy. The proportions of words recalled accurately (# words recalled / # words in target sentence) in Task 1 in all four conditions are shown, for both the verbatim and gist scoring schemes. Error bars indicate one-sided 95 % confidence intervals adjusted for repeated measures using the method of Morey (2008). (A–B) Recall accuracy in short-term (ST) repetition (Task 1). (C–D) Raw accuracy in long-term (LT) cued recall (Task 2). (E–F) Conditional (Cond.) recall in Task 2

Long-term cued recall: raw accuracy

Raw accuracy for the long-term cued-recall task is plotted in Fig. 2c (verbatim) and 2d (gist). We observed a main effect of sentence type [verbatim: F 1(1, 19) = 100.13, p < .001, η p 2 = .84; F 2(1, 98) = 57.38, p < .001, η p 2 = .37; gist: F 1(1, 19) = 97.54, p < .001, η p 2 = .84; F 2(1, 98) = 53.63, p < .001, η p 2 = .35], with participants recalling more words from concrete than from abstract sentences. There was also a main effect of distraction condition [verbatim: F 1(1, 19) = 42.03, p < .001, η p 2 = .69; F 2(1, 98) = 15.67, p < .001, η p 2 = .14; gist: F 1(1, 19) = 29.19, p < .001, η p 2 = .14; F 2(1, 98) = 15.20, p < .001, η p 2 = .13], with fewer words being remembered from sentences previously recalled following counting. No significant interaction emerged between the two factors [verbatim: F 1(1, 19) = 1.04, p = .320, η p 2 = .05; F 2(1, 98) = 0.71, p = .40, η p 2 = .01; gist: F 1(1, 19) = 0.86, p = .36, η p 2 = .04; F 2(1, 98) = 0.86, p = .35, η p 2 = .01].

Long-term cued recall: conditional accuracy

Conditional accuracy for the long-term cued-recall task is plotted in Fig. 2e (verbatim) and 2f (gist). As in the immediate repetition task, we found a main effect of sentence type [verbatim: F 1(1, 19) = 76.68, p < .001, η p 2 = .80; F 2(1, 98) = 36.36, p < .001, η p 2 = .27; gist: F 1(1, 19) = 82.80, p < .001, η p 2 = .81; F 2(1, 98) = 39.18, p < .001, η p 2 = .29], with better recall for concrete sentences regardless of distraction condition. Not only were concrete sentences recalled better in short-term repetition, they were more resistant to being forgotten between immediate and long-term recall.

We also observed a main effect of distraction condition [verbatim: F 1(1, 19) = 10.88, p = .003, η p 2 = .36; F 2(1, 98) = 12.64, p < .001, η p 2 = .11; gist: F 1(1, 19) = 4.51, p = .047, η p 2 = .19; F 2(1, 98) = 5.23, p = .024, η p 2 = .05]. Interestingly, this effect was in the opposite direction of the effect upon immediate recall. Here, sentences that were previously recalled under conditions of AS were relatively more preserved upon long-term recall; that is, they were forgotten less. This effect was stronger for the abstract sentences, as reflected by a significant interaction effect [verbatim: F 1(1, 19) = 8.14, p = .010, η p 2 = .30; F 2(1, 98) = 4.27, p = .041, η p 2 = .04; gist: F 1(1, 19) = 6.17, p = .022, η p 2 = .25; F 2(1, 98) = 4.54, p = .036, η p 2 = .04]. Considering the two sentence types alone, the effect of distraction condition was significant within abstract sentences [subject-wise paired t test: verbatim, t(19) = 3.42, p = .003; gist, t(19) = 2.72, p = .014], but not within concrete sentences [verbatim, t(19) = 1.90, p = .073; gist, t(19) = 0.29, p = .77]. These results indicate that AS prior to short-term repetition resulted in relatively better preservation of sentence content between short-term and long-term recall for abstract sentences, but had a lesser impact on subsequent recall for concrete sentences.

Short-term repetition: error type analysis

The task conditions induced differential rates of errors within subjects, and the effects were different in the short-term and long-term recall tasks. For each participant, we counted the total number of errors in each category within each condition and subjected the totals to 2×2 repeated measures ANOVAs (subject-wise, not item-wise). Both major order errors and phonological errors were extremely rare and were not affected by any experimental factors, so they will not be discussed further. Figure 3 shows error rates across conditions in short-term repetition and long-term recall for three of the other error categories. Figure 3a shows open-class omissions during immediate recall. Omissions were by far the most common form of error, being words that were neither recalled successfully nor substituted. We found a main effect of sentence type [F 1(1, 19) = 34.55, p < .001, η p 2 = .65] in which concrete sentences had fewer omissions, contributing to their higher overall accuracy. There was also a main effect of distraction condition [F 1(1, 19) = 51.92, p < .001, η p 2 = .73], since sentences recalled after counting had more omissions. No interaction was present [F 1(1, 19) = 2.02, p = .172, η p 2 = .10].

Fig. 3
figure 3

Experiment 1: Occurrence of error types. These panels show the average numbers of errors of distinct types made by participants in the two tasks (total errors / # participants) for each condition. (A) Open-class omissions in short-term repetition. (B) Unrelated additions in short-term repetition. (C) Semantic substitutions in short-term repetition. (D) Open-class omissions in long-term cued recall. (E) Unrelated additions in long-term cued recall. (F) Semantic substitutions in long-term cued recall

The same pattern of error rates in short-term repetition (both main effects significant and no interaction) was also observed for unrelated additions [Fig. 3b; sentence type, F 1(1, 19) = 91.63, p < .001, η p 2 = .83; distraction condition, F 1(1, 19) = 27.56, p < .001, η p 2 = .59; interaction, F 1(1, 19) = 0.13, p = .721, η p 2 = .01]. For semantic substitutions (Fig. 3c), no main effect of sentence type emerged [F 1(1, 19) = 1.55, p = .228, η p 2 = .08], but we did find an effect of distraction condition [F 1(1, 19) = 31.62, p < .001, η p 2 = .62] and no interaction [F 1(1, 19) = 0.60, p = .428, η p 2 = .03]. For grammatical substitutions (data not shown), the pattern of a significant main effect for distraction condition only was also present [sentence type, F 1(1, 19) = 1.08, p = .312, η p 2 = .05; distraction condition, F 1(1, 19) = 38.93, p < .001, η p 2 = .67; interaction, F 1(1, 19) = 0.52, p = .478, η p 2 = .03].

In summary, the pattern of errors in short-term repetition for most categories resembled that of accuracy in general. The manipulation of distraction condition had a stronger effect, with AS increasing the error count in four categories, while sentence type had a more modest effect, in that abstract sentences induced more errors in only two categories. However, a more varied pattern of error occurrence was present in the long-term cued-recall test.

Long-term cued recall: error type analysis

In long-term cued recall, sentence type was a stronger modulator of error rates than was distraction condition, and the differences were not all in the same direction as they were in the immediate task. For open-class omissions (Fig. 3d), there were main effects of both sentence type [F 1(1, 19) = 108.13, p < .001, η p 2 = .85] and distraction condition [F 1(1, 19) = 32.96, p < .001, η p 2 = .63], but no interaction [F 1(1, 19) = 0.53, p = .474, η p 2 = .03]. For unrelated additions (Fig. 3e), we found a main effect of sentence type [F 1(1, 19) = 40.32, p < .001, η p 2 = .68] and a marginal effect of distraction condition [F 1(1, 19) = 4.25, p = .053, η p 2 = .18], with no interaction [F 1(1, 19) = 0.03, p = .865, η p 2 = .00]. For both of these error types, errors were more frequent for abstract sentences and for sentences that had previously been recalled following AS. These effects resembled those seen for short-term repetition.

In contrast, semantic substitutions (Fig. 3f) had an opposite pattern. These errors were more common for concrete sentences [sentence type, F 1(1, 19) = 26.20, p < .001, η p 2 = .58; distraction condition, F 1(1, 19) = 0.53, p = .478, η p 2 = .03; interaction, F 1(1, 19) = 0.57, p = .461, η p 2 = .03]. This pattern reflects the tendency for participants to maintain a gist meaning for concrete sentences more easily, which led to them generating words similar to the intended ones instead of omitting the target words altogether. This tendency was also reflected in the accuracy data, since greater accuracy was consistently seen for concrete sentences, especially in the gist criteria that gave credit for semantic substitutions. For grammatical substitutions (not shown), no significant effects were found [sentence type, F 1(1, 19) = 0.95, p = .343, η p 2 = .05; distraction condition, F 1(1, 19) = 3.34, p = .083, η p 2 = .15; interaction, F 1(1, 19) = 0.02, p = .884, η p 2 = .00].

Experiment 2: short-term repetition and long-term cued recall with simple AS

Short-term repetition: accuracy

Recall performance across the four conditions is plotted in Fig. 4a (verbatim) and 4b (gist). Statistically, a main effect emerged of sentence type [verbatim: F 1(1, 14) = 18.36, p < .001, η p 2 = .57; F 2(1, 98) = 6.38, p = .013, η p 2 = .06; gist: F 1(1, 14) = 16.47, p < .001, η p 2 = .54; F 2(1, 98) = 7.37, p = .008, η p 2 = .07] and a main effect of distraction condition [verbatim: F 1(1, 14) = 97.68, p < .001, η p 2 = .87; F 2(1, 98) = 137.00, p = .011, η p 2 = .58; gist: F 1(1, 14) = 68.59, p < .001, η p 2 = .83; F 2(1, 98) = 121.00, p < .001, η p 2 = .55], but no interaction between the two factors [verbatim: F 1(1, 14) = 0.17, p = .687, η p 2 = .01; F 2(1, 98) = 0.09, p = .760, η p 2 = .00; gist: F 1(1, 14) = 1.21, p = .291, η p 2 = .08; F 2(1, 98) = 0.92, p = .340, η p 2 = .01].

Fig. 4
figure 4

Experiment 2: Recall accuracy. The interpretation of the panels is identical to that in Fig. 2

These results indicated that, as in Experiment 1, short-term repetition performance was better for concrete than for abstract sentences, and for sentences repeated after finger tapping rather than after AS. However, unlike in Experiment 1, the two factors did not interact: The detrimental effect of AS was approximately the same for both abstract and concrete sentences, whereas in Experiment 1, abstract sentences had been more vulnerable to disruption by AS than were concrete sentences.

Long-term cued recall: raw accuracy

Raw accuracy for the long-term cued-recall task is plotted in Fig. 4c (verbatim) and 4d (gist). There was a main effect of sentence type [verbatim: F 1(1, 14) = 44.33, p < .001, η p 2 = .76; F 2(1, 98) = 41.02, p < .001, η p 2 = .30; gist: F 1(1, 14) = 45.54, p < .001, η p 2 = .76; F 2(1, 98) = 39.50, p < .001, η p 2 = .29], with concrete sentences being recalled more successfully than abstract ones. The effect of distraction condition was only marginal, reaching the p < .05 criterion for verbatim but not for gist scoring [verbatim: F 1(1, 14) = 5.15, p = .040, η p 2 = .27; F 2(1, 98) = 4.35, p = .040, η p 2 = .04; gist: F 1(1, 14) = 3.21, p = .095, η p 2 = .19; F 2(1, 98) = 2.92, p = .091, η p 2 = .03], reflecting a slight decrement for sentences that were initially recalled following AS. Similarly, the interaction effect was marginal [verbatim: F 1(1, 14) = 5.36, p = .036, η p 2 = .28; F 2(1, 98) = 2.62, p = .109, η p 2 = .03; gist: F 1(1, 14) = 5.36, p = .080, η p 2 = .20; F 2(1, 98) = 2.00, p = .160, η p 2 = .02].

Long-term cued recall: conditional accuracy

Conditional accuracy for the long-term cued-recall task is plotted in Fig. 4e (verbatim) and 4f (gist). We found a main effect of sentence type [verbatim: F 1(1, 14) = 20.37, p < .001, η p 2 = .59; F 2(1, 98) = 33.42, p < .001, η p 2 = .25; gist: F 1(1, 14) = 26.56, p < .001, η p 2 = .65; F 2(1, 98) = 36.16, p < .001, η p 2 = .27] and a main effect of distraction condition [verbatim: F 1(1, 14) = 38.06, p < .001, η p 2 = .73; F 2(1, 98) = 21.42, p < .001, η p 2 = .18; gist: F 1(1, 14) = 45.77, p < .001, η p 2 = .77; F 2(1, 98) = 10.89, p = .001, η p 2 = .10], but no interaction [verbatim: F 1(1, 14) = 0.82, p = .381, η p 2 = .06; F 2(1, 98) = 2.34, p = .129, η p 2 = .02; gist: F 1(1, 14) = 1.36, p < .264, η p 2 = .09; F 2(1, 98) = 3.56, p = .062, η p 2 = .04].

These results indicated that, as in Experiment 1, concrete sentences were better preserved (i.e., less forgotten) than abstract sentences between short-term repetition and long-term cued recall for the same sentences. Similarly, sentences repeated after AS in the short-term repetition task were better preserved than those repeated after finger tapping, when tested in long-term cued recall. Unlike in Experiment 1, the two factors did not interact: The better preservation of sentences initially repeated after AS was equivalent in size for both concrete and abstract sentences.

Short-term repetition: error type analysis

Figure 5a–c show error rates across conditions in short-term repetition for three of the error categories. The overall pattern was identical to that in Experiment 1. We observed very few major order or phonological errors, and no significant effects of the experimental factors on these types. The remaining four error types were all more common following abstract sentences and following AS, and no interactions were present except for grammatical substitutions. The ANOVA results were as follows: Open-class omissions (Fig. 5a): sentence type, F 1(1, 14) = 13.19, p = .003, η p 2 = .49; distraction condition, F 1(1, 14) = 32.43, p < .001, η p 2 = .70; interaction, F 1(1, 14) = 0.02, p = .881, η p 2 = .00; unrelated additions (Fig. 5b): sentence type, F 1(1, 14) = 29.73, p < .001, η p 2 = .68; distraction condition, F 1(1, 14) = 18.13, p < .001, η p 2 = .56; interaction, F 1(1, 14) = 0.38, p = .549, η p 2 = .03; semantic substitutions (Fig. 5c): sentence type, F 1(1, 14) = 5.06, p = .041, η p 2 = .27; distraction condition, F 1(1, 14) = 16.46, p < .001, η p 2 = .54; interaction, F 1(1, 14) = 2.23, p < .158, η p 2 = .14; grammatical substitutions (data not shown): sentence type, F 1(1, 14) = 13.31, p = .003, η p 2 = .49; distraction condition, F 1(1, 14) = 14.03, p = .002, η p 2 = .50; interaction, F 1(1, 14) = 5.47, p = .035, η p 2 = .28.

Fig. 5
figure 5

Experiment 2: Occurrence of error types. The interpretation of the panels is identical to that in Fig. 3

Long-term cued recall: error type analysis

Figure 5d–f shows error rates in long-term cued recall for Experiment 2. Again, no significant effects emerged for major order and phonological errors (data not shown). Sentence type exerted an effect, with more errors in three categories (open-class omissions, unrelated additions, and grammatical substitutions) for abstract sentences. No significant effects of distraction condition were present, however; all error categories were equally frequent for sentences previously repeated following both finger tapping and AS. As in Experiment 1, semantic substitutions patterned differently from the other common error types: In this case, there were no significant effects of either factor on the occurrence of semantic substitutions (whereas in Exp. 1, semantic substitutions were more common for concrete sentences, and other error types were less common for concrete sentences).

The ANOVA results were as follows: Open-class omissions (Fig. 5d): sentence type, F 1(1, 14) = 39.52, p < .001, η p 2 = .74; distraction condition, F 1(1, 14) = 0.82, p = .381, η p 2 = .06; interaction, F 1(1, 14) = 4.95, p = .043, η p 2 = .26; unrelated additions (Fig. 5e): sentence type, F 1(1, 14) = 29.47, p < .001, η p 2 = .68; distraction condition, F 1(1, 14) = 3.41, p = .086, η p 2 = .20; interaction, F 1(1, 14) = 1.14, p = .304, η p 2 = .08; semantic substitutions (Fig. 5f): sentence type, F 1(1, 14) = 0.24, p = .634, η p 2 = .02; distraction condition, F 1(1, 14) = 0.25, p = .625, η p 2 = .02; interaction, F 1(1, 14) = 0.06, p < .806, η p 2 = .00; grammatical substitutions (data not shown): sentence type, F 1(1, 14) = 53.37, p < .001, η p 2 = .79; distraction condition, F 1(1, 14) = 0.93, p = .352, η p 2 = .06; interaction, F 1(1, 14) = 0.93, p = .352, η p 2 = .06.

Discussion

This study introduced a novel paradigm for assessing sentence recall at both short-term and long-term stages. We selectively interfered with subvocal rehearsal on some trials by using AS tasks. Experiment 1 was based on a relatively demanding backward-counting task, whereas Experiment 2 was based on a simpler nonword articulation task. The predictions for short-term repetition were straightforward and were confirmed for both experimental factors. Concrete sentences were repeated more accurately than abstract sentences, reflecting the greater support available from the semantic mechanisms that complement pSTM. The AS tasks reduced accuracy for short-term repetition, as compared to finger tapping, during the short-term delay period. This reduction in accuracy is attributable to interference with pSTM, which ordinarily supports relatively high performance in short-term sentence repetition in the absence of distraction. Finally, AS tasks caused a larger performance decrement for abstract than for concrete sentences in Experiment 1, reflecting the lesser semantic support available for these sentences and their increased reliance on pSTM to support accurate repetition. These effects illustrate that semantic mechanisms play a role in short-term repetition that complements pSTM, especially following a brief delay.

Of greater interest for the present study are the effects of the experimental manipulations on subsequent cued recall of the same sentences in Task 2. This experiment tested whether subvocal rehearsal during a short-term maintenance period promotes the retention of sentence content in LTM when it is tested later, relative to conditions in which rehearsal is prevented and in which semantic mechanisms may be selectively engaged in Task 1 to support repetition in the absence of rehearsal. According to the rehearsal advantage account, sentences initially repeated after AS could be subject to greater rates of forgetting between the two tasks. Alternatively, according to a levels-of-processing framework, the engagement of semantic mechanisms under conditions of AS could result in more effective encoding into LTM, thus protecting those sentences from forgetting. Critically, the two positions diverge in their predictions for conditional recall, or the number of words recalled in Task 2 relative to the number recalled for the same sentence in Task 1. Because we did not expect words forgotten in Task 1 to be recalled in Task 2, we focused on conditional recall as a measure of forgetting after Task 1.

The results clearly support the levels-of-processing prediction. In both experiments, conditional recall was better for sentences that were initially recalled following a challenging distractor task that interfered with subvocal rehearsal. That is, AS before short-term repetition led to sentences being forgotten less between their first recall attempt (short-term repetition, Task 1) and their second recall attempt (long-term cued recall, Task 2). Additionally, concrete sentences were remembered better than abstract sentences at both stages: They were recalled more accurately in Task 1, and they were forgotten less in Task 2.

These results show a dissociation between the two experimental factors. Whereas sentence concreteness had beneficial effects on recall performance in both short-term repetition (Task 1) and conditional long-term cued recall (Task 2), the effects of distraction condition were reversed between the two tasks: AS resulted in sentences being forgotten more in Task 1, but being forgotten less between the two tasks. Despite this dissociation, all of these effects are plausibly attributable to the engagement of semantic mechanisms in short-term repetition that complement pSTM. Concrete sentences can more easily be maintained in memory through semantic resources, including visual imagery and schema construction. Although pSTM supports verbatim maintenance of sentences under ideal conditions (without distraction), retrieval of sentences’ meanings from cSTM or LTM can support the regeneration, or “redintegration,” of the phonological form, resulting in relatively good short-term recall following distraction.

In the present study, we hypothesized that the engagement of semantic mechanisms in the retrieval of sentence content would be greater when pSTM was blocked by AS, resulting in a benefit for LTM. The reversal of the distraction effect between Tasks 1 and 2 supports this hypothesis. Although blocking pSTM reduces performance in short-term repetition, the redintegration of the sentence following the distraction results in deeper processing of the sentence’s meaning, such that it is less likely to have been forgotten when it is cued in Task 2. This result may be somewhat surprising, since rehearsal of a sentence’s form in pSTM would ordinarily be expected to have a beneficial effect on subsequent memory relative to the absence of rehearsal. However, the present results suggest that phonological rehearsal, despite its effectiveness for short-term repetition, may be a “shallower” form of maintenance that does not contribute much to LTM encoding. Semantic elaboration results in better LTM encoding, and semantic elaboration is enhanced when pSTM is blocked but the task demands nonetheless require the sentence’s retrieval for short-term repetition.

One potential concern about these findings is that a long-term advantage for the AS condition was only found for conditional delayed recall, and not for raw delayed recall. We did not expect to find an effect for raw recall, because words that are already forgotten in Task 1 are unlikely to be recalled in Task 2. However, one might argue that the beneficial effect of AS on Task 2 is an artifact of the conditional-recall scoring procedure. Since accuracy on Task 2 was compared with accuracy for the same sentence in Task 1, any manipulation that decreased performance in Task 1 (such as AS) might be expected to increase the conditional recall score on Task 2 if it actually had no effect on the degree to which the sentence was encoded into LTM. However, the present results are unlikely to be attributable to such an effect alone, because the reversal was only seen for the effect of distraction condition. Sentence abstractness resulted in decreased accuracy in short-term repetition and also decreased accuracy in conditional recall. That is, abstract sentences were forgotten more than concrete ones during the brief delay period before their first recall attempt, and then further forgotten before their second recall attempt. This is the opposite of the pattern seen for AS, suggesting that decreased conditional recall is not an inevitable consequence of increased short-term recall. Rather, both effects are more easily explained by the increased engagement of semantic resources for short-term repetition when pSTM is blocked. This finding indicates that high performance in immediate recall, when driven by phonological rehearsal, is not necessarily predictive of good encoding into LTM, which is consistent with the popular notion that “rote” rehearsal may not be the most effective technique for the memorization of verbal information.

Besides the findings about accuracy, the patterns of recall errors observed in these experiments support the idea that semantic engagement in short-term repetition supports subsequent accuracy in long-term cued recall. In short-term recall, all error types (except those that were completely unaffected by the experimental manipulations) were more frequent in conditions that reduced overall accuracy: counting versus tapping and abstract versus concrete sentences. In long-term recall, omissions and unrelated additions were more frequent for abstract sentences (patterning with overall accuracy), but semantic substitutions showed a different pattern. In Experiment 1, the effect of sentence type was actually reversed for semantic substitutions relative to other error types: They were significantly more frequent for concrete than for abstract sentences. In Experiment 2, the effect did not quite reverse as compared to other error types, but it was neutralized: Semantic substitution errors were equally frequent for both abstract and concrete sentences.

Because concrete sentences are thought to be more amenable to semantic encoding (and therefore less dependent on phonological rehearsal to support immediate recall), redintegration of a sentence’s phonological form on the basis of its meaning is likely to result in semantic substitutions. At the delayed stage, the phonological form is redintegrated solely from the retained meaning, and thus semantic effects dominate: an increase in semantic substitutions for better-remembered sentences, and a decrease in other error types. In short-term repetition, in contrast, all error types are decreased for concrete sentences. This suggests that the phonological trace still contributes to recall performance in short-term repetition, even in the face of AS, although semantic encoding does contribute.

Differences related to the type of AS

In the two experiments, we used different tasks for AS. In Experiment 1, we used a fairly demanding cognitive task, counting backward by threes. We chose this task because we not only wanted to block subvocal rehearsal; we also wanted to prevent participants from covertly refreshing the phonological form of the sentences between utterances, since an insufficiently demanding distractor task might allow for the latter (Rose et al., 2014). However, this task might also have interfered with other cognitive resources that could contribute to memory encoding, including attention and executive processes. Thus, the beneficial effects of distraction on conditional long-term recall seen in Experiment 1 might be expected to be eliminated if a simpler AS task were used, because refreshing might suffice to keep the phonological form of sentences in active memory. Alternatively, the effects might be enhanced if the relative preservation of attention and executive functions makes a strong contribution to encoding in this case. In fact, the results of both experiments were very similar: In both experiments, AS reduced accuracy in short-term repetition but improved conditional accuracy in long-term cued recall. One notable difference between the two experiments was the interaction between imageability and distraction condition. Significant interactions were apparent for both tasks in Experiment 1: Abstract sentences suffered more than concrete sentences from AS on short-term repetition, but benefited more from it on delayed recall, as predicted. In Experiment 2, however, these interactions were not present; AS reduced accuracy equivalently for both kinds of sentences on short-term repetition, and equivalently improved accuracy for long-term cued recall. An additional difference between the two experiments was the relative frequencies of semantic substitutions for concrete versus abstract sentences in delayed recall, as we noted in the previous section.

The similar results seen in both experiments suggest that traditional AS is sufficient to disrupt pSTM and cause participants to rely more on semantic resources to support recall in short-term repetition. The additional cognitive load caused by verbal calculation in Experiment 1 did not appear to interfere with the engagement of those semantic resources. In fact, it seemed to enhance the effects, bringing about the expected interaction between imageability and distraction condition in Experiment 1, and reversing the frequency of semantic substitutions between the two tasks. These enhanced effects may be attributable to a more demanding distraction task blocking covert retrieval of a sentence’s phonetic form, which participants may have been able to do occasionally between repetitions of the nonword “Babataka” in Experiment 2.

Implications for neuropsychology

The findings in this study suggest that rehearsal of verbal information in the phonological loop is not necessarily optimal for encoding information into LTM. Although unrestricted pSTM supports very high performance in short-term repetition, we have found that interfering with pSTM through AS tasks results in less forgetting of sentence content between the short-term repetition and long-term cued-recall tests.

The precise nature of the mechanisms that supplement pSTM for sentence repetition remains controversial. Although some behavioral data do suggest that there are dedicated mechanisms for the short-term maintenance of semantic information (e.g., Romani, McAlpine, & Martin, 2008; Shivde & Anderson, 2011), some theorists maintain that short-term maintenance of verbal information is attributable to the encoding and retrieval of information into LTM on a short time scale (see the introduction). The neuropsychological implication of such a view would be that both short- and long-term verbal memory depend on the same brain structures, most likely the medial temporal lobe (MTL). On the other hand, dedicated maintenance of semantic information is more likely to be related to brain activity in cortical areas previously linked to semantic processing (Binder, Desai, Graves, & Conant, 2009), especially the anterior temporal lobes (Patterson, Nestor, & Rogers, 2007).

Some evidence for a distinct semantic STM buffer has come from neuropsychological cases demonstrating a double dissociation between deficits in phonological and semantic STM (Belleville, Caza, & Peretz, 2003; N. Martin & Saffran, 1997; R. C. Martin & He, 2004; R. C. Martin, Shelton, & Yaffee, 1994). Although these findings support a strong dissociation between phonological and semantic resources for sentence repetition, they are compatible with a critical role for the MTL memory systems in supporting the use of semantic information in STM for sentence content. However, more recent neuropsychological evidence from MTL amnesic patients suggests that semantic support in verbal STM is also independent from MTL-mediated episodic memory function (Race, Palombo, Cadden, Burke, & Verfaellie, 2015; Rose, Olsen, Craik, & Rosenbaum, 2012). These findings suggest that neocortical regions distinct from those underlying pSTM may support specialized resources that are particularly important for semantic STM. Characterizing the relevant brain networks remains an important task for future studies.

Implications for neuroimaging

A few neuroimaging studies have explored the brain mechanisms supporting semantic STM. Shivde and Thompson-Schill (2004) specifically implicated the bilateral inferior frontal and left middle temporal gyri in semantic maintenance, whereas Hamilton, Martin, and Burton (2009) had partially overlapping findings implicating the left inferior and middle frontal gyri. These findings of cortical activation are consistent with the existence of specialized mechanisms for semantic maintenance, although they may also be attributable to increased semantic processing rather than to maintenance per se. Recently, Rose, Craik, and Buchsbaum (2015) found that short-term recall accuracy for single words was predicted by the level of activity in the left inferior frontal gyrus at encoding, in the left anterior temporal lobe during maintenance of the word in a distractor-filled delay period, and in the left hippocampus following distraction (i.e., during the recall phase itself). Critically, the recruitment of these areas that are commonly involved in semantic processing and episodic recall depended on the nature of the distraction; rehearsal-filled delays recruited a very different network of frontal, temporal, and parietal areas associated with the default-mode network. To investigate the question of whether semantic maintenance is driven by the engagement of LTM systems operating over short time scales, or by dedicated short-term mechanisms, it would be desirable to examine both short-term and long-term sentence recall using neuroimaging techniques.

The cued sentence recall paradigm developed here may also prove useful in neuroimaging studies of verbal memory, particularly regarding the distinctions between pSTM, cSTM, and LTM. Despite the hundreds of fMRI studies that have been conducted in the past two decades, a major debate persists between single-mechanism and dual-mechanism accounts of STM and LTM (Ranganath & Blumenfeld, 2005; Surprenant & Neath, 2008). Some studies have attempted to address the question by conducting both working memory and long-term encoding or retrieval tasks within the same participants, and have found both overlapping and distinct activations (Braver et al., 2001; Cabeza, Dolcos, Graham, & Nyberg, 2002).

One of the more powerful techniques for assessing neural correlates of the encoding of information into LTM is the subsequent memory technique. Stimuli are initially presented for encoding (either intentional or incidental) during neural data collection, and then either recognition or recall for the same stimuli is assessed afterward. Neural activity is then compared for stimuli that were subsequently remembered or forgotten. Since its initial use in event-related fMRI (Brewer, Zhao, Desmond, Glover, & Gabrieli, 1998; Wagner, Koutstaal, & Schacter, 1999), the paradigm has been used in dozens of fMRI experiments. A meta-analytic review (Kim, 2011) revealed that activation predicting subsequent memory for verbal stimuli tended to be highly left-lateralized, occurring most commonly in the medial temporal lobe, inferior temporal gyrus, and inferior frontal gyrus, all regions that are intimately linked with semantic processing. Negative effects (activity predicting subsequent forgetting) occurred in default-mode structures.

Despite the power of this technique to elucidate the underpinnings of LTM encoding, we know of no published studies that have used it to evaluate the relationships between verbal STM and LTM. This may be due to the lack of an appropriate task. The studies reviewed have tended to use single words or word pairs for encoding, and either recognition or free recall for assessing memory performance. Single words or word pairs are not very challenging as stimuli for STM, and combining them into a longer list makes it difficult to assess subsequent memory for a particular item occurring in a neuroimaging study of STM. We suggest that the sentence is a natural unit for assessing both STM and LTM, and the cued-recall paradigm developed here could be ideal for neuroimaging investigations of the relationship between memory on these two time scales. Furthermore, we expect that such studies would not simply replicate prior subsequent memory studies, because they would allow researchers to manipulate effects of phonological rehearsal and AS. In the present study, we have seen a competitive interaction between phonological rehearsal and semantic processing, such that interfering with phonological rehearsal resulted in less forgetting of sentences at the delayed stage. Thus, one might predict that areas implementing phonological rehearsal would show a negative subsequent-memory effect, since they are more active during STM for sentences that are subsequently forgotten, reflecting their role in relatively shallow “rote” memory that does not involve deep processing of semantic content. Similarly, areas involved in semantic maintenance may show a positive subsequent-memory effect. Future experiments examining both STM and LTM for sentence stimuli may therefore be able to dissociate the distinct roles of specific brain regions in the phonological and semantic short-term memory of verbal information.