The Atkinson and Shiffrin (1968) model of memory has influenced almost all the research questions we have attempted to address regarding memory. The model is probably most well known in the UK as a highly influential early model that proposed a structural distinction between short-term and long-term memory stores (STS and LTS, respectively). Even 50 years on from its publication, it pleasingly combines the intuitiveness of a good psychological theory with the explicit precision of a mathematically-defined model.

Within the Atkinson and Shiffrin (1968, 1971) model, a distinction is made between structural and control processes. The structural distinction between STS and LTS is most commonly evidenced by the serial position curve in immediate free recall (IFR; e.g., Glanzer, 1972; Glanzer & Cunitz, 1966; Murdock, 1962; Postman & Phillips, 1965), a task in which participants are presented with a list of words, one at a time, and are asked to recall as many of the list items as possible, in any order they like. In this task, participants tend to recall more words from the start and end of the list than from the middle of the list (recall advantages known as the primacy effect and the recency effect, respectively), and the Atkinson and Shiffrin (1968, 1971) model proposes that the recency effect reflects participants directly outputting the contents of the STS at test, whereas the primacy effect reflects the greater number of rehearsals afforded to the early list items (e.g., Rundus, 1971), resulting in stronger LTS traces. Consistent with this dual-store interpretation of the serial position curve, variables such as the list length (Murdock, 1962), presentation rate (Glanzer & Cunitz, 1966), and word frequency (Sumby, 1963) are assumed to selectively affect the LTS component (the primacy and middle portions) of the serial position curve, whereas other variables, such as the effect of a filled delay (Glanzer & Cunitz, 1966; Postman & Phillips, 1965), are assumed to selectively affect the STS component (the recency portion) of the serial position curve (for a review, see Glanzer, 1972; but for an alternative interpretation, see Tan & Ward, 2000; Ward, 2002).

The present article focuses on the control processes in immediate memory (or STS). Atkinson and Shiffrin (1968, 1971) proposed that participants can flexibly allocate some STS capacity to rehearsal and some to other control processes, including hypothesis testing, recoding, organizing, chunking, and grouping. Although participants might under some circumstances seek to recode or reorganize the list items, Atkinson and Shiffrin (1968, 1971) argued that it would often be advantageous for participants to devote their resources to maximizing the capacity of their rehearsal buffer, and they hypothesized that participants can exert some control over how the STS buffer is used. When the participants’ task is to maintain (for later recall) every item within a short list of items (such as in immediate serial recall, ISR), they argued that participants might make use of ordered rehearsal, which would be the optimal strategy to lengthen the stay of all the items in STS, by refreshing and offsetting decaying items in turn. By contrast, when participants must try to remember a greater number of items than the capacity of STS, such as is often the case in IFR of longer lists, the authors hypothesized that participants might engage in a different strategy, of replacing one of the items being rehearsed (those that could be said to be within the rehearsal buffer) with a new input, so that every list item would receive at least some rehearsals.

Atkinson and Shiffrin (1968) discussed a number of different possible rules for displacing old items with new items. Items within the buffer might be displaced at random (as was later assumed by, e.g., Raaijmakers & Shiffrin, 1981); participants might displace items that have resided in the buffer for longer durations, rather than more recent entries (as was later assumed by Gillund & Shiffrin, 1984; Lehman & Malmberg, 2013); or items within the buffer might be displaced or intentionally dropped from the buffer when participants decide that an item is no longer needed (as was later assumed by Lehman & Malmberg, 2009, 2011, 2013). In some circumstances, Atkinson and Shiffrin (1968) argued that it might even be preferable for presented list items not to be incorporated into the buffer.

For the purposes of the present article, we argue that these control processes in STS are important because they allow participants to vary the order of rehearsals at encoding, such that the contents of STS are most consistent with the output requirements of different tasks. If one wanted to try to recall all the items in a short list, one might try to rehearse and recall in order, starting with the first list item, but if presented with a longer list, one might distribute rehearsals more evenly across the list items by allowing each new item to enter the buffer, thereby displacing a previously rehearsed item. Although it was acknowledged that participants could perform a variety of recall tasks that might necessitate different output orders (e.g., free recall, or recall in a forward or backward direction, and perform serial probed recall), the degree of flexibility and the degree of control that participants may exert at retrieval over the output order from STS were not formally specified in the Atkinson and Shiffrin (1968) model. Although there have been considerable data and theorizing about the output orders and retrieval processes in free recall from longer lists (e.g., Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann & Usher, 2005; Howard & Kahana, 1999; Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981), and some dual-store accounts continue to be used to explain a wide range of different short-term and working memory tasks (e.g., Unsworth & Engle, 2007), an outstanding issue remains the extent to which participants can accurately retrieve items flexibly from STS in any order they like (Lewandowsky, Brown, & Thomas, 2009).

In recent years, much of our own research has examined how participants’ order of recall varies with list length in a range of immediate memory tasks (e.g., Ward, Tan, & Grenfell-Essam, 2010). When participants are presented with a short list of, say, four items and are asked to recall as many as they can, in any order, they show a strong tendency to initiate IFR in an “ISR-like” manner. That is, when presented with window, penny, jacket, kitten, they tend to initiate recall with “window” and often then continue in a forward order, even though the free recall instructions do not necessitate serial recall. In addition, when participants are asked to recall as many items as they can from a long list, they show an increased tendency to initiate recall with one of the last few items (in an “IFR-like” manner), even when the task is ISR and the experimental instructions are to try to initiate recall with the first-presented word.

This tendency to initiate recall of short lists with the first item is remarkably robust. The finding is obtained under articulatory suppression and at fast presentation rates (Grenfell-Essam, Ward, & Tan, 2013), suggesting that it is not due to rehearsal. It is unaffected by the presentation modality, even though the modality influences the serial position curves (Grenfell-Essam, Ward, & Tan, 2017). It is present, although somewhat attenuated, with free recall under continual-distractor conditions and delayed free-recall conditions (Spurgeon, Ward, & Matthews, 2014b), suggesting that it is not entirely due to the output of a short-term buffer store. It is also present with visual presentation under articulatory suppression (Spurgeon, Ward, & Matthews, 2014a), suggesting that it is not due to the proposed function of the phonological store (Baddeley, 1986, 2000); and it is even present with visual–spatial dots as stimuli (Cortis, Dent, Kennett, & Ward, 2015; Cortis Mack, Dent, & Ward, 2018) and with tactile stimulations to the face (Cortis et al., 2015), suggesting that it is not the result of an exclusively verbal mechanism. It should be noted that the tendency to initiate recall with the first list item was attenuated under certain of these conditions, but it was nonetheless always the modal tendency in conditions with short lists.

The Ward et al. (2010) findings have been replicated and extended by Lehman and Malmberg (2013), who varied the list length in IFR by manipulating series of single items and pairs of items for IFR. They proposed that the first item is most closely associated with the temporal context of the list and that it is recalled first with a probability that is inversely proportional to the list length. When the first item is not recalled, participants recall from the buffer, and the contents of the buffer most commonly contain recency items. The model correctly assumes that participants’ modal response is to initiate recall with the very last list item when the list is long, but it also correctly predicts a novel finding—that participants tend to initiate recall of very long lists with the penultimate list item when the items are presented in pairs. When participants were presented with series of pairs of items, participants tended to initiate recall with the left-hand item of the most recent pair.

The Ward et al. (2010) findings have also been modeled by Farrell (2012), who argued that participants segment long sequences of lists into multiple groups of items. Farrell argued that participants tend to initiate recall with either the first list item or the first item in the most recent (or current) group. He assumed that the segmented groups were of varying sizes, and he was able to successfully model the tendency to initiate short lists of words with the first list item—with short lists, the first list item is also the first item of the most recent group. He was also able to successfully model the tendency to initiate long lists of words with one of the last few items (the variable-sized group structure meant that recall initiated most often with the last item, but could also be initiated with other recency items), and he showed that where a participant initiates a trial affects the subsequent items that the participant recalls (as participants seek to continue their recall with successive items within the recalled group). Consistent with the grouping account, Spurgeon, Ward, Matthews, and Farrell (2015) showed that when the grouping structure in IFR and ISR was regularized by introducing consistent temporal gaps after every third item, participants consistently initiated recall of long lists with the first item of the most recent (or current) group.

Our present research most closely follows the recent work of Tan, Ward, Paulauskaite, and Markou (2016), who reported the only manipulation to date that has shifted the modal tendency to initiate recall of a short list of words away from the first list item. In their experiments, Tan et al. presented participants with short lists of four, five, or six words, and in different blocks of trials, they required participants to recall one, two, three, or all the words on the list. Just as had been anticipated almost 50 years earlier, by Atkinson and Shiffrin (1968), when participants were presented with a short list of words and told in advance to recall all the words, they demonstrated the now well-established tendency to initiate recall with the first item (Ward et al., 2010). However, when they were presented with a short list of words and were told in advance to recall only a single item, they typically showed a different first response, and tended to recall the last item instead. Moreover, participants also showed a slight preference to initiate recall of two items with the penultimate list item. The Tan et al. findings suggest that participants can exert considerable (but not total) control over the output order in immediate recall, and the study provides an informative method to examine participants’ preferred strategies for recalling different numbers of items under a variety of task instructions.

However, it is not possible to determine from the Tan et al. (2016) findings whether the change in output order based on the different numbers of items to be recalled reflected different encoding (or rehearsal) strategies during the presentation of the list, or whether participants could adopt a range of different retrieval strategies and could flexibly select strategies to recall different items, depending on the number of items to be recalled. This is because in Tan et al.’s study, the number of items to be recalled was always precued: Participants always knew the number of words to be recalled in advance of list presentation, and so were free to selectively encode lists of items in different ways, depending on the number of items to be recalled.

There is a growing body of evidence that participants can use different retrieval strategies when recalling lists of six to eight items. At these list lengths, participants tend to initiate recall with recency items when they are free to do so, but can initiate recall with the first list item when this is required. For example, Bhatarah, Ward, and Tan (2008) showed that a group of participants who were precued to perform IFR and a group of participants who were precued to perform ISR produced serial position curves that were characteristic of their respective tasks: Participants precued to perform IFR produced U-shaped serial position curves, and participants who were precued to perform ISR produced serial position curves with extended primacy effects. Critically, a third group of participants encoded the lists not knowing which of the two tasks they were to perform, and were only told the task immediately prior to recall. When this third group were postcued to perform IFR, they performed like the precued IFR group, whereas when this third group were postcued to perform ISR, they performed like the precued ISR group. (Other examples of flexibility in recall in different immediate memory tasks include, e.g., Bhatarah, Ward, Smith, & Hayes, 2009; Grenfell-Essam & Ward, 2012; Lewandowsky et al., 2009; and Tan & Ward, 2007.)

These studies that manipulated test expectancy showed that participants can exert at least some control over their output order at retrieval, such that they can initiate recall with the first list item when this is required, or initiate recall with one of the last few items when they are free to do so. However, the extent to which participants can exert control at retrieval remains uncertain. Control may be limited to the choice of two retrieval strategies (privileged access to the first list item or privileged access to one of the last list items), or participants may be able to exert far greater flexibility and control in accessing and ordering the list items. Moreover, it remains uncertain whether the strategy changes observed by the Tan et al. (2016) study, based on the number of items to be recalled, would be replicated under postcued conditions. If participants’ preferred recall orders were affected by the number of words to be recalled (as in Tan et al., 2016) even when this information was provided after the words had been encoded, this would indicate that participants possessed a degree of flexibility in their choice of retrieval strategies and could choose to use different retrieval strategies when recalling different numbers of items. By contrast, if participants no longer changed their output order when asked to recall different numbers of words from a list, this would suggest that the Tan et al. findings should be interpreted as highlighting the importance of different encoding strategies or different encoding and storage control processes in determining recall order.

Experiment 1

In Experiment 1, participants were presented with short lists of four, five, or six words for IFR. Short lists of words were used because these list lengths are typically associated with many short-term memory tasks. Depending on the proposed capacity of a hypothetical STS buffer, the addition of a fourth, fifth, or sixth item might be expected to displace items from the STS buffer. Following the last word of each list, participants were presented with a screen informing them of the number of words contained in the list they had just seen and the number of words from the list that they should recall. These two factors (list length and recall requirement) were randomized. Participants recalled the required number of words by writing them down on a response grid and saying the recalled words aloud as they wrote them down.

The advantage of randomizing list lengths 4–6 is that, while participants would be able to encode with certainty the list position based on the start of the list, they would not be able to accurately encode the list position with respect to the end of the list (at least for Serial Positions 1–5). Given a list of uncertain length, n, there would be convincing evidence for flexibility in retrieval strategies based on the recency of the list item if we could show an increased tendency to initiate recall selectively with the last (n), the penultimate (n–1), or the antepenultimate (n–2) item when participants were postcued to recall one, two, or three items, respectively.

Method

Participants

Twenty-five psychology students from City, University of London, participated in this experiment in exchange for course credits. All participants were fluent in English and had normal or corrected-to-normal vision.

Materials and apparatus

The words chosen were those used by Tan et al. (2016). Six hundred monosyllabic words with frequencies of occurrence of ten per million and above, based on the Kučera and Francis (1967) norms, were randomly selected from the MRC Psycholinguistic Database (Coltheart, 1981). From this pool of words, 120 experimental lists were constructed, 40 for each of the list lengths of four, five, or six words. The words for each list were selected randomly for each participant. No participant saw the same word twice. A response booklet with 120 text boxes, each with six numbered lines, was provided to the participants for free recall. The words were presented in 24-point Courier New bold font on a computer monitor using the E-Prime application.

Design

A within-subjects design was used. There were three within-subjects independent variables: recall requirement, with four levels (recall 1, recall 2, recall 3, and recall all); list length, with three levels (4, 5, and 6); and serial position, with up to six levels (1–6). The main dependent variable was the probability of first recall (PFR).

Procedure

Participants were tested individually. They were presented with two practice trials, the first of five words, and the second of four words, followed by 120 experimental word lists. List length and recall requirement were randomized, with ten trials for each combination of these two variables. On each trial, a series of four, five, or six words was presented one at a time in the center of the screen. Each word was displayed for 2 s. Participants read each word aloud as it was presented. At the end of each list, an empty grid containing four, five, or six numbered rows appeared on the screen, informing participants of the number of words contained in the list they had just seen. They were also instructed to recall either all the words (recall all) or only one, two, or three words from the list (recall 1, recall 2, and recall 3, respectively), in any order they wished. Participants wrote down their responses in a paper response booklet provided and recalled their answers out loud as they wrote. Recall was self-paced.

Results

The PFRs for each list length, recall requirement, and serial position are presented in Fig. 1. The PFR refers to the proportion of trials on which the first word recalled was from a particular serial position.

Fig. 1
figure 1

Data from Experiment 1 (immediate free recall), showing the probability of first recall as a function of serial position (SP) and list length (LL: 4, 5, or 6) when participants were required to recall one word (A, upper left), two words (B, upper right), three words (C, lower left), and all the words (D, lower right). Note that neither the list length nor the number of words to be recalled was known to participants in advance of the list presentation

Inspection of Figs. 1A, B, and C suggests that the tendency to initiate recall with the first list item increases with the number of words to be recalled. Participants were most likely to initiate their recall with the first list item when asked to recall all the items in the list. However, this tendency decreased as list length increased. Participants were most likely to initiate recall with the last list item when asked to recall only one item; this tendency remained relatively constant across the three list lengths.

We have behaved like “pragmatic researchers” (Wagenmakers et al., 2018) by adopting an inclusive statistical approach to the analyses reported in this article. The PFR data were first analyzed by performing separate 3 (list length: 4, 5, 6) × 4 (recall requirement: recall 1, recall 2, recall 3, and recall all) within-subjects analysis of variance (ANOVA) tests for the first, final, penultimate, and antepenultimate serial positions, using the Greenhouse–Geisser correction whenever the assumption of sphericity was violated.

These same data were then analyzed using Bayesian repeated measures ANOVA (BANOVA; Wagenmakers et al., 2018) tests with the independent variables (i.e., list length and recall requirement) as fixed effects and participant as a random effect, using the JASP software package (JASP Team, 2018). This method of analysis allows for comparison of the data given one model (e.g., the null model assuming only a random effect of participant, M0) to those for another model (e.g., an alternative model assuming an effect of list length, M1). The ratio of these likelihoods is the Bayes factor (BF), which expresses the relative evidence for the alternative model (BF10) or the null model (BF01). One can also compare the relative evidence between models by examining the ratio between the BFs associated with one model (e.g., a model including an effect of list length) and another model (e.g., a model including effects of both list length and recall requirement). The raw data from all three experiments can be found in the supplemental material accompanying this article.

Figure 2 replots our PFR data, showing the proportions of trials starting with a specified serial position, for trials requiring different numbers of words to be recalled and for each list length.

Fig. 2
figure 2

Data from Experiment 1 (immediate free recall), showing the probability of first recall as a function of list length (4, 5, or 6) and number of words to be recalled (1, 2, 3, or all) for the words presented in the (A) first serial position (SP 1, upper left), (B) final serial position (SP N, upper right), (C) penultimate serial position (SP N–1, lower left), and (D) antepenultimate serial position (SP N–2, lower right)

First serial position

Using a conventional ANOVA, we found a significant main effect of list length, F(2, 48) = 22.71, p < .001, ηp2 = .486; a significant main effect of recall requirement, F(2.02, 48.46) = 19.24, p < .001, ηp2 = .445; and a significant interaction between list length and recall requirement, F(6, 144) = 4.51, p < .001, ηp2 = .158.

Simple main effects revealed that for list length 4, the “recall 1” condition was significantly different from the “recall 3” and “recall all” conditions. In addition, the “recall all” condition was significantly different from the “recall 2” condition (all ps at least < .05.) For list length 5, the “recall all” condition was significantly different from the “recall 1” and “recall 2” conditions (ps at least < .001); the difference between the “recall all” and “recall 3” condition just failed to achieve significance (p = .05). For list length 6, the “recall 1” condition was significantly different from all other recall conditions (all ps at least < .05). Simple main effects also revealed that for the “recall 1” and “recall 2” conditions, list length 4 was significantly different from list lengths 6 and 5, respectively (ps < .05). For the “recall 3” condition, list length 4 was significantly different from the other list lengths (ps at least < .05). Finally, for the “recall all” condition, all three list lengths were significantly different from one another (all ps at least < .05).

Using a BANOVA, there was strong evidence for a model including effects of list length, recall requirement, and the two-way interaction (BF10 = 1.71 × 1020), but this model was not substantially preferred (BF = 2.52) to the simpler model containing only the two main effects (BF10 = 6.76 × 1019). Post-hoc comparisons of list length revealed strong evidence for differences between list lengths 4 and 5 (BF10,U = 2,402), between list lengths 4 and 6 (BF10,U = 3.611 × 108), and between list lengths 5 and 6 (BF10,U = 12.0). Post-hoc comparisons of recall requirement revealed evidence for differences between all different levels of recall requirement. Thus, post-hoc comparisons revealed moderate evidence for a difference between “recall 1” and “recall 2” (BF10,U = 6.47), and strong evidence for differences between “recall 1” and “recall 3” (BF10,U = 7,361) and between “recall 1” and “recall all” (BF10,U = 5.658 × 108). We also observed strong evidence for differences between “recall 2” and “recall 3” (BF10,U = 11.17), between “recall 2” and “recall all” (BF10,U = 476,474), and between “recall 3” and “recall all” (BF10,U = 254).

Final serial position

There was a nonsignificant main effect of list length, F(2, 48) = 1.78, p > .05, ηp2 = .069; a significant main effect of recall requirement, F(1.95, 46.81) = 11.39, p < .001, ηp2 = .322; and a nonsignificant interaction between list length and recall requirement, F(6, 144) = 1.18, p > .05, ηp2 = .047. Bonferroni post-hoc comparisons revealed that the “recall 1” condition was significantly different from all other recall conditions (all ps at least < .05).

Using a BANOVA, we found strong evidence for a best model including effects of recall requirement (BF10 = 1.27 × 1011); this best model was preferred (BF = 11.48) to the model with both main effects (BF10 = 1.05 × 1010) and also (BF = 319.2) to the model with both main effects and their interaction (BF10 = 3.96 × 108). Post-hoc comparisons of recall requirement revealed evidence for differences between “recall 1” and all other levels of recall requirements. Thus, post-hoc comparisons revealed strong evidence for differences between “recall 1” and “recall 2” (BF10,U = 61,681), between “recall 1” and “recall 3” (BF10,U = 854), and between “recall 1” and “recall all” (BF10,U = 2.400 × 106). There was also moderate evidence for a difference between “recall 3” and “recall all” (BF10,U = 3.76). There was strong evidence against a difference between “recall 2” and “recall 3” (BF10,U = 0.190), and moderate evidence against a difference between “recall 2” and “recall all” (BF10,U = 0.342).

Penultimate serial position

We found a nonsignificant main effect of list length, F(2, 48) = 1.76, p > .05, ηp2 = .068; a significant main effect of recall requirement, F(3, 72) = 7.59, p < .001, ηp2 = .240; and a significant interaction between list length and recall requirement, F(6, 144) = 2.70, p < .05, ηp2 = .101. Simple main effects revealed that for list length 4, the “recall 2” condition was significantly different from the “recall 3” and “recall all” conditions (ps at least < .05). For list length 5, the “recall 2” condition was significantly different from all other recall conditions (all ps < .01). Finally, simple main effects also revealed that for the “recall 1” condition, list lengths 5 and 6 were significantly different from each other (p < .01).

In the BANOVA, there was strong evidence for a best model including effects of recall requirement (BF10 = 51,580); this model was preferred (BF = 4.59) to the model with both main effects (BF10 = 11,250), and also preferred (BF = 14.06) to the model with both main effects and the interaction (BF10 = 3,668). Post-hoc comparisons of recall requirement revealed evidence for a difference between “recall 1” and “recall 2” (BF10,U = 15.95), but moderate evidence against a difference between “recall 1” and “recall 3” (BF10,U = 0.269), and no substantial evidence for a difference between “recall 1” and “recall all” (BF10,U = 0.988). There was, however, strong evidence for a difference between “recall 2” and both “recall 3” (BF10,U = 15,334) and “recall all” (BF10,U = 3,065). There was moderate evidence against a difference between “recall 3” and “recall all” (BF10,U = 0.180).

Antepenultimate serial position

We observed a nonsignificant main effect of list length, F(2, 48) = 1.98, p > .05, ηp2 = .076; a significant main effect of recall requirement, F(3, 72) = 3.73, p < .05, ηp2 = .135; and a nonsignificant interaction between list length and recall requirement, F(6, 144) = 0.83, p > .05, ηp2 = .033. Bonferroni post-hoc comparisons revealed that the “recall 1” and “recall 2” conditions were significantly different from each other (p < .05).

Using a BANOVA, we found evidence for a best model including the effect of recall requirement (BF10 = 5.78), and this best model was moderately preferred (BF = 4.38) to the model with both main effects (BF10 = 1.32), and strongly preferred (BF = 113.3) to the model with both main effects and the interaction (BF10 = 0.051). Post-hoc comparisons of recall requirement revealed evidence for differences between “recall 1” and “recall 2” (BF10,U = 13.32), “recall 1” and “recall 3” (BF10,U = 10.47), but not between “recall 1” and “recall all” (BF10,U = 0.293). We found moderate evidence against a difference between “recall 2” and “recall 3” (BF10,U = 0.149), but no substantial difference between “recall 2” and “recall all” (BF10,U = 0.377) or between “recall 3” and “recall all” (BF10,U = 0.654).

The recall of subsequent words

Although the emphasis in this manuscript is on the first word recalled, it is still informative to consider the complete patterns of output order on trials in which participants were asked to recall two, three, or all the list items. We provide two tables showing the patterns of recalls in Experiment 1. Table 1 shows the distribution of recalls in Experiment 1 (IFR) as a function of input serial position and output position for each recall requirement and list length.

Table 1 Output order data from Experiment 1 (immediate free recall)

In Table 1, the values in Output Position 1 represent the first words that are recalled, which have been the data in the preceding analyses. We have already seen that when only one word is to be recalled, there is a heightened tendency to say the last word; when only two words are to be recalled, there is a heightened tendency to say the penultimate word; but when three or more items are to be recalled, there is a tendency to start with the first word. When one considers the later output positions, there is an indication that if participants are asked to recall three or more items, they tend to output early list items in the output position corresponding to their input position. By contrast, items presented at later serial positions are often recalled at any output position, and they are the most commonly output words at later output positions. Finally, it is clear that participants are not always able to recall a third word in the “recall 3” condition, and the numbers of empty cells increase from the fourth output position onward in the “recall all” conditions.

In Table 2, we consider the patterns of transitions in the output sequences from words of different list lengths and recall requirements. The larger values in the leading diagonals provide further evidence of forward-ordered recall in IFR: Words that had been presented at serial position n tended to immediately precede words that had been presented at serial position n+1. This pattern was observed when only two or three words were to be recalled, as well as when participants were required to recall all the list items.

Table 2 Transition data from Experiment 1 (immediate free recall)

Table 2 also shows a tendency for the participants to transition from the last list item to the penultimate list item, and that there was not a strong tendency to “wrap around” from serial position n to Serial Position 1. Finally, participants tended to terminate their recall prematurely more often (i.e., they transitioned more often to “end” responses) following recall of the last item in the list. This could reflect the fact that participants tend to recall in forward order, and so have already recalled all they can remember prior to recall of the last list item, but it could also reflect the fact that participants cannot benefit from a forward-ordered transition from the last list item, leaving them more prone to not recalling a further list item.

These tables provide important information concerning output order and pairwise transitions, but they do not make explicit the whole sequences of output in the “recall all” and “recall 3” conditions. Following Lewandowsky et al. (2009), we provide a short list of the most frequently output complete sequences at each list length and condition (only sequences with ten or more instances are reported, with the observed frequencies following the sequences in parentheses). For list length 4, participants in the “recall all” condition most frequently output the sequences “1234” (64), “124” (11), “342” (11), “1324” (10), “134” (10), and “432” (10), whereas participants in the “recall 3” condition most frequently output the sequences “123” (43), “432” (21), “134” (20), “234” (18), “124” (13), “423” (11), “431” (11), “412” (10), and “43” (10). For list length 5, participants in the “recall all” condition most frequently output the sequences “12345” (19), and “54” (15), whereas participants in the “recall 3” condition most frequently output the sequences “543” (19), “123” (18), “345” (16), “124” (14), “54” (13), “125” (12), and “542” (12). Finally, for list length 6, participants in the “recall all” condition most frequently output the sequences “654” (12) and “456” (9), whereas participants in the “recall 3” condition most frequently output the sequences “654” (20), “123” (15), “456” (15), “564” (12), “65” (11), and “563” (10). When one considers these sequences together, one sees the transition from more forward-ordered recall of sequences for shorter lists to more recency-based strategies for longer lists.

Finally, we briefly examined the individual differences within our data to see whether participants’ tendencies to initiate recall with a particular serial position at one list length and condition correlated with their tendencies to initiate recall with that serial position at other list lengths and/or conditions. Since there were 12 different experimental conditions in Experiment 1, this produced 66 pairwise comparisons between frequencies of trials in which participants initiated recall with Serial Position 1. These 66 individual pairwise correlations were all significantly positively correlated (.45 < r < .90, all ps < .05). Similarly, there were 66 pairwise comparisons between frequencies of initiating recall with the last item, serial position n. Of these 66 individual pairwise correlations, all were positively correlated (.24 < r < .89), and 57 were significantly positively correlated (.400 < r < .89, ps < .05). We found far greater variation in the correlations (– .23 < r < .67) among the 66 pairwise frequencies of initiating recall with the penultimate list items. Similarly, there was considerable variation in the correlations (– .30 < r < .64) among the 66 pairwise frequencies of initiating recall with the antepenultimate list items. Thus, we observed considerable consistency in participants’ tendency to initiate recall with the first and last items, but the strategic behavior to initiate recall with middle-list items was more variable.

Discussion

The results from Experiment 1 showed that, even when the number of words to be recalled was unpredictable (postcued), participants were more likely to initiate recall with the first list item when asked to recall all the list items (particularly for the shorter lists) than when asked to recall fewer items. By contrast, they were most likely to initiate recall with the last list item when asked to recall only a single item. Additionally, at least for the shorter lists, participants were most likely to recall the penultimate list item first when cued to recall only two items. Given that both the list length and the recall requirement were postcued, these recall patterns suggest that participants were able to select, at retrieval, the items with which they should initiate their recalls (although this was less apparent for list length 6).

Taken together, our findings suggest that the patterns of output orders vary with the number of items to be recalled, in a similar manner to that observed by Tan et al. (2016). This suggests that participants can flexibly retrieve from STS from the first item (if they are to recall many items), from the last item (if they are to recall one item), and to a lesser extent from the penultimate list item (if they are asked to recall two items). Participants, however, appear to be limited in terms of how far back from the end of the list they can go to retrieve their first item, since there was little evidence of them initiating their recalls with the antepenultimate item when asked to recall three items.

Experiment 2

The recall requirement and the list length manipulations of Experiment 1 were repeated in Experiment 2, using the ISR task. One motivation in our recent work (e.g., Bhatarah et al., 2008; Ward et al., 2010) has been to encourage theorists to consider applying memory models to a wider range of related tasks (for earlier debate on this issue, see Brown, Chater, & Neath, 2008; Murdock, 2008). Although STS buffer models of memory are typically proposed as models of IFR, the original conception of STS in the Atkinson and Shiffrin (1968) model assumed that the STS rehearsal buffer might be used to perform a wide variety of immediate memory tasks, including immediate and delayed serial recall (in the form of the Brown–Peterson task). Indeed, the Atkinson and Shiffrin (1968) model hypothesized that the STS rehearsal buffer might consist of ordered slots, and they proposed that ordered rehearsal was not only possible but efficient in maximizing recall. Furthermore, in the case of Atkinson and Shiffrin (1968, Exp. 8), it was assumed that participants could keep the presented items in consecutive order in the rehearsal buffer (modeled with a buffer capacity of five) in order to perform serial probed recall. Some 50 years on, it is worth examining the retrieval strategies that might be used to perform immediate recall in a range of related immediate memory tasks.

Method

Participants

Twenty-seven psychology students from City, University of London, participated in this experiment in exchange for course credits. All participants were fluent in English and had normal or corrected-to-normal vision. None had taken part in Experiment 1.

Materials and apparatus

The materials and apparatus were identical to those used in Experiment 1.

Design

We investigated three within-subjects independent variables: recall requirement, with four levels (recall 1, recall 2, recall 3, and recall all); list length, with three levels (4, 5, and 6); and serial position, with up to six levels (1–6). The main dependent variable was the PFR for each serial position.

Procedure

The procedure was identical to that used in Experiment 1, with the exception that participants carried out ISR instead of free recall at the end of each list. They were required to write down their responses in strict forward serial order, working down the response grid and writing each word in the row that corresponded to its serial position at presentation. Participants were told to leave a blank for any words they did not recall. Participants spoke their recalls aloud as they wrote their responses in the grids, so that we could determine both the output order (based on spoken recall) and the participants’ judgments of serial position (based on the written gird position).

Results

The PFRs for each list length, recall requirement, and serial position are presented in Fig. 3.

Fig. 3
figure 3

Data from Experiment 2 (immediate serial recall), showing the probability of first recall as a function of serial position (SP) and list length (LL: 4, 5, or 6) when participants were required to recall one word (A, upper left), two words (B, upper right), three words (C, lower left), and all the words (D, lower right). Note that neither the list length nor the number of words to be recalled was known to participants in advance of the list presentation

The recall patterns illustrated in Figs. 3A, B, and C are clear and consistent. Unsurprisingly, given the ISR instructions, participants were most likely to recall the first list item when they were asked to recall all the items, and were most likely to recall the last list item when they were asked for only one item. Participants also frequently began their recall with the penultimate list item when they were asked to recall two items.

As in Experiment 1, the PFR data were analyzed by performing separate 3 (list length: 4, 5, 6) × 4 (recall requirement: recall 1, recall 2, recall 3, and recall all) within-subjects ANOVAs (and repeated measures BANOVAs) for the first, final, penultimate, and antepenultimate serial positions. Figure 4 shows these PFR data for each list length and recall condition.

Fig. 4
figure 4

Data from Experiment 2 (immediate serial recall), showing the probability of first recall as a function of list length (4, 5, or 6) and number of words to be recalled (1, 2, 3, or all) for the words presented in the (A) first serial position (SP 1, upper left), (B) final serial position (SP N, upper right), (C) penultimate serial position (SP N–1, lower left), and (D) antepenultimate serial position (SP N–2, lower right)

First serial position

We found a significant main effect of list length, F(1.46, 1.2) = 82.96, p < .001, ηp2 = .761; a significant main effect of recall requirement, F(2.31, 3.10) = 67.61, p < .001, ηp2 = .722; and a nonsignificant interaction between list length and recall requirement, F(6, 156) = 2.10, p > .05, ηp2 = .075. Bonferroni post-hoc comparisons revealed that all the recall conditions were significantly different from all other recall conditions (all ps at least < .01). In addition, all the list lengths were significantly different from one another (all ps < .001).

Using a BANOVA, we uncovered strong evidence for a best model including the effects of list length and recall requirement (BF10 = 9.27 × 1055), and this model was moderately preferred (BF = 5.69) to the model including both effects and the interaction (BF10 = 1.61 × 1055). Post-hoc comparisons of list length revealed strong evidence for differences between list lengths 4 and 5 (BF10,U = 2.74 × 1013), between list lengths 4 and 6 (BF10,U = 9.606 × 1017), and between list lengths 5 and 6 (BF10,U = 15,356). Post-hoc comparisons of recall requirement revealed evidence for differences between all different levels of recall requirements. Thus, post-hoc comparisons revealed strong evidence for differences between “recall 1” and “recall 2” (BF10,U = 128,864), between “recall 1” and “recall 3” (BF10,U = 4.08 × 1014), and between “recall 1” and “recall all” (BF10,U = 3.727 × 1021). There was also strong evidence for differences between “recall 2” and “recall 3” (BF10,U = 2.45 × 108), between “recall 2” and “recall all” (BF10,U = 1.80 × 1016), and between “recall 3” and “recall all” (BF10,U = 2,399).

Final serial position

This analysis revealed a significant main effect of list length, F(2, 52) = 7.31, p < .01, ηp2 = .219; a significant main effect of recall requirement, F(1.11, 28.84) = 115.93, p < .001, ηp2 = .817; and a significant interaction between list length and recall requirement, F(3.43, 89.28) = 7.74, p < .001, ηp2 = .229. Simple main effects revealed that for all list lengths, the “recall 1” condition was significantly different from all other recall conditions (all ps < .001). In addition, for the “recall 1” condition, list length 4 was significantly different from the other two list lengths (ps < .01).

A BANOVA produced strong evidence for a best model including effects of recall requirement, list length, and their interaction (BF10 = 1.88 × 1083); this best model was moderately preferred (BF = 5.60) to the model with both main effects (BF10 = 3.36 × 1082), and was also preferred (BF = 7.48) to the model with only recall requirement (BF10 = 2.514 × 1082). Post-hoc comparisons of the effects of list length revealed evidence for differences between list length 4 and list length 5 (BF10,U = 40.1) and between list length 4 and list length 6 (BF10,U = 7.30), but evidence against a difference between list length 5 and list length 6 (BF10,U = 0.137).

Post-hoc comparisons of the effects of recall requirement revealed evidence for differences between “recall 1” and all other levels of recall requirements. Thus, post-hoc comparisons revealed strong evidence for difference between “recall 1” and “recall 2” (BF10,U = 6.19 × 1024), between “recall 1” and “recall 3” (BF10,U = 1.27 × 1024), and between “recall 1” and “recall all” (BF10,U = 2.89 × 1024). There was moderate evidence against a difference between “recall 2” and “recall 3” (BF10,U = 0.331), but no substantial evidence for a difference between “recall 2” and “recall all” (BF10,U = 0.460). We also found moderate evidence against a difference between “recall 3” and “recall all” (BF10,U = 0.131).

Penultimate serial position

There was a significant main effect of list length, F(2, 52) = 12.30, p < .001, ηp2 = .321; a significant main effect of recall requirement, F(1.72, 44.83) = 77.13, p < .001, ηp2 = .748; and a significant interaction between list length and recall requirement, F(4.09, 106.29) = 5.22, p < .01, ηp2 = .167. Simple main effects revealed that for all list lengths, the “recall 2” condition was significantly different from all other recall conditions (all ps at least < .01). In addition, for the “recall 2” condition, list length 6 was significantly different from the other two list lengths (ps < .01). Finally, for the “recall all” condition, list lengths 4 and 6 were significantly different from each other (p < .05).

Using a BANOVA, we found strong evidence for a best model including the effects of both main effects and the interaction (BF10 = 7.323 × 1049); this model was preferred (BF = 28.1) to the model with both main effects (BF10 = 2.577 × 1048) and preferred (BF = 5,942) to the model with only recall requirements (BF10 = 1.217 × 1046). Post-hoc comparisons of the effects of list length revealed no substantial evidence for differences between list length 4 and list length 5 (BF10,U = 0.949), but there was strong evidence for differences between list length 4 and list length 6 (BF10,U = 359.8) and between list length 5 and list length 6 (BF10,U = 20.69). Post-hoc comparisons of the effects of recall requirement revealed evidence for differences between “recall 2” and all other recall requirements. Thus, strong evidence emerged for differences between “recall 2” and “recall 1” (BF10,U = 2.55 × 1015), between “recall 2” and “recall 3” (BF10,U = 1.17 × 1019), and between “recall 2” and “recall all” (BF10,U = 5.16 × 1017). However, there was moderate evidence against differences between “recall 1” and “recall 3” (BF10,U = 0.148), between “recall 1” and “recall all” (BF10,U = 0.124), and between “recall 3” and “recall all” (BF10,U = 0.155).

Antepenultimate serial position

There was a significant main effect of list length, F(2, 52) = 7.08, p < .01, ηp2 = .214; a significant main effect of recall requirement, F(1.51, 39.19) = 18.83, p < .001, ηp2 = .420; and a nonsignificant interaction between list length and recall requirement, F(3.72, 96.63) = 1.47, p > .05, ηp2 = .054. Bonferroni post-hoc comparisons revealed that the “recall 3” condition was significantly different from all other recall conditions (all ps at least < .01). The “recall 1” and “recall 2” conditions were also significantly different from each other (p < .05). Finally, list length 4 was significantly different from the other two list lengths (ps at least < .05).

A BANOVA showed evidence for a best model including the effects of recall requirement and list length (BF10 = 5.23 × 1014), but this best model was not substantially preferred (BF = 1.16) to the model with only recall requirements (BF10 = 4.52 × 1014); the best model was strongly preferred (BF = 18.3), however, to the model with both main effects and the interaction (BF10 = 2.86 × 1013). Post-hoc comparisons of recall requirement revealed evidence for differences between “recall 1” and “recall 3” (BF10,U = 3.87 × 107), “recall 2” and “recall 3” (BF10,U = 16,692), and “recall 3” and “recall all” (BF10,U = 15,821). There was also strong evidence for a difference between “recall 1” and “recall 2” (BF10,U = 16.641), but no substantial difference between “recall 1” and “recall all” (BF10,U = 2.45), and moderate evidence against a difference between “recall 2” and “recall all” (BF10,U = 0.143).

Recall of subsequent words

We again provide two tables showing the patterns of recalls in Experiment 2. Table 3 shows the distribution of recalls in Experiment 2 (ISR) as a function of input serial position and output position for each recall requirement and list length.

Table 3 Output order data from Experiment 2 (immediate serial recall)

In Table 3, the values in Output Position 1 represent the first words that were recalled, which were the data in the preceding analyses. We have already seen that when only one word is to be recalled, there is a heightened tendency to say the last word; when two or three words are to be recalled, there are heightened tendencies to initiate recall with the penultimate and antepenultimate words, respectively; and when all the words are to be recalled, there is a heightened tendency to start with the first word. Not surprisingly, given the ISR instructions, participants tended to initiate recall of three or more list items with the first list item and to proceed in forward order. If participants incorrectly output a word in the wrong position, they were far more likely to output the word sooner rather than later than they should. Finally, it is again clear that participants were not always able to recall a third word in the “recall 3” condition, and the numbers of empty cells increase from the fourth output position onward in the “recall all” conditions.

In Table 4, we examine the patterns of transitions in the output sequences from words of different list lengths and recall requirements. The larger values on the leading diagonals provide evidence of greater forward-ordered recall in ISR than in IFR: Words that had been presented at serial position n tended almost always to precede words that had been presented at serial position n+1. This pattern was observed when only two or three words needed to be recalled, as well as when participants were required to recall all the list items.

Table 4 Transition data from Experiment 2 (immediate serial recall)

These tables provide important information concerning output order and pairwise transitions, but they do not make explicit the whole sequences of output in the “recall 3” and “recall all” conditions. Following Lewandowsky et al. (2009), we again provide a short list of the most frequently output sequences at each list length and condition (only sequences with ten or more instances are reported, with the observed frequencies in parentheses). For list length 4, participants in the “recall all” condition most frequently output the sequences “1234” (170), “134” (16), “34” (16), “124” (15), “123” (12), and “4” (12), whereas participants in the “recall 3” condition most frequently output the sequences “123” (113), “124” (38), “234” (37), “134” (34), “and 34” (14). For list length 5, participants in the “recall all” condition most frequently output the sequences “12345” (57), “45” (21), “345” (19), “1245” (18), “1235” (16), “1234” (14), “145” (14), “1345” (12), “125” (11), and “15” (11), whereas participants in the “recall 3” condition most frequently output the sequences “123” (60), “345” (52), “125” (35), “145” (26), “45” (19), “5” (12), and “124” (11). Finally, for list length 6, participants in the “recall all” condition most frequently output the sequences “56” (28), “456” (22), “456” (9), “6” (16), “1234” (15), “126” (14), “1456” (13), “123456” (11), “156” (11), and “3456” (10), whereas participants in the “recall 3” condition most frequently output the sequences “456” (55), “156” (36), “123” (35), “56” (26), “126” (24), and “125” (11).

Finally, we examined the individual differences within our ISR data by examining the correlations between participants’ tendencies to initiate recall with a particular serial position at one list length and condition with their tendencies to initiate recall with other serial positions at other list lengths and/or conditions. Since there were 12 different experimental conditions, 66 pairwise comparisons were possible between the frequencies of trials in which participants initiated recall with Serial Position 1. Of these 66 individual pairwise correlations, all were positively correlated (.27 < r < .85), and 59 were significantly positively correlated (rs > .38, ps < .05). Similarly, there were 66 pairwise comparisons between frequencies of initiating recall with the last item, serial position n. Of these 66 individual pairwise correlations, all were positively correlated (– .20 < r < .92), but only 20 were significantly positively correlated (.39 < r < .92, ps < .05). We also observed wide variation in the correlations (– .26 < r < .70) between the 66 pairwise frequencies of initiating recall with the penultimate list items. Similarly, there was wide variation in the correlations (– .47 < r < .77) between the 66 pairwise frequencies of initiating recall with the antepenultimate list items. Thus, we found considerable consistency in participants’ tendencies to initiate recall with the first items, but the strategic behavior to initiate recall with middle and last list items was far more variable.

Discussion

The findings from Experiment 2 were similar to those from Experiment 1 and revealed that the number of words to be recalled had a large effect on the probability of first recall of an item. Participants showed enhanced tendencies to initiate recall of the last item, penultimate item, antepenultimate item, and first list item when they were postcued to recall one, two, three, or all the list items, respectively.

Experiment 2 showed again that participants could exert considerable control in their retrieval strategy in an immediate memory task. It is noteworthy that we observed similarities in the preferred recall orders in IFR (Exp. 1) and ISR (Exp. 2). These common patterns of PFR data suggest that there may be more similarities than differences between the memory representations underpinning ISR and IFR, and that it would be fruitful to explore integrative accounts of the two tasks.

Experiment 3

In Experiment 3, we repeated the above list length and recall requirement manipulations using the “ISR-free” task employed by Tan et al. (2016; see also Tan & Ward, 2007; Ward et al., 2010). In this variant of serial recall, participants are required to write each of the recalled items in the row in the response grid corresponding to its serial position (i.e., at recall, the item presented at Serial Position 2 should be written on the second line of the response grid, etc.). However, in contrast with strict ISR (Exp. 2), the participants in ISR-free were free to fill in the grid in any temporal order that they wished (i.e., they were permitted to write down later list items in later grid positions before they wrote down earlier items in earlier grid positions, if they so wished). The advantage of this method is that it provides an informative measure of the relative accuracy and accessibility of serial recall information at different serial positions at the time of test, when the participant is free to output that information in any order desired.

Method

Participants

Twenty-five psychology students from City, University of London, participated in this experiment in exchange for course credits. All participants were fluent in English and had normal or corrected-to-normal vision. None had taken part in the previous experiments.

Materials and apparatus

The materials and apparatus were identical to those used in Experiments 1 and 2.

Design

We examined three within-subjects independent variables: recall requirement, with four levels (recall 1, recall 2, recall 3, and recall all); list length, with three levels (4, 5, and 6); and serial position, with up to six levels (1–6). The main dependent variable was the PFR for each serial position.

Procedure

The procedure was identical to that used in Experiments 1 and 2, with the exception that participants performed the ISR-free task instead of IFR at the end of each list. In this method, participants were free to write down their responses on the response grid in any temporal order they wished, but they had to ensure that each word was written on a row that corresponded to its serial position at presentation. Participants spoke their recalls aloud as they wrote their responses in the grids, so that we could determine both the output order (based on spoken recall) and the participants’ judgments of serial position (based on the written gird position).

Results

The PFRs for each list length, recall requirement, and serial position are presented in Fig. 5.

Fig. 5
figure 5

Data from Experiment 3 (ISR-free), showing the probability of first recall as a function of serial position (SP) and list length (LL: 4, 5, or 6) when participants were required to recall one word (A, upper left), two words (B, upper right), three words (C, lower left), and all the words (D, lower right). Note that neither the list length nor the number of words to be recalled was known to participants in advance of the list presentation

The recall patterns in Figs. 5A, B, and C are highly similar across all list lengths, and again indicate that the PFR for Serial Position 1 is greatest in the “recall all” condition, and that the PFR for the final serial position is greatest in the “recall one” condition. In addition, the penultimate item tends to be the first recalled item in the “recall two” condition.

As in the previous two experiments, the PFR data were analyzed by performing separate 3 (list length: 4, 5, 6) × 4 (recall requirement: recall 1, recall 2, recall 3, and recall all) within-subjects ANOVAs for the first, final, penultimate, and antepenultimate serial positions. Figure 6 shows these PFR data for each list length and recall condition.

Fig. 6
figure 6

Data from Experiment 3 (ISR-free), showing the probability of first recall as a function of list length (4, 5, or 6) and number of words to be recalled (1, 2, 3, or all) for the words presented in the (A) first serial position (SP 1, upper left), (B) final serial position (SP N, upper right), (C) penultimate serial position (SP N–1, lower left), and (D) antepenultimate serial position (SP N–2, lower right)

First serial position

There was a significant main effect of list length, F(2, 48) = 42.24, p < .001, ηp2 = .638; a significant main effect of recall requirement, F(2.26, 54.16) = 37.05, p < .001, ηp2 = .607; and a significant interaction effect between list length and recall requirement, F(6, 144) = 2.88, p < .05, ηp2 = .107. Simple main effects revealed that for list length 4, the “recall 3” and “recall all” conditions were significantly different from each other and from the other recall conditions (all ps at least < .05). For list length 5, the “recall 1” condition was significantly different from all other recall conditions (all ps at least < .05), and the “recall 2” condition was significantly different from the “recall all” condition (p < .05). For list length 6, the “recall all” condition was significantly different from all other recall conditions (all ps at least < .05). Simple main effects also revealed that for the “recall 1” condition, list length 4 was significantly different from the other two list lengths (ps < .01). For the “recall 2” condition, list length 6 was significantly different from the other two list lengths (ps at least < .05). For the “recall 3” condition, all three list lengths were significantly different from one another (all ps at least < .01). Finally, for the “recall all” condition, list length 4 was significantly different from the other two list lengths (ps < .001).

Using a BANOVA, we found strong evidence for a best model including the effects of list length and recall requirement (BF10 = 1.104 × 1034), and this model was not substantially preferred (BF = 2.27) to the model including both effects and the interaction (BF10 = 4.86 × 1033). Post-hoc comparisons of list length revealed strong evidence for differences between list lengths 4 and 5 (BF10,U = 8.45 × 107), between list lengths 4 and 6 (BF10,U = 6.89 × 1016), and between list lengths 5 and 6 (BF10,U = 7,732). Post-hoc comparisons of recall requirement revealed strong evidence for differences between all different levels of recall requirement. Thus, post-hoc comparisons revealed strong evidence for differences between “recall 1” and “recall 2” (BF10,U = 35.54), between “recall 1” and “recall 3” (BF10,U = 4.51 × 107), and between “recall 1” and “recall all” (BF10,U = 1.02 × 1012). There was also strong evidence for differences between “recall 2” and “recall 3” (BF10,U = 443), between “recall 2” and “recall all” (BF10,U = 4.63 × 106), and between “recall 3” and “recall all” (BF10,U = 1,133).

Final serial position

There was a significant main effect of list length, F(2, 48) = 10.65, p < .001, ηp2 = .307; a significant main effect of recall requirement, F(1.56, 37.34) = 53.81, p < .001, ηp2 = .692; and a nonsignificant interaction effect between list length and recall requirement, F(6, 144) = 0.96, p > .05, ηp2 = .038. Bonferroni post-hoc comparisons revealed that the “recall 1” condition was significantly different from all other recall conditions (ps < .001). In addition, list length 4 was significantly different from the two other list lengths (ps < .01).

A BANOVA revealed strong evidence for a best model including the effects of recall requirement and list length (BF10 = 2.58 × 1044); this best model was strongly preferred (BF = 34.81) to the model with both main effects and the interaction (BF10 = 7.42 × 1042) and also strongly preferred (BF = 1,253) to the model with only recall requirement (BF10 = 2.061 × 1041). Post-hoc comparisons of the effects of list length revealed evidence for differences between list length 4 and list length 5 (BF10,U = 23,985) and between list length 4 and list length 6 (BF10,U = 32,753), but evidence against a difference between list length 5 and list length 6 (BF10,U = 0.111).

Post-hoc comparisons of the effects of recall requirement revealed strong evidence for differences between “recall 1” and all other levels of recall requirements. Thus, post-hoc comparisons revealed strong evidence for difference between “recall 1” and “recall 2” (BF10,U = 1.83 × 1016), between “recall 1” and “recall 3” (BF10,U = 2.32 × 1016), and between “recall 1” and “recall all” (BF10,U = 2.22 × 1015). No substantial evidence was apparent for a difference between “recall 2” and “recall 3” (BF10,U = 1.133) or for a difference between “recall 2” and “recall all” (BF10,U = 0.428), and there was moderate evidence against a difference between “recall 3” and “recall all” (BF10,U = 0.153).

Penultimate serial position

We found a significant main effect of list length, F(2, 48) = 11.51, p < .001, ηp2 = .324; a significant main effect of recall requirement, F(1.74, 41.81) = 24.69, p < .001, ηp2 = .507; and a nonsignificant interaction effect between list length and recall requirement, F(6, 144) = .30, p > .05, ηp2 = .012. Bonferroni post-hoc pairwise comparisons revealed that the “recall 2” condition was significantly different from all other recall conditions (ps < .001). In addition, list length 6 was significantly different from the other two list lengths (ps < .01).

Using a BANOVA, we observed strong evidence for a best model including the effects of recall requirement and list length (BF10 = 5.19 × 1022), and this model was strongly preferred (BF = 50.95) to the model with both main effects and an interaction (BF10 = 1.02 × 1022). The best model was also strongly preferred (BF = 78.71) to the model with only the main effect of recall requirement (BF10 = 6.60 × 1020). Post-hoc comparisons of the effects of list length revealed evidence against differences between list length 4 and list length 5 (BF10,U = 0.128), but strong evidence of differences between list length 4 and list length 6 (BF10,U = 481.6) and list length 5 and list length 6 (BF10,U = 100.2).

Post-hoc comparisons of recall requirement revealed evidence for differences between “recall 1” and “recall 2” (BF10,U = 6.46 × 109) and between “recall 1” and “recall 3” (BF10,U = 26.45). There was only moderate evidence for a difference between “recall 1” and “recall all” (BF10,U = 3.014). We found strong evidence, however, for differences between “recall 2” and “recall 3” (BF10,U = 5.71 × 107) and between “recall 2” and “recall all” (BF10,U = 6.76 × 108). There was moderate evidence against a difference between “recall 3” and “recall all” (BF10,U = 0.208).

Antepenultimate serial position

There was a nonsignificant effect of list length, F(1.53, 36.82) = .18, p > .05, ηp2 = .008; a significant main effect of recall requirement, F(2.12, 50.81) = 15.53, p < .001, ηp2 = .393; and a nonsignificant interaction effect between list length and recall requirement, F(6, 144) = 1.04, p > .05, ηp2 = .042. Bonferroni post-hoc pairwise comparisons revealed that the “recall 1” and “recall 3” conditions were significantly different from each other and from the other recall conditions (ps at least < .05).

Using a BANOVA, we found evidence for a best model including effects of recall requirement (BF10 = 8.55 × 109); this best model was strongly preferred (BF = 20.48) to the model with both main effects (BF10 = 4.17 × 108), and also strongly preferred (BF = 284) to the model with both main effects and the interaction (BF10 = 3.01 × 107). Post-hoc comparisons of the effects of list length revealed evidence against differences between list length 4 and list length 5 (BF10,U = 0.149), differences between list length 4 and list length 6 (BF10,U = 0.126), and differences between list length 5 and list length 6 (BF10,U = 0.118).

Post-hoc comparisons of recall requirement revealed evidence for differences between “recall 1” and “recall 3” (BF10,U = 7.32 × 106), between “recall 2” and “recall 3” (BF10,U = 91.97), and between “recall 3” and “recall all” (BF10,U = 651). We also uncovered strong evidence for a difference between “recall 1” and “recall 2” (BF10,U = 694), and strong evidence for a difference between “recall 1” and “recall all” (BF10,U = 246). There was moderate evidence against a difference between “recall 2” and “recall all” (BF10,U = 0.128).

Recall of subsequent words

Although the emphasis in this article is on the first word recalled, it is still informative to consider the complete patterns of output order on trials in which participants were asked to recall two, three, or all the list items. We provide two tables showing the patterns of recalls in Experiment 3. Table 5 shows the distribution of recalls in Experiment 3 (ISR-free) as a function of input serial position and output position for each recall requirement and list length.

Table 5 Output order data from Experiment 3 (immediate serial recall–free)

In Table 5, the values in Output Position 1 again represent the first words that were recalled, which were the data in the preceding analyses. We have already seen that when only one word was to be recalled, there was a heightened tendency to initiate recall with the last word; when two or three words were to be recalled, there were heightened tendencies to initiate recall with the penultimate or antepenultimate item, respectively; and when all the items were to be recalled, there was a tendency to initiate recall with the first word. We also saw an indication that if participants were asked to recall three or more items, they tended to output early list items in the output position corresponding to the items’ input position. By contrast, items presented at later serial positions were often recalled at any output position, and they were the most commonly output words at later output positions. Finally, it is clear that participants were not always able to recall a third word in the “recall 3” condition, and the numbers of empty cells increase from the fourth output position onward in the “recall all” conditions.

In Table 6, we examine the patterns of transitions in the output sequences from words of different list lengths and recall requirements. The larger values in the leading diagonals provide further evidence of a forward-ordered recall in ISR-free: Words that had been presented at serial position n tended to precede words that had been presented at serial position n+1. The most frequent transitions with only two words to recall were from Serial Positions 1 to 2 and from serial position n–1 to serial position n. Table 6 also shows a slight tendency for participants to “wrap around” from serial position n to Serial Position 1, but participants also transitioned from the last list item to the penultimate item.

Table 6 Transition data from Experiment 3 (immediate serial recall–free)

These tables provide important information concerning output order and pairwise transitions, but they do not make explicit the whole sequences of output in the “recall 3” and “recall all” conditions. Following Lewandowsky et al. (2009), we again provide a short list of the most frequently output sequences at each list length and condition (only sequences with ten or more instances are reported, with the observed frequencies in parentheses). For list length 4, participants in the “recall all” condition most frequently output the sequences “1234” (105), “124” (18), “134” (17), “123” (12), and “234” (11), whereas participants in the “recall 3” condition most frequently output the sequences “123” (84), “124” (20), “234” (17), “134” (13), “34” (13), and “412” (11). For list length 5, participants in the “recall all” condition most frequently output the sequences “12345” (37) and “543” (10), whereas participants in the “recall 3” condition most frequently output the sequences “123” (43), “345” (34), “125” (24), “45” (15), “451”(10), “54” (10), and “542” (10). Finally, for list length 6, participants in the “recall all” condition most frequently output the sequences “56” (10), “564” (10), and “65” (10), whereas participants in the “recall 3” condition most frequently output the sequences “456” (35), “126” (16), “561” (16), “56” (14), “65” (14), “654” (14), “123” (10), and “564” (10).

Finally, we briefly examined the individual differences within our data by examining the correlations between participants’ tendencies to initiate recall with a particular serial position at one list length and condition with their tendencies to initiate recall with other serial positions at other list lengths and/or conditions. Since there were 12 different experimental conditions, 66 pairwise comparisons were possible between frequencies of trials in which participants initiated recall with Serial Position 1. These 66 individual pairwise correlations were all significantly positively correlated (.40 < r < .85, all ps < .05). Similarly, there were 66 pairwise comparisons between frequencies of initiating recall with the last item, serial position n. Of these 66 individual pairwise correlations, all were positively correlated (.14 < r < .85), and 50 were significantly positively correlated (.396 < r < .85, ps < .05). We found quite wide variation in the correlations (– .36 < r < .74) between the 66 pairwise frequencies of initiating recall with the penultimate list items. Similarly, there was wide variation in the correlations (– .22 < r < .70) between the 66 pairwise frequencies of initiating recall with the antepenultimate list items. Thus, we observed considerable consistency in participants’ tendencies to initiate recall with the first and last items, but the strategic behavior to initiate recall with middle list items was more variable.

Discussion

Experiment 3 showed that once again, across all three list lengths, the probability of first recalls were greatly affected by the recall demands: Participants tended to initiate recall with the first item when required to recall all the list items, but tended to initiate recall with the last item first when required to recall only one item. Moreover, they showed a tendency to initiate recall with the penultimate item when asked for two items, and with the antepenultimate item when asked for three items. The similarities between the ISR-free data from Experiment 3 and the IFR and ISR data from Experiments 1 and 2, respectively, suggest that the very same types of models should be able to accommodate all three tasks, with little modification, particularly if one assumes that a considerable degree of flexibility and control can be exerted in the output order, depending on the number of items to be recalled and the recall instructions of the task.

General discussion

In three experiments examining IFR, ISR, and ISR-free, participants were more likely to initiate their recall with the first list item when they were instructed to recall all the items in the list, but were more likely to initiate recall with the last or with the penultimate item when they were instructed to recall only one or two items, respectively (cf. Tan et al., 2016). Since participants were only informed of the number of words to be recalled immediately after the list had been presented, we believe that the differences in recall order found in our data must reflect the use of different retrieval strategies, and that participants choose to vary their retrieval strategy as the number of items to be recalled changes.

Our findings add to the growing body of studies that have shown that participants can exert some control at retrieval over which words they are to recall first in immediate tests such as IFR and ISR, and in variants of ISR such as reconstruction of order and ISR-free (e.g., Bhatarah et al., 2009; Bhatarah et al., 2008; Grenfell-Essam & Ward, 2012; Lewandowsky et al., 2009; Tan & Ward, 2007). Unlike those in previous studies, the participants in the present experiments varied their first word recalled not because they were instructed to do so, but on the basis of the instruction to recall different numbers of items, and in so doing, they showed a greater flexibility in retrieval than has previously been demonstrated—showing enhanced access to the first, the last, the penultimate, and sometimes even the antepenultimate items, when asked to recall “all,” “one,” “two,” and “three” items, respectively.

Our findings further demonstrate that we need theories of immediate memory that predict privileged access to the first and the last few items (including the capabilities for enhanced access to items n, n–1, and n–2). A wide range of mechanisms have been proposed that could provide privileged access to the first list item in theories of IFR and ISR. In IFR, possible mechanisms include a start-of-list context cue (e.g., Davelaar et al., 2005; Farrell, 2012; Metcalfe & Murdock, 1981), increased temporal distinctiveness of the first item (e.g., Brown, Neath, & Chater, 2007), increased attention (e.g., Lohnas, Polyn, & Kahana, 2015), or a “Get Ready” warning signal (e.g., Laming, 1999, 2010). In ISR, possible mechanisms include that the first item may be encoded with the greatest strength (e.g., Page & Norris, 1998), may be associated with a start-list cue (e.g., Farrell, 2012; Henson, 1998), or may be associated with early context positions (e.g., Burgess & Hitch, 1992, 1999, 2006). A wide range of mechanisms have also been proposed that could provide privileged access to the last few list items in theories of IFR and ISR. In theories of IFR, the privileged access to the recency items may reflect the output of a short-term store (Anderson, Bothell, Lebiere, & Matessa, 1998; Atkinson & Shiffrin, 1968, 1971; Davelaar et al., 2005; Raaijmakers & Shiffrin, 1981), the result of greater temporal distinctiveness (Brown et al., 2007), the heightened accessibility to the first item of the most recent group (Farrell, 2012), or a greater match with the end-of-list temporal context (e.g., Howard & Kahana, 2002; Polyn, Norman, & Kahana, 2009; Sederberg, Howard, & Kahana, 2008; Tan & Ward, 2000). In theories of ISR, possible mechanisms include that the last item may retain greater modality-dependent features (e.g., Nairne, 1990), may be associated with an end-list cue (e.g., Henson, 1998) or associated with later context positions (e.g., Burgess & Hitch, 1992, 1999, 2006). It should be noted that these accounts of recency rarely specify how participants might have privileged access to list items n–1 or n–2.

A satisfying explanation of our data would further provide some theoretical principle as to why participants naturally prefer to output with different retrieval strategies as the list length and the number of items to be recalled is varied. Our preferred interpretation of our findings is that participants use retrieval strategies that are based on the common principles of (1) extended recency, where participants have greater accessibility to the end-of-list items than they have to earlier list items; (2) one-item primacy, where participants have privileged access to the first list item, and this accessibility decreases with increasing list length; (3) output interference, where item recalled generates output interference and increases the functional retention interval, reducing accessibility to subsequent list items, but also each item recalled helps cue the next list item, so that these constraints lead to participants expressing a preference for forward-ordered sequence of recalls; and (4) participants initiate their recall in order to maximize performance based on the recall requirements.

We believe that combining these principles gives rise to subtle differences in the recency-based strategies primarily used to recall one, two, and (to a lesser extent) three list items. We believe that even in a very short list of, say, four to six items an extended recency function (Principle 1) exists: At test, the most accessible list items are the most recent ones, and the accessibility of these recency items varies little with increasing list length. There is also heightened accessibility of the first list item (relative to other early list items), which decreases with increasing list length (i.e., with decreasing recency). These first two principles are readily evidenced by participants’ preferred recalls. When they are required to recall just one list item, then, regardless of the list length, they tend to recall the most recent item, as it is most accessible. Also, on a substantial minority of trials, participants initiate recall with the very first list item, but the heightened accessibility of this item decreases as the list length increases (e.g., Lehman & Malmberg, 2013; Ward et al., 2010). The heightened accessibility to recency items is consistent with the serial position curves in immediate memory tasks in which they are free to recall in any order: for instance, ISR-free (Tan & Ward, 2007; Ward et al., 2010) and unconstrained reconstruction of order (Lewandowsky et al., 2009; Ward et al., 2010). Extended recency effects have also been shown in other immediate memory tasks, such as the digit probe task (Waugh & Norman, 1965), running memory span (Hockey, 1973), and, of course, IFR of longer lists (Murdock, 1962; Ward et al., 2010). The heightened accessibility of the first item is of course consistent with ISR and is consistent with IFR for very short lists (Ward et al., 2010).

When the number of words to be recalled increases, there are subtle shifts in exactly where to initiate recall, related to Principles 3 and 4. As was predicted by Atkinson and Shiffrin (1968), if participants wish to maximize recall of a small set of list items, items tend to be rehearsed and recalled in forward serial order. Since the more recent items are more accessible, it makes sense to output the less accessible items first, because output interference (or the increased retention interval) would hinder their access at later output positions. Thus, if a recency strategy is maintained, participants will tend to initiate recall with a recency item that allows a sequence length that is consistent with the recall requirements (i.e., initiating recall with item n–1 or n–2 when required to recall two or three list items, respectively). A consideration of the patterns of transitions in our data shows that when participants initiate recall with these recency items, they tend to recall sequences in forward serial order: When recalling two items, they recall items n–1 then n, and when recalling three items, they recall items n–2, n–1, and n.

As the number of words to be recalled increases to recalling three words or to recalling all the list items, participants increasingly tend to initiate recall with the first list item. The advantage of retrieving the first list item first is that recall can be self-propagating (Roediger, 1978), and a forward-ordered recall strategy can facilitate the retrieval of multiple responses: As Lewandowsky et al. (2009) have argued, this strategy allows participants to travel economically through memory space. As each new item is recalled, there may be successive, subtle shifts in the retrieval cues that can be used to cue the next item, and the use of different cues might help attenuate the self-limiting nature of recall (Roediger, 1978)—that is, the negative effects of output interference caused, at least in part, by the repeated retrieval of already-recalled items (e.g., Beaman, 2002; Bunting, Cowan, & Saults, 2006; Cowan, Saults, Elliott, & Moreno, 2002; Laming, 2009; Oberauer, 2003; Roediger, 1973, 1974; Tan & Ward, 2007).

As was discussed by Tan et al. (2016), similar tendencies to recall in forward order have been seen in other data sets, in free recall and serial recall (e.g., Bhatarah et al., 2008; Klein, Addis, & Kahana, 2005; Ward et al., 2010), ISR-free (Tan & Ward, 2007; Ward et al., 2010), and reconstruction-of-order tasks (Lewandowsky et al., 2009; Ward et al., 2010). Forward-ordered transitions are, in general, more successful than backward or more remote transitions (e.g., Howard & Kahana, 1999; Kahana, 1996; Lohnas & Kahana, 2014; Nairne, Ceo, & Reysen, 2007).

The final principle (4) is that the changes in retrieval strategy shown by our participants are influenced by the desire to maximize performance in line with the postcued task requirements. Our preferred interpretation is supported by Table 7, which shows the mean proportions of to-be-remembered words recalled when participants initiated recall with different serial positions. To avoid ceiling effects, we show only the performance when participants were required to “recall all” the words. Table 7 shows that participants tended to recall more words when they initiated recall with the first word (using both free-recall and serial-recall scoring) than when they initiated recall with one of the last words. Statistical analyses are complicated, because different participants contribute to different cells. Nevertheless, for each individual (averaged across the three list lengths) for the “recall all” trials, we found significant positive correlations between initiating recall with the first list item and the mean proportion of words recalled using free-recall scoring for Experiment 1 (r = .561, p < .01), Experiment 2 (r = .859, p < .001), and Experiment 3 (r = .691, p < .001); and also significant positive correlations between initiating recall with the first list item and the mean proportion of words recalled using serial-recall scoring for Experiment 1 (r = .938, p < .001), Experiment 2 (r = .865, p < .001), and Experiment 3 (r = .651, p < .01). By contrast, we found significant negative correlations between initiating recall with the last list item and mean proportion of words recalled using free-recall scoring for Experiment 1 (r = – .496, p < .05), Experiment 2 (r = – .765, p < .001), and Experiment 3 (r = – .497, p < .05); and also significant negative correlations between initiating recall with the last list item and mean proportion of words recalled using serial-recall scoring for Experiment 1 (r = – .726, p < .001), Experiment 2 (r = – .731, p < .001), and Experiment 3 (r = – .516, p < .01).

Table 7 Mean proportions of words recalled as a function of serial position of the initial recall

One final implication of our data is that participants appear to have a greater degree of knowledge about which items were presented in which serial position than they are often credited with in theories of IFR. When asked to recall only two items, participants must know which item was presented in serial position n–1 in order to use a retrieval strategy to deliberately initiate recall with that item. It is unclear how this could be achieved in many theories of IFR. By contrast, it is common in ISR for participants to be asked to recall sequences of five to nine list items in the correct serial order, and a number of experiments have shown that participants are quite capable of assigning items to serial positions even when the output order differs from the input order (e.g., Beaman, 2002; Bunting et al., 2006; Cowan et al., 2002; Laming, 2009; Oberauer, 2003). It may be fruitful, therefore, to consider whether recall of short lists benefits from position-based as well as temporal-context cues (e.g., Brown et al., 2007; Lewandowsky et al., 2009), and it is timely to consider that, in at least one instantiation, Atkinson and Shiffrin (1968, Exp. 8) modeled STS as consisting of an ordered set of five slots associated with their serial positions in order to facilitate serial probed recall. Moreover, prior research has shown that participants who are postcued can perform ISR and IFR similarly to those who are precued to perform these tasks (e.g., Bhatarah et al., 2009; Bhatarah et al., 2008; Grenfell-Essam & Ward, 2012). These postcued experiments suggest that participants who are postcued to perform IFR must possess serial position information, because they are able to allocate items to specific serial positions when postcued to perform ISR.

In summary, participants who were postcued to recall different numbers of words could modify their retrieval strategy and output order depending on the number of words they were required to recall. In all three studied tasks, participants showed a tendency to initiate recall of short lists with the first list item when postcued to recall all the list items, but showed enhanced tendencies to initiate recall with the last or penultimate items when cued to recall one or two items, respectively. Our findings show that participants can demonstrate considerable flexibility in their choice of retrieval strategy, and we suggest that similar memory processes operate across a range of different immediate memory tasks. Some 50 years on, it appears that the Atkinson and Shiffrin (1968) model remains relevant to inspiring the integration of a wide range of memory tasks and acknowledging the existence of different control processes that can act on to-be-presented material. We believe that an account of IFR that delivers flexibility in retrieval strategies may also be well-placed to account for a wider range of immediate memory tasks.

Author note

We thank Anika Asfaque, Monthaha Awlia, and Esra Demir for their assistance in collecting the data.