This issue commemorates the 50th anniversary of the publication of the chapter by Richard Atkinson and Richard Shiffrin titled “Human Memory: A Proposed System and Its Control Processes.” Many scientists have been introduced to what is often called “the modal model” in an Introductory Psychology course. Many have cited the chapter in their publications, usually in reference to the proposed distinction between short-term memory and long-term memory. However, the focus of the chapter’s 100 print pages was an investigation of the role of control processes in all memory systems for both storage and retrieval. The chapter contained many studies of rehearsal in particular and used careful modeling to demonstrate the validity of the concepts. A review of those modeling efforts reveals them to be state-of-the-art today, uncovering, testing, and verifying fundamental processes of rehearsal, storage, and retrieval.

In the first part of this article we describe the historical context for Atkinson and Shiffrin’s chapter, summarize its main concepts, and review briefly the data and the quantitative models that gave support to the theory. In the second part we summarize some of the subsequent developments that in some instances refined and developed the concepts and theory and in other instances led researchers and theorists to pose alternatives.

Background/Context

In the late 1950s and the early 1960s there was a period of tremendous activity in experimental psychology and many of these developments contributed to what would become the Atkinson-Shiffrin theory. Here we will highlight two of these developments. First, there was what is nowadays commonly referred to as the cognitive revolution with its emphasis on attentional and decisional processes. The cognitive revolution developed hand-in-hand with the flourishing of mathematical modeling that allowed learning and memory findings to be explained elegantly using very simple assumptions (for example, dating perhaps to Bower, 1961, a number of very precise yet simple mathematical models were developed to explain paired-associate learning results).

The cognitive revolution

In the second half of the 1950s researchers in auditory and visual perception began to formulate the results of their research in terms such as attention, short-term memory, and stages of information processing. An important milestone was Broadbent’s Perception and Communication (1958) that summarized a large body of research in (especially) auditory perception. Broadbent reintroduced concepts such as primary and secondary memory and emphasized the notion of attention as a filtering process. Broadbent also proposed rehearsal as a means of reactivating information in primary or short-term memory (Broadbent, 1958, p. 225-242). The view that items were displaced (and hence forgotten) from short-term memory by new incoming items (rather than by a time-based process of decay) received support from the experiments by Waugh and Norman (1965). In these experiments a probe-digit recall task was used in which participants were presented with a long list of digits in which some items were repeated. Whenever a repeated item (the probe) was presented they had to recall the item that had been presented previously immediately after the probe item. Critically, the items were presented at either a rate of one per second or four per second, separating the retention time from the number of intervening items. For example, an interval of 2 s could be filled with either two items or with eight items. Recall was determined almost completely by the number of intervening items rather than by the number of seconds, strongly supporting a replacement and interference account of forgetting in short-term memory.

While such results might suggest that only a small amount of information is available at any one time, experiments such as those from Sperling (1960, now a textbook classic) showed that a much larger amount of information is briefly available but is lost very quickly. Sperling went on to show that this information was transferred to and stored in higher level visual short-term memories. Atkinson and Shiffrin (1968) used the term “sensory registers” to describe a variety of low-level sensory systems that can hold large amounts of information temporarily but from which only a few items are transferred to higher-level short-term memories.

More generally, humans came to be viewed as complex information processing systems, and a “computer metaphor” was often used to describe this new direction. The Atkinson-Shiffrin theory could be viewed as a culmination of these various themes, presenting a much more complete framework for learning and memory processes, one that still figures quite prominently in textbooks as the “modal model of memory.”

Developments in mathematical modeling

A second important development was the progress that was made in mathematical modeling, especially in the mathematical modeling of learning and memory processes. Estes (1960) had shown that in simple tasks learning might proceed in an all-or-none fashion. This result implied that in Estes’ Stimulus Sampling Theory the learning could be described as involving a single to-be-conditioned element (the one-element model). It was an important advantage of such models with just a few elements that the learning process was a simple Markov chain, a mathematical process that is relatively easy to analyze. The fundamental property of a Markov chain is that the future steps in the process are determined only by the current state and not by how it got there. In the simplest case, the Markov chain had only two states: learned and not-learned. Bower (1961) applied this model to a paired-associate learning task and showed that the model quite precisely accounted for a large number of statistics (number of errors, trial of last error, number of runs of errors, etc.). These results set a very high standard for future modeling efforts.

Initial applications of Markov modeling referred to external elements that were or were not “conditioned,” but the states of the model soon came to reflect memory or learning in short-term or long-term states. For example, Atkinson and Crothers (1964) described a model with four states: a long-term state L (reflecting that the item is in long-term memory), a short-term S (reflecting that the item is in short-term memory), a state F (reflecting that the item is forgotten from short-term memory), and an initial state U (reflecting that nothing has been learned about this item). In some versions forgetting from the short-term state was assumed to be a function of the number of other items presented between two presentations of an item. These ideas led fairly directly to the Atkinson-Shiffrin model in which there were short-term states (one of which was the rehearsal buffer) from which items were lost when replaced by subsequent items. Whereas early Markov models described the transitions through states until a state of permanent storage was reached, the Atkinson and Shiffrin model placed much more emphasis on causes of forgetting, and upon failures of retrieval from all states of memory, including the “learned” state. This emphasis was seen in the detailing of strategies of retrieval. At the same time, Atkinson and Shiffrin placed emphasis upon strategies of storage.

Origins of the Atkinson-Shiffrin model

Atkinson and Shiffrin formalized concepts that date to the first days of psychology; many of the ideas fundamental to the modal model, such as the distinction between short-term memories and long-term memories, can be found in early writings such as James (1890), and the modal model was informed by numerous findings resulting from the pursuit of experimental psychology, such as Ebbinghaus (1885). Further, it used and built on results and modeling from Atkinson, Shiffrin, and colleagues in the years 1964–1968. A technical report in 1965 by Atkinson and Shiffrin previewed the later chapter by introducing a rehearsal buffer for storage and retrieval, search processes for long-term retrieval, and an emphasis on control processes.

Key concepts of the theory

One might wonder why the chapter has had such a long-lasting impact. Most of the current citations pertain to the memory structures of the theory. However, the chapter’s main focus was investigations of the control processes that operate to store and retrieve from the various memories. The Atkinson and Shiffrin chapter took a large step down the road of the cognitive revolution by formally implementing several control processes involved in the modal model and manipulating them empirically in order to test these new assumptions.

There are a number of key elements that characterized the chapter. First, Atkinson and Shiffrin did not present one quantitative model to explain one memory task, but instead presented a general framework within which specific models for specific tasks could be formulated, a theme that has continued in the developments of the model since. The distinction between a general framework and task-specific models was mandated by the chapter’s emphasis on flexible control processes that were adaptive to the current task demands. This is seen clearly in the second half of the chapter, which presented a number of empirical studies in which task characteristics were varied and modeled by corresponding changes in the assumptions regarding control processes such as the rehearsal buffer.

A second major element of the Atkinson-Shiffrin theory was the distinction between a temporary short-term memory and a relatively permanent long-term memory. In their view, short-term memory was not a simple storage structure but it was also the part of the system where active control processes had their effects so that the system could be termed “working memory.” Short-term memory was recognized to be a system of multiple memories with differing modalities and characteristics. These were partitioned into very short-term memories, termed “sensory registers,” and a longer lasting “short-term store” with multiple modalities and a higher degree of control. Thus, the idea was that there were multiple stages of processing through various short-term memories with increasingly abstract coding of the information. Although this description seems to imply a forward flow of information from the sensory registers to short-term memory and then to long-term memory, Atkinson and Shiffrin made it clear that information flowed both ways; for example, when the word “cow” is presented, semantic and associative information related to the concept of “cow” is activated in long-term memory and joins the information already in short-term memory. Thus, there is a constant flow of information between short- and long-term memory, producing the momentary contents of short-term memory that in turn determines what is stored in long-term memory.

The third and probably most critical component was the emphasis on active control processes, strategies used to encode and store information and to retrieve information from the various memory stores. A key control process that was extensively investigated was rehearsal, a process that was assumed to be critical for the maintenance of information in short-term memory as well as the transfer to long-term memory. The ease of verbal rehearsal was noted, as well as the likelihood that rehearsal in modes other than the verbal one was much more difficult, and the strong possibility that there could be recoding of non-verbal stimuli. Rehearsal may be viewed as “low-hanging fruit,” given its conscious availability and ease of manipulation. However, just the opposite has proved the case: In different forms, variants, and extensions, investigation of rehearsal has dominated the field of memory ever since. A few examples suffice to make this point: Baddeley refined the concepts in different forms of short-term or working memory such as the “phonological loop” (e.g., Baddeley & Hitch, 1974). Working memory with rehearsal in different forms (say visual, auditory, phonological, verbal, and so on) has become a field in its own right, and is presently used as tests for intelligence and for clinical assessment (e.g., Engle, 2018). The presence and/or absence of rehearsal is likely the explanation for the different results and models of short-term recall, the presence of rehearsal likely producing the results leading Sternberg (1966) to propose serial exhaustive search, and the absence of rehearsal likely producing the quite different result first obtained by McElree and Dosher (1989) and then obtained and modeled by Nosofsky (e.g., Nosofsky, Little, Donkin, & Fific, 2011). Storage of information in long-term memory has been tied directly to rehearsal processes, both behaviorally and neurally (e.g., Polyn & Kahana, 2008). Studies directly explore rehearsal through overt paradigms (Ward & Tan, 2004). Capacity limits of short-term and working memory have been tied to limits on control processes such as rehearsal and “attentional refreshing” (e.g., Barrouillet, Portrat, & Camos, 2011) and such limits have been incorporated in cognitive architectures such as ACT-R (e.g., Anderson, 1990), SOAR (e.g., Laird, 2012), and EPIC (e.g., Meyer & Kieras, 1997a, b). These examples are just the tip of a very large iceberg, but serve to illustrate the impact that careful study and modeling of control processes can have on progress in understanding cognition.

The chapter made it clear that the rehearsal buffer was just one component of a much larger system of short-term and working memories, a system with a great of flexibility. For example, different tasks could induce rehearsal of single items, pairs of items, or other types and modalities of information. When the focus was on rehearsal capacity, this was defined by the ability to maintain n items over time, a capacity resulting from the interaction of rehearsal rate and decay. It was noted and shown that optimal capacity would be seen with ordered rehearsal. However, rehearsal was a control strategy, so which items are rehearsed and which are “dropped” from rehearsal are choices of the subject: Perhaps the oldest item is chosen to leave rehearsal (and subsequently lost), or perhaps a more “random” choice is made.

According to the model, rehearsal has two main functions, maintenance and coding. Maintenance rehearsal is the primary use of the buffer when there is a goal to maximize the number of items held in short-term memory. Coding refers to the transfer of information from short-term memory to long-term memory. Such transfer was assumed always to involve a mixture of automatic transfer (as seen for example in incidental learning tasks) and controlled processes (such as rehearsal and elaborative encoding).

Atkinson and Shiffrin also discussed in detail the mechanisms of storage in and retrieval from long-term memory. They assumed that there might be multiple traces of the same “item,” each partial or mostly complete. Long-term traces are constantly evolving and changing as new information is added to them. Retrieval from long-term memory was modeled as a search process, governed by cues used to probe memory: Memory traces are sampled with a probability that is related to their strength. A sampled trace is examined for relevance through use of a recovery process, a reconstructive process where activated features of a memory trace are used to retrieve other stored features in order to both judge relevance and generate an answer to the question that was asked when the trace is judged relevant. Since sampling is a probabilistic process, successful recall is not guaranteed even if the information is available in long-term memory. Thus traces may appear to be forgotten at one moment, but can subsequently be recalled. This notion that long-term forgetting is mainly due to interference and search failure rather than decay has been a guiding principle in much of the work that was performed in later years by Shiffrin and his collaborators.

Quite a variety of control processes were discussed and studied in the chapter: The chapter covered strategies of rehearsal, storage, and retrieval of very short-term visual memory (e.g., the visual icon; Sperling 1960, etc.), and had extensive treatment of search processes as the main process allowing recall from long-term memory (subsequently elaborated in great detail by Raaijmakers & Shiffrin, 1980, 1981, in their SAM model). Similar search models of long-term retrieval remain the gold standard to the present day. The distinction between sampling and recovery is sometimes used today as a distinction between exploration and exploitation (e.g., Hills, Todd, Lazer, Redish, Couzin, and the Cognitive Search Research Group, 2015). Thus delineation, modeling, and testing of control processes has not only permeated the field, but has remained a key component of subsequent development of the Atkinson and Shiffrin theory by the authors of this article, and by our collaborators, colleagues and students (as indicated later in this article).

The various control processes were embedded in and operated upon the various structural components of the memory system: the many forms of temporary short-term memories, such as the very brief memories termed sensory registers (largely for low-level less abstract information), various forms of short-term and working memories, and the relatively permanent long-term memories. Such a characterization remains the standard approach behaviorally and neurally to this day.

In addition, many issues were discussed as unresolved that remain unresolved today, for reasons given in the chapter that are still reasons that hold true today. These include all-or-none storage and forgetting, the causes of short-term decay/forgetting and the difficulty of controlling rehearsal, the difficulty of interpreting results from clinical deficits to the hippocampal region, whether transfer from short-term to long-term memory is continuous, all-or-none, or a mixture, and many others.

Finally, and this may now be seen as a rather obvious point, they emphasized that the properties of the short-term memory system could not be derived from the results of simple “short-term memory” tasks since performance in all tasks is bound to be a mixture of retrieval from short-term and long-term memory. This idea was worked out in detail in the tasks and models that were discussed in the second half of their chapter.

This listing does not exhaust the concepts laid out and discussed by Atkinson and Shiffrin in their first 35 pages, but this introduction must stop short of repeating them. The remaining 65 pages of the chapter presented empirical studies and careful modeling designed to test, verify, extend, and amplify these concepts. There is a great deal of value in those remaining pages because the empirical designs used for assessment, and the testing carried out with detailed modeling, go far beyond the conceptual discussion in the first part of the chapter. Unfortunately we cannot review these because the details of the designs, results, and models would require space in this introductory article approaching that in the original chapter.

All in all, the Atkinson-Shiffrin chapter was a major step forward in comparison to less comprehensive and simpler models that prevailed at that time. Pertinent today is the fact that it is far more complex and worked out than the simple portrayals of the “modal model of memory” found in most textbooks.

Developments since the 1968 model

The Atkinson-Shiffrin chapter had a strong influence on many prominent memory models developed since the 1970s. Of course, Shiffrin and his colleagues are among those who pursued such developments and refined and extended the model. Here we will briefly highlight several of these models, the SAM model of Raaijmakers and Shiffrin (1980, 1981), the recognition model of Gillund and Shiffrin (1984), the REM model (Shiffrin & Steyvers, 1997), the One-Shot-of-Context model proposed by Malmberg and Shiffrin (2005), the SARKAE model of Nelson and Shiffrin (2013), and the dynamic model of Cox and Shiffrin (2017). These models build in a cumulative way on the underlying structure derived from the 1968 model.

The SAM model

The development of the SAM theory (Search of Associative Memory) started in 1978. The initial goal was to develop an extension of the search model initially proposed in 1968 Atkinson-Shiffrin paper and more extensively described in Shiffrin (1970). It was, however, quickly realized that the potential of the model was far greater and that the same architecture could be used to model other paradigms such as cued or paired-associate recall and recognition. Specific models based on the SAM theory were later developed for recognition (Gillund & Shiffrin, 1984), interference and forgetting (Mensink & Raaijmakers, 1988, 1989), and spacing and repetition effects (Raaijmakers, 2003).

The SAM model shares a number of assumptions with the Atkinson-Shiffrin model, including the notion of an STS buffer as a model for rehearsal processes and the assumption that storage in LTS is a function of the nature and duration of rehearsal in STS (Raaijmakers, 2008). The most important innovation was the explicit introduction of the notion of retrieval cues and the specification of how these were used to direct memory search. Rather than using a single strength value as in the original model, SAM assumes that different types of information are stored in the memory trace. A critical aspect of the model was the notion that not just item and inter-item information are stored but also context information. The simple strength value that was used in the Atkinson-Shiffrin model was replaced in SAM by an activation value that was equal to the product of the association strengths between the cues used during a specific memory search and the stored memory traces. This rule implies that the search set for a probe using cues X and Y is formed by the intersection of the search sets for each cue separately. The activation value so defined was then used in the same sampling and recovery equations that were the cornerstone of the Atkinson-Shiffrin model for free recall (see also Shiffrin, 1970).

A property of both the initial memory search model (Shiffrin, 1970) as well as the SAM model is the proposal that retrieval is a function of both the relative and the absolute strength of the target item. That is, the probability of a successful retrieval is decreased as the number of items on the list decreases (decreasing the relative strength) and it increases as the target item is studied longer or in a more elaborative way (increasing its absolute strength). Mensink and Raaijmakers (1988) showed that this property enabled SAM to predict a number of otherwise hard to explain findings in the interference literature. For example, the finding that in an A-B, A-C interference paradigm there is advantage for the interference condition compared to the control condition in the latency of recall even when the two conditions are equated in terms of probability of recall (Anderson, 1981) can be easily explained by a model in which the latency of correct recalls is a function of the relative strength only (as it is in Shiffrin’s memory search model as well as in SAM; see Mensink & Raaijmakers, 1988, p. 450). Impressive support for this memory search model was obtained by Rohrer and Wixted and their colleagues (Wixted & Rohrer, 1994; Wixted & Rohrer, 1994; Rohrer, 1996; Wixted, Ghadisha, & Vera, 1997). Rohrer and Wixted (1994) showed that the characteristics of the cumulative recall curves that they had observed closely matched those predicted by the SAM model more than a decade earlier. They also demonstrated that in free recall the mean latency for a list of n strong items is equal to that of a list of n weak items, even though the probability of recall will be much higher for the stronger list. Wixted, Ghadisha, and Vera (1997) replicated these results and also showed that in mixed-strength lists, the stronger items will be recalled faster than in pure-strength lists and the weaker items will be recalled slower than in pure (all-weak) lists. All of these properties can be derived mathematically from a simplified random sampling model based on relative strengths, but Rohrer (1996) showed that they also hold for more complex random sampling models with variable strengths and a recovery threshold (as in SAM).

The SAM model elevated context to a central and critical role in storage and retrieval, a role that has only grown in empirical and theoretical importance in the years since (particularly as seen in the modeling of Kahana, Howard, and their students and colleagues, for example in Polyn, Norman, & Kahana, 2009; see also Klein, Shiffrin, & Criss, 2007). The SAM model specifically separated out context and content information and used them multiplicatively in retrieval. There are many reasons for treating context and content as having differentiable roles, but a key difference lies in the degree to which context and content are stored and used in retrieval implicitly or explicitly. Context often consists of “background” information that is not experimentally varied and is not the focus of the task. For example, the task might require words to be remembered, but the font in which the words appear, the color of the computer monitor background, and the ambient noise in the experimental setting (and much more along these lines) are usually not varied and not the focus of the task; yet some of this information is stored in memory and used as retrieval cues.

The proposal in SAM of contextual retrieval cues was initially made to make the memory search focus on the most recently studied list (as in most memory experiments). The basic idea of context as a retrieval cue was very much “in the air” at the time when SAM was being developed (Bower, Monteiro, & Gilligan, 1978; Smith, Glenberg, & Bjork, 1978; Smith, 1979), but this was the first time that this idea was integrated in a formal model of memory. Whereas in the Raaijmakers and Shiffrin (1981) analysis of free recall a constant context was assumed during presentation and testing of a single list, Howard and Kahana (1999) made the reasonable assumption that context varies even within a single list and that upon retrieval of a specific trace not just the item information would be retrieved but also the stored context information. They showed how such a model could account for a number of detailed aspects of recall processes.

Later studies showed this to be a highly useful approach. For example, Mensink and Raaijmakers (1988) showed that it provided a mechanism that allowed the model to deal with many classical interference and forgetting phenomena. The same model was also used to provide an explanation for spacing and repetition effects (see Raaijmakers, 2003). While these models focused on gradual and more or less automatic contextual changes within an experimental session, the same framework could be used to model situations where the changes in context are more abrupt. A common hypothesis is that participants are able to construct a new mental context if the situation makes it necessary to separate the currently studies items from previously studied ones. Such an idea was used by Sahakyan and Kelley (2002) to account for directed forgetting phenomena. Malmberg, Lehman, and Sahakyan (2006) implemented this hypothesis in a model based on SAM (and REM, see below), and showed that the model accounted well for the existing data. Similarly, Jonker, Seli and MacLeod (2013) proposed an explanation for retrieval-induced forgetting based on the idea that items presented in different phases of the experiment get associated to different contexts. In sum, the SAM model and its variants greatly extended the explanatory power of the original Atkinson and Shiffrin model.

Initial research on recognition memory

Around the same time that Shiffrin was creating the search model of retrieval during recall, Atkinson began to describe a model of recognition memory (Atkinson, Hermann, & Wescourt, 1974; Atkinson & Juola, 1973, 1974; Juola, Fishler, Wood, & Atkinson, 1971). For a number of years, recognition was viewed as a simpler task than recall because it does not necessarily require the generation of episodic details from memory, and hence some of the problems encountered by the verbal-learning theorists were empirically addressed by extending that research to recognition memory testing (Crowder, 1976 for a review). Specifically, signal-detection models seemed sufficient, whereby the response generated by the subject was based on the strength or familiarity of the stimulus (Egan, 1958; Parks, 1966). However, Mandler, Pearlstone, and Koopmans (1969) proposed that recognition could be performed by either assessing stimulus familiarity or by recollecting episodic details associated with the stimulus. These models became known as dual-process models, and according to Atkinson and Juola’s model, recognition decisions are made quickly when either the familiarity of the stimulus fails to exceed a low-decision criterion or surpass a high-decision criterion, but when the familiarity of the stimulus was insufficient for making a decision, a slower search of memory is conducted for episodic details that could be used to further inform the decision. This decision model influenced not only subsequent models of recognition memory (cf. Malmberg, 2008; Ratcliff & Murdock, 1976) but also models of categorization (Smith, Shoben, & Rips, 1974).

A shortcoming of the signal detection model was that it did not describe how the familiarity of stimulus was obtained. In the 1980s, a new wave of formal models of familiarity was developed, marking a highpoint for mathematical psychology (Hintzman, 1988; Humphreys, Bain, & Pike, 1989; Murdock, 1982). Collectively, these models became known as global-matching models. Like Atkinson and colleagues, Gillund and Shiffrin (1984) assumed that the familiarity of the stimulus is generated via a parallel activation of traces and wed this model of recognition to the model of retrieval described by Raaijmakers and Shiffrin within the SAM framework. The elegant blend of search and familiarity generation was achieved by assuming that the two processes were not independent, but closely dependent: For search, each sample was defined by a ratio of the trace strength activation to the sum of such activations across all (above threshold) traces. As described above, activation was due to a match of the content and context cues in the probe to the content and context information stored in the trace. Recognition was assumed to be a mixture of “familiarity” and search processes, the degree of each varying due to different task demands. Critically, the familiarity component was assumed to be the denominator of the sampling rule.

The conceptual basis of the Atkinson and Shiffrin model and the SAM model requires that recall based on a memory search be an option during a test of recognition. Gillund and Shiffrin were unable to identify data that required the dual-process assumption, partly because Gillund and Shiffrin focused their modeling on the accuracy of recognition, whereas Atkinson and colleagues focused on the speed of recognition. In addition, Gillund and Shiffrin focused on standard item recognition, whereas Mandler et al. focused on more complex recognition tasks. The question concerning the extent to which recall and familiarity affect recognition has been the subject of intense debate ever since, and the relationship between the speed and accuracy of recognition proved to be critical in subsequent developments of the dual-process approach. We will return to this topic after introducing the retrieving effectively from memory family of models.

The Retrieving Effectively from Memory (REM) model

The assumption that recognition was based on global familiarity (the summed activation across all memory traces) proved highly useful and as described below seems to describe performance in a wide range of simple recognition tasks. However, Ratcliff, Clark, and Shiffrin (1990) failed to find support for a critical result predicted by the model, a result at variance with almost the whole family of extant global-matching models, the so-called list-strength effect. This effect refers to the prediction that recognition performance should decrease if the strength of the other items on the list increases, just as the performance decreases with increases in the number of other items. Both of these should increase noise (or variability) and hence should decrease the signal-to-noise ratio. This misprediction was resolved in a plausible and elegant fashion by assuming that as items get stronger they become less similar to other traces — they become differentiated from them. The drop in similarity causes the activation of competing non-target traces to decrease (Shiffrin, Ratcliff, & Clark, 1990).

While this assumption was conceptually plausible, Shiffrin and Steyvers (1997) developed a more principled solution (the REM model) that leads to this prediction. Their solution was based on the idea that the system makes an optimal decision, optimal in the sense of taking into account the information stored in memory and the rules that govern such storage (the assumptions that the model makes about memory storage). To be more specific, the REM model incorporated this idea by assuming that recognition decisions are based on a rational, Bayesian decision process. In order to implement this approach, REM adopted multidimensional traces to represent past events and knowledge. The assumption was not novel; models of categorization and indeed almost all the other global-matching models assumed multidimensional representations. However, new theoretical power was created when they were combined with the Bayesian/rational architecture of REM. The basic idea is straightforward: Trace activation rises when more features match between memory probe and trace, and drop when more features mismatch. If a trace is stored more strongly and it differs from a probe (a test of a different item), then mismatching features increase and activation decreases (see also Criss, 2006; Criss & McClelland, 2006). For a recent application of the differentiation mechanism to understanding the consequences of testing memory, see Kılıç, Criss, Malmberg, and Shiffrin (2017).

Recognition and recall in REM

Interestingly, this Bayesian solution leads to a global-matching model for recognition in which the decisions are based on the average likelihood ratio of all memory traces (the likelihood ratio of that trace matching the test probe). Note that this is similar to the SAM model if one substitutes the likelihood ratio’s for SAM’s strength values, although SAM’s strength values were a function only of the overlap of features, the number of matching features. The similarity between the REM and SAM models for recognition (both based on a global sum over all memory traces) also suggested that the REM model might be generalized to recall if one substitutes REM’s likelihood ratios for SAM’s activation values. This approach was first investigated by Diller, Nobel, and Shiffrin (2001) and greatly extended by Malmberg and his colleagues (e.g., Lehman & Malmberg, 2009, 2013; Malmberg & Shiffrin, 2005). They applied this model to unintentional and intentional forgetting (directed forgetting) and also formulated a new version of the buffer model. This SAM-REM model represents the most sophisticated model that grew out of the framework set forth by the Atkinson-Shiffrin 1968 model. This version of the model has become more detailed and more rigorously specified, but nonetheless shares many of the basic assumptions with the original Atkinson-Shiffrin model; to name a few, a buffer model for rehearsal in short-term memory, experiences stored as separate memory traces, and recall based on a probabilistic sampling process that in turn is based on the activation values of the individual memory traces.

With the development of the REM models of familiarity and search, research once again turned to the influence of these processes on recognition. Indeed, new methods for testing recognition memory and for analyzing the data led to a resurgence in research on the topic. Many researchers endorsed models assuming that recognition is mostly due to familiarity alone (Dunn, 2004; Wixted, 2007), while others endorsed dual-process models (Reder et al., 2000; Tulving, 1983; Yonelinas, 1994, 2002). Atkinson and Shiffrin suggested in 1968 that the focus should not be on which recognition process is correct, but rather what mix of these processes is best suited for a given task, an approach still likely optimal today.

A good example of this approach occurred when Malmberg and Shiffrin published two articles featuring extensive formal modeling in REM in the same issue of JEP:LMC. The Malmberg, Zeelenberg, and Shiffrin (2004) article rebutted research suggesting that use of the benzodiazepine midazolam selectively impaired the search component of the dual-process model by showing that the REM model of familiarity predicted the complex set of observations on the assumption that midazolam impaired encoding of traces by introducing noise to memory traces (Hirshman et al., 2002). On the other hand, Malmberg, Holden, and Shiffrin (2004) found that a dual-process model was required to account for the registration-without-learning phenomenon (Hintzman, Curran, & Oppy, 1992). The upshot was concrete evidence within the framework of the Atkinson and Shiffrin theory that at times recognition was driven by the familiarity of the stimulus, at other times the full outcome of the search process was more important.

To reconcile the different conclusions and to relate the different recognition models within a coherent framework, Malmberg, Holden, & Shiffrin (2004) noted that recognition paradigms in which foils produce about as much familiarity as targets likely would require an additional recall process, as when the task uses words and requires decisions concerning the plurality of a studied word. This approach was extended to associative recognition, which is another task that requires the discrimination of targets from otherwise familiar foils (Xu & Malmberg, 2007) and investigated by various means of testing memory in different contexts (Malmberg & Xu, 2007). For instance, the search process seemed to play a more important role when all foils were similar to a target versus when testing only involved a few similar foils and when the recognition decision required a confidence rating in addition to a yes-no decision. In addition, speed accuracy trade-off functions show that under conditions when targets and foils are similar, accuracy improves in a non-monotonic fashion, suggesting that the search component of retrieval requires additional time to provide the episodic details required to reject familiar foils (Dosher, 1984; Gronlund & Ratcliff, 1989). Malmberg (2008) synthesized these findings in an integrated dual-process framework in which it is assumed that a control process governs the contribution of familiarity and search in a manner that makes the performance of the recognition most efficient with respect to the goals of the subject and conditions of testing. It is interesting to note that the initial name of REM was in fact the retrieving efficiently from memory theory.

Integration of implicit and explicit memory in REM

Soon after the REM model was developed, it was realized that it could be generalized to a number of other memory paradigms, including paradigms such as lexical decision and implicit memory. Schooler, Shiffrin, and Raaijmakers (2001) showed how the model could account in a simple way for priming effects (i.e., implicit memory) in perceptual identification and Wagenmakers et al. (2004) developed a model based on REM for lexical decision. A key innovation of these models were concrete descriptions of episodic and lexical/semantic memory traces. In prior models, the focus was on incomplete and error-prone episodic traces associated with a single learning context (or a small number). The models of implicit memory and lexical/decision focused on access to lexical/semantic traces, which were assumed to be relatively complete and accurate representations of knowledge. Importantly, lexical/semantic traces represent not only knowledge about an item but all the contexts in which that item has been encountered or used. In this sense, knowledge is assumed to be decontextualized and therefore readily available for use. Priming is predicted on the assumption that each time a word is encountered new contextual elements are added to its lexical/semantic trace, making it more available than it would be if the word was not recently used.

Malmberg and Shiffrin (2005) reviewed a number of tasks in which testing was either explicit and implicit, and, in general, explicit memory performance increased if items were given immediate repetitions (or longer study times) or spaced repetitions at study. Yet massed and spaced study often produced quite different patterns of results in the performance of implicit memory task. Specifically, whereas spaced repetitions enhanced priming, massed repetitions or increases in study time did not. Interestingly, a similar pattern of results had been observed in the context-dependent memory literature by Murnane and Phelps (1995), which at the time was not considered important. However, when taken together with the results from the implicit memory literature, it suggested that a fixed amount of context is stored each time a word is encountered in both the newly formed episodic trace and the existing lexical/semantic trace. Hence, spaced repetitions increase the amount of context stored in memory, whereas other strengthening operations do not. This became known as the “one-shot” hypothesis.

The one-shot hypothesis could explain differences in massed and spaced study on priming, but needed validation in some other setting. To test the one-shot hypothesis, Malmberg and Shiffrin carried out a series of studies based on the list-strength findings of Ratcliff, Clark, and Shiffrin (1990) and the theory in Shiffrin, Ratcliff, and Clark (1990). The new studies were designed to test the hypothesis that in tasks requiring memory for content, content would be given explicit encoding and hence storage in memory, during the entire period the item was available for study, but that context would be stored only for a brief period of time (a second or two) for each separate spaced presentation. The results provided compelling evidence that context was indeed stored differently than content, getting “one shot” of automatic storage (for a second or two) upon each presentation of an item. Thus, longer study, massed repetitions, and elaborative rehearsal would lead to additional encoding of content that would continue during the period of the presentation, but context would be stored only for the first second or two at a given presentation, but would be stored again at each subsequent presentation.

SARKAE

The relationship between experience and knowledge continued to be the subject of intense research. Nelson and Shiffrin (2013) “closed the loop” between encoding and retrieval, and between episodic short-term traces and weak and strong long-term traces in an article titled “The Co-Evolution of Knowledge and Event Memory.” The model was termed Storage and Retrieval of Knowledge and Events, with acronym SARKAE. It started with the assumption that events are stored individually, as contextually defined episodic traces, in both short-term memory and long-term memory. It then described the way that knowledge is formed through accrual of individual events that are sufficiently similar (as when an item is repeated). On the other hand it described how an event occurrence accesses and retrieves knowledge and thereby produces coding of the features that represent the event in short-term memory.

The theory was supported by two studies in which novel items, Chinese characters, were learned over the course of several weeks, with individual characters learned to different degrees. The first study trained with a visual search task. In this study the effects of frequency could have been due to increased similarity of high frequency characters due to increased co-occurrence. The second study trained by having subjects make perceptual matching decisions for the same character in slightly differing physical forms, eliminating co-occurrence. The training in both studies was followed by tests of episodic recognition memory (a traditional episodic memory task), pseudo-lexical decision (tapping access to knowledge), and forced-choice perceptual identification (a form of perception). The large effects of training frequency in both studies demonstrated an important role of pure frequency in addition to differential context and differential similarity. The SARKAE model was implemented quantitatively and applied to all three transfer tasks, bridging the usual research and theory gap between perception, short-term memory, and long-term retrieval.

Cox and Shiffrin (2017): A dynamic approach to recognition memory

Space only allows a hint of this research. For most of the 50 years since Atkinson and Shiffrin the “micro-structure” of encoding of presented events, and the behavioral consequences of encoding that evolves over short time periods (say 1 s or less) has been ignored in memory modeling. Even in models that jointly predict accuracy and response times, the typical approach has ignored the differential time course of perception of individual features and groups of features (e.g., Ratcliff, 1978). Cox and Shiffrin assumed that features arrive over time, with certain types of features arriving slower than others. These features arrive on the basis of retrieval from knowledge. To model recognition memory, the model assumes that at each moment in time the then current features, including context, are compared to event traces that had been stored previously. This comparison produces a current value of “familiarity” that changes as new and different features are encoded and join the probe in short term memory. The resultant value of familiarity “saturates” as all the features in the current event become encoded. Thus there are decision boundaries for “old” and “new” responses that converge proportionally to the degree of expected saturation. The resultant model predicts a variety of findings that had not been explained previously, especially findings from signal-to-respond experiments. A key to the success of the predictions is the idea that certain features are encoded earlier than others. Thus physical features like shape are encoded before higher level features such as meaning and associations. This research produced novel insights regarding word frequency, speeded responding, context reinstatement, short-term priming, similarity, source memory, and associative recognition, revealing how the same set of core dynamic principles can help unify otherwise disparate phenomena in the study of memory. Yet this model builds in cumulative fashion on the sequence of models and the core assumptions originating with the Atkinson and Shiffrin chapter in 1968.

Challenges and alternatives to the theory

We have no desire to defend a theory that is now fifty years old against changes sparked by new findings and new ideas. All theories and models in all domains of science are wrong, but the good ones are useful. Due to the inherent variability in behavior that is seen in humans, theories and models are particularly crude approximations to reality. Thus, as we consider challenges and alternatives to Atkinson and Shiffrin, they should be judged by the degree to which they add or subtract from the usefulness of the theory.

Working memory

“The short-term store is the subject’s working memory; it receives selected input from the sensory register and long-term store” (Atkinson and Shiffrin, 1968, p. 90)

Conventional wisdom pits Atkinson and Shiffrin’s modal model against Baddeley and Hitch’s (1974) working memory model. However, we regard this debate as one more of perspective than reality. The working memory model is an instance of the modal model’s short-term memory store and control processes. Although there are differences, the similarity between the audio-verbal-linguistic store and the phonological loop and the similarity between the concept of the central executive and the concept of control processes, makes it clear that it will be difficult to empirically distinguish the two versions. Of course, the experimentation generated by the debate concerning working memory was very useful, leading to a better understanding of the different forms of short-term memory, capacity limitations, and control processes. An example is found in the specification of a visual short-term store. Atkinson and Shiffrin considered the then extant evidence for such a store, but at the time evidence was indeterminate. Following the research by Shepard and Metzler (1971) and others (e.g., Jonides, Smith, Awh, Minoshima, & Mintun, 1993), it became more and more apparent that something like a visuo-spatial sketchpad was operating as a working memory.

Four outstanding issues: Rehearsal, the short-term store, continuous distraction, and contiguity

On many occasions the Atkinson and Shiffrin chapter has served as a benchmark against which new ideas are measured, and instead of which alternatives have been proposed. Part of the influence of the chapter was due to the use of formal models consistent with the overall framework to test the ideas within settings from a wide variety of experiments. This was a somewhat novel approach at the time, and set a standard for publications in the best journals that lasted for many years. Despite its influence, or perhaps due to it, the Atkinson and Shiffrin framework has been routinely criticized. This criticism could itself be viewed as a success, given the goal of science should be progress, and everyone should want to see old ideas be refined or replaced. Thus, the fact that most introductory text books published in the past couple of decades present the dual-store model as disconfirmed in one way or another testifies to the utility of the theory. At the same time, comprehensive theories of memory, be they cognitive (e.g., ACT-R, TODAM, or connectionist models) or neuroscientific (O’Reilly, 2006), adopt the central tenets of the approach, such as a dual store framework, and the fundamental importance of control processes. In addition, some of the criticisms are based on confusions and these deserve clarification (see Raaijmakers, 1993, for a discussion of the sources of confusion).

Rehearsal

A fundamental characteristic of the Atkinson and Shiffrin model is that the contents of STM are under the control of the subject. The results of Rundus (1971) clearly linked rehearsal with the ability to freely recall from long-term versus short-term memory. The fact that there are substantial benefits of one sort of rehearsal, often termed elaborative, and that the chapter placed greatest emphasis on a type of maintenance and rote rehearsal, has led some to reject the Atkinson and Shiffrin framework (Craik & Lockhart, 1972; Craik & Tulving, 1975). Yet this very point was made in the chapter:

“When the subject is concentrating on rehearsal, the information transferred would be in a relatively weak state and easily subject to interference. On the other hand, the subject may divert his effort from rehearsal to various encoding operations which will increase the strength of the stored information.” (p. 115)

This prediction, that there are modest benefits of maintenance rehearsal and added benefits of elaborative rehearsal, has of course been confirmed many times. In addition to the original experiments reported by Atkinson and Shiffrin, Nelson (1977) directly tested the effects of maintenance rehearsal and levels of processing on free recall, cued recall, and item recognition. For all three memory tasks, both a semantically oriented encoding task and the amount of time the subject spent encoding during study benefited memory. Such findings notwithstanding, the results of several item recognition experiments, using manipulations very similar to those used by Craik and Tulving for free recall, showed that recognition accuracy improves with increases of maintenance rehearsal (e.g., Glenberg, Smith, & Green, 1977; see also Darley & Glass, 1975). This led researchers to ask why maintenance rehearsal benefits recognition and not free recall. However, this was the wrong question. Lehman and Malmberg (2013) reanalyzed Craik and Tulving’s results and found that increases in maintenance rehearsal actually improved free recall, confirming Atkinson and Shiffrin’s speculation in contrast to Craik and Tulving’s original conclusion. Other results thought to provide evidence against the role of maintenance rehearsal in storage are actually predicted by the model. For instance, Wixted and McDowell (1989) found that extending rehearsal was beneficial to free recall only when it was provided in the beginning or middle of a study list. These results suggested to them that additional rehearsal given to end of list items does not affect their long-term storage. However, the results are actually quite consistent with the buffer model accounts of classical results such as those of Rundus (1971) and Murdock (1962). That is, buffer models predict that additional rehearsal would have no effect on immediate recall, as recall would be initiated by retrieving the items that were being rehearsed, and increases in the amount of time devoted to rehearsing an item strengthen the encoding of a long-term episodic trace, but additional rehearsal does not affect the representations of the items being rehearsed.

Malmberg and Shiffrin (2005) accounted for levels of processing effects on the assumption that maintenance rehearsal tends to produce storage of information about the physical form of the stimulus, but elaborative rehearsal produces greater storage of meaning. When combined with the plausible idea that retrieval cues tend to be dominated by meaning information, the joint effects of maintenance and elaborative rehearsal are no surprise. Additional findings consistent with this hypothesis, showing the joint effects of form and meaning encoding, were reported by Criss and Malmberg (2008).

Retrieval from short-term store

The Wixted and McDowell results mentioned in the previous paragraphs highlight another assumption of the Atkinson and Shiffrin chapter, that the traces in STM are in a privileged state, allowing them to be retrieved easily, with little interference from traces in LTM. A criticism of this assumption is found in Cowan (1998, following Shiffrin, 1973) proposing that items in the focus of attention are simply in a relatively active state in LTM and not immune from interference from other traces in LTM. Lehman and Malmberg (2013) provided a direct test of the privileged-state assumption. In this experiment, the length of the study list was varied over an extensive range, and memory was tested via immediate free recall. The critical data concerned the relationship between the probabilities of first recall as a function of serial position. As usual, items from the recency portion of the serial position curve were most likely to be recalled, but they were also most likely to be recalled first, and this first recall probability was unaffected by the length of the study list. Moreover, the amount of time it took subjects to recall the first item was unaffected by the length of the study list. Given that more items were studied on long lists than short lists, these findings indicate that the long-term traces stored during study did not interfere with the retrieval of traces from STM.

Continuous distraction

For over 40 years, the effect of continuous distraction has been widely believed to be problematic for the Atkinson and Shiffrin model: Interpolated and attention demanding tasks during the study of a list of items produces a normal looking free-recall serial-position curve, with pronounced recency, and reduced probability of recall for positions prior to the recency portion (Baddeley & Hitch, 1974). In addition, there were findings that with a distracting task, the recency portion of the serial position curve remains even after a delay that would normally eliminate the recency effect (Bjork & Whitten, 1974). This result casts doubt on the explanation given by Atkinson and Shiffrin that the lack of recency for delayed free recall is due to the removal of items from the rehearsal loop. They suggested the combined results were better explained by a temporally based retrieval process. However, a different explanation is also available: The long-term recency observed in the presence of continuous distraction could be due to changes in context during study coupled with a test probe that uses primarily recent context. Such an assumption can also explain the dissociations that have been observed between the short-term and long-term recency effects (see Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, & Usher, 2005; Raaijmakers, 1993). Lehman and Malmberg (2013) demonstrated that the SAM-REM model indeed accurately accounts for both the short-term and long-term recency effects.

Contiguity

In 1996, Kahana reported a robust tendency to recall items from adjacent serial positions during free recall, which he referred to as the lag-recency effect. Interestingly and reminiscent of Raaijmakers and Shiffrin’s work on part-list cuing, Kahana worked within the SAM framework to show that lag recency could be explained by a model that assumed information retrieved from memory on prior trials is used on subsequent trials as a retrieval cue. Hence, items from adjacent and nearby serial positions tend be recalled in proximity because they were co-rehearsed and inter-item associations were created between them. However, this model was abandoned in preference for a model that accounted not only for short-term recency and lag-recency effects but also long-term recency and lag-recency effects (Howard & Kahana, 2002). The long-term lag-recency effect may be accounted for within the SAM framework by assuming that upon retrieval of an item the subject not only uses that item as a subsequent retrieval cue (as assumed in SAM), but also recovers context information from the retrieved memory trace and uses that context information as the new context cue. There is one aspect of the data presented by Kahana and his colleagues that cannot be so easily explained within the SAM model, namely the clear preference both for recency and non-recency items to show a preference for a forward recall order. For example, in a typical free recall paradigm, recall of item N is more often followed by recall of item N+1 than N-1. Such a tendency was not predicted by the Atkinson and Shiffrin model, nor by SAM or REM. However, there are plausible accounts other than the context account. For example, items (such as pairs) that are presented sequentially may be stored as an associated group with a forward coding, a coding that is used for retrieval order when the trace of the group is sampled. Some evidence favoring such an account over the context account was obtained by Lehman and Malmberg (2013), who showed that the usual lag-recency effect was eliminated when studied items were broken into two item chunks.

Inhibition

Another type of criticism has been raised against the Atkinson-Shiffrin framework, a criticism that applies to all theories that attribute a major part of forgetting from long-term memory to interference due to competitive retrieval processes. This criticism comes from proponents of the inhibition account for forgetting (e.g. Anderson, 2003; Bäuml, 2008). According to this view, forgetting can be due to the active suppression of incorrect memory traces that are activated when one tries to retrieve the correct target trace. This active suppression hypothesis has been claimed to uniquely account for a large number of findings from various experimental paradigms, such as directed forgetting, part-list cuing, retrieval-induced forgetting and the think/no-think paradigm. However, these claims have themselves been criticized by several researchers, including Lehman and Malmberg (2009), Verde (2012), and Raaijmakers and Jakab (2013), who point out alternative explanations for the findings that are consistent with the modal model accounts and more generally with accounts that attribute retrieval failure to competition and interference. Raaijmakers (2018) presents a review of the issues, concluding that what the field needs is a more formalized account of the inhibition hypothesis, echoing the strategy that Atkinson and Shiffrin advocated in their chapter.

Concluding comments

Much, probably most, of the Atkinson and Shiffrin model remains in regular use today, albeit sometimes under alternative terminology. Its success can be measured both by the alternative accounts it has sparked and by the research that has extended and refined the original.

In this article we have reviewed a number of the developments that have extended the Atkinson and Shiffrin framework. We and others have adapted the framework to accommodate new findings that have been discovered since 1968 (mirror effects in recognition, context effects, implicit memory, and many others). In addition, the use of mathematical and computational models has increased, owing not only to increased computational power, but also to the persuasive case made for the utility of such modeling in the chapter. The fact that research and modeling have moved on over the intervening 50 years is a statement of progress in science rather than a critique of the model. Because the 1968 framework captured many of the main processes of cognition, it perhaps unsurprising that, despite all of the modifications and extensions, the models that evolved from the original still share most of the same basic elements, including the importance of control processes, the distinction between short-term and long-term memory and the emphasis on memory search failures as a cause of forgetting.