Introduction

Some 50 years after its first publication, the paper by Atkinson and Shiffrin (1968) has been cited over 10,000 times (as of October 2018, source: Google Scholar) and continues to be influential in the development of cognitive psychology. We reflect on why this is the case, and what lessons can be learned regarding theory development in our field. As Atkinson and Shiffrin point out, their paper falls into two parts, the first of which comprises “a fairly comprehensive theoretical framework for memory which emphasises the role of control processes – processes under the voluntary control of the subject such as rehearsal, coding and search strategies” (pp. 190-191). The second part describes a series of models developed using this general approach.

Two aspects of their framework influenced our own views and will form the bulk of our discussion. The first of these is their postulation of a short-term store of limited capacity, and the second is their proposal that this acts as a ‘working memory’, playing a crucial role in performing a wide range of cognitive activities. Their initial section is followed by a detailed account of the development and testing of a series of models concerned with the role of a rehearsal buffer in long-term learning. Importantly, they describe this development not as a general theory, but as exploring “a sub class of possible models that can be generated by the framework proposed”, emphasising that a range of other approaches are feasible within the general framework (Atkinson & Shiffrin, p. 191). The resulting system is comprehensive enough to provide a good general account of research on human memory in terms of a framework that is simple and coherent but open to more detailed exploration and subsequent modification without the need to abandon the framework when unexpected results emerge.

The Atkinson and Shiffrin (A & S) framework became known as the modal model, although this term appears to have been originally proposed by Murdock (1967) in a paper that summarises a range of memory results and interprets them within a less developed information-processing model than that proposed by A & S. They summarise the many advances made in the study of memory over the previous decade, presenting them within a coherent broad framework that we will argue has stood the test of time. An important feature of the framework is its differentiation between memory structures and fluid ‘control processes’, which manipulate information within those structures. Finally, it attempts to link the model to the world beyond the laboratory, although this is more by implication than by empirical investigation, proposing that the short-term store (STS) within their model acts as a ‘working memory’. As they acknowledge, this was at the heart of Broadbent’s (1958) attempt to link attention and short-term memory, a tradition that we ourselves have attempted to carry on, as have many others.

Assumptions of the Atkinson and Shiffrin (1968) framework

In thinking about this 50-year-old model, it is tempting to limit consideration to the simplified representation that has occurred within text books ever since, and to ignore the many underlying assumptions that have proved to be robust and important, allowing the framework to continue to be productive. We discuss these basic assumptions before going on to consider aspects of the model that were less successful, observing that, rather than leading to an abandonment of the model, as some approaches to theorisation might suggest, they proved to be growing points that allowed further extension and enrichment of the basic framework proposed.

In an article that is highly critical of the lack of theory in current psychology, Gigerenzer (2010) stresses the importance of being aware of the assumptions underpinning theoretical development, contrasting psychology unfavourably with physics and economics. The latter is perhaps an unfortunate choice given the fallibility of its complex theoretical structures based on assumptions such as human rationality and the perfection of the market. As Keynes remarked, “it is better to be roughly right than precisely wrong”. Gigerenzer’s criticism cannot, however, be levelled at Atkinson and Shiffrin, who explicitly list the basic assumptions of their research framework, together with the evidence on which they are based. Some 50 years later, we can revisit them and see how well they have withstood the test of time. They are broadly as follows:

  • Atkinson and Shiffrin (A & S) propose a “general theoretical framework for human memory”.

  • Their system distinguishes between permanent structural features and readily modifiable programmable control processes. We regard this as an important distinction sometimes lost in later tendencies to theorise in terms of memory as a by-product of ‘processing’, objecting to the term ‘store’ as implying passive maintenance of the original experience (e.g. Craik & Lockhart, 1972). We ourselves suggest the need for both storage and processing; processes are certainly important but require some form of continuing maintenance over time, for which the term ‘storage’ is helpful.

  • A & S assume three structural components, a bank of sensory registers, a short-term store (STS) and a long-term store (LTS). They defend this on the basis of earlier research, notably including information from neuropsychological single case studies. This assumption has subsequently been contested, particularly on the basis of neuroimaging studies. We return to this issue later. Our own view, however, is that this separation has continued to be well supported, although subsequent work has led to further fractionation of the three systems (see Baddeley, Eysenck, & Anderson, 2015). The sensory registers are assumed to differ across modalities and link to further analysis and investigation of the role of both storage and processing within the relevant perceptual systems. The STS concept has been elaborated into a more complex working memory system (see below), while long-term memory has also been fractionated into semantic/episodic and implicit/explicit systems.

A & S accept that memory is likely to operate across a number of modalities, but focus on what they term the ‘audio-visual-linguistic system’, linking it directly to their proposed STS. This emphasis on verbal memory is understandable given that the vast bulk of experimental and theoretical work on human memory has involved such material. We would argue, however, that it is perhaps unfortunate that more effort has not, over the years, been made to explore the generality of results of verbal studies, other than simply regarding nonverbal memory as providing further potentially helpful features, as in Paivio’s (1971) dual coding hypothesis. This imbalance has recently begun to change, principally through investigators interested in vision, often influenced by attempts to develop automatic object recognition systems (Brady, Konkle, & Alvarez, 2011; Isola, Xaio, Parikh, Torralba, & Oliva, 2014). Such theorisation has, however, tended to focus on stimulus characteristics rather than the activities of the rememberer, although recent work has attempted to combine research from the verbal and the visual memory traditions (see, e.g., Baddeley & Hitch, 2017; Evans & Baddeley, 2018).

  • A & S’s proposed framework assumes pathways from the sensory registers to STS and between STS and LTS, and emphasises the importance of control processes in modifying the flow of information through them, stressing their potential complexity and dependence on LTS. However, in practice A & S focused on the particular control process of verbal rehearsal. While this can be readily demonstrated using an appropriate paradigm, it is far from optimal as a mechanism for long-term learning (Hyde & Jenkins, 1969), and in particular underestimates the role of more complex encoding strategies such as those demonstrated in levels of processing studies (Craik & Lockhart, 1972; Craik & Tulving, 1975).

  • It is important to note, however, that more complex methods of rehearsal remain entirely plausible within their system, which emphasises the flexibility and importance of the strategies adopted by participants, as exemplified by subsequent models within this tradition (e.g. Lehman & Malmberg, 2013; Raaijmakers & Shiffrin, 1981). While strategy has not been extensively studied, it has continued to be accepted as potentially important within cognitive psychology and typically controlled by requiring a sequence of experiments that carefully constrain potential processing strategies. Unfortunately, this has been much less common in neuroimaging studies where the trouble and expense of running a series of experiments has tended to encourage reliance on the simplistic assumption that a single task reflects a single underlying concept, whereas few if any tasks are in fact sufficiently process-pure to justify this assumption. As A & S (1968, p. 101) point out, “Both STS and LTS are active in both STS and LTS experiments”.

  • This issue is reflected in A & S’s distinction between the concepts of short-term memory (STM) and their proposed short-term store (STS). In their account, STM refers to a range of paradigms whereby small amounts of information are maintained over a limited period, whereas the term STS refers to a hypothetical storage system that may be involved to a greater or lesser extent in such STM paradigms. Hence, as Keppel and Underwood (1962) showed, the Peterson and Peterson (1959) task involving the retention of consonant triplets over delays of up to 18 s, initially regarded as a classic STM task, does in fact depend heavily on LTS, although the STS is also involved (Baddeley & Scott, 1971). Similarly, recency effects, initially regarded as a hallmark of the STS (Glanzer & Cunitz, 1966), can be found across a range of LTM and STM paradigms, and can better be seen as reflecting the application of a recency-based retrieval control strategy to primed representations within a range of different storage systems (Baddeley & Hitch, 1993; see also Lehman & Malmberg, 2013).

It is of course entirely valid to ask a time-based question such as what is happening to information stored over a brief time interval, as for example in the analysis of ongoing processing in speech comprehension. It is, however, important in doing so to accept that this is likely to involve a number of potentially separable processes, and that a tendency to conflate STM and STS is likely to lead to theoretical confusion (Jenesen & Squire, 2012; Waugh & Norman, 1965). LTS does influence storage of information over the first few seconds, and relevant theories of LTS such as those based on Estes (1950) are likely to be relevant in accounting for processing over this interval. (e.g. Nairne, 2002). They are not, however, the whole story, and effects of LTS need to be carefully controlled if a system such as Atkinson and Shiffrin’s STS is to be investigated.

  • The assumption made by A & S that is most central to our own work is that “the short-term store is the subject’s working memory; it receives selected input from the sensory register and also from long-term memory” (A & S, 1968, p. 97). They propose further that it yields hypotheses that are linked to thinking, problem solving and a range of other complex cognitive activities, while accepting that “the framework raises more questions than it answers” (A & S, 1968, p. 97). The multicomponent model of working memory stemmed from an attempt to use the STS component to answer some of these questions, resulting in the need to extend and elaborate this aspect of the modal model. This provides the focus of what follows.

STS as a working memory

We began our first grant at a time when the intense interest in STM was beginning to fade. The gold rush days when everyone seemed to have their own paradigm and a mathematical model to fit were fading, overtaken by interest in semantic memory and levels of processing. This in fact proved fortunate, since instead of worrying about how our work fitted in with everyone else’s, we could focus on the model produced by A & S, a ‘modal model’ in the sense that it encompassed and reflected much of the work that had gone on during the previous decade and presented it in a manner that invited further exploration. Both Baddeley and Hitch had completed their graduate training at the MRC Applied Psychology Unit in Cambridge (now the Cognition and Brain Sciences Unit) under Broadbent’s directorship, and were influenced by the Unit’s remit of combining basic and applied psychology (Baddeley, 2018). We decided that the first question we should ask was whether the STS did indeed serve as a general working memory. We did so by attempting to manipulate its available storage capacity, observing the effect on three different cognitive activities: reasoning, comprehension and learning. We based our approach on a concurrent task method, requiring participants to perform the relevant cognitive activities at the same time as repeating random digit sequences varying in length. Performance declined as the length of the concurrent sequence was increased on each of our three cognitive activities of reasoning, comprehension and learning, suggesting that the STS did indeed serve as some kind of working memory. However, the decrements were far less than anticipated. The STS was indeed relevant, but not nearly as important as the modal model would seem to suggest. We decided to modify the modal model, taking into account both our own results and some neuropsychological evidence that had just been published (Shallice & Warrington, 1970). This reported a newly discovered patient who appeared to have a grossly impaired STS with a digit span of two, together with an apparently normal LTS and no evidence of the very general cognitive disruption that would be expected if the STS served as a working memory. How could both the neuropsychological and our own data be reconciled with the modal model?

Our new model comprised three components, one of which was the phonological loop involved a verbal/acoustic system, similar in nature to A & S’s STS, in which material could be maintained and if necessary transferred to LTM via subvocal rehearsal. We also postulated a broadly equivalent visuo-spatial system, although this was mentioned only briefly in our original paper (Baddeley & Hitch, 1974). We were already beginning to investigate visual STM (Baddeley, Grant, Wight, & Thomson, 1975; Baddeley & Lieberman, 1980; Phillips & Baddeley, 1971), and although included in the 1974 proposal, we only began to actively incorporate the visuo-spatial sketchpad into the overall model some years later. The most marked difference from the modal model, however, was the explicit structurally-defined short-term verbal/acoustic store and a separate attentional control system, the central executive. We initially termed the verbal/acoustic store the articulatory loop, emphasizing its function as a control process, as did A & S. We did, however, later decide that this term did not do justice to its basic storage function, adopting the term phonological loop, although without wishing to be precise about the linguistic processes underpinning it. We return later to the structure versus processing distinction.

We started by focusing on the phonological loop since we regarded it as the simplest and most tractable subcomponent of the system. This proved to be the case, allowing us to separate and analyse both the storage system, principally using phonological similarity as a marker, and the subvocal rehearsal system, principally using articulatory suppression to disrupt rehearsal (e.g. Allen, Baddeley & Hitch, 2006; Baddeley, Chincotta & Adlam, 2001). The precise nature of forgetting within the phonological store remains controversial, however. We presented evidence that we felt suggested time-based trace decay (Baddeley et al. 1975), while others presented both counter evidence (Lovatt, Avons & Masterson, 2000) and evidence in favour (Mueller, Seymour, Kieras & Meyer, 2003). The issue remains hotly disputed (Barrouillet & Camos, 2014; Hulme, Suprenant, Bireta, Stuart, & Neath, 2004), and the nature of short-term forgetting remains an important question but is fortunately not crucial for the overall concept of a phonological loop.

Although the three-component system could account for a wide range of experimental results, it had difficulty in handling data based on prose recall as used, for example, in the working memory span task that Daneman and Carpenter (1980) had shown to be such a good predictor of individual differences, not only as originally proposed in prose comprehension, but also in a wide range of other complex cognitive tasks including reasoning and performance on standard intelligence tests (Conway et al., 2008). In the face of these and other related problems, a fourth component was proposed, the episodic buffer (Baddeley, 2000), a multidimensional interface that was assumed to be capable of binding information, either within or between systems into episodes that were then available for conscious awareness. As such it provided an essential component of our revised working memory system.

Much of the last decade has been concerned with attempting to use the concept of an episodic buffer productively, hence avoiding the danger that it may simply become a convenient way of explaining unwanted anomalies. Our initial assumption was that the binding of features such as colour and shape into objects, or of words into meaningful phrases, was directly dependent on the buffer. However, a series of studies systematically manipulating the various components of working memory consistently argued against this hypothesis. Syntactic and semantic binding appears to occur relatively automatically based on language skills within LTM (Baddeley, Hitch, & Allen, 2009), while the binding of visual features into objects appears to occur at a level prior to accessing the episodic buffer (Allen et al., 2006). We concluded, therefore, that it is essentially a passive system for combining information from a range of dimensions and cognitive subsystems and making it available to conscious awareness, but that it does not itself serve a binding function (see Baddeley, 2012; Baddeley, Allen, & Hitch, 2011), although maintaining such representations against trace decay or interference does appear to be attentionally dependent (Allen, Baddeley, & Hitch, 2014).

In recent years there has been a dramatic increase in interest in visual short-term and working memory, principally coming from investigators with interests in visual perception and visual attention. We ourselves have become involved in the area, principally focused on supplementing the initially relatively narrow range of methodologies applied to studying visual working memory with methods that had already proved theoretically productive in the study of verbal working memory. These include manipulating attentional capacity by concurrent tasks (Allen et al., 2006; Baddeley et al., 2009), investigating the role of strategy by instruction to focus on subsamples of the visual stimuli (Atkinson, Baddeley, & Allen, 2018; Hu, Hitch, Baddeley, Zhang, & Allen, 2014) and moving from simultaneous presentation of an array of visual stimuli to sequential presentation of individual items (Allen et al., 2006, 2014). This also allowed us to study effects of visual suffixes, noting that their capacity to disrupt STM depended not only on their visual characteristics but also on whether they might or might not potentially have formed part of the relevant test set or came from a different set of broadly similar items (Hu et al., 2014; Ueno, Allen, Baddeley, Hitch, & Saito, 2011).

By pursuing these lines of research and combining them, we found ourselves focusing on the nature of attention and its control, an issue we had initially avoided as being too difficult. Our current results suggest that visual working memory depends on two pools of attentional capacity, both of limited extent. One is concerned with attentional control and can broadly be seen as an aspect of our proposed central executive. It is sensitive to concurrent attentional load, regardless of modality. The other is concerned with the intake of perceptual information rather than executive control (Allen et al., 2014; Hu et al., 2014; Hu, Allen, Baddeley, & Hitch, 2016). Our conclusions have turned out to be broadly similar to those of colleagues approaching the same issue often using different methods from within the attentional field (Chun, Golomb & Turk-Browne, 2011; Lavie et al., 2004; Posner, 1980; Yantis, 2000).

Much of this work, including our own, is limited to studying the retention of simple stimuli such as colored shapes. Such an approach has the advantage of allowing methods from visual attention and its neurobiological basis to be directly applied and for precise and detailed models to be developed. A good example of this is provided by the controversy as to whether the limitation in visual STM is best modeled using the concept of a limited number of storage locations or in terms of limited but flexible storage capacity (e.g. Bays, Catalao & Husain, 2009; Ma, Husain, & Bays, 2014; Zhang & Luck, 2008). This in turn has led to the development of new continuous response measures based on precision rather than categorical error rate. Such detailed modeling occurs explicitly or implicitly within a broader framework, and it is encouraging to see this in the case of visual working memory, as in the case of the recent proposal by Van der Stigchel and Hollingworth (2018) that visuo-spatial working memory plays a fundamental role in the operation of eye movement control system.

We thus regard our own work as part of an attempt to explain the way in which attention and memory interact in allowing us to perform a wide range of cognitive activities. We see our work as part of an ongoing enterprise that extends from Broadbent (1958), through the Atkinson and Shiffrin modal model to a very wide range of studies of working memory across both cognitive psychology and cognitive neuroscience. It is of course important to bear in mind that studies using the concept of working memory reflect many different approaches to the topic, with studies in neuroscience in particular often applying the term ‘working memory’ to simple STM tasks.

STS as activated LTM?

However, while the broad framework produced by A & S, with its emphasis on separate strictures for STS and LTS, has been very influential for over 50 years, in recent years it has been seriously challenged by the claim that short-term storage is simply activated LTM. This could be regarded as perhaps the most substantial objection to our own multicomponent model and as such merits careful consideration. We should begin by stressing that we do not suggest that LTM plays no role in working memory. Even a basic digit span task will depend on knowledge of digit names and frequency of digit sequences (Jones & Macken, 2015), and be much reduced when the digits come from a non-native language, while if presented visually, span will depend on the familiarity of spatial configuration (Darling, Allen, & Havelka, 2017) and learned capacity to turn the visual symbols into sounds. This and many other tasks will also be influenced by strategy, with reliance on phonological coding tending to be abandoned as sequences become longer (Hall, Wilson, Humphreys, Tinzmann, & Bowyer, 1983; Salame & Baddeley, 1986) or when semantic coding proves feasible, as in sentence span (Baddeley, Hitch, & Allen, 2009). As material becomes more complex, the inter-relation with LTM is itself likely to increase in complexity.

We suggest therefore that the crucial question is not whether working memory depends on LTM, but how long-term and working memory interact and indeed whether it is necessary to assume separate long-term and temporary storage systems. The strongest evidence for this, tentatively accepted by A & S, comes from neuropsychology, with some patients showing grossly impaired LTM but preserved STM (Baddeley & Warrington, 1970; Milner, 1966), while others show the opposite pattern of preserved LTM and grossly impaired STM (Shallice & Warrington, 1970; Vallar & Baddeley, 1984).

Cowan (1988, p182) has suggested an alternative view of the neuropsychological evidence, suggesting that the patient described by Shallice and Warrington may have had “a deficiency in one or more of the control processes used to enhance short-term storage (e.g. overt articulation)”. There is, however, no evidence for this; such patients can have excellent language production skills combined with a substantial verbal STM deficit (Vallar & Baddeley, 1984). It could be argued that this is only one possibility, but to propose a model with a range of potential but unspecified control processes that might possibly explain the result does not seem to offer a clear way forward when compared with a well–supported and specified alternative

A more direct criticism of the neuropsychological evidence for separate visual and verbal STM system is provided by Morey’s (2018) proposal that the concept of a separate short-term visual store is unnecessary. Morey’s case rests principally on questioning two sources of evidence for a short- term visual system. The first of these concerns the neuropsychological evidence and in particular on case ELD, initially identified as a case of long-term learning deficit for faces (Hanley, Pearson, & Young, 1990; Hanley, Young, & Pearson, 1991), but which subsequently proved to offer a visual analogue to the type of verbal STM deficit first reported by Shallice and Warrington (1970) that formed the basis for the concept of a phonological loop (Vallar & Baddeley, 1984). Morey (2018) criticizes both of these studies, but this appears to depend on a number of misreports and/or misinterpretations of the original studies, as pointed out by Hanley and Young (in press). In particular, Morey reports ELD’s face memory as normal when this applies only to already familiar faces, whereas her retention of unfamiliar faces was grossly impaired, a pattern resembling verbal STM patient PV’s good retention of words but impaired STM for new phonological information in the form of nonwords (Baddeley, Papagno, & Vallar, 1988). The suggestion of a specific visual STM deficit led to a number of new hypotheses, including the prediction that ELD’s pattern of deficits would extend beyond faces, with impaired performance on a range of visual STM tasks including the Corsi block-tapping task together with normal digit span and impaired performance on the visual but not the verbal components of the Brooks tasks (Brooks, 1967). These, together with her difficulty in remembering new but not familiar faces, provide a clear double-dissociation when combined with the equivalent pattern for patients with verbal STM deficits (see Baddeley & Hitch, 2018). Such a dissociation is not open to Morey’s claim that one type of task is simply harder than the other.

As Morey points out, a double dissociation in which one patient shows a deficit in A but not B while a second shows the opposite pattern, while providing stronger evidence for two separate systems than a single dissociation, is not conclusive, especially in a system with more than two components (Baddeley, 2003; Dunn & Kirsner, 2003). A triple or quadruple dissociation for a three- or four-component system, however, becomes rapidly impractical, forcing the investigator to rely on the method of converging operations whereby the same question is asked using a range of different methods and different populations, only accepting the result when there is extensive agreement (Garner, Hake, & Ericsson, 1956). This is the approach we have consistently taken in developing the multi-component model.

The second major theme of Morey’s review is to reject the hypothesis of separate visual and verbal short-term stores by conducting an extensive meta-analysis of studies in which visual and verbal tasks must be performed simultaneously, finding clear evidence of costs above those expected by such tasks when performed alone. This is suggested to provide evidence against the assumption of separate visual and verbal STM. This is not, however, a valid prediction from our multicomponent model, which would assume at least two additional central executive costs. The first comes from the role of the central executive in maintaining information over the short–term even under single-task conditions. This would be expected to be reduced with verbal information for which articulatory subvocalisation provides a method of maintaining small amounts of information at a relatively low attentional cost, although this cost is likely to increase with longer sequences. In the case of visual STM, we assume that even small loads will require some form of rehearsal by refreshing (Barrouillet & Camos, 2014), an attentionally demanding process. Secondly, there is clear evidence that dual or multi-tasking places a specific additional demand on the central executive (Baddeley, Logie, Bressi, Della Sala, & Spinnler, 1986; Logie, Cocchini, Della Sala, & Baddeley, 2004). We would therefore predict some cost of performing visual and verbal tasks simultaneously, although this would be less than combining two tasks that both involve visual or verbal short-term storage. The degree of interference is likely to depend on precisely which tasks are combined, leading to the pattern of results that Morey observed.

We would argue that although it is not possible to conduct any single experiment that leads to an unequivocal conclusion, the balance of evidence across studies favors our proposal of separate visual and verbal storage maintained by a common executive control system. Demonstrating this within a single experimental study is very demanding, as shown by the attempt to rule out all potential objections to the proposal of separate visual and spatial contributions to STM by Klauer and Zhao (2004), in which they review the literature, finding none of the studies totally convincing, and attempt to test each possible objection across multiple experiments before concluding that the distinction is valid.

One advantage of attempting to apply a model such as our own across a wide range of differing situations is that it does provide potentially converging ways of attempting to conceptualize such a model. The best example of this is provided by the concept of the phonological loop. As we have already mentioned, we began by assuming that the loop was based purely on the process of articulation, as Cowan suggested but moved gradually to a more nuanced approach that assumes separate contributions from both storage and from an optional articulatory rehearsal strategy. Fortunately, it is possible to disrupt rehearsal by articulatory suppression, repeatedly uttering an irrelevant word such as ‘the – the – the’ (Baddeley, Lewis & Vallar, 1984; Murray, 1968). This impairs span, eliminates the word length effect and interferes with long-term learning of new phonological material while leaving semantically-based learning unaffected (Baddeley, Gathercole, & Papagno, 1998), experimentally induced effects that resemble those typically shown by STM-deficit patients. These effects are, however, substantially reduced in magnitude, relative to those shown by patients. Thus, suppression reduces span by about two items, leaving performance well above the 1- to 2-item span in patients (Vallar & Shallice, 1990), suggesting that span depends on substantially more than the capacity of the rehearsal system. Dyslexia and related developmental reading problems tend also to be associated with reduced span, a finding that Shankweiler, Liberman, Mark, Fowler and Fischer (1979) attributed to failure to use the articulatory loop, since they observed an apparent absence of phonological coding in their poor readers. However, people tend to abandon phonological coding strategy when sequence lengths begin to exceed span and error rates build up (Salame & Baddeley, 1986). This proves to be the case when poor readers with reduced spans are tested at a level that is sufficient to tax the capacity of normal reading-control children. When tested at appropriately shorter lengths, the poor readers showed typical phonological similarity effects, suggesting that the absence of phonological coding in poor readers is strategic rather than structurally-based (Hall, Wilson, Humphreys, Tinzmann, & Bowyer, 1983). Converging evidence comes from other groups selected as being more severely dyslexic who, when tested at appropriate lengths, show evidence of both phonological similarity and word length effects, together with memory error patterns that resemble those of younger children, consistent with an interpretation of the Shankweiler et al. (1979) results as a strategic response to their limited storage capacity (Baddeley, Logie, & Ellis, 1988).

An attempt to study the role of the phonological loop in reading comprehension using lexical decision suggested that the store itself can best be considered as reflecting two components, one articulatory that allows the continued maintenance and manipulation of material, and a second acoustic that allows simple judgements to be made under suppression but does not allow manipulation (Baddeley & Lewis, 1981; Besner, 1987; Besner & Davelaar, 1982), a conclusion extended in a recent study by Norris, Butterfield, Hall and Page (2018).

The assumption that rehearsal is an optional strategy does not of course deny the interest in and importance of this process, which, as Cowan has shown, can be divided in children between time to retrieve the articulated items and time needed to articulate them, suggesting a two-stage process (Cowan et al., 2003; Jarrold, Hewes, & Baddeley, 2000). Unfortunately, separating these two depends on measuring inter-item gaps in the stream of overtly spoken rehearsal, which is possible in children but not in fluent adults for whom retrieval and articulation appear to overlap (Mattys, Baddeley, & Trenkic, 2018).

There is evidence, furthermore, that articulation need not involve overt speech movements. A locked-in patient who had lost all capacity for peripheral muscle control, including that of speech, nevertheless showed good STM capacity and clear evidence of both phonological similarity and word length effects (Baddeley & Wilson, 1985), implying a preserved capacity for internal subvocal rehearsal. These are only a sample of the relevant literature, but illustrate the way in which the initial simple phonological loop model has been used to investigate a wide range of situations and populations. It is not clear that the more general and less constrained concept of subvocal rehearsal as one of an unspecified number of control processes has been, or promises to be, nearly so fruitful.

The purpose of the previous discussion was not to refute Cowan’s reasonable speculation, but rather to point out the value of having a relatively specified and simple system that can be tested by being applied across a wide range of differing situations. We do indeed assume that our views have much in common with those of Cowan, noting that Cowan and Chen (2008) propose that “although the mechanisms of short-term memory are separate from those of long-term memory they are closely related” (p. 104), going on to elaborate with a suggestion that a “phonologically-based storage and rehearsal mechanism such as the phonological loop mechanism (Baddeley, 1986) may come into play primarily when items have to recalled in the correct serial order” (p. 94). We also agree with his suggestion that “Baddeley’s (2000) episodic buffer is possibly the same as the information saved in Cowan’s focus of attention or at least is a closely similar concept” (Cowan, 2005, p. 11). We regard our concepts of a central executive interacting with an episodic buffer as essentially equivalent to Cowan’s more intensively studied attentional approach. We see ourselves as differing principally in the greater emphasis on our more detailed analysis of processes and systems involved in visual and auditory short-term storage. Our principal point of disagreement thus concerns the way in which long-term and working memory interact and in particular whether it is helpful to assume separate short-term systems.

The case for the importance of temporary storage systems has been made recently by Norris (2017), who combined the evidence from behavioral studies, neuropsychology, neuroimaging and computational modeling to question the claim that activated LTM provides an adequate basis for working memory, criticizing in particular the tendency for brain imaging studies to conclude that because working memory tasks are typically associated with brain areas that are also linked to LTM, that activated LTM is sufficient to account for short-term storage (e.g. Acheson, Hamidi, Binder, & Postle, 2011; Cameron, Haarmann, Grafman & Ruchkin, 2005; Lewis-Peacock & Postle, 2008). The latter claim in their abstract that: “This result implies that activated long-term memory provides a representational basis for semantic verbal short-term memory, and hence supports theories that postulate that short-term and long-term stores are not separate”. Similarly, Öztekin, Davachi and McElree (2010) state in their abstract that “these findings support single store accounts that assume there are similar operating principles across WM and LTM representations” (Öztekin et al., 2010, p. 1123). However, as we have already noted, A & S (1968, p. 101) point out that “both STS and LTS are active in both STS and LTS experiments”. The modal model and many other models of memory assume close links between WM and LTM, hence demonstrating a positive association is inconclusive in deciding whether one or two systems are involved.

Norris goes on to argue that models that rely on activation of existing representations in LTM, with no temporary short-term component, may flounder on the ‘problem of two’ (Norris, 2017, p. 1003). This refers to the long-standing issue of serial recall where an item may be used in a sequence more than once, or may need to be recalled more than once, as for the digit 1 in recalling the sequence 971312. If such a sequence does not already occupy a specific representation in LTM, it will require a separate representation to be created in some other store. Given that we can handle limitless repetitive sequences of novel items, it is implausible to assume that all of these already exist in LTM. A temporary STS of some kind solves this problem. Such a store could indeed contain pointers rather than copies of the original items, but although “STM would indeed depend on LTM representations, all of the heavy lifting would be done by processes outside LTM itself” (Norris, 2017, p. 1003). Cowan (1999) accepts this problem but proposes that it can be handled by the rapid formation of new LTM representations. However, while extensive research has shown that adequate models of the storage and retrieval of serial order have been developed with the aid of a separate short-term store, Norris claims that detailed modelling of how this might be achieved without such temporary storage is currently absent. Given the importance of the capacity to create and maintain serial order, this is a major omission.

Of course, the question of how serial order is stored also occurs within working memory as in the case of the phonological loop. This has, however, been recognised and has led to extensive and detailed modelling, with a range of different approaches (some though not all based on the multicomponent model), both in the case of verbal recall (e.g. Burgess & Hitch, 1999, 2006; Page & Norris, 1998, 2009), and visuospatial STM (Hurlstone & Hitch, 2015, 2018). Happily, a coherent set of principles appear to be emerging from the literature with a growing degree of agreement (Hurlstone et al., 2013).

Approaches to theorizing in psychology

It is relevant at this point to provide a brief discussion of the implications of the success and longevity of the modal model for the wider issue of theorizing within psychology. Although there is currently justifiable concern with methodological issues such as transparency and replicability, it appears to be no longer fashionable within cognitive psychology to discuss philosophy of science; we should instead simply concentrate on getting our papers in high citation journals, preferably with a neuroscience flavor. In this connection it is perhaps worth noting another quote from the great economist John Maynard Keynes, who observed that: “Practical men who believe themselves to be free of any intellectual influence are typically the slaves to some defunct economist.” Could that also be true of science? If so, what might be the implicit theories within experimental psychology for example?

In the middle years of the last century, the philosophy of science was a topic of some general interest, with the dominant view probably being that of Popper (1959), who was part of a general movement originating in Vienna sometimes termed ‘falsificationism’. This approach was applied to both philosophy and science, and proposed that for a theory to be useful, it had to make clear and falsifiable predictions; if these were not supported, the theory should be abandoned. This tended to be backed up by reference to Newtonian physics with its clear postulates and precise predictions (Braithwaite, 1953). Its clearest instantiation in psychology was through Clark Hull’s (1943) Principles of Behavior, which attempted to explain learning in the white rat, and by implication more generally, in terms of a series of postulates linked by precise equations. An alternative view was that proposed by Toulmin (1953), who viewed theories as resembling maps, useful as far as they represent what is known, as accurately and elegantly as possible, providing a tool for further exploration. The outcome of such exploration was then likely to involve elaboration of the earlier map rather than its total abandonment, unless of course a different and better map was produced.

Observations as to how scientists actually behave, however, suggests yet another approach, that presented by Kuhn (1962) with his concept of scientific paradigms. These reflect the dominant questions and methods operating in a particular science at a given moment. ‘Normal science’ involves operating within the current paradigm, leading occasionally to a paradigm shift when the old paradigm is abandoned and a new one taken up. This certainly captures the extent to which science responds to fashions, very reasonably in the sense that an exciting new technique or finding will attract people from areas that were showing little progress. Unfortunately, in the hands of philosophers and sociologists it has sometimes been interpreted as suggesting that science is simply a matter of what is fashionable (Dawkins, 1998; Sokal, 1996).

A rather more constructive development came with the proposal by Popper’s colleague Lakatos (1976) that theories should not be decided on the success or otherwise of precise predictions, but by how productive they are. This does not refer simply to the number of subsequent papers and citations, as fashionable questions are by no means always theoretically or practically productive, but rather to how effective a theory is in creating a framework that captures existing knowledge in a way that leads to further questions that in turn generate new findings or extend existing findings to new fields. He distinguishes such theories from those principally concerned with protecting themselves from attack from further evidence, which he describes as ‘degenerative’. The broad framework proposed by A & S clearly fits more comfortably into the approach advocated by Lakatos, as indeed does our own theoretical approach.

That is not of course to say that more precise theories are not necessary. In its original form we had no means of storing information in serial order, a problem raised by Lashley (1951). A number of mechanisms have been suggested, but in order to decide between them it has proved necessary to have much more precise models and carefully focused empirical studies concerned, for example, not only with how serial order is maintained but also with the issue of whether it differs from one modality to another, or whether a common ordering mechanism applies across modalities (see Hurlstone et al., 2013, for further discussion).

As in the case of geographical maps, the most appropriate form of theorizing will depend on the scale of the enterprise. We need both broadly-based maps of countries and regions together with more detailed maps of towns and cities with yet more detail when precisely delineating each individual’s property. It is also important to accept that we need different maps for different purposes; a map of the London tube system is not very helpful in finding your way when walking, although it will broadly mirror the street map. Similarly, theories based on behavior and on neuroscience are likely to have different emphases but to ultimately be broadly compatible. Furthermore, a cognitive framework that is based on well-controlled experiments within the laboratory becomes more productive if it can also be applied beyond the laboratory. This criterion of generality is not by any means the only criterion of a productive theory. Equally important is its capacity to generate questions that then allow the framework to be extended or remodeled, a process that ideally should be combined with more precise attempts to cover individual areas within the broad model. It would of course be very nice to have a model that did both, and this, we assume, is behind the recent attempt to provide ‘benchmarks’ across the various phenomena that are agreed to be characteristic of working memory (Oberauer et al., 2018), presumably with the aim of creating a broad but also precise model of working memory. However, with a total selection of 20 ‘major’ and 31 ‘minor’ phenomena to fit, we suspect modeling them might be a little premature and potentially may have the undesirable effect of limiting further exploration as likely to further complicate an already daunting task (see Logie, 2018).

Evaluating the concept’s productivity

So how should we evaluate the concept proposed by A & S of a working memory? One approach is that proposed by Lakatos, in terms of productivity. A simple estimate might come from the frequency of the term ‘working memory’ in journal titles. Within psychology or psychology-related fields, this has increased from six in the year 1980 to 40 in 1990, 306 in 2000, 604 in 2010 and 845 for the year 2016 (Source: Web of Science). Of course, this refers to a wide range of different uses of the term and an overall increase in the range and number of publications, and should therefore be interpreted with some caution. Nevertheless, when calculated as a percentage of the number of articles with the broader term ‘memory’ in the title, a clear increase can be observed across this same time period (1% in 1980; 4% in 1990; 16% in 2000; 21% in 2010; and 24% in 2016). However, as mentioned earlier, the simple popularity of a concept does not necessarily mean that it is scientifically fruitful; it could simply reflect unproductive controversy.

A more informative way of evaluating the productivity of the working memory concept is to consider concrete examples of its use. We ourselves share Broadbent’s original commitment to link theory with its application beyond the laboratory, and have been pleased to see the working memory concept applied across an increasingly wide range of fields. One major development has been through its application to the field of individual differences by Kyllonen and Christal (1990), relating it to the earlier concept of general intelligence, an approach that has been further developed by a range of groups (Barrouillet & Camos, 2014; Conway, Cowan, & Bunting 2001; Engle, Tuholski, Laughlin, & Conway, 1999; Miyake, Friedman, Emerson, Witzki, Howerter, & Wager, 2000), with extensive studies relating to the development of working memory in childhood (Cowan et al., 2003; Hitch, Towse, & Hutton, 2001). The concept of working memory as a mental workspace has extended beyond psychology, a good example being its extension to paleoarcheology by Coolidge and Wynn (2005), who propose that working memory may have proved the crucial advantage held by homo sapiens over Neanderthal man. This suggestion was based on the study of remaining artefacts and their implications for the cognitive abilities they reflect, a claim that is taken sufficiently seriously within the field to merit extensive discussion in the journal Science (see Balter, 2010).

An advantage of the multicomponent model over the hypothesis of a single unitary attentional workspace is that it allows more detailed but constrained hypotheses to be proposed and tested. While A & S focus on one particular control process in verbal rehearsal, it is not clear to us that this has led to fruitful extension to other control processes or to practical applications. This has, however, proved possible with the fractionation of the A & S STS into the three-component working memory. One important feature of our early model is that, like the original A&S model, it could readily be understood without a very precise knowledge of cognitive psychology. This, together with a series of relatively simple tools for identifying and separating the three components, has led to its being widely adopted as a means of investigating the role of working memory across a range of populations and situations. An obvious application is within the field of education (Pickering, 2006), typified by the work of Gathercole and colleagues in developing measures of the components of working memory across the school years (Gathercole & Pickering, 2000a, b, Gathercole, Pickering, Knight, & Stegmann, 2004), identifying different components associated principally with vocabulary (Gathercole & Baddeley, 1989), reading (Swanson & Berninger, 1995) and language development more generally (Baddeley, Gathercole, & Papagno, 1998). A somewhat different pattern emerges in the study of mathematics where a visuo-spatial rather than phonological component tends to dominate (Bull, Johnston, & Roy, 1999; Hitch & McAuley, 1991). The multicomponent model has also begun to be used widely within the field of second language learning (Wen, Mota, & McNeill, 2015), while a recent meta-analysis based on individual differences in the rate of second language acquisition based on a wide range of studies involving a total of 3,707 learners showed clear and substantial separable contributions from the central executive and phonological loop (Linck, Osthus, Koeth, & Bunting, 2014).

Application of the model to special populations has also been fruitful, with Morris (1984) reporting a central executive deficit in Alzheimer’s disease, followed by a demonstration that this group has a particular problem in dual-task performance proposed by Baddeley (1996) as one component of the executive (Baddeley et al., 1986; Logie, et al., 2004). Dual-task performance has also proved to offer a sensitive genetic marker of a familial form of early-onset Alzheimer’s Disease, allowing family members with the gene to be identified before the onset of other major symptoms (Parra, Abrahams, Logie, Mendez Lopera, & Della Sala, 2010). The multicomponent model has also been applied successfully in a twin study of language disorder by Bishop, North, and Donlan (1996), who found evidence for the heritability of an underlying phonological loop component. Clear genetically-based differences have also been shown between people with Down syndrome, who tend to have a phonological loop deficit, and those with William’s syndrome, for whom the sketchpad appears to be clearly impaired (Jarrold, Baddeley, & Hewes, 1999; Wang & Bellugi, 1994). The model also proved useful in further analysis of familial cognitive deficit, with Schulze, Vargha-Khadem and Mishkin (2018) identifying a phonological loop deficit as crucial in a family showing marked impairment in normal language development. These are simply some examples of the application of the concept of a multi-component working memory across a range of fields that operate well beyond the bounds of the psychological laboratory. It is hard to see the concept of working memory simply as activated LTM proving to be equally productive.

Conclusions

So, why has the modal model been so influential? We suggest first of all that it survived while many other models have been forgotten because it attempted to provide a broad framework within which further detail could be developed. The separation between structure and processing has also stood the test of time. Less successful was the modal model’s reliance on the most widely studied memory tasks at the time, based largely on the short-term retention of acoustic/linguistic material. As a result, the need to account for remembering and processing visuo-spatial information was comparatively neglected. A further consequence of the over-reliance on verbal materials was an initial oversimplification of the processes whereby information is transferred from STS to LTS. The emphasis of the model on simple maintenance was called into question shortly afterwards by evidence for the importance of deeper and more elaborative processing for long-term retention (Craik & Lockhart, 1972). Relatedly, the assumption that the STS serves as the gateway to LTS was challenged by the existence of neuropsychological patients with impaired STM but normal LTM.

We suggest that our own multicomponent working memory concept forms an extension and elaboration of the STS component of the modal model that has avoided these latter difficulties. In particular, we suggest that it has proved fruitful to separate the attentional control processes that we termed the central executive from temporary storage, and to suggest that more than one storage modality is likely to be involved. In addition, by postulating the concept of an episodic buffer, we explicitly link the system to hypotheses about conscious awareness. We suggest that our broad framework is compatible with a range of other more detailed proposals regarding specific components of the system. Such development and elaboration are of course essential if the overall framework is to continue to be fruitful.

The original A&S model’s well-deserved longevity stems in part from its capacity to crystalize the major advances made in the previous decade in the understanding of human memory and combine them within a well justified theoretical framework, a framework that was broad enough to encompass modifications and additions in the face of new evidence. We see the multicomponent model of working memory as an extension of this approach, exploring further the nature of their proposed STS by focusing on its capacity to function as part of a more general working memory system.