Memory from nonsense syllables to novels: A survey of retention

Radvansky, Gabriel A.; Parra, Dani; Doolen, Abigail C.

doi:10.3758/s13423-024-02514-3

Memory from nonsense syllables to novels: A survey of retention

Theoretical/Review
Open access
Published: 07 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Memory from nonsense syllables to novels: A survey of retention

Download PDF

707 Accesses
2 Altmetric
Explore all metrics

Abstract

Memory has been the subject of scientific study for nearly 150 years. Because a broad range of studies have been done, we can now assess how effective memory is for a range of materials, from simple nonsense syllables to complex materials such as novels. Moreover, we can assess memory effectiveness for a variety of durations, anywhere from a few seconds up to decades later. Our aim here is to assess a range of factors that contribute to the patterns of retention and forgetting under various circumstances. This was done by taking a meta-analytic approach that assesses performance across a broad assortment of studies. Specifically, we assessed memory across 256 papers, involving 916 data sets (e.g., experiments and conditions). The results revealed that exponential-power, logarithmic, and linear functions best captured the widest range of data compared with power and hyperbolic-power functions. Given previous research on this topic, it was surprising that the power function was not the best-fitting function most often. Contrary to what would be expected, a substantial amount of data also revealed either stable memory over time or improvement. These findings can be used to improve our ability to model and predict the amount of information retained in memory. In addition, this analysis of a large set of memory data provides a foundation for expanding behavioral and neuroimaging research to better target areas of study that can inform the effectiveness of memory.

The effect of working memory maintenance on long-term memory

Article 09 May 2019

Short-term retention of words as a function of encoding depth

Article 12 March 2024

Recognition memory decisions made with short- and long-term retrieval

Article 09 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An important aim of cognitive psychology is to understand the progress of memories over time (Ebbinghaus, 1885). One issue that memory researchers have struggled with for well over a century is how much is remembered and forgotten as time passes. This is important because understanding the progress of memory would allow us to predict memories at different points in the future. Ebbinghaus (1885) proposed a logarithmic function to capture memory retention over time. Other researchers have suggested other functions, including power (e.g., J. R. Anderson & Schooler, 1991; Averell & Heathcote, 2011; Rubin & Wenzel, 1996a, 1996b; Wickelgren, 1974; Wixted & Carpenter, 2007; Wixted & Ebbesen, 1991a, 1991b), exponential (e.g., Loftus, 1985a, 1985b), and linear functions (Fisher & Radvansky, 2019). The aim of this study was to analyze a very large corpus of data sets from well over a century of memory research to better understand which pattern(s) best captures the progress of memory retention and forgetting over time, and to assess whether the type of pattern observed may be influenced by other factors, such as the retention delay, the type of materials tested, the memory test used, and so on.

An earlier attempt at this was reported by Rubin and Wenzel (1996a, 1996b). They assessed how well 210 published data sets fit 105 two-parameter functions. Their criteria for data inclusion were that (a) there were five or more retention intervals, (b) the dependent measure conveyed how much was remembered, and (c) each included data set fit at least one of their 105 two-parameter functions with an r² of .90 or better. Their assessment largely involved counting the number of times each of the 105 functions was one of the ten best for a given data set. These two-parameter functions were selected because they have (a) a measure of the rate of change over time and (b) a scaling measure. Using this approach, Rubin and Wenzel concluded that the four best-fitting two-parameter functions were (a) logarithmic, (b) power, (c) exponential in the square root of time (exponential-power), and (d) hyperbolic in the square root of time (hyperbolic-power).

Rubin and Wenzel (1996a, 1996b) classified and sorted the data based on the labs that the data came from (e.g., Wickelgren and Bahrick) and study types (e.g., autobiographical memory). Our study expands their assessment by considering the amount of data in a study (e.g., number of participants and observations). Our aim was to build upon the impressive effort of Rubin and Wenzel to systematically assess and compare different retention and forgetting patterns for various independent and dependent variables, such as retention delay and memory test types, and to examine why the pattern of retention and forgetting might vary as a function of such factors.

Like Rubin and Wenzel (1996a, 1996b), we take a largely atheoretical, exploratory approach to our analyses. That said, we do consider some theoretical issues that our analysis can inform and adjudicate. This assessment of patterns of retention and forgetting is motivated by recent work in our lab. One line of work showed that there are changes in the rate of forgetting for different periods of time (Radvansky et al., 2017a, 2017b, 2022). Another reports that some patterns of retention and forgetting do not conform to a negatively accelerating pattern but are better captured by a linear function (Fisher & Radvansky, 2019). This is important because with a negatively accelerating function, there is often a constant proportional loss over some unit of time (e.g., log time for a power function). In contrast, for a linear function there is a constant amount of loss over a given unit of time. Thus, there would be an increasing proportion of information lost with longer and longer periods of time.

For this study, we added more recent memory studies and used somewhat different criteria than Rubin and Wenzel (1996a, 1996b). This larger dataset allows for a more systematic assessment of a range of variables that may contribute to the observed memory performance. In what follows, we detail the inclusion criteria as well as the independent and dependent variables used for our analysis. Our analysis is composed of two phases. We first assess the goodness-of-fit of various functions to see which function(s) emerge as best descriptors of memory, similar to the procedure used by Rubin and Wenzel (1996a, 1996b). We then assess various factors that contribute to the emergence of those functions.

Memory functions

Our assessment of memory loss was done for five target functions (Table 1): logarithmic, power, hyperbolic-power, exponential-power, and linear. These are all two-parameter^{Footnote 1} functions in which memory, M, across time, t, is captured by a constant scale parameter, a, and a rate of change, b. Each of these functions is now considered in turn.

Table 1 Function equations considered in our analyses

Full size table

Logarithmic

Logarithmic functions have been attributed to Ebbinghaus^{Footnote 2} (1885; see also Woodworth, 1938) and were among the best-fitting functions in Rubin and Wenzel’s (1996a, 1996b) assessment. They suggested that these functions may be best for non-autobiographical memories. For this function, a boundary condition is necessary. Logarithmic loss functions predict that at some point people will remember less than nothing, which is nonsensical. Thus, there should be a constraint restricting values to be positive for the appropriate application of logarithmic functions to memory data.

Power

Power functions were also among the best-fitting functions reported by Rubin and Wenzel (1996a, 1996b) and are often preferred by many researchers studying patterns of retention and forgetting (e.g., Wixted & Ebbesen, 1991a, 1991b). As noted by Rubin and Wenzel, with a power function, there are equal ratios of retention time that are accompanied by unequal ratios of memory retrieval. It has been suggested that power function occur as a result of memory consolidation processes (Wixted, 2004a, 2004b) or to mirror the occurrence of events in the environment (Anderson & Schooler, 1991).

Exponential-power

The exponential in the square root of time, which is a special case of the Weibull function, was first suggested as a description of memory change by Wickelgren (1972). This function was also among the best-fitting functions reported by Rubin and Wenzel (1996a, 1996b).

Hyperbolic-power

The fourth function that did well in Rubin and Wenzel’s (1996a, 1996b) analysis was the hyperbola in the square root of time. However, it is rarely considered as a means of capturing data from human participants outside of their study. It has, as suggested by Rubin and Wenzel, been more popular in research with animals (e.g., Harnett et al., 1984a, 1984b; Staddon, 1983a, 1983b).

Linear

The last function that we consider is the linear function. Although it did not make Rubin and Wenzel’s (1996a, 1996b) top four list, recent work suggests that it may be observed under some circumstances (Fisher & Radvansky, 2019). Thus, we include it here to better understand the circumstances under which it would appear. Like logarithmic functions, linear functions also predict that at some point, memory becomes negative. Thus, these functions also need to be used with a boundary constraint so that they cannot drop below zero. That said, it has been suggested that a linear forgetting pattern may be reflective of curvilinear patterns “by assuming that it simply reflects a scaling/measurement artifact” (Wixted, 2022, p. 1785), and as such, linear patterns of forgetting could easily be dismissed.

These five functions describe memory loss over time. We also consider cases in which memory remains stable over extended retention intervals. For our purposes, we define data sets as falling into this category when the net proportion change across a data set’s retention intervals is between −.01 and .01. Although these data sets may not contribute to understanding the function of best fit (because the data may be so flat), it is useful to understand the situations in which stable memory arises. Following this logic, we also separately consider cases in which memory increases over time. Again, examining such cases helps identify the circumstances under which improvement rather than loss might be observed with the aim of contributing to a more robust understanding of memory retention.

Previous reviews of retention and forgetting, such as Rubin and Wenzel’s (1996a, 1996b) classic assessment, have concluded that different functions capture different patterns of loss that have been observed in the literature. Different retention and forgetting functions are likely to occur because different memory representations and processes are involved, leading to the observation of different functions. These can then be used to better explore and understand how memory retention, loss, and retrieval operate. As one example, Fisher and Radvansky (2022a, 2022b) found that less well-known information was better captured by a power function, whereas better-learned information was better captured by a linear function. This was explained using a computational model of memory retrieval, the RAFT model. In brief, the explanation is that better-learned information allows for more reconstruction of partial knowledge in memory, which can produce a linear forgetting function, even if the individual elements making up a representation are lost in a way that follows a power function. This is similar to the idea that the more components of an event that can be used as retrieval cues, the better memory performance will be (Jones, 1976). In comparison, with less well-learned information, reconstruction is more difficult, and a power function emerges more readily. Thus, with knowledge such as this, knowing which function best fits the data can provide some understanding of how information is represented and processed in memory.

Inclusion criteria

There are a number of criteria for each data set to meet to be included in our corpus and analyses. These are listed in Table 2 and are described here, along with a justification for each.

Table 2 Inclusion criteria for data sets to be included in our corpus

Full size table

Study characteristics

English

We limited our search to studies that are either published in or translated into English to ensure an accurate understanding of the methods and results of each study. The one exception to this is a study by Radosavljevich (1907a, 1907b), which is included here because of its historical significance.

Number of retention intervals

For our analyses, we look at memory retention and forgetting over time, so the studies needed to measure memory after different retention delays. We used studies with three or more retention intervals because three data points are needed to fit the two-parameter functions currently considered. Rubin and Wenzel (1996a, 1996b) limited themselves to studies with five or more retention intervals to increase data stability. While including studies with three or four retention intervals does introduce some potential instability, we include them to increase the size of our corpus. The number of retention intervals is included as a factor in the analyses to account for any potential influences of it.

Measurement

Each included data set had a clear measure of the amount of information retained in memory, such as the amount recalled, recognized, or degree of savings. For our analyses this was uniformly represented as proportion correct.^{Footnote 3} We further restrict ourselves to studies in which declarative memory was measured. Procedural memory contains information that cannot be consciously recalled, such as how to ride a bicycle, and is thought to involve different neural mechanisms (e.g., Cohen et al., 1985; Squire, 1986). Moreover, it is much less clear how to quantify the proportion remembered for such data. Thus, studies of procedural memory retention were excluded.

We also exclude studies in which people made free memory associations to individual words, and the data are based on what people produced initially (e.g., Crovitz & Shiffman, 1974; Rubin & Wenzel, 1996a, 1996b). With this method, there is no targeted assessment of memories at different times. These were simply the first memories that come to mind when people hear those words. This is not an assessment of the proportion of memories accessible at different time periods. Thus, we do not know how much people do and do not remember from different periods of time.

Sample characteristics

For our analyses, we include studies using human adults, with no known psychopathologies.

Humans

Unlike Rubin and Wenzel (1996a, 1996b), who included animal studies, we confine our analyses to data sets that involve humans. Memory research with non-humans has been invaluable, as there are many points of convergence between the two (e.g., Squire, 2004). However, there are also notable dissimilarities that could complicate our analyses (e.g., Premack, 2007).

Age

We limit our analyses to data that do not involve children (younger than 18). There are both neurological developments and behavioral changes that occur throughout childhood that could complicate our analyses. Thus, we chose to be conservative and not include data from children here.

Psychopathology and cognitive state

We excluded samples that assessed people with a known psychopathology, such as amnesia or schizophrenia. We also excluded samples in which the participant’s cognitive state was altered by a substance such as alcohol or caffeine. Again, memory in these groups varies from normal (e.g., Aleman et al., 1999; Wickelgren, 1975). If there was a neurotypical control group in a study, we did include that data.

A note on fit characteristics and data reduction

Our analyses examine the influence of several factors on the goodness-of-fit, as indicated by the coefficient of determination, r², of several functions to the retention data of each study. Rubin and Wenzel (1996a, 1996b) used a strict criterion of r² ≥ .90 for at least one of their functions to focus on well-behaved data sets. We elected not to use this criterion because, although the fit of individual studies may not be as high, the various factors included in our analysis may account for such variance.

However, because our emphasis is on assessing which function(s) best fit the data, we removed any data sets that were so variable that none of the functions captured well the nature of changes in performance. After assembling our corpus, we took some steps to reduce the amount of data used. We first dropped any data sets, in cases of memory loss, in which the fit of any of the five functions was poor. Rather than dropping anything below r² = .90, as Rubin and Wenzel (1996a, 1996b) had done, we took a more inclusive approach. We instead took the best-fitting r² for each study, across the five retention functions, in addition to keeping any data showing no net change or improvement. We then chose to drop those data sets for which the best-fitting function was less than r² = .5. This is around two standard deviations from the mean of the five best-fitting functions (M = .880, SD = .192), again excluding no net change and improving data sets. This resulted in the loss of 53 data sets, which had a low average best fit (M = .31, SD = .14). The reduced data set used for our analyses can be found in our online Supplement A (all supplements are available at https://osf.io/wq9ty). For interested readers, the removed data are provided in our online Supplement B. After dropping the poorly-fitting data, the data sets showing memory loss over time were better described by the five mathematical functions (M = .922, SD = .112).

In terms of prominent differences, relative to the retained data sets, the sets that were removed had slightly smaller average sample sizes (M = 98.2 vs. M = 110.0), smaller average groups sizes (M = 24.3 vs. M = 30.3), and fewer observations per person (M = 44.3 vs. M = 120.1). They also had a higher proportion of multiple study opportunity data sets (M = .71 vs. M = .44). Moreover, the trimmed data had more retention intervals (M = 5.5 and M = 4.7) and covered longer average periods of time (M = 18.7 years and M = 8.3 years). The trimmed data also had an average lower initial memory (M = .65 and M = .73), and importantly, the trimmed data sets were more likely to involve less change from one retention interval to the next (M = −.02 vs. M = −.07). Thus, it seems likely that the data sets that were trimmed out had poor retention function fits because they had less data overall, for information that was less well-learned, over longer periods of time, and were less likely to show much change over time.

Literature search

Many studies of memory with multiple retention intervals are not labeled as such, so it is far too limiting to do a literature search merely using key terms. We have found published work examining forgetting over time (e.g., Rubin & Wenzel, 1996a, 1996b) and have sifted through the references of each to find additional papers. We also included any papers and data sets that we have happened upon during this process. We hope that the reports that are included in this analysis are representative of the population of studies that fit our inclusion criteria. From this set of criteria, we have developed our corpus. This includes data from 256 papers, involving 916 data sets (e.g., multiple experiments and/or conditions within articles).^{Footnote 4} That said, this corpus is almost certainly incomplete.

Data coding

There were a range of variables considered for our analyses (Table 3). These are about general characteristics of a study, characteristics of the materials, learning characteristics, nature of the retrieval tasks, and aspects of retention. The summary statistics for these variables are provided in Table 4. The data used here as well as the syntax for analysis are publicly available on the Open Science Framework (https://osf.io/wq9ty).

Table 3 Independent variables considered in our analyses

Full size table

Table 4 Summary statistics of the numeric independent variables for our corpus

Full size table

General characteristics

Publication year

Research practices in the study of human memory have changed over the decades. These changes may contribute to the observed data in ways that are not captured by our other factors. To allow for this, we include the year of publication as a dependent variable. The description of the various years in our corpus is provided in Table 4.

Sample size/observations per participant/amount of data

All else being equal, studies with larger sample size, more observations per participant, and more data in general, are likely to have more stable data that more accurately reflect the population (Cohen, 1992a, 1992b; Cronbach et al., 1972; Marcoulides, 1993; VanVoorhis & Morgan, 2007). Studies with fewer data points may overestimate true effect sizes and, as such, may be less replicable (e.g., Button et al., 2013). Specifically, if certain functions are better fit by small data sets, this would bring into question the accuracy of such functions. The description of the sample sizes, observations per participant, and amount of data in our data set is provided in Table 4.

Material characteristics

Material type

There are different types of memory for different types of information (e.g., nonsense syllables and novels). Thus, we also coded for the types of materials used. We identified many different material types. These are listed in Table 5.

Table 5 Levels of material complexity used for our assessment

Full size table

Material complexity

The level of complexity of the material was coded to convey the degree to which the materials likely activate prior world knowledge and invite inferences. The levels of complexity are shown alongside the material types in Table 5. Level 1 (n = 124 data sets) includes materials that have very little to no semantic meaning, such as nonsense syllables, and are presented in isolation with no reference to other items. Level 2 (n = 252) includes materials that have semantic meaning, either through prior knowledge or individual experiences. However, the items are presented in isolation with little to no relation to one another. Thus, while elaboration is possible, it is likely to be limited in scope.

Level 3 (n = 37) includes materials in which there is some type of association. However, while one of the items may be meaningful, the other is not, at least from the perspective of the participant. Again, while elaboration is possible, it is likely to be limited. Level 4 (n = 145) includes materials for which there is an association between two or more meaningful items, however, not enough to form a complete proposition. Some elaboration might be possible, but it would likely be subjective and initiated by the person.

Level 5 (n = 91) includes materials that convey at least one complete idea or proposition. These materials are much more likely to encourage some elaborative processing. Level 6 (n = 24) includes materials that clearly go beyond a single propositional idea unit and involve multiple ideas. Thus, there is more opportunities for elaborative processing. However, more complex information is less likely to convey a coherent situation or event, or a collection of situations or events. The faces and events material types are placed here because they involve elements of simpler and more complex materials. Poems are placed here because they typically do not convey elaborative descriptions of events as prose does. Level 7 (n = 243) includes materials that clearly involve an understanding of situations and events, that often span across time, with many different elements and inter-relations, such as novels. Given recent findings of linear forgetting in complex materials, it is expected that material complexity may be related to linear function fit (Fisher & Radvansky, 2019).

Given that our complexity measure provides an index that can be more readily used to compare with other measures, and the different memoranda were sorted into the seven complexity classifications, we elected not to use any coding of the specific memoranda in any of our analyses. This information is available for any readers wishing to explore such issues.

Learning characteristics

Multiple study opportunities

Memory can vary depending on whether information was presented once, or if there were multiple study opportunities (e.g., Ebbinghaus, 1885). This was coded in our corpus as 0 for single (n = 471 data sets) and 1 for multiple (n = 445).

Degree of learning

Another factor that can influence the consolidation of information into memory is the degree of learning. The better the information is learned, the more likely that it has been stored in memory (e.g., Craik & Lockhart, 1972). This could have consequences for the nature of the pattern of retention and forgetting that is observed. To quantify this for the purposes of our analyses, we came up with a rough measure to code these studies by identifying four levels. Level 1 are materials from Complexity Levels 1 to 4 that likely involved rote rehearsal and were only explicitly processed once (n = 310). Level 2 are materials from Levels 1 to 4 and were explicitly processed more than once (n = 248). Level 3 are materials from Complexity Levels 5 to 7 that likely involved elaborative rehearsal and were explicitly processed once (n = 178). Finally, Level 4 are materials from Levels 5 to 7 that were explicitly processed more than once (n = 180).

Distractor task

Studies vary in whether there was an experimenter-imposed distractor task used to encourage forgetting. Memory retention requires that traces go through a process of consolidation (e.g., McGaugh, 1966, 2000). Prior to this, traces might be disrupted through distraction, leading to faster forgetting. Thus, it is possible that the pattern of forgetting would differ in studies in which a distractor task was present (n = 150) versus those in which it was not (n = 766).

Test characteristics

Assessment type

This is the type of memory test used. In our data set, these include free recall (n = 432 data sets), cued recall (n = 121), yes–no recognition (n = 141), forced choice recognition (n = 160), savings (n = 14), stem completion (n = 22), fragment completion (n = 12), anagram solution (n = 1), matching (n = 7), problem solutions (n = 4), and source monitoring (n = 2).

Number of retention intervals

We coded for the number of retention intervals in each data set. Data with more retention intervals may have more stable patterns than data with fewer, and this needs to be considered. This variable allows us to assess whether certain retention functions are more likely with different numbers of intervals, perhaps because of the stability (or lack thereof) of the data. The descriptive information about the number of retention intervals is shown in Table 4.

Study design

Studies of retention and forgetting use either a between-subjects design (with a different group of participants at each retention interval), or a within-subjects design (measuring memory in all people at all retention intervals). This is important to account for because practice effects can occur with within-subjects’ designs, although it does reduce some error variance (Greenwald, 1976). In our corpus, within participant designs were coded as 0 and between participant designs as 1. There were more within-participant designs (n = 489) than between (n = 294).

Number of observations

We also include the number of observations per participant. This includes the number of trials. The more observations there were per person, the more stable the data are likely to be. Thus, more observations are more likely to capture retention, and provide more replicable and stable results. The number of observations per person can compensate for low levels on other factors, such as the number of participants (Smith & Little, 2018).

Amount of data

We combined information about sample size and the number of observations per person to calculate the overall amount of data in each study. This is the number of participants in the study times the number of observations per person. Descriptive information about both the number of observations per person and the amount of data are provided in Table 4. However, for all further analyses, number of participants and number of observations per participant are not considered, as the amount of data measure was derived from these.

Retention characteristics

Retention intervals

Memory exhibits different properties at different periods of time. Traditionally, there is a distinction between short-term/working memory and long-term memory (e.g., Atkinson & Shiffrin, 1968; Cowan, 2008). There have also been suggestions of other divisions, such as between long-term memory and long-lasting memory (McGaugh, 2000). These phases of memory may have different neurological and behavioral properties. For example, there is a shift in neurological processes over the span of retention, where the hippocampus is more active in early memory consolidation but becomes less active as memories are transferred to the neocortex (e.g., Squire & Alvarez, 1995). The neurological shift is reflected in changing speeds of forgetting (e.g., Radvansky et al., 2022), where forgetting speed decreases up to about a day, increases between 1 and 9 days, and remains somewhat stable thereafter.

Because there are shifts in the neural mechanisms supporting memory retention, it is possible that forgetting can be best described by different functions over the course of retention. As a hypothetical example, it could be that retention is better captured by a logarithmic function prior to one day, but a power function at longer delays. Thus, we include delay to allow us to discover any such differences. The issue of retention delay, while seemingly straight-forward, becomes somewhat thorny when considering the fit of data to retention and forgetting functions. There are many ways to quantify delay in studies with multiple delay intervals, so we capture delay in four ways: (1) the shortest retention interval, (2) the longest retention interval, (3) the average retention interval, and (4) the retention range (longest–shortest). For consistency, all studies are coded into the nearest reasonable number of seconds. Descriptive data on each of these is provided in Table 4.

Initial memory

Somewhat related to degree of learning is initial memory. Initial memory could affect the fit and rate of forgetting (Slamecka & McElree, 1983a, 1883b; Wixted, 2022). For example, Anderson and Schooler (1991) suggested that a logarithmic function has a steeper slope when there is a higher initial memory, but the power function does not depend on the initial memory. Thus, the best-fitting function may be related to initial memory levels. The descriptive data for initial memory levels are provided in Table 4.

Note that for some studies, the first memory assessment is reported as being made “immediately” or after 0 s. Rather than setting an initial delay time as 0 in these cases, which would literally mean that the information and the memory test were perfectly coexistent in time, and because some functions require a non-zero value to calculate, we took two approaches. The first was the acknowledgement that an immediate memory test was not actually instantaneous. Instead, there was some nominal delay, even if it is something like the refresh rate of a computer screen. To provide an estimate of what the actual delay time was we either used an estimate based on information about actual display durations or provided an educated estimate of how long instructions for a memory test would take (e.g., 30 s). We note in our corpus how these initial estimates were made in applicable cases. In some cases, a 0-second delay is more in line with what actually happened, such as in cases of short-term memory testing. This is a problem for some memory functions, such as the power function, because it is mathematically undefined at 0 seconds. To address this, we set the number of seconds for this initial memory at a small value (e.g., .01 s) that is not likely to be psychologically meaningful in the context of these data sets.

Confirmatory and exploratory analyses

While our approach is largely atheoretical, there are some theoretical implications for what we find. Our analyses are part confirmatory and part exploratory. The confirmatory analyses address issues or findings that are already reported in the literature. In comparison, the exploratory analyses are those for which there is no strong, a priori, theoretical expectation or prediction about the outcome. However, these are of interest because they have the potential to provide insight to guide future research.

Confirmatory analyses

Best-fitting functions

Of most interest to us here, we consider predictions for which of the retention and forgetting functions will best capture the data. Previously, Rubin and Wenzel (1996a, 1996b), in their extensive analysis, reported that logarithmic functions fit the data best most often, followed by power, exponential-power, and hyperbolic power, and with linear patterns doing the worst by far. This would be in line with some of the earliest work on memory retention and forgetting (i.e., Ebbinghaus, 1885). Moreover, they also suggest that this will be more likely to be the case for simpler materials, compared with complex memories, such as autobiographical memories.

A competing prediction, based on a report by Wixted and Ebbesen (1991a, 1991b), is that the data will be best fit by a power function. This is almost a default assumption of researchers studying retention and forgetting (e.g., Anderson & Schooler, 1991; Averell & Heathcote, 2011; Carpenter et al., 2008a, 2008b; Wickelgren, 1974; Wixted & Carpenter, 2007). An attraction of the power function is the idea that forgetting functions “almost invariably exhibit a decreasing relative rate of forgetting, as noted long ago by Jost (1897)” (Wixted, 2022, p. 1779). This invariability is assessed here.

Material characteristics

Rubin and Wenzel’s (1996a, 1996b) report predicts that autobiographical memories will be better fit by power functions than other types of materials. We can tentatively expand this to memory for any type of complex set of materials (e.g., stories). Rubin and Wenzel arrived at this conclusion largely based on a consideration of autobiographical memory studies using the Galton–Crovitz technique of eliciting memory reports (Crovitz & Quina-Holland, 1976). We exclude these here because this approach does not assess how much is remembered from a given time period, but only provide information about the memories initially retrieved in response to a cue.

A competing prediction is that autobiographical memories will be more likely to be well-fit by linear functions (Linton, 1982a, 1982b). This is based on the pattern of data observed for long-term memory for an extensive autobiographical memory study. A more general prediction is that as material complexity increases, the pattern of forgetting will be more linear (Fisher & Radvansky, 2019). This is based on research that has found clear and stable evidence of linear forgetting. Fisher and Radvansky’s (2019) report highlighted the fact that when a linear pattern of retention and forgetting is observed, the studies involved used more complex materials (e.g., narratives).

Learning characteristics

Another observed methodological factor involved in the pattern of retention and forgetting is the degree of learning. One prediction is that when there is a higher degree of learning, the pattern of forgetting will most likely be linear (Fisher & Radvansky, 2019). A review of the literature showed that when clear linear patterns of forgetting are observed, it is not unusual for such studies to involve a higher degree of learning, as with overlearning (e.g., Burtt & Dobell, 1925a, 1925b).

Having said all of this, we explicitly note here that we are aware that there may be other reports or accounts in the literature that provide a basis for confirmatory predictions that we have missed.

Exploratory analyses

Year of publication

There are no strong a priori expectations that the year a study was published will influence the pattern of data. However, we do include this as a factor in our analysis to address the possibility that something, perhaps methodologically, has changed over the years to yield different patterns of memory retention and forgetting.

Sample size

There is no question that sample sizes can influence the patterns of data observed in psychological studies. Studies with small sample sizes may provide distorted views of the mind. Thus, it is reasonable to expect that differences in sample size may lead to different patterns of results, some of which may be more distorted than others. For example, if some memory functions are only seen with smaller sample sizes, then this would be an indication that that function is a result of more random fluctuations in the observed pattern of data, and not reflective of underlying memory mechanisms. Our analyses allow for this assessment.

Memory test characteristics

While there has been some suggestion that memory test types may influence the pattern of observed retention and forgetting (e.g., Haist et al., 1992), there may be aspects of the memory testing process itself that influence this pattern of which we are not aware. These aspects may include study design (such as whether it is a within- or between-participants design) and the number of observations per person.

Retention characteristics

The influence of various aspects of retention, such as the number of retention intervals, the shortest and longest intervals, the average interval, and the range of the retention interval, are largely unknown. Our analysis will provide some insight into this.

Analyses

We have two basic analyses. The first is in line with the approach taken by Rubin and Wenzel (1996a, 1996b). Specifically, we compare the fits of the top four functions from Rubin and Wenzel’s work (logarithmic, power, hyperbolic-power, and exponential-power), along with the linear function in cases in which there is no net change over time or which memory is increasing, to assess how often each of these fit the data better than the others. Our second major set of analyses was to assess which of a wide range of factors are more likely to produce one pattern of data over another. That is, what aspects of the sample, the materials, the memory assessment, and so on, lead to different patterns of retention and forgetting.

Best-fitting functions analysis

We first consider the assessment of how often each of our functions was the best fit for each of the data sets. As a reminder, Rubin and Wenzel (1996a, 1996b) fit all the data sets in their corpus to 105 functions as part of their primary analysis. They then tallied how often a given function was among the best 10 fitting functions for a given data set. From this, they concluded that the best-fitting four functions were logarithmic, power, exponential-power, and hyperbolic-power. We take a similar approach here, with some changes. Again, we deviate from Rubin and Wenzel in that we are not comparing 105 functions, as they did (they rejected many of them). We also deviated from Rubin and Wenzel in that while they only considered data when r² ≥ .90 for at least one function, we allow for poorer fits. Second, while they treated each data set equivalently, we adjusted for the number of observations in a data set by using weighted means. It is important to account for the size of the study, in terms of the number of participants and the number of observations. Otherwise, studies with few participants/observations can place undue weight on the results and skew our conclusions. For those readers interested in the patterns of results when data sets were limited to r² ≥ .90 and/or the data were not adjusted for the number of observations, these are available in our online Supplement C.

The results of this analysis are in Fig. 1. As can be seen, the logarithmic function was the most successful because it best fit the largest proportion of data sets, followed closely by the linear and exponential-power functions. The hyperbolic-power function did worse, and the power function again did the worst. Finally, the relative proportion of studies that conformed to either no net change or increasing was substantial.

Overall, this analysis failed to support two confirmatory predictions. The first, based on work by Rubin and Wenzel (1996a, 1996b), was that logarithmic functions would do best, followed by power, exponential-power, and hyperbolic power, and with linear patterns doing the worst. Instead, we observed very different patterns of effectiveness of the various functions at capturing memory retention and forgetting.

The second unsupported prediction was that, based on a report by Wixted and Ebbesen (1991a, 1991b), as well as others, the dominant equation would be a power function. They might have observed those results for no other reason than because the averaging of various other underlying functions can be well fit by a power function (e.g., Anderson, 2001), particularly if there is greater variability among the individual forgetting functions due to variability in participants and memory for different material items. Based on this alone, one might expect power functions to do very well, even if the underlying pattern of memory loss is not a power function. However, with our corpus, the power function was the best to the smallest degree. Thus, prior implications of the dominance of the power function as the best description of memory retention and forgetting was not supported. At a minimum, given that power functions can be observed from an averaging of other functions with higher levels of variability in their loss rates, within studies, there is relatively little variability across participants and items.^{Footnote 5}

One other point to note is that goodness of fit values for the loss functions are correlated with one another, suggesting that some of them produce patterns that are difficult to distinguish with some memory data. The correlation matrix is shown in Table 6. As can be seen, the power and logarithmic function fits were highly correlated, as were the exponential-power and hyperbolic-power functions. The linear function differed the most from the other functions. Thus, there seem to be three families of functions here: (1) logarithmic/power, (2) exponential-power/hyperbolic-power, and (3) linear. One could drop consideration of the power and hyperbolic-power functions and still provide a fairly accurate characterization of most forgetting curves.

Table 6 Correlation of the fits for the five functions

Full size table

After assessing which functions fit the data most often, we also assessed how well the various functions did when they were the best fit for a data set. First, the exponential-power (M = .968, SE = .004; min = .764, max = 1.00) and hyperbolic-power functions (M = .966, SE = .006; min = .522, max = 1.00) fits did the best. This was followed by the power (M = .934, SE = .008; min = .651, max = 1.00), linear (M = .888, SE = .010; min = .510, max = 1.00), and logarithmic functions (M = .883, SE = .010; min = .514, max = 1.00).

Typical functions

To get a feel for the different categories of memory retention and forgetting for our five functions, as well as no net change and increasing data sets, we plotted each of these using the median value for the a and b formula values. For the no net change data sets, we used the median initial performance level, and then extended that throughout. For the increasing data sets, we used the median a and b values for the best-fitting linear functions. The plots for these typical patterns are shown in Fig. 2. As can be seen, the logarithmic and power functions both show more dramatic loss over log time, especially for earlier retention intervals, with the power function showing more rapid loss. Also, the exponential-power and hyperbolic-power functions closely resembled each other, and the linear pattern was closer to these as well. Finally, the increasing data sets started out at a lower initial level of performance, overall, and there was very little change over time for the typical data set. Thus, many of these data sets could be grouped with the no net change data sets, without much loss in the accuracy of the characterization of the data.

Memory factors analysis

Our next aim was to determine which factors contribute to different patterns of retention and forgetting. The first step was to assess whether any variables were strongly correlated with one another, and drop those that were to reduce redundancy. The next step was to assess the characteristics of the data sets for our seven patterns of performance over time. This may provide some insight into which study characteristics, when present, are likely to lead to a particular retention and forgetting pattern.

Correlation analysis

Our first step was to assess whether any of our 13 numeric variables are strongly correlated with one another. The results of a correlation matrix are shown in Table 7. First, the correlation between Longest Retention Interval and Range was nearly perfect (r = .99). We elected to drop Range because, conceptually, we are interested in how long people remember things.

Table 7 Correlations among the various independent variables

Full size table

Moreover, we elected to identify variables that are correlated .70 or greater as collinear. To reduce collinearity, we also reduced the number of variables that met this standard criterion. Not surprisingly, the remaining three variables related to the length of the retention interval (Shortest, Longest, and Average Retention Interval), were highly inter-correlated (r = .72 to .98) and collinear. We elected to keep Longest Retention Interval. Another indicator of collinearity was between Complexity and Degree of Learning (r = .77). To address this, we elected to drop the Degree of Learning because it was also highly correlated with Multiple Study Opportunities (r = .52).

Finally, there was still one large correlation, between Amount of Data with Number of Retention intervals (r = .55). This is sensible. In general, the more retention intervals that were tested, the more data there was. This is largely unavoidable, and we retain these variables for our analyses but with an awareness of this relationship. Overall, we reduced the number of independent variables from 13 to 9.

Curve category characteristics

Our next step was to assess how our study characteristics differed across our seven categories. We did this in two ways. We first tested for any differences for each of the individual factors using ANOVA and Tukey tests. We then did logistic regressions in which we assessed the ability of our factors to predict when a given function would be the best at capturing the data sets.^{Footnote 6} The factor means and standard errors for each of the memory patterns is shown in Table 8. Here, we do not report data from studies using anagram solution, matching, problem solving, and source monitoring measures because there were so few of them. Moreover, we collapsed stem and fragment completion studies because they were methodologically and conceptually so similar.

Table 8 Characteristic means of the data subsets (standard error in parentheses)

Full size table

For the data in Table 8, we performed an ANOVA comparing performance on each of the factors (Table 9). If the ANOVA was significant (at least marginally), we also report any significant pairwise Tukey comparisons. One concern may be that five of these patterns are loss functions, and the other two (No net change and Increasing) are not. It might be that there are qualitative differences that result in memory loss versus not. Thus, we did each of our analyses twice, first including the No net Loss and Increasing pattern data sets, and then without. We report the first here, and the results are shown in Table 9. The second are available in our on-line Supplement D for interested readers. There were no major differences between the two analysis approaches.

Table 9 Individual characteristics results of overall ANOVA and Tukey test comparisons (comparisons that did not reach significance are not shown)

Full size table

The results of the regression analyses are shown in Table 10.

Table 10 Results of logistic regressions for each of the characteristics

Full size table