Does sleep benefit source memory? Investigating 12-h retention intervals with a multinomial modeling approach

Berres, Sabrina; Erdfelder, Edgar; Kuhlmann, Beatrice G.

doi:10.3758/s13421-024-01579-8

Does sleep benefit source memory? Investigating 12-h retention intervals with a multinomial modeling approach

Open access
Published: 03 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Memory & Cognition Aims and scope Submit manuscript

Does sleep benefit source memory? Investigating 12-h retention intervals with a multinomial modeling approach

Download PDF

459 Accesses
Explore all metrics

Abstract

For retention intervals of up to 12 h, the active systems consolidation hypothesis predicts that sleep compared to wakefulness strengthens the context binding of memories previously established during encoding. Sleep should thus improve source memory. By comparing retention intervals filled with natural night sleep versus daytime wakefulness, we tested this prediction in two online source-monitoring experiments using intentionally learned pictures as items and incidentally learned screen positions and frame colors as source dimensions. In Experiment 1, we examined source memory by varying the spatial position of pictures on the computer screen. Multinomial modeling analyses revealed a significant sleep benefit in source memory. In Experiment 2, we manipulated both the spatial position and the frame color of pictures orthogonally to investigate source memory for two different source dimensions at the same time, also allowing exploration of bound memory for both source dimensions. The sleep benefit on spatial source memory replicated. In contrast, no source memory sleep benefit was observed for either frame color or bound memory of both source dimensions, probably as a consequence of a floor effect in incidental encoding of color associations. In sum, the results of both experiments show that sleep within a 12-h retention interval improves source memory for spatial positions, supporting the prediction of the active systems consolidation hypothesis. However, additional research is required to clarify the impact of sleep on source memory for other context features and bound memories of multiple source dimensions.

Targeted memory reactivation during sleep boosts intentional forgetting of spatial locations

Article Open access 11 February 2020

Multiple memories can be simultaneously reactivated during sleep as effectively as a single memory

Article Open access 04 January 2021

Sleep reactivation did not boost suppression-induced forgetting

Article Open access 14 January 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Episodic memory refers to memory for past events, experiences, or the source (context)^{Footnote 1} of information (e.g., location, time; Tulving, 2002). Empirical evidence from neuroimaging techniques such as functional magnetic resonance imaging (fMRI) points to a crucial role of the hippocampus in episodic memory (for reviews, see Eichenbaum et al., 2007; Mitchell & Johnson, 2009). Specifically, the hippocampus appears to bind the content of memories (i.e., item memory) to its unique context (i.e., source memory) during encoding (e.g., Dudai et al., 2015; Mitchell & Johnson, 2009).

Our present research addresses the role of sleep in these source-binding processes. Almost a century of research in neuroscience and psychology has impressively shown that episodic memory is supported by sleep (for a recent meta-analysis, see Berres & Erdfelder, 2021). One mechanism assumed to underlie the sleep benefit in episodic memory is memory consolidation. As such, memory consolidation during sleep increases episodic memory storage by converting recently encoded and therefore labile memories into more stable long-term memory representations (Buzsáki, 1998; Diekelmann & Born, 2010; Dudai, 2004, 2012; Dudai et al., 2015; Klinzing et al., 2019; Rasch & Born, 2013). There are various theories that explain sleep benefits in episodic memory by memory consolidation, such as the sequential hypothesis^{Footnote 2} (Ambrosini & Giuditta, 2001; Giuditta, 2014; Giuditta et al., 1995) and the synaptic homeostasis hypothesis^{Footnote 3} (Cirelli & Tononi, 2015; Tononi & Cirelli, 2003, 2006, 2014, 2020). In the current work, we focus on memory consolidation as proposed by the active systems consolidation hypothesis (Born & Wilhelm, 2012; Diekelmann & Born, 2010; Feld & Born, 2017; Inostroza & Born, 2013; Klinzing et al., 2019; Rasch & Born, 2013). This hypothesis is arguably “the currently most integrative account of sleep-dependent memory consolidation” (Klinzing et al., 2019, p. 1598), because it incorporates aspects of various consolidation theories – including the sequential and synaptic homeostasis hypothesis. Specifically, the active systems consolidation hypothesis states that during wakefulness, components of a memory representation (e.g., color, texture, odor of a fruit) are formed and distributed across various neocortical brain areas. In parallel, the hippocampus binds these components to a unique memory representation (e.g., Feld & Born, 2017; Klinzing et al., 2019). During subsequent sleep, especially during slow-wave sleep (SWS), the hippocampal memory representation is replayed by reactivating specific neuronal firing patterns (Klinzing et al., 2019; Lewis & Durrant, 2011; O’Neill et al., 2010; Pfeiffer, 2020; Wilson & McNaughton, 1994). These local synaptic upscaling processes strengthen not only synaptic connections in the hippocampus, and thus stabilize the hippocampal memory representation, but also strengthen the separate components of the memory representation by triggering replay in various neocortical brain areas. Simultaneously, global synaptic downscaling renormalizes the strength of synaptic connections across all cortical and subcortical areas by diminishing neuronal firing rates (Feld & Born, 2017; Klinzing et al., 2019). It is assumed that the combination of local synaptic upscaling and global synaptic downscaling in the hippocampus and neocortex results in a net strengthening of episodic context-bound hippocampal memory representations for relatively short retention intervals (e.g., 12 h) and more gist-like decontextualized neocortical memory representations for longer retention intervals (e.g., 3 days; Klinzing et al., 2019). This assumption is supported by studies indicating a strengthening but no decontextualization of episodic memories within 10–12 h after learning (e.g., Jurewicz et al., 2016; Lutz et al., 2017). In brief, according to the active systems consolidation hypothesis, sleep compared to wakefulness within a 12-h retention interval should strengthen associations between the components of a memory representation that were previously established during encoding.

To investigate the sleep benefit in episodic memory, researchers have often used item-item associations such as word pairs as stimulus material (Diekelmann et al., 2009; Klinzing et al., 2019; for a meta-analysis on single words and word pairs, see Berres & Erdfelder, 2021). By contrast, only a few studies investigated the sleep benefit using item-source associations (for a discussion of functional differences between item-item and item-source associations, see Mayes et al., 2007). In the following section, we review the rather mixed outcomes of studies addressing sleep benefits in memory for item-source associations.

Overview of research on sleep benefits in source memory

Using a split-night design, Rauchs et al. (2004) found better free recall performance for spatial positions (i.e., top vs. bottom) of words in a what-where-when task after sleep in the second half of the night (predominantly characterized by rapid eye movement (REM) sleep) compared to wakefulness. In contrast, sleep–wake comparisons in the second half of the night for word-list associations (i.e., temporal source memory, “when” dimension) showed no significant differences. Correspondingly, the authors found no significant differences for sleep–wake comparisons in the first half of the night (predominantly characterized by SWS) for spatial positions and lists. Furthermore, all sleep–wake comparisons for spatial positions and lists in the subsequent recognition test were not significant. When comparing sleep deprivation in the first versus the second half of the night, the authors found better free recall performance for word positions after SWS deprivation than after REM sleep deprivation (Rauchs et al., 2004). These results suggest that REM sleep contributes to the sleep benefit in item-position associations, thereby conflicting with the active systems consolidation hypothesis, which considers SWS to be more important for memory consolidation. However, in line with the consolidation hypothesis, other split-night studies showed worse memory of the frame color and spatial position for neutral pictures after REM-rich late sleep than after SWS-rich early sleep, pointing to a pivotal role of SWS for memory performance (see Groch et al., 2015; Sopp et al., 2017).

The results were also mixed for studies comparing naps versus wakefulness during the day or early evening: Wang and Fu (2009) as well as Köster et al. (2017) found no significant differences between naps and wakefulness for picture-background color associations, contradicting the active systems consolidation hypothesis. By contrast, van der Helm et al. (2011) found a significant sleep benefit in source memory for word-context associations after naps in line with the active systems consolidation hypothesis. Further support is provided by Lewis et al. (2011, Experiment 2), who observed significantly less forgetting after naps compared to wakefulness in source memory for object-background photo associations.

Classical sleep study designs that compared night-time sleep and daytime wakefulness using retention intervals up to 12 h resulted in somewhat stronger evidence for sleep-induced context memory improvements as predicted by the active systems consolidation hypothesis. Lewis et al. (2011) made use of such a design in their first experiment and found significantly less forgetting of encoding contexts after night-time sleep than daytime wakefulness, very similar to their nap study results in Experiment 2. Also using a retention interval of 12 h filled with either sleep or wakefulness, Mawdsley et al. (2014) observed a significant sleep benefit in source memory for word-position associations. Wang et al. (2017) investigated the sleep benefit for word pair-temporal context associations in children. Specifically, children learned two lists of word pairs separated by a 1-h delay between learning of the first and second list (temporal context). After a retention interval of 11 h, memory for word pairs was tested with a cued recall task. In addition, children were asked to indicate the list of the respective word pair. Interpolated sleep compared to wakefulness improved memory for word pairs and the temporal context but not for word pair-temporal context associations (Wang et al., 2017). Hence, this result provides no support for the prediction of the active systems consolidation hypothesis that sleep compared to wakefulness within a 12-h retention interval improves source memory for word-pair-context associations.

Overall, the empirical evidence concerning sleep benefits in source memory is thus quite mixed. The reviewed studies differ in several aspects that may explain the mixed results observed with respect to the sleep benefit in item-context associations. For example, researchers have not only used a wide variety of sleep study designs (i.e., split-night designs, daytime naps, night-time naps, natural sleep and wakefulness), but also different item materials (i.e., single words, word pairs, pictures) and sources (i.e., spatial positions, frame colors, background colors, background photos, posters, lists), next to different encoding instructions (i.e., intentional learning of item-context associations, incidental learning of item-context associations, intentional learning of items but incidental learning of contexts), participant populations, sample sizes, and experimental designs (i.e., within-subjects design, between-subjects design; see Appendix Table 5 for an overview of study characteristics).

Furthermore, the variety of source memory measures used likely contributed to the inconsistent results. According to the source-monitoring framework, multiple cognitive processes such as memory, decision making, guessing, and response biases are involved in making judgments about the origin of a memory (Johnson et al., 1993). These cognitive processes are confounded in frequently used standard measures of source memory (cf. Batchelder & Riefer, 1990). Source memory is often measured by simply counting the number of correct source attributions (e.g., Groch et al., 2015; Lewis et al., 2011; Mawdsley et al., 2014; Wang et al., 2017) or by using the source-identification measure (SIM; e.g., van der Helm et al., 2011), defined as the proportion of correct source attributions for all target items, irrespective of whether they were identified as “old” or “new.” Another frequently used measure for source memory is the average conditional source identification measure (ACSIM; Rauchs et al., 2004; Sopp et al., 2017; Wang & Fu, 2009), defined as the proportion of correct source attributions for all target items correctly identified as “old,” averaged across the two sources (e.g., left, right). Although item and source memory are somewhat less confounded in ACSIM than in SIM, all these source memory measures confound item memory, source memory, and guessing to some degree (Bröder & Meiser, 2007; Murnane & Bayen, 1996). We therefore argue that more rigorous and less contaminated measures of source memory are required to test whether sleep compared to wakefulness strengthens the context binding of episodic memories for retention intervals up to 12 h, as predicted by the active systems consolidation hypothesis (Inostroza & Born, 2013; Klinzing et al., 2019). Multinomial processing tree (MPT) models of source monitoring (Batchelder & Riefer, 1990; Bayen et al., 1996; Meiser & Bröder, 2002) provide an appropriate framework to achieve this goal. However, to our knowledge, sleep benefits in source memory have not yet been investigated using such models so far. In the current work, we aim to fill this gap by testing the sleep-strengthens-source-memory hypothesis using validated MPT measures of source memory tailored to two different source-monitoring tasks. Of course, to ensure comparability with previous research, traditional measures of item and source memory are employed in addition.

The current experiments

In Experiment 1, we manipulated the spatial position of pictures on a computer screen in a standard source-monitoring task (e.g., Bayen et al., 1996; Murnane & Bayen, 1996) to investigate source memory for item-context associations after a 12-h retention interval filled with either a period of night-time sleep or daytime wakefulness. We conducted a second experiment with the main purpose of conceptually replicating the results for spatial position memory of Experiment 1. In Experiment 2, we additionally manipulated the frame color orthogonally to the spatial position of pictures. This allowed us to explore two additional research questions: First, does the result for spatial position memory generalize to other source dimensions (i.e., frame color memory)? Second, does sleep compared to wakefulness benefit memory for context-context associations (i.e., bound source memory for spatial position and frame color)?

In both experiments, we explicitly instructed participants to study pictures for a later recognition test (i.e., intentional learning of items), whereas no such instruction was provided for their sources (i.e., sources were learned incidentally). To counteract possible floor effects in source memory, participants performed an orienting task during the learning phase that requires attending to the relevant source information but involves no rehearsal (i.e., indicating spatial positions using response keys during stimulus presentation on the screen; cf. Boywitt & Meiser, 2012). By preventing participants from using explicit rehearsal strategies for item-context and context-context associations, this approach creates a more realistic setting for examining everyday source monitoring. Note that most previous studies on the sleep benefit in context-binding employed intentional learning of item-context associations (for incidental learning, e.g., see Mawdsley et al., 2014; Wang et al., 2017).

To allow comparisons with previous studies, we report hit rates and false-alarm rates in addition to the sensitivity index d’ and response bias c for item memory. Whereas sensitivity and response bias are confounded in hit rates (i.e., proportion of target items correctly identified as “old”) and false-alarm rates (i.e., proportion of distractor items falsely identified as “old”), sensitivity and response bias are separated in d’ and c as derived from the signal detection theory (SDT; Stanislaw & Todorov, 1999; e.g., van der Helm et al., 2011). Specifically, larger positive values of d’ indicate better discrimination between target and distractor items. Response bias c denotes the general response tendency, with larger negative values indicating a stronger “old”-response bias, values close to zero no response bias, and larger positive values a stronger “new”-response bias (Stanislaw & Todorov, 1999).

For source memory, we report the average conditional source identification measure (ACSIM), defined as the proportion of correct source attributions for all target items correctly identified as “old,” averaged across the two sources (e.g., left, right) of a source dimension (e.g., spatial position; Murnane & Bayen, 1996). Because ACSIM is not defined when all target items correctly identified as “old” are assigned to the same source (e.g., right) of a source dimension (e.g., spatial position), we report the conditional source identification measure (CSIM) in these cases. This measure is defined as the averaged proportion of correct source attributions for all target items correctly identified as “old.” For ACSIM and CSIM, larger positive values indicate better source memory. Note, however, that both measures confound source memory with item memory in some circumstances, for example, when targets are identified as “old” based on guessing (Bayen et al., 1996; for a detailed discussion, see Murnane & Bayen, 1996).

In contrast to ACSIM and CSIM, MPT models allow us to disentangle source memory from item memory and guessing (for reviews on this model class and a MPT tutorial, see Batchelder & Riefer, 1999; Erdfelder et al., 2009; Schmidt et al., 2023). MPT models have therefore gained considerable popularity in source memory research in general (e.g., Arnold et al., 2019; Bell et al., 2012, 2017; Boywitt & Meiser, 2012; Kuhlmann et al., 2016; Nadarevic & Erdfelder, 2013, 2019), albeit with the exception of sleep-related research.

There are several options for fitting MPT models to empirical data (e.g., Heck et al., 2018; Moshagen, 2010; Nestler & Erdfelder, 2023), with complete and partial pooling being the two most often used methods. Specifically, in the complete pooling approach, observed category frequencies are aggregated across participants, and the maximum likelihood (ML) method is used to obtain MPT-parameter estimates (aggregated model-based analysis). In contrast to complete pooling, the partial pooling approach explicitly accounts for individual differences between participants by combining information on the individual and group level (hierarchical model-based analysis). For individual and group-level parameter estimation, partial pooling often relies on a Bayesian approach employing Markov-chain Monte Carlo (MCMC) methods (Heck et al., 2018). Here we performed both aggregated and Bayesian hierarchical model-based analyses to check whether our results are robust against the different distributional assumptions involved in complete and partial pooling. For complete pooling in the aggregated model-based analysis, we used the software multiTree (Moshagen, 2010). The latent-trait approach (Klauer, 2010) as implemented in the R package TreeBUGS (Heck et al., 2018) was used for partial pooling in the hierarchical model-based analysis.

Hypotheses, study design, sample size, and analysis plan were preregistered for Experiment 1 (https://osf.io/gctzn) and Experiment 2 (https://osf.io/a6z4u). For both experiments, the data and stimulus materials are available via the Open Science Framework (OSF; https://osf.io/8rmj2/?view_only=02e5eec5c3e54fd4aff3d55eedebffa7). In the respective Method sections, we provide detailed information about the MPT models used, sample size determination, and data exclusions.

Experiment 1

To reiterate, according to the active systems consolidation hypothesis, the hippocampus binds the content (i.e., item memory) and its unique context (i.e., source memory) to a unique memory representation during encoding. This memory representation is replayed during subsequent sleep, which should result in better item and source memory compared to wakefulness (Feld & Born, 2017; Klinzing et al., 2019). For a 12-h retention interval, the active systems consolidation hypothesis thus predicts that both item memory and source memory should benefit from sleep. To test these two hypotheses, we used the two-high-threshold MPT model of source monitoring (2HTSM) shown in Fig. 1. The 2HTSM model performed best in a comparative validation study of source-monitoring models (Bayen et al., 1996). As such, this model is based on a standard source-monitoring task in which participants study items from two sources and are subsequently asked whether the item was previously presented, and if so, in which source (e.g., Bayen et al., 1996; Murnane & Bayen, 1996). The 2HTSM provides separate parameters for item memory, source memory, and guessing. Specifically, participants correctly recognize a target item presented in source A or B as “old” or a distractor item as “new” with probability D. Conditionally on correct item recognition, participants correctly identify the source with probability d. However, if item memory (1—D) or source memory (1—d) fails, participants are assumed to guess. In case of successful item memory but failing source memory, participants correctly guess the source of a target item with probability a. If item memory fails, participants guess “old” with probability b. Finally, if both item and source memory fail, participants correctly guess the source with probability g (Bayen et al., 1996).

In the most general version of the 2HSTM, item memory, source memory, and source guessing may vary between item types and sources as illustrated in Fig. 1. To arrive at an identifiable and most parsimonious 2HTSM submodel that still fits the data, we first tested invariance of item memory with respect to item types and sources, followed by invariance tests of source memory, and finally guessing. By using this principled strategy, we aimed at identifying a submodel with a minimum of precisely estimable parameters (see Bayen et al., 1996).

According to the active systems consolidation hypothesis, the corresponding item memory (D) and source memory (d) parameters should both be larger when participants sleep during the 12-h retention interval than when they stay awake.

Method

In this experiment, we compared participants randomly assigned to a wake versus sleep condition. Whereas participants in the wake condition learned the material in the morning and were tested in the evening after a 12-h retention interval of daytime wakefulness, this was reversed for participants in the sleep condition, who were tested after a period of night-time sleep. Crucially, note that previous research showed comparable performance in learning as well as testing parameters by using the same sleep study design, showing that circadian effects are not a serious confound in this design (e.g., Abel & Bäuml, 2012, 2013a, 2013b, 2014; Bäuml et al., 2014; Erdfelder et al., 2024; Fenn & Hambrick, 2013).

Participants

We determined the necessary sample size a priori by conducting two power analyses: First, despite our directional predictions, we conservatively performed an a priori power analysis for a two-tailed t test with two independent groups using G*Power 3.1 (Faul et al., 2007). Given a medium effect size (Cohen’s d = 0.50), a conventional α-level of 0.05, and a target-power of 1—β = 0.80, the analysis resulted in a total sample size of 128 participants. Second, we determined the necessary sample size for the model-based analysis using multiTree (Moshagen, 2010). Assuming a sleep–wake difference of 0.10 in the crucial parameter (D or d, depending on the hypothesis), an analysis based on 130 participants, 60 target items, and 30 distractor items resulted in a power larger than 0.99 for the item memory parameter D and a power of 0.96 for the source memory parameter d (for more detailed information, see the preregistration on the OSF, https://osf.io/gctzn). Thus, we strove for a sample of 130 participants. Data collection took place from fall 2020 to spring 2021. Note that we extended the data collection phase until we reached the desired number of participants because data collection was slow and only a fraction of the targeted sample size was collected within the preregistered 3 months.

In total, 174 participants recruited via mailing lists of the University of Mannheim, social media, personal contacts, and the online research platform Prolific (https://www.prolific.co; Palan & Schitter, 2018; Peer et al., 2017) took part in the online experiment. To participate in the experiment, participants had to be between 18 and 35 years old, speak German fluently, and have no neurological disorders (see the preregistration on the OSF, https://osf.io/gctzn). After successful completion of the experiment, 103 participants recruited via Prolific (59.20%) were paid a flat fee of £4.50, whereas 71 participants recruited through other channels (40.80%) either received corresponding course credits or were eligible to win vouchers. Due to random assignment to the wake versus sleep condition, the number of participants who were paid (n_wake = 50, n_sleep = 53), received corresponding course credits, or were eligible to win vouchers (n_wake = 40, n_sleep = 31) were approximately balanced across the experimental conditions.^{Footnote 4} Note that the experiment was successfully completed only if the following two conditions were met: First, all parts of the experiment had to be completed within the set time frames (i.e., registration, learning, and testing session). Second, more than 50% of the responses in the orienting task had to be correct.

Following the preregistered exclusion criteria, 23 participants were excluded from the analysis, because they indicated that they were distracted or interrupted during the experiment. Another four participants had to be excluded because the retention interval was not within 11–13 h. Furthermore, seven participants of the wake condition were excluded because they napped during the retention interval, and two participants were excluded because they reported having neurological disorders. We also excluded two participants because of substantial alcohol consumption during the retention interval (i.e., females were excluded if they consumed more than 20 g alcohol, males were excluded if they consumed more than 40 g alcohol), and one participant with a larger false-alarm rate than hit rate. Three additional participants were excluded for unforeseen reasons not included in the preregistration: One participant reported using memory aids (e.g., notes, screenshots), one participant reported technical problems, and another participant assigned to the wake condition delayed the start of the experiment so that it started in the evening instead of the morning. In sum, we excluded 42 participants, leaving 132 participants (n_wake = 65, n_sleep = 67) for analysis, all of them fluent in German. The 132 participants were between 18 and 35 years of age (M = 26.77 years, SD = 4.48), 84 (63.64%) were female. For all participants, many more than the minimally required 50% of the responses in the orienting task were correct (M_total = 98%, M_wake = 97%, M_sleep = 98%; see Table S1 in the Online Supplemental Materials (OSM) for more detailed sample characteristics), confirming that they paid attention to the source (i.e., spatial position) at encoding.

Materials

We selected 160 colored object photos from the bank of standardized stimuli (BOSS; Brodeur et al., 2010) of which 60 randomly chosen target pictures were displayed on either the left or the right side of the screen (i.e., 30 pictures each were displayed at the 10% and 90% position on the x-axis). Thus, spatial positions of pictures (left vs. right) served as the two sources of interest. Another 30 pictures were randomly selected as distractors, and four additional pictures were randomly selected as buffer items that were included at the start of the learning phase to prevent primacy effects. Note that we decided against including a recency buffer because of the 12-h retention interval. A list of the 160 pictures and detailed information about the selection criteria are available via the OSF (https://osf.io/8rmj2/?view_only=02e5eec5c3e54fd4aff3d55eedebffa7).

Procedure

The online experiment was conducted with SoSci Survey (Leiner, 2020), using lab.js (Henninger et al., 2022) for stimulus presentation during the study phase, and consisted of three parts: registration, learning, and testing session (see Fig. 2 for an illustration). In the registration session, participants gave informed consent before being randomly assigned to either the sleep or wake condition. They were asked to pick a date and time for the first session in line with their randomly predetermined condition (i.e., wake condition: 7 a.m. to 10 a.m.; sleep condition: 7 p.m. to 10 p.m.) and were informed that the second session starts 12 h later. Participants received the access link via email or Prolific notification 15 min before the start of the learning session. During the study phase, 64 randomly selected pictures (i.e., four buffer and 60 target items) were sequentially presented on the left or right side of the screen for 4 s each with an interstimulus interval of 1 s (i.e., blank white screen for 500 ms followed by a fixation cross for 500 ms). While a picture was presented on the screen, participants performed the orienting task, which entailed pressing the correct button for the spatial position within the 4-s picture-presentation time. The two buttons labeled “left” and “right” were arranged next to each other and were displayed below the picture. Only participants who answered with the correct spatial position for more than 50% of the 64 pictures completed the learning session and were invited to the testing session 12 h later. Again, participants received the access link via email or Prolific notification 15 min before the session started. For the testing session, the 60 target items were intermixed with 30 distractor items and presented in the middle of the screen with two buttons labeled “old” and “new” below. Note that we varied the spatial position of the labels “old” and “new” randomly between participants but kept it constant within participants. By pressing one of the two buttons, participants indicated whether the picture was presented during the study phase (“old”) or not (“new”). If participants answered “old,” they were asked whether the picture was presented left or right and to respond with the corresponding button. This task was followed by control and demographic questions, which also included the exclusion criteria mentioned before (e.g., distraction, alcohol consumption, use of memory aids, technical problems; for details, see the preregistration on the OSF, https://osf.io/gctzn). Finally, participants were thanked and debriefed.

Results

We set a significance level of α = 0.05 for all analyses. For hit and false alarm rates as well as d’ and c measures of item recognition we report means, standard errors, and t-test results in Table 1.^{Footnote 5} Regarding item memory, all two-tailed t tests for two independent groups showed no statistically significant differences between the sleep and the wake condition, t(130) ≤ 1.62, p ≥ 0.107 (see Table 1). In contrast, source memory as measured by ACSIM significantly benefitted from sleep, t(130) = 3.46, p = 0.001, estimated Cohen’s d = 0.59 (sleep condition: M = 0.77, SE = 0.01; wake condition: M = 0.69, SE = 0.01; Table 1).^{Footnote 6} Taken together, using commonly applied measures of item and source memory, we found statistically significant evidence for a sleep benefit in source memory but not in item memory.

Table 1 Results of item- and source-memory analyses in Experiment 1

Full size table

The most parsimonious model we originally aimed at – Submodel 4 of the 2HTSM with parameter D for item memory, parameter d for source memory, and parameters b and g for guessing (Bayen et al., 1996; see the preregistration on the OSF, https://osf.io/gctzn) – produced considerable misfit for the aggregated data, G²(4) = 10.21, p = 0.037. While invariance of item and source memory parameters across item types and sources turned out to be unproblematic, assuming invariance of source guessing parameters a and g in addition resulted in the observed misfit. Hence, applying Submodel 5a of the 2HTSM (Bayen et al., 1996) – with a single parameter D for item memory, a single parameter d for source memory, and three parameters a, b, and g for guessing – resulted in a good fit, G²(2) = 1.78, p = 0.411. The ML parameter estimates, standard errors, and 95% confidence intervals of Submodel 5a for the wake and sleep condition are summarized in Table 2. We found a statistically significant difference between sleep versus wake conditions in the item memory parameter D, ΔG²(1) = 13.66, p < 0.001. The item memory parameter estimate for the sleep condition was almost 5% larger than for the wake condition. Similarly, the source memory parameter d also differed significantly between conditions, ΔG²(1) = 31.30, p < 0.001, with about 15% higher source memory estimates after sleep than after wakefulness. Concerning the guessing parameters, we found significantly more “old”-guessing in the wake than the sleep condition (parameter b), ΔG²(1) = 6.09, p = 0.014; and a significantly stronger “left” guessing bias for unrecognized items after sleep than after wakefulness (parameter g), ΔG²(1) = 10.13, p = 0.001. By contrast, there was no statistically significant difference between the sleep and wake condition in source guessing for recognized items (parameter a), ΔG²(1) = 1.36, p = 0.243.

Table 2 Aggregated parameter estimates of the two-high-threshold multinomial model of source monitoring (2HTSM) for Experiment 1

Full size table

To check the robustness of our results, we reanalyzed the same data by performing a hierarchical model-based analysis in the framework of Klauer’s (2010) latent-trait model as implemented in TreeBUGS (Heck et al., 2018) for partial pooling. As can be seen in the Appendix (see Table 6), the estimated group-level means resembled those reported in Table 2. We thus conclude that the basic result pattern does not depend on whether complete or partial pooling approaches are used for data analysis.

Discussion

Both the ACSIM-based and the model-based results suggest that sleep compared to wakefulness benefits source memory. This is in line with a core prediction of the active systems consolidation hypothesis that sleep benefits source memory for retention intervals of up to 12 h.

For item memory, the descriptive result patterns of d’ and the aggregated as well as hierarchical model-based analyses suggest that sleep compared to wakefulness might benefit item recognition. Whereas item memory was descriptively higher after sleep versus wakefulness in all three analyses, the sleep benefit was significant only for complete pooling. This deviance is likely due to different analysis-levels (i.e., complete pooling, partial pooling, no pooling) that account for potential individual differences to a varying extend. Specifically, the complete pooling approach (aggregated analysis) assumes that the data are independently and identically distributed for all participants, thereby ignoring potential individual differences. By contrast, the partial pooling approach (hierarchical analysis) accounts for individual differences. The same applies to d’, which is calculated for each participant separately (i.e., no pooling). Thus, the significant sleep benefit in item memory observed for complete pooling is likely due to the fact that partial and no pooling approaches account for individual differences, whereas the complete pooling approach does not. Importantly, our mixed results concerning item memory are in line with previous research that uses recognition tasks to assess item memory, also yielding mixed evidence for the active systems consolidation hypothesis: Some studies found a significant sleep benefit in item memory (e.g., Köster et al., 2017; Mawdsley et al., 2014), whereas others did not (e.g., van der Helm et al., 2011; Wang & Fu, 2009). In fact, a recent meta-analysis showed that the sleep benefit for word materials is largest in free recall (Hedges’ g = 0.49), followed by cued recall (Hedges’ g = 0.45), and lastly recognition tasks (Hedges’ g = 0.38; Berres & Erdfelder, 2021). This suggests that item recognition apparently benefits from sleep only slightly, thereby making it difficult to detect these small positive sleep effects in item recognition tasks (e.g., Rauchs et al., 2004; Wang & Fu, 2009).

In sum, Experiment 1 indicates that sleep improves source memory within a 12-h retention interval as predicted by the active systems consolidation hypothesis. However, to establish the validity of this conclusion more rigorously, our results require an experimental follow-up evaluation. We therefore conducted a second experiment with the aim of conceptually replicating the results for spatial position memory. In addition, by manipulating frame color orthogonally to the spatial position of pictures in Experiment 2, we were able to explore whether the results for spatial position memory generalize to a second source dimension (i.e., frame color). Furthermore, we explored whether sleep within a 12-h retention interval also strengthens bound memory for spatial position and frame color.

Experiment 2

As in Experiment 1, we predict that both item memory and source memory should benefit from sleep compared to wakefulness in a 12-h retention interval. Because hippocampal memory representations include not only item-context but also context-context associations, we also explored whether sleep improves bound memory for two source dimensions. We tested these predictions using a reparameterized variant of the MPT model of multidimensional source monitoring (Meiser, 2014), shown in Fig. 3. Like the 2HTSM, this model is based on a source-monitoring task that is, however, extended to two source dimensions (e.g., a position dimension with sources “left” and “right,” and a color dimension with sources “blue” and “yellow”; Meiser, 2014).

The multinomial model of multidimensional source monitoring provides separate parameter estimates for item memory, bound source memory (i.e., spatial position plus frame color), unbound source memory (e.g., spatial position only), and guessing. Specifically, participants correctly recognize a target item presented by source i of the first source dimension (e.g., “left” or “right” on source dimension “spatial position”) and source j of the second source dimension (e.g., “blue” or “yellow” on source dimension “frame color”) as “old” with probability D_ij or detect a distractor item as “new” with probability D_new. Conditionally on correct item recognition, participants correctly identify the source combination (e.g., left and blue, left and yellow, right and blue, right and yellow) of recognized items with bound source probability d_ij. In contrast, if bound source memory fails for recognized items (i.e., the source combination is not correctly identified with probability 1—d_ij), participants can still correctly identify the sources i (e.g., “left” or “right” on source dimension “spatial position”) and j (e.g., “blue” or “yellow” on source dimension “frame color”) of either or both source dimensions independently with probabilities e_ij^Position and e_ij^Color, respectively. However, if item memory (1—D_ij), bound source memory (1—d_ij), and unbound source memory (1—e_ij^Position, 1—e_ij^Color) fail, participants are assumed to guess. In case of successful item memory but bound-source-memory and unbound-source-memory failure for either or both source dimensions, participants guess source A of source dimension i (e.g., “left” on source dimension “spatial position”) for a target item with probability a^Position. They also guess source X of source dimension j (e.g., “blue” on source dimension “frame color”) for a target item assigned to source A (e.g., left) or B (e.g., right) of source dimension i (e.g., spatial position) with probability a_|left^Color or a_|right^Color, respectively. If item memory fails, participants guess “old” with probability b. For unrecognized target or distractor items identified as “old,” participants guess source A of source dimension i (e.g., “left” on source dimension “spatial position”) with probability g^Position. In addition, they guess source X of source dimension j (e.g., “blue” on source dimension “frame color”) for unrecognized target or distractor items assigned to source A (e.g., left) or B (e.g., right) of source dimension i (e.g., spatial position) with probability g_|left^Color or g_|right^Color, respectively (Meiser, 2014).

In its most general version, the multidimensional source memory model allows for parameters that may differ between item types and sources, as illustrated in Fig. 3. To simplify this model and ensure identifiability of parameters, we employed basically the same principled strategy as previously used for the 2HTSM in Experiment 1. Specifically, we successively imposed the following constraints on the parameters (cf. Meiser, 2014; Meiser & Bröder, 2002): First, the item memory parameters D_ij were equated across the source dimensions “spatial position” and “frame color,” and D_New was constrained to be equal to the resulting item memory parameter D. Second, the bound source memory parameters d_ij were also equated across the source dimensions “spatial position” and “frame color” (parameter d). Next, the unbound source memory parameters for spatial position e_ij^Position and frame color e_ij^Color were equated across the source dimension “frame color” (parameter e^Position) and “spatial position” (parameter e^Color), respectively (Meiser, 2014; Meiser & Bröder, 2002). Finally, additional equality constraints were imposed on the source guessing parameters (i.e., a^Position = g^Position, a_|left^Color = g_|left^Color, a_|right^Color = g_|right^Color).

Drawing on the active systems consolidation hypothesis, we predict for a 12-h retention interval that the corresponding item memory parameters, bound source memory parameters, and unbound source memory parameters e^Position and e^Color should be larger after sleep than wakefulness.