Maintenance of to-be-remembered information is one of the defining functions of working memory (Baddeley, 2012; Camos, 2015). Working memory for verbal materials—both auditory speech sounds and visual orthographic materials—has been studied at length, particularly in the context of the phonological loop mechanism of Baddeley’s multicomponent working memory (M-WM) model. Research on working memory mechanisms for nonverbal sounds—including music and common environmental sounds—has lagged behind the extensive literature on verbal working memory.

The M-WM model specified an articulatory, subvocal rehearsal mechanism that works in tandem with a passive store to maintain verbal information in working memory. Research has demonstrated several characteristic effects of subvocal maintenance (e.g., the phonological similarity effect and the word length effect; see Baddeley, 2012; Camos, 2015). Additionally, studies have shown a detrimental impact on verbal memory when tasks are introduced to interfere with the articulatory rehearsal mechanism (e.g., Baddeley, Lewis, & Vallar, 1984). The disruption of subvocal rehearsal, dubbed articulatory suppression (AS), often has consisted of the rote repetition of irrelevant verbal stimuli such as a string of numbers or a single word or syllable. The negative impact of AS on verbal working memory tasks offered evidence for the central role of articulation in working memory for verbal stimuli. Articulation, especially covert subvocalization, became a hallmark of verbal mental representations (for a review, see Baddeley, 2012).

Berz (1995) examined the literature on working memory for music in an attempt to draw comparisons with the M-WM model. He concluded that musical stimuli do not exhibit the same pattern of effects as verbal stimuli in working memory. For example, Berz noted Salame and Baddeley’s (1987) finding that lyrical music interfered with concurrent verbal working memory tasks to a greater extent than instrumental music—a finding that Berz found difficult to reconcile with a shared working memory mechanism. As a result, Berz proposed a “musical memory loop” for rehearsing musical stimuli (tonal melodies, etc.) that was independent from the phonological loop of verbal working memory.

If the articulatory rehearsal mechanism of verbal working memory does not play a role in working memory for nonverbal sounds, then AS should not interfere with concurrent working memory tasks for nonverbal sounds. Yet the studies that have looked for effects of AS on nonverbal sounds have produced conflicting results. Logie and Edworthy (1986, Exp. 1) showed that AS (repeating “the”) interfered with same–different comparisons of melodies relative to a control condition without suppression. Similarly, Schendel and Palmer (2007, Exp. 1) compared same–different judgments for melodies versus strings of random digits across auditory or visual presentations of the stimuli. A control group was compared to groups experiencing either verbal (repeating “the”) or “musical” (repeatedly singing “la” in a constant pitch) suppression. Both types of suppression showed large and equal interference effects for both digits and melodies that persisted across presentation modalities.

Similar results have been found with various combinations of musical and AS tasks. Smith, Wilson, and Reisberg (1995, Exp. 3) showed that AS interfered with judgments about changes in the pitch of notes in familiar melodies (e.g., “Take Me Out to the Ball Game”). Brodsky, Kessler, Rubinstein, Ginsborg, and Henik (2008) showed that comparing musical notation to a heard melody was impaired by articulation (singing/humming an unrelated tune) in highly trained musicians. Koelsch et al. (2009) used stimuli embedded with both verbal (sung syllable) and tonal (pitch) content. AS (covert singing) impaired memory for both the verbal and tonal aspects of heard stimuli.

In other reports, however, AS did not impair working memory for nonverbal sounds. Logie and Edworthy (1986, Exp. 2) found that AS did not interfere with a two-note pitch change detection task as compared to a control condition. More recently, Soemer and Saito (2015, Exp. 1) used sounds that were abstract and discriminable primarily by their timbre. The primary task involved the presentation of two, three, or four to-be-remembered sounds, followed by a retention interval. Participants then heard a single probe sound and indicated whether it was in the initial set. Relative to a control condition, AS during the retention interval impaired performance only for the two-item sets. McKeown, Mills, and Mercer (2011) reported a study in which reading lists of words aloud—an activity akin to AS—did not impair the discrimination of sounds with subtle timbre differences across retention durations from 5 to 30 s. These studies suggested that articulatory rehearsal may not fully explain memory for nonverbal sounds.

Relatively recent research has provided evidence that, in addition to articulatory rehearsal, an attentional refreshing mechanism operates to maintain verbal information in working memory. Refreshing involves periodically “thinking of” (see Raye, Johnson, Mitchell, Greene, & Johnson, 2007) an item or representation in memory, the act of which serves to facilitate maintenance. Aside from the specification that refreshing does not involve subvocalization, the mechanism differs from articulatory rehearsal primarily in that it uses resources. Whereas articulatory rehearsal proceeds as a mostly automatic (i.e., resource-free) process after the articulatory program has been constructed (Naveh-Benjamin & Jonides, 1984), attentional refreshing draws upon central, resource-limited processes to maintain information in working memory. These two mechanisms have been posited to “operate jointly and independently” (Camos, Lagner, & Barrouillet, 2009, p. 459).

Attentional refreshing would seem to hold one possible explanation for the inconsistent effects of AS on memory for nonverbal sounds (see Siedenburg & McAdams, 2017). Perhaps in some circumstances involving AS, attentional refreshing is an additional mechanism that can support the maintenance of nonverbal sounds in working memory, whereas in other circumstances this mechanism is not available or is not used. In the present experiments we examined the effects of both AS and the suppression of attentional refreshing on memory for four-note melodies. Participants heard a four-note standard melody. Following an 8-s retention interval, participants heard a comparison melody and decided whether the comparison was the same as or different from the standard. The stimuli and the melody memory task were similar to those used by Schendel and Palmer (2007).

The suppression tasks were modeled after the interference tasks of Camos et al. (2009, Exp. 1). In their study, participants attempted to remember a string of consonants. During a retention interval, the participants were exposed to simple math problems. In the AS condition, participants viewed a solved math problem (e.g., “5 – 3 = 2”) and read the problem aloud (e.g., “five minus three equals two”). Another condition featured both AS and attentional refreshing suppression (ARS) concurrently; participants viewed an unsolved problem (e.g., “7 – 4 = ?”) and were required to solve and then read the problem aloud (e.g., “seven minus four equals three”). In both conditions, articulatory rehearsal was blocked by the reading-aloud component, but the latter condition also was hypothesized to block attentional refreshing rehearsal due to the additional load that solving the simple math problems imposed on central attentional resources. Accordingly, Camos et al. found that participants’ verbal memory spans were significantly reduced when solving the math problems as compared to only reading them, which the researchers interpreted as evidence that both articulatory rehearsal and attentional refreshing operate to maintain information in verbal working memory. Their study was one of the first to establish the independent effects of AS and ARS on memory for verbal materials. Furthermore, their methodology was amenable to translation to nonverbal auditory stimuli—unlike, for example, some of their later studies (see Camos, 2015) that examined phonological similarity and word length, two verbal working memory effects that have no clear parallel for melodic stimuli.

We used four experimental conditions in our experiments. During a control condition, participants experienced no suppression tasks during an 8-s retention interval between the to-be-compared melodies. To parallel the method of Camos et al. (2009), during an AS condition, participants read aloud already solved math problems. A concurrent AS and ARS condition required them to solve the math problems while reading them aloud. We added a fourth condition that only featured ARS without AS; participants in this condition viewed the unsolved math problems and solved them by responding manually on the computer keyboard without articulating anything during the retention interval. If attentional refreshing and articulation both play roles in the maintenance of melodies (as seems to be the case with verbal working memory), we predicted that the AS and ARS conditions should both exhibit worse memory performance than the control condition. Also, the concurrent AS and ARS condition should result in worse performance than all of the other conditions.

Experiment 1

Participants

The participants (N = 40, 30 females, 10 males; M age = 19.58 years, SD = 1.22) were volunteers from undergraduate psychology courses. Those who were enrolled in eligible courses were compensated with course credit; some volunteered without compensation. Participants reported means of 4.25 years of experience playing a musical instrument (SD = 4.89, mdn = 2.50, mode = 0), 3.90 years of experience reading musical notation (SD = 4.76, mdn = 2.50, mode = 0), and 4.00 years of formal musical training (SD = 4.15, mdn = 3.00, mode = 0). On a scale from 1 (I have no musical ability at all) to 7 (I am a professional musician), participants’ average self-reported musical ability was 2.85 (SD = 1.44, mdn = 3.00, mode = 2).

Stimuli

Melodies

The melody stimuli were modeled after the stimuli used by Schendel and Palmer (2007). Sixty-four different four-note standard melodies were created. Individual notes used the MIDI piano timbre and were 500 ms in duration, including 10-ms onset and offset ramps. To generate the melodies, notes were randomly drawn from the nine notes (D4 to E5) of the C major diatonic scale. Adjustments were made to the randomly generated note sequences (i.e., a note was shifted one note up or down) as necessary to prevent the same note from repeating successively, and no successive intervals of an octave or more were permitted. Each monaural melody had a total duration of 2,000 ms. From the 64 standard melodies, 32 additional comparison melodies were created. The comparison and standard melodies differed by a one-note interval, either increasing or decreasing in pitch, on either the second or the third note in the melody. The directions and positions of pitch changes were distributed uniformly across conditions of the study.

Math problems

The math problems used in the experiment were generated from a website.Footnote 1 The problems involved only addition or subtraction, and all problem constituents and solutions were constrained to the digits 1–9.

Apparatus

The experiment was conducted on a 21.5-in. iMac. Sounds were presented through desktop speakers positioned approximately 18 in. to each side of the computer display.

Task and experimental conditions

For each study trial, participants listened to a melody that was followed by an 8-s delay interval, then listened to a second melody. Their task was to decide whether the two melodies were the same or different. Four experimental conditions were interposed during the 8-s interval. Figure 1 shows a schematic representation of the structure of trials in each of the four conditions.

Fig. 1
figure 1

Schematic representation of trials in the four experimental conditions

For control trials, a blank screen filled the 8-s delay. For articulatory suppression (AS) trials, participants read two different solved math problems aloud during the 8-s interval. Each solved problem appeared in its entirety for 3,200 ms and was followed by an 800-ms blank screen. Participants spoke the problem aloud as it appeared on the screen. Articulatory suppression plus attentional refreshing suppression (AS + ARS) trials were the same as the AS trials, except the math problems were presented in an unsolved format and had to be solved by the participants. For attentional refreshing suppression (ARS) trials, participants viewed unsolved math problems and had to solve them. Instead of reading and solving the problems aloud, however, participants silently viewed and solved the problems and responded using the computer keyboard. Therefore, this condition suppressed attentional refreshing rehearsal without interfering with articulatory rehearsal.

Procedure

Following informed consent, participants were seated in front of the computer in an individual testing room. An experimenter sat behind each participant at a slight angle for the duration of the study to observe and confirm that participants were following all instructions. This experimenter also noted incorrect responses to the math problems in the AS + ARS and ARS conditions. A computer program written in Adobe Flash embedded in the Qualtrics platform presented all stimuli and collected all primary task data. From the initial pool of 64 melodies, 16 were randomly allocated to each of the four conditions. From the initial pool of 48 math problems, 16 were randomly allocated to each of the three conditions that used math problems. The conditions were presented in a random order to each participant. Sixteen trials (eight “same” and eight “different” trials) were presented in each condition in a random order. Participants registered their responses (“same” or “different”) with the computer mouse after the conclusion of the comparison stimulus. There were mandatory 5-s minimum pauses between trials (i.e., after a response was registered and before the participant could click a button to initiate the next trial). Following the experimental procedures, participants answered demographic questions. Finally, as an attentional check item, participants responded to the statement “I followed the study instructions to the best of my ability for this entire experiment” on a rating scale from 1 (strongly disagree) to 7 (strongly agree).

Preliminary power analysis

In Camos et al. (2009, p. 461), the effect of the difference between AS and AS + ARS was η p 2 = .35. An a priori power analysis using the G*Power software indicated that our design (four repeated measures conditions, r = .50 correlation among the repeated measures, and α = .05, N = 40) would detect an effect of η p 2 = .055 with power = .95.

Results and discussion

The data from five participants were considered for elimination from the analyses. The first four participants completed the task with a mandatory 10-s pause between trials; for practical reasons, this pause was halved for the remaining 36 participants. Another participant self-reported a score of 4 on the attentional-check item; all other participants responded with ratings of 6 or 7 (M = 6.77, SD = 0.58, mdn = 7, mode = 7). All analyses showed the same pattern of significant effects with or without these five participants included in the sample, so the data from all 40 participants were included in all analyses, for completeness.

Measures of sensitivity (d') were calculated for each participant in each condition. Hit rates and false alarm rates of 0 and 1.00 in the data were replaced with rates of .0625 (0.5 ÷ n, where n was the number of possible hits or false alarms) and .9375 [(n – 0.5) ÷ n], respectively (for a complete discussion of rates of 0 and 1.00 in signal detection calculations, see Stanislaw & Todorov, 1999). Visual inspection of the histograms confirmed that all variables were approximately normally distributed. The assumption of sphericity was met for all analyses. All post hoc comparisons used Fisher’s LSD test.

A 1 × 4 repeated measures analysis of variance (ANOVA) showed a significant effect of condition on d' scores, F(3, 117) = 9.97, p < .001, η p 2 = .20. Pairwise comparisons showed that, as compared to the control condition (M = .89, SE = .15), participants performed worse in both the AS (M = .32, SE = .12), p < .001, and AS + ARS (M = .42, SE = .12), p = .002, conditions, which were not different from each other, p = .51. Both the AS and AS + ARS conditions also resulted in worse performance than the ARS (M = .97, SE = .15) condition, p < .001 and p = .001, respectively. The control and ARS conditions were not different from each other, p = .53. The results for the d' dependent variable are shown in Fig. 2.

Fig. 2
figure 2

Sensitivity scores as a function of condition in Experiment 1. Error bars represent standard errors

Incorrect responses to the math problems in the AS + ARS (M = 0.35, SD = 0.80 mdn = 0, mode = 0) and the ARS (M = 0.95, SD = 1.01, mdn = 1.00, mode = 0) conditions were rare. There was no significant relationship (which would indicate a performance trade-off) between math errors and d' scores for either the AS + ARS condition, r s (38) = .28, p = .08, or the ARS condition, r s (38) = .13, p = .43.

The results of Experiment 1 showed that AS interfered with memory for melodies, which suggested that articulatory rehearsal is involved in the maintenance of melodies in working memory. An alternate interpretation is that the results obtained in Experiment 1 were due to perceptual (acoustic) interference. The two AS conditions were the only conditions that also featured an auditory stimulus—the sound of participants’ own voices—during the retention interval. In Experiment 2, we sought to rule out this explanation by using silent, mouthed AS.

Experiment 2

Participants

The recruitment of participants (N = 36, 28 females, 8 males; M age = 19.53 years, SD = 0.91) followed the same procedures as in Experiment 1. Participants reported means of 4.24 years of experience playing a musical instrument (SD = 4.43, mdn = 3.00, mode = 0), 3.33 years of experience reading musical notation (SD = 4.43, mdn = 0.50, mode = 0), and 3.99 years of formal musical training (SD = 4.35, mdn = 2.50, mode = 0). On a scale from 1 (I have no musical ability at all) to 7 (I am a professional musician), participants’ average self-reported musical ability was 2.78 (SD = 1.44, mdn = 3.00, mode = 1).

Stimuli, apparatus, task, and procedure

The methods of Experiment 2 were the same as those of Experiment 1 in all regards except that articulatory suppression was silent. Participants were instructed to “move your mouth as if reading each math problem aloud as you see it on the screen and solve the problem, but do not actually make any sound.” Participants were told to move their mouths such that the movement was observable, and the experimenter observed to confirm compliance with these instructions.

Results and discussion

The data from one participant who self-reported a score of 3 on the attentional check item were considered for elimination from the analyses. All other participants responded with ratings of 5, 6, or 7 (M = 6.64, SD = 0.80, mdn = 7, mode = 7). All analyses showed the same pattern of significant effects with or without this participant included in the sample, so the data from all 36 participants were included in all analyses, for completeness.

A 1 × 4 repeated measures ANOVA showed a significant effect of condition on d' scores, F(3, 105) = 7.26, p < .001, η p 2 = .17. Pairwise comparisons showed that, as compared to the control condition (M = .58, SE = .11), participants performed worse in both the AS (M = .28, SE = .08), p = .006, and AS + ARS (M = .26, SE = .08), p = .006, conditions, which were not different from each other, p = .82. Both the AS and AS + ARS conditions also resulted in worse performance than the ARS (M = .63, SE = .09) condition, ps = .001. These results are shown in Fig. 3.

Fig. 3
figure 3

Sensitivity scores as a function of condition in Experiment 2. Error bars represent standard errors

Incorrect responses to the math problems in the AS + ARS condition were not recordable, since the participants were not speaking aloud. Incorrect responses in the ARS condition (M = 0.64, SD = 1.09, mdn = 0 , mode = 0) were rare. There was no significant relationship (which would indicate a performance trade-off) between math errors and d' scores in the ARS condition, r s (34) = .21, p = .22.

The results of Experiment 1 were replicated and extended in Experiment 2 using silent AS. This suggested that the locus of interference in the AS conditions was articulatory rather than auditory in nature. Macken and Jones (1995) also provided evidence that working memory interference from AS is articulatory rather than auditory in nature.

General discussion

Across two experiments, both conditions that featured AS during the retention interval showed worse performance than the control condition. We found no evidence, however, that attentional refreshing—a central, attention-based resource—operated to maintain melodies in memory. ARS alone did not impair memory performance as compared to the control condition, and ARS in addition to AS did not result in worse performance than did AS alone (as had been demonstrated in past research on memory for verbal information; see Camos et al., 2009). Our results suggested that the articulatory mechanism of verbal working memory is important in the maintenance of memory for melodies.

Considering that the hypothetical attentional refreshing mechanism requires central (i.e., general) attentional resources, one possible interpretation of the lack of impact of ARS on memory is that the ARS math task was not difficult (i.e., resource-demanding) enough to impact d' scores in our experiments (see, e.g., Navon & Gopher, 1979). Camos et al. (2009) found an effect of the same ARS task on the retention of lists of consonants. Though some of the lists used by Camos et al. were longer (up to seven consonants) than our four-note melodies, research has suggested that memory for tones is inferior to memory for verbal stimuli (Li, Cowan, & Saults, 2013). Pechmann and Mohr (1992) suggested that “the articulatory loop is assumed to work rather automatically . . . [but] tonal rehearsal may be a much more controlled process that requires additional allocation of attention” (p. 315). Since our modal participant had little to no musical training, the retention of melodies with attentional refreshing should have been especially demanding for them, and therefore particularly sensitive to interference from additional attentional load, yet we found no evidence of this.

Another possible interpretation of our findings requires that we reconsider the notion of attentional resources as a central, general (i.e., all-purpose) construct. Indeed, recent literature on the attentional refreshing mechanism of verbal working memory has assumed that attentional resources are not fractionated; that is, solving a math problem and remembering verbal material both require the same central resource pool, according to this account (see Camos, 2015; Camos et al., 2009). Other theoretical orientations, however, have attributed cognitive processes to multiple, relatively independent pools of resources (e.g., Wickens, 2002), but these perspectives have not specified how nonverbal sounds are rehearsed in memory. It remains possible that a mechanism in addition to articulatory rehearsal might support the maintenance of nonverbal sounds in memory. Our results suggested, however, that any nonarticulatory, attentional resource explanation of the maintenance of melodies in working memory may rely on a more specialized attentional mechanism than the general one described in the verbal working memory literature (e.g., Camos, 2015). Another possibility is that the recall task used by Camos and colleagues is more sensitive to attentional refreshing interference than the two-stimulus comparison task used here. Unfortunately, there is no established recall task procedure with nonverbal auditory stimuli, because an appropriate response modality is not apparent.

Some researchers (e.g., Soemer & Saito, 2015) have suggested that, whereas the pitch content of nonverbal sounds can be rehearsed with articulation, memory for the timbral properties of sounds must use a different mechanism. Indeed, of the numerous studies (reviewed above) that have shown that AS impairs memory for nonverbal sounds, most (including the present study) used stimuli whose primary distinguishing property was pitch, or perhaps more specifically, pitch change or melodic contour. The possibility remains that a mechanism like attentional refreshing could operate independently from articulatory rehearsal to maintain nonpitch properties (e.g., timbre) of nonverbal sounds.

Given the results of the present study, however, it is unclear why a nonarticulatory mechanism that could maintain acoustic timbre information could not also be deployed to maintain acoustic pitch information when articulatory rehearsal was obstructed. The adaptive value of a rehearsal mechanism dedicated exclusively to timbre that cannot access pitch information is unclear. Future research perhaps should fully examine the stimulus characteristics, task demands, or individual differences in maintenance strategies that do or do not permit articulatory rehearsal of sounds, since these variables may be able to explain the discrepant effects of AS on memory for nonverbal auditory stimuli without appeal to an additional mechanism. Clearly, much more research will be needed to clarify our understanding of how sounds that are not speech are maintained in memory.