A Hebbian Model to Account for Musical Expertise Differences in a Working Memory Task

Lörch, Lucas; Lemaire, Benoît; Portrat, Sophie

doi:10.1007/s12559-023-10138-3

A Hebbian Model to Account for Musical Expertise Differences in a Working Memory Task

Original Article
Open access
Published: 19 May 2023

Volume 15, pages 1620–1639, (2023)
Cite this article

Download PDF

You have full access to this open access article

Cognitive Computation Aims and scope Submit manuscript

A Hebbian Model to Account for Musical Expertise Differences in a Working Memory Task

Download PDF

1050 Accesses
Explore all metrics

Abstract

The TBRS*C computational model provides a mathematical implementation of the cognitive processes involved in complex span tasks. The logic of the core processes, i.e., encoding, refreshing/time-based decay, and chunking, is based on Hebbian learning, synaptic facilitation, and long-term neural plasticity, respectively. The modeling, however, takes place on a cognitive rather than a physiological level. Chunking is implemented as a process of searching for sequences of memoranda in long-term memory and recoding them as a single unit which increases the efficacy of memory maintenance. Using TBRS*C simulations, the present study investigated how chunking and central working memory processes change with expertise. Hobby musicians and music students completed a complex span task in which sequences of twelve note symbols were presented for serial recall of pitch. After the presentation of each memorandum, participants performed an unknown, notated melody on an electric piano. To manipulate the potential for chunking, we varied whether sequences of memoranda formed meaningful tonal structures (major triads) or arbitrary trichords. Hobby musicians and music students were each split up in a higher-expertise and a lower-expertise group and TBRS*C simulations were performed for each group individually. In the simulations, higher-expertise hobby musicians encoded memoranda more rapidly, invested less time in chunk search, and recognized chunks with a higher chance than lower-expertise hobby musicians. Parameter estimates for music students showed only marginal expertise differences. We conclude that expertise in the TBRS model can be conceptualized by a rapid access to long-term memory and by chunking, which leads to an increase in the opportunity and efficacy of refreshing.

Promoting the experimental dialogue between working memory and chunking: Behavioral data and simulation

Article 23 November 2015

Beyond the ears: A review exploring the interconnected brain behind the hierarchical memory of music

Article 18 September 2023

Mathematical expertise: the role of domain-specific knowledge for memory and creativity

Article Open access 02 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Working memory (WM) is generally defined as a set of memory processes that enable the maintenance of information during concurrent processing of other information [1]. The Time-Based Resource Sharing (TBRS) theory [2] assumes that this is achieved by a rapid switching between processing new stimuli and refreshing already encoded information. According to the theory, any information that is not in the focus of attention suffers from time-based decay. Hence, there is a need for frequent refreshing of to-be-maintained information. Due to the central attentional bottleneck [3], though, attention can be devoted to only one central process at a time. Thus, the sharing of attentional resources between processing and refreshing needs to be time-based [2].

Being a general theory of WM, TBRS is not concerned with expertise. However, other theories of WM such as template theory [4] or long-term WM theory (LT-WM) [5] have conceptualized WM as being inherently influenced by expertise. Experts’ WM has been repeatedly found to be better [6] and this advantage is commonly explained with both the concept of chunking [7,8,9] and the rapid access to long-term memory (LTM) [10, 11]. The main idea of chunking is that experts’ memory system detects known structures in processed stimuli and recodes them as single, meaningful units [12].

Besides these theoretical explanations, there is biological evidence for expertise differences in WM. James and colleagues [13] compared gray matter density between participants of three levels of musical expertise. Their analysis revealed that a higher level of expertise is associated with an increase in gray matter density in areas involved in higher-order cognitive processing, including the left inferior frontal gyrus which is involved in working memory processes.

In the context of the TBRS theory, an account of how central WM processes and chunking change with expertise is lacking. The present work addressed this issue by analyzing expertise differences in WM and chunking in the processing of musical note symbols. To this end, we created a musical complex span task. In complex span tasks, memoranda have to be maintained while a secondary distractor task is performed [14]. In the most classical example, the reading span task, numerous sentences have to be read aloud and the last word of each sentence has to be memorized [15]. Analogously, in the present task, single note symbols were presented for later serial recall. In between the presentation of each of these to-be-remembered notes, participants had to perform a short, unknown, notated melody on an electric piano. Hobby musicians and music students completed this task and its procedure was slightly adapted to match the skill level of the two sub-samples.

To manipulate the possibility for chunking, we varied the meaningfulness of the tonal structure of the sequences of to-be-remembered notes. In meaningful sequences, the to-be-remembered notes formed major triads, which can be considered meaningful units in tonal music. In stimuli that were not meaningful, to-be-remembered notes formed arbitrary trichords, i.e., tonal structures that were at odds with the common rules of tonal music. We generally expected that more musically experienced participants would gain an additional benefit in recalling sequences of major triads.

In addition to identifying the interplay of expertise groups and tonal structure conditions, our goal was to uncover the underlying cognitive mechanisms. One method to pursue this goal was to design a computational model that can perform the same experiment as the participants by expressing the involved cognitive processes within a computational framework. This framework was the TBRS*C computational model [16], which is a model of WM supplemented with a chunking mechanism. The TBRS*C model performed the experimental task and it was analyzed which parameter estimates best reflected expertise differences in the human data. This provided insights how WM processes and chunking might change with expertise.

The TBRS*C Computational Model

TBRS*C [16] uses the same functional core as TBRS*, which was developed by Oberauer and Lewandowsky [17] as a computational implementation of the TBRS verbal theory. TBRS* simulates serial recall in complex span tasks. Because recall is serial in such tasks, associations between items and their position in the sequence have to be built and maintained. For instance, if participants are presented with the items A, B, and C, it is supposed that they have to create associations between item A and position 1, item B and position 2, etc. This is described in TBRS* by a Hebbian learning mechanism which has both a cognitive and a computational modeling basis. The Hebbian learning rule [18] describes the modification of neural network connections as a result of the firing of output neurons [19]. More specifically, if cell assemblies, i.e., networks of interconnected neurons that form a functional unit [20], are activated simultaneously, they become associated. The unsupervised learning in neural networks has been described based on this rule [21, 22].

The specific decay/refresh mechanisms in TBRS also have a neural equivalent. Although WM is generally assumed to be biologically implemented by persistent spiking activity, another line of research considers that WM can be explained by short-term synaptic plasticity mediated by increased residual calcium levels [23]. In this kind of model, memory maintenance is directly achieved through short-term synaptic facilitation. However, this facilitation decays over time [24].

In line with these explanations, TBRS* stores associations between items and positions in a network with two fully interconnected layers, i.e., a position layer and an item layer. Each item is represented by a node in the item layer that is connected with a set of position markers in the position layer. Adjacent positions share a certain proportion P (30% by default) of these markers in order to represent the fact that people are more likely to confound a position with the previous or the next one, but less likely with others. TBRS* reproduces the basic operations of a complex span task, namely encoding, refreshing, distractor processing, and recall. Following the assumptions of the underlying TBRS model, only one of these processes can be performed at a time and all items that are not in the focus of attention suffer from time-based decay. Figure 1 presents the architecture that is the basis of TBRS*.

Table 1 provides an overview of the parameters of the computational model. These parameters will now be explained in detail. Encoding of items is performed by a Hebbian mechanism that strengthens the association between the item node and the markers representing its position. Basically, each connection weight w_ip between an item i and a position unit p is increased by ∆w_ip = η.(L-w_ip) where L is an asymptotic value set to 1/9 because there are 9 position markers coding each position. This way, the total strength of the item-position association that can be reached during encoding is bound to 1. The rate of increase of the association strength is defined as η = 1—e^−Rt, i.e., it follows an exponential curve. It is influenced by the time during which the association is strengthened (t) and the parameter R. With increasing R, the association strength increases faster and hence, the maximum is reached more rapidly. So, R affects the time that is needed to encode a memorandum. For example, with the default value R = 6 and a duration of t = 0.5 s, the strength of the association between an item being encoded and its position is η = 0.95 which represents 95% of the maximum value. Actually, to model some variability, it is not the value R which is used but rather the outcome of a random draw from a normal distribution centered at R, with a standard deviation of s (1 by default).

Table 1 Parameters of the TBRS*C computational model

Full size table

Refreshing in TBRS* occurs during any free time, usually right after encoding items or processing distractors. During refreshing, previous positions are considered in turn and for each one, an item is retrieved and the association with its position markers is strengthened, using the same mechanism as during the initial encoding of an item, presented previously, except that the duration is much shorter. As the duration of refreshing Tr is fixed (80 ms by default), though, a larger R does not result in more rapid refreshing, but in a larger activation reached during refreshing.

Retrieval at a given position is performed by selecting the item whose sum of association strengths to the respective position markers is maximal. To mimic retrieval errors, zero-centered Gaussian noise with standard deviation σ (0.02 by default) is added to each sum of activation strengths. More precisely, the selected item is defined by argmax_i(∑_p w_ip + noise) where noise ~ N(0, σ) and w_ip is the association weight between item i and position p. However, if that best value is lower than a retrieval threshold ϴ (0.05 by default), no item is recalled as if it was forgotten.

Distractor processing is not simulated per se, but its effect is reproduced by applying a decay function to the item-position associations during processing of distractors. The Ta parameter indicates the time used for the attentional capture of a distractor. During that time, all association weights w decay and become w_new = w.e^−D.Ta, where D is a decay parameter usually set to 0.5.

Recall in the model involves the retrieval and output of the most activated item associated with the markers representing a given position, following the mechanism presented previously. Once an item i is recalled, its associations with the current position p are suppressed by Hebbian anti-learning (∆w_ip = − ηL) in order to minimize repetition of the same item at a subsequent position. Further details of TBRS* mechanisms and parameters are described in the seminal article [17] or in derived models [26].

TBRS*C [16] extended TBRS* with a chunking mechanism which accounts for the fact that humans may recode known sequences of items as single units to increase recall performance. TBRS*C assumes that there is a period right after encoding an item during which long-term memory is searched for the previous sequence of items. If a chunk is successfully recognized, the items are chained and the known group is associated with the position of the first item in the sequence. This is advantageous, as fewer elements need to be refreshed. So, chunking in TBRS*C denotes a process of searching LTM for sequences of encoded items and recoding them; a chunk denotes a known sequence of items in LTM.

For instance, if the letter sequence X-P-D would be presented, the model would search for XPD in LTM and would not recognize a chunk. However, if the next letter would be F, the model would recognize the chunk PDF and would associate it with the position of the first letter of the acronym. Consequently, only one unit (PDF) would have to be refreshed in position two instead of three letters in positions two, three, and four. To search a known sequence in LTM, all its constitutive elements need to be simultaneously present within the focus of attention. Thus, as opposed to TBRS*, TBRS*C has an attentional focus size of up to four elements [27] meaning that up to four items are refreshed in parallel during each refreshing period. The duration Tr is not modified, but the strength is divided by the number of items N that is considered: ∆w_ip = η.(L-w_ip)/N. Items are thus refreshed in groups of 4 instead of individually, but the strength of refreshing is 4 times weaker. Actually, N is not always 4 because at the beginning of the task, there are less than 4 items to be refreshed.

Chunking is implemented in the model by two parameters, namely the time invested in searching for known sequences (chunk search duration, cSD) and the likelihood of recognizing an item as a chunk, given it exists in LTM (probability of chunk retrieval, PCR). Both parameters are separate and independent. With reference to the architecture presented previously, cSD represents an additional amount of time right after encoding an item, during which there is no refresh and all association weights decay, exactly like during the attentional capture of a distractor. PCR, however, does not change the time course of processes in the model as it only controls the probability of recognizing the previous sequence of encoded items as a chunk.

In the initial study on TBRS*C, Portrat and colleagues [16] employed a complex span task in which seven letters were presented as memoranda. Between the presentation of memoranda, participants had to complete spatial judgment tasks. Known letter sequences, namely French three-letter acronyms, were either absent or present, starting at the first, third, or fifth serial position. Participants’ recall data was simulated with TBRS*C leading to the conclusion that chunking is “an attentional time-based mechanism that certainly enhances WM performance but also competes with other processes at hand in WM” [16, p. 430].

Expertise Differences in Working Memory Functioning

In the present work, we assumed two expert advantages in WM functioning, namely chunking and rapid access to LTM. These advantages are assumed by other theories of WM, such as LT-WM [5] and template theory [4]. In addition, they are biologically founded on long-term neural plasticity. When someone practices to become an expert, Hebbian learning takes place [20]. As a consequence, functional units of neurons (so-called cell assemblies) form new associations, thereby creating chunks. For example, if the notes C-E-G are repeatedly activated together with the verbal label “C major,” the notes and the label form a functional unit through Hebbian learning. Rapid access to LTM, however, is based on another mechanism: nerve myelination. Myelin is found in the brain’s white matter. It is a white, fatty tissue that encloses axons and increases the speed of the passing nerve impulses [28]. Myelination is a process that persists for the first three decades of human development and is affected by experience [29]. Specifically, piano practice in certain critical developmental periods was found to be associated with plasticity in myelinating tracts [30]. As a consequence of musical training, myelin cells around nerve fibers have been found to increase in size, contributing to the velocity of electrical impulses [13].

Based on these biological mechanisms, the present study sought to unravel expertise differences in WM functioning in greater detail. To this end, we collected data from a complex span task with musical notation. To ensure variation in musical expertise, the task was completed by two sub-samples, namely music students and hobby musicians. The complex span task required the performance of notated melodies at first sight, which is highly demanding for hobby musicians. Thus, the task procedure was slightly adapted to match hobby musicians’ skill level. Using a median split on the general musical sophistication scale of the Gold Musical Sophistication Index [31], both sub-samples were split up in a higher-expertise and a lower-expertise group (threshold for hobby musicians: 69.5; threshold for music students: 85.5).

The complex span task was additionally performed by the TBRS*C computational model. Separately for both sub-samples, we analyzed which parameter values best reflected the differences in task performance between the higher-expertise and the lower-expertise group. The parameters for this analysis were chosen based on the expected expertise differences in WM: the parameters cSD and R were chosen to investigate experts’ rapid access to LTM; the parameter PCR was chosen to investigate experts’ chunking processes. In addition, we were interested if changes in WM and chunking processes would be associated with changes in the way resources were shared between the two task components. We wondered if the same amount of time would be used for the processing of distractors despite changes in the timing of encoding and chunking. Thus, we explored expertise differences in the parameter that represents the time used for the processing of distractors (Ta). In the analysis, we checked which combination of values for these four parameters (cSD, R, PCR, Ta) would provide the best fit to expertise differences in the human data. Due to the difference in the experimental procedure, music students and hobby musicians were not directly contrasted, but higher-expertise hobby musicians were compared to lower-expertise hobby musicians and higher-expertise music students were compared to lower-expertise music students.

Hobby Musicians

Method

Sample

The hobby musician sub-sample (n = 80) was recruited at the University of Mannheim. It contained 53 female and 25 male students with two participants not providing information on their sex. Participation was restricted to students of any subject except music that considered themselves able to perform musical notes on an instrument. Hobby musicians’ mean age was 21.36 years (SD = 2.55; Min = 18; Max = 31) and most of them studied psychology (27) or teacher education (43). All participants either received 5 € or course credit for their participation.

The general musical sophistication scale of the Gold Musical Sophistication Index (Gold-MSI) [31] revealed that the level of musical expertise of hobby musicians was comparable to the Gold-MSI norm sample (norm sample: M = 70.41; SD = 19.94; hobby musicians: M = 69.75; SD = 13.37). From single items of the Gold-MSI, it became apparent that the average hobby musician in the present study had practiced an instrument regularly and daily for 4 to 5 years (Item 32: M = 4.86; SD = 1.81, means and standard deviations refer to the answering scale of the Gold-MSI), had practiced for one hour daily at the height of musical interest (Item 33: M = 3.24; SD = 1.37), and was able to play two musical instruments (Item 37: M = 2.76; SD = 1.05). For all analyses, we created a group of higher-expertise hobby musicians and a group of lower-expertise hobby musicians by performing a median split on the Gold-MSI score with the threshold of 69.5.

Procedure

The general procedure of one trial of the complex span task is presented in Fig. 2. In the task, twelve single quarter note symbols were presented as memoranda. The pitch of these notes had to be recalled at the correct serial position. Between the presentation of these to-be-remembered notes, participants had to perform an unknown, notated melody on an electric piano. In other words, participants saw a single note they had to memorize, then had to perform a notated melody, saw another note they had to memorize, performed another melody, and so on. After the performance of the twelfth melody, the recall task followed. Participants completed four of these trials, i.e., had to complete four recall tasks. The reason for using such a small number of trials was that one trial took about 6 min and the whole experiment in its present form already took about 1 h. Using a larger number of trials probably might have led to fatigue and hence invalid data. Eye tracking and Midi data were recorded during the musical performance. They were used to analyze the association of the number and duration of fixation with the accuracy of performing the melodies [32]. These analyses, however, were unrelated to the present work and hence will not be reported in any further detail.

The complex span task started after participants gave informed consent and received instructions. Each of the four experimental trials began with a preparatory phase with three steps: (1) positioning the hand on the piano keyboard, (2) performing a preparatory melody, (3) calibrating the eye tracker. The tones of the melodies that had to be performed were drawn from a set of five adjacent tones. Participants were informed which tones they would have to play in a given trial and how to position their hand on the keyboard to play them. Thus, participants did not have to move their hands on the piano during one trial. Then, a preparatory melody was provided prior to the experimental task. Participants were allowed to play it for as long as they wanted in order to practice the mapping between note symbols and piano keys. The tones of this melody were drawn from the five tones of the given trial, but apart from that, the preparatory melodies did not resemble the experimental melodies in any further aspect. Subsequently, the eye tracker was calibrated and the complex span task started.

The complex span task consisted of two alternating phases: (1) the presentation of memoranda and (2) the performance of melodies (see Table 2, the mapping to TBRS*C processes will be explained below). The former phase comprised the presentation of a fixation cross (2000 ms) and of a single quarter note symbol in treble clef (2500 ms). The latter comprised a two-bar count-in (6856 ms), the musical performance (13,714 ms), and the saving of eye tracking data. During count-in and performance, a digital metronome provided the tempo of 70 beats per minute via speakers. During the count-in, a preview of the first bar of the melody was provided. The whole melody appeared on the screen when the performance started. After each musical performance, the eye tracking data which had been collected during this performance was saved. Saving times varied marginally (M = 6124 ms; SD = 390 ms). In one trial of the task, the procedure depicted in Table 2 was repeated twelve times until participants had to recall all memoranda and write them in the correct serial order on a sheet of paper with an empty staff.

Table 2. The phases of the musical complex span task and how they were mapped on TBRS*C processes

Full size table

Prior to the first experimental trial, participants performed a warm-up trial. It was similar to the experimental trials. However, only three instead of twelve notes had to be recalled and there were some additional instructions (“memorize the following note,” “play the following melody”) prior to each stimulus. The purpose of this warm-up was that participants could get used to the experimental procedure and that any misunderstandings could be clarified prior to data collection.

In the end of the experiment, participants completed the global scale of the Gold-MSI [31] termed general musical sophistication and answered questions on demographics. The experimental procedure was in full agreement with APA’s Ethical Principles of Psychologists and Code of Conduct [33] and with German data privacy regulations. ^{Footnote 1}

Design and Material

The design of the experiment was defined by the between-participants factor expertise group (higher-expertise vs. lower-expertise hobby musicians) and the within-participants factor tonal structure (major triads vs. arbitrary trichords). Sequences of memoranda consisted of four three-note melodic cells. Depending on the level of the factor tonal structure, these cells were either major triads or arbitrary trichords. Major triads can be considered meaningful tonal structures in western music. They consist of a base note called root (e.g., C) and two further notes that have a fixed distance of four and seven semitones to the root (e.g., E and G). The name of a specific major triad is defined by its root (e.g., C major triad). Arbitrary trichords in the present study were defined as consisting of a root note followed by two notes that had a distance of eight and nine semitones to it (e.g., C, G#, A). We assumed that major triads would be beneficial for chunking processes. Participants performed two trials in each condition and were not informed about the regularities within the sequences of notes. The order of trials was randomized. Table 3 shows the twelve memoranda of the four trials.

Table 3. Each staff shows the twelve to-be-remembered notes that were presented successively in one trial of the complex span task

Full size table

To select the memoranda, the roots of each triad, i.e., the notes at positions one, four, seven, and ten, were randomly selected from the notes between C4 and Eb5. As the three-note cells in both conditions had a fixed structure, the remaining notes were derived based on the root notes and experimental conditions. The melodies that had to be performed between the presentation of the memoranda each consisted of four bars, with three notes in each bar (see Fig. 3). They contained only eighth and quarter notes and rests. They were created by arranging four one-bar rhythmic phrases in a random order and assigning a pitch to each note that was randomly drawn from a set of five adjacent pitches from the C major scale. All melodies had a similar structure, which guaranteed that the amount and rate of information that had to be processed was constant across all melodies. Simultaneously, as the order of rhythmic phrases and pitches was varied randomly, it was impossible for participants to anticipate how the melody would progress. Thus, they were forced to process the notes.

Note images were created with the program Forte 7 Basic (https://www.fortenotation.com/en/) and then altered with the graphic-editing software Gimp (https://www.gimp.org/). To create the images containing the to-be-remembered notes, the indication of meter was removed, and a single quarter note symbol (0.3 × 1.2 cm) was positioned in the center of a short staff (6.0 × 1.3 cm) with a treble clef symbol. To-be-remembered notes as well as distractor melodies (24.0 × 0.6 cm) were positioned in the center of a white image (49.92 × 28.08 cm, 1920 × 1080 pixels, 60 cm viewing distance). The experiment was programmed with the software ePrime (https://pstnet.com/products/e-prime/); responses for the recall task and for the questionnaires were indicated on a sheet of paper.

Analyses

To obtain a measure of recall accuracy, the names of the note symbols that participants wrote as their response were transferred manually to a spreadsheet and then recoded as correct (1) or wrong (0) for each serial position. Only notes that were recalled at the correct serial position were judged as correctly recalled. In addition, a melodic cell (i.e., a major triad or an arbitrary trichord) was defined as being recalled correctly if its three notes were recalled correctly. The Midi data of the musical performance was analyzed with the algorithm MidiAnalyze [32] to derive how accurately the distractor melodies were performed.

In the following, we will first present descriptive plots and then results of a Bayesian mixed logistic regression model in which recall accuracy (0/1) was predicted by serial position and an interaction of expertise group (higher-expertise vs. lower-expertise) and tonal structure (major triads vs. arbitrary trichords). This inferential analysis provided insights into the data structure that was the basis for the TBRS*C simulations. Generally, TBRS*C simulations provide serial position accuracies, i.e., mean recall accuracies per serial position. In the present study, simulations were performed separately for expertise groups and experimental conditions. Thus, to interpret the simulations correctly, it is important to understand how recall accuracy varied with serial position, expertise group, and tonal structure.

Prior to analysis, six hobby musicians were excluded as they either had a very low recall accuracy of below 5% or did not adhere to experimental instructions. Experimental materials, data sets, and analysis code of this study can be accessed via https://doi.org/10.17605/OSF.IO/6UKEV.

Results

Figure 4 shows mean values by group and condition for the proportion of correctly recalled notes, of correctly recalled melodic cells, and of correctly performed distractor notes. The left plot shows that recall was more accurate in the higher-expertise group and in the major triads condition; the advantage in the major triads condition was larger in the higher-expertise group.

To test this pattern statistically, we performed a Bayesian mixed logistic regression. The model was created in Stan (http://mc-stan.org/), accessed via the R package brms [34]. Recall accuracy (0/1) was predicted by serial position and by an interaction of expertise group (higher-expertise vs. lower-expertise) and tonal structure (major triads vs. arbitrary trichords). We implemented a full random structure, i.e., the effect of all predictors as well as the intercept varied across participants. This controlled for any differences between participants that were not accounted for by the predictors, such as domain-general WM capacity. The predictor serial position was an integer variable with values from zero to eleven. Thus, the intercept in the model represented the expected log-odds of the recall accuracy of a lower-expertise hobby musician in the major triads condition at the first serial position. Leave-one-out (loo) analyses were performed but revealed no influential data points.

We used highly informative priors, which were defined a priori based on the expected pattern of results using the conditional means prior approach [35]. Table 4 shows priors and posterior distributions of the regression coefficients. The main effect of tonal structure (posterior mean regression coefficient: 0.71; 95% credibility interval: [0.35, 1.09]) and the interaction of expertise group and tonal structure (posterior mean regression coefficient: 0.63; 95% credibility interval: [0.09, 1.17]) were most pronounced; the main effect of expertise group was of marginal extent (posterior mean regression coefficient: 0.41; 95% credibility interval: [−0.09, 0.91]). To clarify the pattern of interaction, Fig. 5 provides a conditional effects plot. It shows the predicted interaction of expertise group and tonal structure on the accuracy scale for the recall of the sixth note. The sixth note was chosen as it is in the center of the sequence of memoranda, i.e., it might neither be affected by recency nor by primacy effects [36]. The pattern of interaction was as expected: the difference in recall accuracy between sequences of major triads and sequences of arbitrary trichords was larger in the higher-expertise group. This pattern of effects did not change using weakly informative priors.

Table 4 Priors and posterior distribution of regression coefficients of the Bayesian mixed logistic regression for the hobby musician sub-sample

Full size table

Moreover, another regression analysis revealed that the accuracy of performing distractor notes (right panel of Fig. 4) did not differ by group, condition, or their interaction. Using these predictors in a mixed linear model with the package lme4 [37] did not increase the fit to the data compared to a null model without any predictors (χ² (3) = 4.38; p = 0.22).

TBRS*C Simulations

Serial recall in the musical complex span task was then simulated with TBRS*C to investigate expertise differences in WM and in chunking processes. Table 2 depicts how the different phases of the experimental task were mapped on TBRS*C processes. The number of memoranda was set to twelve, as participants had to recall twelve notes in each of the four trials. From the second repetition onwards, the presentation of the fixation cross was simulated as being used for refreshing already encoded notes. The time span in which the to-be-remembered note was presented was modeled as being used first for the encoding of the currently presented note, then for chunking, and for the refreshing of already encoded notes. However, during the encoding of the first two notes, chunking was not initiated, as chunks consisted of three notes. The duration of encoding the currently presented note depended on the value of the R parameter (6 at default, which corresponds to 500 ms). During chunking, the simulation searched in long-term memory for a chunk consisting of the last three notes. Long-term memory was defined to contain major triads, i.e., it was only possible that the simulation recognizes a chunk in sequences of major triads. Refreshing took place during any free time prior to the distractor task.

The distractor task, i.e., the performance of the notated melodies, started with a two-bar count-in. During this count-in, the first bar of the melody was presented. After the count-in, the four-bar musical performance started. Figure 6 shows how the distractor task was modeled in TBRS*C. Both the preview and the performance were modeled as alternating processing distractor notes and refreshing to-be-remembered notes. Ta indicated the time used to process a distractor note. Any free between the processing of subsequent distractor notes was modeled as being used for refreshing. This means that the preview and the musical performance were identical in the simulations. This approach was based on the assumption that, to prepare the performance, participants would process the distractor notes during the preview in a comparable manner as while playing them. In summary, as each bar contained three distractor notes and as participants read six bars (two-bar preview, four-bar performance), the musical performance was modeled as processing eighteen distractor notes interspersed with refreshing the to-be-remembered notes. In the end of the distractor task, there was a short time interval in which the eye tracking data had to be saved. This time interval was modeled as being used for refreshing.

TBRS*C simulations were performed on higher-expertise and lower-expertise hobby musicians independently. Moreover, the simulation of the data was performed in two steps: First, only the data of the arbitrary trichords condition was simulated to obtain a baseline of the parameters Ta, R, and cSD. Second, using these initial parameter estimates, the data of the major triads condition was used to estimate PCR.

This two-step procedure was based on the assumption that the memory system cannot know beforehand if it would recognize a chunk in a certain sequence of memoranda. Hence, chunking is initiated in the arbitrary trichords condition as well. This means that all parameters except for the probability of chunk retrieval (PCR) are the same in both conditions. The PCR parameter, however, can only be estimated if it is possible to recognize chunks, i.e., in the major triads condition.

Table 5 shows the details of this two-step analysis as well as the results. The column “Values” indicates the parameter values that were considered in the analysis. If initial simulations showed that the best-fitting value was one of the extremes of the tested values, additional smaller or larger values were included. We performed a 3000-run simulation with each combination of these values and computed the root mean square error (RMSE) to investigate the fit between the simulated and the human data for each serial position. The TBRS* basic parameters were set to the default values reported in Oberauer and Lewandowsky [17]: P = 0.3; τ_E = 0.95; s = 1; D = 0.5; Tr = 0.08; θ = 0.1; σ = 0.02. The column “Optimum” in Table 5 indicates the combination of parameter values for which the best fit was obtained.

Table 5 Results of TBRS*C simulations for hobby musicians. A 3000-run simulation was performed for each combination of parameter values. The optimum was the combination of parameter values with the best fit to the experimental data

Full size table

In the simulation of the arbitrary trichords condition, the best fit was obtained with parameters that characterized higher-expertise hobby musicians by a shorter chunk search duration (cSD = 800 ms), a stronger encoding (R = 8), and a longer time of attentional capture of distractor notes (Ta = 600 ms; RMSE = 0.070) compared to lower-expertise hobby musicians (cSD = 1,200 ms; R = 6; Ta = 400 ms; RMSE = 0.091). Using these values, the simulation of the major triads condition suggested that the probability of recognizing a chunk was larger for higher-expertise hobby musicians (PCR = 50%; RMSE = 0.061) than for lower-expertise hobby musicians (PCR = 30%; RMSE = 0.076).

Following these analyses, we were interested if chunking in the simulations was beneficial for recall even though chunks were recognized only with 30 to 50% chance. Thus, we performed a “no chunking” simulation in which cSD and PCR were set to zero and all other parameters were kept constant. Figure 7 shows serial position curves of human and simulated data, separately for expertise groups and tonal structure conditions. The curves of the “no chunking” simulation in the top plots suggest that chunking indeed was beneficial for both expertise groups.