Ethical approval for the study was obtained from the Ethics Committee at the Psychology Department, Goldsmiths, University of London. Informed consent was obtained from all participants tested.
Power analysis was conducted a priori to determine the number of participants required. Given our testing tool is a novel instrument and our primary interest is the correlations with a questionnaire and related tests, we decided to set .30 as the minimum effect size for observation. G*Power (Faul, Erdfelder, Buchner, & Lang, 2009) calculated that 84 participants would be required to achieve 80% power in a two-tailed, p = .05 correlational design.
A total of 104 participants (69 females) with a mean age of 25.21 years (SD = 9.26) were gathered from among the student population of Goldsmiths, University of London. To achieve a heterogeneous sample of participants with diverse musical backgrounds, the study was advertised to students in the music department and the psychology department. The overall sample mean of the Musical Training subscale was 26.96 (SD = 12.34) on the scale bounded at 7 and 49, which was comparable to the mean (M = 26.52, SD = 11.44) reported by Müllensiefen et al. (2014) from a large UK sample. Specifically, 26.5% reported to have more than 10 years of formal musical training, an equal amount reported to have no experience (26.5%). Subsequently the percentages were: 3 to 5 years (14.5%), 6 to 9 years (11.1%), 1 year (9.4%), and 2 years (6.8%).
Seven participants did not move the slider on more than half of the items of the TPT, and two participants’ data were missing for all items. Together, these nine participants were excluded from the analysis, and 95 sets of data remained for the final analysis. Participants were compensated for their time by either receiving course credits or a small monetary award.
Development of the Timbre Perception Test (TPT)
The TPT aims to assess individuals’ perceptual ability to distinguish fine-grained timbral qualities in sound by assessing three important dimensions of timbre—namely, the amplitude envelope, spectral flux, and spectral centroid. The TPT was programmed using the MaxMSP software environment (Version 7.3.4, 64-bit, Cycling 74, San Francisco, CA) as a standalone application, which is portable for both Microsoft Windows and Mac OS operating systems (download available at www.osf.io/9c8qz). In the testing environment, these three dimensions were respectively labelled as Blocks 1, 2, and 3.
Eight sine-oscillators were used to produce sets of complex tone stimuli with one fundamental frequency (f0) and seven overtones. The overtones were multiples of whole number integers to the f0, starting from multiples of two to eight (i.e., first harmonic = f0 × 2, second harmonic = f0 × 3, etc.). The stimulus tones were repeated three times, indicated by a flickering blue light, at intervals of 800 ms. This repetition of the tones was to ensure that participants hear the stimuli during memory trials, in which the playback is limited.
Five pitch-tones were employed with notes ranging across two octaves (from G3 to A#4) to encompass a wide range of frequency spectrum. Moreover, five acoustic values of attack/decay, spectral flux, and spectral centroid were mix-matched to produce five unique parameter sets. These sets were mapped on to the stimuli and systematically organized to ensure that all five sets are presented for every testing dimension (in a varied order). The full acoustic range of each testing dimension and parameter values used for the stimuli are reported in Appendix C.
Unlike the stimulus tones, the participants could manipulate the reproduction tone by moving an interactive slider (with a slider range of 0–100) to change the sound profile according to the dimension being tested, whereas the other two dimensions not being tested had identical profiles to the stimulus. For instance, when participants performed a trial in manipulating attack/decay (here group termed as ‘Envelope’), moving the slider only affected the envelope of the reproduction tone, whereas the parameters of spectral flux and spectral centroid remained unchanged (i.e., identical profiles to the corresponding stimulus tone).
Ultimately, the participants’ task was to manipulate the reproduction tone by moving the slider to replicate the stimulus tone as accurately as possible. Figure 1 illustrates the layout of the TPT software and graphical representation of the change in sound profiles of the subtasks by the movement of the slider.
In the Envelope subtask (Block 1), the slider bar altered the log attack time which also inversely influenced the decay time of the reproduction tone. Log attack time has shown to be the salient attribute of timbre identification, whereas lesser extent for the decay time. Nonetheless, we included the decay time to keep the total duration of the stimulus approximately constant and allow listeners to focus on the interplay between the two parameters. We reasoned that if only the attack time was included, there would be a potential risk of participants judging the stimulus merely by the total tone duration instead of by its dynamics of the rise and fall in amplitude. Hence, moving slider to the left (i.e., closer to zero) manipulated the reproduction tone to have a shorter attack with longer decay time, whereas moving the slider to the right (i.e., closer to 100) resulted in longer attack with shorter decay times, with them always having an inversely proportional relationship. The full acoustic range covered by the slider in each subtask is reported in Table 1.
In the Spectral Flux (Block 2) subtask, the ratios of harmonics to the fundamental frequency were altered to introduce dissonance caused by the beatings of frequency, characterized as ‘roughness’. To achieve this effect, four harmonics were manipulated with the movement of the slider. This manipulation occurred by altering the ratio between the harmonics and their whole-number integers (i.e., when the slider was moved from left to right, the ratios of the 4th and 6th harmonics were increased and those of the 5th and 7th harmonics were decreased). Similar to the Envelope subtask, the inversely proportional relationship between two pairs of harmonics was to prevent participants from making judgments based merely on the rise or fall in global pitch. Moving slider to the left aligned the harmonics closer to the whole integer numbers and therefore more consonant. Meanwhile, moving to the right introduced more dissonance as the number of beating frequencies increased.
In the Spectral Centroid subtask (Block 3), a bandpass filter was applied to the source sound to alter its spectral centroid, which has shown to be a good predictor of the perceptual ‘brightness’ of a sound. The bandpass filter is characterized by two main components: one being the ‘centre frequency’ (also known as ‘resonant frequency’) which is the peak frequency response, and the other being the quality factor ‘Q’ which describes the ratio of the centre frequency to the bandwidth. Higher Q value corresponds to the passing of narrower frequency spectrum, resulting as a pointier bell-shaped curve when observed with an audio equalizer spectrum. For this subtask, Q remained constant at a ratio of 1.8 and only the centre frequency was manipulated. Positioning the slider from left to right moved the centre frequency of the sound from low to high on the frequency spectrum, with brighter sounds located on the right. The filter responded to the slider following a logarithmic relationship in agreement with the basic principle of human frequency perception (Moore & Glasberg, 2007).
To establish suitable parameter ranges for the subtasks, pilot testing (N = 15, 10 females; age M = 27 years, SD = 6.8) was conducted to assess the level of difficulty of the items. In the first instance, we tried testing a few participants on a version of the task that combined all three dimensions of timbre (i.e., simultaneous manipulation of three sliders). However, almost all participants found it very difficult to get a good understanding of the task, and we could not judge whether they were attending to the changes produced by each slider. Hence, we decided subsequently to simplify the interface by splitting the full experiment into three subtasks, with each subtask only presenting one slider (i.e., manipulating only a single timbre dimension at a time). The pilot test consisted of four trials per subtask without restricting the playback of the stimuli. Judging by the absolute distance of participants’ slider position from the target value, the results indicated that the Envelope and Spectral Centroid subtasks were relatively easy compared with the Spectral Flux subtask. Therefore, the parameters were adjusted to balance the level of difficulty across the subtasks.
Subsequently, a second pilot test was conducted by reinviting six of the participants from the first pilot test. The distribution of responses confirmed that the difficulty of the three tasks roughly matched in terms of the absolute distance to the target value of the stimulus, with Envelope (Mean absolute slider distance from target = 15.0 points, SD = 11.9), Spectral Flux (M = 20.1 points, SD = 17.7), and Spectral Centroid (M = 15.7 points, SD = 15.7). These new parameter ranges as given in Table 1 were used for the main experiment.
Participants took approximately 5 minutes to complete the full pilot test. Given such short duration, an extra item was added on each subtask, as well as ten trials with limited playback (memory trials). The memory trials differed from the here-called match trials in that the stimulus sound could be played-back only once at the beginning of a trial. The participants had to retrieve the heard attributes of the timbre and adjust the slider entirely from memory. Thus, the final version of TPT for the main experiment comprised of five items of match trials and ten items of memory trials for each of the three subtasks (Envelope, Spectral Flux, and Spectral Centroid) presented in blocks, totalling 45 items. In addition, a training item was included prior to beginning each subtask for participants to become familiarized with the changes that it produced. The final version for the experiment lasted about 10–15 minutes.
Materials for testing validity
Pitch discrimination of complex tones (Soranzo & Grassi,
This test is part of the PSYCHOACOUSTICS toolbox for MATLAB and is designed to examine listener’s threshold in detecting differences in two pitches. It employs a three-alternative forced-choice (3AFC) response paradigm in which three complex tones are presented to the listener in quick succession. Two of the complex tones are played back with the base frequency 330 Hz, while one is higher in pitch (starting frequency at 390.01 Hz). Participants have to identify which one of the three sounds is highest by indicating with number 1, 2, or 3 on the keyboard. In our experiment, participants performed the task using the maximum likelihood procedure (MLP; Shen & Richards, 2012) with two blocks and 30 trials per procedure (blocks averaged for analysis), taking about 4 minutes in duration. The MLP method have been employed extensively in auditory threshold testing for clinical trials (e.g., Benoit et al., 2014; Flaugnacco et al., 2014) and validating newly developed listening tests (e.g., Larrouy-Maestri, Harrison, & Müllensiefen, 2019).
Duration discrimination of complex tones (Soranzo & Grassi, 2014)
The test is part of the PSYCHOACOUSTICS toolbox and measures the listener’s perceptual threshold in detecting duration of musical notes. Three complex tones are presented to the listener with two having note lengths of 250 ms while one having a longer length (starting length at 450 ms). Listeners have to identify the longest tone and it followed the same testing procedure as the pitch discrimination test, taking about 4 minutes in duration.
Profile analysis (Soranzo & Grassi, 2014)
The test is part of PSYCHOACOUSTICS toolbox and measures the listener’s perceptual threshold in detecting amplitude variation of harmonics in a complex tone. Three complex tones are presented to the listener with two having 5 harmonics with fixed amplitude of −4.0 dB while one having a higher amplitude for the 3rd harmonic (starting amplitude at 20 dB). Listeners have to tell the odd sounding tone. Due to the MLP option being faulty for the particular task, the test was run with the Staircase stimulus selection method for a single block with 3AFC, two-down-one-up, 8 reversals, taking about 6 minutes in duration.
Timbre subtest from the Profile of Music Perception Skills battery (PROMS; Law & Zentner, 2012)
In this test, stimuli are generated using a virtual sound sample library, consisting of chords of four notes (C4, E4, G4, C5) lasting 1.5 s in length, taking about 8 minutes to complete a total of 18 trials. Participants compare whether the stimuli are played by identical instruments or not by responding on a scale from 1 (definitely different) to 5 (definitely same). For the easy trials at the beginning of the test, when comparing nonidentical instruments, the instruments are from different families (e.g., horn vs. strings). However, trial by trial, the test gradually becomes more difficult as the comparison is made between similar or within the same instrument family (e.g., most difficult trial compares four violas with three violas and a violin). Individuals’ score is calculated by assigning a score of 1 for a corrected response, 0.5 for a partially correct (i.e., probably different or probably the same), and 0 for an incorrect response. These scores are summed together with the highest possible score being 18. The original study (N = 56) for the Timbre subtest reported a mean raw score of 11.92 (SD = 3.12), internal consistency of α = .77 and ω = .73, and test–retest reliability of r = .69 (with subsample of n = 20).
Gold-MSI self-report questionnaire (Müllensiefen et al., 2014)
This short questionnaire addresses several aspects of musical expertise and engagement, comprising 39 items on five subscales (Active Engagement, Emotions, Musical Training, Perceptual Abilities, and Singing Abilities) and a General Musical Sophistication score. From the original study, comparison data is available from a very large sample (N = 147,663) representing the general, nonspecialist population.
Testing took place in isolated cubicles with Windows 10 operating computers and the stimuli were presented using Behringer HPM-1000 headphones (Behringer GmbH, Willich, Germany). MATLAB (Version R2018a) was used to run the tests from the PSYCHOACOUSTICS toolbox (Soranzo & Grassi, 2014).
The test battery consisted of six assessments and progressed in the following order: hearing assessment, TPT, Pitch Discrimination, Duration Discrimination, Profile Analysis, Timbre subtest from PROMS, and Gold-MSI self-report. After signing the informed consent, a short online hearing assessmentFootnote 1 based on a speech-in-noise hearing test was conducted to screen out participants with impaired hearing. None of the participants in our sample fell below the clinical threshold of 70% correct-response rate. Subsequently, participants received verbal instructions on how to perform the TPT along with the interactive speech bubbles that appeared on the screen during the first training trial.
Participants completed each trial by first listening to the stimulus tone and then by moving the slider bar to adjust the reproduction tone to replicate the stimulus tone as closely as possible. For ease of playback, keyboard shortcuts were used to play the stimulus (keypad ‘1’) and reproduction (keypad ‘2’) tones. They were encouraged to compare the two sounds as many times as necessary during the match trials, whereas they were informed that the stimulus is played only once in the memory trials (if participants clicked the stimulus sound during a memory trial, a speech bubble appeared stating “Remember you can play back the blue sound only once during the memory task!”).
Participants were also informed at the beginning that they would proceed through three separate blocks of tasks with each block consisting of a test trial, five matching trials, and ten memory trials. The overall progress could be tracked with the progress bar, but they were not given any information with regards to how the sounds and the meaning of the slider changed for each block.
Subsequently, participants performed three tests from the PSYCHOACOUSTICS toolbox within the MATLAB environment and Timbre subtest from the PROMS test battery online. Lastly, they were asked to fill the Gold-MSI self-report questionnaire online and were thanked for their contribution. The full test battery lasted about 1 hour in duration.