2.1 Participants
Fifteen subjects (mean age 24.8, 8 females, higher education: undergraduate and above) participated in the main study in return for monetary compensation. All subjects reported normal hearing, and no history of psychiatric disorders.
2.2 Experimental Setup
Subjects were seated in front of a screen with access to a response box in sound-attenuated booth (Industrial Acoustics Company GmbH). Acoustic stimulus presentation and behavioural control were performed using custom software package written in MATLAB (BAPHY, NSL, University of Maryland). The acoustic stimulus was sampled at 100 kHz, and converted to an analog signal using an IO board (National Instruments NI PCIe-6353) before being sent to diotic presentation using headphones (Sennheiser HD380, calibrated flat, i.e. ±5 dB within 100 Hz–20,000 Hz). Reaction times were measured via a custom-built response box and collected by the same IO card with a sampling rate of 1 kHz.
2.3 Stimulus Design
We used a simplified sound texture model, which retained the property of being predictable only from the statistical properties of its complex spectrotemporal structure (Fig. 1a). The texture was a tone cloud consisting of a sequence of 30-ms, temporally overlapping pure tones whose frequency covered a range of 2.2 octaves (400–1840 Hz) divided in 8 frequency bins. The frequency resolution of the tone distribution was 12 semitones per octave, starting at 400 Hz. This allowed 26 tone frequencies in total with 3–4 frequency values in each frequency bin.
The minimal temporal unit of the stimulus was a 30-ms chord in which the number of tones in a particular frequency range was drawn according to a marginal distribution for that frequency range. On average, for each chord duration the mean number of tones per octave was 2. The marginal distribution of occurrence probability was obtained by modifying a uniform distribution in each frequency bin. The probability in each of these is called Puniform = \({\raise0.5ex\hbox{$\scriptstyle 1$} \kern-0.1em/\kern-0.15em \lower0.25ex\hbox{$\scriptstyle 8$}}\) = 0.125. To generate different initial frequency marginals, we perturbed the marginals randomly by adding/subtracting fixed value Δ corresponding to 50 % of the probability (which would be observed for a uniform distribution (Δ = Puniform/2 = 0.0625)). The resulting marginal distribution was thus pseudo-random with 3 bins at Puniform + Δ, 3 bins at Puniform ‑Δ and 2 bins left intact at Puniform. This implies that the average initial probability in 2 bins can take 5 different values, namely Δ, 3Δ/2, 2Δ, 5Δ/2, and 3Δ.
The change consisted in an increment of the marginal distribution in the selected frequency bins at a random point in time (referred to as change time) during stimulus presentation. We chose to use an increment of the marginal distribution in two adjacent frequency bins on the basis that an appearance is more salient than its opposite in a complex acoustic stimulus (Constantino et al. 2012). After the change, the second stimulus continued for up to 2 s or until the subject made an earlier decision, whichever happened first.
The increment size, referred to as change size, was drawn from a set of discrete values [50, 80, 110, 140] %, relative to the single bin probability in a uniform distribution (for 8 bins, the single bin uniform probability is 1/8th, and thus a 50 % change size would be 1/16th). In order to maintain the overall number of tones per chord before and after the change, the occurrence probability in the six remaining frequency bins was decreased accordingly.
The time at which the change occurred (change time) was drawn randomly from an exponential distribution (mean: 3.2 s) limited to the interval [0, 8] s. This choice of distribution prevents subjects from developing a timing strategy, as the instantaneous probability of a change in the next time step is constant.
2.4 Procedure
The experiment was separated into three phases: instruction, training, and main experiment. After reading the instructions, subjects went through 10 min of training (60 trials), where they were required to obtain at least 40 % performance. The training comprised only stimuli of the two greatest change sizes (110 %, 140 %). Three subjects did not attain the criterion level of performance and were not tested further.
The main experiment was composed of two sessions of about 70 min each, comprising a total of 930 trials, corresponding to 60 repetitions of each condition. The two sessions were never more than 2 days apart. After reading the instructions, subjects were aware that the change could arise at any moment on each trial and that their task was to detect it within the 2 s window.
Visual feedback was always displayed on the screen in front of them; either a red square was displayed when the button was pressed before the change (false alarm), or when the button was not pressed within the 2 s time window after the change (miss). A green square was displayed when the button was pressed after the change but within the 2 s window.
In addition, sound level was roved from trial to trial; it was chosen randomly between 60 and 80 dB SPL (sound pressure level). This procedure is classically applied to prevent subjects from adopting any absolute level strategy. The inter-trial interval was ~ 1 s with a small, random jitter (< 0.1 s) depending on computer load.
2.5 Data Analysis
We quantified the ability of the subjects to detect the change in stimulus statistics using two measures, hit rate and d-prime (d’). We also found reaction times to depend on the difficulty.
These measures were computed as a function of change size and change time. Since change times were distributed continuously but with an exponential distribution, the set of change times was binned with approximately exponentially increasing bin size (in order to achieve comparable numbers of trials in each bin).
To control for inattentive subjects, we set a 35 % threshold for the total false alarm rate. Two subjects were discarded according to this criterion leaving a total of 10 subjects for the data analysis, with false alarm rates around 25 %.
2.5.1 Hit Rate and Reaction Times
We computed a subject’s hit rate as the fraction between successful detection (hits) out of the total trials for which the change occurred before the subject’s response (hits + misses). False alarms were excluded from the hit rate computation, since they occurred before the subject was exposed to the change (see d’ below for an inclusion of false alarms). We obtained reaction times by subtracting the change time from the response time in each trial.
2.5.2 d’ Analysis
We computed d’ values to assess the ability to detect changes, while taking their false alarm rate into account. Due to the present task structure, d’ was computed as a function of time from stimulus onset (see Fig. 1e for an illustration), approximated as d’(t) = Z(HR(t)) ‑Z(FAR(t)), where Z(p) is the inverse of the cumulative Gaussian distribution. HR(t) is the hit rate as a function of time t since stimulus onset. Hit rate was computed as the fraction of correct change detections, in relation to the number of trials with changes occurring at t. Similarly, the false alarm rate FAR(t) was computed as the number of false alarms that occurred over all 2 s windows (starting at t), in which no change in statistics occurred. The window of 2 s was chosen to be compatible with the hit rates in the 2 s decision window. d’ was computed separately for different times and change sizes, yielding only a limited number of trials per condition. To avoid degenerate cases (i.e. d’ would be infinite for perfect scores), the analysis was not performed separately by subject, but over the pooled data. Confidence bounds (95 %) were then estimated by grouping data from all subjects. The analysis was verified on surrogate data from a random responder (binomial with very low p at each point in time), providing d’ very close to 0 on a comparable number of trials.
2.5.3 Hit rate Dynamics
In order to compare the hit rate dynamics for different change sizes, we fitted (least-square non-linear minimization) a cumulative Erlang distribution to the data according to:
$$P({{\Delta }_{c}},{{t}_{c}})={{P}_{0}}({{\Delta }_{c}})+{{P}_{max}}({{\Delta }_{c}})\text{*}\gamma (k,{{t}_{c}}/\tau ({{\Delta }_{c}}))/(k-1)!$$
where P
0 is the minimal hit-rate, P
max is the maximal hit rate, t
c is change time, Δc the change size, γ the incomplete gamma function, τ the function rate, and k controls the function shape. k was kept constant across subjects and change sizes, assuming the shape of the hit rate curves is invariant, which appeared to be the case in our sample.
2.6 Statistical Analysis
In the statistical analysis only non-parametric tests were used. One-way analysis of variance were computed with Kruskal-Wallis’ test, the two-way were computed using Friedman’s test. Unless is indicated otherwise, error bars correspond to twice the standard error of the mean (SEM). All statistical analyses were performed using Matlab (The Mathworks, Natick).