Materials
Stimuli were created with PRAAT (Boersma & Weenink, 2011) in two steps. First, a 90-ms white noise was generated; the duration was chosen on the basis of previous studies (Goudbeek et al., 2008; Goudbeek et al., 2009). Second, the white noise was filtered in two frequency bands that approximated the location of the first and second formant frequencies of naturally produced vowels—that is, F1 and F2, respectively. Target filter frequencies are referred to as spectral filter frequencies, S1 and S2, throughout this article, and are normalized to Bark (Zwicker, 1961). The Bark conversion is commonly applied in acoustic-phonetic research and accounts for the nonlinearity of the frequency resolution by the human auditory system.
The same original white noise token was used as the basis for all 1,000 stimuli that were generated for each category (A and B) and for each distribution (falling and rising), with different S1 and S2 filter frequencies in each case. Filter frequencies were drawn from the distributions shown in Fig. 1a. In order to arrive at the stretched distributions along the falling and rising diagonals in the S1/S2 space (with a slope of −1 and +1), the linear equation for the diagonal running through the distribution center was calculated. Then individual bivariate normal distributions were generated that had means, μ, at 40 (x,y) locations along the diagonal and equal standard deviations, σ (see Table 1 for details). Twenty-five tokens per distribution were randomly generated, yielding a total of 1,000 stimuli per distribution.
Table 1 Stimulus details in Experiment 1: Distribution ranges (first value, starting point; second value, end point), means (μ) and standard deviations (σ) are given for both spectral filter frequencies (S1, S2) in Bark. Physical δ′ is calculated on the basis of distribution means and standard deviations. For rule-based (RB) optimal boundaries, δ′ can be calculated for both dimensions, while for information integration (II) optimal boundaries, δ′ can be calculated only for the integrated dimension and is based on the Euclidean distance between the distribution means in the two-dimensional space. A and B are arbitrary labels and refer to stimulus sets in the S1/S2 space, where the leftmost set (with low S1) is A and the rightmost (with high S2) is B
Noises were filtered with fast IIR filters, comprising two recursive filter coefficients. Filter bandwidths were 0.2 times the target filter frequency. Stimuli were normalized to an equal average intensity that approximated 60 dB SPL (Boersma & Weenink, 2011). Onsets were multiplied with the first half period of a [1 − cos(x)] * 0.5 function, and offsets with the first half period of a [1 + cos(x)] *0.5 function, over a duration of 10 ms in each case, in order to eliminate acoustic artifacts.
In order to arrive at a stimulus-based measure for the likelihood of strategy preference, we determined the normalized distance between the means of category A and category B distributions, here referred to as δ', along both the S1 and S2 dimensions. The information integration δ' (Euclidean distance between distribution means) was of comparable size to the rule-based δ' values (difference between means for each dimension, S1 and S2). We expect that larger category distances would lead to better categorization performance, such that if participants were to utilize optimal strategies, they ought to prefer those for which δ' is largest. In Experiment 1, the similarity of the distances therefore suggested equal strategy preference.
Stimuli for the nonfeedback maintenance phase were arranged in an equidistantly spaced grid with step sizes of 2/3 Bark in either dimension (S1, 5–9 Bark; S2, 9–13 Bark). They thus described a 6 × 6 grid that evenly covered the critical region of the original stimulus space (Fig. 2a).
Participants and procedure
Thirty-three native speakers of German (all right-handed) participated in Experiment 1 (16 males; mean age 25.76, SD 2.42). They were drawn from the participant pool of the Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, and received monetary compensation for their participation. None of them reported a history of hearing problems.
Participants were randomly assigned to either the rising (n = 17; 7 males; mean age 25.29, SD 2.08) or the falling (n = 16; 9 males, mean age 26.25, SD 2.72) distribution condition. Participants first completed eight blocks (36 trials each) during the learning phase. On each trial, a single stimulus was randomly selected from category A (1,000 exemplars) or B (1,000 exemplars), with the following restrictions: (1) No stimulus could be selected more than once for a given participant in the learning phase of the experiment; (2) within each block, category A and B stimuli were equally probable (p = .5). After stimulus presentation, participants indicated whether it belonged to category A or category B by pressing one of two keys on a computer keyboard (button assignments for the two categories were counterbalanced across participants).
Following the response, participants received corrective feedback, which was displayed for 1 s in the middle of a CRT screen (Sony Multiscan E430). Correct feedback was given in bold green font (24 points), while incorrect feedback was given in bold red font (24 points). Participants were allowed a short break following each block.
Participants then completed two maintenance blocks (also 36 trials each). On each trial, participants were presented with a stimulus sampled from the equidistantly spaced grid described above. Critically, during the maintenance phase, participants did not receive feedback about their responses. The entire experiment lasted for about 20 min.
Stimuli were presented on a Windows-based PC, using the stimulation software PRESENTATION (Neurobehavioral Systems, Inc., version 13.9), and were transmitted through a Creative Labs Audigy II sound card onto Sennheiser HD 201 headphones.
Results
Accuracy results
Overall, performance differed significantly from chance (d′ = 1.35, SD = 0.43), t(32) = 32.94, p < .001. Accuracy in the learning phase was assessed by d′, a signal detection measure of perceptual sensitivity that is independent of response bias (Macmillan & Creelman, 2005). Figure 2 shows d′ as a function of block separately for the rising and falling distributions. In order to assess learning, d′ values were entered into a mixed–measures analysis of variance (ANOVA) with block as a within-subjects and distribution (rising vs. falling) as a between-subjects variable. For all ANOVAs, we report partial eta squared (η
p
2) as a measure of effect size and Greenhouse–Geisser-corrected p-values and degrees of freedom in cases of sphericity violations. There were no significant main effects [block, F(6.1, 188.9) = 1.14, \( \eta_{\mathrm{p}}^2=.04 \), p = .34; distribution, F(1, 31) = 0.002, \( \eta_{\mathrm{p}}^2=.00001 \), p = .97] and also no block × distribution interaction, F(6.1, 188.9) = 0.43, \( \eta_{\mathrm{p}}^2=.04 \), p = .89. Hence, performance did not differ across blocks, and performance across blocks did not differ as a function of distribution condition.
Logistic regression
In order to assess the degree to which category membership judgments (i.e., A vs. B) depended on the acoustic dimensions under investigation (i.e., S1, S2), logistic regressions were calculated in order to predict category A responses from S1 and S2 and their interaction. Note that a significant β-weight indicates the importance of the dimension in determining category membership. Logistic regression models were calculated separately for the learning and the maintenance phases.
Learning phase
The model comprised the regressors S1 and S2 and the factors block and distribution (rising, falling). The following interactions were also included in the model: S1 × S2, S1 × block, S2 × block, S1 × distribution, and S2 × distribution. S1 values per trial significantly predicted category judgments, β = −1.27, z = −4.82, p < .001, but there was no interaction with block, z = −0.02, p = .24, indicating that participants similarly weighted S1 information in making category judgments over the course of the learning phase. The S1 × distribution interaction reached significance, β = −0.35, z = −2.21, p < .05, indicating that category A responses were better predicted by S1 in the falling than in the rising distribution. None of the other factors or interactions were significant, z < 2, n.s.
Maintenance phase
The model included the same predictor variables listed above for the learning phase model, with the exception of block, which was not included here. There was no significant S1 effect, even though there was a trend for more category A responses at lower S1 values, β = −1.82, z = −1.38, p < .16; the S1 effect reached significance if the S1 × S2 interaction was removed from the model, β = −2.34, z = −13.08, p < .001. In the full model, there was a further significant effect of distribution, β = 3.97, z = 2.06, p < .05, reflecting that category A responses were better predicted in the falling than in the rising condition, and this effect was qualified by the significant S1 × distribution interaction, β = −0.53, z = −1.92, p = .05.
Learning versus maintenance phases
Cue utilization during the learning versus maintenance phases was compared directly by entering the absolute values of β-weights for S1 and S2 into a mixed–measures ANOVA with the within-subjects factors phase (learning, maintenance) and filter (S1, S2) and the between-subjects factor distribution (rising, falling). Individual β-weights stemmed from the learning and maintenance models reported above, except that we did not include block in the learning phase model (parallel to the maintenance phase model). Note that for these analyses, only the magnitude (but not the sign) of the β-weights provided interesting information regarding the importance of each dimension to category judgments. There were significant main effects of filter, F(1, 93) = 217.99, \( \eta_{\mathrm{p}}^2=.701 \), p < .001, and phase, F(1, 93) = 43.52, \( \eta_{\mathrm{p}}^2=.319 \), p < .01. Higher βs were observed for S1 (2.55, SD = 1.20) than for S2 (0.62, SD = 0.53), and βs were higher in the maintenance (2.02, SD = 1.57) than in the learning (1.15, SD = 0.90) phase. Furthermore, there was a trend for a distribution × filter interaction, F(1, 93) = 2.03, \( \eta_{\mathrm{p}}^2=.021 \), p = .14, reflecting that within the falling distribution, the difference between βs for S1 and S2 (2.70, SD = 1.18, vs. 0.55, SD = 0.55) was greater than within the rising distribution (2.43, SD = 1.24, vs. 0.67, SD = 0.52). There was also a filter × phase interaction, F(1, 93) = 16.50, \( \eta_{\mathrm{p}}^2=.151 \), p < .01, indicating that βs for S1 and S2 differed more in the maintenance (3.25, SD = 1.20, vs. 0.78, SD = 0.61) than in the learning (1.86, SD = 0.70, vs. 0.45, SD = 0.39) phase.
In order to visualize the degree to which participants relied on the individual dimensions, S1 and S2, we plotted βs for S2 (ordinate) against βs for S1 (abscissa) in the learning and in the maintenance phases. In these scatterplots, participants are coded according to whether they significantly used S1 and S2 (S1 + S2; blue diamonds), S1 only (S1; red squares), S2 only (S2; green triangles), or none of the dimensions (purple circles) for categorization. Significant usage was determined by βs that significantly differed from zero on the basis of the single-subject logistic regression models (α = .05). In these plots, participants who used both dimensions tended to fall on a diagonal. Participants with a preference for S2 are clustered near the ordinate, and participants with a preference for S1 are clustered near the abscissa (Fig. 3a). It can be seen from the figure that most participants relied on S1 in both the learning and the maintenance phases. The percentages of significant βs did not significantly differ between the falling and rising distributions (all χ
2s < 2.5, n.s.), even though we observed a trend for participants to be more likely to rely on both S1 and S2 in the falling, as compared with the rising, stimulus distribution.
In sum, the logistic regressions indicated that most participants relied on the first filter frequency, S1, during categorization and more strongly in the maintenance phase (without feedback) than in the learning phase (with feedback). On the other hand, some participants used information from both S1 and S2, while only very few exclusively used S2 for categorization.
Furthermore, categorization also somewhat differed between the falling and the rising distributions in that the reliance on S1 was greater in the falling than in the rising stimulus distribution and in that more participants tended to use both S1 and S2 in the falling than in the rising distribution.
Modeling results
Three families of decision bound models (e.g., Ashby & Gott, 1988; Maddox & Ashby, 1993) were fit to the data for each individual participant on a block-by-block basis to determine the decision strategy that best accounted for performance (Fig. 3b; see the Appendix for details): unidimensional rule-based, information integration, and random-response models. The two rule-based models assumed that listeners made use of unidimensional rules based on either S1 or S2. Two information integration models assumed an optimal decision bound or allowed decision bound slope and intercept to be free parameters but are summarized as one model for the remainder of this article. Finally, the random-response model presumes that participants guessed randomly on every trial. In order to assess whether the decision bound models provided substantial evidence, we transformed the respective Bayesian information criterion (BIC) scores to Bayes factors (Kass & Raftery, 1995; Raftery, 1986; see the Appendix) and subsequently used Jeffrey’s suggested scale of evidence. According to this scale, Bayes factors greater than 3 indicate substantial evidence for model use.
Almost all model fits (per participant and block) for the rule-based S1 and S2 models provided substantial evidence, while only 10 % of the model fits for information integration exceeded this threshold (Fig. 3b, right). The percentages did not differ between distributions (i.e., falling vs. rising; all χ
2s > 2, n.s.). Figure 3b (left) gives the proportion of listeners in the rising and falling distributions whose data were best fit by each of the tested models across blocks. All winning models had Bayes factors of >3.
Consistent with the results of the logistic regression analysis, participants in both the rising and falling distribution conditions made almost exclusive use of unidimensional rules, and participants were more likely to use a rule based on S1 than one based on S2. Chi-squared tests indicated that, overall, more participants relied on a unidimensional S1 rule, as compared with a unidimensional S2 rule, in six of the eight blocks. In order to account for multiple comparisons, we corrected our statistics with the false-discovery-rate (FDR) method (Benjamini & Hochberg, 1995; FDR-corrected α-level = .05). Taking the distribution conditions separately, participants in the falling distribution condition exhibited this pattern more strongly, using a unidimensional S1 rule more often than a unidimensional S2 rule on four of the eight blocks (ps < .05), whereas this difference was not significant in any block for the rising distribution condition.
Convergence of logistic regression and decision bound models
To our knowledge, no study has assessed the degree to which the two approaches, logistic regressions and decision bound models, converge. For this reason, we explored the relationship between block-averaged β-weights and goodness-of-fit measures (i.e., BICs) separately for the unidimensional S1 and S2 decision bounds in two ANOVAs with block-averaged BIC scores as dependent variables. We were effectively asking to what degree BIC-scores supporting a rule-based S1 or S2 strategy could be predicted from β-weights of S1 or S2 logistic regressions. Note that information integration models were not included in these analyses, since the proportion of participants using information integration in Experiment 1 was too small for meaningful comparisons.
Both models included the between-subjects factor distribution (rising, falling), the regressor β-weight, as well as the β-weight × distribution interaction. The S1 model (with S1 BIC score as dependent variable) showed a significant effect of S1 β-weight, F(1, 29) = 28.16, \( \eta_{\mathrm{p}}^2=.492 \), p < .001, reflecting a negative correlation between β-weights and BIC scores (i.e., higher β-weights for lower BIC scores). However, the correlation was not modulated by distribution, as evidenced by no other significant main effects or interactions (all Fs < 3, all ps > .15). In the S2 model (with S2 BIC scores as dependent variable), >the negative correlation between β-weights was not significant, F(1, 29) = 2.33, \( \eta_{\mathrm{p}}^2=.074 \), p < .15, overall, but depended on distribution [β-weight × distribution: F(1, 29) = 3.07, \( \eta_{\mathrm{p}}^2=.113 \), p = .05]. The β-weight/BIC score correlation was significant for the falling, F(1, 15) = 4.88, \( \eta_{\mathrm{p}}^2=.245 \), p < .05, but not for the rising, F(1, 14) = 0.39, \( \eta_{\mathrm{p}}^2=.027 \), p = .54, distribution.
Overall, BIC scores supporting either S1 or S2 rule-based strategies negatively correlated with the corresponding absolute β-weights from the S1 and S2 logistic regression effects. Thus, β-weights and decision bound model BIC scores converged.
Prediction of performance by decision bound models
Finally, we tried to predict performance from strategy use; that is, we tested whether the likelihood of using a rule-based S1 or S2 categorization strategy was associated with better performance, as indexed by two separate mixed-measures ANOVAs with d' as the dependent variable, the proportions of rulel-based S1 and S2 strategy use and distribution (falling, rising) as independent variables. Since proportions of rule-based S1 and S2 strategy use are necessarily highly correlated, the two factors were investigated in separate ANOVAs.
None of the ANOVAs showed significant main effects or interactions (all Fs > 1, n.s.). Thus, performance in Experiment 1 did not depend on either rule-based S1 or S2 strategy use.