The speed and accuracy of perceptual decisions in a random-tone pitch task

Research in perceptual decision making is dominated by paradigms that tap the visual system, such as the random-dot motion (RDM) paradigm. In this study, we investigated whether the behavioral signature of perceptual decisions in the auditory domain is similar to those observed in the visual domain. We developed an auditory version of the RDM task, in which tones correspond to dots and pitch corresponds to motion (the random-tone pitch task, RTP). In this task, participants have to decide quickly whether the pitch of a “sound cloud” of tones is moving up or down. Stimulus strength and speed–accuracy trade-off were manipulated. To describe the relationship between stimulus strength and performance, we fitted the proportional-rate diffusion model to the data. The results showed a close coupling between stimulus strength and the speed and accuracy of perceptual decisions in both tasks. Additionally, we fitted the full drift diffusion model (DDM) to the data and showed that three of the four participants had similar speed–accuracy trade-offs in both tasks. However, for the RTP task, drift rates were larger and nondecision times slower, suggesting that some DDM parameters might be dependent on stimulus modality (drift rate and nondecision time), whereas others might not be (decision bound). The results illustrate that the RTP task is suitable for investigating the dynamics of auditory perceptual choices. Future studies using the task might help to investigate modality-specific effects on decision making at both the behavioral and neuronal levels.

The field of perceptual decision making investigates how decisions are made on the basis of noisy sensory information (see Summerfield & Tsetsos, 2012, for a review). When people make perceptual decisions, it is generally assumed that sensory evidence for each of the alternatives is collected until a boundary is reached and a course of action is chosen. Typically, experiments designed to study perceptual decision making involve a two-alternative forced choice paradigm. A popular example of such a paradigm is the randomdot motion (RDM) task (e.g., Britten, Shadlen, Newsome, & Movshon, 1992;Forstmann et al., 2008;Gold, 2003;Gold & Shadlen, 2000;Hanks, Ditterich, & Shadlen, 2006;Heekeren, Marrett, Ruff, Bandettini, & Ungerleider, 2006;Morgan & Ward, 1980;Mulder et al., 2010;Mulder, Wagenmakers, Ratcliff, Boekel, & Forstmann, 2012;Newsome & Paré, 1988;Palmer, Huk, & Shadlen, 2005;Roitman & Shadlen, 2002;van Ravenzwaaij, Mulder, Tuerlinckx, & Wagenmakers, 2012;Watamaniuk & Sekuler, 1992). In this task, participants have to decide as quickly and accurately as possible whether a "cloud" of dots appear to move to the left or the right. The paradigm has proven to be extremely useful for the study of perceptual decisions, since manipulations of the decision process can be implemented easily. For example, difficulty can be manipulated on a continuous scale by simply changing the number of coherently moving dots (e.g., Palmer et al., 2005). Another example involves a manipulation of the time period in which the participant is allowed to decide. Such a manipulation is associated with the typical signatures of changes in the response time (RT) and accuracy of the perceptual choice: By stressing the importance of speed, choices will be faster, but more prone to errors (a speed-accuracy trade-off; e.g., Forstmann et al., 2008;Palmer et al., 2005).
Sequential-sampling models can describe the underlying mechanism of the decision process. These models, including the drift diffusion model (DDM; Ratcliff, 1978), are based on the assumption that noisy sensory evidence accumulates until a boundary is reached ( Fig. 1A; for reviews, see Bogacz, 2007;Gold & Shadlen, 2007;Wagenmakers, 2009). The model permits data to be decomposed into parameters that map onto latent psychological processes. For example, the rate of evidence accumulation (drift rate) depends on the quality of the stimulus and reflects the difficulty of a choice, whereas the boundary separation reflects the trade-off between the speed and accuracy of the perceptual decision. Furthermore, the DDM has proven to be neurobiologically plausible, as studies with human and nonhuman primates have shown neural correlates of the different components of the model (see Gold & Shadlen, 2007;Heekeren, Marrett, & Ungerleider, 2008). For instance, the firing rates of neurons in the monkey lateral intraparietal area reflect the accumulation process prior to the actual choice, which is initiated when the neurons reach a critical firing rate (Gold & Shadlen, 2007).
In this study, we developed an auditory random-tone pitch (RTP) task to investigate the behavioral signature of auditory perceptual choices. In this task, participants were asked to decide whether a mixture of randomly changing tones moved up or down a pitch scale.
Our main goal was to develop an auditory version of the RDM paradigm that would allow for flexible adaptation of stimulus strength on a continuous scale. As such, the psychophysical features of the auditory stimulus were closely related to those used in the RDM task. In addition, we applied a speed manipulation to illustrate how the task could be used to study changes in the decision process that might or might not be modality specific. The results of the auditory RTP task will be described with respect to performance on the visual RDM task. To this end, we fitted two versions of the DDM to the behavioral data. First, we fitted the proportional-rate diffusion model (Palmer et al., 2005) to the data, to show that the descriptive results (mean RT and accuracy) could be described by a model that has strong theoretical restrictions. Second, we fitted the full DDM (Ratcliff, 1978) to the data and measured the effect of the experimental conditions on the model parameters.
For both tasks, we expected participants to show a smaller boundary separation in the speed than in the accuracy condition (i.e., a speed-accuracy trade-off). Furthermore, we expected higher drift rates for easier stimuli. In addition, we explored the extent to which task modality moderates the speed-accuracy trade-off and the effects of stimulus difficulty.

Procedure
Four of the authors [MK, WB, LM and MM.; mean (SD) age = 29.5 (6.8) years] participated in the experiment. All four participants had already undergone substantial training in both  The drift diffusion model assumes that noisy information is accumulated until a boundary is reached (a). Drift rate (v) reflects the quality of the sensory evidence, and nondecision time (T er ) reflects the time other than the decision time (e.g., processing of sensory information and/or execution of a motor response). (B) The classic version of the random-dot motion (RDM) task, in which participants have to decide whether a cloud of dots appears to move to the left or to the right. (C) The auditory random-tone pitch (RTP) task, in which participants have to decide whether a sound cloud of random tones moves up or down a pitch scale paradigms, minimizing contamination from practice effects. In the RDM paradigm, the participants were asked to indicate the direction of motion of a cloud of randomly moving white dots on a black background (Fig. 1B). In the RTP paradigm, they were asked to indicate the direction of a "sound cloud" of random tones on a pitch scale (Fig. 1C). All four participants had an additional session of ∼1 h of practice on both tasks before participating in the experiment. In both the RDM and RTP tasks, both difficulty and speed-accuracy trade-off (SAT) were manipulated. Difficulty was manipulated by changing the quality of the perceptual stimulus. For the RDM task, this was done by manipulating the number of coherently moving dots. For the RTP task, this was done by manipulating the number of coherently changing tones on a pitch scale. For both tasks, we used six levels of difficulty, represented by six levels of coherence (0 %, 5 %, 10 %, 20 %, 40 %, and 80 %). For the SAT manipulation, each participant performed three blocks of each task in which they were instructed to be as accurate as possible. To determine a participant-specific time limit, we fitted the proportional-rate diffusion model to the data of the accuracy session and estimated the halfway time threshold from the chronometric curve (predicted by the proportional-rate diffusion model; Palmer et al., 2005). This threshold was then used as a time limit in the following speed session (three blocks). Across all sessions, the participants received one point for each correct choice. No points were given for incorrect choices. In the speed session, the feedback "too slow" was given when a response exceeded the time limit. In the auditory version, the participants additionally received a buzz sound. In each block, the participants performed 600 trials for each modality (100 trials per difficulty level), resulting in 1,800 trials for the accuracy condition and 1,800 trials for the speed condition (the total number of trials was 7,200, which lasted about 5 h). The order of the modalities was interleaved within sessions (i.e., a block of one modality was followed by a block of the other modality) and counterbalanced across participants (i.e., for two participants, the session started with an auditory block, whereas for the other two participants, the session started with a visual block).

Visual stimuli
In the RDM paradigm, participants were asked to decide whether a cloud of white dots on a black background appeared to move to the left or the right. Responses were given by pressing a left (the letter "Z") or right (the letter "M") key on a keyboard. The motion stimuli were similar to those described elsewhere (e.g., Gold & Shadlen, 2000;Mulder et al., 2010;Mulder et al., 2012;Palmer et al., 2005) and were created by changing the locations of dots at each successive video frame (monitor refresh rate = 60 Hz, resulting in 16.7 ms per frame). On the first three frames of the motion stimulus, dots were located in random positions. For each of these frames, the dots were repositioned after two subsequent frames (i.e., the dots in Frame 1 were repositioned in Frame 4, the dots in Frame 2 were repositioned in Frame 5, etc.). For each dot, the new location was either random or at a fixed distance from its former location, in the current direction of the motion stimulus. The probability that a dot would move in line with the motion direction was defined as the coherence (see also Britten et al., 1992;Gold & Shadlen, 2003;Palmer et al., 2005).

Auditory stimuli
In the RTP paradigm, participants were asked to decide the direction of a "sound cloud" of random tones moving up or down a pitch scale. Responses were given by pressing a left key (the letter "Z") for "up" or a right key (the letter "M") for "down" on a keyboard. To keep the psychophysical features of the stimuli similar to their visual counterparts, we used the locations of the moving dots to calculate the locations on the pitch scale: The x position of each dot was translated to a location on a pitch scale, and both the x and y positions were used to determine the volume on a ramping scale, with 0 being outside the aperture of the motion stimulus and 1 in the middle (see Fig. 2). The rationale behind using both the x and y positions to set the volume was to mimic the circular visual border of the RDM stimulus: The closer a dot was to the aperture, the lower the additive effect it had on the stimulus. At the first, and at each subsequent third frame, the different tones were added together and played for a duration of three frames (∼50 ms). The resulting "sound cloud" was a set of bleeps that either increased or decreased in pitch. The stimulus strength again depended on coherence-the probability that a tone moved coherently with the pitch direction (see Fig. 2; for examples, go to http://martijnmulder.wordpress.com/stuff/). For each tone, the pitch frequency could vary between 261.63 and 16744 Hz (which is equal to the audio frequency range between C4 and C10). Each successive step on the x-axis resulted in an auditory change of 200 cents, which is equal to a whole tone on a musical scale (C4, D4, E4, F#4 . . . C10).

Behavioral analyses
First, we fitted the proportional-rate diffusion model (Palmer et al., 2005) to the mean RT and accuracy data for each task separately. This model has strong theoretically motivated constraints and is able to adequately describe the relationship between coherence and behavioral performance on a continuous scale. The proportional-rate model predicts (1) a close coupling between mean RT, accuracy, and stimulus strength; (2) a linear scaling between stimulus strength and mean drift rate; and (3) equal RTs for correct and incorrect choices (Palmer et al., 2005). These restrictions result in a model that is less complex than the full DDM, while showing an intuitive relationship between stimulus strength and performance. Psychometric and chronometric functions of stimulus strength were fitted to the mean RT and accuracy data using a maximum likelihood procedure (see Palmer et al., 2005, for details).
Additionally, we used the DMAT toolbox to fit the full DDM to the data, in order to obtain parameter values for each condition separately, without the theoretical restrictions of the proportional-rate diffusion model. The full DDM consists of seven parameters: three for the decision process (i.e., boundary separation a, mean starting point z, and mean drift rate v), a parameter for nondecision processes (i.e., nondecision time T er ), and three parameters for across-trial variability (i.e., variability in starting point s z , variability in nondecision time s t , and variability in stimulus quality η; Ratcliff, 1978;Ratcliff & Tuerlinckx, 2002). The DDM assumes that sensory evidence in favor of one or the other alternative starts to accumulate at a drift rate v from a starting point z, until a boundary is reached. We assume that the effects of difficulty (i.e., the stimulus quality or coherence) are reflected by changes in the drift rate v, whereas the speed-accuracy trade-off is controlled by the height of the boundary separation a. However, the nondecision time T er might be affected by the speed manipulation as well (Mulder et al., 2010; Fig. 2 The psychophysical features of the auditory stimuli were kept similar to those of the visual stimuli. For each visual stimulus, and each moving dot, the position on the horizontal axis (x) was translated to a position on a pitch scale. In addition, both the x and y positions were used to determine the volume for each tone. This resulted in a "sound cloud" of tones that moved coherently or randomly on the pitch scale. When dots moved to the right, the resulting pitch of the sound cloud went up. When dots moved to the left, the resulting pitch went down Osman et al., 2000;Rinkenauer, Osman, Ulrich, Muller-Gethmann, & Mattes, 2004;Voss, Rothermund, & Voss, 2004). As such, when fitting the DDM to the data, we allowed the boundary separation and nondecision time to fluctuate across the speed and accuracy conditions, whereas drift rate was allowed to vary across difficulty conditions. All other parameters were held fixed across conditions. We used the Diffusion Model Analysis Toolbox (DMAT) to fit the DDM to the individual data (Vandekerckhove & Tuerlinckx, 2007. For this analysis, the easiest coherence level (80 % coherence) was discarded from the data, because the number of incorrect trials required by DMAT was not reached, due to a ceiling effect (see Fig. 3).
Furthermore, for each condition we excluded outlier trials, as defined by an RT that exceeded the conditional average RT plus three times the standard deviation. We used the DMAT Nelder-Mead SIMPLEX optimization algorithm (Nelder & Mead, 1965) to maximize the likelihood of observing a proportion of responses within a given number of RT bins (10th, 30th, 50th, 70th, and 90th percentiles). Quantile probability plots were generated in order to display the quality of the model's fits to the data (see Fig. 4).

Parameter permutations
To obtain confidence intervals for the DDM parameters, we used the nonparametric bootstrap (Efron & Tibshirani, 1993). The DDM was fitted to 1,000 resampled data sets, and the resulting histogram of best-fitting parameter values was used to obtain confidence intervals to test specific differences across conditions.

Results
Below, we will first describe the effects of stimulus difficulty and SAT for both the RTP and RDM tasks. Then we will show the results of the full-DDM analyses, comparing parameter changes across different modalities.

Descriptive results
Fitting the proportional-rate diffusion model to the data clearly showed a coupling between stimulus difficulty and the speed and accuracy of perceptual choices in both the RTP and RDM tasks (see the supplementary materials for the parameter values). More specifically, for each participant, RTs decreased and accuracy increased as a function of coherence (see Fig. 3). Furthermore, for both modalities we observed decreases in accuracy and RT for all difficulty levels when speed was stressed. This effect on accuracy was less apparent for participant MM in the RTP task, in whose data the accuracy levels of the speed-and-accuracy manipulation seemed to overlap (see Fig. 3). Interestingly, when comparing RTs between the visual and auditory tasks, most of the participants showed slower RTs for the auditory stimuli, reflected in the upward vertical shifts of the chronometric curves in Fig. 3. In addition, for participants LM and MM, the upper bounds of the psychometric curves of the auditory task did not reach an accuracy level of 100 %, suggesting perhaps a larger lapse rate for the auditory as opposed to the visual stimuli for these participants. Taken together, both the RTP and RDM tasks showed a close relationship between stimulus strength and performance on a continuous scale. However, the observed differences in the chronometric and psychometric curves reflected   Tables S1, S2, and S3 in the supplementary materials subtle differences in RTs and accuracy between the two tasks. To elaborate these findings, we fitted the full DDM to the data and investigated the differences in the underlying decision parameters between the auditory and visual task domains.

DDM fits
For each experimental modality (visual vs. auditory), the proportions of correct choices and the RT distributions are presented in quantile probability plots for the speed and accuracy conditions separately  see Fig. 4). These plots show the empirical data together with the quantile probability functions that indicate the fit from the diffusion model. Overall, the quantile probability functions adequately describe the data. However, the functions do deviate from the data at some points. Specifically, the fits are worse for the higher quantiles in the accuracy sessions (see participants WB and MM). This is possibly due to the relatively low number of incorrect trials for these sessions. The misfit was most apparent for participant WB, for whom the model deviated from the data for most quantiles in the accuracy session. A possible explanation for this deviation might be that the RTs were somewhat unusually long for the visual accuracy trials (see Fig. 3).

DDM parameters
Fitting the DDM to the data revealed that, for both tasks, manipulations of the speed-accuracy trade-off affected both boundary separation and nondecision time (see Fig. 5 and   (see Fig. 5 and Tables 1 and 2). In addition, we found a difference between the visual and auditory modalities, with longer nondecision times for the auditory task (Fig. 6). Finally, for both tasks we found an effect of difficulty on drift rates, with larger drift rates for trials with higher coherence. However, for three of the four participants (MK, WB, and LM), the drift rates were higher for visual than for auditory stimuli, especially when stimulus strength increased (see Figs. 3 and 5). These results suggest that for these participants, the quality of the sensory evidence was lower for easy auditory than for easy visual stimuli, possibly reflecting an individual difference in modality-specific processing.

Discussion
Research in perceptual decision making is dominated by paradigms that tap the visual system. Here, we investigated whether the behavioral signature of perceptual decisions in the auditory domain is similar to those observed in the visual domain. We developed an auditory version of the RDM task, in which tones correspond to dots and pitch corresponds to motion (the random-tone pitch task, RTP) and manipulated stimulus difficulty and speed-accuracy trade-off. We showed that the relationship between stimulus strength and performance on the RTP task followed a pattern similar to the one observed for the RDM task. By fitting the full drift diffusion model to the data, we showed that the effects of modality mainly affected drift rate and nondecision time, with lower drift rates and longer nondecision times for the auditory task. In contrast, boundary separation seems to be less sensitive to modality-specific effects, as shown by the similar boundary separation values across both tasks for three of the four participants. Similar results were found for the proportional-rate model parameter values (see Table S3 in the supplementary materials). These  results might indicate that some components of the decision process might be inherent to the participant rather than the task. Remarkably, the boundary separations for participant WB in the accuracy conditions were much higher than those for the other three participants. One explanation might be that participant W.B. was overly cautious (e.g., Bogacz, Wagenmakers, Forstmann, & Nieuwenhuis, 2010;Forstmann et al., 2008;Wagenmakers, Ratcliff, Gomez, & McKoon, 2008). Indeed, this participant reported that he focused specifically on performing accurately, and therefore was deliberately very cautious. This explanation is in line with the RTs for the accuracy session, which were relatively slow, as compared also to those from other studies using the RDM task (Forstmann et al., 2008;Mulder et al., 2010;Mulder et al., 2012;Palmer et al., 2005;van Maanen, Grasman, Forstmann, Keuken, et al., 2012a;van Maanen, Grasman, Forstmann, & Wagenmakers, 2012b).
In contrast to boundary separation, drift rate and nondecision time showed considerable variability across participants and modalities. Overall, we found that easy auditory stimuli had lower drift rates than did easy visual stimuli (see Fig. 6). Typically, drift rate reflects the speed of the accumulation process, and therefore indicates the quality of the sensory evidence. Hence, the drift rates suggest that the RTP task is more difficult than the RDM task. This assertion is confirmed by an inspection of the behavioral results. Accuracy levels for the easy trials (20 % and 40 % coherence) were higher for the visual than for the auditory task (note that the 80 %-coherence trials were discarded from the DDM analyses, as participants made very few incorrect choices for this stimulus strength). Furthermore, RTs seem to be longer for the auditory than for the visual modality. Together, these results strongly suggest that the discrimination between auditory stimuli was harder than the discrimination between visual stimuli. One explanation of the differences in auditory performance across participants might be that they could have different experiences within the auditory domain, such as musical development (Foster & Zatorre, 2010;Kishon-Rabin, Amir, Vexler, & Zaltz, 2001;Micheyl, Delhommeau, Perrot, & Oxenham, 2006;Schön, Magne, & Besson, 2004;Spiegel & Watson, 1984;Tervaniemi, Just, Koelsch, Widmann, & Schröger, 2005). Musical experience might enhance the processing of the auditory stimuli used in the RTP task, which in turn could increase drift rate.
In addition to drift rate, effects of nondecision time can account for modality-specific differences in RTs as well. As is shown in Fig. 6, three of the four participants had longer nondecision times for the auditory than for the visual decisions. Typically, nondecision time is associated with sensoryencoding and/or motor processes Vandekerckhove & Tuerlinckx, 2008;Voss et al., 2004;Zylberberg, Ouellette, Sigman, & Roelfsema, 2012). However, it has been shown that the encoding of auditory information is usually faster than the encoding of visual information (Brebner & Welford, 1980;Green & Vongierke, 1984). This suggests that the difference in nondecision times may be due primarily to effects at the motor level. One explanation is that the stimulus-response mapping might be different for the auditory than for the visual task. For the visual task, the direction of the motion stimulus is compatible with the response, resulting in preparatory effects in favor of that particular response (Buetti & Kerzel, 2008;Lien & Proctor, 2002). For the auditory task, however, the "vertical" direction of the tones might result in a longer RT, as the stimulus has to be associated with the particular response first. As such, stimulus-mapping effects might result in a delay (Zhang & Kornblum, 1998), resulting in prolonged nondecision times for auditory stimuli. In addition to the sensory modality effect, some participants showed effects of the speed manipulation on nondecision time, as well. Other studies have also shown effects of speed instructions on nondecision times (Mulder et al., 2010;Osman et al., 2000;Rinkenauer et al., 2004;Voss et al., 2004), suggesting effects of speed instructions on (pre) motor processes that occur after the decision process (Rinkenauer et al., 2004).
Taken together, the differences in drift rates and nondecision times between the two tasks suggest that sensory and motor processes are modality specific, whereas the boundary separation might be less sensitive to the type of information that is accumulated.
This study demonstrated how both the RTP and RDM tasks can be used to study perceptual decision making for different sensory modalities. Furthermore, the tasks illustrate the convenience of using stimuli on a continuous scale. For example, by fitting both the psychometric and chronometric functions (e.g., through the proportional-rate diffusion model), one is able to interpolate participant-specific speed and accuracy levels. This may be useful in experiments in which one wants to keep task difficulty similar across participants (e.g., Mulder et al., 2010;Mulder et al., 2012). Instead of using a fixed level of stimulus strength, one could interpolate the stimulus strength from the psychometric curve for a specific performance level (e.g., 80 % correct responses). Similarly, for a speedaccuracy trade-off paradigm, one may choose to interpolate a participant-specific time limit by using the chronometric curve, to make sure that the underlying effort to speed up is similar across participants. Such experimental controls might be particularly useful for neuroimaging studies in which one is interested in measuring the brain correlates of a specific decision mechanism. Furthermore, combining the RDM with the RTP task might be especially useful for investigating the neural correlates of perceptual decision making within and across different sensory modalities.

Conclusion
We developed the RTP task, an auditory version of the random-dot motion task to investigate the dynamics of auditory perceptual choices on a continuous stimulus scale. Manipulations of difficulty and the speed-accuracy tradeoff illustrated that some parameters might be independent of stimulus modality (i.e., boundary separation), whereas others are not (i.e., drift rate and nondecision time). Future studies on perceptual decision making using the RTP task will allow a more systematic investigation of these modality-specific effects on both the behavioral and neuronal levels.
Author note This study was supported by the Dutch Organization for Scientific Research (NWO).
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.