Introduction

When we measure the reaction to the onset of a stimulus, we usually define it as a reaction to a single point in time. For this reason models for auditory simple reaction time (RT) often assume an abrupt onset at the beginning of a sound. However, when presenting any sound except broadband noise, for example a pure tone, it is essential to increase amplitude gradually over several milliseconds in order to avoid a clicking noise and thus preserve the intended spectrum of the sound. Even for broadband noise, it is unlikely that RT is entirely determined by the signal onset because a sound cannot be defined by a single point in time. The question arises to which portion of the sound we actually react.

For stationary stimuli, i.e. sounds with a flat envelope, which have a long or constant duration, it has been well-known since the early times of psychophysics that RT is determined by the strength of sensation (Wundt, 1874; Cattell, 1886; Piéron, 1920) and that reaction is faster to more intense sounds. In psychoacoustics, it has been established that RT is a correlate of loudness rather than physical intensity. For tones of different frequencies, RT can be different despite equal physical level, to some degree following the patterns of equal loudness contours (Chocholle, 1940; Marshall & Brandt, 1980; Kohfeld et al. 1981; Epstein & Florentine, 2006). Loudness also depends on the bandwidth of a sound, with broadband noise being louder than narrowband noise, and so does RT (Wagner et al. 2004). Finally, loudness is increased by binaural summation. Numerous studies have shown that RT to the onset of a tone presented to both ears is slightly shorter than to a tone presented to one ear only (Chocholle, 1944; Lentz et al. 2014; Schröter et al. 2007; Simon, 1967) and that the level difference needed to obtain equal RT comes close to the one required for equal loudness (Schlittenlacher et al. 2014).

A more difficult issue for the relation between loudness and RT is the duration of a sound. Munson (1947) reported that “full loudness is not reached until over 0.2 second has elapsed” and much subsequent research agrees with this finding by and large 7 (Miller, 1948; Pollack, 1958; Port, 1959, 1963; Ekman et al. 1966; Stevens and Hall, 327; Zwislocki, 1969; Pedersen et al. 1977; Scharf, 1978; Poulsen, 1981), though the critical durations (CDs) reported range from 50 msec to 500 msec, mostly falling between 100 and 200 msec and somewhat depending on level (Florentine et al. 1996; Buus et al. 1997). The only exception with a significantly smaller CD of only 15 msec is reported by Small et al. (1962). The CD for loudness is defined as the duration after which loudness remains constant and is independent of duration. In some of the studies, the loudness-duration function is approximated by an exponential curve for which its time constant was reported. In these cases, the CD is approximately three times the time constant. The process underlying the increasing loudness until the CD is reached is called temporal integration.

RT also depends on duration, with longer RT for short-duration sounds. However, it seems obvious that the CD for RT might be shorter than that for loudness, as mean RT itself is in the range of the CD for loudness. Raab (1962) studied RT to the onset of a monaural broadband noise at two levels and durations of 2, 5, 10, 20, 50 and 100 msec. For both levels, he found RT to decrease for durations from 2 to 20 msec but remaining constant for 50 and 100 msec. Ulrich et al. (1998) also investigated the influence of duration, using a binaural 1-kHz pure tone with an abrupt onset at three levels and durations of 5, 10, 20, 40, 80, 160 and 320 msec. They found RT to decrease up to a duration of 40 msec and afterwards “almost equivalent” RT. By looking at the curves, however, there is an indication of RT slightly increasing again for longer durations. Such an effect was reported by Gregg and Brodgen (1950) with RT to a sound with a duration of 2400 msec being 30 msec longer than that to a duration of 100 msec. This leads to discrepancies with the loudness studies, since loudness remains constant for sounds longer than the CD and above all the CD for RT appears to be considerably shorter than for loudness.

That is why Heil et al. (2006), focusing on RT near the threshold in quiet, suggested that two necessary conditions for RT to be a correlate of loudness are that stimulus durations and temporal envelopes are the same across the conditions being studied. The present work will investigate these two stimulus features in more detail; however, it shall focus on sounds well above threshold. Experiment 1 focuses on duration; Experiment 2 varies the rise of the stimulus. By doing so, conclusions about the nature of the detectors determining reaction time (see Luce, 1986, p. 159 ff.) may be derived. Level detectors are triggered by the intensity of the stimulus whereas change detectors primarily depend on the temporal envelope, e.g. represented by the steepness of the rise. Finally, Experiment 3 combines all the issues mentioned above by using amplitude-modulated sounds as an instance of time-varying sounds.

If RT draws on different mechanisms than does loudness, the question arises how they can be explained. There are many models for RT as a function of intensity (see Luce, 1986, p. 122 ff. for a review), but most of them assume stationary stimuli. A rather new and interesting model for the present purpose is the parallel grains model (PGM) of J. Miller and Ulrich (2003 see also section “Evaluation and development of the models” for details). In their model, a stimulus activates a number of “grains”, e.g. neural information channels. The time needed for a certain grain to be detected at the decision center depends on its intensity and is a probabilistic function of time. After a defined number of grains have arrived at the decision center, the reaction is triggered. Since the reaction is already a function of intensity and time, it is easy and straightforward to extend the model in a manner that it can deal with intensity as a function of time.

An alternative approach to explain temporal integration for RT would be to deduce it from the temporal integration of loudness, but using a different critical duration. Namba et al. (2008) introduce a rather simple model for temporal integration of loudness by illustrating that sound exposure level (L A E ) correlates highly with the loudness of short and impulsive sounds. The L A E is a summation of energy within a predefined interval independent of the stimulus duration, one second. Despite of its simplicity, the L A E predicts the same effect of stimulus duration as more complex models like DIN 45631/A1 (2010) do, i.e. that a sound with 100 msec duration sounds 10 phon louder as a sound with 10 msec duration. It will be discussed how well the summation of energy within a shorter interval, much closer to the critical duration of RT, could serve as a correlate of RT.

We will start out by presenting three reaction-time experiments using stimuli which vary over or in time. The first one will focus on critical duration as Raab (1962) and Ulrich et al. (1998) did. However, the durations shall be varied in smaller steps than in the previous studies, i.e. using more than one condition per doubling of duration. After pinpointing the CD that way, it shall be studied whether temporal integration of RT shows the same behaviour as temporal integration of loudness with the only difference of a shorter CD, or whether it is based on an entirely different mechanism. This will be done by systematically modifying the temporal envelope and to test whether an increase in RT is accompanied by an equivalent decrease in predicted loudness. For this reason, Experiment 2 will vary the rise time, i.e. the increase of the envelope after stimulus onset. Experiment 3 will go one step further and use stimuli whose envelope is never constant. In particular, the envelope can also decrease shortly after onset. These stimuli will consist of amplitude-modulated sounds, which have a time-varying envelope whose variation is controlled by two parameters, i.e. modulation frequency and modulation phase.

Finally, the combined results of all experiments are used to evaluate the predictions for mean RT of a slightly extended parallel grains model (Miller and Ulrich, 2003) and for comparison with the temporal integration of loudness. A major difference between these two concepts is that temporal summation does not care about how energy is distributed in time, whereas probability density functions as used in the PGM give higher weight to intensity closer to the onset.

General method

Participants

Twelve listeners aged 19 to 44 years (median 22), three of them males and nine females, participated in Experiment 1, twenty in Experiment 2 (6 males, 14 females, 19 to 30 years, median 21) and twenty in Experiment 3 (3 males, 17 females, 20 to 34 years, median 23). All of them had thresholds in quiet better than 20 dB HL for frequencies between 250 Hz and 8 kHz, measured in octave steps. They participated for course credit.

Apparatus

The sounds were D/A converted by a RME Hammerfall DSP Multiface II audio interface, amplified by a Behringer HA8000 Powerplay Pro-8 and presented via Beyerdynamics DT-990 250 Ω headphones to the participants who were seated in a double-walled sound-proof chamber manufactured by the Industrial Acoustics Company.

The signals were calibrated using a Brüel & Kjær 4153 coupler with a DB 0843 adapter. It was determined that 1 V produces 100 dB SPL which is 2 dB less than the manufacturer’s data say. Discrepancies between the left and the right channel were corrected via the software settings of the audio interface though they amounted to less than 1 dB.

The telegraph key used was custom-made and offered a resistance comparable to that of a computer mouse. It was connected to a custom-made external electronic timer constructed according to the prototype of Kerber (2008). Its high-precision counter has a clock rate of 1 msec.

Stimuli

In all experiments, a diotic 1-kHz pure tone being presented to both ears was used as the carrier. Different properties were varied: the duration in Experiment 1, rise time and its shape in Experiment 2, modulation frequency and point of onset in Experiment 3. Details will be described in the appropriate subsections. To avoid confusion about different methods to measure sound pressure level, all reported levels refer to the peak amplitude of the sinusoid. The levels given are the equivalent sound pressure levels which would be obtained if the pure tone was held at its peak amplitude, without considering its duration, rise and fall time or amplitude modulation. For a precise definition of the stimulus’ temporal shape, i.e. its envelope after onset, Gaussian rise and fall times are specified as follows: They consist of a Gauss curve extending over three standard deviations, the rise or fall time specified in this paper is the duration which covers these three standard deviations. Thus, the Gaussian rise covers a range of sound pressure levels spanning 39 dB. The duration between 10- and 90 %-point is 56 % of the rise time reported here, the equivalent rectangular duration of the rise is 29 % of the rise time reported in this paper. Linear rise times cover the range from the onset at 0 to the maximum.

Procedure

The participants were instructed to press a telegraph key with their preferred finger of the dominant hand as soon as they heard any sound through their headphones. Each trial was started by presenting a red square on the screen for 200 msec. The foreperiod between the warning signal and the onset of the sound consisted of two parts: A fixed part of 500 msec and a randomly varying part drawn from an exponential distribution having a mean value of 1 sec. The entire waiting time was limited to 5 sec. If the value drawn exceeded this limit, it was recalculated before the trial. After pressing the telegraph key, the participants received visual feedback on the screen through a depiction of a button being pressed thus telling them the response had been registered. The inter-trial interval between a reaction and the start of the next warning signal was 1.5 sec. Trials resulting in reaction times of less than 100 msec or more than 1 sec were repeated at a random position in the same block. With latencies of less than 100 msec, we assumed that the listener had anticipated the sound, whereas latencies greater than 1 sec were taken to mean he or she had missed its onset.

In order to prevent fatigue, the participants were allowed to take breaks after blocks of about 100 trials. Within each block, the conditions occurred equally often and in random order. Before beginning with an experiment, the participants completed a short training. Furthermore, each block started with two additional training trials.

Experiment 1: Duration

Stimuli and procedure

The aim of Experiment 1 was to determine the critical duration for RT more precisely than previous studies did. For this reason only a single sound pressure level (SPL), 70 dB, was used in favor of more steps for duration. This SPL was chosen to represent a medium loudness level for signals well above threshold. The most interesting range for varying the duration was determined to be 32 msec to 200 msec, representing the range of CD for RT and loudness, and 3 steps were used per doubling of duration. In addition, two shorter and a longer duration were used, resulting in a set of durations with 10, 20, 32, 40, 50, 63, 80, 100, 125, 160, 200 and 500 msec. Gaussian rise and fall times of 5 msec were used. They are included in the durations specified. Thus, the shortest stimulus of 10 msec consists of rise and fall only. In addition, a 1-kHz pure tone with a duration of 200 msec and 50 dB was added in order to be able to draw a link between RT and SPL. In our previous study (Schlittenlacher et al. 2014) it turned out to be a linear relation within that range of levels.

Each participant accumulated 80 trials in each of the 13 conditions by completing eight blocks. These blocks were run in two separate sessions.

Results

When linking RT to a psychological or physical measure, for example loudness, it is necessary to average across all trials collected for a condition to find one representative value. For this purpose, the overall geometric mean was computed across participants and trials. Compared to other measures of central tendency, it is less prone to longer RTs regarded as outliers. The results are shown in Fig. 1 as a function of tone duration, error bars indicate the standard error of the mean. It can be seen that mean RT decreases with increasing duration up to a duration of 40 msec and remains virtually constant for longer durations. Employing a within-subjects one-way analysis of variance, the main effect of duration is statistically significant, F(11,121)=22.4, p < .001, \({\eta ^{2}_{p}} = .671\).

Fig. 1
figure 1

Geometric mean reaction times of 12 listeners to the onsets of 1-kHz pure tones as a function of duration. Circles indicate conditions with 70 dB SPL, the square the condition with 50 dB SPL, error bars the standard errors of the mean into each direction. The black solid line interpolates the 70-dB conditions by showing predictions of the parallel grains model (Miller and Ulrich, 2003) with parameters c = 2, G = 32, μ X = 44 msec, μ Y = 180 msec and M = 165 msec

One might wonder whether RTs at 200 and 500 msec duration (geometric mean of the two: 201.3 msec) are a little longer than those at 40 to 160 msec duration (geometric mean of the seven conditions: 199.5 msec). A paired t-test using each participants’ geometric means of the two sets of stimulus durations, however, is not statistically significant, t(11) = 1.63, p = 0.131. Post-hoc t-tests comparing each duration to the 40-msec condition, after which RT seems to remain constant, are statistically significant for 10 msec, t(11) = 6.94, p < .001, for 20 msec, t(11) = 3.37, p < .01, and might show a tendency for 32 msec, t(11) = 1.89, p = .085 (two-tailed, no correction for multiple tests). For all longer durations, p-values are greater than .20, suggesting their RTs are not different to the RT at 40 msec duration.

Inspecting the two conditions for the duration of 200 msec, it turns out that RT at 50 dB SPL is 16.9 msec slower than that at 70 dB SPL, corresponding to 0.85 msec per dB. Interpolating the data for 70 dB by fitting them to the parallel grains model (PGM) of J. Miller and Ulrich (2003, see also section “Evaluation and development of models” for details), allows us to compare the control condition of 50 dB SPL and 200 msec duration not only in terms of a difference in RT, but also in terms of the duration at 70 dB SPL which evokes the same RT. The RT for 50 dB SPL and 200 msec duration equals the RT calculated for 70 dB SPL and 14 msec duration. By comparison, the same intensity is integrated in one hundredth of the time if the level is raised by 20 dB.

Ratcliff (1979) described a procedure for estimating distributions of reaction times, which is meant to represent the distribution of the average participant. The procedure itself is a form of quantile averaging called “Vincentizing” after its inventor (Vincent, 1912). For the present data set, each participant’s 80 RTs for a given condition were sorted in ascending order and for each of the thus sorted 80 trials (i.e. quantiles) the geometric mean was calculated across participants. Figure 2 represents these distributions as cumulative density functions (CDFs). The CDF for the 200 msec duration at 70 dB SPL (black dashed line) is shown as a reference in each graph. The difference between the 200-msec and 10-msec condition (top left) is apparent throughout the whole range, the difference between the 200-msec and 20-msec (top center) or 32-msec (top right) conditions emerges between the 10th and 70th to 80th-percentile. The remaining distributions in the region where mean RT asymptotes, i.e. from 40 to 500 msec duration, are almost indistinguishable. For the 200-msec 50-dB SPL tone (bottom right), showing the second longest RT, the condition with 70 dB SPL and 10 msec duration is depicted to provide a second reference (dotted line).

Fig. 2
figure 2

Cumulative distribution functions obtained through “Vincentizing” the data (Ratcliff, 1979). The black dashed line shows the 70-dB-SPL, 200-msec condition as a reference, the 12 panels depict the 12 other conditions. The panel at the bottom right for the 50-dB-SPL condition shows a second reference (70 dB SPL, 10 msec, dotted line). Kolmogorov-Smirnov tests yield significant differences between the 10-msec and 200-msec distribution (p < .01, top left) and between the 50-dB 200-msec and 70-dB 200-msec distribution (p < .01, bottom right). The tests fail to become significant for all other panels

Discussion

The present results narrow down the critical duration for RT to the onset of a 1-kHz pure tone with a sound pressure level of 70 dB SPL to between 32 and 40 msec. This lies in the larger ranges already obtained by Raab (1962) and Ulrich et al. (1998). It is considerably shorter than the CD for loudness and underscores the requirement of a constant duration if RT is taken as a correlate of loudness, as Heil et al. (2006) pointed out. An explanation why the CDs for loudness and RT are different might be difficult. However, a line of reasoning is suggested by Heil et al., who propose that RT is not a direct correlate of loudness, but that underlying mechanisms are closely related.

When scrutinizing the CDFs of the intermediate durations (40 to 160 msec) with regard to a potential shorter RT than the longer durations (200 and 500 msec), some of the panels in the second and third row of Fig. 2 suggest a slight advantage in the region of the high percentiles or reaction times around 200 to 300 msec, respectively. This might indicate that the offset of the stimulus provides an additional information to trigger the reaction. However, it must be kept in mind that this potential advantage of roughly 2 msec did not reach statistical significance in mean RT. Thus, the present experiment does not provide evidence for elevated reaction times at long stimulus durations clearly exceeding the critical duration, which occasional observations in the literature had suggested (Gregg and Brodgen, 1950).

Experiment 2: Rise time and shape

When discussing the temporal characteristics of an auditory stimulus influencing RT, another important aspect other than duration is how exactly the sound is switched on. Even though most psychoacoustic experiments use a rise time in order to preserve the frequency characteristics of a tone and to avoid a click noise or spectral splatter, most models for RT completely ignore this fact by assuming an abrupt onset. In doing so, an interesting issue gets missed, whether RT is determined by detectors for intensity (or level, respectively) or change detectors, i.e. depending on the steepness of the rise. Although higher intensity is related to a steeper rise, the two types of detectors might be distinguishable based on the magnitude of stimulus change they respond to. For example, if the intensity eventually reached is kept constant and two linear rise times differing by a factor of 10 are used, the final intensity of course is the same and physical intensity levels integrated during any interval exceeding the longer rise time differ by less than 5 dB. The difference in steepness by a factor of 10, however, leads to an increase in amplitude by 20 dB if the shorter rise is extended in time to have the same duration as the longer one. Therefore a difference in RT generated by change detectors should be comparatively large and correspond to a 20-dB effect. Although a level detector is not necessarily an integrator of physical energy (it could refer to a neural correlate, for example, and might make up more than 5 dB), smaller differences in RT are indicators for a dominance of level detectors. The following experiment shall determine RT as a function of rise time and thus allow conclusions on the influence of the steepness or shape of the rise compared to the final intensity of the sound.

Stimuli and procedure

All stimuli were 1-kHz pure tones with a sound pressure level of 70 dB and a duration of 250 msec. This duration includes the rise time and a Gaussian fall time of 5 msec. The rise time was 2, 5, 10, 20 or 50 msec and two rise shapes, Gaussian and linear, were used. Each of these 10 conditions was presented 60 times per participant throughout the experiment. The trials were arranged in five blocks.

Detecting differences between the two shapes by listening to the sounds is rather difficult and may even be impossible for naive listeners. Nevertheless both shapes were included into the experiment because the Gauss curve starts out flatter, which could lead to longer RT.

Results

Geometric mean RTs are depicted in Fig. 3 as a function of rise time. Qualitatively, it can be seen that RT increases monotonically with rise time. This main effect of rise time is confirmed by a 5 x 2 (rise time x rise shape) within-subjects analysis of variance, F(4,76) = 9.94, p < .001, \({\eta ^{2}_{p}} = .344\). Neither the main effect of rise shape, F(1,19)=1.07, p = .320, \({\eta ^{2}_{p}} = .053\), nor the interaction between rise time and shape, F(4,76)=0.81, p = .520, \({\eta ^{2}_{p}} = .041\), are statistically significant. This allows us to collapse data across the two types of rise shape and to inspect their (geometric) means (connected diamonds in Fig. 3). They show a convex increase of RT on the logarithmic abscissa for rise times up to 20 msec or maybe the critical duration.

Fig. 3
figure 3

Geometric mean reaction time is shown as a function of rise time. Circles illustrate a Gaussian rise, squares a linear rise, diamonds the geometric mean between the two. Error bars indicate the standard error of the means

The difference in RT between rise times of 2 and 20 msec is 7 msec, the difference in RT between the rise times of 5 and 50 msec is 8 msec. Since both Experiment 1 and earlier measurements using our setup (Schlittenlacher et al. 2014) all suggest a tradeoff between RT and SPL around 0.8 msec per dB, the differences in RT for tenfolding the rise time correspond in magnitude to those obtained when changing level by roughly 10 dB.

Discussion

Experiment 2 demonstrated the influence of rise time on RT to the onset of a tone. Although the shapes of a linear and Gaussian rise differ, i.e. the Gaussian starts and ends more gradually but is steeper in its middle, we did not find a difference in RT between the two. Looking at any pair of data points in Fig. 3, their difference in RT on the ordinate is smaller than their difference in rise time. This is clear proof for the less surprising fact that the reaction is at least partly triggered before the rise time has elapsed and the final intensity is reached.

Regarding the initial question whether RT is caused by level detectors or change detectors, it can be excluded that RT depends solely on change detectors, because differences in RT are considerably smaller than would be expected if the steepness of the rise was their sole determinant. Level detectors can be of various nature. Obviously, the final level reached is not the crucial factor either as it was kept constant across all conditions. An integration of physical energy over 50 msec or more would lead to smaller differences than the experimental results and their conversion to decibels suggest. Note that the CD for RT is not much shorter than these 50 msec. However, it seems possible that RT is determined by a different kind of level detector or a combination of level and change detectors. One of the examined models (see section “Evaluation and development of models” and Fig. 8), which is a probabilistic account modelling RT as a function of intensity over time and thus uses level detectors, predicts RTs quite close to the present results.

Experiment 3: Amplitude-modulated sounds

In order to investigate the influence of intensity as a function of time in greater detail, a third experiment using amplitude-modulated sounds with several modulation frequencies and points of stimulus onset was conducted.

Stimuli and procedure

The stimuli were constructed by multiplying a 70-dB-SPL 1-kHz pure tone with a sinusoidal envelope which corresponds to a modulation degree of 0.999. Thus the level of the amplitude-modulated sounds varied between 10 and 70 dB SPL. Three different modulation frequencies (f m ) were used: 2, 4 and 8 Hz. The sound was started at four different points of the envelope, being referred as modulation phases (φ m ) of 0, 1/2 π, π and 3/2 π. 1/2 π stands for an onset at the maximum of the envelope, 3/2 π for an onset at its minimum. 0 and π are modulation phases which start 6 dB below the maximum, with the envelope increasing after onset for φ m = 0 and decreasing after onset for φ m = π (see the upper abscissa in Fig. 4). The duration of the stimulus was 250 msec, implying it consisted of one period for f m = 4 Hz, two periods for f m = 8 Hz and half a period for f m = 2 Hz. A Gaussian rise and fall time of 5 msec preceded and followed the stimulus, resulting in a total stimulus duration of 260 msec. Thus appending the rise time instead of modifying the first few milliseconds ensured that the rise was exactly the same for the three modulation frequencies at a given modulation phase. An exemplary stimulus (f m = 4 Hz, φ m = 0) is shown in Fig. 4. A thirteenth condition with the maximum amplitude and no modulation was added as a control and reference.

Fig. 4
figure 4

Example of a stimulus (f m = 4 Hz, φ m = 0), its amplitude is shown as a function of time. The top abscissa shows the corresponding modulation phase. The modulation frequency of 4 Hz results in one period during the 250 msec duration, which is preceded and followed by a 5 msec rise and fall time. Accordingly, the 8-Hz stimuli consist of two periods, the 2-Hz stimuli of half a period

80 trials were collected for each condition and participant. They were conducted in eight blocks distributed over two sessions.

Results

Figure 5 shows the results as a function of modulation phase. The modulation frequencies are depicted by blue circles (2 Hz), red squares (4 Hz), black diamonds (8 Hz) or a purple triangle (no modulation). Typical standard errors of the mean are shown for the 8-Hz conditions, they amount to approximately 2 msec for the other conditions as well. Geometric mean reaction times range from 217 msec to 257 msec with the longer RTs occurring at φ m = π and in particular at φ m = 3/2π. The modulation frequencies cause a spread of RT, especially at these latter two modulation phases. Furthermore, f m = 8 Hz results in the longest RT among the three modulation frequencies for φ m = π but the shortest for φ m = 3/2π, and the opposite occurs for f m = 2 Hz. All of these effects are confirmed by a 4 x 3 (φ m x f m ) within-subjects analysis of variance, disregarding the control condition with no modulation. The main effect of modulation frequency is statistically significant, F(2,38)=4.54, p < .05, \({\eta ^{2}_{p}} = .193\) as is the main effect of modulation phase, F(3,57)=170, p < .001, \({\eta ^{2}_{p}} = .900\). The same is true for the interaction between the two, F(6,114)=19.3, p < .001, \({\eta ^{2}_{p}} = .504\). Post-hoc paired t-tests using the Holm correction for p-values (p H o l m ) were calculated based on the geometric means across trials for any combination of participant, modulation phase and modulation frequency and forming groups by either modulation frequency, modulation phase or the combination of two. Applied to the four groups of modulation phases, the tests yield significant differences between all of them except for φ m = 0 compared to φ m = 1/2π, which shows a tendency, t(59) = 1.73, p H o l m = .090. Comparing the three modulation frequencies the same way, the smallest p-value is obtained for 2 and 4 Hz, t(79) = 2.18, p H o l m = .097.

Fig. 5
figure 5

Geometric mean reaction times to amplitude-modulated 1-kHz pure tones. The abscissa shows the points of onset in terms of modulation phase. Blue circles indicate a modulation frequency of 2 Hz, red squares 4 Hz, black diamonds 8 Hz and error bars standard errors of the mean. The dashed curve depicts the envelope as a function of modulation frequency, not referring to the ordinate

At φ m = 1/2π, the shortest mean RTs are observed and differences among modulation-frequency conditions amount to less than 4 msec here. For this modulation phase, all sounds start at 70 dB SPL and decay in amplitude rather slowly, less than 6 dB in the first 31 msec (a quarter of the period for f m = 8 Hz). Although the RTs are sorted in the expected order as they lose energy, no significant difference between any of these conditions could be found using the pairwise paired t-tests for the twelve groups obtained by combining modulation frequency and phase. The same is true for φ m = 0, no significant differences can be seen for the RTs differing by less than 3 msec. For φ m = π, meaning that the sound starts at 64 dB SPL but decreases rapidly in level, RTs are ordered as would be expected by the amount of decrease in amplitude (RT 8H z > RT 4H z > RT 2H z ). Though the test between 2 and 4 Hz shows no significant difference here, t(19) = 1.18, p H o l m = 1, t-tests between 2 and 8 Hz or 4 and 8 Hz both are statistically significant, p H o l m <.001. The RTs for modulation frequencies starting at φ m = 3/2π are shorter the faster sound pressure level increases after onset, all pairwise tests between them being significant.

Two further interesting data points for a direct comparison are the conditions for f m = 8 Hz with φ m = π and φ m = 3/2π, which show hardly any difference in mean RT. The first starts at 64 dB SPL and reaches 10 dB SPL 31 msec later, which is just opposite for the second. When looking at the cumulative distribution functions (right panel in Fig. 6), the low percentiles show faster RTs for the condition which starts at the higher SPL. However, the two CDFs do not show a difference for the longer RTs and even an advantage for the other condition at the highest percentiles. All other conditions are different throughout the most part of percentiles if they differ in mean RT or they are hardly distinguishable from each other if they do not differ in mean RT.

Fig. 6
figure 6

Cumulative distribution functions obtained through “Vincentizing” (Ratcliff, 1979) the data of Experiment 3. The left panel shows all conditions for a modulation frequency of 2 Hz, the center panel 4 Hz and the right panel 8 Hz. The dashed black lines show modulation phases (φ m ) of 0, dotted black lines φ m = 1/2π, solid red lines φ m = π and blue solid lines with dots φ m = 3/2π. φ m = 1/2π stands for an onset at the maximum of the envelope. Kolmogorov-Smirnov tests yield significant differences between the 3/2 π and any other distribution at the left panel (all p < .001); and at the center panel between the 3/2 π and 1/2 π distribution (p < .001) and between the 3/2 π and 0 distribution (p < .001). All other Kolmogorov-Smirnov tests for the curves shown do not reach statistical significance

Discussion

In the present experiment, the sound pressure level at onset was determined by the modulation phase. It turned out to play an important role for RT. Furthermore, when the modulation phase is fixed and modulation frequency is varied, plausible differences in RT can be observed. In this case, the modulation frequency determines how fast the intensity decreases or increases as a function of time. When SPL decreases more rapidly, RT is longer. When it increases faster due to a higher modulation frequency, RT is shorter. This ordering can be seen very well for modulation phases of π and 3/2 π. When comparing the modulation phases of 0 and π, there is further evidence that RT not only depends on the onset. Both have the same level at onset, 64 dB SPL, and the first five milliseconds during the rise time are identical. However, RT is longer for φ m = π, because level decreases after onset whereas it increases for φ m = 0.

For modulation phases of 0 and 1/2 π, the modulation frequency does not have a significant effect. For these conditions, the stimulus starts at a rather high level and reaches smaller levels more slowly. Even for a modulation frequency of 8 Hz and onset at the maximum amplitude, it takes 31 msec plus 5 msec rise time until the level drops by 6 dB. This duration matches exactly the critical duration determined in Experiment 1, meaning that these stimuli do not differ much in intensity during the CD. The important role of the CD is underlined when comparing f m = 8 Hz at φ m = π and φ m = 3/2π. RT is almost the same for the two although their intensities at onset differ enormously. However, during the 31 msec after rise time, intensity integrates to the same amount. Altogether, Experiment 3 provided a closer look at the role of the growth of intensity in the temporal envelope during the initial integration time critical for RT.

Evaluation and development of models

The present results show the importance of considering the initial buildup of stimulus strength or intensity as a function of time for simple reaction time. If models disregard this fact, they can only be valid for stationary sounds additionally having an abrupt onset, which is rarely true for sounds outside the laboratory. As there are plenty of models for reaction time, it seems reasonable to adapt an existing sophisticated model to the present domain. In the following, the parallel grains model (Miller & Ulrich, 2003) will be slightly extended to account for time-varying sounds based on the results of Experiment 1 and on data from the literature. The focus will be on (geometric) mean reaction time in order to compare model predictions to the simple concept of integrating intensity over time, as it is known from work on loudness. After developing these two alternatives, both will be evaluated by comparing their predictions to the results of Experiments 2 and 3 which used time-varying sounds.

Parallel grains RT model

The parallel grains model (PGM, J. Miller & Ulrich, 2003) is a probabilistic model which predicts RT as a function of intensity and duration. It consists of G parallel grains. Such a grain could be regarded as some kind of information entity travelling through a channel to the decision center. The probability for a single grain to be activated depends on the intensity of the stimulus and is modelled as an exponentially distributed random variable with mean activation time μ x . After activation, each grain is transmitted to the decision center, with the transmission time being another exponentially distributed random variable with mean μ y , but independent of the stimulus. A reaction is triggered when c grains have reached the decision center. It takes another M milliseconds until the reaction is recorded, M being a constant denoting the additional time required for the finger movement and key press. When fitting the model to experimental data, M is also the variable which represents the dependence of mean RT on the set of participants and the particular apparatus used.

Miller and Ulrich (2003) emphasize that the PGM is rather simple, a general framework applicable to all senses and for explaining several aspects of simple reaction time, but not covering any potential eventuality. However, extending the PGM to be suitable for time-varying sounds is straightforward. Compared to stationary sounds, time-varying sounds differ in intensity over time. The part of the PGM which depends on intensity is the activation time. Remember, it is an exponentially distributed random variable. This implies its hazard function, the probability of activation within the next millisecond if it has not happened yet, is constant with a constant rate λ x = 1/μ x . If intensity is not stationary, but varies over time, the hazard function should not only be a function of a single intensity I, but rather of intensity at time t, I(t). The only modification with respect to the original model is in the hazard function of the activation time: The probability, that a grain is activated within the next millisecond at time t, is determined by I(t).

An algorithmic implementation of the modified model is straightforward. A density function for a grain’s activation is calculated recursively using the hazard function or rate λ x (t), respectively, starting from the first millisecond. If the sound does not last infinitely long, the obtained function cannot integrate to 1 and thus is not a valid probability density function. However, integrating over it yields the activation probability for a grain. The obtained density function divided by the activation probability yields the probability density function for the activation time. For stationary sounds, the model did not change and makes the same predictions as before.

For a quantitative analysis of the model, it is necessary to determine its parameters, G, c, μ x , μ y and M. For modeling the results of Experiment 1, the original PGM can be used as the stimuli do not vary in intensity over time. As our primary focus is on mean reaction time, the twelve geometric means of the 70-dB-SPL conditions were fitted by the model. An exhaustive search varying G from 8 to 128 (two values per doubling), c from 1 to 16 (one value per doubling), μ x from 10 to 50 msec (step size of 2 msec) and μ y from 10 to 500 msec (step size of 10 msec up to 100, 20 msec up to 300 and 50 msec above), found a minimum root mean square error (RMSE) for G = 32, c = 2, μ x = 44 msec, μ y = 180 msec and M = 165 msec with a RMSE of 0.87 msec. M was set by calculation, not a numerical approach, to minimize the root mean square error (RMSE) for a given set of the other four parameters. Altogether 22 combinations resulted in a MSE less than 1. Nine of them showed a G of 32 and six a G of 23, 16 a c of 2. μ x and μ y varied throughout the whole range investigated. In the following, only the best fit is taken.

In order to determine μ x as a function of intensity, further data on RT as a function of the sound pressure level of a stationary sound are needed. Chocholle (1944, Table 4) shows mean RT to the onset of a binaural 1-kHz pure tone for one participant but a range from -2 to 90 dB HL. The studies of Schlittenlacher et al. (2014) contain data for the same stimulus for 20 participants in 5-dB-steps from 45 to 85 dB SPL. Figure 7 (left panel) shows further unpublished data from our laboratory. They are geometric mean RTs to a 1-kHz pure tone measured on 21 participants contributing 60 trials each. Thus, they provide us further valuable data to determine the function linking intensity and reaction time for our setup. Although the data of Chocholle (1944) rely on one participant only and it is not exactly defined how to convert the reported HL to SPL, they were included because they are very much in line with our results and, above all, contain data for very low intensities. To allow us a better comparison between the experiments, reaction times are normalized and shown as the difference relative to the RT at 70 dB SPL.

Fig. 7
figure 7

Left panel: RTs to the onset of a 1-kHz pure tone as a function of sound pressure level. The ordinate shows the difference in RT with respect to that obtained for 70 dB. Red circles show the results of Chocholle (1944, one participant), blue squares results of Schlittenlacher et al. (2014, 20 participants), black triangles data from our laboratory for 21 further participants. The solid line marks the mean between the three experiments. Right panel: Fit of μ x to the parallel grains model (Miller & Ulrich, 2003) with parameters c = 2, G = 32, μ X = 44 msec for SPL = 70 dB, μ Y = 180 msec, duration >> critical duration

The data for SPLs greater than 40 dB could be approximated by a straight line, the entire set could be fitted by functions proposed by Piéron (1920) or Wagner et al. (2004). To avoid a discussion about a best fitting function, a rather conservative estimate is chosen by simply taking a linear interpolation. First, the mean across the experiments is calculated for each SPL given. Subsequently, SPLs in between are just interpolated linearly.

Predicting the mean results of the left panel of Fig. 7 determines μ x as a a function of intensity for the given set of parameters, G = 32, c = 2, μ y = 180 msec and the boundary condition that μ x = 44 msec for an SPL of 70 dB. This mapping of intensity or SPL to μ x is shown in the right panel of Fig. 7. These parameters will be used for the modified PGM, whereby I(t) determines λ x (t) = 1/ μ x (t) and the only parameter to be estimated for each experiment is M. Changing M only accounts for differences between groups of participants or experimental setups, it does not affect the differences between the conditions of a single experiment, which we are interested in.

Linking RT to modelling of loudness

The concept of using L A E as an indicator of temporal integration of loudness (Namba et al. 2008) simply means that intensity integrated over time correlates with loudness. In the case of reaction time, it should be a negative correlation, with more intense sounds causing shorter RT. Experiment 1 has shown that an integration over several hundred milliseconds might be inappropriate for RT as a sound with a duration of 40 msec produces the same RT as longer sounds. However, L e q,40m s e c could be a correlate of reaction time, i.e. the intensity integrated in the first 40 msec of the sound. If not only a rank order is to be predicted, but also a difference in RT between two conditions, the decibels of L e q,40m s e c may be converted to RT using the left panel of Fig. 7.

When interpreting the results, it must be noted that the L e q,40m s e c is rather simple and more complex models (e.g. Chalupper & Fastl, 2002; Glasberg & Moore, 2002) might make more accurate predictions. However, all of them including the L e q have in common to predict a gain of about 10 phon when increasing duration by a factor of ten. Furthermore, and in contrast to the more complex models, the concept of L e q can be adapted to the critical duration of RT, 40 msec, very easily.

Evaluation of the models

The parameters of the modified PGM and L e q,40m s e c were determined from Experiment 1, which varies duration but not level over time, and several other experiments linking the level of a stationary 1-kHz pure tone to RT. Thus they are independent of the results of Experiments 2 and 3. The data from these two experiments, using time-varying sounds, will be used for evaluating the two concepts.

Figure 8 shows the predictions made by the two models for Experiment 2 compared to the actual data. The modified PGM (left panel) predicts RT as a function of rise time rather well, the RMSE between predictions and evaluations is 1.6 msec. This amount might not be negligible for an effect size of about 10 msec, however, it is smaller than the standard error of the mean of the RTs measured. Furthermore, the Gauss curve starts out flatter than a linear rise, which leads to a prediction of slightly longer RTs, while no significant difference was found in the experimental data. However, as can be seen in Fig. 8 or the RMSE, the predicted discrepancy is not very large.

Fig. 8
figure 8

Model predictions for Experiment 2. Left panel: Predictions made by the PGM. The abscissa shows the rise time, the ordinate reaction time. Red circles indicate the predictions for a Gaussian rise, blue squares for a linear rise, red downwards triangles experimental results for a Gaussian rise, blue triangles pointing upwards experimental results for a linear rise. Right panel: Black triangles indicate the geometric mean of the two rise shapes for the experimental results. For the predictions made by L e q,40m s e c (red circles for Gaussian rise and blue squares for linear rise), the left ordinate shows the intensity in dB SPL. The right ordinate indicates a conversion of these SPLs to RT increments using the data from the left panel of Fig. 7

Looking at the predictions made by the L e q,40m s e c (right panel), there are several discrepancies. L e q,40m s e c differs by less than 2 dB for rise times between 2 and 20 msec. However, the data show a significant increment in reaction time. Furthermore, there are discrepancies when converting SPLs to RT. In general, the L e q,40m s e c greatly underestimates the slowing of RT with longer rise times, resulting in a RMSE of 4.1 msec. The difference is especially large for a rise time of 20 msec, showing a difference of 7 msec towards the actual mean RT. The right axis is somewhat irregularly spaced because of the simple interpolation used to convert SPL to RT. However, the difference of 7 msec would be the same when using a linear regression. This departure is large when considering the range of RTs, which is just 10 msec. Expressing the difference between prediction and evaluation for a rise time of 20 msec in decibels, it amounts to 10 dB. Choosing a different duration for the integration of intensity would not be much more successful. In order to predict significantly different equivalent levels for rise times of 2, 5, 10 and 20 msec, as demanded by the experimental results, the time for integration would have to be smaller than 5 msec. That is far away from the critical duration.

Predictions for Experiment 3 are shown in Fig. 9. The modified PGM (left panel) predicts the results with a RMSE of 3.0 msec. Considering the range of experimental results, roughly 40 msec, the RMSE is less than 10 %. Furthermore, the pattern of the predictions is virtually the same as that of the actual data, with the only difference being somewhat narrower spread.

Fig. 9
figure 9

Model predictions for Experiment 3. Left panel: Predictions made by the PGM. The abscissa shows the modulation phase with 1/2 π being an onset at the maximum of the envelope. The ordinate shows the reaction time. Blue circles represent a modulation frequency of 2 Hz, red squares 4 Hz, black diamonds 8 Hz and triangles the experimental results (2 Hz: blue downwards; 4 Hz: red rightwards; 8 Hz: black upwards) . Right panel: Predictions made by L e q,40m s e c are shown. The left ordinate depicts the intensity in dB SPL. The right ordinate indicates a conversion of these SPL to RT using the left panel of Fig. 7. The coding of colors and symbols is equivalent to the left panel

The model derived from loudness, L e q,40m s e c (right panel), also predicts the effects of modulation phase, modulation frequency and their interaction rather well. There is hardly any difference in L e q,40m s e c for all conditions with a modulation phase of 0 or 1/2 π, as it was for their mean RTs in the experiment. The ordering of the L e q,40m s e c predictions also agrees with that of the data for the other modulation phases. The spread between the conditions is less pronounced by L e q,40m s e c , leading to a RMSE of 7.3 msec when converted to RT. However, L e q,40m s e c qualitatively predicts the results of Experiment 3 quite well.

General Discussion

The present work consists of three experiments investigating the influence of a stimulus’ temporal characteristics on reaction time to its onset. Experiment 1 pinpoints the critical duration for a 1-kHz pure tone with an SPL of 70 dB SPL to fall somewhere between 32 and 40 msec. Raab (1962) found the CD to depend on SPL and to be slightly longer for lower SPLs. Furthermore, his results suggest the CD to be shorter than 50 msec at both 60 and 40 dB SPL. Combining his conclusions with ours, the CD at 70 dB SPL is supposed to be longer than 32 msec and the CD at 40 dB SPL is still rather close to this value, i.e. not longer than 50 msec.

Varying the rise time in Experiment 2 highlighted that the steepness of the rise alone cannot explain RT. Since RT definitively also depends on the rise part, the entire distribution of intensity over time during the CD has an effect on RT. One might also want to compare the effects of rise time and duration. Increasing rise time from 2 to 20 msec increased RT by about 7 msec. The effect of duration is considerably larger, with a decrease in RT of about 20 msec for increasing duration from 10 to 20 msec.

Experiment 3 went a step further on and employed amplitude-modulated sounds to study the role of temporal envelope changes in the first few milliseconds after onset. Despite having the same rise in the first five milliseconds, these stimuli produced different RTs to their onsets because of the temporal variations immediately afterwards. Both intensity close to the onset and its envelope during the critical duration seem to have an effect on RT.

One possibility to explain these effects is to base reaction time on the integration of intensity, as is known from work on loudness (Namba et al. 2008). This was done successfully for Experiment 3. However, integrated intensity alone cannot explain RTs quantitatively as a function of rise time. This cannot be fixed by changing the duration for the integration. A similar problem occurs for Experiment 1, when comparing the RT of the 50-dB tone to the curve for the 70-dB tones. The difference of 20 dB implies taking a hundredth of intensity. However, in order to obtain an equal reaction time as in the 50-dB condition, the critical duration of a 70-dB tone must not be divided by 100 but rather by 2 to 3 (see Fig. 1). If the duration of a stimulus was defined excluding rise and fall time, this ratio still would not exceed 10. A reason for the difficulty to explain RT by temporal integration of intensity could be that the integration gives the same weight to each part during its duration. For RT, earlier segments might have a higher weight, though.

By contrast, the parallel grains model (Miller and Ulrich, 2003) predicted the experimental results quite well. It was slightly extended in a straightforward manner in order to be able to deal with time-varying sounds. That was accomplished by making the activation rate λ x a function of time λ x (t) through making it depend on I(t) instead of I. Making the hazard function a variable of intensity over time, the probability density function inherently gives higher weight to intensities measured closer to the beginning of the sound. Experiment 1 was used to estimate the parameters of the model. When comparing these parameters to a fit made by Miller and Ulrich (2003) for the data of Raab (1962), they are quite similar: c equals 2 in both cases, G is in the same order of magnitude (32 compared to 20), μ x differs but is also in the same range (44 msec in the present work for 70 dB SPL compared to 46 msec at 60 dB HL). Only the mean transmission time μ y (180 compared to 21 msec) and the motor component M (165 compared to 101 msec) are significantly longer in the present estimation, maybe representing a difference between well-trained and rather naive participants or differences in the procedure, e.g. the type of randomization.

When using the parameters for predicting the results of Experiments 2 and 3, the predictions of the modified PGM come close to the RTs measured. This is remarkable because the model was not fit using these data. Regarding the stimuli and results of Experiment 3, it is important that the PGM gives higher weight to the earlier segments. Doing so implies that ramped sounds having rising intensity should evoke longer RTs than damped sounds having falling intensity. However, loudness studies show that ramped sounds are perceived louder than damped sounds (e.g. Ries, Schlauch, & DiGiovanni, 2008). This contrast underscores that RT might not be an unconditional correlate of loudness.

To sum up, the present set of experiments investigated reaction times to sounds varying in level during the first few milliseconds after onset. The effects observed are difficult to explain by what is known about temporal integration of loudness. It appears that, in order for RT to be a correlate of loudness, all stimuli must have the same normalized envelope (see Heil et al., 2006). When using stationary sounds which do not contain very low frequencies, and when keeping the rise time constant, this is not a problem. RT to the time-varying sounds studied in the present experiments, however, can be explained by an extended probabilistic race model with many parallel channels, the parallel grains model of Miller and Ulrich (2003). The extension inherently implements a kind of weighting for the input intensities as a function of time, being higher the closer they are to the onset. The model correctly predicts the critical duration, determined to be between 32 and 40 msec in Experiment 1, accounts for slightly longer RTs in the case of flatter rises after onset (Experiment 2) and, above all, it can predict the effects of arbitrary distributions of intensity over time (Experiment 3) quite accurately.