Introduction

While most psychoacoustical research has been primarily concerned with the perception of stationary signals, a number of studies in the past decade have started investigating more complex, time-varying stimuli (e.g., Chi, Ru, & Shamma, 2005; Lu, Liang, & Wang, 2001; Neuhoff, 1998; Oberfeld, Heeren, Rennies, & Verhey, 2012). Undoubtedly, time variation is essential to completely understand the perception of environmental or musical sounds, for which spectral and energetic characteristics constantly change over time. Processing such changes involves particular attentional and cognitive mechanisms, and recruit specific neural circuits. As pointed out recently by Schutz and Vaisberg (2014), although too often neglected in the past, a sound’s temporal shape plays a crucial role in its perception. In particular, perceptual differences in loudness, timbre, or perceived duration have been observed between tones that have identical energy and long-term spectrum, but opposite temporal profiles. Because they occur between temporally asymmetric stimuli – the term “temporal asymmetry” (Patterson, 1994) was used to describe the sounds with different attack and decay times – these arising perceptual differences were often referred to as perceptual “asymmetries” (e.g. see Grassi & Pavan, 2012; Meunier, Vannier, Chatron, & Susini, 2014; Ries, Schlauch, & DiGiovanni, 2008; Susini, McAdams, & Smith, 2007). As a result, this term will also be used throughout the rest of this paper.

The present work is specifically concerned with asymmetries in global loudness. Listeners asked to make global loudness judgments of rising- and falling-intensity sounds of a few seconds, i.e. “to evaluate the overall loudness of the sound over its entire duration,” give greater estimates to rising sounds compared to their time-reversed versions, falling sounds (Ponsot, Susini, Saint Pierre, & Meunier, 2013; Susini et al., 2007). The causes and mechanisms underlying these asymmetries remain unclear.

Most previous studies on auditory perceptual asymmetries have relied on short sounds (from 10–250 ms) with temporally asymmetric amplitude envelopes. This concerns the domains of physiology (e.g., Neuert, Pressnitzer, Patterson, & Winter, 2001), psychophysics (e.g., Meunier et al., 2014; Schlauch, Ries, & DiGiovanni, 2001; Stecker & Hafter, 2000) as well as neurosciences (Wang, Qin, Chimoto, Tazunoki, & Sato, 2014). Temporally asymmetric stimuli were used to model the amplitude envelope of environmental and musical sounds. Percussive-like, also called damped, stimuli have been modeled using envelopes with fast attacks and slow decays. Similarly, bowing-like, also called ramped, stimuli (with slow attacks and fast decays) were mimicked using temporally reversed versions of the percussive-like sound envelopes. Psychophysical studies have shown strong perceptual asymmetries between these two types of stimuli for timbre (Irino & Patterson, 1996), perceived duration (e.g., Meunier et al., 2014; Schlauch et al., 2001), and loudness (e.g., Stecker & Hafter, 2000). Physiological measures have also supported these psychological findings. Investigating the response characteristics of primary auditory cortex neurons to ramped and damped 80-ms stimuli in awake cats, Wang et al. (2014) showed that distinct cells were involved in the processing of ramped and damped sounds, and that cells activated by ramped stimuli had longer, more persistent responses than those stimulated by damped stimuli. That ramped stimuli are associated with a relative “persistence of excitation” was taken to corroborate the theory that “persistence of perception” at the behavioral level (DiGiovanni & Schlauch, 2007; Ries et al., 2008) explains why ramped sounds are perceived as longer and louder than damped sounds. Cognitive factors were also found to play a role in these perceptual asymmetries between short sounds. Stecker and Hafter (2000) proposed the existence of a “decay suppression” mechanism, with which listeners would ignore the decay part of damped stimuli, because it represents reverberation from the acoustical environment, resulting in sounds judged softer than ramped stimuli. The same authors, observing that the loudness asymmetry between 250-ms ramped and damped sounds was significantly reduced and even removed when ramped stimuli were presented as priors, proposed that this “local context” effect was based on a cognitive mechanism that favored the subjects processing in a particular representation mode. Damped prior stimuli would bring the subjects to process in a “constancy mode” and to eliminate the decay portion (reverberation part) of the following stimulus. Later, DiGiovanni and Schlauch (2007) demonstrated that the size of the asymmetry in perceived duration between ramped and damped stimuli was modulated by the instructions given to the participants. When the participants were explicitly asked to consider all aspects of the sounds, and to not ignore decay portions, the asymmetry was found to be significantly reduced. These results demonstrate that cognitive aspects of sound evaluation can modulate the perceptual asymmetries that occur with temporally asymmetric sounds of short duration.

Although less documented, the perception of non-stationary sounds of longer durations has also been explored, in particular with rising- and falling-intensity sounds lasting a few seconds. While asymmetries in perceived duration appear to fade out after 1 s (Grassi & Darwin, 2006; Grassi & Pavan 2012; Meunier et al., 2014), asymmetries in loudness are still observed with longer sounds. In particular, Neuhoff (1998) observed that 1.8-s rising-intensity tones were perceived with a greater loudness change Footnote 1 than falling-intensity tones. He argued that the overestimation of rising sound sources would provide a “selective advantage” to an organism’s preparation for approaching objects, characterized by their rising-intensity profiles, compared to receding objects that produce falling-intensity profiles. This effect was also demonstrated with tones, chords and vowels stimuli lasting 1.8 and 3.6 s (Olsen, Stevens, & Tardieu, 2010; Olsen & Stevens, 2010). Neuroscience and psychophysiological experiments on humans showed that rising-intensity tonal stimuli from 750 ms to 2 s produce neural activity in the amygdala, recruit specific attentional and physiological resources and induce higher emotional ratings compared to falling-intensity sounds (Bach et al., 2008; Bach, Neuhoff, Perrig, & Seifritz, 2009; Seifritz et al., 2002; Tajadura-Jiménez, Väljamäe, Asutay, & Västfjäll, 2010). Similar results are observed in non-human primates, for whom behavioral and neural differences are also found between the two types of stimuli (Ghazanfar, Neuhoff, & Logothetis, 2002; Maier & Ghazanfar, 2007). This bias in primates is considered adaptive, serving as a warning cue to process looming sound sources with higher priority. Furthermore, recent behavioral studies reported a perceptual asymmetry between 2-s rising- and falling-intensity tones when participants were instructed to evaluate the global loudness of these sounds (Ponsot et al., 2013; Susini et al., 2007). Although less studied than the loudness change, the global loudness is of particular interest in many fields of applications concerned with loudness measurement, e.g., loudness normalization of audio data in the media (Glasberg & Moore, 2002) or loudness assessment of vehicle and road-traffic noise in industry (e.g. Kaczmarek & Preis, 2010). However, the magnitude and robustness of the global loudness asymmetry between rising- and falling-intensity sounds of a few seconds have never been systematically assessed. In particular, it remains unclear how much this asymmetry could rely on contextual factors, as previously demonstrated for shorter duration stimuli.

The present study was designed for that purpose. The magnitude of the asymmetry was estimated from loudness measurements of rising- and falling-intensity sounds obtained in several experimental contexts and using different psychophysical methods. We examined both “global context” effects, by manipulating the type of stimuli presented within an experimental block, and “local context” effects, by manipulating the type of stimuli that immediately precedes a sound. In addition, we measured asymmetries for ramps presented at different intensity regions, to explore potential level-dependent effects.

The “global context” of stimuli presentation was rarely considered as a factor in previous studies. In most of them, measuring either the loudness change or the global loudness of rising- and falling-intensity sounds, the two types of sounds were either presented in a random order in one-interval paradigms (Neuhoff, 1998; Susini et al., 2007; Teghtsoonian, Teghtsoonian, & Canévet, 2005) or compared one with another in two-interval paradigms (Neuhoff, 2001; Olsen et al., 2010; Olsen & Stevens, 2010; Ponsot et al., 2013). However, it is now recognized that the range, the spacing, and the distribution of a set of stimuli that only vary in intensity influence the loudness evaluation of each stimulus (Arieh & Marks, 2011; Marks, 1993). Thus, the way in which different types of stimuli are presented plausibly also affects their evaluation. In a recent study investigating the perceived duration of damped and flat-intensity stimuli, Vallet, Shore, and Schutz (2014) showed that the strategies underlying duration estimation for the two types of sounds can be modulated by the “blocking procedure,” namely whether the two types of stimuli are presented in “separate” blocks or “mixed” within a same block. In “separate” presentations where each type of stimulus is presented in a specific block, participants can guess the type of the stimulus that follows with 100 % certainty, and thus can switch from one evaluation strategy to another every time a new block starts. On the contrary, in “mixed” presentations where stimuli types are randomly interleaved, participants cannot predict the type of the following stimulus. In this case, listeners seem to adopt a common, low-cost evaluation strategy for both profiles (Vallet et al., 2014). In addition, the magnitude of the asymmetry in perceived duration measured between ramped and damped stimuli in Grassi and Pavan (2012) was found to be larger in a “mixed” context compared to a “separate” context. The authors argued that the “mixed context” emphasizes the perceptual distinctiveness of the stimuli, resulting in larger perceptual asymmetries (Grassi & Pavan, 2012, p. 1329). We can thus reasonably assume that a similar phenomenon might impact the loudness asymmetry between rising- and falling-intensity sounds. This hypothesis was tested in a first experiment using a magnitude estimation procedure (Experiment 1): asymmetries in loudness between rising and falling stimuli were compared between two “global contexts,” using magnitude estimates obtained for two groups of participants, performing the experiment in either a “mixed” or in a “separate” context of presentation.

Besides “global context,” “local context” effects on the loudness asymmetry have also rarely been considered. However, several of such “sequential” effects have been documented for constant-intensity stimuli (for a review, see Arieh & Marks, 2011), and two of them may be worth mentioning. First, with “loudness enhancement,” the loudness of a tone is increased when a preceding tone is presented within a short temporal window under certain level conditions (for details, see Oberfeld, 2007). Second, with “Induced Loudness Reduction” (ILR), the opposite effect is reached when the preceding tone is relatively loud (Epstein, 2007). Despite supporting the view that the estimation of a stimulus not only reflects its perception at a given time but also depends on the acoustical characteristics of previous stimuli (Lockhead, 1992), such “local context” effects have received only little attention regarding non-stationary sounds. With temporally asymmetric sounds, the mechanism of decay suppression (Ries et al., 2008; Stecker & Hafter, 2000) has only been observed for short-duration sounds. The influence of prior stimulus direction remains to be explored with longer sounds and no specific assumption can be made concerning the potential mechanisms that could modulate their loudness perception and evaluation. The influence of “local context” was thus explored in Experiment 1 by considering the influence of the prior stimuli direction on the loudness estimates given by the “mixed context” group. In addition, a second experiment was conducted (Experiment 2) to examine whether the type of comparison stimulus, in a two-interval paradigm, affects the loudness asymmetry between rising- and falling-intensity tones. This was done by comparing the asymmetry assessed with two groups of participants who were asked (i) to match the loudness of rising- and falling-intensity sounds presented in a same pair (Exp. 2A), or (ii) to match the loudness of each ramp relatively to constant-intensity tones (Exp. 2B). Note that this latter condition did not allow subjects to make any direct opposition when comparing the two “objects” (rising vs. falling), contrary to what was done in previous studies. If the global loudness judgments of the ramps inferred from these different matching configurations provide similar asymmetries, it would indicate that listeners use a common and robust loudness evaluation strategy. Moreover, by ruling out simple explanations of the effect based on judgment bias only, this would suggest that rising and falling tones contain an intrinsic characteristic, which results in directional-specific loudness judgments.

Finally, the robustness of the loudness asymmetry is discussed with respect to the quantitative psychophysical estimates obtained in the present experiments (Exps. 1, 2A and 2B) and the magnitude of the effect is compared with predictions from current loudness models.

Experiment 1

In this experiment, the loudness of 2-s rising- and falling-intensity tones with 15-dB dynamics (changing in level over 15 dB) was measured in a one-interval paradigm using an absolute magnitude estimation procedure (AME). The influence of “global context” was examined using a between-subject experimental design. In one group of participants (the “mixed context” group), rising and falling ramps were interleaved and presented in the same blocks; in the other group (the “separate context” group), they were presented in two distinct blocks. The influence of “local context” was examined as a within-subject factor in the “mixed context” group only, by considering the influence of prior stimuli direction (rising/falling) on loudness estimates.

Method

Participants

A total of thirty-two participants (aged between 20 and 34 years) were recruited for this experiment. They were assigned to one of the two groups (N = 16 and sex ratio = 1 in each group), which performed the experiment under different conditions, as described below. All reported normal hearing. The participants were paid for their participation. They were all naïve with respect to the hypotheses being tested.

Apparatus

The stimuli were generated at a sampling rate of 44.1 kHz with 16-bit resolution using Matlab. Sounds were converted using a RME Fireface 800 soundcard, amplified using a Lake People G-95 Phoneamp amplifier and presented diotically through headphones (Beyerdynamic DT 770 PRO). The levels were calibrated using a Brüel & Kjær artificial ear (type 4153) coupled with the mounting plate provided for circumaural headphones. Each participant was tested in a double-walled IAC sound-insulated booth.

Stimuli

All the stimuli were 1-kHz pure tones with 10-ms linear rise and fall times (on amplitude). Ten constant-intensity tones of 500 ms (between 45 and 90 dB SPL) were used to measure the individual loudness functions. Otherwise, linear (in decibels) rising- and falling-intensity ramps with 15-dB dynamics and 2-s duration were employed in the experiment. These ramps were presented at seven different intensity regions: [45–60], [50–65], [55–70], [60–75], [65–80], [70–85] and [75–90 dB SPL] (in what follows, the ramps varying, e.g., in the [45–60 dB SPL] region are simply referred to as 60-dB SPL ramps).

Procedure

The Absolute Magnitude Estimation (AME) procedure was used for this experiment, based on the instructions of Hellman (1982). No standard was given to the participants, whose task was simply to give a number proportional to the global loudness of each sound regardless of the numbers assigned to previous stimuli. We asked participants to judge the global loudness of the sounds, as in previous studies (Ponsot et al., 2013; Susini et al., 2007; Susini, Meunier, Trapeau, & Chatron, 2010), defined as “the loudness of the sound over its entire duration.” For each group of participants, the experiment was divided into two similar sessions scheduled on different days. Each session started with the measurement of the loudness function. As in Canévet, Teghtsoonian, and Teghtsoonian (2003), the ten constant-intensity tones were presented in a “quasi-random” order to reduce assimilation effects, as proposed by Cross (1973). These tones were evaluated 18 times each in a session. The loudness function measurement was followed by the main experiment, which was composed of the 14 rising and falling ramps presented 13 times each. For one group of participants (the “separate context” group), rising and falling ramps were presented in two distinct blocks whereas for the other group (the “mixed context” group), rising and falling ramps were randomly mixed in the same experimental block. Thus, each participant performed the magnitude estimation task in the same global context (“mixed” or “separate”) over the two experimental sessions. A between-subject design was adopted in order to avoid any “asymmetrical transfer effect” between the two conditions (see Poulton & Freeman, 1966; Poulton 1982). In each block, the ramps were presented in random order. For the “separate context” group, the order of presentation of the two blocks (rising/falling) was counterbalanced between subjects. Before each session, participants were instructed not to change their scale between the preliminary measurement of the loudness function, where they rated the loudness of constant-intensity tones, and the following blocks that contained rising and falling-intensity sounds. Subjects became familiar with the procedure by giving their estimates for 20 constant-intensity sounds (between 45 and 90 dB SPL) before the beginning of each session, which lasted approximately 1 hour.

Normalization and data analysis

For each subject, the average perceived loudness of each stimulus was computed using the geometric mean of the ratings. Mean loudness estimates were then normalized individually for each session so that all estimates fall within the same range and could be expressed in loudness unit (sone). All the ratings given by a subject to constant tones and ramps in a daily session were divided by the mean rating assigned to the 50-dB SPL constant-intensity tone and multiplied by two, to match the loudness of a 50-dB SPL, 1-kHz pure-tone, which corresponds to two sones (see Susini et al., 2010). Power loudness functions were then fitted to constant-intensity sound estimates to evaluate the loudness exponent of the loudness function of each subject in each session.

A mixed-ANOVA [1-between (Context) and 3-within (Direction × Session × Level) factors] using a univariate approach was performed on the logarithm of the normalized loudness ratings accorded to rising- and falling-intensity stimuli. Since no significant differences between experimental sessions and no interactions with other experimental factors were found (p > .05), the data from the two experimental sessions were then pooled together and the analysis was rerun without the factor “Session.” All the statistical analyses presented in this study were conducted using R (R Core Team, 2013). Unless otherwise specified, all the tests were two-tailed and used a probability level of .05 to test for significance. The Huynh–Feldt correction for degrees of freedom was used where appropriate and \( \tilde{\varepsilon} \) correction factors are reported. Effect sizes are reported using generalized eta-squared η g 2 (Bakeman, 2005).

Results

Loudness asymmetry

The mean loudness ratings given to rising and falling ramps are plotted as a function of the maximum level of their range of intensity-variation (between 60 and 90 dB SPL) in Fig. 1 (panel a). The black symbols correspond to the estimates of the “mixed context” group and the grey symbols to the estimates of the “separate context” group. To evaluate quantitatively the asymmetry in loudness between rising- and falling-intensity ramps for the two groups of participants, we converted loudness ratings given by each subject into phon units by using his own loudness exponent. For a given maximum level, the loudness estimated in sones (Fig. 1, panel a) was converted into a loudness level in phon by finding the corresponding dB SPL value on the loudness function for both rising and falling tones. The difference between rising and falling tones loudness levels (in phons) provides a measure, in decibels, of the asymmetry. This calculation is made for each listener, at each intensity region. Mathematically, it corresponds to the use of the following index:

$$ \mathrm{Asymmetry}\left(\mathrm{dB}\right)=\frac{20}{e}\times log\left(\frac{S_{rising}}{S_{falling}}\right) $$
(1)

where e is the individual exponent of the loudness function and S rising and S falling are the mean loudness ratings measured in the experiment for rising and falling tones, respectively. The asymmetries computed with this index are presented on panels (b) and (c) of Fig. 1.

Fig. 1
figure 1

Results of Experiment 1. (a) Mean normalized loudness ratings, in sones, of the “mixed context” (in black) and the “separate context” (in grey) groups plotted as a function of the maximum level of the ramps. (b) Loudness asymmetries between these rising and falling ramps assessed for the different levels, using the index (1), for each group (black bars, “mixed context;” grey bars, “separate context”). (c) Asymmetries averaged over the different intensity regions for each group. Note that positive values indicate that rising-intensity tones were perceived as louder than falling-intensity tones. Error bars indicate standard errors of the mean (SEM)

As can be seen in panel a of Fig. 1, rising ramps were perceived as louder than falling ramps in the two “global” contexts. This claim was supported by the statistical analysis, which revealed a significant effect of stimuli Direction (rising / falling) [F (1, 30) = 34.390, p < .001, η g 2 = .01]. A small but significant interaction Level × Direction was also found [F (6, 180) = 4.657, p < .001, \( \tilde{\varepsilon} \) = .51, η g 2 = .0005], indicating that the size of the asymmetry is not the same at all levels. However, the same mixed-ANOVA but without the estimates for the 90-dB SPL ramps gives no significant interaction Level × Direction (p > .05), which shows that the asymmetry is in fact similar over the different regions from 60 to 85 dB SPL and that the observed interaction is due to the estimates given for the 90-dB SPL ramps.

These asymmetries were assessed quantitatively using the index (1) and were averaged over the different intensity regions to evaluate the overall magnitude of the effect (see Fig. 1, panel c). One-tailed t-tests conducted on these data show that the mean asymmetries are significantly positive; 5.3 dB (SD = 5.2) for the “separate context” group [t(15) = 4.017, p < .001] and 3.1 dB (SD = 3.7) for the “mixed context” group [t(15) = 3.4215, p < .01].

Global context effect

The influence of the “global context” can be observed directly in the ANOVA by looking at the interaction Context × Level. This interaction turned out to be not significant (p > .05), which does not support the hypothesis that a “mixed context” would have caused a greater asymmetry. A second mixed-ANOVA [1-between (Context) and 1-within (Level) factors] was also conducted on the asymmetries computed with the proposed index to support this observation (i.e., based on the asymmetries plotted on Fig. 1, panel b). Again, no influence of the context was found (p > .05) and a significant effect of the Level was obtained [F (6, 180) = 4.667, p < .001, \( \tilde{\varepsilon} \) = .42, η g 2 = .031], which was caused by a smaller asymmetry at 90 dB SPL compared to other levels, as confirmed by a post-hoc mixed-ANOVA conducted on the data but without the estimates for the 90-dB ramps, where the effect of the Level became non-significant (p > .05).

Local context effect

To examine potential “local context” effects, another ANOVA was conducted on the results of the “mixed context” group only, taking into account the direction of the preceding stimulus (PrecDir) on each rating [i.e., a within-three-factor repeated-measure ANOVA (Direction × PrecDir × Level) was conducted]. The ratings of the “mixed context” group are plotted as a function of the maximum level of the ramp on the left panel of Fig. 2, with the preceding stimulus direction as a parameter. A small but significant effect of the preceding stimulus direction was found [F (1, 15) = 11.661, p < .01, η g 2 = .0014], as well as significant two-order and three-order interactions [PrecDir × Direction: F (1, 15) = 8.945, p < .01, η g 2 = .0003; PrecDir × Direction × Level: F (6, 90) = 5.126, p < .001, \( \tilde{\varepsilon} \) = .94, η g 2 = .0019]. As can be observed on the left panel of Fig. 2, the estimates given to falling tones (grey and black downward triangles) were virtually unaffected by the preceding stimulus direction, whereas the estimates given to rising tones were slightly increased when they were presented at low levels and preceded by rising tones (black upward triangles), compared to falling tones (grey upward triangles). Asymmetries computed for the different levels and priors with the index (1) are presented in the right panel of Fig. 2. Greater asymmetries are obtained at 60 and 65 dB SPL when the preceding stimulus is a rising ramp compared to a falling ramp.

Fig. 2
figure 2

Left panel: Mean normalized loudness ratings, in sones, of the “mixed context” group specifically. Upward and downward triangles correspond to estimates given to rising and falling ramps, respectively, when the preceding tone was either a rising ramp (black triangles) or a falling ramp (grey triangles). Right panel: Asymmetries computed for each level and each configuration (rising prior with black bars, falling prior with grey bars). Error bars show standard errors of the mean (SEM)

Discussion

In this experiment, the loudness of rising- and falling-intensity tones was measured using an AME procedure in two different experimental contexts. The present results confirm the existence of loudness asymmetries between rising and falling tones in both contexts.

Our first research question was whether the global context of stimulus presentation influences the magnitude of the loudness asymmetry. The results show no significant difference between the asymmetries measured in “mixed” and “separate” contexts. Our hypothesis that a “mixed context” would have reinforced the loudness asymmetry, compared to a “separate context,” cannot be verified here. These results indicate that listeners don’t need to compare the two types of stimuli, and have access to some intrinsic information, which led them to produce different global loudness estimates. This provides evidence for the robustness of their evaluation strategy.

Our second research question was the influence of the local context of stimuli presentation. We found a significant effect of prior stimulus direction, but the size of this effect was about ten times smaller than the effect of interest (namely, the direction of the ramp). The estimates given to rising stimuli were slightly higher with rising priors than with falling priors, specifically at low sound levels (see Fig. 2, right panel). This effect is different from that observed by Stecker and Hafter (2000) with short-duration stimuli, for which damped stimuli were judged lower when preceded by a damped stimulus than by a ramped stimulus. Because the stimuli of the present experiment were separated by silent intervals of several seconds, it is very unlikely that this level-dependent effect has sensory origins. Whether it relies on some cognitive or judgment-deviation phenomenon, induced by the end of rising ramps on the following, low-level, rising ramps, still has to be determined. As a result, the asymmetry was found to be slightly greater at low levels (60 and 65 dB SPL) when assessed with rising priors than with falling priors, but remained similar at other levels. This small effect of “local context,” occurring only at low levels, suggests that the evaluation strategy was not specifically affected by the direction of prior stimuli.

Finally, the magnitude of this asymmetry was similar at all intensity regions, except for ramps culminating at 90 dB SPL, where it turned out to be significantly smaller. The reasons for this decrease only at the highest level remain undetermined.

In summary, the results of this experiment reveal that the context of stimuli presentation does not particularly modulate the global loudness asymmetry between 2-s rising- and falling-intensity tones. Overall, the present data show that the global loudness of falling stimuli is around four phons lower than the global loudness of rising stimuli.

Experiment 2

The second experiment was designed to examine the loudness asymmetry in a two-interval paradigm. Loudness of 2-s, 15-dB dynamics stimuli was measured using an interleaved adaptive 2I, 2AFC procedure. Rising and falling ramps were compared in loudness to another by one group of subjects (Experiment 2A), while another group compared rising and falling stimuli to constant-intensity stimuli (Experiment 2B). Loudness estimates were compared between the two groups, providing new insights into the sensitivity of this perceptual asymmetry to the type of comparison tone.

Method

Participants

Ten subjects (aged 21–44 years, sex ratio = 1) participated in Experiment 2A and 12 different subjects (aged 18–33 years, sex ratio = 1) took part in Experiment 2B. None reported having hearing problems. They provided informed consent prior to the experiment and were paid for their participation. The participants were naïve with respect to the hypotheses being tested.

Apparatus

The apparatus used was the same as described in Experiment 1.

Stimuli

The stimuli were 1-kHz pure tones with duration of 2 s. All the stimuli had 10-ms linear rise and fall times. In both experiments, the experimental design was developed to measure the loudness matches between rising and falling ramps having 15-dB dynamics, or between ramps and constant-intensity tones. For the ramps, the five lowest intensity regions tested in Experiment 1 were reemployed in this experiment: [45-60 dB SPL], [50-65 dB SPL], [55-70 dB SPL], [60-75 dB SPL] and [65-80 dB SPL]. The fixed-level test tones were rising or falling ramps (for details, see upper panels of Figs. 3 and 4) and the variable comparison tones were either ramps (in Exp. 2A) or constant-intensity tones (in Exp. 2B). The maximum level of the comparison tones was limited to 95 dB SPL.

Fig. 3
figure 3

Upper panel: Experimental configurations used in Experiment 2A. The two types of standards (rising or falling ramps, filled in grey) were matched in loudness with their opposite ramps. Double arrows indicate that the levels of the comparison tones (unfilled) were varied by the adaptive procedure. Lower panel: Measured level differences between the second (L2) and the first tone (L1) of the pair at equal-loudness in the different configurations, derived from the obtained loudness matches, plotted as a function of the maximum level of the standards. Error bars show SEM

Fig. 4
figure 4

Upper panel: Experimental configurations used in Experiment 2B. The two types of standards (rising or falling ramps, filled in grey) were matched in loudness with constant tones. Lower panel: Measured level differences between the second (L2) and the first tone (L1) of the pair at equal-loudness in the different configurations, derived from the obtained loudness matches, plotted as a function of the maximum level of the standards. Error bars show SEM

Procedure

In both Experiments 2A and 2B a loudness matching task was employed to determine the loudness matches of the ramps in the 20 conditions [four Configurations × five Levels] as presented in upper panels of Figs 3 and 4. The stimuli were presented in a 2I, 2AFC paradigm based on an interleaved adaptive procedure (Florentine & Poulsen, 1996; Jesteadt, 1980). In each experiment, the loudness matches were obtained by using 20 interleaved tracks, i.e., one for each condition. On each trial, listeners heard two tones separated by 500-ms silent intervals. Their task was to indicate which sound was louder by pressing a button (labeled “first” or “second”) on the interface. The response initiated the next trial after a 1-s delay. The level of the comparison tone for each track was adjusted according to a one-up, one-down procedure (Levitt, 1971). If the listener indicated that the comparison tone was louder than the standard its level was decreased; otherwise its level was increased. Different starting values and step sizes were chosen to make the difficulty of the task more homogenous between the different tracks. On the one hand, starting level values of comparison tones were determined in a preliminary experiment so as to be perceived either clearly louder or softer than the standard (Nieder, Buus, Florentine, & Scharf, 2003). The comparison tone started above or below the expected equal-loudness level, with equal a priori probability. On the other hand, since the near miss to Weber’s law predicts an increase of sensitivity with sound level for 1-kHz tones (Rabinowitz, Lim, Braida, & Durlach, 1976), the initial step size was linearly adjusted (across the five intensity regions under study) from 3.6 dB for the ramps in the highest intensity region [65–80 dB SPL] to 5 dB for the ramps in the lowest intensity region [45–60 dB SPL].

To ensure that the tracks converge at roughly the same time, we used the same rule as in Grimm, Hohmann, and Verhey (2002); each trial was randomly assigned a track, but this choice was restricted by requiring that each track be selected once before any track could be reselected. For each track, the step size was reduced by a ratio of 1.5 after every two reversals; and after six reversals, it was held constant. This procedure ended when eight reversals were achieved for each of the 20 tracks. The level of the comparison tone thus converged towards the 50 % point of the psychometric function (Levitt, 1971) for each track (i.e., each condition). The loudness matches were then defined as the average of the maximum level of the comparison tone in the last two reversals of each track. An experimental session lasted approximately 1 h and required about 500 trials to end (i.e., 25 trials per track).

Results

The level differences between the second tone and the first tone of the pair (L2 – L1) at equal loudness measured in the different configurations of Experiments 2A and 2B are presented in Figs. 3 and 4, respectively, as a function of the maximum level of the ramps. Statistical analyses of each experiment were conducted on these level differences. In Experiment 2A, three factors were considered: the pair order (rising-falling/falling-rising), the position of the comparison tone in the pair (first/second), and the level of the ramp (five levels, see above). This was done using a within three-factor repeated-measure ANOVA [Order × PosComp × Level] with a univariate approach. A similar analysis was conducted on the data obtained in Experiment 2B, where the influence of the position, the direction, and the level of the standard was examined using a within three-factor repeated-measure ANOVA [PosStand × Direction × Level].

The results of Experiment 2A (Fig. 3, lower panel) show that the level differences (L2 – L1) are positive in the rising-falling configurations – i.e., (a) and (b) – and negative in the opposite falling-rising configurations – i.e., (c) and (d). In other words, the level of the falling tone was matched higher than the level of the rising tone to produce equal-loudness. This asymmetry in loudness is supported by statistical analysis, where a significant effect of the Order is found [F (1, 9) = 40.26, p < .001, η g 2 = .35]. However, as can be observed in this panel, these differences depend strongly on the level considered; a significant effect of the Level is obtained [F (4, 36) = 8.86, p < .001, \( \tilde{\varepsilon} \) = .68, η g 2 = .16]. Besides, this level effect also depends on the pair order, as indicated by a significant Order × Level interaction [F (4, 36) = 11.49, p < .001, \( \tilde{\varepsilon} \) = .90, η g 2 = .10]. Finally, level differences measured in configurations (a) and (b) were similar, as well as in configurations (c) and (d), as revealed by a non-significant effect of the position of the comparison tone and no significant interactions with other factors (all Ps > .05). This is due to the interleaved procedure, which ensured that subjects would not notice which configuration they were presented with, and could thus not use different strategies to compare the stimuli based on which the first or the second tone was defined as the standard or the comparison tone.

The results of Experiment 2B, plotted in Fig. 4 (lower panel), show negative differences in the [ramp-constant] configurations – i.e., (d) and (e) – and positive differences in the [constant-ramp] configurations – i.e., (f) and (g). The analysis reveals a strong and significant effect of the position of the standard [F (1, 11) = 124.65, p < .001, η g 2 = .69]. Thus, the level of the constant-intensity tone was matched lower than the maximum level of the ramp to produce equal loudness, i.e. the ramps were perceived as louder than constant-intensity tone presented at their maximum level. As in Experiment 2A, a strong effect of the Level can also be observed [F (4, 44) = 39.52, p < .001, \( \tilde{\varepsilon} \) = .62, η g 2 = .33]. Finally, the absolute magnitude of the level differences drawn between ramps and constant tones is larger in the conditions where the standard was falling – (f) and (h) – than where the standard was rising – (e) and (g); a significant interaction between the position of the standard and its direction (rising/falling) is found [F (1, 11) = 30.48, p < .001, η g 2 = .33]. Rising tones were thus matched at a higher level than falling tones, which provides clear evidence for a loudness asymmetry.

Time order errors

Before looking in closer detail at these loudness asymmetries and their relative magnitudes, one may ask first why the level differences were strongly influenced by the region of intensity, in both Experiments 2A and 2B. In fact, the decreasing patterns observed in all the configurations investigated in these two experiments (see lower panels of Figs. 3 and 4) indicate that the first stimulus tended to be perceived more often as louder than the second stimulus when the level of these two tones was felt in low intensity regions (dark grey and dark colors), compared to in high intensity regions (white and light grey colors), where the opposite phenomenon is observed. Such level effects are common in experiments where the levels of the stimuli are roved trial-by-trial, or when different levels are tested – as in the present experiments – and are due to time-order errors “TOEs” (Berliner, Durlach, & Braida, 1977; Hellström, 1979, 2003; Macmillan & Creelman, 2004). In a roving-level paired comparison of loudness where two identical stimuli S1 and S2 are presented, they are almost never perceived as equal. One of the suggestions proposed to explain TOEs is that the two stimuli are weighted differently, depending on their intensity region (Hellström, 1979). In this sense, at short ISIs, a greater weight is generally accorded to the first stimulus when the levels of the two tones are in low intensity regions, and a greater weight is accorded to the second stimulus when the levels of the two tones are in high intensity regions (Hellström, 2003). This particular perceptual weighting would be used by subjects to increase the detectability of level changes between the stimuli along an experiment where the level of the standard is roved or varied (for more details, see Patching, Englund, & Hellström, 2012). Thus, when the TOE is defined as the level difference between S2 and S1 at equal loudness, with short ISIs, negative TOEs are generally observed in high intensity regions and positive TOEs in low intensity regions (Hellström, 2003). In both Experiments 2A and 2B, the level differences reported in the different configurations are thus undoubtedly affected by this phenomenon, which explains why decreasing profiles of the level differences as a function of the maximum level of the ramps were observed (see Figs. 3 and 4, lower panels).

Loudness asymmetries

Since the initial aim of this study was concerned with the potential loudness asymmetries between rising and falling tones and how they rely on the pair configuration and the level of presentation, a specific treatment of the data is proposed afterwards to compensate the TOEs. The only way to obtain data free of TOE-effects was to average the results in the four different configurations of each experiment. Thus, it still allowed assessing the magnitude of the loudness asymmetries for the different levels investigated in the two experiments, but a specific evaluation of the pair order influence on the asymmetry was not possible. The mean loudness asymmetries were then calculated for each intensity region by averaging the signed level difference between the second and the first stimulus of a pair matched in loudness in the different configurations, as follows:

  • In Experiment 2A:

    $$ \mathrm{Asymmetry}\left(\mathrm{dB}\right)=\frac{a+b-c-d}{4} $$
    (2)
  • In Experiment 2B:

    $$ \mathrm{Asymmetry}\left(\mathrm{dB}\right)=\frac{e-f+g-h}{2} $$
    (3)

The results of these averaged asymmetries are presented in Fig. 5. The assessment of the influence of the type of comparison stimulus (TypeComp) and the level of the ramps was made by comparing these mean asymmetries in a mixed-ANOVA [1-between (TypeComp) and 1-within (Level) factors]. No influence of the type of comparison tone was found, as revealed by a non-significant difference between the two groups (p > .05). However, the analysis showed a marginally significant effect of the Level [F (4, 80) = 2.697, p < .05, \( \tilde{\varepsilon} \) = .49, η g 2 = .06]. This effect was caused by the significantly smaller size of the asymmetry at 80 dB SPL compared to other lower levels. Indeed, a post-hoc mixed-ANOVA conducted on the data of Experiments 2A and 2B without the estimates for the ramps at the 80-dB level shows that the effect of the Level becomes non-significant. In addition, as in the first experiment, additional one-tailed t-tests were conducted on the asymmetries averaged over levels to demonstrate that these values were overall significantly higher than zero in both Experiments 2A [Mean = 3.4 dB (SD = 2.3); t(9) 4.826, p < .001] and 2B [Mean = 3.2 dB (SD = 2.0); t(11) = 5.521, p < .001].

Fig. 5
figure 5

Loudness asymmetries measured when rising and falling ramps were compared within the same pair (Experiment 2A, black bars), or compared against constant-intensity tones (Experiment 2B, grey bars), presented as a function of their maximum level (left panel) and averaged over the different intensity levels (right panel). Error bars correspond to SEM

Discussion

Asymmetries between 2-s rising and falling tones were assessed from direct loudness matching tasks in Experiment 2A and from indirect loudness matching tasks in Experiment 2B, where constant-intensity stimuli were employed as comparison tones. Strong TOEs were found in both tasks and were compensated by averaging the different configurations. Overall, the results show that the magnitude of these asymmetries were similar between the two experiments, providing clear evidence against any significant influence of the type of comparison stimulus on the asymmetry. If the asymmetry was (partly) due to some particular judgment bias, different values would likely have been found when the ramps were matched together compared to when the ramps were matched with constant-intensity tones. Because the data do not show such differences, this supports the results of Experiment 1, suggesting that rising- and falling-intensity sounds contain an intrinsic information that lead to their asymmetry. As in Experiment 1, similar asymmetries were found between rising and falling ramps at lower levels (with peak levels between 60 and 75 dB SPL), but significantly smaller asymmetries were found between rising and falling ramps culminating at 80 dB SPL. The origins of this decrease at high levels remains to be examined.

Furthermore, one might wonder whether Induced Loudness Reduction (ILR) could have impacted the present results. In the configuration (g) of Experiment 2B only, that is where the falling ramp is preceded by a constant tone (see Fig. 4), the conditions required for ILR to occur are met: (1) the two tones are identical in frequency, (2) the first tone is 10 to 20 dB louder than the second tone, and (3) the two tones are separated by several hundred ms (Epstein, 2007, 2013; Oberfeld, 2007). As a result, the loudness of the falling ramp (starting at a low level) might have potentially been reduced by the preceding constant tone. If this were true, one direct consequence of this effect would be a greater asymmetry when the constant comparison tone was the second stimulus compared to when it was the first stimulus, especially at high levels. To test this, a comparison of the asymmetries between the conditions where the constant tone was presented first (calculated using the index A1 = (h) – (g)) or second (using A2 = (e) – (f)) was made. No significant effect or interactions with the level of the ramp were found (all Ps > .05), suggesting that there was no particular influence of ILR on the present results.

General discussion

In this study, we measured the global loudness of 2-s rising- and falling-intensity tones using different measurement methods and contexts of presentation. The influence of “global” and “local” contexts on the perceptual asymmetry was examined in the first experiment (Experiment 1) using a one-interval paradigm. We found that “global context” of presentation had no significant influence on the loudness asymmetry, and that “local context” (i.e., prior stimuli direction) only had a small effect. This shows that no particular strong “representation mode” is involved in the processing of long stimuli, which contrasts with previous results obtained for short duration stimuli where a particular cognitive mechanism – the parsing of sounds into “source” and “reverberation” – was found to cancel the asymmetry under certain conditions (Stecker & Hafter, 2000). Based on two-interval loudness matching paradigms, Experiments 2A and 2B extended the findings of Experiment 1. We found no influence of the type of comparison stimulus on the magnitude of the asymmetry. In addition, over all three experiments, we measured asymmetries for ramps at different intensity regions, culminating at between 60 and 90 dB SPL. The size of the asymmetry was virtually unaffected by the level of presentation, except for the loudest ramps (at 90 dB SPL in Experiment 1 and at 80 dB SPL in Experiment 2), where a significant decrease was observed. Whether this peculiarity for high-level stimuli relies on any contextual edge effect or has sensory origins remains to be determined in a future study. Overall, the mean size of the asymmetry was similar when the loudness was measured by AME (Experiment 1), when rising and falling sounds were compared to each other (Experiment 2A) and compared against constant-intensity tones (Experiment 2B); it was quantified and found to fall within the range 3–5 dB. Durlach, Braida, and colleagues (e.g., see Braida, Lim, Berliner, Durlach, Rabinowitz, & Purks, 1984; Durlach & Braida, 1969) demonstrated that in one-interval paradigms such as in magnitude estimation tasks, subjects primarily judge the stimulus intensity by operating in a “context-coding” mode (that is by comparing the sensory representation to the stimulus context), whereas in two-interval paradigms like in roving-level discrimination tasks, an optimum combination of the “context-coding” mode and the “trace-mode” (the latter based on the direct sensory representation of the stimulus) is used. Taken in this framework, our results suggest that the perceptual asymmetry between 2-s asymmetrical stimuli is not constrained by any particular coding mode. Since two different measurement methods and various contexts of stimuli presentation yielded, on average, very similar values, our study provides strong evidence for the robustness of the mechanism responsible for the asymmetry in global loudness between 2-s symmetrical stimuli.

Predictions from current loudness models

It is interesting to compare the magnitude of this asymmetry with predictions from two loudness models applicable to time-varying stimuli (dynamic loudness model (DLM), Chalupper & Fastl, 2002; time-varying loudness model (TVL), Glasberg & Moore, 2002). Peak values of both short-term (proposed in DLM and TVL) and long-term (in TVL only) loudness time-series predicted by these models were used to evaluate the global loudness of our stimuli. Thereafter, loudness ratios were used to compute the asymmetries at four different intensity regions ([45–60 dB SPL], [55–70 dB SPL], [65–80 dB SPL], and [75–90 dB SPL]) that cover the ranges examined in the present experiments. Asymmetries obtained using these short-term and long-term loudness predictions are reported in Table 1. Asymmetries evaluated with DLM and TVL short-term loudness (STL) lie between 0.4 dB and 0.5 dB, which predict very small or almost no difference in loudness between rising and falling tones. Asymmetries predicted with TVL long-term loudness (LTL) are slightly greater (around 1.3 dB) but still fall well below the loudness differences assessed psychophysically. These asymmetries can be seen with TVL time-series on Fig. 6, where both STL and LTL of a rising-intensity tone reach higher peaks than for a falling-intensity tone. Note that these predictions are the consequence of the temporal integration stages employed in the models. Indeed, instantaneous loudness patterns are symmetrical for the two profiles. For example, the Glasberg and Moore model (2002) uses two successive Automatic Gain Control (AGC) circuits to convert the instantaneous loudness to a short-term loudness pattern first, and then to a long-term loudness pattern. Moreover, while smaller perceptual asymmetries were found for the ramps in the highest regions of our experiments, the models give virtually identical predictions for the different levels (cf. Table 1).

Table 1 Mean loudness asymmetries (in dB) calculated with dynamic loudness model (DLM) (Chalupper & Fastl, 2002) and time-varying loudness model (TVL) (short-term loudness (STL) and long-term loudness (LTL); Glasberg & Moore, 2002) models applicable to non-stationary sounds, for ramps with 15-dB dynamics at different intensity regions. These asymmetries are based on the ratio of loudness maxima obtained between rising- and falling-intensity tones. The right column indicates the asymmetry averaged over these regions
Fig. 6
figure 6

Loudness (in sones) of two symmetrical time-varying tones predicted by the time-varying loudness (TVL) model (Glasberg & Moore, 2002). The left panel shows short-term loudness (STL, black lines) and long-term loudness (LTL, grey lines) of a 2-s [65–80 dB SPL] rising-intensity 1-kHz tone (with 10-ms linear rise and fall times) predicted by the TVL model. The right panel shows predictions for the time-reversed falling-intensity tone. Asymmetries between rising- and falling-intensity loudness patterns can be observed considering either STL or LTL maxima (as indicated by dashed lines). A greater asymmetry is found with LTL compared to STL, due to the use of another stage of temporal integration

In these models, the time constants employed at the first integration stage (i.e., to compute the STL) were based only on data from psychophysical experiments on temporal masking and temporal integration of loudness (Glasberg & Moore, 2002). Although some studies showed that these constants surprisingly lead to fairly accurate predictions of loudness asymmetries between short duration rising and falling stimuli (Moore, 2013; Rennies, Verhey, & Fastl, 2010; Ries et al., 2008), our results demonstrate that they are not sufficient to predict loudness asymmetries that occur between stimuli of a few seconds. In the view of our data, though it still underestimates the magnitude of the effect, the LTL of Glasberg and Moore’s model seems to be the best candidate to evaluate these asymmetries. Increasing the time constants at the latter integration stage in LTL would be the only option to improve the predictions of the asymmetries. However, the downside would be a poorer loudness prediction for amplitude-modulated sounds, for which LTL was initially designed. Future investigations are thus needed to create new and appropriate descriptors that correctly predict the loudness asymmetry between long-duration sounds, for example by using top-down modulated integration windows, as proposed recently to predict the pitch of short asymmetrical stimuli (Tabas, Balaguer-Ballester, Pressnitzer, Siebert, & Rupp, 2014).

Conclusions and perspectives

The global loudness judgment reflects a listener’s overall evaluation of “a sound over its entire duration” (Ponsot et al., 2013). While the peak value of the long-term loudness proposed in Glasberg and Moore’s model (2002) is typically used to evaluate the global loudness of time-varying sounds longer than 1 s (e.g., Rennies, Verhey, Appell, & Kollmeier, 2013; Rennies, Holube, & Verhey, 2013; Ries et al., 2008), our results clearly point out the limits of that model. The fact that the loudness asymmetry between 2-s rising- and falling-intensity sounds was consistent across different contexts of stimulus presentation and different measurement methods suggests that a strong underlying mechanism is involved. Further work is needed to determine the causes of the effect. In particular, the extent to which it relies on the “bias for rising tones” specified by Neuhoff (1998) or on the “end level bias” specified by Susini et al. (2010) remains to be specifically addressed. Loudness models would certainly benefit from future research on the perceptual and cognitive processes involved in such asymmetries in order to improve the global loudness prediction of long non-stationary sounds.