What happens when we compare two successively presented stimuli, X and Y? According to the commonly assumed difference model, the comparison response (e.g., first greater, second greater) is determined by the difference between the momentary subjective magnitudes of the stimuli (Thurstone, 1927a, b). Furthermore, these magnitudes are only dependent on the stimuli, and are independent of such factors as their presentation order (X–Y or Y–X) within the pair. Hence, no systematic under- or overestimation of the first or the second stimulus, relative to the other, should occur. The time-order error (TOE), first noted by Gustav Fechner (1860), violates this prediction but has often been explained, within the framework of the difference model, as being due to some kind of response bias (see, e.g., Alcalá-Quintana & García-Pérez, 2011). However, plenty of experimental evidence, in different sensory continua and using different paradigms, has invalidated the latter explanation of the TOE (e.g., Hellström, 1977, 1978; Jamieson & Petrusic, 1975).

Most often, participants’ abilities to tell the difference between two stimuli are studied by methods in which the stimuli are presented in succession, with one stimulus being held constant (the standard stimulus, St) and the other (the comparison stimulus, Co) varying around it. Discrimination sensitivity is then usually measured by the difference limen (DL), often defined as one-half of the change in either stimulus that is needed to change the response distribution from, for instance, 75% “first greater” to 75% “second greater.” The difference model predicts that discrimination sensitivity, as measured by the DL, should be independent of whether the first stimulus is fixed and the second varied (St – Co) or vice versa (Co – St). This prediction is contradicted by what will here be called the standard-position effect (SPE): DLs, as well as the percentages of correct responses, typically differ systematically between the orders St – Co and Co – St. This effect was noted for lifted weights by Martin and Müller (1890), who found that more correct answers were given when the order was St–Co than when it was Co – St. The effect was also noted for durations by Rammsayer and Wittkowski (1990), who named it the “constant-position effect.” Grondin and McAuley (2009) and Ulrich and Vorberg (2009) likewise found this effect for duration comparison. Ulrich and Vorberg called it the “Type-B effect” to distinguish it from the TOE, which they called the “Type-A effect.”

The essence of the SPE is that changes in the first- and in the second-presented stimulus do not exert equal impacts on the outcome of the stimulus comparison. Dyjas and Ulrich (2014) defined this effect to be negative in the common case in which the best discrimination is obtained in the order St–Co, so that the first stimulus has less impact on the response than the second. This definition is in analogy with the standard definition of the TOE (Fechner, 1860), according to which the TOE is negative when the first stimulus is perceived as having a lower magnitude than the physically identical second stimulus. The unequal impacts of the compared stimuli can also be noted in experimental designs in which none of the stimuli is held constant: Hellström (1979) found, in comparisons of tone loudness, an effect equivalent to the SPE, which varied in its sign and magnitude across temporal conditions (stimulus duration and interstimulus interval). For stimulus durations and interstimulus intervals of at least 1,000 ms, the impact of the second stimulus on the response (first louder, equal, or second louder) was larger than that of the first, whereas for brief stimulus durations and interstimulus intervals (tone durations of 100 – 200 ms and interstimulus intervals of 100 – 700 ms), the first stimulus had a much greater impact than the second. The impact of each stimulus was quantified as a weight coefficient, described below, and (although this was not done in Hellström, 1979) each weight can be translated into a measure of the DL, which is inversely related to the weight.

The existence of the SPE and the TOE leads to the inevitable conclusion that the difference model, which models the subjective difference between two compared stimuli as a simple difference between their subjective stimulus magnitudes—possibly with an added response bias—is unrealistic as a model of stimulus comparison. Still, there is no need to abandon the general notion behind the difference model: that stimulus comparison is based on the perceived difference between monotonic functions of the magnitudes of the two stimuli, and that the buildup of this difference is what the theorist should attempt to model in attempting to understand the comparison process. One useful and straightforward modeling approach is analogous to a standard model for data analysis in general—namely, linear regression. With this approach, the difference model is replaced by a model analogous to linear regression, with possibly different weight coefficients for the two subjective stimulus magnitudes. One model of this kind, based on Harry Helson’s adaptation-level theory, was introduced by Michels and Helson (1954; see also Helson 1964). A generalized version of this model was successfully fitted to the results from experiments with comparisons of duration (Hellström, 1977) and loudness (Hellström, 1978). In data from loudness comparison in a set of widely varying temporal conditions, as we mentioned above, Hellström (1979) noted a strong relationship, across conditions, between the intercept and the two weight coefficients. The analysis of this relationship led to further development of the stimulus comparison model into the sensation-weighting model (Hellström, 1979, 1985, 2000, 2003), which has been successfully used in many studies, and which is able to account for both the TOE and the SPE.

The basic form of the sensation-weighting model is

$$ {d}_{12}=\left[{s}_1{\psi}_1+\left(1\kern0.15em \hbox{--} \kern0.20em {s}_1\right){\psi}_{\mathrm{r}1}\right]\hbox{--} \left[{s}_2{\psi}_2+\left(1\kern0.15em \hbox{--} \kern0.20em {s}_2\right){\psi}_{\mathrm{r}2}\right]+b, $$
(1)

where d 12 is the scaled subjective difference between the first and second stimuli. ψ 1 and ψ 2 are the sensation magnitudes of the stimuli, s 1 and s 2 are weighting coefficients, and ψ r1 and ψ r2 are the subjective magnitudes that correspond to the current reference levels for the respective stimuli. The reference level is formed by the pooling of stimulus information: It is normally near the center of the stimulus range, and may differ somewhat between the two stimuli; see Hellström (1979). b is a constant term, which captures a possible contribution to d 12 that is independent of the weighting mechanism—for instance, a response bias. Notably, the sensation-weighting model thereby includes the difference model with and without bias as special cases.

The sensation-weighting model implies that the SPE and the TOE are two sides of the same coin. With s 1 < s 2, not only is the SPE negative, but in suitably designed experiments (varying both stimuli or using at least two standards), the familiar inverse relation between the TOE and stimulus magnitude (see, e.g., Needham, 1935) is observed, yielding TOEs that are positive below and negative above the reference levels. With the less common relation s 1 > s 2, this relation is reversed, and the SPE also reverses and becomes positive. A mean TOE different from zero may arise due to, for instance, extraneous stimulation that enters into the reference levels.

Temporal intervals have often been used in research on stimulus judgment, comparison, and discrimination (see Eisler, Eisler, & Hellström, 2008), because timing and time perception are so important in everyday life, and because their shortcomings, in the forms of failures to keep time and gross misperceptions of the duration of activities and events, are so obvious. Unlike with other sensory continua, there is no sense organ for time, and the processes underlying time perception remain largely a scientific challenge. The processing of short (in the range of milliseconds) and longer temporal intervals is likely governed at least partly by different mechanisms (see, e.g., Lewis & Miall, 2003a, b; Rammsayer, 1999, 2008; Rammsayer & Troche, 2014): For longer intervals, temporal processing demands cognitive resources, whereas for brief intervals processing is highly sensory in nature and beyond cognitive control. For example, when using a secondary task of a nontemporal and cognitive nature (Rammsayer & Lima, 1991; Rammsayer & Ulrich, 2011), the temporal discrimination of intervals in the 1-s range was markedly impaired, whereas for intervals from 50 to 100 ms, discrimination was not affected (see Mattes & Ulrich, 1998). Also, neuropharmacological studies have suggested temporal processing by a subcortical automatic system controlled by mesostriatal dopaminergic activity for intervals in the range of milliseconds, and by a prefrontal cognitive system for intervals in the 1-s range (for a review, see Rammsayer, 2008).

Karl von Vierordt (1868) performed basic studies and discovered systematic errors in the reproduction of time intervals. According to Vierordt’s law, short intervals are overreproduced, and long intervals are underreproduced. Also, Fechner and his followers noted the idiosyncrasies of temporal discrimination, such as possible violations of Weber’s law (see Eisler et al. 2008). In more recent research, it has been found that TOEs occur with time intervals as well as with other kinds of stimuli (e.g., Allan, 1977; Hellström, 1977, 2003). Temporal intervals differ from other stimuli in that their (temporal) magnitude is the same as their duration, so that duration and magnitude, two important factors behind TOEs, are inseparable. Furthermore, two consecutive time intervals can be perceptually concatenated, enabling alternative strategies in the comparison of a pair of successive durations, as is addressed by Eisler (1975, 1981); Eisler et al. 2008) parallel-clock model, according to which what is compared is not the two durations, but the second duration and one-half of the total duration.

A time interval can be represented and demarcated using stimuli of different modalities (e.g., auditory, visual, or tactile) and either can be filled with sensory content or can be empty. One effect of the modality factor is well-known—namely, worse discrimination for filled visual than for filled auditory intervals (Eisler et al. 2008; Grondin & McAuley, 2009; Hellström, 2003; Rammsayer, 2014).

Rammsayer (2014) studied Weber fractions (WFs) as a function of sensory modality (auditory, visual), standard duration, type of interval (filled, empty), and type of task (2AFC, reminder [explained below]). The purpose of the present Experiment 1 was to follow up this study—which was not designed to study the SPE and the TOE—by assessing the SPE—that is, discrimination performance as a function of the presentation order of the standard and comparison stimuli—as well as the TOEs for four different types of temporal stimuli (intervals), and two interval durations. Another particularly important background study was the one by Hellström and Rammsayer (2004), in which discrimination performance was studied for filled auditory (white-noise) intervals with standard durations of 50 and 1,000 ms. In that study, it was found that with the 1,000-ms standard, discrimination was better (i.e., WFs were smaller) for the presentation order St–Co than for Co–St, whereas the reverse relation was true with the 50-ms standard. This is an effect of the standard’s position in the pair—that is, an SPE—and, according to the definition of Dyjas and Ulrich (2014), it was negative with the 1,000-ms standard, but positive with the 50-ms standard.

According to Jamieson and Petrusic (1976, 1978) the practice of providing correctness feedback leads to response bias that may be one factor behind the TOE. Hellström and Rammsayer excluded the feedback in some conditions and found that the main effect of this was to greatly impair discrimination for some participants. It was considered of importance to further investigate the possible impact of feedback by making its presence or absence an additional factor.

Negative SPEs were highlighted by Dyjas and Ulrich (2014), whose internal reference model for stimulus comparison contends that the second stimulus is compared with an internal reference, successively built up as a weighted average (weights g and 1 – g, 0 < g < 1) of the current stimulus and the previous value of the reference. The modeling of Dyjas and Ulrich is based on methods in which one stimulus, the first or the second, is always at the standard level and the other varied around it. For this kind of situation, the internal reference model predicts a negative SPE and would have an explanation neither for a positive SPE nor for any TOE.

One problem seems to be that most researchers concentrate, be it by convenience or by tradition, on a few experimental conditions (in particular by avoiding brief interstimulus intervals) in which the internal reference model and similar models are usually able to describe the results. One exception is Hellström and Rammsayer (2004), who, using a 50-ms standard and interstimulus intervals of 100 through 2,700 ms, obtained consistently positive SPEs. Another exception is the recent study by Bausenhart, Dyjas, and Ulrich (2015), using an adaptive method, in which negative SPEs were found for filled auditory intervals with 100-ms and 1,000-ms standards and an interstimulus interval of 1,000 ms. With a 100-ms standard, the SPE vanished when the interstimulus interval was shortened to 300 ms, but it did not reverse into positive except for a few participants.

A model related to the internal reference model is that of Raviv, Ahissar, and Loewenstein (2012) who suggested, on the basis of the observer employing Bayesian inference, that a heuristic is used in which “decision in the two tone [frequency] discrimination task is based on a comparison between the second tone and an exponentially decaying average of the first tone and past tones” (p. 1). With this kind of model, the authors account for the “contraction bias” in their experiment—that is, TOEs that are positive for low, and negative for high, magnitudes. (In their experiment, unlike those of Bausenhart et al. 2015, both stimuli were varied, which made the TOEs visible.) This explanation may be plausible in most experimental conditions, but not in those of Hellström (1979, 2003) with brief, but by no means extreme, stimulus durations and interstimulus intervals. In such conditions, a straightforward application of Bayesian inference regarding the first-presented stimulus does not seem fruitful. The sensation-weighting model, built on the conception of the comparison process as “a two-way, interactive process that involves the memory representations of both stimuli” (Hellström, 1979, p. 469) was instead developed to also account for this kind of results. This development was based on a wide range of conditions, and with a design more suitable for quantitative modeling than are designs in which one of the stimuli is always at the standard level.

SPEs are known from earlier research: The psychometric functions and DLs for the orders St – Co and Co – St often differ, as measured with adaptive as well as with nonadaptive psychophysical methods. Also, as was shown by, for instance, Rammsayer (2014), using a random order of standard and comparison stimuli (“2AFC task”) yields increased DLs as compared to using only the St–Co order (“reminder task”). Ulrich and Vorberg (2009) demonstrated that measures of the DL obtained with a random order become biased in the presence of TOEs and SPEs; they therefore strongly recommended determining DLs separately for the two stimulus orders. That estimates of the DL can differ considerably between the two orders was also demonstrated by Hellström (2000) and by Hellström and Rammsayer (2004) (see Eisler et al. 2008).

Against this background, we considered it important to further explore the comparison and discrimination of time intervals, and in particular how the SPEs might depend on duration range (around 100 ms and around 1,000 ms), sensory modality (auditory, visual), and interval type (filled, empty). Of particular theoretical importance was attempting to confirm the previous (Hellström & Rammsayer, 2004) observations of positive SPEs as well as of TOEs, because these are excluded by both the internal reference model and the difference model.

As in the study by Hellström and Rammsayer (2004), a quantitative modeling approach was used, based on the sensation-weighting model. This approach by no means excluded the possibility of ending up by selecting the difference model (with or without bias), the internal reference model, or some variant of the latter, including the Michels–Helson model. All of these models can be seen as special cases of the more general sensation-weighting model.

Experiment 1

In this experiment, auditory as well as visual interval durations were compared using an adaptive psychophysical method. Standard durations of 100 and 1,000 ms were used, with an interstimulus interval of 900 ms.The between-groups factors were Interval Type (filled, empty) and Correctness Feedback (with, without). The within-participants (blocked) factors were Interval Modality (auditory, visual), Standard Duration (100 ms, 1,000 ms), and the Magnitude Relation of the standard and comparison stimuli, as is described under Procedure.

Method

Participants

The participants were 21 male and 91 female undergraduate psychology students at the University of Bern, ranging in age from 18 to 47 years (M = 22.8, SD = 4.59). They received course credit for participating in this study. All participants were naïve as to the purpose of this study and reported normal hearing and normal or corrected-to-normal vision. The study was approved by the local ethics committee.

Apparatus and stimuli

The time intervals were marked by auditory or visual stimuli. The presentation of the intervals and the recording of participants’ responses were computer controlled. The timing accuracy of stimulus presentation was better than ± 1 ms. The filled-auditory stimuli were white-noise bursts presented binaurally through headphones (Sony CD 450) at an intensity of 66 dBA. The empty-auditory stimuli were bounded by 3-ms white-noise bursts at an intensity of 88 dBA. These different levels of intensity were chosen to achieve equal loudnesses in the two conditions, on the basis of the results of a pilot experiment in which 12 participants were asked to adjust the loudness of a 3 - ms white-noise burst until it matched that of a 100 - ms burst. Visual stimuli were generated by a red light-emitting diode (LED) (diameter 0.38°, viewing distance 60 cm, luminance 68 cd/m2) positioned at the eye level of the participant. The intensity of the LED was clearly above threshold, but not dazzling. Empty visual intervals were marked by 3 - ms LED flashes.

Procedure

Using a weighted up–down procedure (Kaernbach, 1991; see Hellström & Rammsayer, 2004), the participants compared the durations of two successive intervals. The between-groups factors were Interval Type (filled, empty) and Feedback (with, without); the within-participants factors were Modality (auditory, visual) and Standard Duration (100 ms, 1,000 ms). Twenty-eight participants were randomly assigned to each of the four experimental conditions—filled–feedback, filled–no feedback, empty–feedback, and empty–no feedback. Each participant participated in one session, consisting of eight blocks with a 1-min break in between. A block was composed of 64 trials; four consecutive blocks were presented with auditory, and four with visual, stimuli. Half of the participants started with the four auditory blocks, and the other half with the four visual blocks. The auditory and visual parts of the session were preceded by six practice trials, in order to ensure that the participants understood the instructions and to make them familiar with the stimuli.

In two blocks for each sensory modality, the standard was 100 ms, and in two, 1,000 ms. For each combination of modality and standard, in one block the comparison stimulus was longer than the standard (L-block), and in the other it was shorter (S-block). The order of these block types was balanced across participants. Presentation orders of the standard and comparison stimuli were randomized and balanced within each block; each order was used in half of the trials. Using a notation in which the order of standard (St) and comparison (Co) stimuli indicates their presentation order, and “<” and “>” indicate their magnitude relation, an L-block was, equally often, St < Co or Co > St, and an S-block was, equally often, St > Co or Co < St.

For empty auditory intervals, only the 100-ms standard was used. This was done because in a pilot study, many participants had difficulties with empty auditory intervals in the 1,000-ms range. They reported primarily perceiving a sequence of four auditory clicks rather than two empty intervals separated by an interstimulus interval. This was probably due to confusion of the 1,000-ms stimulus interval with the 900-ms interstimulus interval (cf. Grondin, Meilleur-Wells, Ouellette, & Macar, 1998), transforming the task into one of rhythm perception.

The participant was seated at a table with a keyboard and a computer monitor in a sound-attenuated and dimly lit room. To initiate the first trial, the participant pressed the space bar; the first stimulus was then presented after 900 ms. After the second stimulus, the participant responded by pressing one of two designated keys on a computer keyboard, labeled “first interval longer” and “second interval longer.” Accuracy, not speed, was emphasized in the instructions. In the feedback conditions, “+” (correct) or “–” (false) was displayed on the monitor screen for 1,500 ms after the response. The next trial started 900 ms after presentation of the feedback, and 900 ms after the participant’s response in the no-feedback conditions.

When the standard was 100 (or 1,000) ms, the initial duration of the comparison stimulus was 30 (or 400) ms below or above that of the standard. The comparison stimulus was then changed using an adaptive rule, the weighted up–down method (Kaernbach, 1991), to estimate the upper and lower 75% difference thresholds. When the comparison stimulus was shorter than the standard, it was increased by 6 (or 25) ms when it was judged to be shorter than the standard, and decreased by 18 (or 75) ms when it was judged to be longer than the standard; its longest possible value was 1 ms below the standard. When the comparison stimulus was longer than the standard, it was decreased by 6 (or 25) ms after it was judged to be longer than the standard, and increased by 18 (or 75) ms after it was judged to be shorter than the standard; its shortest possible value was 1 ms above the standard.

Data analysis

Discrimination performance

With the weighted up–down procedure of Kaernbach (1991), using the ratio 3:1 between the size of an upward and a downward step, performance settles at 75% correct responses. [This happens because, given the current value of the comparison stimulus, Co i , its predicted value at the next trial, Co i+1, is equal to p(Co i x) + (1 – p)(Co i + 3x) = Co i + 3x(.75 – p), where x and 3x are the step sizes, and p is the probability of a correct response.] The corresponding duration difference (D 75) between the comparison stimulus and the standard was computed, for each block and presentation order, as the mean absolute difference in milliseconds (i.e., the mean of the upper and lower 75% thresholds) between the comparison stimulus and the standard across the last 20 trials. To measure discrimination uncontaminated by order effects, each participant’s arithmetic mean of D 75 for St < Co and St > Co, and for Co < St and Co > St, were then computed. These averages indicate the DLs, DLSt – Co and DLCo – St.

Modeling approach

To model the participants’ comparison behavior, Hellström’s (1979, 1985, 2000, 2003) sensation-weighting model was applied and adapted to the experimental situation. It was assumed that the psychophysical function was linear, ψ = ϕ, over the range of comparison stimuli for each standard duration (no assumption was made concerning the shape of the psychophysical function across standard durations). It was also assumed that for each block type the two reference levels were equal, ϕ r1 = ϕ r2 = ϕ r (see Hellström, 2000), but that ϕ r might differ between L- and S-blocks (where in L-blocks, the mean duration of the comparison stimulus was about two DL units longer than in S-blocks, and thereby—since the standard was presented in each trial—the mean stimulus duration was about one DL unit longer). This yielded

$$ {d}_{12} = {s}_1{\phi}_1\hbox{--} {s}_2{\phi}_2 + \left({s}_2\ \hbox{--} {s}_1\right){\phi}_{\mathrm{r}T} + b, $$
(2)

where T = L or S, according to block type.

It was assumed that each measured DL corresponds to the same absolute value in the scale of the subjective duration difference d 12, and thereby, according to our assumption ψ = ϕ, also in the scale of the equivalent physical duration difference between the comparison stimulus and the standard. This physical difference, ∆ϕ, then represents the “true” DL, unaffected by order effects. It can be expressed in terms of Eq. 2—expressing each comparison stimulus as a function of the standard and the pertinent DL—as follows:

$$ L- block, St < Co:\ \hbox{--} \varDelta \phi = {s}_1\mathrm{St}\ \hbox{--} {s}_2\left(\mathrm{St} + {\mathrm{DL}}_{\mathrm{St}<\mathrm{C}\mathrm{o}}\right) + \left({s}_2\ \hbox{--} {s}_1\right){\phi}_{rL} + b, $$
(3)
$$ L- block,Co > St:\kern0.5em \varDelta \phi = {s}_1\left(\mathrm{St} + {\mathrm{DL}}_{\mathrm{Co}>\mathrm{St}}\right)\ \hbox{--} {s}_2\mathrm{St} + \left({s}_2\ \hbox{--} {s}_1\right){\phi}_{rL} + b, $$
(4)
$$ S- block, St > Co:\kern0.5em \varDelta \phi = {s}_1\mathrm{St}\ \hbox{--} {s}_2\left(\mathrm{St}\ \hbox{--}\ {\mathrm{DL}}_{\mathrm{St}>\mathrm{C}\mathrm{o}}\right) + \left({s}_2\ \hbox{--} {s}_1\right){\phi}_{rS} + b, $$
(5)
$$ S- block,Co < St:\ \hbox{--} \varDelta \phi = {s}_1\left(\mathrm{St}\ \hbox{--}\ {\mathrm{DL}}_{\mathrm{Co}<\mathrm{St}}\right)\ \hbox{--} {s}_2\mathrm{St} + \left({s}_2\ \hbox{--}\ {\mathrm{s}}_1\right){\phi}_{rS} + b. $$
(6)

Setting s 1 = s 2 = 1 shows that in the absence of weighting and bias, each DL would be equal to ∆ϕ.

Difference limens

For each participant, the DLs for the orders standard–comparison (St – Co) and comparison–standard (Co – St) were calculated as follows:

$$ {\mathrm{DL}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}} = \left({\mathrm{DL}}_{\mathrm{St}<\mathrm{C}\mathrm{o}} + {\mathrm{DL}}_{\mathrm{St}>\mathrm{C}\mathrm{o}}\right)/2, $$
(7)
$$ {\mathrm{DL}}_{\mathrm{Co}\hbox{--} \mathrm{St}} = \left({\mathrm{DL}}_{\mathrm{Co}<\mathrm{St}} + {\mathrm{DL}}_{\mathrm{Co}>\mathrm{St}}\right)/2. $$
(8)

Due to the positive skewness of the DL distributions across participants, and for comparability of the effects between the two standard durations, the Weber fraction (WF) was computed as DL/St, and lnWF was used to assess discriminability. To assess differential discriminability in the two orders, the SPE was quantified as 100*ln(DLSt – Co/DLCo – St), which can be interpreted as a symmetric percent difference (Graff, 2014). This measure assigns the correct sign to the SPE.

ResultsFootnote 1

Natural log of the Weber fraction (lnWF)

The means of lnWF are shown in Figs. 1 and 2. Due to the missing values for the 1,000-ms standard in the empty-auditory condition, an overarching analysis of variance (ANOVA) for all conditions could not be conducted, nor an ANOVA for all auditory conditions. For the visual conditions, the values of lnWF for each type of interval (filled, empty) were submitted to a mixed ANOVA with Standard Duration (100 ms, 1,000 ms) and Order (St – Co, Co – St) as within-participants factors and Feedback (with, without) as a between-participants factor.

Fig. 1
figure 1

Experiment 1. Empty visual and filled auditory intervals: Mean of natural log Weber fraction versus standard duration. Error bars indicate standard errors of the means. The right axis shows the corresponding geometric means of the Weber fractions. The plots illustrate positive standard-position effects (SPEs) for the 100-ms standard duration and negative SPEs for the 1,000-ms standard duration

Fig. 2
figure 2

Experiment 1. Filled visual and empty auditory intervals: Mean of natural log Weber fraction versus standard duration. Error bars indicate standard errors of the means. The right axis shows the corresponding geometric means of the Weber fractions. The plots illustrate negative standard-position effects (SPEs)

For the filled auditory intervals (see Fig. 1), the effect of feedback was significant, F(1, 54) = 5.355, p = .025, η p 2 = .090. With feedback, lnWFs were lower (M = – 2.022, SE M = 0.063) than without feedback (M = – 1.815, SE M = 0.064), corresponding to a DL reduction by 18.7%. The Duration × Order interaction was also significant, F(1, 54) = 7.515, p = .008, η p 2 = .122, but not the other interactions, nor the main effects of duration and order. Separate analyses with mixed ANOVAs (with the factors Order and Feedback) were conducted for St = 100 ms and St = 1,000 ms. For St = 100 ms, the effect of order was significant, F(1, 54) = 5.101, p = .028, η p 2 = .086: lnWFs were higher for St–Co (M = – 1.826, SE M = 0.060) than for Co–St (M = – 1.935, SE M = 0.065), yielding a DL ratio of 1.115 and an SPE of + 10.91% (SE = 4.83%). For St = 1,000 ms, the effect of order was not significant, p = .167.

For the empty auditory intervals (see Fig. 2), only a 100-ms standard was used. The lnWFs were submitted to a mixed ANOVA, with Order (St – Co, Co – St) as a within-participants factor and Feedback (with, without) as a between-participants factor. No significant effects were found (all ps > .05).

For the filled visual intervals (see Fig. 2), the effect of feedback was significant, F(1, 54) = 9.443, p = .003, η p 2 = .149. With feedback, lnWFs were lower (M = – 1.416, SE M = 0.051) than without feedback (M = – 1.195, SE M = 0.051), corresponding to a DL reduction of 19.8%. We observed a significant effect of duration, F(1, 54) = 147.929, p < .001, η p 2 = .733. For St = 100 ms, lnWFs were higher (M = – 1.043, SE M = 0.041) than for St = 1,000 ms (M = – 1.567, SE M = 0.043), corresponding to a WF ratio of 1.69. We also found a significant effect of order, F(1, 54) = 10.651, p = .002, η p 2 = .165. For St – Co, lnWFs were lower (M = – 1.378, SE M = 0.043) than for Co – St (M = – 1.232, SE M = 0.042). None of the interactions were significant.

For the empty visual intervals (see Fig. 1), the effect of feedback was nonsignificant. A significant effect of duration emerged, F(1, 54) = 93.375, p < .001, η p 2 = .634: For St = 100 ms, lnWFs were higher (M = – 1.038, SE M = 0.057) than for St = 1,000 ms (M = – 1.585, SE M = 0.064). There was also a significant effect of order, F(1, 54) = 6.163, p = .016, η p 2 = .102, but the interactions were nonsignificant, except for Duration × Order, which was significant, F(1, 54) = 54.306, p < .001, η p 2 = .501. Separate analyses with mixed ANOVAs (with the factors Order and Feedback) were conducted for St = 100 ms and St = 1,000 ms. For St = 100 ms, the effect of order was significant, F(1, 54) = 34.818, p < .001, η p 2 = .392. The lnWFs were higher for St – Co (M = – 0.772, SE M = 0.050) than for Co – St (M = – 1.304, SE M = 0.089), yielding a DL ratio of 1.703 and an SPE of + 53.24% (SE M = 9.02%). For St = 1,000 ms, the effect of order was likewise significant, F(1, 54) = 21.653, p < .001, η p 2 = .286. The lnWFs were lower for St – Co (M = – 1.720, SE M = 0.067) than for Co – St (M = – 1.450, SE M = 0.074), yielding a DL ratio of 0.763 and an SPE of – 27.01% (SE = 5.81%).

For all of the auditory conditions with standard duration 100 ms, the lnWF values were submitted to a mixed ANOVA with Order (St – Co, Co – St) as a within-participants factor and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The significant effects were those of order, F(1, 220) = 11.691, p < .001, η p 2 = .050; interval type, F(1, 220) = 18.684, p < .001, η p 2 = .078; feedback, F(1, 220) = 6.375, p = .012, η p 2 = .028: and the Order × Interval Type interaction, F(1, 220) = 16.209, p < .001, η p 2 = .069;

For all of the visual conditions, the lnWF values were submitted to a mixed ANOVA with Standard Duration (100 ms, 1,000 ms) and Order (St – Co, Co – St) as within-participants factors and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The significant effects were those of feedback, F(1, 108) = 8.966, p < .003, η p 2 = .077; and standard duration, F(1, 108)  = 226.722, p < .001, η p 2 = .677; the significant interactions were Order × Interval Type, F(1, 108) = 16.037, p < .001, η p 2 = .129; Standard Duration × Order, F(1, 108) = 33.564, p < .001, η p 2 = .237; and Standard Duration × Order × Interval Type, F(1, 108) = 35.408, p < .001, η p 2 = .247. The latter interaction effect confirmed the difference between filled and empty visual intervals regarding the effect of standard duration on the SPE.

For all of the filled conditions, the lnWF values were submitted to a mixed ANOVA with Modality (auditory, visual), Standard Duration (100 ms, 1,000 ms), and Order (St – Co, Co – St) as within-participants factors and Feedback (with, without) as a between-participants factor. The significant effects were those of feedback, F(1, 54) = 8.321, p = .006, η p 2 = .135; modality, F(1, 54) = 348.352, p < .001, η p 2 = .865; duration, F(1, 54) = 56.630, p < .001, η p 2 = .512; and order, F(1, 54) = 4.279, p = .0043, η p 2 = .073; along with the Modality × Duration, F(1, 54) = 64.467, p < .001, η p 2 = .544; and Modality × Order, F(1, 54) = 7.197, p = .0097, η p 2 = .118, interactions. The Modality × Duration × Order interaction did not reach significance (p = .088).

For all of the conditions with St = 100 ms, the lnWF values were submitted to a mixed ANOVA with Modality (auditory, visual) and Order (St – Co, Co – St) as within-participants factors and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The significant effects were those of modality, F(1, 108) = 185.039, p < .001, η p 2 = .631; order, F(1, 108) = 16.123, p < .001, η p 2 = .130; interval type, F(1, 108) = 18.542, p < .001, η p 2 = .147; and feedback, F(1, 108) = 6.326, p = .013, η p 2 = .055; and the significant interactions Modality × Interval Type, F(1, 108) = 57.231, p < .001, η p 2 = .346; Order × Interval Type, F(1, 108) = 22.353, p < .001, η p 2 = .171; Modality × Order, F(1, 108) = 4.157, p = .044, η p 2 = .037; and Modality × Order × Interval Type, F(1, 108) = 33.940, p < .001, η p 2 = .239.

Time-order error

The TOE was defined as the difference in physical stimulus magnitudes (ϕ 2ϕ 1) that yielded d = 0. It was estimated as follows for the two presentation orders:

$$ {\mathrm{TOE}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}} = \left({\mathrm{DL}}_{\mathrm{St}<\mathrm{C}\mathrm{o}}\hbox{--}\ {\mathrm{DL}}_{\mathrm{St}>\mathrm{C}\mathrm{o}}\right)/2, $$
(9)
$$ {\mathrm{TOE}}_{\mathrm{Co}\hbox{--} \mathrm{St}} = \left({\mathrm{DL}}_{\mathrm{Co}<\mathrm{St}}\hbox{--}\ {\mathrm{DL}}_{\mathrm{Co}>\mathrm{St}}\right)/2. $$
(10)

The mean of TOESt – Co and TOECo – St, expressed as a percentage of the standard, was named TOE%. The means of TOE% are plotted in Figs. 3 and 4. For all interval types, the mean TOE% was negative for St = 1,000 ms, and positive for St = 100 ms. The values of TOE% were analyzed by mixed ANOVAs with Duration and Order as within-participants factors and Feedback as a between-participants factor.

Fig. 3
figure 3

Experiment 1. Filled and empty auditory intervals: Mean of TOE% versus standard duration. Error bars indicate standard errors of the means

Fig. 4
figure 4

Experiment 1. Filled visual intervals: Mean of TOE% versus standard duration. Error bars indicate standard errors of the means

For the filled auditory intervals (see Fig. 3), the effect of duration was significant, F(1, 54) = 23.781, p < .001, η p 2 = .306. The effect of order was also significant, F(1, 54) = 4.137, p = .047, η p 2 = .071. None of the interactions was significant except for Duration × Order, F(1, 54) = 4.160, p = .046, η p 2 = .072. Separate analyses with mixed ANOVAs (with the factors Order and Feedback) were conducted for St = 100 ms and St = 1,000 ms. For St = 100 ms, only the effect of order was significant, F(1, 54) = 5.126, p = .028, η p 2 = .087 (TOE% St – Co > TOE% Co – St). For St = 1,000 ms, we observed no significant effects.

For the empty auditory intervals (see Fig. 3) (St = 100 ms only), only the effect of order was significant, F(1, 54) = 6.765, p = .012, η p 2 = .111. For the filled visual intervals (see Fig. 4), only the effect of duration was significant, F(1, 54) = 22.827, p < .001, η p 2 = .297. For the empty visual intervals (see Fig. 5), the effect of duration was significant, F(1, 54) = 28.724, p < .001, η p 2 = .347, as well as the effect of order, F(1, 54) = 7.943, p = .007, η p 2 = .128, but none of the interactions.

Fig. 5
figure 5

Experiment 1. Empty visual intervals: Mean of TOE% versus standard duration. Error bars indicate standard errors of the means

For all of the auditory conditions with standard duration 100 ms, the TOE% values were submitted to a mixed ANOVA with Order (St – Co, Co – St) as a within-participants factor and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The only significant effect was that of order, F(1, 220) = 10.308, p = .002, η p 2 = .045.

For all of the visual conditions, the values of TOE% were submitted to a mixed ANOVA with Standard Duration (100 ms, 1,000 ms) and Order (St – Co, Co – St) as within-participants factors and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The significant effects were those of standard duration, F(1, 108) = 51.486, p < .001, η p 2 = .323; and order, F(1, 108) = 7.209, p = .008, η p 2 = .063; and the Order × Interval Type interaction, F(1, 108) = 4.495, p = .036, η p 2 = .040.

For all of the filled conditions, the TOE% values were submitted to a mixed ANOVA with Modality (auditory, visual), Standard Duration (100 ms, 1,000 ms), and Order (St – Co, Co – St) as within-participants factors, and Feedback (with, without) as a between-participants factor. The duration effect was significant, F(1, 54) = 33.659, p < .001, η p 2 = .384, as were the Modality × Duration, F(1, 54) = 7.631, p = .008, η p 2 = .124; and Modality × Duration × Order, F(1, 54) = 5.947, p = .018, η p 2 = .099, interactions.

For all of the conditions with St = 100 ms, the TOE% values were submitted to a mixed ANOVA with Modality (auditory, visual) and Order (St – Co, Co – St) as within-participants factors, and Interval Type (filled, empty) and Feedback (with, without) as between-participants factors. The significant effects were those of modality, F(1, 108) = 3.968, p = .049, η p 2 = .035; and order, F(1, 108) = 8.243, p = .005, η p 2 = .071.

Estimation of parameters in the sensation-weighting model

Estimation of s1/s2 from TOEs

From Eqs. 3 through 6, it follows that

$$ {\mathrm{TOE}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}} = \left(1/{s}_2\right)\ \left\{\left({s}_1\ \hbox{--} {s}_2\right)\ \left[\mathrm{St}\ \hbox{--}\ \left({\phi}_{\mathrm{rL}}+{\phi}_{\mathrm{rS}}\right)/2\right] + b\right\} $$
(11)

and

$$ {\mathrm{TOE}}_{\mathrm{Co}\hbox{--} \mathrm{St}} = \left(1/{s}_1\right)\ \left\{\left({s}_1\ \hbox{--} {s}_2\right)\ \left[\mathrm{St}\ \hbox{--}\ \left({\phi}_{\mathrm{rL}}+{\phi}_{\mathrm{rS}}\right)/2\right] + b\right\}, $$
(12)

so that

$$ {\mathrm{TOE}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}}/\ {\mathrm{TOE}}_{\mathrm{Co}\hbox{--} \mathrm{St}} = {s}_1/{s}_2, $$
(13)

which in principle yields an unbiased estimate of s 1/s 2, regardless of the relation between ϕ rL and ϕ rS, and of a possible judgment bias (b). This estimate is, however, not very useful, especially for individual participants, because the two TOE values are very variable, are often close to zero, and may even have opposite signs. The ratio of the group means of TOESt – Co and TOECo – St may still be used to roughly estimate s 1/s 2 in conditions in which these means are sufficiently reliable and at a reasonably safe distance from zero. This is illustrated below.

Estimation of s1/s2 from DLs

It follows from the sensation-weighting model that unless s 1 = s 2, the DLs differ between the presentation orders St – Co and Co–St, which means that a SPE occurs. Combining Eqs. 3 through 6 yields

$$ {\mathrm{DL}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}} = \left({s}_1/{s}_2\right){\mathrm{DL}}_{\mathrm{Co}\hbox{--} \mathrm{St}} + \left(1\ \hbox{--} {s}_1/{s}_2\right)\left({\phi}_{\mathrm{rL}}\ \hbox{--} {\phi}_{\mathrm{rS}}\right), $$
(14)

where DLSt – Co = (DLSt  >  Co + DLSt  <  Co)/2 and DLCo  –  St = (DLCo  >  St + DLCo  <  St)/2.

A blocked design was used, so that virtually only the stimuli in the particular block should determine the ϕ r. As was said before, with the present design, the mean stimulus duration in L-blocks is about one DL unit longer than in S-blocks, which gives reason to hypothesize that ϕ rL is longer than ϕ rS in some proportion, u, to this amount.

We now express ϕ rLϕ rS as u(DLSt  –  Co + DLCo  –  St)/2. This yields DLSt–Co = DLCo  –  St [2k + u(1 – k)]/[(2 – u(1 – k)], where k = s 1/s 2. For u = 1, this yields DLSt  –  Co/DLCo  –  St = 1, and for u = 0—that is, for ϕ rL = ϕ rS

$$ {\mathrm{DL}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}}/{\mathrm{DL}}_{\mathrm{Co}\hbox{--} \mathrm{St}} = {s}_1/{s}_2. $$
(15)

If 0<u < 1, this DL ratio will be between s 1/s 2 and 1.

In conditions that display sizable and reliable effects of stimulus order, Eqs. 1114 are very useful to help interpret the WF and TOE data, in order to estimate the relations (i) of s 1 to s 2, (ii) of ϕ rL to ϕ rS, and (iii) of St to ϕ r. This estimation will here be illustrated only for the empty-visual condition, in which s 1/s 2 was estimated, estimating the TOE ratio (Eq. 13), as 3.074 and 0.323 for St = 100 ms and St = 1,000 ms, respectively. Using Eq. 14, together with the DL estimates for the orders St – Co and Co – St, ϕ rL – ϕ rS was estimated as 17.9 and 152.5 for St = 100 ms and St = 1,000 ms, respectively. These values can be compared with the corresponding geometric means of the DLs—35.4 and 205.0 ms—and indicate u values of .51 and .74. Finally, Eq. 11, together with the estimate of TOESt – Co, was used to estimate (ϕ rL + ϕ rS)/2 – b/(s 1 – s 2) as 96.2 and 961.3. Assuming b = 0, we then arrive at these estimates: for St = 100 ms, ϕ rL = 105.1, ϕ rS = 87.2; for St = 1,000 ms, ϕ rL = 1,037.6, ϕ rS = 885.0. The resulting estimates of (ϕ rL + ϕ rS)/2, 96.2 and 961.3 ms, are about 4% below the corresponding standard duration. Interestingly, the TOEs in this case resemble the usual pattern in TOE studies—positive for St = 100 ms, negative for St = 1,000 ms—but this has a different cause according to the sensation-weighting model: namely, a positive, nearly constant relative difference between the standard duration and reference level, together with differing weight ratios (s 1/s 2).

“True” DL obscured by weighting

By adding the expressions for ∆ϕ, the “true” DL, in Eqs. 36, we obtain 4∆ϕ = s 1(DLCo  <  St + D 75Co  > St) + s 2(DLSt   <  Co + DLSt  >  Co). This yields

$$ \varDelta \phi = \left({s}_1{\mathrm{DL}}_{\mathrm{Co}\hbox{--} \mathrm{St}} + {s}_2{\mathrm{DL}}_{\mathrm{St}\hbox{--} \mathrm{C}\mathrm{o}}\right)/2, $$
(16)

which means that each obtained DL depends not only on its underlying, “true” value, but also on the comparison procedure. For instance, in stimulus comparison with a constant interstimulus interval, as in the present experiments, the time interval between the onsets of the first and second stimuli (i.e., the stimulus onset interval) shortens when the stimuli become briefer. As was suggested by earlier research (Hellström, 1979), this may lead to lower weights (s 1 and s 2) for the briefer stimuli. This, in turn, leads to increased DLs, so that if Weber’s law holds for the “true” DLs, the obtained WFs no longer would be constant, but would be larger for the briefer stimuli (see Hellström & Rammsayer, 2004).

Discussion

For TOEs, the results affirm the commonly found effect of stimulus level—negative TOEs for long, but positive TOEs for short, durations. However, this effect occurred despite the fact that the 100-ms and 1,000-ms standards were presented in separate blocks. For the empty visual condition, the detailed analysis above indicates that the effect was due to different weighting patterns for the 100-ms and 1,000-ms durations—also visible by causing SPEs with different signs—rather than to the position of the standard duration in relation to a common reference level. Whatever the detailed reason for the TOE in each condition, it clearly excludes an explanation by simple judgment bias (see Hellström, 1978).

For the filled and empty visual intervals, but not for the filled auditory intervals, lnWF was markedly higher for the 100-ms than for the 1,000-ms standard. This suggests that although the mechanisms behind the temporal perception of auditory and of visual intervals may be the same (Grondin, 1993), those behind their comparison and discrimination may differ.

For filled auditory intervals, the TOE% and lnWF data confirmed earlier results (Hellström & Rammsayer, 2004). Feedback lowered WFs by 15% – 20% but did not change the pattern of results, and all interactions involving feedback were nonsignificant. WFs were affected by whether the first or the second stimulus was varied: For filled auditory and empty visual intervals, with the 100-ms standard, varying the first stimulus (i.e., Co  –  St) had a higher impact on the discriminative response, as was shown by smaller WFs, than varying the second stimulus (i.e., St  –  Co). This means that a positive SPE was found. Negative SPEs were obtained, as well: For filled auditory and, in particular, for empty visual intervals, WFs were higher with the presentation order Co  –  St than with St  –  Co for the 1,000-ms standard. Correspondingly, the Order × Duration interactions were significant (p = .008 and p < .001, respectively). For empty auditory intervals (in which only the 100-ms standard was used), the WFs did not differ between orders.

The results of Experiment 1 are incompatible with the classic difference model, which is built on simple subtraction, thereby assuming that variation of each of the stimuli exerts an equal impact on the response, and therefore allows no SPEs. The results are also incompatible with the internal reference model, which (like the quite similar model of Michels & Helson, 1954) specifies a lower impact for the first stimulus, and therefore predicts only negative SPEs. Instead, the results suggest a more flexible ratio of impacts (i.e., weights) for the first and second stimuli, as is described by the sensation-weighting model (Hellström, 1979, 2003). The latter model is able to account for the positive as well as the negative SPEs. In addition, it accounts for the variation of TOEs across conditions, whereas the difference model and the internal reference model would have to invoke some kind of response bias as the basis for the TOE, and seek an explanation for the variation of this bias.

Experiment 2

In Experiment 1, for filled auditory and empty visual intervals we found a marked difference in results for the 100-ms and 1,000-ms standards, in that for lnWF an interaction emerged between standard duration and stimulus order. These results confirmed those from Hellström and Rammsayer (2004), suggesting a transition between different weighting patterns somewhere between the duration levels of 100 and 1,000 ms. Therefore, Experiment 2 was conceived to investigate the weighting patterns for four, logarithmically equally spaced, filled auditory intervals between 100 and 1,000 ms, using no feedback in order to exclude its possibly biasing influence (Jamieson & Petrusic, 1975, 1976). The number of participants included was expected to give a level of measurement precision similar to that in Experiment 1.

Method

Participants

The participants were 14 male and 42 female undergraduate psychology students ranging in age from 19 to 40 years (M = 22.25, SD = 3.44). They received course credit for participating in this study. All participants were naïve about the purpose of this study and reported normal hearing and normal or corrected-to-normal vision. The study was approved by the local ethics committee.

Apparatus and stimuli

The same apparatus was used as in Experiment 1. Filled auditory intervals (white-noise bursts at 66 dBA) were presented with standard durations of 100, 215, 464, and 1,000 ms. No correctness feedback was given. The durations were logarithmically equally spaced: log10(duration in ms) = 2.00, 2.33, 2.67, and 3.00.

Procedure

The basic blocked procedure for stimulus presentation and threshold measurement was the same as in Experiment 1. However, here there were four block pairs, each with one of the four durations, with the duration order balanced across participants. In each block pair, one was an L-block and one an S-block, as we defined for Experiment 1. For half of the participants, each block pair always started with an L-block, and for the other half, always with an S-block. The session was initiated by six practice trials.

Results

The basic calculations were the same as for Experiment 1. Two participants were excluded: one due to failure to follow the instructions, the other due to extremely poor discrimination for the 100-ms standard duration (upper threshold > 200 ms).

Natural log Weber fractions

The mean lnWFs are displayed in Fig. 6. The lnWFs were submitted to a repeated measures ANOVA, using the multivariate approach with Pillai’s tests, with Duration (100, 215, 464, 1,000 ms) and Order (St – Co, Co – St) as within-participants factors. The main effect of duration was significant, F(3, 51) = 11.386, p < .001, η p 2 = .401, as were the linear trend, F(1, 53) = 22.363, p < .001, η p 2 = .297, and the quadratic trend, F(1, 53) = 6.811, p = .012, η p 2 = .114. The main effect of order was not significant. The Duration × Order interaction was significant, F(3, 51) = 5.852, p = .002, η p 2 = .256; notably, the Quadratic × Linear trend, mirrored by the double intersection of the curves, was significant, F(1, 53) = 17.471, p < .001, η p 2 = .248. Separate ANOVAs were therefore conducted for the two presentation orders. For the order St – Co, the effect of duration was significant, F(3, 51) = 6.915, p < .001, η p 2 = .289; only the linear trend was significant, F(1, 53) = 18.286, p < .001, η p 2 = .257. For the order Co – St, the effect of duration was likewise significant, F(3, 51) = 10.136, p < .001, η p 2 = .374; the linear trend was significant, F(1, 53) = 8.541, p = .005, η p 2 = .139, as was the quadratic trend, F(1, 53) = 17.952, p < .001, η p 2 = .253.

Fig. 6
figure 6

Experiment 2. Filled auditory intervals: Mean of natural log Weber fraction versus standard duration (durations are logarithmically equally spaced). Error bars indicate standard errors of the means. The right axis shows the corresponding geometric means of the Weber fractions

With pairwise t tests, the effect of order reached significance only for St = 1,000 ms, t(53) = – 3.246, p = .002, but just approached significance for St = 100 ms, t(53) = 1.936, p = .058 (in both cases, lnWFSt – Co < lnWFCo – St, indicating negative SPEs).

Considering only the durations 100 and 1,000 ms, as in Experiment 1, the effect of duration was significant, F(1, 53) = 26.502, p < .001, η p 2 = .333 (lnWF1,000 ms < lnWF100 ms). The effect of order was also significant, F(1, 53) = 13.247, p < .001, η p 2 = .200 (lnWFSt – Co < lnWFCo – St), but not the Duration × Order interaction, p = .334. Considering only the durations 215 and 464 ms, only the effect of order was significant, F(1, 53) = 4.193, p = .046, η p 2 = .073 (lnWFSt – Co > lnWFCo – St, indicating positive SPEs).

Time-order error

The means of TOE%s are given in Fig. 7. All obtained mean values were positive. The TOE% values were ANOVA-analyzed similarly to the lnWFs. Only the effect of duration was significant, F(3, 51) = 11.030, p < .001, η p 2 = .394; likewise, the linear trend was significant, F(1, 53) = 22.442, p < .001, η p 2 = .297, and the cubic trend, mirrored in the S-shape of the plots, also reached significance, F(1, 53) = 5.840, p = .019, η p 2 = .099. Considering only the durations of 100 and 1,000 ms—for comparison with Experiment 1—again, only the effect of duration on TOE% was significant, F(1, 53) = 27.693, p < .001, η p 2 = .343.

Fig. 7
figure 7

Experiment 2. Filled auditory intervals: Mean of TOE% versus standard duration (durations are logarithmically equally spaced). Error bars indicate standard errors of the means

For a further comparison between Experiments 1 and 2, an ANOVA was conducted on the lnWF values (only for the no-feedback condition) with Standard Duration (100 ms, 1,000 ms) and Order (St – Co, Co – St) as within-participants factors and Experiment (1, 2) as a between-participants factor. The effect of duration was significant, F(1, 80) = 11.472, p < .001, η p 2 = .125. The effect of experiment was not significant, F(1, 80) = 0.635, p = .428, η p 2 = .008. The Order × Experiment interaction was significant, F(1, 80) = 6.006, p = .016, η p 2 = .050, as was the Duration × Experiment interaction, F(1, 80) = 4.204, p = .044, η p 2 = .070, but the triple, Duration × Order × Experiment interaction was not, F(1, 80) = 0.198, p = .658, η p 2 = .002. The effect of order, like the Duration × Order interaction, did not reach the 5% level of statistical significance.

For TOE%s, the results agree quite well between the experiments, in that the values were less positive for the order Co – St than for the order St – Co, and more so for St = 1,000 ms than for St = 100 ms. Unlike in Experiment 1, TOE% stayed on the positive side.

An ANOVA similar to that on the data from both experiments for the lnWF values was conducted on the TOE% values. Only the effect of order was significant, F(1, 80) = 4.993, p = .028, η p 2 = .059, whereas the Duration × Order interaction failed to reach statistical significance.

Discussion

For WFs, the results of Experiment 2 differed from those of Experiment 1, in that the effect of the order of the standard and comparison stimuli on lnWF was the same for standard durations 100 and 1,000 ms (lnWFSt – Co < lnWFCo – St), whereas for the intermediate durations 215 and 464 ms, the relation was the opposite (lnWFSt – Co > lnWFCo – St). In terms of SPEs, we observed a transition from a negative effect at 1,000 ms to a positive effect at 464 and 215 ms, which unexpectedly reversed at 100 ms. We cannot yet offer an explanation for the difference in the effects of standard duration between Experiments 1 and 2. It may have been due to an unknown effect of the differences in the sets of interval types (Exp. 1, auditory and visual; Exp. 2, only auditory) and standard durations encountered by the participants. Still, the body of results confirms the occurrence of both types of SPE—positive and negative—and that this effect does interact with stimulus duration: a pattern of results that is compatible with the sensation-weighting model but not with the difference model, nor with the internal reference model.

General discussion

The aim of the present experiments was to investigate the comparison and discrimination of intervals with different durations and in different representation forms, as an extension of the studies by Hellström and Rammsayer (2004) and Rammsayer (2014). For this purpose, we employed two different types of intervals (filled and empty ones) presented in the auditory and visual sensory modalities. The sensation-weighting model was used as an analytical and descriptive tool. It should be noted, however, that this by no means excluded the possibility of ending up by selecting the difference or the internal reference model. Duration discrimination sensitivity was shown to depend on the order of the standard and comparison stimuli (the SPE), in a fashion that depended on the type of interval as well as on the standard duration. Although they are incompatible with the difference model and the internal reference model, these results are compatible with the sensation-weighting model, which suggests a flexible strategic weighting of the two stimuli within the comparison process. With the employed interstimulus interval of 900 ms, one common strategy was to give a lower weight to the first-presented stimulus, reflecting the partial retention loss of its duration information, and causing a negative SPE. For some types of brief intervals, however, the opposite strategy was used. This might reflect the fact that with a constant interstimulus interval, the interval between the onsets of the two stimuli decreases with the stimulus duration, and may become brief enough that proactive stimulus interference sets in (see Hellström, 1979). Such flexibility of stimulus weighting is in line with the recent finding by Dyjas, Bausenhart, and Ulrich (2014) that indicating to the participant which stimulus was the comparison eliminated the SPE, thereby showing that the weighting of stimulus information can be put under cognitive control.

It seems pertinent, at this point, to quote Helson (1964):

In classical psychophysics and for many modern workers effects of order of stimulation are merely “errors” of judgment which should be eradicated; but for us TOE and SOE [space-order effect] have deeper significance. So-called time-order and space-order errors arise from the decentered position of adaptation level and are therefore manifestations of lawful underlying mechanisms. . . . The systematic nature of such effects rules out the “error” view of their origin. (p. 159)

The sensation-weighting model as well as the internal reference model (Dyjas et al. 2014) are both in agreement with Helson (1964) in rejecting the difference model. What distinguishes their theoretical positions is that the internal reference model assumes that only the first stimulus is replaced by an internal reference (as was also assumed by Michels & Helson, 1954), whereas the sensation-weighting model is more general, since it also assumes an analogous process for the second stimulus. This generalization (Hellström, 1979) is built on the notion that, at the moment of comparison, both stimuli are in memory, not just the first one. Many of the results of, for example, Hellström (1979) could never be explained without the notion that weighting also affects the second stimulus. Furthermore, within the conceptual framework of the sensation-weighting model, stimulus weights > 1 (indicating contrastive rather than assimilative integration of stimulus and reference level) are allowed, and indeed seem to occur in some stimulus conditions (see Fig. 3 of Hellström, 1979). The model is thereby able to account for a wide range of phenomena in stimulus comparison, notably the TOE, for which the sensation-weighting model has an explanation similar to Helson’s (1964), whereas the internal reference model has none. As is suggested by the results of the present experiments, factors such as sensory modality (auditory, visual), type of interval (filled, unfilled), and standard duration modulate the parameters of the sensation-weighting model.

TOEs need not always appear in an experiment with only one standard per block (see Hellström, 2000). According to the sensation-weighting model, this is because, apart from bias effects, the TOE is proportional to the distance from the reference level. It is important to note that the terminologies in the models differ in a crucial way: In the internal reference model, the internal reference represents the current stimulus weighted together with all the previous experimental stimulation (with a constant weight across trials). Extra-experimental stimulation is ignored in the internal reference model, which is a troublesome limitation, in that it excludes any TOEs. Except for this, the internal reference model is quite similar to Michels and Helson (1954); see also Helson 1964, pp. 209–215) TOE model, in which “s and t are the relative weights of S [the first-presented stimulus] and SAL [the series adaptation level] in determining A 1 [the comparative adaptation level; i.e., the internal reference] and are subject to the condition: t = 1 – s” (p. 329). (Please note that Michels & Helson, 1954, applied the s and t weights to the logarithms of the stimulus magnitudes.) In the internal reference model, this kind of weighting is applied at every trial, specifying how the experimental stimuli build up the internal reference in the course of the experiment. In the sensation-weighting model, which arose as a generalization of the Michels–Helson model (see Hellström, 1979), the reference level is similar to what Michels and Helson called the “series adaptation level,” in that it represents all of the influential stimulation, which may be seen as a weighted compound of experimental, background, and residual stimulation (see Helson, 1964).

The sensation-weighting model makes no detailed assumptions as to the buildup processes. Rather, the subjective magnitudes of the stimulus and the pertinent reference level are combined by weights to form the subjective magnitudes, the difference of which produces the subjective difference underlying comparison and discrimination. As was discussed in detail in Patching, Englund, and Hellström (2012), this weighting mechanism seems to be optimized to serve as an adaptive strategy aimed at maximizing the discriminability of small relative changes in the compared stimuli. TOEs arise as by-products. According to this view, SPEs and TOEs originate from the same process, although this becomes clearer when using designs in which both stimuli are varied (e.g., Hellström, 1979; Patching et al. 2012).

Raviv, Lieder, Loewenstein, and Ahissar (2014) suggested that “sensory processes cannot be studied in isolation, or ‘out of context’: even in the simplest discriminations they involve complex statistical learning that affects participants’ performance” (p. 2, Author Summary). This statistical learning is incorporated, albeit differently, in contemporary models such as the sensation-weighting model and the internal reference model, as well as in the older Michels–Helson model. The insight that sensory processes are dependent on their context may help reduce the frustration due to diverging results in seemingly similar experiments, and may provide clues to a deeper understanding of these results.

For temporal stimuli, once they have generated temporal sensations, these seem to be handled by essentially the same comparison mechanisms as do sensations in other modalities—resulting in TOEs and SPEs being obtained in temporal comparison studies such as the present one. Still, the detailed functioning of these mechanisms is dependent on the stimulus conditions, as the present experiments illustrate, but as is also known from studies in other modalities (see Hellström, 1979, 2003). In this respect, time may not be so special after all.