Independent-channels models of temporal-order judgment revisited: A model comparison

The perception of temporal order or simultaneity of stimuli is almost always explained in terms of independent-channels models, such as perceptual-moment, triggered-moment, and attention-switching models. Independent-channels models generally posit that stimuli are processed in separate peripheral channels and that their arrival-time difference at a central location is translated into an internal state of order (simultaneity) if it reaches (misses) a certain threshold. Non-monotonic and non-parallel psychometric functions in a ternary-response task provided critical evidence against a wide range of independent-channels models. However, two independent-channels models have been introduced in the last decades that can account for such shapes by considering misreports of internal states (response-error model) or by assuming that simultaneity and order judgments rely on distinct sensory and decisional processes (two-stage model). Based on previous ideas, we also consider a two-threshold model, according to which the same arrival-time difference may need to reach a higher threshold for order detection than for successiveness detection. All three models were fitted to various data sets collected over a period of more than a century. The two-threshold model provided the best balance between goodness of fit and parsimony. This preference for the two-threshold model over the two-stage model and the response-error model aligns well with several lines of evidence from cognitive modeling, psychophysics, mental chronometry, and psychophysiology. We conclude that the seemingly deviant shapes of psychometric functions can be explained within the framework of independent-channels models in a simpler way than previously assumed. Supplementary Information The online version contains supplementary material available at 10.3758/s13414-024-02915-5.


Introduction
A fundamental and classical question in psychophysics is how humans perceive temporal relations between different stimuli.This question arises for stimuli encountered in the same sensory modality (e.g., two lights) as well as for stimuli encountered in different sensory modalities (e.g., one light and one sound).Although accurate temporal integration is essential for many cognitive operations such as speech recognition and motor coordination (see e.g., Buonomano & Karmarkar, 2002;Lashley, 1951), the perceived and the actual temporal relation of physical events often mis-B Paul Kelber paul.kelber@uni-tuebingen.de 1 Department of Psychology, University of Tübingen, Schleichstraße 4, Tübingen 72076, Germany match (e.g., Dennett & Kinsbourne, 1992;Shore et al., 2002;Stone, 1926;Zampini et al., 2003).Astronomers of the 18 th and 19 th century already became aware of this when they noticed considerable discrepancies between their judgments about the time of star transits (Mollon & Perkins, 1996).The astronomer Friedrich Bessel thus established the concept of the "personal equation" to take into account the dependence of temporal judgments on the observer's perceptual processes.These events constituted a starting point of experimental psychology (Boring, 1950), as the research of the astronomers was picked up by pioneering psychologists like Wilhelm Wundt.Since then, psychologists have been devoted to related topics, such as whether an attended stimulus gains faster access to consciousness compared to an unattended stimulus ("prior entry"; Titchener, 1908).In fact, a wealth of research suggests that attended stimuli tend to be perceived earlier than concurrently presented, unattended stimuli (e.g., Jaśkowski, 1993;Schneider & Bavelier, 2003;Sternberg et al., 1971;Shore et al., 2001;Spence et al., 2001;Tünnermann et al., 2015).

Judgment of temporal order and simultaneity
The two most widespread tasks used to study such phenomena are temporal-order judgment (TOJ) and simultaneity judgment (SJ) tasks. 1 In each of these binary-response tasks, the presentation of two stimuli S x and S y (e.g., light and tone) starts at times t x and t y , where the stimulus-onset difference d = t y − t x is manipulated across trials. 2 Since t y = t x + d, positive, zero, and negative values of d reflect that S x is presented first (t x < t y ), that S x and S y are presented simultaneously (t x = t y ), and that S y is presented first (t x > t y ).In the TOJ task, subjects respond whether S x appeared first (R xy ) or whether S y appeared first (R yx ).In the SJ task, subjects respond whether S x and S y appeared simultaneous (R si ) or successive (R su ).The TOJ task yields the psychometric function P(R xy | d) = 1 − P(R yx | d), which tends to increase with d.The SJ task yields the psychometric function P(R si | d) = 1 − P(R su | d), which tends to exhibit a bell shape over values of d.
TOJ and SJ tasks are commonly used as tools to infer effects on perceptual latency (see e.g., Badde et al., 2020;Vroomen et al., 2004;Zampini et al., 2005).Nowadays, this usually involves fitting a somewhat arbitrary function to the data (e.g., logistic psychometric functions for TOJs, Gaussian functions for SJs), based on which certain indices such as the "point of subjective simultaneity" (PSS) are derived.A shift of the PSS is then often used as a proxy to support statements about perceptual latencies without further specifying the underlying mechanisms.

Independent-channels models
If explanations of performance in TOJ and SJ tasks are offered at all, they usually fall within the class of independentchannels models (Sternberg & Knoll, 1973).Their basic form is as follows: The stimuli S x and S y are processed in separate peripheral channels until each stimulus is registered at a central location in the brain.The arrival latencies to register S x and S y are represented by the random variables L x and L y .Thus, S x is registered at arrival time A x = L x + t x and S y at A y = L y + t y = L y + t x + d.A central decision mechanism determines the temporal order of S x and S y according to a decision rule that operates on the arrival-time difference This architecture of independent-channels models implies that processing in one peripheral channel (L x ) is unaffected by processing in the other peripheral channel (L y ), and vice versa.It also implies that the arrival-latency difference L is independent of the stimulus-onset difference d.Further, the decision rule generated by the central decision mechanism is assumed to be unaffected by the arrival latencies generated in the peripheral channels, and vice versa.Sternberg and Knoll (1973) have shown that a general independent-channels model encompasses many models of timing judgment, such as perceptual-latency models (Baron, 1969;Gibbon & Rutschmann, 1969), perceptualmoment models (Gibbon & Rutschmann, 1969;Stroud, 1955), triggered-moment models (Baron, 1971;Venables, 1960), and attention-switching models (Allan & Kristofferson, 1974;Allan, 1975a;Kristofferson, 1967a, b;Schmidt & Kristofferson, 1963).All of these models are special cases of the general independent-channels model, since they result from a certain specification of the decision rule.
According to the deterministic decision rule assumed by perceptual-latency models, the stimulus that arrives first appears first.However, perceptual-latency models have been invalidated by several findings (e.g., Allan, 1975a;Heath, 1984).Unlike perceptual-latency models, other independentchannel models (perceptual-moment, triggered-moment, and attention-switching models) posit that the temporal resolution of the arrival-time comparator is limited by a central timing mechanism.Specifically, according to perceptualmoment models, stimuli are only perceived as ordered if they arrive in different "moments" (the assumed discrete building blocks of psychological time), where the timing of moments is unaffected by the central arrival of stimuli.Triggeredmoment models follow perceptual-moment models, except that the first central arrival of a stimulus is assumed to initiate a new moment.Finally, according to attention-switching models, attention can only be directed to one channel at a time, and attention can only be switched from one channel to another at certain time points separated by one "time quantum".Further, an arrival is only registered if the respective channel is attended.Thus, two stimuli are only perceived as ordered in time if the temporal order of their registrations can be determined.
Differences aside, perceptual-moment, triggered-moment, and attention-switching models hypothesize a central oscillatory process, whose period (i.e., the duration of a moment or time quantum) is formally reflected in a threshold (or criterion) c that the magnitude of the arrival-time difference, | A|, must reach to perceive temporal order.Therefore, these independent-channels models may also be referred to as threshold models (Ulrich, 1987).
Theoretically important, Sternberg and Knoll (1973) demonstrated that psychometric functions are shaped not only by the specific assumptions about the central decision mechanism but also by the distribution of the arrival-latency difference.As a result, it is impossible to examine the central decision mechanism in isolation.Instead, empirical testing of hypotheses about the central decision mechanism requires subsidiary assumptions about the arrival-latency difference.
In general, it seems realistic that stimulus detection requires several successive neural pulses rather than only a single neural pulse.Therefore, one might also assume that the arrival latencies L x and L y each reflect the sum of a fixed number of independent, exponentially distributed latencies, which in turn reflect the times between two successive neural pulses.In that case, L corresponds to the difference of two Erlang distributions.As shown in Appendix B, this difference converges to a normal distribution with an increasing number of required neural pulses.Similarly, one might assume that L x and L y each reflect the duration of a noisy accumu-lation process drifting towards a single absorbing boundary (cf.Ratcliff, 1978).The first-passage times of such a Wiener process follow a Wald (inverse Gaussian) distribution (see e.g., Luce, 1991).Thus, in this case, L corresponds to the difference of two Wald distributions.This difference is again reasonably well approximated by the normal distribution for a range of plausible mean values of L x and L y (see Appendix B).Overall, these theoretical considerations appeal to the subsidiary assumption of a normally distributed (rather than Laplace-distributed) arrival-latency difference L.

Shape of psychometric functions in the ternary-response task
To empirically distinguish between competing accounts of the central decision mechanism, timing-judgment tasks that rely on more than two response alternatives and thus yield multiple psychometric functions proved to be more informative than the binary TOJ and SJ tasks (Allan, 1975a;Sternberg et al., 1975Sternberg et al., , 2023)).In the simplest extension of the TOJ and the SJ task, the ternary-response task (Ulrich, 1987), subjects choose between the three response alternatives R xy , R si , and R yx .Thus, this task yields the three psychometric functions P(R xy | d), P(R si | d), and P(R yx | d), which can also be represented by the two psychometric functions Specific independent-channels models constrain the relationship between the two psychometric functions P(R xy | d) and 1 − P(R yx | d) (Allan, 1975a;Ulrich, 1987;Sternberg et al., 1975Sternberg et al., , 2023)).In particular, many independent-channels models require that P(R xy | d) and 1− P(R yx | d) are monotonically increasing functions (monotonicity) that can be transformed into each other solely by horizontal translation (parallelism).On the one hand, the functions must be monotonically increasing according to perceptual-moment, triggered-moment, and attention-switching models, because these functions are conceptualized as cumulative distribution functions in the general independent-channels model.On the other hand, the functions must be parallel according to several perceptual-moment, triggered-moment, and attention-switching models, because the distributions that are assumed to govern the observed psychometric functions are identical in shape (see Ulrich, 1987).

Explanations for non-monotonic and non-parallel psychometric functions
In the last decades, different independent-channels models have been developed to account for observed non-monotonic and non-parallel shapes of psychometric functions (e.g., Allan, 1975a;Ulrich, 1987;Jaśkowski, 1991a).To this end, the three independent-channels models described below elaborate on the mapping from unobservable internal (perceptual) states I to observable responses R. In other words, these models assume that there is no one-to-one relation between internal states and responses in the ternary-response task.Critically, the models differ in how many internal states they posit.
First, the response-error model (REM; García-Pérez & Alcalá-Quintana, 2012b) assumes that after each presentation of S x and S y , one of three internal states results: (a) the state I xy that S x appears before S y , (b) the state I si that S x and S y appear simultaneous, or (c) the state I yx that S y appears before S x .Crucially, the REM allows for mistranslations of these internal states I ∈ {I xy , I si , I yx } into the responses R ∈ {R xy , R si , R yx }: Each internal state I (e.g., I xy ) is misreported with some probability, and given a misreport of I , each non-corresponding response R (e.g., R si and R yx ) is provided with a certain probability.This resembles modeling approaches in visual psychophysics to account for finger errors and attentional lapses (e.g., Swanson & Birch, 1992;Wichmann & Hill, 2001).
Second, in addition to the internal states I xy , I si , and I yx , the two-stage model (TSM; Jaśkowski, 1991b) also postulates the internal state I su that S x and S y appear successive as well as the internal state I ?that no order of S x and S y is perceived.Each of these five internal states is elicited by one of two distinct processes, which are assumed to govern successiveness and order detection: The arrival-time difference in the "successiveness center" (or "simultaneity center") is translated into an internal state of simultaneity (I si ) or successiveness (I su ).Independently of this, the distinct arrival-time difference in the "order center" is translated into one of three internal states -precedence of x (I xy ), precedence of y (I yx ), or imperceptible order (I ?).In the ternary-response task, I si leads to R si irrespective of the internal order-detection state.
Moreover, I su ∩ I xy leads to R xy , I su ∩ I yx to R yx , and I su ∩ I ? to each response with a certain guessing probability.
Third, the two-threshold model (TTM) proceeds from the assumption that successiveness detection need not suffice for order detection.Specifically, observers might simply set a higher threshold of | A| for order detection than for successiveness detection, which implies the possibility of detecting successiveness without detecting order. 4This idea was initially discussed by Sternberg et al. (1975Sternberg et al. ( , 2023) ) and more recently by García-Pérez andAlcalá-Quintana (2015b, 2018).According to the TTM, one of the four internal states I xy , I si , I su , and I yx results depending on the size of the arrival-time difference A. If | A| does not reach the threshold for successiveness detection, c su , simultaneity is perceived (I si ).If | A| reaches the threshold for order detection, c o (with c o ≥ c su ), order is perceived (I xy and I yx for positive and negative values of A).Finally, if c su ≤ | A| < c o , I su results.In this case, R xy is provided with guessing probability g, and R yx with the complementary probability 1 − g.The TTM thus allows for "order-uncertainty bands" at the two response boundaries in the ternary-response task (see Fig. 1).Theoretically, this implies another perceptual state besides order perception and simultaneity perception, namely the perception of "unordered successiveness".In this way, the TTM bears resemblance to other two-threshold models of signal detection (Krantz, 1969) and pitch discrimination (Wickelgren, 1969).

Present study
The goal of the presented research was to systematically evaluate the explanatory power of the three independentchannels models (REM, TSM, TTM) informally introduced above.To this end, these models were compared in terms of their balance between goodness of fit and parsimony against many data sets collected over a span of more than a century. 4The name of each of the three considered models points to the characteristic that distinguishes it from a basic independent-channels model assuming a variable (e.g., normally distributed) arrival-latency difference L and a constant threshold c: (1) The response-error model (REM) allows for erroneous translations of internal states into responses, (2) the two-stage model (TSM) posits that the perception of simultaneity and temporal order are governed by two distinct processes (i.e., two independently distributed arrival-latency differences and two independent thresholds), and (3) the two-threshold model (TTM) assumes that the threshold for order detection may exceed the threshold for successiveness detection.Note that the TSM also assumes two thresholds, but unlike the TTM, the TSM is not limited to this, as it assumes two entirely distinct SJ and TOJ processes.We use the name TSM instead of the possibly more precise name "two-process model" to be consistent with the terminology introduced by Jaśkowski (1991b).Moreover, the model assumptions are discussed in relation to empirical findings and theoretical questions such as whether successiveness detection is necessary or even necessary and sufficient for order detection.In the next section, the three models to be compared are formalized.

Formalization of the models
The specification of the REM and the TSM provided by García-Pérez and Alcalá-Quintana (2012b) and Jaśkowski (1991b) differ in several respects that are not the main focus of the present study.In particular, García-Pérez and Alcalá-Quintana (2012b) assumed that the arrival-latency difference L follows a Laplace (bilateral exponential) distribution, whereas Jaśkowski (1991b) assumed L to be normally distributed.To attribute the model selection results to the differences of interest here (role of response errors, separate successiveness and order processes, or unequal thresholds), all models had to be brought onto a common basis.Therefore, we assumed that L is normally distributed with mean μ L and variance σ 2 L .A normal distribution was assumed because it provides a simple and, in many realistic scenarios, satisfactory approximation of the difference between two Erlang-distributed or Wald-distributed arrival latencies (see Appendix B).However, to check whether our conclusions are robust to the use of different distributional assumptions, we repeated the model comparison with the alternative assumption that L is Laplace-distributed with location μ L and scale b L .As described below, this led to the selection of the same model, thus giving hope that the results of the present model comparison are not specific to our particular distributional assumptions in this study.
Furthermore, we assumed that the threshold for successiveness detection, c su , is constant and symmetric around the simultaneity band.The three parameters μ L , σ L , and c su yield a basic independent-channels model, according to which

and I xy if
A ≥ c su .This basic model can, for instance, be conceived as a triggered-moment model, where the first central arrival of a peripheral signal elicits the start of a new moment.Because each of the three models (REM, TSM, TTM) comprises the three parameters μ L , σ L , and c su , they all represent extensions of this basic independent-channels model.These extended models are described below and illustrated in Fig. 2. Appendix A includes the derivations of the formulas for the internal-state probabilities according to the three models.

Response-error model (REM)
Unlike the TSM and the TTM, the REM requires that successiveness and order are detected if A reaches a single threshold c = c su = c o .Importantly, the REM introduces the response-error rates xy , si , and yx that reflect the probability of misreporting I xy , I si , and I yx , respectively.Moreover, κ xy−yx , κ si−xy , and κ yx−xy denote the conditional probability of misreporting I xy as R yx , I si as R xy , and I yx as R xy given that the respective internal state is misreported.It follows that Unlike García-Pérez and Alcalá-Quintana (2012a, b), we did not include an additional parameter τ , which reflects a further processing delay for one channel.5The resulting nine-parameter model predicts that The frequency and direction of response errors may differ across studies depending on how the response buttons are arranged (García-Pérez & Alcalá-Quintana, 2012a, b).Therefore, we followed García-Pérez and Alcalá-Quintana (2012a) and Alcalá-Quintana and García-Pérez (2013) by fitting seven reduced versions of the REM to the data from each study: One three-parameter "zero model" REM ∅ -which forms the base of REM, TSM, and TTM alike -has no response-error or -bias parameter at all.Three five-parameter models (REM xy , REM si , REM yx ) have an error parameter and a bias parameter for one response.Finally, three sevenparameter models (REM xy,si , REM xy,yx , REM si,yx ) have an error parameter and a bias parameter for two responses.The full nine-parameter REM = REM xy,si,yx , which includes an error parameter and a bias parameter for each of the three responses in the ternary-response task, was superior to all reduced REMs for seven out of eight studies (see Table S24 in the Supplementary Material).It is thus not surprising that the REM min , which takes the form of the best model for each study, performs the same as the full-fledged REM when compared to the TSM and the TTM.Therefore, we only consider the nine-parameter REM in the following.

Two-stage model (TSM)
The TSM includes separate arrival-latency differences as well as separate thresholds c su , c o for successiveness and order detection.It is posited that I si in the successiveness center always leads to R si irrespective of the internal state in the order center (I xy , I yx , or I ?).However, for I su , I xy leads to R xy , I yx to R yx , and I ? to R xy (R yx ) with probability g (1 − g). 6The seven-parameter model thus predicts that

Two-threshold model (TTM)
Similar to the TSM, the TTM includes separate thresholds for the detection of successiveness (c su ) and order (c o ).However, while the TSM allows c su and c o to vary independently, the TTM imposes the constraint that c o ≥ c su .Furthermore, g (1 − g) here reflects the probability of reporting I su as R xy (R yx ).7 Unlike the TSM, but like the REM, the TTM assumes that successiveness and order detection are based on the same arrival-latency difference L ∼ N (μ L , σ 2 L ).Like the TSM, but unlike the REM, the TTM assumes that observers do not produce response errors, or at least that they are so rare such that they can be ignored for parsimony in modeling.As a result, the TTM adds only two parameters to a basic independent-channels model that only has the three parameters μ L , σ 2 L , and c su .Each of the two additional parameters is intended to address a shortcoming of many independent-channels models: By assuming that the threshold for order detection can exceed the threshold for successiveness detection (i.e., c o ≥ c su ), one can account for non-monotonic psychometric functions P(R xy | d) and 1−P(R yx | d) (even if R xy and R yx were given at random for I su ).By further adding the response-bias g, one can also account for differently shaped non-parallel psychometric functions P(R xy | d) and 1 − P(R yx | d) by capturing the subject's response tendency when uncertain about the temporal order of stimuli perceived as successive.Thus, both additional parameters in the TTM seem justified.According to the resulting five-parameter model, the predicted response probabilities are

Studies
We considered data from ternary-response tasks and data from other tasks that could be plausibly converted to the responses R xy , R si , and R yx .As a result, data from eight studies were included in the present model comparison.These studies differed in several respects (e.g., stimulus modality, stimulus type, response format).Most notably, some studies used intramodal stimuli (i.e., two visual stimuli), whereas other studies used intermodal stimuli (i.e., a visual stimulus and an auditory stimulus).However, the models considered here are sufficiently general to be applicable to all these different experimental settings.A brief description of the eight studies follows.Benussi (1913, pp. 363-369) conducted three experiments using a ternary-response task.The onset of two horizontally displaced light flashes was manipulated with d = −150, −120, ..., 150 ms in Experiment 1, and d = −90, −60, ..., 90 ms in Experiments 2 and 3 (small rounding errors due to actual step size of 29.7 ms).For each experiment and each value of d, Benussi (1913) added the number of trials in which a particular response was given across all 32 subjects.By dividing this sum by the total number of trials for each value of d, an average psychometric function across all subjects was obtained for each of the three experiments.Allan (1975a) tasked three subjects to indicate their SJ (R si or R su ) followed by their TOJ (R xy or R yx ) with regard to the stimulus-offset difference (d = −100, −75, ..., 100 ms) of a light and a tone (see also Yarrow et al., 2011).In accordance with Ulrich (1987) and Jaśkowski (1991b), we coded R su followed by R xy (R yx ) as R xy (R yx ), and R si followed by R xy or R yx as R si .Allan (1975b) supplemented an audio-visual TOJ task (stimulus-offset difference d = −100, −75, ..., 100 ms; six subjects) with a confidence rating, yielding the four response alternatives R xy ∩ certain , R xy ∩ uncertain , R yx ∩ uncertain , and R yx ∩ certain (see also Arnold et al., 2020;Keane et al., 2015).Following Ulrich (1987) and Jaśkowski (1991b), we coded R xy ∩ certain as R xy , R xy ∩ uncertain and R yx ∩ uncertain both as R si , and R yx ∩ certain as R yx .
Ulrich (1987) used a ternary-response task with regard to the onset of two vertically displaced light flashes (d = −100, −75, ..., 100 ms).For each of the three subjects, the intensity of the stimuli was manipulated so that both visual stimuli were equally bright in the high-intensity condition but equally dim in the low-intensity condition.
Jaśkowski (1991a, Experiment 3) also devised a ternaryresponse task regarding the onset of two vertically displaced light flashes (d = −75, −60, ..., 75 ms).The task-irrelevant stimulus duration was manipulated within three subjects (bottom stimulus shorter, same duration, or longer than top stimulus).Note that the data from this study had to be read from a figure, which resulted in minimal deviations from the observed data.
Also van Eijk et al. ( 2008) applied an audio-visual ternary-response task (d = −350, −300, ..., 350 ms).It was manipulated across sessions whether the stimuli were simple (light flash and click) or complex (bouncing ball and appropriate impact sound).Twelve subjects were tested with the complex stimuli, of which eleven were also tested with the simple stimuli.
García-Pérez and Alcalá-Quintana (2018) asked 19 subjects to perform a quaternary-response task with the four response alternatives R xy , R yx , R si , and R su (see also Weiß & Scharlau, 2010, 2011).The response R su here indicated that the stimuli appeared successive, but in an unclear temporal order.The onset of two horizontally displaced visual stimuli was manipulated in steps of 17 ms according to an adaptive procedure (García-Pérez, 2014).As predicted by the TTM and consistent with the TSM, P(R su | d) tended to follow a bimodal distribution, with the highest response probabil-ity left and right of the peak of P(R si | d).However, many subjects gave R su rarely or not all, possibly because they considered this response undesirable and therefore resorted to other response alternatives (Weiß & Scharlau, 2011).As noted by García-Pérez and Alcalá-Quintana (2018, p. 283), the psychometric functions of those subjects often exhibited non-monotonicities around the simultaneity band (i.e., at the locations at which two peaks of R su are expected based on the TTM).However, due to the rather infrequent use of R su and since it is unclear how the REM would have to be modified to fit the distribution of R su without invoking some form of TTM, trials with the response R su were excluded from the present analysis, and the three models were fitted to the distributions of the responses R xy , R si , and R yx .
Finally, Lahkar et al. ( 2023) conducted a ternary-response task with two horizontally displaced light flashes (d = −65, −60, ..., 65 ms).Responses were summed across the 24 subjects for each of the 27 different values of d.Accordingly, we fitted the models to the data aggregated across all subjects.

Model fitting
Each model was fitted to the observed frequency of R xy , R si , and R yx as a function of d by minimizing the likelihood-ratio goodness-of-fit statistic where O Rd (E Rd ) denotes the observed (expected) frequency of response R ∈ {R xy , R si , R yx } for a particular value of d (e.g., d ∈ {−100, −75, ..., 100 ms}).It is well known that G 2 is inaccurate for small expected frequencies (see e.g., Cochran, 1952Cochran, , 1954)).Therefore, a somewhat arbitrary lower bound for expected frequencies was set at 0.1 to reduce the high leverage of very small expected frequencies.This improved the fits of all models, without affecting the outcome of the model selection.
The statistic G 2 was minimized using the quasi-Newton optimization algorithm L-BFGS-B.The parameter space was constrained as follows (initial parameter values in brackets): Mean arrival-latency differences (μ L , μ L su , μ L o ) could vary from −100 to 100 ms (−33, 34 ms), the standard deviations of arrival-latency differences (σ L , σ L su , σ L o ) from 1 to 100 ms (33, 67 ms), thresholds (c, c su , c o ) from 1 to 150 ms (50, 100 ms), response errors ( xy , si , yx ) from 0 to 0.8 (0.27, 0.53), and response biases (κ xy−yx , κ si−xy , κ yx−xy , g) from 0 to 1 (0.33, 0.67).For each parameter, two initial values were evenly distributed in the search space (see above).Parameters were estimated for each combination of initial parameter values, thus yielding 2 9 = 512 iterations for the REM, 2 7 = 128 iterations for the TSM, and 2 5 = 32 iterations for the TTM.Notably, parameter estimation based on such a multitude of initial values was necessary to identify the best-fitting parameter values of the REM and the TSM, whereas this turned out to be unnecessary for the TTM.That is, G 2 values varied as a function of the initial values for the REM and the TSM, but not for the TTM.
Because G 2 is only asymptotically χ 2 -distributed, chisquare goodness-of-fit tests yield an inflation of significant results for small expected frequencies (García-Pérez, 1994;García-Pérez & Núñez-Antón, 2001, 2004), which were prevalent in many of the data sets considered here (e.g., Lahkar et al., 2023).Furthermore, estimation of confidence intervals via parametric bootstrapping (see Alcalá-Quintana & García-Pérez, 2013; García-Pérez & Alcalá-Quintana, 2012b; Wichmann & Hill, 2001) lead to large differences in the width of the confidence intervals for G 2 across the three models, which exhibited considerable differences in their best-fitting variability parameter values.It therefore seemed most sensible not to subject G 2 to a chi-square or parametricbootstrap goodness-of-fit test, but instead to first assess the goodness of fit qualitatively based on visual inspection and then to offset goodness of fit (G 2 ) against model complexity in a quantitative model comparison.Note that each individual fit upon which the model comparison is based is presented either below or in the Supplementary Material.
Following previous TOJ model comparisons (e.g., García-Pérez & Alcalá-Quintana, 2012a; Lahkar et al., 2023;Tünnermann & Scharlau, 2018b;Wen et al., 2020), model selection was based on the Bayesian information criterion (BIC; Schwarz, 1978): where G 2 rewards goodness of fit and k • ln(n), with number of parameters k and number of observations n, punishes complexity.Thus, the model with the lowest overall BIC value across the eight studies was considered best.The BIC was used because it is consistent, that is, the probability of selecting the true (data-generating) model approximates 1 as n increases (e.g., Claeskens & Hjort, 2006).Unlike other information criteria such as the Akaike information criterion, the BIC therefore seemed appropriate for the objective of the present study, namely to compare the explanatory power of several candidate models.Additionally, to quantify the specific ability of the models to fit non-monotonicities in the observed psychometric functions where R ∈ {R xy , R xy ∪ R si } and i runs from the first to the penultimate value of d.The higher the NMI (where NMI ≥ 0), the stronger the non-monotonicity in the functions.
The NMI was calculated for both observed and fitted psychometric functions to substantiate our inference from visual inspection that the models fitted the observed non-monotonic curves with different success.

Results
In general, visual inspection revealed that all three models fitted the observed psychometric functions P(R xy | d), P(R si | d), and P(R yx | d) well.As an example, Fig. 3 shows the model fits to the individual data from Ulrich (1987).Individual fits for all other studies are depicted in the Figs.S1-13 in the Supplementary Material.In addition, the observed and fitted psychometric functions P(R xy | d) and 1− P(R yx | d) are displayed in Fig. 4 for Ulrich (1987) and in the Figs.S14-26 in the Supplementary Material for the other studies.These figures allow a visual assessment of the shape characteristics (monotonicity, parallelism) that are predicted by many independent-channels models (perceptual-moment, triggered-moment, and attention-switching models) in the observed data and in the fits of the three independentchannels models considered here (REM, TSM, TTM).First, the observed psychometric functions were often not monotonic (see e.g., subject GU, high-intensity condition in Fig. 4).In fact, non-monotonic curves were found in each of the eight studies considered.It is notable that the fits of the TSM and the TTM tended to follow non-monotonic trends in the data, whereas the REM fits rarely did so (see e.g., Fig. 4 as well as Figs.S17-25 in the Supplementary Material).Specifically, in the REM fits, the height of the asymptotes was often adjusted without taking into account the observed "bumps".This does not imply that the REM cannot account for nonmonotonic psychometric functions at all, since (1) its fits were non-monotonic in some cases (e.g., for subject 2 from García-Pérez and Alcalá-Quintana, 2018; see Fig. S23 in the Supplementary Material), and more generally, (2) mathematical models can typically generate various curves, not just the ones produced by their best-fitting parameter values (Roberts & Pashler, 2000).Nevertheless, the TSM and the TTM appeared more sensitive to non-monotonic curves compared to the REM.Quantitatively, this is supported by the fact that the NMI for the TSM fits and the TTM fits were much closer to the NMI in the observed psychometric functions than the NMI for the REM fits (see Table S25 in the Supplementary Material).Second, the observed psychometric functions P(R xy | d) and 1 − P(R yx | d) were often not parallel (see e.g., subject UD, high-intensity condition in Fig. 4).The fits of all three models followed such non-parallel curves.Crucially, the total BIC value summed across all studies was lowest for the TTM, second lowest for the TSM, and highest for the REM (see Table 1).Accordingly, the TTM provided the most satisfactory account of the three models.The selection of the TTM appeared to be fairly robust: For six of the eight studies considered, the TTM provided the lowest BIC value, with the TSM yielding the lowest value for the study by Allan (1975b) and the REM for the study by van Eijk et al. (2008).
When comparing the models separately for studies with intramodal (i.e., two visual) stimuli (Benussi, 1913;Ulrich, 1987;Jaśkowski, 1991a;García-Pérez & Alcalá-Quintana, 2018;Lahkar et al., 2023) and for studies with intermodal (i.e., visual and auditory) stimuli (Allan, 1975a, b;van Eijk et al., 2008), the TTM provided the lowest overall BIC value in both cases.However, while the TTM yielded the lowest BIC value for each of the five studies with intramodal stimuli, the preference for the TTM was less pronounced and not  Ulrich (1987).Note.x: top; y: bottom consistent across the three studies with intermodal stimuli (see Table 1).8Moreover, the BIC value of the TTM was lower than the minimum BIC value across a hierarchy of eight REMs with different types and numbers of response-error and -bias parameters.This suggests that the REM is inferior to the TTM even when taking into account that the frequency and direction of response errors might depend on the set-up in a particular study (e.g., arrangement of response buttons).
Finally, the TTM also yielded the lowest overall BIC value when comparing (otherwise equivalent) models with the alternative assumption of Laplace-distributed arrival-latency differences (see Table 2 and Figs.S27-54 in the Supplementary Material).When comparing the individual models assuming a normal distribution and a Laplace distribution, the normal distribution provided the better fits in the REM and in the TTM, whereas the Laplace distribution provided the better fits in the TSM.Thus, taken together, the TTM with the normal distribution assumption performed best.
The mean best-fitting parameter values of the REM, TSM, and TTM can be found in the Tables 3, 4, and 5. Best-fitting mean arrival-latency differences were similar across models, with the exception of μ L o in the TSM, which sometimes differed greatly.Surprisingly, the mean best-fitting TSM parameters μ L su and μ L o had different signs in three out of eight studies.In these cases, the TSM fits seem to suggest a shorter mean arrival latency of one stimulus (e.g., S x ) in the successiveness center but of the other stimulus (e.g., S y ) in the order center.The standard deviation of the arrival-latency difference was estimated to be lowest for the REM, medium for the TTM and for the successiveness center of the TSM, and highest for the order center of the TSM.
Furthermore, the threshold was estimated to be lowest for detecting successiveness (TSM, TTM), medium for the REM, and highest for detecting order (TSM, TTM).Particularly, in seven out of eight studies, the mean best-fitting TSM parameter values conformed to the constraint of the TTM that the order threshold is at least as high as the successiveness threshold.The exception to this was the study from van Eijk et al. (2008), for which the estimated thresholds were generally much higher than in the other studies.
Response errors were estimated by the REM to occur quite frequently.This is particularly the case for I si , which was estimated to be accidentally reported as R xy or R yx in an average of 24% of all trials.For Ulrich (1987), this value was as high as 38%, despite five practice sessions and ten experimental sessions.According to the mean best-fitting response-bias parameters of the REM, I xy (I yx ) was less likely to be reported as R si than as R yx (R xy ).This also hints at some region in the decision space where stimuli appear successive but their order cannot be resolved.Finally, the estimates of the response-bias parameter g in the TSM and the TTM differed substantially across subjects and studies, suggesting that this parameter is needed in the framework of the two models to account for different individual and studyrelated response tendencies.The trends in the best-fitting parameters reported here were similarly evident in the fits of the models assuming Laplace-distributed arrival-latency differences (see Tables S21-23 in the Supplementary Material).

Discussion
The present study compared three independent-channels models, the REM (García-Pérez & Alcalá-Quintana, 2012b), the TSM (Jaśkowski, 1991b), and the TTM (inspired by previous ideas from Sternberg et al., 1975, 2023and García-Pérez & Alcalá-Quintana, 2015b, 2018).In general, independent-channels models assume that stimuli are processed in separate peripheral channels.Temporal order (simultaneity) is perceived if the arrival-time difference of the stimuli at a central location reaches (misses) a certain threshold, which may reflect the duration of a perceptual moment or time quantum.More specifically, the three independent-channels models considered here were developed to explain non-monotonic and non-parallel psychometric functions generated in the ternary-response task.
The REM attributes such shapes to finger errors, the TSM to independent simultaneity and order processing, and the TTM to a higher threshold for order detection than for successiveness detection.These three models were fitted to data from eight previous studies.In sum, the TTM provided the lowest BIC value and was thus considered superior to the TSM and the REM.A more detailed evaluation of each model follows.

Response-error model (REM)
One issue with the REM is that, while it seemed to provide a good fit to non-parallel psychometric functions, it often did not fit non-monotonic curves.The REM is wellequipped to shift the asymptotes of the fitted psychometric functions up and down, and thus often provided a good fit to the data points at the ends of the curves.However, nonmonotonicities were regularly due to a U-shaped curvature lateral to the simultaneity peak, which the REM cannot easily account for.Another issue with the REM is that its estimates of the response-error rate often seemed implausibly high, especially considering the amount of practice in some of the studies.Therefore, as argued by Tünnermann and Scharlau (2018b, p. 8), response-error parameters appear unwarranted for highly trained subjects, and might even lead to invalid estimates of the response-error rates.In any event, we commend García-Pérez and Alcalá-Quintana (2012b) for expanding the TOJ modeling repertoire by methods to account for response errors and attentional lapses.
The idea of explaining non-monotonic and non-parallel psychometric functions in terms of response errors also impresses with its conceptual simplicity: The basic independent-

Two-stage model (TSM)
The TSM provided a good fit to non-monotonic and nonparallel psychometric functions.To achieve these fits, however, the TSM sometimes made use of implausible assumptions.Specifically, the TSM fits frequently indicated that one stimulus (e.g., a sound) arrived earlier in the successiveness center, whereas the other stimulus (e.g., a light) arrived earlier in the order center.More generally, it is a far stretch from other independentchannels models to the TSM: The former proceed from the parsimonious assumption that SJs and TOJs are generated by the same internal process, whereas the TSM posits two distinct processes.In this respect, the TSM is conceptually closer to other dual-route models (Horsfall et al., 2021;Mitrani et al., 1986;Parise & Ernst, 2016, 2023;Stelmach & Herdman, 1991).For example, the temporal-profile model from Stelmach and Herdman (1991) assumes that temporal order (simultaneity) is determined by computing the difference (overlap) between the internal response functions elicited by the two stimuli.Similarly, in the multisensorycorrelation detector proposed by Parise andErnst (2016, 2023), pre-processed signals are subtracted (multiplied) to compute the lag (correlation) and thereby determine the temporal order (simultaneity) of stimuli (but see Yarrow et al., 2023).
The TSM offers a straightforward explanation for the sometimes uncorrelated PSS estimates from TOJ and SJ tasks (Linares & Holcombe, 2014;Love et al., 2013;van Eijk et al., 2008;Vatakis et al., 2008; but see Machulla et al., 2016;Maier et al., 2011;Sanders et al., 2011).This partial lack of correlation might be taken as evidence that TOJs and SJs are based on distinct processes.However, note that these PSS indices, which are typically obtained by fitting arbitrary functions to the observed data, at best reflect the interplay of sensory, decisional, and motor processes (e.g., perceptual latencies, decision thresholds, and response errors).Therefore, García-Pérez and Alcalá-Quintana (2012a, 2015a, b, c) compared TOJ and SJ task performance in more fine-grained model-based analyses.Their extensive analyses of many data sets pointed to invariant sensory parameters (e.g., arrival latencies) between the two tasks.This is consistent with the REM and the TTM, but casts doubt on the TSM.Besides, the TSM also struggles to account for within-trial relations of SJ and TOJ task performance (Allan, 1975a), as we will discuss in more detail below.

Two-threshold model (TTM)
Like the TSM, the TTM provided a good fit to nonmonotonic and non-parallel psychometric functions.Unlike the TSM, the TTM is committed to a specific origin of such shapes, namely a higher threshold for order detection than for successiveness detection.Put the other way around, compared to the TSM, the TTM imposes the additional constraints that SJs and TOJs are based on the same arrivallatency difference and that the threshold for order detection cannot be smaller than the threshold for successiveness detection.
These constraints are mostly in line with the modeling results from García-Pérez Alcalá-Quintana (2012a, 2015a, b, c).As already mentioned, their analyses strongly suggested identical perceptual latencies in the two tasks.Further, in line with the TTM, they obtained larger threshold estimates for the TOJ task than for the SJ task in three studies (Capa et al., 2014;Li & Cai, 2014;Matthews & Welch, 2015).Interestingly, however, they found the reverse pattern in two other studies (Linares & Holcombe, 2014;van Eijk et al., 2008).This is reminiscent of the TSM fits in the present model comparison, which yielded a numerically larger threshold for order (vs.successiveness) detection in seven studies, but the reverse pattern for the results of van Eijk et al. (2008).An exploration of differences across all the studies considered here revealed that the two "deviant" studies (Linares & Holcombe, 2014;van Eijk et al., 2008) (1) covered a substantially wider range of d values and (2) yielded generally higher threshold estimates compared to all other studies, except for the study of Matthews and Welch (2015).Thus, one might speculate that the presence of large stimulus-onset differences led subjects to set sub-optimally high thresholds, such that threshold differences between SJ and TOJ no longer reflected differences in optimal task performance.
In addition, the TTM is in line with human performance in ternary-and quaternary-response tasks.In ternary-response tasks, non-monotonicities in the psychometric functions tended to occur at the locations at which the TTM allows them to occur -left and right of the simultaneity peak.In quaternary-response tasks, R su tended to be most frequent in precisely these regions of d, thus yielding a bimodal distribution (García-Pérez & Alcalá-Quintana, 2018;Weiß & Scharlau, 2011).Finally, the TTM also appears to be in line with psychophysiological data, which pointed to brain areas that are commonly activated during SJs and TOJs, additional regions that are more strongly activated during TOJs, but few to no regions that are more strongly activated during SJs (Binder, 2015;Love et al., 2018;Matsuzaki et al., 2014;Miyazaki et al., 2016).Taken together, the constraints imposed by the TTM seem to be largely supported by cognitive modeling, neural, and behavioral evidence.
The TTM is conceptually simpler than the TSM, and certainly computationally simpler than the TSM and the REM because its fits do not depend on the initial values for the numerical search procedure.However, the TTM does seem to require a re-interpretation of a "threshold".In the traditional independent-channels models, the threshold is defined relative to the duration of a psychological moment (or a time quantum).However, according to the TTM, the threshold depends on whether observers are detecting order (as in the TOJ task) or successiveness (as in the SJ task).Thus, under the classic moment-conception of the threshold (e.g., Stroud, 1955), moments are prolonged in the TOJ (vs.SJ) task.Furthermore, in the ternary-response task, two psychological moments of different duration would need to elapse in parallel.We are not convinced that the psychological moment, so conceived, still represents a sensible concept.Instead, the TTM rather appeals to an interpretation of the threshold as somehow reflecting the amount of evidence needed for an observer to determine successiveness or order.Under the plausible assumption that evidence accrues over time rather than instantaneously, responses should be slower for higher thresholds.Thus, consistent with the TTM, responses have been found to be slower in the TOJ task than in the SJ task (Matsuzaki et al., 2014;Matthews et al., 2016;Pan & Huang, 2023).

Relation of successiveness and order detection
For many decades now, a lingering question has been whether successiveness detection is necessary or even sufficient for order detection.Specifically, successiveness detection may be (1) necessary and sufficient (Allan & Kristofferson, 1974;Baron, 1969), (2) neither necessary nor sufficient (cf.Mitrani et al., 1986), or (3) necessary yet insufficient (Hirsh, 1959;Hirsh & Sherrick, 1961) for order detection.Each of these three positions is represented by one of the three independentchannels models considered here: (1) The REM requires that successiveness detection is necessary and sufficient for order detection, because SJs and TOJs are thought to be governed by the same sensory and decisional processes.(2) According to the TSM, successiveness detection is neither necessary nor sufficient for order detection, because SJs and TOJs are thought to be governed by independent sensory and decisional processes.(3) The TTM demands that successiveness detection is necessary but not sufficient for order detection, because SJs and TOJs are thought to be governed by the same sensory and decisional processes, except that the decisional threshold may be higher for TOJs than for SJs.
Many pieces of (psychophysical, cognitive modeling, mental chronometric, and psychophysiological) evidence described earlier indicate that successiveness detection does not suffice for order detection.Thus, we will now turn to the necessity assumption.Although successiveness detection may be regarded as a "logical prerequisite" (Arrouet et al., 2023, p. 364) for order detection, independent TOJ and SJ processes would make it possible to detect order without detecting successiveness.9However, contrary to the independence assumption, Allan (1975a) observed that the conditional probability of a correct order response given a preceding correct successiveness response for the same stimulus pair was significantly higher than the unconditional of a correct order response in 19 out of 24 cases (three subjects × eight values of d = 0).
Critically, the data from Allan (1975a) also allow a more direct test of the necessity assumption: If order detection necessitates successiveness detection, order detection performance should be at chance level when stimuli are judged to be simultaneous.To test this prediction, we re-analyzed the data from Allan (1975a), and found that the probability of a correct order response after an incorrect simultaneity response R si differed significantly from 0.50 in only three out of 24 cases (12.5%).Even when summing responses across all values of d, a binomial test did not yield a significant deviation from 0.50 for either of the three subjects (AJ: 102/180 = 0.57, p = .086;TM: 141/252 = 0.56, p = .068;BP: 139/275 = 0.51, p = .904).This somewhat surprising outcome suggests that successiveness detection is indeed necessary for order detection, as entailed by the TTM and the REM.

Limitations of present study
In the present study, three independent-channels models (REM, TSM, TTM) were compared based on their fit to the psychometric functions observed in eight previous studies using more than two response alternatives.The preference for the TTM across the three studies using intermodal stimuli was less pronounced compared to the consistent preference for the TTM in the five studies using intramodal stimuli.Therefore, future studies with intermodal stimuli should ascertain whether the preference for the TTM holds for the judgment of temporal relations across sensory modalities.
Another limitation of the present study was that the model selection using the BIC rests on the simplifying rule that the models receive identical penalties for each parameter, irrespective of how much a particular parameter within a particular model can alter the predictions of that model.This approach appears unwarranted in some cases.For example, the order threshold allows for more flexibility in the TSM than in the TTM, since it is allowed to vary independently of the successiveness threshold in the TSM, whereas it is prohibited to fall below the successiveness threshold in the TTM.While this instance seems to strengthen our selection of the TTM, the overall impact of this simplifying rule remains unclear.Therefore, future model comparisons may go beyond enumerating the number of free parameters when taking into account how flexible the models are in fitting data.
Moreover, it would also be desirable to obtain a more precise specification of the shapes of the psychometric functions predicted by different independent-channels models.Such an approach is pursued in the current study by Sternberg et al. (2023), which evaluates several predictions of models closely related to the two-threshold model considered here concerning the cumulants of the psychometric functions.
Furthermore, independent-channels models have been subject to other critical tests apart from their predictions about the shape of psychometric functions.Perhaps most notably, independent-channels models have also been tested with respect to RT data (Heath, 1984) and to their assumptions of independent peripheral channels and independent peripheral and central processing (Sternberg & Knoll, 1973;Ulrich, 1987).An evaluation of independent-channels models based on such tests was also beyond the scope of the present study.Finally, the theoretical conclusions in the present study are yet to be experimentally validated based on new data.

Conclusion
Many independent-channels models have been challenged by observations of non-monotonic and non-parallel psychometric functions in ternary-response tasks.Three independentchannels models attribute such shapes to response errors (REM), to distinct processes underlying SJs and TOJs (TSM), or to a larger threshold for order (vs.successiveness) detection (TTM).Of these three models, the TTM provided the best balance between goodness of fit and parsimony in the present model comparison.The TTM is also in line with many other results from different methods (cognitive modeling, psychophysics, mental chronometry, psychophysiology), which together seem to suggest that successiveness detection is necessary but not sufficient for order detection.Independent-channels models can thus be reconciled with non-monotonic and non-parallel psychometric functions in a simple and plausible way.The parameter k can be conceptualized as the number of neural pulses required for stimulus detection (e.g., Luce & Green, 1972).For simplicity, assume that λ x = λ y = λ.In this case, the distribution of the difference of Erlangdistributed arrival latencies L x and L y has the mean and the variance Rearranging yields Let the number of required neural pulses k ∈ {1, 2, 3, 4} and σ 2 L = 484 ms (see above).For the special case of a single required pulse (k = 1), the Erlang distributions of L x and L y reduce to exponential distributions, and their difference L reduces to a Laplace distribution.In the first row of Fig. 5, the PDFs of the Erlang-distributed arrival latencies L are shown for the four cases.In the second row, the difference between these PDFs is compared with the PDF of the normally distributed arrival-latency difference L. The difference between the two Erlang distributions quickly approaches the normal distribution as the number of required neural pulses k increases.Despite the residual discrepancies in kurtosis, it may be doubted whether the difference of two Erlang distributions with k > 1 can be empirically distinguished from a normal distribution.

Difference of two Wald distributions
Assume that the arrival-latency difference L is normally distributed with mean μ L = −7 ms and variance σ 2 L = 22 2 ms = 484 ms (following the mean best-fitting REM parameter values from Table 3).Further, assume that σ 2 L x = σ 2 L y = σ 2 L .Thus, Substituting σ 2 L = 484 ms yields σ 2 L = 242 ms.The PDF of the Wald distribution reads with t > 0, μ > 0, and λ = μ 3 /σ 2 > 0. Let μ L x ∈ {50 ms, 100 ms, 150 ms, 200 ms} and μ L y = μ L x − 7 ms, yielding μ L = −7 ms in each of these four cases.In the first row of Fig. 6, the PDFs of the Wald-distributed arrival latencies L x and L y are shown for each case.In the second row, the difference of these PDFs is compared with the PDF of the normally distributed arrival-latency difference L. The difference between two Wald distributions quickly approaches the normal distribution with increasing latencies.In fact, the two distributions are virtually indistinguishable starting from the second case, where μ L x = 100 ms and μ L y = 93 ms.

Fig. 1
Fig. 1 Mapping from the arrival-time difference A via the internal state I to the response R in the ternary-response task according to the two-threshold model (TTM).Note.A: arrival time; x: one channel (e.g.,

Fig. 5
Fig. 5 Theoretical comparison between the difference of two Erlang-distributed arrival latencies L = L x = L y and the normally distributed arrival-latency difference L

Table 1
Sum of G 2 and BIC values for the three independent-channels models with the assumption of normally distributed arrival-latency differences across subjects and conditions within and across studies

Table 2
Sum of G 2 and BIC values for the three independent-channels models with the alternative assumption of Laplace-distributed arrival-latency differences across subjects and conditions within and across studies

Table 4
Mean best-fitting TSM parameter values across subjects and conditions within and across studies

Table 5
Mean best-fitting TTM parameter across subjects and conditions within and across studies