Introduction

The visual search paradigm is one of the most popular paradigms in the study of visual attention as it mimics real search tasks we perform every day (for reviews, see Eckstein, 2011; Humphreys, 2016). In each trial of a standard visual search task, a display is presented that contains a spatially arranged set of objects, and participants are asked to press one of two buttons to indicate whether the target (e.g., a red vertical bar) is present or not. The so-called search functions relating the number of items in the display (set size) to the mean correct search response time (RT) are close to linear for both sets of target-present (TP) and target-absent (TA) trials, and their slopes seem to vary on a continuum depending on the difficulty of the search task (Cheal & Lyon, 1992; Liesefeld, Moran, Usher, Müller, & Zehetleitner, 2016). For example, search for a red vertical target among green vertical distractors (feature search; Fig. 1a, left) is efficient, and results in search functions with slopes close to 0 ms/item (Fig. 1b, left). In contrast, searching for a 2 among 5´s (spatial configuration search; Fig. 1a, right) is inefficient, and results in search functions with large positive slopes (Fig. 1b, right). Finally, searching for a red vertical target among green vertical and red horizontal distractors (conjunction search; Fig. 1a, middle) is of intermediate efficiency, and results in search functions with intermediate slopes (Fig. 1b, middle).

Fig. 1
figure 1

Benchmark visual search data set from Wolfe, Palmer, and Horowitz (2010). a Example visual search displays for three search tasks (copy of Fig. 1 in Wolfe et al., 2010). Participants search for the red vertical bar in the feature (left) and conjunction (middle) search tasks, and for a digital 2 among digital 5´s in the spatial configuration search task (right). b Mean correct RT for target-present (solid lines) and target-absent trials (dashed lines). Lighter lines show data for individual observers and darker lines show mean data (copy of Fig. 2 in Wolfe et al., 2010). c Empirical RT distributions for one observer in the spatial configuration search task. Set size is coded by lightness from the lightest lines, set size 3, through set sizes 6 and 12 to the darkest, set size 18 (copy of the lower panel in Fig. 4 in Wolfe et al., 2010). d Simulated RT distributions from a serial, self-terminating search model for target-present (solid) and target-absent (dashed) trials. Lighter lines represent smaller set sizes (copy of Fig. 7 in Wolfe et al., 2010).

To explain visual search behavior, researchers have mainly focused on devising different accounts of the attentional selection process. According to serial search accounts a high-speed attentional spotlight is scanning each object one by one in order to bind its surface features and to recognize it as a target or distractor (Treisman and Gelade, 1980; Wolfe, Cave, & Franzel, 1989). When the target is so salient that it is always scanned first – for example when the target and distractors are very dissimilar in a single surface dimension such as color – flat search slopes will result. Scanning continues until the target is found or all items are identified as distractors – a serial exhaustive search model (Wolfe, 1994). More recent developments have added grouping processes and feature inhibition processes (Treisman & Sato, 1990; Wolfe, 2007).

According to parallel search accounts, all items in the display are attended and identified in parallel. While some parallel search models are based on signal detection theory (Palmer, 1995), others are based on biased competition (Heinke & Backhaus, 2011; Heinke & Humphreys, 2003), similarity, grouping, and recursive rejection (Humphreys & Müller, 1993; Müller, Humphreys, & Donnelly, 1994), neural synchronization (Kazanovich & Borisyuk, 2017), or neurodynamical approaches (Deco & Zihl, 2006; Fix, Rougier, & Alexandre, 2011; Grieben, Tekülve, Zibner, Schneegans, & Schöner, 2018). However, mean correct RT and slopes are not sufficient to distinguish between serial versus parallel processing because both search mechanisms are able to generate efficient and inefficient searches (Townsend, 1990a).

Because mean performance measures such as overall error rate and mean correct RT can be accounted for by different computational models – the problem of model mimicry – Wolfe, Palmer, and Horowitz (2010) focused on the shape of the RT distributions. They collected very large data sets for three search tasks to set a benchmark: a feature search for a color, a spatial configuration search for a digit 2 among digit 5s, and a color-by-orientation conjunction search (see Fig. 1a). Target presence (present or TP/absent or TA) and set size (3, 6, 12, 18) were manipulated for each task to give eight within-subject trial types (TP3, TA3, TP6, …, TA18). About 500 trials were administered to each participant for each trial type. In each trial, the search display remained visible until the observer pressed one of two keys to indicate target present or target absent.

Balota and Yap (2011) distinguish three general approaches for understanding the influences of variables on RT distributions. The first approach is to plot the shape of the RT distribution to determine how a manipulation changes the different regions of the distribution (e.g., histograms, quantile plots, delta plots, hazard plots). For example, Wolfe et al. (2010) plotted histograms by tabulating the RTs in 50 ms-wide bins (see Fig. 1c). They found that all distributions were positively skewed, and that variance tracks mean RT (i.e., all distributions broaden as they shift rightward). Furthermore, for the conjunction and spatial configuration tasks, target-absent distributions are generally to the right of target-present distributions, and the variance of the target-absent trials is greater than that of the target-present trials. They concluded that these distributional shapes reject classic, serial self-terminating search models including the Guided Search 2.0 model (Wolfe, 1994) as shown in Fig. 1d.

The second approach is to fit a mathematical function to an RT distribution to assess how different parameters of the function are modulated by experimental manipulations (Balota & Yap, 2011). For example, Palmer, Horowitz, Torralba, and Wolfe (2011) fitted four psychologically motivated functions to these benchmark data sets (ex-Gaussian, ex-Wald, Gamma, and Weibull). They found that the three functions with an exponential component were all more successful at modeling the RT distributions than the Weibull. They proposed that these exponential components either reflect residual (non-decision) processes in the generation of response times, or that these residual components are actually encoding something important about the search process itself. However, they were unable to distinguish among these two options.

The third and ultimately the preferred approach discussed by Balota and Yap (2011) is to use a computationally explicit process model that makes specific predictions about the characteristics of RT distributions. For example, Moran, Zehetleitner, Müller, and Usher (2013) developed the Competitive Guided Search (CGS) model as an extension of the Guided Search 2.0 model (Wolfe, 1994). The main addition was a mechanism to quit searches prematurely in order to explain the large overlap between the empirical distributions in the target-absent conditions (see Figure 1c). Based on several model comparisons using the benchmark data sets from Wolfe et al. (2010), Moran et al. (2013) concluded that the CGS model meets the challenge of accounting for the shape of the RT distributions in the three benchmark search data sets. Furthermore, Moran, Zehetleitner, Liesefeld, Müller, and Usher (2016) found that CGS indeed fits the benchmark data sets better than a flexible, competitive parallel race model.

However, based on another benchmark search data set, Narbutas, Lin, Kristan, and Heinke (2017) concluded that the CGS model suffers from a failure to generalize across all display sizes, as did a parallel search model developed by Heinke and Humphreys (2003). Indeed, Cheal and Lyon (1992) already concluded that none of the standard theories of visual search are completely adequate (see also Dutilh et al., 2018).

The structure of this paper is as follows. First we will discuss event history analysis, the standard longitudinal technique to analyze time-to-event data in many scientific disciplines. Event history analysis allows one to study how the effect of an experimental manipulation (here: set size and target presence) on performance changes with the passage of time on one or more time scales. We end the Introduction section by listing our objectives. In the Methods section, we explain how we applied the descriptive and inferential statistics from event history analysis to the benchmark data sets of Wolfe et al. (2010). In the Results section, we show descriptive plots of the empirical distributions and compare different individuals. Because Balota and Yap (2011) do not discuss the statistical analysis of RT distributions, we also illustrate how to fit a statistical model to RT distributions and what this reveals about behavioral dynamics. We discuss several new findings in light of existing visual search theories in the Discussion section.

Event history analysis

Event history analysis, a.k.a. survival, hazard, duration, transition, and failure-time analysis, is the standard set of statistical methods for studying the occurrence and timing of events in many scientific disciplines (Allison, 2010; Singer & Willett, 2003). Examples of time-to-event data include RT data, saccade latencies, fixation durations, time-to-force-threshold data, perceptual dominance durations, neural inter-spike durations, etc. To apply event history analysis, one must be able to define the event-of-interest (any qualitative change that can be situated in time; here: a button-press response), to define time point zero (here: search display onset), and to measure the passage of time between time zero and event occurrence.

Continuous-time hazard rate function

Luce (1986) mentions that there are several different, but mathematically equivalent, ways to present the information about a continuous random variable T denoting a particular person´s response time in a particular experiment: the cumulative distribution function F(t) = P(T≤t), its derivative F´(t) known as the probability density function f(t), the survivor function S(t) = 1-F(t) = P(T>t), and the hazard rate function λ(t) = f(t) / [1-F(t)] = f(t) / S(t). “In principle, we may present the data as estimates of any of these functions and it should not matter which we use. In practice, it matters a great deal, although that fact does not seem to have been as widely recognized by psychologists as it might be” (Luce, 1986, p. 17).

Event history analysis (EHA) has been developed to describe and model the hazard function of response occurrence. Hazard quantifies the instantaneous risk that a response will occur at time t, conditional on its nonoccurrence until time t. In other words, it quantifies the likelihood that a response we are still waiting for at time t will occur within the next instant. There are at least five reasons why statisticians and mathematical psychologists recommend focusing on the hazard function in practice, when dealing with a finite sample.

First, “the hazard function itself is one of the most revealing plots because it displays what is going on locally without favoring either short or long times, and it can be strikingly different for f´s that seem little different.” (Luce, 1986, p. 19). To illustrate this, Fig. 2 shows the F(t), f(t), S(t), and λ(t) for four theoretical waiting-time distributions. In contrast to λ(t), all F(t) and S(t) distributions look vaguely alike and one cannot describe easily salient features other than the mean and standard deviation. The problem with the density function f(t) is that it conceals what is happening in the right tail of the distribution (Luce, 1986). As discussed by Holden, Van Orden, and Turvey (2009) "Probability density functions can appear nearly identical, both statistically and to the naked eye, and yet are clearly different on the basis of their hazard functions (but not vice versa). Hazard functions are thus more diagnostic than density functions" (p. 331).

Fig. 2
figure 2

Four views on waiting-time distributions. The cumulative distribution function (top left), the density function (top right), the survivor function (bottom left) and the hazard rate function (bottom right) are shown for each of four theoretical probability distributions (exponential, Weibull, gamma, log-normal). While the hazard function for the exponential is flat, it keeps increasing for the Weibull, it increases to an asymptote for the gamma, and it reaches a peak and then gradually decreases to an asymptote for the log-normal

Second, because RT distributions may differ from one another in multiple ways, Townsend (1990b) developed a dominance hierarchy of statistical differences between two arbitrary distributions A and B. For example, if FA(t) > FB(t) for all t, then both cumulative distribution functions are said to show a complete ordering. Townsend (1990b) showed that a complete ordering on the hazard functions – λA(t) > λB(t) for all t – implies a complete ordering on both the cumulative distribution and survivor functions – FA(t) > FB(t) and SA(t) < SB(t) – which in turn implies an ordering on the mean latencies – mean A < mean B. In contrast, an ordering on two means does not imply a complete ordering on the corresponding F(t) and S(t) functions, and a complete ordering on these latter functions does not imply a complete ordering on the corresponding hazard functions. This means that stronger conclusions can be drawn from data when comparing the RT hazard functions using event history analysis. For example, when mean A < mean B, the hazard functions might show a complete ordering (i.e., for all t) or only a partial ordering (e.g., only for t < 600 ms).

Third, EHA does not discard right-censored observations (trials for which we do not observe a response during the data collection period so that we only know that the RT must be larger than some value)Footnote 1. Discarding such trials and/or trials with very long RTs will introduce a sampling bias that results in underestimation of the mean. In fact, EHA includes the data from all trials when estimating the descriptive statistics. For inferential statistics, one might exclude some early bins with rarely occurring fast responses (see Methods).

Fourth, hazard modeling allows incorporating time-varying explanatory covariates such as heart rate, EEG signal amplitude, gaze location, etc. (Allison, 2010) which is useful for cognitive psychophysiology (Meyer, Osman, Irwin, & Yantis, 1988)Footnote 2.

Fifth, hazard is more suited as a measure of the concept of processing capacity, i.e., the amount of work the observer is capable of performing within some unit of time (Wenger & Gibson, 2004). In the context of research on attention, the hazard function can capture the notion of the instantaneous capacity of the observer for completing the task in the next instant, given that the observer has not yet completed the task.

Discrete-time hazard probability function

Unfortunately, estimating the shape of the continuous-time hazard rate function for one observer in one experimental condition is not straightforward because one needs at least 1000 trials for example (Bloxom, 1984; Luce, 1986; Van Zandt, 2000). Furthermore, statistical modeling of continuous time-to-event data requires specialized software to either fit parametric hazard models that are rather restrictive in the shapes they allow (e.g., a Weibull hazard model), or semi-parametric hazard models that completely ignore the shape of the hazard function (e.g., Cox regression). Therefore, we promote the general application of discrete-time hazard analysis to RT data, which is straightforward, easy and intuitive, and allows for flexible statistical modeling by logistic regression which is highly familiar to psychologists (Allison, 1982, 2010; Singer & Willett, 1991, 2003; Willett & Singer, 1993, 1995).

To calculate the descriptive statistics – functions of discrete time – one has to set up a life table. A life table summarizes the history of event occurrences for a combination of subject and experimental condition. For illustrative purposes, we present in Table 1 a life table for the 530 trials of one participant in the feature search task for the trial type TP3 (target present and set size 3).

Table 1 A life table for the 530 trials of subject 1 for condition TP3 in the feature task. The censoring time and bin size equal 1000 ms and 100 ms, respectively

First, the first second after search display onset is divided into ten contiguous bins of 100 ms (column 1). Then, after counting the number of observed responses in each bin (column 4) the risk set must be determined for each bin (column 5). The risk set is equal to the number of trials that have not yet experienced a response at the start of the bin in question. The sample-based hazard estimate in bin t, or h(t) (column 6), is then simply obtained by dividing the number of observed responses in bin t (column 4) by the risk set of bin t (column 5). In discrete time, hazard is defined as the conditional probability of a response occurring in time bin t given it has not yet occurred before, h(t) = P(T=t|T≥t). The discrete-time hazard function thus tells us the probability that a response we are still waiting for will actually occur in bin t.

Next to the hazard function, EHA also focuses on the survivor function S(t) = P(T>t) = 1 - F(t), because S(t) provides a context for h(t+1) as it shows the proportion of trials not having experienced the response by the end of bin t. For completeness, Table 1 also tabulates the corresponding probability mass function P(t) = P(T=t).Footnote 3

For choice RT data such as the benchmark search data, we want to distinguish different types of events: correct versus incorrect responses. One approach is to assume that each event type has its own hazard function that describes the occurrence and timing of events of that type. One would thus model the h(t) of correct response occurrence separately from the h(t) of error response occurrence. Another approach is to first study the timing of events without distinguishing among event types, and then to study which type of event occurs while restricting the analysis to those cases that experienced an event. A major attraction of this latter approach is that there is no need to assume that the different kinds of events are uninformative for one another (Allison, 2010).Footnote 4

We therefore take the latter, so-called conditional-processes approach (Allison, 2010, pp. 227-229) by extending the h(t) analysis of response occurrence with an analysis of the conditional accuracy function ca(t) = P(correct|RT = t). The ca(t) is estimated by dividing the number of correct responses in bin t by the total number of observed responses in bin t (Table 1)Footnote 5. Note that P(t) provides a context for ca(t) as P(t) shows on which percentage of trials the ca(t) estimate is based.

Thus, by using h(t) functions of response occurrence in combination with ca(t) functions one can provide an unbiased, time-varying, and probabilistic description of the latency and accuracy of responses based on all trials of any RT data set. Statistical models for h(t) and ca(t) can each be implemented as generalized linear mixed regression models predicting event occurrence (1/0) and response accuracy (1/0) in each bin of a selected time range, respectively (Panis & Schmidt, 2016).

There are also possible disadvantages of discrete-time event history analysis. First, the person-trial-bin oriented data set (see section Methods) can become very large. Second, one needs to explore a few bin sizes. The optimal bin size will depend on the censoring time, the rarity of event occurrence, and the number of repeated measures or trials. Note that the time bins do not have to be of equal size. Third, remember that in hierarchical data like ours, there are two sources of noise: within and between participants. For a distributional analysis it is important to have enough repeated measures per participant and condition (preferably at least 100) to minimize the influence of within-subject noise. Between-subjects variation is a different matter: it can be due to noise, but also due to characteristic differences between individuals (e.g., in speed, capacity, or strategy). Again, high measurement precision in single participants is the only way to deal with this. The analysis of single participants should be regarded as a safeguard against interpreting spurious effects in the pooled data that are actually only generated by a small minority of participants, while at the same time refraining from overinterpreting the individual data patterns. Note that systematic effects will be visible for a majority of participants, while occurrences due to noise will not.

Objectives

The current study is motivated by three goals. First, using a freely available data set, we want to illustrate the descriptive and inferential statistics used by discrete-time EHA, and what we can learn from this. As discussed by Whelan (2008) the use of a more advanced analysis method can maximize the return from the obtained data, which is important in view of the time and costs required to run an experiment. Second, using an exploratory approach, we want to see if the shapes of the diagnostic h(t) and ca(t) functions for the three benchmark data sets will reveal certain, as yet unknown, features of the time-dispersed behavior of searchers. Third, to collect the benchmark data set, Wolfe et al. (2010) used a small-N design in which a large number of observations are made on a relatively small number of experimental participants. Smith and Little (2018) argue that, “if psychology is to be a mature quantitative science, then its primary theoretical aim should be to investigate systematic functional relationships as they are manifested at the individual participant level” (p. 2083). Therefore, we will pay attention to individual differences in the time-dispersed search behavior. As discussed below, our results reveal new features of visual search behavior, many – if not all – of which are not considered by current cognitive models of visual search, but can be helpful to inform future models.

Methods

Data sets

We reanalyzed the benchmark data sets provided by Wolfe et al. (2010; http://search.bwh.harvard.edu/new/data_set_files.html). This group collected data in three tasks. In the feature search task (nine participants), participants searched for a red vertical rectangle among green vertical rectangles. In the conjunction search tasks (ten participants), they searched for a red vertical rectangle among green vertical and red horizontal rectangles. Finally, in the spatial configuration task (nine participants), they searched for a numerical digit 2 among 5s (see Fig. 1). Four different set sizes (SS; distractors plus target, either 3, 6, 12, or 18) were randomly intermixed. Participants were young adults with corrected or normal acuity; their color vision was ascertained by Ishihara tests. They pressed one key if the target was present (which was the case in 50 % of trials) and another if the target was absent. They were instructed to respond as quickly and correctly as possible and received feedback after each trial. Accuracy and RT in ms were recorded. Each participant provided approximately ten blocks of 400 trials, leading to about 500 trials per participant and search condition. For further information, see the website.

For the descriptive statistics we downloaded the raw data from the website via the section named “Fitting Functions”. The raw feature, conjunction, and spatial configuration search data sets contained 35,941, 39,958, and 35,862 rows, respectively. For hazard model fitting, we actually used the downloadable text files (via the section below the one named “Fitting Functions”) because these also contain information about trial and block numbers.

Mean correct RT and percent error

We used the same outlier criteria as Wolfe et al. (2010) in order to calculate the sample mean RT, mean correct RT, and error percentage for each combination of subject, target presence, and set size. Specifically, we excluded all trials (N = 80) with RT < 200 ms or > 4,000 ms for the feature and conjunction search tasks, and with RT < 200 ms or > 8000 ms in the spatial configuration search task.

Event history analysis: descriptive statistics

Starting from the raw data sets without removing outliers, life tables were constructed using software package R (R Core Team, 2014) for each combination of subject, target presence, and set size (see Table 1). We used a bin size of 40 ms for the feature and conjunction search tasks, and a bin size of 80 ms for the spatial configuration search task, to provide high temporal resolution when visually studying the shape of the empirical distributions (and to still have an acceptable level of stability in the estimates). We used a censoring time of 1600, 2400, and 3600 ms for the feature, conjunction, and spatial configuration search data sets, respectively, since most events had occurred by this time in all search conditions. Standard errors for h(t), P(t), and ca(t) can be estimated using the formula for a proportion p – the square root of {p(1-p) / N} – where N equals the risk set for bin t, the total number of trials, and the number of observed responses in bin t, respectively. The standard errors for S(t) were estimated using the recurrent formula on page 350 of Singer and Willett (2003).

Event history analysis: Inferential statistics

To test whether and when the main and interaction effects including target presence and set size are significant across participants, we fitted discrete-time hazard models to aggregated data by implementing generalized linear mixed-effects regression models in R (R Core Team, 2014; function glmmPQL of package MASS) using the complementary log-log (cloglog) link function (Allison, 2010).Footnote 6 An example discrete-time hazard model with three predictors can be written as follows: cloglog[h(t)] =ln(-ln[1-h(t)]) = [α0ONE+ α1(TIME – 1) + α2(TIME – 1)2 + α3(TIME – 1)3] + [β1X1 + β2X2 + β3X2(TIME – 1)]. The main predictor variable TIME is the time bin index t (see Table 1), which is centered on value 1 in this example. The first set of terms within brackets, the alpha parameters multiplied by their polynomial specifications of (centered) time, represents the shape of the baseline cloglog-hazard function (i.e., when all predictors Xi take on a value of zero). The second set of terms (the beta parameters) represents the vertical shift in the baseline cloglog-hazard for a 1 unit increase in the respective predictor. For example, the effect of a 1 unit increase in X1 is to vertically shift the whole baseline cloglog-hazard function with β1 cloglog-hazard units. However, if the predictor interacts linearly with time (see X2 in the example) then the effect of a 1 unit increase in X2 is to vertically shift the predicted cloglog-hazard in bin 1 with β2 cloglog-hazard units (when TIME-1 = 0), in bin 2 with β2+ β3 cloglog-hazard units (when TIME-1 = 1), etc. To interpret the effects of the predictors, the parameter estimates are anti-logged, resulting in a hazard ratio.

We proceeded as follows. First, for each search task we aggregated the raw data across all subjects, except for the conjunction search task where one observer was ignored (see section Results).

Second, for each task we selected a time range where all subjects provide enough data in each condition, and created between 10 and 20 bins for modeling purposes. For the feature search data, we therefore used a censoring time of 800 ms and a bin size of 40 ms. After ignoring the first 240 ms (i.e., the first six 40-ms bins) in which no (or only few) responses occurred, we end up with 14 bins to model. For the conjunction search data, we used a censoring time of 1000 ms and a bin size of 50 ms. After ignoring the first 350 ms (i.e., the first seven 50-ms bins) in which no (or only few) responses occurred, we end up with 13 bins to model. For the spatial configuration data, we used a censoring time of 2000 ms and a bin size of 80 ms. After ignoring the first 400 ms (i.e., the first five 80-ms bins) in which no (or only few) responses occurred, we end up with 20 bins to model.

Third, trial type TP3 was chosen as the baseline condition in each search task. The main predictor variable TIME was centered on value 10 or bin (360,400], 10 or bin (450,500], and 8 or bin (560,640] for the feature, conjunction, and spatial configuration search task, respectively. For each task, the intercept and the linear effect of TIME were treated as random effects to deal with the correlated data resulting from the repeated measures on the same subjects. Next to dummy-coding the levels of our experimental factors (target presence and set size), we also included TRIAL and BLOCK number as continuous predictors to model across-trial and across-block learning effects in the speed of responses. TRIAL was centered on value 350 (and rescaled by dividing by 10) and BLOCK on value 8 for each task. Thus, for the feature search task for example, with all effects set to zero, the h(t) model´s intercept refers to the estimated cloglog[h(t)] for bin (360,400] in trial 350 of block 8 when the target is present and set size equals three (see Table 2, column 5, effect nr. 1).

Table 2 Parameter estimates and test statistics for the selected hazard model in the feature search task. The selected model was refitted three times with TIME centered on bin 280, bin 520, and bin 640, respectively

Fourth, to estimate the parameters of the h(t) model, we must create a dataset where each row corresponds to a time bin of a trial of a participant (a person-trial-bin oriented data set). Specifically, each time bin that was at risk for event occurrence in a trial was scored on the dependent variable EVENT (0 = no response occurred; 1 = response occurred), the centered covariates TIME, TRIAL, and BLOCK, the variable SUBJECT, and the dummy-coded dichotomous experimental predictor variables (TA, SS6, SS12, SS18). Thus, for the feature search task for example, all trials with observed RTs > 800 ms were treated as right-censored observations; they provide the information that the response did not occur during the first 800 ms or 20 bins after search display onset (i.e., each of these trials contributes 20 rows, and each row has a value 0 for EVENT).

Just before running glmmPQL, we deleted the rows corresponding to the first 6, 7, and 5 bins for the feature, conjunction and spatial configuration search task, respectively, as mentioned above under step 2. The resulting person-trial-bin oriented data set contained 168,996, 219,762, and 355,261 rows for the feature, conjunction and spatial configuration search task, respectively.

Fifth, for each task, we started with a full multi-level EHA model (46 parameters; with bins at level 1 nested within observers at level 2) encompassing the following effects at level 1: (a) a 7th-order polynomial for the shape of the baseline cloglog-hazard function (eight parameters), (b) the effects of target absence (TA), SS6, SS12 and SS18 were allowed to interact with time in a quartic fashion (20 parameters), (c) the interaction effects between TA and each of the three set sizes could vary over time in a cubic fashion (12 parameters), and (d) the linear effects of trial and block were allowed to interact with time in a quadratic fashion (six parameters). For each task, we used an automatic backward selection procedure to select a final model. Specifically, during each iteration, the effect with the largest p value that was not part of any higher-order effect was deleted, and the model refitted. This continued until each of the remaining effects that was not part of any higher-order effect had a p < .05 (see highlighted p values in Tables 2, 3, and 4).

Table 3 Parameter estimates and test statistics for the selected hazard model in the conjunction search task. The selected model was refitted three times with TIME centered on bin 650, bin 800, and bin 950, respectively. Same conventions as in Table 2.
Table 4 Parameter estimates and test statistics for the selected hazard model in the spatial configuration search task. The selected model was refitted three times with TIME centered on bin 960, bin 1280, and bin 1600, respectively. Same conventions as in Table 2

Finally, after model selection, we refitted the final model a number of times with TIME centered each time on another bin, to see explicitly what values the parameter estimates take on according to the final model in these other bins, and whether they represent a significant effect or not (see Tables 2, 3, and 4).

Results

Feature search: Descriptive statistics

In Fig. 3, we present the data from one participant in the feature search task. In each experimental condition, the hazard function (top row) rises to a peak and then declines before hitting the value 1. We denote time bins by the endpoint of the interval they span, so that “bin 280” refers to bin (240,280]. For example, in condition SS3 the estimated h(280) equals 0.42 for TP and 0.014 for TA for this participant. In other words, if the waiting time has increased until 240 ms after display onset without response occurrence, then the (conditional) probability that the (first) response will occur somewhere in bin (240,280] equals 0.42 for TP but only 0.014 for TA; similarly, if the waiting time has increased until 320 ms after display onset then the estimated h(360) equals .76 for TP and .74 for TA. The effect of target presence on h(t) is clearly visible for each set size in the left tail of the distribution (i.e., when hazard is rising), and this effect is decreasing somewhat with increases in set size. The ca(t) functions show that the fastest responses in the target-absent conditions tend to be errors (false alarms) while the slowest responses are error free. In contrast, in the target-present conditions most emitted responses tend to be correct, except for a small dip in the ca(t) functions that reveals a temporarily increased miss rate around the time when the ca(t) functions in the target-absent conditions reach 1. This particular participant, however, was the only one in the sample who emitted a response before 800 ms in each trial of each condition.

Fig. 3
figure 3

Descriptive statistics for subject number 3 in the feature search task. Top to bottom: h(t), S(t), P(t), and ca(t) for each set size (columns) and target presence (green = target present, red = Target Absent). Vertical lines in the h(t), S(t), and P(t) plots represent the sample mean RT, the estimated median RT or S(t).50, and the sample mean correct RT, respectively. Horizontal lines in the ca(t) plots represent overall accuracy

In Fig. 4, we compare the h(t) and ca(t) estimates between this and three other participants. Comparing individuals reveals that there are two subgroups of observers that show qualitatively different ca(t) behavior. Three observers (2, 3, and 7; see Fig. 4, top eight panels) show early false alarms when responses are emitted around 240 ms after search display onset. We define early false alarms as “ca(t) ≤ .50 for the earliest emitted responses, for at least two set sizes when the target is absent”. At the same time, they show “small dips” in early ca(t) for target present, i.e., small temporary increases in the miss rate (early misses) at the time when ca(t) for target-absent trials reaches 1. The remaining observers show no systematic errors (see Fig. 4, bottom 8 panels). Note that the latter observers emit their fastest responses a bit later compared to those individuals who do show early errors. Interestingly, eight out of nine subjects showed a small but systematic effect of set size (i.e., SS3 > SS6 > SS12 > SS18) on h(t) for target-present trials in one or more bins before or around the time when hazard reaches its peak (see Fig. 4, left h(t) panels). Finally, for those subjects who were not as fast as subject number 3 the hazard functions peaked and then declined toward, and stayed hoovering for some time around a non-zero value. Note that as time passes on the standard errors for the h(t) estimates automatically increase because the risk set becomes smaller and smaller.

Fig. 4
figure 4

Inter-individual differences in feature search. Estimates of h(t) and ca(t) for four participants in the feature search task, for target-present (left column) and target-absent (right column) trials and each set size (green = SS3, red = SS6, black = SS12, blue = SS18). Vertical lines in the h(t) plots represent the sample mean RT. a Subject 2. b Subject 3. c Subject 4. d Subject 8

Feature search: Inferential statistics

Table 2 shows the selected hazard model for the feature task (columns 5 to 8). Figure 5 presents the predicted (i.e., model-based) hazard functions (first column), cloglog-hazard functions (second column), and the corresponding survivor (third column) and probability mass functions (fourth column), for each set size in target-present (top row) and -absent trials (bottom row) for trial 350 in block 8.

Fig. 5
figure 5

Hazard model predictions for feature search. Predicted h(t) functions for trial 350 in block 8 (first column) and the corresponding cloglog-hazard functions (second column), S(t) (third column) and P(t) (right column) functions, for target-present (top row) and target-absent (bottom row) trials. Vertical lines in the S(t) plots represent the estimated median RT or S(t).50

Because TRIAL and BLOCK are centered, the first six parameter estimates (PE) in Table 2 model the shape of the cloglog[h(t)] function for TP3 (the chosen baseline condition) in trial 350 of block 8 using a 5th order polynomial function of TIME (Figure 5, first row, second column, green line). Because TIME is centered on bin 400, the intercept of our regression model refers to the predicted cloglog[h(400)] value for TP3 in trial 350 of block 8. Converting back from cloglogs to hazards, h(400) = .42 (= 1 - exp[-exp(-0.61)]) as shown in Figure 5 (top left). Parameters 2-6 show a significant linear, quadratic, cubic, quartic, and quintic effect of TIME on this intercept estimate, such that the predicted response hazard first quickly increases with increasing waiting time until around 440 ms after display onset, and then decreases toward a non-zero value: h(280) = 0.04, h(400) = 0.42, h(520) = 0.39, and h(640) = 0.16. This shows that the hazard of response occurrence changes in a particular fashion on the across-bin/within-trial time scale.

With respect to the manipulations of interest, we see that in bin 400 and relative to the reference condition TP3, there is a main effect of removing the target (parameter 7, column 5, PE = -0.2483, p < .0001). A measure of effect size for a discrete-time cloglog-hazard model can be obtained by exponentiating the parameter estimates which gives us hazard ratios (HR; Allison, 2010, p. 242). Thus, compared to the cloglog[h(400)] estimate in the reference condition, removing the target decreases the estimated cloglog[h(t)] by 0.2483 units, which corresponds to a decrease in response hazard by a factor of 0.78 (HR(400) = exp[-0.2483] = 0.78). There are also main effects in bin 400 of changing the set size to 6 (parameter 12, PE = – 0.11, HR = 0.90, p < .0001), to 12 (parameter 14, PE = – 0.14, HR = 0.87, p < .0001), and to 18 (parameter 18, PE = – 0.2, HR = 0.83, p < .0001). The fact that all these effects are negative indicates that response occurrence slows down.

All these main effects change significantly with TIME (parameters 7 to 20). For example, the effect of target absent changes in quartic fashion (parameters 7 to 11) so that it equals – 1.70 in bin 280 (HR = 0.18, p < .0001), – 0.25 in bin 400 (HR = 0.78, p < .0001), – 0.21 in bin 520 (HR = 0.81, p < .0001), and only 0.09 in bin 640 (HR = 1.09, p = .173). The effect of SS6 changes in a linear fashion that is marginally significant (parameter 13, p = .0506). The effect of SS12 changes in a linear and cubic fashion (parameters 15 to 17). The effect of SS18 changes in a linear fashion (parameters 19 to 20). Increasing the set size leads to a systematic decrease in the estimated h(t) in bins < 500 ms when the target is present (see Fig. 5, top left panel). Bins after 500 ms show no significant effects of set size anymore in the target-present conditions. Thus, the h(t) functions show a partial ordering with respect to the systematic effects of target presence (i.e., only for t < 600 ms) and set size when the target is present (i.e., only for t < 500 ms). In other words, once the waiting time has increased until 600 ms after display onset, then set-size and target presence have no influence anymore on the hazard of response occurrence.

As expected, there are also interaction effects in bin 400 between target absent and each of the three set sizes, which change over TIME (parameters 21–28). These positive interaction effects counteract the negative main effects of set size when the target is present. Note that for each set size (SS6, SS12, and SS18) the interaction effect with target absent is larger in absolute value than the main effect of each set size (i.e., parameter 21 versus 12, 23 versus 14, and 26 versus 18), both in bin 280 and in bin 400. Thus, increasing the set size up to 12 leads to a systematic increase in the estimated h(t) in bins < 500 ms when the target is absent; This can be seen most clearly for bin 280 of the cloglog-hazard functions in Figure 5 (second row and column). Bins after 500 ms show no significant interaction effects involving set size anymore.

One advantage of a discrete-time hazard model is that you can incorporate multiple time scales. Parameters 29 to 32 show that hazard also varies on the across-trial/within-block time scale, and on the across-block/within-experiment time scale. First, in bin 400, each additional series of 10 trials will increase the estimated cloglog[h(t)] value with -0.0023 units (parameter 29, column 5, p < .0001), and this effect increases linearly with TIME (parameter 30, PE = .0009, p < .0001). Thus, while the effect of Trial is negative for the left tail of the distribution (see Table 2, row 29) it is positive for the right tail (e.g., in bin 640). Second, each additional block will increase the estimated cloglog[h(t)] value with 0.0275 units in bin 400 (parameter 31, column 5, p < .0001), and this effect decreases linearly with TIME (parameter 32, PE = – .0057, p < .0001). Figure 6a shows how learning effects operating on the block-wide and experiment-wide time scales affect the shape of the hazard function in the baseline condition TP3.

Fig. 6
figure 6

Effect of practice on event occurrence. The model-based effects of Trial (T) and Block (B) are shown for the baseline condition (TP3) for the feature (A), conjunction (B) and spatial configuration (C) search tasks

Conjunction search: Descriptive statistics

In Fig. 7, we present the data from one participant in the conjunction search task. In each condition the hazard function rises to a peak, then declines, and finally keeps hoovering around a non-zero value temporarily. The effect of target presence on h(t) is clearly present in the left-tail of the distributions (i.e., when hazard is rising), and this effect is now clearly increasing with increases in set size. The ca(t) functions show that responses emitted before 400 ms in the target-absent condition tend to be false alarms. At the same time, they show “small dips” in early ca(t) for target-present trials, i.e., small temporary increases in the miss rate (early misses) at the time when ca(t) for target-absent trials reaches 1.

Fig. 7
figure 7

Descriptive statistics for subject number 5 in the conjunction search task. Same conventions as in Figure 3

Inspection of the descriptive functions h(t) and ca(t) showed that individuals can differ in at least three aspects (see Figure 8). First, three observers show a lot of early false alarms (subjects 4, 5, 6), while the remaining observers show few-to-no early false alarms. Those subjects who show early false alarms also show an early, temporary increase in the miss rate (early misses; i.e., a small dip in ca(t) for TP) at the time when ca(t) for target-absent trials reaches 1. Second, three observers (subjects 3, 6, 10) show a very large effect of target presence on h(t) and S(t) for set sizes 12 and 18, while the remaining observers show a smaller effect (compare subjects 10, 2, 4, and 5 in Fig. 8). Third, those subjects that show few-to-no early errors tend to emit the earliest responses a bit later compared to those who do show early false alarms. Note that subject 4 was very fast overall. Regardless of these individual differences, for each participant target-present responses were faster on average than target-absent responses and this difference increased with set size. Also, for many subjects misses started to emerge in the later bins for the larger set sizes, while there were few-to-no late false alarms. Finally, the effect of set size on h(t) was visible only in the left tail of the distribution, and not in the flat right tail. Note that the location of the sample means is not systematically related to any feature of the shape of the RT distributions.

Fig. 8
figure 8

Inter-individual differences in conjunction search. Estimates of h(t) and ca(t) for four participants in the conjunction search task. a Subject 4. b Subject 5. c Subject 2. d Subject 10. Same conventions as in Fig. 4

Conjunction search: Inferential statistics

Table 3 (columns 3 to 6) shows the selected hazard model for the conjunction task based on the aggregated data of nine subjects (subject number 4 was ignored because of a lot of missing data in the later bins which led to model fitting failures). Figure 9 presents the predicted (i.e., model-based) hazard functions (top row), cloglog-hazard functions (second row), and the corresponding survivor (third row) and probability mass functions (bottom row), for each set size in target-present and -absent trials for trial 350 in block 8. The first seven parameter estimates in Table 3 model the shape of the cloglog[h(t)] function for TP3 (the reference condition) in trial 350 of block 8 (Fig. 9, top left, green line) using a 6th-order polynomial function of TIME. Parameters 2–7 show a significant linear, quadratic, cubic, quartic, quantic, and sextic effect of TIME on this intercept estimate, such that the predicted response hazard first increases with increasing waiting time, and then decreases towards a non-zero asymptote (Fig. 9).

Fig. 9
figure 9

Hazard model predictions for conjunction search. Predicted h(t) functions for trial 350 in block 8 (top row) and the corresponding cloglog-hazard functions (second row), S(t) (third row) and P(t) (bottom row) functions, for target-present (left column) and target-absent (right column) trials. Vertical lines in the S(t) plots represent the estimated median RT or S(t).50

With respect to the manipulations of interest, we see that in bin 500 and relative to the reference condition TP3, there is a main effect of removing the target (parameter 8, PE = – 0.44, HR = 0.64, p < .0001), and main effects of changing the set size to 6 (parameter 13, PE = – 0.25, HR = 0.78, p < .0001), to 12 (parameter 17, PE = – 0.74, HR = 0.48, p < .0001), and to 18 (parameter 21, PE = – 1.13, HR = 0.32, p < .0001). All these main effects change significantly with TIME (parameters 8 to 23). For example, the effect of target absent changes in quartic fashion (parameters 9–12) so that it equals – 0.44 in bin 500 (HR = 0.64, p < .0001), 0.11 in bin 650 (HR = 1.12, p < .001), 0.18 in bin 800 (HR = 1.20, p < .005), and only 0.01 in bin 950 (HR = 1.01, p = .92). The effect of SS6 changes in a cubic fashion (parameter 14 to 16), the effect of SS12 changes in a cubic fashion (parameters 18–20), and the effect of SS18 changes in a quadratic fashion (parameters 22–23).

There are also interaction effects between Target Absent and each of the three set sizes, which change over TIME (parameters 24–33). In contrast to the feature search data, these interaction effects are now negative, and their absolute size in each bin increases with increasing set size. Furthermore, both these interaction effects and the three main effects of set size are (a) negative in bin 500, (b) decrease in absolute size over time, (c) are larger for larger set sizes, and (d) remain significant for a longer time after display onset for larger set sizes. In other words, the h(t) functions show a partial ordering with respect to the systematic effects of set size and target presence (i.e., in general only for t < 1000 ms).

Finally, hazard also varies on the across-trial/within-block time scale, and on the across-block/within-experiment time scale. First, each additional series of ten trials will increase the estimated cloglog[h(t)] value with 0.0021 units (parameter 34, column 3, p < .0001) in each bin. Second, each additional block will increase the estimated cloglog[h(t)] value with 0.048 units in bin 500 (parameter 35, column 3, p < .0001), and this effect decreases linearly with TIME (parameter 36, PE = – .0021, p < .0001). Figure 6B shows how the effect of trial affects the shape of the hazard function in the baseline condition within Blocks 1 and 8 when changing from trial 10 to trial 350.

Spatial configuration search: Descriptive statistics

In Fig. 10, we present the data from one participant in the spatial configuration search task. Instead of only peaked hazard functions, we now see also monotonically increasing hazard functions for the larger set sizes. The effect of target presence on h(t) is clearly present in the left tail of the distributions and the difference between the target-present and -absent hazard functions is lasting longer for larger set sizes. The ca(t) functions show that many responses emitted before 1000 ms in the target-absent condition tend to be false alarms. At the same time, they show “small dips” in early ca(t) for target-present trials. Furthermore, for the larger set sizes 12 and 18 the miss rate starts to increase over time around 1500 ms after display onset.

Fig. 10
figure 10

Descriptive statistics for subject number 8 in the spatial configuration search task. Same conventions as in Fig. 3

Comparing individuals (Fig. 11) shows that four individuals show many early false alarms coupled with early misses (subjects 1, 7, 8, and 9). The remaining subjects show few-to-no false alarms and their fastest responses appear somewhat later compared to the other subjects. Also, subjects 3 and 7 showed very slow behavior when the target is absent for set sizes 12 and 18 (i.e., hazard functions that start to rise late and at a very low rate). Regardless of these individual differences, for each subject target-present responses were on average faster than target-absent responses and this difference increased with set size. Finally, all subjects show late misses for the larger set sizes, appearing around 1500 ms after display onset.

Fig. 11
figure 11

Inter-individual differences in spatial configuration search. Estimates of h(t) and ca(t) for four participants in the spatial configuration search task. a Subject 1. b Subject 8. c Subject 3. d Subject 4. Same conventions as in Fig. 4

Spatial configuration search: Inferential statistics

Table 4 (columns 3–6) shows the selected hazard model for the spatial configuration task. Figure 12 presents the predicted (i.e., model-based) hazard functions (top row), cloglog-hazard functions (second row), and the corresponding survivor (third row) and probability mass functions (bottom row), for each set size in target-present and -absent trials for trial 350 in block 8. In the baseline condition (TP3) the predicted response hazard first increases with increasing waiting time, and then decreases to a non-zero value (Fig. 12).

Fig. 12
figure 12

Hazard model predictions for spatial configuration search. Same conventions as in Fig. 9

With respect to the manipulations of interest, we see that in bin 640 and relative to the reference condition TP3, there is a main effect of removing the target (parameter 8, PE = – 0.55, HR = 0.58), and main effects of changing the set size to 6 (parameter 13, PE = – 0.76, HR = 0.47), to 12 (parameter 18, PE = – 1.52, HR = 0.22), and to 18 (parameter 23, PE = – 2.05, HR = 0.13) and interaction effects between target absent and set size 6 (parameter 28, PE = – 0.83, HR = 0.43), set size 12 (parameter 31, PE = – 2.06, HR = 0.13), and set size 18 (parameter 34, PE = – 1.99, HR = 0.14), with all p < .0001.

Furthermore, all these effects interact with TIME in a significantly linear, quadratic, cubic and/or quartic fashion (see Table 4). As a result, the effect of target presence and the systematic effect of set size on h(t) in the target-present condition (i.e., SS3 > SS6 > SS12 > SS18) are gone around 1500 ms after search display onset (parameter rows 8, 13, 18, and 23 in Table 4). In contrast to the conjunction search task, the interaction effects between target absent and each set size (parameter rows 28, 31, and 34) do not quickly decrease in absolute size over time (before 1600 ms). In sum, the h(t) functions show a partial ordering with respect to the systematic effects of set size and target presence.

Finally, as shown in Fig. 6c, each additional series of ten trials increases the estimated cloglog[h(t)] value in bin 640 with only 0.00001 units (parameter 38, column 3, p = .98) but this effect increases linearly with TIME (parameter 39, PE = .00028, p < .01), so that each additional series of ten trials increases the estimated cloglog[h(t)] in bin 1600 with 0.00342 units (parameter 38, column 11, p < .001). Second, each additional block will increase the estimated cloglog[h(t)] value with 0.054 units in bin 640 (parameter 40, column 3, p < .0001), and this effect decreases linearly with TIME (parameter 41, PE = – .00145, p < .0001).

Discussion

To study the temporal dynamics of visual search behavior, we applied descriptive and inferential discrete-time event history analyses to published benchmark RT data from three search tasks. To study whether correct or error responses occur we also plotted the ca(t) or micro-level speed–accuracy tradeoff functions, next to the discrete-time h(t) or hazard functions of response occurrence.

Based on the results, we draw four conclusions. First, event history analysis is a useful statistical technique to analyze RT data as it can detect differences that remain hidden when comparing mean RTs, such as the systematic but temporary effect of set size on h(t) in the feature search task. It is now clear that many – if not all – experimental manipulations lead to effects that change over time, whether in the context of masked response priming (Panis & Schmidt, 2016), simultaneous masking (Panis & Hermens, 2014), or object recognition (Panis, Torfs, Gillebert, Wagemans, & Humphreys, 2017; Panis & Wagemans, 2009; Torfs, Panis, & Wagemans, 2010). While many assume that RTs reflect the cumulative duration of all time-consuming cognitive operations involved in a task (e.g., Liesefeld, 2018; Song & Nakayama, 2009), our results show that fast, medium, and slow RTs can actually index different sets of cognitive operations. Due to the advantages of this method (illustrated in the current work) we recommend that it is used more often in future empirical and simulated RT studiesFootnote 7. Second, there are clear individual differences in the presence of a systematic pattern of early false alarms and early misses. Third, the hazard modeling results suggest differences between the underlying processes in the three search tasks, and provide strong constraints for future cognitive modeling efforts. Fourth, there is only a partial ordering of the hazard functions with respect to the effects of set size and target presence, and the hazard functions are relatively flat for the right tail of the RT distributions in all three search tasks.

No pop-out in h(t) for the feature search task

Why is there a systematic but temporary effect of set size (i.e., SS3 > SS6 > SS12 > SS18) on early h(t) for feature search when the target is present (Fig. 5) although there is no effect of set size on mean correct RT (Fig. 1)? At least three factors related to object recognition that were not controlled by Wolfe et al. (2010) might be at play. First, the eccentricity of the target varies from trial to trial, and it is known that peripheral targets take a longer time to be recognized than foveal ones. Second, differences in set size are confounded with differences in density. This means that the receptive field of a single high-level visual neuron might only contain 1 or 2 objects for set size 3, but much more objects for set size 18. As color sensitivity is lower in the periphery, it is likely that visual crowding of the eccentric target occurred with large set sizes in many target-present trials. Third, because the search display was presented until response, more eye-movements could have been made with larger set sizes. If this is the case then the distance between the target location and the eye gaze location will have varied across the within-trial time (i.e., gaze-to-target distance is a time-varying covariate).

A small trend for the reversed effect of set size (i.e., SS3 < SS6 < SS12 = SS18) on early h(t) for feature search was found when the target is absent. This finding is consistent with the proposal that distractor-distractor feature similarity, next to target-distractor feature similarity, plays a role in visual search (Duncan & Humphreys, 1989). Because homogeneous distractors tend to group perceptually based on their high feature similarity they can be rejected together and this can explain why target-absent mean RTs sometimes decrease with increasing set size (Cheal & Lyon, 1992; Duncan & Humphreys, 1989; Humphreys & Müller, 1993).

Attentional capture and cognitive control processes in visual search

We noted that a subset of the observers in each task – those who tended to respond very early on some trials – showed early false alarms coupled with early misses. More specifically, we can distinguish at least three states in the ca(t) behavior of these fast-onset responders, as can be seen clearly in the lower panels of Figures 3, 7, and 10. First, the very fast responses show false alarms (ca(t) ≤ .50) when the target is absent coupled with perfect performance (ca(t) = 1) when the target is present. In other words, these very fast responses display a strong yes-bias, independent from target presence. Second, after this initial ca(t) state the slower – but still relatively fast – responses show perfect accuracy when the target is absent, and a small but temporary increase in the miss-rate when the target is present. Third, after this second ca(t) state responses with intermediate latencies show high accuracy for both target-present and target-absent trials. In the conjunction and spatial configuration search tasks the slower responses in a fourth ca(t) state display a developing “no”-response bias especially for the larger set sizes. In other words, when the search task is difficult, the slower responses show virtually no false alarms and a gradual increase over time of the miss rate.

The results of Kiss, Grubert, and Eimer (2012) provide a likely explanation for the initial yes-bias in the first ca(t) state (see also Lee, Leonard, Luck, & Geng, 2018). They concluded that the attentional selection of targets that are defined by a combination of features – here: “red” and “vertical” in the feature and conjunction search tasks – is a two-stage process: Attention is initially captured by all target-matching features, but is then rapidly withdrawn from distractor objects that share some but not all features with the current target. This means that at the end of the feedforward sweep of the initial neural responses along the ventral and dorsal pathways right after display onset, all elements in the search display will have captured attention to some extent, each signaling the presence of target feature(s) such as red and vertical in the feature and conjunction search tasks, or combinations of left and right curvature in the spatial configuration task. This explains the presence of the early “yes”-response bias in the first ca(t) state of the fast-onset responders.

But why are these early false alarms followed by early temporary misses in the second ca(t) state? If we assume that online error-monitoring processes can detect the task-interfering “yes”-response bias in the earliest response tendencies, then reactive cognitive control processes can kick in (Braver, 2012)Footnote 8. Panis and Schmidt (2016) used EHA to show that RT and accuracy distributions are shaped by active and selective response inhibition of premature response tendencies. Thus, it seems that for those participants that display early overt false alarms, this premature “yes”-response tendency is actively (i.e., top-down) and selectively inhibited – resulting in a temporary disinhibition of the competing “no”-response which would lead to an overt no-response if a momentary threshold is crossed –, which explains the observed small, early and temporary increase in the miss rate in target-present trials, and the concurrent almost complete absence of false alarms in the target-absent trials during the second ca(t) state. Crucially, the early difference in h(t) between target-present and -absent conditions might then be caused partially by a response competition process because both responses will be activated in target-absent trials, and not completely by the fact that target absence is confirmed slower on average than target presence as assumed in serial exhaustive search models.

In other words, at any point in time the hazard of response occurrence and conditional accuracy are not only determined by information from the search process but also by cognitive control processes (see Panis & Schmidt, 2016). As time passes on without response occurrence then the chance that target presence is correctly confirmed or rejected increases and this search information is additionally influencing the ongoing decision process (Cisek & Kalaska, 2010). Responses during the third ca(t) state are therefore dominated by information from the search process (i.e., selective response inhibition signals are overridden by response activation signals from the search outcome) and they thus show high accuracy in both target-present and target-absent trials. Finally, as time passes on response-free and target presence is not yet confirmed or rejected, then search is aborted and a no-bias is developing for the slower responses during a fourth ca(t) state in the conjunction and spatial configuration tasks.

Those observers who show no early errors probably have better proactive control in terms of global (or aselective) response inhibition (Panis & Schmidt, 2016)8. In other words, these observers are proactively and globally inhibiting both the correct and incorrect response channels until reliable information about the search outcome is available. This hypothesis is consistent with the observation that the earliest responses of these observers are emitted somewhat later in time compared to the earliest responses of the observers who show early errors.

Serial versus parallel selection

While there is a general consensus that the current color feature task relies on parallel selection, and that the current spatial configuration task relies on serial selection, this is not the case for the color-orientation conjunction task. According to feature integration theory (Treisman & Gelade, 1980) attentional selection is serial because of the need to bind both surface features for recognition. However, there are many studies that suggest that certain feature conjunctions can actually be detected in parallel (Eckstein, 1998; McElree & Carrasco, 1999; Mordkoff, Yantis, & Egeth, 1990; Pashler, 1987; Sung, 2008). Although our hazard modeling results provide no answer to this issue, they do show task differences. First, the effect of trial number on hazard was similar for the conjunction and spatial configuration task, and different for the feature search task. Second, the interactions involving set size and time became more complex with task difficulty. These observations argue against the proposal that differences between search tasks might be due to purely quantitative differences in target discriminability (Haslam, Porter, & Rothschild, 2001; Liesefeld et al., 2016; Wolfe, 1998).

Perhaps the question whether search is parallel or serial is ill-posed. It is possible that search actually involves parallel selection early in time as reflected in fast responses < ~500 ms, and serial selection later in time as reflected in slower responses > ~500 ms. Indeed, Li, Kadohisa, Kusunoki, Duncan, Bundesen, and Ditlevsen (2018) found that neurons show parallel processing early after search display onset (related to the initial feedforward sweep of neural activity after display onset), whereas they show serial processing later on (related to attentional effects in recurrent feedback connections where all processing capacities are focused on the attended object; see also Gabroi and Lisman, 2003). It is also possible that sequences of discrete attentional shifts emerge automatically from a parallel neural dynamic architecture that operates in continuous time (Grieben et al., 2018).

Effects of set size due to recurrent object recognition and cognitive control processes

According to Reverse Hierarchy Theory (Hochstein & Ahissar, 2002) feature search "pop-out" is attributed to high-level areas where large receptive fields underlie spread attention detecting categorical differences. Search for conjunctions or fine discriminations depends on reentry to low-level specific receptive fields using focused attention. Similarly, Nakayama and Martini (2011) proposed that visual search relies on object recognition processes, with high level processing occurring very rapidly and often unconsciously. They consider object recognition as a problem of linear classification where high-level areas have to disentangle the representations of different object classes by extracting diagnostic feature dimensions. They propose that search tasks vary on a continuum depending on the computational tradeoff between detail of description (number of feature dimensions) and scope (number of objects). Feature search can be performed in a single glance for many objects (a large attentional window) as only one feature dimension is relevant. Configuration search takes time because many feature dimensions have to be extracted for each display element separately and the small attentional window thus moves serially from element to element. For example, to solve the current spatial configuration task spatial attention has to be focused on each stimulus to extract object-centered spatial reference frame information to distinguish a digit 2 from a digit 5. For conjunction search a few feature dimensions are relevant and therefore an intermediate-sized attentional window is used. For example, to solve the current conjunction task a time-consuming attention-based coupling between two neuronal populations might be necessary (one sensitive to color-position and the other to orientation-position) while only one population is necessary for the feature task (color-position; Grieben et al., 2018). Next to theories based on a strategically modifiable attentional window (Humphreys & Müller, 1993; Theeuwes, 1994; Treisman & Souther, 1985) others have proposed that the size of the attentional window is determined by inherent limitations of the system (Engel, 1977; Geisler & Chou, 1995; Hulleman & Olivers, 2017).

Palmer (1995) distinguished between four causes of a set size effect: (a) preselection factors such as target eccentricity and display density, (b) selection factors such as whether only one object or a group of objects can be selected, (c) postselection factors, and (d) decision processes (see also Liesefeld and Müller, 2019). We can add a fifth cause: increases in set size might result in stronger automatic response activation of the yes-response and a stronger selective response inhibition response due to reactive cognitive control. Similarly, if target recognition during search performance depends on reentry to lower-level populations then set size will affect performance due to the link between the complexity of the feature that distinguishes the target from distractors and the receptive field size of the neurons coding for that feature (VanRullen, Reddy, & Koch, 2004). Future studies can use event history analysis to study when and how these different factors affect the shape of the hazard function of response occurrence. Temporally distinguishing the contributions from these factors to h(t) can be done by adding relevant predictors like target eccentricity, density, gaze-to-target distance, target-distractor similarity, feature complexity, working memory capacity, etc. to a hazard model. In other words, by adding the necessary predictors to a hazard model one can control for variation due to variables irrelevant to the research question.

Search is aborted rather early

The systematic effect of set size (i.e., SS3 > SS6 > SS12 > SS18) on response hazards lasted longer for more difficult search tasks. However, the systematic effects of both target presence and set size on hazard are rather limited in time. That is, we observed a partial ordering on the hazard functions (i.e., set size and target presence affected only the left tail of the distributions). For example, for the feature, conjunction, and spatial configuration search tasks the systematic effects of set size and target presence are gone around 500 ms, 1 s, and 2s after search display onset, respectively.

Thereafter, the system transitions to a state with flat hazard functions without systematic effects of set size and target presence (see Figs. 4, 8, and 11). Horizontally shaped hazard functions point to exponentially distributed RTs (see Fig. 2). Because this is observed for every search task including feature search, it suggests that the constant hazards in the right tail of the RT distributions are not related to the visual search process per se, but to a decision-making process in general (Palmer et al., 2011). Hazard functions that show a peak and a flat right tail have been observed before (Holden et al., 2009). Based on the findings of Shenoy, Sahani, and Churchland (2013) we assume that these flat right tails reflect RT outliers during decision making. Shenoy et al. (2013) described neuronal motor activity from a dynamical systems perspective by studying single-trial neural trajectories in a state-space. They found that the neural state wanders before falling back on track in RT outlier trials so that the monkey hesitated for an abnormally long time before movement onset. Interestingly, Thompson, Hanes, Bichot, and Schall (1996) found that much of the RT variance in search tasks is due to postperceptual motor processing, perhaps to provide the adaptive advantage of allowing for subsequent visual processing and cognitive factors to alter the response choice (e.g., explicitly comparing the presumed target with a few surrounding distractors to confirm target presence) before an irrevocable commitment is made.

Recommendations for experimental design of RT and other time-to-event data studies

Two general recommendations can be made from the viewpoint of event history analysis when designing RT studies. First, always use the same fixed response deadline in each trial, for example 500 ms for single-button detection, and 800 ms for an easy two-button discrimination task. Because hazard analysis deals with right-censored observations, there is no need to wait for very slow responses that are considered meaningless and would be trimmed anyway. As a consequence, event history analysis also allows analyzing RT data in masking paradigms, attentional blink paradigm, etc., that is, in paradigms for which RT is typically not measured, let alone analyzed and reported (because typically no differences in mean RT are found for example). Also, using rather short and fixed response deadlines will lead to individual distributions that overlap in time, which is important for h(t) and ca(t) modeling (Panis & Schmidt, 2016). Furthermore, if you wait for a response in each trial and let the overt response end the trial, then you allow subjects to have control over the trial (and experiment) duration, which should be avoided unless this is part of the research question. Second, try to design as many trials as possible per condition because then you can use small bins and still obtain stable h(t) and ca(t) estimates (i.e., use a small-N design; Smith & Little, 2018). Also, designing 100 trials per condition, for example, will not result in a large increase in experiment duration since the response deadline and thus trial duration can be kept short (see Panis & Schmidt, 2016).

Conclusions

RT and accuracy distributions are a rich source of information on the time course of cognitive processing. The changing effects of our experimental manipulations with increases in waiting time become strikingly clear when looking at response hazards and micro-level speed–accuracy tradeoff functions. An event history analysis of time-to-event data can strongly constrain the choice between cognitive models of the same phenomenon. We suggest that future inclusion of recurrent object recognition, learning, and cognitive control processes in computational models of visual search will improve the ability of such models to account for RT distributions and to explain the differences in the time-dispersed behavior of individual searchers.