# Beyond trial types

## Abstract

Conventional wisdom on psychological experiments has held that when one or more independent variables are manipulated it is essential that all other conditions are kept constant such that confounding factors can be assumed negligible (Woodworth, 1938). In practice, the latter assumption is often questionable because it is generally difficult to guarantee that all other conditions are constant between any two trials. Therefore, the most common way to check for confounding violations of this assumption is to split the experimental conditions in terms of “trial types” to simulate a reduction of unintended trial-by-trial variation. Here, we pose a method which is more general than the use of trial types: use of mathematical models treating measures of potentially confounding factors and manipulated variables as equals on the single-trial level. We show how the method can be applied with models that subsume under the generalized linear item response theory (GLIRT), which is the case for most of the well-known psychometric models (Mellenbergh, 1994). As an example, we provide a new analysis of a single-letter recognition experiment using a nested likelihood ratio test that treats manipulated and measured variables equally (i.e., in exactly the same way) on the single-trial level. The test detects a confounding interaction with time-on-task as a single-trial measure and yields a substantially better estimate of the effect size of the main manipulation compared with an analysis made in terms of trial types.

## Beyond trial types

Common wisdom has implied a restrictive conception of psychological experiments. In the words of one of the fathers of modern experimental psychology, Robert S. Woodworth, “an experimenter is said to control the conditions in which an event occurs” (Woodworth, 1938). By manipulating the experimental conditions (changing trial types), one or more independent variables are varied, and the associated variations in the participants’ performance or reported experience (the dependent variables) are observed. According to Woodworth, “whether one or more independent variables are used, it remains essential that all other conditions be constant. Otherwise you cannot connect the effect observed with any definite cause” (Woodworth, 1938).

Notwithstanding this claim, cognitive neuroscientists have recently begun to use physiological measures that fluctuate from trial to trial as explanatory variables along with manipulated variables (see Cavanagh et al., 2011; O’Doherty, Hampton, & Kim, 2007). We further this development by proposing that using mathematical models, single-trial measures and manipulated variables can be treated as equals in statistical tests. The method is readily applicable to models that subsume under the generalized linear item response theory (GLIRT), which is the case for most of the well-known psychometric models (Mellenbergh, 1994). In GLIRT, a linear combination of latent and observed explanatory variables is used as a predictor of the expected response of a participant to a stimulus item in a specified format. We show that the special case of the Theory of Visual Attention (TVA; Bundesen, 1990) used for modeling single-stimulus recognition (e.g., Bundesen & Harms, 1999; Vangkilde, Coull, & Bundesen, 2012) is also a special case of GLIRT, and we present a new analysis of a single-letter recognition experiment based on this theory (Vangkilde et al., 2012, Experiment 3). The new analysis shows that the expected response of a participant on a particular trial depends strongly on the time-on-task associated with the trial in question. This confound is grossly underestimated by a traditional analysis in terms of trial types (early vs. late trials), and it even goes undetected in a standard post hoc check.^{1}

## Single-letter recognition under GLIRT

TVA is often used to describe an observer’s recognition accuracy as a function of exposure duration *t*. In its most commonly applied form, TVA provides estimates for the following perceptual parameters: visual short-term memory (VSTM) capacity *K* (in units of elements), processing speed *C* (rate of categorization in units of elements per second), a temporal threshold *t*_{0} (seconds), attentional weights {*w*_{x}} (unitless) for a fixed set of display positions {*x*}, and a measure of the efficiency of top-down control α (unitless ratio of the attentional weight of a distractor to the weight of a target). This particular parameterization has been widely applied in studies of partial report, whole report, and change detection (Bundesen & Habekost, 2008; Duncan et al., 1999; Gillebert et al., 2012; Habekost & Starrfelt, 2009; Hung, Driver & Walsh, 2005, 2011; Kyllingsbæk & Bundesen, 2009; Shibuya & Bundesen, 1988). The parameters have traditionally been assumed to be nearly constant within each trial type (Kyllingsbæk, 2006), but recent advances have shown that this assumption leads to systematic errors (Dyrholm, Kyllingsbæk, Espeseth & Bundesen, 2011). Here, we estimate parameters on individual trials (the *v* values in Eqs. 1 and 2 below on every trial *n*) using a linear predictor (the right-hand side of Eq. 2) that varies between any two trials (for related work on single-trial inference using the number of correctly reported targets on a given trial for inferring the number of distractors in VSTM on the same trial of a partial report task, see Dyrholm, Kyllingsbæk, Vangkilde et al. 2011).

*t*. Summed across

*N*Bernoulli trials with the same exposure duration

*t*, the number of correct responses follows a binomial distribution with parameters

*N*and

*p*, where the probability

*p*that a given item is correctly reported defines the expected value of the participant’s response on each trial (Mellenbergh, 1994). In the single-stimulus case (Bundesen & Harms, 1999; Dyrholm, Kyllingsbæk, Espeseth & Bundesen, 2011), TVA implies that

*p*= 1 − exp(− τ

*v*) where τ =

*t*−

*t*

_{0}is the effective exposure duration if

*t*exceeds the temporal threshold

*t*

_{0}, whereas

*p*= 0 if

*t*≤

*t*

_{0}. The parameter

*v*is the conventional single-stimulus equivalent of the

*C*parameter of TVA. From this, we derive a function of the expected item response

*p*on a given trial.

*n*is the trial number. This function is monotonic and differentiable as required for a link function under GLIRT. Inserting a linear predictor of the logarithm of

*v*

_{n},

^{2}: The responses are modeled as independently distributed across trials given the values of the explanatory variables; a distribution of the responses occurs according to the given item format (here a dichotomous format: correct vs. incorrect); and the item responses

*p*

_{n}are explained by a continuous latent variable

*v*

_{n}(Mellenbergh, 1994). In other words, this model has the structure of a generalized linear model (Knoblauch & Maloney, 2012; McCulloch & Searle, 2001) with a highly specialized link function that allows for nonlinear regression of item responses in a single-stimulus recognition task. The specialized link function is exactly such that the stimulus exposure duration

*t*and the participant’s perceptual threshold

*t*

_{0}are both taken into account in accordance with TVA.

It was recently found that perceptual processing speed *v* is modulated by the observer’s expectation regarding the foreperiod between a cue and a subsequent target letter occurrence (Vangkilde et al., 2012; Vangkilde, Petersen & Bundesen, 2013). Specifically, in a single-letter recognition experiment (Vangkilde et al., 2012), two levels of expectancy were induced in the participants by two types of trials, one type with a higher hazard rate of stimulus presentation than the other. Across all participants perceptual processing was 40 % faster in the high expectancy condition compared with the low expectancy condition. This finding was interpreted as suggesting that higher expectations speed up perceptual processing.

However, it is well known that maintaining attention over a prolonged period of time may negatively affect attentional efficiency (Robertson et al. 1997). Even though such effects of “time-on-task” could potentially hinder optimal performance, they are rarely taken into account in studies that do not focus explicitly on sustained attention. Thus, an alternative explanation of the finding by Vangkilde et al. (2012) could be that low-expectancy trials are substantially more susceptible to time-on-task effects leading to a rapid decline in processing speed across a test session which is not seen in the high-expectancy trials.

To exemplify the explanatory power of the model expressed in Eqs. 1 and 2, we present a new analysis of the same experiment (Vangkilde et al., 2012, Experiment 3), this time including “time-on-task” as a potentially explanatory variable which is tested in the same way as variables represented in terms of trial types.

## Method

### Participants

Each of eight young female participants completed eight sessions of 480 trials each.

### Procedure

*t*of the target letter was randomly sampled from the set {10 ms, 20 ms, 50 ms, 80 ms} such that all exposure durations were used equally over the course of a session.

The hazard rate (high vs. low) alternated between blocks of 60 trials. The foreperiod between the cue and the target letter was chosen at random from the set {0.5 s, 1.0 s, 1.5 s,…} following two different geometric distributions which are shown in Fig. 1b. The foreperiod distributions were defined such that, in the high hazard rate condition the expected foreperiod was 0.75 s (a hazard rate of 1.33 Hz), and in the low hazard rate condition it was 4.5 s (a hazard rate of 0.22 Hz).

### Computational model

For the computational GLIRT TVA model, the cue-target foreperiod (*FP*) of 0.5 s was chosen as the reference, so all other foreperiod coefficients were relative to this. For the hazard rate (*HR*), the low condition was the chosen reference. A time-on-task variable (*T*) was defined on the single-trial level by translation and scaling of the stimulus-onset time relative to the session such that the value of *T* increased monotonically from 0.0 on the first trial of the session to 1.0 on the last trial of the session (the 480th trial; the first trial was the reference trial).

*q*, let {

*q*} be the binary truth value (0 or 1) of

*q*. In the first model (Model 1), the natural logarithm of the perceptual processing speed of the correct categorization of the stimulus letter shown on trial

*n*is given by

*a*

_{1}= ln(

*v*

_{base}), and

*T*

_{n}= (

*A*

_{n}−

*A*

_{1})/(

*A*

_{480}−

*A*

_{1}) is the time-on-task variable,

*A*

_{n}being the onset time of trial

*n*, for

*n*= 1, 2,…, 480. Parameter

*v*

_{base}is the value of

*v*in the reference condition (i.e., when

*FP*= 0.5 s,

*T*= 0.0, and

*HR*= low). By exponentiating both sides of Eq. 3 a simple multiplicative structure is obtained,

A sequential likelihood ratio test was designed to test Models 1–4 (i.e., effects of the foreperiod and hazard rate, as well as time-on-task effects including possible interaction with the hazard rate). Maximum-likelihood estimation of the model coefficients *a*_{j} in Eq. 2 was achieved via chain rules extending the Newton step (Dyrholm, Kyllingsbæk, Espeseth, et al. 2011) for estimating *v*_{n}. Estimated model coefficients *a*_{j} were mapped to [exp(*a*_{j}) − 1] × 100 % to represent the percentage difference in perceptual processing speed per unit increase of the corresponding explanatory variable *x*_{j}. For each of the four models, the individual coefficients were tested on the group level against the null hypothesis that the percentage difference was zero. This was done for each model coefficient by summing the corresponding 64 likelihood ratio test statistics (one per subject per session). Significance levels were then derived from a Chi-square distribution with 64 degrees of freedom.

## Results

*v.*Averaged across participants and sessions the model is summarized as follows (cf. Table 1): An increase in

*v*by 7 % when the foreperiod was 1.0 s as compared to the other foreperiods, a 28 % increase in

*v*when the hazard rate was high compared to when it was low, and a gradual decrease in

*v*over the course of a session amounting to 4 % in the high hazard rate condition and 27 % in the low hazard rate condition. That is, the gradual decrease in perceptual processing speed over time happened at significantly different rates in the two different hazard rate conditions (see Fig. 2). This interaction was detected in the test by rejecting Model 4 when posed as an alternative to Model 3. The modeling of this interaction using time-on-task as a single-trial measure caused a strong reduction in the estimated magnitude of the temporal expectation effect (compare Models 3 and 4 in Table 1): From an estimated 46 % increase in processing speed

*v*, down to an estimated 28 % increase in

*v*in the high hazard rate condition as compared with the low hazard rate condition.

Testing with a single-trial measure of time-on-task

Variable | Coefficient (as % difference) | |||
---|---|---|---|---|

Model 1 | Model 2 | Model 3 | Model 4 | |

In terms of trial types | ||||

Foreperiod | ||||

=1.0 s | 5.24*** | 4.91*** | 7.28*** | 7.46** |

=1.5 s | −2.21 | |||

≥1.5 s | −5.21 | |||

≥2.0 s | −4.46 | |||

Hazard Rate | ||||

=high | 25.30* | 24.17* | 28.38*** | 45.59*** |

Beyond trial types | ||||

Time-on-task | ||||

| −16.52*** | |||

Interactions | ||||

| −3.93* | −4.01* | −3.73* | |

| −26.46*** | −26.64*** | −26.68*** |

Testing time-on-task in terms of trial types

Variable | Coefficient (as % difference) | |||
---|---|---|---|---|

Model 5 | Model 6 | Model 7† | Model 8 | |

In terms of trial types | ||||

Foreperiod | ||||

=1.0 s | 5.25*** | 4.91*** | 7.16*** | 7.22** |

=1.5 s | −2.42 | |||

≥1.5 s | −4.97 | |||

≥2.0 s | −4.04 | |||

Hazard rate | ||||

=high | 37.79*** | 36.41*** | 40.75*** | 49.19*** |

Time-on-task trial type | ||||

{ | −7.35 | |||

Interactions | ||||

{ | −1.24** | −1.12* | −.97** | |

{ | −12.49* | −12.56* | −12.72* |

Compare the effect size of 41 % obtained in terms of trial types with the effect size of 28 %, which was found using time-on-task as a single-trial measure. A model selection problem arises: Which one is the better estimate? To answer this question we computed the Bayes factor per session by the ratio between marginal likelihoods as derived analytically and implemented for the single-stimulus TVA by Dyrholm, Kyllingsbæk, Espeseth, et al. (2011). With an average Bayes factor of 6.97 to one against, the single-trial model was substantially better than the trial type model (see, e.g., Rouder et al. 2012, for a contemporary description of Bayes factors).

An even worse result than the 41 % could have been obtained if one had waited to introduce the time-on-task trial types until making a post hoc check for confounding variables. This is evident from Model 8 in Table 2 where the time-on-task trial type variable is found to be insignificant. At this point a naive experimenter could have concluded incorrectly that time-on-task effects were negligible. Estimating the GLIRT model that comes out of Model 8 with the time-on-task trial type variable removed yields a main effect size of 49 % increase from the low to the high hazard rate condition—an effect size estimate which is 1.75 times higher than our current best estimate of 28 %.

## Discussion

We have presented a general method for analysis of experimental data through the use of mathematical models treating measures of potentially confounding factors and manipulated variables as equals on the single-trial level. We have also shown how the method can be applied with models that subsume under GLIRT. Specifically, we showed that the special case of TVA that is commonly used in single-item recognition is also a special case of GLIRT, and presented a thorough reanalysis of a single-letter recognition experiment (Vangkilde et al., 2012, Experiment 3) based on TVA. Our exemplary analysis incorporated a single-trial measure of time-on-task although this variable was neither manipulated nor assumed constant. Formal model selection showed that this way of estimation was more precise than the one obtained using early and late trial types. Qualitatively speaking, the model selection showed that the confounding interaction was gradual rather than reflecting a sudden change in type from early to late trials. Note that the gradual model is more general in nature than the trial type model: There are trivial scalar functions of the gradual time-on-task measure which yield the equivalent of the trial type model, but not the other way round. Naturally, one may try other nonlinear transformations of explanatory variables that go beyond trial types, thereby finding quantitatively better mathematical models of behavior (Cavanagh et al., 2011; Dyrholm et al. 2012). Our method differs from generalist data mining methods (e.g., Hinton & Salakhutdinov, 2006) by predicting through cognitive parameters. The method also differs from cognitive model-based functional neuroimaging (O’Doherty et al. 2007) by having behavioral response predictability as the explicit objective. In situations with limited data, the method should be extended to a mixed/random effects framework.

In summary, we have presented a method for checking the extent to which something measurable has an effect on observed behavioral responses. The method is readily applicable with models that fall under GLIRT by including the potentially confounding measured variables along with the manipulated variables on the single-trial level using standard tests (Mellenbergh, 1994). Our detailed example of this incorporated a measure of time-on-task in a single-letter identification response model. A measure of time-on-task will almost always be available, but a wealth of other measures may also be available depending on the paradigm, including measures of previous stimuli and responses, and physiological measures.

## Footnotes

- 1.
In this article, a "confounder" means a variable that is a source of systematic error because it co-varies with one or more independent variables (the most traditional meaning of the word) or because it modifies the effect of some of the independent variables of interest. Note that confounders are present in almost any study.

- 2.
Or, at least, under a modified version of GLIRT in which the link function may vary between stimuli (cf. stimulus parameter

*t*) and subjects (cf. subject parameter*t*_{0}).

## Notes

### Acknowledgments

This research was supported in part by grants from the Center of Excellence Program of the University of Copenhagen and from the Sapere Aude Program of the Danish Council for Independent Research. The authors would like to thank John Duncan and Gordon D. Logan for useful comments on an earlier version of this manuscript.

### References

- Bundesen, C. (1990). A theory of visual attention.
*Psychological Review,**97*, 523–547.CrossRefPubMedGoogle Scholar - Bundesen, C., & Habekost, T. (2008).
*Principles of visual attention: linking mind and brain*. New York: Oxford University Press.CrossRefGoogle Scholar - Bundesen, C., & Harms, L. (1999). Single-letter recognition as a function of exposure duration.
*Psychological Research,**62*, 275–279.CrossRefGoogle Scholar - Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., et al. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold.
*Nature Neuroscience,**14*, 1462–1467. doi:10.1038/nn.2925.CrossRefPubMedCentralPubMedGoogle Scholar - Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Chavda, S., & Shibuya, H. (1999). Systematic analysis of deficits in visual attention.
*Journal of Experimental Psychology,**128*, 450–478.CrossRefPubMedGoogle Scholar - Dyrholm, M., Kyllingsbæk, S., Espeseth, T., & Bundesen, C. (2011). Generalizing parametric models by introducing trial-by-trial parameter variability: the case of TVA.
*Journal of Mathematical Psychology,**55*, 416–429.CrossRefGoogle Scholar - Dyrholm, M., Kyllingsbæk, S., Vangkilde, S., Habekost, T., & Bundesen, C. (2011). Single-trial inference on visual attention.
*American Institute of Physics Conference Proceedings,**1371*, pp 37–43.Google Scholar - Dyrholm, M., Nordfang, M., & Bundesen, C. (2012). Mining the brain with a Theory of Visual Attention. In Proceedings of the 2nd NIPS Workshop on Machine Learning and Interpretation in Neuroimaging 2012.Google Scholar
- Gillebert, C. R., Dyrholm, M., Vangkilde, S., Kyllingsbæk, S., Peeters, R., & Vandenberghe, R. (2012). Attentional priorities and access to short-term memory: parietal interactions.
*Neuroimage,**62*, 1551–1562.CrossRefPubMedGoogle Scholar - Habekost, T., & Starrfelt, R. (2009). Visual attention capacity: a review of TVA-based patient studies.
*Scandinavian Journal of Psychology,**50*, 23–32.CrossRefPubMedGoogle Scholar - Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks.
*Science,**313*, 504–507.CrossRefPubMedGoogle Scholar - Hung, J., Driver, J., & Walsh, V. (2005). Visual selection and posterior parietal cortex: effects of repetitive transcranial magnetic stimulation on partial report analyzed by Bundesen’s theory of visual attention.
*Journal of Neuroscience,**25*, 9602–9612.CrossRefPubMedGoogle Scholar - Hung, J., Driver, J., & Walsh, V. (2011). Visual selection and the human frontal eye fields: effects of frontal transcranial magnetic stimulation on partial report analyzed by Bundesen’s theory of visual attention.
*Journal of Neuroscience,**31*, 15904–15913.CrossRefPubMedGoogle Scholar - Knoblauch, K., & Maloney, L. (2012).
*Modeling psychophysical data in R*. London: Springer.CrossRefGoogle Scholar - Kyllingsbæk, S. (2006). Modeling visual attention.
*Behavior Research Methods,**38*, 123–133.CrossRefPubMedGoogle Scholar - Kyllingsbæk, S., & Bundesen, C. (2009). Changing change detection: improving the reliability of measures of visual short-term memory capacity.
*Psychonomic Bulletin & Review,**16*, 1000–1010.CrossRefGoogle Scholar - McCulloch, C. E., & Searle, S. R. (2001).
*Generalized, linear, and mixed models*. Toronto: Wiley.Google Scholar - Mellenbergh, G. J. (1994). Generalized linear item response theory.
*Psychological Bulletin,**115*, 300–307.CrossRefGoogle Scholar - O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making.
*Annals of the New York Academy of Sciences,**1104*, 35–53.CrossRefPubMedGoogle Scholar - Robertson, I. H., Manly, T., Andrade, J., Baddeley, B. T., & Yiend, J. (1997). Oops!’: performance correlates of everyday attentional failures in traumatic brain injured and normal subjects.
*Neuropsychologia,**35*(6), 747–758.CrossRefPubMedGoogle Scholar - Rouder, N. J., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs.
*Journal of Mathematical Psychology,**56*, 356–374.CrossRefGoogle Scholar - Shibuya, H., & Bundesen, C. (1988). Visual selection from multielement displays: measuring and modeling effects of exposure duration.
*Journal of Experimental Psychology,**14*, 591–600.PubMedGoogle Scholar - Vangkilde, S., Coull, J. T., & Bundesen, C. (2012). Great expectations: temporal expectation modulates perceptual processing speed.
*Journal of Experimental Psychology,**38*, 1183–1191.PubMedGoogle Scholar - Vangkilde, S., Petersen, A., & Bundesen, C. (2013). Temporal expectancy in the context of a theory of visual attention.
*Philosophical Transactions of the Royal Society B Biological Sciences,**368*(1628), 20130054.CrossRefPubMedCentralGoogle Scholar - Woodworth, R. S. (1938).
*Experimental psychology*. New York: Holt.Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.