Appendix A: Non-Decision Time in the ABDM
In our ABDM, we assume that lapses in attention affect decision processes, decreasing the effective rate of information accumulation. Our conceptualisation of these attentional lapses can be most easily understood by contrasting the random walk approximation of the classic DM with the random walk approximation of our ABDM. We begin by showing that the diffusion process in the classic DM is the weak limit of a random walk in which the walker moves either up or down at each time step. We base our exposition on Schilling and Partzsch (2014).
We discretise space into units of size \(\Delta x\) and consider discrete time steps of size \(\Delta t\) in an interval [0, T]. This means that there are \(N = \lfloor {T}/{\Delta t}\rfloor \) time steps in total, where \(\lfloor \cdot \rfloor \) denotes rounding down to the nearest integer. Let \(\mathscr {E}_n \sim \text {Bern}(p)\), \(n = 1, \ldots , N\) be independent Bernoulli random variables with parameter \(p \in [0,1]\) and define \(X_n = 2\mathscr {E}_n-1\). At each time step n, the walker moves \(X_n \Delta x\) units up or down, depending on the sign of \(X_n\). The random walk process is defined as:
$$\begin{aligned} X(T) = \sum _{n=1}^N X_n \Delta x. \end{aligned}$$
(A1)
The mean and variance of X(T) are:
$$\begin{aligned} \mathbb {E}[X(T)]&= N\Delta x (2p-1)\nonumber \\ \mathbb {V}[X(T)]&= N(\Delta x)^2 4p(1-p). \end{aligned}$$
(A2)
We now note that (i) \(X(0) = 0\). If we let \(T' = M\Delta t\), \(M < N\), we may write \(X(T) = (X(T)-X(T')) + (X(T')-X(0))\). Since the \(\mathscr {E}_n\) are independent, identically distributed random variables, the increments \(X(T)-X(T')\) and \(X(T')-X(0)\) are also (ii) independent and (iii) identically distributed random variables. These are three of the four properties we require for X(T) to approximate a diffusion process as \(N\rightarrow \infty \). The last property we require is that increments must be normally distributed with variance \((T-T')\sigma ^2\), where the constant \(\sigma \) is the diffusion coefficient. To show this, we first note that independence of the increments implies that \(\mathbb {V}[X(T)]=\mathbb {V}[X(T)-X(T')]+\mathbb {V}[X(T')-X(0)]\), that is, the variance of X(T) is linear in T:
$$\begin{aligned} \mathbb {V}[X(T)] = T\sigma ^2. \end{aligned}$$
From Eq. (A2), we conclude that:
$$\begin{aligned} \sigma ^2 = \frac{(\Delta x)^2}{\Delta t}4p(1-p). \end{aligned}$$
We now write:
$$\begin{aligned} X(T) = \frac{X(T) - N\Delta x (2p-1)}{\sqrt{T\sigma ^2}}\sqrt{T\sigma ^2} + N\Delta x (2p-1), \end{aligned}$$
(A3)
and let \(p = {1}/{2} \left( \mu \left( {\Delta t}/{\Delta x}\right) +1 \right) \), where the constant \(\mu \) is the drift rate. Taking \(N\rightarrow \infty \) and applying the central limit theorem to the first term shows that (iv) the limiting process has normally distributed increments with mean \(T \mu \) and variance \(T\sigma ^2\). This shows that the process X(T) approximates a diffusion process.
In contrast to the classic DM, the random walk approximation to our ABDM allows the random walker to remain at its current position with a certain probability at each time step. Nevertheless, the weak limit of this random walk is again a diffusion process.
Using the same discretisation of space and time as above, we now consider two sets of independent Bernoulli random variables \(\mathscr {E}_{+,n} \sim \text {Bern}(p_+)\) and \(\mathscr {E}_{-,n} \sim \text {Bern}(p_-)\) with \(n = 1, \ldots , N\) and \(p_{\pm } \in [0,1]\). We now define \(Y_n = \Delta x (\mathscr {E}_{+,n} - \mathscr {E}_{-,n})\) and note that:
$$\begin{aligned} \mathbb {P}(Y_n = \Delta x)&= p_+(1-p_-) \\ \mathbb {P}(Y_n = 0)&= p_+p_-+(1-p_+)(1-p_-) \\ \mathbb {P}(Y_n = -\Delta x)&= p_-(1-p_+). \end{aligned}$$
That is, at each time step n the walker moves \(Y_n \Delta x\) units up, down, or remains at its current position, depending on the value of \(Y_n\). The corresponding random walk process is defined as:
$$\begin{aligned} Y(T) = \sum _{n=1}^N Y_n \Delta x. \end{aligned}$$
(A4)
The mean and variance of Y(T) are:
$$\begin{aligned} \mathbb {E}[Y(T)]&= N\Delta x (p_{+}-p_{-})\nonumber \\ \mathbb {V}[Y(T)]&= N(\Delta x)^2 \left( p_{+}+p_{-}-(p_{+}^2+p_{-}^2)\right) . \end{aligned}$$
(A5)
Clearly, (i) \(Y(0) = 0\) and the independence of \(\mathscr {E}_{+,n}\) and \(\mathscr {E}_{-,n}\) means that Y(T) has (ii) independent and (iii) identically distributed increments. A similar argument as above furthermore shows that the variance of Y(T) is linear in T and:
$$\begin{aligned} \sigma ^2 = \frac{(\Delta x)^2}{\Delta t} \left( p_++p_--(p_+^2+p_-^2)\right) . \end{aligned}$$
We now write:
$$\begin{aligned} Y(T) = \frac{Y(T) - N\Delta x (p_+-p_-)}{\sqrt{T\sigma ^2}}\sqrt{T\sigma ^2} + N\Delta x (p_+-p_-), \end{aligned}$$
(A6)
and let \(p_{\pm } = {1}/{2}\left( (1-p_0)\pm \mu \left( {\Delta t}/{\Delta x}\right) \right) \), where the constant \(\mu \) is the drift rate and \(p_0=p_+p_-+(1-p_+)(1-p_-)\) is the probability that the random walker does not move at some time step. Taking \(N\rightarrow \infty \) and applying the central limit theorem to the first term shows that (iv) the limiting process has normally distributed increments with mean \(T\mu \) and variance \(T\sigma ^2\). This shows that the process Y(T) approximates a diffusion process. Moreover, the expression for the expectation in Eq. (A5) shows that information is accumulated at a lower rate if non-decision time is larger (i.e. for larger \(p_0\)).
Appendix B: Relationship Between the Attention-Based and Classic Diffusion Model
Here, we explore how the parameters of the classic DM are related to the parameters of our ABDM. From the expression for the mean RT and the probability of choosing option A given in Eqs. (2) and (3), we can derive the corresponding expressions in our ABDM for the mean RT, \(M_{RT~ABDM}\):
$$\begin{aligned} M_{RT~ABDM} = \mathbb {E}(t) = \frac{s_{ABDM}}{2v_{ABDM}c} \frac{1-\exp {\left( -\frac{v_{ABDM}}{cs_{ABDM}}\right) }}{1 + \exp {\left( -\frac{v_{ABDM}}{cs_{ABDM}}\right) }}, \end{aligned}$$
(B1)
and for the probability of choosing option A, \(P_{A~ABDM}\):
$$\begin{aligned} P_{A~ABDM} = \mathbb {P}(x=1) = \frac{1}{1+\exp {\left( -\frac{v_{ABDM}}{cs_{ABDM}}\right) }}. \end{aligned}$$
(B2)
These latter expressions provide a simple way to study the relationship between our ABDM and the classic DM. Given a fixed value for c, say \(c=1/8\), the ABDM includes only two free parameters, which, for the purpose of this comparison, are denoted \(v_{ABDM}\) and \(s_{ABDM}\). As can be seen from Eqs. (B1) and (B2), in the absence of further constraints on the model parameters, \(M_{RT~ABDM}\) and \(P_{A~ABDM}\) are only determined up to a ratio of \(v_{ABDM}\) and \(s_{ABDM}\). For our comparison with the classic DM, we will, therefore, assume that \(s_{ABDM}=1\). Similarly, for this comparison we denote the parameters of the DM as \(v_{DM}\), \(s_{DM}\), \(z_{DM}\), \(a_{DM}\), and \(t_{0_{DM}}\). We adopt the usual assumption that the diffusion coefficientFootnote 3\(s_{DM}=1\), and we furthermore assume that the decision process is unbiased, that is, \(z_{DM}=\frac{a_{DM}}{2}\) (this corresponds to the EZ-diffusion model; Wagenmakers et al., 2007). Thus, the DM has three free parameters, \(v_{DM}\), \(a_{DM}\), and \(t_{0_{DM}}\). The resulting expression in the DM for the mean response, \(M_{RT~DM}\), is:
$$\begin{aligned} M_{RT~DM} = \mathbb {E}(t) = t_{0_{DM}}+\frac{a_{DM}}{2v_{DM}} \frac{1-\exp {\left( -v_{DM}a_{DM}\right) }}{1 + \exp {\left( -v_{DM}a_{DM}\right) }}, \end{aligned}$$
(B3)
and the expression for the probability of choosing option A, \(P_{A~DM}\), is:
$$\begin{aligned} P_{A~DM} = \mathbb {P}(x=1) = \frac{1}{1+\exp {\left( -v_{DM}a_{DM}\right) }}. \end{aligned}$$
(B4)
We can now study the relationship between the two models by setting \(M_{RT}=M_{RT~ABDM}=M_{RT~DM}\) to a fixed value, and numerically solving Eq. (B1) for \(v_{ABDM}\) and numerically solving Eq. (B3) for \(v_{DM}\) whilst \(a_{DM}\) and \(t_{0_{DM}}\) are held constant at different values.
Figure 8 shows the relationship between the ABDM parameters (x-axis) and the DM parameters (y-axis). The grey curves show how \(v_{ABDM}\) relates to \(v_{DM}\) as \(M_{RT}\) decreases from 15 s (bottom left on each curve) to \(t_{0_{DM}}\) s (top right on each curve). Each grey curve shows the relationship between \(v_{ABDM}\) and \(v_{DM}\) for a different fixed value of \(t_{0_{DM}}\). Black dots show the values of \(v_{ABDM}\) and \(v_{DM}\) that produce a mean response time of \(M_{RT}=6\) s. The left panel shows the relationship when \(a_{DM}=8\); the right panel shows the relationship when \(a_{DM}=12\). As can be seen, the qualitative pattern is the same for both values of \(a_{DM}\): \(v_{ABDM}\) and \(v_{DM}\) increase monotonically as \(M_{RT}\) increases from left to right along any of the grey curves. Higher values of boundary separation in the DM merely scale up the x-axis, which means that larger values of \(v_{DM}\) are required to match the \(M_{RT}\) produced by the ABDM. For instance, if \(a_{DM}=8\) and \(t_{0_{DM}}=0\) s (leftmost straight line in the left panel), a mean response time of \(M_{RT}=6\) s is produced by the same drift rate of \(v_{DM}=v_{ABDM}=0.66\) in the DM and in the ABDM. If \(a_{DM}=12\) and \(t_{0_{DM}}=0\) s (leftmost straight line in the right panel), a mean response time of \(M_{RT}=6\) s is produced by a drift rate of \(v_{DM}=1.00\) in the DM but only requires a drift rate of \(v_{ABDM}=0.66\) in the ABDM.
The effect of \(t_{0_{DM}}\) for a fixed value of \(a_{DM}\) is to modulate the slope and curvature of the grey curves. For larger values of \(t_{0_{DM}}\), the slope of the grey curves becomes shallower and small mean response times are produced by much smaller values of \(v_{ABDM}\) compared to \(v_{DM}\). For instance, if \(a_{DM}=8\) (left panel) and \(t_{0_{DM}}=0\) s, a mean response time of \(M_{RT}=6\) s is produced by \(v_{ABDM}=v_{DM}=0.66\). As the non-decision time increases to \(t_{0_{DM}}=5\) s, the DM requires \(v_{DM}=4\) to produce a mean response time of \(M_{RT}=6\) s, whilst the ABDM produces the same mean response time with \(v_{DM}=0.66\).
Taken together, in the case of non-response time close to 0 s, the drift rate parameters in both models are approximately linearly related. Hence, for small non-decision times the drift rate parameter in the ABDM can be interpreted analogously to the drift rate parameter in the classic DM. For higher values of boundary separation in the classic DM, the drift rate in the classic DM corresponds to a lower ratio of boundary separation and diffusion coefficient c in the ABDM. For larger values of non-response time, the relation between the drift rate parameters in both models becomes increasingly nonlinear as the drift rate parameter in the ABDM accounts for the combined effects of non-decision and decision processes. However, as already observed in our simulation studies, the relationship between the drift rate parameters in both models remains monotonic, which means that comparisons based on the ordering of different drift rate values in the two models remain valid.
Appendix C: Analysis of Data Set II From Molenaar (2015)
Data set II from Molenaar (2015) consists of RT and accuracy data of 121 test takers answering ten items of a mental rotation test. Each item had a response deadline of 7.5s (Borst et al., 2011). Twenty-six persons failed to complete all items, and we excluded these person’s data from our analysis.
Figure 9 shows the marginal RT quantiles (0, 0.2, 0.4, 0.6, 0.8, and 1.0) for each of the ten items and for each of the 95 persons, ordered by mean RT. As can be seen, marginal RT distributions for the items (left plot) had a relatively shallow leading edge and a typical long tail. Compared to Van Rijn and Ali’s (2017) data, the minimum RTs in this data set were longer relative to the median RT; the ratio of the minimum observed RT to the median RT ranged between 0.25 and 0.45, with a mean of 0.35. Hence, item-specific additive non-decision components might have a more pronounced effect the RT data.
Marginal RT distributions for the persons (right plot) again had a shallow leading edge and exhibited a typical long tail. However, due to the small number of items, the exact shape of the marginal distributions is insufficiently constrained by the data. The ratio of the minimum observed RT to the median RT ranged between 0.28 and 0.93, with a mean of 0.58. This again means that person-specific additive non-decision components might significantly affect the RT data.
We compared the relative and absolute fit of our ABDM to Van der Linden’s (2009) hierarchical model and Van der Maas et al.’s (2011) Q-diffusion model in the same way as in the main text. For the comparison with the hierarchical model, we again used the golden section algorithm to estimate the boundary-to-volatility ratio c with the same termination criteria as before. We subsequently obtained 12,000 posterior samples from 3 MCMC chains for both models and discarded the first 2,000 samples as burn-in. All chains showed good convergence, with \(\hat{R} < 1.01\) for both models.
We assessed relative model fit by means of AIC, BIC and DIC. Due to the long computing times required for the evaluation of the full ABDM likelihood function, we based the computation of the model selection criteria on the first 1,000 posterior samples after burn-in from a single MCMC chain. Table 4 shows the relative fit of the two models. As can be seen, the log-likelihood was smaller for the ABDM. AIC and BIC indicated that the hierarchical model fitted the data better than the ABDM, whereas DIC preferred the ABDM over the hierarchical model.
Table 4 Relative model fits. We assessed the absolute model fit by means of the posterior predictive mean RTs and accuracies for persons and items. We generated 500 posterior predictive samples for each item and person. Figure 10 shows the comparison of the data and the posterior predictive samples generated by the two models, ordered by the mean RT in the data. As can be seen, the ABDM generated wider posterior predictive intervals for RTs (left column in the left panel) than the hierarchical model (left column in the right panel). Whereas the ABDM systematically underpredicted the mean item RTs (top left plot in the left panel), the hierarchical model’s posterior predictives matched the mean item RTs relatively closely (top left plot in the right panel). Both models’ posterior predictives covered the mean person RTs well (bottom left plot in the both panels). The results for the mean accuracies showed the opposite pattern. Whereas the ABDM matched the item and person accuracies well (right column in the left panel), the hierarchical model underpredicted both the mean item and person accuracies (right column in the right panel). Taken together, the hierarchical model provided a better fit to the RT data, whereas the ABDM provided a better fit to the accuracy data.
We compared our ABDM to the Q-diffusion model based on maximum-likelihood fits. We again used the estimate of the boundary-to-volatility ratio c from the Bayesian implementation of the ABDM to avoid the formidable computational costs associated with evaluating the complete ABDM likelihood.
We assessed relative model fit by means of AIC and BIC. Table 5 shows the relative fit of the two models. As can be seen, the log-likelihood was smaller for the ABDM than for the Q-diffusion model. AIC preferred the Q-diffusion model over the ABDM, whereas BIC preferred the ABDM over the Q-diffusion model. This difference between the two criteria is due to the higher penalty for model complexity imposed by BIC, which aims to select simpler models at small sample sizes. These results thus suggest that the ABDM provided a good account of the data at such a small sample size, despite being much simpler than the Q-diffusion model.
Table 5 Relative model fits.
Table 6 Relative model fits.
We used fivefold cross-validation to assess the absolute model fit. Firstly, we divided the data into five folds of 19 persons each, removed one fold from the data and fitted both models to the remaining data (person-based fit). Secondly, we divided the data into five folds of two items each, removed one fold from the data and fitted both models to the remaining data (item-based fit). We used the results of the person-based fit to compute the item parameters for the item fold removed in the item-based fit, and we used the results of the item-based fit to compute the person parameters for the person fold removed in the person-based fit. We subsequently combined the person and item parameters obtained in the two steps and used expressions (2) and (3) to predict RTs and accuracies for the person-by-item combinations contained in the removed folds. This was repeated for all 25 person-by-item folds.
Figure 11 shows the deviation between observed and predicted RTs (top row of plots) and accuracies (bottom row of plots) for the ABDM (left column of plots) and the Q-diffusion model (right column of plots). Each column of dots shows the results for the 19 persons in one person fold, each cluster of five columns of dots shows the results for all five person folds on one item, and each group of two items shows the results for all five person folds on one item fold. As can be seen, the ABDM tended to slightly overpredict RTs and produced more extreme outliers than the Q-diffusion model. For the Q-diffusion model, prediction errors clustered symmetrically around 0 with fewer outliers.
The results for the predicted accuracy show the opposite pattern of the response times. For the ABDM, prediction errors clustered closely around 0 with a few negative outliers. For the Q-diffusion model, prediction errors were larger and the model tended to underpredict accuracy.
Taken together, the comparison of the relative fits was ambiguous in that some criteria preferred the ABDM, whilst others preferred the hierarchical model or the Q-diffusion model. In terms of absolute fit, the ABDM tended to capture the accuracy data better than its competitor models, whilst the hierarchical model and the Q-diffusion model provided a better account for the RT data. A possible explanation for the worse performance of the ABDM is the very small number of items, which might have compromised the estimation of the person parameters. This explanation is also supported by the fact that the posterior predictive ranges for the person RTs and accuracies in Fig. 10 are much wider than the posterior predictive ranges for the item RTs and accuracies.
Appendix D: Further Cross-Validation Results for the comparison of the ABDM and the Q-Diffusion Model
The comparison of the relative model fit by means of AIC and BIC is shown in Table 6. As reported in the main text, the log-likelihood was smaller for the ABDM than for the Q-diffusion model. Whilst AIC preferred the Q-diffusion model over the ABDM, BIC preferred the ABDM over the Q-diffusion model. This difference between the two criteria is due to the higher penalty for model complexity imposed by BIC.
The comparison of the absolute fit is presented in Fig. 12. As can be seen, the prediction errors for RTs for the ABDM clustered symmetrically around 0, with a few large positive outliers. This means that the ABDM predicted RTs were largely unbiased, except for very long RTs, in which case the ABDM tended to underpredict observed RTs. For the Q-diffusion model, prediction errors also clustered symmetrically around 0. The spread of the prediction error for the Q-diffusion model was comparable to that for the ABDM. Finally, the Q-diffusion model showed a similar tendency to underpredict long RTs as the ABDM.
The results for the predicted accuracy are again similar between the two models. For the ABDM, prediction errors tend to be positive, which indicates a tendency to underpredict accuracies. Nevertheless, most prediction errors are close to 0, which means that the ABDM generally captured the probability of a correct response well. The Q-diffusion model showed a similar tendency to underpredict accuracies but, in general, predicted the probability of a correct response well, with most prediction errors being slightly larger than 0. The spread of the prediction errors is also comparable to those observed for the ABDM. Taken together, both models provided a comparable absolute fit to the data.