Modeling across-trial variability in the Wald drift rate parameter

The shifted-Wald model is a popular analysis tool for one-choice reaction-time tasks. In its simplest version, the shifted-Wald model assumes a constant trial-independent drift rate parameter. However, the presence of endogenous processes—fluctuation in attention and motivation, fatigue and boredom—suggest that drift rate might vary across experimental trials. Here we show how across-trial variability in drift rate can be accounted for by assuming a trial-specific drift rate parameter that is governed by a positive-valued distribution. We consider two candidate distributions: the truncated normal distribution and the gamma distribution. For the resulting distributions of first-arrival times, we derive analytical and sampling-based solutions, and implement the models in a Bayesian framework. Recovery studies and an application to a data set comprised of 1469 participants suggest that (1) both mixture distributions yield similar results; (2) all model parameters can be recovered accurately except for the drift variance parameter; (3) despite poor recovery, the presence of the drift variance parameter facilitates accurate recovery of the remaining parameters; (4) shift, threshold, and drift mean parameters are correlated.

Gaussian distribution) which represents the density of the first-arrival times of a Wiener diffusion process toward a single absorbing boundary. The basic model has three parameters that correspond closely to the three parameters of the Ratcliff diffusion model (Ratcliff, 1978;Forstmann et al., 2016;Ratcliff et al., 2016): (1) the decision threshold α, that is, the distance from the starting point (which we arbitrarily set at 0) to the absorbing barrier; (2) the drift rate ξ of the diffusion process; and (3) the shift parameter θ that quantifies the time required for nondecision processes (see below for a more detailed description of the model).
The vanilla SW model assumes a constant drift rate across trials. However, the presence of endogenous processes, such as fluctuation in attention and motivation, fatigue and boredom, suggest that drift rate might vary across trials (Ratcliff & Strayer, 2014;Ratcliff & Tuerlinckx, 2002;Ratcliff & Van Dongen, 2011;Ratcliff & Van Zandt, 1999): "Parameters may also change from day to day or from one block of trials to the next. Evidence of such drift can be found in the variability of block mean reaction times" (p. 121; Burbeck & Luce, 1982). In order to incorporate across-trial variability in the Wald drift rate parameter ξ , the standard approach is to assume that its value on any specific trial is governed by a certain positivevalued distribution h with parameters γ , that is, ξ ∼ h(γ ). Subsequently, in order to derive the distribution of the first-arrival times t assuming a trial-dependent drift parameter, one has to integrate out drift rate, that is, f (t; α, θ, γ ) = ∞ 0 SW(t; α, θ, ξ ) · h(ξ ; γ )dξ .
How to choose an appropriate distribution h(ξ ; γ ) for the drift rate parameter? Ideally, such a distribution should be (a) plausible (e.g., allowing only positive drift rates); and (b) allow for a closed-form expression f (t; α, θ, γ ).
Here we explore two candidate distributions for h(ξ ; γ ): the truncated normal distribution and the gamma distribution. This general approach to incorporate across-trial variability in the Wald drift rate parameter has been explored in earlier work. Specifically, Desmond and Yang (2011) showed that a closed-form expression for f (t; α, θ, γ ) exists when drift rate follows a truncated normal distribution, and when the SW model has decision threshold α fixed to one and shift parameter θ fixed to zero (see Weiss, 2012, for an application). This scenario deviates from ours since we want threshold and shift parameters to be free, but the general approach is the same (see Whitmore, 1986, for a scenario that deviates even more from ours, but that employs the same general approach). Logan et al. (2014) also use this general approach, but their goal was to incorporate across-trial variability in the threshold parameter.
In this article, we derive the distribution of the firstarrival times for the SW model assuming a trial-dependent drift rate parameter. We also explore an alternative, sampling-based approach. We present Bayesian implementations of both approaches. The adequacy of these implementations is confirmed in a series of parameter recovery studies. Finally, the practicality of our methodology is demonstrated by fitting the extended SW models to a data set comprised of 1469 participants (Woods et al., 2015).

The shifted-Wald model
The shifted-Wald model gives the density of the firstarrival times of a Wiener diffusion process toward a single absorbing boundary. Figure 1 shows how the model conceptualizes the decision-making process as a single-boundary diffusion process. In particular, the model assumes that evidence is accumulated with a drift rate ξ until an absorbing barrier α is reached. Additional delay time is captured by θ -the time required for nondecision processes.
The density of the first-arrival times is known as the Wald distribution or the inverse Gaussian distribution (Anders et al., in press;Donkin et al., 2009;Wald, 1947), and it is given by: with t the first-arrival time. The Wald distribution has a positively skewed unimodal shape as shown in the top of Fig. 1. The extension of the SW model that we consider here assumes across-trial variability in the Wald drift rate parameter either according to a truncated normal distribution or a gamma distribution.

Across-trial variability in the Wald drift rate parameter
Across-trial variability in Wald drift rate parameter ξ can be incorporated by assuming that drift rate follows a positive-valued distribution, that is, ξ ∼ h(γ ). Two candidate distributions for h(ξ ; γ ) are the truncated normal distribution and the gamma distribution. In order to implement the resulting distribution of the first-arrival times, we use two different approaches. Firstly, we derive the analytical solution for the distribution of the first-arrival times by integrating out drift rate, that is, f (t; α, θ, γ ) = ∞ 0 SW(t; α, θ, ξ ) · h(ξ ; γ )dξ . Secondly, we explain a sampling-based implementation, that is, on each trial, we assume that RT follows a SW distribution that depends in part on a trial-dependent drift rate that is itself a draw from a positive-valued distribution.

SW-TN mixture
The first candidate distribution for across-trial variability in the Wald drift rate parameter ξ is the truncated normal (TN) distribution. The TN distribution is a normal distribution that can be bounded below and above (Robert, 1995). Since we are interested in a positive-valued distribution on drift rate, we consider here a TN distribution that is bounded below by zero and unbounded above. The probability density function of the drift rate ξ is then given by: with μ ξ ∈ R and σ ξ > 0. The free parameters are the location parameter μ ξ ∈ R, and the scale parameter σ ξ ∈ R + . Parameters a and b are the lower and upper bounds of the TN distribution, respectively, and φ(·) is the probability density function of the standard normal distribution. Thus, φ(x) = 1 √ 2π exp − 1 2 x 2 , and in The analytical solution for the distribution of the firstarrival times can be derived by "completing the square" and some algebraic manipulations. The result is what we will term the SW-TN mixture: 1 To the best of our knowledge, the distribution given in Eq. 3 has not been derived before; however, the method for obtaining the result followed the approach by Desmond & Yang (2011;see also Logan et al., 2014;and Whitmore, 1986). In contrast to Desmond and Yang, we let the decision threshold and shift parameter free to vary.

SW-GAM mixture
A disadvantage of the TN distribution is that it restricts the values of drift rate ξ to a specified interval [a, b] artificially and abruptly. An alternative candidate distribution that is naturally restricted to positive values is the gamma distribution. For this reason, it has been argued that the gamma distribution is conceptually more attractive (Terry et al., 2015). The probability density function of the gamma 1 A more complete derivation is available on the Open Science Framework (OSF; http://osf.io/av4qn). distribution parameterized by the shape parameter κ ∈ R >0 and the rate parameter τ ∈ R >0 is given by: with κ > 0, θ > 0 and where (κ) is the gamma function evaluated at κ. Analogous to the SW-TN mixture, we assume that RT follows a SW distribution with a drift rate parameter that varies across trials according to a gamma distribution. The next step is to integrate out drift rate resulting in the analytical solution of f (t; α, θ, κ, τ ) = ∞ 0 SW(t; α, θ, ξ ) · GAM(ξ ; κ, τ )dξ . To obtain this solution, we tried standard integration techniques, and also used the computer software Maple and Mathematica (Maple, 2015;Wolfram Research, 2010). The main analytical solution is displayed in Fig. 12 of the appendix. 2 It is evident that the analytical solution is extremely complicated, that is, lengthy and containing the Laguerre polynomial (chapter 3 in Bayin, 2006;Koepf, 1997; more details can be found in the appendix).
For the practical implementation of the analytical solution, we could use the lengthy equation shown in Fig. 12. However, it appears more insightful and efficient to use a for-loop implementation in a probabilistic programming language. This means that on each trial, we assume that RT is a draw from a SW distribution that depends in part on a trial-dependent drift rate parametera parameter that is itself a draw from a gamma distribution with shape parameter κ and rate parameter τ . We refer to this solution as the SW-GAM mixture.
Note that the mean of the gamma distribution is given by μ ξ = κ/τ and the variance by σ 2 ξ = κ/τ 2 . This is equivalent to κ = μ 2 ξ /σ 2 ξ and τ = μ ξ /σ 2 ξ . The advantage of expressing the parameters of the gamma distribution in terms of the mean and the variance is that this allows us to easier compare results of the SW-TN and SW-GAM mixture.

Bayesian implementation of the two mixtures
In order to be able to apply the models to data and draw inferences about their parameters, we implemented the SW-TN and SW-GAM mixtures in a Bayesian framework using the software JAGS (Plummer, 2003; for related Bayesian work see Banerjee & Bhattacharyya, 1979;Betro & Rotondi, 1991). In the case of the SW-TN mixture, we added the analytical solution (Eq. 3) to JAGS as described in the tutorial by Wabersich and Vandekerckhove (2013). For the implementation of the SW-GAM mixture, we added the constant-drift SW distribution (Eq. 1) to JAGS-again following the tutorial by Wabersich and Vandekerckhove (2013)-and then used a for-loop construction (the model file is presented in the Appendix). For both mixtures, the prior distributions for the model parameters were inspired by Jepma et al. (2012;second experiment). These prior distributions are uninformative within a range that is plausible for data from one-choice RT tasks. In particular, we chose the following prior distributions: ). For all model applications reported in this article, we used random starting values, and simultaneously ran three Markov chain Monte Carlo (MCMC) chains. To assess whether the MCMC chains of all parameters had converged successfully from their starting values to their stationary distributions, we visually inspected the chains. In addition, we used theR statistic (Gelman & Rubin, 1992), a formal diagnostic measure of convergence that compares the between-chain variability to the within-chain variability. As a rule of thumb, values ofR close to 1.0 indicate adequate convergence of the chains from their starting values to their stationary distributions, whereas values greater than 1.1 indicate inadequate convergence. All relevant code is available on GitHub (https://github.com/HelenSteingroever/ jags-wald/releases).

Methods
Before fitting the SW-TN and SW-GAM mixtures to real data, it is important to confirm that the two models can accurately recover parameters . An informative parameter recovery study uses a representative number of synthetic participants, a representative number of trials, and representative parameter values. We obtained these representative values by considering the performance of 1469 participants who each contributed 120 trials (Woods et al., 2015; see below for more details on the data set). In our simulation study, we use two generated data sets, one containing the-what we will call in the remainder-SW-TN synthetic participants, and the other containing the SW-GAM synthetic participants. We generated 30 synthetic SW-TN participants and 30 synthetic SW-GAM participants each contributing 120 trials (as in Woods et al., 2015). These synthetic participants were generated with representative parameter values that were obtained as follows: First, we selected 30 participants from Woods' data set who spanned a wide range of task performance. Specifically, we chose the 30 participants who corresponded to the 30 quantiles of the mean RT of all participants in the data set. Second, we fit the SW-TN and SW-GAM mixtures to the data of the 30 selected participants. To fit the data, we used the Bayesian framework outlined in the last section. For each participant, we collected 4000 samples of each chain after discarding the first 1000 samples as burn-in. Whenever this resulted in inadequate convergence of the chains (i.e.,R values larger than 1.05), we fit the data again with 1000 additional samples. We continued this process until allR values for a given subject were below 1.05. Third, for each participant we determined the mode of the posterior distribution of each model parameter. The modes that were obtained from fitting the SW-TN mixture were used to generate 30 synthetic participants with the SW-TN mixture (i.e., the synthetic SW-TN participants). We used the analogous procedure to obtain 30 synthetic SW-GAM participants, that is, we used the SW-GAM model as data-generating model, and as data-generating values we used the modes that were obtained from fitting the SW-GAM model to the data of the 30 participants selected from the data set of Woods et al. (2015).
After having generated the two synthetic data sets each containing 30 participants each contributing 120 trials, we fit the SW-TN mixture to the 30 synthetic SW-TN participants and the SW-GAM mixture to the 30 synthetic SW-GAM participants. We collected 2000 samples of each chain after discarding the first 1000 samples of each chain as burn-in. Whenever this resulted in aR values larger than 1.05, we fit the data again with 1000 additional samples. We continued this process until allR values of a given synthetic participant were below 1.05. Finally, we compared whether the modes of the posterior distributions of the synthetic participants correspond to the data-generating values, and we considered the interquartile ranges of the posterior distributions to assess the uncertainty about the parameter values.
In addition, we extended the recovery study in two ways. First, we repeated the recovery study using 1200 trials instead of 120 trials. More precisely, we fit the SW-TN mixture to 30 SW-TN participants each contributing 1200 trials, and analogously for the SW-GAM mixture. Second, we investigated the impact of the drift variance parameter σ 2 ξ . To this aim, we used the four synthetic data sets of the first two recovery studies, but this time the models were fit with drift variance parameter σ 2 ξ fixed to zero. These models thus assume, falsely, that there is no across-trial variability in the Wald drift rate parameter. Thus, in total we conducted eight model fitting exercises; for each mixture we had two synthetic data sets that differed in whether 120 or 1200 trials were generated, and for each synthetic data set, we fit the full and the restricted model (e.g., drift variance parameter σ 2 ξ fixed to zero).

Results
Visual inspection of all chains and confirmation that all parameters hadR values below 1.05 suggested that the collected samples of all four recovery studies provided a valid approximation to the joint posterior parameter distributions (i.e., adequate convergence of the chains from their starting values to their stationary distributions).
Recovery study 1: 30 synthetic participants each contributing 120 trials Figure 2 shows the correlation between the data-generating values (x-axis) and the modes of the posterior distributions (y-axis) obtained from fitting the SW-TN and SW-GAM mixtures to the data of 30 synthetic SW-TN and SW-GAM participants, respectively, each contributing 120 trials. The error bars represent the interquartile range of the posterior distributions. Under perfect parameter recovery, all dots would lie on the main diagonal, and all error bars would be narrow, indicating little uncertainty about the recovered parameter values. From the figure it is evident that all parameters except for the drift variance parameter can be recovered accurately; overall, there is not much discrepancy between the data-generating values and the modes of the posterior distributions, and only little uncertainty about the true parameter values. However, from Fig. 2 it is also evident that, in the case of the drift variance parameter, many dots strongly deviate from the main diagonal indicating that the drift variance parameter cannot be recovered accurately, and the wide error bars signal high uncertainty. Note that when the posterior distribution is highly skewed, it is possible that the error bars (i.e., the interquartile ranges of the posterior distributions) do not cover the dots (i.e., the modes of the posterior distributions).
To assess whether parameter inferences coincide across the two models, Fig. 3 shows the correlation between the posterior modes of the SW-GAM (x-axis) and SW-TN (y-axis) model, respectively, together with the error bars representing the interquartile ranges of the posterior distributions. First of all, Fig. 3 suggests that the uncertainty about the true parameter values is comparable in magnitude for both models. Second, it is evident that for all parameters except for the drift variance parameter, the modes of both models are similar since most modes lie close to the main diagonal. However, for the drift variance parameter, the posterior modes of the two models deviate from each other in an unsystematic manner; some are higher estimated by the SW-TN mixture, whereas others are higher estimated by the SW-GAM mixture. Finally, note that Fig. 3 underestimates the true accordance of the two models because the models were fit to different synthetic data sets, that is, the SW-TN and SW-GAM mixture were fit to a data set that was generated with the respective mixture. To assess the impact of the number of trials on the parameter recovery, we repeated the same recovery study as presented in the last section, but this time with 1200 trials instead of 120 trials. Figure 4 shows the results. From the figure it is evident that the parameter recovery has improved: The modes of the posterior distribution of the threshold, shift, and drift mean parameter cluster tightly around the main diagonal indicating a close correspondence between the true parameter values and their recovered values. In addition, the narrow error bars signal high certainty about the parameter values. The recovery has also improved in the case of the drift variance parameter, yet systematic deviances and large uncertainty remain. Figure 4 thus suggests that, with a large number of trials, all parameters except for the drift variance parameter can be recovered to a high degree of accuracy.
To assess whether parameter inferences coincide across the two models, Fig. 5 shows the correlation between the modes obtained from fitting the SW-GAM (x-axis) and SW-TN model (y-axis), respectively, together with the error bars representing the interquartile ranges of the posterior distributions. As was the case for the previous recovery study with 120 trials, Fig. 5 suggests that the uncertainty about the true parameter values is comparable in magnitude for both models, and that the parameter inferences for threshold, shift and drift mean are similar for both models. However, in the case of the drift variance parameter, just as when considering 120 trials, the posterior modes of the two models do not match.

Recovery study 3: 30 synthetic participants each contributing 120 trials, fixed drift variance parameter
In order to investigate the impact of the drift variance parameter on parameter recovery, we used the synthetic data sets from the last two sections, but now the two models were fit with a drift variance parameter fixed to zero (i.e., the vanilla SW model with constant drift). Figure 6 shows the correlation between the data-generating values (x-axis) and the modes of the posterior distributions (y-axis) obtained from fitting the SW-TN and SW-GAM mixture, respectively, to the data of 30 synthetic participants contributing 120 trials. The filled dots show the posterior modes of the model with a free drift variance parameter (obtained from recovery study 1), whereas the unfilled dots show the posterior modes of the model with a drift variance parameter fixed to zero. For a given synthetic participant, the impact of fixing the drift variance parameter to zero can be assessed by choosing a certain value on the x-axis; the difference between the corresponding filled and unfilled dot expresses the disagreement between the two models (i.e., drift variance free to vary vs. drift variance fixed to zero). Figure 6 suggests that, in the case of the threshold and the shift parameter, fixing the drift variance parameter to zero does not harm recovery. However, in the case of the drift mean parameter, fixing the drift variance parameter results in a slight underestimation of the drift mean parameter. Figure 6 also suggests that this pattern is present for both the SW-TN and the SW-GAM mixture.

Recovery study 4: 30 synthetic participants each contributing 1,200 trials, fixed drift variance parameter
To investigate whether the impact of fixing the drift variance parameter becomes more apparent when using a larger number of RTs, we repeated the recovery study presented in the last section with 1200 trials instead of 120 trials. Figure 7 shows the results. The underestimation of the drift mean parameter when fixing the drift variance parameter is now more systematic. In addition, fixing the drift variance parameter seems to result in a slight underestimation of the threshold and a slight overestimation of the shift parameter. These results suggest that even though the drift variance parameter cannot be recovered accurately (see Figs. 2 and 4), including it to the models does improve recovery of the remaining parameters. This data set consists of 1469 participants, 40.1% men, 10.8% left-handed by self-report, and all between 18 and 65 years. All participants completed a 120-trial simple RT task that required participants to press a response button on a computer gaming mouse as fast as possible once upon detecting a stimulus appearing either in the left or in the right hemifield. As was done by Woods et al. (2015), we excluded RTs less than 110 ms and greater than 1000 ms. More details about the data set and the experiment can be found in Woods et al. (2015; Experiment 1). The data set can be downloaded from the OSF at http://osf.io/av4qn.

Cognitive modeling analyses
We fit both models (i.e., the SW-TN and SW-GAM mixture) to the data set provided by Woods et al. (2015) using the Bayesian framework outlined above. We collected 4000 samples of each chain after discarding the first 1000 samples of each chain as burn-in. Whenever this resulted in inadequate convergence (i.e.,R values larger than 1.05), we fit the data again with 1000 additional samples. We continued this process until allR values for a given subject were below 1.05.

Results
Visual inspection of the chains and confirmation that all parameters hadR values below 1.05 suggested that the collected samples provided a valid approximation to the joint posterior parameter distribution. To assess each model's goodness-of-fit, we compare for each participant the observed .1, .3, .5, .7., .9 quantiles to the predicted quantiles (see also Ratcliff and Van Dongen (2011)). 3 The predicted quantiles were obtained by generating 120 RTs using the modes of the posterior distributions, for both mixtures separately. Figure 8 suggests that both mixtures provide a good account of the data as the predicted quantiles closely correspond to the observed quantiles. Figure 9 shows the histograms for the modes of the posterior distributions of all participants for the SW-TN and SW-GAM mixture. First of all, it is evident that the histograms look very similar for both mixtures. Second, the mode of the threshold parameter is around 1, the mode of the shift parameter around .13, and the mode of the drift mean parameter is around 10. Third, the mode of the drift variance parameter has a bimodal distribution and should not be interpreted due to poor parameter recovery and high uncertainty. Figure 10 shows contour plots that represent the correlation between the parameters of the SW-TN and SW-GAM mixture for a representative participant. 4 A comparison between the left and right columns suggests that the correlation pattern is consistent across both mixtures. Second, there is no substantial correlation between drift variance and the remaining parameters (bottom three rows of Fig. 10)-a foreseeable finding given the large uncertainty about the drift variance parameter observed in the recovery studies. Third, the top row of Fig. 10 shows a strong negative correlation between the threshold and shift parameter suggesting that a higher threshold can be compensated by lower shift. Fourth, the second row of Fig. 10 shows a strong positive correlation between the threshold and drift mean parameter suggesting that a higher threshold can be compensated by a higher drift mean. Finally, a-at first glance-counterintuitive correlation is shown in the third row: The shift parameter is negatively correlated with the drift mean parameter. This finding is driven by the correlations shown in the first two rows of Fig. 10. It is evident that increasing the shift parameter results in a lower threshold (first row), and a lower threshold results in a lower drift mean (second row).

Discussion
This article illustrated how the shifted-Wald model can be extended to incorporate across-trial variability in the Wald 3 Note that we did not compare each model's performance to the performance of a benchmark model (e.g., a model without random drift) because our purpose is parameter estimation and not model selection. 4 The correlations for all remaining participants can be examined using the data and R code provided on the OSF at http://osf.io/av4qn. This also involves fitting the model using the R code provided on GitHub at https://github.com/HelenSteingroever/jags-wald/releases. drift rate parameter. More specifically, we assumed that the trial-dependent drift rate is governed by either a truncated normal distribution or a gamma distribution. We showed that analytical solutions exist for the resulting distributions of the first-arrival times, and explicitly derived them. In addition, we explained how they can be implemented in a Bayesian framework. Due to the complexity of the analytical solution for the SW-GAM mixture, we only implemented the analytical solution for the SW-TN mixture, but provided a sampling-based implementation for the SW-GAM mixture. We confirmed the adequacy of both implementations in four recovery studies, and demonstrated their practicality on the example of a data set comprised of 1469 participants (Woods et al., 2015).
Our recovery studies showed that (1) all parameters except for the drift variability parameter can be recovered accurately; (2) parameter recovery of the drift variability parameter is problematic, yet including this parameter results in more accurate parameter recovery of the remaining model parameters; and (3) parameter inferences are consistent across both the SW-TN and the SW-GAM mixtures. Our application to real data also suggests that the results are consistent across both mixtures. In addition, we showed that both mixtures provide a good account of the data, but that the shift, threshold, and drift mean parameters are correlated.
Our models can be readily applied to a large variety of one-choice RT tasks-tasks that require participants to "initiate a simple, preprogrammed response to a simple triggering signal" (p. 49; Luce (1986)). Examples of such tasks are simple RT tasks, go/no-go tasks, and temporal-cueing tasks (Heathcote, 2004;Jepma et al., 2012;Kamienkowski et al., 2011;Luce, 1986;Schwarz, 2001;Smith, 2000). Next to one-choice RT tasks, our models can also be applied to tasks from other domains involving duration phenomena. Examples are job completion times of employees in economics (Desmond & Chapman, 1993), and organ transit time distributions of vascular markers (so called washout curves) in biology (Weiss, 2012). In addition, our implementations can be extended to account for contaminant processes by assuming that the observed RT originates from two processes-the Wiener diffusion process toward a single absorbing boundary and a contaminant process that can be modeled as a uniform distribution between a lower and upper RT bound (Jepma et al., 2012;Ratcliff & Tuerlinckx, 2002;Zeigenfuse & Lee, 2010). Yet another possible extension of our implementations is to also account for commonalities and differences across participants using a Bayesian hierarchical framework (Anders et al., in press;Jepma et al., 2012;Shiffrin et al., 2008).
Our results suggest that including variability in the Wald drift rate parameter may be worthwhile since it improves the parameter recovery of the remaining model parameters (e.g., Burbeck & Luce, 1982;Ratcliff & Strayer, 2014;Ratcliff & Tuerlinckx, 2002;Ratcliff & Van Dongen, 2011;Ratcliff & Van Zandt, 1999; for an alternative perspective see van Ravenzwaaij et al. (2017)). A crucial advantage of our implementation is that it avoids having to fit the data by simulation (e.g., Ratcliff and Van Dongen, 2011;Ratcliff & Strayer, 2014). In addition, our results show that the parameter inferences are not affected by whether we use the analytical solution for the SW-TN mixture or the samplingbased solution for the SW-GAM mixture. This suggests that the inference is robust to the type of distribution on the drift rate parameter and to the type of implementation. Our recommendation to include across-trial variability in the Wald drift rate parameter relied on a recovery study with generated data from a shifted-Wald model that incorporates across-trial variability. In real data, across-trial variability in the drift rate may be absent or negligible. In order to assess the extent to which the data support the inclusion of the across-trial variability in the drift rate, we recommend researchers to use the Bayes factor (Etz & Wagenmakers, 2017;Jeffreys, 1961;Vandekerckhove et al., 2015); this can be accomplished using bridge sampling and the data pool reported in this article (Gronau et al., in press). As a side note, even though it would theoretically be possible to add across-trial variability in the shift parameter and the threshold parameter as well, we believe that for realistic sample sizes the Bayes factor will indicate that the model without theses variabilities shows better predictive performance. Here we focus on drift rate variability because it is the most consequential and the most interesting from a psychological perspective (see also Ratcliff, 1978). Finally, our results can be used to create informative prior distributions for the SW-TN and SW-GAM mixtures. Since we fit both mixtures to a data set comprised of 1469 participants, the histograms for the posterior modes (Fig. 9) offer a good description of which parameter values can be expected when modeling one-choice RT data. One way to create informative prior distributions based on Fig. 9 is to use uniform distributions with a lower and upper bound corresponding to the bounds shown in that figure (see Matzke & Wagenmakers, 2009). Even more informative prior distributions can be used by fitting a positive-valued distribution to the distributions of the modes, and then using the resulting distributions as prior distributions. 5 To conclude, we have extended the SW model to incorporate across-trial variability in the Wald drift rate parameter, and derived distributions of the first-arrival times under the assumption that individual-trial drift rates are governed either by a truncated normal distribution or a gamma distribution. Our extended model can be applied to a broad range of data sets involving duration phenomena and can easily be extended to account for contaminant processes and hierarchical structure.
12 show the results. The results are presented for several scenarios that differ in the number of parameters fixed to one. To obtain the results we used standard integration techniques, but also the computer software Maple (2015). We also tried to derive the results using Mathematica (Wolfram Research, 2010). The results were either identical to the ones provided by Maple, or else Mathematica failed to provide any results.
From the table and the figures it is evident that assuming a decision threshold α fixed to one results in relatively short solutions that are straightforward to implement and computationally inexpensive; however, letting decision threshold α free to vary results in lengthy solutions involving the Laguerre polynomial (https://www.maplesoft.com/support/help/maple/view. aspx?path=LaguerreL; retrieved 2016, July 14; chapter 3 in Bayin, 2006;Koepf, 1997). Note that LaguerreL(n, a, x) is a function implemented in Maple that computes the n th Laguerre polynomial.
In principle, it is possible to add the solution displayed in Fig. 12-the solution where all relevant parameters are free to vary-to JAGS following the procedure described by Wabersich and Vandekerckhove (2013). Since the LaguerreL function is not contained in JAGS, this would involve programming the LaguerreL function in C++, a programming language that is used for the JAGS implementation. In order to program the Laguerre polynomial one can use its definition as the product of a generalized binomial coefficient and the Kummer confluent hypergeometric function of the first kind (also known as Kummer's function of the first kind; Equation 10 in Georgiev & Georgieva-Grosse, 2010): LaguerreL(n, a, x) = L a n (x) = n + a n · KummerM(−n, a + 1, x). (5) The binomial coefficient in Eq. 5 can be rewritten using: n + a n = (n + a + 1) (n + 1) · (a + 1) .
The Kummer confluent hypergeometric function of the first kind is already implemented in C++. Note that the analytical solutions presented in Figs. 11 and 12 involve Laguerre functions with negative inputs, resulting in negative inputs for the gamma function in Eq. 6. Negative inputs to the gamma function are not allowed in C++ that is used for the JAGS implementation. One therefore needs to use the following definition of the gamma function for negative inputs excluding integers (otherwise the sine is zero): SW(t; α, θ, ξ ) GAM(ξ ; κ, τ = 1) See Fig. 11 GAM(ξ ; κ, τ ) See Fig. 12 The analytical solutions were derived with standard integration techniques and Maple. Each row differs depending on which parameter of the SW and gamma distribution were fixed to one (http://algolist.manual.ru/maths/count fast/gamma function.php; retrieved July 14, 2016). The classical identities (x) = (x − 1)! and (x) = (x+1) x suggest that the gamma function is not defined for negative integers including zero (i.e., x / ∈ Z − 0 ). However, Fisher and Kılıcman (2012) proved that the gamma function can be defined for negative integers as follows:

JAGS model files
This section presents the JAGS model files to implement the SW-TN and SW-GAM mixture. All relevant code can be downloaded from GitHub at https://github.com/ HelenSteingroever/jags-wald/releases.
The JAGS model code for the SW-TN mixture is as follows: The JAGS model code for the SW-GAM mixture: Note that the T(0,) argument ensures positive values and that the second argument of the normal distribution refers to the precision being defined as the reciprocal of the variance (i.e., λ = 1/σ 2 ). In addition to the dunif(0, 5) prior (i.e., a uniform prior ranging from 0 to 5, that is, U (0, 5)) on the drift variance, we also tried a dunif(0, 2) prior for our recovery studies. The problem with this parameter is that, in the recovery studies, a lot of mass of many posterior distributions often lies close to the upper bound even though the true values did not-a finding that did not affect the remaining parameter inferences, but that we observed independently of whether an upper bound of 2 or 5 was used. We assume that this pattern will remain even if we use a yet higher upper limit (i.e., the principle of stable estimation; Edwards et al., 1963). We therefore decided to use a dunif(0, 5) prior on the drift variance for all analyses presented in this article. Fig. 11 Solution of ∞ 0 SW(t; α, θ, ξ ) · GAM(ξ ; κ, τ = 1) dξ , with κ the shape parameter and τ the rate parameter, derived by Maple Fig. 12 Solution of ∞ 0 SW(t; α, θ, ξ )·GAM(ξ ; shape = κ, rate = τ ) dξ , with κ the shape parameter and τ the rate parameter, derived by Maple