Are there jumps in evidence accumulation, and what, if anything, do they reflect psychologically? An analysis of Lévy Flights models of decision-making

According to existing theories of simple decision-making, decisions are initiated by continuously sampling and accumulating perceptual evidence until a threshold value has been reached. Many models, such as the diffusion decision model, assume a noisy accumulation process, described mathematically as a stochastic Wiener process with Gaussian distributed noise. Recently, an alternative account of decision-making has been proposed in the Lévy Flights (LF) model, in which accumulation noise is characterized by a heavy-tailed power-law distribution, controlled by a parameter, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α. The LF model produces sudden large “jumps" in evidence accumulation that are not produced by the standard Wiener diffusion model, which some have argued provide better fits to data. It remains unclear, however, whether jumps in evidence accumulation have any real psychological meaning. Here, we investigate the conjecture by Voss et al. (Psychonomic Bulletin & Review, 26(3), 813–832, 2019) that jumps might reflect sudden shifts in the source of evidence people rely on to make decisions. We reason that if jumps are psychologically real, we should observe systematic reductions in jumps as people become more practiced with a task (i.e., as people converge on a stable decision strategy with experience). We fitted five versions of the LF model to behavioral data from a study by Evans and Brown (Psychonomic Bulletin & Review, 24(2), 597–606, 2017), using a five-layer deep inference neural network for parameter estimation. The analysis revealed systematic reductions in jumps as a function of practice, such that the LF model more closely approximated the standard Wiener model over time. This trend could not be attributed to other sources of parameter variability, speaking against the possibility of trade-offs with other model parameters. Our analysis suggests that jumps in the LF model might be capturing strategy instability exhibited by relatively inexperienced observers early on in task performance. We conclude that further investigation of a potential psychological interpretation of jumps in evidence accumulation is warranted. Supplementary Information The online version contains supplementary material available at 10.3758/s13423-023-02284-4.

The most popular and well-studied model of simple decision-making is the diffusion decision model (DDM) of Ratcliff (Ratcliff, 1978;Ratcliff & McKoon, 2008;Ratcliff & Rouder, 1998).The DDM represents evidence as a single signed value, X (t), that is accumulated from a starting point, z, toward one of two absorbing boundaries, located at zero and a. Psychologically, the boundary separation parameter, a, characterizes response caution, with more widely separated boundaries reflecting more cautious responding (i.e., a greater quantity of evidence is required to trigger a response).The start-point parameter, z, is interpreted psychologically as characterizing response bias, and varies uniformly with mean z and range s zr .An unbiased decision-maker will begin evidence accumulation from z = a 2 .For a biased decisionmaker, the accumulation process will begin closer to the boundary favored by the bias.The rate of evidence accumulation, the drift rate of the diffusion process, has a mean v and standard deviation s v across trials.Psychologically, the drift rate is determined by the quality of the stimulus.The sign of the drift rate determines which response alternative evidence tends to accumulate towards.Drift rates with absolute values that deviate further from zero correspond to stimuli with higher-quality information that more readily discriminates between response alternatives.Mathematically, evidence accumulation in the DDM can be described as, where N denotes a standard normal distribution and t represents the time step.The accumulation dynamics of Eq. 1 determine the decision time, but behavioral response times also consist of the time required to encode the stimulus and execute a response.In the DDM, these components of nondecision time are described by the parameter, t 0 (also written as T er ).Non-decision time is assumed to vary uniformly with mean t 0 and range s t .Components of the DDM (i.e., without variability parameters) are illustrated in Fig. 1.
A number of alternatives to the DDM have been developed.These include "reduced" forms of the standard DDM, incorporating simplifying assumptions that remove betweentrial parameter variability (Wagenmakers et al., 2007), as well as models that make different assumptions about how evidence is represented in the model.For example, multiaccumulator models (e.g., Brown & Heathcote, 2005, 2008;Hawkins & Heathcote, 2021;Smith, 2022;Smith & Vickers, 1988;Tillman et al., 2020;Usher & McClelland, 2001), represent separate absolute evidence totals for each response alternative that race against each other, rather than a single signed relative evidence total.More recently, Voss and colleagues (Voss et al., 2019;Wieschen et al., 2020) have proposed the Lévy Flights (LF) model as another alternative, which includes the DDM as a special case.The LF model differs from the DDM in assuming that there are random "jumps" in evidence accumulation that do not conform to the standard Gaussian noise process described in Eq. 1.To allow for these large sudden changes in evidence accumulation, the noise in the accumulation process is instead characterized by a heavy-tailed α-stable distribution with long-tailed, powerlaw asymptote λ(x) ∼ |x| −1−α (0 < α ≤ 2) (Padash et al., 2019).Evidence accumulation in the LF model can therefore be written as, In Eq. 2, α, β, γ , and δ are the parameters of an α-stable distribution, where α ∈ (0, 2] is the stability index (Lévy index), β ∈ [−1, 1] is the skewness parameter, γ > 0 is the scale parameter, and δ is the shift parameter that can be any real number.In the LF model, the distribution of accumulation noise is α-stable with fixed parameters β = 0, γ = 1 √ 2 , and δ = 0 (Gikhman & Skorokhod, 1975;Samorodnitsky & Taqqu, 1994) and α is the free parameter which is constrained to , δ = 0), and 1 ≤ α ≤ 2).When α = 2, the accumulation noise is Gaussian, and the model is equivalent to the DDM (Eq.1).When α = 1, the accumulation noise is Cauchy distributed.Accumulation dynamics for the standard DDM and the LF model with different α values are shown in Fig. 2.
To date, the LF model has only been applied sparingly in psychological research.The domains in which the model has proved useful, however, are quite broad, having been applied to memory search and retrieval processes (Patten et al., 2020), semantic memory search processes (Montez et al., 2015;Rhodes & Turvey, 2007), spatial memory (Kerster et al., 2016), perception in typically developing children and children with autism spectrum disorder (Liberati et al., 2017), and animal foraging (Reynolds, 2012).There are also neurological motivations for considering the LF model as a model of decision-making.For example, Wardak and Gong  (Voss et al., 2019;Wieschen et al., 2020), arguing for several potential benefits over the DDM, which we now summarize.
First, Voss et al. (2019) showed that LF models can provide highly accurate accounts of fast error patterns in data.In the DDM, fast errors have been previously explained in terms of start-point variability (Ratcliff & Rouder, 1998;Smith et al., 2014).While start-point variability has good psychological plausibility-it can be related theoretically to sequential effects in stimulus presentation and/or responding (e.g., Rabbitt, 1969), an account that has received support from cognitive modeling (e.g., Bode et al., 2012)-reliance on between-trial variability parameters has been criticized due to relatively poor parameter recovery properties (Lerche & Voss, 2016;Voss et al., 2019).The LF model can account for errors that are, on average, faster than correct responses without between-trial variability in start-point, attributing them to large jumps that occur early in evidence accumulation that quickly move the process toward the incorrect absorbing boundary.This account can be investigated in more detail by examining correlations between the α parameter and overall error rate (Wieschen et al., 2020).
Second, extending the applications to data exhibiting fast errors, the LF model may be useful for examining timepressured decisions, where fast decisions are more important than accurate ones.In these paradigms, a decision-maker can choose among three types of strategies: (1) maintaining a low fixed evidence threshold, (2) implementing a collapsing evidence threshold (Cisek et al., 2009;Ditterich, 2006;Drugowitsch et al., 2012;Hawkins et al., 2015;Thura et al., 2012;Zhang et al., 2014), or (3) capitalizing on sudden jumps during evidence accumulation.A low fixed evidence threshold leads to many incorrect decisions in difficult conditions and is therefore not a good option (Evans et al., 2020).A collapsing threshold overcomes the problem of an unacceptably high error rate in difficult conditions, but does not produce fast errors (Evans et al., 2020).Taking advantage of sudden jumps in evidence allows the participant to reach an acceptable level of overall accuracy in this type of time-pressured environment while also accounting for fast errors.Wieschen et al. (2020) found high positive correlations between response time (RT) and the α parameter, as well as between overall accuracy and the α parameter: the shorter the response time (or the faster the response), the smaller the α parameter, which implies that sudden jumps in evidence accumulation contributed to these faster decisions.
Third, jumps in evidence accumulation can be related to the "jumping to a conclusion" phenomenon (McKay et al., 2006), which is not readily achieved within a DDM framework.The heavy tails in the accumulation process outlined by the LF model are controlled by the value of the α parameter.Psychologically, lower values of α could correspond to a higher probability of suddenly accumulating an extreme amount of evidence (Wieschen et al., 2020).This phenomenon may occur in groups with high impulsivity, whose decision strategies may be more naturally modeled as a LF process due to lack of stability.Sudden accumulation of evidence might also be interpretable as a consequence of changes in the allocation of attentional resources that arise through the use of a hypothesis testing strategy (e.g., toggling between different decision strategies when judging a multi-attribute stimulus; cf.Lamberts, 1995).
Notwithstanding the potential advantages of using the LF model as a generalized form of the DDM, it is unclear whether situations where the LF outperforms the DDM are best attributed to the LF model providing a more accurate characterization of within-trial noise, or simply to increased model flexibility.Although Voss and colleagues have argued for the former (Voss et al., 2019;Wieschen et al., 2020), support for the LF model as a decision-making model has come primarily from assessing goodness of fit against other competing models (e.g., the standard DDM and various collapsing threshold models), and not from a more detailed theoretical analysis of the critical α parameter.
We address this issue by exploring Voss et al.'s (2019) conjecture that jumps in evidence accumulation may be due to sudden on-the-fly shifts of attention that may reflect instability in the decision strategy (or the capacity to execute the strategy) adopted by the observer.We argue that if the jumps in evidence accumulation assumed by the LF model can be psychologically interpreted in this way, we should be able to observe systematic changes in α, as people become more experienced in performing a task or become more familiar with the stimuli they are presented with.That is, the prevalence of sudden jumps in evidence accumulation should progressively reduce as people settle on a consistent decision strategy for performing a task and/or grow more adept at parsing and encoding stimuli in a way that supports effective task performance.This means that people's early decision performance might best be modeled using an α value that is less than 2, but with increasing experience over the course of an experiment, approaches a value of 2, approximating the behavior of the standard DDM.If, on the other hand, α cannot be psychologically interpreted in this way-if it enables better fits to data by simply increasing flexibility-we should not observe any systematic changes in the parameter with experience.To further explore the issue of model flexibility, we also consider whether the α parameter trades off with either mean drift rate or other between-trial variability parameters in the standard DDM (e.g., variability in start-point).
We structure the rest of the article as follows.We first provide a more detailed mathematical overview of the LF model.We then report a re-analysis of data from Evans and Brown (2017) that enables tracking of the α parameter (and other LF model parameters) as a function of task experience.In their study, Evans and Brown (2017) analyzed people's performance in a motion discrimination task using the DDM, showing that people set overly cautious decision thresholds at the beginning of the task, but relax their thresholds with experience in a way that approaches an optimal setting (i.e., one that maximizes the rate of reward on the task).Since reductions in decision threshold over time reflect systematic changes in the way people approach the task-how the quality of information extracted from the stimulus is reconciled with the need to respond quickly and accurately-one might predict that if the α parameter reflects instability in executing a decision strategy (or shifting attention among multiple competing strategies), its value should systematically change in a way that reflects the gradual refinement of (or convergence toward) a stable encoding and decision strategy.Comparing changes in α over time with corresponding changes in other between-trial variability parameters (i.e., s v , s zr , and s t ) allows us to judge whether α is simply acting as a surrogate for one of these parameters, which would suggest that improvements in fit afforded by the LF model might reflect greater model flexibility rather than a more accurate description of psychological processing.

Lévy Flights Model
Mathematically, the Lévy Flights model is a generalized continuous-time random walk process, a sequential sampling model (Voss et al., 2019) that uses an α-stable jump length distribution (or Lévy distribution Gnedenko & Kolmogorov, 1954) with a long-tailed, power-law asymptote λ(x) ∼ |x| −1−α (0 < α ≤ 2) to describe noise in the accumulation process (Hadian Rasanan et al., 2022;Padash et al., 2019Padash et al., , 2020)).Therefore, information accumulation in LF model can be formulated as follows: , δ = 0), v is the drift rate, and the process terminates whenever X (t) ≥ a or X (t) ≤ 0. A technical point concerns the different ranges of values α can take with respect to psychological plausibility.The value of the α parameter ranges between 0 to 2 in the α-stable distribution.However, when α is lower than 1, the resulting accumulation process becomes an anomalous sub-diffusion process that has not been considered in the decision-making literature.In keeping with theoretical precedent within psychology, we restrict the range of α in Eq. 3 to lie between 1 and 2. By considering Eq. 3, the process can be considered as a continuous time process vt + Z t , in which: Thus, the Fourier Transform of the probability density function of the location of the accumulated evidence total is obtained as follows: (5) where r represents the probability of jumping upward, and q is the probability of jumping downward.Moreover, it can be concluded that the p(k, t) satisfies the following differential equation (Meerschaert & Sikorskii, 2011): (6) By applying the inverse Fourier Transform on both sides of Eq. 6 the following space fractional partial differential equation is obtained (Meerschaert & Sikorskii, 2011;Padash et al., 2019;Hadian Rasanan et al., 2022): and by regarding the starting and termination conditions of the process, the following initial and boundary conditions are obtained: The fraction derivative in Eq. 7 (i.e.D α x ) is defined as follows: where ∞ are the left and right Riemann-Liouville fractional derivatives and have the following definitions (Ding & Li, 2017;Hadian Rasanan et al., 2020): (11) By considering Eq. 7, the probability of the location of the accumulated evidence total at time t can be obtained by solving the space fractional partial differential equation.Thus, by approximating the solution of this equation the survival probability of the decision process can be obtained as: which determines whether the location of the accumulated evidence total is still somewhere between the absorbing boundaries at time t, or if the process has already terminated at one of them.

Evans & Brown (2017) Data Reanalysis
We now present a reanalysis of data from a study by Evans and Brown (2017), using the LF model.Their study examined reward rate optimality in perceptual decision-making when people had either a fixed amount of time to complete the task, or a fixed number of trials to complete.This data set was selected because it allows us to examine whether the α parameter is sensitive to refinement in people's decision-making strategy over time.The task also varied the level of performance feedback provided to participants, providing more fine-grained information on factors that influence the rate at which people's response strategy becomes tuned to the task.For simplicity, our analysis focuses on the fixed-trial conditions from their study, in which participants completed a fixed number of trials in a motion discrimination task using random dot kinematogram stimuli (Roitman & Shadlen, 2002).On each trial, participants viewed an array of moving dots, a proportion of which moved coherently in one direction.Participants were randomly assigned to one of three conditions that differed in terms of the level of feedback provided at the end of each trial block.In the 'low information' condition, participants were only informed that the block had been completed.In the 'medium information' condition, participants were informed about their performance in that block (i.e., how many points they accrued in the block, the time taken to complete the block, and the rate at which they accrued points).In the 'high information' condition, some hints on how participants could improve their performance were presented in addition to the summary provided in the medium information condition (i.e., participants were advised that responding faster/slower could adjust their performance in a way that increased the rate at which they accrued points).
The key finding from the study was that people tend to perform non-optimally by being overly cautious.However, with highly informative feedback and practice on the task, people's performance rapidly approaches optimality.

Fits of the Lévy flights model to human data
There were 85 total participants in the original data set, of whom 39 were assigned to one of the fixed-trial conditions that we focus on.Following Evans and Brown (2017), we removed nine participants with accuracy less than 70%, yielding 30 participants with data that we modeled (i.e., 10 participants in the 'low information' condition, 9 in the 'medium information' condition, and 11 in the 'high information' condition).Evans and Brown analyzed the data from each of the 24 blocks of trials using the DDM, yielding parameter estimates for each individual block.For our analysis, the relatively low number of trials per block (N = 40) presented difficulties in parameter recovery for the LF model, and so we elected to collapse the data into four larger trial epochs.Each epoch consisted of six blocks of the original design (e.g., Epoch 1 consisted of data from trial blocks 1-6 of the Evans and Brown study).Each epoch in our analysis comprised N = 240 trials.
Our main interest was to examine changes in the LF jump parameter, α, over the course of the task.If this parameter reflects instability in selecting or executing a decision strat-egy, as conjectured by Voss et al. (2019), we would expect to see systematic changes in α in the practice data of Evans and Brown (2017).To this end, we compared five versions of the LF model, each instantiating different assumptions about the latent processes involved in decision making.If people's stability in strategy selection or execution increases with practice, we would expect a model that allows α to vary over time to successfully account for the data while also showing reductions in the rate of jumps in evidence accumulation.If, on the other hand, α serves to simply increase the flexibility of the model-improving fit, but without demonstrating any clear systematic mapping with performance or task parameters-we would not expect to see any clear changes in α as people become more experienced with the task.We can further consider whether α adds flexibility by duplicating the effects of other DDM parameters that describe across-trial variability in different processing components.Specifically, we investigate whether the effects of allowing α to vary over time can be captured by across-trial variability in drift rate, the start-point of evidence accumulation, and non-decision time.Finally, because refinement of a decision strategy might be described more simply in terms of increases in drift rate-a product of improved stimulus encoding potentially driven by selective attention mechanisms more effectively facilitating focus on the most task-relevant properties of the stimuluswe also consider whether the effects of α can be explained in terms of changes in mean drift rate.
The models were fitted to the data from the high, medium, and low information conditions such that all model parameters were allowed to vary across conditions (i.e., different parameter estimates across different conditions).For each model, we estimated a decision threshold for each trial epoch (a 1 , a 2 , a 3 , a 4 ).Following Evans and Brown (2017), we also estimated a single drift rate (v) for the entire experiment in the first four models.To examine whether their practice effects can be captured by the drift rate parameter we estimated four drift rates alongside epoch-wise estimates of α in the fifth model.For all models, we estimated a single relative starting point bias (z r = z a ), and a single non-decision time (t 0 ) for the entire experiment.In the model of primary interest, the α varying model, we estimated a separate α parameter for each trial epoch.We did not estimate any sources of between-trial parameter variability for this model (i.e., s v , s zr , and s t were all fixed to 0).We also considered several control models that all assumed a fixed level of α across the experiment, but allowed a different between-trial variability parameter to differ across trial epochs.The rationale for the control models was to determine whether the jump parameter in the LF model is simply mimicking the effects of one of the between-trial variability parameters, which confer some additional flexibility to the standard DDM.Specifically, we fitted three such control models, the drift rate variability model, the start-point variability model, and non-decision time variability model, which each allowed their namesake parameter, s v , s zr , or s t , respectively to vary across trial epochs.We also fitted another control model, the drift & α varying model, in which both mean drift rate and α parameters were free to vary across trial epochs.
To the extent that any systematic changes in α are mirrored in the other between-trial variability parameters or drift rate, we can conclude that α primarily contributes flexibility to the LF model without having a clear psychological interpretation that is distinct from existing model parameters.Table 1 summarizes the free parameters for the five models fitted to the data.While there have been recent attempts for approximating the likelihood function the LF model using partial differential equations (Hadian Rasanan et al., 2022), the lack of a closed-form likelihood function typically impedes Bayesian inference methods (Fengler et al., 2021;Voss et al., 2019).A notable paradigm of approximating the likelihood function is approximate Bayesian computing (ABC; Turner & Van Zandt, 2012).There are various ABC algorithms for likelihood approximation, but recently, some researchers have combined neural network approaches with traditional ABC algorithms and have improved them (Fengler et al., 2021;Radev et al., 2020).Previously, a convolutional neural network (CNN) was developed by Radev et al. (2020) to approximate the likelihood function.This network can approximate the parameters of a stochastic process by learning a large number of simulations of that process.We fit the four variants of the LF models described above using a deep inference algorithm based on the CNN described by Radev et al. (2020).This neural network approach learns approximate likelihoods for the LF models, allowing fast posterior sampling with only a one-off cost for model simulations that are amortized for future inference.Parameter estimation for each of the five LF models was achieved by training its own dedicated CNN.For example, for the non-decision time variabil- ity model, we generated 28000 data-sets with 240 trials of the LF model, and the network is trained using these data-sets.Each network had 5 layers and the size of filters in each layer were 64, 64, 128, 128, and 128, respectively.Additionally, each network consisted of one channel for each trial epoch (i.e., four channels in total).The architecture of the utilized network is presented in Fig. 3.For the α varying model, the network estimates a drift rate, a non-decision time, a bias for the experiment, and four α and four decision threshold parameters (one for each trial epoch).

Modeling results
We now report the results of fitting the LF models to the individual data from each condition of the Evans and Brown (2017) study.We report summaries of the parameter estimates in the main text, and more detailed information about parameter estimates at the individual level in the Supplementary Materials.
We first consider the quality of fit for the α varying model to the data from the three information conditions.Figure 4 plots Fig. 3 The architecture of the Deep Inference neural network we used for parameter estimation in which m denotes the number of parameters that the network is estimating, k denotes the number of epochs, and n denotes the size of the input data Fig. 4 Predictions of the α varying model against observed data in different conditions for correct RTs in seconds (top panels), error RT (middle panels), and accuracy (bottom panel).Across columns, panels in the top two rows show response times for the 0.1, 0.3, 0.5, 0.7, and 0.9 distribution quantiles.Each point shows data for one participant model predictions against observed data for the RT distribution data for correct responses (top panels), error responses (middle panels), and accuracy (bottom panel).The model provides a reasonable account of the data, though there is a general tendency for the model to predict slower correct RTs than are observed empirically, and to overestimate accuracy.The overestimation of the fastest correct RTs and accuracy appears most pronounced for the high information condition.
Figure 5 shows how the jump parameter, α, varies across trial epochs in each condition.In the 'low' and 'medium' information conditions, changes in α are not clear-cut across trial epochs.In the 'high' information condition, however, there are general increases in α (i.e., a 17% increase in the last epoch with respect to the first epoch) alongside lower levels of variability across trial epochs.Although there is substantial overlap in the distributions of α, particularly in the earliest epochs, the overall pattern suggests greater stability in strategy selection and/or execution with increasing practice.Indeed, the pattern of increasing α values in this condition means that the behavior of the model more closely approximates the standard DDM with increasing practice.If α tends to increase with practice, it is perhaps unsurprising why the DDM has proved so successful: in highly practiced participants, whose data are often used to test the standard model, within-trial accumulation noise will be well characterized by a Gaussian process (or equivalently, an α-stable distribution, where α=2).We note, however, that the estimates of α remain lower than 2, suggesting that decision stability has yet to plateau after a single session of practice on a task.This leaves open the possibility that analysis with the LF model may even be beneficial in more highly practiced individuals.
We note that the changes in α did not come at the expense of the changes in the decision threshold reported by Evans and Brown (2017).Figure 6 shows systematic reductions in decision threshold across trial epochs, replicating their core result.
We next considered the performance of the four control models that allowed either drift rate variability (s v ), startpoint variability (s zr ), non-decision time variability (s t ), or both mean drift rate (v) and α to differ across epochs.Changes in each model's namesake parameter across trial epochs are shown in Fig. 7.It is clear from the figure that there are no systematic changes in non-decision time variability across trial epochs (third row of panels).This suggests that there Similarly, for the α & drift varying model, averaged across participants, α in the final trial epoch is increased by 9.7%, 0.5%, and 3.5% relative to the first, for the High, Medium, and Low information conditions respectively is no tradeoff between α and non-decision time variability.However, there appear to be systematic changes in drift rate variability (top panels) as well as start-point variability (second row of panels) as a function of practice, which raises concerns about whether the changes in α reported for the α varying model are simply mimicking changes in these between-trial variability parameters that are available to the standard DDM.
Turning first to the drift variability model, there is a decreasing trend in drift rate variability, s v , across trial epochs in both the 'medium' and 'high' information conditions.This decreasing pattern is only significant in the 'medium' condition.One interpretation of this trend is that it reflects increased consistency in how the stimulus is encoded as people become more practiced with the task.This interpretation aligns well with the interpretation of α proposed by Voss et al. (2019) (i.e., that it reflects consistent strategy use and/or the capacity to reliably execute a given decision strategy).It is possible, then, that the changes in α seen with the α varying model are actually mimicking reductions in drift rate variability.Based on the model selection results reported in Table 2, and the relatively poor performance of the drift variability model, it appears that the degree of mimicry between α and s v is only partial.Given that changes in s v are insufficient to explain the effects of gradual increases in α with practice, we tentatively conclude that changes in α may reflect more stable reliance on a well-learned decision strategy, consistent with the claims of Voss et al. (2019), and that these changes are not restricted to consistency in strategy selection or application across trials.
Turning to the start-point variability model, there is a clear increasing trend in start-point variability, s zr across trial epochs in all 'low', 'medium', and 'high' information conditions.The overall pattern of change, especially in the high information condition, is similar to the changes observed in α for the α varying model.The effect of increasing start-point variability, however, runs counter to the effect of increasing α in the LF model, and so we think it is unlikely that  the changes in these model parameters observed here reflect any meaningful tradeoff.Specifically, start-point variability is important for allowing the standard DDM to fit patterns of data where error responses are, on average, faster than correct responses (Ratcliff & Rouder, 1998).While fits to this fast-error pattern have been used to support the LF model (Voss et al., 2019), these fits require lower values of the α parameter.The increases in α shown by the α varying model imply a progressively lower rate of fast errors with practice, whereas the increasing pattern of s zr shown by the start-point variability model implies an increasing rate of fast errors with practice.On balance, we conclude that changes in start-point variability do not provide a viable explanation of changes in the α parameter.
Finally, for the drift and α varying model, we see the same pattern of changes in the α parameter across trial epochs in the high information condition (see lower panels of Figure 5).However, there were no systematic changes in mean drift rate for any of the information conditions (see bottom panels of Fig. 7).Taken together, this suggests that the effects of α are not duplicating effects better attributed to variation in drift rate.Of note, for these data, (Evans & Brown, 2017) also found little support for changes in drift rate over epoch.Overall, we conclude that the α varying model was the best performing model we considered.Model selection based on both AIC and BIC further support this (Table 3).Parameter estimates for each model are shown in Table 3.

Discussion
Our primary aim with the current study was to investigate the assumptions made by the LF model regarding within-trial evidence accumulation dynamics.Specifically, we provided a critical investigation of the psychological plausibility of the sudden large jumps in evidence accumulation that the model allows via its α parameter.While previous work has shown that the LF model provides a good fit to data exhibiting patterns of fast errors (Voss et al., 2019), it is not clear whether this is simply due to the increased flexibility afforded by α, or if the LF model provides a more accurate characterization of the underlying psychological process.Voss et al. (2019) proposed that α might be interpreted as reflecting the stability or consistency with which an individual selects or executes a given decision strategy (e.g., jumps in evidence accumulation, due to α, may reflect sudden shifts in the source of evidence when people are less adept at parsing the stimulus).This conjecture was the subject of our investigation.
We reasoned that if α could be interpreted psychologically along the lines proposed by Voss et al. (2019), we should be able to observe systematic changes in this parameter, as people refine a decision strategy and become more experienced in executing it.That is, we might expect systematic increases in the α parameter as a function of practice.We reanalyzed practice data from a study by Evans and Brown (2017), who showed different patterns of improvement as a the High (left column of panels), Medium (middle column of panels), and Low Information (right column of panels) conditions across trial epochs function of how informative the performance feedback they received was.Our analysis of these data with the LF model showed that when α was free to vary as a function of practice, as in the α varying model, we saw systematic increases in α, as a function of practice.This increasing pattern was most evident in the high information condition, where participants received the most guidance on how to refine their decision strategy over time.Our result coheres well with the conjecture of Voss et al. (2019), as the high information condition would be where one would expect participants to have refined their decision strategy to the greatest degree.
We also considered whether the changes in α we observed could simply reflect a tradeoff with some other model param-eter that is available to the standard DDM (viz.mean drift rate, or between-trial variability in either drift rate, starting point, or non-decision time).Our analysis showed that while there is some overlap between changes in α and changes in trial-to-trial variability in drift rate, s v , the changes in the drift rate variability parameter alone are insufficient to explain the benefits of allowing α to vary with practice.Notably, we found no systematic changes in the mean drift rate across trial epochs.The stability of drift rates was observed alongside changes in α in the high information condition, suggesting that α does not simply trade-off with other diffusion model parameters when fit to choice data.It is worth noting some limitations with our study, such as computational limitations (i.e., having to aggregate trial blocks into epochs to avoid problems with parameter recovery) and more importantly, a relatively low number of participants in the condition demonstrating variation in α as a function of practice.Specifically, there were only 11 participants in the high-information condition, necessitating caution in the interpretation of our result.Replicating these results with a larger sample will be important to establish the robustness of interpreting α as a potential marker of decision heterogeneity within an individual.Another important point of caution relates to the variability in epoch-on-epoch changes in the α parameter.In both models that allowed α to vary across epochs (i.e., Model 1 and Model 5), there was considerable overlap in the distributions across epochs, particularly in the earlier parts of the task.The increases we observed were mostly restricted to the the final trial epoch.The relatively noisy changes in α, including non-monotonic changes across the first two epochs, necessitate some caution in interpretation, though we do note that such changes are consistent with the notion of a greater variety of decision strategies/hypotheses an observer might pursue in the earlier parts of the task.

Lévy flights or Wiener diffusion?
Our analysis provides further preliminary support for using LFs as a model of human decision-making.Given the similarities between the LF model and the standard Wiener diffusion model (Ratcliff, 1978;Ratcliff & McKoon, 2008;Ratcliff & Rouder, 1998), it is reasonable to ask whether the LF model can tell us anything more about decision-making than what can be achieved through a standard diffusion model analysis.The answer to this question depends heavily on the extent to which accumulation noise in the decision process deviates from standard Gaussian assumptions.This will likely depend on the level of analysis one adopts when modeling the decision process.For example, at the lower level of spiking neurons, the noise will likely be more heavy-tailed than assumed by a Gaussian (e.g., Poisson noise provides a good characterization at this level; Wang, 2002).However, the aggregate properties of low-level models with non-Gaussian noise can be shown to be approximated well by diffusion models that assume Gaussian noise at evidence accumulation (Smith, 2010).It follows that at the level of overt behavior, standard diffusion models may be sufficient.A potential caveat though concerns situations where responses are collected from relatively novice observers or those who are inexperienced with a task.Here, the conjecture of Voss et al. (2019), that heavy-tailed accumulation noise might provide a better account may provide additional insight.If novice or inexperienced observers with a task are more variable in the response strategies they adopt, a LF analy-sis should show increased variability in accumulation noise via estimates of α < 2. Indeed, our reanalysis of the Evans and Brown (2017) data support this.We suspect that as observers become more experienced with a task-or become more single-minded in the execution of a preferred response strategy-their performance will be better approximated by a standard Wiener diffusion model.With less experienced observers, a LF model may provide a good way of indirectly assessing variability in response strategy; something that typically requires analysis of verbal reports collected after-the-fact from participants.This sort of analysis opens the door to then developing more explicit theories of the variety of decision strategies people may rely on when attempting a new task (e.g., by assuming a mixture of differently configured Wiener processes).It may also provide a principled first step towards developing mechanistic models of how one chooses among these different candidate strategies on a trialby-trial basis (e.g., Rieskamp & Otto, 2006).
If it is the case that the LF model has the potential to provide insights that are not as directly attained through standard diffusion model analysis, another question is whether LF analyses cast doubt on conclusions based on existing diffusion model analyses.That is, is there reason to mistrust processing insights gained by existing diffusion model analyses?To this, we think the answer is firmly no.Like other existing sequential sampling models, the LF model decomposes behavioral performance into latent variables corresponding to quality of evidence (drift rate), caution (boundary separation), and non-decision time.Donkin et al. (2011) found that application of two quite different model architectures-Ratcliff's diffusion model (Ratcliff & McKoon, 2008) and the linear ballistic accumulator (Brown & Heathcote, 2008)-would lead to similar conclusions regarding latent variables responsible for generating effects in data.The similarity of the LF model to the standard DDM leads us to believe that it is highly likely that inferences about whether an effect is driven by (say) quality of evidence versus (say) response caution will not depend strongly on which model is applied to data.
It is also important to consider the implications of the α parameter as an indicator of a mixture of decision strategies.Unless reliance on different strategies varies on a moment-tomoment basis, or alternatively, if the outputs of all strategies simultaneously feed into the decision stage resulting in superimposed evidence (e.g., Lee & Sewell, 2023;Ulrich et al., 2015), the LF model might mischaracterize the properties of the component decision strategies.For example, the singular drift rate summarized by the LF model may not reflect the distribution of drift rates of different (say) Wiener diffusion processes associated with different strategies.Since α affects within-trial evidence accumulation dynamics as opposed to between-trial variability in strategy selection, the mixture of strategies is necessarily modeled as if it were a single aggregate strat-egy.We think it is reasonable to view selection of a decision strategy primarily as a between-trial phenomenon, rather than something that can vary on a moment-to-moment timescale, and so a LF analysis will still require further investigation into how different decision strategies ought to be characterized and selected and/or combined to fit a given data set.This issue similarly besets any DDM analysis of a data set where responses reflect a mixture of strategies.On balance, we view the potential insights that can be gained by a LF analysis to complement insights already obtained through existing DDM analyses.In both cases, the results provide guidance on further theoretical work that needs to occur to provide a fuller characterization of psychological processing.For example, both DDM and LF analyses can shed light on how "theories of drift rates" need to be articulated.The LF model can potentially go further in demonstrating a need to develop a set of theories of drift rates to characterize different competing decision strategies that novices on a task might be selecting from.

Lévy flights models: current status and future directions.
How then should we view the LF model, and the assumptions it makes regarding the nature of evidence accumulation?On the one hand, our application of the model to the practice data of Evans and Brown (2017) suggests that systematic changes in α unfold in a manner that is sensible and theoretically consistent with our understanding of how task performance improves as a function of practice.It is therefore tempting to conclude that the jumps in evidence accumulation implied by the LF model are indeed real, and that the α parameter indexes something akin to the "stability" of a decision strategy.We believe such a conclusion would be premature, as, on the other hand, there is a broader question of whether the stability in applying, selecting, or executing a decision strategy over the course of an experiment is best represented as a within-trial noise parameter controlling moment-to-moment perturbations in evidence accumulation, or if it could be better characterized in some other way.If refinements of decision strategy serve to limit the noise that enters the decision process-perhaps by allowing observers to focus attention on the most relevant or diagnostic parts of a stimulus, or by rendering the observer less vulnerable to momentary distraction-then it is perhaps appropriate to interpret α as an index of decision stability.If, however, the evolution of decision strategies is more consistent with a process of selecting and testing candidate decision hypotheses, it would be more sensible to consider decision stability as a phenomenon that is best understood as occurring across tri- als, rather than varying over the course of a single decision within a trial.If decision instability is more akin to discrete selection of candidate response strategies across trials, then the early stages of practice-where estimates of α are low-may better be characterized by a probability mixture of Wiener diffusion processes, characterized by drift rate distributions with different means and standard deviations.These more detailed comparisons await future research.A logical next step from the current research would be to examine other data sets that have been used to assess learning-or experience-related changes in diffusion model parameters over time (e.g., Dutilh et al., 2009;Lerche & Voss, 2017) to see if similar changes in α arise there as well.One particularly fruitful direction for future research is to better characterize the relationship between the LF α parameter and existing parameters in the DDM.Our comparison of different model variations above suggest that the effects of the α parameter are not duplicating the effects of other betweentrial variability parameters that are commonly included in the DDM (viz.trial-to-trial variability in drift rates, starting point, and non-decision time).A cross-fitting exercise similar to the one conducted by Donkin et al. (2011) would clarify the relationship between α and other more established model parameters while also providing suggestions about the kinds of signatures in data that α might especially be sensitive to (e.g., patterns of fast errors, as described by Voss et al. 2019).
Given the open question of whether the heavy-tailed accumulation noise described by α best reflects between-or within-trial processing dynamics, there is an urgent need to understand moment-to-moment volatility of people's decision strategies and/or the extent to which people can maintain focus on a single source of evidence while making a decision.Attentional selection paradigms might present an ideal setting for exploring this question, as both relevant and irrelevant information is presented, requiring the observer to filter out the irrelevant information in order to respond correctly.Andersen et al. (2009) used a dot motion task where dots of two different colors were spatially overlaid with one another.In order to report the correct direction of motion, observers needed to selectively ignore the irrelevant colored dots.If α is sensitive to people's capacity to focus attention on a single source of perceptual evidence, individual differences in this parameter should correlate with other assays of attentional control.
Another future direction for research will be to examine how multiple sources of information might influence decision-making in non-perceptual domains.Wang et al. (2022) have suggested that jumps in evidence accumulation may reflect parallel activation of multiple information sources from episodic memory.This idea can be investigated in more detail, for example, in learning tasks where information in memory about multiple stimuli must be combined to form a decision.In these contexts, the LF model might be used to identify within-trial dynamics in the influence of multiple memory-based sources of evidence on decision-making (e.g., information specific to the stimulus vs. information about the task structure as a whole).To the extent that different parallel streams of information are being accumulated will highlight the utility of using the LF model as a tool for assessing task-related processing.

Conclusion
In this article, we investigated the conjecture of Voss et al. (2019) that the LF α parameter might provide a measure of decision stability.We observed systematic changes in α as a function of practice in a data set previously published by Evans and Brown (2017).While the question of exactly what α corresponds to psychologically remains open, for now, we think it is prudent to conclude that the LF α parameter reflects a psychologically meaningful construct.However, future research will need to carefully investigate whether the construct is best represented in formal models of decision-making as a between-or within-trial phenomenon.Our analysis of practice data using the LF model suggests that more diagnostic tests of the model may require participants who are less practiced on tasks, as individuals in the early stages of practice will be most likely to be trying to discern the best way to perform and will be the most open to trying a number of different approaches.

Fig. 1
Fig. 1 A schematic view of the diffusion decision model.The jagged lines depict evidence accumulation trajectories driven by different stimuli across different trials.The darker lines correspond to higher-quality stimuli that produce drift rates that deviate further from zero.The blue

Fig. 2
Fig. 2 Plot of sample paths for the Lévy Flights model for different values of α.In the left panel, the noise distribution of the accumulation process is presented for different values of α.The right panels show sample accumulation paths under different values of α.When α = 2, accumulation noise is Gaussian, and the model is equivalent to

Fig. 5
Fig. 5 Box plots depicting changes in the α parameter across trial epochs for the α varying model (top panels) and α & drift varying model (bottom panels) in the High (left panels), Medium (middle panels), and Low Information (right panels) conditions.For the α varying model model, averaged across participants, the α parameter increases by 17%, 17.5%, and 2.2% in the last trial epoch relative to the first,

Fig. 6
Fig. 6 Box plots depicting changes in the decision threshold (a) parameter across trial epochs for the α varying model (top panels) and the drift and α varying model (bottom panels) in the High (left panel), Medium (middle panel), and Low Information (right panel) conditions.For the α varying model model, averaged across participants, the decision threshold decreases by 60.7%, 33.4%, and 20.7% in the last trial

Fig. 7
Fig. 7 Box plots depicting changes in between trial variability parameters of the drift variability model (top panels), start-point variability model (second row of panels), non-decision time variability model (third row of panels), and the drift and α varying model (bottom panels) for

Table 2
Fitting performance of all models.The best performing model for each fit index (log likelihood, AIC, and BIC) is shown in bold Note.Model 1 = α Varying Model; Model 2 = Drift Variability Model; Model 3 = Start-Point Variability Model; Model 4 = Non-Decision Time Variability Model; Model 5 = Drift & α Varying Model

Table 3
Estimated parameters of the models for each condition.Mean parameter estimates across participants are reported along with the variance in estimates in parentheses