Mindfulness is a psychological mode characterized by full attention to present-moment experience without conceptual elaboration or emotional reactivity. Mindfulness training (MT), which can involve engaging in daily mindfulness exercises, taking a multiweek course, or participating in an intensive retreat, may be used to cultivate this mental mode. MT has been well-studied in clinical and health settings (see, e.g., Brown & Ryan, 2003), and there is growing evidence that MT is helpful for stress reduction, as well as improving mood and well-being (see Baer, Smith, Hopkins, Krietemeyer & Toney, 2006). Since exercises that engage attention are central to most MT protocols, a prominent hypothesis is that MT may improve aspects of attention (see Lutz, Slagter, Dunne & Davidson, 2008). Recent studies have further suggested that affective improvements, which are well-reported with MT, may be mediated by improvements in nonaffective core cognitive-control operations of attention and working memory (see Goldin & Gross, 2010; Jha, Stanley, Kiyonaga, Wong & Gelfand, 2010).

A growing number of studies manipulating aspects of attention have reported results consistent with the hypothesis that MT improves attentional control (see Lutz et al., 2008, for a review). Several studies have employed speeded response time (RT) tasks, such as the attention network test (Jha, Krompinger & Baime, 2007; Tang et al., 2007; van den Hurk, Giommi, Gielen, Speckens & Barendregt, 2010; see also Lutz et al., 2008). On subsets of trial types, RTs are faster in those who receive MT relative to control participants, and these RT improvements have been taken as evidence for improved attentional orienting (Jha et al., 2007; van den Hurk et al., 2010), executive control (Chan & Woollacott, 2007; Tang et al., 2007; van den Hurk et al., 2010; Wenk-Sormaz, 2005), and alerting (Jha et al., 2007). Given the well-established interrelationship between attention and working memory (see Jha, 2002), it is perhaps not surprising that recent studies have reported MT-related improvements in working memory as well (Chambers, Lo & Allen, 2008; Jha et al., 2010; Kozhevnikov, Louchakova, Josipovic & Motes, 2009; Zeidan, Johnson, Diamond, David & Goolkasian, 2010).

While there is growing evidence of MT-related improvements in behavioral tasks manipulating attention and working memory, it is not clear what aspects of information processing and decisional processes might be most sensitive to improvements in attentional control. For example, increased attentional control could reduce one’s tendency for response conservativeness. Alternatively, greater attentional control may promote a more deliberative quality to each decision that could improve accuracy and reduce variability but make overall RTs slower. Thus, RT and accuracy measures alone are insufficient for clarifying how MT may alter information processing. In the present study, we employ a mathematical modeling analysis strategy to more fully investigate the hypothesis of improved attentional control with MT.

Performance in any cognitive task in which the participant has to make simple judgments can be described by the drift diffusion model (DDM; Ratcliff, 1978), which models decisions in terms of evidence accumulation. Evidence for each decision alternative is accumulated over time, where the speed of evidence accumulation is determined by the quality of information present in the stimuli, and the individual’s capacity to extract that information to inform appropriate response execution. This latent variable is referred to as the “drift rate.” As soon as the accumulated evidence exceeds a decision threshold, the participant emits the corresponding response. In the case of a delayed-recognition task, as used herein to investigate working memory, this would be “yes” when the probe item is identical to a study list item, and “no” otherwise. Thus, the larger the model-derived drift rate parameter, the higher the quality of the information that is accumulated; in other words, if probe and study items are easily distinguishable as either identical or different, the drift rate will be high. If the information is more ambiguous, the drift rate will be low.

Whereas the drift rate corresponds to the quality of participants’ information accumulation (e.g., modulated by stimulus quality and attention), the “decision boundary” parameter of the model captures her/his speed–accuracy trade-off. A participant who favors accuracy continues to accumulate evidence for a long time before deciding, which corresponds to a high decision boundary. Conversely, a participant who favors speed has a lower decision boundary, which causes RTs to be short and task accuracy to be potentially compromised. The remaining factors, including postdecisional motor latencies, are captured by the nondecision time parameter.

While many studies have examined the influence of specific experimental manipulations on the drift rate (e.g., changing perceptual salience; Ratcliff, 2002) or decisional processes (e.g., emphasizing speed vs. accuracy; Ratcliff, 2002), only a few studies have investigated individual differences in diffusion model parameters while holding task parameters constant. For example, Madden et al. (2009) and Spaniol, Madden and Voss (2006) found that aging preferentially decreases drift rate estimates, while Green, Pouget and Bavelier (2010) found that intensive action-game playing increased drift rates. Other studies have demonstrated age-related differences in decision boundaries to account for performance differences between younger and older adults (e.g., Ratcliff, Thapar, Gomez & McKoon, 2004a; Ratcliff, Thapar & McKoon, 2001; 2004). In addition to aging, Ratcliff, Thapar and McKoon (2006) examined the impact of task-specific training on diffusion model parameters. They found that training improved participants’ ability to perceive fine details of the task stimuli, consistent with changes in drift rate. Training did not alter decisional processes, as indicated by comparable decision boundary estimates before and after training.

Thus, individual differences and training-related effects are tractable with mathematical modeling, warranting its further use to examine mechanisms of action of MT. Perhaps most relevant to an investigation of MT using a modeling approach is the theoretical account by Smith and Ratcliff (2009), who proposed that attention increases the efficiency with which a working memory trace is formed, either by increasing the rate at which memory-related information is transferred to a temporary store or by reducing the delay before the memory trace formation begins. In line with this view, they demonstrated that modeling of several manipulations of attention produced increases in the drift rate for information that was attended versus unattended.

Herein, we use the DDM in combination with a model based on stimulus identities to investigate the mechanisms of action of MT in the context of a visual working memory task. If MT improves attentional control and the efficiency with which attentional allocation can be used in the service of working memory, the theoretical account of Smith and Ratcliff (2009) predicts that drift rates may increase with MT, without any changes in nondecision time.

We examined drift rate, decision boundary, and nondecision time using the EZ-diffusion model (Wagenmakers et al., 2007). Our participants were an MT-experienced cohort who participated in an intensive one-month MT retreat and an age- and education-matched control group who had no prior experience with MT. Both groups performed a delayed recognition working memory task for complex and highly confusable faces at two time points, corresponding to the beginning and end of the MT retreat.

We investigated three main issues. First, would participation in the retreat improve task accuracy, RTs, or response variability? Given past research reporting decreases in RT and its variability during attention tasks following intensive MT retreats (Jha et al., 2007; Lutz et al., 2009; Slagter et al., 2007), the strong links between attention and working memory (see Jha, 2002), and recent research demonstrating MT-related improvements in working memory (Chambers et al., 2008; Jha et al., 2010), we predicted that the mean and variance of RTs would decrease for the MT group on this working memory task.

Second, we predicted that this decrease in RT mean and variance might be due to an improvement in perceptual evidence accumulation with MT. Stimulus-focused attention, which may be bolstered with MT (see Jha et al., 2007; van den Hurk et al., 2010), increases the quality of the perceived stimulus (e.g., Downing, 1988) and its remembered representation (e.g., Palmer & Ames, 1992). In addition, previous modeling studies reported that manipulations of attention during working memory tasks increase drift rate (Smith & Ratcliff, 2009). We therefore predicted that drift rate parameters would increase with MT, corresponding to the improved distinguishability of memory representations. Furthermore, because of the negative correlation between drift rate and decision boundaries (Bogacz, Brown, Moehlis, Holmes & Cohen, 2006), we predicted that decision boundary decreases would be observed with MT. Given past research that attentional allocation does not alter nondecision time, we predicted that there would be no change in nondecision time (i.e., pre- and postdecisional processes).

Third, we examined the origin of increased drift rates more closely with a similarity-based model of working memory (Kahana & Sekuler, 2002), the noisy exemplar model (NEMo). This model describes how people decide whether they have seen a stimulus before on the basis of the similarities of the mental representations of probe items to all study-list items. If the sum of these similarities exceeds a decision threshold, participants will respond that they have seen the item before, otherwise they will say that they have not. The sum of similarities is weighted depending on their serial position, such that early list items contribute less than later items do, an effect that is captured by a forgetting parameter. Furthermore, the decision threshold is adjusted by the overall homogeneity of interitem similarities within the list. Distances between items are scaled by a generalization gradient that, when large, indicates that a participant will still be able to distinguish stimuli even when they are very similar. According to NEMo, the mental representations of the studied list stimuli are corrupted by noise, the magnitude of which is tuned by another model parameter.

In this model, performance could improve through (1) a decrease in forgetting of early list items, (2) optimization of the influence of list homogeneity on performance, (3) a decrease in the generalization gradient, and (4) a decrease in encoding noise.Footnote 1 If it is true that drift rate increases and RT and its variability decreases as a result of MT we predicted that perceptual noise would decrease, but that there would be no change in the rate of forgetting stimuli. As participants in MT learn to focus more on the task, they have fewer lapses of attention, which are one of the main sources of encoding noise. This improvement in attention with MT may lead to improved perceptual abilities, which is also claimed in the traditional Buddhist literature. We therefore predict that long-term MT practitioners will show a decrease in the generalization gradient but that these changes might not be tractable with only short-term intensive training. We predicted instead that they might be observable only after many years of MT experience. We did not envisage changes in the forgetting parameters, because MT involves focusing on the present moment, rather than remembering the past, which is what the forgetting parameters reflect.


MT intervention

We examined the effects of MT by comparing delayed-recognition task performance before and after a month-long intensive training program that was based largely on the Four Foundations of Mindfulness—Sathipattana Sutra in the Buddhist canon. Training took place at Shambhala Mountain Center. The participants engaged in MT exercises that aimed to cultivate concentrative and receptive attention for 10–12 h a day. In the first 2 weeks, the participants focused mainly on their breath, and in the second 2 weeks, they opened up their attention and added practices that cultivated compassion and loving kindness.


A group of 29 participants (12 female, 17 male, ages 21–70) participated in the retreat. They had practiced MT on average 1,792 h (SEM = 261) during their lives, ranging from 100 to more than 5,000 h. To account for test–retest effects such as improvements due to task-specific training (see Ratcliff et al., 2006), we tested a group of 29 age-, gender-, and education-matched controls at two time points separated by one month. All procedures were in accord with the requirements of the Institutional Review Board of the University of Pennsylvania.


Each participant performed a delayed-recognition task (Sternberg, 1966) with synthetic face stimuli. These stimuli were created so they occupied known positions in an abstract similarity space (Wilson, Loffler & Wilkinson, 2002) and are described in more detail elsewhere (Pantelis, van Vugt, Sekuler, Wilson & Kahana, 2008; van Vugt, Sekuler, Wilson & Kahana, submitted). Each of the three face stimuli in the list was shown for 700 ms, with a 300-ms interstimulus interval. The three study-list items were followed by a 3,000-ms delay, after which a probe item appeared (see Fig. 1). Participants had to indicate whether the probe was identical to one of the list items by pressing a “yes” button with their left index finger or a “no” button with their right index finger. They were instructed to be both fast and accurate. The experiment was programmed in E-Prime (Psychology Software Tools, Pittsburgh, PA). Task performance was quantified by accuracy, median RT, and RT variability measures, which were all statistically evaluated with nonparametric ANOVAs (Brunner & Puri, 2001). We used nonparametric ANOVAs because task performance was heterogeneous across participants, and the normality assumptions for a regular ANOVA were violated because of the wide range in age, education, and computer experience of the participants. To investigate the possible contribution of lifetime hours of meditation experience to our results, we conducted a regression on Time 1 (T1) and Time 2 (T2) performance data (accuracy and RT), correcting for age. Trials with RTs longer than twice the standard deviation above the mean were removed from the analysis, leaving on average 144 trials per time point per participant for modeling and analysis.

Fig. 1
figure 1

Schematic of the visual working memory paradigm

Drift diffusion model

We estimated each individual’s drift rate, decision boundary, and nondecision time using the EZ-diffusion model. The EZ-diffusion model (Wagenmakers et al., 2007) is a simplified version of Ratcliff’s (1978) diffusion model that is particularly well-suited for modeling individual differences (van Ravenzwaaij & Oberauer, 2009). The EZ-diffusion model reduces the seven-parameter drift diffusion model of decision making to three parameters: drift rate, which is the rate at which information accumulates; decision boundary, the point at which enough information is accumulated to make a decision; and nondecision time.

Noisy exemplar model

We further probed into the nature of the mental representations of the to-be-remembered items by estimating model parameters for NEMo (Kahana & Sekuler, 2002). This model predicts accuracy in delayed-recognition tasks on the basis of the similarity coordinates of the stimuli. The similarity coordinates of our stimuli had been obtained in a previous study (Pantelis et al., 2008). NEMo posits that a participant bases his or her decision on the sum of the similarities (S) between the probe (p) and all of the list items (s i ) for a study list of list length LL (in our case, LL = 3):[COMP: Graphic of eq. will not delete. Use it as a guide to how eq. should look, then delete.]

$$ S = \sum\limits_{{i = 1}}^{{LL}} {{\alpha_i}{e^{{ - \tau \left| {(p - \left( {{s_i} + \sigma } \right)} \right|}}}} + \beta \sum\limits_{{i = 1}}^{{LL - 1}} {\sum\limits_{{j = i + 1}}^{{LL}} {{e^{{ - \tau \left| {\left( {{s_i} + \sigma } \right) - \left( {{s_j} + \sigma } \right)} \right|}}}} } $$

In this equation, S is the summed similarity, α i is a forgetting parameter, as described below, and τ determines the form of the exponential generalization gradient (Shepard, 1987). The exponential generalization gradient quantifies how different two stimuli have to be for the participant to see them as different. A larger value of τ indicates that stimuli need to differ less in their similarity for the participant to see them as different. The list items s i and the probe item p are represented by vector coordinates in an abstract similarity space, and |x − y| denotes the Euclidean distance between any two stimuli x and y.

NEMo assumes that every stimulus is encoded with some amount of noise, implemented by a zero-mean Gaussian with standard deviation σ. This parameter is referred to as “encoding noise.” To simulate forgetting, NEMo assumes that the most recent stimulus contributes the most to the summed similarity, and that earlier items contribute less and less. This is simulated by the parameter α i , which decreases with lag (i = 1 indicating the most recently studied item). Finally, β determines how strongly a person adapts their recognition decision threshold based on the homogeneity of the similarities between the stimuli comprising the study list. When S exceeds a decision threshold determined by maximum likelihood, the simulated participant responds yes on that trial, and otherwise responds no.

This simplified version of NEMo, incorporating only five free parameters (van Vugt et al., submitted), was fit to the behavioral data using a genetic algorithm (Mitchell, 1996), which was run for 10 generations of 1,000 individuals (inspection of the generation in which the best fit was found indicated that 10 generations were sufficient). The genetic algorithm tries to minimize the root-mean square difference (RMSD) between observed and predicted accuracy, in conjunction with the difference between the observed and predicted d' (both accuracy and d' contributed equally to the RMSD). We used both d' and RMSD as error measures in order to force the algorithm to focus on solutions in which both targets and lures were predicted well. If either targets or lures are overpredicted, then the predicted d' will be very far from the empirical d', even if the RMSD is still relatively low.

Because of the inherent stochasticity of the algorithm, a genetic algorithm will not easily get stuck in a local optimum. This stochasticity therefore obviates the need to run the algorithm multiple times with different starting points, since there will always be individuals in the population who are away from the current minimum.


Median accuracy and response times

We first examined whether basic task performance changed as a function of intensive MT. Median RTs and accuracy are reported in Table 1. There was no significant difference between the MT and control groups in either median accuracy or RTs at T1 (ranksum test z = 0.82 for accuracy, z = 0.51 for RT, n.s.). Median accuracy did not differ between groups at T2, the main effects of time and group were not significant,Footnote 2 and there was no interaction between time and group (p = .13). As we had predicted, there was a main effect of time on median RT, which decreased between T1 and T2 (main effect of time, p = .0001; main effect of group, p = .30; see Fig. 2a). There was a trend toward a significant interaction between time and group (p = .069), indicating that RTs decreased more for the MT than for the control group. If MT results in a reduction in RTs, then we predict that median RTs should be faster in participants with more MT experience upon entering the retreat. Indeed, median RTs at T1 were faster with more lifetime hours of MT practice, after being adjusted for age (partial regression coefficient for regression of median RT on MT practice: –.048, p = .05). There was no significant relationship between the change in median RTs as a function of MT and the lifetime hours of MT practice (partial regression coefficient for regression of difference in median RT between T1 and T2 and lifetime MT practice: –.027, n.s.), indicating that lifetime MT experience did not predict how much effect the retreat would have on RT. There was also no significant relationship between lifetime MT practice and either median accuracy at T1 or retreat effects on accuracy.

Table 1 Behavioral performance
Fig. 2
figure 2

(a and b) Changes in the means (a) and variance (b) of z-transformed response times (RTs) for the mindfulness training (MT) and control groups. Error bars reflect the standard errors of the means. (c) Correlation between variance in RTs and hours of MT experience at T1, the start of the retreat

Response time variability

There was no difference between groups in the variance of RTs at T1 (ranksum z = 0.61, n.s.). We found that the variance in RTs was reduced from T1 to T2 (main effect of time, p = .0041) and that this effect was greater for the MT group than for controls (significant interaction between time and group, p = .0065; see Fig. 2b), but there was no main effect of group. Thus, the MT group had a lower RT variance at T2 relative to the control group. Investigation of lifetime hours of MT on RT variance revealed that the variance in RTs at T1 also showed a trend toward being lower with greater lifetime hours of MT practice, when adjusted for age (slope for regression of RT variability on meditation experience: –.039, p = .090; see also Fig. 2c). To determine the impact of lifetime hours of practice on the magnitude of retreat-related changes in RT variability for the MT group, a similar regression analysis was performed. There was a trend toward a relation between the change in RT variability between T2 and T1 and lifetime hours of MT practice (slope when corrected for age: –.0299, p = .086). In other words, prior experience with MT and the effects of the retreat interacted with each other. With more lifetime hours of MT, RT variability was lower, and the degree of benefit from attending the retreat for lowering RT variability was diminished with greater lifetime hours of MT. Thus, the retreat was more effective in reducing RT variability in MT practitioners with few lifetime hours of preretreat MT experience. We observed a correlation of .82 (p = 10–30) between RT variability and its mean, similar to what has been observed in previous studies (Wagenmakers & Brown, 2007).

Modeling results

We used the EZ-diffusion model (Wagenmakers et al., 2007) to assess MT-related changes in nondecision time, drift rate, and decision boundary. First, as we predicted, we found no change in nondecision time between T1 and T2 (median at T1: MT, 694 ms; control, 791 ms; median at T2: MT, 726 ms; control, 762 ms), as revealed by a nonparametric ANOVA that did not show any significant main effects (group, p = .30; time, p = .31) or interactions (p = .76). Our main prediction was that drift rate would increase with MT. Indeed, we found an increase in the drift rate (the model parameter corresponding to information quality or stimulus distinguishability) for the MT group (T1, 0.0473; T2, 0.0570) but not for the control group (T1, 0.0479; T2, 0.0459; Fig. 3b). A nonparametric ANOVA indicated no main effect of time or group (time, p = .22; group, p = .94), but did show a significant interaction between time and group (p = .037). There was no difference in drift rates between groups at T1 (ranksum z = −0.73, n.s.), nor was there a significant relation between drift rate and lifetime hours of experience with MT.

Fig. 3
figure 3

Overall changes in the (a) boundary separation and (b) drift rate parameters of the EZ-diffusion model for the MT and control groups. Error bars reflect the standard errors of the means

Previous reports have shown that in many cases, increases in drift rate are accompanied by decreases in decision boundary (Bogacz et al., 2006). This is particularly true in tasks in which the participants are allotted a fixed amount of time in which they can maximize the number of trials. They adjust their boundary depending on their drift rate, because as drift rate increases the same amount of reward can be obtained in a shorter time (Bogacz et al., 2006). Although in our study the task was self-paced, participants might still have had internal time pressure, since they wanted to finish the task. We indeed found a negative correlation between drift rate and decision boundary across participants [r(115) = −.48, p = 4.2 × 10–8].

While there was no difference between groups in boundary separation at T1 (ranksum z = 0.47, n.s.; Fig. 3a), a nonparametric ANOVA showed that there was a main effect of time (p = .0033) but not of group. There was also an interaction between time and group (p = .0063), such that decision boundary separation decreased more for the MT than for the control group at T2. This indicated that the MT group moved toward a greater emphasis on speed.

If MT was a significant causal contributor to the changes in decision boundary and drift rate observed at T2, then perhaps lifetime hours of MT should show a similar effect on decision boundary, with lower preretreat decision boundaries in those with many versus few hours of lifetime MT experience. The decision boundary separation at T1 did indeed decrease with increasing lifetime MT practice, when adjusted for age (regression slope for boundary separation on meditation experience: −7.85 × 10−5, p = .041). The change in boundary separation between T2 and T1 was negatively correlated with lifetime MT practice (regression slope: –6.9 × 10−5, p = .027), indicating that for participants who already had a large amount of MT experience, the effect of the retreat on the decision boundary was smaller than for more novice participants.

We then sought to investigate what caused the observed changes in drift rate. Our delayed-recognition paradigm with highly confusable stimuli was amenable to using the NEMo (Kahana & Sekuler, 2002), which uses stimulus structure to predict participants’ accuracies. NEMo computes the likelihood of a particular decision for every trial on the basis of the summed similarity between the memory probe and all study items (see the Method section).

We asked which mechanism (i.e., model parameter) led to the observed improvement in task performance. Our hypothesis was that task improvement would specifically be related to decreases in encoding noise, and not to changes in the generalization gradient, the forgetting parameter, or a change in the influence of list homogeneity on the decision threshold. Table 2 reveals that the MT group showed a decrease in encoding noise over time that the control group did not (ranksum z = −1.8, p = .04, one-tailed). The two groups did not differ in their change over time in any other NEMo parameter, apart from β, which reflects the relative contributions of list homogeneity and probe-item similarity to the decision but has no clear implications for improved task performance. MT can thus be said to provide a specific effect on encoding noise. Although a drift rate increase could be related to many factors, NEMo shows that in this case, it is specifically related to a decrease in encoding noise.

Table 2 Average values of the noisy exemplar model parameters and significance of the interaction between time (effect of retreat) and group (mindfulness training [MT] vs. control)

General discussion

We investigated the impact of intensive MT on performance of a delayed-recognition working memory task with face stimuli. After a month of intensive MT, participants were faster and their RTs were less variable than those of an age- and education-matched control group who received no training. This change in RT performance, in the absence of group- or time-related changes in accuracy, corresponded to changes in the parameters of a mathematical model of decision making. The EZ-diffusion model revealed that the drift rate increased (and, consequently, the decision boundary decreased) with MT. When the experimental task was held constant in our participants, MT corresponded to salutary changes in model parameters from NEMo representing the quality of memory items but not forgetting. Thus, the modeling results support the hypothesis that MT may improve working memory via attentional improvements, particularly on information quality and consequent decisional processes.

While we attribute our effects to functional improvement in information processing achieved by MT participants, several alternative explanations must be acknowledged. One potential confound in our study is that control participants might not be as motivated as the MT group to do well in the task. Participants in MT spent a full month of their lives in this retreat, and therefore expected benefits, which motivated them to do particularly well at T2. It is important to note that changing motivation levels with monetary incentives may have effects that look similar to those we observed with MT. In a task-switching paradigm, participants in conditions with high prospective gains are less distractible than those in conditions with low prospective gains (Müller et al., 2009). This processing shift may correspond to changes in drift rate in a diffusion model (Voss, Rothermund & Brandtstädter, 2008) and to a decrease in encoding noise or a decrease in forgetting. Yet, arguing against a motivational interpretation of our results is the fact that we found that most of our results, such as the decrease in RT, scaled with the number of hours of lifetime meditation practice. In addition, the effect of a retreat on RT and DDM parameters decreased as a function of prior experience with MT. This indicates that MT creates lasting changes in cognition.

Future studies should, nevertheless, rule out that the effects observed in the MT group were simply motivational. They should also investigate what aspect of MT, the focus on the breath or compassion practices, underlies the results. This could be investigated by using a participant population that only practices a single type of MT.

Another potential confound we must consider is that the MT group was not only engaged in MT during the retreat but they were also in a relatively low-stress environment, surrounded by nature. Thus, the environmental context of the training, as opposed to the training itself, might have bolstered attentional functioning (see Tang & Posner, 2009). Arguing against this interpretation is the fact that our MT-related changes were very specific and restricted to RT variability and to only a subset of our modeling parameters. A final important confound to consider is that our results might have been driven by preexisting individual differences between the MT and control groups, as opposed to training-related changes. Yet, the groups did not differ in performance at T1 on most of our measures of interest. Nevertheless, a much stronger test of our claims should involve conducting a randomized control design with an active non-MT comparison intervention in novices with no prior experience with MT.

We also acknowledge several modeling-related issues. First, one may be concerned that the nondecision times we observed (on the order of 700 ms) were much greater than those typically reported in the literature on the diffusion model (where the values tend to be on the order of 400 ms). These relatively high nondecision times were likely created by the conjunction of two circumstances: (1) the relative difficulty of the task as compared to other tasks used in DDM modeling (e.g., lexical decision tasks) and (2) the unfamiliarity of our participant sample with cognitive tasks and/or with computers. In addition, older participants tend to have slower motor processes, which might have further increased nondecision time (cf. Ratcliff et al., 2004a). Indeed, there was a significant correlation between nondecision time and age [r(57) = .41, p = .0016], indicating that age was a significant contributor to long nondecision times.

Another model-related issue is whether our data warrant application of EZ diffusion. On average, 47.5 of 58 participantsFootnote 3 (82%) passed all of the criteria for EZ-diffusion application put forward by Wagenmakers et al. (2007). Of the participants who failed tests, most of them (8.5/58) failed to show an interaction between the correctness of the response and whether the response was a target or a lure (third criterion). Two participants failed to show a right skew in their RT distribution (first criterion; 2/58 participants), and 0.5 participant showed a difference between correct and error response times (second criterion). A failure to satisfy all criteria might lead to unreliable fits of the DDM. Given that this should increase the noise in our data, we would expect that it would decrease the chances of finding any differences between our participant groups.

Although the confounds we mentioned above must be carefully addressed in subsequent studies, the present project is noteworthy in its novel use of mathematical modeling to capture functional changes that might occur via mental training. Importantly, this training shared almost no features with the transfer task, and thus satisfies a strong test of generalizability over contexts (see Green & Bavelier, 2008). Further, the community of researchers who investigate the cognitive, affective, and neural consequences of MT and other types of training have not, as yet, used mathematical modeling as a route by which to capture the specific functional changes that MT may engender (but see Green et al., 2010, for an example of using the DDM to investigate the effect of video game training on perceptual decision making).

Mathematical models allow for an examination of precise functional processes that are not revealed by analysis of RT or accuracy data alone. These processes can be examined while simultaneously accounting for other confounding factors. In addition to behavioral studies, studies of MT and other forms of mental training with functional magnetic resonance imaging and other neural recordings should consider including modeling components (see, e.g., van Vugt et al., 2009). Since activation of distinct neural loci does not disambiguate between the functional processes subserved by the activated regions, the MT-related sensitivity of specific model parameters might help to better understand the functional involvement of specific brain regions in the effects of MT on cognition.

In summary, we have demonstrated that intensive MT can benefit performance in a visual working memory task. After a month of intensive MT, participants are faster and their RTs are less variable. This change in performance reflects an increase in the distinguishability of the information that is accumulated when a decision has to be made about whether the probe is or is not a member of the study list, and a concomitant decrease in the decision threshold. Thus, this study suggests that the influence of MT on information processing is both specific and tractable.