Investigating the impact of mindfulness meditation training on working memory: A mathematical modeling approach
We investigated whether mindfulness training (MT) influences information processing in a working memory task with complex visual stimuli. Participants were tested before (T1) and after (T2) participation in an intensive one-month MT retreat, and their performance was compared with that of an age- and education-matched control group. Accuracy did not differ across groups at either time point. Response times were faster and significantly less variable in the MT versus the control group at T2. Since these results could be due to changes in mnemonic processes, speed–accuracy trade-off, or nondecisional factors (e.g., motor execution), we used a mathematical modeling approach to disentangle these factors. The EZ-diffusion model (Wagenmakers, van der Maas, & Grasman, Psychonomic Bulletin & Review 14:(1), 3–22, 2007) suggested that MT leads to improved information quality and reduced response conservativeness, with no changes in nondecisional factors. The noisy exemplar model further suggested that the increase in information quality reflected a decrease in encoding noise and not an increase in forgetting. Thus, mathematical modeling may help clarify the mechanisms by which MT produces salutary effects on performance.
KeywordsComputational model Decision making Working memory Attention
Mindfulness is a psychological mode characterized by full attention to present-moment experience without conceptual elaboration or emotional reactivity. Mindfulness training (MT), which can involve engaging in daily mindfulness exercises, taking a multiweek course, or participating in an intensive retreat, may be used to cultivate this mental mode. MT has been well-studied in clinical and health settings (see, e.g., Brown & Ryan, 2003), and there is growing evidence that MT is helpful for stress reduction, as well as improving mood and well-being (see Baer, Smith, Hopkins, Krietemeyer & Toney, 2006). Since exercises that engage attention are central to most MT protocols, a prominent hypothesis is that MT may improve aspects of attention (see Lutz, Slagter, Dunne & Davidson, 2008). Recent studies have further suggested that affective improvements, which are well-reported with MT, may be mediated by improvements in nonaffective core cognitive-control operations of attention and working memory (see Goldin & Gross, 2010; Jha, Stanley, Kiyonaga, Wong & Gelfand, 2010).
A growing number of studies manipulating aspects of attention have reported results consistent with the hypothesis that MT improves attentional control (see Lutz et al., 2008, for a review). Several studies have employed speeded response time (RT) tasks, such as the attention network test (Jha, Krompinger & Baime, 2007; Tang et al., 2007; van den Hurk, Giommi, Gielen, Speckens & Barendregt, 2010; see also Lutz et al., 2008). On subsets of trial types, RTs are faster in those who receive MT relative to control participants, and these RT improvements have been taken as evidence for improved attentional orienting (Jha et al., 2007; van den Hurk et al., 2010), executive control (Chan & Woollacott, 2007; Tang et al., 2007; van den Hurk et al., 2010; Wenk-Sormaz, 2005), and alerting (Jha et al., 2007). Given the well-established interrelationship between attention and working memory (see Jha, 2002), it is perhaps not surprising that recent studies have reported MT-related improvements in working memory as well (Chambers, Lo & Allen, 2008; Jha et al., 2010; Kozhevnikov, Louchakova, Josipovic & Motes, 2009; Zeidan, Johnson, Diamond, David & Goolkasian, 2010).
While there is growing evidence of MT-related improvements in behavioral tasks manipulating attention and working memory, it is not clear what aspects of information processing and decisional processes might be most sensitive to improvements in attentional control. For example, increased attentional control could reduce one’s tendency for response conservativeness. Alternatively, greater attentional control may promote a more deliberative quality to each decision that could improve accuracy and reduce variability but make overall RTs slower. Thus, RT and accuracy measures alone are insufficient for clarifying how MT may alter information processing. In the present study, we employ a mathematical modeling analysis strategy to more fully investigate the hypothesis of improved attentional control with MT.
Performance in any cognitive task in which the participant has to make simple judgments can be described by the drift diffusion model (DDM; Ratcliff, 1978), which models decisions in terms of evidence accumulation. Evidence for each decision alternative is accumulated over time, where the speed of evidence accumulation is determined by the quality of information present in the stimuli, and the individual’s capacity to extract that information to inform appropriate response execution. This latent variable is referred to as the “drift rate.” As soon as the accumulated evidence exceeds a decision threshold, the participant emits the corresponding response. In the case of a delayed-recognition task, as used herein to investigate working memory, this would be “yes” when the probe item is identical to a study list item, and “no” otherwise. Thus, the larger the model-derived drift rate parameter, the higher the quality of the information that is accumulated; in other words, if probe and study items are easily distinguishable as either identical or different, the drift rate will be high. If the information is more ambiguous, the drift rate will be low.
Whereas the drift rate corresponds to the quality of participants’ information accumulation (e.g., modulated by stimulus quality and attention), the “decision boundary” parameter of the model captures her/his speed–accuracy trade-off. A participant who favors accuracy continues to accumulate evidence for a long time before deciding, which corresponds to a high decision boundary. Conversely, a participant who favors speed has a lower decision boundary, which causes RTs to be short and task accuracy to be potentially compromised. The remaining factors, including postdecisional motor latencies, are captured by the nondecision time parameter.
While many studies have examined the influence of specific experimental manipulations on the drift rate (e.g., changing perceptual salience; Ratcliff, 2002) or decisional processes (e.g., emphasizing speed vs. accuracy; Ratcliff, 2002), only a few studies have investigated individual differences in diffusion model parameters while holding task parameters constant. For example, Madden et al. (2009) and Spaniol, Madden and Voss (2006) found that aging preferentially decreases drift rate estimates, while Green, Pouget and Bavelier (2010) found that intensive action-game playing increased drift rates. Other studies have demonstrated age-related differences in decision boundaries to account for performance differences between younger and older adults (e.g., Ratcliff, Thapar, Gomez & McKoon, 2004a; Ratcliff, Thapar & McKoon, 2001; 2004). In addition to aging, Ratcliff, Thapar and McKoon (2006) examined the impact of task-specific training on diffusion model parameters. They found that training improved participants’ ability to perceive fine details of the task stimuli, consistent with changes in drift rate. Training did not alter decisional processes, as indicated by comparable decision boundary estimates before and after training.
Thus, individual differences and training-related effects are tractable with mathematical modeling, warranting its further use to examine mechanisms of action of MT. Perhaps most relevant to an investigation of MT using a modeling approach is the theoretical account by Smith and Ratcliff (2009), who proposed that attention increases the efficiency with which a working memory trace is formed, either by increasing the rate at which memory-related information is transferred to a temporary store or by reducing the delay before the memory trace formation begins. In line with this view, they demonstrated that modeling of several manipulations of attention produced increases in the drift rate for information that was attended versus unattended.
Herein, we use the DDM in combination with a model based on stimulus identities to investigate the mechanisms of action of MT in the context of a visual working memory task. If MT improves attentional control and the efficiency with which attentional allocation can be used in the service of working memory, the theoretical account of Smith and Ratcliff (2009) predicts that drift rates may increase with MT, without any changes in nondecision time.
We examined drift rate, decision boundary, and nondecision time using the EZ-diffusion model (Wagenmakers et al., 2007). Our participants were an MT-experienced cohort who participated in an intensive one-month MT retreat and an age- and education-matched control group who had no prior experience with MT. Both groups performed a delayed recognition working memory task for complex and highly confusable faces at two time points, corresponding to the beginning and end of the MT retreat.
We investigated three main issues. First, would participation in the retreat improve task accuracy, RTs, or response variability? Given past research reporting decreases in RT and its variability during attention tasks following intensive MT retreats (Jha et al., 2007; Lutz et al., 2009; Slagter et al., 2007), the strong links between attention and working memory (see Jha, 2002), and recent research demonstrating MT-related improvements in working memory (Chambers et al., 2008; Jha et al., 2010), we predicted that the mean and variance of RTs would decrease for the MT group on this working memory task.
Second, we predicted that this decrease in RT mean and variance might be due to an improvement in perceptual evidence accumulation with MT. Stimulus-focused attention, which may be bolstered with MT (see Jha et al., 2007; van den Hurk et al., 2010), increases the quality of the perceived stimulus (e.g., Downing, 1988) and its remembered representation (e.g., Palmer & Ames, 1992). In addition, previous modeling studies reported that manipulations of attention during working memory tasks increase drift rate (Smith & Ratcliff, 2009). We therefore predicted that drift rate parameters would increase with MT, corresponding to the improved distinguishability of memory representations. Furthermore, because of the negative correlation between drift rate and decision boundaries (Bogacz, Brown, Moehlis, Holmes & Cohen, 2006), we predicted that decision boundary decreases would be observed with MT. Given past research that attentional allocation does not alter nondecision time, we predicted that there would be no change in nondecision time (i.e., pre- and postdecisional processes).
Third, we examined the origin of increased drift rates more closely with a similarity-based model of working memory (Kahana & Sekuler, 2002), the noisy exemplar model (NEMo). This model describes how people decide whether they have seen a stimulus before on the basis of the similarities of the mental representations of probe items to all study-list items. If the sum of these similarities exceeds a decision threshold, participants will respond that they have seen the item before, otherwise they will say that they have not. The sum of similarities is weighted depending on their serial position, such that early list items contribute less than later items do, an effect that is captured by a forgetting parameter. Furthermore, the decision threshold is adjusted by the overall homogeneity of interitem similarities within the list. Distances between items are scaled by a generalization gradient that, when large, indicates that a participant will still be able to distinguish stimuli even when they are very similar. According to NEMo, the mental representations of the studied list stimuli are corrupted by noise, the magnitude of which is tuned by another model parameter.
In this model, performance could improve through (1) a decrease in forgetting of early list items, (2) optimization of the influence of list homogeneity on performance, (3) a decrease in the generalization gradient, and (4) a decrease in encoding noise.1 If it is true that drift rate increases and RT and its variability decreases as a result of MT we predicted that perceptual noise would decrease, but that there would be no change in the rate of forgetting stimuli. As participants in MT learn to focus more on the task, they have fewer lapses of attention, which are one of the main sources of encoding noise. This improvement in attention with MT may lead to improved perceptual abilities, which is also claimed in the traditional Buddhist literature. We therefore predict that long-term MT practitioners will show a decrease in the generalization gradient but that these changes might not be tractable with only short-term intensive training. We predicted instead that they might be observable only after many years of MT experience. We did not envisage changes in the forgetting parameters, because MT involves focusing on the present moment, rather than remembering the past, which is what the forgetting parameters reflect.
We examined the effects of MT by comparing delayed-recognition task performance before and after a month-long intensive training program that was based largely on the Four Foundations of Mindfulness—Sathipattana Sutra in the Buddhist canon. Training took place at Shambhala Mountain Center. The participants engaged in MT exercises that aimed to cultivate concentrative and receptive attention for 10–12 h a day. In the first 2 weeks, the participants focused mainly on their breath, and in the second 2 weeks, they opened up their attention and added practices that cultivated compassion and loving kindness.
A group of 29 participants (12 female, 17 male, ages 21–70) participated in the retreat. They had practiced MT on average 1,792 h (SEM = 261) during their lives, ranging from 100 to more than 5,000 h. To account for test–retest effects such as improvements due to task-specific training (see Ratcliff et al., 2006), we tested a group of 29 age-, gender-, and education-matched controls at two time points separated by one month. All procedures were in accord with the requirements of the Institutional Review Board of the University of Pennsylvania.
Drift diffusion model
We estimated each individual’s drift rate, decision boundary, and nondecision time using the EZ-diffusion model. The EZ-diffusion model (Wagenmakers et al., 2007) is a simplified version of Ratcliff’s (1978) diffusion model that is particularly well-suited for modeling individual differences (van Ravenzwaaij & Oberauer, 2009). The EZ-diffusion model reduces the seven-parameter drift diffusion model of decision making to three parameters: drift rate, which is the rate at which information accumulates; decision boundary, the point at which enough information is accumulated to make a decision; and nondecision time.
Noisy exemplar model
In this equation, S is the summed similarity, αi is a forgetting parameter, as described below, and τ determines the form of the exponential generalization gradient (Shepard, 1987). The exponential generalization gradient quantifies how different two stimuli have to be for the participant to see them as different. A larger value of τ indicates that stimuli need to differ less in their similarity for the participant to see them as different. The list items si and the probe item p are represented by vector coordinates in an abstract similarity space, and |x − y| denotes the Euclidean distance between any two stimuli x and y.
NEMo assumes that every stimulus is encoded with some amount of noise, implemented by a zero-mean Gaussian with standard deviation σ. This parameter is referred to as “encoding noise.” To simulate forgetting, NEMo assumes that the most recent stimulus contributes the most to the summed similarity, and that earlier items contribute less and less. This is simulated by the parameter αi, which decreases with lag (i = 1 indicating the most recently studied item). Finally, β determines how strongly a person adapts their recognition decision threshold based on the homogeneity of the similarities between the stimuli comprising the study list. When S exceeds a decision threshold determined by maximum likelihood, the simulated participant responds yes on that trial, and otherwise responds no.
This simplified version of NEMo, incorporating only five free parameters (van Vugt et al., submitted), was fit to the behavioral data using a genetic algorithm (Mitchell, 1996), which was run for 10 generations of 1,000 individuals (inspection of the generation in which the best fit was found indicated that 10 generations were sufficient). The genetic algorithm tries to minimize the root-mean square difference (RMSD) between observed and predicted accuracy, in conjunction with the difference between the observed and predicted d' (both accuracy and d' contributed equally to the RMSD). We used both d' and RMSD as error measures in order to force the algorithm to focus on solutions in which both targets and lures were predicted well. If either targets or lures are overpredicted, then the predicted d' will be very far from the empirical d', even if the RMSD is still relatively low.
Because of the inherent stochasticity of the algorithm, a genetic algorithm will not easily get stuck in a local optimum. This stochasticity therefore obviates the need to run the algorithm multiple times with different starting points, since there will always be individuals in the population who are away from the current minimum.
Median accuracy and response times
Median RT (ms)
Median accuracy (%)
Response time variability
There was no difference between groups in the variance of RTs at T1 (ranksum z = 0.61, n.s.). We found that the variance in RTs was reduced from T1 to T2 (main effect of time, p = .0041) and that this effect was greater for the MT group than for controls (significant interaction between time and group, p = .0065; see Fig. 2b), but there was no main effect of group. Thus, the MT group had a lower RT variance at T2 relative to the control group. Investigation of lifetime hours of MT on RT variance revealed that the variance in RTs at T1 also showed a trend toward being lower with greater lifetime hours of MT practice, when adjusted for age (slope for regression of RT variability on meditation experience: –.039, p = .090; see also Fig. 2c). To determine the impact of lifetime hours of practice on the magnitude of retreat-related changes in RT variability for the MT group, a similar regression analysis was performed. There was a trend toward a relation between the change in RT variability between T2 and T1 and lifetime hours of MT practice (slope when corrected for age: –.0299, p = .086). In other words, prior experience with MT and the effects of the retreat interacted with each other. With more lifetime hours of MT, RT variability was lower, and the degree of benefit from attending the retreat for lowering RT variability was diminished with greater lifetime hours of MT. Thus, the retreat was more effective in reducing RT variability in MT practitioners with few lifetime hours of preretreat MT experience. We observed a correlation of .82 (p = 10–30) between RT variability and its mean, similar to what has been observed in previous studies (Wagenmakers & Brown, 2007).
Previous reports have shown that in many cases, increases in drift rate are accompanied by decreases in decision boundary (Bogacz et al., 2006). This is particularly true in tasks in which the participants are allotted a fixed amount of time in which they can maximize the number of trials. They adjust their boundary depending on their drift rate, because as drift rate increases the same amount of reward can be obtained in a shorter time (Bogacz et al., 2006). Although in our study the task was self-paced, participants might still have had internal time pressure, since they wanted to finish the task. We indeed found a negative correlation between drift rate and decision boundary across participants [r(115) = −.48, p = 4.2 × 10–8].
While there was no difference between groups in boundary separation at T1 (ranksum z = 0.47, n.s.; Fig. 3a), a nonparametric ANOVA showed that there was a main effect of time (p = .0033) but not of group. There was also an interaction between time and group (p = .0063), such that decision boundary separation decreased more for the MT than for the control group at T2. This indicated that the MT group moved toward a greater emphasis on speed.
If MT was a significant causal contributor to the changes in decision boundary and drift rate observed at T2, then perhaps lifetime hours of MT should show a similar effect on decision boundary, with lower preretreat decision boundaries in those with many versus few hours of lifetime MT experience. The decision boundary separation at T1 did indeed decrease with increasing lifetime MT practice, when adjusted for age (regression slope for boundary separation on meditation experience: −7.85 × 10−5, p = .041). The change in boundary separation between T2 and T1 was negatively correlated with lifetime MT practice (regression slope: –6.9 × 10−5, p = .027), indicating that for participants who already had a large amount of MT experience, the effect of the retreat on the decision boundary was smaller than for more novice participants.
We then sought to investigate what caused the observed changes in drift rate. Our delayed-recognition paradigm with highly confusable stimuli was amenable to using the NEMo (Kahana & Sekuler, 2002), which uses stimulus structure to predict participants’ accuracies. NEMo computes the likelihood of a particular decision for every trial on the basis of the summed similarity between the memory probe and all study items (see the Method section).
Average values of the noisy exemplar model parameters and significance of the interaction between time (effect of retreat) and group (mindfulness training [MT] vs. control)
Forgetting lag 2†
Forgetting lag 3
We investigated the impact of intensive MT on performance of a delayed-recognition working memory task with face stimuli. After a month of intensive MT, participants were faster and their RTs were less variable than those of an age- and education-matched control group who received no training. This change in RT performance, in the absence of group- or time-related changes in accuracy, corresponded to changes in the parameters of a mathematical model of decision making. The EZ-diffusion model revealed that the drift rate increased (and, consequently, the decision boundary decreased) with MT. When the experimental task was held constant in our participants, MT corresponded to salutary changes in model parameters from NEMo representing the quality of memory items but not forgetting. Thus, the modeling results support the hypothesis that MT may improve working memory via attentional improvements, particularly on information quality and consequent decisional processes.
While we attribute our effects to functional improvement in information processing achieved by MT participants, several alternative explanations must be acknowledged. One potential confound in our study is that control participants might not be as motivated as the MT group to do well in the task. Participants in MT spent a full month of their lives in this retreat, and therefore expected benefits, which motivated them to do particularly well at T2. It is important to note that changing motivation levels with monetary incentives may have effects that look similar to those we observed with MT. In a task-switching paradigm, participants in conditions with high prospective gains are less distractible than those in conditions with low prospective gains (Müller et al., 2009). This processing shift may correspond to changes in drift rate in a diffusion model (Voss, Rothermund & Brandtstädter, 2008) and to a decrease in encoding noise or a decrease in forgetting. Yet, arguing against a motivational interpretation of our results is the fact that we found that most of our results, such as the decrease in RT, scaled with the number of hours of lifetime meditation practice. In addition, the effect of a retreat on RT and DDM parameters decreased as a function of prior experience with MT. This indicates that MT creates lasting changes in cognition.
Future studies should, nevertheless, rule out that the effects observed in the MT group were simply motivational. They should also investigate what aspect of MT, the focus on the breath or compassion practices, underlies the results. This could be investigated by using a participant population that only practices a single type of MT.
Another potential confound we must consider is that the MT group was not only engaged in MT during the retreat but they were also in a relatively low-stress environment, surrounded by nature. Thus, the environmental context of the training, as opposed to the training itself, might have bolstered attentional functioning (see Tang & Posner, 2009). Arguing against this interpretation is the fact that our MT-related changes were very specific and restricted to RT variability and to only a subset of our modeling parameters. A final important confound to consider is that our results might have been driven by preexisting individual differences between the MT and control groups, as opposed to training-related changes. Yet, the groups did not differ in performance at T1 on most of our measures of interest. Nevertheless, a much stronger test of our claims should involve conducting a randomized control design with an active non-MT comparison intervention in novices with no prior experience with MT.
We also acknowledge several modeling-related issues. First, one may be concerned that the nondecision times we observed (on the order of 700 ms) were much greater than those typically reported in the literature on the diffusion model (where the values tend to be on the order of 400 ms). These relatively high nondecision times were likely created by the conjunction of two circumstances: (1) the relative difficulty of the task as compared to other tasks used in DDM modeling (e.g., lexical decision tasks) and (2) the unfamiliarity of our participant sample with cognitive tasks and/or with computers. In addition, older participants tend to have slower motor processes, which might have further increased nondecision time (cf. Ratcliff et al., 2004a). Indeed, there was a significant correlation between nondecision time and age [r(57) = .41, p = .0016], indicating that age was a significant contributor to long nondecision times.
Another model-related issue is whether our data warrant application of EZ diffusion. On average, 47.5 of 58 participants3 (82%) passed all of the criteria for EZ-diffusion application put forward by Wagenmakers et al. (2007). Of the participants who failed tests, most of them (8.5/58) failed to show an interaction between the correctness of the response and whether the response was a target or a lure (third criterion). Two participants failed to show a right skew in their RT distribution (first criterion; 2/58 participants), and 0.5 participant showed a difference between correct and error response times (second criterion). A failure to satisfy all criteria might lead to unreliable fits of the DDM. Given that this should increase the noise in our data, we would expect that it would decrease the chances of finding any differences between our participant groups.
Although the confounds we mentioned above must be carefully addressed in subsequent studies, the present project is noteworthy in its novel use of mathematical modeling to capture functional changes that might occur via mental training. Importantly, this training shared almost no features with the transfer task, and thus satisfies a strong test of generalizability over contexts (see Green & Bavelier, 2008). Further, the community of researchers who investigate the cognitive, affective, and neural consequences of MT and other types of training have not, as yet, used mathematical modeling as a route by which to capture the specific functional changes that MT may engender (but see Green et al., 2010, for an example of using the DDM to investigate the effect of video game training on perceptual decision making).
Mathematical models allow for an examination of precise functional processes that are not revealed by analysis of RT or accuracy data alone. These processes can be examined while simultaneously accounting for other confounding factors. In addition to behavioral studies, studies of MT and other forms of mental training with functional magnetic resonance imaging and other neural recordings should consider including modeling components (see, e.g., van Vugt et al., 2009). Since activation of distinct neural loci does not disambiguate between the functional processes subserved by the activated regions, the MT-related sensitivity of specific model parameters might help to better understand the functional involvement of specific brain regions in the effects of MT on cognition.
In summary, we have demonstrated that intensive MT can benefit performance in a visual working memory task. After a month of intensive MT, participants are faster and their RTs are less variable. This change in performance reflects an increase in the distinguishability of the information that is accumulated when a decision has to be made about whether the probe is or is not a member of the study list, and a concomitant decrease in the decision threshold. Thus, this study suggests that the influence of MT on information processing is both specific and tractable.
These predictions are based on simulations of the similarity-based model, measuring its ideal-observer d' for different parameter sets. We started the simulation with the average parameters obtained from the combined data in the present study, and then varied each parameter individually while recording the associated root-mean square deviation.
ANOVAs with d' as an independent variable revealed results similar to those for the ANOVAs with accuracy as an independent variable.
We report fractional participants when they failed a test at one but not the other of the two time points.
The authors gratefully acknowledge support from a Varela Grant from the Mind and Life Institute to M.K.v.V. They also express their thanks to Jane Carpenter and Kell Delaney of Naropa University and all the participants and the staff of Shambhala Mountain Center.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Mitchell, M. (1996). An introduction to genetic algorithms. Cambridge: MIT Press.Google Scholar
- Müller, J., Dreisbach, G., Goschke, T., Hensch, T., Lesch, K.-P., & Brocke, B. (2009). Dopamine and cognitive control: the prospect of monetary gains influences the balance between flexibility and stability in a set-shifting paradigm. European Journal of Neuroscience, 26, 3661–3668.CrossRefGoogle Scholar
- Tang, Y.-Y., Ma, Y., Wang, J., Fan, Y., Feng, S., Lu, Q., et al. (2007). Short-term meditation training improves attention and self-regulation Proceedings of the National Academy of Sciences, 104, 17152–17156. doi:10.1073/pnas.0707678104
- van Vugt, M. K., Sekuler, R., Wilson, H. R., & Kahana, M. J. (2011). Electrophysiological correlates of similarity-based interference in visual working memory (submitted).Google Scholar