Empirical validation of the diffusion model for recognition memory and a comparison of parameter-estimation methods

Arnold, Nina R.; Bröder, Arndt; Bayen, Ute J.

doi:10.1007/s00426-014-0608-y

Empirical validation of the diffusion model for recognition memory and a comparison of parameter-estimation methods

Original Article
Open access
Published: 04 October 2014

Volume 79, pages 882–898, (2015)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

Empirical validation of the diffusion model for recognition memory and a comparison of parameter-estimation methods

Download PDF

Nina R. Arnold¹^nAff2,
Arndt Bröder² &
Ute J. Bayen¹

3449 Accesses
54 Citations
1 Altmetric
Explore all metrics

Abstract

The diffusion model introduced by Ratcliff (Psychol Rev 85:59–108, 1978) has been applied to many binary decision tasks including recognition memory. It describes dynamic evidence accumulation unfolding over time and models choice accuracy as well as response-time distributions. Various parameters describe aspects of decision quality and response bias. In three recognition-memory experiments, the validity of the model was tested experimentally and analyzed with three different programs: fast-dm, EZ, and DMAT. Each of three central model parameters was targeted via specific experimental manipulations. All manipulations affected mainly the corresponding parameters, thus supporting the convergent validity of the measures. There were, however, smaller effects on other parameters, showing some limitations in discriminant validity.

Experimental validation of the diffusion model based on a slow response time paradigm

Article 09 December 2017

Three regularities of recognition memory: the role of bias

Article 02 May 2015

ROC residuals in signal-detection models of recognition memory

Article 31 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Recognition tests are a widely used method to assess episodic memory performance. Previously presented (old) items must be distinguished from items that were not presented before (new items). It has been acknowledged early that in this paradigm, it is not trivial to derive good measures of memory from the correct responses (hits and correct rejections) and the erroneous ones (misses and false alarms, see e.g., Schulze, 1909). Model-based measures derived from signal detection theory (SDT; e.g. Snodgrass & Corwin, 1988) or from various threshold models disentangle memory performance from response biases (see Kellen, Klauer, & Bröder, 2013, for a discussion and comparison). These approaches, however, only model the result of cognitive processes ignoring how they unfolded over time. Ratcliff (1978) took a step further with his diffusion model describing the memory process as an accumulation of evidence until a threshold is reached. The model disentangles the memory measure further into two aspects that reflect objective processing (drift rate v) and a subjective achievement level (threshold parameter a and bias parameter z/a) (Wagenmakers, 2009). Accuracy data as well as reaction-time distributions of correct and false responses are used to estimate the model parameters, and speed–accuracy trade-offs are thus modelled.

The diffusion model (also Ratcliff diffusion model) was originally formulated for recognition memory and has been applied often in this domain (e.g., Ratcliff, 1978, 2006; Ratcliff, Thapar, & McKoon, 2004; Spaniol, Madden, & Voss, 2006; Thapar, Ratcliff, & McKoon, 2003). However, it has also been applied to many other binary-choice tasks, for example, in the areas of perception (Liu & Watanabe, 2012; Ratcliff, Thapar & McKoon, 2001, 2003, 2006b), prospective memory (Boywitt & Rummel, 2012; Horn, Bayen, & Smith, 2011, 2013; Rummel, Kuhlmann, & Touron, 2013), cognitive aging (McKoon & Ratcliff, 2012, 2013; Ratcliff, Thapar, & McKoon, 2001, 2003, 2004, 2006a, b, 2007; Spaniol et al., 2006; Spaniol, Voss, & Grady, 2008), post-error slowing (Dutilh, Forstmann, Vandekerckhove, & Wagenmakers, 2013; Dutilh et al., 2012), and in experiments involving response signals (Ratcliff & McKoon, 2008), a go/no-go task (Gomez, Ratcliff, & Perea, 2007), temporal-expectation effects on reaction time (Jepma, Wagenmakers, & Nieuwenhuis, 2012), task switching (Schmitz & Voss, 2012), priming (Voss, Rothermund, Gast, & Wentura, 2013), and the Implicit Association Test (Klauer, Voss, Schmitz, & Teige-Mocigemba, 2007). It has also been applied to clinical problems such as aphasia and dyslexia (Ratcliff, Perea, Colangelo, & Buchanan, 2004), depression (Pe, Vandekerckhove, & Kuppens, 2013), and to the impact of sleep deprivation on cognitive performance (Ratcliff & Van Dongen, 2009). Thus, the diffusion model has a wide area of applications. For an overview, see Ratcliff and McKoon (2008).

Although there is some existing evidence supporting the diffusion model’s validity as discussed below, a systematic experimental validation of model parameters in the recognition-memory domain has not been performed to date. We conducted validity tests in three recognition experiments each targeting one of the core model parameters. Additionally, we analyzed our experimental data with three computer programs to compare different methods for parameter estimation: fast-dm (Voss & Voss, 2007), EZ (Wagenmakers, Van der Maas, & Grasman, 2007), and DMAT (Vandekerckhove & Tuerlinckx, 2008). We will first describe the diffusion model and its parameters in detail as well as approaches for parameter estimation.

The diffusion model

The diffusion model is designed for fast binary choices (with mean reaction times faster than about 1,500 ms). It utilizes the information available from the participant’s responses in the best possible way. That is, it considers not only mean reaction times and accuracy, but also relative speed of false and correct responses, and the shape of reaction-time distributions (Ratcliff et al., 2004).

The main idea underlying the diffusion model is shown in Fig. 1. Confronted with a binary choice task like old–new recognition, a participant will start accumulating internal evidence for the decision. Depending on the relative amounts or quality of information favoring one of the options, the evidence will drift to one of two decision boundaries, and the process will terminate in a decision for one option when one of the boundaries is crossed. The drift towards a boundary is modelled as a diffusion process, which is the continuous generalization of a random walk. There are slight differences in parameter labels in the literature. We use the labels used by Voss and Voss (2007, 2008). However, they are easily translated to other labels.

The drift rate v represents the quality of the information extracted from the stimuli (Ratcliff et al., 2004). The drift rate v depends on the degree of match between a memory probe and information stored in memory (Ratcliff, 1978; Ratcliff & Starns, 2009). It describes the information accumulation per time unit and is, therefore, the average gradient, that is, the mean rate of approach to one of the thresholds. Positive values indicate an approach to the upper threshold, whereas negative values indicate an approach to the lower threshold. The absolute value describes the speed of information accumulation. The higher the absolute value is, the faster the corresponding threshold is reached, and the less likely it is that the response opposite to the drift rate––which is often wrong––is chosen (Voss, Rothermund, & Voss, 2004). Every item in a recognition test has its own drift rate. The drift rate is assumed to be normally distributed with mean v and standard deviation s _v (Ratcliff et al., 2004).

The distance between the decision boundaries (threshold parameter a), on the other hand, defines how much information a participant needs before making a decision (Voss et al., 2004). The upper threshold a is the criterion for responding old in a recognition-memory test. Conventionally, the lower boundary is set to zero. Therefore, the value of the upper threshold a is a measure of the distance between the thresholds. Obviously, the decision boundaries also affect accuracy, because wider distances reduce the probability of the process diffusing across the “wrong” boundary by chance. Thus, v and a both contribute to performance, and a is believed to be affected by speed–accuracy trade-offs (Ratcliff, 1978). A higher threshold parameter a indicates that a person needs more information to make a decision. This leads to a higher rate of correct, but on average slower, responses (Ratcliff et al., 2004).

The starting point z of information accumulation describes possible asymmetries in the amount of information that is needed to exceed the response criteria for old versus new responses. If z equals a/2, there is no bias towards one response or the other. If z differs from a/2, the reaction times for old versus new responses will differ. The smaller the distance between starting point and threshold, the lower the reaction times will be. If z > a/2, less information is needed to exceed the upper threshold a. Thus, there is a bias towards the old response. If z < a/2, there is a bias towards the new response. It is assumed that the starting point varies between trials with a uniform distribution with mean z and range s _z (Ratcliff, 1978).

The model does not assume a linear process, but takes into account random influences that add to the constant influence of the drift rate. This explains why processes with the same drift rate can have different reaction times or even opposite responses (Ratcliff & Rouder, 1998). Random influences at time t are described by a normal distribution with mean 0. The variance increases with time. Increase in variance is represented by the diffusion constant s. s is a scaling parameter, fixed to any positive value.

In addition, there are other processes contributing to reaction time, such as motor processes and stimulus encoding. In the model, their total time is estimated as the response-time constant t ₀. It contains the non-decisional proportion of the reaction time (Ratcliff, 1978). The total reaction time RT equals RT_decision + t ₀. Like drift rate v and starting point z, this parameter differs between trials. t ₀ is uniformly distributed with range s _t (Ratcliff et al. 2004). Ratcliff (2013) showed that, in most cases, these standard assumptions about the distributions of drift rate, starting point, and response-time constant lead to the same predictions as different distributional assumptions.

Since 1978, when the diffusion model was introduced, there have been some modifications in the use of the model. Ratcliff (1978) postulated that the process is self-terminating for matches, but exhaustive for non-matches. This implies that there are two different processes for the two boundaries. The upper boundary is reached when a match is found, and all other processes are then terminated. For the lower boundary, all processes must result in a non-match. According to Ratcliff (1978), recognition is best described by parallel processes. For each item in the search set, a comparison with the memory probe is running. The observed reaction time reflects only the maximum (for non-matches) or the minimum (for matches) of the diffusion processes (Ratcliff, 1978, 1988).

The model has also been used in paradigms other than recognition memory, where the assumption of a difference between the two boundaries is unnecessary because only one simultaneous comparison is assumed. In later descriptions of the diffusion model, there is no differentiation between descriptions of the diffusion model for recognition-memory experiments and descriptions of the diffusion model for other tasks. According to these descriptions, the response is initiated as soon as a boundary is reached (Ratcliff et al., 2004, 2007; Spaniol et al. 2006, 2008; White et al. 2009). Recently, the drift criterion has attracted some attention. The drift criterion can be seen as the zero point of the drift rate. It describes the amount of evidence above which evidence accumulates towards the upper threshold and below which evidence accumulates toward the lower threshold (Criss, 2010; Ratcliff, 1978, 1981, 1985, 1987; Ratcliff, Van Zandt, & McKoon, 1999). However, this parameter is not implemented in the available programs.

Data analysis and parameter estimation with the diffusion model

The aim of the parameter estimation is to find the optimal fit between theoretical and empirical reaction-time distributions and accuracy data. Therefore, formulas for the probability density functions (PDF) or the cumulative distribution function (CDF) for both thresholds are needed. For a detailed description and discussion of this topic, see Tuerlinckx, Maris, Ratcliff, and De Boeck (2001), Tuerlinckx (2004), and Ratcliff and Tuerlinckx (2002). To estimate the parameters, a criterion for the goodness-of-fit is needed. For a discussion of different criteria see Read and Cressie (1988), Ratcliff and Tuerlinckx (2002), and Voss et al. (2004). The parameter estimation of the diffusion model has no analytical solution. Therefore, to find the best fit, numerical integration procedures are implemented (Wagenmakers et al., 2007). Parameter estimation is quite complex and is a research topic of its own (Diederich & Busemeyer, 2003; Ratcliff & Tuerlinckx, 2002; Tuerlinckx, 2004; Wagenmakers et al., 2007; Vandekerckhove & Tuerlinckx, 2007). In recent years, some programs have been developed to make the diffusion model easy to use: EZ-diffusion model (Wagenmakers et al., 2007), DMAT (Vandekerckhove & Tuerlinckx, 2008), and fast-dm (Voss & Voss, 2007). Vandekerckhove et al. (2011) developed a hierarchical extension of the diffusion model.

Van Ravenzwaaij and Oberauer (2009) tested the parameter recovery of fast-dm, DMAT, and EZ with simulated data. They calculated correlations between the true values and the estimated parameter values. All methods were able to estimate the parameters with reasonable accuracy. Fast-dm seemed to be the least robust method for parameter estimation. This was due to an incapability of recovering individual differences for the dispersion parameters s_v and s_z, and a tendency to yield smaller differences between conditions, especially for the drift rate, with a small number of trials.

DMAT requires a large number of trials. In contrast, EZ and fast-dm provide useful estimates with about 80 trials per condition (Van Ravenzwaaij & Oberauer, 2009). In our experiments, we had a relatively small number of trials per condition because we wanted to mimic standard conditions of a recognition-memory experiment. For a small number of trials, Van Ravenzwaaij and Oberauer (2009) found that EZ was most robust. However, the parameter z is fixed in this model, and since we also wanted to validate this bias parameter, we used fast-dm and DMAT to estimate the parameters and cross-checked the results with EZ for the two experiments not targeting the bias parameter (Experiments 2 and 3).

The EZ-diffusion model (Wagenmakers et al., 2007) is an algorithm that was developed to make data analyses with the diffusion model as easy as possible. It transforms accuracy and the mean and variance of the reaction times of correct responses into drift rate v, threshold parameter a, and response-time constant t ₀ via three equations. As an advantage, these equations do not require any parameter fitting and can be used even if the error rate is very small. To achieve this, the model makes some simplifications. That is, (1) it assumes there is no between-trial variability, and thus, s _v, s _z and s _t are set to zero. (2) The starting point is assumed to be unbiased, and thus, z/a is set to 0.5.

Fast-dm (Voss & Voss, 2007) uses the partial differential equation (PDE) method to compute the CDF (Voss & Voss, 2008) and the Kolmogorov–Smirnov test (KS test; Kolmogorov, 1941) to estimate the parameters and determine the model fit. The PDE method avoids infinite sums and has the advantage of evaluating all starting points at the same time, thus reducing computing time (Voss & Voss, 2008). The KS test uses the test statistic T as the optimization criterion, and parameters are chosen such that T is minimized. The reaction-time distributions of both thresholds are estimated together by giving the reaction times of the lower threshold a negative sign. The parameter space is searched via the simplex method (Nelder & Mead, 1965) to obtain the best model fit. Starting points for v, a, and t ₀ are provided by the EZ model (Wagenmakers et al., 2007). Realistic values are chosen as starting points for the other parameters.

The diffusion model analysis toolbox (DMAT; Vandekerckhove & Tuerlinckx, 2008) is a Matlab toolbox with a graphical user interface. It uses design matrices to obtain parameter estimates. Chi-square and maximum-likelihood estimates are available for parameter estimation and goodness-of-fit tests.

As described above, fast-dm and DMAT use different test statistics. Each statistic has several advantages and disadvantages, and the authors of the programs motivated the choice of their statistics differently. Voss and Voss (2007) chose the KS test because it does not aggregate data and, thus, does not lose information. Additionally, it is not affected by outliers as much as the maximum-likelihood and the Chi-square statistic. The Chi-square statistic is more robust and faster than the maximum-likelihood method (Ratcliff & Tuerlinckx, 2002). Chi-square and maximum-likelihood methods are commonly used for parameter estimation.

In applications of the diffusion model reported by Ratcliff and colleagues, s was usually set to 0.1 (e.g., Ratcliff, 1978, 1988, 2002; Ratcliff & Rouder, 1998, 2000; Ratcliff et al., 2001, 2003, 2004, 2006a, b, 2007). DMAT and EZ set s = 0.1 by default. In applications of the model reported by Voss and colleagues, s was usually set to 1 (Voss et al., 2004; Spaniol et al. 2006, 2008). The fast-dm program (Voss & Voss, 2007) also uses a diffusion constant of 1. However, parameters that were obtained via computations based on other diffusion constants can simply be transformed by multiplying all parameters (except t ₀) by the desired diffusion constant. We converted the fast-dm results to s = 0.1 to make the results more comparable.

The validity of the model

When first publishing the diffusion model, Ratcliff applied it to several recognition-memory paradigms including the old–new paradigm used here. He showed that the drift rate accounted for primacy and recency effects (Ratcliff, 1978). Since 1978, the model has been applied in many studies of recognition memory, yielding insights into the underlying dynamics of the process (e.g., Ratcliff, 1978, 2006; Ratcliff et al., 2004; Spaniol et al., 2006; Thapar et al., 2003) and having far-reaching implications, for example falsifying the global slowing hypothesis of cognitive aging (Wagenmakers, 2009).

There are also neuroscientific studies that support the model’s fit to data. Ratcliff, Cherian, and Segraves (2003) examined macaques via the moving-dot paradigm. In this paradigm, there are several dots moving randomly. Among them, however, are some dots that move simultaneously. The task is to identify the dots that move simultaneously. Ratcliff et al. (2003) showed that the macaques’ behavior as well as their neuronal activity could be fitted by the diffusion model. The fit of behavioral data from the moving-dot paradigm (Julesz, 1971) was also shown for humans (Ratcliff & McKoon, 2008).

These studies supported the model because the model fitted the data well, and they were able to explain a range of phenomena. However, interpreting parameter estimates as measures of cognitive processes requires construct validity of the measurement model in the sense of Cronbach and Meehl (1955). That is, the measures must show convergent as well as discriminant validity. Convergent validity is assessed by a measure’s covariation with related constructs, whereas discriminant validity refers to the lack of covariation with unrelated constructs. Measures are “process-pure” to the extent they show both types of validity. A systematic experimental validation assessing both types of validity is lacking in the realm of recognition memory. Parameter estimates are mathematical abstractions, and a systematic empirical justification of their psychological interpretation is indispensable.

In the perceptual domain, a systematic experimental validation of the diffusion model was conducted by Voss et al. (2004), using a color discrimination task. In a first experiment, Voss et al. manipulated variables to affect the drift rate v, the threshold parameter a, and the response-time constant t ₀. Their participants had to decide whether a dot stimulus was dominated by orange or by blue dots. There were four conditions, namely one standard condition and three other conditions that each targeted one specific model parameter. Task difficulty was increased to decrease the drift rate (difficult condition). An instruction to be very accurate was aimed at increasing the threshold parameter a exclusively (accuracy condition). Finally, by allowing participants to press the response keys with one finger only, the authors strove to increase the response-time constant t ₀ (handicap condition). They found the predicted pattern. That is, higher task difficulty decreased drift rate, accuracy instructions led to a higher threshold parameter, and the handicap condition led to an increased response-time constant t ₀. However, the authors also found unexpected results. In the accuracy condition, the t ₀ parameter was higher than in the standard condition. In the handicap condition, the drift rate for blue dominated stimuli v _blue and the starting point z/a differed significantly from those in the standard condition. The increased t ₀ parameter was easily explained because if participants have more time to respond they execute their responses more slowly. Differences in drift rate and starting point in the handicap condition, however, could not be explained that easily. However, all individual models revealed good model fit as assessed via the goodness-of-fit statistic T (see Voss et al., 2004, for a detailed description). In a second experiment, Voss et al. (2004) manipulated the starting point by promoting one response over the other. They found that the starting point was biased towards the promoted response. Overall, the models described the empirical data well. The authors concluded that the parameters of the diffusion model represent the process components of the perceptual task well. The study supported the convergent and partly the discriminant validity of the diffusion-model parameters in the perceptual domain.

Additional support for the model’s validity came from Ratcliff and Rouder (1998) for psychophysical tasks and from Wagenmakers, Ratcliff, Gomez, and McKoon (2008) for the lexical-decision task. They showed that accuracy instructions increased the threshold parameter a, and that easier stimuli have higher drift rates. Wagenmakers et al. (2008) showed that unequal presentation proportions affected not only the starting point but also the boundary separation.

The aim of the present study was to provide a similar test of the model’s validity in the recognition domain. In this article, we present three recognition-memory experiments each targeting one central model parameter. In Experiment 1, we manipulated the ratio of old to new items in the test (targeting bias parameter z). In Experiment 2, we manipulated the instructions for accuracy versus speed (targeting threshold parameter a). In Experiment 3, we used a manipulation to affect the quality of encoding (targeting drift-rate parameter v).

If each manipulation affects the predicted parameter in the expected direction without influencing other parameters, this would be strong support for the validity of the model. Therefore, we tested if experimental manipulations targeting the process components of the diffusion model affected the corresponding parameters (convergent validity) and only these (discriminant validity).

Ratcliff (1978) advised against between-subject designs because in such designs, differences in reaction times may be due to between-group differences in speed–accuracy criteria (threshold parameter a). However, some variables cannot be experimentally manipulated within participants, such as, for example, the age variable in studies of cognitive aging (e.g., Spaniol et al., 2006). Hence, it is useful to know if the model is valid for both types of design. We, therefore, tested model validity using within-subject designs (Experiment 3) as well as between-subject designs (Experiments 1 and 2).

We analyzed the data with three different methods: fast-dm (Voss & Voss 2007), EZ (Wagenmakers et al., 2007), and DMAT (Vandekerckhove & Tuerlinckx, 2008). Van Ravenzwaaij and Oberauer (2009) compared these methods with simulated data. For individual differences, they found that EZ did better than fast-dm and DMAT, and that there was no consistent difference between fast-dm and DMAT regarding the correlation with the true values that generated the data. Fast-dm and DMAT both had difficulties with the dispersion parameters which are not estimated by EZ. Regarding parameter means, EZ showed a small bias to underestimate drift rate and non-decision time and to overestimate the threshold parameter. However, it covered the mean structure of the data and showed mean parameter differences between conditions in the expected direction. DMAT showed the smallest bias, but underestimated response-time constant t ₀, and overestimated drift rate and boundary separation. It covered group differences well. Fast-dm showed the largest bias and showed smaller group differences than there were in the simulated data sets. Van Ravenzwaaij and Oberauer concluded that all three methods show reasonable accuracy when they have sufficient data points. DMAT required a large number of data points, whereas EZ and fast-dm needed only 80 data points to produce reasonable estimates. EZ and DMAT proved better at detecting group differences. Thus, it is not easy to decide which toolbox to use. EZ seems to be very accurate but cannot detect differences in the bias parameter. DMAT is better than fast-dm at detecting group differences but needs more trials to yield reasonable estimates.

The aim of our study is similar to that by Voss et al. (2004) in that we experimentally evaluated the validity of the diffusion model. While Voss and colleagues validated the model in the perceptual domain, we evaluated its validity for recognition-memory experiments. There is no a priori reason to believe that perceptual evidence accumulation and retrieval from memory follow the same laws. Hence, an assessment of construct validity is necessary in both domains. Additionally, our work is similar to the work by van Ravenzwaaij and Oberauer (2009) in the sense that it compares different methods for estimating diffusion-model parameters. Unlike van Ravenzwaaij and Oberauer, we did not simulate data but we analyzed our data with all three toolboxes to perform a systematic comparison of the three methods with experimental data. Our experiments were typical recognition experiments and did, hence, not provide perfect conditions for data analysis with the diffusion model. For example, we used relatively few trials (resulting in relatively few error responses) compared with a lexical-decision task or a perceptual task. Hence, we examined the performance of the three methods in compromised fitting situations.

Experiment 1

The first experiment tested the validity of the starting-point parameter z, using a standard response-bias manipulation, namely the manipulation of the ratio of old to new items in the test (e.g., Bröder & Schütz, 2009; Criss, 2010; Macmillan & Creelman, 2005; Starns, Ratcliff, & McKoon, 2012). Participants were informed about this ratio. Words were used as stimuli. We expected the ratio manipulation to affect the bias parameter z/a, exclusively. If there are more old words than new words in the test––and participants are aware of this––the starting point is expected to be biased towards the threshold for the old response. Accordingly, if there are more new words than old words in the test, the starting point is expected to be biased towards the threshold for the new response. If this manipulation specifically affects the bias parameter and not the other parameters, this would provide strong support for the diffusion model.

This response-bias manipulation was used by Rotello et al. (2006), for example. They found that participants adopted a lenient signal detection criterion when they were informed that the majority of the test items were old. The signal detection criterion resembles the bias parameter z/a of the diffusion model (Ratcliff & McKoon, 2008). Hit rate and false-alarm rate both increase as the proportion of old items increases (Criss, 2010; Rotello et al., 2006). Bröder and Schütz (2009) showed that this manipulation affected the bias parameters in SDT and a two high-threshold model in a similar fashion.

Methods

Participants

60 participants (53 female) took part in the experiment. They were students at the University of Düsseldorf (M (age) = 22.3 years, range 18–35 years) who received course credit or monetary payment.

Design

We manipulated the ratio of old to new items between participants with two levels (1:2 versus 2:1).

Materials

Items were drawn from a pool of 285 nouns that we selected from a collection of German nouns normed for concreteness (Hager & Hasselhorn, 1994). The ratings vary between −20 (very abstract) and +20 (very concrete). Our pool included 285 concrete nouns (mean ratings >+5) of 4–9 letters.

Procedure

There were one or two participants in each session, seated in individual computer booths. Stimulus presentation and response recordings were computer directed. For each participant, 140 nouns were randomly drawn from the pool for the study list. They were presented one at a time for 2 s each in the center of the screen, preceded by a primacy buffer of five items that were the same for all participants. Participants were instructed to concentrate on the words and to memorize them. After a three-minute filler task (mental rotation), the test phase followed. In the old-bias condition, there were 140 old nouns (i.e., all nouns from the study list) and 70 new nouns (randomly drawn from the remaining items in the pool). In the new-bias condition, there were 70 old nouns (randomly drawn from the study list) and 140 new nouns. Participants were informed about the number of old and new words before the test phase started. To ensure understanding of the instructions, participants were asked if there were more old words or more new words in the test. All participants could answer this question correctly. Two marked keys on the keyboard (C and M) were used for the responses in the test. The assignment of the keys to the response options old and new was counterbalanced across participants. Three seconds after response selection, the next item appeared on the screen. If the latency of a response exceeded 4 s, a reminder appeared on the screen prompting the participant to respond faster. After completion of the recognition test, participants were debriefed. The average length of a session was approximately 45 min.

Results

Performance measures

Mean hit rates were 0.61 (SD = 0.14) in the new-bias condition and 0.71 (SD = 0.14) in the old-bias-condition, a significant difference, t(58) = −2.45, p = 0.02, d = 0.63. False-alarm rates were 0.17 (SD = 0.10) in the new-bias condition and 0.29 (SD = 0.15) in the old-bias condition, also a significant difference, t(58) = −3.65, p < 0.01, d = 0.94. The two groups did not differ in terms of SDT’s sensitivity parameter d′ (M (new-bias) = 1.35, SD (new-bias) = 0.54; M (old-bias) = 1.22, SD (old-bias) = 0.64) but differed significantly in the response criterion c (M (new-bias) = 0.38, SD (new-bias) = 0.41; M (old-bias) = 0.02, SD (old-bias) = 0.35), t(58) = 3.65, p < 0.01, d = 0.94. Mean reaction times showed no significant differences. They were 945 ms (SD = 0.14) in the new-bias condition and 978 ms (SD = 0.14) in the old-bias condition.

Parameter estimation and model fit

First, we performed parameter estimation and goodness-of-fit tests with the fast-dm program (Voss & Voss, 2007) and with DMAT (Vandekerckhove & Tuerlinckx, 2008). For each participant, we calculated one model with two different drift rates––one for old and one for new items. Each model was based on 210 trials (for the drift rates there were 140 and 70 trials, respectively). Following Voss et al. (2004), we excluded trials with reaction times below 300 ms and above 4,000 ms from analyses because Ratcliff and Tuerlinckx (2002) showed that outliers may have a strong effect on parameter estimation, and because after 4,000 ms, participants were reminded to answer faster. We excluded a total of 37 trials (<1 %). The upper threshold was associated with the old response; the lower threshold was associated with the new response and was set to 0. Thus, negative drift rates indicate an approach toward the new response, whereas positive drift rates indicate an approach toward the old response.

We estimated eight parameters per participant: the mean bias parameter z, the mean upper threshold a, the mean drift rate for old items v _old, the mean drift rate for new items v _new, the mean response-time constant t ₀, the range of the bias parameter s _z, the range of the response-time constant s _t, and the standard deviation of the drift rates s _v. Like Voss et al. (2004), we present z/a instead of z because z/a is easier to interpret. A bias parameter of z/a = 0.5 represents an unbiased starting point. Values greater than 0.5 indicate a bias towards the old response; values lower that 0.5 indicate a bias towards the new response.

For fast-dm, the KS test showed a good fit for all individual models (p > 0.05). For DMAT, we used the Chi-square method with default bins to estimate parameters and to calculate the model fit. The Chi-square test showed good model fit for 57 models and bad model fit for the remaining three individual models. We only included models with sufficient model fit (i.e., p > 0.05). Since some participants made very few mistakes, we encountered several warnings with DMAT. However, we included the parameter estimates in the analysis when they had reasonable fit. As this experiment was designed to target the bias-parameter z/a, we did not analyze the data with the EZ method because in EZ, z/a is set to 0.5.

Parameter analyses with fast-dm

The significance level was set to 0.05 for all our tests. Drift rates were significantly steeper for new items than for old items in both conditions (new-bias: M (old) = 0.05, SD (old) = 0.08, M (new) = 0.14, SD (new) = 0.06, t(29) = −5.60, p < 0.01, d = 1.02; old-bias: M (old) = 0.04, SD (old) = 0. 05, M (new) = 0.14, SD (new) = 0.07, t(29) = −6.41, p < 0.01, d = 1.17). To test the influence of the manipulation, we conducted independent-samples t tests for each parameter. As predicted, the bias-parameter z/a was significantly higher in the old-bias condition than in the new-bias condition, M (old-bias) = 0.66, SD (old-bias) = 0.09, M (new-bias) = 0.49, SD (new-bias) = 0.10, t(58) = −7.11, p < 0.01, d = 1.84. As z/a = 0.5 represents an unbiased starting point, both conditions should differ significantly from this neutral point. The bias parameter for new items was not significantly different from 0.5, t(29) = −0.69, p = 0.49, d = 0.01, which suggests that contrary to prediction there was no bias in the starting point. In the old condition, the bias-parameter was significantly higher than 0.5, t(29) = −9.86, p < 0.01, d = 1.78, as predicted.

Contrary to predictions, the threshold parameter a also differed significantly between conditions, M (old-bias) = 0.14, SD (old-bias) = 0.02, M (new-bias) = 0.13, SD (new-bias) = 0.02, t(58) = −2.13, p = 0.04, d = 0.55. Participants in the old-bias condition showed a larger value of the threshold parameter than participants in the new-bias condition. Thus, the former were more conservative. The effect size (measured by Cohen’s d), however, was only about one-third of that of the bias parameter. No other comparison yielded significance (all p > 0.05). Averaged mean parameter estimates are shown in Fig. 2.

Parameter analyses with DMAT

There was no significant difference between the absolute value of the drift rates for old and new items in either condition (all p > 0.05) Again, the bias-parameter z/a was significantly higher in the old-bias condition than in the new-bias condition, M (old-bias) = 0.60, SD (old-bias) = 0.16, M (new-bias) = 0.46, SD (new-bias) = 0.13, t(58) = −3.70, p < 0.01, d = 0.99. The bias parameter of the old-bias condition differed significantly from 0.5, p < 0.01, d = 0.65, but the bias parameter in the new-bias condition did not, p = 0.12, d = 0.31. No other comparison yielded significance (all p > 0.05). Averaged mean parameter estimates are shown in Fig. 3.

Discussion

We conducted this experiment to validate the interpretation of the bias parameter z/a of the diffusion model. We manipulated the proportion of old to new items at test. This should affect the bias parameter and have no effect on other parameters. In line with the hypothesis, the manipulation affected the bias parameter most strongly according to both estimation methods. The effect size d was large to very large in each case according to Cohen’s (1988) conventions. However, the manipulation also had a medium-sized effect on the threshold parameter as estimated with fast-dm.

The bias parameter z/a is the starting point of the diffusion process. Along with the thresholds (parameter a) it defines the amount of information that is necessary to make a decision to call the item old or new. When there were more old items in the test, the starting point moved towards the upper threshold, but at the same time the thresholds moved apart. Whether the effect on parameter a is a genuine psychological effect of stricter criteria or rather a problem of missing discriminant validity of the model parameters estimated with fast-dm cannot be decided at this point. If it were the former, it would underline Ratcliff’s (1978) warning against between-subjects designs which may lead to differing criteria in the experimental conditions, although in this case for unknown reasons.

To summarize, in Experiment 1, both estimation methods showed convergent validity and found the predicted difference in the starting point. However, only DMAT showed satisfying discriminant validity. Fast-dm found unpredicted differences in one other parameter, although this effect was considerably smaller.

Experiment 2

In Experiment 2, we used the same materials and similar procedures as in Experiment 1. The aim of this experiment was to test the validity of the threshold parameter a. Participants received different kinds of feedback depending on their experimental condition. In the accuracy condition, participants received negative feedback if they made a mistake. In the speed condition, participants received negative feedback if they responded more slowly than within 1,000 ms. This manipulation was expected to lead to an adjustment of thresholds. Participants in the accuracy condition should adopt more conservative criteria and thus have a higher threshold parameter than participants in the speed condition. Ratcliff et al. (2004) used a similar manipulation as a within-subject manipulation, but they fixed the other model parameters between the conditions and compared the results for young and older adults. They showed that the model captured the effect of speed and accuracy instructions with only threshold parameter a changing.