Memory & Cognition

, Volume 42, Issue 8, pp 1357–1372

Using response time modeling to distinguish memory and decision processes in recognition and source tasks

Article

DOI: 10.3758/s13421-014-0432-z

Cite this article as:
Starns, J.J. Mem Cogn (2014) 42: 1357. doi:10.3758/s13421-014-0432-z

Abstract

Receiver operating characteristic (ROC) functions are often used to make inferences about memory processes, such as claiming that memory strength is more variable for studied versus nonstudied items. However, decision processes can produce the ROC patterns that are usually attributed to memory, so independent forms of data are needed to support strong conclusions. The present experiments tested ROC-based claims about the variability of memory evidence by modeling response time (RT) data with the diffusion model. To ensure that the model can correctly discriminate equal- and unequal-variance distributions, Experiment 1 used a numerousity discrimination task that had a direct manipulation of evidence variability. Fits of the model produced correct conclusions about evidence variability in all cases. Experiments 2 and 3 explored the effect of repeated learning trials on evidence variability in recognition and source memory tasks, respectively. Fits of the diffusion model supported the same conclusions about variability as the ROC literature. For recognition, evidence variability was higher for targets than for lures, but it did not differ on the basis of the number of learning trials for target items. For source memory, evidence variability was roughly equal for source 1 and source 2 items, and variability increased for items with additional learning attempts. These results demonstrate that RT modeling can help resolve ambiguities regarding the processes that produce different patterns in ROC data. The results strengthen the evidence that memory strength distributions have unequal variability across item types in recognition and source memory tasks.

Keywords

Diffusion model Recognition memory Source memory Unequal variance assumption 

When we look back on an event, what kind of information do we get from memory? Subjectively, we reexperience a somewhat degraded version of the perceptions, actions, thoughts, and emotions that characterized the event, but how is this information translated into explicit decisions about what we have and have not experienced in the past? Theorists from a variety of perspectives propose that all of the different types of remembered information vary on a continuum of strength, and the total evidence that an event was experienced is defined by combining these strength values (Johnson, Hashtroudi, & Lindsay, 1993; Ratcliff, 1978; Wixted, 2007).

Researchers have often attempted to test hypotheses about memory evidence using receiver operating characteristic (ROC) functions, which plot the proportion of times a response is made correctly on the proportion of times it is made incorrectly across different levels of bias (Egan, 1958; Wixted, 2007). The different bias levels are almost always created by having participants respond on a confidence scale. Conclusions about memory rely on the assumption that properties of the ROC function—such as asymmetry in the function or the degree of curvature—are a consequence of underlying memory processes. However, ROC properties can also be influenced by decision processes. For example, ROC asymmetry and curvature can be affected by differences in decision criteria for the different response alternatives in sequential sampling models (Ratcliff & Starns, 2009, 2013; Van Zandt, 2000), changes in the decision criteria on one evidence dimension across different levels of another dimension (Starns, Pazzaglia, Rotello, Hautus, & Macmillan, 2013; Starns, Rotello, & Hautus, 2014), and variability in the position of decision criteria across trials (Benjamin, Diaz, & Wee, 2009; Mueller & Weidemann, 2008).

A critical goal for advancing memory research is distinguishing which ROC results are properly interpreted in terms of memory evidence and which are produced by decision mechanisms. My primary goal was to model response time (RT) distributions from memory tasks to determine whether RT data support the same conclusions about memory as ROC analyses. The RT model was applied to two-choice responding, so decision biases that affect confidence ratings cannot distort the results. Another goal was to model results from a task with known evidence distributions to ensure that the RT model supported correct conclusions.

Evidence variability and ROC functions

The view that memory evidence is a continuous strength value can be formalized using the standard univariate signal detection model (Egan, 1958; Lockhart & Murdock, 1970). Signal detection models have been broadly applied in studies of both recognition memory, in which participants are asked to discriminate items that were studied on a previous list (targets) from items that were not (lures), and source memory, in which participants are asked to specify the context or presentation format that was paired with an item at study. In the recognition memory version of the model, memory evidence is higher for targets than lures on average, but varies from trial to trial. For example, targets can sometimes produce weak evidence (perhaps because they were not well attended at encoding), and lures can sometimes produce strong evidence (perhaps because they are similar to one or more of the studied words). Some theorists contend that strength is more variable for targets than for lures, given that encoding and retrieval processes are more successful for some items than for others (Wixted, 2007). Indeed, a number of computational models of memory produce unequal-variance distributions (Gillund & Shiffrin, 1984; Hintzman, 1986; McClelland & Chappell, 1998; Ratcliff, Sheu, & Gronlund, 1992; Shiffrin & Steyvers, 1997). However, the effects of learning variability are challenged by some theorists (e.g., Koen & Yonelinas, 2010), and some models do produce equal-variance strength distributions (e.g., Murdock, 1982).

ROC data have played a key role in testing assumptions about the variability of memory evidence distributions. Recognition ROCs are almost always poorly fit by an equal-variance model and closely fit by a model in which evidence is more variable for targets than for lures (e.g., Egan, 1958; Wixted, 2007). The unequal-variance model performs better because it can match the asymmetry in the empirical functions. Providing extra learning trials generally has little or no influence on the asymmetry of the recognition ROC, suggesting that increasing the average memory strength does not affect the across-trial variability in strength (Dube, Starns, Rotello, & Ratcliff, 2012; Glanzer, Kim, Hilford, & Adams, 1999; Ratcliff, McKoon, & Tindall, 1994; Ratcliff et al., 1992; but see Heathcote, 2003). For source memory tasks, the ROC function is typically symmetrical when the two alternative sources are equal in memory strength (Hilford, Glanzer, Kim, & DeCarlo, 2002). If one source is learned more effectively, the function becomes asymmetrical and indicates higher evidence variability for the strong source than the weak source (Starns et al., 2013; Yonelinas & Parks, 2007).

Starns et al. (2013) reported evidence that the effect of strength on the asymmetry of source memory ROC functions (or equivalently, the slope of zROC functions) is produced by decision processes. By evaluating source confidence ratings for nonstudied items incorrectly called “old,” these researchers demonstrated that participants were more willing to make high-confidence source ratings when they were more confident that the item was studied. This pattern was observed in 14 data sets, and the decision bias was often quite dramatic. For example, participants in one data set made high-confidence source ratings for fewer than 1 % of nonstudied items that they rated as “maybe old,” as compared with 64 % of nonstudied items that they rated as “definitely old.” Critically, when the confidence ratings were used to construct ROC functions, this decision bias had the same influence on ROC asymmetry as increasing the evidence variability for the stronger source. Therefore, Starns et al. (2013) suggested that learning strength might have little or no effect on evidence variability for source memory, with the apparent effect produced by decision processes.

Experiments 2 and 3 herein explored the effect of additional learning on evidence variability in recognition and source tasks by analyzing accuracy and RT data with the diffusion model (Ratcliff, 1978). These experiments used similar methods, and the primary goal was to determine whether RT modeling—like ROC modeling—shows that additional learning increases evidence variability for source tasks but not for recognition tasks. Starns and Ratcliff (2014) applied the diffusion model to nine recognition memory data sets, and the results consistently showed that memory evidence was more variable for studied items than for nonstudied items. Learning strength did not affect evidence variability for studied words, consistent with the findings of ROC studies. I expect to replicate these results in Experiment 2, and Experiment 3 will reveal whether RT and ROC modeling also support the same concluions for source memory.

RT modeling can inform whether the strength effect on the asymmetry of source memory ROC functions is based on decision processes alone or whether changes in memory evidence also play a role. The diffusion model can estimate evidence variability from a two-choice task without confidence ratings. If the strength effect is produced by confidence-scale decision biases, then the diffusion model estimates should show that additional learning has little or no effect on evidence variability for source memory. If increasing source performance does affect the overall variability of source evidence, then the diffusion model variability estimates should be higher for stronger items, matching the ROC results.

Unequal variance in RT modeling

The diffusion model predicts response proportions and RT distributions from two-choice decision tasks in terms of a sequential sampling decision process (Ratcliff, 1978). Figure 1 shows an example application of this model to a source memory task with male and female sources. For a given word on the test, samples of memory evidence are taken from moment to moment. If the evidence in the sample is more consistent with the male source than the female source, then the evidence accumulation process moves toward the top boundary (and vice versa). Variability in evidence across time results in wandering paths such as the one shown in the first panel of Fig. 1. However, most of the evidence samples should be consistent with the true source of the item, so the process tends to approach the correct boundary. When the process reaches one of the boundaries, the participant makes the associated response. Wider boundaries lead to slower responses (more accumulation time), but also more accurate responses because the within-trial variability has more time to average out.
Fig. 1

The diffusion model applied to a source memory task with male and female sources. Panel 1 shows an example drift rate and accumulation path for a single trial. a is the boundary width, z is the starting point, and sZ is the range in starting point across trials. Panel 2 shows across-trial distributions of drift rates for strong and weak items from both sources. The top set of distributions displays an equal-variance model, and the bottom set displays an unequal-variance model. Panel 3 shows predictions from the diffusion model with either the equal-variance (EV) or unequal-variance (UV) distributions displayed in Panel 2. Predictions for response proportions and the .1, .5, and .9 quantiles of the response time (RT) distributions are displayed

The critical model parameters are the separation between the decision boundaries (a), the starting point of evidence accumulation (z), the duration of nondecision processes (Ter), and the drift rate toward a response boundary (v). The amount of within-trial variability in the accumulation process (s) is treated as a scaling parameter, and I follow the standard practice of setting it to .1 (e.g., Ratcliff & Rouder, 1998).

The model accommodates between-trial variability in many of the parameters, including a uniform distribution of starting points with range sZ, a uniform distribution of nondecision times with range sT, and a normal distribution of drift rates with standard deviation η (Ratcliff & McKoon, 2008; Ratcliff, Van Zandt, & McKoon, 1999). The between-trial variation in drift is similar to the variation in strength assumed by signal detection models. Panel 2 of Fig. 1 shows between-trial drift distributions for a male/female source task such as the one used in Experiment 3. Female items usually have drift rates that approach the bottom boundary (i.e., the distribution means are below zero), and male items usually have drift rates that approach the top boundary (i.e., the distribution means are above zero). However, some items have drift rates that approach the incorrect boundary.

The displayed distributions produce an accuracy of about .65 for weak items and .8 for strong items, and they represent two possible mechanisms for this increase in accuracy. In the set of distributions on the top, strengthening items shifts the means of the drift distributions farther from zero with no effect on variability (η = .1 for all distributions). In the set on the bottom, strengthening items both shifts the means of the distributions and increases the variability (η = .1 for weak items and .2 for strong items). The drift distribution means were set to match the desired accuracy levels, and the other model parameters were similar to the average values from Experiment 3 below.

Figure 1 also shows model predictions to demonstrate how the equal- and unequal-variance alternatives can be discriminated with RT distributions (Panel 3). Predictions are shown for the proportion of “male” and “female” responses, as well as the .1, .5, and .9 quantiles of the RT distribution associated with each response. These quantiles show the leading edge, the median, and the tail of the RT distributions, respectively. All of the drift distributions are the same for weak items, so the predictions are identical as well. For strong items, the RT quantiles show that the equal- and unequal-variance versions of the model produce slightly different RT distributions for the same accuracy level. Specifically, the unequal-variance version of the model produces faster correct responses, with a very small effect on the leading edge, a slightly larger effect on the median, and a still larger effect on the tail. The error RTs show a hint of the same pattern, but the differences are smaller.

Parameter validation experiments

Figure 1 shows that changing evidence variability in the diffusion model has only a small effect on predictions. The goal of Experiment 1 was to ensure that model fits can discriminate equal- and unequal-variance distributions despite this small effect. Participants saw arrays of varying numbers of asterisks, and they decided whether a low (under 50) or high (over 50) number of asterisks was displayed. The across-trial variability in the number of asterisks was directly manipulated to produce equal- and unequal-variance evidence distributions. Different blocks of trials were designed to produce evidence distributions similar to those assumed in both the recognition task (Experiment 2) and the source task (Experiment 3). Thus, Experiment 1 will show whether or not conclusions for the memory experiments rest on sound methods. Previous investigators have performed similar validation studies for the diffusion model, and the model shows appropriate parameter estimates for response caution, response bias, evidence strength, and nondecision processes (e.g., Ratcliff & McKoon, 2008; Voss, Rothermund, & Voss, 2004). Experiment 1 will be the first experiment to extend this validation approach to the drift-rate variability parameter.

Experiment 1: Numerousity discrimination

Method

Participants

Thirty-seven University of Massachusetts undergraduates participated to earn extra credit in their psychology courses.

Materials

Each stimulus had 100 locations arranged in 10 rows and 10 columns. Some of the locations displayed an asterisk, and some were blank. The stimuli were created by randomly selecting a value from a uniform distribution between 0 and 1 for each location and displaying an asterisk for any location with a value less than p. Thus, the number of asterisks displayed across trials followed a binomial distribution with a probability parameter equal to p and an N of 100. The asterisk distributions were designed to simulate evidence distributions in either a recognition or source memory task. Throughout the “Method” and “Results” sections, the conditions will be labeled with the corresponding conditions in the memory task that they represent.

For the equal-variance recognition blocks, three stimulus categories were created to correspond to lures, weak targets, and strong targets in a recognition task. The number-of-asterisks dimension was used as an analog for a dimension of memory strength. The number of asterisks within each stimulus class followed a binomial distribution with the probability parameter (p) set to .46, .54, and .58 for lures, weak targets, and strong targets, respectively (see Fig. 2, Panel 1). All of the distributions crossed the criterion that participants were asked to apply in making their decisions (50 asterisks), analogous to a recognition task in which the memory strength for some of the targets will fall below and some lures will fall above the criterion for making an “old” response.
Fig. 2

Asterisk distributions and evidence variability estimates for the numerousity discrimination task in Experiment 1. The block conditions and item types were named based on the memory task that the evidence distributions were designed to represent. Panels 1 and 2 show distributions and variability estimates for lures (L), weak targets (WT), and strong targets (ST) in the equal-variance and unequal-variance recognition blocks, respectively. Panels 3 and 4 show distributions and variability estimates for strong female (SF), weak female (WF), weak male (WM), and strong male (SM) trials in the equal-variance and unequal-variance source blocks, respectively. The error bars are 95 % confidence intervals

For the unequal-variance recognition blocks, the trial structure and the distributions for lures and weak targets were the same as those in the equal-variance blocks. To introduce additional variability for strong targets, the probability parameter of the binomial distribution was sampled from a beta distribution (instead of being fixed across trials). The beta distribution had shape parameters α = 12 and β = 5, which produces a mean probability of .706 that a space will be filled with an asterisk. The combination of the beta and binomial distributions produces a distribution for the number of asterisks with higher variability and a slight negative skew (Fig. 2, Panel 2).

The remaining blocks were based on the source task in Experiment 3, in which participants studied words paired with a picture of either a male or a female face. Here, the number-of-asterisks dimension is analogous to a dimension ranging from evidence that is strongly characteristic of the female face to evidence that is strongly characteristic of the male face. The number of asterisks for the equal-variance source blocks followed binomial distributions with probability parameters of .42, .46, .54, and .58 for strong female, weak female, weak male, and strong male items, respectively (Fig. 2, Panel 3). The unequal variance source blocks matched the equal-variance source blocks, except that the probability parameters for strong male and strong female items were sampled from a beta distribution, with α = 5 and β = 12 (mean = .294) for female items and α = 12 and β = 5 (mean = .706) for male items (Fig. 2, Panel 4).

Design and procedures

Block type and item type were both manipulated within subjects. At the beginning of the experiment, participants were informed that they would complete multiple blocks of a numerousity task. They were told that they would see displays with varying numbers of asterisks on the screen and that they should hit the “z” key to respond “low” (below 50 asterisks) or the “/” key to respond “high” (above 50 asterisks). Participants were asked to balance speed and accuracy in their responding. Any trial with an RT below 200 ms or above 1,400 ms was immediately followed by a “too fast” or “too slow” message, respectively. The RT feedback messages were required for fewer than 1 % of the responses. Participants also saw the proportion of correct responses that they achieved on each block after all the trials were completed.

Participants first completed a practice block with the equal-variance source design, and then they completed 16 blocks that contributed data for the analyses below (four from each block condition). The order of the blocks was randomized uniquely for each participant. Recognition blocks comprised 50 lure trials, 25 weak-target trials, and 25 strong-target trials. Source blocks comprised 17 trials each for the strong-male, strong-female, weak-male, and weak-female stimuli. The order of trials within each block was random. Participants were allowed to take brief breaks between blocks if they wished.

Results

All statistical tests were conducted with α = .05. Trials with RTs shorter than 250 ms or longer than 3,500 ms were excluded from analyses, which eliminated less than 1 % of the data. Table 1 shows the proportion and RT data from all conditions. The results were as expected, with higher accuracy and shorter correct RTs for strong than for weak items. Errors were generally slower than correct responses.
Table 1

Response proportions and correct and error response time (RT) medians from all experiments

 

Measure

Experiment and Condition

p(“top”)

Correct RT Median

Error RT Median

Experiment 1

 EV Recog.

  ST

.73 (.02)

612 (14)

677 (19)

  WT

.64 (.02)

634 (15)

676 (17)

  L

.37 (.02)

652 (12)

642 (16)

 UV Recog.

  ST

.86 (.01)

564 (10)

658 (19)

  WT

.56 (.02)

644 (15)

660 (15)

  L

.32 (.02)

648 (12)

645 (16)

 EV Source

  SM

.77 (.02)

603 (11)

653 (19)

  SF

.28 (.02)

639 (11)

627 (17)

  WM

.66 (.02)

623 (12)

667 (16)

  WF

.39 (.02)

650 (12)

649 (16)

 UV Source

  SM

.90 (.01)

557 (10)

608 (34)

  SF

.13 (.01)

585 (10)

620 (28)

  WM

.65 (.02)

627 (14)

673 (17)

  WF

.40 (.02)

651 (12)

643 (16)

Experiment 2

 ST

.84 (.02)

673 (13)

758 (23)

 WT

.61 (.02)

713 (13)

769 (19)

 L

.15 (.02)

727 (16)

796 (24)

Experiment 3

 SM

.82 (.02)

801 (18)

916 (40)

 SF

.19 (.02)

790 (18)

879 (37)

 WM

.68 (.02)

844 (19)

922 (31)

 WF

.33 (.02)

848 (23)

915 (22)

Note. The experimental task was numerousity discrimination for Experiment 1, recognition for Experiment 2, and source memory for Experiment 3. p(“top”) is the probability of making the response associated with the top boundary: “high” for Experiment 1, “old” for Experiment 2, and “male” for Experiment 3. The condition names for Experiment 1 refer to the type of task the asterisk distributions were designed to represent. EV = equal variance; UV = unequal variance; Recog. = recognition; ST = strong target; WT = weak target; L = lure; SM = strong male; SF = strong female; WM = weak male; WF = weak female. Standard errors are in parentheses

Modeling procedures

The diffusion model was fit to the data from each participant using the χ2 method described by Ratcliff and Tuerlinckx (2002) and the same parameter ranges reported by Starns and Ratcliff (2014). Separate model fits were performed for the four block types. The data comprised 12 response frequencies for each condition: 6 frequencies each for the number of “high” and “low” responses in the six RT bins segregated by the .1, .3, .5, .7, and .9 quantiles of the RT distribution. Thus, to fit the data, the model had to match both the proportion of each response and the RT distributions within each category. One degree of freedom (df) is lost for each item type because the proportions in the bins must sum to 1.

For all of the reported fits, parameters for decision criteria (a, z, and sZ) and parameters for nondecision components (Ter and sT) were held constant across different stimulus types, such as displays with many or few asterisks in Experiment 1 or studied versus nonstudied items in Experiment 2. This is a nearly universal practice in diffusion research, and fixing these parameters has a strong theoretical justification (e.g., Ratcliff & McKoon, 2008). Fixing the decision parameters is analogous to using the same response criterion across stimulus types in signal detection theory. If the only thing identifying the stimulus type is the evidence available from the stimulus, then participants have no basis for changing decision standards across stimulus types (e.g., they can’t decide to be more liberal for targets, because once they have identified a stimulus as a target, the decision has already been made). Fixing the nondecision parameters is justified by the fact that all of the stimulus types are randomly mixed into the same test with the same basic stimulus processing and response output requirements; thus, the factors affecting nondecision time are controlled.1

For the blocks simulating equal- and unequal-variance recognition distributions, the data for each fit had 33 df (11 each for strong targets, weak targets, and lures). The recognition model had 12 free parameters: the width of the response boundaries (a), the mean starting point of evidence accumulation (z), the range of variation in starting point across trials (sZ), the mean nondecision time (Ter), the range of variation in nondecision time across trials (sT), the proportion of trials affected by RT contaminants (pO; see Ratcliff & Tuerlinckx, 2002), three drift rates (v) for strong targets, weak targets, and lures, and three between-trial evidence variability parameters (η) for strong targets, weak targets, and lures. Thus, the fits for the recognition blocks were associated with 21 df (33 df in the data minus 12 free parameters).

For the blocks designed to simulate a source memory task, the data had 44 df (11 each from the four stimulus types). The source model had 14 free parameters. The parameters were the same as the fits to the recognition blocks, except that there were four drift rates (v) and four evidence variability (η) parameters for strong-male, strong-female, weak-male, and weak-female items. Thus, fits to the source blocks had 30 df (44 df in the data minus 14 free parameters).

Modeling results

The diffusion model fit the numerousity data closely. The mean of the χ2 fit statistic was 33.43 for equal-variance recognition blocks (quartiles: 24.34, 30.89, 42.15), 30.80 for unequal-variance recognition blocks (quartiles: 23.62, 26.85, 35.24), 36.41 for equal-variance source blocks (quartiles: 28.68, 35.52, 42.04), and 34.66 for unequal-variance source blocks (quartiles: 22.41, 33.69, 43.33). Even slight systematic discrepancies between the model and the data can produce large increases in χ2 (Ratcliff, Thapar, Gomez, & McKoon, 2004, p. 285), so the fact that the subject averages are relatively close to the degrees of freedom indicates a close fit. Reinforcing this conclusion, Figs. 3 and 4 show fits to the group data from the equal-variance recognition and source conditions, respectively (the unequal-variance fits were very similar). The figures display response proportion and the leading edge, median, and tail of the RT distribution (the .1, .5, and .9 quantiles, respectively).2 The theoretical values fell within the 95 % confidence intervals around the observed values in all cases.
Fig. 3

Diffusion model fits to the equal-variance recognition blocks from Experiment 1. “High” and “Low” responses signify that the participant thought there were above or below 50 asterisks displayed, respectively. Item types are named for the corresponding items in the recognition memory task that the asterisk distributions were designed to represent (see Fig. 2). Response proportions and the .1, .5, and .9 quantiles of the response time (RT) distributions are displayed. The error bars are 95 % confidence intervals

Fig. 4

Diffusion model fits to the equal-variance source blocks from Experiment 1. “High” and “Low” responses signify that the participant thought there were above or below 50 asterisks displayed, respectively. Item types are named for the corresponding items in the source memory task that the asterisk distributions were designed to represent (see Fig. 2). Response proportions and the .1, .5, and .9 quantiles of the response time (RT) distributions are displayed. The error bars are 95 % confidence intervals

Tables 2 and 3 show the average drift rate and drift variability parameters across subjects, and Table 4 reports the other parameters. The main parameter of interest is the between-trial variability in drift rates (η), but I will first briefly discuss the other parameters. Across all experiments, starting points (z) were consistently close to the midpoint of the boundary width (a), indicating that responding was relatively unbiased. The drift rates (v) showed the expected pattern for all experiments: Stimuli requiring a top-boundary response had positive drift rates, stimuli requiring a bottom-boundary response had negative drift rates, and the drift rates were farther from zero for strong items than for weak items. Nondecision times were longer for the memory tasks (Experiments 2 and 3) than for the numerousity task (Experiment 1), perhaps reflecting the extra time needed to form a memory probe before evidence accumulation could begin. The proportion of trials affected by RT delays based on lapses of attention was always very low (pO < .001 for all data sets).
Table 2

Average drift rate (v) and drift variability (η) parameter values from individual-participant fits to the recognition-simulation blocks in Experiment 1 and the recognition task in Experiment 2

 

Condition

Experiment and Condition

Strong Target

Weak Target

Lure

Drift Rate (v)

 Experiment 1

  EV Recog.

.161 (.016)

.082 (.017)

−.093 (.012)

  UV Recog.

.352 (.025)

.021 (.010)

−.126 (.014)

 Experiment 2

.343 (.032)

.074 (.019)

−.283 (020)

Drift Distribution Standard Deviation (η)

 Experiment 1

  EV Recog.

.190 (.020)

.155 (.019)

.146 (.021)

  UV Recog.

.278 (.018)

.142 (.015)

.121 (.017)

 Experiment 2

.315 (.016)

.281 (.019)

.173 (014)

Note. The experiment task was numerousity discrimination for Experiment 1 and recognition for Experiment 2. The condition names for Experiment 1 refer to the type of task the asterisk distributions were designed to represent. EV Recog. = equal variance recognition; UV Recog. = unequal variance recognition. Standard errors are in parentheses

Table 3

Average drift rate (v) and drift variability (η) parameter values from individual-participant fits to the source-simulation blocks in Experiment 1 and the source task in Experiment 3

 

Condition

Experiment and Condition

Strong Male

Strong Female

Weak Male

Weak Female

Drift Rate (v)

 Experiment 1

  EV Source

.192 (.019)

−.182 (.022)

.084 (.012)

−.094 (.017)

  UV Source

.381 (.022)

−.367 (.023)

.069 (.010)

−.076 (.010)

 Experiment 3

.295 (.025)

−.322 (.037)

.123 (.026)

−.125 (.018)

Drift Distribution Standard Deviation (η)

 Experiment 1

  EV Source

.181 (.021)

.176 (.021)

.188 (.016)

.173 (.019)

  UV Source

.269 (.022)

.242 (.020)

.162 (.015)

.142 (.017)

 Experiment 3

.273 (.024)

.302 (.021)

.210 (.023)

.223 (.025)

Note. The experiment task was numerousity discrimination for Experiment 1 and source memory for Experiment 3. The condition names for Experiment 1 refer to the type of task the asterisk distributions were designed to represent. EV = equal variance; UV = unequal variance. Standard errors are in parentheses

Table 4

Average parameter values for parameters fixed across item types in the individual-participant fits

 

Parameter

Experiment and Condition

a

z

sZ

Ter

sT

Experiment 1

 EV Recog.

.104 (.002)

.055 (.002)

.024 (.004)

474 (13)

228 (14)

 UV Recog.

.106 (.002)

.057 (.002)

.024 (.004)

463 (11)

203 (11)

 EV Source

.103 (.003)

.056 (.002)

.026 (.004)

476 (11)

242 (14)

 UV Source

.109 (.003)

.060 (.002)

.021 (.004)

456 (10)

168 (9)

 Experiment 2

.110 (.004)

.061 (.003)

.010 (.003)

575 (14)

171 (13)

 Experiment 3

.121 (.004)

.060 (.003)

.007 (.003)

661 (17)

269 (23)

Note. The experiment task was numerousity discrimination for Experiment 1, recognition for Experiment 2, and source memory for Experiment 3. The condition names for Experiment 1 refer to the type of task the asterisk distributions were designed to represent. EV = equal variance; UV = unequal variance; Recog. = recognition. The average proportion of trials with RT contaminants (pO) was less than .001 for every fit. Standard errors are in parentheses.

Figure 2 shows the average η estimates from each of the four conditions in Experiment 1. The variability estimates had similar values across the different item types in the equal-variance conditions (Panels 1 and 3) but showed clear differences in the unequal-variance conditions (Panels 2 and 4). Data from the recognition blocks were submitted to a 3 (item type) × 2 (equal- vs. unequal-variance blocks) ANOVA. If the diffusion model successfully discriminated the equal- and unequal-variance distributions, the different blocks should have different variability estimates for strong targets but no differences for lures and weak targets. The ANOVA revealed a significant interaction consistent with this pattern, F(2, 72) = 7.90, p < .001, MSE = .009. Follow-up analyses showed a significant difference between equal- and unequal-variance blocks for strong targets, t(36) = 3.47, p < .001, but not weak targets or lures (lowest p = .352).

The source-block data were submitted to a 2 (gender) × 2 (strength) × 2 (variance block) ANOVA. On the basis of the distributions in Fig. 2, the estimates should show no effect of gender on evidence variability. Consistent with this expectation, η values were very similar for male (.20) and female (.18) items. The ANOVA found no main effect of gender, F(1, 36) = 1.83, n.s., MSE = .012, and no significant interactions involving the gender variable (lowest p = .583). The distributions in Fig. 2 also suggest that η estimates should be higher for strong than for weak items, but only in the unequal-variance blocks. Validating this pattern, the results showed a significant interaction between strength and block type, F(1, 36) = 18.63, p < .001, MSE = .011. Follow-up 2 (gender) × 2 (strength) ANOVAs showed a difference between strong (.26) and weak (.15) items in the unequal-variance blocks, F(1, 36) = 34.17, p < .001, MSE = .012, but very similar values for strong (.18) and weak (.18) items in the equal-variance blocks, F(1, 36) = 0.02, n.s., MSE = .008.

The reported results demonstrate that the model successfully discriminated equal- and unequal-variance evidence distributions. However, performance was higher in the strong conditions with high- versus low-variability distributions, which might suggest that η estimates were sensitive to the overall performance level, as opposed to the actual variability in evidence. To investigate this possibility, I collected data from an additional 21 participants with asterisk distributions that were highly variable for low-performance conditions and less variable for high-performance conditions. All experimental details were held constant, except that participants only completed unequal-variance source blocks and the asterisk distributions were changed to the ones shown in Fig. 5. Specifically, the number of asterisks for strong male and female items were drawn from binomial distributions with probability parameters equal to .62 and .38, respectively. Weak-male items were drawn from binomial distributions with the probability parameter varying across trials according to a beta distribution with α = 12 and β = 9, and the shape parameters were switched for weak-female items. The proportion of correct responses was .86, .83, .68, and .65 for strong-male, strong-female, weak-male, and weak-female items, respectively. Nevertheless, Fig. 5 shows that η estimates were higher for weak (.24) than for strong (.14) items. A 2 (gender) × 2 (strength) ANOVA showed a significant effect of strength, F(1, 20) = 169.46, p < .001, MSE = .001. Neither the effect of gender nor the interaction was significant (lowest p = .13). These results provide strong evidence that the estimates correctly measure variability differences regardless of the overall performance level.
Fig. 5

Asterisk distributions and evidence variability estimates for the additional sample of 21 participants collected to ensure that η estimates are not influenced by performance level. The item types were named on the basis of the memory task that the evidence distributions were designed to represent—that is, strong female (SF), weak female (WF), weak male (WM), and strong male (SM) items in a source memory task. The error bars are 95 % confidence intervals

Experiment 1 conclusions

Experiment 1 showed that the diffusion model can accurately discriminate equal- and unequal-variance distributions based on response proportion and RT data. Thus, RT modeling provides an opportunity to double-check conclusions about evidence variability based on ROC analyses. The following experiments address this goal.

Experiment 2: Recognition memory

Experiment 2 examines the effect of learning strength on evidence variability in a recognition memory task. Results are expected to replicate Starns and Ratcliff (2014); that is, evidence should be more variable for targets than for lures, but additional learning should have little or no effect on the evidence variability parameter. This experiment will provide a comparison data set that uses very similar materials and methods as the source memory experiment below.

Method

Participants

Thirty-three University of Massachusetts undergraduates participated to earn extra credit in their psychology courses.

Materials

The stimuli for each participant were randomly drawn from a pool of 680 low-frequency nouns, verbs, and adjectives (Kučera & Francis, 1967). For the practice cycle, the study list contained 20 words, with half presented once and half presented three times. The test comprised these 20 targets, along with 20 lure words that were not on the study list. For the true experiment cycles, the study list had 25 targets studied once and 25 targets studied three times. The tests contained 50 targets and 50 lures. No words appeared in more than one study/test cycle. The order of all lists was randomized anew for each participant, with the constraint that at least two other items intervened before the same strong target was repeated at study.

Design and procedures

Item type was manipulated within subjects. At the beginning of the experiment, participants were informed that they would study lists of words and take a memory test following each list. They were told that some of the words on each study list would be repeated and that they should try to remember all of the words as best they could. They were also correctly informed that words from previous study/test cycles would never appear in the current cycle, so they only had to remember the most recent study list. For each test, they were asked to press the “/” key if they thought the test word was studied or the “z” key if they thought the word was new. They were asked to balance speed and accuracy in their decisions. After the instructions, participants completed the practice cycle, and then they began the true experiment cycles.

Each studied word remained on the screen for 1,400 ms, followed by 100 ms of blank screen. Each test word remained on the screen until a response was made. Participants saw a “too slow” message following any RT longer than 1,400 ms and a “too fast” message following any RT shorter than 300 ms. Only 1 % of the trials required these messages. Accuracy feedback was not provided on a trial-by-trial basis, but participants saw a message reporting their proportion of correct responses after each test was completed.

Each participant completed three study/test cycles after the practice cycle. Participants initiated each cycle by pressing the space bar, and they were informed that they could take short breaks before starting a cycle.

Results and discussion

Responses made faster than 300 ms or slower than 3,500 ms were excluded from analyses, which eliminated less than 1 % of the data. Table 1 shows the proportion of “old” responses and the median RT for correct and incorrect responses in Experiment 2. As was expected, participants made more “old” responses for strong (.84) than for weak (.61) targets, with relatively few “old” responses for lure items (.15). Participants also made “old” responses more quickly for strong versus weak targets (673 vs. 713 ms, respectively). Error RTs were longer than correct RTs for all three item types.

Modeling results

The same model that was used for the recognition blocks in Experiment 1 was applied to the data from this experiment. Figure 6 shows the model fit for the group data, and the model values fell within the 95 % confidence intervals for the data in all cases except the .9 quantile for “new” responses to strong targets. Misses in the .9 quantiles generally have a small influence on fit indices such as χ2, and “new” responses were relatively rare for strong targets.3 Thus, the model did not miss any critical aspects of the data. The fit for each participant was associated with 21 df, and the distribution of χ2 values had a mean of 30.14, with quartiles of 22.62, 29.25, and 34.81. This mean value is in the range of the best diffusion model fits reported for recognition memory data (Starns & Ratcliff, 2014) and is similar to the fits for the recognition blocks of the numerousity task in Experiment 1.
Fig. 6

Diffusion model fits to the recognition memory data in Experiment 2. Response proportions and the .1, .5, and .9 quantiles of the response time (RT) distributions are displayed. The error bars are 95 % confidence intervals

Table 2 reports the average evidence-variability parameters across participants for lures, weak targets, and strong targets. Evidence was substantially more variable for weak and strong targets (.281 and .315) than for lures (.173) but was relatively similar across the two target strengths. An ANOVA showed a significant effect of item type, F(2, 64) = 26.66, p < .001, MSE = .007. Bonferroni-corrected follow up analyses revealed a significant difference between lures and both types of targets (corrected p < .001 for both comparisons), but no difference between strong and weak targets (corrected p = .48). Even an uncorrected comparison of strong and weak targets failed to reach significance (p = .16). Therefore, the results matched the RT estimates reported by Starns and Ratcliff (2014), as well as past work with ROC functions (e.g., Ratcliff et al., 1994; Ratcliff et al., 1992).

Experiment 3: Source memory

Experiment 3 explored the effect of learning strength on evidence variability in a source memory task. If the strength effect on the asymmetry of source ROCs is based on decision strategies for using the confidence scale (Starns et al., 2013), then the fits to the two-response source task should show no differences in evidence variability based on strength. If additional learning does affect the overall variability of source evidence, then the variability estimates from the diffusion model should be higher for strong than for weak items.

Method

Participants

Twenty-six University of Massachusetts undergraduates participated to earn extra credit in their psychology courses.

Materials

Stimuli for each participant were sampled from the same word pool that was used in Experiment 2. In addition, four faces were randomly selected from a pool of eight alternatives within each gender. All face pictures were taken from Maner et al. (2003). A different pair of male and female faces was used for each study/test cycle (including the practice cycle) to minimize interference. The practice study list contained 20 items, half studied with a male picture and half studied with a female picture. Half of the items within each gender were studied once, and half were studied three times. Study lists for the true experiment cycles had the same organization, but the total number of items was increased to 48. The order of the study lists was independently randomized for each participant, with the constraint that at least two words intervened before the same word was repeated. No words were repeated across study/test cycles. Each test list contained all of the studied items in a random order.

Design and procedure

The independent variables were source (male versus female) and strength (one versus three learning trials), and both were manipulated within subjects. The initial instructions were similar to those in Experiment 2, except that participants were informed that they would have to remember whether each word was studied with a male or a female face. At study, each word–picture combination remained on the screen for 1,900 ms, followed by 100 ms of blank screen. At test, participants were asked to press the “/” key or the “z” key to indicate that the word was studied with a male or a female face, respectively. They were again asked to balance speed and accuracy. “Too fast” and “too slow” messages were displayed following trials with RTs <300 ms and >1,600 ms, respectively. Only 1 % of the trials had RTs outside of these boundaries. Participants received feedback on their proportion of correct responses following each test list. Each participant completed the practice cycle and three regular cycles.

Results

Trials with RTs shorter than 300 ms or longer than 3,500 ms were excluded from analyses, which eliminated less than 1 % of the data. Table 1 reports the proportion of “male” responses and the RT medians for the four conditions. As was expected, extra learning trials increased the proportion of “male” responses for male items (.68 vs. .82) and decreased the proportion of “male” responses for female items (.33 vs. .19). Correct responses were faster for strong than for weak items, and error responses were consistently slower than correct responses.

Modeling results

The same model that was used for the source blocks in Experiment 1 was fit to the source memory data from this experiment. Figure 7 shows the fit to the group data, and the model values fell within the 95 % confidence interval for all of the observations except the .9 RT quantiles for weak-female items. The individual participant fits were each associated with 30 df, and the distribution of χ2 values had a mean of 40.70, with quartiles of 29.00, 41.14, and 50.35. These values are similar to the fits to the source memory blocks in Experiment 1.
Fig. 7

Diffusion model fits to the source memory data in Experiment 3. Response proportions and the .1, .5, and .9 quantiles of the response time (RT) distributions are displayed. The error bars are 95 % confidence intervals

Table 3 reports the average evidence variability (η) parameters for the four conditions. A 2 (source) × 2 (strength) ANOVA showed that η estimates were higher for strong (.29) than for weak (.22) items, F(1, 25) = 8.77, p < .01, MSE = .015. There was no main effect of source, F(1, 25) = 2.04, n.s., MSE = .005, and no interaction F(1, 25) = 0.31, n.s., MSE = .006.

The variability estimates from the RT model showed the same pattern as the ROC literature: Source evidence was more variable for items with strong learning than for items with weak learning (Starns et al., 2013; Yonelinas & Parks, 2007). Although decision strategies for using a confidence rating scale do play a role in producing the ROC strength effect (Starns et al., 2013), the present results suggest that confidence-rating artifacts do not provide a complete account.

Comparing recognition and source memory

Considering Experiments 2 and 3 together, RT modeling supports the same conclusion as ROC modeling; that is, additional learning increases the variability of source evidence, but not recognition evidence. However, this conclusion requires accepting a nonsignificant hypothesis test as evidence for the lack of an effect in recognition memory. In this section, I use a Bayesian approach to evaluate the possibility of a selective effect on source memory. Bayesian statistics are particularly well-suited to the question because they can quantify support for the null hypothesis and they provide a principled method for combining the present recognition results with the previous diffusion model fits to recognition data. I applied the Bayesian t-test outlined by Rouder, Speckman, Sun, Morey, and Iverson (2009). The likelihood for a given effect size was the probability density of the observed t value on a t distribution with a noncentrality parameter determined by the effect size (all tests were with paired-samples, so the df was N – 1). When multiple experiments were considered, the likelihoods were multiplied across observed t values. The null model assumed an effect size of zero, and the alternative model had a standard normal distribution as the prior on effect size. Reported Bayes factors give the probability of the observed data under the null divided by the probability under the alternative.

Echoing the standard analyses, the Bayesian test showed a strong effect of additional leaning on η parameters for source memory. The probability of the data was nearly 12 times higher for the alternative than for the null hypothesis (BF = .084). In contrast, the recognition data in Experiment 2 were more consistent with the null than with the alternative hypothesis, although the results were more equivocal (BF = 2.797). A single study will almost never provide strong support for the null hypothesis, so I considered the present recognition results together with 5 similar data sets that Starns and Ratcliff (2014) identified as having sufficient observations to estimate separate η parameters for each item type.4 With all of the available studies combined, the probability of the data was over 15 times higher for the null hypothesis than for the alternative (BF = 15.309). Thus, the currently available diffusion model results strongly support the contention that additional learning increases evidence variability for source memory but not recognition.

Model comparison

My primary goal was to approach the variability question as a parameter estimation issue, but the tests for variability differences could also be framed in terms of model selection. In this section, I evaluate whether differences in model fit support the same conclusions as the analyses on parameter values. For Experiment 2, I tested the difference in η values between targets and lures by comparing the fit of a model with a single η parameter across all item types (one-η) with that of a model with one η for lures and another for both types of target (two-η). To test for a difference between strong and weak targets, I compared the fit of the two-η model with that of the three-η model originally fit to the data. For Experiment 3, I tested for strength effects on η by comparing the fit of a model with a single η value for all item types with that of a model with one η parameter for weak male and female items and another for strong male and female items. Under the null hypothesis of equal η parameters, the difference in fit for each participant should theoretically follow a χ2 distribution with one df, and the summed χ2 across the N participants should follow a χ2 distribution with N df. For each comparison, I performed a single χ2 test on the summed difference in fit across participants.5

For Experiment 2, the fit comparisons indicated a significant difference between a model with a single η value for all items types and a model with different η values for targets and lures, χ2 (33) = 113.72, p < .001. In contrast, results showed no evidence of a difference between the two-η model and a model with separate η parameters for strong targets, weak targets, and lures, χ2 (33) = 45.34, n.s. For Experiment 3, fit statistics showed a significant difference between a single-η model and a model with different η parameters for strong and weak items, χ2 (26) = 56.71, p < .001. Thus, the fit comparisons supported the same conclusions as the analyses on η estimates across participants: Evidence variability differed between targets and lures in recognition memory but did not change on the basis of learning strength, whereas evidence variability increased with learning strength for source memory.

General discussion

The diffusion model accurately distinguished equal- and unequal-variance evidence distributions in fits to the numerousity discrimination task (Experiment 1). These results extend the parameter validation literature to a new parameter (η) and provide further confirmation that the model appropriately measures the psychological processes involved in decision making. For the recognition memory data (Experiment 2), applying the model revealed that memory evidence was more variable for targets than for lures, but additional learning had little or no effect on evidence variability. For source memory (Experiment 3), variability estimates did not differ between male and female items within a strength class, but they were higher for strong items than for weak items. Overall, the diffusion results support the same conclusions about evidence variability as previous experiments using ROC analyses.

Starns et al. (2013) showed that participants are more willing to make high-confidence source responses when they are more confident that an item was studied, and this decision strategy has the same influence on source memory ROC functions as an increase in evidence variability with stronger learning. As a result, Starns et al. (2013) suggested that additional learning might not actually affect the variability of source evidence. The present results provide evidence against this suggestion. Even when confidence ratings were eliminated and evidence variability was estimated from RT distributions, results showed an increase in variability from weak to strong items. As such, the present study provides a good example of how RT modeling can be used to disambiguate memory and decision processes.

Alternatives to unequal variance

In ROC research, one well-known competitor for the unequal-variance approach is Yonelinas’s (1994) dual-process model. This model assumes that recollection succeeds for a proportion of the target items equal to R, and for these items, decisions are based on retrieving qualitative contextual information from the learning event. Decisions for the remaining targets (and all of the lures) are based on familiarity, a continuous strength signal that tends to be higher for items that were recently encountered. Familiarity values are assumed to be equally variable for targets and lures, so the model predicts symmetrical ROC functions if none of the studied items are recollected. When recollection succeeds for some items, the predicted ROC function becomes asymmetrical. Thus, mixing decisions based on recollection and familiarity produces an ROC shape that is very similar to the shape predicted by the unequal-variance signal detection model (Wixted, 2007; Yonelinas & Parks, 2007).

Although the models show nearly complete mimicry for ROC data, our results highlight one unique advantage of the unequal-variance approach. Namely, this approach has been successfully implemented in models that accommodate RT data in addition to response proportions. A number of studies have simultaneously modeled both RT and ROC data, and the successful models in all of these studies used unequal-variance evidence distributions (Dube et al., 2012; Ratcliff & Starns, 2009, 2013; Starns, Ratcliff, & McKoon, 2012). RT modeling also offers a chance to validate conclusions from ROC analyses with a completely independent form of data. Together with Starns and Ratcliff (2014), the present results show an impressive level of agreement in conclusions about evidence variability from RT and ROC modeling.

At this point, there is no version of the dual-process model that is capable of estimating recollection and familiarity using RT distributions. Therefore, the dual-process account cannot be tested by evaluating the consistency of RT and ROC modeling. Although the present results do not directly refute the dual-process approach, they do offer a form of support for the unequal-variance account that is currently not available for the dual-process account. The same can be said for pure threshold models of ROC data (Batchelder & Riefer, 1999; Bröder & Schütz, 2009), since this approach has not been systematically applied to modeling RT distributions (Dube et al, 2012).

A variety of mixture models have been proposed to accommodate ROC data (e.g., DeCarlo, 2002; Onyper, Zhang, & Howard, 2010). These models all assume that studied items fall into two or more latent classes that differ in strength, such as attended versus unattended items. Mixing these latent classes functionally increases the variability of memory evidence for studied items. Both ROC analyses and diffusion model fits are relatively insensitive to the functional form of the underlying distributions (Ratcliff, 2013), so the currently available data cannot discriminate models in which the evidence distributions are Gaussian distributions versus mixtures of Gaussian distributions. Although I chose to implement the unequal-variance assumption using Gaussian distributions, the present results are fully consistent with the predictions of mixture models.

Hierarchical Bayesian modeling

Recently, theorists have implemented the diffusion model in a hierarchical Bayesian format (e.g., Vandekerckhove, Tuerlinckx, & Lee, 2011). This technique offers many advantages that are especially relevant when estimating parameters that can have subtle influences on the data, such as η. For example, parameter estimates for each participant can be influenced by the data from other participants via a participant-level distribution that is simultaneously estimated by the model. Future research should address whether hierarchical modeling also shows differences in η values for memory tasks. Although the traditional modeling procedures used herein are not ideal, Experiment 1 provides strong evidence that they appropriately discriminate equal- and unequal-variance evidence distributions.

Conclusion

Researchers often make conclusions about the nature of memory evidence on the basis of the properties of ROC functions, but these properties can also be influenced by decision mechanisms (Benjamin et al., 2009; Mueller & Weidemann, 2008; Ratcliff & Starns, 2009, 2013; Starns et al., 2013; Van Zandt, 2000). Thus, conclusions about memory must be corroborated with independent forms of evidence. The present experiments show that ROC and RT modeling support the same conclusions about the effect of additional learning on evidence variability in recognition and source tasks, providing further evidence that the unequal-variance assumption is critical in accounting for memory decisions. Together with past work, the present results demonstrate that RT modeling provides a unique and rigorous test of claims about memory processes (Criss, 2010; Dube et al., 2012; Ratcliff & Starns, 2009, 2013; Ratcliff, Thapar, & McKoon, 2004; Starns, Ratcliff, & McKoon, 2012; Starns, Ratcliff, & White, 2012; Van Zandt, 2000).

Footnotes
1

Acting alone, no other model parameter can produce the same effects on the data as changes in the across-trial drift variability parameter (η), not even the other across-trial variability parameters (sZ and sT). However, allowing all of the parameters to vary across stimulus levels produces a very flexible model that can closely mimic the effects of changes in the η parameter even if η is constant across stimulus types. For this reason, η values cannot be accurately recovered in a completely unconstrained model. Given that fixing decision criteria and nondecision parameters across stimulus levels is an extremely common practice with strong theoretical justifications, my goal was to explore the effects of variables on η estimates under these constraints.

 
2

The .3 and .7 quantiles were also fit but are not displayed, to reduce clutter.

 
3

As was mentioned, the model is actually fitting response frequencies in the 6 RT bins separated by the 1, .3, .5, .7, and .9 quantiles. The .9 quantiles are in the low-density tail of the distribution, so large shifts in this quantile have relatively small effects on the number of counts in the 5th and 6th RT bin. As a result, the model can make what appears to be a large miss in the .9 quantile, but this actually translates to a more subtle miss in the bin counts that are actually being used to calculate the fit statistic.

 
4

Results are from data sets 2, 6, 7, 8, and 9 in Starns and Ratcliff (2014). Tests were conducted on η values from fits with separate η parameters for lures, weak (nonrepeated) targets, and strong (repeated) targets within each word frequency condition. The present experiments used low-frequency words as stimuli, so I used only the low-frequency conditions from previous studies. Results were very similar if both high- and low-frequency words were considered.

 
5

To ensure that the fit differences truly followed a χ2 distribution (or at least conformed closely enough to avoid incorrect conclusions), I performed Monte Carlo simulations in which I simulated data sets from the diffusion model with each participant’s best-fitting equal-η parameters and fit the simulated data with both equal-η and free-η models. I performed 50 replications including all participants for each of the χ2 tests reported in the text, and I summed the fit differences across participants for each replication. The .95 quantile of the distribution of fit differences across replications was always close to the theoretical χ2 critical value with α = .05, so there is no reason to expect that deviations from the χ2 distribution affected the outcome of any of the reported χ2 tests.

 

Copyright information

© Psychonomic Society, Inc. 2014

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of Massachusetts – AmherstAmherstUSA

Personalised recommendations