Generality of a congruity effect in judgements of relative order

Liu, Yang S.; Chan, Michelle; Caplan, Jeremy B.

doi:10.3758/s13421-014-0426-x

Generality of a congruity effect in judgements of relative order

Published: 26 July 2014

Volume 42, pages 1086–1105, (2014)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Generality of a congruity effect in judgements of relative order

Download PDF

Yang S. Liu¹,
Michelle Chan¹ &
Jeremy B. Caplan²

568 Accesses
6 Citations
Explore all metrics

Abstract

The judgement of relative order (JOR) procedure is used to investigate serial-order memory. Measuring response times, the wording of the instructions (whether the earlier or the later item was designated as the target) reversed the direction of search in subspan lists (Chan, Ross, Earle, & Caplan Psychonomic Bulletin & Review, 16(5), 945–951, 2009). If a similar congruity effect applied to above-span lists and, furthermore, with error rate as the measure, this could suggest how to model order memory across scales. Participants performed JORs on lists of nouns (Experiment 1: list lengths = 4, 6, 8, 10) or consonants (Experiment 2: list lengths = 4, 8). In addition to the usual distance, primacy, and recency effects, instructions interacted with serial position of the later probe in both experiments, not only in response time, but also in error rate, suggesting that availability, not just accessibility, is affected by instructions. The congruity effect challenges current memory models. We fitted Hacker’s (Journal of Experimental Psychology: Human Learning and Memory, 6(6), 651–675, 1980) self-terminating search model to our data and found that a switch in search direction could explain the congruity effect for short lists, but not longer lists. This suggests that JORs may need to be understood via direct-access models, adapted to produce a congruity effect, or a mix of mechanisms.

Item-properties may influence item–item associations in serial recall

Article 16 August 2014

The item/order account of word frequency effects: Evidence from serial order tests

Article Open access 30 March 2021

Does neighborhood size really cause the word length effect?

Article 10 October 2017

Introduction

In remembering everyday information, such as a telephone number, a route, or a sequence of events, order is central (Lashley, 1951). A relatively simple test of memory for order is the judgement of relative order (JOR) procedure (Butters, Kaszniak, Glisky, Eslinger, & Shacter, 1994; Chan, Ross, Earle, & Caplan, 2009; Fozard, 1970; Hacker, 1980; Hockley, 1984; Hurst & Volpe, 1982; Klein, Shiffrin, & Criss, 2007; McElree & Dosher, 1993; Milner, 1971; Muter, 1979; Naveh-Benjamin, 1990; Wolff, 1966; Yntema & Trask, 1963). Illustrated in Fig. 1, the JOR procedure tests memory for relative order without requiring participants to produce the items from memory. The wording of a JOR question typically takes a form like, “Which of two people left the party more recently?” A logically equivalent form of this question could be “Which of two people left the party earlier?” Because, formally, all that has changed is that the target became the nontarget and vice versa, one might presume that these “earlier” and “later” instructions test the same information in memory. Perhaps this is why few studies have compared these instructions. The vast majority have used a recency instruction—hence, the term, judgement of relative recency (the origin of the acronym, JOR). However, instructions do influence JOR performance on both supra- and subspan lists. Flexser and Bower (1974) found that their distant instruction had worse overall accuracy than their recency instruction. More specifically, Chan et al. (2009) found that participants’ behavior on subspan lists resembled backward, self-terminating search for a later instruction, consistent with previous findings (Hacker, 1980; Muter, 1979), but forward, self-terminating search for an earlier instruction. Here, we ask whether this congruity effect is confined to subspan lists or generalizes to longer, supraspan lists.

Figure 2c illustrates how hypothetical response time data would look for a forward, self-terminating search strategy. The vertical axis plots the behavioral measure; for illustration purposes, we label it “error rate” or “response time,” because speed–accuracy trade-offs notwithstanding (and we found none in our data), one would expect response time and error rates to vary in the same direction as one another. The left horizontal axis plots the serial position of the earlier probe item, and the right horizontal axis plots the serial position of the later probe item. Note that the later-item serial position is plotted in descending order to minimize the bars occluding one another. In forward, self-terminating search, response time/error rate increases as a function of the earlier probe serial position, whereas the later probe serial position has no influence on response time/error rate. The opposite pattern is expected for backward, self-terminating search, where response time/error rate increases when the later probe serial position decreases (Fig. 2d). The effect of instruction can be most clearly visualized if we plot the difference between the earlier and later instruction data (Fig. 2e).

We already know that JORs for supraspan lists are qualitatively quite different, and two important findings may suggest that we would not find a congruity effect at longer list lengths: (1) a distance effect (Fig. 2a), whereby judgements are better (faster and more accurate) as the difference in serial positions (distance) of the two probe items increases (e.g., Bower, 1971; Yntema & Trask, 1963), similar to the symbolic distance effect (e.g., Banks, 1977; Holyoak, 1977; Moyer & Landauer, 1967), and (2) an inverted U-shaped serial position effect, made up of a primacy and recency effect (Fig. 2b) (e.g., Hacker, 1980; Jou, 2003; Muter, 1979; Yntema & Trask, 1963). Chan et al.’s (2009) congruity effect was found for response times, suggesting that instruction influenced access speed as a function of serial position. For supraspan JORs, error rate is also a useful dependent measure. As list length increases above span, error rate increases; in an extreme case, with a list length of 90 words, accuracy approached chance levels, rising to 60 % accuracy only for very large lags (distance of 36 words; Klein et al., 2007). Primacy and recency effects may seem at odds with self-terminating search models that are reasonable accounts of subspan data (Chan et al., 2009). However, Hacker (1980) suggested that, in the case of imperfect item memory, U-shaped serial position effects due to item memory might distort self-terminating search patterns in JORs, an idea he incorporated into his self-terminating search model. The distance effect is also incompatible with self-terminating search, because the position of the unreached probe item should not affect the outcome of the JOR decision. These arguments might lead one to expect no congruity effect in long lists.

On the other hand, there are reasons to expect there should be a congruity effect at long list lengths. Evidence suggests there is no clear distinction between short- and long-term order memory (McElree, 2006). Moreover, Muter (1979) found a backward self-terminating search pattern extending to lists of 10 items (supraspan). Hacker’s (1980) data did not show obvious break points of his “availability” parameter (representing item memory) that could have distinguished a working memory from a long-term memory. This is consistent with extensive evidence suggesting that memory is scale invariant (Brown, Neath, & Chater, 2007; Crowder, 1982; Howard & Kahana, 1999; Nairne, 2002). We suggest that it is possible that both long and short list lengths are governed by the same memory mechanisms and that the congruity effect will generalize from short to longer list lengths.

In addition, the self-terminating search model has been fitted to long-list JOR data with success (Hacker, 1980; McElree & Dosher, 1993). It is possible that a self-terminating search model operating in the forward, rather than the backward, direction could explain the earlier instruction data and, thus, account for the congruity effect. Thus, the earlier instruction might induce a dominant primacy effect even for longer lists. In serial-recall procedures, forward recall shows a dominant primacy effect, whereas backward recall shows a dominant recency effect (Beaman, 2002; Hulme et al., 1997; Li et al., 2010; Li & Lewandowsky, 1993, 1995; Madigan, 1971; Richardson, 2007; Rosen & Engle, 1997; Thomas, Milner, & Haberlandt, 2003), suggesting that if forward search is based on serial recall, this kind of mechanism might be applicable even for longer lists. At present, published studies of supraspan JORs have mainly used a recency instruction to look at serial-position effects, similar to our later instruction (Butters et al., 1994; Chan et al., 2009; Fozard, 1970; Hacker, 1980; Hockley, 1984; Hurst & Volpe, 1982; Klein et al., 2007; McElree & Dosher, 1993; Milner, 1971 Muter, 1979; Naveh-Benjamin, 1990; Wolff, 1966; Yntema & Trask, 1963). Wyer, Shoben, Fuhrman, and Bodenhausen (1985) used both sooner and later instructions with probes derived from a social-action script (e.g., going to a restaurant) and found a response time congruity effect, but not for events that were specific to the example story. A similar response time congruity effect was found for personal life events in a subset of experimental conditions (Fuhrman & Wyer, 1988). These congruity effects for action scripts and personal life events may reflect supraspan phenomena, but both types of material are arguably tapping into semantic, not episodic, temporal order. We wondered if the JOR congruity effect would generalize above span, with response time as the measure.

Since we expected error rate to be an informative dependent measure for these lists, we wondered whether instruction would affect the quality of information in memory (availability), measured by error rate, or just accessibility, measured by response time. An error rate congruity effect has been found in autobiographical order tasks with yes/no judgements (Skowronski et al., 2007; Skowronski, Walker, & Betz, 2003); however, participants’ confirmation bias (toward selecting “yes” rather than “no”) might underlie that result. We found no clear published error rate congruity effect for temporal-order memory, although error rate congruity effects have occasionally been found for perceptual comparative judgements (Petrusic, 1992). We therefore hypothesized that a similar congruity effect would be observed in supraspan JOR data, but with the addition of recency, primacy, and distance effects, with both response time and error rate as measures. If we assume that the primacy, recency, and distance effects are approximately constant between instructions, we can isolate the congruity effect by analyzing the difference between instructions (Fig. 2e), which would then look similar to that observed in subspan response time data (Chan et al., 2009). We test these hypotheses in two experiments, always manipulating instruction between subjects. Experiment 1 used lists of nouns and manipulated list length (4, 6, 8, and 10) within subjects. Experiment 2 used consonant lists and manipulated list length (4 and 8) between subjects. The experiments produced similar results, suggesting broad boundary conditions for the congruity effect. Experiment 2 used the same materials and presentation rate as Chan et al.’s experiment.

To broaden the theoretical implications of our results, we evaluated our findings with respect to Hacker’s (1980) self-terminating search model. Hacker developed this model specifically to explain JORs, but it has not been tested on the congruity effect. We hypothesize that the congruity effect can be explained by a difference in the direction of search associated with each instruction. Participants may perform forward, self-terminating search with the earlier instruction, and backward, self-terminating search with the later instruction, and we test this with fits of models based on Hacker’s model after presenting the results of both experiments. We also discuss whether other existing memory models for the JOR paradigm could account for the congruity effect in their current form or could be easily adapted to do so.

Experiment 1

Method

Participants

Fourteen participants were recruited from the University of Alberta community. Participants gave informed consent and were paid at a rate of $12 for each of five 1-h sessions, conducted on 5 consecutive days. All had normal or corrected-to-normal vision and had learned English before the age of 6. Participants were randomly assigned to the earlier or later group in alternating testing order. One participant in the later instruction did not attend the last session, so for that participant, only the first four sessions were included in the analyses.

Materials

The stimuli were 1,316 nouns generated from the MRC Psycholinguistic Database (Wilson, 1988) with word length restricted to three to eight letters, two syllables, and Kučera–Francis written frequency above 6 per million, displayed in all capital letters. Nouns that we subjectively determined might be confused with verbs were manually removed from the list. Each trial was randomly drawn from list length 4, 6, 8, and 10, counterbalanced within session. There was no within-session repetition of words, but words were reused across sessions. All participants were tested using an A1207 iMac computer with an Apple Macintosh A1048 Pro keyboard.

Procedure

The experiment was implemented with the Python Experiment-Programming Library (PyEPL; Geller, Schleifer, Sederberg, Jacobs, & Kahana, 2007) and modified from Chan et al.’s (2009) experiment (Fig. 1). Probes were pairs of items drawn from the just-presented list, and all possible combinations were equally probable and counterbalanced within subjects and within list length. Participants in the two groups received slightly different instructions: (1) excerpt from the earlier instruction: “. . . judge which of the two nouns came earlier on the list you just studied. Press the ‘/’ key if the earlier item is presented on the right side of the screen and the ‘.’ key if the earlier item is on the left side of the screen. . . ” (2) Excerpt from the later instruction: “. . . judge which of the two nouns came later on the list you just studied. Press the ‘/’ key if the later item is presented on the right side of the screen and the ‘.’ key if the later item is on the left side of the screen. . . .” Participants were instructed to respond as quickly as they could without compromising accuracy. A session consisted of nine blocks with 20 trials in each block. The first block of each session was a practice block, excluded from analyses, composed of 8 trials, to familiarize (or refamiliarize) participants with the task. The computer provided immediate accuracy feedback after each trial in practice block (“correct” or “incorrect”), and average response time (in milliseconds) and accuracy (percentage correct) at the end of each experimental block. Each trial began with a fixation asterisk, “*,” in the center of the screen, followed by a word list presented sequentially in the center of the screen. Items were presented for 1,500 ms each with an interstimulus interval (ISI) of 175 ms. This is slower than the rate Chan et al. used (575-ms presentation time and 175-ms ISI), due to the greater stimulus complexity of nouns, than consonants (e.g., Sternberg, 1975). After a 2,500-ms delay, participants were presented with a single probe consisting of two words from the just-presented list and were asked which item was presented earlier or later, depending on group, by pressing the “.” key (for the left-hand probe item) or the “/” key (for the right-hand probe item). After a 500-ms delay, participants could press a key to start the next trial.

Data analysis

Trials with response times less than 200 ms and above three standard deviations from a participant’s mean response time were removed from the data (1.3 % of responses). A linear mixed effects (LME) model (Baayen, Davidson, & Bates, 2008; Bates, 2005) was used to analyze our data. We adopted LME analysis because, as compared with ANOVA, LME handles unbalanced designs, can fit individual responses without the need for averaging of the data, and protects against type II error due to increased power (Baayen et al., 2008; Baayen & Milin, 2010). LME analyses were conducted in R (Bates, 2005), using the LME4 (Bates & Sarkar, 2007), LanguageR (Baayen, 2007), and LMERConvenienceFunctions (Tremblay, 2013) libraries. The “lmer” function was used to fit the LME model. The “pamer.fnc” function was used to calculate the p values of model parameters. Eight fixed factors were used as predictors, including instruction (earlier, later), linear and quadratic component of later probe serial position (serial position of the probe item that appeared later from the presented list), distance (absolute value of the difference between two probe’s serial positions), intact/reverse (whether probe order was consistent or inconsistent with presentation order, respectively), trial number, session number, and list length. The linear and quadratic components of the later probe serial position are orthogonal to each other, generated with the “poly” function in R. We included the quadratic term to account for expected primacy and recency effects. Participant was included as a random effect on intercept. Instruction and intact/reverse were treated as categorical factors. All other factors were scaled and centered before being entered in the model. Response time was analyzed for correct trials only and was log-transformed to reduce skewness. The error rate data were fitted with logistic regression since it is a binary variable (“correct” vs. “incorrect”). LME estimates random effects first, followed by fixed effects. In the results tables, the “Estimate” column reports the corresponding regression coefficients, along with their standard errors. For the purposes of reporting the LME results, the intact condition and the earlier instruction were set as the reference levels for the intact/reverse and instruction factors, respectively. The best fits of LME models were obtained by conducting a series of iterative tests comparing progressively simpler models with more complex models using the Bayesian information criterion (BIC). We used BIC because it penalizes free parameters more than the Akaike information criterion (AIC), making it conservative and resistant to overfitting (Motulsky & Christopoulos, 2004; Zuur, Leno, Walker, Saveliev, & Smith, 2009). This approach is adopted to remove interactions and variables that do not explain a significant amount of variance (Baayen et al., 2008). We used the LMERConvenienceFunctions (Tremblay, 2013) library to conduct fitting of fixed effects systematically. In this approach, for each condition, we started with a model that included all factor combinations and interactions, with two exceptions. (1) The quadratic component of later probe serial position was not allowed to interact with the linear component of later probe serial position because both were derived from the later probe serial position. (2) Any interaction term for which one or more levels had no data was not included. Starting with the complete model, the highest-order terms are considered first, progressing to the lowest-order terms. At each stage, considering a given order of interaction, the term with the lowest p value is identified, and a model without this term is compared with the original model using BIC. The term is kept if it improves BIC based on a threshold of 2 or if the term is also contained within a higher-order interaction. When all terms are tested for the highest-order interaction, the comparison process continues to the term with lowest p value in the next highest-order interaction, and so on. The process iterates until all interaction terms have been tested, ending with main effects (Tremblay, 2013).

Results and discussion

Error rate and response time, averaged across participants, are plotted as functions of serial position of the earlier and later probe items in Figs. 3 and 4. We isolated the congruity effect by plotting the difference between the earlier and later instructions after first removing the overall mean for each participant (right-hand columns). The best-fitting LME model is reported in Tables 1 and 2. To better visualize the pattern of serial-position effects, the overall mean was removed to correct for the mean difference between the earlier and later instructions.

Table 1 Best-fitting LME model for Experiment 1 error rate

Full size table

Table 2 Best-fitting LME model for Experiment 1 response time

Full size table

Error rates

First, we replicated the known effects of bow-shaped serial-position effects and distance effects. At all list lengths and for both instructions, the error rate data (Fig. 3) showed a distance effect (Fig. 2a), supported by a significant main effect of distance, and a bow-shaped serial-position effect involving both primacy and recency (Fig. 2b), supported by significant quadratic component of the later probe serial position in the best-fitting LME model (Table 1). The later instruction (Fig. 3, middle column) broadly resembled the earlier instruction (Fig. 3, left-hand column), except that the recency effect was more pronounced for the later instruction.

We next asked whether, despite the presence of distance and serial-position effects, there might also be a congruity effect. The difference bar graph (Fig. 3, right-hand column) shows that instruction indeed interacted with probe serial positions, supported in the LME analysis by interactions between instruction and the linear component of later probe serial position (Table 1). This interaction was due to the earlier instruction producing better performance at earlier serial positions and the later instruction producing better performance at later serial positions, in line with our predicted congruity effect (Fig. 2e).

Additional findings of interest that emerged from the best-fitting LME model were main effects of list length, intact/reverse, trial, and session. More error was associated with greater list length, reverse probe presentation order, lower trial number, and lower session number.

Importantly, list length did not interact with the congruity effect, suggesting that the congruity effect on error rate is replicated at all list lengths and does not change substantially across our four list lengths. We found a significant trial × session interaction. The interaction is consistent with learning-to-learn effects; larger trial numbers have fewer errors, and this effect is reduced in later sessions. Importantly, trial and session both did not interact with the congruity effect, showing that the congruity effect generalizes across these factors.

Finally, a significant interaction was found for instruction × intact/reverse. This is a second kind of congruity effect between instruction and reading order: Intact probes were judged better for the earlier instruction and worse for the later instruction. Reverse probes had the opposite relationship to instruction. If participants read from left to right, this would indicate better performance when the target was read first.

Response times

First, as with error rate, for all list lengths and both instructions, the response time data (Fig. 4) had significant distance and bow-shaped serial-position effects (Fig. 2a), supported by a significant main effect of distance and the quadratic component of later probe serial position, respectively, in the best-fitting LME model (Table 2).

Turning to the congruity effect, as with error rate, the difference bar graph (Fig. 4, right-hand columns) shows the predicted congruity effect, supported in the LME analysis by significant interactions between instruction and the linear component of later probe serial position (Table 2). Again, in line with our predicted congruity effect (Fig. 2e), the earlier instruction produced better performance at earlier serial positions, and vice versa for the later instruction.

We further checked whether the congruity effect was qualified by significant three-way interactions in the best-fitting LME model. The three-way instruction × linear component of later probe serial position × distance interaction showed that increasing distance was associated with a decrease in the slope of the linear component of later probe serial position for both instructions (see Fig. 1 in the supplementary materials). However, the rate of the linear component of later probe serial position function’s slope decrease was steeper for the earlier instruction than for the later instruction. The differential rate of slope decrease, thus, does not contradict the congruity effect. The instruction × quadratic component of later probe serial position × list length interaction showed a pattern of decreasing quadratic component of later probe serial-position slope for the later instructions and increasing quadratic component of later probe serial-position slope for the earlier instruction, as list length increases (see Fig. 2 in the supplementary materials). This interaction suggests that the difference in the primacy and recency effects between instructions decreases as list length increases.

Similar to the error rate results, we found trial × session and instruction × intact/reverse interactions. Instruction also interacted with trial, session, and distance. Response time in the later instruction improved more with practice than that in the earlier instruction. The later instruction also had a smaller distance effect than the earlier instruction. List length interacted with instruction, session, and later probe serial position. To summarize this effect, increasing list length was associated with longer response times for the later instruction, higher session number, and larger later probe serial position.

In sum, Experiment 1 replicated the typical primacy, recency, and distance effects (Hacker, 1980; Jou, 2003; Muter, 1979; Yntema & Trask, 1963) and extended Chan et al.’s (2009) congruity effect finding from subspan (e.g., list length 4) to supraspan (up to list length 10) data. The congruity effect appeared in both error rate and response time measures.

Experiment 2

One potential confound in Experiment 1 is that participants were given four list lengths, intermixed. It is possible that that the congruity effect is, in fact, a subspan—not supraspan—phenomenon but that the inclusion of some subspan lists (list length 4) influenced participants to apply a subspan strategy to supraspan lists. Thus, perhaps our congruity effect in supraspan lists is a special case. To address this, list length was a between-subjects factor in Experiment 2. In addition, to test for boundary conditions of the congruity effect, we switched from nouns to consonants and to a faster presentation rate (similar to the one used by Chan et al., 2009). If the congruity effect were found regardless of practice effects, stimulus type, and presentation rate, the generality of the congruity effect would be further supported.