Context effects in recognition memory have been studied for over 40 years.Footnote 1 The role of context in a match of a test probe with memory played a prominent role in the development and testing of the Global Matching Models (Clark & Gronlund, 1996; Humphreys, Pike, Bain, & Tehan, 1989). Context effects continue to be of interest in dual process theories of recognition, with contextual information potentially playing a role in both the familiarity and recollection components of the theories (Hockley, 2008). However, context effects are still not well understood.

The present paper seeks to contribute further to understanding these effects. This is significant not only in theory but has important real-world applications, because memory can be important to the determination of legal questions. One example commonly explored in the literature is eyewitness identification evidence, but in this paper we establish a significant applied need for understanding the impact of context effects because of their implications for trade mark law and other areas of the law that are concerned with enabling consumers to reliably identify desired products and avoid confusion when acquiring products and services.

Consumers are frequently challenged to discriminate between the brand they are looking for and a look-alike product. This can be difficult especially when the look-alike product has things in common with the sought-for brand – that is, has common context. For the purpose of understanding how consumers recognize a product that is displayed on a shelf or depicted on a website we consider context to include the physical location, the shoppers’ understanding of the physical location (e.g., an upmarket retail outlet), and any aspects of the product or display such as aspects of the packaging, brand claims, and celebrity endorsements. Some, but not all of these aspects of context can be controlled by trade mark and related laws aimed at preventing consumer confusion.Footnote 2 The present experiments focus on a quintessential example of a trade mark – a product name – and one aspect of context, namely brand claims, for example, “Yoplait makes dairy fun.”

The role of context in recognition has not always been recognized in trade mark law, where there is an active debate as to how cognitive psychology and experimental methods can better contribute to how we assess or predict consumer responses for legal purposes (Dinwoodie & Gangjee, 2015; Weatherall, 2017). Courts determine trade mark infringement by reference to whether a “hypothetical reasonable consumer”Footnote 3 would have been confused into thinking a competing product might have been produced by the same manufacturer. As imagined by the courts, this hypothetical consumer tends to be analytical when drawing such inferences, disregarding a great deal of context (especially context common to the trade), and focusing on essential differentiating features of trade marks and rival packaging.Footnote 4 For example, in an Australian trade mark case, the court ruled that consumers would not confuse “Rain Master” and “Rain King” as names of lawn sprinklers because “rain” was a common term used in marketing lawn sprinklers, and “Master” and “King” were sufficiently different to avoid the risk of confusion (Cooper Engineering Co Pty Ltd v Sigmund Pumps Ltd (1952) 86 CLR 536). The risk of consumer confusion might be remote if consumers were deliberately comparing the brands. However, if they are attempting to recognize the name of a brand that they had seen in an advertisement for sprinklers, they may be determining whether they have a memory for the joint occurrence of “rain” and the brand name. To foreshadow the current results we would expect confusion in this example because of (a) the moderate similarity between master and king, and (b) the common contextual component rain.

We set out to consider how experimental methods could be used to test more general assumptions made in cases like Cooper Engineering. Experimental proof undermining courts’ characterization of the hypothetical consumer does not necessarily mean that results in cases like Rain King/Rain Master are wrong as a matter of policy; however, as discussed below, it might at least make for more informed discussion of what trade mark law is trying to achieve.

In the memory laboratory, we know that even when participants are instructed to recognize a target and ignore the context, the recognition behavior can look like associative recognition (Humphreys & Chalmers, 2016; Ratcliff & McKoon, 1981). It thus seems possible that when consumers are trying to recognize brands in a familiar context they will either explicitly attempt to recognize the brand-context association, or may implicitly give some weight to this association in their recognition decision. This raises the question as to what happens to associative recognition when a brand-context association has been formed and an unstudied similar brand is tested in the old context. There is a straightforward prediction from the Global Matching models (Clark & Gronlund, 1996; Humphreys, Pike, et al., 1989). We can represent the study pair as AB where A is the context (in our case a brand claim), B is the studied brand and B’ is an unstudied brand that is similar to the studied brand. In these models the similarity of A to itself is multiplied by the similarity of B’ to B. If the similarity of B and B’ is low then the probability of falsely recognizing the AB’ pair is no greater than the probability of falsely recognizing a YX pair where Y is an unstudied claim and X is an unstudied brand unrelated to any studied brand. However, as the similarity of B’ to B increases, the probability of falsely recognizing the AB’ pair will increase. Thus it seems possible that a moderate increase in brand similarity will have a disproportionate (multiplicative) effect on false recognitions when the new brand name is encountered in a familiar context.

In order to establish that a multiplicative effect could be obtained in associative recognition, we had participants study brand names (B) in the presence of a context consisting of a brand claim (A). Participants were then tested on five different types of pairs. The target pairs were studied brand names tested with the study claim paired with that brand at study (AB) (see Fig. 1 for examples of brands and brand claims). The distractors consisted of non-studied claims with a non-studied brand that was dissimilar to any studied brand (YX), a studied brand claim with an unstudied brand that was dissimilar to any studied brand (AX),Footnote 5 an unstudied brand claim paired with a brand that was similar to a studied brand (YB’), and a studied brand claim with an unstudied brand that was similar to the brand that had been studied with that claim (AB’). We used membership in a product category such as flavored milk as a proxy for brand similarity (see Humphreys et al., 2010). Support for a multiplicative combination would occur if the oldness of the brand claim on its own (AX pairing) and the use of a similar name on its own (YB’ pairing) failed to increase the false alarm rate, whereas the combination of the two significantly increased the false alarm rate (AB’ pairing). The global matching models have inspired research on the role of context in recognition and in associative recognition (Bain & Humphreys, 1988; Clark & Gronlund, 1996; Humphreys, Bain, & Pike, 1989), but we do not believe that a test of this specific prediction has been published.

Fig. 1
figure 1

Graphic depiction of the experimental design. Note that Eveready and Duracell are members of the same product category (batteries), and that Colgate is a toothpaste brand, whereas S-26 is a baby formula brand so they belong to different product categories

Method

Participants

Sixty-one participants (33 male and 28 female) in Experiment 1A and 59 participants (29 male and 30 female) in Experiment 1B were recruited using Qualtrics online panels. Participants were required to be over 18 years of age, and reside in the states of New South Wales or Victoria in Australia. Sixty participants per experiment were requested from Qualtrics on the basis of pilot studies with student samples and taking into account the anticipated noise of an online sample, with replacements provided by Qualtrics for any participant who responded all old or all new during the recognition test phase. As a result, 22 participants (eight in 1A and 14 in 1B) were replaced. Demographic details for participants can be found in Table 2 in Appendix A.

Materials and design

Thirty-six product categories were selected. For each product category two familiar brands were chosen, a target brand (B) for the study list and an alternative (B’), and a brand claim of 4–9 words was written for each brand pair. For the test phase, an additional familiar product (X) was chosen from a different product category (not already in use), subject to the constraint that the brand claim would be applicable for that category. See Table 3 in Appendix B for the brands and brand claims used.

A digital sound recording of each spoken brand claim with the three brand completions (B, B’, and X) was made by an Australian female speaker. Attempts were made to retain similar speech inflections for a given brand claim over the three brand completions.

In both experiments the study list contained 24 brand claims with their brand names. The test list contained 36 pairs. Twelve were intact (AB). Twelve were old-new: six with same product category brands (AB’) and six with the brands from different product categories (AX). Finally, 12 were new-new: six with same category product brands (YB’) and six with different category product brands (YX). See Fig. 1 for a graphical depiction of the experimental design.

In order to develop the three counterbalanced stimulus sets, the 36 brand claims were randomly divided into three subsets of 12. In the first list, the first subset of brand claims was assigned to be tested as intact, the second subset was assigned to be tested as old-new, and the third subset was assigned to be tested as new-new. The subsets were cycled through the conditions for the other stimulus sets, resulting in all brand claims occurring in the intact, old-new, and new-new conditions at test. The only difference between Experiment 1A and 1B was a re-assignment of items to the three counterbalancing subsets. In both experiments the order of the claims in the study and test lists was randomized for each participant.

Procedure

Both experiments were conducted online using the Qualtrics platform. Before beginning the experiment, participants were required to provide consent, and indicate their age and state of residence.

There were four phases to the experiment all of which occurred during a single session. The first phase ensured that participants were familiar with the brand names in each product category. The pair of brand names from each product category was presented vertically on the page (B and B’, X and another brand from the X category that was not used during the experiment test phase) for a familiarity rating on a five-point scale ranging from not at all familiar to extremely familiar. To make sure that participants realized that both brands belonged to a single product category, the familiarity question presented above for each pair included (in capital letters) the name of the category to which the brands belonged.

The study phase was then administered. Each brand claim was presented both auditorily via an auto-playing SoundCloud file and in writing (black font in the center of a white screen) above the SoundCloud file for 6 s. Participants were instructed to read and listen to the brand claims carefully for a later memory test. A self-paced retention interval followed in which participants answered demographic questions which were designed to keep the participants focused on a shopping context. A list of the questions can be found in Table 2 of Appendix A.

The final phase was the recognition memory test of brand claim-brand pairs. Each pair was presented auditorily only via auto-playing SoundCloud files. Participants were instructed to respond old only if an old brand was presented in its old claim. They were to respond new to new claims with new brands, or old claims with a different brand from that presented with the claim at study. After a response participants pressed a button to advance to the next claim.

Results

Preliminary analyses revealed no differences between counterbalanced conditions in either experiment so we collapsed over the counterbalancing conditions. Table 1 presents the mean hit rate (HR) and false alarm rate (FAR) for the recognition tests as a function of experiment (1A or 1B), claim type (old or new), and brand type (B, B’, or X).

Table 1 Hit rates (HRs) for Experiments 1A and 1B to old brand claims containing target brands (B), and false alarm rates (FARs) with 95% confidence intervals (CIs) to old and new brand claims containing previously unstudied same category brands (B’) and different category brands (X)

Two 2 × 2 within-subjects ANOVAs were conducted on Experiment 1A and 1B FARs, with context type (old claim vs. new claim) and brand type (same category vs. different category) included as factors. In both Experiment 1A and Experiment 1B, the main effect of context type was significant, F(1, 60) = 7.00, MSE = .02, p = .010, η p 2 = .11 and F(1, 58) = 6.33, MSE = .03, p = .015, η p 2 = .10, respectively, with higher FARs for old claims (.28 [1A], .32 [1B]) than new claims (.23 [1A], .26 [1B]). A main effect of brand type was also observed for Experiment 1A, F(1, 60) = 8.68, MSE = .03, p = .005, η p 2 = .13, with higher FARs for same category brands (.28) than different category brands (.22). Although the trend for higher FARs for same category brands (.31) than different category brands (.27) was observed in Experiment 1B as well, the effect did not remain significant, F(1, 58) = 3.60, p = .063.

The interaction between context type and brand type was significant in Experiment 1A, F(1, 60) = 4.75, MSE = .03, p = .033, η p 2 = .07, but not in Experiment 1B, F(1, 58) = 2.35, p = .131. However, an additional 2 × 2 × 2 mixed-model ANOVA was conducted on the combined results, with experiment (1A vs. 1B) included as the between-subjects factor and context type (old claim vs. new claim) and brand type (same category vs. different category) included as within-subjects factors. As expected, the effect of context type was significant, F(1, 118) = 13.29, MSE = .02, p < .001, η p 2 = .10, as was brand type, F(1, 118) = 11.22, MSE = .03, p = .001, η p 2 = .09, and no effect of experiment was observed, F < 1. Importantly, the two-way interaction between context type and brand type was significant, F(1, 118) = 6.80, MSE = .03, p = .010, η p 2 = .05.

An examination of 95% confidence intervals as described by Loftus and Masson (1994) revealed significantly higher FARs for old claims with a same category brand than new claims with a different category brand, with the difference between means (.11) in Experiment 1A and (.10) in Experiment 1B larger than \( \sqrt{2}\times 95\%\;\mathrm{C}\mathrm{I} \) (.06 in both 1A and 1B). No difference in FARs was observed between a different category brand with an old context and a different category brand without an old context and between a same category brand without an old context and a different category brand without an old context. That is, neither the presence of an old context nor the presence of a same category brand on their own had a significant effect.

Discussion

We conducted an experiment and a replication testing whether a moderate increase in the similarity of a target that is tested in a familiar context can significantly increase the probability of a false alarm. Brands were studied with brand claims, and names from the same product category were tested with old claims or new claims. Likewise, names from a different product category were tested with old claims or new claims.

The instructions directed the participants to look for the joint occurrence of an old claim and the brand name that had been studied with that name. With these instructions we assume that the retrieval cue on at least some of the test trials would involve both the claim and the brand name. There was almost no evidence that a similar brand name on its own or a familiar context on its own produced an increase in the FAR. However, the similar brand name in conjunction with the familiar context produced a significant increase in the FAR in both experiments. Although the two-way interaction between brand similarity and the familiarity of the context was not significant in Experiment 1B, it was significant in Experiment 1A and in the combined analysis.

These results confirm the prediction from Global Matching Models that shared context can multiplicatively combine with information about the relationship between a target and probe to substantially increase the FAR when a new recognition probe is only modestly similar to the studied target. However, it will be important to examine the conditions under which the multiplicative effect occurs. For example, we did not observe a multiplicative effect in a pilot study with only visual presentation and a lack of participant familiarity with some brands.

Our experiments employ a low fidelity simulation of a shopping situation and were designed to test a theoretical prediction from the Global Matching models. These experiments do not directly indicate where consumer confusion will occur and by how much. Nevertheless our findings, especially if replicated in other experimental designs reflecting other possible permutations of the shopping context, have implications for both trade mark law and the way evidence is presented in trade mark disputes. Specifically, they cast doubts on general assumptions made by trade mark law and they tell us where to look to find evidence of confusion. From a legal perspective, acknowledgment of a multiplier effect arising from a common context challenges how trade mark examiners and courts assess the risk of confusion. Courts commonly assume, particularly in “trade dress” cases where an alleged infringer has adopted multiple elements of a rival’s marketing or packaging, that consumers ignore generic or descriptive content (like common packaging colors, or common descriptive words like “crunchy” for cereals) and focus on distinctive aspects of branding (such as invented brand names) when identifying products they are looking for (Burrell & Handler, 2016, pp. 216-217). Similarly, trade mark examiners must assess whether proposed trade marks are likely to cause confusion with existing marks. Trade mark examiners currently tend to allow the co-existence of marks with words or images which are common in the trade, or descriptive (like rain for sprinklers), provided there are other, sufficiently differentiated aspects to the mark.

If experimental evidence shows that these assumptions in the law are incorrect, there are potential implications for trade mark law around the world. We would not suggest that the results of these experiments alone undermine trade mark law: any single experimental design has limitations. And even if this were not the case, experimental proof that consumers can be confused would not necessarily mean that we should stop allowing traders to use common descriptive terms like rain, because preventing consumer confusion is not trade mark law’s only job. Courts and legislators also take into account other policy concerns, such as a desire to allow competing traders to use common colors and descriptive words. But trade mark lawyers might need to consider, at least, whether trade mark and consumer law are doing their job of preventing consumer confusion, or whether we need to be more explicit about the policy choices that are being made.

Another implication relates to evidence in trade mark cases. Courts commonly accept testimony from consumers who report confusing one trade mark for another. Our findings suggest that false alarms and presumably mistaken purchase can occur above the unrelated pair FAR if a brand from the same product category as the to-be-purchased brand occurs in the same context (in our case a brand claim).The implication is that if the test of infringement requires marks to be confusingly similar, consumer testimony should be treated with caution.

It would also have implications for trade mark law if it could be established that there was a substantial number of memory-based confusion errors when several contextual components and the brand name were similar, even when no component on its own was judged to be deceptively similar. A finding of this kind would support traditional approaches to trade mark registration, which allow firms to register their marks in product categories they operate in, and mostly sue for infringement only for uses that are in the same or closely related product categories. But equally such a finding would put a question mark over current trade mark practices, where firms register a series of marks (words, logos, aspects of packaging). When confronted by copycat packaging incorporating a range of similar elements, the trade mark owner must sue for infringement of each individual mark, and the similarity of each pair of related marks (word-word, logo-logo; color scheme-color scheme) is separately assessed. It is possible that this legal approach underestimates the interaction between different aspects of packaging.