Introduction

Synesthesia is known as a union of sensations (e.g., Cohen Kadosh, Cohen Kadosh, & Henik, 2007) or as a union or joining of the senses (e.g., Howes, 2006; Melara & O’Brien, 1987; Wicker, 1968). Over the decades, the term has inspired a long series of definitions and controversies (see Jewanski, Simner, Day, & Ward, 2011; Marks, 2011; Simner, 2012), suggesting that the word itself acts as something of a placeholder with which to characterize the process (or processes) that underlie surprising reports of associations between two apparently disjoint sensations, categories, or sensory dimensions. The term has been used to describe disparate cases, from those where individuals give consistent reports of seeing colors when hearing sounds (e.g., Howells, 1944; Mudge, 1920; Neufeld et al., 2012; Ward, Huckstep, & Tsakanikos, 2006; Zigler, 1930) or feeling shapes when tasting foods (Cytowic & Wood, 1982) to more singular cases of famous writers and composers who appear to have created exceptional musical combinations or metaphors (see Harrison, 2001, for a review; although see also Dann, 1998). It has also been used to explain the more general connections that the majority of people appear to make between “unlike stimuli” (Collier, 1996, p. 4)—say, high-pitched sounds and bright colors, speech sounds and shapes (Köhler, 1929), or the flavor of certain foods and the notion of sharpness (see Gal, Wheeler, & Shiv, 2007; Marks, 1982; Rader & Tellegen, 1987; Spence, Ngo, Percival, & Smith, 2013; Williams, 1976). However, whereas the first series of cases mentioned above are nowadays unanimously labeled as synesthesia, the latter series of cases have not always been. Even those who, like Martino and Marks, have argued that all cases should count as synesthesia agree that the most common form is less “strongly” synesthetic than the striking joint experiences first labeled under that heading: “Strong synaesthesia is characterized by a vivid image in one sensory modality in response to stimulation in another one. Weak synaesthesia is characterized by cross-sensory correspondences expressed through language, perceptual similarity and perceptual interactions during information processing” (Martino & Marks, 2001, p. 61). Others have preferred to talk more cautiously about synesthetic associations (Parise & Spence, 2008; K. Wagner & Dobkins, 2011) or synesthetic correspondences (Braaten, 1993; Martino & Marks, 2000; Melara & O’Brien, 1987; Parise & Spence, 2009; Walker et al., 2010). However, the same phenomena have been described using a variety of different headings that make no mention of synesthesia, including terms such as crossmodal correspondences (e.g., Deroy & Valentin, 2011; Gilbert, Martin, & Kemp, 1996; Spence, 2011; Zellner, McGarry, Mattern-McClory, & Abreu, 2008), intermodal correspondences (Walker-Andrews, 1994), correspondences (Mondloch & Maurer, 2004), cross-modal equivalences (Lewkowicz & Turkewitz, 1980), natural crossmodal mappings (Evans & Treisman, 2010), metaphorical mappings (S. Wagner, Winner, Cicchetti, & Gardner, 1981), and crossmodal associations (Crisinel & Spence, 2011; Seigneuric, Durand, Jiang, Baudouin, & Schaal, 2010).

Since what is observed in these cases is only a tendency for a sensory feature, or attribute, in one modality—either physically present or merely imagined—to be matched with a sensory feature in another modality (Spence, 2011), it can, at first, seem odd to assimilate them to synesthesia. These cases are not only typically devoid of any conscious perceptual concurrent, but also frequent (and perhaps universal) in the population, whereas synasthesia has been defined as “a conscious experience of systematically induced sensory attributes that are not experienced by most people under comparable conditions” (Grossenbacher & Lovelace, 2001, p. 36; Ward & Mattingley, 2006). The reintegration of cases without a conscious concurrent into the synesthetic category can be seen as representing something of a regression, given that the hard-won scientific respectability given to synesthesia was made precisely thanks to the distinction between those individuals having genuine atypical conscious concurrents and those individuals simply exhibiting a tendency to associate certain sensory features or concepts. One could, of course, argue for the contrary—that is, that the extension of the term to more common cases brings it a new form of legitimacy (as advocated, for instance, by Karwoski, Odbert, & Osgood, 1942, or Osgood, Suci, & Tannenbaum, 1957). It is fair to underscore that the historical roots of the notion should not constrain what the term covers and that, as far as it can be extended to other cases where the existence of a conscious concurrent is not crucial, such as nonsensory cases (e.g., grapheme-personification; Amin et al., 2011), or hypnotically induced cases (Cohen Kadosh, Henik, Catena, Walsh, & Fuentes, 2009), the relevance of its extension to nonnecessarily conscious surprising crossmodal matchings is indeed an open question. This said, most cases that have generated substantial interest in both the scientific literature and the wider public arena continue to be of the conscious, rare, and frequently crossmodal kind (what we have called canonical synesthesia; see Deroy & Spence, submitted): Cases where people taste shapes, see sounds, or experience blue when thinking of Wednesday (e.g., Cytowic, 1998; Cytowic & Eagleman, 2009; Seaberg, 2011; Ward, 2008). The tension here is palpable: What makes synesthesia intriguing to both researchers and the general public alike is precisely those atypical conscious experiences, and this fascinating or intriguing aspect is the starting point explaining why everyone feels inclined to call the sort of crossmodal associations experienced by us all synesthetic.

However, since these associations are increasingly coming to be studied by researchers (e.g., Chen & Spence, 2011; Crisinel & Spence, 2010, 2011; Gallace, Boschin, & Spence, 2011; Hirata, Ukita, & Kita, 2011; Ludwig, Adachi, & Matzuzawa, 2011; Walker et al., 2010), the question of their relation to synesthesia has to be examined more systematically. Methodologically, it is important not to let a variety of labels subsist in the field, since it is likely to confuse researchers and not reference together studies that document similar phenomena. Scientifically, it is important to ask when and how the presence of a conscious experience makes, or at least suggests, a difference in kind between two processes. Should cases where the presentation or representation of one sensory attribute in a given modality is sufficient to elicit a conscious experience in another distinct modality, such as canonical synasthetic cases of colored–hearing, be seen as continuous with cases where it is not sufficient, such as with the general tendency to associate higher pitch sounds with brighter colors? The problem of distinguishing between conscious synesthesia and not-necessarily-conscious crossmodal correspondences can have a wider importance, besides putting intriguing cases in the right categories: It might well tell us something about whether consciousness really distinguishes two kinds of mental states or processes, how it does so (e.g., Marcel, 1983), and whether, then, consciousness comes as a continuous or discrete phenomenon (Baars, 1996; Hubbard, 1996a).

These questions stress the need to make sure that we are cutting mental phenomena into the appropriate kinds. A common strategy here is to turn toward neurology or development, in order to see whether observed phenomena share the same neural underpinnings or develop from the same roots. However, none of these strategies is, as yet, able to deliver a clear verdict about whether to count canonical synesthesia and crossmodal correspondences as different manifestations of the same condition. Although some researchers hope that synesthesia will eventually be individuated in terms of its distinctive neurological bases (e.g., Simner, 2012; see also Hubbard, Arman, Ramachadran, & Boynton, 2005), there is still no agreement on the type of neurological profile (or profiles) that characterizes developmental synesthesia, not to mention the problem of reconciling these profiles with those found in various forms of acquired synesthesia (e.g., following brain damage, Jacobs, Karpik, Bozian, & Gothgen, 1981, and Vike, Jabbari, & Maitland, 1984; following hypnosis, Cohen Kadosh, Henik, Catena, et al., 2009; or possibly even after having taken hallucinogenic drugs, Grossenbacher & Lovelace, 2001). What is more, of all the neural accounts of synesthesia that are currently available—that is, disinhibition of feedback (Grossenbacher & Lovelace, 2001), a breakdown of modularity (Baron-Cohen, Harrison, Goldstein, & Wyke, 1993), enhanced neural connectivity (Bargary & Mitchell, 2008; Hänggi, Beeli, Oechslin, & Jäncke, 2008; Rouw & Scholte, 2007), neural cross-talk (Hubbard, Brang, & Ramachandran, 2011; Hubbard & Ramachandran, 2005; Ramachandran & Hubbard, 2001)—none makes a strong prediction as to when and why these neural underpinnings give rise to a conscious sensory concurrent.

Thinking in terms of the developmental trajectories of these phenomena does not deliver a verdict that is any clearer here. Despite long-standing claims that synesthesia and associative crossmodal tendencies might have a common origin in an initial state of modal indistinction or sensory confusion characteristic of neonates (Maurer, 1993; Maurer & Mondloch, 2005b; Wagner & Dobkins, 2011), there is little evidence to support this hypothesis. Different degrees of pruning in development have been proposed to explain the existence of both conscious and not-necessarily-conscious kinds of crossmodal associations, which, in turn, supposes that they both originate in the same initial neonatal synesthetic state of confused experiences. In some rare cases, of the sort we call canonical synesthesia, this initial confusion persists as it is, while in more frequent cases, pruning will keep the connection intact but suppress the vivid conscious coexperience. If the “neonatal synaesthesia” (Maurer, 1993) or better called “monoaesthetic” (Marks & Odgaard, 2005) account was established, crossmodal associations would be just another manifestation of the kind of state that is kept intact in canonical synesthetes. However, for the moment, nothing privileges this highly speculative hypothesis over the alternative interpretation that neonates are sensitive to crossmodal correspondences and that synesthesia follows another pattern of development.

The latter point once again stresses why the question of the right labeling to apply to widespread surprising crossmodal associative tendencies is not merely terminological. Not only does it go beyond the available evidence and misrepresent a much more complicated neurological picture or fragile developmental picture, it also strongly biases research in favor of looking for similarities between the two phenomena, to the detriment of hypotheses looking for differences and possible distinctions.

The decision as to whether to group the two phenomena under the same heading should hinge, then, on a more thorough comparison of their characteristic psychophysical features. Here, we want to go against the widespread idea that canonical, conscious synesthesia and general crossmodal associative tendencies are, in most relevant respects, comparable, or comparable enough, to be said to be continuous. In the Crossmodal sensory correspondences and crossmodal sensory synesthesia: A comparison section, we stress that a fair overview of the superficial characteristics of these two phenomena reveal as many similarities as differences, which should, at most, lead one to suspend one’s judgment on their relation and recognize their assimilation as premature. In the Contrasting crossmodal sensory correspondences and sensory synesthesia section, we demonstrate why accepting the assimilation can be done only at a cost—that is, by accepting concessions with the definition of synesthesia, which are, we believe, unsatisfactory. A more detailed investigation of the differences between the two phenomena leads us, in the Interim summary section, to suggest that they actually represent distinct kinds and that crossmodal correspondences should be studied as a category in their own right until the right kind of evidence for their assimilation to synaesthesia is available. In the Conclusions section, we come back to what we consider to be both a methodological recommendation and a substantive proposal.

Crossmodal sensory correspondences and crossmodal sensory synesthesia: a comparison

Similarities versus differences

Let us, for matter of convenience, use the term synesthesia to refer simply to the canonical synesthetic cases where the presentation or representation of one sensory feature in one modality elicits a conscious experience in another, nonstimulated modality, as in colored–hearing, or an extra-experience in the same modality, as in grapheme–color synesthesia. The term crossmodal correspondences can, then, refer to tendencies to match sensory features or dimensions across sensory modalities, which are observed in many individuals, but does not mean that the presentation of one sensory feature necessarily gives rise to the conscious experience of the second matching feature.

It is undoubtedly the case that crossmodal correspondences and synesthesia present, in certain superficial respects at least, some similarities. So, for example, both often involve the pairing (or matching) of seemingly (or at least at first glance) unrelated features in different sensory modalities (e.g., Marks, 1978; Marks, Hammeal, & Bornstein, 1987). In both cases, it seems as though the physical presentation (or possibly even just the imagination; e.g., Cytowic, 1989, p. 41; Dixon, Smilek, Cudahy, & Merikle, 2000) of features in one sensory modality (say audition) leads to, or elicits, the representation of a feature in another sensory modality (a color, say, or a surface of a certain brightness). Both crossmodal correspondences and synesthesia, as they become better studied, suggest that some precise dimension (or dimensions) of the stimulus presented is (are) responsible for the mapping. This, of course, does not mean that these dimensions are not sometimes hard to identify, at least initially: For instance, what precisely induces the color concurrent in color–grapheme synesthesia (e.g., Beeli, Esslen, & Jäncke, 2007; Brang, Coulson, & Ramachandran, 2011; Simner & Ward, 2008) and what drives the mappings between sounds and shapes in the case of phonetic symbolism are still topics of intense investigation (see Hinton, Nichols, & Ohala, 1994; Nielsen & Rendall, 2011; Robson, 2011).

What is more, and probably the main source of the connection between the two phenomena, is the observed consistency of these mappings. Consistency has become formalized in the case of canonical synesthesia in terms of the test of genuineness (ToG; Baron-Cohen, Wyke, & Binnie, 1987; see also Asher, Aitken, Farooqi, Kurmani, & Baron-Cohen, 2006). It turns out that only a small percentage of the population (i.e., some proportion of genuine synesthetes) give consistent reports of associations between apparently disconnected sensory features such as, say, colors and graphemes. For example, in Baron-Cohen et al.’s (1993) study, those participants identified as synesthetes were consistent 92% of the time when replying, given an unexpected retest a year later. This compared with responses that were 38% consistent for control participants retested with warning after only 1 week (see also Asher et al., 2006; Thornley Head, 2006). Now, consistency is also granted to nonsynesthetes when it comes to crossmodal correspondences (e.g., Gilbert et al., 1996). For instance, individuals consistently match higher-pitched sounds to brighter surfaces and lower-pitched sounds to darker surfaces (although the matchings remain relative and the exact value of each corresponding stimulus might vary; e.g., Gallace & Spence, 2006; Marks, 1991).

Looking in more detail, one also finds similarities in content: Some of the same crossmodal mappings have now been documented in both synesthetes and nonsynesthetes (Cohen Kadosh, Henik, & Walsh, 2007; Simner, Gärtner, & Taylor, 2011; Simner et al., 2005). So, for example, over 60% of number–form synesthetes, as well as nonsynesthetes, map smaller numbers to the left and larger numbers to the right (for 1 to 10 in the synesthetes and the usual SNARC effect in nonsynesthetes, see Cohen-Kadosh & Henik, 2007; Sagiv, Simner, Collins, Butterworth, & Ward, 2006). Other surprising similarities hold in terms of the colors that both synesthetes and nonsynesthetes tend to ascribe to graphemes: “A” tends to be red, “B,” blue, and “C” yellow in both color–grapheme synesthetes and controls (see Simner et al., 2005). Darker colors are also matched with lower-pitched sounds and lighter colors with higher-pitched sounds for synesthetes (Marks, 1975; Riggs & Karwoski, 1934; Ward et al., 2006; Zigler, 1930). Once again, the same trend has also been reported in nonsynesthetes (Hubbard, 1996b; Marks, 1974; Simpson, Quinn, & Ausubel, 1956).

There are also similarities in terms of the effects on behavior: Both crossmodal correspondences and synesthesia can give rise to interference effects in speeded discrimination tasks (such as in the Stroop color-naming task) when simultaneously presented with pairs of features that happen to be incongruent with the individual’s underlying synesthetic pairings or crossmodal associations (e.g., Dixon et al., 2000; Meier & Rothen, 2009; Mills, Boteler, & Oliver, 1999; Walker & Smith, 1985, 1986; see Marks, 2004, for a review). There is even evidence to suggest that both types of mapping can modulate a participant’s performance on visual search tasks (e.g., Klapatek, Ngo, & Spence, 2012; Laeng, Svartdal, & Oelmann, 2004; Nijboer, Satris, & Van der Stigchel, 2010; Palmieri, Blake, Marois, Flanery, & Whetsell, 2002; Ramachandran & Hubbard, 2001; Rothen & Meier, 2009).

Extending these behavioral similarities, both synesthetes and nonsynesthetes also “prefer” objects, events, or situations that respect their crossmodal mapping. It has been reported that synesthetes react negatively to stimulus combinations that are incongruent with their inducer–concurrent pairing (Cytowic, 1989). There are now also many examples of studies in which putatively normal participants rate combinations of stimulus attributes as being more pleasant when they follow the rule of their intuitive crossmodal correspondences than others that do not (see Sakai, Imada, Saito, Kobayakawa, & Deguchi, 2005; Zellner, Bartoli, & Eckard, 1991). This effect has been coined as a crossmodal congruency effect and is currently attracting increasing amounts of research interest (e.g., Holt-Hansen, 1976; Krishna, Elder, & Caldara, 2010; Seo & Hummel, 2011), given other hypothetical connections between synesthesia, emotions, and sometimes even esthetic pleasure or creativity (e.g., Cytowic, 1997). Following this, “nonsynesthetes” have also been shown to prefer those crossmodal associations that have been created by synesthetes, even for crossmodal associations that they themselves do not report (see, e.g., Ward, Moore, Thompson-Lake, Salih, & Beck, 2008).

However, besides these overt similarities, the two phenomena in the first place present some striking differences, even besides the fact that one is necessarily conscious and the other not.

Focusing on content again, but this time on the details of the concurrents, it appears as though synesthetic mappings are largely, if not entirely, idiosyncratic (e.g., Martino & Marks, 2001; Meier & Rothen, 2009; Simner et al., 2006). By contrast, crossmodal correspondences appear to be much more regular, even if initially the source of the mapping might not always be apparent. Note, though, that this claim has recently been challenged (or at the very least qualified): That is, certain regularities have now been observed across groups of synesthetes in terms of the specific mapping between inducer and concurrent and are used to show that synesthesia is not as individually specific as once thought (e.g., Rich, Bradshaw, & Mattingley, 2005; Witthoft & Winawer, 2006, 2010). Synesthesia and crossmodal correspondences are both then considered to be, in some sense, regular mappings, common to several or even many individuals. A further step in the direction of the assimilation of the two phenomena is accomplished by the similarities in content that seem to reveal that both synesthetic pairings and correspondences follow the same rule—for instance, of having brighter colors or surfaces for higher pitch notes or sounds.

However, the claims of commonalities, both among synesthetes and across synesthetes and nonsynesthetes, need to be strongly nuanced. Even if certain types of synesthesia seem to follow some common general rule (see Cohen Kadosh, Henik, & Walsh, 2007) or if specific pairings, such as between A and red or between B and blue, appear to be frequent among synesthetes, the overall repertoire of inducers and their color concurrents in the case of color–grapheme or colored–hearing synesthesia remains idiosyncratic, even within the same family or between homozygote twins (Barnett et al., 2008; Duffy, 2001; Ortmann, 1933; Smilek, Dixon, & Merikle, 2001, 2005). What's more, similarities between nonsynesthetes and synesthetes have not, at least for the moment, been demonstrated across all cases of crossmodal relations, and especially not between nonsynesthetes and individuals with rarer forms of synesthesia (such as sound–taste synesthesia, as documented by Beeli, Esslen, & Jäncke, 2005).

A second aspect of the idiosyncratic claim is that there is no obvious (i.e., immediately explainable) relationship between the inducer and the concurrent; the mapping appears, in some sense, arbitrary (or surprising) to most (see Auvray & Deroy, in press). The perceptual experience of the synesthete is said to be “anomalous” (Asher et al., 2009, p. 279; Rich & Mattingley, 2002; Simner et al., 2006). Although crossmodal correspondences can also sometimes be surprising, they can often seem “natural” or explainable: Calkins (1893) is, to our knowledge, one of the few researchers to have tested for this, showing that 24 out of 45 participants thought that the associations between colors and words or notes that her participants experienced could be explained (although, that said, most of them could not venture a specific explanation). Most crossmodal correspondences certainly call for an explanation in terms of exposure: For instance, whereas it is unlikely that middle C goes with orange for a specific synesthete because he or she has been exposed more often to these two features by association, it is entirely possible to relate the pairing of lower-pitched sounds and larger objects to their frequent co-occurrence in nature (Marks et al., 1987; see Spence, 2011, for a review).

A further difference comes from the respective incidence of the two phenomena. Synesthesia is frequently defined in terms of its rarity in the general population (see Grossenbacher & Lovelace, 2001). Now, while it is certainly true that the incidence of synesthesia has been increasing in recent years (from 1 in 25,000 according to Cytowic [2002, p. 54] to 1 in 20 in Sagiv & Ward, 2006),Footnote 1 as researchers have discovered and/or started to include more phenomena and varieties within the canon of synesthesia, virtually no one has suggested that synesthesia is universal (although see Stevenson & Tomiczek, 2007, for an exception).

By contrast, the incidence of crossmodal correspondences can go all the way from rare (presumably experienced in the extreme case by no more than a single individual) through to universal (e.g., Bremner et al., 2013; Spence, 2011): It simply depends, at least in the case of certain “statistical” correspondences, on the proportion of individuals that have been exposed to a particular statistical relationship between covarying stimulus dimensions in their environments (see Spence, 2011).Footnote 2 Hence, in terms of their incidence in the general population, synesthesia and crossmodal correspondences can differ or be similar, depending on the specific cases that are being considered. That said, given that the frequency of occurrence would appear to be more of a descriptive than a constitutive feature of synesthesia (see Deroy & Spence, submitted, on this distinction), and since incidence is neither a constitutive nor a descriptive feature of crossmodal correspondences, this difference between the two phenomena is perhaps somewhat less relevant, or telling, than other ways in which they differ.

At first glance, then, a review of the characteristic features of crossmodal correspondences shows them to share certain, but by no means all, characteristics of so-called canonical synesthesia.

Emphasizing the similarities: the continuum hypothesis

Labeling crossmodal correspondences as synesthetic (even in the sense of weak synaesthesia; Martino & Marks, 2001), if it is not just a question of fashion or ease, means that the similarities (i.e., surprising consistent crossmodal mappings, behavioral similarities) prevail, while the differences (i.e., conscious concurrent vs. no conscious concurrent, idiosyncrasy vs. regularity, rare vs. frequent in the population) are treated as merely secondary. This, as we have argued elsewhere (see Deroy & Spence, submitted), goes with the progressive tendency toward thinking of synesthesia only through the lens of the test of consistency (also known as ToG, or test of genuineness) and the results of behavioral tests of Stroop interference (e.g., Beeli et al., 2005; Mills et al., 1999).Footnote 3 The inclusion of crossmodal correspondences within the category of synesthesia therefore accompanies the progressive relaxation of constitutive criteria for this, once upon a time, more specific condition.

We have argued against such a relaxation of the definition of synesthesia, believing that doing so is likely to be detrimental when it comes to accounting for cases of vivid crossmodal imagery (Spence & Deroy, 2013a) and other borderline cases of crossmodally induced experiences (Deroy & Spence, submitted). Here, we would like to suggest that the same is true for crossmodal correspondences, and this even if their inclusion within the broad canon of synesthesia might seem more motivated by some overt similarities between the two phenomena. We consider this inclusion as relying on a fundamental error of attribution, where the first attribution, even if not justified, tends to stick to the following cases, however the evidence subsequently turns out. The investigation of crossmodal matchings seems to exemplify this biased attribution: Cases of crossmodal correspondences have been investigated after, or rediscovered somewhat later than, canonical cases of sensory synesthesia, and after those had won their actual scientific respectability and success. There has been a steady flow of publications on the topic of synesthesia from the 1890s to the present day (albeit with something of a waning of interest [or perhaps ability of researchers to publish papers] during the reign of behaviorism; see Harrison, 2001; Marks, 1975). By contrast, the crossmodal correspondences that were initially documented in the same studies as the cases with conscious concurrents (e.g., Flournoy, 1893) or else reported by themselves (e.g., Külpe, 1893, Stumpf, 1883) disappeared, only to start reappearing with any regularity in the journals from the late 1970s onward (e.g., Marks, 1975; Rader & Tellegen, 1987; Rudmin & Cappelli, 1983). When they were “rediscovered,” it was under the heading of synesthesia (e.g., see the title of Rudmin & Cappelli’s, 1983, article: “Tone–Taste Synesthesia: A Replication”). As a result, studies of these cases have remained marginal, except, perhaps, as an appendix to the literature on synesthesia: Welch and Warren (1986), in one of the more influential reviews of multisensory interactions to have been published over the last 25 years, do not even mention the topic of crossmodal correspondences. The very recent resurgence of interest in the phenomenon over the last 3 years or so also embraces the “synesthetic label,” either because it is popular (especially to the wider public and press) or because, as was said earlier, the definition of synesthesia is also becoming more and more inclusive, making the integration of various phenomena all the easier.

An error of attribution is especially detrimental when it generates, as here, an assimilationist bias that may mislead researchers who look for similar mechanistic explanations for the two phenomena, no matter whether or not that is necessarily the most appropriate thing to do. Note here that Martino and Marks (2001, p. 61), in the abstract to their paper, state, when talking about strong and weak synesthesia, the following: “we maintain the two forms draw on similar underlying mechanisms.” If this is true, one would expect, at least, specific hypotheses about what these similar underlying mechanisms could be to be ventured. But that is something they simply do not do. Instead of explanations, one finds the following:

  • Hesitations between the idea that there is a common mechanism for synesthesia and correspondences and the idea that there might not yet be a common mechanism for synesthesia. (Take the following quote from Simner et al. [2005, p. 1070]: “Synaesthetes may differ from non-synaesthetes in terms of the consistency of their responses, their automaticity, and their reported phenomenology, but the mechanisms that guide the choice of cross-modal associations appear to be common to both synaesthetes and non-synaesthetes.” However, no sooner have the authors made this claim than they proceed to question it: “To what extent, however, might this be true of all forms of synaesthesia?”)

  • Hypotheses for possible common explanations never being tested and/or being ignored (e.g., the idea that canonical synesthesia is due to emotion, advanced by Cytowic [1997, 2002] and similar explanations [i.e., that crossmodal correspondences are mediated by common emotional responses to specific pairs of stimuli] provided as the basis for crossmodal correspondences [Collier, 1996; Schifferstein & Tanudjaja, 2004] are never explicitly linked; Meanwhile, “the semantic coding hypothesis” [Martino & Marks, 1999, 2000, 2001], advanced as a putative explanation for the existence of at least certain crossmodal correspondences is not applied to synesthesia, be it sensory or more conceptual cases such as linguistic personification).

This lack of explanation is also exhibited by a lack of concrete predictions: If it was true that synesthesia and crossmodal sensory correspondences were to constitute the two ends of a continuum, what would that continuum look like? Would one expect synesthetes to show enhanced crossmodal correspondences and, if so, for all stimuli, or just for those dimensions on which they experience a synesthetic relation? Here, we would like to suggest that it is, in some sense, incumbent on those who wish to propose the continuum account linking crossmodal sensory correspondences to canonical cases of crossmodal synesthesia to be clear about what sort of result would contradict their view (see Platt, 1964).

What we would like to insist upon here, then, is the fact that crossmodal correspondences can and, to our way of thinking, certainly should, be studied as a distinct set of empirical phenomena. (Whether they are of one and the same kind or require further fine-grain distinctions is another question, and one that their separation from synesthesia can perhaps at least make possible, or easier, to investigate and raise.) Taking the opposite side to the assimilationist bias that minimizes differences, we would like to stress that differences matter. We examine a number of them below.

Contrasting crossmodal sensory correspondences and sensory synesthesia

Why consciousness (sometimes) makes a difference

The assimilation of crossmodal correspondences to synesthesia certainly obliges one to weaken, or to give up on, certain of the classical criteria for synesthesia and, in the first instance, the necessity of it leading to an atypical conscious experience. According to the majority of researchers, synesthesia involves the elicitation of a conscious sensory concurrent (for reviews, see Auvray & Deroy, in press; Deroy & Spence, submitted). At least, this is the way in which the term has been restricted historically, after a more general usage of the term in the early days of its study (e.g., Flournoy, 1893).

Now it is certainly true that a number of researchers have recently started to suggest that the concurrent need not be conscious in the case of synesthesia and, as such, that it can be studied only indirectly (e.g., Cohen Kadosh, Cohen Kadosh, & Henik, 2007; Cohen Kadosh & Henik, 2006; Gebuis, Nijboer, & Van der Smagt, 2009; Knoch, Gianotti, Mohr, & Brugger, 2005). The progressive relaxation of the consciousness criterion can also be seen in the resurgence of studies based solely on questionnaires (so, e.g., the participants in Day’s, 2005, study “reporting” on their unusual associations may not necessarily all have had a conscious or vivid experience). Furthermore, “novel” cases of synesthesia are increasingly being documented, where the conscious concurrent is not central (do those individuals who attribute personality traits to numbers, for instance [e.g., Simner & Holenstein, 2007], experience an emotional reaction toward the number, or are they just capable of making a repeatable judgment?).

This progressive relaxation from the position that synesthesia necessitates a conscious concurrent to the idea that it can, in certain circumstances, be conscious has made a growing number of researchers ready to include crossmodal correspondences in one and the same category and, thereafter, to lend their support, either explicitly or implicitly, to the continuum thesis (e.g., Marks, 1987; Martino & Marks, 2001; Rader & Tellegen, 1987).

The move is all the easier to make, given that crossmodal correspondences might also, at times, give rise to a conscious experience (what, in the case of synesthesia, would be called the concurrent) when a specific inducer is presented. This is especially likely to be true in those individuals who possess more vivid imagery ability (see Spence & Deroy, 2013a) and, perhaps, when the mapping goes from a less to a more accurate or habitual sensory dimension (for instance, the imagining of visual shapes in response to complex flavors or of colors for smells; e.g., Deroy & Spence, submitted; Deroy & Valentin, 2011). This, combined with the general claim that synesthetes are more likely to experience mental images vividly (e.g., Barnett & Newell, 2008), leads most researchers to infer that there must be a continuum of individuals from those who never have the nonconscious form to those individuals who always enjoy a vivid conscious concurrent, with various grades of frequency in-between (see Fig. 1).

Fig. 1
figure 1

The continuum model of synesthetic tendencies. This is the view that appears to be held, explicitly or implicitly, by the majority of researchers in the field

Another argument here is made by certain researchers from the fact that even those individuals who are not synesthetic may, on occasion, have strange crossmodal experiences under the influence of psychotropes. These experiences are then reported to resemble those reported by “congenital” synesthetes (e.g., Cytowic, 1989; MacDougal, 1898; Simpson & McKellar, 1955) to the idea that “the potential to experience synaesthetically may lay latent within everyone” (Marks et al., 1987, p. 4; see also Grossenbacher & Lovelace, 2001). Note here that, given the problematic character of the evidence reported (i.e., people might not find it possible to identify the nature of their drug-induced experience; see Hubbard & Ramachandran, 2003), calling such drug-induced experiences synesthetic provides rather a nice illustration of what we call an assimilationist bias.

Now, pushing such difficult cases to one side, why argue that the assimilationist view is widely incorrect? Why not simply accept the idea that more or less conscious cases of crossmodal mappings therefore fall on a continuum from the more to the less synesthetic (see Fig. 1), granting that consciousness can be thought of as a continuum anyway (e.g., Baars, 1996; Hubbard, 1996a)?

First, the continuum model comes with an empirical bet; that is, it predicts that the phenomena will also give rise to results that are distributed in a continuous way across the population. This awaits empirical demonstration. Consistency of conscious experience across time, for instance, seems to vary, but more systematic large scale studies are needed in order to determine whether the distribution is indeed continuous or whether, instead, there are clusters of conditions with characteristic kinds of consistency (see Simner et al., 2011).

The continuum model might not just be premature; it also faces clear difficulties. The first thing to note here is the finer differences in behavior introduced by the presence of a conscious synesthetic concurrent. Noticeably, there are differences in the kinds of interference introduced by the joint presentation of two given sensory dimensions, depending on whether they happen to be part of a synesthetic experience or of a crossmodal correspondence. For a given case—for instance, colors and letters—Stroop interference between specific colors and graphemes (say, red and the letter C) occurs only for synesthetes who consciously experience graphemes as colored (or for those who have explicitly been trained with specific pairings; see Meier & Rothen, 2009), and not in nonsynesthetes naturally matching letters and colors as well (see Simner et al., 2005). What’s more, synesthetes have been shown to be sensitive to interference when it comes to briefly presented inducers. In one study, Mattingley, Rich, Yelland, and Bradshaw (2001) used masked achromatic alphanumeric characters, which they presented for 500 ms, and had participants name the color of a target patch presented just after the masked prime. The color of the patch was either congruent or incongruent with respect to the synesthetic color induced by the numeral. Only synesthetes were significantly affected by prime–target congruency, showing slower responses in the incongruent than in the congruent condition. Again, if we expect nonsynesthetes to have number–color correspondences or be on a continuum with color–grapheme and color–number synesthetes because of some general association between shapes and brightness, some of them should exhibit interference on at least some of these trials. This said, we reckon that substantial additional research is needed to investigate the differences in the pattern or the underlying basis of interferences independently studied in synesthetes and nonsynesthetes (e.g., Elias, Saucier, Hardie, & Sarty, 2003).

The conscious concurrent introduces similar differences when it comes to the emotional consequences of presenting crossmodally corresponding (or congruent) versus noncorresponding (or incongruent) pairs of stimuli. Incongruent combinations are often reported as being very unpleasant for synesthetes, since the conscious concurrent conflicts then with the inducing feature (Cytowic, 1997). By contrast, noncorresponding stimulus pairings give rise, at most, to confusion, but without any strong negative reaction in the case of crossmodal correspondences (see Piqueras-Fiszman & Spence, 2011)—at least, as far as we are aware. More systematic comparative studies have not been conducted, for instance, to determine whether supposed effects of synesthesia on memory also hold (at least to a lesser degree) with crossmodal correspondences, given their difference in conscious manifestation. Here, the prediction would be that they do not, since the concurrent has to be conscious in order to be encoded with the inducing stimulus and generate a memory advantage.

A second difficulty comes from the fact that the continuity claims regarding consciousness are ambivalent. Individual variation in the vividness of experience is a documented feature of mental imagery but is rarely ever talked about for the case of perception.Footnote 4 Now, by thinking of concurrents as being more vivid in some individuals than in others, the continuum thesis starts to make synesthetic concurrents resemble a form of mental imagery, varying in degrees among individuals, as well as modalities. This, however, seems to be a form of assimilation that few researchers are nowadays ready to accept: Most current definitions of synesthesia stress, on the contrary, that the concurrents are “perceptually real” and, contrary to the case with mental images, subjectively indistinguishable from real percepts (see Rader & Tellegen, 1987, for an exception, and Spence & Deroy (2013a), for a discussion; note that the assimilation between synesthesia and mental imagery was once common; see Galton, 1880; Vernon, 1937). So, if there are crucial behavioral differences hinging on whether or not there is a conscious sensory concurrent, and if there are good reasons to think that, even when there are two conscious experiences, the two kinds of experience might remain in different categories (imagery here, perceptual-like there), then we really ought to distinguish two kinds of phenomena and two different headings—and not put them on a single scale or continuum.

Pushing things further, there are also methodological and theoretical reasons to adopt a separatist line. As was said earlier in terms of content, saying that the same mapping relation is at stake is also misleading: As one moves toward the '”rare” end of the synesthetic continuum, it is worth noting that the content (or concurrent, once it becomes conscious) onto which the stimulus is mapped becomes more specific or richer, going, for instance, from a mere approximate brightness level associated with a specific sound to specific colors, with hue, brightness, and very often texture (e.g., Eagleman & Goodale, 2009).

Last but not least, methodologically, unless one clings to the tenets of behaviorism (see Graham, 2010, for a review), a similarity in effects (such as exists here, in some respects, between certain crossmodal correspondences and certain canonical forms of synesthesia) in the presence of a difference in conscious experience has been sufficient to motivate a distinction, and the search for a deeper explanation (cf. the case of mental imagery; Reisberg, Pearson, & Kosslyn, 2003). It seems to us totally unsatisfactory to claim that one and the same phenomenon can here be conscious (as in the case of “strong” synesthetes) and there not (as in the case of “weak” synesthetes) without proffering additional explanation of the difference. Furthermore, since the terms themselves suggest that the “weak” synesthetes might be lacking something that the “strong” synesthetes have, one is entitled to ask for a good explanation of that superiority to be given, which has not yet been the case.

What about the tentative explanation that has been advanced by some claiming that the difference comes from the fact that synesthetes are possibly more “creative” (see Dailey, Martindale, & Borkum, 1997; Domino, 1989; Harrison, 2001; Mulvenna, 2007; Ramachandran & Hubbard, 2001; Ward, Thompson-Lake, Ely, & Kaminski, 2008)? One could easily argue here that this does not constitute an explanation for the existence of the conscious concurrent and simply seems to describe the consequences of synesthesia: Creative individuals are those who experience, and think, “differently” and are capable of making connections between different ideas/concepts. The other tentative explanation in terms of a difference in imagery ability, although certainly worth pursuing, has not been tested across all varieties of crossmodal synesthesia. What is more, it remains based on self-report measures (Barnett & Newell, 2008), does not account for the increase in specificity and richness of the sensory feature or concurrent to which the inducer is associated. Neither does it fit with more recent neurological accounts illustrating that those brain areas that are more active during mental imagery are no more active during synesthetic episodes (see Nunn et al., 2002; Rich et al., 2006). Differences in mental imagery ability might explain the difference in vividness obtained specifically for crossmodal correspondences. Rader and Tellegen (1987), for instance, argue that people differ in that some, but not all, have conscious experiences (and of various degrees) of colors when asked to match colors with sounds. But they also rightly point out that a difference in mental imagery ability does not explain another crucial difference, i.e. that some people have strong and specific associations while others have weaker and more general ones. We take their demonstration to represent an interesting intermediate version of the continuum hypothesis, whereby synesthetes and nonsynesthetes differ only in degree, but with regard to two independent dimensions—one being the vividness of the concurrent, the other being the strength and specificity of the relation between the inducer and the concurrent (see Fig. 2).

Fig. 2
figure 2

Second model of the relations between crossmodal correspondences and canonical synesthesia based on Rader and Tellegen (1987). Cases of crossmodal mappings between sensory features fall into four types (weak, or nonspecific, mapping and low imagery = typical crossmodal correspondences; strong mappings and high imagery = canonical synaesthesia; plus weak mapping with high imagery and strong mapping with low imagery). Note, therefore, that according to this schema, canonical synesthesia and typical correspondence do not overlap

What is still left unexplained by the intermediate or initial versions of the continuum hypothesis is, then, why there would be more people on the nonconscious (“weak”) than on the conscious (“strong”) end of the continuum. If being a strong or canonical synesthete was indeed advantageous, one might wonder why more people do not have synesthesia, given natural selection and the inheritability of the strong, or canonical, form of the condition (Barnett et al., 2008). For instance, if there is indeed such a thing as “mirror-touch” synesthesia (Banissy, Cohen Kadosh, Maus, Walsh, & Ward, 2009; Fitzgibborn et al., 2011), and if it constitutes a genuine road to empathy (Banissy & Ward, 2007), or if synesthesia does indeed lead to memory advantages (e.g., Gross, Neargarder, Caldwell-Harris, & Cronin-Golomb, 2011; Luria, 1968; Yaro & Ward, 2007), why don't more of us have it? Or again, if crossmodal correspondences exist across all sensory modalities (see Spence, 2011, for a review), why should cases with color concurrents be so dominant when it comes to the rare end of conscious synesthesia?

Why being frequent matters

This leads us to reexamine the prevalence criterion. As we said earlier, accepting crossmodal correspondences as constituting (or residing at) the weak end of a synesthetic continuum forces one to abandon the idea that synesthesia is, overall, constitutively rare. Again, this progressive abandonment highlights, as some have recently started to argue, that certain forms of synesthesia (i.e., the induction of tastes or flavor attributes by smell) may be universal (Stevenson & Tomizcek, 2007; although see Auvray & Spence, 2008).

Now, just as for variations in consciousness, even if cases seem to transition smoothly from one extreme to the other, we need to look in more detail at the way in which those cases are distributed. One thing worth mentioning here is that the frequent-to-rare continuum does not seem to align perfectly with the never-to-always conscious continuum. Although most cases are shared across individuals, certain crossmodal correspondences might be less frequent (and why not, in principle, as rare (or rarer) as some canonical cases of synesthesia—i.e., 0.5%–1% of the population; Bargary & Mitchell, 2008; Grossenbacher & Lovelace, 2001). They can, however, very often be explained in terms of those who have been exposed to the same environmental stimulation, or regularities (e.g., as in the case of crossmodal correspondences between colors and flavors, for instance; see Shankar, Levitan, & Spence, 2010; Spence, Levitan, Shankar, & Zampini, 2010). This is not the case of synesthetic mappings, which are distinctively nonreducible to what one has been exposed to previously (Hubbard & Ramachandran, 2003; Rich & Mattingley, 2002; although see Witthoft & Winawer, 2006, for an exception). As has been argued elsewhere (Auvray & Deroy, in press), this difference makes it likely that crossmodal correspondences have a representational value (that is, that they carry some information about independent regularities in the environment), while synesthesia does not inform the synesthete about an objective regularity. This, in turn, is consistent with the fact that crossmodal correspondences play a role in shaping multisensory perception and possibly helping solve the “correspondence problem” (Klapatek et al., 2012; Spence, 2011); for instance, the crossmodal correspondence between higher-pitched sounds and smaller objects has been shown to influence the multisensory integration of audiovisual events (for reviews, see Bien, ten Oever, Goebel, & Sack, 2012; Parise & Spence, 2009).

This is also consistent with the fact that certain crossmodal correspondences are universal, since they can be explained in terms of the natural statistics of the environment, which are encoded in terms of learned probabilities and help an organism to predict (Bar, 2007) what will occur (note that other crossmodal correspondences likely have a different explanation; see Spence, 2011, for a review).Footnote 5 Finally, the same hypothesis is congruent with the growing body of empirical evidence that animals are also sensitive to a variety of different crossmodal correspondences (see, e.g., Ludwig et al., 2011; Schiller, 1935; see also Faragó et al., 2010; Premack & Premack, 2003).Footnote 6

Interim summary

Let's summarize where we have got to so far. We have stressed that crossmodal sensory correspondences and canonical sensory synesthesia present certain similarities, as well as significant differences. This makes their assimilation tempting, but by no means costless. Indeed, considering cases of crossmodal sensory correspondences as continuous with cases of sensory synesthesia obliges one to relax two features once thought to be constitutive of synesthesia: first, the necessity of a conscious concurrent and, second, the rarity of the condition. We have demonstrated first that there is no good reason to give up on these criteria but, second, that it is also wrong to infer from an apparent linear spread of differences (from more to less conscious and from more to less frequent) to the idea that the phenomena should therefore receive the same explanation and are necessarily underpinned by the same process (or similar ones) that they must therefore lie on the same continuum. The difference in vividness and frequency for crossmodal correspondences can receive different explanations so that even the rare and vivid cases of crossmodal correspondences have nothing to do with rare and vivid cases of canonical synesthesia. This said, we now turn to further evidence that the two phenomena are not based on the same processes and must, therefore, receive a different explanation (and hence, presumably, also a different label). We review this evidence from the less to the more robust in terms of demonstrating the difference. But when taken together, we would argue that this series of to-be-explored differences make a strong case for the distinction between the two phenomena. These differences also point at crucial aspects where further investigation is needed in order to provide an integrated framework for the study of crossmodal correspondences.

Unveiling critical differences

Why automaticity matters

In synesthesia, the elicitation of the conscious concurrent occurs automatically or, at least, involuntarily. If automaticity means purely stimulus driven, goal independent, free from dual-task interference, and independent of attentional load (see Moors & De Houwer, 2006), it is very unlikely that most cases of synesthesia, being also dependent on recognition and selective attention, will count as genuinely automatic (e.g., Mattingley, Payne, & Rich, 2006; Rich & Mattingley, 2003, 2010; Sagiv, Heer, & Robertson, 2006). Most of the recently published evidence still indicates that, after the inducer has been attended to, the subsequent processes that elicit the synesthetic concurrent is relatively unintentional and unaffected by voluntary control. What is more, both the specific content of the concurrent and its vividness are fixed. That is, they are precisely determined by the inducer, although it should be noted that, on occasion, some synesthetes have reported that they have a certain degree of conscious control over their concurrents (Rich et al., 2005; Rich & Mattingley, 2003; Sagiv & Robertson, 2005).

It is, however, less clear how the automaticity claim should be interpreted in the case of crossmodal correspondences (Marks, 2004; Spence & Deroy, 2013; see also Treisman, 2005; cf. Santangelo & Spence, 2008). Crossmodal correspondences are certainly automatic in the sense that when tested in behavioral paradigms, they appear to give rise to behavioral effects (facilitation on congruent trials and/or interference on incongruent trials) that cannot be overridden at will by the average participant even when detrimental to their behavioral performance. However, there is another sense in which perhaps we cannot ascertain whether crossmodal correspondences are genuinely automatic. The reason for this is that since there is no necessary conscious concurrent in the case of crossmodal correspondences, the only way, at present, in which cognitive psychologists/psychophysicists can demonstrate their presence is by means of behavioral testing. It is, then, at least possible that the behavioral probing elicits the correspondence in the first place (that is, by setting up the relevant contrast/dimension).Footnote 7 Here, we want to argue that the characteristics of crossmodal correspondences are at least as likely to resemble the automaticity of other nonreflexive processes and to be of many kinds (for instance, the judgments they lead to resemble intuitions; e.g., Sperber, 1997), while their effects on crossmodal binding make them akin to coupling priors (Ernst, 2007), instead of their exhibiting the same automaticity as canonical synesthesia.

Differences in mappings: unidirectionality and relativity

Until recently, one of the defining features of the synesthetic relation between the inducer and concurrent has been its apparent unidirectionality (e.g., Mills et al., 1999). While the presentation of a given inducer would reliably give rise to a certain specific concurrent, the presentation of the concurrent does not, typically, give rise to the inducer. What is more, the inducer normally tends to be somewhat more complex (often conceptual) than the concurrent (Grossenbacher & Lovelace, 2001, p. 38; Martino & Marks, 2001, p. 62; Simner, 2007).

Over the last few years, some researchers have, however, started to suggest that certain synesthetes may exhibit bidirectionality in the relation between inducer and concurrent (see Cohen Kadosh, Cohen Kadosh, & Henik, 2007; Cohen Kadosh & Henik, 2006; Cohen Kadosh, Tzelgov, & Henik, 2011; Gebuis et al., 2009; Johnson, Jepma, & de Jong, 2007; Knoch et al., 2005; Richer, Beaufils, & Poirier, 2011). That said, closer inspection of these cases reveals that a conscious concurrent occurs (i.e., is only elicited, or at least only automatically elicited) in one direction. Several studies have now provided evidence for implicit bidirectionality (Cohen Kadosh et al., 2005; Johnson et al., 2007; Knoch et al., 2005) where a synesthete only had a conscious concurrent in one direction. Meanwhile, in a very small number of other cases, explicit bidirectionality has been reported (Cohen Kadosh, Cohen Kadosh, & Henik, 2007; Cohen Kadosh & Henik, 2006; Cohen Kadosh et al., 2011). The latter finding, if robust, could well be important in terms of theorizing in this area. However, what is also worth noting here is that the case remains isolated and the evidence needs to be strengthened; most papers are single-case studies and based on subjective reports, and several of the studies apparently refer to the same individual, I.S., in whom colors evoke digits and digits colors.Footnote 8 Finally, bidirectionality has been shown for the intramodal but not crossmodal variety of synesthesia, which makes the comparison with crossmodal correspondences less straightforward. The only exception comes here for the moment from P.S., reported by Richer et al., who occasionally experiences conscious bidirectional lexical–gustatory synaesthesia, but further study is needed here as well to confirm this report.

Now, it is already debatable whether an individual should be characterized as having bidirectional synesthesia if the concurrent is not conscious in both directions. Clearly, the answer to this question depends on whether or not one thinks that the concurrent must be conscious in order to count as synesthesia. This highlights, to our way of thinking, the kind of confirmation bias generated by the continuum model of synesthesia: Once one sees synesthesia as ranging from not at all to always conscious, then one is likely to incorporate many more phenomena into synesthesia (such as individuals exhibiting bidirectional mappings) and miss things that might stress a difference between correspondences and the necessarily conscious synesthesia.

So, one might ask, what is it, if not synesthesia, that describe what synesthetes (or at least these two synesthetes) experience in the reverse direction (i.e., from the concurrent to the inducer)? Here, we would like to argue that perhaps it is an unusual form of correspondence, but one that is driven by the statistics of the synesthetes’ own idiosyncratic perceptual experience, and not by the statistics of the environment, as is more commonly the case in nonsynesthetes (Spence, 2011). After all, one might ask, why shouldn’t synesthetes develop correspondences for their synesthetic relations, given the high statistical regularity of the pairing between the inducer and the concurrent in their own perceptual experience? The prediction, if such a suggestion were to be correct, is obviously that many more synesthetes should have bidirectionality when it is tested for appropriately.Footnote 9

Leaving the few isolated cases of putatively bidirectional synesthesia to one side, it is, by contrast, the case that the majority of researchers appear to believe (at least implicitly) that crossmodal correspondences are bidirectional (e.g., Martino & Marks, 2001). That is, they believe that the presentation of a particular shape or size of object will prime a lower frequency sound just as robustly as the presentation of a lower-pitched sound will prime a larger object. Now, of course, sounds have been shown to correspond to a wide variety of attributes in other sensory modalities (e.g., vision; see Table 1). This multiplicity of crossmodal correspondences between two modalities raises interesting questions regarding the connection between synesthesia and correspondences. What happens when several inducers are simultaneously presented in the case of crossmodal synesthesia (for instance, what sort of concurrent experience is triggered when both a sound and a tactile stimulus are presented to a synesthete with multiple synesthesia) has, to our knowledge, not been systematically studied. Drawing on a few reports (e.g., Ortmann, 1933) and on an analogy with reports given by color–grapheme synesthetes when presented with a complex name suggests that such cases can lead either to the systematic dominance of one inducer (e.g., the first letter of the word will determine the color of the word) or to a complex combination of concurrents (e.g., Duffy, 2001). By contrast, it is unclear here whether all possible crossmodal correspondences are ever primed simultaneously when a sound is presented; the minimal evidence pertinent to this question suggests that only the subset of crossmodal correspondences that are relevant, or primed, by the setting of the experiment are effective (see the earlier discussion on this point in Footnote 8). The case of simultaneous inducers is likely to introduce further interesting differences between synesthesia and crossmodal correspondences, with more contextually driven hierarchies or dominance for certain crossmodal correspondences (or of certain directions) over others.

Table 1 Summary of visual crossmodal correspondences elicited by the presentation of sounds of different pitch

Asymmetries in perceptual salience are also worth considering here: If one happens to have more vivid imagery abilities in vision, say, than in the other modalities, one might be more likely to experience a visual image primed by an auditory inducer in the case of a crossmodal correspondence than vice versa. If correspondences are indeed perfectly bidirectional but mental imagery is stronger in certain modalities than in others, this might explain the dissociation we already stressed between strength and vividness of correspondence (see also Rader & Tellegen, 1987).

There seems to be a belief that the bidirectionality observed in the case of crossmodal correspondences also means that (at least some) crossmodal correspondences might be transitive. If we know how brightness is related to pitch (i.e., that larger objects make sounds that have a lower pitch), say, and how pitch is related to density, we should be able to predict, at least roughly, how brightness is related to density. This area of research has, to our knowledge, not been investigated in any detail yet, except for this precise example given by Boring and Stevens (1936) and other work on triangulation (or transitivity) by von Hornbostel (1931; see also Schiller, 1935). However, by combining several recent studies, it would seem as if the transitivity hypothesis can hold more widely, with sharp shapes being correlated with the sound of the word kiki and kiki with dark chocolate (Gallace et al., 2011), while sharp also corresponds to acidity and trigeminally stimulating substances (Deroy & Valentin, 2011; Hanson-Vaux et al., 2013; Spence, 2012; Spence & Gallace, 2011). This said, given both the multidimensionality of the percepts at stake and the effects of undoubted differences in salience, transitivity should not to be expected in every case. For instance, given the cross-modal correspondence congruence between loudness and size (where louder corresponds to larger) and the crossmodal correspondence between pitch and size (where lower corresponds to larger), simple transitivity would predict that louder corresponds to lower pitch, and yet this is not what is actually observed (Evans & Treisman, 2010; Spence, 2011). The relations between crossmodal correspondences certainly then require more complex explanations, but transitivity might at least be a component of these explanations, which does not appear to be the case for synesthesia (see Fig. 3).

Fig. 3
figure 3

Transitivity of crossmodal correspondences (see Boring & Stevens, 1936; Deroy & Valentin, 2011; Gallace, Boschin, & Spence, 2011; Spence & Gallace, 2011). Transitivity is one of the ways in which crossmodal correspondences differ from the vast majority of cases of synaesthesia (which tend to be unidirectional)

Finally, the concurrent in the case of synesthesia also seems to be dependent solely on the nature of the inducer, and not on the other objects along which that inducer happens to be presented (see Martino & Marks, 2001). There is some limited evidence for contextual effects in synesthesia (Dixon, Smilek, Duffy, Zanna, & Merikle, 2006; Rich & Mattingley, 2003; Sagiv, Heer, & Robertson, 2006), but really only in the case of local/global grouping or when the immediate context in which the inducer is presented changes the way in which it is interpreted (i.e., its meaning; “2” or “z” study). Thus, although one might consider that contextual effects therefore affect the interpretation of the meaning of the inducer in synesthesia, many researchers consider that synesthesia remain absolute, by considering that the inducer is not the physical stimulus, but the interpreted object (e.g., Simner, 2007). In that case, the same (interpreted) inducer always leads to the same concurrent.

By contrast, the evidence concerning crossmodal correspondences suggests that they are very often relative. It is, for example, the relative, not the absolute, size that matters in the case of size–pitch correspondences: The larger of two visual targets is matched to the lower pitch sound, where being lower is determined by either the actual comparison between two sounds or an implicit referent. In fact, at least certain crossmodal correspondences have been shown to disappear altogether if all congruent and incongruent trials are presented in separate blocks of experimental trials, rather than intermingled randomly on a trial-by-trial basis (see, e.g., Gallace & Spence, 2006; Melara & O’Brien, 1987). What's more, this means that the same inducer, presented in one context or another, might then be matched to different dimensions or features in the other modality (Marks, 1991). This said, some documented cases of crossmodal correspondences might qualify as absolute (e.g. Guzman-Martinez, Ortega, Grabowecky, Mossbridge, & Suzuki, 2012; Pedley & Harper, 1959; Smith, Grabowecky, & Suzuki, 2007), but perhaps due to semantic or linguistic reinforcements. When synesthesia presents a relative component (as, for instance, in Cytowic & Wood, 1982, albeit for a very small participant sample), the issue is complicated by the evidence that some relativistic effects in perceptual judgments appear to reflect sensory adaptations (for a review, see Marks & Arieh, 2006). The rarity and the opacity of the evidence are here not sufficient to mask the contrast between the overall absolute character of the synesthetic elicitation, once the inducer is perceived, and the relativity of crossmodal correspondences.

Now, why would transitivity, bidirectionality , and relativity demonstrate that crossmodal correspondences form a distinctive kind, separate from synesthetic cases? Why not, in other words, predict or argue that there is a continuum of cases, from more bidirectional, transitive, and relative (at the crossmodal correspondences end) to more unidirectional, intransitive, and absolute (as one moves toward the synesthetic end of the continuum)? The first thing to stress is that contrary to conscious or frequent, which could be treated like gradable terms, these criteria are binary and do not have intermediate degrees. Transitivity is a property of relations that is either possessed or not possessed but cannot be half-possessed: Everything which is not fully transitive is therefore intransitive and cannot be “semitransitive.” Relativity and absolute character are the same: Something that is not absolute is not half-absolute; it is relative. Bidirectional is also a not a criterion that come in degrees, unless one manages to find cases which are never, sometimes, often, and always bidirectional.

So, from a strictly logical point of view, these binary divisions cannot but go with a strong partition (perhaps more than a bipartition, since some phenomena could be, for instance, relative and transitive, whereas others could be relative and intransitive). From an empirical point of view, there must certainly be different neural underpinnings for the bidirectional and unidirectional processes.

This adds to what we have already stressed—that is, the fact that the continuity model has difficulty aligning aspects such as strength and frequency with the otherwise continuous series of cases from more to less often conscious. The strength of the mapping seems to go with either end and to be orthogonal to the targeted synesthetic continuity (see Fig. 4). Note here that the same might be true of what we have called specificity: After all, whereas correspondences seem mostly to involve the mapping of whole dimensions (where mappings can be elicited even between pairs of stimuli that themselves might not have been preexposed) and have been interpreted as such (e.g., Collier, 1996; Harvey, 1973), it is possible that some more specific cases, involving full discrete stimuli, also exist.

Fig. 4
figure 4

Limits of the continuum hypothesis (discontinuous features and orthogonal variations)

The nonspecific, bidirectional, and relative nature of most crossmodal correspondences, as well as their prevalence in the general population and their role in multisensory binding, single them out for an altogether quite different role than synesthetic experiences. They help keep track of statistical regularities in our experience of the environment, where multisensory objects and events present very varied features but are likely to correlate in a regular way (see den Ouden, Daunizeau, Roiser, Friston, & Stephan, 2010). For instance, visual size and auditory loudness, or size and pitch, are often related dimensions in the environment: Bigger objects tend to emit louder and lower pitch sounds than do smallar objects. (Unless, in the latter case, basilar mechanics and neural coding of sounds might also explain why the apparent spatial extent of sounds themselves, or auditory volume, varies inversely with acoustic frequency. At lower sound frequencies, the basilar membrane tends to vibrate as a whole, producing synchronous activation across large numbers of peripheral neurons. As sound frequency increases, there is greater spatial resolution of basilar-membrane activation; see Boring, 1926; Marks, 1978; Stevens, 1934). These related dimensions might be an aspect of our cognitive apparatus that has been internalized, either because of these very objective regularities in the environment (Spence, 2011) or because of a common coding—for instance, of magnitude. This leads us to examine crucial differences regarding acquisition between crossmodal correspondences and canonical synesthesia.

Differences in acquisition

In adulthood, in the case of developmental synesthesia at least, the relation between the inducer and the concurrent would appear to be fixed. Here, we will not discuss the other, much rarer types of synesthesia (e.g., brain damage or blindness-induced synesthesia, or, for that matter, drug-induced synesthesia, if such there be; Jacobs et al., 1981; Steven & Blakemore, 2004; Vike et al., 1984). While there is certainly some evidence that these mappings may at one time (i.e., in childhood) have picked up on the statistics of the environment (e.g., the colors of the letters used in fridge magnets; Witthoft & Winawer, 2006; see also Beeli et al., 2007; Smilek, Carriere, Dixon, & Merikle, 2007; although see Cohen Kadosh, Henik, & Walsh, 2009, p. 485), in adulthood, it appears that existing synesthetic mappings between inducer and concurrent cannot be modified. Nor, for that matter, can new synesthetic mappings be learned (see Mrockzo, Metzinger, Singer, & Nikolic, 2009, for a limited exception). Similarly, repeatedly pairing arbitrary combinations of auditory and visual stimuli, even when presented for several tens of thousands of trials (and even when the participants are given drugs such as Mescal), do not give rise to a conscious sensory concurrent when either one of the paired stimuli is subsequently presented in isolation (see Howells, 1944; Kelly, 1934; Meier & Rothen, 2009). After training people with specific, but arbitrary, associations between graphemes and colors, a robust Stroop-like effect was observed behaviorally by Meier and Rothen. Since this effect was documented in the absence of any vivid conscious synesthetic concurrents, one might argue that what was established was merely a novel crossmodal correspondence, or coupling prior (cf. Ernst, 2007; Flanagan, Bittner, & Johansson, 2008; Spence, 2011).

Now, by contrast, a subset of crossmodal correspondences (i.e., semantic ones) can be learned very rapidly, in a matter of trials. The same is true of coexposures for novel odorants paired with a specific tastant (e.g., Stevenson, 2012; Stevenson, Boakes, & Prescott, 1998) and within an hour of arbitrary pairings of visual and tactile stimulus dimensions (Ernst, 2007). At the end, both internalized, universal correspondences like the one holding between pitch and brightness and more individual learned correspondences (like the one that holds between colors and flavors or odors and tastants) suggest that crossmodal correspondences remain malleable, throughout the lifetime of an individual. No such systematic environment-driven change has been observed for synesthesia.

Differences in neural underpinnings

Given the differences stressed above, one would expect synesthetic and nonsynesthetic cases of crossmodal mappings to present different, or at least not completely overlapping, patterns of brain activation, therefore revealing a specific neurological profile or different explanations for each condition.

The task is made complicated by the fact, stressed in recent research (Rouw, Scholte, & Colizoli, 2011), that synesthesia involves a network of brain areas rather than a single one. In auditory and/or visual cases this network includes the left parietal lobe (Jäncke & Langer, 2011) and medial structures (i.e., the anterior cingulate cortex, thalamus, precuneus, and insular cortex; Specht & Laeng, 2011). Studies still show that the brains of synesthetes reporting consistent conscious experiences differ from the brains of other individuals (Hänggi et al., 2008; Jäncke, Beeli, Eulig, & Hänggi, 2009; Rouw & Scholte, 2007, 2010). A second complication here comes from the lack of agreement regarding what this difference consists of: Researchers have proposed over the years a number of different accounts of the neural mechanisms underlying synesthesia: disinhibition of feedback (Grossenbacher & Lovelace, 2001), the breakdown of modularity (Baron-Cohen et al., 1993), enhanced neural connectivity (Bargary & Mitchell, 2008; Rouw & Scholte, 2007), neural cross-talk (Hubbard et al., 2011; Hubbard & Ramachandran, 2005; Ramachandran & Hubbard, 2001), and so forth (see Rouw, 2011, for a recent review).

By contrast, here it is worth noting that until recently, there has been little research (or even published suggestions) concerning the neural underpinnings of crossmodal correspondences (cf. Martino & Marks, 2001). The results of an electrophysiological study by Bien et al. (2012) revealed the activation of intraparietal areas; similarly, work by other researchers (e.g., Kovic, Plunkett, & Westermann, 2010; Peiffer-Smadja, 2010; Seo et al., 2010) has been widely taken to suggest that such correspondences can influence information processing at a relatively early (i.e., perceptual) level. Given the variety of crossmodal correspondences that have been demonstrated—semantic, statistical, and neural, for a start (see Spence, 2011)—it is likely that there may be various different neural substrates for different kinds of crossmodal correspondence, having various effects, for instance, on visual processing (Sadaghiani, Maier, & Noppeney, 2009).

Neurological data are not delivering a clear verdict in terms of whether a solution of continuity can be found between synesthesia and nonsynesthetic cases. We want to stress here that, given the differences noted above, it is up to the defenders of the continuum hypothesis to say which actual or forthcoming data would constitute a confirmation of their view. Evidence of overlapping areas, for instance, largely underdetermine their claims: Researchers have now started to investigate whether they can eliminate or, at the very least, interfere with the elicitation of synesthetic concurrents with the targeted use of transcranial magnetic stimulation (TMS; see Esterman, Verstynen, Ivry, & Robertson, 2006; Muggleton, Tsakanikos, Walsh, & Ward, 2007). Synesthetic Stroop-type interference effects have been shown to occur when using TMS applied over right parietal areas—specifically, over the intraparietal sulcus and the temporal-occipital area. Bien et al. (2012) have also demonstrated that they could interfere with the crossmodal correspondence between pitch and size by applying TMS over the right intraparietal area. Here, though, the modulation of the Stroop effect should not be confused with, or necessarily taken to demonstrate, a modulation of synesthesia (i.e., the modulation of a conscious sensory concurrent; see Deroy and Spence, submitted, on that point; see also Meier & Rothen, 2009). What’s more, in their study of synesthetes, Muggleton et al. (2007) examined the right and left parieto-occipital (PO) junction and left and right intraparietal sulcus (IPS) and found the effect only for the right PO, whereas Bien et al.’s results concern the IPS alone. It seems then that the idea of a neurological continuity or even overlap between canonical synesthesia and crossmodal correspondences needs stronger evidence.

Finally, an important question here for future research concerns the fact, noted above, that most of crossmodal correspondences seem to rely on bidirectional processes. By contrast, several researchers in the field have noted that finding an explanation for synesthesia is likely to benefit from looking for an explanation for its unidirectionality (e.g., Grossenbacher & Lovelace, 2001; Rich & Mattingley, 2002). Therefore, and although no conclusion can yet be advanced, the two research agendas, we contend, come apart.

Conclusions

The evidence reviewed here should have made clear that, despite their superficial similarities, crossmodal correspondences are not to be considered as a form of “weak synaesthesia.” In fact, in most regards, they are qualitatively different phenomena, with likely (although not necessarily) different neural underpinnings. As such, we would like to take issue with the increasing tendency to conflate these two phenomena (e.g., Bien et al., 2012; Eagleman, 2009; Esterman et al., 2006; Ludwig et al., 2011; Mulvenna & Walsh, 2006; Sagiv & Ward, 2006).

By so doing, these researchers implicitly seem to see the expression of crossmodal correspondences as but one end of a continuum with synesthesia at the other end—siding with what Marks (2011) has labeled a form of “synaesthetic monism.” Note here that calling this continuum synesthetic is, then, a mere matter of convention, since one could as well decide to call it a crossmodal correspondence continuum (which few, but some have done; see Harrison & Baron-Cohen, 1996).

There are strong objections to the trend that consists in lowering or ignoring the occurrence of a conscious concurrent when documenting cases as “synesthetic,” even “weakly synesthetic” (e.g., Martino & Marks, 2001; see also Rader & Tellegen, 1987; Zellner et al., 2008). Crucially, this debate is much more than merely terminological. Progress in understanding both synesthesia and crossmodal correspondences may be held up by what we see as an inappropriate attempt to link these two kinds of phenomena. Until some meaningful similarity can be established, it is methodologically and conceptually recommended to use distinct terms to refer to these two phenomena. Everyone will agree with this pragmatic recommendation. The same might not be true with what we consider to be a more substantial conclusion that synesthesia and crossmodal correspondences are fundamentally different. This leads us to insist that the concurrent in the case of synesthesia needs to be conscious and obliges us to revisit or resist some popular accounts of consciousness.

If the conscious concurrent is key to the definition of synesthesia, it is going to become impossible to say confidently whether neonates and infants are, for example, more synesthetic than adults or not. Maybe neonates are just more sensitive to crossmodal correspondences than are adults. This line of argument leads one to challenge the growing body of research suggesting that neonates are all synesthetic, and also more synesthetic than adults (Maurer, 1993; Maurer & Mondloch, 2005a, 2005b; Mondloch & Mauer, 2004; Simner, Harrold, Creed, Monro, & Foulkes, 2008; Spector & Maurer, 2009; Walker et al., 2010; see also Marks & Odgaard, 2005). Further investigation of this new hypothesis of a high sensitivity to crossmodal correspondences is likely to help in distinguishing innate (Shepherd, 2012) or neurologically internalized correspondences, thereby shedding light on mechanisms underlying universal correspondences or on how these first correspondences are updated through exposure (Lewkowicz & Lickliter, 1994). It also opens up an interesting series of studies if some correspondences are shown to exist in infants and disappear through the course of development.

From a more philosophical point of view, the need to distinguish conscious synesthesia from crossmodal correspondences raises new questions regarding the differences between conscious and unconscious processing. Contemporary signal detection theory, in a sense, has freed us from the need to draw a line between the two, since the power of detection and discrimination increase continuously without consciousness seemingly introducing a key difference. From there, it is also possible to think that there can be continuity in consciousness, as once defended by Leibniz with his theory of “les petites perceptions” (see Marks, 2011). However, the distinction between synesthesia and crossmodal correspondences invites us to consider the qualitative differences being introduced by the necessity of a conscious concurrent or binding.

At the end, we contend, freeing correspondences from the domination of synesthetic interpretations is likely to open up a whole new domain for investigation. In this sense, then, we do indeed believe that the evidence for weak synesthesia is, at present, weak and that the need for more systematic study of crossmodal correspondences is strong (Table 2).

Table 2 Summary of the differences between canonical cases of synaesthesia and crossmodal correspondences, which justify their distinction