Perfumers often talk about the composition of a fragrance in terms that look strikingly similar to those used by musical composers: That is, they search for the right balance and harmony between high and low notes, they think about tones, and so forth. Ruth Rosenbaum, reporting on the work of flavorists working on strawberry flavor, noted, for instance, that

there are about 2,000 raw materials for the flavorist to choose from, each one of them with characteristic notes. As the flavorist composes at his console, he must be able to envision how the notes of each material he adds will “sound” with every other note in the final flavour composition,

and considers that “all chemicals have to be chemically atuned as well” (Rosenbaum, 1979, p. 83). Famous perfume expert Piesse (1857, p. 25) pushed the connection between odors and music even further, noting that

Scents, like sounds, appear to influence the olfactory nerve in certain definite degrees. There is, as it were, an octave of odours like an octave in music; certain odours coincide, like the keys of an instrument. Such as almond, heliotrope, vanilla, and orange-blossoms blend together, each producing different degrees of a nearly similar impression. Again, we have citron, lemon, orange-peel, and verbena, forming a higher octave of smells, which blend in a similar manner.

It is, however, not only perfumers who have made this crossmodal connection: Professional wine and food writers have also been known to describe certain wines or dishes in terms of specific musical notes, musical styles, or pieces of music (see, e.g., Prescott, 2012, p. 21; see Spence, 2011b, for a review). The connections between auditory and olfactory dimensions could be nothing more than an anecdotal phenomenon, or else they might be specific to a given community of experts. It could be that, as many artists do when looking for ways in which to think about and create complex and novel sensory experiences, perfumers rely on the use of language that is highly metaphorical. Alternatively, though, these connections might come from special olfactory abilities that separate expert perfumers from the general population. So, for instance, it has been reported that many perfumers have superior olfactory mental imagery abilities, and that these abilities are correlated with reduced activity in the piriform cortex and increased activity in the parahippocampal gyrus when compared to nonexperts (Plailly, Delon-Martin, & Royet, 2012). The hypothesis concerning the superiority of experts, however, still fails to explain the origin of the connection that they make between perfumes and pieces of music—and, in the first place, whether the connections are consistent between individuals. A way to address these two questions might come from the results of several studies that have highlighted the facts that the descriptors used for perfumes are, shared across individuals (see Manetta, Urdapilleta, Houix, Montet, & Richard, 2007, for a review) and, more specifically, that the comparison between certain olfactory stimuli and specific musical notes or attributes has an intuitive and direct appeal that extends well beyond the realm of artists or expert perfumery. The majority of people, for instance (or at least a majority of those tested in the research to date; see Henrich, Heine, & Norenzayan, 2010), reliably match certain smells (such as, e.g., caramel) with a lower-pitched note than they match other smells (e.g., bergamot) with (see Belkin, Martin, Kemp, & Gilbert, 1997; Crisinel & Spence, 2012b). The crossmodal associations that have been tested to date exhibit a degree of convergence that is far greater than would be expected by chance: In this respect, they look like many other crossmodal associations (what are often called “correspondences”; see Deroy & Spence, in press).

A point worth stressing here is that, as contrasted with the spatial mapping of pitch, most people do not necessarily describe music (or specific auditory attributes) using the spontaneous sensory repertoire that they would use to describe olfactory stimuli. Perhaps unsurprisingly, they often pair food smells with the typical taste of their sources (e.g., “sweetness” for the smell of vanilla; Stevenson & Boakes, 2004) or with the thermal attributes attributable to their trigeminal characteristics (as in the case of “cool” smells, such as menthol; e.g., Laska, 2001). One can also find olfactory experiences that are associated with apparently more arbitrary features, such as the visual or amodal dimension of “brightness” (N. E. Cohen, 1934; von Hornbostel, 1931; Kemp & Gilbert, 1994, 1997; Schiller, 1935),Footnote 1 the colors or shapes of three-dimensional objects (Smets & Overbeeke, 1989), and the notion of “thickness,” “heaviness,” or “density,” borrowed from the tactile domain and typical of solid objects (Stevenson & Mahmut, 2011). Finally, when prompted, or when performing some sort of forced choice task, people not only associate odors with certain auditory features, such as auditory pitch (Belkin et al., 1997; Crisinel & Spence, 2012b), but also with certain visual features, such as angularity and roundness (Hanson-Vaux, Crisinel, & Spence, 2013; Seo, Arshamian, et al., 2010).

Olfactory experiences, then, represent a domain to which attributes from all other sensory modalities (i.e., speaking from a commonsensical viewpoint: taste, touch, hearing, and vision) can be easily and naturally applied (see Table 1); In other words, they represent a domain that is particularly rich in crossmodal correspondences. Crossmodal correspondences are defined as tendencies for a certain sensory feature (or dimension) to be associated or matched with another feature (or dimension) in a distinct sensory modality (see Spence, 2011a, for a review). By extension, the term applies to many surprising matchings between actual stimuli or imagined objects across sensory modalities. Such tendencies appear to exist across all combinations of sensory modalities, at least those that have been tested to date (although, importantly, not necessarily between all pairings of dimensions; see Bernstein & Edelstein, 1971; Evans & Treisman, 2010), and canonical examples include higher-pitch sounds being mapped to brighter surfaces and smaller objects, and lower-pitch sounds to darker, larger objects. Despite it being a rich illustration of the variety and importance of crossmodal correspondences, olfaction has been seemingly ignored in both previous and recent studies of these phenomena (see Kenneth, 1923, for an exception). The primary goal of the present study was to correct this ignorance, with a specific focus on the most surprising and least investigated cases, such as odor–musical notes or odor–angularity correspondences.

Table 1 Varieties of crossmodal associations related to olfaction

A first question to address is whether all of these mappings between olfaction and the other senses really do constitute a single phenomenon, and would, then, benefit from a joint explanation. Do they reveal a variety of distinct empirical phenomena and underlying processes? This first question is closely linked to a second one, concerning the rules that determine how the specific olfactory stimuli are associated with other sensory attributes. Do the same rules apply to explain, for instance, the pairing of odors and taste or trigeminal attributes and the pairing between odors and musical attributes? In other words, can we explain all of the mappings in terms of learned associations? Why should it otherwise be that the orthonasal smell of caramel is mapped onto “lower notes” than those associated with the smell of bergamot (Belkin et al., 1997)?

As we will show in the following section, it is difficult to explain sound/music–smell or shape–smell matchings simply in terms of other cases of crossmodal correspondences—that is, as coming from the internalization of statistical regularities in the environment. This lack of any obvious statistical explanation, as well as additional differences, leads one to question the relevance of assimilating these matchings between contingent features to crossmodal correspondences in the first place. This, perhaps, explains why, previously, such matchings have been considered to be mere metaphorical transfers. The next section, however, details what we take to be the reasons why the idea that crossmodal associations come from linguistic or conceptual mappings does not provide an adequate account. After this, we explore and critically examine other explanations that are compatible with these associations being internalized crossmodal correspondences—that is, explanations in terms of amodal, indirect, and transitive mappings across sensory modalities. The final section concludes with two arguments related to seeking a better understanding of the crossmodal correspondences holding between contingent features, such as those holding between smells and auditory or shape features: First, researchers should resist the idea that the associations are merely anecdotal, but rather should subject them to systematic investigation (see also Stevenson et al., 2012). Second, we propose an alternative framework to investigate the most surprising cases of crossmodal associations, one that contrasts with their being seen as either synesthetic connections (an interpretation that was suggested for smell–taste associations by Stevenson & Tomiczek, 2007, and is easily extended to crossmodal associations between smell and musical notes) or merely grounded in episodic memories (as has been proposed to explain “Proustian” associations between odors and visual images; see Chu & Downes, 2002; Willander & Larsson, 2006, 2008).

Specificities of contingent crossmodal associations

Besides their superficial resemblance as surprising connections between sensory features or dimensions, the various associations between olfaction and audition—as well as other correspondences between, for example, olfaction and taste (“sweet smells”) or pitch and shape—appear to be different in at least three respects: first, in terms of intra- and interindividual consistency; second, in terms of their grounding in the statistics of the environment; and third, in terms of their behavioral effects. Although all of these differences await further experimental confirmation, they stand at the core of the puzzle currently raised about what we call “contingent crossmodal associations.”

Less robust associations

A series of experiments have highlighted that people converge when asked to match olfactory stimuli to either auditory stimuli (Belkin et al., 1997; Crisinel & Spence, 2012b; Crisinel, Jacquier, Deroy, & Spence, in press) or symbolic/geometrical shapes (Hanson-Vaux et al., 2013; Seo & Hummel, 2011). This, in itself, would appear to be sufficient to suggest that a real phenomenon is at stake, and that these associations are not random.

However, one might object that a closed question format or the authority of the scientist asking the question might just encourage some people to answer an apparently nonsensical question about pairing smells and musical features as if it were meaningful. In turn, this might end up convincing the experimenter involved that some hidden truth lies behind such answers. This is what happens with children, who it has been shown are ready to answer apparently arbitrary questions posed by adults—for instance, whether jumpers are angrier than trees, or whether red is heavier than yellow (see, e.g., Hughes & Grieve, 1980; C. Pratt, 1990; Waterman, Blades, & Spencer, 2000, 2001). The closed format of the question encourages the participant to pick one of the two suggested solutions. However, not all nonsensical questions are equivalent: While the question about the emotional states of trees may not lead to a statistically significant general answer, most people, including many adults, will happily agree that red is indeed heavier than yellow (Alexander & Shansky, 1976) or, in the present case, that bergamot is higher-pitched than caramel. Not all matchings, then, seem to be equally arbitrary, and some can be shown to be more frequent and common than others. Note that these judgments can remain subjectively arbitrary, and that the individuals who make them often cannot justify the reason or give the origin of their inclination (see, e.g., Rudmin & Cappelli, 1983).

Now, even if crossmodal associations between smells and apparently unrelated features satisfy the criterion of giving rise to converging answers, it is not clear that they hold through time or are shared among individuals exposed to the same environmental cues. In this sense, it might seem uncertain that they satisfy the two other criteria of consistency and universality, which seem to be generally granted to crossmodal correspondences (but which are less often tested; though see Bremner et al., 2013, for a recent exception). A higher degree of interindividual variability in smell–music or smell–shape associations should, however, not deter one from thinking about them as varieties of crossmodal correspondences. For instance, if the crossmodal association between certain speech sounds and visual shapes is reported to be universal (see Spence & Deroy, 2012b, for a review), it is still less frequent in nonliterate cultures (Bremner et al., 2013). Crossmodal associations between flavors, shapes, and speech sounds appear to vary from culture to culture, as do the crossmodal associations between tastes and smells (Spence, 2008)—all of which still count as crossmodal correspondences. Regarding consistency (or intraindividual variability), it is fair to say that many of the associations concerning smells have not been subject to a test–retest paradigm covering long periods of time (e.g., in Belkin et al., 1997, participants were retested after only a week), as is usually done with synesthetic associations. This said, the fact that the criterion of consistency through time might be important for synesthesia (see Deroy & Spence, in press, for a discussion, and Simner, 2012, for a different view) does not mean that it is important for crossmodal correspondences, which one would expect to change following exposure to an environment with different statistics.

In the absence of further testing, then, it is impossible to say firmly whether crossmodal associations between smells and auditory features or geometrical visual shapes are really less robust—that is, whether they present more inter- and intraindividual variability—than other crossmodal associations that have been documented to date. This said, even if this were demonstrated, a lack of robustness would be a characteristic to explain, rather than a reason to reject these associations from the category of crossmodal correspondences.

Lack of a straightforward statistical explanation

What about the acquisition of these crossmodal associations? It is unclear whether olfactory–auditory crossmodal associations could have been learned through associative learning, as has been proposed to explain other examples of crossmodal correspondence (Parise & Spence, in press; Spence, 2011a; Spence & Deroy, 2012a, 2012b). Although it might, for instance, seem equally arbitrary at first that people match vanilla to sweet tastes and higher-pitched sounds to bergamot rather than to caramel, the former associations can be explained in terms of a regular joint co-exposure—as vanilla tends (at least in Western countries) to be present in sweet, rather than salty or sour, foods. If certain odors might indeed have been encountered while listening to certain pieces of music or seeing certain rounded shapes, these encounters are very unlikely to have been statistically more relevant than other associations.

In a study reported by Crisinel and Spence (2012b), consistent crossmodal matches were documented between a variety of synthetic fruit odors and high-pitched notes (regardless of the instrument playing the note). Such results accord well with previous findings demonstrating that sour and sweet tastes, two prominent taste qualities present in fruits, are both associated with high-pitched sounds (Crisinel & Spence, 2010). The fact that the tastes paired with high-pitched sounds and the smells associated with high-pitched sounds are themselves strongly connected (at least in the case of fruits, where they have been tested previously) still goes against the associative-learning explanation put forward for taste attributes and smells: It is not clear that any good explanation can be produced in the case of fruits being regularly experienced in conjunction with, or shortly after, sounds having a higher pitch. Sounds and smells do not naturally co-occur, at least not in a way that would explain the music–smell or pitch–smell pairings that have now been documented. Let us put aside those crossmodal associations that may have been formed between a certain perfume and an arbitrary piece of music, which might certainly happen as a result of branding a certain association (see also Nelson & Hitchon, 1995, 1999). In only a few cases do a smell and a characteristic sound belong to the same object (e.g., animals have characteristic smells and make particular recognizable kinds of sounds), and this is certainly not the case for fruits or flowers. For most floral, fruity, and other environmental smells to which the music–smell mapping applies, as far as we can tell, no straightforward associative-learning story can be told to explain the high probability of their being presented together, beyond a few examples (the smell of the forest and the noise of the wind in the leaves, women wearing perfumes different from those for men and having higher-pitched voices, etc.).

The same limits regarding the associative explanation can be illustrated in the case of the crossmodal relations that have been documented between olfaction and vision. Not surprisingly, Seigneuric, Durand, Jiang, Baudouin, and Schaal (2010), for example, demonstrated that learned associations between a smell (e.g., orange) and a picture of its source (e.g., a picture of an orange) affect visual exploration: In the presence of their corresponding odors, objects are explored more rapidly, and for a shorter time, than other objects in the scene, and this occurs even if participants are not aware of the smells having been presented (see also Demattè, Sanabria, & Spence, 2009; Seo, Roidl, Müller, & Negoias, 2010). As in the case of the crossmodal association between taste and smell, this case can be explained by the regular co-experiencing of two features and, more specifically, of features that normally originate from, and are attributed to, the same environmental object.

But things also start to look more complicated when one turns to examine the crossmodal associations that have been documented between olfactory stimuli and geometrical or symbolic shapes, rather than the shapes of objects (Hanson-Vaux et al., 2013; Seo, Arshamian, et al., 2010). So, for example, the participants in a recent study by Seo, Arshamian, et al. had to judge whether eight food odors (guava, honey melon, mint, parmesan cheese, pepper, truffle, vanilla, and violet) fitted 19 different geometrical shapes or symbols (see Fig. 1). The shapes could be described as varying from more organic to more angular. The intriguing result emerging from this study was that the participants reliably matched certain shapes to particular odors.

Fig. 1
figure 1

Abstract symbols/shapes used in Seo, Arshamian, et al.’s (2010) study of the crossmodal correspondence between odor and shape. Participants were presented with eight odors and viewed a series of 19 abstract symbols/shapes, one by one. They were instructed to answer the question, “Does this odor fit this symbol?” via “yes” or “no” answers. Pairings of symbols/shapes with odors obtained through the correspondence analysis are shown here: The first two dimensions (axes X and Y) accounted for 93.3 % of the total variance, and in particular, the first dimension (axis X) explained a large portion of the total variance (88.2 %)

In the second part of their study, Seo, Arshamian, et al. (2010) went on to test for differences in the odor pleasantness and intensity ratings of two of the stimuli (the odors associated with violet and parmesan cheese) when the stimuli were presented together with an abstract shape that had been judged as being congruent versus when they were presented with an incongruent shape (or else with no visual stimulus). The behavioral responses (i.e., the psychophysical data) showed no significant differences between the congruent and incongruent conditions. However, the neural responses of participants in Seo, Arshamian, et al.’s study were also measured using olfactory event-related potentials (ERPs). Using the latter technique, the researchers were able to demonstrate that this particular crossmodal correspondence between abstract visual shapes and odors influenced the magnitude and latency of the early N1 component, occurring around 400 ms after stimulus onset. These results were taken to suggest that the presentation of the abstract shape was affecting participants’ perception of the odor, rather than their rating of that olfactory experience.

Here, the use of geometrical or symbolic shapes showed that smells were not matched to shapes in a manner that could be explained simply by suggesting that participants were matching to the shape properties of their typical source (again, this is similar to what has been noticed for crossmodal associations between tastes/flavors and shapes; see Deroy & Valentin, 2011; Spence, 2012; Spence & Gallace, 2011). Although other semantic associations between the smells that participants were able to identify and the concepts associated with the shapes could have been at play, they cannot explain all of the associations, as recognition of the smells was not necessary for the matching to be done. For instance, in Crisinel and Spence (2012b), participants were able to correctly identify the olfactory stimulus in only 17.7 % of the cases, while in a further 17.3 % of the cases, they identified the general category of the stimulus correctly. For instance, nobody identified the blackberry odor, and yet it was strongly associated with the piano.

Underevidenced behavioral effects

A final difference between the smell–sound and smell–arbitrary shape associations considered here and the more obviously statistically grounded crossmodal correspondences needs to be highlighted. Crossmodal mappings of olfactory stimuli to auditory features or arbitrary shapes are not expected to be relevant for multisensory interactions, especially when it comes to helping solve the binding problem (i.e., one can think of similar mappings performing the role of coupling priors in Bayesian models of multisensory perception; see Ernst, 2007; Spence, 2011a). This contrasts with what has been observed for odor–taste correspondence (as well as for other crossmodal correspondences, noticeably audio–visual; see Parise & Spence, in press, for a review), which has been shown to affect participants’ performance on a variety of detection, speeded classification, and multisensory integration tasks (see Gallace & Spence, 2006; see also Evans & Treisman, 2010, and Spence, 2011a, for reviews). Certain odorants (again, such as vanilla) can enhance the perceived sweetness of certain solutions (see, e.g., Djordjevic, Zatorre, & Jones-Gotman, 2004). The same phenomenon also exists in terms of the effect of “bitter” smells on perceived bitterness and of “salty” smells on perceived saltiness (Lawrence, Sallesa, Septiera, Busch, & Thomas-Danguina, 2009; Seo, Iannilli, et al., 2011). There is also evidence that statistical crossmodal correspondences can modulate the neural response relatively early during information processing (that is, 220 ms after stimulus onset, see Bien, ten Oever, Goebel, & Sack, 2012; Spence & Parise, 2012; see Spence & Deroy, 2013, for a discussion). Such results should, of course, not be taken to imply that statistical correspondences do not also (or sometimes only) activate other loci later in information processing, as well (Sadaghiani, Maier, & Noppeney, 2009).

Some behavioral effects of crossmodal cues from vision or audition on olfaction have been documented, but mostly for semantically congruent or regularly associated pairs of stimuli. Now, this might be attributable to researchers having looked preferentially at those crossmodal influences that have been formed by associating various cues relative to the same object (e.g., following the unity assumption, considered as a crucial factor in multisensory research since Welch & Warren, 1986; see also Vatakis & Spence, 2007). However, effects that are not limited to these cases have been demonstrated in those broader crossmodal correspondences between sometimes distinctly presented objects whose dimensions happen to feel congruent: Higher-pitched sounds can facilitate the detection of brighter objects, even without coming from or being attributed to the same source or object (e.g., Klapetek, Ngo, & Spence, 2012; Ludwig, Adachi, & Matzuzawa, 2011). As we detail below, this draws attention to the need for further investigation in the case of associations holding between olfaction and musical features, and it indicates a few of the directions along which research could profitably be pursued (see Table 2 and below).

Table 2 Effects of visual and auditory stimuli on olfactory perception and evaluation

Effects of the joint presentation of olfactory and visual cues

In the case of olfaction and vision, for instance, it is well established that semantically congruent visual shapes—but also colors in general—facilitate olfactory identification (see, e.g., Demattè et al., 2009; see also Spence, Levitan, Shankar, & Zampini, 2010, for a review for a review of orthonasal and retronasal olfaction). Interestingly, Zellner and Whitten (1999) reported that while semantically congruent colors can enhance the intensity of a perceived odor (green for mint odor, red for strawberry odor), the effect was mainly due to the intensity of the color, while hue had little effect: The presence or absence of color in the concurrently presented solution seems to be an important factor (see also Zellner, Bartoli, & Eckard, 1991; Zellner & Kautz, 1990; but see Spence et al., 2010), in addition to whether it was red or green. Going one step farther, recent imagery studies have suggested that vision contributes more generally to olfactory processing, as the mere stimulation of the visual cortex improves performance at the discrimination of different odor qualities (Jadauji, Djordjevic, Lundström, & Pack, 2012). In Seo, Arshamian, et al.’s (2010) study, the presentation of pairs of visual shapes and odors that had been rated as corresponding crossmodally in a previous task increased both the pleasantness (or unpleasantness) and intensity of the odors. The influence of visual cues on olfaction therefore seems to operate in many ways: Besides the well-documented and specific influence of learned semantic associations, a more general “encouragement” (as the authors call it) has been observed at the neurological level, and room also remains for more general crossmodal correspondence effects between odor perception and some broader range of visual variations, such as shape or angularity.

Effects of the joint presentation of olfactory and auditory cues

Turning now to crossmodal pairings between audition and olfaction, the documented effects concern olfactory evaluation, rather than discrimination. Olfactory stimuli are, for instance, rated as being more pleasant when paired with a sound that happens to be congruent with the smell, rather than incongruent with it (Seo & Hummel, 2011)—for example, when the sound of someone else eating potato chips makes the smell of chips more pleasant (or, likewise, the sound of drinking coffee interacts with the smell of coffee). Arbitrary pairings of pleasant and unpleasant sounds with olfactory stimuli have also been shown to make the latter smell more pleasant or unpleasant, but no more intense, a finding that might simply demonstrate an emotional “halo” effect (i.e., the overall attractiveness of an object or situation biases the rating of some unrelated characteristic; Thorndike, 1920). In the absence of more systematic testing of the emotional effects of congruent pairings of olfactory and musical stimuli, one can at least think of some artistic attempts to exploit the power of joint olfactory and auditory stimulation (as in the case of Huysmans’s, 1884, “mouth organ” or Huxley’s, 1932, “scent organ” displaying symphonies of aromas, or in Scriabin’s olfactory symphonies, although it has to be said that the latter’s creative associations raised little enthusiasm among his audience/critics; see, e.g., Runciman, 1915).

The attested effectiveness of congruencies that respect crossmodal correspondences in the food domain (Piqueras-Fiszman & Spence, 2012; Spence, 2011a) and the effects of audition on food perception (Spence, 2012; Spence & Shankar, 2010) do, at least, suggest that such congruency effects also occur for music and olfactory stimuli. Further encouragement is to be found in recent evidence (albeit not yet extended to humans) of single units in the olfactory tubercle responding to the simultaneous presentation of odors and tones (Wesson & Wilson, 2010; see also L. Cohen, Rotschild, & Mizrahi, 2011).

Further effects

Crossmodal correspondences between auditory and olfactory cues might have other effects on perception, to play a role, for instance, in crossmodal perceptual grouping (Spence, in press; Spence & Chen, 2012). One benefit of an explanation in terms of crossmodal correspondences is to generate testable hypotheses regarding such effects. Take, for instance, the fact that higher- and lower-pitched sounds correspond, respectively, to higher and lower locations in space, which has been documented in both the visual and the tactile modalities (Bernstein & Edelstein, 1971; Occelli, Spence, & Zampini, 2009; Pedley & Harper, 1959; C. C. Pratt, 1930). One can then wonder whether such a compatibility effect exists for odors that correspond to higher or lower pitch. Can we predict that the “high notes” in a perfume or the odors paired with higher-pitched sounds, such as bergamot and fruit odors (Belkin et al., 1997; Crisinel & Spence, 2012b), will also lead to facilitation of participants’ performance in a visual-target elevation discrimination task?Footnote 2 Support for this hypothesis has come from the observation that terms such as “high notes” and “low notes” are said to correlate with properties of the chemicals that they are used to talk about—noticeably, the compound’s relative volatility: That is, “high” notes typically correspond to the most volatile compounds in a perfume, those that arrive first at the nasal epithelium, whereas the “lower” or “middle” notes correspond to the less volatile compound—that is, those that will likely arrive later at the nose. This, at least, is the kind of claim that one finds in the literature (Teixeira, Rodríguez, Mata, & Rodrigues, 2009; Turin, 2007; see Hettinger, 2008, for a review)—but one that perhaps awaits empirical confirmation before being endorsed whole-heartedly.

Besides looking more systematically for the behavioral effects of pairing olfactory stimuli with their corresponding pitch, timbre, or geometrical shape, another important test for their inclusion as crossmodal correspondences comes from their being bidirectional. Other crossmodal correspondences have already been shown to operate bidirectionally—with, for instance, brightness corresponding to higher pitch, and higher pitch to brightness (which, by the way, constitutes yet another reason to distinguish between such correspondences and synesthesia; see Deroy & Spence, in press; Martino & Marks, 2001; Spence, 2011a). Not much in the way of research has been conducted in laboratory settings to try and explore crossmodal matchings from audition to olfaction, but some artistic examples exist, with certain musicians looking for the right concert of smells to accompany their musical compositions (see Bungey, 2012). If the mapping that exists from smells to musical features—for instance, from unpleasant smells to brass—also holds in the other direction—that is, from the sound of brass instruments to specific unpleasant smells—this would give some further argument for the existence of a crossmodal correspondence. Of course, the many dimensions of variation in both music and perfumes are unlikely to reveal their intriguing connections so easily.

To summarize, the cases of matchings between odors and musical notes or geometrical/symbolic visual shapes are (a) convergent, even if not demonstratively consistent over time or universal; (b) not straightforwardly (or obviously) learned by being associated in experience, either in the same external object or as part of any kind of statistical co-occurrence that might be present in the environment; and (c) capable of affecting evaluation (and perhaps behavior).

The two latter points come together in the present accounts of crossmodal associations between olfactory and auditory stimuli (or visual angularity): As they have not yet been linked to robust behavioral consequences or crossmodal effects, and do not seem to come from perceived regularities, it has been suggested that these cases need to be thought of as examples of metaphorical mappings: That is, they should be explained not in terms of perceptual processes or associations, but in terms of linguistic or conceptual mappings instead. Several researchers have even suggested that this may be the right way to understand all surprising matchings across sensory modalities (Shen & Eisenman, 2008; L. Walker, Walker, & Francis, 2012a). This conclusion, however, is not necessarily warranted. As was noted by Spence (2011a), the alternative is not between crossmodal correspondences being either statistical (e.g., like odor–taste pairings) or otherwise linguistic (or conceptual); notably, we need to make room for structural determinants for crossmodal pairings. But before we turn to this alternative, let us stress why metaphorical accounts are also not sufficient.

Why metaphorical and conceptual accounts are not sufficient

Using words that pertain to one sensory modality (e.g., audition) to characterize experiences in another sensory modality (e.g., olfaction) is not specific to a particular pairing of sensory modalities, but here the example of olfaction versus hearing seems to fulfill a specifically useful role. The olfactory domain is one that is notoriously difficult to verbalize (see, e.g., Ackerman, 1990), at least in the sense that most of the terms that we use to describe smells refer not to qualities of the olfactory experience that those smells give rise to. Rather, olfactory descriptors individuate families of odors by reference to their typical sources (see Berglund, Berglund, Engen, & Ekman, 1973; Dubois & Rouby, 2002; Engen 1987; Lawless & Engen 1977): People happily talk about the smell of a violet, but have almost no direct linguistic means to say exactly what it is like to smell such a flower. When it comes to capturing the conscious experiences (the olfactory qualia, as philosophers would say), people are most often inclined to discriminate them in terms of their intensity and whether they are pleasant or unpleasant (see Koulakov, Kolterman, Enikolopov, & Rinberg, 2011; Yeshurun & Sobel, 2010). Crossmodal attributes applied to smells can therefore be thought of as a case of metaphorical transfer, or as a means to “understand and experience one kind of thing in terms of another” (Lakoff & Johnson, 2003, p. 5). In this sense, crossmodal transfer is something that occurs between concepts.

Related to this, Peter Walker and his colleagues in Lancaster have suggested that all crossmodal mappings could be explained by a form of conceptual mapping (L. Walker et al., 2012, for a recent statement). These conceptual mappings hold between what they have called, following on from Karwoski, Odbert, and Osgood (1942), “dimensions of connotative meaning.” In an early version of the hypothesis, P. Walker and Smith (1985) suggested that a certain concept, such as the concept of brightness elicited by visual stimuli, and literally denoting visual luminance, also connotes a “suprasensory” concept which encompasses olfactory brightness, auditory brightness, and so forth. Why connotative meanings might be suprasensory was, however, not made clear in Walker and Smith’s original article.

More recently, L. Walker et al. (2012) proposed the existence of a conceptual process by which every polar sensory dimension (e.g., bright/dark, fast/slow, light/heavy, small/big, etc.), and even less clearly sensory polar dimensions (such as active/passive, taken from the literature on the semantic differential technique: Osgood, Suci, & Tannenbaum, 1957) are mapped onto one another in ways that respect polarities (the brighter goes with the thinner, smaller, lighter, etc.). This conceptual process is meant to explain the ubiquity and bidirectionality of all crossmodal transfers, but it has mostly been tested and built around auditory, visual, and tactile dimensions of experience. It is, however, supposed to extend to all crossmodal associations, including olfactory ones: In one of Walker et al.’s footnotes on the topic of olfaction (and flavor), they stated that olfactory brightness is likely to correspond to the weaker intensities on the polar scale (see L. Walker, Walker, & Francis, 2012b, note 2). Going one step farther, the hypothesis seems to be that brighter smells will also be thought of as small, higher in pitch, light, active, weak, and so forth.

Although they are attractive, such metaphorical/conceptual accounts face two main difficulties. First, as we detailed at the beginning of the previous section, the two versions that have been given of this account both fail to provide a genuine explanation for the mechanisms that underlie crossmodal mapping; they offer, rather, a description of certain of its verbal manifestations. Second, as we then detailed, such description is not sufficient to account for the nonlinguistically manifest or mediated effects that are being progressively documented and should be investigated further.

A limited explanation of specific mappings

First, it is not clear how the metaphorical/conceptual account can explain specific crossmodal associations—notably, why certain odors are mapped onto characteristics of musical stimuli such as timbre or pitch. If one follows the polarity-based principle, it seems difficult to understand the associations between olfactory dimensions or odors and timbre (e.g., piano vs. brass), as timbre is a qualitative or discrete feature that does not fit onto a polar scale. The preferential matchings of certain odors with certain instruments, such as piano or brass (Crisinel & Spence, 2012a), therefore need to be explained by other rules. Going one step further, even crossmodal associations between odors and high versus low pitch do not simply follow the principle of polarity matching: The choice of pitch for a certain odor has been shown not to be governed simply by a mapping with its intensity (e.g., the more intense the odor, the higher the pitch) or its pleasantness (e.g., the less pleasant the odor, the lower the pitch). As the qualities of odors are also likely to play a role (Belkin et al., 1997; Crisinel & Spence, 2012b), the principle of a polar mapping also fails to explain the process by which odors and pitch come to be associated. (It is important to note here that L. Walker et al., 2012a, did not test whether the same mappings as those generated through touch, vision, and audition would also apply to the case of olfaction.) Turning to the literature on metaphor, another hypothesis that has been advanced to explain the basis on which a certain crossmodal transfer occurs across sensory modalities has been to suggest that metaphorical mappings result from a tendency to use better-known experiences from the lower, more concrete, sensory domains to understand higher senses and more abstract domains (Lakoff & Johnson, 2003; Shen & Eisenman, 2008). The use of crossmodal adjectives is said to follow a law of “semantic transfer,” according to which the senses of touch, taste, and smell (i.e., what are also known as “proximal senses”) provide a source of attributes with which to characterize our experiences in the “higher senses” (i.e., the distal senses of audition and vision). A crucial source of support for this theory comes from the evidence that higher, more abstract senses are not used (metaphorically) to understand the lower senses (see Fig. 2). For example, people often talk about sweet sounds and warm colors, but not about loud tastes or colorful warmth. More concrete gustatory and tactile descriptors are applied to auditory and visual features, but the reverse is not true, or at least does not occur frequently.

Fig. 2
figure 2

Schematic diagram illustrating the prominent direction in which the crossmodal transfer of sensory adjectives occurs (according to Williams, 1976). The directions of the crossmodal transfers in William’s original article are shown using solid lines, while the “transfers” that we suggest need to be added are displayed using dashed lines

However, the law of semantic transfer seems to have been particularly unfair to the sense of smell, which indeed occupies an intermediate position between the proximal and distal senses. Williams’s (1976) famous conceptualization accounts only for a regular transfer from taste to smell (as in the phrase a “sweet smell”), but ignores altogether the application of auditory, but also tactile and visual, attributes to smells, as in expressions such as “heavy smells,” “bright smells,” and “high notes” in a perfume.

Of course, Williams’s (1976) followers working in the field of “cognitive linguistics” (e.g., Shen & Eisenman, 2008) could, on this point, argue that the “lower-to-higher” rule captures a general tendency and not a systematic rule (the sort of thing that is not to be expected in language, anyway). The argument would hold if the evidence that metaphorical mappings tend to operate from lower to higher senses were based on large-scale observations across languages, rather than on the study of written or poetic corpuses, in which between 65 % and 80 % of the adjectives that are transferred from one sensory domain to another follow this direction (see Dombi, 1974, for Romanian; Shen, 1997, for Hebrew; Ullman, 1957, for French, English, and Hungarian; and Yu, 1992, for Chinese; similar results for Serbo-Croatian, Arabic, and Russian are also mentioned by Shen & Eisenman, 2008). Although better evidence and methods in the field of experimental or cognitive linguistics could strengthen the “lower-to-higher” rule, we believe that focusing on linguistic manifestations already encourages a confirmation bias in favor of the linguistic accounts.

The need to account for nonlinguistic manifestations and origins of crossmodal correspondences

This leads to the second reason not to be satisfied with the “cognitive linguistics” (or sometimes only linguistic) treatment of the crossmodal transfer of sensory attributes. Authors like Williams have certainly been open to ideas that “connections might exist among ontogeny, phylogeny, the neurophysiology of sensation, cognition, and naming” (Williams, 1976, p. 473), and what is more, that such connections might underlie crossmodal transfer, but their accounts have relied primarily on linguistic data. Methodologically, then, as Lawrence Marks has put it, “even if some perceptual metaphors might end up being mediated linguistically, their origins appear to be wholly in perception itself, starting with perceptual processes before being overlaid and dominated by linguistic ones” (Marks, 1996, p. 59), the explanation should not just focus on those crossmodal transfers that emerge in language.

One cannot even assume that the sorts of crossmodal mappings that are observed at the linguistic level are necessarily similar to the more fundamental perceptual mappings in which, according to both Williams and Marks, crossmodal metaphors originate. The experimental investigation of the perceptual associations relevant to olfaction has grown in recent years, especially of the semantic or episodic-memory relations between olfactory stimuli and visual attributes (Demattè, Sanabria, & Spence, 2006, 2009) or between (orthonasal) smells and taste/flavor attributes (Stevenson & Boakes, 2004), both of which can have behavioral consequences in terms of odor discrimination or chemosensory detection. More associations have been demonstrated between smells and tactile perception—leading to, for example, the crossmodal effects of olfactory stimuli on the perceived textures of various materials that have been evidenced in recent years (Churchill, Meyners, Griffiths, & Bailey, 2009; Demattè, Sanabria, Sugarman, & Spence, 2006; Krishna, Elder, & Caldara, 2010; Stalmans, 2008). Associations with audition, in this sense, have not been investigated in anything like as much detail, with the exception of the work of Belkin et al. (1997) and Crisinel and Spence (2010, 2012b) on those associations that exist between pitch, timbre, and odor.

Explaining crossmodal correspondences between contingently related features

If neither metaphorical nor associative-learning accounts are capable of explaining cases of music–odor or geometrical shape–odor matching, we are left with a puzzle. Where do these associations come from, and what explains the fact that people converge on these associations? Taking up a suggestion made by Marks (1978) and Spence (2011a), we want to suggest that these associations might arise naturally from the organization of the perceptual system. Some crossmodal pairings are also grounded in facts about the structure of our perceptual system, such as innate neural connections (Maurer & Mondloch, 2005; see also Marks, 1978, 1987; Wagner & Dobkins, 2011) or common coding (e.g., magnitude-related dimensions such as loudness and size correspond, given that magnitude appears to be represented in the same way by the brain, regardless of the particular dimension under consideration; see L. B. Smith & Sera, 1992, and Walsh, 2003). This is where we consider that contingent associations between odors and notes or geometrical shapes turn out to be relevant—as they extend the category of crossmodal correspondences to associations having structural, and not just statistical, bases. In this sense, then, they help one to investigate some possible ways in which crossmodal correspondences are generated in the mind/brain, because of more internal determinants (Spence, 2011a; see also Marks, 1978). In light of the existing evidence, three hypotheses are worth pursuing. According to one interpretation, olfactory and visual or auditory sensory experiences can share certain dimensions—for instance, they have the same dimension of “elevation” in common (a hypothesis that can be called amodal, by reference to space being called an “amodal feature/dimension”; Giudice, Klatzky, & Loomis, 2009). To keep this hypothesis distinct from the conceptual–metaphorical interpretation, it is also important to stress that the amodal character does not come from a conceptual remapping or translation.

According to a second interpretation, some other indirect commonality (e.g., of pleasantness, fastness, etc.) explains why all sensory experiences that share this dimension correspond (we can call this the mediated hypothesis). Finally, a third interpretation is that correspondences are transitive—that is, if a feature or a dimension A corresponds to a feature or dimension B in another sensory modality, and B, in turn, corresponds to a feature or dimension C in a third sensory modality, our brains will generate a crossmodal correspondence between A and C (see Fig. 3).

Fig. 3
figure 3

Three hypothetical explanations for the mapping of sensory attributes across sensory modalities, such as in the music–olfactory case: (a) The olfactory and auditory perceptions share some common amodal dimension (e.g., space), or (b) they have the same effect or connection to a third supramodal/independent dimension (e.g., emotional effect, or basic taste), or (c) a network of crossmodal correspondences between sensory features or dimensions establishes direct links based on statistical co-occurrence; if the correspondences are transitive, two non-statistically-corresponding dimensions can nevertheless be linked through crossmodal correspondences

In what follows, we will examine these three hypotheses in turn. Each of them, as we will show below, illuminates some important aspects of the underinvestigated phenomena of chemosensory crossmodal correspondences and suggests some more precise directions for further research.

The amodal hypothesis

It is common in scientific circles to refer to certain dimensions or aspects of sensory experiences as “amodal,” meaning by this term that the coding of such dimensions is common across various sensory modalities. This interpretation has been proposed for magnitude (see Walsh, 2003), space (Giudice et al., 2009), and time (van Wassenhove, 2009), but also, by earlier researchers, for sensory brightness (N. E. Cohen, 1934; von Hornbostel, 1931) and intensity (Lewkowicz & Turkewitz, 1980). This said, calling a dimension “amodal” might have to meet more exacting standards than just being perceptible by several senses. Currently, it seems to include the stronger requirement of having identified cortical areas where that dimension is represented in common among several sensory modalities (see, e.g., Walsh, 2003). Here, it is worth noting that in the case of olfaction and audition, nothing has been directly documented in humans, despite some prophetic claims about the neural basis for “smound” (that is smell–sound shared representations or connections) soon being discovered (Peeples, 2010, p. 28; see also L. Cohen et al., 2011; Wesson & Wilson, 2010, for evidence in mice).

Another idea that usually comes with amodal hypotheses is that the common representation will lead to interactionsor integration between two sensory estimates coding for the same amodal dimension, usually with the dominance given to the most accurate or precise estimate (i.e., vision for space, audition for time, etc.). It is certainly difficult to bring to mind examples in which the spatial (or temporal, etc.) information provided by audition interacts with the spatial (or temporal, etc.) information provided by olfaction. Besides the cases that can be found in the animal literature (Skals, Anderson, Kanneworff, Löftstedt, & Surlykke, 2005), the main example would come from a correlation between smell and sound intensity when their common source gets closer or farther from the perceiver (see Wright & Thomson, 2005). This relation gets expressed in crossmodal correspondences between olfaction and audition: Sounds of high volume are associated with a high concentration of an odor, whereas a low volume is associated with a low concentration of an odor (Persson, 2011).

Otherwise, though, the spatiotemporal structure of the olfactory experience is not a particularly well-investigated area of research, and therefore it would benefit from the generation of testable hypotheses. However, we can find arguments for a common spatiotemporal structure shared by olfaction and audition, which supports the amodal hypothesis.

Neither olfactory nor auditory experiences are spatially structured in the same way that visual experiences are. Indeed, some philosophers (Batty, 2009, 2010) have even claimed that olfactory experiences are not spatially structured at all, so that only a general smell, with a general “around my nose” location, can be experienced. Other researchers, however, have stressed that olfactory experiences, especially those coordinated with sniffing and body/head movement (Porter, Anand, Johnson, Khan, & Sobel, 2005; Porter et al., 2007), contain some spatially rich information (see also von Békésy, 1964). Both the egocentric frame of spatial reference and the fact that changes of spatial information are tightly connected with head movements that are required, given the fixed locations of the relevant sensory receptor organs (nose and ears) on the head (contrary to what happens with eye movements, which are to some degree independent of head movements), are traits that are shared by olfaction and audition. This might make a spatial mapping of olfactory experiences onto auditory space more appropriate than a mapping onto allocentric visual space.

There are also interesting similarities between the spatial/temporal experiences of perfumes and music. Olfactory and auditory stimuli are time-varying and necessarily (or most often, in the case of sound) evolve over time in a way that vision does not (or at least does not need to), as vision is sometimes static and unchanging. Another similarity comes from the fact that a musical piece can be experienced as complex and unified at one and the same time—that is, as a harmony of different instruments, all partaking in a single unified sequence. The fact that sounds come from different spatial locations (as when listening to music in a concert hall) and/or at slightly different times constitutes one of the main cues on which the distinction between various components is achieved. Differences in modally specific auditory quality (timbre, pitch, etc.) will also help the listener distinguish between, say, the various instruments in an orchestra. The same is also true in the case of perfume, in which different “notes” can be experienced as a single unified whole (perfume or bouquet), but also—in certain cases at least—as a mixture of various components.

This said, researchers have shown that people might not be so reliable when it comes to analyzing component smells in a mixture; at best, they only seem capable of distinguishing two or three odorants in mixtures (Jinks & Laing, 2001; Laing & Francis, 1989; Lawless, 1999). In the case of complex perfumes, the distinction between several components is likely to be achieved on the basis of a combination of spatial and temporal cues, as in the case of audition: Individuals track which odor components come first or second, and eventually from where, while also being helped by modally specific aspects in terms of odor qualities. The discriminatory strategy, then, starts to look analogous to the one used for pieces of music.

Does this explain why the first notes of a perfume would be “high” and the ones that follow lower? One possibility here comes from the combination of a temporal structure (the fact that certain odors or notes arrive first and might or might not persist, as new component odors come to be perceived) and a spatial dimension (as all smells enter the nasal cavity from below). Although it is not often highlighted in the literature, the latter fact is crucial to olfaction: Smelling a perfume is often done with the bottle held below the nostrils, grounding a perception (then helped by tactile, proprioceptive, and visual cues) of smells as also coming from below. More generally, and even if odors are not phenomenologically experienced as coming from below, humans have certainly evolved with a “natural constraint” to expect smells to come from the ground, as they expect, for instance, the illuminant to come from above (Adams, Graf, & Ernst, 2004). The persistent notes that are experienced first are progressively complemented by other notes, also emanating from below—thus perhaps encouraging the impression that smells “pile up” in the nose, with first notes being “pushed up” when new notes also come from below. Although admittedly speculative, this hypothesis gives rise to a number of interesting questions as to what would happen to the high/low differences if the source of the olfactory stimulus were to be visually or haptically presented from above the nose/head of the perceiver, or if perfumes were to be smelled by someone who was suspended upside down (or bent over with his/her head between his or her legs) and feeling smells coming from “above.” Here we have been talking mostly of space as a common external localization (with sounds coming from different localizations and smells coming from below, at different times and with effects of persistence). Another hypothesis emerges once one starts to think about the localizations of the sensory receptors themselves.

Research by Lapid et al. (2011) has suggested that the organization of the olfactory receptor surface itself might give rise to the axes of olfactory perception, as is the case for both vision and audition. By inserting an electrode into the human olfactory epithelium, these researchers directly measured odorant-induced evoked responses and found that locations that responded maximally to a pleasant odorant were likely to respond strongly to other pleasant odorants (and vice versa, for unpleasant odorants). This said, the experiment conducted by Lapid et al. was only performed for the pleasant–unpleasant dimension of smell, and would obviously need to be generalized to other dimensions in humans (for other animal studies demonstrating topological organization in the olfactory epithelium, see Johnson & Leon, 2007; Le Gros Clark, 1951). In this case, it could be argued that odors for which receptive surfaces partly overlap are recognized as being more similar (e.g., higher/lower). Once we consider this hypothesis, another question emerges as to the extent to which people are also sensitive to differences in localization when inhaling—analogous, for instance, to differences in localization on the tongue. This question also needs further exploration, as previous studies have suggested that those odors having a trigeminal component could be differently localized by subjects within the olfactory cavity (see von Skramlik, 1924, and Fig. 4). Just think, for example, of the distinctive localization of the wasabi hit that one sometimes gets when eating sushi; this experience is very much localized to the nose, while the flavor of the sushi itself is firmly localized within the oral cavity itself.

Fig. 4
figure 4

Localization of various odorants in the nasal cavity. Odors with a trigeminal component are felt in a number of locations: for cold/fresh odors, in (2); for painful ones, in (3); and for warm odors in (4). Reproduced from “Über die Lokalisation der Empfindungen bei den niederen Sinnen [On the Localization of the Sensations From the Lower Senses],” by E. von Skramlik, 1924, Zeitschrift für Sinnesphysiologie, 56, pp. 69–140

The indirect hypothesis

An alternative hypothesis here involves a consideration of whether the connection between perfumes and music might be grounded in their emotional similarities. The notions of “high” and “low notes” become here causally efficacious when it comes to creating perfumes: Some of the most successful perfumers seem to be guided by this distinction when a perfumer chooses which odorant to mix with which to create harmonious combinations:

Creating a fragrance is similar to composing music, because there is also a similarity in finding the ‘proper’ accords. You don’t want anything being overpowering. You want it to be harmonious. One of the most important parts of putting a creation together is harmony. You could have layers of notes coming through the fragrance, but yet you still feel it’s pleasing. (Sophia Grojsman, interviewed by Diane Ackerman, 1990)

So, the suggestion here is that perfumers at least are getting at some “emotional” prediction with the connection between music and smell.

In the audio–visual domain, Lyman (1979) investigated this emotional hypothesis using Köhler’s (1929, 1947) round (maluma) and angular (takete) figures. He demonstrated that people reliably assigned emotional terms to each shape (23 out of the 40 terms used by the participants were classed as emotional) but also associated the positive terms with the round figure 83 % of the time, and the negative terms with the angular figure 75 % of the time. This hypothesis has also been pursued by Collier (1996) and, more recently, by Palmer and Schloss (2012) in an attempt to explain the robust crossmodal associations that they were able to document between colors and music (see also Schifferstein & Tanudjaja, 2004). Kenneth (1923) has been among the few to defend a version of this hypothesis in the olfactory domain, suggesting that what he called “indirect associations” between smells and music might come from “the affect produced by smell [being] similar to the affect produced by some other stimulus” (Kenneth, 1923, p. 77). Associations between say, “vanillin and Chopin” (ibid.) would then not result from a joint encoding in episodic memory, as was postulated in the case of olfactory–visual associations documented as “Proust phenomena,” but instead be mediated by their emotional similarity. The difference here matters, in terms of the prevalence of these phenomena: Interpreted as indirect associations, odor–music pairings are likely to be shared among individuals (or at least to make sense to several individuals, if odor and music pleasantness are shared), whereas associations grounded in episodic memory are more likely to be idiosyncratic.

Seo, Arshamian, et al. (2010, p. 176) suggested that the explanation for the convergent olfactory–shape mappings documented in their study might come from similarities between the emotions elicited by the two stimuli:

The odors generally regarded as being pleasant (e.g., vanilla, banana, violet, honey melon, and mint) were paired with circle- or curve-shaped symbols. Whereas, the odors judged generally as being unpleasant (e.g., parmesan cheese, truffle, and pepper) were paired with square- or angular-shaped symbols.

(see also Crisinel et al., in press, for other examples with music and smells). This said, the very characterization of truffle or pepper smells as unpleasant can be questioned, as these smells can be appreciated (in fact, they are highly valued) when presented in the appropriate food-related situations. What most of these odors perhaps have in common is a higher trigeminal component, which explains their shared association to angularity (although less clearly, as the smell of mint typically also contains a trigeminal component).

The hypothesis of an indirect, emotional mapping can illuminate the fact that the joint presentation of a smell and congruent sounds leads to an increased liking for the smell (Seo & Hummel, 2011). Schifferstein and Tanudjaja (2004) also found similar results for color–smell mappings: Participants in their study demonstrated nonrandom matchings between odors and colors, with the colors differing mainly in terms of their brightness, and less on the dimensions of saturation and hue. Here, though, a consistent negative relationship between odor–color degree-of-fit ratings and differences in the pleasantness of odors and colors also suggested a role for emotional associations. This shows why the present emotional hypothesis is altogether different from Lakoff and Johnson’s (2003) idea that the application of “high” to certain notes derives from a general metaphorical connection between good moods and upward direction: High notes in perfumes are not necessarily any more pleasant than the lower ones, and various mappings between high or low smells and musical notes are pleasant. This also suggests the need to explore the possible role of other emotional connections besides pleasantness—for instance, the relaxing or exciting dimensions of both smells and music. What seems to drive the connection further is the idea that both complex perfumes and complex musical pieces give rise to a sense of pleasant harmony with respect to certain rules of balance.

However, a series of problems also arise from the emotional account. First, from a practical point of view, it is not easy to distinguish between the crossmodal correspondences relating smells and auditory features and conditioned associations, which are also grounded in emotional reactions and could lead to very specific pairings between, say, a certain aversive smell and sound. A second problem with the emotional account is that it still fails to explain the already-mentioned correlation between high notes in a perfume and the chemical, objective properties of volatile odiferous compounds. Furthermore, Belkin et al. (1997) did not find any correlation between the pleasantness of smells and the chosen matching musical notes, suggesting that emotion was not the main factor underlying the crossmodal correspondences that were documented in their study. True, in Crisinel and Spence’s (2012b) study, emotions explained part of the mapping, as the sounds of a brass instrument were preferred for unpleasant olfactory stimuli, while the sound of a piano was associated with pleasant odors. However, not all such crossmodal associations are driven by the matching of the pleasantness of the component stimuli: For instance, while they differed in their choices of musical instrument, dark chocolate likers and dislikers gave similar responses for the pitch that they associated with the dark chocolate samples that they tasted (Crisinel & Spence, 2012c; see also Belkin et al., 1997, for similar results with odors). The piano was typically not chosen by participants for odors rated as more complex, and higher intensity ratings led to higher proportions of participants choosing brass instruments. This said, it would be interesting to test the importance of pleasantness in determining olfactory–musical note or olfactory–timbre associations, by giving people the very same smell under two different labels that would change its perceived pleasantness (e.g., iso-valeric + butyric acid as “body odor/vomit” or “cheddar/parmesan; see De Araujo, Rolls, Velazco, Margot, & Cayeux, 2005; Herz & von Clef, 2001).

Osgood et al. (1957) suggested, for instance, that other dimensions besides emotional hedonicity (or valence) explained the crossmodal mapping between auditory and visual features, and also highlighted that it was at least necessary to add activity (fast–slow) and potency (strong–weak) to hedonic value in order to adequately account for most people’s responses. Auditory–olfactory matchings should deliver interesting evidence as to the generalizability of Osgood et al.’s model here: Do fast–slow differences in auditory rhythm correspond better to the most volatile, and hence the more rapidly detected, olfactory notes, and the less volatile notes correspond to the more slowly, subsequently perceived smells? What matters more here is that this model relies on supramodal (good–bad) as well as amodal dimensions (time), suggesting that the explanation of what on the surface seems to be a unitary crossmodal correspondence could be due to a mixture of various underlying phenomena that are not all of the same sort.

A second hypothesis can be introduced here, which is based on stimulus intensity. Loudness, for instance, is associated with the intensity of a perfume (as was famously done by John Donne: “A loud perfume, which at my entrance cried / Even at thy father’s nose, so were we spied”; Donne, 1633/1971, p. 98). One possibility here could be that more intense odors are mapped onto louder sounds. Another hypothesis, and one that remains to be tested, is that “high notes,” being the most volatile and being detected first, are also more clearly and intensely perceived, as the other smells do not interfere with them (one could consider this a kind of “prior entry”; Spence & Parise, 2010). The more intense initial olfactory stimuli could, then, be mapped onto greater height. One way to test this hypothesis more systematically might be to see whether the same smell, delivered in one case first or more intensely, and in another case second and/or less intensely, would be differentially matched to high notes or low notes. Note that such an experiment would need to be carefully controlled, given that certain odors are perceived as being qualitatively different when presented in different concentrations (see Gross-Isserhof & Lancet, 1988). What’s more, Belkin et al.’s (1997) results suggest that, at least in the case of single odorants (and not complex mixtures), the mapping of a smell to a higher–lower pitch is indifferent to the order in which various odorants are presented.

The transitivity hypothesis

The review of the previous hypotheses might legitimately lead one to wonder whether the process underlying the perceptual mapping of smells to musical attributes can ever receive a simple answer. What seems to be true is that a series of intuitive or underlying similarities can all explain part of the mapping or some specific results obtained in distinct studies, but no unique or specific rule seems to emerge from these various studies. This might help us to draw a more nuanced lesson—which will at least prevent bold claims about all crossmodal transfers being governed by single or simple rules that could be read directly from the dominant transfers from most concrete to less concrete (see Lakoff & Johnson, 2003; Williams, 1976). However, the complexities that characterize the crossmodal associations noted between olfactory, and apparently arbitrary, auditory or visual attributes can reveal the existence of an underlying property of correspondences—that is, their transitivity. Transitivity is an extensively investigated feature of associative learning, which is demonstrated when individuals trained to independently associate one stimulus to two distinct stimuli (A to B and A to C) then respond differently to the pairing of these two stimuli (B and C). In other words, participants respond to the indirect relation between stimuli through their direct relation to a common stimulus (see Fields, Verhave, & Fath, 1984, for a review). Although transitivity has mostly been used to explain equivalences between two stimuli or objects, there would seem to be no reason why it could not also be applied to explain crossmodal correspondences between specific dimensions of these objects or stimuli. Transitivity can explain why certain associations come to generalize and be established beyond what has been experienced. In the case of olfaction, the match between pitch and angularity or angularity and tastes (but also complex flavors: Deroy & Valentin, 2011; Spence & Gallace, 2011; see Spence & Ngo, 2012, for a review), and in turn, these tastes and a certain smell can explain why pitch is ultimately matched to tastes/flavors and smells. The explanatory burden is then to explain these different steps (some of which can be explained in terms of correlated co-occurrence of features in the environment; e.g., pitch and angularity in the case of speech sounds, or sweetness and roundness through the frequency of round fruits), and not what seems to be a gap between pitch and smells.

A benefit of the transitivity hypothesis is that it predicts that the “mediated” correspondence will probably be weaker than the crossmodal correspondences acquired via direct co-exposure, and not have the same behavioral consequences. This hypothesis is also more true to the phenomena than the prediction derived from the semantic hypotheses (Martino & Marks 1999; P. Walker & Walker, 2012) that all dimensions might ultimately be placed in some sort of correspondence via a general conceptual mapping. The transitivity hypothesis does not predict a consistent alignment from one dimension/modality to another: Many examples simply do not follow this rule; for example, high pitch corresponds to “small” but high elevation is “more,” and low pitch is “large,” but decreasing pitch “shrinks” (see Eitan, Schupak, Gotler, & Marks, 2013; Eitan & Timmers, 2010). In the olfactory domain, we need to explain why the correspondences between fruity odors and high pitch, and fruity odors and round shapes, do not map onto the correspondence between high-pitched speech sounds and angular shapes, noted in the audio–visual domain, while the two domains probably match regarding brightness (as higher pitch corresponds to both bright smells and bright visual surfaces).


Olfaction remains perhaps the least well understood of the senses (Keller & Vosshall, 2004), and researchers still need to investigate further how it interacts with the other senses. The crossmodal mapping between odors and the sensory attributes belonging to other modalities is a widespread and long-standing tendency in many Western cultures, and probably beyond. While some of these mappings can receive an explanation in terms of associative learning (e.g., “sweet smells”; Stevenson & Boakes, 2004), the more surprising among them are difficult to explain (see Stevenson et al. 2012). Here we have focused on mappings between odors and auditory attributes such as pitch or timbre, or recent results highlighting the existence of crossmodal mappings between odors and geometrical or symbolic shapes. Thinking about these phenomena in terms of crossmodal correspondences extends the consideration of crossmodal correspondences that directly pick up on statistical regularities that are present in the environment (Ernst, 2007; Spence, 2011a) to contingent correspondences established via amodal, indirect, or transitive connections, as well as to the consideration of matchings involving whole objects (as some insist that olfactory objects are; see Wilson & Stevenson, 2006), and not specific dimensions or features of those objects.

The difficulty of providing a single or straightforward explanation for the problematic matchings here should not encourage one to go back to the idea that the “high notes” in perfume necessarily originate in people’s metaphorical or conceptual tendencies. In terms of explanation, we reckon that multiple roots likely underlie this particular class of crossmodal correspondences, not in the sense of competing general explanations, but in the sense of cumulative and mutually reinforcing effects. One possibility here is that amodal, indirect, and transitive factors shape these correspondences, with no easy way to study how these various factors interact.

A benefit of the present account comes, then, from proposing that attributions of “sweetness” (Stevenson & Boakes, 2004), pitch elevation (Belkin et al., 1997; Crisinel & Spence, 2012b), hue (Gilbert, Martin, & Kemp, 1996), brightness (Schiller, 1935), or shape (Hanson-Vaux et al., 2013; Seo & Hummel, 2011) can be explained in terms of perceptual processes grounded in exposure and structural determinants, instead of being divided between “synesthetic” connections, on the one hand (Stevenson & Tomiczek, 2007), and conceptual or metaphorical transfers, on the other. It also crucially expands consideration of the crossmodal associations elicited by olfactory cues beyond the idiosyncratic evocations studied as the “Proust phenomenon” (e.g., Chu & Downes, 2002; Willander & Larsson, 2006, 2008).

The next stage of research, beyond the continuing investigation of individual crossmodal correspondences between odors and other sensory attributes and the study of their underlying mechanism(s), should probably be to try to understand how these correspondences combine, or which comes to dominate/be activated in a specific context, and how they might modulate or interact with other crossmodal relations, such as the Proustian elicitation of visual images by olfactory stimuli (e.g., Chu & Downes, 2002; Saive, Ravel, Thévenet, Royet, & Plailly, 2013). Neuroimaging studies and transcranial magnetic stimulation (TMS) might help in future research to determine whether all of these correspondences are indeed on a par, neurologically speaking (see Bien et al., 2012; Sadaghiani et al., 2009; Spence & Parise, 2012).

Besides these theoretical aspects, crossmodal correspondences constitute unexpected but promising paths to explore in terms of correcting for the partial or total loss of smell that has been observed in the aging population and beyond. Cooke and Myin (2011), for instance, considered that sounds could be used as interesting substitutes for smells in sensory substitution, with sounds being made more or less intense, depending on sniffing rate, and timbres corresponding to odor qualities and pleasantness. Correspondences between smells and geometrical shapes (or auditory pitch) could also be used to improve olfactory training programs (Hummel et al., 2009) and product experience (Crisinel et al., in press; Spence, 2013).