1 Introduction

Perceptual similarity has been recognized as a crucial issue in Western philosophy since the time of the Ancient Greeks (Bryan 2012).Footnote 1 For Plato, for example, similarity was a key organizational factor of perceptual experience, one that grounded the ability of humans to categorize objects into different classes, that is, perceptual similarity is considered key to concept formation (e.g., see the Parmenides, Allen 1997, and Ryle 1939a, b, for authoritative readings; Williams 2002). In the Republic, Plato argued that many particular A-things are perceived as one if they are regarded as instantiating a universal A-ness (Republic, 596a, Reeve 2004). This idea assumes some sort of similarity between A-things and A-ness. For instance, Greyhounds and Basset hounds are themselves similar because each of them constitutes an instantiation of ‘Dog-ness’.Footnote 2 From Plato onwards, the topic of similarity has been debated throughout the history of Western thought, among others by Descartes (Gorham 1999), Leibniz (1923, A64 107/P 13), Goodman (1972), and Quine (1960, 1969, 2000; see also Churchland 1976).Footnote 3

The concept of similarity has also been investigated outside philosophy, often being regarded as a crucial prerequisite for human cognition (e.g., Gentner and Medina 1998; Goldstone and Barsalou 1998; Goldstone and Son 2012; Segundo-Ortin and Hutto 2021). Warning about the risks related to overriding philosophical issues in research on similarity,Footnote 4 the psychologist Linda Smith (1993) acknowledged that similarity is the source of learning, and is crucial even for the simplest of cognitive abstractions: “If we teach a girl to call her collie dog, she will call Labradors dog, and perhaps goats dog, but she will not call a motor scooter dog” (Smith 1993, p. 223, italics in the original). Much of the appeal that has been attributed to similarity relies on what the same psychologist suggested as essential to similarity, that is, being hybrid perceptual-conceptual in nature: “I propose that similarity is a complex and diverse set of processes that in their mutual interactions yield both a system of perceptual comparison that is inherently creative and a unitary concept of same that transcends specific perceptual features” (Smith 1993, p. 223). Stressing the “creativity” of perception, Linda Smith seemingly points to a crucial aspect concerning similarity comparisons, suggesting that the perceptual processes underlying sensory similarity aren’t completely constrained (e.g., they might admit of individual differences) and thus are, at least to some extent, open (we will deepen this aspect of the discussion in Section 4).

Many psychologists have stressed the phenomenological evidence that similarity is central to many different domains of human experience, from sensory perception through to abstract cognition (e.g., Goldstone and Barsalou 1998; Medin and Ortony 1989; Medin et al. 1990, 1993; Rosch and Mervis 1975; Smith and Medin 1981). Phenomenological similarity is important for the categorization and identification of objects themselves (e.g., think of shape similarity for vision, see Biederman and Ju 1988; Rosch et al. 1976; Tversky and Hemenway 1984). In his foundational paper, Amos Tversky (1977, p. 327) wrote that: “Similarity plays a fundamental role in theories of knowledge and behavior. It serves as an organizing principle by which individuals explain and classify objects, form concepts, and make generalizations. Indeed, the concept of similarity is ubiquitous in psychological theory. It underlies the accounts of stimulus and response generalization in learning, it is used to explain errors in memory and pattern recognition, and it is central to the analysis of connotative meaning.” The following year, at the opening of his book The unity of the senses, the influential experimental psychologist Lawrence Marks acknowledged the importance of perceptual similarity for the field of sensory psychology: “The theme of this book is similarity among the senses. While writing the book, I became aware (at first only dimly, but gradually more clearly) of the importance and scope of the very concept of similarity itself” (Marks 1978, pp. x-xi).

In the psychological literature, it is commonly accepted that similarity might exist between pairs of stimuli presented within the same sensory modality (e.g., Blank and Mattes 1990; Ekman 1954; Ekman et al. 1964; Shepard 1962, 1974; Tversky 1977). The majority of the studies in the literature on sensory similarity reference vision (e.g., Logothetis and Sheinberg 1996; though see Spence 2022a, for an isolated exception); this bias is not unexpected, given the well-known primacy of vision in Western culture (Classen 1997; Hutmacher 2019; Jenks 2002; see also Levin 1993). At the same time, however, talking about sensory similarity between pairs of stimuli presented in different sensory modalities would appear to be a much more controversial topic (e.g., Helmholtz 1878/1971; Marks 1978). As Marks (1978) pointedly observed: “There is no way that we can gradually modulate a chord played on a piano until it becomes indistinguishable from the fragrance of a rose, just as there is no way the sight of the pen can become the feel of it” (Marks 1978, p. 188). This quotation seems to complete the previous one, thus conveying the following idea: assuming that two stimuli presented to different senses cannot be conceived in terms of mere identity, some degree of similarity needs to be admitted to account for sensory experience. However, the question of how to explain similarity in the crossmodal domain has attracted, and continues to attract, a diverse range of theoretical responses (Spence 2022a, for a recent review on perceptual similarity in the chemical senses).

As will become clear, the issue of perceptual similarity raises a number of questions, which have both philosophical and psychological implications, such as: what are the origins and properties of perceptual similarity? Is perceptual similarity ‘of a single sort’ or are there multiple, intrinsically different kinds of perceptual similarity? If there are different kinds of perceptual similarity, what are they and how do they differ? In this paper, we try to address some of these questions by analyzing the notion of similarity with some insight from psychological works on crossmodal associations, a phenomenon which has seemingly received less attention than others in the literature on similarity (e.g., categorical perception, see Goldstone and Hendrickson 2010). The goal is to ascertain whether similarity can be perceived across the senses or rather is inferred as a result of some form of cognitive mediation and, relatedly, whether the similarity relationship can be understood in terms of the sharing of phenomenological or structural properties. To answer these questions, we will restrict our interests to similarity judgements for pairs of stimuli that happen to be presented in different sensory modalities, that is, in the crossmodal domain.

In a nutshell, we propose that what is generally referred to as “perceptual similarity” deals with a wide variety of processes that vary, depending on the context. In some cases, the similarity relationship is established on the basis of the contents of sensory apprehension; in others, the similarity process originates from perception but is finally accomplished at a cognitive level. To consider the effect of exposure and learning on the perception of similarity, we conceive this distinction in terms of a continuum, thus admitting that some similarities that are based on sensory apprehension might be altered by cognitive information as well as vice versa, that is, some cognitively established similarities might become perceptual following repeated exposure. At the same time, however, we also acknowledge that the continuum is not arbitrarily flexible, and we evoke the concept of (cognitive/perceptual) impenetrability to account for those phenomenal(/conceptual) similarities that would appear resistant to any cognitive(/perceptual) information.

2 Perceptual Similarity Across the Senses

The possibility of perceiving similarities across the senses has long been debated by psychologists/psychophysicists. For instance, in the eighteenth century, the eminent psychophysicist, Herman Ludwig von Helmholtz was skeptical in this regard, stating that: “The distinctions among sensations which belong to different modalities, such as the differences among blue, warm, sweet, and high-pitched, are so fundamental as to exclude any possible transition from one modality to another and any relationship of greater or less similarity. For example, one cannot ask whether sweet is more like red or more like blue. Comparisons are possible only within each modality; we can cross over from blue through violet and carmine to scarlet, for example, and we can say that yellow is more like orange than like blue!” (Helmholtz 1878/1971, p. 77).Footnote 5 Others, like Marks, quoted earlier have expressed apparently opposite ideas: “Much as the color aqua is more similar phenomenologically to cerulean than to pink, the flavour of lime more similar to lemon than to banana, so too are low notes played on a bassoon or an organ more like dark colors such as brown or black than bright colors such as yellow or white, while the higher notes played on clavier or a flute resemble yellow or white more than brown or black” (Marks 2011, p. 52).Footnote 6

The two quotes would appear to have different implications for sensory processing. While Helmholtz suggests that different sensory information is mutually exclusive, meaning that the same information cannot be picked out from/processed by two sensory systems, Marks believes that the different senses work in close communication and have rather to be considered as a potentially unified, or unifying, processing system. Despite their opposite tone, however, both quotes converge on identifying the field of crossmodal associations – conceived of as the deliberate and consistent matching between sensory stimuli, attributes, or dimensions from different sensory domains that are observed in normal (i.e., non-synaesthesesFootnote 7) people (see Marks 2004; Spence 2011 for a review) – as one of the natural domains in which to investigate the nature of perceptual similarity. Marks clearly sees the phenomenon of crossmodal matching as illuminating (cross-sensory) similarity. Commenting on some of his own findings on pitch-brightness associations, Marks writes: “In a cross-modality matching task, for example, virtually all subjects will set higher sound frequencies to match greater visual intensities (Marks 1974, 1978), thereby revealing a universal appreciation of similarity between the dimension of pitch on the one hand and that of brightness on the other” (Marks 1989, p. 58). The link between crossmodal matching and similarity is incidentally suggested by Goldstone and Barsalou (1998), where the authors stress the importance of people’s natural tendency to link distinct sensory domains, namely crossmodal matching, in the case of analogical reasoning (cf. Goldstone and Barsalou 1998, p. 253).

Going back to the earliest documented audiovisual associations (Köhler 1929, 1947), for example, the fact that the terms ‘baluba’ and ‘takete’ are associated with curved and angular lines, respectively, has been explained in terms of sound symbolism, which might, in turn, be related to the existence of some sort of sensory similarity between the pseudowords and the curvilinearity of the shapes (e.g., Bremner et al. 2013; Margiotoudi and Pulvermüller 2020; Passi and Arun 2022; Sidhu et al. 2021). Or, moving on to research on crossmodal associations between scent and sound, one might attempt to explain the observed associations between blackberry and piano, musk and brass, or fruit odours and high-pitched notes hypothesizing that the paired stimuli share some sensory characteristics (Crisinel and Spence 2012; Deroy et al. 2013; see also Piesse 1867 and Di Stefano et al. 2022, Spence 2022a, b). At the same time, however, the exact nature of similarity relationships in these associations remains unclear (see also Belkin et al. 1997; Cohen 1934; Hartshorne 1934).

Marks (1987) suggested that similarities can appear in diverse forms in cross-sensory perception, namely informationally, psychophysically, and phenomenologically. ‘Informationally’ refers to the fact that different sensory systems can provide information about the same quality, i.e., the shape of a cube through sight or touch (e.g., Gibson 1966). Psychophysically refers to the functional similarities in the ways in which sensations and perceptions depend on how certain stimulus parameters scale (e.g., intensity, qualitative structure, and distribution in space and time, see Stevens 1961; Von Békésy 1967). For the time being, in line with previous researchers, we will refer to these stimulus parameters in terms of amodal qualities and briefly present them in §3.1 (though for a broader discussion of the concept of amodality see Spence and Di Stefano, submitted). Finally, ‘phenomenologically’ refers to those similarities that are directly perceived between qualities of perceptual experiences in different sensory modalities (Marks 1978; Spence 2022a).

Attempting to explain the wide range of observed associations between pairs of stimuli presented in different sensory modalities, psychologists have elaborated various hypotheses, each assuming, or conveying, a different view of similarity, especially regarding its relationship with sensory perception. For example, the idea of relative positioning is based on the concept of structural, rather than perceptual, similarity.Footnote 8 The structure, in this case, is that of the stimulus dimension rather than the stimulus itself. According to this view, perceivers can establish a connection between stimuli that are represented along sensory dimensions that share structural properties, such as their relative position along a prothetic (as compared to metathetic) sensory dimension (Stevens 1957).Footnote 9 However, perceivers can experience crossmodal correspondences between prothetic and metathetic dimensions too (e.g., as in the case of the frequently-studied size-pitch correspondence; Gallace and Spence 2006; Parise and Spence 2009). Moreover, crossmodal mappings have frequently been documented in the absence of any obvious phenomenological sense of perceptual similarity (e.g., Crisinel and Spence 2012), thus forcing researchers to search for hypotheses that are not based exclusively on (any sort of comparison between) sensory inputs. One of the most widely accepted accounts (of crossmodal correspondences), namely the emotional mediation hypothesis, holds that it is the similarity between the affective meanings associated with sensory stimuli that links the stimuli presented across sensory modalities (see Spence 2020a, b, for a review; Palmer et al. 2013, on colour-music association; Di Stefano et al. 2022 and Spence 2022a, b, on olfaction and literature/music).Footnote 10 In the next section, therefore, the various accounts of crossmodal associations that have been put forward in the psychological literature, and that have implications for the idea of perceptual similarity are briefly examined. As will become clear, several philosophical issues are at stake, such as the perceptual vs. cognitive nature of similarity relationships, the putative role of shared perceptual structures/intersensory qualities that might mediate similarity (e.g., isomorphism/amodal quality), and the role of affective factors (e.g., emotional meanings) in establishing similarity relationships.

3 Similarity in the Major Account of Crossmodal Associations/Correspondences

According to Mellers and Birnbaum (1982), there are two major accounts of cross-modality matching, namely mapping theory and relation theory. The former holds that psychophysical values of stimuli from different continua can be compared directly because they are mapped onto a common scale of sensation. For instance, a cross-modality match in the intensity domain likely occurs when sensations of equal perceived strength are elicited by stimuli on different continua. According to relation theory, relationships (e.g., ratios) between pairs of stimuli from different continua are compared. For example, while it would be difficult to compare pitch and brightness directly, it seems possible to compare the ratio between pitches with the ratio between brightnesses. While mapping theory assumes the existence of a perceptual, crossmodal link, relation theory holds that matching across senses is rooted in the way in which a stimulus dimension is perceived unimodally (with respect to scaling within each sensory modality), with no direct implications for the nature of crossmodality judgements.

Given that most experimental protocols investigating crossmodal perception have had participants match stimuli across different senses (e.g., first presenting a stimulus in one modality and then asking which of the various stimuli from a different sensory modality matches best), findings obtained in this way might be more straightforwardly interpreted in terms of mapping theories, rather than relation theories. Moreover, the latter seemingly work only for what Stevens (1957) defined as ‘prothetic dimensions’ (see footnote 7). However, in order to account for specific perceptual matches, such as those between odours and sounds, which involve metathetic dimensions (i.e., pitch), researchers have formulated additional hypotheses, such as analogical mapping or affective mediation (Spence 2020a). Given their allegedly universal nature, crossmodal correspondences have been conceived in terms of the compatibility between attributes or dimensions of a stimulus in different sensory modalities (see Spence 2011, for a review).Footnote 11 Evidence supporting this phenomenon comes from both speeded classification tasks (Brunel et al. 2015; Evans and Treisman 2010; Marks 1987) and unspeeded psychophysical tasks (Gallace and Spence 2006; Parise and Spence 2009).

In what follows, we briefly present five major accounts of crossmodal similarity, namely, amodal (or suprasensory), relative positioning, emotional mediation, analogical mapping, and statistical learning (see Table 1). It is worth noting that these accounts of crossmodal correspondences should not be considered as mutually exclusive and they may all have some degree of validity, or relevance, in terms of explaining consensual matches.

Table 1 The major accounts of crossmodal associations and their implications for similarity

3.1 Amodal, or Suprasensory, Qualities

Literally meaning “without” modality (see Bahrick 2009), the term ‘amodal’ is often taken to mean that the same information can be picked-up regardless of the sensory source, or modality, by which that information was acquired (Walker-Andrews 1994; and see Spence and Di Stefano, submitted, for a critical review). Assuming the existence of amodal qualities, thus, the crossmodal association between A and B, with A and B being stimuli pertaining to two different sensory domains, might be straightforwardly explained by observing that the same sensory quality X is perceived in both stimuli/objects. Qualitatively distinct and dissimilar sorts of sensation, say of sight and sound, may thus link up because both share a perceptual aspect, for example, they are equally intense. This view suggests that the various sense modalities might share a few properties of sensation, that might be called ‘suprasensory’, meaning that those categories or dimensions of experience are not limited to a single sensory modality, but rather, can be applied to most or to all modalities. Among those properties, duration is one of the most thoroughly investigated, with early studies demonstrating that observers' ratings of duration tend to be highly consistent across the senses (e.g., Loeb et al. 1966; Marks 1987). Moreover, magnitude, size (extension), and brightness have often been considered as amodal properties (see Spence and Di Stefano, submitted). In addition to properties, it might be worth mentioning here psychological principles or laws that are assumed to be valid for different sensory domains, such as the Weber’s (1978) law of discriminability and Stevens’s (1957) law of sensory intensity.

One of the earliest empirical studies of putatively amodal perceptual dimensions was published by Von Hornbostel (1931). He investigated brightness as ‘a universal dimension of sensory experience’, conducting a study in which three participants matched different sounds (different pitches) and different scents to greyscale values. The transitivity of the resulting matches led von Hornbostel to infer that ‘sensory brightness’ is common to all of the senses (i.e., ‘allen sinnlichen Erscheinungen gemeinsame Eigenschaft’, p. 519, ‘property common to all sensory phenomena’, or ‘eine intermodale Eigenschaft’, p. 537, ‘an intermodal property’), and hence a universal dimension of sensory experience. Given that neither the resemblance between pitch and brightness nor that between loudness and brightness rests on any property common to the objects that generate the experiences (i.e., sound emitting source and luminous objects), calling a high-pitched or a loud sound bright says nothing intrinsic about a connection between the source objects. In some fundamental sense, the similarities between pitch and brightness and between loudness and brightness reside in shared features of sensory processing. With respect to Marks’s (1987) above-mentioned distinction, therefore, the amodal account grounds crossmodal associations on psychophysical similarity.

The findings of a recent audiovisual study seemingly suggest that roughness might be considered as an amodal quality. Giannos et al. (2021) hypothesized that non-tonal and highly dissonant harmonic stimuli would have been associated with rough images, while more consonant stimuli would be associated with the images of low visual roughness. To test this hypothesis, the authors harmonized the same melody in seven styles (which varied in the use of consonances and dissonances) and asked their participants to associate the melodies with images of variable roughness. The latter were black and white 2D and 3D images that represented surfaces with different degrees of smoothness/roughness. Showing that participants tend to consistently associate auditory dissonance with visual roughness, the results provide support to the amodal nature of roughness perception, at least if defined in terms of the pick-up by two or more senses (see also Liew et al. 2017, 2018; Ludwig and Simner 2013; Slobodenyuk et al. 2015; and Di Stefano and Spence 2022 for a review).

3.2 Relative Positioning or Structural Alignment

The existence of amodal dimensions (‘intermodale Eigenschaft’) has been questioned since Von Hornbostel published his findings, and is still being debated (Spence and Di Stefano, submitted). The psychologist Cohen (1934), for instance, argued that crossmodal mappings were actually better conceptualized as relative/relational judgements (cf. Hartshorne 1934, for an extended discussion on this theme). That is, according to Cohen (1934), Von Hornbostel’s experimental stimuli were ‘analogous’ rather than ‘identical’. The psychologist explained as follows: “It would not be unreasonable then to suppose that cross-modality comparison should be based (physiologically, if not introspectively) upon relative positions within different ‘absolute’ scales. According to this view, equation with respect to brightness of two experiences of different modalities would involve nothing more than the identity of relative positions upon two wholly independent scales.” (Cohen 1934, p. 119). A similar view was put forth by Gombrich (1960), who suggested that crossmodal similarity might be better explained in terms of ‘structural relationships’ within sensory systems rather than the similarity of specific intersensory elements.Footnote 12

According to this view, which might be labelled as ‘relative positioning’ or ‘structural alignment’, what is being compared when someone makes a similarity judgement is the relative position of the two sensory features in their own scales, with no implications regarding their actually being similar across the scales/dimensions (cf. Cohen 1934; Marks et al. 1986; Mellers and Birnbaum 1982; Moul 1930; Simpson et al. 1956, p. 100; Stevens 1957). For example, dark colours can be matched to lower keys of the piano, based on the common position they have on their respective unisensory scales, but such a match would not necessarily imply that dark colours are in any intuitive way perceptually similar to low pitched tones. Relatedly, certain mappings between different sensory stimuli have been also explained in terms ‘polar dimensions’ (e.g., Proctor and Cho 2006). The suggestion here is that stimuli might be coded as having positive and negative polarity along several dimensions, and the similarity is established based on the polarity.

Relative positioning thus holds that the concept of similarity makes little sense when referring to stimuli presented in different senses, while it can more properly describe the relationship that is recognized between the component parts of one stimulus that are mapped onto relative parts of the other stimulus. Therefore, structural similarity refers to the resemblance in the underlying systems of relations between the elements of the sources and the elements of the target. Structural similarity exists if the relations between the objects in the source are similar to the relations between the objects in the target, independently of the similarity between the objects themselves (Forbus et al. 1995).

Referring to the different ways in which stimulus dimensions are organized within each sensory modality, relative positioning or structural alignment requires that the organization of the stimulus dimensions be similar across the senses (Mellers and Birnbaum 1982, p. 600). As noted above, this account intuitively works for prothetic dimensions, which can be clearly ordered along a perceptual continuum. Moreover, this account could probably be applied to metathetic dimensions as well, considering that there might be individual differences in which end of the two scales are lined up (e.g., see Marks 1974, on the alignment of loudness/darkness). For example, any associations between a sound, for instance a low key of the piano, and a colour hue, such as dark blue, can be explained only assuming the similarity of the position of the sound along the frequency scale and of dark blue along the hue scale assuming that the two dimensions are commensurable (see Spence 2020a and Spence and Di Stefano 2022, for reviews of colour-sound associations).Footnote 13

Marks (1987) talked in terms of ‘superimposable’ dimensions, where the attributes at stake stand in a one-to-one relation, with no need to hypothesize a third, amodal or suprasensory quality actually picked-up by the various senses.Footnote 14 An alternative approach along similar lines could be to investigate the similarities between the psychophysics of different senses. Several studies by von Békésy (e.g., von Békésy 1957) went in this direction, highlighting several similarities between the processing of vibratory stimuli presented to the skin and to the ear, with resulting similarities between sensations of vibratory touch and of hearing. Such robust psychophysical similarities might pave the way for cross-sensory similarity judgements based on similar processing dynamics or mechanisms, with no implication for the similarity of the stimuli in themselves.Footnote 15 However, one might still wonder whether the robust psychophysics (e.g., of transitivity) that is obtained when comparing judgements across various pairs of senses (Ellermeier et al. 2021, on the ratio-based crossmodal matching of visual brightness and sound intensity; cf. Luce et al. 2010), reflects anything more than merely the application of ratio-based mappings within qualitatively distinct unimodal dimensions (see Cohen 1934; Root and Ross 1965; Stevens 1957, 1971; Stevens and Guirao 1963).

3.3 Emotional Mediation

The emotional mediation account has been presented as one of the most powerful and general explanations for a wide range of crossmodal correspondences involving different stimuli (see Spence 2020a, b, for reviews). According to this hypothesis, two stimuli are perceived as similar because they are associated with a similar emotional or, more broadly, affective meaning. Similar to relative positioning, emotional mediation does not imply anything about the actual perceived similarity between the pairs of sensory stimuli that are matched, and thus might well explain the existence of crossmodal mappings that have been documented in the absence of any obvious phenomenological sense of perceptual similarity (e.g., Crisinel and Spence 2012).

The conceptual premises of the emotional mediation account can be traced back to the long history of research investigating the feeling value, or affective tone, of sensory stimuli (Spence 2020a). The work of Charles Osgood and colleagues on the semantic differentiation of implicit affective meanings is worth mentioning here (e.g., Osgood 1952; Osgood et al. 1957). According to this approach, it is possible to determine the meaning of any stimulus (simple or complex, semantic, conceptual, or sensory) by asking people to rate their feelings about it on three semantic differential scales, namely valence (good/bad), activity (active/passive), and potency (strong/weak).

A clear example of emotional mediation in crossmodal correspondence is represented by Palmer et al.'s (2013) study of audiovisual associations, in which participants had to associate musical excerpts to colour patches and rate both stimuli for their emotional valence using eight affective descriptors (happy, sad, angry, calm, strong, weak, lively, and dreary). The results demonstrated that participants consistently associated music excerpts to colours and that the matched stimuli have similar emotional characters (evidence of some role of ‘mood’ in mediating music-painting associations is provided by the early study of Cowles 1935; see Spence 2020a, b, for reviews).

Considering all of the evidence reviewed by Spence (2020a), it would seem reasonable to conclude that emotion offers a crucial explanatory concept underpinning audiovisual associations (at least when the sensory cues are not paired semantically, e.g., as when we pair the barking sound with the picture of the dog; Chen and Spence 2010).Footnote 16 In addition, further evidence suggests that emotion is also relevant for those crossmodal associations that involve olfaction (e.g., see Levitan et al. 2015, for odour-music associations; Schifferstein and Tanudjaja 2004; Spence 2020b, and Gilbert et al. 2016, for colour-taste/smell associations). The findings reported by Winter (2016b) might also indirectly support such a central role for emotion, explaining why it is that taste (gustation), in particular, is a common source domain in most of the crossmodal correspondences, as taste and smell are thought to be more strongly emotionally-valenced than the other senses (see Levinson and Majid 2014; Winter 2016a). Additionally, Guetta and Loui’s (2017) findings revealed that crossmodal associations between complex auditory and gustatory stimuli might be mediated by emotional valence. Given all the above, however, one major caveat to the emotional mediation account is the fact that, as Spence (2020a) and many others have pointed out, emotions remain a poorly defined concept since different experimental and theoretical contexts adopt very different definitions or paradigmatic examples.Footnote 17 For instance, besides referring to primary emotions such as happiness or anger, studies have used much less intuitive emotional descriptors such as ‘strong’, ‘dreary’, ‘complex’, ‘spicy’, ‘whimsical’, and ‘dissonant’ (see Palmer et al. 2013; Whiteford et al. 2018), with other studies using complex constructs such as peacefulness, transcendence, and tenderness (see Janowski and Chełkowska-Zacharewicz 2019; Juslin 2013; Vuoskoski and Eerola 2011).

3.4 Analogical Mapping

Several psychologists have suggested a cognitive view of similarity in crossmodal matching. Often proposed with the term ‘analogous mappings’, this idea can be traced back to the medieval notion of “analogy of imitation” (or participation), which was introduced to explain the alleged relation of likeness between an immaterial, and thus unperceivable, entity, namely God, and sensible creatures, namely humans. The similarity between God and human creatures could be grounded only in some sort of conceptual similarity relation (e.g., mediated by an abstract concept, such as goodness or justice, see Ashworth 2008). A secular example of analogous mapping is the Mercator projection, a technique for making a two-dimensional map of a spherical surface. Most world maps that we are used to are the result of a Mercator projection, thus establishing a one-to-one link between each point on the earth and a related point on the projection. A Mercator projection is inherently perspectival, and it necessarily distorts the pattern of lines on the real surface of the globe. Once you choose the projection criterion, no other choices need to be made, everything follows from the criterion. Another famous example of similarity based on analogical mapping is the Rutherford’s model of the atom. In this model, the atom is seen as analogous to a solar system, with the nucleus of the atom being associated with the sun and the electrons with the planets on the basis of the shared relation of attraction which causes the revolution of the later around the former.Footnote 18

Analogical mappings can be established based on different factors which vary in the extent to which they have a perceptual rather than just a conceptual basis. For example, music notation establishes a conventional mapping between sound duration and the shape of notes on the musical score. However, there is no perceptual similarity between the duration of, for instance, quarter and eight notes and the way these are represented in the score, and there would be no problem in notating the same sound duration differently, based on a different criterion.Footnote 19 However, other mappings maintain a perceptual basis, though feeble. The shadows series by the contemporary artists Tim Noble and Sue Webster represents an interesting example: These artists pile up waste and old objects and, through the skilful use of light, project shadow figures that represented real things – such as human beings, animals, or everyday objects – on the wall. Observers find it difficult to visually perceive in the raw materials the image that will be projected on the wall. By carefully choosing the angle of the light, the artists establish an analogical mapping between the pile of waste objects and the shadows resembling, for example, a human figure.

The findings of a recent study investigating analogical mapping across sensory modalities supports the existence of a general analogy factor (Weinberger et al. 2022).Footnote 20 The authors tested relations between information presented in different modalities (e.g., words, sounds, lines). The participants were presented with two pairs of stimuli (A:B and C:D) and had to make binary true–false judgements about whether the relation conveyed in the first pair of stimuli (A:B) was analogous to the relation conveyed in the second pair (C:D). Stimuli included unimodal pairs, such lines-to-lines, words-to-words, as well as crossmodal stimuli, such as lines-to-sounds, lines-to-words, words-to-sounds. For example, a sequence of vertical and horizontal short segments (A) and a solid and longer segment (B) were paired with the words ‘indecision’ (C) and ‘certainty’ (D). Participants were then asked to judge whether A:B::C:D is true or false. In many cases, the stimuli not only present an explicit structural relation, but involve mappings based on quantitatively specified ratios (e.g., the differences in pitch between a series of rising and falling tones is quantitatively comparable to the differences in distance between a series of lines on the screen). Weinberger et al.’s results demonstrated that participants performed well-above-chance in the identification of crossmodal second-order relations, providing robust evidence of analogy across modalities.

The existence of a general analogy factor might be related to the way in which evolutionary forces have shaped human brains to detect sensory regularities in environmental stimuli. Evidence supporting this idea could be provided by studies on non-human primates. For instance, Ravignani and Sonnweber (2017) tested cross-modal isomorphisms – the ability to detect analogies between structural features across domains – in two chimpanzees (Pan troglodytes). After an initial unimodal training during which individuals learned to choose structurally ‘symmetric’ image sequences (two identical geometrical shapes separated by a different shape) presented beside ‘edge’ sequences (two identical shapes preceded or followed by a different one), individuals were involved in a multimodal task. In the experimental protocol, the presentation of visual stimuli was preceded by the playback of three sounds that could mimic the structure of either symmetric or edge visual stimuli. Individuals more readily chose symmetric sequences when presented with symmetric, rather than edge, sound triplets, thus leading researchers to conclude that the chimpanzees spontaneously detected a crossmodal, namely visual-auditory, isomorphism. Together with the findings that have been reported in infants (Lewkowicz and Turkewitz 1980), this study further indicates that basic cross-modal abstraction capacities transcend linguistic abilities and may involve evolutionary, low-level sensory mechanisms.

3.5 Statistical Associations

The statistical account of crossmodal association is briefly presented, although, as it will become clear, it has few, if any, meaningful implications regarding similarity. The approach moves from the factual evidence that many physical properties of the sensory stimuli are often correlated in nature. Therefore, the sensory system acquires, together with the stimuli, information on their regular binding, thus learning the statistical regularities of the environment and the correlations between multiple sensory cues (Ernst 2007; Glicksohn and Cohen 2013; Parise and Spence 2013). Reflecting the properties of the environment (and in some cases also the laws of physics), most statistical correspondences are likely to be universally perceived (at least by those subjects exposed to the same environment). However, and intriguingly, perceivers can also successfully learn novel associations that are not statistically frequent in the environment (e.g., Brunel et al. 2015). As far as similarity is concerned, neither the universally perceived nor the experimentally-induced statistical associations imply that the matched stimuli are, in any perceptually salient way, similar.

4 Accounting for Similarity Across the Senses

We now examine the implications of the reviewed accounts for the concept of similarity across the senses. The diverse accounts are grouped according to Alistair’s (2013) general distinction between property-based and isomorphism-based views of similarity. We then progress on to explore some issues related to the hybrid perceptual, affective, and cognitive nature of similarity. Finally, we highlight the key different factors that allegedly contribute to providing a multidimensional definition of (cross-sensory) similarity.

Alistair (2013) distinguished between two possible strategies for conceiving of similarity: property-based and isomorphism-based. The property-based strategy treats two objects as similar when they share phenomenal qualities. The isomorphism-based strategy treats two objects as similar if a mapping between their component parts exists. Apples, blood, and traffic lights might be similar as they share “redness”. A Mercator projection of the world can be similar to the world as they can be related by a one-to-one mapping. Based on Alistair’s distinction, accounts of crossmodal correspondences in terms of amodal dimensions and emotional mediation are property-based, while structural alignment and analogical mapping are isomorphism-based (see Table 1). In what follows, we go deeper into property-based and isomorphism-based accounts of similarity, evidencing their implications for the perceptual, affective, or cognitive nature of similarity judgements.

In the unisensory domain, the idea of property-based similarity appears to be intuitively and straightforwardly applied. In this context, two perceptual experiences are similar when they present some shared phenomenological aspect (see also Leibniz 1923, A64 107/P 13). For instance, when I experience the blue sky and the ocean blue, the common experience of blueness guarantees that they are similar in some analogous respect (i.e., colour). No mapping is needed to perceive the similarity between the sky and ocean, just the apprehension of a common intramodal property.Footnote 21 However, famous critics such as Goodman (1972) have pointed to the fact that any two objects share at least one phenomenal (intramodal or intermodal) quality and thus such view of similarity would simply be a universal relation – ‘everything is similar to everything else’ – and therefore similarity claims are uninformative (see also Rodriguez-Pereyra 2002).Footnote 22

Conceived of in terms of property sharing, similarity also raises issues in the cross-sensory context. For example, it has been observed that a funeral march (i.e., an auditory stimulus) is associated with a weeping willow (a visual stimulus) as they share a property, that is, they are both perceived as sad (e.g., see Davies 2017). Or, analogously, a happy song might be associated with a sunflower. Similar associations are often explained in terms of emotional expressiveness, noting that both the funeral march (/happy song) and the weeping willow (/sunflower) express the same emotional quality, namely sadness (/happiness), which therefore grounds the emotionally mediated similarity between funeral marches (/happy songs) and weeping willows (/sunflowers).

Two observations should be made here regarding the implications of emotionally mediated correspondences for the concept of property-based similarity. First, the fact that people tend to associate X more to A than to B, for instance, the song “happy birthday” to sunflowers more than to weeping willows, does not necessarily imply that X is phenomenologically similar to A. As has been observed elsewhere (see Spence and Levitan 2021; Spence et al. 2015), merely demonstrating a statistically significant crossmodal correspondence between stimuli only implies that it is the best of the options that happened to be available to participants at the time they were asked, and does not imply that the stimuli are perceptually similar in any respect (rather the association may simply reflect learned statistical correlations in the environment, see Section 3.5). If people were presented with the word ‘apple’ and with the images of an apple and of an apple tree and were asked to match the word with one of the two images, they will likely match the word to the image of the apple though, clearly, the word ‘apple’ is not in an intuitive way more similar to the apple than to the tree.Footnote 23

Second, emotional meanings can only metaphorically be considered as ‘perceived’ properties. In fact, according to the emotional mediation account, affective meanings are rather attributed or associated with perceived stimuli, but are not directly perceived in themselves (as a property-based definition would require). This is in line with the cognitivist’s view of musical emotions (e.g., Kivy 1991; Radford 1989) claiming that listeners recognize emotional qualities of music owing to cognitive processes. These considerations stress the fact that emotions are either felt or cognitively superimposed on perceptual stimuli, leaving open the question as to whether the sadness of the weeping willow is actually perceived (e.g., Davies 2017).

Taken together, the above considerations would appear to suggest that property-based similarity can hardly be invoked to account for the broad range of emotionally mediated correspondences, which are, as said, one the most widely documented correspondences (especially audiovisual). We are thus led to examine whether the second view of similarity presented by Alistair (2013), namely isomorphism-based, can more broadly account for the existence of crossmodal correspondences.

According to the isomorphism-based definition, similarity relationships are established by the perceiver based on an implicit or explicit cognitive criterion. Merely recognizing that two objects are perceptually similar has no meaning unless one specifies how they are the same (Goodman 1972; Smith 1993). An apple is more similar to a pear than to a carrot. But it is likely more similar to a carrot than to a train. Thus, we could generalize by saying that perceptual similarity always varies with the attributes that are attended to (Nosofsky 1984; Shepard 1964, 1987). For example, cats and lions are similar as they share features such as their shape, retractile claws, locomotive behaviour. But humans might be similar to lions as well, when they show aggressive behaviours as lions do, or to sloths, when they are lazy and sleepy. In the first case, the association is based on the similarity of properties that are actually perceived (property-based). In the latter, it seems that we analogically infer similarity between humans and lions or sloths, based on non-perceptual characteristics.

The isomorphism-based similarity account denies that phenomenological similarity can provide a solid basis for establishing similarity relationships across the senses. Even accepting that perceptual or phenomenological similarity exists, isomorphism-based view stresses the fact that similarity relationships based on property sharing are either universal or indeterminate and, in either case, there cannot be a solid link mediating the correspondence or association between stimuli across the senses. Such an objection clearly goes against the explanations of crossmodal correspondences in terms of amodal dimensions (even if such dimensions were to exist, they would not be capable of accounting for all demonstrated occurrences) and emotional mediation, for the reasons highlighted above. One is thus led to assume that cognition must play a role in crossmodal similarity, which naturally raises the question about what triggers the similarity process, when this is conceived of as cognitively mediated.

Armary et al. (2018) suggested that based on the specific context different forms of salience emerge as a possible heuristic in driving the similarity process: sensory, categorical, and operational salience. Grounding on biological, developmental, and evolutionary explanations, sensory salience assumes that the processing and attentional mechanisms of the human perceptual system generate a hierarchy amongst sensory properties, with some appearing more salient than others. For instance, in humans, features such as colour, shape, orientation, symmetry, will likely be more salient than odours. By contrast, for animals with exceptional olfactory abilities, such as certain breeds of dog, odours will be prominent. Categorical salience reflects the hierarchy in the cognitive system and knowledge of the perceiver. Given a concept, several others will follow as more saliently related to the former. For example, despite the different possible uses of a chair, the concept ‘sitting’ will be prominent, while ‘climbing’ less prominent. Finally, operational salience refers to the ongoing salience attributed to one property during the analogical process. It is the most flexible form of salience, influenced by sensory inputs, categorical knowledge, and goals.

Such a multidimensional concept of salience could allow us to explain why certain similarities are perceived more intuitively than others and why it is that we ascribe different weights to different properties of things. Even though ladybirds and tomatoes share the salient perceptual characteristic of red colour, but ladybirds and flies do not, flies may be judged as more similar to ladybirds (both being flying insects) than tomatoes are to ladybirds. Presumably, in this example, the property of being a flying insect overrides having the same colour. Referring to salience can also explain hierarchies in the way crossmodal similarities are established by the perceiver, with the sensory ones being used first, if available, while the most cognitive ones are triggered only when perceptual similarities are unavailable.Footnote 24

Returning to the previous examples, when colours and sounds are presented to a perceiver, s/he might search for a perceptually meaningful mapping criterion (e.g., dark colours to low pitches). By contrast, when one cannot establish an immediate sensory connection, such as between sounds and the notes on the score, or between a pile of rubbish and a female figure, the perceiver might search for cognitively inferred mediating factors instead, such as analogies. Another way to put the same idea is by referring to the role of stimulus/subject in perceptual similarity. In some cases, similarity is based on the phenomenal qualities of the stimulus, and thus the similarity process is exogenous, or stimulus-driven (e.g., dark colours and low pitches); in other cases, the phenomenal qualities of the stimuli do not allow to establish any perceptually meaningful similarity (e.g., sounds and notes on the score), and similarity is rather established by the observer thank to an explicit cognitive or abstract mediation, and is therefore ‘endogenous’, with respect to the participant.

Interestingly, hierarchies might also take place within conceptually-mediated similarities. The similarity process can, in fact, act as a sorting function that helps to determine the importance of properties in a given context, determining, in turn, which analogy best suits that context. For example, in the analogy mentioned between the atom and the solar system, the spherical shape of the planets and electrons is less relevant than is the causal relation of attraction between the centre and the peripheral elements. Similarly, in the analogy between a pile of rubbish and a human figure, shape similarity overrides other evident differences (e.g., matter, bi-dimensionality; and see Spiro et al. 1989, on the use of, or need for, multiple analogies when trying to explain/understand complex systems).

Taken together, the above observations suggest a first tentative answer to one of our initial questions, namely, whether similarity comparisons involving sensory qualities from different senses result from perception or are rather cognitively inferred. We suggest that similarity can be perceived across the senses and therefore, at least in certain occurrences is, in our terminology, stimulus-driven. This conclusion stresses that similarity is a quintessential component of human perception (see also Quine 1960, 2000; Churchland 1976, for criticisms of Quine’s position). More specifically, we suggest that similarity judgements involving stimuli across the senses have their origins in perception, and perception places important constraints on them (see also Goldstone and Barsalou 1998, for perceptual similarity in general). For example, a tone of 400 Hz and a tone of 402 Hz sound similar, and nothing can be done to alter this sensory similarity. We therefore acknowledge that phenomenological similarity simply exists, and it refers to the phenomenal resemblance between the objects in the source and target and their properties (Keane et al. 1994). However, as shown, a stimulus-driven account of similarity could not explain several occurrences of similarities across the senses, thus suggesting that more cognitive factors might mediate similarity judgements when perceptually driven paths are unavailable or insufficient (e.g., in the case of analogical mappings). Therefore, the previous conclusion has to be completed by stating that similarity processes may, depending on the context, be perceptual or conceptual in nature. When perceptual, the similarity process follows an exogenous, or stimulus-driven, path, starting and ending with sensory apprehension; when conceptual, the similarity process follows an endogenous, or subject-driven, path, starting with perception but being finally accomplished at a cognitive level (see Fig. 1).Footnote 25

Fig. 1
figure 1

Two paths leading to the establishment/recognition of crossmodal similarity. When two stimuli in different sensory domains (A and B) are perceived, similarity is directly perceived if they share phenomenal qualities. If not, conceptual, or affective links might trigger similarity comparisons

Several important observations should now be made in order to achieve the objective of this paper and, thus, to provide a more fine-grained account of perceptual-cognitive dynamics in the case of cross-sensory similarity judgements. First, if it is true that nothing can be done to alter phenomenal similarity, it is also true that this statement is valid under certain circumstances. For instance, it is likely that individual differences play a role in the ability to differentiate between specific classes of stimuli, such as pitches or colours, impacting the ability to perceive similarities among those stimuli. For example, listeners with absolute pitch might perceive two tones at 400 Hz and 402 Hz as sounding different. This would lead us to consider sensory discrimination thresholds, a topic that falls beyond the scope of this paper. However, we would like to point out that discrimination thresholds are likely much more important for similarity comparisons in the same sensory modality than across the senses. Beyond individual differences, universal psychological mechanisms can influence perception, such as the functional unitization/differentiation of perceptual components due to categorization experience (Goldstone 2000). For example, in an early study by Katz (1963), two groups of children were required to associate four similar geometrical shapes with nonsense syllables. In the first group, the association was such that two shapes had the same name while, in the second group, each syllable was associated to a different shape. The results of a subsequent similarity rating task revealed that children from the first group judged the shapes that were named identically as being the same more often than children from the second group. Similar findings were obtained on adults by Livingston et al. (1998), who found that, when participants learned to classify objects (animal body parts or artificial cells) into a single category, these objects were rated as more similar than objects that belonged to different categories, or than objects that were not categorized at all (see also Kurtz 1996).

Second, a caveat should be mentioned here against the radicalization of the dichotomy between perceived vs. cognitively inferred similarities. In fact, in several cases, similarities that were once effortful and inferred based on conceptual means might become perceptual or, at least, less demanding cognitively. Especially when dealing with analogical mapping, perceivers might struggle to see the link between the paired entities at first, while gradually becoming so familiar with it that they do not need to recall the cognitive mediation anymore. Roughly-speaking, this is the process of perceiving what was once a conceptual similarity. For instance, the student musician explicitly uses (cognitive) rules to decipher the link between sounds and sings on the score. Then, with time, reading music ceases to be effortful and rule based, and becomes perceptual and phenomenologically direct. Importantly, when this occurs, the ‘acquired’ similarity can be used to trigger new similarities (as often occurs in the case of scientific discoveries, e.g., Gentner 2002).Footnote 26 Moreover, as observed by Marks (1987), crossmodal associations might be due to some kind of communality between parallel stages in processing attributes of different sensory modalities; such communality may not be explained in terms of phenomenological properties of the stimulus, but rather in terms of stimulus processing after sensory apprehension.

Third, and relatedly, it should also be noted that cognitive/conceptual information does not necessarily make perceptual stimuli seem any more similar (nor dissimilar). For example, studies in the unisensory context show that certain olfactory molecules called enantiomers, namely odorants which are identical except for their chirality, sometimes smell very different (e.g., Laska and Teubner 1999). Even though the perceiver might be cognitively aware of their similarity, such awareness does not alter the way in which the molecules are actually perceived. In the crossmodal context, the same vibratory stimulus at a frequency beyond 200 Hz is not perceived similarly when experienced as an auditory versus tactile vibration.Footnote 27

To recapitulate, phenomenologically and cognitively mediated similarities are wrongly conceived as mutually exclusive, and rather need to be conceived in terms of continuity and, at least to a certain extent, flexible categories. This implies that phenomenological similarity might be altered or influenced by cognitive information, and that cognitive similarity might become perceptual after exposure or training. At the same time, however, phenomenological similarity would occasionally appear resistant to any cognitive penetration, and conversely, cognitive information is impenetrable by perceptual information. We therefore now want to account for the difference between phenomenological and cognitive similarity using the concept of (cognitive/perceptual) impenetrability.

Perceptual contents might be considered to be cognitively impenetrable if they cannot be influenced or affected by the contents of higher cognitive states or factors such as beliefs, inferences, habits, or knowledge (Pylyshyn 1984; Siegel 2012; Stokes 2013; see also Macpherson 2012); vice versa, perceptual contents are cognitively penetrable when cognitive information or knowledge alters how they are perceived. An illuminating example of the cognitive penetrability of perception comes from studies testing the use of inverted lenses. These lenses turn one’s visible world upside down, impacting the way we visually perceive and interact with the world. Typically, the lenses are initially experienced as radically disorienting for the wearer but, after a relatively short learning period of consistently using and acting with the lenses, participants quickly recover to perform perceptual and motor tasks normally (Harris 1965; Kottenhoff 1957; Stratton 1897). This example demonstrates that participants could readily adjust their perception according to their knowledge concerning how the real, non-inverted world is (see also the use of pseudophone to induce sensory incongruity in audition, Spence 2022b).

These observations allow us to refine our cognitive-perceptual polarized model, suggesting that similarity can be rather represented as a continuum along two main dimensions, namely, associative strength and impenetrability (see Fig. 2; see also Deroy and Spence 2013, for a similar bidimensional model to account for crossmodal correspondences and canonical synaesthesia). Associative strength refers to the extent to which the link created by the similarity relationship between two stimuli is binding (i.e., low associative strength implies that the bond is weak, and hence their similarity does not constrain perception/cognition). Impenetrability refers to the extent to which a similarity relationship can be altered by cognitive factors. In the present model, similarity judgements that involve stimuli from different sensory domains can be based on phenomenological, structural, emotional, or conceptual grounds. Although there might not be clear-cut distinctions amongst the different kinds, according to the model, phenomenological matchings are highly impenetrable and perceptual in nature, while conceptual, or analogy-mediated, matchings are penetrable and cognitive in nature. In between these two extreme cases, structural might be considered as less penetrable than emotional, which is likely more cognitive and penetrable.

Fig. 2
figure 2

The different kinds of cross-sensory similarities according to their associative strength and the extent they are impenetrable/penetrable

Before concluding, it might also be worth pointing out that these reflections could not have been made starting from the analysis of what has been often considered – though incorrectly (Deroy and Spence 2013) – a peculiar case of crossmodal matching, namely, synaesthesia (Ramachandran and Hubbard 2001). In fact, while it was possible to highlight a number of reasons why crossmodal associations might be putatively relevant for similarity, the latter concept seems not to be straightforwardly applicable to synaesthesia. With respect to the specific stimuli that are being linked/associated, in fact, synaesthesia seems to be more related to identity, rather than necessarily similarity since, for a synaesthete, the inducer (e.g., the sound of the trumpet) and the concurrent (colour scarlet) are simply part of one and the same, cognitively impenetrable, perceptual experience (that is, the inducer is always co-experienced with the concurrent).

5 Conclusions

Returning to our initial question about the perceptual vs. cognitive nature of similarity across the senses, a tentative answer can now be provided, by observing that crossmodal similarity is, in its allegedly fundamental manifestation, rooted in perception and can therefore be explained in terms of the sharing of phenomenal qualities. However, in other cases, explanations based on phenomenal qualities fail to account for the observed matchings, which can rather be explained in terms of cognitive or affective mediation, according to which, for example, similarity judgements are based on the sharing of similar emotional meanings. While perceptual-based matchings are cognitively impenetrable, conceptually/affectively mediated ones are cognitively penetrable.

At this point, an underlying question regarding the implications of the literature on crossmodal matchings for the notion of (cross-sensory) similarity seemingly remains open. In fact, as briefly observed earlier, according to some commentators (e.g., Spence et al. 2015; Spence and Levitan 2021), the evidence of statistically significant (or consensual) crossmodal correspondence between stimuli does not say anything about crossmodal similarity, being rather explainable in terms of, say, learned statistical correlations in the environment. At the same time, however, influential psychologists such as Marks (e.g., Marks 1989) are strongly convinced that investigating cross-sensory matchings sheds light on the very nature of similarity. From a psychological perspective, the point seemingly remains controversial, as no crossmodal matching protocol can generate results solely on similarity, excluding the possible contribution of mechanisms other than similarity (e.g., statistical learning) in triggering the (eventually discovered) association. In other words, when testing primates or infants, how could we actually know whether it is analogy, emotional mediation or perceptual similarity driving their behavioural responses? Is there some implicit assumption experimenters hold about the kinds of comparisons that can legitimately be based on perceptual similarity? The issue remains controversial from the philosophical side as well, and the controversy might be framed within the never-ending dispute on the nature of perception itself and its objects (universal vs. particular, conceptual vs. non-conceptual, propositional vs. sensory; see Foster 2000).

Several additional questions remain open for future investigation. First, one might be tempted to ask about the implication of our inquiry for similarity in the unisensory domain, for example, by asking whether the emotional mediation account can also work for similarity judgements between colours (e.g., do we judge certain pairs of colours as more similar because they are associated with the same emotion?). Relatedly, it could be worth investigating whether fundamental distinctions between stimuli in one sensory dimension, such as major/minor in sound perception, might have a correlate in a different modality, such as vision or touch. This would open to the investigation of the intriguing possibility of translating between the senses (see Spence and Di Stefano, in press), and the putative role of similarity criteria in mediating the process of sensory translation.