Converging evidence suggests that the perception of auditory pitch exhibits a characteristic spatial organization. A compelling demonstration of this resides in the Spatial Musical Association of Response Codes (SMARC) effect, referring to the observation that response times to low-pitched tones are faster when keys are located in the lower part of the space and responses to high-pitched tones are faster when keys are located in the upper part of the space (Lidji, Kolinsky, Lochy, & Morais, 2007; Rusconi, Kwan, Giordano, Umiltà, & Butterworth, 2006). This is also the case when pitch is irrelevant for the task being considered (as, for instance, when the task is about discriminating between musical instruments; Lidji et al., 2007; Rusconi et al., 2006), leading researchers to suggest that the pitch–space association may happen in an automatic (i.e., without requiring attentional control) fashion (Lidji et al., 2007). Hence, high tones seem to be mentally linked to upper locations and low tones to lower locations in space. Pitch can also be mapped along a left-to-right continuum, but such horizontal mapping is weaker than vertical mapping, being more influenced by one’s musical and instrumental playing experience (Lidji et al., 2007; Rusconi et al., 2006; Timmers & Li, 2016). Interestingly, listening to sounds of a different pitch may also affect orienting of spatial attention in peripersonal space (e.g., Akiva-Kabiri, Linkovski, Gertner, & Henik, 2014; Bernardi et al., 2015; Bodak, Malhotra, Bernardi, Cocchini, & Stewart, 2014; Fernández-Prieto, Vera-Constán, García-Morera, & Navarra, 2012; Ishihara et al., 2013; Lega, Cattaneo, Merabet, Vecchi, & Cucchi, 2014).

The mapping between pitch height and vertical location is quite robust (Dolscheid, Shayan, Majid, & Casasanto, 2013; Roffler & Butler, 1968) and is already present in early infancy (Dolscheid, Hunnius, Casasanto, & Majid, 2014; Mondloch & Maurer, 2004; Walker et al., 2010). Although it may also rely on linguistic associations (Dolscheid et al., 2013), it has been reported in prelingual infants (Dolscheid et al., 2014; Walker et al., 2010) and in remote isolated populations who do not use linguistic spatial metaphors (i.e., “high” and “low”) to describe pitch (Parkinson, Kohler, Sievers, & Wheatley, 2012). Indeed, pitch–space mapping may be rooted in auditory scene statistics that reveal a clear association between frequency and elevation, with sounds coming from higher elevations tending to have higher frequencies than those coming from lower elevations (Parise, Knorre, & Ernst, 2014).

Sound sources are often visible, and the availability of visual information improves sound localization estimates (e.g., Tabry, Zatorre, & Voss, 2013). Indeed, when visual and auditory inputs providing spatial information are both available, the final multisensory estimate tends to be more precise than either unisensory estimate, with vision playing a major role in localization tasks (e.g., Alais & Burr, 2004; Shelton & Searle, 1980; see also Stein, Stanford, & Rowland, 2014). Visual experience may thus play a crucial role in associating the pitch of natural sounds to the locations they generally originate from, and therefore in internalizing audio-spatial correlations in the environment (Deroy, Fasiello, Hayward, & Auvray, 2016). In this regard, it is interesting that the first evidence about pitch–space correspondence comes from experiments manipulating visual stimuli. In one of the earliest investigations on this topic, Bernstein and Edelstein (1971) found that a visual stimulus was classified faster as high or low when it was accompanied by a tone that was congruent rather than incongruent (e.g., high pitch with high position rather than low). This result has been replicated in many subsequent studies (e.g., Ben-Artzi & Marks, 1995; Evans & Treisman, 2010; Melara & O’Brien, 1987; Patching & Quinlan, 2002), also investigating other pitch-visual feature correspondences (such as lightness, brightness, shape, size).

If pitch–space correspondence arises through frequent associations in everyday experience (where vision plays a critical role as the dominant modality we rely on in spatial localization; e.g., Alais & Burr, 2004; Shelton & Searle, 1980; Tabry et al., 2013), we could suspect that blind individuals who lack prior visual experience may not show the same learned perceptual association. Moreover, while the blind typically make superior use of spectral cues for localization in the horizontal plane, this compensatory behavior may come at the cost of a reduced ability to use these cues for localization in the vertical plane (Voss, Tabry, & Zatorre, 2015). This impaired ability to localize sounds in the vertical plane may also (indirectly) affect pitch-vertical space mapping in the blind.

Previous studies have demonstrated that the lack of prior visual experience may affect but not prevent the use of mental spatial representations (e.g., Afonso et al., 2010; Cattaneo et al., 2011; Cattaneo, Vecchi, Monegato, Pece, & Cornoldi, 2007; Postma, Zuidhoek, Noordzij, & Kappers, 2007; Röder, Kusmierek, Spence, & Schicke, 2007). For instance, blind individuals are likely to represent numerical information spatially along a left-to-right-oriented mental number line, as sighted individuals typically do (e.g., Castronovo & Seron, 2007; Cattaneo et al., 2011; Cattaneo, Fantino, Tinti, Silvanto, & Vecchi, 2010; Rinaldi, Vecchi, Fantino, Merabet, & Cattaneo, 2015; Szűcs, & Csépe, 2005; but see Pasqualotto, Taya, & Proulx, 2014). Moreover, past and future events seem to be mapped on a left (past) to right (future) spatial continuum, regardless of visual experience (Bottini, Crepaldi, Casasanto, Crollen, & Collignon, 2015; Santiago, Lupiãnez, Pérez, & Funes, 2007); although blindness may affect the mapping of past and future on the sagittal plane (Rinaldi, Vecchi, Fantino, Merabet, & Cattaneo, 2017). Furthermore, in verbal memory tasks (in which sighted individuals tend to organize information spatially), blind individuals may use preferentially alternative nonspatial strategies (Bottini, Mattioni, & Collignon, 2016). Specifically for pitch processing, a recent study reported a preferred association between tones increasing in pitch and upward tactile movements and tones decreasing in pitch and downward tactile movements in sighted individuals, supporting a pitch–space mapping, whereas blind individuals did not show any preferential association (Deroy et al., 2016). These findings suggest that the normal development of vision may be critical in mediating this pitch–space correspondence (Deroy et al., 2016).

To shed further light on this issue, we carried out a study in which we presented a group of early blind and sighted individuals with an implicit (timbre judgment) SMARC-like task measuring the mapping of pitch height along a bottom-to-top vertical dimension. In accordance to existing literature, we expected sighted participants to be faster at categorizing the timbre of various musical instruments when they responded to low pitches using a response key located in lower space, and to high pitches using a response key located in upper space, despite the fact that pitch was irrelevant for the purposes of carrying out the task (e.g., Lidji et al., 2007; Rusconi et al., 2006). Importantly, if the mapping between pitch height and spatial location is mediated by normal visual development, responses of early blind individuals to high and low pitch tones should not be affected by the vertical location of the response key. This would suggest that prior visual experience is needed for learning the pitch–vertical space correspondence. Alternatively, if mapping of pitch in space depends more on auditory-motor experience, early blind individuals should also associate high and low pitches with different positions in vertical space, suggesting that the source of pitch–space mapping can be accounted for by mechanisms other than visual binding.

Method

Participants

Twenty-three right-handed early blind participants (14 males; mean age = 42.22 years, SD = 13.00, range: 18–65, mean education: 14.39 years, SD = 3.87) and 23 right-handed sighted participants (12 males; mean age = 38.61 years, SD = 12.86, range: 22–65, mean education: 15.83 years, SD = 2.70) took part in the experiment. The participants had little musical training. Specifically, blind participants had on average 2.60 years (SD = 2.72) and sighted participants 1.50 (SD = 2.35) years of musical training beyond basic music education obtained in primary school; with no significant difference between the two groups, t(44) < 1, p = .31. No participant had any history of neurological disorders or motor dysfunction. All sighted participants had normal (or corrected to normal) vision. All the blind participants were profoundly blind due to various ocular causes (see Table 1 for individual demographics), and all were proficient Braille readers.

Table 1 Characteristics of early blind participants

Stimuli

The stimuli consisted of two low-pitched tones (C2 and E2; 65.41 Hz and 82.41 Hz, respectively) and two high-pitched tones (F5 and A5; 698.46 Hz and 880.00 Hz, respectively). These pitches were chosen in order to achieve the same semitone distance between the two low and the two high tones (i.e., four semitones between C2 and E2 and four semitones between F5 and A5; see Lidji et al., 2007, for study design similarities). Each tone was synthesized with piano and piano-keyboard timbre (belonging to the keyboard instrument family) and clarinet and saxophone timbre (belonging to the wind instrument family), for a total of 16 different stimuli. All stimuli were presented for a duration of 500 ms and were normalized in loudness at 0 dB (Audacity software, http://audacity.sourceforge.net/). Although all the sounds used were normalized at 0 dB, auditory stimuli may still differ in terms of perceived loudness. To rule out this possible confound, we conducted a preliminary experiment on 10 participants (not taking part in the main experiment) and confirmed that all sounds were perceived as equal in terms of loudness (for a similar procedure, see Rinaldi, Lega, Cattaneo, Girelli, & Bernardi, 2016).

Procedure

Participants were seated comfortably in a dimly lit room. Sighted participants were blindfolded throughout the entire experiment, as typically done in studies comparing blind and sighted performance on different perceptual tasks. This was done to avoid the potential effects of visual input (even if task irrelevant) on the performance of the sighted group (e.g., Tabry et al., 2013). Stimuli were binaurally delivered using professional headphones (Sennheiser HD 280 Pro headphone). Figure 1 shows the experimental setting and procedure. In each trial, the target auditory stimulus was presented for 500 ms. Participants were asked to judge whether the tone was played by a wind or a keyboard instrument (timbre judgment task) by pressing one of two vertically aligned response keys; one at the bottom (the space bar key) and one at the top (the key corresponding to the number 6) of a standard keyboard (see Lidji et al., 2007; Rusconi et al., 2006, for a similar procedure). There was no time limit for response, but task instructions emphasized both speed and accuracy. Even if participants could respond during the 500-ms stimulus presentation, the sound was always presented for its entire duration. After stimulus offset, a silent interval of 650 ms preceded the presentation of the next auditory stimulus. Participants took part in two experimental blocks, in which the association between instrument category (wind vs. keyboard) and response key was kept constant. However, in one block participants used their right hand to press the up response key and the left hand to press the bottom response key, while in the other block, the hand position was reversed (see Fig. 1b). Hand assignment to top/bottom response keys was manipulated in light of prior findings suggesting that hand position may affect the SMARC effect in the vertical plane (Lidji et al., 2007; but see Rusconi et al., 2006) and in light of prior evidence showing that blind individuals tend to rely more on body–hand-centered reference frames compared with sighted people (e.g., Cattaneo et al., 2008; Crollen, Dormal, Seron, Lepore, & Collignon, 2013; Noordzij, Zuidhoek, & Postma, 2006, 2007; Pasqualotto et al., 2014; Rinaldi et al., 2015; for recent evidence in auditory localization, see Vercillo, Tonelli, & Gori, 2018). The order of the two experimental blocks and the response-key–timbre assignment were counterbalanced across participants. Each of the 16 stimuli (i.e., four tones played by four different instruments; see above) was presented six times for a total of 96 trials in each block. Within each block, trials were presented in random order, with the only constraint being that the same identical tone (same pitch played by the same instrument) never occurred consecutively. Before the first experimental block, participants listened to two tones from each instrument used during the task in order to familiarize themselves with the auditory stimuli, and performed eight practice trials with tones of instruments not used in the real experiment (but still belonging to the keyboard or wind family). Participants did not train again before starting the second block, but were allowed a few minutes break. The software E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA) was used for stimuli presentation and data collection. The entire experiment lasted approximately 1 hour, including instructions, short breaks between the two blocks, and debriefing.

Fig. 1
figure 1

a Experimental timeline. A silent interval of 650 ms preceded the presentation of the 500-ms auditory stimulus. In each trial, participants were asked to judge whether the tone was played by a keyboard instrument or by a wind instrument (timbre judgment task). b Hand assignment. Participants performed two experimental blocks: in one block, they pressed the top key with their right hand and the bottom key with their left hand. In the other block hand position was reversed

Data analysis

We first carried out a repeated-measures analysis of variance (ANOVA) on mean correct reaction times (RT, recorded from onset of the auditory stimulus) and on mean error scores to look for possible group and block effects on performance. Hence, the SMARC effect was analyzed as in prior studies (Lidji et al., 2007; Weis, Estner, & Lachmann, 2016) computing the difference in mean correct reaction times (dRTs) and percentage of error rates (dErrors) between top-key and bottom-key responses. Accordingly, positive values indicate faster responses and fewer errors for bottom-key responses, and negative values indicate faster responses and fewer errors for the top-key responses. Values for dRT and dError were analyzed by means of a repeated-measures ANOVA with pitch (low and high) and hand assignment (right-hand top/left hand bottom vs. right hand bottom/left hand top) as within-subjects variables and group (blind vs. sighted) as the between-subjects variable.

Results

RT analyses

Mean response latencies for correct responses are reported in Table 2. The ANOVA on mean correct RT with block as a within-subjects variable and group as a between-subjects variable revealed that participants were overall faster in the second block, F(1, 44) = 15.89, p < .01, ηp2 = .26, likely reflecting practice effects (note that timbre-key assignment was the same in the two blocks; what changed was only the hand-key assignment). Neither the main effect of group, F(1, 44) < 1, p = .49, ηp2 = .01, nor the interaction group by block, F(1, 44) < 1, p = .44, ηp2 = .01, were significant.

Table 2 Mean correct reaction times (ms) as a function of hand position (right-hand top, left-hand top) and pitch (high, low) in blind and sighted participants

The ANOVA on the dRTs with pitch and hand assignment (i.e., right hand top/left hand bottom vs. right hand bottom/left hand top) as within-subjects variables and group as the between-subjects variable yielded a significant main effect of pitch, F(1, 44) = 11.13, p = .002, ηp2 = .20, with positive dRTs in response to low tones and negative dRT in response to high tones, consistent with the characteristics of a SMARC effect (see Fig. 2). Neither the main effect of hand assignment, F(1, 44) < 1, p = .99, ηp2 = .00, nor the main effect of group, F(1, 44) < 1, p = .97, ηp2 = .00, were significant. None of the interactions reached significance (all ps > .20).

Fig. 2
figure 2

Differences in reaction times (dRTs) between bottom and top responses as a function of pitch (high, low) and group (blind, sighted). Positive dRTs indicate faster bottom-key than top-key responses. A significant SMARC effect emerged in both early blind and sighted individuals. Error bars represent ±1SEM

Error analyses

Mean error rates for sighted and blind participants in the different conditions are reported in Table 3. The ANOVA on mean error rates with block as the within-subjects variable and group as the between-subjects variable showed that participants improved with practice, with fewer errors made in the second than in the first block, F(1, 44) = 7.96, p = .007, ηp2 = .15. Neither the main effect of group F(1, 44) < 1, p = .57, ηp2 = .00, nor the interaction group by block, were significant.

Table 3 Mean error rates (%) as a function of hand position (right-hand top, left-hand top) and pitch (high, low) in blind and sighted participants

Analysis on the dErrors with pitch and hand assignment (i.e., right hand top/left hand bottom vs. right hand bottom/left hand top) as within-subjects variables and group as the between-subjects variable showed no significant main effects of hand assignment, F(1, 44) = 1.25, p = .26, ηp2 = .02, or group, F(1, 44) < 1, p = .87, ηp2 = .00. Although the pattern of dErrors was consistent with that of the dRT (see Fig. 3), with participants performing better when responding to low tones by pressing the bottom key and high tones by pressing the top key, the main effect of pitch failed to reach significance, F(1, 44) = 1.71, p = .19, ηp2 = .03. None of the interactions reached significance (all ps > .07).

Fig. 3
figure 3

Differences in percentage of errors (dErrors) between bottom and top responses as a function of pitch (high, low) and group (blind, sighted). Positive dErrors indicate fewer errors for bottom-key responses. Error bars represent ±1SEM

Discussion

In this study, we tested the potential effect of normal visual development on the association between pitch and space by comparing sighted and early blind individuals in a stimulus–response compatibility task. We found a consistent SMARC effect with respect to response latencies (see Lidji et al., 2007) in both blind and sighted participants, with all participants responding faster to low tones when these were associated with a bottom-response key press and to high tones when these were associated with a top-response key press. Importantly, in neither group was the SMARC effect modulated by hand assignment (i.e., left or right hand used to press the bottom or top key). Overall, these results extend previous findings suggesting that tones are associated with a vertical spatial continuum, demonstrating that the pitch–space association does not require normal visual experience to develop.

Our findings suggest that the spatial representation of auditory pitch can develop even in the absence of vision, likely deriving from other sensorimotor (and possibly also verbal) experiences. First, it is reasonable that auditory scene statistics for which sounds coming from higher elevations tend to have higher frequencies than those coming from lower elevations (Parise et al., 2014) can be experienced also when the visual input is not available. Moreover, the association between pitch and elevation in space may derive from anatomical features related to auditory processing and vocal emission. In particular, Parise et al. (2014) demonstrated that the anatomical properties of the outer ear evolved to mirror the statistical regularities of the external world, therefore suggesting that the mapping between pitch and elevation is at least partially embodied and independent from audio-visual binding. Moreover, low tones tend to resonate in the lower portion of the chest compared to high-pitch sounds (Zbikowski, 1998; see also Shayan, Ozturk, & Sicoli, 2011), and when people produce higher voice frequencies, the larynx moves upward in the throat, whereas it moves downward when they produce lower frequencies (see Connell, Cai, & Holler, 2013). Taken together, these observations suggest an important role of motor and bodily experiences in reinforcing the spatial representation of pitch. In this view, normal visual experience may not be strictly needed (but rather is only supportive) in experiencing the association between pitch and vertical space. Moreover, it is worth mentioning that blind individuals make use of visuospatial linguistic metaphors used by the sighted in describing pitch (e.g., “high” and “low” for high-pitched and low-pitched tones; see, for instance Antović, Bennett, & Turner, 2013; Eitan, Ornoy, & Granot, 2012; Walker, 1985; Welch, 1991). Thus, the spatial mapping of pitch may also be learned (or reinforced) in the blind by the explicit (verbal) conceptualization of pitch in terms of spatial elevation.

Our results may appear at odds with those reported by Deroy et al. (2016) that failed to find a cross-modal correspondence between auditory pitch and the perceived direction of movement of a tactile stimulation in early blind participants. However, in Deroy et al. (2016), participants were presented with an implicit association task (Greenwald, McGhee, & Schwartz, 1998) in which they had to categorize either a sound or a tactile stimulus delivered to one hand (auditory and tactile stimuli presented in different blocks) by means of a left–right key press using two fingers of the unstimulated hand. In our task, the position of the response keys (and thus of the two hands) was directly related to the mapping of sounds along the vertical dimension, this creating a direct (and possibly more detectable) conflict between the motor response required and the mental spatial mapping of pitch whenever low pitches required a response with top keys and high pitches with bottom keys. Moreover, whereas Deroy et al. (2016) used pure tones of linearly increasing and decreasing pitch, in our study we asked participants to classify single tones of different timbres. It is possible that the type of experimental stimuli used (i.e., pure tones vs. rising/descending sounds) lead to different effects on the saliency of the critical dimension, that is, the perceived distance between low and high tones (Chiou & Rich, 2012; Fernández-Prieto et al., 2012; Mossbridge, Grabowecky, & Suzuki, 2011; see Spence & Deroy, 2013, for a discussion).

In our task, hand assignment (right hand assigned to the top response key or vice versa) did not modulate performance in either blind or sighted participants. There is evidence that pitch may also be mapped (especially by musicians) along a horizontal left-to-right oriented continuum, with low pitches on the left and high pitches on the right (e.g., Lidji et al., 2007; Rusconi et al., 2006). Because of stimulus–response compatibility effects, typically reporting an up-right/down-left advantage (Lippa & Adam, 2001; Weeks, Proctor, & Beyak, 1995), the hand used to respond to low and high pitches may then also affect performance beyond the position of the response keys per se. Accordingly, Lidji et al. (2007) found a tendency toward a stronger vertical SMARC effect in participants having the right hand on the top key and the left hand on the bottom key compared to participants with the reverse mapping (but see Rusconi et al., 2006). We hypothesized that because blind individuals preferentially rely on body–hand centered over external spatial auditory representations (Vercillo et al., 2018), stimulus–response compatibility effects may be more evident in blind than in sighted participants. However, we did not find any hand position effect in either blind or sighted participants, suggesting that the mapping between pitch and vertical space (at least with a paradigm as the one used here) is not affected by the possible activation of hand-centered auditory representations (note further that the effect reported in Lidji et al., 2007, was mainly driven by participants with musical training). Our results suggest that the mapping between pitch and vertical space occurs regardless of the possible activation of hand-centered coordinates in the blind. Moreover, whether pitch is mapped along the horizontal dimension in the blind and whether this possible mapping is anchored to hand position has not been experimentally investigated and deserves further consideration.

In conclusion, while the origin and underlying mechanisms of the association between pitch and spatial elevation is still a topic of debate, our findings show for the first time that early onset profound blindness does not prevent the development of a pitch–space correspondence.