Introduction

During the past years, accumulating evidence support that language and manual processes would be tightly bound (for reviews, see Kendon, 2004; McNeill, 1992). In this vein, a line of inquiry has highlighted a peculiar link between vocalization and grasping. In a seminal work, Gentilucci et al. (2001) asked participants to open their mouth while they have to grasp an object with their hand. The size of the hand aperture increased according to the size of the mouth opening during the grasping. This result is in line with the neurophysiological discovery that some neurons of the monkey’s cortex (i.e., area F5) were able to discharge both when the monkey grasps an object with his hand and with his mouth (Rizzolatti et al., 1988). Interestingly, this cortical area is considered as the homologue of the Broca’s area in the human brain. Accordingly, it has been proposed that, initially, this class of neurons was associated to ingestion behaviors that required to finely coordinate the opening of the hand and mouth to properly grasp and eat food. Later, in humans, these neurons could have been used to control the mouth opening for communicative purposes (Gentilucci & Corballis, 2006). This hypothesis is experimentally supported by Gentilucci and Campione (2011). In their study, the participants were instructed to pronounce the vowel /a/ or /i/ while they reached and grasped an object. These syllables have been chosen because they are respectively associated to a large and small mouth opening. Data suggests that the hand aperture during the reach-to-grasp action is larger when the syllable /a/ is pronounced and smaller for the syllable /i/. Therefore, it seems that vocalization and grasping share common neurocognitive processes presumably related to the opening of the mouth and the hand.

This hypothesis is reinforced by a series of studies conducted by Lari Vainio and his colleagues (for a review, see Vainio, 2019). In 2013, Vainio et al. employed a modified version of the visuo-motor priming paradigm developed by Ellis and Tucker (2000). In their experiment, the syllable KA or TI were presented written in light gray during 600 ms. Then, it immediately turned blue or green. The participants were instructed to simultaneously read aloud the syllable when it changed of color, and to perform a power or a precision grip according to the syllables’ color. Since the work of Napier (1956), the power grip is known to involve a large aperture of the whole hand to grasp relatively large objects (e.g., an apple) while the precision grip involves a small grip between the thumb and the forefinger to grasp smaller objects (e.g., a cherry). Therefore, to manipulate the kind of grip performed allows to manipulate the aperture of the hand. Vainio et al. (2013) found that the power grip was carried out faster when the syllable KA was read compared with TI and conversely for the precision grip. Vainio et al. (2014) go further and found that silently reading the syllable KA and TI, or merely heard them pronounced, is enough to find the same effect (for converging data, see Komeilipoor et al., 2016; Vainio et al., 2017a; Vainio & Vainio, 2022). Besides, Tiainen et al. (2017) evidenced the reverse influence: the syllable /kɑ/ and /ti/ were pronounced faster when participants performed a concomitantly power or precision grip, respectively. According to the hypothesis of Gentilucci and Campione (2012), this compatibility effect occurs because the syllables (i.e., KA and TI) and the grips (i.e., power grip and precision grip) would share the same neurocognitive processes involved in a planning of hand and mouth opening.

In this article, we aimed to test an alternative approach according to which there is a possible confounding variable in the previous studies reporting the influence of syllables on power and precision-grip responses. Indeed, to perform each kind of grip, the participants were instructed to use a responses device first designed by Ellis and Tucker (2000). It was generally composed of a small switch grasped between the thumb and the forefinger (to mimic a precision grip) and a large switch grasped between the three remaining fingers and the palm of the same hand (to mimic a power grip). Unfortunately, it is now well-documented that some compatibility effects observed with this device could be due to a compatibility with the size of switches rather than with the kind of grip (Guerineau et al., 2021; Haddad et al., 2023; Harrak et al., 2022; Heurley et al., 2020, 2023). For instance, Ellis and Tucker (2000; see also Tucker & Ellis, 2001, 2004) originally reported an experiment showing that seeing objects usually graspable thanks to a power and precision grip (e.g., an apple vs. a cherry) facilitate respectively power- and precision-grip responses performed on their specific device. This result has been explained thank to a “simulationnist” account (e.g., Barsalou, 2008, 2009) according to which, seeing a graspable object led to the simulation of the grasping action usually performed to grasp it (e.g., a power grip for an apple). Such simulation would in turn potentiate the compatible response (i.e., power grip). However, Heurley et al. (2020) found that the same kind of graspable objects can also facilitate mere keypress responses performed on a large and small switch. Therefore, the effect originally reported by Ellis and Tucker (2000) could be explained thanks to a compatibility of size codes. The objects usually grasped thanks to a power grip are generally larger than the ones usually grasped thanks to a precision grip and would be therefore coded respectively as large and small. Similarly, at the response level, the power-grip response would be coded as large while the precision-grip response would be coded as small because of the size of the pressed switches. Thus, when the two codes are compatible, the responses should be facilitated compared with the noncompatible condition. Such explanation has been called the size coding account (Masson, 2015; Proctor & Miles, 2014).

Accordingly, it is possible to argue that the influence of syllables on power and precision-grip responses could be likewise explain thanks to the size coding account. More precisely, the syllables KA and TI would not specifically facilitate to perform a power and precision grip respectively, but more widely to press a large or small switch. If it was the case, it is thus possible that the kind of neurocognitive processes underlying interactions between vocalization and grasping could be less specific than originally envisioned. In other words, they could not be grasping specific but rather based on a compatibility of size codes. To test this possibility, we conducted a preregistered experiment using the same protocol than Vainio et al. (2014, Experiment 1), in which power and precision-grip responses were influenced by silently reading the syllables KA and TI. However, the participants did not have to perform each grip to respond but had to press a large or small switch. If the influence of these syllables on grasping is not grasping specific, we should find that the silent reading of the syllable KA facilitates one to press the large rather than the small switch and conversely when the syllable TI was silently read (i.e., Syllable × Switch Size interaction).

Method

This experiment has been preregistered (https://osf.io/htw3f). The raw data, the analyses, the OpenSesame scripts and the material are available online (https://osf.io/d7cq8/).

Participants

To determine an appropriate sample size to detect the Syllable × Switch Size interaction, we conducted an a priori power analysis using G*Power 3 (Faul et al., 2007), using the F test family and “ANOVA repeated measures, within factors” as a statistical test. The results indicated that a minimum of 14 participants is required to detect an effect size as large as \({\hat{\upeta}}_{\textrm{p}}^2\) = 0.15 in a within-subject design with 95% statistical power (α = 0.05). We selected this effect size because in the analysis of their first experiment, Vainio et al. (2014) reported an effect size of \({\hat{\upeta}}_{\textrm{p}}^2\) = 0.50 for the Syllable × Response interaction. We wanted to have a safeguard against a potential overestimation of the effect in their analysis (see Perugini et al., 2014). Thus, we used the lower limit of the 90% CI for \({\hat{\upeta}}_{\textrm{p}}^2\) [0.15, 0.67] of their effect.

To compensate for a possible decrease in sample size following the discarding of outliers and to properly equalize the number of participants between our four mapping groups, our sample was composed of 16 students at the University Paris Nanterre (10 females and six males; 15 right-handed and one left-handed; Mage = 21.6 years, SDage = 3.0; range: 18–27). They all reported having a normal or corrected-to-normal visual acuity and did not report color perception issues (e.g., color-blind). All participants were naïve to the goal of the experiment and were French native speakers. They were volunteers, provided their written informed consent to participate for course credit, and were debriefed at the end of the experiment. This study was in accordance with the ethical principles of the American Psychological Association (2017).

It is noteworthy that the data of 4 participants have not been correctly saved by OpenSesame. Accordingly, we rerun 4 additional participants in order to properly reach a minimum of 16 participants.

Material and apparatus

The stimuli were four syllables: KA, TI, PY, and MO, written in capital letters in the center of the screen. We designed three versions of each syllable: a light-gray, a green, and a blue version always presented against a white background. Their coordinates in the RGB color model are respectively, “R = 166, G = 166, B = 166”; “R = 0, G = 176, B = 80”; and “R = 0, G = 101, B = 176.” To avoid differences between the discrimination of the two target colors (i.e., blue, and green), we equalized their luminance (i.e., we settled their luminance at 88/255 on the TSL model). The response device was an AZERTY keyboard whose alphanumeric keys have been removed except the “D” and “L” keys. We constructed two switches: The small switch was a usual keyboard key that was cut to measure 0.5 cm × 0.5 cm, while we fixed a plastic component measuring 5 cm × 4 cm on a usual keyboard key for the large switch. According to the experimental condition, we placed a blue or green sticker on each switch. Based on the size of the switches, one sticker was large and the other one was small (see Fig. 1).

Fig. 1
figure 1

The device used during the experiment. White transparent squares have been added to better understand where switches were located, their size, and how participants were instructed to press them

Procedure

We designed a procedure as close as possible to the one used by Vainio et al. (2014, Experiment 1). The experiment was running in a quiet room where each participant will sit facing a monitor (23-in.; refresh rate: 60 Hz; placed at 60 cm). During each trial, a blank screen was displayed for 2,000 ms, immediately followed by a syllable appearing in light gray during 600 ms. Then, the syllable turned either green or blue and remained in view for 2,000 ms or until the response of the participant. Participants were instructed to respond as fast and accurately as possible according to the color of the stimulus if the syllable was KA or TI (i.e., go trials), and to withhold their response if the syllable was MO and PY (i.e., no-go trials). This procedure ensures that participants properly read the syllable displayed.

In go trials, the participants have to use either the large or small switch according to the color of the syllable. The keyboard was putting in front of the participants that will be instructed to used only their dominant hand to respond (for a similar method, see Heurley et al., 2020). Participants must press the “L” key with their index and the “D” key with their thumb (see Fig. 1). To properly counterbalanced the keys (D vs. L), the color (blue vs. green) and the size of switches (large vs. small), we designed four groups in which each participant has been randomly assigned (i.e., four participants per group).Footnote 1 For instance, one participant has to press the “L” key (e.g., large switch) with their index if the syllable was blue and the “D” key (i.e., small switch) with their thumb if the syllable was green. Both responses were recorded using Open Sesame (Version 3.3.9) on an HP-Probook-650G1 2.40 GHz computer. After familiarization trials (n = 24), there were 216 test trials: 96 no-go trials (48 with the PY syllable and 48 with the MO syllable) and 120 go trials (60 with the KA syllable and 60 with the TI syllable all appearing both in blue and in green version).

It is noteworthy that the words “small” and “large” was never used to denote response keys during the instructions, the experimenter denoted each switch thanks to their associated color (i.e., the blue switch vs. the green switch). Additionally, the participants were not allowed to read the syllables aloud. Therefore, the experimenter checked whether the participant do not overtly articulate the syllables as in Vainio et al. (2014, Experiment 1).

Finally, participants completed the 4-items version of the Edinburgh Handedness Inventory (Veale, 2014) and a short questionnaire to gather complementary information about possible vision impairments, the native language of participants, their feedbacks about the use of the device, as well as their guess about the aim of the experiment. The whole procedure was controlled by OpenSesame (Version 3.3.10; Mathôt et al., 2012)

Results

We only analyzed Response Times (RTs) on GO trials (i.e., when the syllable was KA or TI). RTs were defined as the delay between the presentation of the colored syllable and the time when the participants press the key to respond. We removed RTs from familiarization trials and incorrect test trials (1.6%), as well as RTs below 200 ms and above 1,200 ms (0.2%) as in Heurley et al. (2020).Footnote 2 It is noteworthy that 14 responses were executed in no-go trials and 42 responses were withheld in go trials. We used a repeated-measures ANOVA, with participants as a random variable, response type (large vs. small switch) and syllables (KA vs. TI) as fixed within-participants independent variables.

The ANOVA did not reveal neither a main effect of the response type, F(1, 15) = 0.002, p = .97, \({\hat{\eta}}_p^2\) = .00, nor of the syllable, F(1, 15) = 0.11, p = .75, \({\hat{\eta}}_p^2\) = .00. The ANOVA revealed a significant Response Type × Syllables interaction, F(1, 15) = 10.92, p < .005, \({\hat{\eta}}_p^2\) = .42. Planned comparisons showed that large-switch RTs were shorter when the syllable KA was read (M = 431 ms; SD = 67) than when it was the syllable TI (M = 448 ms; SD = 73), F(1, 15) = 7.28, p < .02, \({\hat{\eta}}_p^2\) = .33, and small-switch RTs were shorter when the syllable TI was read (M = 432 ms; SD = 67) than when it was syllable KA (M = 446 ms; SD = 76), F(1, 15) = 5.17, p = .04, \({\hat{\eta}}_p^2\) = .26.

Discussion

Our main goal was to investigate the nature of the close link between vocalization and grasping (Gentilucci & Campione, 2012; Vainio, 2019). It is possible that the syllable KA and TI did not specifically facilitate a specific grip (e.g., Vainio et al., 2013) but, more generally, to press a large versus small switch. To test this alternative account, we designed a preregistered experiment based on Vainio et al. (2014, Experiment 1) in which participants were merely instructed to press a large vs small switch. Data gathered show that reading the syllable KA and TI silently facilitates to press the large and small switch, respectively. In the remainder of the discussion, we draw some methodological and theorical conclusions.

The methodological counterpart

Our current results invite researchers to be very careful with the interpretation of the link between vocalization and grasping. Indeed, our results critically highlight the need to properly control the size of the switches used to respond in order to firmly conclude to an involvement of grasping-related processes. Without this precaution, it could be still arguable that the interaction could instead depend on the size of the switches. This methodological consideration is particularly valuable for studies using a response device as the one originally designed by Ellis and Tucker (2000) in which power- and precision-grip responses were carried out on a large and small switch respectively. But this consideration can be also extended to studies using reach-to-grasp movements directed toward a small or large target object (e.g., Gentilucci et al., 2009) that can drove the compatibility effect independently of the grasp used. Besides, it is interesting to note that this methodological precaution should be also extended to studies using the same kind of response device in order to support a close link between grasping and number processing (e.g., Lindemann et al., 2007; Moretto & Di Pellegrino, 2008). Maybe, in this case also, the facilitation of power and precision grip responses because of large and small numbers respectively occurs only because of the size of the switches used.

The size coding account

One critical question is how to explain that silently reading the syllable KA and TI can facilitate the press of large and small switch, respectively. It seems, therefore, that the influence of vocalization on manual gesture is less specific than originally thought, and that it critically relies on the size. One valuable explanation lies in the size coding account (Heurley et al., 2020, 2023; Masson, 2015; Proctor & Miles, 2014) initially developed to explain the potentiation effect of grasping behaviors reported by Ellis and Tucker (2000; see also, Tucker & Ellis, 2001, 2004). Concretely, it is assumed that stimulus–response compatibility effects could occur when the stimulus and the response share a similar size code compared with the condition where the size codes mismatch. For instance, the participants could be faster to respond on a large switch when they saw a large object (e.g., an apple) rather than a small one (e.g., a cherry) because in the former case the stimulus (i.e., the object) and the response are both coded as “large” while in the latter case the size codes differ (for supporting data, see Guerineau et al., 2021; Haddad et al., 2023; Harrak et al., 2022; Heurley et al., 2020, 2023). This account is a modified version of the spatial coding account developed by several authors (Nicoletti & Umiltà, 1984; Wallace, 1971) to explain the widely known Simon effect (Simon & Rudell, 1967).

In this vein, silently reading the syllables could lead to their automatic coding along the size dimension with KA coded as “large” and TI coded as “small.” Then, these coding can match (or mismatch) the size code associated to the responses themselves. We hypothesized that the size coding of the syllables can occur for at least three, not mutually exclusive, reasons. First, KA could be coded as large because when this syllable is pronounced it implied a larger mouth opening than TI. More precisely, vowels as /i/ implied a narrow mouth shape while vowels as /ɑ/ implied a great opening of the vocal tract (Vainio & Vainio, 2022). This difference between mouth opening can be easily perceived by the participants through their proprioceptive receptors that could favor the relative coding of the syllables/vowels as large and small. This hypothesis could therefore explain various results gathered thanks to experimental protocols requiring participants to vocally pronounce those syllables (e.g., Gentilucci & Campione, 2011; Tiainen et al., 2017; Vainio et al., 2013). It is noteworthy that this explanation also helps us to understand why some consonants could facilitate power- and precision-grip responses as well. Vainio and Vainio (2022), for instance reported that the consonant k is able to facilitate a power grip while the consonants d, s, or t facilitate instead a precision grip. As the vowel /i/, the consonants d, s, or t required a narrow mouth shape while the consonant k, like the vowel /ɑ/, require instead a larger opening. From this viewpoint, the interaction between vocalization and grasping would not be directly due to grasping-related processes but mediated by size codes themselves depending on the opening of the mouth/hand. Hence, this reasoning is not in agreement with the proposal of Vainio and Vainio (2022), who assumed that alveolar consonants (e.g., d, s, or t) should be associated with a precision grip as their vocalization entails a movement of the tip of the tongue similar to the pinching action performed by the thumb and index finger during a precision grip.

Second, it could be argued that when the syllable KA is pronounced, the acoustic properties of the tone vocally produced have a lower pitch than the ones produced when the syllable TI is pronounced. Cross-modal studies have established that low and high tone were naturally and automatically associated to large and small object respectively (Parise & Spence, 2009; see also, Guerineau et al., 2021, for converging evidence). Therefore, it is possible that KA and TI are coded as large and small because of cross-modal pitch/size associations. However, it is noteworthy that this hypothesis is less appropriated to explain the compatibility effects between consonants and grasping (Vainio & Vainio, 2022). Further studies should however be careful to the pitch of the tones when consonants are pronounced. For instance, the vocalization of the consonant k can induce low-pitch tones while the consonants d or s instead can induce higher-pitch tones explaining why the former facilitates a power grip and the latter a precision grip.Footnote 3

A last possibility could be that the syllable KA is coded larger than the syllable TI because the former appeared visually larger than the latter when both are written as in our experiment. Indeed, the letter i is visually smaller than the three other letters composing both syllables. This hypothesis is in line with the possibility to find a size-based compatibility effect only because of the visual size of stimuli (e.g., Harrak et al., 2022; Heurley et al., 2023).

A possible involvement of grasping-specific processes

It is noteworthy that even whether the size coding account can explain the compatibility effects between vocalization and grasping, it cannot be excluded that grasping-specific processes could also be at work as originally advocated (e.g., Gentilucci & Campione, 2012). This possibility is supported by various results. First, Gentilucci and Campione (2011) showed that pronouncing the syllable KA and TI modulate the grip aperture during reach-to-grasp movements. Interestingly, in this experiment, the responses cannot be coded as large and small because participants always performed the same gesture directed toward the same target object. Therefore, it is possible that this effect is due to grasping-specific processes associated to syllable processing. However, we nonetheless cannot exclude the possibility that the influence of syllables on the grip aperture could also be non-grasping specific. Indeed, it could be due to a possible alteration of the hand opening because of the size codes associated to the syllables KA and TI, even when responses are not coded themselves along this dimension. Further studies should be designed to test this possibility.

In the same vein, Vainio, Rantala, et al. (2017a) reported that the perception of pictures of a hand carrying out a power or a precision grip can facilitate pronouncing the syllables KA and TI respectively while it was not the case when the pictures depicted abstract shape that were large and small (Vainio et al., 2017b). Similarly, Vainio et al. (2019) reported that large manipulable objects usually grasped thanks to a power grip (e.g., an apple) and small manipulable objects usually grasped thanks to a precision grip (e.g., a cherry) respectively facilitate pronouncing the syllables KA and TI while it is no longer the case when large and small objects were ungraspable (e.g., an elephant). Altogether, these works pointed toward the possibility that the size may not be sufficient to induce systematic interactions with vocalization. Grasping-specific processes seem necessary under some conditions which need to be clarified with further studies. In addition, our current demonstration is restricted to an influence of the syllables KA and TI and, especially, when the task was to silently read each syllable. However, various studies have also reported an influence of another kind of syllables on grasping responses and using another kind of tasks (e.g., Vainio et al., 2013). Other works also suggests a reverse influence of grasping on vocalization (e.g., Tiainen et al., 2017; Vainio et al., 2019). Further studies should be therefore developed in order to test whether the size coding account can be also applied to these additional conditions or whether the involvement of grasping-specific processes is a more reasonable explanation.

It is interesting to notice that Vainio and Vainio (2021, 2022) proposed a possible link between vocalization/grasping interactions and sound–magnitude symbolism effects. The “sound/magnitude symbolism” refers to the systematic mental associations between specific speech sounds and magnitude concepts (e.g., Sapir, 1929). For instance, high and front vowels are typically associated with small concepts, while low and back vowels are associated with large concepts (Vainio & Vainio. 2021). Vainio and Vainio (2021, 2022) suggest that these associations could be mediated by grasp-specific processes. Applied to our data, it could be argued that the syllables KA and TI would be respectively associated to a large and small magnitude because of shared connections between hand and mouth opening. This magnitude would in turn favor a response performed on a large and small switch because those switches would be associated to similar magnitudes. In a nutshell, such view is able to predict any association between vocalization and size like the size-coding account, including our results. The predictions are thus no longer grasping-specific in the sense that predictions are not restricted to a relation between vocalization and grasping. The main difference however with the size-coding account relies on how the magnitude is associated to the syllables. While the size coding account assumed there is no need for grasping-specific processes, the perspective developed by Vainio and Vainio (2021, 2022) assumed the necessity of such processes. Further studies should be therefore more tightly designed to disentangle these accounts.

Conclusions

To sum up, our current work supports the possibility that silently reading the syllable KA and TI cannot only influence grasping behaviors but also merely keypress responses performed on a large and small switch, respectively. This result is of primary importance because it suggests an alternative account of the link between vocalization and grasping that does not involve grasping-specific neurocognitive processes, as it is usually advocated (Gentilucci & Campione, 2012; Vainio, 2019).