We used linear models to analyze participants’ responses in each dimension of the CIELuv space, adopting a mixed-effect approach to account for repeated measures, with random effects for participants and trial. Where individual fixed effects or interactions did not significantly improve the model fit, they were dropped, using the step() function from the lmerTest package (Kuznetsova, Brockho, & Christensen, 2018). All of the models reported below have significantly improved fit over a null model or simpler alternatives. The models were analyzed in R using the lme4 package (Bates, Maechler, Bolker, & Walker, 2013); F and p values were estimated using Straitherwaite approximations and the anova() function in the lmerTest package (Kuznetsova et al., 2018).
We examined the continuous predictors of F1 and F2 vowel formants, as well as the categorical predictors of vowel phoneme category and grapheme category, summarized in Table 2. We compared models with different predictors using the sem.model.fits function in the piecewiseSEM package (Lefcheck, 2016), which compares Akaike information criterion (AIC) values across models and estimates the marginal (fixed effect) and conditional (fixed + random effect) R2 values for each model. Although phonemic, graphemic, and acoustic factors all play roles, we found that vowel phoneme category is the best overall predictor of color choice, and that grapheme category accounts for more variation than acoustic factors. Details of the grapheme models are included in Appendix 2, but they will not be discussed at length here, since vowel category is a better predictor of responses. In the following analyses, we focus first on acoustic factors for a comparison with earlier work, followed by further analyses of vowel phoneme category.
Figure 5 shows the results in color space in the same way as in Moos et al. (2014). As in Moos et al.’s study, we found that synesthetes chose a generally wider range of colors, but the general shape of the mapping between vowels and colors was shared between synesthetes and nonsynesthetes. Figure 6 shows more exact color values chosen in each dimension for each item, plotted in F1–F2 space.
In the formant models described below, F1, F2, and synesthetic status were tested as fixed-effect predictors of L, u, and v. The models tested interactions with each acoustic predictor and synesthetic status (but not between acoustic predictors) as fixed effects, after Moos et al. (2014). Below we describe the results for L, u, and v, respectively.
For acoustic factors, more variance was accounted for in lightness choices (~ 15%, marginal R2 = .155) than in the u and v dimensions, where the fixed effects accounted for less than 10% of variation (u, red–green, marginal R2 = .017; v, blue–yellow, marginal R2 = .08).
Participants generally chose lighter colors for more front vowels and darker colors for more back vowels (Fig. 6, top left), reflected by a strong significant main effect of F2 in the model (Table 3). The model also showed that participants chose slightly lighter colors for lower vowels. Synesthetes were likely to choose slightly darker colors overall, but an interaction between F2 and synesthetic status showed that synesthetes tended to choose slightly lighter colors for front vowels than nonsynesthetes. On the other hand, the interaction between F1 and synesthetic status showed that synesthetes chose slightly darker colors for lower vowels (higher values of F1) than nonsynesthetes. Although the synesthete/nonsynesthete contrasts are strongly significant, they are difficult to detect in Fig. 6, since the color swatches reflect means of choices from a large, continuous color palette instead from a predefined set of 16 colors, as in Moos et al. (2014). For the specific estimates and p values of the fixed effects, see Table 3. For examples of the individual response patterns, see Fig. 10 below.
In terms of the u (green–red) scale (Fig. 6, top right), acoustic factors accounted for the least amount of variation in the u (green–red) scale, with less than 2% of the variation in responses being predicted by variation in F1, F2, and synesthetic status. Participants generally chose redder colors for lower vowels (higher F1) and more back vowels (higher F2). There was no main effect of synesthetic status in this dimension; however, there were significant interactions with F1 and F2. Synesthetes chose colors that were greener for low and back vowels than did nonsynesthetes. The detailed estimates, along with F and p values, are outlined in Table 4.
On the blue–yellow (v) dimension (Fig. 6, bottom), the interaction between F1 and synesthetic status was dropped, since it did not improve model fit. Participants preferred yellower colors for high (low F1) and front (high F2) vowels. Synesthetes preferred bluer colors overall; however, they chose significantly yellower colors for front vowels than did nonsynesthetic participants, reflected in the significant interaction between F2 and synesthetic status. The detailed estimates, along with F and p values, are outlined in Table 5.
Phonemes and graphemes
In the categorical predictor models described in this section, phoneme (and grapheme) category and synesthetic status were fixed-effect predictors of L, u, and v. As with acoustic predictors, these models also tested interactions between phoneme (and grapheme) category and synesthetic status as fixed effects. For grapheme category, grapheme synesthete status was used as the synesthetic predictor instead of vowel synesthete status. Since vowel category was a better predictor of responses, we will not detail the grapheme results here, but they are provided in Appendix 2. The detailed results of the vowel models are provided below.
For vowel category, we observed significant main effects of this variable in all three color dimensions (L, F = 1,493, p < .001; u, F = 166, p < .001; v, F = 660, p < .001), as well as a significant main effect of synesthetic status in the L and v dimensions (L, F = 10.73, p = .001; u, F = 0.75, p = .39; v, F = 660, p < .001). These results echo those found for acoustic factors, with synesthetes choosing lighter (estimate = 1.34, SE = 0.49) and yellower (estimate = 6.92, SE = 2.71) colors than nonsynesthetes.
To assess specific effects for vowels, we calculated contrasts between all vowel categories in each dimension, with Bonferroni adjustments for all reported p values, using the lsmeans package (Lenth, 2016). The differences between vowel categories were mostly highly significant for all color dimensions and are fully listed in Appendix 3, with estimated means, confidence intervals, t ratios, and p values. Since the contrasts were mostly significant, Fig. 7 shows which vowels were not different from one another, using black lines for uncorrected nonsignificant contrasts (p > .05), a dotted line for a contrast in which p < .05, and a dashed line for a contrast in which p < .01. In other words, the more solid a line between two vowels is, the more similarly participants responded to them. For all unmarked contrasts, p < .001. Vowels are plotted with their canonical Dutch phoneme values from Adank et al. (2004). Figure 7 shows the results in each dimension.
In all dimensions, the vowels /e, , ε, ø/ tended not to be significantly different, forming a “mid- front” grouping for which participants generally chose similar colors that were lighter, greener, and yellower than those of low back vowels. In both the L and v spaces, the vowel /i/ stood apart from this group, being even lighter and yellower. The vowels /u/ and // were also informally grouped in the u and v dimensions. In the u dimension, these also converged with /a/, although /ɑ/ was set apart as the reddest of all the vowels.
Contrasts between synesthetes and nonsynesthetes were not calculated for the u dimension, due to the lack of a main effect of synesthetic status. For the other dimensions, significant contrasts between synesthetes and nonsynesthetes are marked in Fig. 7 as follows: ***p < .001, **p < .01, *p < .05. The vowels in the “mid-front” grouping described above were generally lighter for synesthetes than for nonsynesthetes, and this was also true for the highest front vowel /i/. In the v dimension, synesthetes’ choices were generally yellower than nonsynesthetes’ for the same “mid-front” grouping, and also yellower for the high front vowel /i/. They were also slightly yellower for the low vowels /a/ and /ɑ/.
In a large sample of Dutch speakers, we found evidence of shared vowel–color associations. As in earlier work, our data showed that the acoustic factors F1 and F2 were predictive of color choices: Higher values of F1 (i.e., lower vowels) are darker, redder, and bluer; higher values of F2 (i.e., more front vowels) are lighter, greener, and yellower. These results echo those found for English speakers by Moos et al. (2014), and for Korean–English bilinguals in Kim et al. (2018). Although the general shape of associations is shared across synesthetes and nonsynesthetes (Fig. 5), synesthetes show more extreme color and lightness choices, selecting lighter and yellower colors for high values of F2 especially. The differences between synesthetes and nonsynesthetes in our results are not as marked as those reported by Moos et al.; we address several potential reasons for this in the General Discussion.
However, we also found that an approximation of phoneme category is a better predictor of color choice than the acoustic measures, indicating that categorical perception can shape the structure of cross-modal associations. As would be expected, given the acoustic results, front vowels are lighter, greener, and yellower, whereas low and back vowels are darker, redder, and bluer. There were generally no significant differences between the “mid-front” group of /e/, //, /ø/, and /ε/. However, particularly for this group, synesthetes chose slightly lighter and yellower colors than did nonsynesthetes. They also chose lighter and yellower colors for the high front vowel /i/, and darker colors for the low back vowel /u/.
Although the values from the phoneme model are very close to those from the grapheme model, comparisons of the two predictors showed that the phoneme model is significantly superior in every color dimension. It may be that grapheme category is only a good predictor of color choices insofar as it is a fairly good, though imperfect, predictor of vowel category. Although a rough mid-front vowel grouping emerged in the phoneme category analyses, this does not correspond to a larger grapheme grouping, since the four relevant phonemes map onto three different graphemes (/e/ and /ε/ to e, // to i, and /ø/ to u) in Dutch orthography.
So far, our analyses were largely concerned with the kind of questions asked in classic cross-modal association studies, linking specific colors or color dimensions to acoustic and phonemic features. We also looked at contrasts between synesthetes and nonsynesthetes, based largely on consistency across trials. Although the temporal consistency of mappings is rightly considered a benchmark of genuine synesthesia, and some earlier studies have considered how the mappings of synesthetes relate to those of nonsynesthetes, less consideration has been given to the internal structure of synesthetic and cross-modal mappings. The traditional approach tells us something about whether synesthetes choose colors for sounds that are different from or similar to the kinds chosen by nonsynesthetes, but it is less adept at detecting overall structure in cross-modal mappings or telling us whether the mappings of synesthetes are more internally structured than those of nonsynesthetes. Are there structural regularities in how we link one sensory domain to another? Does the shape of the vowel space map onto the color space more reliably for synesthetes than for nonsynesthetes?
We operationalized structure in this context by comparing paired distances across spaces, using a method borrowed from ecology (Mantel, 1967). This method has been used extensively in iterated artificial-language learning studies to detect structured mappings between form and meaning spaces (e.g., Kirby, Cornish, & Smith, 2008). The Mantel test in this context measures whether distances in form correlate with distances in meaning. To the extent that they do, we can say there is structure in the mappings between two spaces.
In the context of the present data, for example, a mapping would be structured when pairs of vowels that are similar in F1–F2 acoustic space map onto pairs of colors that are similar in three-dimensional color space, and when pairs of vowels that are dissimilar in F1–F2 acoustic space map onto pairs of colors that are dissimilar. Thus, structure implies a degree of isomorphism across (multidimensional) sensory spaces—in this case, acoustic and color spaces.
Whereas our earlier consistency measures had used CIELuv space to align with prior work (Rothen et al., 2013), for this measure we used the related CIELab space. In CIELab space, the L dimension is identical, whereas a corresponds to a green–red continuum (similar to u) and b corresponds to a blue–yellow continuum (similar to v). The benefit of CIElab space for the structure measure is that it allows us to use more perceptually realistic distances, specifically ∆E2000 (Sharma, Wencheng, & Dalal, 2005). The ∆E2000 distance takes into account that Euclidean distances have nonuniform perceptual effects, particularly at the edges of the color space. For example, as lightness increases to the white point, the perceptual differences between chroma shrink and eventually disappear, even though their plain Euclidean distance in the CIELab or CIELuv spaces would be identical to those of two perceptually distinct colors elsewhere in the space. Therefore, our structure measure relies on ∆E2000 distances in CIELab space. For vowel distances, we used Euclidean distance in F1–F2 space using the canonical phoneme values shown in Fig. 7.Footnote 4 Since Euclidean distance can be high-dimensional, this allowed us to use all three dimensions of a participants’ color response to an item at once.
Where there is structure, the pairwise distances within each space will be correlated with one another. Once we had pairwise distances for every mapping in each space from our participants’ data, we permuted the vowel–color mapping between the two spaces and then recalculated the pairwise distances in order to get a distribution of potential correlations between the two spaces. Using this distribution, we obtained a z score indicating where a given participant’s mappings were on the actual distribution, and a p value that indicated the likelihood that any random mapping of the vowel–color space would be more structured than the actual one.Footnote 5 In other words, p < .05 in this case means that fewer than 5% of mappings generated in the simulation were more structured than the real one.
Pairwise distances were calculated between all sounds and all colors chosen by a particular participant, and a veridical correlation was calculated between these distance matrices. To account for multiple responses to the same stimuli, the responses were first shuffled within a particular item, and then across items. These matrices were then shuffled into 10,000 random permutations, each with its own r, allowing us to calculate a z score as described above. Python code for performing the Mantel structure analyses is available online at http://github.com/mdingemanse/colouredvowels. Figure 8 shows a density plot of z scores of synesthetes and nonsynesthetes in the vowel–color association task.
Three findings stand out. First, the mappings of synesthetes tend to be more structured than those of nonsynesthetes (t = – 7.09, df = 660, p < .001). Second, the majority of participants’ mappings are more structured than would be expected by chance: All participants to the right of the vertical dotted line had correlations between the vowel and color spaces greater than 95% of random permutations generated by the Mantel test. Third, there is correlation between the structure score and CIELuv consistency scores: Participants with more consistent associations across trials (i.e., lower consistency scores) tended to have more structured mappings across the vowel and color spaces (r = – .313, t = – 11.07, p < .001; Fig. 9).
This measure provides a new way to quantify the structure of cross-modal mappings and is a valuable quantitative complement to traditional unimodal consistency scores. Figure 10 shows individual participants that fall in specific parts of the consistency-structure space. The participant in panel a, who was classified as a synesthete according to consistency, appears to have achieved this by having high consistency across items (i.e., choosing the same color regardless of stimulus or trial), rather than by having structured, temporally consistent associations. This indicates that participants with high consistency and low structure are less likely to be genuine synesthetes, perhaps explaining the slight peak of unstructured synesthetes in Fig. 8. The participant in panel b has both low structure and low consistency, having chosen idiosyncratic colors on each trial and across the space, and sometimes mapping distant vowels (e.g., low-central and high-back/high-front vowels) with similar colors.
A nonsynesthete participant with middling consistency and significant but not especially high structure is shown in panel c. This participant shows structured mappings for some parts of the space—for instance, showing similar yellow/green choices for the cluster of high-front vowels, and brownish choices for the cluster of high back vowels, resulting in significant structure. However, this participant was inconsistent across trials for the same item, distinguishing the participant from highly structured synesthetes like the participant in panel e, showing categorical, structured associations that are highly consistent across trials. Finally, panel d shows a participant with high structure but low consistency: This participant made structured mappings across the space, but seems to have done so differently on each trial, as indicated by the inversions of green/blue in mid-front vowels and red/blue in back vowels across trials.