We tested 32 grapheme–colour synaesthetes and 32 nonsynaesthetic controls yoked for age, gender, education, first language, and handedness. In both groups, mean age was 30 years (SD = 10 years, range of synaesthetes: 18–57 years, range of controls: 18–55 years), 22 participants were female, 28 were right-handed, and 28 were native English speakers. Synaesthetic experiences were confirmed by testing the consistency of grapheme–colour associations (mean score = 0.75, SD = .25) in our sample of synaesthetes (Eagleman, Kagan, Nelson, Sagaram, & Sarma, 2007; Rothen, Seth, Witzel, & Ward, 2013b). On this test, synaesthetes typically score <1 and controls score around 2. None of our controls reported experiencing grapheme–colour associations. Synaesthetes were recruited via our synaesthesia website hosted at the University of Sussex (www.sussex.ac.uk/synaesthesia). Controls were recruited through a University of Sussex participant database and advertisements on notice boards at the university. Participants were tested individually and paid at the rate of £5 per hour for their participation. The study was approved by the local ethics committee of the University of Sussex.
A total of 120 four-letter words were selected from the Medical Research Council Psycholinguistic database (Coltheart, 1981). The study phase used 70 words (10 primacy words, 50 midlist words, 10 recency words), and the test phase used 100 words (the 50 midlist ‘old’ items and 50 new items). The two lists of 50 words (Lists A and B) were counterbalanced across each yoked pair of participants, so Lists A and B were used as old/new equally often. The words of List A had a mean frequency of occurrence of 72 (SD = 59, range: 10–200; Kucera & Francis, 1967), a mean score of 422 on the imageability scale (SD = 54, range: 302–498), and a mean score of 383 on the concreteness scale (SD = 62, range: 255–500) in the database. The words of List B had a mean frequency of occurrence of 72 (SD = 59, range: 10–200; Kucera & Francis, 1967), a mean score of 423 on the imageability scale (SD = 53, range: 307–499) and a mean score of 375 on the concreteness scale (SD = 63, range: 244–481) in the database. The remaining 20 words in the primacy and recency trials were in the same range of the specified measures. All words consisted of lowercase letters. Four hash symbols in a row (####) served as mask. Words consisted of black 20-pt Courier font, and the mask consisted of black 26-pt Courier font. All stimuli were presented against a grey background.
The experimental procedure was based on Berry et al. (2008a) and consisted of a study and a test phase (see Fig. 1). At the start of the study phase, participants were informed that they would be presented with words flashing on the screen for longer and longer durations, which would make them easier to identify over time. There was no indication of the upcoming test phase. They were instructed to press the space bar on the keyboard as soon as they were able to identify the word, and thereafter to say it aloud. They were advised to do this as fast as possible, but to avoid making errors. Individual trials always started with the presentation of the mask for 500 ms. The initial mask was followed by a 250-ms presentation block consisting of the word displayed for 16.7 ms and the mask for 233.3 ms (the screen refresh rate was set to 60 Hz). This was immediately followed by another 250 ms block, but with the word exposure duration increased by 16.7 ms (resulting in 33.4 ms) and the mask duration decreased by 16.7 ms (resulting in 216.6 ms). The procedure of increasing word exposure duration by 16.7 ms and decreasing mask duration by 16.7 ms was continued until the mask presentation was 0 ms (i.e., 15 blocks in total, or 3,750 ms from the onset of the word after the initial mask, respectively). However, when a response was made during this procedure by pressing the space bar on the keyboard, the mask was immediately presented for 2,000 ms. Below the mask the message ‘Say the word aloud’ was displayed. Thereafter, to start the next trial the instruction ‘Press “C” to continue’ appeared on the screen. RTs were recorded from the onset of a word after the initial mask to the response. RTs longer than 3,750 ms were not registered. In such a case the message ‘Try to be faster on the next trial’ was displayed. Words were presented in random order within their respective list—primacy, midlist, and recency.
The test phase began immediately after the study phase. The general procedure was the same as in the study phase; participants were required to press the space bar on the keyboard as soon as they were able to identify a word. Old and new words were presented in random order. However, after a word was identified, participants were required to judge whether the word was old or new (i.e., one of two designated keys had to be pressed). If a word was judged as old, participants were required to indicate by key press whether they thought it was old because they remembered something specific (remember), it just felt familiar (know), or they were guessing (guess). Similarly, if a word was judged as new, participants had to decide whether they thought it was new because they were sure, it felt unfamiliar, or they were guessing. Thereafter, the instruction ‘Press “C” to continue’ appeared on the screen to start the next trial.
Next, synaesthetes, but not controls, were presented again with all the words from the test phase, one at a time, in random order. Each word was accompanied, on the same screen, by a palette of 13 basic colours, the same each time, but randomly arranged on each trial. Participants were required to select the colour which best matched the colour elicited by the word. If a word did not elicit a colour, they were asked to choose black (for a similar method, see Rothen & Meier, 2010).
The data reported here are available at the Open Science Framework (https://osf.io/nyqbp/ or doi:10.17605/OSF.IO/NYQBP). For the analysis of the study phase, primacy and recency trials were not taken into consideration. All trials in the study and test phase which elapsed without key press, trials with delayed key press (where the word was articulated before the key press), misidentification trials, and trials with RTs less than 200 ms were regarded as errors and excluded from the analysis. Only trials that were correct in both phases in this respect entered the analysis of the test phase. The alpha level was set to .05 for all statistical analyses, and t tests were two-tailed. We applied the Greenhouse–Geisser correction where the assumption of sphericity was violated on tests involving repeated-measures factors with more than two levels.
Full details of the models which fit both RT and responses can be found in previous articles (Berry et al., 2012; Berry et al., 2014). The SS model is based on signal detection theory (Green & Swets, 1966); a core assumption is that each item at test is associated with a memory strength variable, f, which is a normally distributed, random variable, with mean μ and standard deviation σf (i.e., f ~ N(μ, σf)). Because of exposure during the study phase, the mean f of old items is assumed to be greater than that of new items (μold > μnew). To generate a recognition judgment for an item, its value of f is first added to er to give Jr, where er is an independent, normally distributed random variable with a mean fixed to zero and standard deviation of σr—that is, Jr = f + er, where er ~ N(0, σr), and er represents noise that is specific to the recognition task. As in signal detection theory, if an item’s value of Jr exceeds a criterion, C, it will be judged old, or else it will be judged new. For a given item, the same value of f that was used to generate Jr is also used to generate its identification RT in a CID-R task. An important difference, however, is that f is subjected to another independent source of noise, ep, and the identification RT is assumed to be a decreasing function of f—that is, RT = b − sf + ep, where ep is a normally distributed random variable with a mean fixed to zero and a standard deviation of σp (i.e., ep ~ N(0, σp)), and b and s are scaling parameters, which represent the RT intercept and slope, respectively. Thus, the greater the value of f of an item, the more likely it is to be judged old, and the more likely it is to have a relatively short identification RT. Old items are therefore more likely to be judged old than new items and show a priming effect. Furthermore, because σp is typically greater than σr, as μold increases, this will tend to have a larger effect on recognition than priming. The model represents the idea that the word recognition advantage in synaesthesia is driven by the same signal as word identification, and is not based on a second independent signal.
Under a dual coding account, colour information would be a factor that affects recognition and not priming, and this, in principle, should weaken the association between the two. It seems reasonable to ask whether colour information is simply a factor that affects the recognition noise parameter er, and, if so, whether changing its standard deviation σr would enable the SS model to capture the effects of dual coding. Although it is true that increasing σr would weaken the association between identification RTs and recognition decisions (all other parameters being held constant), this change would result in a lower predicted value of d′, and so the model would not simultaneously be able to predict the recognition enhancement in synaesthesia. Thus, the effects of dual coding cannot be captured by the er parameter.
The MS1 and MS2 models are modifications of the SS model. The MS1 model is the same as the SS model but includes a distinct memory strength signal for the ‘explicit’ (i.e., fr drives recognition) and ‘implicit’ (i.e., fp drives priming) parts of the memory task, and fr and fp are used analogously to f in the SS model to model Jr and RT, respectively. In the MS1 model, fr ~ N(μr, σf) and fp ~ N(μp, σf), where μr and μp are free parameters, and fr and fp are uncorrelated (i.e., r(fr, fp) = 0). This allows the MS1 model to produce independent effects of a variable upon recognition and priming and also conditional independence of the RT and judgment. As such, the idea that the advantage in word recognition memory in synaesthesia is based on a signal, independent of that which drives priming, is directly represented in this model.
The MS2 model is a weaker representation of the notion that colour information is driving the recognition advantage in synaesthesia (i.e., a ‘weaker’ version of the MS1 model). The model is identical to the MS1 model, except that explicit and implicit memory strength signals can be positively correlated (i.e., r(fr, fp) ≥ 0), for example, due to distinctiveness (with correlation w). That is, increased distinctiveness may increase encoding efficiency for both colour and word information. The MS2 model can produce any result that the other models, SS and MS1, can. Keeping average memory signal strength μr and μp equal and setting w to 1, the model reduces to the SS model. Allowing μr and μp to vary independently of one another and setting w to 0, the model reduces to the MS1 model (cf. Berry et al., 2012).
The SS, MS1, and MS2 models were fit to the data using maximum likelihood estimation (see Berry et al., 2014; Berry et al., 2012). A likelihood value can be obtained for every trial in the test phase, given particular parameter values. An automated search procedure was used to find the parameter values that maximized the summed log likelihood across trials. As in previous applications of the models (e.g., Berry et al., 2014), there were five free parameters in the SS model: μ, the mean f of old items; σp, the standard deviation of the noise associated with RT generation (ep); b, the RT intercept; s, the RT scaling parameter; and C, the decision criterion. The MS1 model has five free parameters: b, σp, and C, as in the SS model, and also μr and μp, the mean of the explicit and implicit item strengths, respectively. Finally, the MS2 model contained six free parameters: In addition to the five free parameters of the MS1 model, the parameter w, representing the correlation between fr and fp, was free to vary. As in previous studies, numerous parameter values were fixed: The mean of er and ep, the noise variables for recognition and priming were set to equal zero; σf, the standard deviation of f (in the SS model) and fr and fp (in the MS1 and MS2 models), was set to equal √0.5; σr, the standard deviation of the recognition noise (er) was set to equal σf; the mean f (in the SS model) and mean fr and fp (in the MS1 and MS2 models) of new items was fixed to zero; and finally, the value of s in the MS1 and MS2 models was fixed to the estimate of s in the SS model. Separate models were fit to the data from each individual, giving one set of parameter values per participant. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were calculated for each model. The AIC and BIC are measures of the goodness of fit of the model that take into account the number of free parameters (model complexity); lower values indicate better complexity/fit trade-off.