Introduction

Although studies on the role of typographical and perceptual factors during visual-word recognition and reading have a long tradition (see Huey, 1908; Tinker, 1963), this area of research has been relatively neglected in the past decades, despite its obvious practical implications (Legge, Mansfield, & Chung, 2001; see also Moret-Tatay & Perea, 2011; Slattery & Rayner, 2010). In the present study, we focus on how variations of interletter spacing affect the recognition of visually presented words. Changes in interletter spacing can have either a beneficial or a deleterious effect in visual-word recognition, depending on its magnitude (see, e.g., Chung, 2002; Perea, Moret-Tatay, & Gómez, 2011). On the one hand, very large interletter spacings hinder the perceptual integrity of the whole word (e.g., as in

figure b

) and (unsurprisingly) produce longer word identification times—indeed, this manipulation has been employed as a way to degrade words (e.g., see Cohen, Dehaene, Vinckier, Jobert, & Montavont, 2008). But on the other hand—and more importantly for the present purposes—small increases in interletter spacing (relative to the default settings; compare

figure c

vs.

figure n

) do not destroy the integrity of the written word but do produce two potential benefits: fewer “crowding” effects (i.e., less interference from the neighboring letters; see Bouma, 1970; O’Brien, Mansfield, & Legge, 2005) and a more accurate process of letter position coding (see Davis, 2010; Gomez, Ratcliff, & Perea, 2008).

In a recent study based on the lexical decision task, Perea et al. (2011) found faster identification times for words presented with a slightly wide interletter spacing (+1.2; e.g.,

figure d

) than for words presented with the default spacing (0.0; e.g.,

figure o

; see Latham & Whitaker, 1996, and McLeish, 2007, for similar findings with other paradigms and populations). (The interletter spacing levels were taken from the values provided by Microsoft Word; e.g., the value +1.2 refers to an expanded intercharacter spacing of 1.2 points in this application.) Perea et al. concluded that small increases in interletter spacing (relative to the default settings) produced a benefit for lexical access, but they acknowledged that “more research is needed to examine in greater detail the optimal interletter value using a large set of interletter spacing conditions” (p. 350). The present experiment aims to fill this gap. In the present experiment, we employed a parametric approach with five levels of interletter spacing: condensed (–0.5), as in

figure e

; default (0.0), as in

figure p

; expanded (+0.5), as in

figure f

; expanded (+1.0), as in

figure g

; and expanded (+1.5), as in

figure h

. As in Perea et al.’s study, we used the most common word identification laboratory task: the lexical decision task (see Balota et al., 2007); note that the effects obtained with this task have typically been replicated in normal silent reading (Rayner, 1998; see also Davis, Perea, & Acha, 2009; Perea & Pollatsek, 1998). (We examine the potential implications of the present experiment for normal silent reading in the Discussion section.)

The second goal of the present experiment was to examine the nature of the effect of interletter spacing. To do that, we employed Ratcliff’s (1978) diffusion model for speeded two-choice decisions. This model has been quite successful at accounting for lexical decision data (e.g., Ratcliff, Gomez, & McKoon 2004; see also Gomez, Ratcliff, & Perea, 2007; Ratcliff, Perea, Colangelo, & Buchanan 2004; Wagenmakers, Ratcliff, Gomez, & McKoon, 2008). According to the diffusion model account of the lexical decision task, the visual stimulus is encoded so that the relevant stimulus features (e.g., lexical features) are utilized to accumulate evidence toward a “word” or “nonword” response. The accumulation of evidence is assumed to occur in a noisy manner. The two aforementioned processes (encoding and accumulation of evidence) are represented by two separate parameters in the model: (T er and drift rate, respectively). Importantly, in a diffusion model, changes in these two parameters produce different effects in qualitative aspects of the data. If the T er parameter changes, there should be shifts in the response time (RT) distributions with no change in their shape (see Gomez et al., 2007), and in addition, there should not be any effect on error rates. On the other hand, changes in the drift rate produce greater effects in the tail than in the leading edge of the RT distributions (i.e., the .1 quantile) and also affect error rates.Footnote 1 Therefore, if the effect of interletter spacing takes place in the early encoding, nondecisional stage, its effect should be a shift of the RT distribution with no effect on accuracy. Alternatively, if the impact of interletter spacing occurs in the word system, the one would expect some changes in the drift rate—and consequent changes in the RT distributions and error rates. We should note here that Perea et al. (2011) briefly discussed the RT distributions—with no fits of the diffusion model—and the effect of interletter spacing on words grew very slightly as a function of RT quantiles, while the changes in error rates were minimal. However, explicit fits are necessary to corroborate that observation—in particular, by using a wider range of interletter spacing conditions. To obtain stable estimations for the diffusion model, we employed a large number of items per condition (60) in the experiment.

Method

Participants

A group of 25 students at the University of Valencia took part in the experiment voluntarily. They were native speakers of Spanish, and all had either normal or corrected-to-normal vision.

Materials

We selected a set of 300 Spanish words from the B-Pal lexical database (Davis & Perea, 2005). The mean written frequency of these words was 89 occurrences per million words (range: 24–690); the mean length was 5.6 (range: 5–6); and the mean number of substitution-letter neighbors was 1.53 (range: 1–4). For the purposes of the lexical decision task, 300 orthographically legal nonwords were also created (mean length: 5.6 letters; range: 5–6). These nonwords had been created by changing two letters from Spanish words that did not form part of the word list. The stimuli were presented in Times New Roman 14-pt font (i.e., the same font as in the Perea et al., 2011, experiments). Five lists of stimuli were created to counterbalance the materials across letter spacings, so that each target appeared only once in each list, but in a different condition. The list of stimuli is available at www.uv.es/mperea/paramspacing.pdf. The participants were randomly assigned to each list.

Procedure

Participants were tested individually in a quiet room. Presentation of the stimuli and recording of latencies were controlled by a computer using DMDX (Forster & Forster, 2003). On each trial, a fixation point (+) was presented for 500 ms in the center of the monitor. Then, the stimulus item (in lowercase) was presented until the participant’s response. The letter strings were presented centered, in black, on a white background. The participants were instructed to push a button labeled “yes” if the letter string formed an existing Spanish word and a button labeled no if the letter string was not a word. Each participant received a different order of trials, and the whole experimental session lasted about 25 min.

Results

Incorrect responses (3.9% of the data) and RTs less than 250 ms or greater than 1,500 ms (less than 1.5% of the data) were excluded from the RT analyses. The mean correct RTs and error percentages from the participant analysis are presented in Table 1. ANOVAs based on the participant and item mean correct RTs were conducted according to a 5 (interletter spacing: condensed [–0.5], default [0.0], expanded [+0.5], expanded [+1.0], or expanded [+1.5]) × 5 (list: 1–5) design. List was included as a dummy factor in the statistical analyses to remove the error variance due to the counterbalancing lists.

Table 1 Mean response times (in milliseconds) and percentages of errors (in parentheses) for words and pseudowords in our experiment

Word data

The ANOVA on the latency data showed an effect of interletter spacing, F 1(4, 80) = 3.94, MSE = 934, p < .007, η 2 = .16; F 2(4, 1180) = 7.81, MSE = 5,391, p < .001, η 2 = .03. This effect reflected a decreasing linear trend (see Table 1), F 1(1, 80) = 6.59, MSE = 2,147, p < .02, η 2 = .25; F 2(1, 295) = 29.50, MSE = 5,869, p < .001, η 2 = .09, while the quadratic/cubic/quartic components were not significant (all Fs < 1).

The ANOVA on the error data showed did not reveal any significant effects (both ps > .25).

Nonword data

The ANOVAs on the latency/error data failed to show any significant effects (all Fs < 1).

Diffusion model analysis

Within the diffusion model framework, different data patterns correspond to distinct parameter behavior. The behavior of the parameters can then be interpreted in terms of psychological processes. To this end, we present the fits to the grouped data that we obtained using the fitting routines described by Ratcliff and Tuerlinckx (2002). We calculated the accuracy and latency (i.e., the RTs at the .1, .3, .6, .7, and .9 quantiles) for “word” and “nonword” responses for all conditions and for all participants, and we obtained the group-level performance by averaging across subjects (i.e., vincentizing; Ratcliff, 1978; Vincent, 1912). Fitting averaged data is an appropriate procedure for fitting the diffusion model. In previous research (e.g., Ratcliff, Gomez & McKoon 2004; Ratcliff, Thapar, & McKoon, 2001), fits to averaged data provided parameter values similar to the values obtained by averaging across fits to individual participants. The averaged quantile RTs were used for the diffusion model fits as follows: The model generated for each response the predicted cumulative probability within the time frames bounded by the five empirical quantiles. Subtracting the cumulative probabilities for each successive quantile from those of the next higher quantile yields the proportions of responses between each pair of quantiles, which are the expected values for the chi-square computation. The observed values are the empirical proportions of responses that fall within a bin bounded by the 0, .1, .3, .5, .7, .9, and 1.0 quantiles, multiplied by the proportion of responses for that choice (e.g., if there is a .965 response proportion for the word alternative, the proportions would be .965*.1, .965*.2, .965*.2, .965*.2, .965*.2 and .965*.1).

In this article, we used the model as a tool to test specific hypotheses in the most principled way possible. Simply put, interletter spacing could affect the encoding time, the rate of accumulation of evidence, or a combination of these two parameters. Hence, we implemented three parameterizations of the diffusion model (see Ratcliff & McKoon, 2008, for a full description of the model and its parameters, and Table 2 for the parameter values here); the first parameterization is a fairly unconstrained implementation of the model in which the T er parameter (i.e., encoding/response process) was allowed to vary for each interletter space (same T er for words and nonwords, so five values of T er), and the drift rates (i.e., quality of information) were allowed to vary for each interletter spacing and also for words and nonwords (creating ten values of drift rate). For the next two parameterizations, we removed free parameters and obtained the loss in the quality of the fits in terms of chi-square (in the second parameterization, we allowed T er to vary, and in the third we allowed the drift rate to vary). These chi-squares are based on group data, so they cannot properly be used as absolute measures of fit; however, they provide us with an estimate of the loss or gain in the quality of the fit relative to each of the other parameterizations of the model.

Table 2 Parameters of the diffusion model for the different scenarios in the experiment

The unconstrained parameterization yielded a chi-square value of 77.37. Interestingly, the value of the T er parameter decreased as a function of interletter spacing, from .492 in the condensed condition to .473 in the +1.5 condition. The value of the drift rates for words increased very slightly from the condensed condition (.264) to the +1.5 condition (.278).

Ter parameterization

Pure distributional shifts (i.e., changes in the locations of distributions) are naturally accounted for by allowing the T er parameter to vary (i.e., there were five values of T er, one for each level of spacing, and two drift rates, one for words and one for nonwords). This model yielded a chi-square value of 104.55, which was 14% greater than the value from the unconstrained model (see Ratcliff & Smith, 2010, for a similar result in a perceptual task).

Drift rate parameterization

Across a large variety of manipulations, the mean RT and the variance are correlated; this is so because effects tend to be larger in the tail of the RT distribution than in the faster responses (e.g., the word-frequency effect affects both the mean and the variance of the RTs). These effects are naturally accounted for by allowing the drift rate to vary (i.e., one value of T er and 10 values of drift rate: one for words and one for nonwords for each level of spacing); this model yielded a chi-square value of 126.63, which was 38% worse than the value for the unconstrained model.

To summarize, the T er parameterization provides the best balance of parsimony and quality of fits. It has eight fewer parameters than the unconstrained model and three fewer parameters than the drift rate model. Furthermore, it nicely fits the empirical patterns, with a shift in the RT distributions and also a null effect on error rates as a function of spacing (see Fig. 1). One feature of the data, however, does not seem to be accounted for by any of the parameterizations: Interletter spacing affected only the responses to words, not to nonwords, thus suggesting an interaction between lexicality and the encoding process. One can speculate on the reasons for this pattern of results. In previous applications of the diffusion model to the lexical decision task, the encoding time has been assumed to be unaffected by lexical status. Nonetheless, in Ratcliff, Gomez and McKoon (2004) Table 3, it can be seen that the diffusion model consistently underestimates the empirical .1 quantile for pseudowords by 5–15 ms. Clearly, more work is needed to understand the specific nature of word-versus-nonword decisions in lexical decision (see Davis, 2010, for a discussion).

Fig. 1
figure 1

Group response time (RT) distributions in the five interletter spacing conditions for word (left panel) and nonword (right panel) stimuli. Each column of points represents the five RT quantiles (.1, .3, .5, .7, and .9) in each letter spacing condition. These values were obtained by computing the quantiles for individual participants and subsequently averaging the obtained values for each quantile over the participants (see Vincent, 1912). The proportions shown at the bottom of the figure are the accuracy rates for each condition. The plus signs joined by dotted lines represent the fits of the T er parameterizations of the diffusion model. The offset in the horizontal dimension represents the size of the model miss

Discussion

The findings from the present experiment are clear. First, small increases of interletter spacing (relative to the default settings) lead to faster word identification times, extending the findings of Perea et al. (2011) to a wider range of interletter spacing conditions. Second, the effect of interletter spacing shows a decreasing linear trend (see Table 1). Third, the effect of interletter spacing occurs at the encoding level rather than at a decisional level, as deduced from the fits of the diffusion model (see Fig. 1).

What about the locus of the effect of interletter spacing for word stimuli? The locus of this effect is at an encoding level (rather than at the decision level), as deduced from the fits of the diffusion model: The fits of the model were very good when the encoding parameter (T er) was allowed to vary freely across the spacing conditions, while they were rather poor when the drift rate (i.e., the quality of lexical information) was allowed to vary across conditions (see Fig. 1). To our knowledge, this is the first time in which a manipulation at the stimulus level has produced an effect on the encoding time rather than on the quality of information (i.e., drift rates)—note that this finding undermines a common criticism of the diffusion model approach, that “everything goes to drift rate.”Footnote 2 This encoding advantage for words with a slightly wide interletter spacing was presumably due to less “crowding” or a more accurate “letter position coding” process; the present experiment was not designed to disentangle these two accounts, however. Thus, increased letter spacing could be thought to enhance the perceptual normalization phase, which would affect T er but not drift rates. This pattern of data is consistent with the experimental findings of Yap and Balota (2007), who found that degrading the lexical string led to a shift of the RT distribution, with little or no effect on the error data. Although Yap and Balota did not conduct any explicit modeling, this finding would be consistent with the view that stimulus degradation affects the encoding process (i.e., T er in the diffusion model)—namely, that the decision process would not begin until the appropriate information had been extracted from the stimulus.

Importantly, the presence of faster identification times for words presented with a slightly wider interletter spacing than the default one (i.e.,

figure i

faster than

figure q

) has obvious practical implications. The “default” interletter settings in word-processing packages (and publishing companies) may not be optimal—keep in mind that this “default” setting was established on the basis of no empirical evidence (see McLeish, 2007). One fair question to ask is whether or not the present findings can be generalized from the recognition of single, isolated words (e.g., when reading the names of products, stores, or bus/subway stations) to the context of text reading. The vast majority of the effects obtained in visual-word recognition tasks have been generalized to normal reading experiments—with the advantage that word-identification tasks can be easily modeled in terms of the components of word processing. Nonetheless, the potential advantages of interletter spacing in the fovea during word identification (e.g., less crowding, more accurate letter position coding) may be canceled out by the fact that the N + 1 or N + 2 words in a sentence would be presented slightly farther away from fixation (i.e., with a decrease in acuity; compare the sentences “

figure j

” vs. “the cat is on the couch”). In this respect, in an unpublished study, Tai, Sheedy, and Hayes (2009) used nine conditions, from condensed interletter spacing (–1.75; e.g.,

figure k

) through expanded interletter spacing (2.00; e.g.,

figure l

) in a reading task in which participants had to read a novel while their eye movements were monitored. Tai et al. reported that fixation durations decreased with interletter spacing in a linear way—as occurred in the present experiment with lexical decision times. However, the number of fixations and the regression rate in the Tai et al. experiment also increased with interletter spacing, and the overall reading rate was not affected by interletter spacing. More research will be necessary to assess the role of interletter spacing in a normal reading scenario, not only with adult skilled readers, but also with other populations (e.g., low-vision individuals, young readers, or dyslexic readers).

In sum, the present experiment with adult skilled readers has revealed that small increases in interletter spacing (relative to the default settings) have a positive impact on lexical access, and that the locus of the effect is at an early encoding (nondecisional) stage. This finding opens a new window of opportunities to examine the role of interletter spacing not only in other well-known word identification paradigms, but also in more applied settings (i.e., normal silent reading).