Letter processing and font information during reading: Beyond distinctiveness, where vision meets design
- First Online:
- Cite this article as:
- Sanocki, T. & Dyson, M.C. Atten Percept Psychophys (2012) 74: 132. doi:10.3758/s13414-011-0220-9
- 1.5k Downloads
Letter identification is a critical front end of the reading process. In general, conceptualizations of the identification process have emphasized arbitrary sets of distinctive features. However, a richer view of letter processing incorporates principles from the field of type design, including an emphasis on uniformities across letters within a font. The importance of uniformities is supported by a small body of research indicating that consistency of font increases letter identification efficiency. We review design concepts and the relevant literature, with the goal of stimulating further thinking about letter processing during reading.
KeywordsLetter identification Letter perception Font Font tuning Common features Type design Reading
Motivated by the increasing realization that letter perception is an important but overlooked stage in the reading process (e.g., Finkbeiner & Coltheart, 2009; Grainger, 2008; Massaro & Schmuller, 1975; Pelli, Burns, Farell, & Moore-Page, 2006), there has been a resurgence of interest in letter perception in relation to reading. It is now clear that letter perception provides a critical front end for reading because letters are functional units; they are independent pieces of the word code (e.g., McClelland, 1976; Oden, 1984; Pelli, Farell, & Moore, 2003). The visual forms of letters exist within a larger structural design, a family of objects known as the type font. This idea has implications for perceptual identification that we begin to develop here.
In previous research and theory, a core concept is distinctiveness—the properties that make one letter easy to discriminate from its alternatives in the alphabet. This has led to the central concept of feature detection in the literature (e.g., Fiset, Blais, Éthier-Majcher, Arguin, Bub, & Gosselin, 2008; Gibson, 1969; Massaro & Schmuller, 1975). Letters are defined by sets of features whose membership is determined by distinctiveness. If distinctiveness is indeed critical, increasing it through alphabet design should increase legibility. This logic has been recently advocated (e.g., Fiset et al., 2008; Gosselin & Tjan, 2008).1 However, if letter distinctiveness is an incomplete basis for understanding letter processing during reading, calls to redesign letters are premature.
A richer view of letter processing incorporates structural relations between letters and originates in the field of type design. Type designers have long been concerned with letter form and its impact on reading. Text fonts are designed for reading continuous paragraphs of text, and the main goal in their design is to produce optimally legible letter forms. Type designers recognize the importance of distinctiveness, but they also emphasize the uniformity of letters (e.g., Carter, Day, & Meggs, 1985; Cheng, 2005). The classical goal of type design is to achieve harmony and balance between individual forms. Within words, a letter should never stand out; it should cohere with neighboring letters, in order to better form a word unit and sublexical units as well.2
Thus, commonalities within and between letters are a design feature of high-quality text fonts. Type designers incorporate commonalities because they believe they are important for legibility, on the basis of their data. Their data are judgments refined through training, aimed at understanding the structural relations that constitute a legible font. Type design results from a design process in which design possibilities (variations in visual structure) are evaluated by the designer and other educated readers by intuitively monitoring their own reading experience.
The psychological evidence that commonalities contribute to efficiency comes from advantages found in identification of letters of consistent, regular fonts, relative to mixed or irregular fonts (Gauthier, Wong, Hayward, & Cheung, 2006; Sanocki, 1987, 1988, 1991b, 1991c). The interpretation of these results is that letter processing becomes more efficient because the perceptual-processing system tunes itself to exploit regularities of a font (see also Sanocki, 1991b, 1992; Walker, 2008). In contrast, in mixed font conditions, although there are often the same n alternative forms and letter identities, they are from two or more fonts, instead of a single font. Note that mixing fonts increases the distinctiveness of individual letters, because differences between fonts correspond with letter identities. However, spatial and other properties are not as regular in a mixed font, and the perceptual system cannot exploit them as well. As a result, letter identification efficiency is reduced relative to same-font conditions. These regularity effects imply that shared properties within a font (commonalities) are important in letter processing, in addition to distinctiveness. These commonalities are constraints that create a family of objects for identification. The ability to exploit commonalities is a hallmark of expertise with letters; nonexperts (i.e., those unfamiliar with the writing system) do not exploit regularities (Gauthier et al., 2006).
Plan of this article
We begin with preliminary issues and the idea of distinctive features. Next, we review selected recent work and contrast it with our emphasis on commonalities. In the main section, we review the small body of research on font mixing and identify outstanding issues. We conclude with several further issues.
Observing and appreciating the complexity of letter processing during reading
Letter research can benefit from discussion of the perceptual problems (Marr, 1982) that must be solved in letter processing during reading. We begin, however, by commenting on how letter stimuli should be presented to maximize experimental sensitivity to visual processing.
Because words are highly meaningful units in reading, it is natural to use them to study reading. However, higher level units in language impart strong benefits on the processing of their constituents; in particular, word unitization in the brain strongly benefits letter processing. The benefits occur because information is integrated across letters in words (e.g., Massaro & Sanocki, 1993; McClelland & Rumelhart, 1981), making the perceptual system more robust to limitations of letter-level processing (e.g., Oden, Rueckl & Sanocki 1991). Sublexical units such as syllables and bigrams are also likely to cause top-down effects, as are wordlike nonwords (e.g., Grainger, 2008; McClelland & Rumelhart, 1981). To maximize sensitivity to visual processes, these top-down effects must be obviated. One way to do so is to present unrelated letter strings as stimuli. Such strings benefit much less from higher level processing, and, critically, they embody difficulties that occur for letters in words. There is evidence from a variety of paradigms of reductions in top-down benefits as higher level units are removed.4 Of course, if the top-down influences on letter processing are of interest, wordlike stimuli are appropriate.
When letters are presented in strings, there are at least three classes of problems that the perceptual system must solve. Commonalities within a font may contribute to solving these problems.
Location and position uncertainty
Uncertainty related to horizontal position is considerable during reading for at least two reasons. First, uncertainty is produced because the eyes move rapidly across text, landing briefly at various positions within words. Information must be registered somehow relative to eye position. Second, there is uncertainty produced by the letters themselves. Skilled readers almost always read proportionally spaced fonts, in which letter width is highly variable. As many as three narrow letters (e.g., ill) can fit within the space of one wide letter (e.g., m or w). Research has documented the location uncertainty of letters when words are processed (e.g., Davis & Bowers, 2006; Gomez, Ratcliff, & Perea, 2008; Mozer, 1983). These complexities necessitate that the reference system for letters in words include horizontal positional information. In designing type, horizontal parameters determine the width of individual letters and the positioning of letters (including space between letters and words). Vertical position is a likely separate but also important issue. 5
Crowding is caused by limitations in the visual system's ability to resolve features within particular regions, a limitation that increases with eccentricity (Pelli et al., 2007; Stuart & Burian, 1962). Although fixated letters may not be crowded, there is increasing crowding for letters away from fixation (Pelli, Palomares, & Majaj, 2004; Pelli et al., 2007), and crowding may be especially problematic for parafoveal processing of letter and word information (e.g., Juhasz, Pollatsek, Hyönä, Drieghe & Rayner 2009). Thus, crowding is a further problem for letter processing during reading. Interestingly, crowding can be reduced by systematic configuration of structures (Livne & Sagi, 2007). Crowding is also affected by whether a target blends with flankers (Saarela, Sayim, Westheimer, & Herzog, 2009). Font design may balance these effects, providing systematicity that reduces crowding, while framing visual distinctions that aid identification. For example, ascenders and descenders are likely to be easy to distinguish because they occupy the less-crowded upper and lower regions of letter space.
Composing a parallel cue
During reading, the desired result of letter processing is lexical access, via a code that accesses the word and its meaning. In most cases, each letter has a functional role in composing the word code; skilled readers process letters in a parallel manner (e.g., Spinelli et al., 2005) and are disrupted by missing letters (e.g., White, Johnson, Liversedge & Rayner 2008). Thus, the best word cue includes an appropriately ordered set of letter identities. This means that the relative timing with which letter information becomes active in the mind is critical; if letters within a word become active at different times, the cue is wrong and will access the wrong word(s). There is evidence of the negative effects of salient sublexical groups in research on identifying case-mixed words (Humphreys, Mayall, & Cooper, 2003; Mayall, Humphreys and Olson 1997) and direct evidence in size-mixed letter identification (Sanocki, 1991b), discussed below. Fonts are designed to ensure that particular letters are not more salient than others; this and other design features may help ensure that letters in words become active in mind in parallel and over a similar time course.
Finally, we argue that lowercase letters are the best stimuli to study in most instances. Lowercase is the regular practice in texts meant for reading. The variation in basic letter shapes is a potent cue to identity. Thus, lowercase letters are best for studying skilled letter processing in general. All uppercase text has been found to be read slower than lowercase and is thought to be less legible (Tinker & Paterson, 1928).6
Time course of processing individual letters during reading
We briefly summarize the time course. Theories posit a hierarchy of processing in which letter identity is quickly abstracted from visual details (e.g., the letter “a” is abstracted from font and case [A, a]; e.g., Grainger, 2008; Massaro & Schmuller 1975). That is, letter identification begins with the processing of visual information such as features, but visual details are discarded as soon as an abstract identity is extracted. Visual processing may involve global-to-local or coarse-to-fine stages, as we describe later (e.g., Fiset et al., 2009; Navon, 1977; Sanocki, 1991a, 1993, 2001). Abstract letter identities (and associated phonology) are used to identify words. The importance of abstraction is supported by the rapidity of perceiving abstract letter identity, which is independent of visual structure (Friedman, 1980).
Our main emphasis will be on the processes leading up to the activation of letter identity. The general characteristics of these processes is widely agreed upon in the research community. There is also evidence that type information can sometimes linger in word processing, becoming part of the memory code (e.g., Goldinger, Azuma, Kleider, & Holmes, 2003); this is an interesting issue but is not treated here. In the review section, we emphasize processes leading up to identification, as opposed to decision processes that may occur after initial letter identification (cf. Neely, 1991; Sanocki, 1987, 1992).
Research on letter perception
How are letters perceived? This issue has been examined for many years, and the dominant framework is based on features—the idea that a letter is perceived by detecting independent features. The features are an arbitrary set whose combination serves to distinguish the letter, activating an abstract letter code (e.g., Fiset et al., 2009; Gibson, 1969; Massaro & Schmuller, 1975). Accordingly, research has sought to determine what the features of letters are. Much of this research has presented individual letters, under the assumption that the features used in recognizing isolated letters are also used for letters in context. We first summarize some core ideas in this research and then go on to more recent statements.
The traditional method for investigating letter features has been to present letters briefly for identification and explore confusions among the letters, under the assumption that similar letters will be confused with each other (e.g., Bouma, 1971; Gervais, Harvey, & Roberts, 1984; Gibson, 1969; Harris, 1973; Keren & Baggen, 1981). Similarity is assumed to be determined by shared letter features, and the pattern of interletter confusions has been used to determine what the features of letters are. However, after decades of research involving confusion matrices, little agreement has been reached on a definitive set of features. Summarizing more than 70 published studies on letter confusability, Grainger, Rey, and Dufau (2008) described the features as “mainly consisting of lines of different orientation and curvature” (p. 381). These are local features. In contrast, Bouma (1971) proposed global features, such as vertically ascending and descending parts, slenderness, and outer parts as features that might serve as perceptual cues. Thus, a very basic question remains open, as to whether features are global or local in nature.
The value and appropriateness of using confusion matrices to identify letter features has been questioned. Pelli et al. (2006) referred to limited success in identifying features, singling out just two global attributes as important: roundness and letter width. However, common manipulations (low contrast and rapid presentation) make low spatial frequencies appear more important and, hence, are more likely to reveal global features (Fiset et al., 2008). Grainger et al. (2008) and Fiset et al. (2009) noted that letters must be degraded to create confusions, and this can influence the nature of the confusions. In fact, support for this argument was obtained by Bouma (1971), who compared two different manipulations (long reading distance and eccentric vision) and found differences in the pattern of confusions. Differences have also been attributed to the particular fonts tested; Gilmore (1985) noted that fonts vary in the spatial frequency and phase spectra of the letters due to their shape, proportions, and other stylistic attributes.
Classification image technique
In two recent articles, Fiset et al. (2009; Fiset et al., 2008) used classification image methods to discover the features used in letter identification. The method directly tests the effects on letter identification of feature samples at multiple scales. Thus, the method addresses the issue of feature scale (and spatial frequency) by examining different combinations of scaled features. On each trial in the experiments, a different combination of feature samples at multiple scales is generated, combined, and then presented as a stimulus (one stimulus centered on fixation) for subjects to identify. Then identification probabilities are analyzed with multiple regression to determine how strongly feature samples are associated with correct identification. For lowercase letters (Arial font), Fiset found that line terminations (the ends of letter parts) were by far the most important feature for human observers and much more important than the second feature, horizontals.
Fiset et al. (2009; Fiset et al. 2008) argued that terminations may be important because they represent critical identification information across most fonts—font-invariant information. We agree with this conclusion but suggest that the perceptual importance of terminations can be understood more deeply in terms of their role within the system of distinctions within a font; terminations distinguish between different basic letter parts or their combination and, thus, between letters (e.g., Sanocki, 1987). For example, n and r both have a curved component (arch), but the curve of n continues further into a vertical before terminating. In type design, these distinctions are systematically constrained within and across fonts and often are marked with serifs (small lines at the end of strokes) or other details.
One potential problem with single-letter studies is that absolute locations within letters are likely to be overemphasized. Over trials, the letters are superimposed over each other (presented centered in the same position); this can make letter parts that do not overlap (or overlap less) with other letters especially salient. For example, p’s descender becomes a unique area of nonoverlap in the lower left of absolute letter space. In contrast, during reading, letter perception is complicated by spatial uncertainty that is resolved over time, and there is no such thing as absolute location or area of nonoverlap.
Efficiency of letter processing: 1. Letter templates
Letter processing is highly efficient in skilled reading; indeed, efficient letter processing is a requisite for comprehension (e.g., Laberge & Samuels, 1974). Therefore, the efficiency of letter processing is an important concern. Using psychophysical methods, Pelli et al. (2006) recently developed an empirical definition of efficiency that involves an ideal observer. The measure compares the efficiency of identification by human observers with a maximum level possible for the set of stimulus alternatives, defined by an ideal observer (Tanner & Birdsall, 1958). The ideal performance level is based on the assumption that maximal identification (optimal use of stimulus information) is achieved by comparing the stimulus representation on each trial with templates for the alternatives (in an absolute location space) and picking the alternative with maximum overlap.
Different fonts vary in the ideal level they allow, because the letters within the fonts can be more or less similar to each other. The ideal observer method factors out the overlap between letters and establishes the ideal level for each font. Human performance can then be compared with the ideal level possible with a font.
Pelli et al. (2006) established the ideal level for a number of different fonts by simulating, through a computer program, the ideal identification for single letters presented centrally but embedded in white noise. For comparison, they also calculated ideal performance for sets of alternatives consisting of simple shapes (subsets of squares in matrices—Checkers font), and sets of many words (treated as single characters). Pelli et al. (2006) then measured human performance with the same stimuli and presentation method and reported the ratio of human performance to ideal performance.7 Interestingly, overall efficiency was surprisingly low, ranging between 0.09 and 0.14 for typical text fonts (Bookman and Helvetica, respectively). Efficiency reached its highest level for bold fonts (Bookman Bold, 0.155) and was markedly higher for the fairly simple shapes (2 × 3 Checkers, 0.308).
If efficiencies had approached 1 for some fonts, Pelli et al. (2006) would have argued that letter identification involves something like the template-matching scheme modeled by the ideal observer. Indeed, they had expected such a result. However, the low efficiencies imply, instead, that observers use an alternative and less efficient scheme. On the basis of further analyses, Pelli et al. (2006) concluded that letter identification follows an early feature detection stage involving multiple independent features. The separate feature decisions made at this stage produce the low levels of efficiency observed. The idea of an initial feature detection stage is consistent with the prior literature on feature detection as the basis of letter identification.
Pelli et al. (2006) have developed a measure that is formal, specific, and useful in certain ways. However, the Pelli et al. (2006) approach is limited as well. One general problem is that efficiency was low for high-quality, common text fonts; these results call into question the validity of applying this overall measure to reading. The Checkers font, made of randomly combined squares, had the highest efficiency.
Why might text font efficiency be so low? Simple problems include differences caused by brief stimulus presentations (in contrast to reading) and presenting individual letters centered at fixation, which may overemphasize exact spatial information. More important for this article, the concept of efficiency, as well as the ideal observer model, is based on the assumption that distinctiveness (template differences between letters in absolute space) is critical and that only distinctiveness is critical. The assumption is inconsistent with the basic assumption in type design that commonalities among letters are also important. Note that distinctiveness could be increased by creating a mixed font, where differences between fonts would increase interletter distinctiveness. For such a font, ideal observer efficiency should go up. Yet this would seriously contravene type designers’ intuitions as to what is a legible font. Furthermore, evidence indicates that letter perception would be slowed down by font mixing.
A related point is that the ideal observer approach ignores another possible basis of efficiency during reading. In almost all texts, the font remains the same within words and throughout the text. Commonalities of type could be exploited by the perceptual system. Letters within a string share spatial parameters (such as x-height or ascender height), and their reference frames may be resolved in parallel over time. And, because the parameters of a good font remain constant, much of the perceptual information used for one word could be applied to subsequent letter processing. Thus, letter-processing efficiency could be increased by tuning of the perceptual system.
Efficiency of letter processing: 2 Parts, relations, and font tuning
Independent features are an elegant way to decompose letters into psychologically functional subunits, but they are not the only analytic approach. An alternative approach to letter-processing efficiency that incorporates type commonalities (Sanocki, 1987) has been developed using structural network concepts, from the literatures on modeling visual knowledge about objects and scenes (schema theories; see, e.g., Oden, 1979; Palmer, 1975; Pinker, 1984; Sanocki, 1999), including letters (Knuth, 1982). In the network approach, object structure is modeled in terms of entities and relations between the entities that are made explicit within a network representation (e.g., Oden, 1979). In type, the major component entities of letters may be letter parts (Sanocki, 1987). Relations and commonalities among components are made explicit in the network, including spatial size parameters, part shape, and details of lines and terminations (Knuth, 1982; Sanocki, 1987).8
The efficiency of a consistent font
The possibility of tuning for font-specific details led Sanocki (1987, 1988) to create contrasting fonts and vary the consistency of the font within strings and between strings. Sanocki (1987, 1988) measured the effects of font consistency on general measures of letter identification efficiency. The importance of font consistency was implicated by prior reading research indicating more efficient perception when typographic variables such as size, case, and type style are consistent (e.g., Corcoran & Rouse, 1970; Rudnicky & Kolers, 1984; Tinker, 1963). However, the focus on letter identification by Sanocki was new.
Sanocki (1987) developed fonts that were similar in overall letter size (top of ascender to bottom of descender) but differed on several properties, including the extent and details of terminations, the basic shape of parts (Experiments 1–3), and line thickness and spatial dimensions (x-height, ascender/descender height; Experiment 2). Strings of letters were presented, and the font was either consistent within strings or mixed within strings. Sanocki (1987) used a letter–nonletter task; the letter strings sometimes contained a nonletter (a letter with a part added or deleted), and observers pressed a key to indicate their judgment. Responses were considerably faster for same-font strings over mixed strings (over 100 ms in most cases). Analyses based on additive-factors logic (Sternberg, 1969) implied that the same-font advantage arose from the speed of activating letter codes, rather than response or decision processes. The large font-mixing effect was additive with other large effects, presumably because it involved visual processes preceding checking or decision processes. The other effects on response time were produced by the factors string length and response (letter–nonletter); string length caused linear increases, while response caused a 2:1 slope pattern (Sanocki, 1987, Experiment 2). The conclusion that font mixing affected the efficiency of letter activation was supported more directly in the next set of studies.
The fonts in these studies were created by Sanocki, within dot matrices that were reasonable for the computer technology at that time (20 × 8 pixels, total height × width). The fonts are not as high in visual quality as typical text fonts in current use. Do the negative effects of font mixing generalize to higher quality fonts? More recent work with improved letter displays begins to address this issue (Gauthier et al., 2006; Walker, 2008). Perhaps most compelling is Gauthier et al.’s Experiment 3, which measured accuracy of identifying letters in strings of three letters each. This experiment involved sans serif letters displayed at a fairly high resolution but varying in a spatial size relation—the size of the ascenders and descenders relative to x-height (termed aspect ratio by Gauthier et al., 2006). Gauthier et al. found that when two fonts were mixed within strings, accuracy of identification was lower than when presented in pure strings. This result replicates negative effects of similar spatial properties found by Sanocki (1991b) with dot matrix letters (see below).
Gauthier et al.’s (2006) first two experiments were also conducted with high resolution letters. Gauthier et al. had observers search for a target letter through large matrices of well-spaced individual letters, with a search task that required serial, letter-by-letter processing. They found that search was faster and more accurate through entirely same-font matrices than when font varied from row to row. Search was even slower and less accurate when font varied within rows. The effects were found with English readers and (typical) Roman fonts (Experiment 1) and with Chinese readers and fonts of Chinese characters (Experiment 2). However, the search task requires a decision about each individual letter, and the overall reaction times may be influenced by decision-level effects that would accumulate across letters.
Walker (2008) used Bookman Bold and Palatino Italic and presented words and nonwords as stimuli. Font varied only between strings. Walker presented the strings in pairs (one above the other), and the observer’s task was to indicate whether there was a word present in the pair. Critically, responses were faster when the font was the same for the pair of strings, as compared with different-font pairs. Walker developed the idea that font tuning is the setting of parameters and explored the time course of tuning over trials (see also Sanocki, 1992). Walker’s results clearly implicate the processing of font-specific information, and Walker’s discussion of how font parameters may change over time is illuminating. However, the pattern of error rates could imply a speed–accuracy trade-off; error rates were somewhat (but not significantly) higher in same-font conditions. This could reflect the trading of accuracy for speed in same-font conditions; consequently, it is not clear that letter activation processes were more efficient here. The pattern of errors is consistent (but again, not significantly so) with decision-level influences in which “same-fontness” (perhaps uniformity of style across the pair of strings) is taken as evidence of wordness. This may be analogous to the decision-level effects found in letter–nonletter decisions (Sanocki, 1992) and well-studied in word–nonword decisions (see, e.g., Neely, 1991).
In summary, there is evidence of same-font over mixed-font advantages with letter strings, both with somewhat coarse dot matrix fonts (Sanocki, 1987, 1988, 1991a, 1991b, 1991c) and with higher resolution fonts (Gauthier et al., 2006), and evidence that font information is retained and used over time (Walker, 2008). There are a number of outstanding questions, however. Two are of particular importance.
What specific properties of fonts are critical for same-font advantages?
Is a constellation of differing font properties necessary for regularity effects (see, e.g., Fig. 4), or can specific properties be identified? Existing results have most clearly implicated spatial size parameters. In their Experiment 3, Gauthier et al. (2006) manipulated three different font properties and obtained same-font advantages for only one, the manipulation of aspect ratio mentioned (size of ascenders or descenders, relative to x-height). Two other manipulations had no effect in that experiment: a manipulation of letter slant (letters were tilted left or right with other properties constant) and fill (letters were outlines or filled). This suggests that spatial size parameters may be more important than these other possible font parameters. However, the range of possible font parameters has only been partially explored; remaining parameters include, but are not limited to, extents of terminations, shape of components, and line thickness. Also, spatial size parameters should be further defined and distinguished.
An additional result was that letter identification efficiency was not improved by an increase in the relative amount of distinctive information, in the small/large font. Even though the ascenders and descenders in this font were relatively large (see Fig. 5), and even though this font was presented as a consistent font throughout blocks of trials (to allow tuning), performance levels were similar to those for the normal fonts. This means that increasing the distinctiveness of the letters within the font did not increase letter-processing efficiency.
What is the relation between font-mixing effects and font typicality?
We begin this section with two perspectives on font mixing and font typicality. The first perspective is the possibility that font mixing has negligible effects on letter identification efficiency under important conditions—when highly legible text fonts of the same apparent size are mixed. We note that good text fonts are, in fact, quite similar to each other when critical size parameters are matched; the fonts may approximate a prototype. However, a second perspective is that the mixing of good fonts clearly has effects when the fonts vary in size relations or spatial relations. The implication is that, at the very least, the study of font tuning is a study of how spatial size relations are processed.
The first perspective arises because, despite the collection of font-mixing studies reviewed, we know of no compelling findings with typical text fonts that are similar in apparent size. Walker (2008) used high-quality fonts, but one was a bold font and the other a fairly readable italic font. Furthermore, there are some concerns about decision-level effects in those studies. Gauthier et al. (2006) found their most compelling effects by mixing spatial size relations and producing atypical fonts. The font parameters used by Sanocki (1987, 1988, 1991c) extend beyond the range of typical fonts in several ways. Thus, the extent of the mixing effect with typical, high-quality text fonts is an open issue.
In any case, further exploration of font-mixing effects with a range of fonts is in order. In this exploration process, measures in addition to letter identification efficiency should also be considered, as we suggest below. The similarity of good text fonts to each other, and to a prototypical form, has interesting implications, which are also explored below.
Although the existence of mixing effects for similarly sized text fonts is an open question, fonts are size-specific entities in the type world. They have a nominal point size, and there are, of course, real differences in size. The evidence is now clear than when different-size normal fonts are mixed, there are negative effects on letter and word processing. These effects include the negative effects of mixing spatial size relations within letters, studied by Gauthier et al. (2006) and Sanocki (1991b). As noted, the time course functions of Sanocki (1991b) indicate that these negative effects occur on the efficiency of letter activation. And there is another negative effect involving size—the size of letters with normal spatial relations—which we now turn to.
Size variation between letters has produced robust negative effects; however, these may not result from changes in letter identification efficiency. We now discuss those results briefly. Rudnicky and Kolers (1984) mixed large and small letters in reading and found a number of negative effects on reading speed. With normal fonts differing in size, Sanocki (1991b) found that letter identification accuracy was reduced when large and small normal letters were mixed, and only for the smaller letters. A conservative explanation for these negative effects is differences in attentional priority, as will be explained. Finally, recent research suggests that the negative effects of mixing letter case (eXaMpLe; McClelland, 1976) may arise because the differences in the size of upper- and lowercase letters disrupt the grouping of letters into word units; uppercase letters look and function as larger letters and may combine to form competing sublexical groups (Humphreys et al., 2003; Mayall et al., 1997). The attentional priority explanation of these results is that, when large and small letters are mixed within strings, the larger letters are a higher priority for attentional processes that read out abstract letter identity as visual processing is completed (Sanocki, 1991b). The attentional priority explanation was developed in the literature on hierarchical letters—that is, larger letters composed of smaller letters (e.g., Miller, 1981; Navon, 1977; Ward, 1982)—to explain advantages for the larger (global) level in decision. It can be applied to perceiving letters in strings (Sanocki, 1991b) and words. With mixed-case words, differences in size-based priority would disrupt the formation of word units but would encourage (inappropriate) grouping of same-size or same-case letters (see Humphreys et al., 2003). With mixed-sized strings, the presence of large letters makes adjacent smaller letters a lower priority for encoding, especially at string positions that get less attention (rightward positions for English readers; Sanocki, 1991b, Experiment 3). Future research on size mixing should consider effects on letter-processing efficiency and the further possibility of effects on attention to large and small letters (Sanocki, 1991b).
The microgenesis of letter and word processing
Evidence of the importance of common letter information can also be found in the time dimension of identification. In most of the experiments discussed, the time course of visual letter processing has been treated as a unitary event. However, it is possible to manipulate phases within the visual-processing time of a letter—to study the microgenesis of letter processing. Such studies reveal an important further property of commonalities, which is that their importance can change over time.
Most theories of the microgenesis of visual perception assume that the earlier phases involve coarse, larger, or more global features and that the representation becomes refined (quickly) over time, with finer details being resolved late in processing (e.g., Broadbent, 1977; Fiset et al., 2009; Navon, 1977; Palmer, 1975; Sanocki, 1991a, 1993, 2001). As was suggested in the introduction, the early phases establish some type of reference frame, such as the spatial envelope of letters. Also, in early phases, larger scale features, such as large letter parts, are likely to be perceptually important. These larger features can function as a commonality that aids letter perception (Sanocki, 1991a). For example, the vertical stroke that is common to f and t is a major structure within the letter. If established early in processing, this structure could be useful in integrating smaller parts, such as the top or bottom curves that distinguish the letters. In a simple study of the microgenesis of letter processing, Sanocki (1991a) presented large common features early in processing as a prime (for 33–67 ms), followed by the entire letter for a similar amount of time and a mask. Observers choose between the pair of similar letters. The primes always involved features common to the forced choice alternatives, providing no distinctive information. Yet they facilitated target identification, relative to no prime and various baseline primes. The results were extended to objects with common features primes, and temporally opposite results were obtained with small distinctive features (benefits for presentation in late processing, but not early processing, in contrast to large common features that had benefits early but not late; Sanocki, 1993, 2001). These results indicate that benefits of common information change over time.
Font typicality, type design, and language
The similarity of popular text fonts suggests that letter designers have developed, over centuries of research into design, prototype structures of individual letters. In the design field, Johnston described these as essential or structural forms: “the simplest forms that preserve the characteristic structure, distinctiveness, and proportions of each individual letter” (Johnston, 1945, p. 239). Similarity to prototype may be critical for acceptance or popularity of the font, and it may be critical for optimal legibility. Designers may have discovered, through their intuitive research methods, that fonts become less legible if they vary too much from the prototypical structure.
The basis of the prototypical structure of popular text fonts is an open and interesting question. Is the prototypical structure due to the nature of the human visual system? Have designers, by striving for legibility, developed a system that approaches optimality for the human visual system? These questions are complex and involve a number of separate issues, but headway is being made.
The parallels between the visual system and the structure of the world have been discussed in depth by theorists for some time (e.g., Lockhead & Pomerantz 1991; Shepard, 2001). In alphabet design, Changizi, Zhang, Ye, and Shimojo (2006) argued that letters have shape-distinctions analogous to objects in the natural environment, allowing readers to exploit general recognition mechanisms that evolved to efficiently perceive objects. Changizi et al. supported their thesis by finding commonalities in contour configurations across writing systems, nonlinguistic symbols, and natural scenes. In a further study of letter components, Changizi and Shimojo (2005) concluded that writing systems have evolved to balance distinctiveness and uniformity. Parallels between object perception and letter design were also studied by Lanthier, Risko, Stolz and Besner (2009). In object perception, vertices are argued to be more important than midsegment contours, because of their utility in distinguishing object shapes (e.g., Biederman, 1987); Lanthier et al. showed that vertices are also more important than midsegments for letter identification.
The research just discussed emphasizes distinctiveness. However, the richer framework we argue for includes the translation from deeper or more essential levels to the details of letters and emphasizes that uniformity of this translation is also important (see also Walker, 2008). Similar translations may be involved in speech perception, where deeper phonetic relations are modified by the talker (e.g., Martin, Mullennix, Pisoni, & Summers, 1989). For example, within a single talker, the acoustic pattern of one vowel is distinct from other vowels, but one vowel of a talker overlaps with the acoustic pattern of a different vowel spoken by another talker. We are able to identify vowels correctly across different talkers because we process the systematic covariation in signal properties (talker characteristics) to separate phonetic categories (Nearey, 1989). The talker-dependent variation applies to all vowels almost uniformly. Talker regularity effects are also found in that mixing talkers within a block of trials leads to vowel identification that is worse than that for a single talker (Nearey, 1989).
Given the importance of deep versus surface structure in letter perception and speech perception, could it also apply to object perception? In modern times, the details of manufactured objects often depend on the intended style. Would a chair of one style (e.g., early American) be perceived less efficiently in the context of another style (e.g., an otherwise modern Scandinavian room)? This is an open question; if such influences exist, models of object recognition that emphasize only distinctiveness would have to be modified.
Measuring the efficiency of the front end of reading: Judgments by the visually educated
We have argued that measuring the identification of letters in strings is a good way of assessing the efficiency of reading’s front end. However, there is a more sensitive and yet simpler measure that could be further explored—the “ease” or “enjoyment” of reading, assessed subjectively by a visually educated reader. This is a primary measure that type designers and typographers develop and refine as they build up design skill.
The reader is invited to consider the ease of reading the sentence in Fig. 7a, written in alternating letters from two fonts (i.e., a mixed font). While it is easy to ignore the modest “noise” and read the sentence, is the visual experience as pleasant as with pure fonts (Fig. 7b, c)? Would it be pleasant to read an entire book in mixed font? Measurements of subjective ease could be quite sensitive.
There may be more at stake here than ease of reading, however. Researchers have known for some time that readers allocate processing resources at multiple levels during reading and can alter their allocation strategy (e.g., Laberge & Samuels, 1974). Mixed font may be like visual noise. When visual noise is present, readers can still read quickly, but at a cost of resources and comprehension at higher levels (e.g., Gao, Stine-Morrow, Noh, & Eskew, 2011). Measures of ease and enjoyment may be sensitive to visual noise and be more sensitive to font-mixing effects than are measures of letter identification efficiency. Such measures could be scientifically useful, in combination with objective measures.
Combined objective and subjective measurement could also be used to test a critical assumption motivating type design, that subjective measurements are informative about visual processing. We have argued that type designer’s judgments appear to be valid in general, but this is an assumption that should be examined.
Our main goal in this article has been to stimulate further thinking about the front end of reading—the process of letter identification. Although distinctive features are psychologically important aspects of letter identification, we argue that their details are not arbitrary. A richer approach to letter identification involves distinctiveness together with the commonalities of letters—uniformities from letter to letter within a font, pertaining to spatial and size relations—and perhaps other stylistic details that characterize a font. Commonalities may be important for establishing spatial reference frames for letters and may help to define a family of objects for efficient identification. Font-mixing research begins to provide evidence that commonalities influence letter identification efficiency. However, critical questions remain to be answered. A variety of research approaches should be considered, including studies of the microgenesis of processing and, possibly, subjective measures such as those used by type designers.
The openness of these issues means that vision researchers should not be quick to propose changes in alphabet design to the larger world. Alphabet design is a complex topic, and it can be a fascinating topic for vision science, one that we can learn from. However, the topic should be reasonably understood before we can safely influence visual culture.
The idea of improving legibility through research and design is not new. Modifications to increase distinctiveness have been explored by Kolers (1969), Lockhead and Crist (1980), and, more recently, Beier and Larson (2010). A modification that was actually used (mainly to increase spelling regularity, but distinctiveness was also increased) was the Initial Teaching Alphabet (Pittman & St. John, 1969); this modification was not successful (e.g., Downing, 1967). Spencer (1968) cites proposals for new designs going back to 1881.
In general, type designers (mistakenly) assume that words are the functional units of reading. Psychological research, however, indicates that readers also form units at sublexical units such as letters, syllables, and perhaps bigrams (e.g., Grainger, 2008). Fortunately, designers work with letters and attend to how well letters form word units. We suggest that well-designed letters form functional sublexical units, while also forming good word units, which are most important for lexical access.
This is not done in a precise (mathematical) way. In fact, some shapes are given different heights in order to appear equivalent in height (e.g., to match the perceived height of a curved vs. a straight line, rounded letters such as lowercase a, c, e, o are usually slightly taller than v, w, x, y, etc.). Also designers may introduce some slight irregularities to reflect their particular design style.
These include paradigms studying the identification of letters in words and nonwords (e.g., McClelland & Rumelhart, 1981; Reicher, 1969; Wheeler, 1970), experiments on the effects on identification of visual feature information (e.g., Lanthier et al. 2009), and research on type factors such as case alternation (e.g., Mayall, Humphreys, & Olson, 1997).
For example, the space between lines (referred to as interlinear or line spacing, leading, or vertical word space) affects reading speed (Chung, 2004; Paterson & Tinker, 1940). Efficiency may be influenced by transitions between lines, because greater leading eases location of the next line (Paterson & Tinker, 1940), and efficiency may be influenced by crowding (Chung, 2004; see the next section).
In some cases, such as low vision, uppercase may be advantageous, perhaps because of its increased size relative to lowercase (Arditi & Cho, 2007). Also, familiar acronyms are processed more quickly in the familiar uppercase than in lowercase (Besner, Davelaar, Alcott, & Parry, 1984; Seymour & Jack, 1978); these results support the idea of using the most frequently encountered forms in studying normal letter processing.
In more detail, Pelli and colleagues reported a refined and advantageous version of this concept, high-noise efficiency (Pelli & Farell, 1999).
Network approaches differ in a critical way from traditional feature models. Traditional models assume that features are both the unit of extraction from the stimulus and a component of the letter representation. Network models emphasize the information in the representation; the units of extraction are a separate issue. (Although not a necessary assumption, the units of extraction could consist of information about edge pieces and relations between line pairs that are interpreted within the network, for example.)
The authors thank Gregg Oden for helping to inspire the initial research and for helpful comments on this manuscript, and Rosanna Traina for assisting with illustrations. MCD was supported by the University of Reading’s Research Endowment Trust Fund.