1 Reading Letters and Words in Serif and Sans Serif Typefaces

One of the earliest uses of computers in vision research was as tachistoscopes for the brief presentation of visual stimuli. One fundamental problem was that cathode-ray tubes (CRTs) suffered from the decay time or persistence of the phosphor in the individual pixels, which led to inaccurate measurements (see Hutner et al., 1999). (It should be noted that traditional tachistoscopes were by no means immune to such problems: Mollon & Polden, 1978.) Liquid crystal displays (LCDs) are generally regarded as more appropriate for vision research (Wang & Nikolić, 2011). Even so, it is generally regarded as being good practice to use a backward-masking procedure in which a second stimulus or mask (perhaps only a random pattern) is presented after a brief interval to overwrite the visual trace of the original stimulus.

Suen and Komoda (1986) used this approach to compare the recognition of individual uppercase and lowercase letters that had been digitised using an optical scanner from print in a slab serif typeface (Courier), a sans serif typeface (Letter Gothic), and the output of a dot-matrix printer. All three typefaces were monospaced or non-proportional (that is, each character occupied the same width). The characters were presented on a high-resolution CRT screen controlled by an Apple microcomputer. (The actual duration of the presentation was not specified.) They were followed by a mask after 0 ms, 16.7 ms, or 33.3 ms. If the mask was presented immediately after the letters, performance was best with the sans serif typeface and worst with the dot-matrix style. The differences among the three styles were much reduced if the mask was presented after a delay, but this may have been a ceiling effect, since performance was 80% or better with all three styles, presumably because of the additional time available for processing the letters.

As mentioned in Sect. 4.1, Korean is another language where the alphabet can be rendered in either serif (or Ming) typefaces or sans serif (or Gothic) typefaces. Hwang et al. (1997) presented Korean participants with CRT screens consisting of 12 windows, each containing letters in one of six different sizes. The participants’ task was to find and read aloud a particular target letter and then to report how much visual fatigue they had experienced while carrying out the task. The letters were presented using either an unspecified Ming typeface or an unspecified Gothic typeface. The participants’ response times were faster, their accuracy was higher, and their reported visual fatigue was lower when reading letters in a sans serif typeface than when reading letters in a serif typeface. Even so, different results were obtained by Kong et al. (2011) in a study that was described in Sect. 4.1. They asked Korean participants to read aloud sets of four one- or two-syllable letters of varying sizes and to rate how much discomfort they had experienced when reading each set. The letters were presented either on paper or on an LCD screen using either an unspecified Ming typeface or an unspecified Gothic typeface. When the letters were presented on an LCD screen, there was no difference between the serif typeface and the sans serif typeface either in the participants’ performance or in their reported discomfort.

As was mentioned in Sect. 5.1, Arditi (2004) devised software to generate typefaces with slab serifs of varying size. Arditi and Cho (2005) used this software to construct lowercase typefaces of uniform thickness with slab serifs extending 0% (sans serif), 5%, or 10% of the cap height (the height of capital or uppercase letters). In theory, the resulting typefaces should have varied only in the size of the serifs, but, as was mentioned in Sect. 10.1, Arditi and Cho found that an increase in the spacing between successive letters had been required to accommodate the serifs. They therefore used an inter-letter spacing of 0%, 10%, or 40% of the cap height, yielding a 3 × 3 design. Arditi and Cho measured size thresholds when random five-letter strings were presented on a CRT computer screen as black letters against a white background. Data were obtained from four participants with normal vision. There was a large effect of spacing such that closely spaced letters yielded higher thresholds (i.e., poorer performance). There was also a significant but small effect of serif size, such that serifs of 5% or 10% led to lower thresholds (i.e., better performance) than a sans serif typeface, which Arditi and Cho ascribed to the concomitant increase in spacing required to accommodate them.

After Microsoft introduced ClearType software with the aim of tackling the aliasing issue in text presented on LCDs (see Sect. 10.3), a range of new typefaces was commissioned to try to exploit this new technology. They included two serif typefaces (Cambria and Constantia) and four sans serif typefaces (Calibri, Candara, Corbel, and Consolas, the last for use mainly in programming). Chaparro et al. (2006a, b, 2010) evaluated the legibility of these new typefaces in comparison with that of the serif typeface Times New Roman and the sans serif typeface Verdana. Nine participants were presented with individual characters from all eight typefaces for just 34 ms each (but with no backward mask) using an LCD monitor with ClearType software enabled and were asked to say each character’s name aloud. The proportion of characters reported correctly was highest for Consolas, Cambria, and Verdana and lowest for Times New Roman, Candara, and Corbel. Taking Times New Roman as the reference, accuracy was significantly better for Consolas, Cambria, and Verdana but not for the other four typefaces (Chaparro et al., 2010). Despite these somewhat ambivalent findings, Microsoft made Calibri the default typeface for all its Office applications in 2007 (and it remains the default at the time of writing).

Moret-Tatay and Perea (2011) used a lexical decision task in which Spanish students had to say whether or not stimuli were genuine words. Half the stimuli were Spanish words, and the other half were nonwords created by changing two letters in genuine Spanish words. Each was presented in the centre of an LCD screen in either a serif typeface (Lucida Bright) or a sans serif typeface (Lucida Sans). Once again, the typefaces should have differed only in the presence or absence of serifs, but, as mentioned in Sect. 10.1, Moret-Tatay and Perea noted that the serifs had occupied some of the space between the letters, and so removing the serifs led to a slight increase in the inter-letter spacing. They found that participants responded significantly more quickly to words presented in a sans serif typeface than to words presented in a serif typeface, and they suggested that this might have been due to the slight increase in inter-letter spacing.

2 The “Stripiness” of Words Displayed on Screens

Section 4.2 described research by Wilkins et al. (2007) which measured the vertical “stripiness” of a word by the height of the first peak of the autocorrelation between an image of the word and a second, horizontally displaced image of the same word. They had found that words with a higher first peak (i.e., more stripy words) were read more slowly than words with a lower first peak (i.e., less stripy words). However, it was not clear whether this led to variations in how quickly words in different typefaces were read.

Liversedge et al. (2006) had displayed sentences to 15 students as white letters on a black background using a CRT screen and monitored the movements of both their eyes. They found that in normal binocular reading the two eyes were often misaligned after a saccade, so that part of the duration of the subsequent fixation was taken up correcting this disparity in order to achieve binocular vergence. Jainta et al. (2010) presented 32 German participants with 120 unrelated sentences in blocks of 10 to read silently from a CRT screen. They found that the participants achieved better binocular vergence when the sentences contained words with a higher first peak, but that this took longer to achieve and led to a longer overall fixation duration. They argued that these findings explained the longer overall reading time for words with higher first peaks.

Wilkins et al. (2020) observed that different typefaces appeared to vary in the periodicity of their letters’ vertical strokes. In two serif typefaces, Times and Palatino, the letter strokes were relatively evenly spaced, whereas in two sans serif typefaces, Arial and Verdana, the spaces between the strokes within a letter were greater than the spaces between the letters, leading to low periodicity. Wilkins et al. determined the first peak of the horizontal autocorrelation for passages from two novels when they were printed to the Retina (LCD) screen of an Apple Macbook Pro in each of nine serif typefaces and in each of 11 sans serif typefaces. They found that the first peak of the horizontal autocorrelation was significantly greater for the serif typefaces than for the sans serif typefaces. Wilkins et al. argued that this difference in the first peak of the horizontal autocorrelation was not due to the serifs themselves but to the effect of the serifs on the rhythm or periodicity of the typefaces. However, Wilkins et al. did not provide any evidence that these differences led to significant variations in how quickly words in different typefaces were read.

3 Confusions Among Letters in Serif and Sans Serif Typefaces

In their original study involving the presentation of uppercase and lowercase letters in either a slab serif typeface or a sans serif typeface (mentioned in Sect. 11.1), Suen and Komoda (1986) observed that with both typefaces errors often consisted of confusions between uppercase and lowercase forms of the same letter or confusions between letters that were visually similar in the same case. The total number of confusions was similar in the two typefaces, but there were certain differences in the pattern of confusions. For instance, with the sans serif typeface, lowercase letters tended to be mistaken as their uppercase counterparts rather than vice versa, but the reverse tended to be true for the slab serif typeface. Suen and Komoda ascribed these trends to the design of the characters in the relevant typefaces rather than to the presence or absence of serifs.

Using a laptop computer with an LCD screen, D. Fox, Chaparro, and Merkle (2007) presented individual characters (the 26 lowercase letters plus the digits 0–9 and 11 common mathematical and scientific symbols) to ten participants in ClearType rendering for just 34 ms (but with no backward mask) in each of 20 different typefaces. The participants were asked to say each character aloud and were scored on their accuracy. Fox et al. focused on errors for the letter e (which is confusable with the letter c and the number 0) and for the number 0 (which is confusable with the letters e and o). For the letter e, the most accurate performance was obtained with the sans serif typefaces Clearview Text and Verdana, and the least accurate performance was obtained with the serif typeface Garamond. For the number 0, the most accurate performance was obtained with the serif typefaces Centaur and Rockwell, and the least accurate performance was obtained with the serif typeface Constantia.

In further results from this study, Fox et al. (2008) compared the data for the numbers 0 and 1 (which is confusable with the letter l). For the number 1, the most accurate performance was obtained with the sans serif typeface ClearView Text, and the least accurate performance was obtained with the serif typeface Centaur. Fox et al. employed classification tree analysis to identify the physical features of different typefaces that might be responsible for variations in legibility for both numbers, although they did not include the presence or absence of serifs as a feature in these analyses. Taken together, the findings of this study suggest that some characters tend to be more legible when presented on-screen in sans serif typefaces than when presented on-screen in serif typefaces, whereas the opposite is true for some other characters.

Beier and Larson (2010) constructed three new artificial typefaces. In each case, they varied the presence or absence of slab serifs with no other changes to the letters themselves. In a pre-test, each of 34 participants viewed the letter d presented on the LCD screen of a laptop computer that had been placed on a podium at eye-level height at a distance of 10 m. They approached the podium until they could correctly identify the letter. The relevant distance was then used for the presentation of individual uppercase and lowercase letters in the main experiment. Beier and Larson found that serifs could serve either to enhance or to impair the relative differentiation of individual letters. For instance, a slab serif added to the top of the stem of the letter i led to improved identification, but this was not the case when slab serifs were added to both the top of the stem and the baseline. In Sect. 4.2, it was noted that similar findings had been obtained with the identification of individual characters when reading from print (Harris, 1973; Tinker, 1963, p. 36). However, Vernon’s (1929) findings, again based on reading from print, imply that such confusions would be much less likely if letters were presented in the context of meaningful text, as in normal reading.

4 Conclusions

As with reading from print, the earliest research on the legibility of different typefaces when reading from screens was concerned with recognising individual letters and words under different conditions. Studies that employed authentic typefaces showed at most that some sans serif typefaces are more legible than some serif typefaces. Research using artificial typefaces suffers from confounding between (a) the presence or absence of serifs and (b) variations in the width of the letters and the spacing among successive letters. The horizontal autocorrelation of individual words differs across different typefaces, but it is not clear whether this leads to differences in how quickly different typefaces can be read. The vertical stripiness of serif typefaces tends to be greater than that of sans serif typefaces, but there is little evidence that this leads to variations in how quickly words in different typefaces are read. As with printed letters and words, identification errors are often the result of confusions among visually similar letters, but visual confusions are not more likely with sans serif typefaces than with serif typefaces, contradicting an old hypothesis that serifs make letters easier to discriminate (Legros, 1922, p. 11).