1 Reading Letters and Words in Serif and Sans Serif Typefaces

The earliest experiments on the legibility of printed material were concerned with the relative legibility of individual letters presented in isolation in either uppercase or lowercase in the same serif typeface using the short-exposure method or the distance method. Other research was concerned with the relative legibility of characters presented in different serif typefaces. However, some researchers included one or more sans serif typefaces together with a range of serif typefaces in investigations of the legibility of letters and words:

  • Griffing and Franz (1896) measured the “illumination threshold” of letters constituting from one to four words in a line. In this method, the distance between a faint light source and the material was progressively reduced until the letters could be correctly reported. They included uppercase letters in both thick and thin versions of the sans serif typeface Block.

  • Roethlein (1912) used the distance method to measure the legibility of individual letters presented in the same typeface and included the sans serif typefaces Franklin Gothic and News Gothic.

  • Pyke (1926) measured the legibility of both meaningful and meaningless strings of letters presented in the same typeface, including the sans serif typeface Lining Grotesque. He used a speed-of-reading test, a letter-cancellation task in which participants had to cross out all occurrences of the letters e and t in a page of nonsense material, and a task which involved reading aloud coherent text (pp. 47–58).

  • Paterson and Tinker (1932) used a speed-of-reading test in which the participants had to read short passages and in each case to identify a word that conflicted with the passage’s meaning. They used a number of different typefaces including the sans serif typeface Kabel Light.

  • Webster and Tinker (1935) employed the distance method to measure the legibility of individual words in different typefaces and also included Kabel Light.

  • Luckiesh and Moss (1937) measured the visibility threshold of individual lowercase letters and included a sans serif typeface in light, medium, and bold font. They did not identify this typeface, but Lund (1999, p. 116) suggested that it was Kabel.

  • Luckiesh and Moss (1942, pp. 159–162) carried out a similar experiment and included the sans serif typeface Metrolite No. 2.

None of these researchers focused upon the difference between serif and sans serif typefaces, but the results that they presented indicate that the legibility of sans serif typefaces was not markedly different from the legibility of serif typefaces that were in common use at the time.

Ovink (1938) carried out two experiments to investigate the typographical factors that might influence the legibility of printed letters. His first experiment used the short-exposure method, and he presented individual lowercase letters in isolation in both serif and sans serif typefaces (pp. 23–37). The second experiment used a version of the distance method in which the printed material was presented in a fixed location and the participants approached it in gradual steps until it could be perceived and reported correctly. The material consisted of individual uppercase and lowercase letters presented in isolation in one of several sans serif typefaces or in one of two serif typefaces (Lo and Poster Bodoni) (pp. 38–71). The results that Ovink obtained using both methods show that the legibility of the sans serif typefaces was not markedly different from that of the serif typefaces.

Korean is another language where the alphabet can be rendered in either serif (or Ming) typefaces or sans serif (or Gothic) typefaces. Two studies have compared the legibility of the two kinds of typeface, but they yielded contradictory findings. Kong et al. (2011) asked ten older and ten younger adults to read aloud sets of four one- or two-syllable letters of varying sizes and then to rate how much discomfort they had experienced when reading each set on a 4-point scale. The sets of letters were presented either on paper or on a computer screen using either an unspecified Ming typeface or an unspecified Gothic typeface. When the letters were presented on paper, the participants’ reading speed was faster and their discomfort was less with the sans serif typefaces than with the serif typefaces. Nevertheless, the relevant differences were small in magnitude and unlikely to be of practical importance. The results were similar in both age groups.

Kim et al. (2015) presented 14 Korean students with pairs of two-syllable words printed side by side in 26-point type. On each trial, one member of the pair had been designated as the target, while the other was a distractor, and the participants’ task was to read aloud the target in each pair. The pairs of letters were presented either on paper or on the screen of a smartphone in one of two serif typefaces (Batung or Gungseo) or in one of two sans serif typefaces (Dodum or Gulim). Afterwards, the students were asked to rate the typefaces of the stimuli in terms of their ease of reading, their familiarity, and their comfort. When the words were presented on paper, the participants’ reading time was significantly faster for the serif typefaces than for the sans serif typefaces. Despite the pattern of results for their response times, they gave the highest ratings to the sans serif typeface Dodum and the lowest ratings to the serif typeface Gungseo.

Wilkins et al. (1996) had devised the Rate of Reading Test for children with reading difficulties. The child was presented with a display of 10 lines, each consisting of a random ordering of the same 15 common words. For instance, the first line of one such display was “come see the play look up is cat not my and dog for you to”, except that the spacing between successive words was only 0.36 mm. This made the display resemble horizontal stripes and thus rendered it visually stressful. (Horizontal stripes are known to induce eye strain, visual illusions, headache, and—in people with photosensitive epilepsy—seizures: A. Wilkins et al., 1984.) The researchers argued that the use of random ordering minimised the linguistic and semantic aspects of reading that tended to be emphasised in more conventional reading tests. Children were timed while they read aloud the words in each display and were scored on the number of words that they had read correctly per minute. The original version of the Rate of Reading Test only used the serif typeface Times. However, Svensson (2019) developed a Swedish version of the test and administered it to 45 adults aged between 22 and 83 years. The test was administered twice in Times New Roman and twice in Times Sans Serif, a sans serif variant designed by Mundo da Lua. The average reading speed was 168 words/min in both conditions, and the difference between them was not statistically significant (p = 0.54).

2 The “Stripiness” of Printed Words

Wilkins et al. (2007) suggested that the legibility of letters or words might depend upon their shape and, in particular, upon the extent to which letters’ vertical strokes were relatively evenly spaced, a phenomenon that typographers refer to as their rhythm but which Wilkins et al. referred to less formally as their “stripiness” (i.e., the extent to which an image of a word approximated a pattern of vertical stripes). They suggested that this could be measured by the height of the first peak of the autocorrelation between an image of a word and a second, horizontally displaced image of the same word. They explained this measure by asking readers to imagine two identical transparencies containing a single word placed on top of one another on an overhead projector.

When the transparencies are in register [i.e., exactly in line], a maximum amount of light will be transmitted through the combined transparencies. . . . If the top transparency is moved horizontally across the bottom transparency, the amount of light transmitted is initially reduced because the letter strokes in one version of the word block the spaces in the other version. As the displacement continues, however, and neighbouring letter strokes come into register, so the amount of light transmitted increases. As the top transparency is displaced still further, the amount of light transmitted once again decreases and then increases again. The light transmitted varies with horizontal position according to a function with peaks and troughs. This function is, in effect, the horizontal autocorrelation. (pp. 1788–1789).

As examples, Wilkins et al. gave the words “mum” and “over”. (In academic texts, these words would normally be rendered in an italic font. On this occasion, I have presented the words in a regular font with inverted commas to help readers to appreciate the differences in the words’ shape.) The former has fairly evenly spaced vertical strokes, high periodicity, and a relatively high first peak (high stripiness), but the latter has very few vertical line elements, low periodicity, and a relatively low first peak (low stripiness).

Wilkins et al. asked ten students to rate each of 40 common words printed in the serif typeface Times New Roman in terms of their stripiness on a scale from 0 (not at all stripy) to 10 (very stripy). They found that their mean rating for each word was highly correlated with the first peak in its horizontal autocorrelation (r = 0.688). In short, “words with a high first peak in the autocorrelation were rated as having a striped appearance” (p. 1791). Wilkins et al. then asked 32 university students and staff to read aloud 22 common monosyllabic words. The words were divided into those with high and low first peaks and were printed either in a single column or as a random paragraph of 18 lines in either Times New Roman or the sans serif typeface Arial. There was a large effect of autocorrelation, such that words with a high first peak were read more slowly than were words with a low first peak. Wilkins et al. confirmed this finding in two experiments using words with no ascenders or descenders that were printed in either Times New Roman or the sans serif typeface Geneva. (This indicated that the difference in reading speed was not due to the presence or absence of ascenders or descenders.) They also confirmed this finding in two experiments where participants silently scanned passages of randomly ordered words with the aim of finding pairs of target words.

In addition, Wilkins et al. compared the first peak in the horizontal autocorrelation of 1,000 words printed in different typefaces of similar x-heights. The value of the first peak in the Times New Roman was very highly correlated with its value in the serif typeface Palatino (r = 0.95), but it was less highly correlated with its value in the sans serif typeface Arial (r = 0.68). The first peak tended to be highest in Times New Roman, somewhat less in the sans serif typeface Lucida Sans, and lowest in the serif typeface Palatino and the sans serif typeface Arial, although the differences were small in magnitude. Wilkins et al. used the same corpus of 1,000 words to compare the horizontal autocorrelation in the serif typefaces Times New Roman and Tahoma, the sans serif typefaces Arial and Verdana, and the slanting sans serif typeface Sassoon Primary (discussed in Sect. 7.4). They did not report the detailed findings, but Verdana had the lowest first peak.

In two of their experiments, Wilkins et al. directly compared the reading times for different typefaces. Random paragraphs were read significantly more quickly in the sans serif typeface Geneva than in the serif typeface Times New Roman. However, there was no significant difference between the reading speeds in Times New Roman and the sans serif typeface Arial, regardless of whether the words were presented in a single column or in random paragraphs. In short:

  • Different words printed in the same typeface vary in the first peak of their horizontal autocorrelation (or vertical stripiness).

  • The same words printed in different typefaces vary in the first peak of their horizontal autocorrelation.

  • Words of low vertical stripiness are read more quickly than are words of high vertical stripiness.

However, the results obtained by Wilkins et al. leave it uncertain whether these phenomena lead to variations in how quickly words in different typefaces are read. Subsequently, Wilkins and his colleagues carried out further research using words presented on computer monitors, and this is described in Sect. 11.2.

3 Confusions Among Letters in Serif and Sans Serif Typefaces

Many early studies found that errors in the tachistoscopic recognition of individual letters and words were the result of confusions among visually similar letters (for a review, see Vernon, 1931, pp. 114–120, 145–150, 158–159). In the light of such evidence, Legros (1922, p. 11) claimed that serifs made letters easier to discriminate and identify. However, Vernon (1929) noted that other studies had found that, in reading connected text, words tended to be perceived in a holistic manner rather than letter-by-letter, and hence confusions among individual letters should be much less important. She presented adults with different kinds of material using a tachistoscope. She found that the proportion of errors based upon similarity of appearance declined from 82% for groups of unrelated words to 14% for longer sentences, whereas the proportion of errors based upon similarity of meaning increased from 2% for groups of unrelated words to 57% for longer sentences.

Vernon argued that, “when the meaning of the material read was fully comprehended, typographical errors were few in tachistoscopic reading, and would be negligible in normal reading” (p. 35). Elsewhere, Vernon (1931, pp. 171–172) concluded that young children beginning to read might be liable to confuse visually similar letters but that this was of much less importance in normal adults’ reading. An implication of this is that, even if the presence or absence of serifs influences the discrimination or identification of individual letters, it should have little or no impact upon the reading of connected text by literate adults.

Tinker (1963, p. 36) argued that in practice serifs might serve either to enhance or impair the relative differentiation of individual lowercase letters. Harris (1973) presented individual lowercase letters tachistoscopically either to the left or to the right of the point of fixation in the sans serif typefaces Gill Medium and Univers Medium or in the serif typeface Baskerville. He found that different letters were more likely to be confused when presented in Baskerville. He suggested that serifs on letters with a single vertical stroke (such as i, j, and l) rendered them more distinctive and hence less likely to be confused with one another. On the other hand, he also suggested that serifs on letters with more than one vertical stroke (such as h, n, and u) rendered them less distinctive and hence more likely to be confused with one another. Beier and Dyson (2014) obtained analogous results using artificial typefaces with a version of the distance method. However, Vernon’s (1929) findings would imply that such confusions would be much less likely if letters were presented in the context of meaningful text, as in normal reading.

4 Measuring Visual Acuity

Some of the earliest charts for measuring visual acuity were developed by Snellen (1862) (see Fig. 1.4 in Sect. 1.3). These contained rows of uppercase letters and single digits based on a 5 × 5 grid; this yielded a slab serif style akin to a typeface that was then known as Egyptian Paragon, in which the width of the main strokes and the width of the serifs were one fifth the height of a letter. (Both were the size of the cells in the 5 × 5 grid.) Successive rows contained increasing numbers of symbols of decreasing size, and visual acuity was scored according to the smallest row that could be read accurately in each eye.

Over the next century, other researchers developed versions of these charts using different layouts and sequences of letters; some followed Snellen in using a slab serif style, whereas others adopted a sans serif style (see Bennett, 1965, for a review). Cowan (1928) asserted that “Gothic” (sans serif) letters were more easily distinguished than “block” (slab serif) letters (p. 290), but he provided no reference or any other source for this assertion. Hetherington (1954) tested the visual acuity of 100 boys aged 8–17 using an unspecified version of a Snellen chart; he noted that different letters of the same size varied in their legibility, but he concluded that the boys’ errors were generally the result of confusions among visually similar letters.

In fact, since the 1950s, charts with sans serif styles of lettering have been widely adopted for measuring visual acuity. Examples include those devised by Sloan (1959), the British Standards Institute (1968), Deederer (1968, 1970), and Bailey and Lovie (1976). The British Standards Institute (1968) commented that this development in measures of visual acuity was “in keeping with modern trends in typography” (p. ii), while Deederer (1968) remarked that it was appropriate for testing drivers, since they would frequently encounter sans serif lettering on traffic signs.

Richards (1965) compared the visual acuity of 103 volunteers who were tested using Sloan’s sans serif chart and a Snellen chart containing lettering with slab serifs; people who had slight uncorrected astigmatism found the letters with slab serifs more confusing under conditions of low luminance, but otherwise there was very little overall difference between the results obtained using the two styles. Richards (1978) subsequently replicated these findings with a sample of 175 volunteers stratified by age between 16–25 years and 66–75 years (that is, the sample contained 25 volunteers from each decade of the adult life span). Bailey and Lovie’s (1976) instrument is known as the LogMAR chart (the acronym stands for Logarithm of the Minimum Angle of Resolution), and nowadays it is generally regarded as the most accurate measure of visual acuity.

5 Conclusions

The earliest research on the legibility of different typefaces was concerned with recognising individual letters and words under different conditions. The vertical “stripiness” of individual words can be defined in terms of their horizontal autocorrelation. This seems to affect how quickly they can be read, but it is unclear whether this leads to differences among typefaces. There is a separate line of research concerned with evaluating visual acuity, going back to the construction of optical charts in the middle of the nineteenth century. In both fields of research, the most common finding—the modal finding—is that there are no differences in the legibility of letters and words printed in serif and sans serif typefaces. Confusions among visually similar letters were originally considered to be a primary determinant of legibility, but these appear to be less important when skilled readers are presented with meaningful text.