2.1 Concepts

Since the introduction of movable type in Western countries during the fifteenth century, many thousands of different typefaces have been designed for use in printed material. A typeface can be expressed in several different fonts (bold, italic, etc.) by varying the weight, width, and style of individual characters. Since the seventeenth century, there has been an alternative use of font (and its variant fount) as a synonym for typeface, and this has become more common since the introduction of digital typography (Oxford University Press, n.d.). Nevertheless, for consistency, the words typeface and font will be used in their original senses in this book; thus, a typeface is comprised of a family of related fonts. In Sect. 1.2, for example, Fig. 1.1 showed eight different typefaces, and both the name of each typeface and the example sentence (the pangram) were shown in the typeface’s regular font.

Typefaces are designed to be read, and thus an obvious research question is whether different typefaces vary in how legible they are for readers. Readable can be used as a synonym of legible, although there are technical definitions of both legibility and readability that go beyond their daily use. Some researchers have used “legibility” to refer to the recognition and the identification of individual letters or words and “readability” to refer to the reading and the understanding of connected prose. Others have devised “readability formulas” to measure the level of mental difficulty involved in reading specific material. Yet others have used “readability” to refer to the extent to which a typeface is subjectively appealing or comfortable to the reader. Even so, as Chomsky (1970) pointed out, in many current versions of English, readable is much more sharply restricted in meaning than “able to be read”: it is instead often used to mean how easy, enjoyable, or engaging a work is to read, as in “This is a most readable novel”. (Chomsky explained that this phenomenon was problematic for theories of transformational grammar.) Consequently, legible and legibility will be used throughout this book.

The legibility of typefaces is pertinent to a wide variety of everyday settings, but it is particularly relevant for the field of education. First, much of the information that is acquired by students is delivered in books, articles, or other printed documents presented either on paper or computer screens or in printed displays projected using PowerPoint or other software. Second, students often submit their work to be evaluated by their teachers or other assessors in the form of word-processed documents, which raises the issue of their legibility for those teachers and assessors, and which led to the question that gave rise to this project.

2.2 Objective Methods for Measuring the Legibility of Typefaces

Attempts to measure the legibility of printed material go back at least to the 1880s. Tinker (1963, pp. 5–7, 9–31) provided a useful summary of the relevant methods of investigation (see also Pyke, 1926, pp. 25–34; Reynolds, 1979; Zachrisson, 1965, pp. 44–69). The following list of methods is a paraphrase based mainly upon Tinker’s account, but it covers most of the techniques that have been used to measure the legibility of printed material. Most of them can be applied equally to measure the legibility of material presented on computer monitors or other screens.

  • Short-exposure method. Printed symbols are briefly presented (e.g., by means of a tachistoscope, which carefully controls the duration of a presentation using shutters and mirrors) to measure the speed or accuracy with which they can be perceived and reported.

  • Distance method. Printed symbols are presented in clear view but at a distance from the observer. The material is then moved towards the observer in gradual steps to measure the furthest distance at which they can be perceived and reported correctly. A variant is where the observer gradually approaches the stimulus. Similar techniques to compare the legibility of different typefaces have been employed since the eighteenth century (Kinross, 1992, pp. 23–24).

  • Perceptibility in peripheral vision. Printed symbols are presented to one side or the other of a central fixation point to measure the furthest horizontal distance at which they can be perceived and reported correctly. Similar effects can be obtained using the “focal variator” (Weiss, 1917), which uses a system of lenses to project a visual stimulus onto a ground glass screen to varying degrees out of focus.

  • Visibility threshold. Printed symbols are viewed through two photographic filters with precise circular gradients of density which are rotated until the material can be perceived and reported correctly. The filters reduce the apparent brightness of the material and also lower the contrast between the material and its background (Luckiesh & Moss, 1935, 1942, pp. 71–79).

  • Reflex blink method. The observer reads text, and the experimenter counts the number of involuntary eye blinks made during a standard observation interval. This assumes that the blink rate is reduced and the reader’s progress faster with more legible text.

  • Rate of work. This covers a variety of tasks including speed of reading, amount read in a specified time limit, the time taken to look up specific information in printed sources such as telephone numbers or functions in mathematical tables, and output in tasks involving visual discrimination.

  • Eye movements. The observer is asked to read continuous text, and the experimenter measures the number of fixations and the number of jumps or saccades between successive fixations. This assumes that more legible text results in shorter fixations and fewer saccades.

  • Fatigue in reading. This approach is concerned not with visual fatigue in reading per se, which has proved consistently difficult to measure; rather, legibility is defined in terms of the ease, accuracy, or efficiency of perceiving printed symbols while reading for understanding.

Of course, new technologies to measure legibility have been introduced over the years. For instance, to employ the short-exposure method, Cattell (1885) used a “gravity chronometer” in which printed material was obscured by a vertically sliding panel held in place by an electromagnet. On its release, the panel fell, and the material was visible for a brief period of time through a small window in the panel. Cattell found that both uppercase letters and lowercase letters varied considerably in their legibility. More sophisticated tachistoscopes became available in the early twentieth century. Since the 1970s, technology has included cathode-ray tubes and liquid crystal displays, and these will be discussed in Part II. Again, studies of eye movements in reading have become more popular with the use of computer-based eye-tracking devices, and these will also be discussed in Part II.

Not only have researchers adopted different methods for measuring the legibility of printed material, but they have presented their participants with different kinds of material: individual letters or other characters; letter strings that do not constitute words; individual words; sequences of unrelated words; strings of words that constitute grammatical sentences; or coherent grammatical prose. The materials towards the beginning of this list afford more opportunity to control the participants’ behaviour, whereas the materials towards the end of the list are more akin to those encountered in everyday reading situations. As in other kinds of educational and psychological research, there is a trade-off between experimental rigour and “ecological validity” (i.e., whether the findings can be generalised to real-life settings).

In particular, Kinross (1992, p. 32) noted that most of the research carried out before the end of the nineteenth century had tested the recognisability of individual letters rather than the legibility of words or passages of text. As he pointed out, it took a change in the theoretical climate around 1900 for legibility to be interpreted as the comprehension of meaning: “not recognition, but reading.” Tinker (1963) went so far as to propose the following conclusion: “Research dealing with individual letters or letters grouped in nonsense arrangement offers little that is important concerning the legibility of type faces. Satisfactory results are obtained by measuring speed of reading continuous, meaningful material” (pp. 65–66).

It is important to distinguish between the legibility of different typefaces and readers’ familiarity with these typefaces. This is reflected in a study of binocular rivalry by Zachrisson (1965, pp. 128–131). The latter phenomenon occurs when a stimulus is presented to one eye but another stimulus is presented to the other eye: instead of seeing the stimuli superimposed on each other, most observers report seeing images of the stimuli alternating with each other. Zachrisson presented 28 students with individual words. In each case, the word was presented in the serif typeface Imprint to one eye and in the sans serif typeface semi-bold Grotesk to the other eye. The participants pressed one of two buttons to report which typeface they were seeing over a period of 3 min. The results showed a strong dominance of the serif typeface over the sans serif typeface, regardless of the eyes to which they were presented. Zachrisson repeated his study with 9-year-old children and found that the dominance of the serif typeface was much weaker. He took this to reflect the children’s reduced familiarity with these letter forms. The implication is that the stronger ocular preference seen in adults was mainly due to their more extensive experience of reading documents (such as newspapers, magazines, reports, and books) in serif typefaces rather than to any inherent superiority in their legibility.

2.3 Subjective Methods for Measuring the Legibility of Typefaces

Researchers have also collected subjective reports from their participants concerning the legibility of different typefaces. Examples of different typefaces can be presented either individually or in groups of two or more for comparison. The self-reports can be collected either informally (for example, through interviews) or more formally (for example, through the use of rankings or rating scales). Pyke (1926, pp. 58–59) asked 60 participants to rank order eight typefaces (including one sans serif typeface, Lining Grotesque) in terms of their “relative merits”. He found that the participants gave various reasons for their choices, and many found it difficult to differentiate between the typefaces on this basis. He found that there was a correlation of just +0.46 with the rank order of performance in a speed-of-reading test, and he concluded that the relationship between the two measures was unclear.

Tinker and Paterson (1942) asked a group of participants to arrange samples of ten different typefaces “in order from most legible to least legible” (p. 38). Tinker (1944) then obtained results on the legibility of the ten typefaces using the visibility threshold method, which he referred to as their “visibility”. He had existing data on the legibility of the same typefaces using the distance method (which he referred to as their “perceptibility”) as well as data on their legibility using their speed of reading. He found a high positive correlation between the ranks of their visibility and their perceptibility, which suggested that the two measurements had much in common. Nevertheless, their ranked visibility and perceptibility both showed a modest negative relationship with their ranked speed of reading. In addition, their judged legibility showed a high positive correlation with their ranked visibility and perceptibility. However, consistent with the results obtained by Pyke (1926), their judged legibility showed only a correlation of +0.33 with their ranked speed of reading.

These findings could be interpreted in a number of different ways. Tinker himself argued that both visibility and perceptibility at a distance represented abnormal and artificial reading situations, even though they were highly correlated with judged legibility. Instead, he argued that speed of reading constituted the best possibility as a measure of legibility since it provided “measurement in a normal, ordinary reading situation” (Tinker, 1944, pp. 393–394). In addition, he claimed that subjective judgments of legibility “can only be considered as an expression of preference which may be employed to advantage in a practical way for the guidance of printers when there is a choice to be made between equally readable typographical arrangements” (pp. 394–395). Even so, Tinker’s results imply that the techniques listed in Sect. 2.2 do not simply constitute alternative measures of a single construct of legibility. Any research findings with regard to the legibility of serif and sans serif typefaces therefore need to be qualified by a clear explanation of the measure or measures of legibility on which they are based, and this practice will be adopted in this book.

Participants’ preferences might not be a reliable indicator of the objective legibility of different typefaces, but they may well have practical consequences. Song and Schwarz (2008b) carried out three studies in which the participants read instructions for carrying out a particular task printed either in a plain sans serif typeface (Arial in all three studies) or in an elaborate cursive typeface (Brush455 BT or Mistral in different studies). In all three experiments, the sans serif typeface was rated as easier to read than the cursive typeface, but there was no difference in the participants’ memory for particular details in the instructions. The participants who read the instructions in the cursive typeface reported that the task would take more time, would feel less fluent and natural, and would require more skill, and that they were less willing to engage in the task than were the participants who read the instructions in the sans serif typeface. Song and Schwarz concluded that the participants had mistaken the ease of processing the instructions as indicating the ease with which the relevant tasks could themselves be executed. Song and Schwarz (2008a) showed that the same manipulation affected how participants answered distorted and undistorted questions based on their general knowledge.

2.4 The Size of Typefaces

It might seem plausible that the legibility of different typefaces depends on their size. In fact, Legge and Bigelow (2011) showed that legibility was essentially constant across the range of type sizes that readers might encounter in books, magazines, and newspapers. Nevertheless, comparing the physical size of different typefaces is not a straightforward matter.

Traditionally, the overall height of typefaces (technically known as their body size) has been expressed in terms of points, where one point is approximately equal to 0.35 mm. However, the size of typefaces is also expressed in terms of the dimensions of lowercase letters. Some lowercase letters have features that extend above their main parts (e.g., b and d); these are called ascenders. Others have features that extend below their main parts (e.g., p and q); these are called descenders. The x-height of a typeface is the height of lowercase letters that do not have either ascenders or descenders (such as the letter x itself). Finally, the cap-height of a typeface is the height of capital (or uppercase) letters, which may or may not be the same as the height of ascenders. Key concepts in the measurement of typefaces are summarised in Fig. 2.1.

Fig. 2.1
figure 1

Key concepts in the measurement of typefaces. From Effects of printing types and formats on the comprehension of scientific journals (Applied Psychology Unit Report No. 346), by E. C. Poulton, 1959. UK Medical Research Council, Applied Psychology Unit. Used by kind permission of the Medical Research Council, as part of UK Research and Innovation

The body size of a typeface is thus made up of its x-height and the combined heights of the ascenders and descenders, plus small margins above the tops of ascenders and below the bottoms of descenders. (The latter are known as leading, pronounced “ledding”. This term originated in the practice of using thin strips of lead to separate lines of text in order to increase the vertical space between them. Such terminology was developed in the age of movable type, but it has been carried over into electronic printing, where it is also known as interline spacing.) When comparing different styles of typeface, researchers have often matched them on the basis of their body size. Nevertheless, Poulton (1972) noted that this does not equate the sizes of the individual letters, as measured by their x-height. In general, it is often not possible to match pairs of typefaces simultaneously on the basis of both their body size and their x-height.

Poulton (1972) simulated the situation of a shopper looking for a particular item in the list of ingredients on a package of food. Because food containers are often quite small, the typefaces used for lists of ingredients are themselves usually quite small. Poulton tried to determine the minimum legible size of lowercase letters printed in one of two serif typefaces (Times New Roman and Perpetua) or in one sans serif typeface (Univers). He asked a total of 264 adult volunteers to find a designated target word within each of 15 lists of food ingredients, for which they were allowed 25 s. The number of target words found within this time limit was used as a measure of legibility.

Poulton found that performance markedly declined when the x-height of a typeface was less than 1.2 mm. He also found that Times New Roman and Perpetua yielded similar results, even though the latter’s body size was more than 30% greater than that of the former. He inferred that body size was not an important determinant of legibility. Performance was significantly poorer with Univers than with either Times New Roman or Perpetua, but not when the x-height of the two latter typefaces was reduced photographically to match that of the former. Poulton concluded that Univers was less legible than the other typefaces because of its smaller x-height and not because of the absence of serifs. In fact, typographers have believed for a long time that the visual impact of lowercase letters is determined by their x-height rather than by their body size (Craig, 1971, p. 24; Williamson, 1966, p. 37).

2.5 Conclusions

This chapter clarified the distinction between typefaces and fonts and that between legibility and readability. It described the variety of objective methods that have been used to measure the legibility of printed material and the different ways of collecting subjective reports from participants regarding the legibility and other properties of presented material. Most of these techniques have been taken over into research on reading from screens. Finally, this chapter described how typographers define the size of typefaces and discussed which aspects of the size of typefaces are likely to affect the legibility of material.