1.1 The Origins of this Book

Many interesting research projects begin with an apparently simple question. The question with which this project began arose in the context of managing an online course.

In 2008, I and a colleague at the UK Open University were charged with designing and implementing a postgraduate distance-learning course with the title Accessible Online Learning: Supporting Disabled Students. The course was to be taken entirely online and would contribute to the University’s master’s programme in Online and Distance Education. All the course material was available online, either from a dedicated website or through the University library’s online resources. In particular, the course textbook (Seale, 2006) was available free for students as an e-book. The course would run annually from September to January and would be equivalent to one quarter of a year’s full-time study.

We recruited four experienced associate lecturers to serve as tutors for the course. They were each assigned 10–20 students, and their duties were to moderate online discussion forums among the group of students and to assess the assignments submitted by each student. The assignments were to be submitted in Microsoft Word format through a dedicated online system, and the students were given advice and instructions with regard to essay structure and referencing. The tutors provided an overall evaluation and mark (out of 100%) on each of the assignments using a separate form, and they provided more specific feedback in the margins of the assignment using Word’s “comment” facility.

During the first presentation of the course in 2008–2009, one of the tutors suggested that students should be required to submit their assignments using a sans serif typeface, on the basis that “everybody knew” that sans serif typefaces were easier to read on screen and that this would make the task of evaluating the students’ assignments more efficient. This seemed to overlook a number of points regarding the process of learning and assessment:

  • Many students were using the default typeface in Microsoft Word to type their assignments. At the time, this was the serif typeface Times New Roman.

  • The idea that the appearance of written work could be changed to suit the potential readers’ abilities, skills, and preferences was introduced during the course.

  • A tutor who downloaded an assignment to provide feedback could change the appearance of the assignment to suit their own abilities, skills, and preferences.

  • Tutors might choose to evaluate assignments on screen, or they could instead choose to print off the assignments to read and evaluate them in hard copy.

The last point raised the question of the legibility of serif and sans serif typefaces when they were used to produce material intended to be read on paper or other hard surfaces. Here, a cursory view of the literature suggested that “everybody knew” that serif typefaces were easier to read on paper than were sans serif typefaces. Of course, the fact that everybody knows something is no guarantee that it is true. Until the time of the Pythagorean School in the sixth century BC, “everybody knew” that the earth was flat; and until the time of Copernicus in the sixteenth century AD, “everybody knew” that the sun rotated around the earth. Nowadays, we expect such matters to be determined by empirical evidence, not by majority opinion. This book is concerned with the empirical evidence concerning the relative legibility of serif typefaces and sans serif typefaces: Part I is concerned with their legibility in “hard copy” (i.e., when presented on paper or other hard surfaces), and Part II is concerned with their legibility when presented on computer monitors or other screens.

1.2 Serif Typefaces

There are many dimensions on which typefaces can vary, but this book is concerned with the legibility of typefaces with and without serifs. A serif is “a short, light line projecting from the main stroke of a letter” (Chicago Manual of Style, 2003, p. 837) that often takes the form of a small finishing stroke at the top or bottom of a letter. (In English-speaking countries, typographers have variously spelled the word ceref, ceriph, seriff, seriph, seryph, surriph, surripse, surryph, or syrif: Mosley, 1999, pp. 53, 55.) One feature of typefaces with serifs is that the main strokes constituting each letter are often of varying thickness. The left-hand panel of Fig. 1.1 shows some examples of common serif typefaces (Baskerville, Garamond, Palatino, and Times New Roman) as they are currently rendered in Microsoft Word. (The example sentence is a pangram—a single sentence that uses all 26 letters in the English alphabet—which is often used as a typing exercise.)

Fig. 1.1
figure 1

Examples of common serif typefaces (Baskerville, Garamond, Palatino, and Times New Roman) and common sans serif typefaces (Arial, Comic Sans, Tahoma, and Verdana)

Serifs have been noted in Greek inscriptions from the fourth century BC, but they became widely adopted during the Roman Empire (from 27 BC to AD 476) (Mosley, 1999, p. 18), when they are generally thought to have resulted from the practices of Roman masons (Bringhurst, 2019, pp. 119–120). It is likely that the latter used a flat or square-edged brush to draft symbols on pieces of stone before carving them with chisels, and the serifs were left at the ends of the brushstrokes (Catich, 1991). By the beginning of the Common Era, Roman inscriptions used a standard alphabet of capital letters in which serifs were a characteristic feature, and many examples have survived in their original locations or else in museums to the present day. A commonly cited example is the inscription on the base of the column that was completed in AD 113 to commemorate the emperor Trajan’s victory in the Dacian Wars, shown in Fig. 1.2. The column itself survives largely intact in the otherwise ruined remains of Trajan’s forum near the Piazza Venezia in the centre of modern Rome.

Fig. 1.2
figure 2

The surviving inscription on the base of Trajan’s Column. Licensed under the Creative Commons Attribution–Share Alike 3.0 Unported (CC BY-SA 3.0), https://commons.wikimedia.org/wiki/File:Base_columna_trajana.jpg

Smaller versions of these letters were used when writing books. However, towards the end of the eighth century AD, a simplified version of this alphabet, nowadays known as the “Carolingian minuscule,” was introduced across the Holy Roman Empire as a standard handwritten script for rendering the Vulgate Bible in Latin. This incorporated strokes that extended above or below the main body of each letter but retained the use of serifs. In the early fifteenth century, Italian calligraphers combined the Roman capitals with the Carolingian minuscule; these were used as the basis for the earliest Western typefaces and evolved into the combination of uppercase and lowercase alphabets that is used in Western countries today (see Bigelow, 1981, for a more detailed account, of which this is mainly a summary). Due to the origins of their capital letters in inscriptions from the Roman Empire, serif typefaces are sometimes referred to as Roman or roman. For instance, Times New Roman was developed by Stanley Morison, working for the Monotype foundry, for use in the London newspaper, The Times, in 1932. A rival printing company, Linotype, developed a similar typeface known as “Times Roman” (or even just as “Times”).

Several theories have been put forward regarding why serifs should have survived in modern typography:

  • Early researchers sometimes claimed that serifs provided additional visual cues to enable readers to direct their gaze at successive words in a line of text. This idea can be found even in some modern accounts. However, the work of Hering (1879) and Lamare (1892) showed that the eye movements of experienced readers consist not in a continuous horizontal gaze but in a series of discrete fixations separated by jumps or “saccades”. (This finding is often erroneously attributed to their colleague, Louis Émile Javal: see Wade & Tatler, 2008, for discussion of this issue.)

  • Other early researchers argued that serifs helped to overcome the harmful effects of “irradiation” (see Pyke, 1926, pp. 21, 99–101, for examples). The latter is a well-established optical illusion whereby a dark figure that is presented against a light visual field appears to be larger than an otherwise identical light figure that is presented against a dark visual field. Taylor (1934) claimed that irradiation explained why letters printed in serif typefaces were harder to read when shown in white print against a black background than when shown in black print against a white background. Nevertheless, the relevance of this phenomenon to the legibility of typefaces is otherwise unclear.

  • Robinson et al. (1971) hypothesised that serifs facilitated the operation of line detectors in the human visual system. They found evidence in support of this idea using a computer model of visual processing. Nevertheless, computational models which assume that specific and unique brain cells are dedicated to the detection of lines or other features have since been criticised in favour of connectionist models which assume that groups of brain cells function as a distributed network (e.g., Schiffman, 2000, pp. 83–85, 163–166).

1.3 Sans Serif Typefaces

Sans serif typefaces are presented without serifs. (In English-speaking countries, typographers have variously used the expressions sans-ceriph, sans-serif, sans-surryph, sanserif, and sansserif: Mosley, 1999, p. 53.) In contrast to serif typefaces, the strokes constituting each letter are often of constant thickness. The right-hand panel of Fig. 1.1 shows examples of common sans serif typefaces (Arial, Comic Sans, Tahoma, and Verdana) as they are currently rendered in Microsoft Word.

Early Greek and Etruscan inscriptions routinely employed a sans serif style (Mosley, 1999, p. 17), and they were widely adopted during the Roman Republic (from 509 to 27 BC) (Bringhurst, 2019, p. 261). In contrast to serif inscriptions, relatively few examples survive today (Lightfoot, 2009), partly because of the poorer quality of the material in which they were inscribed (wood or local stone, rather than marble) and partly because from time to time the Republican authorities discouraged the construction of inscribed monuments. Nevertheless, the National Archeological Museum in Aquileia in north-eastern Italy contains many Republican inscriptions using sans serif capital letters (Clough, 2015), and about 150 of these are displayed in the online Lupa database (http://lupa.at). Figure 1.3 shows an inscription dating from the middle of the second century BC that was discovered in the area of the Roman forum in Aquileia in 1995. It came from a monument (now lost) to Titus Annius Luscus, who was elected as praetor in 156 and consul in 153 BC. Clough (2020) provided a more detailed discussion of this inscription. (I am grateful to James Clough for permission to reproduce his photograph of the inscription here.)

Fig. 1.3
figure 3

An inscription in honour of Titus Annius Luscus. Reproduced by kind permission of James Clough from https://articles.c-a-s-t.com/letter-hunting-in-italy-2-e7b51cd821a6

Sans serif inscriptions had a brief revival in the fifteenth century, when they were used to decorate buildings and monuments in a number of Italian cities (Clough, 2020; Gray, 1960). The origins of the style are a matter of debate (Stiff, 2005). Nevertheless, these developments had little or no effect on the evolution of early typefaces. Instead, early typographers used serif typefaces modelled on surviving inscriptions from Imperial Rome, and these had become widely adopted by the end of the eighteenth century.

During the latter part of the eighteenth century and the early nineteenth century, sans serif inscriptions became popular on monuments, public buildings, and garden features in the United Kingdom, where they were seen as being more natural or primitive than serif inscriptions. Between 1800 and 1820, hand-etched sans serif letters were used in publicity materials printed for commercial signwriters or engravers and on the title pages of British catalogues of antiquities, while lowercase sans serif letters were engraved from around 1810. At this time, the style was described as “Egyptian”, and it was this term that was used to describe the first uppercase sans serif typeface produced in 1816. However, the first sans serif typeface produced in both uppercase and lowercase in 1832 was described as “sans-serif”, and variations of this term were used thereafter. Such typefaces were occasionally referred to as “antique” or “grotesque”, and the equivalent terms were adopted by typographers in France and Germany, respectively, later in the nineteenth century (see Mosley, 1999, for a more detailed account, of which this paragraph is mainly a summary; see also Mosley, 2007).

Possibly in reaction to the popularity of sans serif styles, some printers devised a variant of the serif style in which the serifs were of a similar thickness to the letters’ main strokes. These were used in wood blocks from 1810 and in metal type from 1817. Somewhat confusingly, these typefaces were referred to as “antique” in the United Kingdom and the United States. However, they were also occasionally referred to as “Egyptian”, and the equivalent terms were adopted by typographers in France and Germany. In English-speaking countries, they are nowadays referred to as “slab serif” typefaces (Mosley, 1999, pp. 42, 56; 2007). Figure 1.4 shows a typical example of a Snellen chart, used to assess visual acuity (to be discussed in Sect. 4.3). This uses slab serif symbols, each constructed within a 5 × 5 grid.

Fig. 1.4
figure 4

A typical Snellen-type chart, used to evaluate visual acuity. Licenced under the Creative Commons attribution–Share Alike 3.0 Unported (CC BY-SA 3.0), https://commons.wikimedia.org/w/index.php?curid=4262200

From the 1830s, sans serif typefaces were widely used in commercial printing (Mosley, 1999, p. 43). Initially, they were mainly used for display purposes (Kinross, 1992, pp. 28–29; McLean, 1980, p. 64). Nowadays, as Perea (2013) noted, sans serif typefaces are widely used in many countries for public direction signs, although it caused some controversy when they were introduced for a new motorway (freeway) system in the United Kingdom in 1959 (see Lund, 1999, pp. 126–147). Some foundries in the United Kingdom adopted the term “gothic” for sans serif typefaces around the middle of the nineteenth century, and this term became generally used in the United States (Mosley, 1999, pp. 55–56). There was much global interchange in the evolution of typography, and the distinction between serif and sans serif typefaces appears to be universal in countries that have adopted a Western alphabet (Ovink, 1938, pp. 188–228). More recently, this has been compounded by the hegemony of word-processing software originating in the United States.

The distinction between serif and sans serif typefaces applies to most typefaces that are designed for reading over long stretches of text. It is less applicable to other kinds of typeface, such as display typefaces that are designed to attract attention (for instance, in advertisements or logos) and cursive typefaces that are intended to mimic handwriting.

1.4 Review Methodology

This book reports the findings of a systematic review of research comparing the legibility of serif typefaces and sans serif typefaces. As Uman (2011, p. 57) explained, “narrative reviews” (in other words, conventional literature reviews)

can often involve an element of selection bias. They can also be confusing at times, particularly if similar studies have diverging results and conclusions. Systematic reviews, as the name implies, typically involve a detailed and comprehensive plan and search strategy derived a priori, with the goal of reducing bias by identifying, appraising, and synthesizing all relevant studies on a particular topic.

Readers may be aware of the systematic reviews from the Cochrane Collaboration, an international organisation created in 1993 to focus on health-related issues. A complementary organisation, the Campbell Collaboration, was established in 2000 to focus on social-related issues (see Noonan & Bjørndal, 2010).

The first stage in any systematic review is to identify a search strategy based upon key terms. This can be simple or complex and in clinical research can involve the specification of inclusion or exclusion criteria. In this case, the aim was to identify all previous studies which had endeavoured to compare the legibility of serif and sans serif typefaces. It was therefore decided to use the single key term serif. “Legibility” is intrinsically a psychological concept with educational applications, but it was recognised that the legibility of serif and sans serif typefaces might vary across particular clinical populations. Accordingly, the following online databases were deemed relevant:

  • APA PsycInfo (https://www.apa.org/pubs/databases/psycinfo; formerly PsycINFO) contains approximately 5 million records, mainly relating to peer-reviewed publications. It subsumes the journal Psychological Abstracts, which went back to 1894, but it also contains some earlier publications.

  • ERIC (https://eric.ed.gov/) is the bibliographic database of the US Education Resources Information Center. It contains more than 1.6 million records of education-related materials. The collection was initiated in 1966, although it contains some earlier material. In the past, authors could use it as a repository for their own material, and so a proportion of the records relates to “grey” literature that has not been peer-reviewed. However, with effect from January 2016, ERIC introduced a selection policy that limited new records to material that had undergone some kind of review process.

  • MEDLINE (https://www.nlm.nih.gov/medline/index.html) is the bibliographic database of the US National Library of Medicine. It contains more than 27 million records relating to journal articles in the life sciences with a focus on biomedicine.

All three databases are accessible through EBSCO Information Services, which means that they can be searched simultaneously to avoid duplicate results.

These three databases were searched repeatedly during the period 2019–2021 to find publications containing the term serif in their titles, abstracts, keywords, or metadata. This led to a high number of false positives mainly because “Serif” is a common first name and family name in Turkey. It was also suspected to lead to a high number of misses, because informally it was noted that some relevant sources were not covered by any of the three databases. The results were therefore used as a basis for the conventional procedures of backward searching and forward searching. The former refers to the examination of previously published sources cited by the obtained hits, while the latter refers to the examination of subsequently published sources that cite one or more of the obtained hits. This process was facilitated by employing the database Web of Science (https://clarivate.com/products/web-of-science/, formerly Web of Knowledge), which enables searching among 79 million cited or citing sources in the form of books, journals, and conference proceedings.

When there is sufficient commonality with regard to research methods, it is possible to integrate the quantitative findings of a systematic review using statistical techniques, thus yielding a single overall estimate of a difference, variation, or effect size. Such an approach is known as “meta-analysis”. A classic example is the analysis of differences between men and women in their performance on particular cognitive tests or similar tasks (for example, see Caplan et al., 1997). Nevertheless, despite the apparently simple nature of the research question, the present literature review yielded extremely diverse methods of data collection and analysis, and this ruled out any formal statistical meta-analysis to integrate the research findings from the wide variety of studies that will be described.

The alternative approach is to rely on “vote counting” (sometimes known as the “box score” approach). For present purposes, different studies are sorted according to whether their results are statistically significant favouring serif typefaces, statistically significant favouring sans serif typefaces, or not statistically significant, and I shall invite readers to plump for the majority outcome across these three categories in particular tasks, in particular contexts, and in particular subject populations; in other words, I shall focus readers’ attention on the most common finding or, as statisticians would say, the modal finding. Such an approach is not without its hazards (see Caplan, 1979; Maccoby & Jacklin, 1974, pp. 355–356), and it can in theory lead to misleading conclusions (see Hedges & Olkin, 1985, pp. 48–52). Nevertheless, it has the major advantage over the use of narrative review that the criteria and standards used for the selection and interpretation of individual studies have been made totally explicit, and hence the findings can be readily replicated. In fact, in most cases the results are sufficiently unambiguous that readers should have little difficulty sharing my conclusions.

The final methodological point is that there is no reason to think that typographical features have the same consequences when people are reading from paper or other hard surfaces and when they are reading from computer monitors or other screens. It follows that reviews which indiscriminately combine research on reading from paper with research on reading from screens (e.g., Chung, 2020) are unlikely to be informative. Accordingly, Part I of this book reviews the research literature regarding the legibility of serif typefaces and sans serif typefaces when they are used to generate material that is printed on paper, and Part II reviews the research literature regarding the legibility of serif typefaces and sans serif typefaces when they are used to produce material that is to be viewed on display screens or by means of other kinds of technology.

1.5 Conclusions

This chapter has introduced the distinction between serif and sans serif typefaces. There seem to be widespread assumptions about their relative legibility both on paper and on screens. The chapter also described the methodology of systematic review employed to address this issue.