1 The Legibility of Serif and Sans Serif Typefaces in Internet Browsers

The research studies described thus far have been mainly concerned with material generated on local workstations using conventional word-processing software. However, many readers view material that has been saved in hypertext markup language (HTML) on remote sites on the internet to be viewed in web browsers. Even so, this is not a hard-and-fast distinction. First, word-processed documents can be uploaded to remote web sites and retrieved by other users to read on their smartphones or tablet computers as well as their own workstations. Second, how downloaded documents appear on-screen will depend on the browser settings and other software on the local device. Third, material can also be saved in HTML on local workstations and viewed through web browsers in order to mimic the retrieval of information from remote web sites. Even so, the question arises whether serif and sans serif typefaces differ in legibility when used in documents saved in HTML and viewed through web browsers. As with material that is generated on local workstations, designers and design educators tend to recommend that sans serif typefaces should be used on web sites due to the poor legibility of serif typefaces on low-resolution monitors or with small type sizes (Davidow, 2002). However, others maintain that a preference for sans serif typefaces for websites simply reflects readers’ greater familiarity with sans serif typefaces when accessing sources of information on the internet (Redich, 2012, p. 62).

Gosse (1999) asked 200 participants to read stories selected from the websites of real newspapers published outside the immediate locality. Eight stories were edited to a standard length of 325 words and presented on a laptop computer with a liquid crystal display (LCD) screen using a web browser as if they were being viewed on the World Wide Web. Four stories were presented in different serif typefaces (Courier, New Century Schoolbook, Palatino, and Times), and four were presented in different sans serif typefaces (Avant Garde, Hallmarke Light, Helvetica, and Quick Type). The stories were assigned at random to four pairs, each containing a story in serif typeface and a story in sans serif typeface. Participants were timed while they read the two stories in a pair, and they were then asked a number of questions, including their preference between the two typefaces they had seen (pp. 67–75).

There was no significant difference in either reading time or preference: serif passages were read in an average of 92.0 s, and sans serif passages were read in an average of 95.8 s; 102 of the participants preferred the serif typefaces, and 98 preferred the sans serif typefaces (p. 81). Unfortunately, there were two problems with this study. First, the opening page on the web browser listed short titles of the eight stories, each presented in the appropriate typeface, and the participants were free to choose which of the stories they read first (p. 91). This then determined which story they read second and which typefaces the participant received. This contradicts the author’s assertion that the order of presentation of the different serif typefaces was “systematically randomized” (p. 69). Second, in the data analyses, the contrast between serif and sans serif typefaces was incorrectly treated as a between-subjects variable and not as a within-subjects variable (pp. 84–85), and this would have reduced the analyses’ statistical power.

Grant and Branch (2000) asked 21 undergraduate student teachers to participate in an online experiment using a specially constructed website. The stimuli were two passages of 164 words taken from the Graduate Record Examination and two test questions presented on the same screen as the passages. Students who used a Windows platform were shown stimuli in Times New Roman and Arial; those who used a Macintosh platform were shown stimuli in Times and Helvetica. Students who accessed the website alternately received a serif typeface first or a sans serif typeface first; in both cases, all the stimuli were shown as 11-point black text on a white background. The system recorded the reading time for each passage, and the students were given feedback on their answers to the questions, but these were not recorded. The reading data were converted to words per minute and showed that the serif typefaces were read significantly more quickly than sans serif typefaces, but there was no significant practice effect between the two passages. Grant and Branch acknowledged that the number of participants in their study was relatively small and that they had compared just two typefaces in each participant. They also had no control over the platforms used by the participants.

2 The Research of Bernard and Colleagues

Michael Bernard and his colleagues carried out a series of experiments to evaluate the legibility of different typefaces when reading online material. They identified passages of text (typically around 1,000 words for young adult readers) and in each case replaced 15 randomly selected words with substitutes. The latter rhymed with the original words but were semantically inappropriate to the context. The passages were saved in HTML and were viewed on a high-resolution LCD monitor using a browser as if they had been retrieved from the internet. The participants were asked to read each passage silently but to say the inappropriate words aloud. They were also asked to rate the different passages on several characteristics and to rank their overall preference among the different typefaces.

This research was initially published in Usability News, a biannual newsletter that was produced by the Software Usability Research Laboratory at Wichita State University. Dyson (2005) argued that these reports could not be relied upon because they had not been peer reviewed. In fact, some of the reports were also published as articles in academic journals or conference proceedings, where they will certainly have been exposed to independent peer review. In such cases, I will cite both versions of these reports so that readers can make meaningful comparisons for themselves.

Bernard and Mills (2000; Bernard et al., 2003) compared the presentation of material in either a serif typeface, Times New Roman, or a sans serif typeface, Arial, in either 10-point type or a 12-point type, and in either a aliased form or an anti-aliased form. There was no significant variation either in the participants’ detection of substitutes or in the time they had taken to read the different passages. However, the fastest reading time was obtained when the passages were presented in 12-point aliased Times New Roman, and the slowest time was obtained when they were presented in 10-point anti-aliased Arial. There was no significant difference in perceived legibility between the passages presented in Arial and those presented in Times New Roman, but the passages presented in a 12-point Arial typeface (whether aliased or anti-aliased) were rated as sharper than the passages presented in 10-point anti-aliased Times New Roman. The passages presented in 12-point Arial (whether aliased or anti-aliased) or in 12-point aliased Times New Roman were the most preferred.

Bernard, Mills, Peterson, and Storrer (2001e) compared five serif typefaces (Century Schoolbook, Courier New, Georgia, Goudy Old Style, and Times New Roman), five sans serif typefaces (Agency FB, Arial, Comic Sans MS, Tahoma, and Verdana), and two ornate typefaces (Bradley Hand ITC and Monotype Corsiva). The typefaces were matched in terms of their body height and were mainly 12-point. (Whether they were aliased or anti-aliased was not specified.) There was significant variation in the time taken to read the different passages, with Corsiva yielding the fastest reading time and Tahoma the slowest. However, there was no overall difference in reading time between the serif typefaces and the sans serif typefaces. Moreover, when Bernard et al. constructed a measure of “reading efficiency” by dividing the percentage of substitutions that were detected by the overall reading time, there was no significant variation in this measure among the 12 typefaces. This suggests that any variation in reading time represented a trade-off between speed and accuracy rather than any genuine differences in reading efficiency. The participants’ perceptions showed significant variation, but again there was no overall difference between the serif and sans serif typefaces. Finally, Arial, Comic Sans, Tahoma, Verdana, Courier New, Georgia, and Century Schoolbook were ranked higher than other typefaces in terms of the participants’ overall preference.

Bernard, Lida, Riley Hackler, and Janzen (2002a) compared eight different typefaces. Of the four serif typefaces, two, Courier New and Times New Roman, had originally been designed for print applications; one, Century Schoolbook, had been designed for educational materials; and one, Georgia, had been designed to be displayed on computer screens. Of the four sans serif typefaces, two, Arial and Comic Sans MS, had originally been designed for print applications, and two, Tahoma and Verdana, had been designed to be displayed on computer screens. Different groups of participants were presented with passages in 10-point, 12-point, or 14-point type.

Bernard et al. found that passages presented in Times New Roman or Arial were read significantly more quickly than those presented in Courier New, Georgia, or Century Schoolbook. Again, however, there was no overall difference in reading time between the serif and sans serif typefaces. The passages presented in 12-point type were read significantly more quickly than those presented in 10-point type. The researchers noted that passages which were read more quickly tended to be read less accurately: in other words, there was a speed–accuracy trade-off. This time, they calculated a measure of reading efficiency by dividing the overall reading time by the percentage of substitutions detected (in other words, the reciprocal of their previous measure of reading efficiency). On this measure, there was no significant variation in this measure among the eight typefaces. The participants’ perceptions again showed significant variation, but there was no systematic difference between either the ratings or the rankings of the serif typefaces and the sans serif typefaces.

Bernard and colleagues carried out further experiments with both older and younger participants. Bernard, Liao, and Mills (2001b, c) tested 27 adults aged between 62 and 83. Passages of around 700 words were presented in two serif typefaces, Times New Roman and Georgia, or two sans serif typefaces, Arial and Verdana, in either 12-point or 14-point type. In each passage, ten randomly selected words had been replaced by substitutes that rhymed with the original words but were semantically inappropriate to the context. The participants were also asked to rate the different passages with regard to the perceived legibility of the typeface and to rank their overall preference of the typefaces.

The passages presented in 12-point serif typefaces were read significantly less quickly than the passages presented in either 14-point serif typefaces or 14-point sans serif typefaces. When Bernard et al. constructed a measure of reading efficiency by dividing the percentage of substitutions detected by the overall reading time, the 14-point passages yielded higher scores than the 12-point passages, but there was no significant variation in reading efficiency across the four typefaces. Similarly, the participants rated the 14-point passages as being more legible than the 12-point passages, but there was no significant variation in their ratings of the four typefaces. Finally, the 14-point sans serif typefaces were ranked higher than the 12-point sans serif typefaces and all of the serif typefaces in terms of the participants’ overall preference. No significant differences were found on any measure between the typefaces designed for printing on paper (Times New Roman and Arial) and those designed for screen display (Georgia and Verdana).

Bernard, Liao, Chaparro, and Chaparro (2001a) repeated this experiment with a new sample of 26 older adults in order to focus on their perceptions of different typefaces. After reading each passage, they rated it on 7-point scales in terms of its legibility, how easy it was to read, its sharpness and crispness, its attractiveness, and its personality. Finally, they ranked their overall preference of the typefaces. The 14-point passages obtained higher scores than the 12-point passages, although this was mainly true for men, not for women. No significant differences were found among the four typefaces on any of the aspects of the participants’ perceptions. Overall, the participants ranked the sans serif typefaces higher than the serif typefaces, but there was no systematic difference between the ranks of the typefaces designed for printing on paper and the ranks of those designed for screen display.

Bernard, Mills, Frank, and McKown (2001d; Bernard, Chaparro, Mills, & Halcomb, 2002b) tested 27 children aged between 9 and 11. Children’s short stories of about 580 words were presented in two serif typefaces, Times New Roman and Courier New, or two sans serif typefaces, Arial and Comic Sans MS, in either 12-point or 14-point type. In each story, 15 randomly selected words had been replaced by substitutes that rhymed with the original words but were semantically inappropriate to the context. The participants were also required to rate the different stories with regard to how easy they were to read, whether they enabled them to read faster, the attractiveness of the typeface, and whether they would like their schoolbooks to use the typeface. Finally, they ranked their overall preference of the eight typefaces.

There were no significant differences among the four typefaces and the two type sizes in terms of either the detection of substitutes or the speed of reading. Bernard et al. computed a measure of reading efficiency by dividing the reading time by the percentage of substitutes detected (so that lower scores implied higher efficiency). The only significant difference was that reading efficiency was less on stories presented in Courier New than on stories presented in the other typefaces. The stories presented in 14-point type were rated as significantly better than those presented in 12-point type in terms of their ease of reading, reading more quickly, their attractiveness, and their use in schoolbooks. Stories presented in Times New Roman were rated as being less easy to read than those presented in Arial or Comic Sans; stories presented in Times New Roman were rated as being less attractive than those presented in Comic Sans; and those presented in Times New Roman or Courier New were rated as less desirable for use in schoolbooks. Among the 14-point typefaces, Arial and Comic Sans were ranked higher in overall preference than Courier New or Times New Roman; among the 12-point typefaces, Comic Sans was ranked higher in overall preference than the other typefaces.

3 Subsequent Research

Myung (2003) presented 12 Korean students with newspaper stories of between 453 and 532 words using the internet browser installed on a personal computer. The stories were shown in three typefaces: one serif typeface (Batang) and two sans serif typefaces (Dodum and Gulim). The participants were asked to read the stories to themselves and then to rate their typographical appearance on a 7-point scale. Myung calculated a measure of reading speed by dividing the total number of characters in each story by the time taken to read it. There was no significant variation among the three typefaces in terms of the participants’ reading speed. However, the application of conjoint analysis to the participants’ preference ratings showed that the stories that had been printed in Dodum and Gulim were preferred to those printed in Batang.

Ling and van Schaik (2006) carried out two experiments to examine the influence of typeface and line length on students’ use of web pages. In both cases, the web pages were presented in either a serif typeface (12-point Times New Roman) or a sans serif typeface (10-point Arial) and in four different line lengths. In their first experiment, 72 participants had to say whether or not a mock web page presented in a browser contained a specified hyperlink. There were no significant differences between the students who saw web pages in the serif typeface and those who saw web pages in the sans serif typeface in either the accuracy or the response time for hits or in either the accuracy or the response time for correct rejections.

In their second experiment, 99 participants had to answer questions based upon the information contained in five mock web sites, each consisting of 30 pages, on various topics. The proportion of correct answers approached 100%. There were no significant differences between the students who saw web sites in the serif typeface and those who saw web sites in the sans serif typeface in either the time taken to carry out their task or the number of web pages that they visited to find the correct answers. Ling and van Schaik concluded that there was no difference between the serif typeface and the sans serif typeface either in visual search or in information retrieval.

After both experiments, the participants were asked to express a preference between the two typefaces and to rate their aesthetic value on a 10-point scale. In the first experiment, regardless of which typeface they had seen, the participants tended to prefer Arial rather than Times New Roman and to rate Arial more highly than Times New Roman in terms of aesthetic value, although the latter difference was small in magnitude and unlikely to be of any practical importance. In the second experiment, there were no significant differences in either the participants’ preference or in their ratings of aesthetic value.

Chernecky et al. (2006) recruited 22 cancer patients. The patients were assigned to workstations in groups of two or three but recorded their individual responses on a prepared form. The stimuli were presented using an internet browser, but the computers and monitors used were not specified. In one section of the test, the patients were presented with examples of text in varying sizes and in different typefaces with different backgrounds, two at a time. In each case, they indicated which of the two displays that they preferred. The strongest preference was for the serif typeface Times New Roman in a ten-point font and in blue lettering on either a tan or white background. The next strongest preference was for the sans serif typeface Arial in a nine-point font and black lettering on a tan background. However, the sans serif typeface Verdana was not preferred. Chernecky et al. ascribed the preference for the serif typeface to the fact that it was widely used in books, newspapers, and magazines. They did not carry out any kind of statistical analysis of their results, and they did not include any comparison group, and so it is unclear whether their findings were peculiar to cancer patients or would generalise to other kinds of participant.

In a study mentioned in Sect. 12.5, Shaikh et al. (2006) obtained participants’ ratings of the appropriateness of different typefaces for various online purposes. Fox et al. (2007) selected three of these typefaces judged to be of high, medium, or low appropriateness for each of three purposes: a business document, an e-mail message, and a narrative for young people. A total of 120 participants were presented with an example of each document (a bank letter, an e-mail invitation to a company picnic, and an explanation of how fireworks work) in one of the three typefaces. Each was presented as an HTML web page and was viewed using a web browser. The participants were asked to rate the “personality” of the document using 15 bipolar scales from Shaikh et al.’s (2006) study and to rate its “ethos” (their perceptions of the author and the intended readership) using five scales.

The choice of typeface had no significant effect on the participants’ perceptions of the business document, except that its author was viewed as less mature if the least appropriate typeface was used. If the least appropriate typeface was used for the e-mail message, it was viewed as less stable, less practical, more rebellious, more youthful, and more feminine; its author was also viewed as less believable, less professional, less trustworthy, and less mature. The choice of typeface had no significant effect on perceptions of the narrative for young people, except that it was viewed as more youthful and more casual if the most appropriate typeface was used. Fox et al. concluded that in general on-screen documents were more likely to be perceived in a negative manner if they were presented using a less appropriate typeface.

Beymer et al. (2008) carried out a similar study. They presented 82 employees of a computer company with one-page stories taken from a science news website. They were presented on a computer screen as a series of web pages in a 12-point anti-aliased typeface: half of the participants saw the stories in the serif typeface Georgia, and half saw them in the sans serif typeface Helvetica. Their eye movements were monitored while they read the stories, and a multiple-choice test was administered after each story to check their retention. There was no significant difference between the two subgroups in their reading speed, in a variety of statistics relating to their eye movements, or in their retention. Beymer et al. noted that around half of their participants reported having a first language other than English. Having found significant differences in eye movements related to the participants’ age, they focused on those aged 30–39. The participants for whom English was the first language produced shorter fixations and longer eye movements than those who had some other first language. However, neither group showed any differences between the two typefaces.

Ali et al. (2013) compared the legibility of serif and sans serif typefaces in 48 Malaysian students reading texts of moderately high difficulty containing 140 words in the Malay language. These were presented in a web browser on an LCD monitor in a 12-point typeface. For the first 24 participants, the texts were presented in two typefaces designed for screen presentation: the serif typeface Georgia and the sans serif typeface Verdana. For the second 24 participants, the texts were presented in two typefaces designed for printed media: the serif typeface Times New Roman and the sans serif typeface Arial. The participants were required to read the texts aloud as quickly and accurately as possible, and their performance was monitored by two research assistants. Their reading speed and their accuracy were both mapped onto scales from 1 to 5, and the results were added together to yield an overall score. There was no sign of any difference in performance either between Georgia and Verdana or between Times New Roman and Arial. Two problems with this study are that all the students read the texts in the same sequence and that all saw the two typefaces in the same sequence; hence, there was no control for transfer effects (such as the positive effect of practice or the negative effect of fatigue). The researchers also carried out independent sample tests when the observations had been obtained by repeated testing of the same participants, and this would once again have reduced the analyses’ statistical power.

Mátrai and Kosztyán (2014) devised web pages containing verbal comprehension tasks and compared text presented in the serif typeface Times New Roman and text presented in the sans serif typeface Arial. In addition, they manipulated the size of the text (three sizes for each typeface), the line length and spacing, the colour of the background, and the alignment of the text, which yielded a total of 144 conditions. Each of 125 university students was asked to solve the tasks on a sample of 40 web pages. (The computers and the monitors were not specified.) A regression analysis found no significant difference between the two typefaces in terms of the response latencies. Mátrai and Kosztyán stated that there was a significant difference in terms of the proportions of correct responses, but they did not provide any further information. In fact, the overall difference was relatively slight (Times New Roman, 84.0%; Arial, 83.3%) and unlikely to be of any practical importance. Finally, the students were asked to express a preference among the different displays, but there was no significant difference in their preference between the two typefaces. A fundamental problem with this study is that Mátrai and Kosztyán assigned different verbal comprehension tasks to different conditions, but they failed to evaluate whether the tasks were of equal difficulty. Consequently, even the small difference that they found between the two typefaces in terms of the proportions of correct responses might have been due to differences in the difficulty of the relevant tasks rather than to differences in the legibility of the typefaces.

Beyon and Cox-Boyd (2020) carried out a follow-up to the study mentioned in Sect. 6.4 by Gasser et al. (2005), who varied the typeface used when participants were reading text from paper. Beyon and Cox-Boyd used a text concerning spinal health. They presented this text in four different typefaces: a monospaced slab serif typeface (Courier New), a monospaced sans serif typeface (Lucida Console), a proportionally spaced serif typeface (Palatino Linotype), and a proportionally spaced sans serif typeface (Arial). Independent of this, they presented the text in black, blue, or red, yielding 12 different conditions. They recruited volunteers from an online website (Amazon Mechanical Turk), who were asked to carry out the task as an online survey. They were asked to read the text, complete a questionnaire about their attitudes to spinal health as a distractor task, and then answer six questions to test their retention of the key information contained in the original document. The participants were randomly assigned to one of the 12 presentation conditions. Beyon and Cox-Boyd found no significant differences in performance among the four typefaces or the three type colours. Beyon and Cox-Boyd acknowledged that they had no control over the devices or platforms which the participants had used to carry out the task.

4 Conclusions

This chapter discussed whether serif and sans serif typefaces differ in their legibility when the material is saved in HTML and viewed on-screen through web browsers. This includes material saved in local workstations as well as material retrieved from the internet. In addition to a variety of individual studies, the chapter described a research programme that was carried out by Bernard and colleagues at Wichita State University. Further research has been carried out into the use of different typefaces for various online purposes. When reading material in internet browsers, by far the most common finding is that there is no significant difference between serif typefaces and sans serif typefaces in terms of the users’ reading comprehension, reading speed, or reading accuracy. There is also no consistent evidence that readers have a preference between serif typefaces and sans serif typefaces when reading material in internet browsers. Once again, both serif and sans serif typefaces are regarded as being broadly appropriate for internet sites, whereas display and cursive typefaces are regarded as being generally inappropriate for serious use.