1 The Importance of Context

Whittemore (1948) argued that the legibility of different typefaces depended upon the context in which they were used. Readers may develop expectations with regard to the kinds of context in which particular typefaces are appropriate. A problem with much of the research described thus far is that it did not provide readers with any sensible context for their reading (Schriver, 1997, p. 277). Some researchers have endeavoured to address this issue.

Zachrisson (1965, pp. 156–162) investigated the attitudes of experts and non-experts to whether material was printed in serif or sans serif typefaces. Typography experts and students from various subject areas were shown samples for each of six themes. A serif typeface was preferred for a wedding invitation, a perfume advertisement, and the title page for a book of lyrical verse, but a sans serif typeface was preferred for an invitation to an art exhibition, the title page for a book on modern architecture, and an advertisement for an oil stove. The rankings given by the experts and the non-experts were relatively similar.

Hvistendahl and Kahl (1975) prepared four newspaper stories of 250–400 words in four serif typefaces (Imperial, News #2, News Bold, and Royal) and four sans serif typefaces (Futura, Helvetica, News Sans, and Sans Heavy). Each of 200 subjects was asked to read at their normal reading pace two stories in serif typefaces and two stories in sans serif typefaces; in each case, one story was printed in 10.5-point type, while the other was printed in 14-point type. For two stories, the serif typefaces yielded a significantly faster mean reading time than the sans serif typefaces; for the other two stories, there was no significant difference in the reading times. Hvistendahl and Kahl then showed each participant two out of eight stories set in both a serif typeface and a sans serif typeface; in each case, one story was printed in 10.5-point type, and the other was printed in 14-point type. The participants were asked to express their preference between the two typefaces in which each story was printed. Overall, the serif typefaces were preferred 68% of the time, while the sans serif typefaces were preferred only 32% of the time. Nevertheless, it should be noted that Hvistendahl and Kahl did not ask their participants to compare stories set in serif and sans serif typefaces of the same size.

Moriarty and Scheiner (1984) asked 260 college students to read a page of text from a sales brochure for stereo speakers. Half the students read the text in a serif typeface (Times Roman), and half read it in a sans serif typeface (Helvetica). Independent of this, half the students read the text in regular spacing, and half read it with an 18% reduction in spacing. They were given 105 s to read the text and marked the last word that they had read when the time limit was reached. The students given a close-set type read significantly more than those given the regular type. However, there was no significant difference between the students who read the text in a serif typeface and those who read it in a sans serif typeface and no significant interaction between the two variables. Moriarty and Scheiner concluded that there was no difference in reading speed between the serif and sans serif typefaces in their study.

2 Serif and Sans Serif Typefaces in Newspaper Headlines

The headlines placed above articles in the body of a newspaper are usually presented in a large typeface, often in a bold font, and they may extend over two or more lines. They may be presented (a) all in capitals, (b) with initial capitals for the principal words (“title case”), or (c) with initial capitals only for the first word and any proper nouns (sometimes known as “sentence case”). Research in the first half of the twentieth century showed that text presented in lowercase was read more quickly than text presented in uppercase and more specifically that headlines in title case were read more quickly than headlines all in capitals (see Tinker, 1963, pp. 186–190). Newspaper headlines are sometimes presented in sans serif typefaces. Arnold (1956) argued that “Sans Serif... is highly readable and, more than any other typographic device, conveys an impression of a newspaper that is alert and up to date” (p. 19).

English (1944) presented three-line headlines in title case tachistoscopically for 450 ms to 45 students of journalism and psychology. He used a serif typeface (Bodoni bold), a slab serif typeface (Karnak bold), and a sans serif typeface (Tempo medium) in three sizes, and he used the number of words reported correctly as a measure of performance. Headlines presented in Bodoni and Tempo yielded significantly better performance than those presented in Karnak but were not significantly different from each other. Variations in type size had no effect upon the participants’ performance. Another group of 50 students was shown pairs of headlines in different typefaces and asked to choose which member of the pair seemed easier to read. There were no significant differences among their preferences for the different typefaces.

Haskins (1958) presented 300 participants recruited through the Saturday Evening Post with ten different magazine articles. The subtitles and text of the articles were presented in their original form, but their headlines were presented to different participants in one of ten different typefaces, including two sans serif typefaces (Futura Light and Futura Bold). The participants were asked to judge how appropriate each headline was for the article to which it was attached using a 6-point scale. The variation in their ratings across the ten typefaces was highly significant; in particular, Futura Light was on average judged to be fairly appropriate, whereas Futura Bold was judged on average to be very appropriate for eight of the articles. These results imply that sans serif typefaces are at least as appropriate as serif typefaces for the headlines of such articles.

Click and Stempel (1968) carried out an experiment in which college students rated six newspaper front pages on 20 semantic differential scales (see Sect. 5.3). However, they had carried out a pilot study in which six front pages had used sans serif typefaces in their headlines and four had used serif typefaces. They found that sans serif typefaces and serif typefaces yielded similar ratings if they were used with similar front-page formats. As they noted, “The main source of variation in response was the format and not the typeface” (p. 130). Because of this, they only used sans serif headlines in their main experiment.

Haskins and Flynne (1974) investigated whether the choice of different typefaces for newspaper headlines affected readers’ interest in the accompanying stories. They carried out interviews with 150 female heads of household, of whom 100 were asked to read through a genuine local newspaper in which a mock women’s page had been inserted. This contained five articles drawn from various newspapers and magazines in which the headlines had been printed either in a serif typeface judged to be relatively “feminine” (Garamond Italic) or in a sans serif typeface judged to be relatively “masculine” (Spartan Black). The remaining 50 female heads of household read the newspaper without the women’s page, together with the headlines of the five articles printed on individual white cards. The participants were asked to rate the attractiveness and interest of each page of the newspaper on a scale from 0 to 100 and to rate their interest in each of the five articles. They were then shown the mock women’s page printed in ten different typefaces and were asked to rate each version using 12 semantic differential scales.

Consistent with the researchers’ assumptions, two typefaces often used on women’s pages of newspapers, Garamond Italic and the cursive script Coronet Light, were rated more highly on several supposedly feminine characteristics, whereas Spartan Black, which was often used on sports pages, was rated more highly on several supposedly masculine characteristics. The other typefaces were rated as being relatively neutral on these characteristics. Nevertheless, there was no significant difference in the ratings of overall reading interest given to the women’s pages with headlines printed in Garamond Italic and in Spartan Black. Only one of the five articles showed a significant difference in the ratings of reading interest, where the version with a headline printed in Garamond Italic was rated more highly than the version with a headline printed in Spartan Black. Haskins and Flynne concluded that, while some typefaces used in headlines were perceived as more feminine or more masculine, this had no effect on a woman’s interest in reading a women’s page.

In a study described in more detail in Sect. 6.3, Wheildon (1990, pp. 18–22; 2005, pp. 61–73) presented the same participants with articles containing headlines in different typefaces and evaluated their perceived legibility by asking the participants to say simply whether the headlines were easy to read or not. When the headline was presented in a lowercase serif typeface, 92% said that it was easy to read; when it was presented in a lowercase sans serif typeface, 90% said that it was easy to read. (He did not specify the exact typefaces that he had used, and he did not explain whether “lowercase” meant title case or sentence case.) Wheildon (1990, p. 22; 2005, p. 72) concluded that there was little to choose between serif and sans serif typefaces in headlines.

3 Wheildon’s Research

Colin Wheildon (1984) carried out a study of the impact of various typographical factors on the comprehension of newspaper copy that incorporated a comparison between serif and sans serif typefaces. He prepared revised versions of his original report in 1986 and 1990, and he also incorporated his account into a book on typography and design (Wheildon, 1995). This in turn went through several editions and revisions, and the final version was published in 2005. Wheildon’s research has been cited in support of the idea that serif typefaces are more legible than sans serif typefaces in continuous text (Kempson & Moore, 1994, pp. 52, 284; Schriver, 1997, p. 274). It has, however, proved to be extremely controversial (Poole, 2012).

Wheildon recruited 300 volunteers from among the inhabitants of Sydney, Australia, and he visited them in their homes on several occasions. Their educational level tended to be higher than that of the general population (79% had graduated from high school and 23% had obtained a university degree or a comparable qualification), but none was professionally involved in printing or publishing. On each visit, they were asked to read a mock newspaper article to a time limit and were then asked ten questions to test their comprehension of its content. For each visit, the participants were randomly divided into two equal subsamples who read the article in different forms, and they were classified into three groups depending on the number of questions that they had answered correctly: between seven and ten questions, good comprehension; between four and six questions, fair comprehension; and between zero and three questions, poor comprehension (Wheildon, 1990, p. 9; 2005, pp. 134–138).

At two of the visits, the bodies of the relevant articles were presented in either a serif typeface (Corona) or a sans serif typeface (Helvetica). The sequence of administration of the two typefaces was counterbalanced across different participants, and so the comprehension of the two typefaces was compared within the same individuals. The results were analysed for the 224 participants who had participated at all of the visits. When reading an article in serif typeface, comprehension was scored as good for 67%, fair for 19%, and poor for only 14%; however, when reading an article in sans serif typeface, comprehension was scored as good for only 12%, fair for 23%, and poor for 65% (Wheildon, 2005, p. 47).

Wheildon also asked the participants who had shown either poor comprehension or good comprehension “leading questions” about their attitudes to the articles and the layout of the pages. He commented that “these responses were collected for anecdotal rather than scientific value” (Wheildon, 1990, p. 9), but he felt that they helped to explain some of the objective results (Wheildon, 2005, p. 138). He summarised the comments made by the 112 participants who had read an article intended to be of direct interest that had been presented in the sans serif typeface. Many of their comments referred to their difficulty in concentrating on the reading task. However, when they were then asked to read another article presented in the serif typeface, they reported no physical difficulties (Wheildon, 1990, p. 17; 2005, p. 48),

In introducing his research, Wheildon (1990, p. 16; 2005, p. 46) had mentioned only one previous study on the legibility of serif and sans serif typefaces (Pyke, 1926), and he did not acknowledge that the sheer size of the effect that he had found linking serif typefaces to better comprehension was clearly anomalous when taken in the context of the findings of all other research carried out up to that point. Nor did he comment on the apparent disparity between these results and his findings regarding the legibility of serif and sans serif typefaces when used in newspaper headlines (described in the previous section). Instead, he argued: “The conclusion must be that body type must be set in serif type if the designer intends it to be read and understood” (Wheildon, 1990, p, 17; 2005, p. 48).

Poole (2012) argued that Wheildon’s account of his research lacked key information that would enable a sceptical reader to evaluate the study. The introduction to the expanded version of Wheildon’s (1990, pp. 9–10) report and an appendix to Wheildon’s (2005, pp. 133–140) book do provide additional information about his research methods, but some important details are unclear. For instance, the report states that each of the articles extended over several pages (Wheildon, 1990, p. 9). However, the book states that they were designed to fit in four columns 5 cm wide and 30 cm deep on a single page, while the examples that are provided in the book indicate that some space on the page was taken up by a headline, a by-line, and two illustrations (Wheildon, 2005, pp. 34–35). Neither account mentioned either the final number of words in each of the articles or the time allowed to read the articles and to answer the comprehension questions (Wheildon, 1990, p. 9; 2005, pp. 33–48, 137–138).

There are some additional issues with Wheildon’s research. First, his general account of the research methodology suggested that each participant was asked to read one article at each visit, and that comparisons were made between their comprehension of different articles at different visits (Wheildon, 2005, p. 137). Nevertheless, he also mentioned a group of 112 participants who were tested on an article of direct interest in a sans serif typeface but who were tested on “another article with a domestic theme” in a serif typeface immediately afterwards (Wheildon, 1990, p. 17; 2005, p. 48). The latter arrangement is clearly more vulnerable to transfer or carry-over effects (for instance, due to practice or fatigue) than repeated testing separated by an interval of weeks or months.

Wheildon (1990, p. 9; 2005, p. 136) had designed his materials to measure the effects of several different variables simultaneously. For example, half the mock newspaper articles were designed to be of direct or broad interest to the participants, whereas the other half were designed to be of limited or specific interest. This manipulation produced a difference of 10 percentage points in the participants’ level of comprehension (Wheildon, 2005, p. 36). Other variables included the use of capital letters in the headlines, the use of colour in the headlines or in the text, the use of justified versus unjustified text, and the use of italic font in the text. Wheildon (2005, p. 136) explained this by arguing that the different manipulations were logically separate from one another, and hence there was no need to change the samples of participants to measure their effects. However, this ignores the possibility that these effects were not empirically separate from one another. In other words, the apparent difference in comprehension between material printed in serif and sans serif typefaces might simply have been an artefact due to a confounding of this manipulation with one or more of the other variables.

One further possibility is that of researcher bias. Wheildon’s (2005, pp. 24, 103) own comments make it clear that he had always had a deep antipathy towards sans serif typefaces. He himself had interviewed and tested all the participants in their own homes (p. 137), and so it is possible that his underlying attitudes to serif and sans serif typefaces might have (either intentionally or unintentionally) influenced how they had set about their task and thus might have influenced their comprehension. This issue should have been addressed by employing assistants to test the participants who were blind (i.e., uninformed) as to the specific research hypotheses.

4 More Recent Research

Schriver (1997, pp. 289–303) obtained examples of texts that might be read for each of four common purposes and presented each in both a serif typeface and a sans serif typeface with similar x-heights. The four purposes or “genres” were: (a) reading to enjoy (a two-page spread from a short story, presented in Bauer Bodoni and Univers); (b) reading to assess (a one-page business letter from a bank, presented in Palatino and Futura); (c) reading to do (a two-page spread from an instruction manual, printed in Times Roman and Helvetica); and (d) reading to learn to do (a two-page spread from a manual to help people estimate their taxes, printed in Garamond Light and Optima). Schriver presented these texts to 67 volunteers using a within-subjects design that counterbalanced the order of the texts and the order of the typefaces. The participants were asked to say which version of each text they preferred and also to say why.

Similar proportions of participants chose the serif and the sans serif typefaces. More detailed examination showed that on balance participants tended to prefer the serif typefaces for the short story and the tax manual, but they tended to prefer the sans serif typefaces for the instruction manual and the business letter. Their preferences were influenced by various factors related to the rhetorical context of each text: the mood or tone of the text; the density of the text; the contrast among the parts of the text; the legibility of the text; and the quality of printing. Schriver concluded: “This study suggests that people find serif and sans serif typefaces equally pleasing but that the situation in which they are reading may lead them to prefer one style over the other” (p. 302).

McCarthy and Mothersbaugh (2002) showed a fictious advertisement about Ontario to 265 business students in their regular classes. Three aspects were varied independently and at random: serif versus sans serif typefaces taken from a family of artificial typefaces; 8-point type versus 10-point type; and types with x-heights of 50% or 70% of the associated capital letters. The participants were asked to read the material to themselves for 1 min and to circle the word that they were reading when the time limit was reached. They were then asked to read a control advertisement following the same procedure and were classified into fast or slow readers using a median split on the number of words that they had read.

The number of words that they had read from the first advertisement was analysed by a between-subjects analysis of variance with the independent variables of typeface, type size, x-height, and reading skill. The use of a serif typeface led to better performance than the use of a sans serif typeface, but only for fast readers reading small typefaces and only for fast readers reading typefaces with a large x-height. The contrast between serif and sans serif typefaces had no significant effect for slow readers, for fast readers reading large typefaces, or for fast readers reading typefaces with a small x-height. These results suggest that any differences in the legibility of serif and sans serif typefaces will only arise as a result of very specific interactions with the effects of other features of the typeface and of the readers themselves.

In the studies mentioned in Sect. 5.4, Bartram (1982) and Rowe (1982) had only studied the connotations of typefaces when used for individual words. Even so, both maintained that these connotations would influence readers’ interpretations when the typefaces were used for regular narrative. E. R. Brumberger (2003b) set out to test this idea by comparing the connotations of typefaces and the connotations of texts in which they were used. In her first study, she asked 80 students to rate how much each of 15 descriptors applied to each of 15 typefaces. A factor analysis of their responses yielded three broad dimensions, which she labelled “elegance,” “directness,” and “friendliness.” Multidimensional scaling yielded a similar grouping of the 15 typefaces. Brumberger noted that the resulting categories were based on the semantic qualities of the typefaces, not their physical characteristics. In particular, each category subsumed both serif and sans serif typefaces (see also E. Brumberger, 2004; Mackiewicz & Moeller, 2004).

In her second study, Brumberger (2003b) presented another 80 students with 15 different texts, each containing 375 words, drawn from a variety of published sources. The participants were asked to read each text and to rate it on the same 15 scales. A factor analysis yielded three broad dimensions that were very different from those found in the first study. She labelled these new dimensions “professionalism”, “violence”, and “friendliness”. Brumberger argued that she had demonstrated that readers consistently ascribe particular personality attributes to particular typefaces and text passages. However, the lack of concordance between the results of the two studies contradicts the idea that the connotations of typefaces affect readers’ interpretation of the texts in which they appear.

Brumberger (2003a) selected one typeface that represented each of the dimensions in her first study: the cursive typeface CounselorScript for elegance, the serif typeface Times New Roman for directness, and the sans serif typeface Bauhaus Md BT for friendliness. She also selected one text that represented each of the dimensions in her second study: a passage from a psychology textbook for professionalism, an excerpt from a novel for violence, and an excerpt from an article on snowboarding for friendliness. Different groups of students were presented with the texts in different typefaces in a between-subjects counterbalanced design and were asked to rate the relevant text on the 15 scales used in her earlier study. There were significant differences in the ratings given to the three texts, but no significant difference in the ratings given to texts in different typefaces. Brumberger suggested that the “persona” of the texts might have overridden that of the typefaces (p. 230).

Gasser et al. (2005) asked 149 psychology students to read an information sheet about tuberculosis that was being used at a local health-care facility. They were presented with the information sheet in one of four typefaces: a slab serif typeface with monospacing (Courier), a serif typeface with proportional spacing (Palatino), a sans serif typeface with monospacing (Monaco), or a sans serif typeface with proportional spacing (Helvetica). (With monospacing, each character occupies the same width, but with proportional spacing different characters take up different amounts of horizontal space.) They read the material silently at their own pace and were then given a short attitudinal questionnaire as a distractor task. Finally, they answered six open-ended questions on key points in the material. The students who read the material in serif typefaces answered more of the questions correctly than did those who read it in sans serif typefaces, although the difference only just attained statistical significance. Gasser et al. suggested that the students had found information printed in serif typefaces easier to read (and thus easier to remember) because they were more familiar with such typefaces in their regular educational material.

Juni and Gross (2008) asked 102 university students to read two satirical articles from the New York Times; one was concerned with government issues and the other with education policy. They then rated each article on 12 qualities. One of the articles was presented in the serif typeface Times New Roman, and the other in the sans serif typeface Arial. For the government article, the version presented in Times New Roman was rated as significantly more angry and as significantly less cheerful than the version presented in Arial. For the education article, the version presented in Times New Roman was rated as significantly more frivolous than the version presented in Arial. It should however be noted that Juni and Gross found just three significant differences out of a total of 24 comparisons between the two typefaces without controlling for the possibility of spurious results due to chance variation (i.e., Type I errors), suggesting that the choice of typeface generally made little difference to their participants’ perceptions of the two articles.

5 Conclusions

It has been argued that the context of reading is a primary determinant of the legibility of different typefaces and the readers’ expectations of the legibility of what they are reading. Newspaper headlines have been used as a specific context in which researchers have studied the legibility and connotations of different kinds of text. Wheildon (1990, 2005) presented an extensive programme of research on the legibility of different kinds of text. However, his research has come under extensive criticism and suffers from further issues that have not been noted in previous research. Several researchers have subsequently considered the effect of variations in typefaces and the expectations of readers in different kinds of situations. In general, research on reading in context provides no convincing evidence for any difference in the legibility of serif and sans serif typefaces. Nevertheless, there is a suggestion that readers’ preferences and the connotations of serif and sans serif typefaces might well vary between different contexts.