1 Introduction

We would like to thank two anonymous reviewers and particularly Ingo Plag for comments, questions and suggestions.

Criticism of English spelling, calls for its reform, and even attempts at reform go back to the beginning. In his Orrmulum, composed in the late 12th century, at the dawn of Middle English, the monk Orrmin complained of people’s mispronunciation of the language and used an orthography of his own devising in which short vowels were followed by double consonants and long vowels by single consonants so as to distinguish the two sorts, which ordinary spelling did not. He also used two distinct letters for /g/ and /dʒ/, where the writing system to this day uses one symbol <g>.Footnote 2 Richard Mulcaster, in his Elementarie (1582), published barely a century after Caxton printed the first book in English, remarked on the irregularity of English spelling: ‘forenners and strangers do wonder at vs, both for the uncertaintie in our writing, and the inconstancie in our letters.’ He provided a list of some 8000 words and proposed a number of spelling rules, including the use of silent <e> to distinguish ‘short’ and ‘long’ vowels (by then distinct vowel qualities), revisiting Orrm’s problem four centuries later. Complaints and attempts at spelling reform continued well into the 20th century, whose most famous (and wealthiest) reform advocates were George Bernard Shaw and Robert McCormick (publisher of the Chicago Tribune).

While the general public continued to agree about the inadequacies of English spelling, a small group of linguists began a contrarian effort in the late 1960s. In their magnum opus, Chomsky and Halle (1968) claimed that “conventional orthography is [...] a near optimal system for the lexical representation of English words” (p. 49). Their claim was based largely on the observation that the system seems to favor having “one representation for each lexical entry,” so that predictable vowel alternations in cases like combine/combination and even more radical changes like electric/electricity/electrician or sign/signal are ignored in favor of lexical stem consistency.Footnote 3

On a more general level, their claim raised the question of the extent to which a writing system, that of English in particular, encodes linguistic information besides sound, especially morphological and lexical information. In this vein, Aronoff (1978) studied the final letter sequences <our>, <or>, and <er>. More recently, Berg et al. (2014) showed that final <s> marks an inflectional suffix (/s, z, əz/), which marks the plural for nouns or the third person singular present for verbs. Almost all other instances of final /s/ that could be interpreted as the plural morpheme are spelled differently. The possessive is written <’s> with an apostrophe, the only English suffix to use this diacritic. Berg and Aronoff (2017, 2018) investigated derivational suffixes, showing that a number of English derivational suffixes have unique spellings and that these became more consistent over the last 500 years, along with but separate from the general trend towards more regular spelling at all levels. For example, the final letter sequence <ous> is now reserved largely for adjectives, with other spellings of unstressed /əs/, such as <us>, used for other parts of speech (e.g. porous vs. bonus). In subsequent work, Ulicheva et al. (2018) have shown that English spellers use these regularities.

In our article, we focus on another morphological unit, the stem, but not on the consistency in spelling within differently pronounced morphologically related words like <electric/electricity/electrician>. We will deal instead with differences in spelling among individual members of a set of stems that have the same sound but are lexically unrelated, such as <pair/pare/pear>. We will call such sets ‘heterographic stems’, or simply ‘heterographs’. In a phonologically driven written language like Italian, such spelling differentiation is impossible: <legge> ‘law’ and <legge> ‘s/he reads’ must be spelled identically because they are pronounced identically. The existence of sets of heterographs in English suggests that its orthographic system is indeed sensitive to lexical information.

Investigating this issue does not only tell us something about the principles that are at work in the spelling of a particular morphological domain in English, the stem, but also about the fundamental problem of morphology, the mapping of form and meaning. By investigating heterographic stems, the present study will provide an answer to the question of whether it is important for writers (or the language they are writing) to express lexical differences between stems in their written forms. We will show that at least for English, differentiating homophonous stems does not seem to be of great importance. Homography, much like homophony, is often tolerated – at least when it comes to the spelling of stems. The spelling of affixes is markedly different in that homophonous affixes often have a distinct graphemic form. As a consequence, the English writing system is both phonographic and lexical/morphological; it all depends on the morphological unit we are looking at.

It is important to distinguish systematic regularities in spelling from the accidental detritus of history. Written language is always more conservative than spoken language. Frequently, pairs of words that were previously distinct in both speech and writing will merge in speech but leave their spellings distinct. A good present-day example are pairs of words that have participated in the ‘pin/pen merger’ in (mostly southern) American varieties, where /i/ and /ɛ/ have merged before nasals. Speakers of these varieties still distinguish <pin> and <pen> in spelling, because the spelling system has not caught up with the pronunciation and because national and international norms encourage such distinctions. Such cases are of no interest to us, because they can be explained as simple accidents of history. But if we can find examples in which two always homophonous items have become orthographically distinct, then we have evidence for systematic distinction of homophones over time. Such is the case with the suffixes that Berg and Aronoff (2017) studied. Here we apply the results of their study of affixes to stems.

First, though, one must ask the very basic question: how pervasive is the pattern of distinguishing homophonous stems in written English? Although the phenomenon has been noted many times, ours is the first systematic effort to answer the basic question: given two or more homophonous stems, how likely is it that they will have different spellings? In other words, to what degree does a single English stem have a written form that identifies this and only this stem?

In order to answer these questions, some terminological clarifications are needed. Uniqueness means that a particular spelling is reserved for a particular morpheme. If a form is unique, it refers to one and only one morpheme. As noted above single final <s> practically always encodes the morpheme -s. It is only very rarely part of a lexeme (as in lens). That does not hold for morphologically simple stems. The letter sequences that are used to spell monosyllabic words like cat, pun, or fat often recur in longer words:

  1. (1)
    figure a

Monosyllabic words are regularly part of larger words, but just on the formal side – whatever the meaning of cat is, it certainly is no part of the meaning of catapult. In a similar vein, Hockett (1961:29) states that morphemes consist of phonemes, but that not every sequence of the same phonemes represents the same morpheme:

It is important to note that the ‘composed of’ assumption has never been seriously taken to imply that every occurrence of a specified arrangement of specific phonemes constitutes an occurrence of one and the same morpheme. Apart from ordinary homophony (beet and beat), it is permissible for some occurrences of an arrangement of phonemes to be a morpheme, other occurrences of the same arrangement to be less or more than a morpheme—in my speech, cat and the beginning of catalog.

Psycholinguistically, a word like catapult does indeed activate cat (cf. Bowers et al. 2005). Yet this activation of the monosyllabic word apparently does not lead to processing difficulties, and thus there is no pressure for a unique spelling of words like cat.

We will thus say that stems like beet/beat are distinct—they are homophonous but they differ orthographically, and neither stem is unique because the same strings of letters also occur as part of other stems, beetle and beatify (we follow Ryan 2016:196f. with this terminology).

We restrict our investigation to morphologically simple stems. Otherwise, we would have to deal with cases like pleas/please, which are described more adequately with reference to affix spelling (the distinctiveness of the morpheme -s corresponds to the <se> spelling of the simple morpheme please).

2 Heterography

Heterography is usually considered to play an important role in the spelling of English words. For example, Rutkowska and Rössler (2012:216) stipulate “[t]he principle of heterography”: “Homophones are visually distinguished by having different spellings”.Footnote 4 There are different orthographic means for distinguishing homophones; the spellings can differ in their consonant letters, their vowel letters, or both (cf. e.g. Venezky 1999:17ff., who lists some common orthographic means for distinguishing homophonous lexemes). Heterographs with differences in consonant letters are relatively rare, according to Carney (1994:403); they mostly involve different spellings of /k/ (<c>, <k>, <ck>, <qu>) and /s/ (<s>, <c>, <ss>), as well as alternations like <wr>/<r>, <wh>/<w>, <kn>/<n> and single vs. double consonants. Heterographs with differences in their vowel letters, are, on the other hand “the most fruitful source of homophones” (Carney 1994:403). Within this group, variation that involves tense vowels or diphthongs is more frequent than variation that involves lax vowels, Carney (1994:404f.) finds.

Yet while it is true that there are many cases of heterography in English—as witnessed e.g. by the extensive list in Carney (1994:401ff.)—it is also obvious that for a ‘principle’, heterography has a lot of exceptions. Carney (1994:399f.) compiles a list of these exception, that is, of homophonous words that are not spelled distinctly. In this vein, Ryan (2016:75) observes:

Among lexical spellings, the distinctiveness principle is applied selectively, so while there exist pairs such as mettle and metal, or flour and flower, this kind of semantic specialisation is not systematically applied across the system. Hence there are countless spellings used to represent different words that sound alike, as in bank, nail, back etc.

As a consequence, Ryan (2016:195) notes, “[d]istinctive spelling has seldom been treated as a fundamental principle of English spelling”.

From a psycholinguistic vantage point, Treiman et al. (2015) arrive at a similar conclusion. They ask participants to spell novel words that are homophonous to existing words—e.g. /θif/, supposedly a rare English word that means “an abnormal rattling sound in unhealthy lungs” (Treiman et al. 2015:8), according to the definition the participants were given. Treiman et al. find that roughly two thirds of the time, participants chose a homographic spelling (in the example above, <thief>). If heterography were indeed a fundamental principle of English spelling, we would expect participants to pick heterographic spellings instead (e.g. <theef>, <theaf>).

It thus seems clear that distinctivity has clear limitations in English spelling. Yet the precise scope and limitations of heterographic spelling in English have never been empirically determined. That is what we set out to do in the following sections.

3 Methodology

To determine the amount of heterography among homophonous words, we need a list of phonological forms that belong to more than one stem. We also need to decide for each phonological form whether the respective stems have distinct graphemic forms.

We use the CELEX corpus (Baayen et al. 1995) to do that, but with substantial modifications. CELEX is a lexical database for Dutch, English and German word forms and lexemes.Footnote 5 We used the lexeme part for our investigation, which contains 52,447 entries. Each lexeme comes with the orthographic and phonological forms of its head word and information about its morphological structure and its morpho-syntactic properties. For the current investigation, we compiled a list of 7,004 morphologically simple words from CELEX.Footnote 6 We automatically extracted all homophonous forms (i.e., sets of phonologically identical distinct lexemes) from this list. That lead to a much smaller list of 727 homophonous word forms in CELEX (e.g. /fɜ:/ - ‘tree’/‘animal skin’).Footnote 7

This list of homophonous forms is unsatisfactory for our investigation in two regards. First, it contains conversions like water.N/water.V. Most of the conversions are noun/verb pairs. These homophonous forms are not homonymous but are instances of the same stem, and we would not expect distinctive spellings for each form. After all, heterography would obscure the tight semantic relationship between the forms.

Second, our list of homophonous forms so far does not include homographic homophones. For any given morphologically simple phonological form to get more than one CELEX entry, one of three conditions has to be met (cf. Burnage 1995:15ff):

  • the phonological form has two orthographic forms (peek/peak)

  • the phonological form belongs to two word classes (water.V/water.N)

  • the phonological form has two inflectional paradigms (antennas/antennae; this is clearly marginal in English)

That means that forms like /deit/ (‘fruit’/‘point in time’) only have one CELEX entry—they are spelled the same way and they are both nouns. But cases like date are actually relevant for our investigation: We need to know in how many cases homophonous words are not spelled distinctively, i.e. each spelled differently from the other, and date is a case in point: Theoretically, spellings like <deight>, <dait> or <daite> are conceivable; they are licenced by the regular phoneme-grapheme-correspondences of English.

To compensate for these shortcomings of CELEX with regard to our investigation, we chose the following two-step remedy. First, we used a large printed dictionary to determine which of the homophones in our list are semantically or morphologically related and thus share the same stem. If words like water.V and water.N are listed as sub-entries under the same head word in the dictionary, we take them to share the same stem and thus exclude the phonological form from the list of homophones. We used a large bilingual English/German dictionaryFootnote 8 with 120,000 head words (Langenscheidt 2005) for this task and excluded 271 polysemous forms. The size of the dictionary matters only insofar as it should be large enough to contain the CELEX forms.

And second, we used a much smaller dictionary to add homographic homophones to the list. We manually extracted homophonous forms from a dictionary about the size of CELEX’s English lexeme part (Langenscheidt 2002, 60,000 entries). Why didn’t we use the dictionary introduced above (Langenscheidt 2005), which contains twice as many entries? While that would raise the number of homographic homophones, the number of heterographic homophones would remain the same (there is no way to find these systematically in a printed dictionary). This would distort the data base towards not differentiating homophonous forms. By using a dictionary that is roughly the size of CELEX, we cautiously expand the current list of homophones with 166 homographic homophones like colon (‘intestine’/‘punctuation mark’).

Of course, this does not solve the notoriously difficult distinction between polysemes and homonyms; it simply relegates it to dictionaries. However, it makes the distinction operationalizable, and we can compare the overall results for English with those for German.

After these operations, our list of homophonous forms contains 622 entries,Footnote 9 of which four are purely orthographic variants like <gingko>/<ginkgo>; they are excluded, and we are left with 618 homophonous forms. We annotated whether these homophones are homographic or heterographic. If two stems share a phonological form but are spelled differently, they are heterographic. If three or more stems share a phonological form, we take this form to be heterographic as soon as at least one stem has its own spelling: /riŋ/(<wring>, <ring> ‘accessory’, <ring> ‘sound’) would be classified as heterographic. Heterographic stems are further annotated according to how they are differentiated. Are the vowel letters different (as in e.g. <beet>/<beat>), are the consonant letters different (e.g. <nap>/<knap>), or both (e.g. <air>/<heir>)?

The operationalization we chose leads to a list of homophones and heterographs; other operationalizations will lead to slightly different lists. A few heterographs we expected to see are missing from our list, e.g. sheer/shear. However, we have no reason to suspect that any other operationalization will yield results that are very different from the ones reported below.

4 Results

Our corpus contains 618 homophonous forms. Of these, 292 forms have more than one written form. However, among them are ten forms like (1):

  1. (1)

    /bi:/ (bee/be), /bai/ (by/buy), /in/ (inn/in), /nəʊ/ (no/know), /wi:/ (we/wee)

These forms are differentiated graphemically, but we can explain this difference on other grounds; we do not have to invoke a heterography principle. Minimal content words contain at least three letters (cf. e.g. Jespersen 1928:149, Venezky 1999:86), but this constraint does not hold for function words. In each of the pairs in (1), one part of the homophones is a content word (which needs at least three letters), and one is a function word (which can be shorter). Accordingly, the ten forms like the ones in (1) can be motivated on the grounds of minimal word constraints; they have to be subtracted from the total count. That leaves us with 608 homophonous forms and 281 of them are heterographic. This is a ratio of 46%. As noted above, many homophones are homophones only in r-free varieties like RP. As a consequence, the ratio of heterographic homophones is even lower for rhotic varieties.

Compared to German (where the same investigation was carried out in Berg 2019), there are many more homophones in English (608 vs. 222), and the ratio of heterographic homophones is also much higher in English than the equivalent ratio for German (46% vs. 18%). Heterography is more wide-spread (and arguably more important) in the English writing system compared to German. Note that German comes closer to the perceived ideal of writing systems: In these ideal systems, one letter represents one sound, and one sound is represented by one letter (the degree of phonographic consistency is high), and—most importantly for the discussion at hand—heterographic homophones (like right and write) have no place. The German writing system is certainly not completely consistent phonographically, but it is more consistent than English.

87% of the heterographic forms are phonologically monosyllabic, 12% are bisyllabic, and only 1% (two forms) are trisyllabic (calendar/calender, gorilla/guerilla). This is not really surprising. Overall, just over 50% of morphologically simple lexemes in CELEX are monosyllabic. But of course the probability of homophony is negatively correlated to the length of the stem: the shorter the phonological form of a stem is, the more likely it is that there exists an unrelated stem with exactly this form.

How does this heterography come about? As mentioned above, Carney (1994:403) suggests that variation in the spelling of vowels is “the most fruitful source of homophones”. This is true for our corpus: Of the 281 homophones that are heterographic, only 20% have different consonant letters alone (e.g. cannon/canon, phase/faze). In an additional 31%, both consonant and vowel letters differ (medal/meddle, rain/reign). That means that in almost 50% of the cases, the difference is exclusively in the vowel letters (tear/tare, course/coarse).

We can now dig a little deeper and group certain differences together (Table 1) into patterns. We include only those with six or more heterographic pairs. Each line contains a difference, but the cases reported may involve other differences as well (for example, the pair curb/kerb involves both a vowel difference—<u> vs. <e>—and a consonant difference, <c> vs. <k>).

Table 1 Number of heterographs for different vowel and consonant patterns in English

The five vowel patterns cover 62% of all vowel alternations; the five consonant patterns cover only 39% of all consonant alternations, which are many fewer to begin with. Consonant alternations are thus much more idiosyncratic overall (Fig. 1).

Fig. 1
figure 1

Proportion of major patterns for the differentiation of homophonous forms for vowel patterns (left) and consonant patterns (right)

We can now use the patterns in Table 1 to determine to what degree they are used in differentiating homophones: How many other homophones besides the ones counted in Table 1 could potentially be spelled heterographically using the patterns? For example, 11 homophones are differentiated in spelling using variation in vowel + <r> combinations (fir/fur). As it turns out, there are 11 homographic homophones that contain /ɜ:/. These could employ this pattern to differentiate stems, but they do not (e.g. stern/*stirn, firm/*furm). That means the <Vr> pattern is used in 50% of all possible cases.

Before we begin, a few words about the patterns are necessary.

  • For the <VV> alternations, we only look at spellings of /i:/ because they are the largest sub-group. They make up almost half of the cases reported in Table 1. The other patterns are much rarer, e.g. <ai>/<ei> (faint/feint) or <oo>/<ou> (wood/would). All of the heterographs with different /i:/ spellings are monosyllabic. Accordingly, the relevant environment are monosyllabic homophones that contain /i:/.

  • <VV> vs. <Ve>: All homophones with this alternation are monosyllabic, so that is what we will restrict our search to. Phonologically, /ei/ (lane), /i:/ (gene), /ai/ (kite), /əʊ/ (lone), /u:/ (cute), /ɔ:/ (chord), and /eə/ (flare) have more than one way of spelling. For /u:/, there is an additional constraint: only /u:/ after /j/, /l/ or /r/ can be spelled with <u> + final <e> (cute, lute, rude).

  • <V> in unstressed syllables: All homophones that contain a monophthong in an unstressed syllable.

  • <i> vs. <y>: Three quarters of the homophones contain the diphthong /ai/ (e.g. die/dye); we will restrict the search to this pattern (homophones with /ai/).

  • <C> vs. <CC>: Here we look at patterns of phonologically monosyllabic homophones with a lax vowel followed by a single consonant from the set {/b/, /d/, /g/, /p/, /t/, /l/, /m/, /n/, /r/, /s/, /z/, /f/} (e.g. but/butt; this combination of factors is one of the major environments for double consonants, cf. Berg 2016)

  • <w>/<wh>: All homophones with initial /w/

  • <n>/<kn>: All homophones with initial /n/

  • <c>/<k>: All homophones with /k/ in the onset; all homophones with final /ɔ:k/ or /ɜ:k/ or /ŋk/

  • <r>/<wr>: All homophones with initial /r/

Table 2 lists the number of heterographs (stems that are differentiated in spelling) and the number of homographs (stems that are not) as well as an example for homographic stems.

Table 2 The proportions of heterographic and homographic homophones among all homophones for different consonant and vowel patterns with examples for potential (but not realized) heterographs. Bold numbers represent the majority of cases for each category

Just one (sub-)pattern is utilized consistently: whenever there are homophones that contain /eə/, they are distinguished in spelling with the <VV>/<V> + final <e> difference (fair/fare, stair/stare, wear/ware). Most of the patterns, however, are not even used in the majority of cases where they might be. For most patterns, there are more homophones that could potentially be differentiated than there are heterographic variants that use this pattern.

Heterography—or distinctivity of stems—is not a major spelling principle in English, we can conclude. If it were, we would expect a much higher degree of utilization for the patterns. Most homophones that could be spelled distinctively are actually not. That naturally leads to the question why the English writing system is neither fully phonographic in this respect (all homophones are homographic) nor fully lexical/morphological (every stem has its own distinctive spelling)—but rather this strange half-baked state of affairs. As usual in situations like these, a good hypothesis is that we are dealing with historical residues.

5 History

The heterographic spelling of homophonous stems can happen for (at least) two historical reasons. First, two homophonous and homographic stems can be differentiated over time. Stem A and stem B are formally indistinguishable at some point in time (they have identical phonological forms /P1/ and identical orthographic forms <O1>), but at a later stage, each stem has its own distinct orthographic representation (the phonological representations are still the same) (Fig. 2).

Fig. 2
figure 2

Schematic representation of graphemic differentiation over time

This is the development that should be expected under a strong principle of heterography: Each stem should have a distinct spelling, and if that is not already the case at a given point in time, it is established.

A special case of this is stem differentiation: A stem has spelling variants, and one variant becomes associated with a new and distinct meaning. As Carney (1994:407–410) shows, there are some examples of this in the history of English, e.g. tire and tyre. Both started out as spelling variants of the stem meaning ‘dress, apparel’, but <tyre> became unfashionable. It was “revived in British English […] in the nineteenth century for the tyre (i.e. ‘clothing’) of a wheel” (Carney 1994:415). Cases like these are probably rare (Ryan 2016:195).

The second reason for heterographic stems is sound change. Two stems have different phonological and orthographic forms at T1. Sound change then leads to a merger of sounds and thus to homophonous stems. The spelling of these stems, though, is retained, which leads to the same constellation at T2 as above (homophonous and heterographic stems)Footnote 10 (Fig. 3).

Fig. 3
figure 3

Schematic representation of graphemic conservation over time

It turns out that most of the patterns that are used in distinguishing homophones (cf. Table 1 above) can actually be explained in this manner, as instances of sound change that is retained in the spelling.Footnote 11

For the vowel patterns, this holds for the <VV>/<V> + <e> alternation (e.g. <loan>/<lone>). Final <e> represented a phonological vowel until the 14th century (Dobson 1968:879); these words were thus not homophonous. The same holds for the <ea>/<ee> alternations (like <leak>/<leek>). These too represented distinct vowel phonemes once, but had merged by the 18th century (Wells 1982:194ff.). Likewise, the <V> + <r> alternations (like <fir>/<fur>) were heterophonous once but had merged by the 17th century (Wells 1982:199f.).

Many consonant patterns are also the result of changes in the phonological system: <r> and <wr> (as in <write> and <right>) were once distinct phonologically (Dobson 1968:975, Ryan 2016:180), as were <w> and <wh> (as in <witch>/<which>, cf. Dobson 1968:974) and <n> and <kn> (as in <night>/<knight>). This latter distinction was neutralized relatively late; orthoepists still demanded a phonological difference until well into the 17th century, cf. Dobson (1968:976). Some varieties still maintain the preaspirated pronunciation of <wh> but it has merged in all the standard varieties. In all these cases, we do not observe a differentiation of stems, but a conservation of older graphemic spellings that once very probably had a phonographic reality.

6 Discussion

Science begins with speculation but it can only reach a conclusion through thorough observation. 20th century linguists, in their desire to find beauty in even the most apparently disorderly phenomena, speculated that the notoriously difficult Modern English spelling was nearly optimal once one delved beneath the phonological surface. Previous researchers have looked in detail at the spelling of suffixes and have found that yes, the system encodes individual suffixes at the expense of phonological transparency. In this article, we have focused on another frequent speculation, that the orthography distinguishes homophonous monomorphemic stems systematically, and have found no evidence for anything more than historical accident. It seems as though for generations of English writers, typesetters and printers, it was not important to express lexical differences between stems in their written forms. Readers do not seem to need this extra help, just as listeners are able to deal with homophony. Languages in general tend to tolerate homonymy much more than they tolerate synonymy, and our paper adds further evidence to this observation. This state of affairs is markedly different from derivational suffixes, where we can actually observe processes of differentiation. In Berg and Aronoff (2018), we demonstrated one such case of the emergence of distinctivity over time. The diminutive suffix -y is gradually leaving the set of -y-suffixes (apart from diminutive -y: adjectival -y as in windy, and nominal -y as in harmony): These diminutives are increasingly often spelled with <ie> instead of <y> (e.g., <cabbie> instead of <cabby>). This shows us that stems and (derivational) affixes are different animals when it comes to naming them: While English writers have no problem with stem homonymy, they prefer their affixes distinct. To return to the question that appears in the title: The English writing system is both phonographic and morphological/lexical—it all depends on which morphological unit we focus on. Homophonous affixes often have distinct spellings, and we can observe the differentiation of affix spellings over time. Homophonous stems, on the other hand, are often spelled alike; when they are not, this is rather because writing conserves historical phonological differences.