Orthography-phonology consistency in English: Theory- and data-driven measures and their impact on auditory vs. visual word recognition

Lim, Alfred; O’Brien, Beth; Onnis, Luca

doi:10.3758/s13428-023-02094-5

Orthography-phonology consistency in English: Theory- and data-driven measures and their impact on auditory vs. visual word recognition

Open access
Published: 08 August 2023

Volume 56, pages 1283–1313, (2024)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

Orthography-phonology consistency in English: Theory- and data-driven measures and their impact on auditory vs. visual word recognition

Download PDF

2146 Accesses
3 Altmetric
Explore all metrics

Abstract

Research on orthographic consistency in English words has selectively identified different sub-syllabic units in isolation (grapheme, onset, vowel, coda, rime), yet there is no comprehensive assessment of how these measures affect word identification when taken together. To study which aspects of consistency are more psychologically relevant, we investigated their independent and composite effects on human reading behavior using large-scale databases. Study 1 found effects on adults’ naming responses of both feedforward consistency (orthography to phonology) and feedback consistency (phonology to orthography). Study 2 found feedback but no feedforward consistency effects on visual and auditory lexical decision tasks, with the best predictor being a composite measure of consistency across grapheme, rime, OVC, and word-initial letter-phoneme. In Study 3, we explicitly modeled the reading process with forward and backward flow in a bidirectionally connected neural network. The model captured latent dimensions of quasi-regular mapping that explain additional variance in human reading and spelling behavior, compared to the established measures. Together, the results suggest interactive activation between phonological and orthographic word representations. They also validate the role of computational analyses of language to better understand how print maps to sound, and what properties of natural language affect reading complexity.

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Article Open access 09 April 2024

Using information-theoretic measures to characterize the structure of the writing system: the case of orthographic-phonological regularities in English

Article 16 January 2020

Orthographic and phonological neighborhood effects in handwritten word perception

Article 26 August 2015

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The ability to recognize written representations of words is considered foundational for fluent reading acquisition and comprehension. As a pivotal process in literacy word reading has been the focus of an extensive body of psycholinguistic research. For skilled adult readers, this research points to the well-specified representations of words’ phonology, orthography, and meaning within the mental lexicon (Perfetti, 2007). While there is agreement that in order to acquire and master such decoding abilities readers must learn to map between orthography (print) and phonology (speech) (Verhoeven & Perfetti, 2017), the specific properties of writing systems that are most cognitively relevant to the reading brain have not been entirely spelled out.

Skilled readers of alphabetic languages are able to ‘cipher’ or decode known and unfamiliar words using acquired orthographic-phonological mappings (Ehri & Wilce, 1987), otherwise referred to as grapheme–phoneme correspondences (GPCs), where ‘graphemes’ refer to single letters or letter clusters that correspond to a single ‘phoneme’ or speech sound. Readers are also adept spellers, and so they have also acquired phoneme-grapheme correspondences (PGCs). To establish these mapping systems (GPCs and PGCs), beginning readers take into account the statistical regularities implicit in the written and spoken language, and the regularities of the correspondences between them.

Regularities can occur in multiple guises, for example in the way that phonemes are combined within spoken words—phonotactic regularities. For instance, the phoneme /ŋ/ appears only at the end of words in English, but at the beginning of words in Swahili. Such phonological regularities often appear reflected in written words—as orthotactic and graphotactic regularities. For example, the letter sequence NG also appears at the end of English words but not at the beginning, and noticing this regularity can help the learner map onto the phoneme /ŋ/.

However, orthographic systems are often compromise solutions between print and sound, as they are the historical product of layered adaptations, idiosyncratic habits handed down and becoming conventionalized over centuries, and consequences of language contact. For example, the Roman alphabet script originally containing 23 letter symbols was progressively adopted by several languages in Europe and beyond, with fairly different phonemic systems and inventories. When the Anglo-Saxons, linguistic ancestors of English speakers, adopted the Roman alphabet to correspond with the sounds of their own language, they had to confront the fact that the alphabet contained only five graphemes to indicate vowels, while today’s English varieties contain at least 21 phonemic vowels. Because of multiple historical facts such as these, for any given natural language the print-sound mappings—and thus the underlying statistics upon which learning occurs—can be more or less regular. For instance, the grapheme NG also occurs in the middle of English words to map to a different set of phonemes /ndʒ/, as in the word ENGINEER. Or the grapheme CH can map onto three different phonemes: /k/ as in CHAOS, /ʃ/ as in MACHINE, and /tʃ/ as in CHINA. More consistent orthographies, like Finnish or Italian, exhibit fewer and more regular GPC and PGC patterns than English, and thus an overall more economical mapping between print and sound. For instance, the grapheme CH maps onto a single phoneme /k/ in Italian. Less consistent orthographies contain more quasi-regularities, where one grapheme can match to more than one phoneme, or phonemes can have inconsistent spellings.

One direct consequence of varying degrees of consistency is that reading is acquired at a comparatively slower rate for readers of more inconsistent graphophonemic systems (Ellis & Yuan, 2004; Georgiou, Parrila, & Papadopoulos, 2008; Florit & Cain, 2011; Frith, Wimmer, & Landerl, 1998). Moreover, within any alphabetic language, more consistent words are read faster and more accurately (Jared, 2002), and this principle also applies to words within more consistent orthographies (Ventura, Morais, Pattamadilok, & Kolinsky, 2004). Thus, besides identifying scripts with more opacity and inconsistencies, it is important to better understand and identify the degree of consistency/inconsistency of words within a language’s script, and how it affects word recognition. In the present study, we examined to what extent the accuracy and latency of word recognition from a large collection of adult participant responses is affected by various measures of print-speech consistency. While our method was applied to English and native speakers of English, we documented and share all procedures and computational pipelines, so that they could be readily applied to other alphabetic systems in future studies.

The current study

The first goal in this paper was to review several dimensions of word consistency proposed in the literature, and subsequently assess which best accounts empirically for the ease or difficulty of word reading by experienced adult readers. We quantified sublexical features that make English words more or less regular in orthography-to-phonology and phonology-to-orthography mappings.

Because these measures have been mostly studied individually, we asked whether a word-level combined measure captures more systematic psycholinguistic behavior in word identification. Mapping print-sound regularities can occur at different levels of granularity, both from spelling-to-sound (e.g., Hino & Lupker, 1996; Stanovich & Bauer, 1978; Waters & Seidenberg, 1985), and in the opposite direction of sound-to-spelling (e.g., Balota, Cortese, Sergent-marshall, Spieler, & Yap, 2004; Chee, Chow, Yap, & Goh, 2020; Ziegler et al., 1997b). We perused the literature for the various measures proposed and calculated them for thousands of words in a large and representative corpus of English.

The second goal of this paper was thus to ask whether the contribution of orthography-to-phonology and phonology-to-orthography mappings differ depending on the lexical task at hand, i.e., when it is based on visual processing (such as naming or recognizing a written word), and when it is based on auditory processing (such as recognizing a spoken word). To do so, we directly compared the degree of fit of orthography-to-phonology and phonology-to-orthography consistency measures in predicting behavioral visual response data from the English Lexicon Project (ELP, Balota et al., 2007; see Study 1) against data from a new large auditory and production dataset (the Massive Auditory Lexical Decision, MALD, Tucker et al., (2019); see Study 2).

In particular, in Study 1, we analyzed behavioral data from the ELP, which contains behavioral naming response times and accuracy to a naming task of North American English. Based on previous findings, we hypothesized that consistency defined at different granularities shows only moderate overlap, and that a combined measure of consistency across granularity and mapping direction should explain more variance in visual word-recognition performance than individual components (Siegelman, Kearns, & Rueckl, 2020). We found that a composite measure of feedback consistency best accounted for word naming latencies.

In Study 2, we applied the same corpus-derived measures of word consistency to predicting word-recognition performance on a different word task—lexical decision—in both the visual and auditory modalities. Following prior studies, we hypothesized that feedforward consistency should facilitate visual lexical decision performance (Jared, 2002), while feedback consistency should facilitate auditory lexical decision (Grainger & Ziegler, 2011). However, we found only feedback consistency measures best predicting visual lexical decision times.

By the end of Study 2, two considerations became apparent, and we decided to tackle them in Study 3. One consideration is that several dimensions of statistical quasi-regularities between orthography and phonology embedded in the (English) lexicon may be subtle enough to be unaccounted for by the measures used in Study 1 and 2, as in general they may be difficult to identify entirely in researcher-driven analysis. Such undetected patterns of sub-regularity may account for unexplained variance in lexical processing. We thus asked whether a data-driven, machine learning approach implemented in neural networks could contribute to improved overall measures of GPC and PGC consistencies for English words. Modeling reading processes with neural networks has an established tradition since the seminal work of Seidenberg and McClelland (1989), and dovetails with a growing body of empirical evidence that characterizes learning to decode printed words as a form of statistical learning. Because the neural networks we implemented incorporate algorithms of statistical learning and were not taught orthography-phonology mappings explicitly, they represent valid candidate models of what could be learned implicitly from printed words, and how a data-driven approach resolves the mapping problem. In Study 3, we asked whether this data-driven approach to word consistencies provides a better predictor of lexical decision performance than the corpus-derived measures of consistency.

A second consideration for modeling consistency using neural networks is of theoretical relevance and emerged from Study 1 and 2. We found that processes of word identification may rely on resonant bidirectional flows of information relating print to sound and sound to print, perhaps more than has been acknowledged in the literature. This was evident in sound-to-print effects in both the naming word task and the lexical decision tasks, both visual and auditory.

Neural networks lend themselves naturally to modeling interactive effects directly, when forward and backward information flow is implemented explicitly in architectures that are bidirectionally connected. Therefore, we set to train bidirectional neural networks on orthographic-to-phonological mappings (thus simulating reading aloud visually presented words) as well as on phonological-to-orthographic mappings (thus simulating spelling spoken words). The ease and accuracy of the models in solving the mapping problem after training provides a natural alternative metric of word consistency: that is, the closeness to the target phonological word when the network is prompted with an orthographic word as input, and vice versa, the closeness to the target orthographic word when the network is prompted with a phonological input word. In a final set of regression analyses aimed at predicting the human behavioral performance in naming and lexical decision tasks, we compared the fit of our best research-driven consistency predictors (from Study 1 and 2) with the data-driven, neural network consistency predictors obtained in Study 3. To the extent that these networks are bidirectionally connected, they should maximally extract latent quasi-regularities while learning to associate print to sound and vice versa. As a consequence, their performance on individual words could be used to predict human lexical decision performance to a greater sensitivity than the corpus-derived measures of consistency obtained in Study 1 and 2. If neural networks indeed provide a better fit to the human data, we argue that the consistency metrics extracted from their training should be considered as a valid holistic measure of individual words’ consistency in psycholinguistic research. The practical value of this approach should not go unnoticed, as training neural networks has become reasonably fast with modern computers. Therefore, obtaining word-level consistency measures across different languages would be conveniently less resource-intensive, at least compared to the manual hand-picking procedure necessary to identify and extract hundreds of language-specific GPC and PGC mappings (as in Study 1 and Study 2).

Finally, a third goal of this paper was to make available to the scientific community empirical measures of word consistency that can be adopted as a benchmark for future research studies, both experimental and computational, as well as for educational purposes. We share our data publicly in the hope that it can be incorporated in current and next generation psycholinguistic datasets. From an educator’s standpoint, knowing which sets of words may be problematic to learn would allow one to order instruction in line with such challenges, and knowing which patterns of consistent sub-regularities can be capitalized on would likewise help reading instruction. Thus, educational researchers and educators may find useful the ranking of English words in terms of their statistical consistency using the single composite metrics we obtained, when selecting words for experimental tests or to introduce them at different stages of the school curriculum. The resource we offer can thus have both scientific and educational value.

In sum, in the studies that follow we extracted from language corpora consistency measures defined across (a) different sublexical units, and (b) different print-sound direction (feedforward, feedback) and the goal was to find what measure best predicts human performance in (c) three word-recognition tasks. The three studies combined contribute to characterizing the statistical structure of English words in relation to mapping print to sound and sound to print.

Corpus-derived estimates of reading consistency

In this section, we review dimensions of quasi-regularity that have been advanced in the literature, and empirically calculate corpus-derived measures of such regularities for a sizeable portion of English words. A common way of framing the concept of regularity is to consider alphabetic reading as involving identifying words that follow typical spelling-sound patterns, or rules, but also words that do not adhere to these rules. Therefore, to balance the two demands of alphabetic reading, the reader must generalize the rules to ’consistent’ words, and also learn the exceptions of ‘inconsistent’ words. This has been extensively examined in the psycholinguistic literature (Fodor & Pylyshyn, 1988; Glushko, 1979; Taylor, Plunkett, & Nation, 2011). In one area of research, the distinction is made between categories of regular words that follow spelling rules (e.g., MIST), and irregular words that do not (Castles & Coltheart, 1993, YACHT;). One theoretical approach proposes that each category is handled by two separate cognitive processes—applying GPC rules to decode regular words, or using a mental lexical lookup table to identify irregular words (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001).

Other theoretical work considers consistency as a continuum dimension (Jared, 1997), whereby words can more or less follow similar pronunciations from similar spellings. For example, in English, the letter N often denotes the phoneme /n/, but letter combinations containing non-pronounced letters such as KN and GN also denote this phoneme as in KNOW, KNEE, GNAT, SIGN, and so on. From the perspective of an implicit learner, such mappings are informative sub-regularities rather than random “exceptions” (Arciuli, 2018). Indeed, degrees of word consistency affect word naming and lexical decision times for adult readers, with faster responses for consistent words (Andrews, 1982; Jared, 1997; 2002). Children also show better accuracy for reading and spelling of consistent words (Alegria & Mousty, 1996; L’et’e, Peereman, & Fayol, 2008; Weekes, Castles, & Davies, 2006). Thus, consistency as a continuum is an important factor within the language, just as it is between shallow and deep alphabetic languages (the orthographic depth hypothesis, Katz and Feldman (1983)).

In the literature, consistency has been configured in different ways (Borleffs, Maassen, Lyytinen, & Zwarts, 2017). Here, we aim to review them separately and then consider them jointly to establish a combined measure of consistency for English words. In some cases, consistency of words has been computed at the grapheme level (Berndt, Reggia, & Mitchum, 1987), whereby the various pronunciations of a grapheme are tabulated across a corpus of words. For example, graphemes often have more than one possible pronunciation (e.g., E → /ɛ/, E → /i/, E → /ə/), and consistency is defined by the variability of the pronunciations assigned to a particular graphemic unit (a single letter, A, or cluster of letters, AY). A word’s consistency can then be taken as an aggregate of a word’s grapheme consistency levels. Others have defined consistency at the subword level for rime spelling patterns (Jared, 1997), which is the vowel nucleus plus any ending consonants. In this case, there are “friends” which are words with shared rime spellings and their pronunciations (HINT, MINT, TINT), and “enemies” which are words that have similar rime spelling but different pronunciations (PINT). A word’s consistency is thus calculated as a ratio between friends and enemies (Jared, 1997). Still, another proposed way to compute consistency involves all subword components, namely onset (initial graphemes coming before the vowel), vowel (nucleus), and the coda (ending graphemes coming after the vowel; Kessler & Treiman, 2001).

Thus, different psycholinguistic units have been postulated as the basis for determining word consistency: from grapheme units, to subsyllabic onset-vowel-coda units, to rime patterns (as shown in Fig. 1). For the beginning reader, the process of decoding words from these print units to mapped speech units requires first a segmenting process, which is non-trivial. Delineating subword patterns is complicated by the fact that units corresponding to a single phoneme also differ in granularity, or the number of letters that are contained in the graphemic unit. Subword patterns become unitized for experienced readers, as demonstrated when adults are slower to identify individual letters within a multi-letter grapheme (Rey, Ziegler, & Jacobs, 2000). So another essential part of learning to read involves this process of unitization. On the other hand, the mapping process involves pronunciation variability which may be affected by word context, such as non-sequential letter patterns, like the silent vowel E which can affect the pronunciation of the previous vowel (e.g., PLANE → /pleɪn/, instead of /plæni/). Both granularity and consistency, then, are important aspects of language structure that impact reading acquisition and performance.

As defined above, consistency may depend on the level of the units for which it is evaluated. For example, rime patterns are held to play an important role in the pronunciation of printed words (Treiman, Mullennix, Bijeljac-babic, & Richmond-Welty, 1995). Consider the word PINT (/paɪnt/). At the rime level, it is an inconsistent word because it is pronounced differently than other words sharing its rime spelling pattern, like MINT (/mɪnt/) and TINT (/tɪnt/), and these two mappings have different probabilities (INT → /aɪnt/, p = 0.04, versus INT → /ɪnt/, p = 0.91). Yet, at the grapheme level PINT (/paɪnt/) has an overall predictability across its graphemes of p = 0.87 (P → /p/, p = 1.00; I → /aɪ/, p = 0.49; N → /n/, p = 1.00; T → /t/, p = 1.00), calculated based on the average of the ratio of each GPC probability and the most probable correspondence for that combination (Berndt et al., 1987).

Even in cases where the rime pattern is consistently pronounced across words (such as AND → /ʌnd/, p = 0.92), its vowel is often inconsistent across words (A → /ʌ/, p < 0.01). Siegelman et al., (2020) address this important issue for operationalizing consistency, and suggest alternative methods focused on uncertainty using information theory, as described below. Here, we compare different methods previously used for deriving consistency.

We first apply these various definitions of consistency across a word corpus and examine their interrelations, along with a new integrated measure of consistency. We then examine how well the different measures of consistency predict recorded human response times for visual word processing (from the ELP database, Balota et al., 2007) and then in Study 2 additionally for auditory processing (from the MALD database, version 1.1, Tucker et al., (2019)). The ELP contains behavioral data from 1260 participants across six different universities who responded to 40,000 words in a visual naming task and a visual lexical decision task, while the MALD database comprises response data for 26,793 words and 9592 pseudowords in a auditory lexical decision task from 231 unique monolingual English listeners.

Method

Corpus

For the present study, we selected only monosyllabic words from the Massive Auditory Lexical Decision (MALD) database (Tucker et al., 2019) (N = 4,347) to derive and compare their consistency. We used the subtitle-based SUBTLEX-US (Brysbaert, New, & Keuleers, 2012) frequency measure to compute frequency-weighted consistency measures. Tucker et al., (2019) previously found that the SUBTLEX-US frequency count best explains frequency effects on response times when compared to the Corpus of Contemporary American English (COCA ; Davies, 2009) and Google Books n-gram corpus.

The MALD database is a freely available auditory data set for psycholinguistic research, providing time-aligned stimulus recordings for 26,793 words and 9592 pseudowords, and response data for 227,179 auditory lexical decisions from 231 unique monolingual English listeners.

Consistency at different granularities

To capture multiple levels of consistency for each word more holistically, we computed four sub-level consistency measures proposed by Berndt et al., (1987), Jared (1997), Kessler and Treiman (2001), Borgwaldt, Hellwig, and De Groot (2005), and corresponding to the grapheme, rime, onset-vowel-coda (OVC), and the onset level, respectively (see Fig. 1).

Grapheme consistency

The first measure captures word consistency at the grapheme level (referred to as grapheme consistency from here onwards ;Berndt et al., 1987), which requires the probabilities of grapheme–phoneme associations to first be computed as they occur in the corpus (e.g., the probability of the grapheme EW being pronounced as /o/ is, p(/o/|EW) = 0.06). Using these probabilities, the overall consistency of a word’s pronunciation is defined as the average of the ratio of each probability (e.g., p(/o/|EW) = 0.06) and the most probable correspondence for that grapheme (e.g., p_max(EW) = 0.94). For example, the overall grapheme consistency predictability for the word SEW is calculated by taking the ratio average of the graphemes S(p(/s/|S) = 0.63 / p_max(S) = 0.63) and EW (p(/oʊ/|EW) = 0.63 / p_max(EW) = 0.94), resulting in the value 0.83.

Rime consistency

The second measure is at the orthographic rime level (referred to as the rime consistency from here onwards; Jared (1997)). It is calculated as the proportion of friends and enemies amongst words that are similar orthographically in that they share vowel and coda spellings (e.g., the neighborhood: PINT, MINT, TINT). For example, for a word ending in INT, the rime-consistency was defined as the number of friends relative to the total number of friends plus enemies—where a friend is a word with the same orthographic rime unit and the same pronunciation of that unit, and an enemy is a word with the same orthographic rime unit and a different pronunciation.

OVC consistency

The third consistency measure considers the grapheme-to-phoneme consistencies of onset, vowel, and coda of words (referred to as OVC consistency from here onwards; Kessler and Treiman (2001)). Kessler and Treiman (2001) proposed a new measure termed conditional consistencies that is calculated on one part of the word when we hold constant some other part of the word. For example, one could compute the reading consistency of the vowel letter I when the coda is NT. A total of nine probability values (three unconditional and six conditional probabilities) were computed for each word by taking into account the letter strings of each of the three parts (onset, vowel, coda) and the combinations of any of the two parts (e.g., onset-vowel, onset-coda) of the syllable.

Onset consistency

The last measure focuses on the onsets of words and computed the consistency for word-initial letter-to-phoneme correspondences. Onset-consistency has been found to influence reaction times in reading tasks (Glushko, 1979; Treiman et al., 1995) and plays an important role in lexical access tasks (Marslen-Wilson & Welsh, 1978; Marslen-Wilson & Zwitserlood, 1989). Here, we considered the different pronunciations of first letters as in Borgwaldt et al., (2005) and computed the extent to which words with the same first letter also have the same first phoneme. For example, English words that begin with the letter W may have a different first phoneme: /w/ as in WING, p(/w/|W) = 0.94; /r/ as in WRAP, p(/r/|W) = 0.05; and /h/ as in WHOM, p(/r/|W) = 0.06.

From probabilities to information-theoretic measures

The conditional probabilities described above were later converted to surprisal, entropy, and information gain (IG) bits—indices borrowed from information theory (see also Siegelman et al., 2020).

Surprisal captures the unpredictability of a given grapheme-to-phoneme correspondence and, unlike probability, makes fine distinctions between low and very low probabilities via a non-linear logarithmic transformation:

$$ S_{i} = -log_{2} p(i) $$

(1)

where p(i) is the probability of an event i (e.g., p(/o/|EW)). Contrary to probability, higher surprisal values represent more surprising pronunciations, and it has been found to predict behavioral indices of language processing difficulty better than probability (e.g., Smith and Levy, 2013).

Entropy captures the unpredictability in the distribution of possible pronunciations of an event (e.g., how unpredictable a grapheme is given all its possible pronunciations) and is computed summing the surprisal of each event (S_i) multiplied by the probability of the event’s occurrence [p(i)]:

$$ E = -\sum\limits_{i}p(i) * log_{2} p(i) $$

(2)

Entropy was first introduced by Shannon’s information theory (Shannon, 1948), and earlier psycholinguistic studies have used entropy to investigate processing difficulty in human sentence comprehension (e.g., Levy, 2008).

Lastly, IG was computed for each word by finding the difference between entropy and surprisal (E − S), which quantifies the predictability of a grapheme-to-phoneme correspondence given the unpredictability of the grapheme. All analyses were performed on IG bits from here onwards.

Feedforward and feedback consistency

Typically, the mapping from pronunciation to spelling is less consistent than the mapping from spelling to pronunciation, and this may be one reason why spelling tasks are more difficult than reading tasks. Studies of word identification reveal that reading times are longer for words containing a sequence of phonemes that can be spelled in multiple ways. For example, it has been reported that adults are slowed when reading a word like HURL because other words that HURL rhymes with, such as GIRL and PEARL, have different spellings of the same rhyme (e.g., Lacruz & Folk, 2004; Stone, Vanhoy, & Van Orden, 1997; Ziegler, Montant, & Jacobs, 1997a; Perry, 2003). This form of inconsistency in the sound-to-letter direction, as opposed to letter-to-sound direction, is often referred to as the feedback consistency effect, which was first demonstrated by Stone, Vanhoy, and Van Orden (1997).

The theoretical implication of these findings suggests that reading words does not depend solely on converting an orthographic form into a phonological representation, but the process also involves a feedback mechanism from phonology to orthography to verify that the phonological representation can be spelled in that orthographic form. It is therefore believed that spelling and reading are intimately related and may influence each other during word processing. That is, both reading and spelling tasks can be affected by the combination of feedforward and feedback consistency.

The procedure used to compute the four-level consistency measures (i.e., grapheme, rime, OVC, onset) in the GPC direction was repeated using PGCs (for spelling). Separate GPC and PGC conditional probabilities were calculated using the same sound-letter components in the corpus. Taking the word PINT for example, its GPC conditional probability (INT → /aɪnt/, p = .04) and PGC conditional probability (/aɪnt/ → INT, p = 1.0) derived using the rime consistency method were based on the same rime and phonemes, differing only in the direction of correspondence.

Word-level consistency

Once sub-level consistency measures have been computed, we further derived three word-level measures using composite score, principal component, and least consistent unit by taking all four sub-level measures into account, with a higher score representing higher overall word consistency.

Composite score

As mapping print-sound regularities can occur at different levels of granularity, consistency has, too, been defined differently in the literature, which often resulted in inconsistent findings. Therefore, it is necessary to combine the various unit-level measures to obtain a combined index of word consistency. One method is to use a simple mean (unweighted) composite score that averages across the four unit-level measures.

Principal component analysis

Second, we made use of principal component analysis (PCA) for dimensionality reduction, and extracted the first principal component (PC1) for a maximal amount of total variance in the variables. Our results showed that the PC1 of feedforward consistency (FF_PC1) has an eigenvalue of 16, where 73% of the variance was extracted, and the PC1 of feedback consistency (FB_PC1) accounted for 84% of the variance (eigenvalue of 76). Therefore, PC1s were sufficient to account for most variance in the data.

Least consistent unit

The previous two composite and PC1 measures are susceptible to extreme values. This is especially profound when a unit (e.g., rime) of a word is highly consistent or inconsistent, while its consistency measured at other units are less extreme. As such, it is important to determine if an observed consistency effect is simply due to the word-level measure being skewed by its most or least consistent unit. To verify this possibility, we extracted the lowest value among all unit-level measures of each word as a word-level consistency measure of its own.

Corpus analyses

This section contains descriptive statistics of the MALD corpus (Tucker et al., 2019). To ascertain that these measures may in fact capture different aspects of consistency, we plotted a correlation matrix of each measure against the others (see Fig. 2) with a description of the labels provided in Table 1 and pre-scaling descriptive statistics presented in Table 2.

Table 1 Description of the variables used in the present study

Orthography-phonology consistency in English: Theory- and data-driven measures and their impact on auditory vs. visual word recognition

Abstract

Similar content being viewed by others

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Using information-theoretic measures to characterize the structure of the writing system: the case of orthographic-phonological regularities in English

Orthographic and phonological neighborhood effects in handwritten word perception

Explore related subjects

Introduction

The current study

Corpus-derived estimates of reading consistency

Method

Corpus

Consistency at different granularities

Grapheme consistency

Rime consistency

OVC consistency

Onset consistency

From probabilities to information-theoretic measures

Feedforward and feedback consistency

Word-level consistency

Composite score

Principal component analysis

Least consistent unit

Corpus analyses

Study 1: Consistency effects on word naming

Procedure

Analytic approach

Results and discussion

Study 2: Consistency effects on lexical decision across modalities

Procedure

Results and discussion

Predicting auditory lexical processing in the MALD dataset

Procedure

Predicting visual word recognition from the ELP dataset: Visual lexical decision task

Results and discussion

Study 3: Data-driven measures of consistency

Model architecture

Training procedure

Results and discussion

Predicting visual naming latencies in the ELP dataset

Predicting visual lexical processing in the ELP dataset

Predicting auditory lexical processing in the MALD dataset

General discussion

Implications for models of reading

Implications for theories of reading and reading development

Supplementary information

Uni- vs. bi-directional models

Individual hierarchical regression analyses for visual naming task

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation