A formal account of the interaction of orthography and perception
This study presents a formal generative model that integrates perception and reading, and uses English intervocalic consonants borrowed into Italian as either singletons or geminates to illustrate how the model works. Consisting of words borrowed in the 20th century, our data show that the quantity of the intervocalic consonant in an Italian loanword depends on its written representation in English, the source language. Thus only English intervocalic consonants that are written with two identical letters (for example, as in splatter) are borrowed as geminates. We provide a formalization of these orthographic adaptations with grapheme-to-phoneme mappings in the shape of Optimality-theoretic constraints that model the native reading process, and show how the output of these mappings is restricted by native phonotactic constraints. Furthermore, we illustrate that the native reading grammar proposed here complements the perceptual adaptation model by Boersma and Hamann (2009). This combined model is shown to be able to account for simultaneous orthographic and perceptual borrowings in Italian, as well as to hold for reading and perception outside the realm of loanword adaptation.
KeywordsPhonology Loanwords Orthography Perception Reading grammar Italian
Several studies on loanword adaptations state the important role orthography can play in the adaptation process. Friesner (2009), for instance, illustrates that Romanian loans of French words with “final orthographic consonants that are not pronounced in French are occasionally realized in Romanian loans” (128), e.g. the French word boulevard [bul(ə)vaʀ], which is borrowed as [bulevard] in Romanian. Smith (2006) found that in Japanese, loan doublets (borrowings of the same words twice with different resulting forms) often stem from separate means of borrowing, one perceptual and the other orthographic. Further examples of real adaptations that show an influence of orthography are given e.g. by Kang (2009) for Korean and Miao (2005) for Mandarin. A number of experimental studies on second-language perception, which are often interpreted as imitations of online loanword adaptations, show that writing can have a positive influence on the correct perception and identification of L2 segments; see e.g. the studies by Vendelin and Peperkamp (2006), Detey and Nespoulous (2008), Escudero et al. (2008), Escudero and Wanrooij (2010), Porter (2010), and Daland et al. (2015) and the overview by Bassetti et al. (2015). What the literature lacks, however, is a formalization of such an orthographical influence; that is, how the written form must be incorporated into a formal grammar model to account for the observed effects.
In this article we provide a model that accounts for orthographic borrowings by analyzing loanword adaptations in Italian, a language with a relatively transparent grapheme-to-phoneme mapping. Following Coltheart et al. (1993), we use ‘grapheme’ to refer to any letter or group of letters that corresponds to a single phoneme. Italian has a singleton-geminate contrast in consonantal length in inter-sonorant position, which is reflected in the orthographic representation of the consonants. In loan adaptations into Italian, intervocalic consonants in words from languages that only have singletons, such as English, are often borrowed as geminates, e.g. hobby /ˈɔbbi/. Although accounts of Italian loanword adaptations abound, e.g. Rando (1970), Repetti (1993, 2009, 2012), Morandini (2007) and Passino (2008, 2013)—and most do mention an influence of orthography on the adaptation of consonants—none provides a grammar model that can account for this influence.
The present study shows that for Italian borrowings from English in the 20th century the quantity of the intervocalic consonant in the Italian loanword depends on its written representation in English. More precisely, only English intervocalic consonants that are written with two identical letters are borrowed as geminates. To account for such an orthographic influence on the borrowing process, we provide the formalization of a native reading grammar that maps a written form onto a phonological surface form, where the output is restricted by native phonotactic constraints. This native reading grammar alone is shown to be able to account for the attested orthographic effects.
In the following section of this article, we introduce Italian syllable and word structure. Section 3 provides the loanword data, illustrates the influence of orthography on the borrowing of intervocalic consonants, and shows an interaction of orthography and perception in the borrowing process for some of these words. Section 4 formalizes the native reading grammar and its possible interaction with speech perception in Optimality Theory, henceforth ‘OT’ (Prince and Smolensky 1993 ). Section 5 shows briefly what a reading grammar looks like for two identical consonant letters in German native and non-native words; further, it discusses the implications of the reading grammar for a larger grammar model of linguistic knowledge, the possible modelling of the writing process, and the possible modelling of reading of languages with a less transparent grapheme-phoneme mapping than Italian. Section 6 compares the present proposal to earlier formal accounts of reading, and to earlier accounts of orthographic loanword adaptation in Italian. In Sect. 7, we offer some conclusions.
Before introducing Italian phonology, a remark on the employed notation is in order. In this article we use pipes for underlying, lexical representations; slashes for surface, allophonic representations; square brackets for auditory forms; and angle brackets for written forms. Geminates in surface phonological representations are transcribed by two separate identical consonant symbols (rather than one symbol with an additional length sign) as this allows a positioning of the syllable boundary between the two. Auditory forms stand for concrete values along continuous auditory dimensions such as first formant, second formant, duration, etc. They are given in IPA transcriptions though should not be confused with abstract phonological categories such as allophones or phonemes which are also transcribed with IPA symbols, albeit in either slashes or pipes.
2 Italian phonology in brief
Italian singleton consonants
The vowel system of Italian consists of seven phonemes | a e ε i o ɔ u |, all of which can occur in stressed position. In unstressed position, the lax vowels | ε ɔ | are prohibited. We assume in the following analysis that consonantal length contrasts are stored underlyingly and that vowel length contrasts are not (see e.g. Repetti 1993 or Krämer 2009; for an alternative proposal see e.g. Saltarelli 1984). We further assume that geminates are parsed as heterosyllabic (see Saltarelli 1983; Loporcaro 1990).
Structural constraints relevant for the present account of Italian
Assign a violation mark to every stressed syllable that is not bimoraic.
Assign a violation mark to every word-final long vowel.
Assign a violation mark to every intervocalic singleton /ʃ/.
Assign a violation mark to every intervocalic geminate /z/.
Assign a violation mark to every word with non-penultimate stress.
Stress in non-derived nouns is most frequently on the penultimate syllable (for references, see Krämer 2009:161). We employ the constraint Penult in (2e) to assign this default stress (following Repetti’s 1993 rule 1).4 There are numerous exceptions, e.g. /.ʧit.ˈta./ ‘city’ or /.ˈpεː.ko.ra./ ‘sheep’, for which stress is assumed to be stored lexically.5
The constraints in (2) are sufficient for the analysis of singletons and geminates in native and borrowed words in the present article but do not cover all of Italian phonology; for a complete picture, see e.g. Krämer (2009).
3 The data
The data analyzed in the present study all stem from the dictionary by Zingarelli et al. (2015). For accuracy, they were checked with six native speakers: three older (average age of 63) and three younger (including this article’s second author; average age of 25). The native speakers largely agreed with the pronunciations given in Zingarelli; for a full list of their responses, see the Appendix.6
We focus on Italian loanwords borrowed from English in the last century (which makes an orthographic influence in the borrowing process more likely) that have an intervocalic consonant preceded by a short/lax vowel in English, as this can result in either a singleton or a geminate in the Italian loanword. Preceding long/tense vowels in the English form result in a borrowing with a singleton, see e.g. slogan /.ˈzlɔː.gan./ or speaker /.ˈspiː.ker./ for two reasons. First, a vowel with long duration in relation to a shorter following consonant is perceptually interpreted by Italian native speakers as a vowel followed by a singleton (Esposito and Di Benedetto 1999; Pickett et al. 1999). Second, long/tense interconsonantal vowels in English are always written with only one following consonantal letter, and this single consonantal letter does not cause an orthographic interpretation as geminate. Orthographic and perceptual information therefore would result in the same adaptation of a consonant preceded by a long/tense vowel, namely as singleton.
We base our decision whether the preceding vowel is tense or lax on a Standard Southern British English variety, because we assume that this is the variety that native speakers living in Italy are more exposed to (or at least were in the first half of the 20th century). Where relevant, we discuss possible alternatives. For an account of the incorporation of American English words into the Italian of Italian immigrants, see e.g. Repetti (2009).
Borrowings with geminate consonants
ba nn er
ho bb y
ho rr or
jo ll y
hi pp ie
sho pp ing
thri ll er
ra ll y
bu ff er
spla tt er
jo gg ing
newsle tt er
ha pp ening
baby si tt er
no co mm ent
a cc ount
a tt achment
co mm ando
pu ll over
Borrowings with singleton consonants
ho ck ey
ha ck er
e d itor
gla m our
ca m eraman
mo n itor
Orthography, on the other hand, is clearly correlated with the choice of consonantal length: all the consonants borrowed as geminates in (3) are written with two identical letters, while the consonants borrowed as singletons in (4) are written either with a single letter or two different letters (<ck>). Italian geminates are orthographically represented by two identical letters (or two identical letters followed by another letter, e.g. <cc(i)> for /ttʃ/). The only exceptions are the intrinsically long consonants /ɲː ʃː ʎː tːs dːz/, which are written with single consonant letters (though /tːs dːz/ can also be written as <zz>), and the intervocalic sequence <cqu> which is /k.kw/ (e.g. acqua /ˈak.kwa./). Singleton consonants are usually written with a single letter. Exceptions to this one letter-singleton generalization are <ch> for /k/ and <gi> for /dƷ/ before non-front vowels, and <gn> for /ɲ/, <gl(i)> for /ʎ/, and <sc(i)> for /ʃ/ word-initially; for further details see e.g. Hall (1944). We therefore propose that the borrowers apply their knowledge of Italian grapheme-to-phoneme correspondences to the English written form when adapting these words.
Borrowings that violate the orthographic prediction
The third exception in (5), /.ˈmɔː.bin./, is not due to phonotactic restrictions, as a form with a geminate (/.ˈmɔb.bin./) is not only possible but also the only one that our six native speakers used. According to these native speakers’ judgments, mobbing therefore forms no exception and is borrowed with a geminate, as the English orthography would predict.
For the words in (3) and (4) above we argued that the orthography determines the quantity of the consonant (and the correlating quantity of the preceding vowel). A further indicator for an orthographic influence in the adaptation of these words is the quality of the vowels preceding the singletons/geminates. Several of them reflect the Italian grapheme-phoneme mapping for vowels instead of the English pronunciation, see e.g. banner /.ˈban.ner./, splatter /.ˈsplat.ter./ and hacker /.ˈaː.ker./. In these cases the English [æ] has not been rendered with the perceptually closest Italian vowel /ε/ (see e.g. the results of the perception experiment by Flege et al. 1999), but with /a/, the vowel corresponding to the grapheme <a> in Italian. For words like these, we can therefore assume a purely orthographic borrowing process, which we formalize in Sect. 4.2 below.
The borrowed vowels in other words are clearly influenced by the perception of the English form, as e.g. the stressed vowels in buffer /.ˈbaf.fer./, glamour /.ˈglεː.mur./ and rally /.ˈrεl.li./. For cases like these, we have to assume an interaction of perceptual and orthographic borrowing strategies, where perceptual cues to vowel quality and orthographic mappings together determine the output (again restricted by native phonotactic constraints). This interaction of orthographic and perceptual mappings is formalized in Sect. 4.3 below.
4 Modelling orthographic and perceptual borrowings
In this section, we formalize the orthographic adaptation of English intervocalic consonants into Italian by introducing an Optimality-Theoretic reading grammar, i.e. the language-specific mapping of written forms onto surface phonological forms, which is used in the reading process.
The working of an Italian reading grammar is illustrated with three native Italian words in Sect. 4.1. In Sect. 4.2, we show how orthographic borrowings from English can be accounted for by employing this Italian reading grammar to English written forms. This is of course only possible because both native and source language employ a Roman alphabetic script. In Sect. 4.3, we illustrate that our reading grammar is compatible with a perception grammar as proposed by Boersma (2007) and applied by Boersma and Hamann (2009) to loan adaptations, and that the combined model can account for possible cases of simultaneous orthographic and perceptual borrowings.
4.1 Native reading: Orthographic mappings and phonotactic constraints
General orthographic constraints for shallow orthographies
Assign a violation mark to every grapheme <γ> that is not mapped onto the phonological form /P/ and vice versa.
Assign a violation mark to every grapheme <γ> that is mapped onto an empty segment in the SF.
Assign a violation mark if the absence of a grapheme is mapped onto the phonological form /P/.
In principle, a universal set of such orthographic constraints can be assumed, mapping all possible written units (including an empty form) onto all possible phonological surface units (including an empty form). It seems more plausible to us, however, that the language learner postulates such constraints on the basis of the acquired phonological surface units and the encountered written units, as this drastically reduces the number of constraints the learner has to handle and reflects the fact that the acquisition of reading (and writing) depends on previously acquired phonological knowledge (further discussed on the next page). Under this assumption, the constraints in (6) could be considered templates that learners employ to create language-specific Orth constraints.
Examples of Orth constraints of the shape (6a) that are relevant (i.e. high-ranked) in Italian are e.g. <f>/f/, <t>/t/, <u>/u/, <a>/a/, etc., but also e.g. <gn>/ɲ/, <gli>/ʎ/, i.e. constraints with graphemes that consist of two or more letters and therefore violate the “one letter–one sound” principle. Such constraints have to be higher ranked than the constraint against an “empty” mapping in (6b) and constraints mapping single letters onto phonemes, to ensure the retrieval of the correct SF in words such as e.g. <gnocchi> /.ˈɲɔk.ki./ or <aglio> /.ˈaʎ.ʎo./ ‘garlic’. Languages like Italian with so-called shallow alphabetic writing systems (Liberman et al. 1980; Katz and Feldman 1983), where the spelling is consistent, have mostly a one-to-one relationship between grapheme and phoneme, and mostly graphemes that consist of single letters. Languages like French and English with so-called deep alphabetic writing systems have more graphemes that consist of several letters, very often several graphemes for the same phoneme, and the same graphemes for different phonemes.
Orthographic constraints relevant for the singleton-geminate contrast in Italian
Assign a violation mark if a grapheme of two identical consonantal letters is not mapped onto a surface geminate, and vice versa.
Assign a violation mark whenever a single vowel letter is mapped onto a long surface vowel.
Assign a violation mark to every vocalic letter with a grave accent that is not mapped onto a stressed vowel.
Assign a violation mark to every letter <h> that is not mapped onto an empty segment in SF.
The constraint in (7c) is included to illustrate how orthographic markings of non-default stress patterns are dealt with. In this case, the grave accent on the vowel letter has to be mapped onto a corresponding stressed vowel. Italian has further possibilities to mark irregular stress orthographically, which are not included in the present account. Constraint (7d) we employ to account for the fact that the letter <h> maps onto no SF in Italian.
In Sect. 2 above we provided evidence from Italian that the output of the grapheme-phoneme mapping is influenced by phonotactic restrictions: this captures the idea that readers only create phonological forms that are in line with the phonological structure of their language. It furthermore prevents orthographic mappings from reduplicating phonological knowledge that is already represented somewhere else in the readers’ grammar/brain. Neurolinguistic studies on reading alphabetic orthographies support our proposal: during the reading process, a cluster in the left inferior parietal gyrus is activated, which is usually also involved in non-reading related, sub-lexical phonological processes (see the meta-analysis of existing neuroimaging studies by Cattinelli et al. 2013). The assumed influence of phonological knowledge on the reading process furthermore predicts that a phonological deficit or difficulties in accessing phonological representations lead to problems in the acquisition of reading, which has been shown by studies on dyslexia (see e.g. Liberman and Shankweiler 1985; Ramus and Szenkovits 2008).
Section 5.3 below deals with possible orthographic mappings other than those onto surface phonological forms.
As for the lower part of Fig. 2, Orth constraints of the form <γ>/P/ referring to single letters, with <u>/u/ as an example, are assumed to be lower ranked than the constraint <βiβi>/Cː/ because the mapping of two or more letters overrides single letter mappings. The Orth constraint *<α>/Vː/ is low ranked, too, as mentioned above, because it is violated often in the Italian reading process since the length of the vowels is not expressed in Italian orthography. The exact ranking of the Orth constraint <h>/ / cannot be determined on the basis of our data, as we will see in tableau (12) below, and the constraint is therefore not included in Fig. 2.
We postulated the ranking in Fig. 2 based on the frequency of occurring forms and on logical considerations. This ranking is learnable with the help of the Gradual Learning Algorithm (GLA; Boersma 1997; Boersma and Hayes 2001), assuming an initial ranking of all constraints at the same height, and gradual demotion and promotion on the basis of natural input distributions.
Reading native <fatto>
Reading native <fato>
Reading native <città>
The winning surface forms /.ˈfat.to./, /.ˈfaː.to./ and /.tʃit.ˈta./ from tableaux (8), (9) and (10) are mapped onto the underlying forms |fatːo|, |fato| and |tʃitːˈa|, respectively. In this shape they are assumed to be stored together with their meaning in the mental lexicon of Italian speakers. The mappings of surface onto underlying form are not relevant for the present argument and therefore not formalized, but see Sect. 5.2 below for the full grammar model.
4.2 Orthographic borrowings: Reading non-native forms
Orthographic adaptation of <banner>
Orthographic adaptation of <hacker>
Orthographic adaptation of <fashion>
Orthographic adaptation of <puzzle>
4.3 Formalizing simultaneous orthographic and perceptual adaptations
The quality of the stressed vowel in the adaptation of the words puzzle /.ˈpaː.zel./, fashion /ˈfεʃ.ʃon./ and in several other loanwords in (3) and (4) is obviously not due to Italian grapheme-phoneme mappings. Instead, the English original vowel quality is mapped onto its auditorily closest Italian equivalent. Hence, [phʌzl̩] turns into Italian /.ˈpaː.zel./, [fæʃən] into /ˈfεʃ.ʃon./, [ɹæli] into /.ˈrεl.li./, etc. In order to account for words like these, we have to model an interaction of orthographic influences (determining the quantity of the intervocalic consonant) and perceptual influences (determining the quality of the preceding vowel). For the latter, we assume Boersma’s (2007) perception grammar, which was applied by Boersma and Hamann (2009) to account for perceptual adaptations of loanwords. In the perception grammar, an incoming auditory form is mapped onto a SF with the help of Cue constraints of the form [A]/a/: “map the auditory form [A] onto the phonological surface form /a/.” The output of this perception is again influenced by phonotactic restrictions (the Structural constraints) and their language-specific ranking. As is obvious from this description, the OT reading grammar proposed in the present article is parallel to Boersma’s perception grammar. And similar to the presently proposed orthographic adaptation process via a native reading grammar, a perceptual adaptation is nothing else than perceiving a non-native auditory input with a native perception grammar (Boersma and Hamann 2009).
To illustrate an integrated orthographic-perceptual adaptation, the two Italian loanwords buffer /.ˈbaf.fer./ and rally /.ˈrεl.li./ are taken as representative examples, where the quality of the stressed vowels is obviously determined by the English auditory forms: Purely orthographic adaptations in the two example words would have rendered /.ˈbuf.fer./, due to the Orth constraint <u>/u/, and /.ˈral.li./, due to the constraint <a>/a/. These two Orth constraints are thus overridden by perceptual information, i.e. in OT terms, they are outranked by Cue constraints.
Cue constraints of Italian relevant for the integrated adaptations
Assign a violation mark to every auditory form with mid second formant values that is not mapped onto the surface vowel /a/.
Assign a violation mark to every auditory form with high second formant values that is not mapped onto the surface vowel /ε/.
We determined already that the Cue constraints (15a) and (15b) have to be ranked above the Orth constraints <u>/u/ and <a>/a/, respectively, to predict the correct output forms for the integrated orthographic-perceptual account. This ranking expresses the fact that the perceptual cues for vowels are more salient than the orthographic forms of the vowels, if both percept and writing are available to the borrower.
Combined orthographic and auditory adaptation of buffer
Combined orthographic and auditory adaptation of rally
5 The bigger picture: Reading, comprehension and writing in Italian and other languages
Up to now we showed that a formal modelling of the native Italian reading process with an Italian reading grammar, where Orth constraints interact with Struct constraints (Sect. 4.1), can also account for the borrowing of intervocalic consonants in loanwords from English (Sect. 4.2). We furthermore illustrated that this reading grammar can interact with the native Italian perception grammar (Cue and Struct constraints) to account not only for native reading and perception, but also for loanwords from English that show a simultaneous influence of orthography and perception (Sect. 4.3).
The presented reading grammar is thus not restricted to loanword adaptation, and neither is it restricted to Italian. It can be easily applied to other languages with an alphabetic script: Sect. 5.1 below very briefly illustrates how double consonantal letters are treated in native German and in German loanwords from Italian.
Section 5.2 then moves on to show how word recognition works for words that were read via a reading grammar, and how a reading grammar can simply be reversed to account for the process of writing. Section 5.3 discusses alternatives to a mapping from written onto a phonological surface form, and why and when we need to assume such alternative mappings.
5.1 Reading grammars in other languages: German double consonantal letters
German minimal pair illustrating vowel length contrast
German reading of native <Ratte>
German reading of native <Rate>
German reading of non-native <latte>
5.2 The processes of word recognition and writing
In Fig. 5, the arrows connecting all forms point both ways, indicating that the mappings are bidirectional. This bidirectionality is a main principle of the BiPhon model, where not only speech perception and recognition, but also the reverse processes of phonological production and phonetic implementation are modelled. This is done with the same constraints used in the other processing direction: Lexical and Faith constraints map meaning and underlying form onto the phonological surface form in phonological production, where the output is restricted by Struct constraints, and Cue constraints are responsible for the mapping from surface onto an auditory form in phonetic implementation.17
Italian writing of /.ˈfat.to./
5.3 Reading as mapping onto a surface form or a higher-level representation?
Many languages with an alphabetic script have inconsistent mappings between graphemes and phonemes, as e.g. in English and French, where the same sound can be written in different ways (heterographs, as e.g. English <here> and <hear>), and the same written form can be pronounced differently (heteronyms, e.g. English <tear> for [teə] and [tɪə]). For cases of heterographs and heteronyms, a reading and writing grammar as proposed up to now is insufficient. In order to be able to write, the writer needs to know which of the possible written forms is the one representing the underlying form with the intended meaning. And in order to be able to read, the reader needs to know which of the possible underlying forms and meanings is the one associated with the given written form. To be able to account for such cases, we need an additional way of accessing lexical entries in the reading process. We follow the cognitive dual-route model of reading (henceforth: DR model; e.g. Coltheart et al. 1993, 2001) in assuming there are two possible mappings of the written form. One is the sub-lexical route, where graphemes are transformed into a so-called ‘sub-lexical phonological form.’ This form corresponds to the SF in our proposal, and we illustrated already how this mapping is performed by Orth and Struct constraints. The second route is the lexical route, also known as direct access,18 where the written form is directly mapped onto a lexical entry in the shape of a whole-word phonology together with its meaning. The lexical route in the DR model is explicitly said to employ visual word recognition but no graphemic parsing (Coltheart et al. 1993:597). This means that the reader is mapping the written word in its entirety onto a stored UF word form and a connected meaning, and that phonology plays no role in this holistic access. Such a mapping therefore can also account for logographic writing systems.
A question that arises in this context (and was asked by one reviewer) is whether the mapping we propose in the present article for the sub-lexical route, i.e. from written form onto SF, is not better replaced by a mapping from written onto underlying form (UF). Instead of the Orth constraints proposed in this article, we would need very similar constraints mapping graphemes onto phonemes in the UF, e.g. <t>|t|: ‘Assign a violation mark to every letter <t> that is not mapped onto a |t| in the UF.’ An argument in favor of such a UF mapping is that most languages with an alphabetic script do not distinguish allophones in their writing system, referring to phonemes instead. One of the few counterexamples can be found in Dutch, where the devoiced allophones of voiced sibilants are encoded in the writing, e.g. <laars – laarzen> ‘boot sg. – pl.’ and <wijf – wijven> ‘woman sg. – pl.’.
In the Italian data presented here, we encountered one case of allophony, namely the two allophones /V/ and /V:/ of stressed vowels, which are not distinguished orthographically. For those, we need to assume that Italian readers map the same written form onto two different surface allophones, and subsequently onto one common underlying phoneme. As the above-mentioned reviewer correctly pointed out, it seems more straightforward to assume readers map a vowel grapheme directly onto its corresponding UF. However, we would not be able to restrict this mapping by the same phonotactic constraints that apply in the process of speech perception and phonological production (formalized as Struct constraints in the BiPhon model). We would therefore lose exactly what makes our proposal attractive, namely the non-reduplication of such phonological knowledge. Instead, we would have to postulate an additional set of restrictions, this time on the UF. Such morpheme structure constraints (MSC) were proposed in rule-based generative phonology (see e.g. Halle 1959; Stanley 1967), but are in conflict with OT’s concept of Richness of the Base because they pose restrictions on the input (e.g. McCarthy 1998). Further points of criticism against MSC are that they at least partly reduplicate phonotactic constraints (Struct), and seem to capture statistical tendencies rather than absolute constraints (see e.g. Booij 2011 for a full discussion). For these reasons we do not consider a grapheme-UF mapping a valid alternative to our proposal of the sub-lexical route in reading.
Returning to the distinction between lexical and sub-lexical routes, the present model is in agreement with the DR model that nonce words can only be read via the sub-lexical route, as they have no lexical entry. For the same reason, initial orthographic adaptations of loanwords can only occur via the sub-lexical route, as these new words initially lack a lexical form. In the process of reading and thus adaptation, a lexical entry consisting of an underlying phonological form and a meaning is created. In principle it would be possible for a borrower to create a new lexical entry from a written form and a meaning deducted from the context without passing the grapheme-to-phoneme mapping, thus by lexical route. In that case, neither phonotactic restrictions nor phonological mapping constraints would guide this lexical entry, so there would be no restriction whatsoever on the UF.
When reading real words in alphabetic scripts, the lexical and the sub-lexical route are assumed here to be in competition with each other (easily modelled within OT). This competition is obvious for languages with deep orthographies, for instance English, where the irregularly written <sew> |səʊ|, accessible via the lexical route only, competes with a sub-lexical mapping of the grapheme <ew> onto /uː/ as in few, dew, etc. Competition between the lexical and the sub-lexical route is also commonly found in languages with shallow orthographies (see e.g. Katz and Frost 2001; Grainger et al. 2012).
6 Previous linguistic accounts of orthographic borrowings in Italian and of reading and writing in general
In this section, we compare the present account of singleton/geminate borrowings in Italian to earlier proposals on orthographic borrowings in Italian (Sect. 6.1), and to previous formalizations of the reading and writing process within a grammar theory (Sect. 6.2).
6.1 Earlier accounts of the (non-)gemination of consonants in Italian loanwords
In this subsection, we discuss the studies by Rando (1970), Repetti (1993) and Morandini (2007), as they all looked at borrowings of English intervocalic consonants into Italian, and their observations partly complement and partly challenge the present account. The studies by Passino (2008, 2013) and Repetti (2009, 2012) are not discussed as they predominantly deal with word-final consonants in Italian loans and the question whether these should be treated as singletons or geminates. Though this is an interesting question, it falls outside the scope of the present paper.
tu nn el
sho pp ing
te nn is
scoo t er
boo m erang
repo rt er
The difference between the orthographic and the perceptual borrowing strategy, Rando remarks, and its resulting variation in borrowing form is sometimes even reflected in two possible writings in Italian, e.g. pullover/pulover or bluff/bluf (130). For such cases, Rando observes a tendency to preserve the original written form, as it is more prestigious (134). In our data set, far less variation occurred, though we did find considerable variation for the word pullover (see Appendix). Rando did not discuss simultaneous orthographic and perceptual adaptations that we covered in Sect. 4.3.
Though the focus in her analysis is on vowel length, she makes the same generalization with respect to writing as Rando (1970). And as in Rando’s dataset, Repetti’s examples illustrating short consonants (after long vowels) all involve originally long vowels (or diphthongs) in English, cf. (24a), and therefore could also be accounted for by perceptual vowel length instead of orthography.
If the written form has a single consonant following the stressed vowel, the vowel is pronounced long in accordance with the rules of the Italian writing system. If, instead, the stressed vowel is followed by two written consonants, that vowel is pronounced short. (Repetti 1993:192)
Loanwords from Repetti ( 1993 :192) illustrating stressed penultimate vowels
Borrowings of nominalized phrasal verbs (from Zingarelli et al. 2015)
Orthographic borrowings with singletons from Morandini (2007 :27)
As mentioned already in the introduction, none of the studies on English loanwords in Italian known to us provide a formalization of the role of orthography in a grammar model, a point we will focus on in the next section.
6.2 Previous linguistic models of reading and writing
The earliest linguistic accounts of orthography that employ an explicit formalization can be found in the tradition of rule-based Generative Grammar, and employ formal rewriting rules similar to those used for the description of phonological processes. Bierwisch (1972), for example, provides context-sensitive orthographic rewriting rules that take phonological feature matrices as input and generate written forms from them. These early rule-based approaches to orthography are concerned with the production, i.e. the writing process, only, and consider the written form to be solely derivable from phonology. This idea was taken to the extreme by Chomsky and Halle (1968), who propose that in English the underlying form very often is identical to the written form.
A newer and somewhat different rule-based approach to orthography is provided by Neef (2012), who formalizes the reading process (which he calls “recoding”). Neef distinguishes between individual correspondence rules (e.g. <m>→[m]) and constraints that capture general properties of the writing system. An example of such a general constraint for German is, e.g. “in a sequence of identical letters, all non-initial ones may be recoded as zero” (211).23 For the written input <mm>, this constraint provides an alternative output [m] in addition to [mm]. All outputs generated by correspondence rules and constraints are checked for their phonological well-formedness via a phonological filter, which discards ungrammatical outputs (e.g. [mm] at the end of a syllable). Though Neef’s principle that phonology “controls” the reading process is in line with the present account, his use of rules to create outputs make additional machinery in the form of constraints and a phonological filter necessary, and also requires a derivation with at least two stages. In the present OT formalization, on the other hand, mapping constraints (Orth) and restricting constraints (Struct) simply interact in the choice of the best candidate. Furthermore, though Neef writes that his recoding system can also be used for the writing process, it is not clear how this is performed with the provided correspondence rules, especially those that include context, as they cannot be simply reversed, while we showed in Sect. 5.2 how the proposed Orth constraints work bidirectionally, i.e. are interpretable both in the reading and writing direction.
A full-fledged OT account of writing is provided by Wiese (2004), who proposes sound-letter correspondence constraints that map underlying phonological input onto written form. The evaluation is performed with the help of correspondence constraints such as Max, Dep and Ident, which are usually employed in OT to evaluate the mapping between UF and SF (Prince and Smolensky 1993 ), and in Wiese’s account incorrectly imply a possible identity between written and phonological form. In the present account, the different nature of the two forms is made explicit by employing constraints that arbitrarily map one onto the other. Wiese furthermore distinguishes between predictable and unpredictable features of an orthographic system, where the former are dealt with by constraints, whereas the latter are stored in the lexicon. Such a distinction is, however, difficult to make in languages with irregular orthographies. Though Wiese’s account is restricted to writing, he points out that the “bidirectional nature of correspondence relations allows for constraints looking into both directions” (316). Wiese’s account is extended in the study by Song and Wiese (2010), who propose that the input to the OT orthographic evaluation can be either the UF (e.g. in Korean) or the SF (what they call “lexical outputs” 91; e.g. in German).
Baroni (2013) in his account of alternative writings for existing English words (e.g. <tonite> for tonight) proposes bidirectional constraints that map the SF onto written forms and vice versa. An example is his constraint <VCe>↔tenseV, which stands for “<V> in <VCe> sequences is bidirectionally mapped onto a tense vowel” (31). These constraints are then used to model writing only, where phonotactic restrictions do not play any role.
In terms of formalizing orthography especially for loanword adaptation, only the study by Dong (2012) deals with this topic. Dong, referring to the BiPhon model, proposes OT constraints of the type “<x>ORTH should not be mapped to /y/SF” (48) for the correlation between Pinyin written forms and Mandarin Chinese surface forms. However, Dong neither employs such constraints in her OT formalization of Mandarin borrowings, nor discusses their possible interactions with other (such as Struct) constraints.
In sum, none of the previous proposals make a principled distinction between orthography and phonology, or show how the two interact in the reading (or writing) process.
In this article we proposed that the borrowing of English intervocalic consonants after short/lax vowels into Italian is influenced by orthography in such a way that only consonants that are orthographically represented with two identical letters are borrowed as geminates, whereas those represented with a single letter or a sequence of two different consonantal letters are incorporated as singletons. We formalized the orthographic borrowing in an OT framework with the help of Orth constraints, responsible for grapheme-to-phoneme mappings, and argued that this mapping is influenced by phonological, i.e. structural restrictions (Struct constraints) on Italian. Together, the language-specific ranking of Orth and Struct constraints form the native Italian reading grammar, which was shown to be able to handle the reading of native words. The application of this native reading grammar to non-native written forms from English was shown to account for the attested borrowed forms. Compared to earlier proposals, our study is trailblazing, providing a first formal account of adaptations via orthography, and making the role of phonotactic restrictions in the reading and writing grammar explicit without reduplicating these restrictions in the orthographic constraints. Such adaptation does not require any loanword-specific devices but rather makes use of the reading grammar necessary for reading and writing of native Italian. Furthermore, we showed with an example from German that the proposed model is not language-specific but rather applicable to all languages that employ an alphabetic script. Though not illustrated here, the proposed reading grammar can be easily extended to languages that use syllabaries as writing systems, such as Japanese kana, where Orth constraints map syllabograms onto syllables that are, in turn, restricted by Struct constraints. For logographic scripts, such as Chinese characters (hanzi), we proposed direct mappings (the so-called lexical route) from written form onto pairs of underlying form and meaning. We leave elaboration of these reading grammars and corresponding writing grammars for future work.
In fact, our study uncovers several topics still to be dealt with. First, as mentioned at the beginning of Sect. 4, English orthography can only influence Italian borrowings because both languages employ a Roman alphabetic script. English uses several graphemes that do not, or once did not, occur in Italian orthography such as <k>, <th> and <ck> (and, conversely, Italian has several graphemes that do not occur in English). A native reading grammar can only be applied to graphemes that are used in the native writing system. At the same time, new graphemes can be introduced via simultaneous perceptual and orthographical borrowings, as occurred with <k> in Italian, which has been substituting <ch> even in native words (Hall 1958), e.g. kilometro ‘kilometer’ as alternative to chilometro. How the introduction of a new grapheme proceeds is another topic we leave for future work.
Furthermore, we did not address the possible knowledge Italian native speakers might have of English orthography through L2 acquisition of English. The quality of the stressed vowels in a word such as buffer /.ˈbaf.fer./ that we explained as a perceptual effect, for instance, could also be explained by assuming that the borrower had some knowledge of English grapheme-to-phoneme mappings for the vowel.24 Since the borrowing of the two intervocalic consonantal letters as a geminate is clearly caused by Italian orthography, we would deal with an incomplete L2 reading grammar for English (e.g. a high-ranked constraint <u>/a/) complemented by the native Italian reading grammar, which could be formalized in our proposed reading grammar as an interaction of two language-specific rankings of universal Orth constraints (or, in an emergentist approach, as an interaction of language-specific Italian and English Orth constraints).
A further topic of interest we only briefly touched upon is the possible difference between orthographic and perceptual borrowings. In her study of loan doublets in Japanese, Smith (2006) found that orthographic borrowings lead to more epenthesis whereas perceptual borrowings were likelier to result in segmental deletion. In our case, orthography also led to epenthesis of phonological material, namely of a second mora for intervocalic consonants represented with two identical consonantal letters, despite these consonants being monomoraic in the phonological structure of the source language English. We did not find any cases of deletion for perceptual borrowings. In our data, there seemed to be a different asymmetry between the two borrowing strategies. We observed a bigger influence of auditory information on the borrowing of vowels, whereas the borrowing of consonants seemed more influenced by writing. This might be due to the larger perceptual salience of vowel cues compared to consonantal cues. Connected to this topic is the question of whether perception, orthography or both guide Italian speakers in their borrowing of two adjacent vowel letters, such as in acc ou nt /au/, hock ey /ei/ or hipp ie /i/ (all from our dataset in (3) and (4)). Diphthongs in stressed syllables seem to favor perceptual adaptation, such as in acc ou nt, but also m ou se /au/, m ee ting /i:/ and l ea der /i:/, while diphthongs in unstressed syllables favor orthographic adaptation, such as in au sterity /au/ and hock ey /ei/ (Zingarelli et al. 2015). This adaptation of <VV> sequences and its formalization are also left for analysis in future research.
/Ʒ/ only occurs in loanwords, e.g. beige or garage.
We disregard lengthening of the penultimate vowel in the present article, which applies to both open and closed syllables (see measurements by D’Imperio and Rosenthall 1999:6) and adds extra length to a stressed penultimate vowel.
A faithfulness constraint referring to lexical stress would have to be included in the analysis and ranked above Penult to account for such irregular stress patterns.
We chose the dictionary by Zingarelli et al. (2015) as our source because it is updated frequently, contains many loanwords, and seems to reflect the actual pronunciation of these words quite well, as the comparison with the production of our six native speakers shows.
We adjusted Zingarelli et al.’s transcription by indicating vowel length in stressed syllables, and by changing schwas that Zingarelli et al. employed for some unstressed vowels into /e/, in accordance with the intuition of our native speakers.
English words ending in -ing show variation in the final sounds of the borrowed form between /-in/ and /-iŋg/ (with nasal place assimilation) (Zingarelli et al. 2015), where younger speakers seem to exclusively use the latter (as confirmed by our younger speakers). This variation is not included in our transcriptions.
Some consonants occur only in one of the two sets (e.g. /k/ as singleton or /p/ as geminate). We put this down to accidental gaps in the data sets and refrain from postulating idiosyncratic borrowing strategies such as “/VkV/ is always adapted as /Vː.kV/.”
As graphemes are mapped onto surface phonological forms, the latter can be allophones. We discuss the implications of this in Sect. 5.3 below. In the following, we employ the term grapheme-phoneme mappings to include mappings from graphemes onto allophones.
This constraint collapses two negatively formulated constraints, *<βiβi>/C/ and *<β>/Cː/, that do not need to be distinguished in the following analyses.
Candidates with a geminate that do not span two syllables are not included in the present tableaux. They would violate an additional constraint requiring geminates to be bisyllabic.
The form with an initial /h/ also violates a possible Struct constraint */h/ “Assign a violation mark to every surface /h/”, which would be ranked high as Italian does not have a glottal fricative in its phoneme inventory. This Struct constraint would attain the same result as the Orth constraint <h>/ /.
Purely orthographic adaptations show regular penultimate stress, as in the examples analyzed in the present section. Antepenultimate stress in loanwords like happening /.ˈεp.pe.nin./ in (3), monitor /.ˈmɔː.ni.tor./ in (4), karavan /.ˈkaː.ra.van./ or musical /.ˈmjuː.zi.kɔl./ (the last two from Bafile 1999:210) seems to mirror the stress placement of the original English words (as proposed also by Bafile ibid.), and therefore suggests (at least partially) perceptual borrowing as described in Sect. 4.3 below.
A more general constraint like <α(β)>/+long/ seems very low ranked in German, because word-final <αβ> sequences are often mapped onto /–long/, see e.g. hat /hat/ ‘had’, while long/tense vowels in final syllables usually are specially encoded in writing with two identical vowel letters, e.g. Saal /zaːl/ ‘hall’, or an additional letter <h>, e.g. Pfahl /pfaːl/ ‘stake’.
Anti-Faith constraints (Boersma and Hamann 2009:41) possibly also play a role.
A further mapping of this auditory form onto an articulatory form is assumed in the process of phonetic implementation. The constraints necessary for its formalization are described in e.g. Boersma (2011).
Direct access is opposed to mediated access via phonological representations; see e.g. Bradshaw (1975) for a review.
The transcriptions of the examples given in this section are not those employed in the original sources but our careful transference into the transcription system that we employ throughout the present article (IPA with additional phonological assumptions as elaborated in Sect. 2).
We would like to thank one of the three anonymous reviewers for pointing this out to us.
Morandini (2007:22) reports two forms for the loan weekend /.wi.ˈkεnd./∼/.wik.ˈkεnd./ (Zingarelli et al. 2015 gives only /.wi.ˈkεnd./), which might be due to the same reason, namely application of backwards raddoppiamento only if the word is interpreted as consisting of two separate syntactic words by the borrower.
Note that the second author of this article does not recognize access as a loanword.
This constraint captures the fact that German orthography uses two identical consonantal letters to indicate the shortness/laxness of the preceding vowel, which we encoded with the Orth constraint <α(βiβi)>/–long/ in Sect. 5.1.
To reiterate, the grapheme-phoneme correspondences in English are quite irregular. The grapheme <u> can represent, for example, /ʌ/ /ʊ/, /uː/, /ɜː/, /ε/ or /ə/, though it corresponds to /ʌ/ more often than to any of the others (see the corpus study by Berndt et al. 1987:8).
We would like to thank our participants for their participation. Furthermore, we thank Paul Boersma, Robert Cloutier, Edoardo Cavirani, Jonathan Weinand, and the audience at the Manchester Linguistics and English Language Research Seminar, the Workshop on the occasion of Marko Simonović’s defence and the 23rd Manchester Phonology Meeting and the three reviewers for helpful comments.
The first author is responsible for the analysis and writing up of the results, the second author for the data collection and for an initial analysis with orthographic constraints that contained phonological information, see Colombo (2014).
- Apoussidou, Diana. 2007. The learnability of metrical stress. PhD diss., University of Amsterdam. Google Scholar
- Bafile, Laura. 1999. Antepenultimate stress in Italian and some related dialects: Metrical and prosodic aspects. Rivista di Linguistica 11: 201–229. Google Scholar
- Bierwisch, Manfred. 1972. Schriftstruktur und Phonologie. Probleme und Ergebnisse der Psychologie 43: 21–44. Google Scholar
- Boersma, Paul. 1997. How we learn variation, optionality, and probability. Institute of Phonetic Sciences of the University of Amsterdam (IFA) 21: 43–58. Google Scholar
- Booij, Geert. 2011. Morpheme structure constraints. In The Blackwell Companion to Phonology. Volume 4: Interfaces, eds. Marc van Oostendorp, Colin Ewen, Elizabeth Hume, and Keren Rice, 2049–2070. Oxford: Blackwell. Google Scholar
- Chierchia, Gennaro. 1986. Length, syllabification and the phonological cycle in Italian. Journal of Italian Linguistics 8: 5–33. Google Scholar
- Chomsky, Noam, and Morris Halle. 1968. The sound pattern of English. New York: Harper and Row. Google Scholar
- Colombo, Ilaria. 2014. Italian loanword adaptation in OT: How orthographic representation affects perception. Ms., University of Amsterdam. Google Scholar
- Dong, Xiaoli. 2012. What borrowing buys us: A study of Mandarin Chinese loanword phonology. PhD diss., University of Utrecht. Google Scholar
- Hall, Robert A. Jr. 1958. Kappa pubblicitario. Lingua Nostra 19: 129. Google Scholar
- Halle, Morris. 1959. The sound pattern of Russian: A linguistic and acoustical investigation. The Hague: Mouton. Google Scholar
- Hamann, Silke. Submitted. One phonotactic restriction for speaking, listening and reading: The case of the no geminate constraint in German. Ms., University of Amsterdam. Google Scholar
- Hamann, Silke, and David W. L. Li. 2016. Adaptation of English onset clusters across time in Hong Kong Cantonese: The role of the perception grammar. Linguistics in Amsterdam 9: 56–76. Google Scholar
- Kang, Yoonjung. 2009. English /z/ in 1930s Korean. In 2nd International Conference on East Asian Linguistics, eds. David Potter and Dennis Storoshenko. Vol. 2 of Simon Fraser University Working Papers in Linguistics. Google Scholar
- Katz, Leonard, and Laurie Feldman. 1983. Relation between pronunciation and recognition of printed words in deep and shallow orthographies. Journal of Experimental Psychology 9: 157–166. Google Scholar
- Krämer, Martin. 2009. The phonology of Italian. Oxford: Oxford University Press. Google Scholar
- Liberman, Isabelle, Alvin Liberman, Ignatius Mattingly, and Donald Shankweiler. 1980. Orthography and the beginning reader. In Orthography, reading, and dyslexia, eds. James R. Kavanagh and Richard L. Venezky, 67–84. Baltimore: University Park Press. Google Scholar
- Loporcaro, Michele. 1990. On the analysis of geminates in Standard Italian and Italian dialects. In Natural phonology: The state of the art. Papers from the Bern Workshop on Natural Phonology, eds. Bernhard Hurch and Richard Rhodes, 149–174. Berlin: de Gruyter. Google Scholar
- Marotta, Giovanna. 1988. The Italian diphthongs and the autosegmental framework. Certamen Phonologicum 8: 389–420. Google Scholar
- McCarthy, John J. 1998. Morpheme structure constraints and paradigm occultation. In Annual meeting of the Chicago Linguistics Society (CLS), Vol. 32, 123–150. Google Scholar
- Miao, Ruiqin. 2005. Loanword adaptation in Mandarin Chinese: Perceptual, phonological and sociolinguistic factors. PhD diss., State University of New York, Stony Brook. Google Scholar
- Morandini, Diego. 2007. The phonology of loanwords into Italian, MA thesis, University College London. Google Scholar
- Passino, Diana. 2008. Aspects of consonantal lengthening in Italian. Padova: Unipress. Google Scholar
- Porter, Stacey. 2010. Orthographic influence on the perception and production of Spanish loans in English, MA thesis, University of California, Irvine. Google Scholar
- Prince, Alan, and Paul Smolensky. 1993 . Optimality theory: Constraint interaction in generative grammar. Malden: Blackwell. Google Scholar
- Ramers, Karl Heinz. 1992. Ambisyllabische Konsonanten im Deutschen. In Silbenphonologie des Deutschen, eds. Peter Eisenberg, Karl H. Ramers, and Heinz Vater, 246–283. Tübingen: Narr. Google Scholar
- Repetti, Lori. 2012. Consonant-final loanwords and epenthetic vowels in Italian. Catalan Journal of Linguistics 11: 167–188. Google Scholar
- Saltarelli, Mario. 1984. Italian syllable structure. In Estudis Gramaticals: Working Papers in Linguistics, Vol. 1, 279–295. Barcelona: Universitat Autònoma de Barcelona. Google Scholar
- Smith, Jennifer. 2006. Loan phonology is not all perception: Evidence from Japanese loan doublets. In Japanese/Korean Linguistics 14, eds. Timothy Vance and Kimberly Jones, 63–74. Stanford: CSLI. Google Scholar
- Vogel, Irene. 1982. La sillaba come unità fonologica. Bologna: Zanichelli. Google Scholar
- Wiese, Richard. 1996. The phonology of German. Oxford: Oxford University Press. Google Scholar
- Wiese, Richard. 2004. How to optimize orthography. Written Language and Literacy 7: 305–331. Google Scholar
- Zingarelli, Nicola, Mario Cannella, and Beata Lazzarini. 2015. Lo zingarelli 2016: Vocabolario della lingua italiana. Bologna: Zanichelli. Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.