Words and Morphemes

Hausser, Roland

doi:10.1007/978-3-642-41431-2_13

Roland Hausser²

2633 Accesses

Abstract

Part I analyzed natural communication within the Slim theory of language. Part II presented formal language theory in terms of methodology, mathematical complexity, and computational implementation. With this background in mind, we turn in Part III to the morphological and syntactic analysis of natural language. This chapter begins with the basic notions of morphology, i.e., the linguistic analysis of word forms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For simplicity, we subsume agglutination under the term of inflection.
2.
These same tests were also used in the attempt to motivate syntactic constituent structures (8.4.6–8.4.8).
3.
See the distinction between free and bound morphemes in Sect. 13.3.
4.
In analogy to autobahning, coined by Americans stationed in Germany after World War II from ‘Autobahn’ = highway.
5.
Our terminology is in concord with Sinclair (1991):

Note that a word form is close to, but not identical to, the usual idea of a word. In particular, several different word forms may all be regarded as instances of the same word. So drive, drives, driving, drove, driven, and perhaps driver, drivers, driver’s, drivers’, drive’s, make up ten different word forms, all related to the word drive. It is usual in defining a word form to ignore the distinction between upper and lower case, so SHAPE, Shape, and shape, will all be taken as instances of the same word form.

Another term for our notion of a word is lexeme (see for example Matthews 1972, 1974). Terminologically, however, it is simpler to distinguish between word and word forms than between lexeme and word forms (or even lexeme forms).
6.
A clear distinction between the notions of word and word form is not only of theoretical, but also of practical relevance. For example, when a text is said to consist of a ‘100 000 words’ it remains unclear whether the number is intended to refer to (i) the running word forms (tokens), (ii) the different word forms (i.e., the types, as in a word form lexicon), or (iii) the different words (i.e., the types, as in a base form lexicon). Depending on the interpretation, the number in question may be regarded as small, medium, or large.
7.
This holds only for English. In German, it is the adverbial in the positive which is used as the base form for naming adjectives. This is because German adnominals inflect (for example, langsam/er, langsam/e, langsam/es, etc.), whereas in English it is the adverbial which has an additional suffix (for example, slow/ly).
8.
Later work such as NLC and CLaTR uses the format of proplets (Sect. 3.4) instead.
9.
Semantic properties of specific word forms are coded in the third position by means of subscripts. For example, the German verb forms gibst and gabst have the same combinatorics (category), but differ semantically in their respective tense values. NLC and CLaTR use the alternative format of proplets and code such distinctions as values of the sem attribute.
10.
The category (sn) stands for ‘singular noun,’ (pn) stands for ‘plural noun,’ and (gn) stands for ‘genitive noun.’ The distinction between nongenitive singulars and plurals is important for the choice of the determiner, e.g., every vs. all (17.1.1). Because genitives in English serve only as prenominal modifiers, e.g., the wolf’s hunger, their number distinction need not be coded into the syntactic category.
11.
For a more detailed discussion see CLaTR, Sect. 3.5.
12.
Marty (1908), pp. 205 f.
13.
The morphology of English happens to be simple. For example, compared to French, Italian, or German, there is little inflection in English. Furthermore, much of compounding may be regarded as part of English syntax rather than morphology – in accordance with Francis and Kučera’s (1982) definition of a graphic word form cited above. For example, kitchen table or baby wolves are written as separate words, whereas the corresponding compounds in German are written as one word form, e.g., Küchentisch and Babywölfe. For this reason, some morphological phenomena will be illustrated in this and the following two chapters in languages other than English.
14.
An exhaustive categorization based on traditional paradigm tables would arrive at much higher numbers. For example, the adjectives of German have 18 inflectional forms per base form according to a distinctive categorization. In contrast, an exhaustive categorization, as presented in the Grammatik-Duden, p. 288, assigns more than eight times as many, namely 147 analyzed inflectional forms per base form, whereby the different analyses reflect distinctions of grammatical gender, number, case, definiteness, and degree, which in most cases are not concretely marked in the surface.
15.
Nouns of English ending in -lf, such as calf, shelf, self, etc. form their plural in general as -lves. One might prefer for practical purposes to treat forms like wolves, calves, or shelves as elementary allomorphic forms, rather than combining an allomorphic noun stem ending in -lv with the plural allomorph es. This, however, would prevent us from explaining the interaction of concatenation and allomorphy with an example from English. The category segments sn, pn, and sr stand for singular noun, plural noun, and semi-regular (14.1.7), respectively.
16.
Depending on the modality, there is the distinction between allographs in written language and allophones in spoken language. Allographs are, for example, happy vs. happi-, and allophones are, for example, the present vs. past tense pronunciation of read.
17.
This allomorph is used in the progressive swimm/ing, avoiding the concatenative insertion of the gemination letter. A psychological argument for handling a particular form nonconcatenatively is frequency. Based on speech error data, Stemberger and MacWhinney (1986) provide evidence that the distinction between rote and combinatorial formation is based not only on regularity, but also on frequency, so that even regular word forms can be stored if they are sufficiently frequent.
18.
Practically, one may analyze good, better, best as basic allomorphs without concatenation.
19.
Depending on the approach, the basic elements of word forms are either the allomorphs or the morphemes.
20.
For simplicity the categories and meanings of the different word form starts and next morphemes are represented as CATn and MEAN-m.
21.
In lexicography, the term lemma refers to the base form surface, which is used as the key for storing or finding a lexical entry.
22.
For this, special programming languages like AWK (Aho et al. 1988) and Perl (Wall and Schwartz 1990) are available.
23.
See Aho and Ullman (1977), pp. 336–341.
24.
The fourth basic concept of morphology, the word, does not provide for a recognition method because words are not suitable keys for a multitude of word forms.
25.
Also known as the full-form method based on a full-form lexicon.
26.
It is possible to derive much of the word form lexicon automatically, using a base form lexicon and rules for inflection as well as – to a more limited degree – for derivation and compounding. These rules, however, must be written and implemented for the natural language in question, which is costly. The alternative of producing the whole word form lexicon by hand is even more costly.
27.
The discussion of German noun-noun compounds in Sect. 13.2 has shown that the size of a word form lexicon that is attempting to be complete may easily exceed a trillion word forms, thus causing computational difficulties.
28.
A prototype is the KIMMO-system of two-level morphology (Koskenniemi 1983), based on finite state technology.
29.
In light of the morpheme definition 13.2.3, a morpheme lexicon consists strictly speaking of analyzed base form allomorphs.
30.
See Sect. 12.2. The inherent complexity of the morpheme method is shown in detail by Barton et al. (1987), pp. 115–186, using the analysis of spies/spy+s in the KIMMO system.
31.
The allomorph method was first presented in Hausser (1989b).
32.
In the case of irregular paradigms, also the suppletive forms are supplied (14.1.8).
33.
Thus, no bound morphemes (13.3.4) are being postulated.
34.
MacWhinney (1978) demonstrates the independent status of lexical allomorphs with language acquisition data in Hungarian, Finnish, German, English, Latvian, Russian, Spanish, Arabic, and Chinese.

References

Aho, A.V., and J.D. Ullman (1977) Principles of Compiler Design, Reading: Addison-Wesley
Google Scholar
Aho, A.V., B.W. Kernighan, and P. Weinberger (1988) The AWK Programming Language, Reading: Addison-Wesley
MATH Google Scholar
Barton, G., R.C. Berwick, and E.S. Ristad (1987) Computational Complexity and Natural Language, Cambridge: MIT Press
Google Scholar
Francis, W.N., and H. Kučera (1982) Frequency Analysis of English Usage: Lexicon and Grammar, Boston: Houghton Mifflin
Google Scholar
Hausser, R. (1989b) “Principles of Computational Morphology,” Laboratory of Computational Linguistics, Carnegie Mellon University
Google Scholar
Koskenniemi, K. (1983) Two-Level Morphology, University of Helsinki Publications, Vol. 11
Google Scholar
MacWhinney, B. (1978) The Acquisition of Morphophonology, Monographs of the Society for Research in Child Development, No. 174, Vol. 43
Google Scholar
Marty, A. (1908) Untersuchungen zur Grundlegung der allgemeinen Grammatik und Sprachphilosophie, Vol. 1, Halle: Niemeyer, reprint Hildesheim, New York: Olms (1976)
Google Scholar
Matthews, P.H. (1972) Inflectional Morphology. A Theoretical Study Based on Aspects of Latin Verb Conjugation, Cambridge: Cambridge University Press
Google Scholar
Matthews, P.H. (1974) Morphology. An Introduction to the Theory of Word Structure, Cambridge Textbooks in Linguistics, Cambridge: Cambridge University Press
Google Scholar
Sapir, E. (1921) Language, an Introduction to the Study of Speech, New York: Harvest Books – Harcourt, Brace, and World
Google Scholar
Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford: Oxford University Press
Google Scholar
Stemberger, P.J., and B. MacWhinney. (1986) “Frequency and Lexical Storage of Regularly Inflected Forms,” Memory & Cognition 14.1:17–26
Article Google Scholar
Wall, L., and R.L. Schwartz (1990) Programming Perl, Sebastopol: O’Reilly and Associates
Google Scholar

Download references

Author information

Authors and Affiliations

Abteilung für Computerlinguistik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Roland Hausser

Authors

Roland Hausser
View author publications
You can also search for this author in PubMed Google Scholar

Exercises

Section 13.1

1.
Give the inflectional paradigms of man, power, learn, give, fast, and good. Generate new words from them by means of derivation and compounding.
2.
Call up LA Morph on your computer and have the above word forms analyzed.
3.
Explain the notions word, word form, paradigm, part of speech, and the difference between the open and the closed classes.
4.
Why is it relevant to distinguish between the notions word and word form?
5.
What is the role of the closed classes in derivation and compounding?
6.
Why are only the open classes a demanding task of computational morphology?
7.
How do the content and function words relate to the open and the closed classes?

Section 13.2

1.
Why is the number of word forms in German potentially infinite?
2.
Why is the number of noun-noun compounds n²?
3.
What do the formal definitions of word and morpheme have in common?
4.
What is a neologism?
5.
Describe the difference between morphemes and syllables.

Section 13.3

1.
What is suppletion?
2.
Why is a bound morpheme like -ing neither in the open nor the closed classes?
3.
What would argue against postulating bound morphemes?

Section 13.4

1.
Explain the three steps of a morphological analysis.
2.
Why does LA Grammar analyze allomorphs as ordered triples?
3.
What are the components of a system of automatic word form recognition?
4.
What is a lemma? Are there essential differences between the entries of a traditional dictionary and an online lexicon for automatic word form recognition?
5.
How does the surface function as a key in automatic word form recognition?
6.
Explain the purpose of categorization and lemmatization.
7.
What is an analysis lexicon?

Section 13.5

1.
Describe three different methods of automatic word form recognition.
2.
Why is there no word method of automatic word form recognition?
3.
How does the word form method handle categorization and lemmatization?
4.
Compare cost and benefit of the word form method.
5.
Would you classify the word form method as a smart or as a solid solution?
6.
Why is the morpheme method mathematically complex?
7.
Why does the morpheme method violate surface compositionality?
8.
Why does the morpheme method use surfaces only indirectly as the key?
9.
Why is the morpheme method conceptually related to transformational grammar?
10.
Why does the allomorph method satisfy the principle of surface compositionality?
11.
Why is the runtime behavior of the allomorph method faster than that of the morpheme method?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hausser, R. (2014). Words and Morphemes. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41431-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-41431-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41430-5
Online ISBN: 978-3-642-41431-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Words and Morphemes

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Exercises

Exercises

Section 13.1

Section 13.2

Section 13.3

Section 13.4

Section 13.5

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation