Skip to main content
  • 2633 Accesses

Abstract

Part I analyzed natural communication within the Slim theory of language. Part II presented formal language theory in terms of methodology, mathematical complexity, and computational implementation. With this background in mind, we turn in Part III to the morphological and syntactic analysis of natural language. This chapter begins with the basic notions of morphology, i.e., the linguistic analysis of word forms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For simplicity, we subsume agglutination under the term of inflection.

  2. 2.

    These same tests were also used in the attempt to motivate syntactic constituent structures (8.4.68.4.8).

  3. 3.

    See the distinction between free and bound morphemes in Sect. 13.3.

  4. 4.

    In analogy to autobahning, coined by Americans stationed in Germany after World War II from ‘Autobahn’ = highway.

  5. 5.

    Our terminology is in concord with Sinclair (1991):

    Note that a word form is close to, but not identical to, the usual idea of a word. In particular, several different word forms may all be regarded as instances of the same word. So drive, drives, driving, drove, driven, and perhaps driver, drivers, driver’s, drivers’, drive’s, make up ten different word forms, all related to the word drive. It is usual in defining a word form to ignore the distinction between upper and lower case, so SHAPE, Shape, and shape, will all be taken as instances of the same word form.

    Another term for our notion of a word is lexeme (see for example Matthews 1972, 1974). Terminologically, however, it is simpler to distinguish between word and word forms than between lexeme and word forms (or even lexeme forms).

  6. 6.

    A clear distinction between the notions of word and word form is not only of theoretical, but also of practical relevance. For example, when a text is said to consist of a ‘100 000 words’ it remains unclear whether the number is intended to refer to (i) the running word forms (tokens), (ii) the different word forms (i.e., the types, as in a word form lexicon), or (iii) the different words (i.e., the types, as in a base form lexicon). Depending on the interpretation, the number in question may be regarded as small, medium, or large.

  7. 7.

    This holds only for English. In German, it is the adverbial in the positive which is used as the base form for naming adjectives. This is because German adnominals inflect (for example, langsam/er, langsam/e, langsam/es, etc.), whereas in English it is the adverbial which has an additional suffix (for example, slow/ly).

  8. 8.

    Later work such as NLC and CLaTR uses the format of proplets (Sect. 3.4) instead.

  9. 9.

    Semantic properties of specific word forms are coded in the third position by means of subscripts. For example, the German verb forms gibst and gabst have the same combinatorics (category), but differ semantically in their respective tense values. NLC and CLaTR use the alternative format of proplets and code such distinctions as values of the sem attribute.

  10. 10.

    The category (sn) stands for ‘singular noun,’ (pn) stands for ‘plural noun,’ and (gn) stands for ‘genitive noun.’ The distinction between nongenitive singulars and plurals is important for the choice of the determiner, e.g., every vs. all (17.1.1). Because genitives in English serve only as prenominal modifiers, e.g., the wolf’s hunger, their number distinction need not be coded into the syntactic category.

  11. 11.

    For a more detailed discussion see CLaTR, Sect. 3.5.

  12. 12.

    Marty (1908), pp. 205 f.

  13. 13.

    The morphology of English happens to be simple. For example, compared to French, Italian, or German, there is little inflection in English. Furthermore, much of compounding may be regarded as part of English syntax rather than morphology – in accordance with Francis and Kučera’s (1982) definition of a graphic word form cited above. For example, kitchen table or baby wolves are written as separate words, whereas the corresponding compounds in German are written as one word form, e.g., Küchentisch and Babywölfe. For this reason, some morphological phenomena will be illustrated in this and the following two chapters in languages other than English.

  14. 14.

    An exhaustive categorization based on traditional paradigm tables would arrive at much higher numbers. For example, the adjectives of German have 18 inflectional forms per base form according to a distinctive categorization. In contrast, an exhaustive categorization, as presented in the Grammatik-Duden, p. 288, assigns more than eight times as many, namely 147 analyzed inflectional forms per base form, whereby the different analyses reflect distinctions of grammatical gender, number, case, definiteness, and degree, which in most cases are not concretely marked in the surface.

  15. 15.

    Nouns of English ending in -lf, such as calf, shelf, self, etc. form their plural in general as -lves. One might prefer for practical purposes to treat forms like wolves, calves, or shelves as elementary allomorphic forms, rather than combining an allomorphic noun stem ending in -lv with the plural allomorph es. This, however, would prevent us from explaining the interaction of concatenation and allomorphy with an example from English. The category segments sn, pn, and sr stand for singular noun, plural noun, and semi-regular (14.1.7), respectively.

  16. 16.

    Depending on the modality, there is the distinction between allographs in written language and allophones in spoken language. Allographs are, for example, happy vs. happi-, and allophones are, for example, the present vs. past tense pronunciation of read.

  17. 17.

    This allomorph is used in the progressive swimm/ing, avoiding the concatenative insertion of the gemination letter. A psychological argument for handling a particular form nonconcatenatively is frequency. Based on speech error data, Stemberger and MacWhinney (1986) provide evidence that the distinction between rote and combinatorial formation is based not only on regularity, but also on frequency, so that even regular word forms can be stored if they are sufficiently frequent.

  18. 18.

    Practically, one may analyze good, better, best as basic allomorphs without concatenation.

  19. 19.

    Depending on the approach, the basic elements of word forms are either the allomorphs or the morphemes.

  20. 20.

    For simplicity the categories and meanings of the different word form starts and next morphemes are represented as CATn and MEAN-m.

  21. 21.

    In lexicography, the term lemma refers to the base form surface, which is used as the key for storing or finding a lexical entry.

  22. 22.

    For this, special programming languages like AWK (Aho et al. 1988) and Perl (Wall and Schwartz 1990) are available.

  23. 23.

    See Aho and Ullman (1977), pp. 336–341.

  24. 24.

    The fourth basic concept of morphology, the word, does not provide for a recognition method because words are not suitable keys for a multitude of word forms.

  25. 25.

    Also known as the full-form method based on a full-form lexicon.

  26. 26.

    It is possible to derive much of the word form lexicon automatically, using a base form lexicon and rules for inflection as well as – to a more limited degree – for derivation and compounding. These rules, however, must be written and implemented for the natural language in question, which is costly. The alternative of producing the whole word form lexicon by hand is even more costly.

  27. 27.

    The discussion of German noun-noun compounds in Sect. 13.2 has shown that the size of a word form lexicon that is attempting to be complete may easily exceed a trillion word forms, thus causing computational difficulties.

  28. 28.

    A prototype is the KIMMO-system of two-level morphology (Koskenniemi 1983), based on finite state technology.

  29. 29.

    In light of the morpheme definition 13.2.3, a morpheme lexicon consists strictly speaking of analyzed base form allomorphs.

  30. 30.

    See Sect. 12.2. The inherent complexity of the morpheme method is shown in detail by Barton et al. (1987), pp. 115–186, using the analysis of spies/spy+s in the KIMMO system.

  31. 31.

    The allomorph method was first presented in Hausser (1989b).

  32. 32.

    In the case of irregular paradigms, also the suppletive forms are supplied (14.1.8).

  33. 33.

    Thus, no bound morphemes (13.3.4) are being postulated.

  34. 34.

    MacWhinney (1978) demonstrates the independent status of lexical allomorphs with language acquisition data in Hungarian, Finnish, German, English, Latvian, Russian, Spanish, Arabic, and Chinese.

References

  • Aho, A.V., and J.D. Ullman (1977) Principles of Compiler Design, Reading: Addison-Wesley

    Google Scholar 

  • Aho, A.V., B.W. Kernighan, and P. Weinberger (1988) The AWK Programming Language, Reading: Addison-Wesley

    MATH  Google Scholar 

  • Barton, G., R.C. Berwick, and E.S. Ristad (1987) Computational Complexity and Natural Language, Cambridge: MIT Press

    Google Scholar 

  • Francis, W.N., and H. Kučera (1982) Frequency Analysis of English Usage: Lexicon and Grammar, Boston: Houghton Mifflin

    Google Scholar 

  • Hausser, R. (1989b) “Principles of Computational Morphology,” Laboratory of Computational Linguistics, Carnegie Mellon University

    Google Scholar 

  • Koskenniemi, K. (1983) Two-Level Morphology, University of Helsinki Publications, Vol. 11

    Google Scholar 

  • MacWhinney, B. (1978) The Acquisition of Morphophonology, Monographs of the Society for Research in Child Development, No. 174, Vol. 43

    Google Scholar 

  • Marty, A. (1908) Untersuchungen zur Grundlegung der allgemeinen Grammatik und Sprachphilosophie, Vol. 1, Halle: Niemeyer, reprint Hildesheim, New York: Olms (1976)

    Google Scholar 

  • Matthews, P.H. (1972) Inflectional Morphology. A Theoretical Study Based on Aspects of Latin Verb Conjugation, Cambridge: Cambridge University Press

    Google Scholar 

  • Matthews, P.H. (1974) Morphology. An Introduction to the Theory of Word Structure, Cambridge Textbooks in Linguistics, Cambridge: Cambridge University Press

    Google Scholar 

  • Sapir, E. (1921) Language, an Introduction to the Study of Speech, New York: Harvest Books – Harcourt, Brace, and World

    Google Scholar 

  • Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford: Oxford University Press

    Google Scholar 

  • Stemberger, P.J., and B. MacWhinney. (1986) “Frequency and Lexical Storage of Regularly Inflected Forms,” Memory & Cognition 14.1:17–26

    Article  Google Scholar 

  • Wall, L., and R.L. Schwartz (1990) Programming Perl, Sebastopol: O’Reilly and Associates

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Exercises

Exercises

Section 13.1

  1. 1.

    Give the inflectional paradigms of man, power, learn, give, fast, and good. Generate new words from them by means of derivation and compounding.

  2. 2.

    Call up LA Morph on your computer and have the above word forms analyzed.

  3. 3.

    Explain the notions word, word form, paradigm, part of speech, and the difference between the open and the closed classes.

  4. 4.

    Why is it relevant to distinguish between the notions word and word form?

  5. 5.

    What is the role of the closed classes in derivation and compounding?

  6. 6.

    Why are only the open classes a demanding task of computational morphology?

  7. 7.

    How do the content and function words relate to the open and the closed classes?

Section 13.2

  1. 1.

    Why is the number of word forms in German potentially infinite?

  2. 2.

    Why is the number of noun-noun compounds n2?

  3. 3.

    What do the formal definitions of word and morpheme have in common?

  4. 4.

    What is a neologism?

  5. 5.

    Describe the difference between morphemes and syllables.

Section 13.3

  1. 1.

    What is suppletion?

  2. 2.

    Why is a bound morpheme like -ing neither in the open nor the closed classes?

  3. 3.

    What would argue against postulating bound morphemes?

Section 13.4

  1. 1.

    Explain the three steps of a morphological analysis.

  2. 2.

    Why does LA Grammar analyze allomorphs as ordered triples?

  3. 3.

    What are the components of a system of automatic word form recognition?

  4. 4.

    What is a lemma? Are there essential differences between the entries of a traditional dictionary and an online lexicon for automatic word form recognition?

  5. 5.

    How does the surface function as a key in automatic word form recognition?

  6. 6.

    Explain the purpose of categorization and lemmatization.

  7. 7.

    What is an analysis lexicon?

Section 13.5

  1. 1.

    Describe three different methods of automatic word form recognition.

  2. 2.

    Why is there no word method of automatic word form recognition?

  3. 3.

    How does the word form method handle categorization and lemmatization?

  4. 4.

    Compare cost and benefit of the word form method.

  5. 5.

    Would you classify the word form method as a smart or as a solid solution?

  6. 6.

    Why is the morpheme method mathematically complex?

  7. 7.

    Why does the morpheme method violate surface compositionality?

  8. 8.

    Why does the morpheme method use surfaces only indirectly as the key?

  9. 9.

    Why is the morpheme method conceptually related to transformational grammar?

  10. 10.

    Why does the allomorph method satisfy the principle of surface compositionality?

  11. 11.

    Why is the runtime behavior of the allomorph method faster than that of the morpheme method?

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hausser, R. (2014). Words and Morphemes. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41431-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41431-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41430-5

  • Online ISBN: 978-3-642-41431-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics