Skip to main content

Wide-Coverage Parsing, Semantics, and Morphology

  • Chapter
  • First Online:
Turkish Natural Language Processing

Abstract

Wide-coverage parsing poses three demands: broad coverage over preferably free text, depth in semantic representation for purposes such as inference in question answering, and computational efficiency. We show for Turkish that these goals are not inherently contradictory when we assign categories to sub-lexical elements in the lexicon. The presumed computational burden of processing such lexicons does not arise when we work with automata-constrained formalisms that are trainable on word-meaning correspondences at the level of predicate-argument structures for any string, which is characteristic of radically lexicalizable grammars. This is helpful in morphologically simpler languages too, where word-based parsing has been shown to benefit from sub-lexical training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Lewis and Steedman (2014) describe what is at stake if we incorporate distributional semantics of content words but not compositional semantics coming out of such heads.

  2. 2.

    The notion of morpheme is controversial in linguistics. Matthews (1974), Stump (2001), and Aronoff and Fudeman (2011) provide some discussion. Without delving into morphological theory, we shall adopt the computational view summarized by Roark and Sproat (2007): morphology can be characterized by finite-state mechanisms. Models of morphological processing differ in the way they handle lexical access. For example, two-level morphology is finite-state and linear-time in its morphotactics, but it incurs an exponential cost to do surface form-lexical form pairings during morphological processing (see Koskenniemi 1983; Koskenniemi and Church 1988, and Barton et al. 1987 for extensive discussion). On the other hand, if we have lexical categories for sub-lexical items, then, given a string and its decomposition, we can check efficiently (in polynomial time) whether the category-string correspondences are parsable: the problem is in NP (nondeterministic polynomial time) not because of parsing but because of ambiguity. Lexical access could then use the same mechanism for words, morphemes, and morpholexical rules if it wants to.

  3. 3.

    For example: for the word el-ler-im-de-ki “the ones in my hands” with the morphological breakdown of el-plu-poss.1s-loc-rel, el for “hand,” the lexical ending is -plu-poss.1s-loc-rel.

  4. 4.

    Notice that, left unconstrained we face n × 2164 ≈ n × 1049.4 word-like forms in Turkish, from 164 morphemes and n lexemes. A much smaller search space is attested because of morphological, semantic, and lexical constraints, but 50,000 and counting is still an enormous search space.

  5. 5.

    Using complexity results in these aspects has been sometimes controversial; see, for example, Berwick and Weinberg (1982), Barton et al. (1987), and Koskenniemi and Church (1988). One view, which we do not follow, is to eliminate alternatives in the model, by insisting on using tractable algorithms, as in Tractable Cognition thesis (van Rooij 2008). The one we follow addresses complexity as a complex mixture of source and data that in the end allows efficient parsing, feasible and transparent training, and scalable performance. For example, Clark and Curran’s (2007) CCG parser is cubic time, whereas A CCG parser is exponential time in the worst case, but with training, it has a superlinear parsing performance on long sentences (Lewis and Steedman 2014). Another example of this view is PAC learnability of Valiant (2013).

  6. 6.

    Item-and-Arrangement (IA) morphology treats word structure as consisting of morphemes which are put one after another, like segments. Item-and-Process morphology (IP) uses lexemes and associated processes on them, which are not necessarily segmental. Another alternative is Word-and-Paradigm, which is similar to IP but with word as the basic unit. The terminology is due to Hockett (1959).

  7. 7.

    Nonconcatenative and nonsegmental morphological processes, which are not only characteristic of templatic languages but also abundant in diverse morphological typologies, such as German, Tagalog, and Alabama, are painful reminders that IA cannot be a universal model for all lexicons.

  8. 8.

    What this means is that, if “archer” in (4) were a quantified phrase, for example her okçu “every archer,” then the quantifier’s lexically value-raised category would lead to her okçu := NP ∖(NPNP): λpλqλx.(∀x)pa′x → qx. Value-raising is distribution of type-raising to arguments, as shown in the logical form.

  9. 9.

    Here we pass over the mechanism that maintains lexical integrity, which has the effect of doing category combination of bound items before doing it across words. The idea was first stipulated in CCG by Bozşahin (2002) and revised for explanation in Steedman and Bozşahin (2018). In practical parser training the same effect has been achieved in various ways. For example, in a maximum entropy model of Turkish, a category feature for a word is decided based on whether it arises from a suffix of the word (Akkuş 2014). Wang et al. (2014) rely on a morphological analyzer before training, to keep category inference within a word. Ambati et al. (2013) rank intra-chunk (morphological) dependencies higher than inter-chunk (phrasal) dependencies in coming up with CCG categories, which has the same effect.

  10. 10.

    Notice that the adverb kolayca is necessarily a VP modifier, unlike kolay of (7b), which is underspecified. We avoid ungrammatical coordinations involving parts of words while allowing suspended affixation, by virtue of radically lexicalizing the conjunction category. For example, [target-ACC hit and bystander-ACC missed ]-REL archer is ungrammatical, and the coordination category has the constraint (ω), which says that phonological wordhood must be satisfied by all Xs. The left conjunct in this hypothetical example could not project an X ω because its right periphery—which projects X—would not be a phonological word, as Kabak (2007) showed. It is a forced move in CCG that such constraints on formally available combinations must be derived from information available at the perceptual interfaces.

  11. 11.

    We note that another wide-coverage parser for Turkish, Eryiğit et al. (2008), which uses dependency parsing, achieves its highest results in terms equivalent to a subset of our sub-lexical training (inflectional groups, in their case). Their comparison includes word-trained lexicons. CCG adds to this perspective a richer inventory of types to train with, and the benefit of naturally extending the coverage to long-range dependencies that are abundant in large corpora, once heads of syntactic constructions bear combinatory categories in the lexicon. We say more about these aspects subsequently.

  12. 12.

    Honnibal and Curran (2009), Honnibal et al. (2010), and Honnibal (2010) have shown that English benefits in parsing performance from sub-lexical training as well, although parsing in their case is word-based. One key ingredient appears to be lexicalizing the unary rules as “hat categories,” which indeed makes such CCG categories truly supertags because they can be taken into account in training before the parser sees them, whereas the previous usage of supertag in CCG is equivalent to “combinatory lexical category.”

  13. 13.

    In linguistics the term “lexeme” could mean one base lexeme and all its paradigm forms receiving one and same part of speech.

  14. 14.

    The example is from Çakıcı (2008). The convention we follow in display of Turkish treebank data is: word|POS|Category–gloss.

  15. 15.

    Figures are from Çakıcı (2008).

  16. 16.

    In fact, both interpretations are possible, and type-shifting from NP to S would be preferable. For example, “Arabadaki Mehmet.” (car-loc-ki Mehmet) could mean “Mehmet, the one in the car” or “The one in the car is Mehmet,” with the given punctuation. Differences in the interpretations are clear in the following alternative continuations: Yarın gidiyormuş./Ahmet değil. “He is leaving tomorrow/Not Ahmet.” The first one requires NP reading for the example in the beginning, and the second one S (propositional) reading. Going the other way, i.e., from a lexically specified S for a nominal predicate to an NP, is much more restricted in Turkish. Such type-shifting is in fact headed by verbal inflection.

  17. 17.

    Although the gold-standard CCG categories (supertags) are used, this number is slightly less than 100%. This is possibly caused by an implementation discrepancy.

References

  • Akkuş BK (2014) Supertagging with combinatory categorial grammar for dependency parsing. Master’s thesis, Middle East Technical University, Ankara

    Google Scholar 

  • Ambati BR, Deoskar T, Steedman M (2013) Using CCG categories to improve Hindi dependency parsing. In: Proceedings of ACL, Sofia, pp 604–609

    Google Scholar 

  • Ambati BR, Deoskar T, Steedman M (2014) Improving dependency parsers using combinatory categorial grammar. In: Proceedings of EACL, Gothenburg, pp 159–163

    Google Scholar 

  • Aronoff M, Fudeman K (2011) What is morphology?, 2nd edn. Wiley-Blackwell, Chichester

    Google Scholar 

  • Atalay NB, Oflazer K, Say B (2003) The annotation process in the Turkish treebank. In: Proceedings of the workshop on linguistically interpreted corpora, Budapest, pp 33 – 38

    Google Scholar 

  • Bangalore S, Joshi AK (eds) (2010) Supertagging. MIT Press, Cambridge, MA

    Google Scholar 

  • Barton G, Berwick R, Ristad E (1987) Computational complexity and natural language. MIT Press, Cambridge, MA

    Google Scholar 

  • Berwick R, Weinberg A (1982) Parsing efficiency, computational complexity, and the evaluation of grammatical theories. Linguist Inquiry 13:165–192

    Google Scholar 

  • Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of WMT, pp 9–16

    Google Scholar 

  • Bos J, Bosco C, Mazzei A (2009) Converting a dependency treebank to a categorial grammar treebank for Italian. In: Proceedings of the international workshop on treebanks and linguistic theories, Milan, pp 27–38

    Google Scholar 

  • Bozşahin C (2002) The combinatory morphemic lexicon. Comput Linguist 28(2):145–186

    Article  Google Scholar 

  • Bozşahin C (2012) Combinatory linguistics. Mouton De Gruyter, Berlin

    Book  Google Scholar 

  • Çakıcı R (2005) Automatic induction of a CCG grammar for Turkish. In: Proceedings of the ACL student research workshop, Ann Arbor, MI, pp 73–78

    Chapter  Google Scholar 

  • Çakıcı R (2008) Wide-coverage parsing for Turkish. PhD thesis, University of Edinburgh, Edinburgh

    Google Scholar 

  • Çakıcı R, Steedman M (2009) A wide-coverage morphemic CCG lexicon for Turkish. In: Proceedings of ESSLLI workshop on parsing with categorial grammars, Bordeaux, pp 11–15

    Google Scholar 

  • Çakıcı R, Steedman M (2018) Wide coverage CCG parsing for Turkish, in preparation

    Google Scholar 

  • Cha J, Lee G, Lee J (2002) Korean combinatory categorial grammar and statistical parsing. Comput Hum 36(4):431–453

    Article  Google Scholar 

  • Clark S (2002) A supertagger for combinatory categorial grammar. In: Proceedings of the TAG+ workshop, Venice, pp 19–24

    Google Scholar 

  • Clark S, Curran JR (2006) Partial training for a lexicalized grammar parser. In: Proceedings of NAACL-HLT, New York, NY, pp 144–151

    Google Scholar 

  • Clark S, Curran JR (2007) Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput Linguist 33:493–552

    Article  Google Scholar 

  • Çöltekin Ç, Bozşahin C (2007) Syllable-based and morpheme-based models of Bayesian word grammar learning from CHILDES database. In: Proceedings of the annual meeting of the cognitive science society, Nashville, TN, pp 880 – 886

    Google Scholar 

  • Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3): 357 – 389

    Article  Google Scholar 

  • Göksel A (2006) Pronominal participles in Turkish and lexical integrity. Ling. Linguaggio 5(1):105–125

    Google Scholar 

  • Hall J, Nilsson J (2006) CoNLL-X shared task: multi-lingual dependency parsing. MSI Report 06060, School of Mathematics and Systems Engineering, Växjö University, Växjö

    Google Scholar 

  • Hockenmaier J (2003) Data models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh, Edinburgh

    Google Scholar 

  • Hockenmaier J (2006) Creating a CCGbank and a wide-coverage CCG lexicon for German. In: Proceedings of COLING-ACL, Sydney, pp 505–512

    Google Scholar 

  • Hockenmaier J, Steedman M (2007) CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Comput Linguist 33(3):356–396

    Article  Google Scholar 

  • Hockett CF (1959) Two models of grammatical description. Bob-Merrill, Indianapolis, IN

    Google Scholar 

  • Hoeksema J, Janda RD (1988) Implications of process-morphology for categorial grammar. In: Oehrle RT, Bach E, Wheeler D (eds) Categorial grammars and natural language structures. D. Reidel, Dordrecht

    Google Scholar 

  • Honnibal M (2010) Hat categories: representing form and function simultaneously in combinatory categorial grammar. PhD thesis, University of Sydney, Sydney

    Google Scholar 

  • Honnibal M, Curran JR (2009) Fully lexicalising CCGbank with hat categories. In: Proceedings of EMNLP, Singapore, pp 1212–1221

    Google Scholar 

  • Honnibal M, Kummerfeld JK, Curran JR (2010) Morphological analysis can improve a CCG parser for English. In: Proceedings of COLING, Beijing, pp 445–453

    Google Scholar 

  • Kabak B (2007) Turkish suspended affixation. Linguistics 45:311–347

    Article  Google Scholar 

  • Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki

    Google Scholar 

  • Koskenniemi K, Church KW (1988) Complexity, two-level morphology and Finnish. In: Proceedings of COLING, Budapest, pp 335–339

    Google Scholar 

  • Lewis M, Steedman M (2014) A CCG parsing with a supertag-factored model. In: Proceedings of EMNLP, Doha, pp 990–1000

    Google Scholar 

  • Lieber R (1992) Deconstructing morphology: word formation in syntactic theory. The University of Chicago Press, Chicago, IL

    Google Scholar 

  • MacWhinney B (2000) The CHILDES project: tools for analyzing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah, NJ

    Google Scholar 

  • Matthews P (1974) Morphology: an introduction to the theory of word-structure. Cambridge University Press, Cambridge

    Google Scholar 

  • McConville M (2006) An inheritance-based theory of the lexicon in combinatory categorial grammar. PhD thesis, University of Edinburgh, Edinburgh

    Google Scholar 

  • McDonald R, Crammer K, Pereira F (2005) Online large-margin training of dependency parsers. In: Proceedings of ACL, Ann Arbor, MI, pp 91–98

    Google Scholar 

  • Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135

    Google Scholar 

  • Oflazer K (2003) Dependency parsing with an extended finite-state approach. Comput Linguist 29(4):515–544

    Article  Google Scholar 

  • Oflazer K, Göçmen E, Bozşahin C (1994) An outline of Turkish morphology. www.academia.edu/7331476/An_Outline_of_Turkish_Morphology (7 May 2018)

  • Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Berlin

    Google Scholar 

  • Roark B, Sproat RW (2007) Computational approaches to morphology and syntax. Oxford University Press, Oxford

    Google Scholar 

  • Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261

    Article  Google Scholar 

  • Schmerling S (1983) Two theories of syntactic categories. Linguist Philos 6(3):393–421

    Article  Google Scholar 

  • Sells P (1995) Korean and Japanese morphology from a lexical perspective. Linguist Inquiry 26(2):277–325

    Google Scholar 

  • Steedman M (1996) Surface structure and interpretation. MIT Press, Cambridge, MA

    Google Scholar 

  • Steedman M (2000) The syntactic process. MIT Press, Cambridge, MA

    Google Scholar 

  • Steedman M (2011) Taking scope. MIT Press, Cambridge, MA

    Book  Google Scholar 

  • Steedman M, Baldridge J (2011) Combinatory categorial grammar. In: Boyer R, Börjars K (eds) Non-transformational syntax: formal and explicit models of grammar: a guide to current models, Wiley-Blackwell, West Sussex

    Google Scholar 

  • Steedman, M. and C. Bozşahin (2018) Projecting from the Lexicon. MIT Press, (submitted)

    Google Scholar 

  • Stump GT (2001) Inflectional morphology: a theory of paradigm structure. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Tse D, Curran JR (2010) Chinese CCGbank: extracting CCG derivations from the Penn Chinese treebank. In: Proceedings of COLING, Beijing, pp 1083–1091

    Google Scholar 

  • Valiant L (2013) Probably approximately correct: nature’s algorithms for learning and prospering in a complex world. Basic Books, New York, NY

    Google Scholar 

  • van Rooij I (2008) The tractable cognition thesis. Cogn Sci 32(6):939–984

    Article  Google Scholar 

  • Wang A, Kwiatkowski T, Zettlemoyer L (2014) Morpho-syntactic lexical generalization for CCG semantic parsing. In: Proceedings of EMNLP, Doha, pp 1284–1295

    Google Scholar 

  • Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruket Çakıcı .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Çakıcı, R., Steedman, M., Bozşahin, C. (2018). Wide-Coverage Parsing, Semantics, and Morphology. In: Oflazer, K., Saraçlar, M. (eds) Turkish Natural Language Processing. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-90165-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90165-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90163-3

  • Online ISBN: 978-3-319-90165-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics