Dependency Syntax Analysis Using Grammar Induction and a Lexical Categories Precedence System

  • Hiram Calvo
  • Omar J. Gambino
  • Alexander Gelbukh
  • Kentaro Inui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6608)


The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor’s undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semi-supervised dependency analyzers for Spanish, and comparable to the state of the art for English.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Atserias, J., y Rodríguez, H.: TACAT: TAgged Corpus Text Analyzer, Technical Report LSI-UPC RT-2-98 (1998)Google Scholar
  2. Brants, T.: TNT — A Statistical Part-of-Speech Tagger. In: ANLP 2000, 6th Applied NLP Conference, Seattle, Washington, USA (2000)Google Scholar
  3. Briscoe, T., Carroll, J., Graham, J., Copestake, A.: Relational evaluation schemes. In: Procs. of the Beyond PARSEVAL Workshop at the 3rd International Conf. on Language Resources and Evaluation, Gran Canaria, pp. 4–8 (2002)Google Scholar
  4. Briscoe, T., Waegner, N.: Generalized probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19, 25–69 (1993)Google Scholar
  5. Brooks, D.J.: Unsupervised Grammar Induction by Distribution and Attachment. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 117–124. Association for Computational Linguistics, New York City (2006)Google Scholar
  6. Calvo, H., Gelbukh, A.: Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora. Computación y Sistemas 12(1), 128–150 (2008)Google Scholar
  7. Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser based on Rules, Heuristics, and Selectional Preferences. In: NLDB 2006, pp. 164–175 (2006)Google Scholar
  8. Charniak, E.: A Maximum-Entropy-Inspired Parser. In: NAACL 2000, pp. 132–139 (2000)Google Scholar
  9. Charniak, E.: Statistical techniques for natural language parsing. AI Magazine 18, 33–43 (1997)Google Scholar
  10. Civit, M., Antònia Martí, M., Bufí, N.: From Constituents to Dependencies. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 141–152. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. Cohen, S.B., Gimpel, K., Smith, N.A.: Unsupervised Bayesian Parameter Estimation for Dependency Parsing. In: Advances in NIPS 22 (2008)Google Scholar
  12. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, Ph.D. thesis, University of Pennsylvania (1999)Google Scholar
  13. Dörnenburg, E.: Extension of the EMILE algorithm for inductive learning of context-free grammars for natural languages, Master’s Thesis, University of Dortmund (1997)Google Scholar
  14. Gambino, O.J., Calvo, H.: On the usage of morphological tags for grammar induction. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 912–921. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. Gelbukh, A., Calvo, H., Torres, S.: Transforming a constituency Treebank into a dependency Treebank. Procesamiento Del Lenguaje Natural 35, 145–152 (2005)Google Scholar
  16. Genabith, J., van, A., Frank, A.: Way. Treebank vs. Xbar-based Automatic F-Structure Anotation. In: Proceedings of the LFG 2001 Conference. University of Hong Kong, CSLI Publications, Hong Kong (2001)Google Scholar
  17. Genthial, D., Courtin, J., Kowarski, I.: Contribution of a Category Hierarchy to the Robustness of Syntactic Parsing. In: COLING 1990, pp. 139–144 (1990)Google Scholar
  18. Gold, E.M.: Language Identification in the Limit. Information and Control 10(5), 447–474 (1967)MathSciNetCrossRefzbMATHGoogle Scholar
  19. Gorla, J., Goyal, A., Sangal, R.: Two Approaches for Building an Unsupervised Dependency Parser and their Other Applications. In: AAAI 2007, pp. 1860–1861 (2007)Google Scholar
  20. Hengeveld, K.: Parts of speech. In: Fortescue, M., Harder, P., Kristoffersen, L. (eds.) Layered Structure and Reference in a Functional Perspective, Benjamins, Amsterdam, pp. 29–56 (1992)Google Scholar
  21. Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Procs. of the 8th International Workshop on Parsing Technologies (IWPT), pp. 195–206 (2003)Google Scholar
  22. Klein, D., Manning, C.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the ACL (2004)Google Scholar
  23. de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proceedings of LREC 2006 (2006)Google Scholar
  24. McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the CoNLL (2006)Google Scholar
  25. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the HLT/EMNLP (2005a)Google Scholar
  26. McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the ACL (2005b)Google Scholar
  27. McDonald, R., Satta, G.: On the complexity of non-projective data-driven dependency parsing. In: Proceedings of the IWPT (2007)Google Scholar
  28. van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Amsterdam, The Netherlands, pp. 315–322 (2001)Google Scholar
  29. Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Fourth Mexican International Conference on Computer Science ENC 2003, Tlaxcala, México, pp. 18–28 (2003)Google Scholar
  30. Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)Google Scholar
  31. Ninio, A.: A proposal for the adoption of dependency grammar as the framework for the study of language acquisition. Honor of Shlomo Kugelmass, pp. 85–103 (1996)Google Scholar
  32. Paskin, M.A.: Cubic-time parsing and learning algorithms for grammatical bigram models, Technical Report, UCB/CSD-01-1148, Computer Science Division, University of California Berkeley (2001)Google Scholar
  33. Paskin, M.A.: Grammatical Bigrams. In: Advances in Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2001)Google Scholar
  34. Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: 27th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 128–135 (1992)Google Scholar
  35. Robinson, J.J.: Methods for obtaining corresponding phrase structure and dependency grammars. In: Proceedings of the 1967 Conference on Computational Linguistics, pp. 1–25 (1967)Google Scholar
  36. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164 (2006)Google Scholar
  37. Sagae, K., Lavie, A.: Parser combination by reparsing. In: Proceedings of the HLT/NAACL (2006)Google Scholar
  38. Sampson, G.: English for the Computer, The SUSANNE Corpus and analytic scheme. Clarendon Press, Oxford (1995)Google Scholar
  39. Smith, N., Eisner, J.: Guiding unsupervised grammar induction using contrastive estimation. In: Working Notes of the International Joint Conference on Artificial Intelligence Workshop on Grammatical Inference Applications (2005)Google Scholar
  40. Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, D.C., pp. 64–71 (1997)Google Scholar
  41. van Zaanen, M.: Bootstrapping Structure into Language: Alignment-Based Learning, PhD Thesis, School of Computing, University of Leeds (2002)Google Scholar
  42. Cheng, Y., Asahara, M., Matsumoto, Y.: Multi-lingual Dependency Parsing at NAIST, CONLL-X, Nara Institute of Science and Technology (2006)Google Scholar
  43. Yuret, D.: Discovery of linguistic relations using lexical attraction, Ph.D. thesis, MIT (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hiram Calvo
    • 1
    • 2
  • Omar J. Gambino
    • 1
  • Alexander Gelbukh
    • 1
    • 3
  • Kentaro Inui
    • 2
  1. 1.Center for Computing ResearchIPNMéxicoMéxico
  2. 2.Computational LinguisticsNara Institute of Science and TechnologyTakayama, IkomaJapan
  3. 3.Faculty of LawWaseda UniversityShinjuku-kuJapan

Personalised recommendations