Abstract
The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor’s undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semi-supervised dependency analyzers for Spanish, and comparable to the state of the art for English.
We thank the support of SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN, CONACYT; and the Japanese Government; the first author is a JSPS fellow. The third author is a Visiting Scholar at Waseda University and specifically acknowledges support of SIP-20100773 grant, CONACYT 50206-H grant, and CONACYT scholarship for Sabbatical stay 2010.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atserias, J., y Rodríguez, H.: TACAT: TAgged Corpus Text Analyzer, Technical Report LSI-UPC RT-2-98 (1998)
Brants, T.: TNT — A Statistical Part-of-Speech Tagger. In: ANLP 2000, 6th Applied NLP Conference, Seattle, Washington, USA (2000)
Briscoe, T., Carroll, J., Graham, J., Copestake, A.: Relational evaluation schemes. In: Procs. of the Beyond PARSEVAL Workshop at the 3rd International Conf. on Language Resources and Evaluation, Gran Canaria, pp. 4–8 (2002)
Briscoe, T., Waegner, N.: Generalized probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19, 25–69 (1993)
Brooks, D.J.: Unsupervised Grammar Induction by Distribution and Attachment. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pp. 117–124. Association for Computational Linguistics, New York City (2006)
Calvo, H., Gelbukh, A.: Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora. Computación y Sistemas 12(1), 128–150 (2008)
Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser based on Rules, Heuristics, and Selectional Preferences. In: NLDB 2006, pp. 164–175 (2006)
Charniak, E.: A Maximum-Entropy-Inspired Parser. In: NAACL 2000, pp. 132–139 (2000)
Charniak, E.: Statistical techniques for natural language parsing. AI Magazine 18, 33–43 (1997)
Civit, M., Antònia Martí, M., Bufí, N.: From Constituents to Dependencies. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 141–152. Springer, Heidelberg (2006)
Cohen, S.B., Gimpel, K., Smith, N.A.: Unsupervised Bayesian Parameter Estimation for Dependency Parsing. In: Advances in NIPS 22 (2008)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, Ph.D. thesis, University of Pennsylvania (1999)
Dörnenburg, E.: Extension of the EMILE algorithm for inductive learning of context-free grammars for natural languages, Master’s Thesis, University of Dortmund (1997)
Gambino, O.J., Calvo, H.: On the usage of morphological tags for grammar induction. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 912–921. Springer, Heidelberg (2007)
Gelbukh, A., Calvo, H., Torres, S.: Transforming a constituency Treebank into a dependency Treebank. Procesamiento Del Lenguaje Natural 35, 145–152 (2005)
Genabith, J., van, A., Frank, A.: Way. Treebank vs. Xbar-based Automatic F-Structure Anotation. In: Proceedings of the LFG 2001 Conference. University of Hong Kong, CSLI Publications, Hong Kong (2001)
Genthial, D., Courtin, J., Kowarski, I.: Contribution of a Category Hierarchy to the Robustness of Syntactic Parsing. In: COLING 1990, pp. 139–144 (1990)
Gold, E.M.: Language Identification in the Limit. Information and Control 10(5), 447–474 (1967)
Gorla, J., Goyal, A., Sangal, R.: Two Approaches for Building an Unsupervised Dependency Parser and their Other Applications. In: AAAI 2007, pp. 1860–1861 (2007)
Hengeveld, K.: Parts of speech. In: Fortescue, M., Harder, P., Kristoffersen, L. (eds.) Layered Structure and Reference in a Functional Perspective, Benjamins, Amsterdam, pp. 29–56 (1992)
Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Procs. of the 8th International Workshop on Parsing Technologies (IWPT), pp. 195–206 (2003)
Klein, D., Manning, C.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the ACL (2004)
de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proceedings of LREC 2006 (2006)
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser. In: Proceedings of the CoNLL (2006)
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the HLT/EMNLP (2005a)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the ACL (2005b)
McDonald, R., Satta, G.: On the complexity of non-projective data-driven dependency parsing. In: Proceedings of the IWPT (2007)
van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Amsterdam, The Netherlands, pp. 315–322 (2001)
Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Fourth Mexican International Conference on Computer Science ENC 2003, Tlaxcala, México, pp. 18–28 (2003)
Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)
Ninio, A.: A proposal for the adoption of dependency grammar as the framework for the study of language acquisition. Honor of Shlomo Kugelmass, pp. 85–103 (1996)
Paskin, M.A.: Cubic-time parsing and learning algorithms for grammatical bigram models, Technical Report, UCB/CSD-01-1148, Computer Science Division, University of California Berkeley (2001)
Paskin, M.A.: Grammatical Bigrams. In: Advances in Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2001)
Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: 27th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 128–135 (1992)
Robinson, J.J.: Methods for obtaining corresponding phrase structure and dependency grammars. In: Proceedings of the 1967 Conference on Computational Linguistics, pp. 1–25 (1967)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164 (2006)
Sagae, K., Lavie, A.: Parser combination by reparsing. In: Proceedings of the HLT/NAACL (2006)
Sampson, G.: English for the Computer, The SUSANNE Corpus and analytic scheme. Clarendon Press, Oxford (1995)
Smith, N., Eisner, J.: Guiding unsupervised grammar induction using contrastive estimation. In: Working Notes of the International Joint Conference on Artificial Intelligence Workshop on Grammatical Inference Applications (2005)
Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, D.C., pp. 64–71 (1997)
van Zaanen, M.: Bootstrapping Structure into Language: Alignment-Based Learning, PhD Thesis, School of Computing, University of Leeds (2002)
Cheng, Y., Asahara, M., Matsumoto, Y.: Multi-lingual Dependency Parsing at NAIST, CONLL-X, Nara Institute of Science and Technology (2006)
Yuret, D.: Discovery of linguistic relations using lexical attraction, Ph.D. thesis, MIT (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Calvo, H., Gambino, O.J., Gelbukh, A., Inui, K. (2011). Dependency Syntax Analysis Using Grammar Induction and a Lexical Categories Precedence System. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-19400-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)