Abstract
There has been a great deal of progress in statistical parsing in the past decade (Collins, 1996, 1997; Charniak, 2000). A common characteristic of these previous generative parsers is their use of lexical statistics. However, it was subsequently discovered that bi-lexical statistics (parameters that involve two words) actually play a much smaller role than previously believed. It has been found by Gildea (2001) that the removal of bi-lexical statistics from a state-of-the-art PCFG parser resulted in little change in the output. Bikel (2004) observes that only 1.49% of the bi-lexical statistics needed in parsing were found in the training corpus. When considering only bigram statistics involved in the highest probability parse, this percentage becomes 28.8%. However, even when bi-lexical statistics do get used, they are remarkably similar to their back-off values using part-of-speech tags. Therefore, the utility of bi-lexical statistics becomes rather questionable. Klein and Manning (2003) present an unlexicalized parser that eliminates all lexical parameters, with a performance score close to the state-of-the-art lexicalised parsers.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although non-projective trees exist, the dependency trees used in our experiments are projective trees that are converted from the Penn Chinese Treebank.
- 2.
We also computed directed dependency accuracy, which is defined as the percentage of words that have the correct head. We observed that the directed dependency accuracy is only slightly lower than the undirected one.
References
Bikel, D.M. (2004). Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511.
Bikel, D.M. and D. Chiang (2000). Two statistical parsing models applied to the chinese treebank. In Proceedings of the 2nd Chinese Language Processing Workshop, Hong Kong.
Charniak, E. (2000). A maximum entropy inspired parser. In Proceedings of North American Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, pp. 132–139.
Clark, S., J. Hockenmaier, and M. Steedman (2002). Building deep dependency structures with a wide-coverage CCG parser. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA.
Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, pp. 184–191.
Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Madrid, pp. 16–23.
Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph. D. thesis, University of Pennsylvania, Pennsylvania, PA.
Dagan, I., L. Lee, and F. Pereira (1999). Similarity-based models of word cooccurrence probabilities. Machine Learning 34(1–3), 43–69.
Eisner, J. (1996). Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the International Conference on Computational Linguistics, Copenhagen.
Eisner, J. and G. Satta (1999). Efficient parsing for bilexical context-free grammars and head- automaton grammars. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Maryland.
Gildea, D. (2001). Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Pittsburgh, PA.
Graff, D. (2003). English Gigaword. Philadelphia, PA: Linguistic Data Consortium.
Graff, D. and K. Chen (2003). Chinese Gigaword. Philadelphia, PA: Linguistic Data Consortium.
Grefenstette, G. (1994). Corpus-derived first, second and third-order word affinities. In Proceedings of Euralex, Amsterdam.
Harris, Z. (1968). Mathematical Structures of Language. New York, NY: Wiley.
Hindle, D. (1990). Noun classification from predicate-argument structures. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Pittsburgh, PA, pp. 268–275.
Jurafsky, D. and J. Martin (2000). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall.
Klein, D. and C. Manning (2003). Accurate unlexicalized parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo.
Klein, D. and C. Manning (2004). Corpus-based induction of syntactic structure: models of dependency and constituency. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Barcelona.
Levy, R. and C.D. Manning (2003). Is it harder to parse chinese, or the chinese treebank? In Proceedings of the Annual Meeting of the Association for Computational Linguistics 2003, Sapporo, Hokkaido.
Lin, D. (1995). A dependency-based method for evaluating broad-coverage parsers. In Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, Quebec, pp. 1420–1425.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, pp. 768–774.
Manning, C. and H. Schutze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
McDonald, R., K. Crammer, and F. Pereira (2005). Online large-margin training of dependency parsers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan.
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies, Nancy, pp. 149–160.
Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. kubler, S. Marinov, and E. Marsi (2007). Maltparser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13, 95–135.
Pereira, F., N. Tishby, and L. Lee (1993). Distributional clustering of English words. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 183–190.
Ratnaparkhi, A. (1999). Learning to parse natural language with maximum entropy models. Machine Learning 34(1–3), 151–175.
Yamada, H. and Y. Matsumoto (2003). Statistical dependency analysis with support vector machines. In Proceedings of the International Workshop on Parsing Technologies, Nancy.
Acknowledgements
We would like to thank Mark Steedman for suggesting the comparison with unlexicalised parsing in Section 7.6 and the anonymous reviewers for their useful comments. This work was supported by NSERC, the Alberta Ingenuity Center for Machine Learning, and the Canada Research Chairs program. The first author was also supported by iCORE Scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Wang, Q.I., Schuurmans, D., Lin, D. (2010). Strictly Lexicalised Dependency Parsing. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_7
Download citation
DOI: https://doi.org/10.1007/978-90-481-9352-3_7
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)