Dependency Parsing and Domain Adaptation with Data-Driven LR Models and Parser Ensembles

Sagae, Kenji; Tsujii, Jun-ichi

doi:10.1007/978-90-481-9352-3_4

Dependency Parsing and Domain Adaptation with Data-Driven LR Models and Parser Ensembles

Kenji Sagae⁴ &
Jun-ichi Tsujii^5,6,7

Chapter
First Online: 01 January 2010

712 Accesses
7 Citations

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

Abstract

Natural language parsing with data-driven dependency-based frameworks has received an increasing amount of attention in recent years, as observed in the shared tasks hosted by the Conference on Computational Natural Language Learning (CoNLL) in 2006 (Buchholz and Marsi, 2006) and 2007 (Nivre et al., 2007).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We append a “virtual root” word to the beginning of every sentence, which is used as the head of every word in the dependency structure that does not have a head in the sentence.
2.
The larger third set was not used.

References

Abeillé, A. (Ed.) (2003). Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer.
Google Scholar
Aduriz, I., M.J. Aranzabe, J.M. Arriola, A. Atutxa, A.D. de Ilarraza, A. Garmendia, and M. Oronoz (2003). Construction of a Basque dependency treebank. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Växjö, Sweden, pp. 201–204.
Google Scholar
Berger, A., S.A.D. Pietra, and V.J.D. Pietra (1996). A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71.
Google Scholar
Böhmová, A., J. Hajič, E. Hajičová, and B. Hladká (2003). The PDT: a 3-level annotation scenario. In Abeillé (2003), Chapter 7, pp. 103–127.
Briscoe, E. and J. Carroll (1993). Generalized probabilistic lr parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19(1), 25–59.
Google Scholar
Brown, R. (1973). A First Language: The Early Stages. Cambridge, MA: Harvard University Press.
Google Scholar
Buchholz, S. and E. Marsi (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the CoNLL-X Shared Task. Tenth Conference on Computational Natural Language Learning (CoNLL-X), New York, NY, pp. 149–164.
Google Scholar
Chen, K., C. Luo, M. Chang, F. Chen, C. Chen, C. Huang, and Z. Gao (2003). Sinica treebank: design criteria, representational issues and implementation. In Abeillé (2003), Chapter 13, pp. 231–248.
Csendes, D., J. Csirik, T. GyimÓthy, and A. Kocsor (2005). The Szeged Treebank. Berlin: Springer.
Google Scholar
Daelemans, W. and A.V. den Bosch, (2005). Memory-Based Language Processing. Cambridge: Cambridge Press.
Book Google Scholar
Erkan, G., A. Ozgur, and D. Radev (2007). Semisupervised classification for extracting protein interaction sentences using dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 228–237.
Google Scholar
Hajič, J., O. Smrž, P. Zemánek, J. Šnaidauf, and E. Beška (2004). Prague Arabic dependency treebank: development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt, pp. 110–117.
Google Scholar
Johansson, R. and P. Nugues (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), Tartu, Estonia, pp. 105–112.
Google Scholar
Kazama, J. and J. Tsujii (2003). Evaluation and extension of maximum entropy models with inequality constraints. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan, pp. 137–144.
Google Scholar
Knuth, D. (1965). On the translation of languages from left to right. Information and Control 8, 607–639.
Article Google Scholar
Koo, T., X. Carreras, and M. Collins (2008). Simple semi-supervised dependency parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH, pp. 595–603.
Google Scholar
Kulick, S., A. Bies, M. Liberman, M. Mandel, R. McDonald, M. Palmer, A. Schein, and L. Ungar (2004). Integrated annotation for biomedical information extraction. In Proceedings of BioLINK 2004: Linking Biological Literature, Ontologies and Databases, Boston, MA, pp. 61–68.
Google Scholar
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Marcus, M., B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330.
Google Scholar
Martí, M.A., M. Taulé, L. Màrquez, and M. Bertran (2007). CESS-ECE: A multilingual and multilevel annotated corpus. Available for download from: http://www.lsi.upc.edu/mbertran/cessece/
McClosky, D., E. Charniak, and M. Johnson (2006). Effective self-training for parsing. In Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), New York, NY, pp. 152–159.
Google Scholar
McDonald, R., F. Pereira, K. Ribarov, and J. Hajič (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, BC, pp. 523–530.
Google Scholar
Montemagni, S., F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M.T. Pazienza, D. Saracino, F. Zanzotto, N. Nana, F. Pianesi, and R. Delmonte (2003). Building the Italian syntactic-semantic treebank. In Abeillé (2003), Chapter 11, pp. 189–210.
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies, Nancy, France, pp. 149–160.
Google Scholar
Nivre, J. (2004). Incrementality in deterministic dependency parsing. In Proceedings of the ACL Workshop on Incremental Parsing: Bringing Engineering and Cognition Together (Workshop at ACL-2004), Barcelona, Spain, pp. 50–57.
Google Scholar
Nivre, J. and J. Nilsson (2005). Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 99–106.
Google Scholar
Nivre, J. and M. Scholz (2004). Deterministic dependency parsing of English text. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 64–70.
Google Scholar
Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 915–932.
Google Scholar
Oflazer, K., B. Say, D.Z. Hakkani-Tür, and G. Tür (2003). Building a Turkish treebank. In Abeillé (2003), Chapter 15, pp. 261–277.
Platt, J. (2000). Probabilities for SV machines. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans (Eds.), Advances in Large Margin Classifiers. Cambridge, MA: MIT Press, pp. 61–74.
Google Scholar
Prokopidis, P., E. Desypri, M. Koutsombogera, H. Papageorgiou, and S. Piperidis (2005). Theoretical and practical issues in the construction of a Greek dependency treebank. In Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, Spain, pp. 149–160.
Google Scholar
Quirk, C. and S. Corston-Oliver (2006). The impact of parse quality on syntactically-informed statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 62–69.
Google Scholar
Ratnaparkhi, A. (1997). A linear observed time statistical parser based on maximum entropy models. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing. Brown University, Providence, RI, pp. 1–10.
Google Scholar
Saetre, R., K. Sagae, and J. Tsujii (2007). Syntactic features for protein–protein interaction extraction. In Short Paper Proceedings of the 2nd International Symposium on Languages in Biology and Medicine, Biopolis, Singapore, pp. 6.1–6.14.
Google Scholar
Sagae, K. and A. Lavie (2006a). A best-first probabilistic shift-reduce parser. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics Main Conference Poster Session (COLING-ACL 2006), Syndey, Australia, pp. 691–698.
Google Scholar
Sagae, K. and A. Lavie (2006b). Parser combination by reparsing. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, New York, NY, pp. 129–132.
Google Scholar
Tomita, M. (1987). An efficient augmented context-free parsing algorithm. Compuatational Linguist 31, 31–46.
Google Scholar
Tomita, M. (1990). The generalized lr parser/compiler – version 8.4. In Proceedings of the International Conference on Computational Linguistics (COLING’90), Helsinki, pp. 59–63.
Google Scholar
Vapnik, V.N. (1995). The Mature of Statistical Learning Theory. New York, NY: Springer.
Google Scholar
Wang, M., N.A. Smith, and T. Mitamura (2007). What is the jeopardy model? a quasisynchronous grammar for qa. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), Prague, Czech Republic, pp. 22–32.
Google Scholar
Yamada, H. and Y. Matsumoto (2003). Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195–206.
Google Scholar
Zeman, D. and Žabokrtský, Z. (2005). Improving parsing accuracy by combining diverse dependency parsers. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005), Vancouver, BC, pp. 171–178.
Google Scholar

Download references

Acknowledgements

We thank the shared task organizers and treebank providers. We also thank the CoNLL 2007 shared task reviewers for their comments and suggestions, and Yusuke Miyao for insightful discussions. This work was supported in part by Grant-in-Aid for Specially Promoted Research 18002007.

Author information

Authors and Affiliations

Institute for Creative Technologies, University of Southern California, Marina del Rey, CA, 90292, USA
Kenji Sagae
Department of Computer Science, Faculty of Information Science and Technology, University of Tokyo, Tokyo, 113-0033, Japan
Jun-ichi Tsujii
School of Computer Science, University of Manchester, Manchester, UK
Jun-ichi Tsujii
National Center for Text Mining (NaCTeM), Manchester, UK
Jun-ichi Tsujii

Authors

Kenji Sagae
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ichi Tsujii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenji Sagae .

Editor information

Editors and Affiliations

Tilburg University, Warandelaan 2, Tilburg, 5000 LE, Netherlands
Harry Bunt
Dépt. Linguistique, Université de Genève, rue de Candolle 2, Genève, 1211, Switzerland
Paola Merlo
Pimpstensvägen 16, Uppsala, 752 67, Sweden
Joakim Nivre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sagae, K., Tsujii, Ji. (2010). Dependency Parsing and Domain Adaptation with Data-Driven LR Models and Parser Ensembles. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_4

Download citation

DOI: https://doi.org/10.1007/978-90-481-9352-3_4
Published: 29 September 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics