Training with Lexical Information

Chen, Wenliang; Zhang, Min

doi:10.1007/978-981-287-552-5_5

Wenliang Chen³ &
Min Zhang³

352 Accesses

Abstract

This chapter describes the approaches of the word level, which make use of the information based on word surfaces. The lexical information is very important for resolving ambiguous relationships for dependency parsing, but lexicalized statistics are sparse and difficult to estimate directly given a limited train data set. Thus, it is attractive to consider learning lexical information from large-scale unlabeled data, such as web data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The clusters are provided by Koo et al. (2008) that recovers at most 1,000 distinct bit strings.
2.
http://w3.msi.vxu.se/~nivre/research/Penn2Malt.html
3.
We ensure that the text used for extracting subtrees does not include the sentences of the Penn Treebank.

References

Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., & Johnson, M. (2000). BLLIP 1987–89 WSJ Corpus Release 1, LDC2000T43. Linguistic Data Consortium.
Google Scholar
Chen, W., Zhang, M., & Zhang, Y. (2013). Semi-supervised feature transformation for dependency parsing. In Proceedings of EMNLP, Seattle (pp. 1303–1313). Association for Computational Linguistics. http://www.aclweb.org/anthology/D13-1129.
Koo, T., Carreras, X., & Collins, M. (2008). Simple semi-supervised dependency parsing. In Proceedings of ACL-08: HLT, Columbus.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguisticss, 19(2), 313–330.
Google Scholar
McDonald, R., & Nivre, J. (2007). Characterizing the errors of data-driven dependency parsing models. In Proceedings of EMNLP-CoNLL, Prague (pp. 122–131).
Google Scholar
Miller, S., Guinness, J., & Zamanian, A. (2004). Name tagging with word clusters and discriminative training. In D. M. Susan Dumais & S. Roukos (Eds.), HLT-NAACL 2004: Main proceedings, Boston (pp. 337–342). Association for Computational Linguistics.
Google Scholar
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In Proceedings of EMNLP 1996, Philadelphia (pp. 133–142). Copenhagen: Denmark.
Google Scholar
Thorsten, B., & Franz, A. (2006). Web 1T 5-gram Version 1 LDC2006T13. Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC2006T13.
Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In Proceedings of IWPT, Nancy (pp. 195–206).
Google Scholar
Zhou, G., Zhao, J., Liu, K., & Cai, L. (2011). Exploiting web-derived selectional preference to improve statistical dependency parsing. In Proceedings of ACL-HLT2011, Portland (pp. 1556–1565). Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1156.

Download references

Author information

Authors and Affiliations

Soochow University, Suzhou, Jiangsu, China
Wenliang Chen & Min Zhang

Authors

Wenliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, W., Zhang, M. (2015). Training with Lexical Information. In: Semi-Supervised Dependency Parsing. Springer, Singapore. https://doi.org/10.1007/978-981-287-552-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-287-552-5_5
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-551-8
Online ISBN: 978-981-287-552-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics