Evaluating the Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser

Hara, Tadayoshi; Miyao, Yusuke; Tsujii, Jun-ichi

doi:10.1007/978-90-481-9352-3_15

Tadayoshi Hara⁴,
Yusuke Miyao⁴ &
Jun-ichi Tsujii^4,5,6

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 43))

581 Accesses
4 Citations

Abstract

This chapter describes an effective approach to adapting an HPSG parser trained on the Penn Treebank to a biomedical domain. In this approach, we train probabilities of lexical entry assignments to words in a target domain and then incorporate them into the original parser. Experimental results show that this method can obtain higher parsing accuracy than previous work on domain adaptation for parsing the same data. Moreover, the results show that the combination of the proposed method and the existing method achieves parsing accuracy that is as high as that of an HPSG parser retrained from scratch, but with much lower training cost. We also evaluated our method on the Brown corpus to show the portability of our approach in another domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bangalore, S. and A.K. Joshi (1999). Supertagging: an approach to almost parsing. Computational Linguistics 25, 237–265.
Google Scholar
Berger, A.L., S.A.D. Pietra, and V.J.D. Pietra (1996). A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71.
Google Scholar
Blitzer, J., R. McDonald, and F. Pereira (2006). Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, pp. 120–128.
Google Scholar
Chan, Y.S. and H.T. Ng (2006). Estimating class priors in domain adaptation for word sense disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, pp. 89–96.
Google Scholar
Chen, S.F. and R. Rosenfeld (1999). A Gaussian prior for smoothing maximum entropy models. Technical Report, School of Computer Science, Carnegie Mellon University.
Google Scholar
Clark, S. and J.R. Curran (2004a) The importance of supertagging for wide-coverage CCG parsing. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, pp. 282–288.
Google Scholar
Clark, S. and J.R. Curran (2004b) Parsing the WSJ using CCG and log-linear models. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, pp. 104–111.
Google Scholar
Clark, S. and J.R. Curran (2006). Partial training for a lexicalized-grammar parser. In Proceedings of the Human Language Technology Conference and the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, New York, NY, pp 144–151.
Google Scholar
Clegg, A.B., Shepherd A (2005). Evaluating and integrating treebank parsers on a biomedical corpus. In Proceedings of the ACL 2005 Workshop on Software, Ann Arbor, Michigan.
Google Scholar
Cohen, P.R. (1995). Empirical Methods for Artificial Intelligence. Cambridge, MA: MIT Press.
Google Scholar
Hara, T., Y. Miyao, and J. Tsujii (2005). Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, pp. 199–210.
Google Scholar
Jelinek, F. (1998). Statistical Methods for Speech Recognition. Cambridge, MA: The MIT Press.
Google Scholar
Johnson, M. and Riezler, S. (2000). Exploiting auxiliary distributions in stochastic unification-based grammars. In Proceedings of the First conference on North American Chapter of the Association for Computational Linguistics, Seattle, WA, pp. 154–161.
Google Scholar
Kim, J.D., T. Ohta, Y. Teteisi, and J. Tsujii (2003). GENIA corpus – a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), i180–i182.
Article Google Scholar
Kuĉera, H. and W.N. Francis (1967). Computational Analysis of Present-Day American English. Providence, RI: Brown University Press.
Google Scholar
Lease, M. and E. Charniak (2005). Parsing biomedical literature. In Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, pp. 58–69.
Google Scholar
Marcus, M., G. Kim, M.A. Marcinkiewicz, R. Macintyre, A. Bies, M. Ferguson, K. Katz, and B. Schasberger (1994). The Penn Treebank: annotating predicate argument structure. In Proceedings of ARPA Human Language Technology Workshop, Plainsboro, NJ.
Google Scholar
McClosky, D., E. Charniak, and M. Johnson (2006). Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 337–344.
Google Scholar
Miyao, Y., T. Ninomiya, and J. Tsujii (2004). Corpus-oriented grammar development for acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of the First International Joint Conference on Natural Language Processing, Hainan Island, China, pp. 684–693.
Google Scholar
Miyao, Y. and J. Tsujii (2002). Maximum entropy estimation for feature forests. In Proceedings of the Second International Conference on Human Language Technology Research, San Diego, CA, pp. 292–297.
Google Scholar
Miyao, Y. and J. Tsujii (2005). Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, pp. 83–90.
Google Scholar
Ninomiya, T., T. Matsuzaki, Y. Tsuruoka, Y. Miyao, and J. Tsujii (2006). Extremely lexicalized models for accurate and fast HPSG parsing. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, pp. 155–163.
Google Scholar
Pollard, C. and I.A. Sag (1994). Head-Driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press.
Google Scholar
Pyysalo, S., F. Ginter, T. Pahikkala, J. Boberg, J. Jarvinen, T. Salakoski, and J. Koivula (2004). Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions. In Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications, Geneva, pp. 15–21.
Google Scholar
Roark, B. and M. Bacchiani (2003). Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, pp. 126–133.
Google Scholar
Steedman, M., M. Osborne, A. Sarkar, S. Clark, R. Hwa, J. Hockenmaier, P. Ruhlen, S. Baker, and J. Crim (2003). Bootstrapping statistical parsers from small datasets. In Proceedings of the Tenth Conference of the European Chapter of the Association for Computational Linguistics, Budapest, pp. 331–338.
Google Scholar
Szolovits, P. (2003). Adding a medical lexicon to an English parser. In Proceedings of 2003 AMIA Annual Symposium, Washington, DC, pp. 639–643.
Google Scholar
Titov, I. and J. Henderson (2006). Porting statistical parsers with data-defined kernels. In Proceedings of the Tenth Conference on Computational Natural Language Learning, New York, NY, pp. 6–13.
Google Scholar

Download references

Acknowledgements

This research was partially supported by Grant-in-Aid for Specially Promoted Research 18002007.

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Information Science and Technology, University of Tokyo, Tokyo, 113-0033, Japan
Tadayoshi Hara, Yusuke Miyao & Jun-ichi Tsujii
School of Computer Science, University of Manchester, Manchester, UK
Jun-ichi Tsujii
National Center for Text Mining (NaCTeM), Manchester, UK
Jun-ichi Tsujii

Authors

Tadayoshi Hara
View author publications
You can also search for this author in PubMed Google Scholar
Yusuke Miyao
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ichi Tsujii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tadayoshi Hara .

Editor information

Editors and Affiliations

Tilburg University, Warandelaan 2, Tilburg, 5000 LE, Netherlands
Harry Bunt
Dépt. Linguistique, Université de Genève, rue de Candolle 2, Genève, 1211, Switzerland
Paola Merlo
Pimpstensvägen 16, Uppsala, 752 67, Sweden
Joakim Nivre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hara, T., Miyao, Y., Tsujii, Ji. (2010). Evaluating the Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser. In: Bunt, H., Merlo, P., Nivre, J. (eds) Trends in Parsing Technology. Text, Speech and Language Technology, vol 43. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9352-3_15

Download citation

DOI: https://doi.org/10.1007/978-90-481-9352-3_15
Published: 29 September 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9351-6
Online ISBN: 978-90-481-9352-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics