Cross-Domain Effects on Parse Selection for Precision Grammars

  • Andrew MacKinlayEmail author
  • Rebecca Dridan
  • Dan Flickinger
  • Timothy Baldwin


We examine the impact of domain on parse selection accuracy, in the context of precision HPSG parsing using the English Resource Grammar, using two training corpora and four test corpora and evaluating using exact tree matches as well as dependency F-scores. In addition to determining the relative impact of in- vs. cross-domain parse selection training on parser performance, we propose strategies to avoid cross-domain performance penalty when limited in-domain data is available. Our work supports previous research showing that in-domain training data significantly improves parse selection accuracy, and that it provides greater parser accuracy than an out-of-domain training corpus of the same size, but we verify experimentally that this holds for a handcrafted grammar, observing a 10–16% improvement in exact match and 5–6% improvement in dependency F-score by using a domain-matched training corpus. We also find it is possible to considerably improve parse selection accuracy through construction of even small-scale in-domain treebanks, and learning of parse selection models over in-domain and out-of-domain data. Naively adding an 11,000-token in-domain training corpus boosts dependency F-score by 2–3% over using solely out-of-domain data. We investigate more sophisticated strategies for combining data from these sources to train models: weighted linear interpolation between the single-domain models, and training a model from the combined data, optionally duplicating the smaller corpus to give it a higher weighting. The most successful strategy is training a monolithic model after duplicating the smaller corpus, which gives an improvement over a range of weightings, but we also show that the optimal value for these parameters can be estimated on a case-by-case basis using a cross-validation strategy. This domain-tuning strategy provides a further performance improvement of up to 2.3% for exact match and 0.9% for dependency F-score compared to the naive combination strategy using the same data.


Parsing Parse selection Parse ranking Domain adaptation Precision parsing HPSG Treebanking 


  1. Baldridge, J., & Osborne, M. (2003). Active learning for HPSG parse selection. In Proceedings of the seventh conference on natural language learning (pp. 17–24). Edmonton, Canada.Google Scholar
  2. Bikel, D. M. (2002). Design of a multi-lingual, parallel-processing statistical parsing engine. In Proceedings of the second international conference on human language technology research (pp. 178–182). San Francisco, USA.Google Scholar
  3. Black, E., Abney, S., Flickinger, D., Gdaniec, C., Grishman, R., Harrison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., & Strzalkowski, T. (1991). Procedure for quantitatively comparing the syntactic coverage of English grammars. In Proceedings of the workshop on speech and natural language (pp. 306–311). Pacific Grove, USA.Google Scholar
  4. Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Proceedings of EMNLP 2006 (pp. 120–128). Sydney, Australia.Google Scholar
  5. Böhmová A., Hajič J., Hajičová E., Hladká B. (2003) The Prague dependency treebank: A three level annotation scenario. In: Abeillé A. (ed.) Treebanks: Building and using parsed corpora. Springer, BerlinGoogle Scholar
  6. Bouma G., van Noord G., Malouf R. (2001) Alpino. Wide-coverage computational analysis of Dutch. In: Daelemans W., Sima-an K., Veenstra J., Zavrel J. (eds.) Computational linguistics in the Netherlands. Rodopi, Amsterdam, The Netherlands, pp 45–59Google Scholar
  7. Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The TIGER treebank. In Proceedings of the first workshop on treebanks and linguistic theories. Sozopol, Bulgaria.Google Scholar
  8. Brants, T. (2000). TnT—A statistical part-of-speech tagger. In Proceedings of the 6th ACL conference on applied natural language processing (pp. 224–231). Seattle, USA.Google Scholar
  9. Briscoe, T., & Carroll, J. (2006). Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank. In Proceedings of the COLING/ACL 2006 poster sessions (pp. 41–48). Sydney, Australia.Google Scholar
  10. Callmeier U. (2000) PET—A platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1): 99–107CrossRefGoogle Scholar
  11. Carter, D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the workshop on computational environments for grammar development and linguistic engineering (pp. 9–15). Madrid, Spain.Google Scholar
  12. Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of NAACL 2000 (pp. 132–139). Seattle, USA.Google Scholar
  13. Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of ACL 2005 (pp. 173–180). Ann Arbor, USA.Google Scholar
  14. Clark, S., & Curran, J. R. (2007a). Formalism-independent parser evaluation with CCG and DepBank. In Proceedings of ACL 2007 (pp. 248–255). Prague, Czech Republic.Google Scholar
  15. Clark S., Curran J. R. (2007b) Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33(4): 493–552CrossRefGoogle Scholar
  16. Clegg, A., & Shepherd, A. (2005). Evaluating and integrating treebank parsers on a biomedical corpus. In Proceedings of the ACL 2005 workshop on software (pp. 14–33). Ann Arbor, USA.Google Scholar
  17. Collins, M. (1997). Three generative, lexicalised models for statistical parsing. In Proceedings of ACL 1997 (pp. 16–23). Madrid, Spain.Google Scholar
  18. Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.Google Scholar
  19. Copestake, A., & Flickinger, D. (2000). An open source grammar development environment and broad-coverage English grammar using HPSG. In International Conference on Language Resources and Evaluation.Google Scholar
  20. Copestake, A., Flickinger, D., Sag, I. A., & Pollard, C. (2005). Minimal recursion semantics: An introduction. Research on Language and Computation, pp. 281–332.Google Scholar
  21. Dridan, R. (2009). Using lexical statistics to improve HPSG parsing. Ph.D. thesis, Saarland University.Google Scholar
  22. Dridan, R., & Baldwin, T. (2010). Unsupervised parse selection for HPSG. In Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP 2010) (pp. 694–704). Boston, USA.Google Scholar
  23. Finkel, J. R., & Manning, C. D. (2009). Hierarchical Bayesian domain adaptation. In Proceedings of HLT-NAACL 2009 (pp. 602–610). Boulder, USA.Google Scholar
  24. Flickinger D. (2000) On building a more efficient grammar by exploiting types. Natural Language Engineering 6(1): 15–28CrossRefGoogle Scholar
  25. Flickinger D. (2011) Accuracy vs. Robustness in grammar engineering. In: Bender E. M., Arnold J. E. (eds.) Language from a cognitive perspective: Grammar usage, and processing. CSLI Publications, Stanford, pp 31–50Google Scholar
  26. Flickinger, D., Bhayani, R., & Peters, S. (2009). Sentence boundary detection in spoken dialogue. Technical report, Stanford University, TR-09-06.CSLI.Google Scholar
  27. Gildea, D. (2001). Corpus variation and parser performance. In Proceedings of EMNLP 2001 (pp. 167–202). Pittsburgh, USA.Google Scholar
  28. Hara, T., Miyao, Y., & Tsujii, J. (2005). Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In Proceedings of IJCNLP 2005 (pp. 99–210). Jeju Island, Korea.Google Scholar
  29. Hara, T., Miyao, Y., & Tsujii, J. (2007). Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser. In Proceedings of IWPT ’07 (pp. 11–22). Prague, Czech Republic.Google Scholar
  30. Honnibal, M., Nothman, J., & Curran, J. R. (2009). Evaluating a statistical CCG parser on Wikipedia. In People’s Web ’09: Proceedings of the 2009 Workshop on The People’s Web Meets NLP (pp. 38–41). Singapore.Google Scholar
  31. Kaplan, R., Riezler, S., King, T. H., Maxwell III, J. T., Vasserman, A., & Crouch, R. (2004). Speed and accuracy in shallow and deep stochastic parsing. In Proceedings of HLT-NAACL 2004 (pp. 97–104). Boston, USA.Google Scholar
  32. Kingsbury, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the Penn TreeBank. In Proceedings of the human language technology 2002 conference (pp. 252–256). San Diego, USA.Google Scholar
  33. Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In Proceedings of the 2nd international joint conference on natural language processing (IJCNLP-05) (pp. 58–69). Jeju Island, Korea.Google Scholar
  34. Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th conference on natural language learning (CoNLL-2002) (pp. 49–55). Taipei, Taiwan.Google Scholar
  35. Malouf, R., & Van Noord, G. (2004). Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP-04 workshop: Beyond shallow analyses—Formalisms and statistical modeling for deep analyses. Hainan, China.Google Scholar
  36. Marcus M. P., Santorini B., Marcinkiewicz M. A. (1993) Building a large annotated corpus of English. The Penn Treebank. Computational Linguistics 19: 313–330Google Scholar
  37. McClosky, D., & Charniak, E. (2008). Self-training for biomedical parsing. In Proceedings of ACL-08 HLT: Short Papers (pp. 101–104). Columbus, USA.Google Scholar
  38. McClosky, D., Charniak, E., & Johnson, M. (2006). Reranking and self-training for parser adaptation. In Proceedings of the COLING/ACL 2006 (pp. 337–344). Sydney, Australia.Google Scholar
  39. McClosky, D., Charniak, E., & Johnson, M. (2010). Automatic domain adaptation for parsing. In Proceedings of HLT-NAACL 2010 (pp. 28–36). Los Angeles, USA.Google Scholar
  40. Miyao, Y., Sagae, K., & Tsujii, J. (2007). Towards framework-independent evaluation of deep linguistic parsers. In Proceedings of the GEAF 2007 Workshop. Palo Alto, USA.Google Scholar
  41. Miyao Y., Tsujii J. (2008) Feature forest models for probabilistic HPSG parsing. Computational Linguistics 34(1): 35–80CrossRefGoogle Scholar
  42. Oepen, S. (2001). [incr tsdb()]—Competence and performance laboratory. User Manual. Technical report, Computational Linguistics, Saarland University, Saarbrücken, Germany.Google Scholar
  43. Oepen, S., & Carroll, J. (2000). Ambiguity packing in constraint-based parsing—Practical results. In Proceedings of NAACL 2000 (pp. 162–169). Seattle, USA.Google Scholar
  44. Oepen S., Flickinger D., Toutanova K., Manning C. D. (2004) LinGO redwoods: A rich and dynamic treebank for HPSG. Research on Language and Computation 2(4): 575–596CrossRefGoogle Scholar
  45. Oepen, S., & Lønning, J. T. (2006). Discriminant-based MRS banking. In Proceedings the fifth international conference on language resources and evaluation (LREC 2006) (pp. 1250–1255). Genoa, Italy.Google Scholar
  46. Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D., & Brants, T. (2002). The LinGO redwoods treebank: Motivation and preliminary applications. In Proceedings of the 19th international conference on computational linguistics, Vol. 2 (pp. 1–5).Google Scholar
  47. Ohta, T., Tateisi, Y., & Kim, J.-D. (2002). The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In Proceedings of the second international conference on human language technology research (pp. 82–86). San Francisco, USA.Google Scholar
  48. Osborne, M., & Baldridge, J. (2004). Ensemble-based active learning for parse selection. In Proceedings of HLT-NAACL 2004: Main proceedings (pp. 89–96). Boston, USA.Google Scholar
  49. Plank, B., & van Noord, G. (2008). Exploring an auxiliary distribution based approach to domain adaptation of a syntactic disambiguation model. In Proceedings of the COLING 2008 workshop on cross framework and cross domain parser evaluation. Manchester, UK.Google Scholar
  50. Pyysalo S., Ginter F., Heimonen J., Björne J., Boberg J., Järvinen J., Salakoski T. (2007) BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8(1): 50CrossRefGoogle Scholar
  51. Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling. In The workshop on comparing corpora (pp. 1–6). Association for Computational Linguistics, Hong Kong, China.Google Scholar
  52. Rimell L., Clark S. (2009) Porting a lexicalized-grammar parser to the biomedical domain. Biomedical Informatics 42(5): 852–865CrossRefGoogle Scholar
  53. Roark, B., & Bacchiani, M. (2003). Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of HLT-NAACL 2003 (pp. 126–133). Edmonton, Canada.Google Scholar
  54. Rosén, V., Meurer, P., & Smedt, K. D. (2009). LFG Parsebanker: A toolkit for building and searching a treebank as a parsed corpus. In Proceedings of the seventh international workshop on treebanks and linguistic theories (TLT7) (pp. 127–133). LOT, Utrecht, The Netherlands.Google Scholar
  55. Tanaka, T., Bond, F., Oepen, S., & Fujita, S. (2005). High precision Treebanking—Blazing useful trees using POS information. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 330–337). Association for Computational Linguistics, Ann Arbor, USA.Google Scholar
  56. Van der Beek L., Bouma G., Malouf R., Van Noord G. (2002) The Alpino dependency treebank. Computational Linguistics in the Netherlands 45(1): 8–22Google Scholar
  57. Velldal, E. (2007). Empirical realization ranking. Ph.D. thesis, University of Oslo Department of Informatics.Google Scholar
  58. Verspoor K., Cohen K. B., Hunter L. (2009) The textual characteristics of traditional and open access scientific journals are similar. BMC Bioinformatics 10(1): 183CrossRefGoogle Scholar
  59. Yeh, A. (2000). More accurate tests for the statistical significance of result differences. In Proceedings of the 18th international conference on computational linguistics (COLING 2000) (pp. 947–953).Google Scholar
  60. Ytrestøl, G., Flickinger, D., & Oepen, S. (2009). Extracting and annotating Wikipedia sub-domains—Towards a new eScience community resource. In Proceedings of the seventh international workshop on treebanks and linguistic theories. Groeningen, The Netherlands.Google Scholar
  61. Zhang, Y., & Kordoni, V. (2010). Discriminant ranking for efficient treebanking. In Coling 2010: Posters (pp. 1453–1461). Coling 2010 Organizing Committee, Beijing, China.Google Scholar
  62. Zhang, Y., Oepen, S., & Carroll, J. (2007) Efficiency in unification-based n-best parsing. In Proceedings of IWPT 2007 (pp. 48–59). Prague, Czech Republic.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011 2011

Authors and Affiliations

  • Andrew MacKinlay
    • 1
    Email author
  • Rebecca Dridan
    • 1
  • Dan Flickinger
    • 2
  • Timothy Baldwin
    • 1
  1. 1.University of Melbourne/NICTAMelbourneAustralia
  2. 2.Stanford UniversityStanfordUSA

Personalised recommendations