Dealing with Function Words in Unsupervised Dependency Parsing

  • David Mareček
  • Zdeněk Žabokrtský
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8403)

Abstract

In this paper, we show some properties of function words in dependency trees. Function words are grammatical words, such as articles, prepositions, pronouns, conjunctions, or auxiliary verbs. These words are often short and very frequent in texts and therefore many of them can be easily recognized. We formulate a hypothesis that function words tend to have a fixed number of dependents and we prove this hypothesis on treebanks. Using this hypothesis, we are able to improve unsupervised dependency parsing and outperform previously published state-of-the-art results for many languages.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tesnière, L.: Eléments de syntaxe structurale. Editions Klincksieck, Paris (1959)Google Scholar
  2. 2.
    Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Reidel, Dordrecht (1986)Google Scholar
  3. 3.
    Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the Workshop on Data-driven Methods in Machine Translation, vol. 14, pp. 1–8 (2001)Google Scholar
  4. 4.
    Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob, U., Knight, K., Koehn, P., Palmer, M., Schneider, N.: Abstract meaning representation for sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186. Association for Computational Linguistics, Sofia (August 2013)Google Scholar
  5. 5.
    Zipf, G.K.: The Psychobiology of Language. Houghton Mifflin, Boston (1935)Google Scholar
  6. 6.
    Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: To Parse or Not to Parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)Google Scholar
  7. 7.
    Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X 2006, pp. 149–164. Association for Computational Linguistics, Stroudsburg (2006)Google Scholar
  8. 8.
    Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics, Prague (June 2007)Google Scholar
  9. 9.
    Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 281–290. Association for Computational Linguistics, Sofia (August 2013)Google Scholar
  10. 10.
    Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)Google Scholar
  11. 11.
    Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  12. 12.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: Making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL 2011 (2011)Google Scholar
  13. 13.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three Dependency-and-Boundary Models for Grammar Induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP CoNLL 2012 (2012)Google Scholar
  14. 14.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov chain Monte Carlo in practice. Interdisciplinary statistics. Chapman & Hall (1996)Google Scholar
  15. 15.
    Mareček, D., Žabokrtský, Z.: Gibbs Sampling with Treeness constraint in Unsupervised Dependency Parsing. In: Proceedings of RANLP Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing, Hissar, Bulgaria, pp. 1–8 (2011)Google Scholar
  16. 16.
    Mareček, D., Žabokrtský, Z.: Exploiting reducibility in unsupervised dependency parsing. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 297–307. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  17. 17.
    Majliš, M., Žabokrtský, Z.: Language richness of the web. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (May 2012)Google Scholar
  18. 18.
    Brants, T.: TnT - A Statistical Part-of-Speech Tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)Google Scholar
  19. 19.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: A study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle (October 1995)Google Scholar
  20. 20.
    Abney, S.P.: The English Noun Phrase In Its Sentential Aspect. PhD thesis. MIT (1987)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • David Mareček
    • 1
  • Zdeněk Žabokrtský
    • 1
  1. 1.Faculty of Mathematics and Physics, Institute of Formal and Applied LinguisticsCharles University in PragueCzech Republic

Personalised recommendations