Skip to main content

Dealing with Function Words in Unsupervised Dependency Parsing

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8403))

  • 2060 Accesses

Abstract

In this paper, we show some properties of function words in dependency trees. Function words are grammatical words, such as articles, prepositions, pronouns, conjunctions, or auxiliary verbs. These words are often short and very frequent in texts and therefore many of them can be easily recognized. We formulate a hypothesis that function words tend to have a fixed number of dependents and we prove this hypothesis on treebanks. Using this hypothesis, we are able to improve unsupervised dependency parsing and outperform previously published state-of-the-art results for many languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tesnière, L.: Eléments de syntaxe structurale. Editions Klincksieck, Paris (1959)

    Google Scholar 

  2. Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Reidel, Dordrecht (1986)

    Google Scholar 

  3. Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the Workshop on Data-driven Methods in Machine Translation, vol. 14, pp. 1–8 (2001)

    Google Scholar 

  4. Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob, U., Knight, K., Koehn, P., Palmer, M., Schneider, N.: Abstract meaning representation for sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186. Association for Computational Linguistics, Sofia (August 2013)

    Google Scholar 

  5. Zipf, G.K.: The Psychobiology of Language. Houghton Mifflin, Boston (1935)

    Google Scholar 

  6. Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: To Parse or Not to Parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)

    Google Scholar 

  7. Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X 2006, pp. 149–164. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  8. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics, Prague (June 2007)

    Google Scholar 

  9. Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 281–290. Association for Computational Linguistics, Sofia (August 2013)

    Google Scholar 

  10. Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  11. Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  12. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: Making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL 2011 (2011)

    Google Scholar 

  13. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three Dependency-and-Boundary Models for Grammar Induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP CoNLL 2012 (2012)

    Google Scholar 

  14. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov chain Monte Carlo in practice. Interdisciplinary statistics. Chapman & Hall (1996)

    Google Scholar 

  15. Mareček, D., Žabokrtský, Z.: Gibbs Sampling with Treeness constraint in Unsupervised Dependency Parsing. In: Proceedings of RANLP Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing, Hissar, Bulgaria, pp. 1–8 (2011)

    Google Scholar 

  16. Mareček, D., Žabokrtský, Z.: Exploiting reducibility in unsupervised dependency parsing. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 297–307. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  17. Majliš, M., Žabokrtský, Z.: Language richness of the web. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (May 2012)

    Google Scholar 

  18. Brants, T.: TnT - A Statistical Part-of-Speech Tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)

    Google Scholar 

  19. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: A study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle (October 1995)

    Google Scholar 

  20. Abney, S.P.: The English Noun Phrase In Its Sentential Aspect. PhD thesis. MIT (1987)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mareček, D., Žabokrtský, Z. (2014). Dealing with Function Words in Unsupervised Dependency Parsing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54906-9_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54905-2

  • Online ISBN: 978-3-642-54906-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics