Multilingual Unsupervised Dependency Parsing with Unsupervised POS Tags

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9413)

Abstract

In this paper, we present experiments with unsupervised dependency parser without using any part-of-speech tags learned from manually annotated data. We use only unsupervised word-classes and therefore propose fully unsupervised approach of sentence structure induction from a raw text. We show that the results are not much worse than the results with supervised part-of-speech tags.

Keywords

Grammar induction Unsupervised parsing Word classes Gibbs sampling 

Notes

Acknowledgments

This research has been supported by the grant no. GPP406/14/06548P of the Grant Agency of the Czech Republic.

References

  1. 1.
    Blunsom, P., Cohn, T.: A hierarchical Pitman-Yor process hmm for unsupervised part of speech induction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 865–874. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002582
  2. 2.
    Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992). http://dl.acm.org/citation.cfm?id=176313.176316 Google Scholar
  3. 3.
    Clark, A.: Combining distributional and morphological information for part of speech induction. In: Proceedings of 10th EACL, pp. 59–66 (2003)Google Scholar
  4. 4.
    Ganchev, K., Gillenwater, J., Taskar, B.: Dependency grammar induction via bitext projection constraints. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 1, pp. 369–377. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1687878.1687931
  5. 5.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Interdisciplinary Statistics. Chapman & Hall, London (1996)Google Scholar
  6. 6.
    Headden III, W.P., Johnson, M., McClosky, D.: Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 101–109. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  7. 7.
    Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)Google Scholar
  8. 8.
    Majliš, M., Žabokrtský, Z.: Language richness of the web. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012Google Scholar
  9. 9.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1994)Google Scholar
  10. 10.
    Mareček, D., Straka, M.: Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 281–290. Association for Computational Linguistics, Sofia, Bulgaria, August 2013Google Scholar
  11. 11.
    Mareček, D., Žabokrtský, Z.: Gibbs sampling with treeness constraint in unsupervised dependency parsing. In: Proceedings of RANLP Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing, pp. 1–8. Hissar, Bulgaria (2011)Google Scholar
  12. 12.
    Mareček, D., Žabokrtský, Z.: Exploiting reducibility in unsupervised dependency parsing. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 297–307. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  13. 13.
    McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 62–72. Association for Computational Linguistics, Stroudsburg, July 2011. http://dl.acm.org/citation.cfm?id=2145432.2145440
  14. 14.
    Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Chair, N.C.C., Choukri, K., Declerck, T., Doan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012Google Scholar
  15. 15.
    Rasooli, M.S., Faili, H.: Fast unsupervised dependency parsing with arc-standard transitions. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, ROBUS-UNSUP 2012, pp. 1–9. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  16. 16.
    Seginer, Y.: Fast unsupervised incremental parsing. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 384–391. Association for Computational Linguistics, Prague, Czech Republic (2007)Google Scholar
  17. 17.
    Spitkovsky, V.I., Alshawi, H., Chang, A.X., Jurafsky, D.: Unsupervised dependency parsing without gold part-of-speech tags. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011) (2011)Google Scholar
  18. 18.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL-2011) (2011)Google Scholar
  19. 19.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three dependency-and-boundary models for grammar induction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012) (2012)Google Scholar
  20. 20.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: a study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle, October 2013Google Scholar
  21. 21.
    Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: to parse or not to parse? In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Mathematics and Physics, Institute of Formal and Applied LinguisticsCharles University in PraguePragueCzech Republic

Personalised recommendations