Semantic Role Labeling for Portuguese – A Preliminary Approach –

  • João Sequeira
  • Teresa Gonçalves
  • Paulo Quaresma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7243)

Abstract

Currently there are increasingly more private and academic publications in the form of digital content on the Internet making extremely difficult to extract and maintain the content information manually. Normally, these tasks follow approximations based on natural language processing. This paper presents a preliminary approach for obtaining a semantic role labeler for Portuguese, a little explored aspect of natural language processing for this language. The approach was evaluated for the 3 most frequent semantic roles (relation, subject and object) with a subset of Bosque 8.0 corpus. The same approach was applied to an English corpus – the CONLL’2004 one and its results were compared to the ones obtained on the CONLL’2004 shared task. At the same time it presents BosqueUE, a Portuguese corpus for semantic role labeling that can be the basis material for future research in the area. This corpus has the same format as the CONLL’2004 one, facilitating multi-language evaluations.

Keywords

Support Vector Machine Hide Markov Model Natural Language Processing Conditional Random Field Semantic Role 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afonso, S., Bick, E., Haber, R., Santos, D.: Florestra sintá(c)tica: A treebank for portuguese. In: LREC 2002, the Third International Conference on Language Resources and Evaluation, pp. 1698–1703 (2002)Google Scholar
  2. 2.
    Amancio, M.A., Duran, M.S., Aluisio, S.M.: Automatic question categorization: a new approach for text elaboration. Procesamiento del Lenguaje Natural (46), 43–50 (March 2011) Google Scholar
  3. 3.
    Bick, E.: The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Ph.D. thesis, Aarhus University, Aarhus, Denmark (November 2000)Google Scholar
  4. 4.
    Bick, E.: The Parsing System ”PALAVRAS”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press (2000)Google Scholar
  5. 5.
    Bick, E.: Automatic semantic-role annotation for portuguese. In: Anais do XXVII Congresso de SBC (2007)Google Scholar
  6. 6.
    Carreras, X., Màrquez, L.: Introduction to the conll-2004 shared task: Semantic role labeling. In: Proceedings of CoNLL 2004 (2004)Google Scholar
  7. 7.
    Carreras, X., Màrquez, L.: Introduction to the conll-2005 shared task: Semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning, CoNLL 2005 (2005)Google Scholar
  8. 8.
    Charniak, E.: A maximum-entropy inspired parser. In: Proceedings of NAACL 2000 (2000)Google Scholar
  9. 9.
    Cohen, W.: Minorthird: methods for identifying names and ontological relations in text using heuristics for inducing regularities from data (2004), http://minorthird.sourceforge.net
  10. 10.
    Collins, M.: Head-driven statistical models for natural language parsing. Computational Linguistics 29(4), 589–637 (2003)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Duran, M.S., Aluisio, S.M.: Propbank-br: a brazilian portuguese corpus annotated with semantic role labels. In: STIL 2011 – 8th Symposium in Information and Human Language Technology (October 2011)Google Scholar
  12. 12.
    Francis, W., Kucera, H.: Brown corpus manual (1997), http://icame.uib.no/brown/bcm.html
  13. 13.
    Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28, 245–288 (2002)CrossRefGoogle Scholar
  14. 14.
    Gildea, D., Hockenmaier, J.: Identifying semantic roles using combinatory categorial grammar. In: Proceedings of the 2003 conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 57–64. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar
  15. 15.
    Hacioglu, K., Pradhan, S., Ward, W., Martin, J., Jurafsky, D.: Semantic role labeling by tagging syntactic chunks. In: Proceedings of CoNLL 2004 Shared Task, pp. 110–113 (2004)Google Scholar
  16. 16.
    Kingsbury, P., Palmer, M.: From treebank to propbank (2002), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.7566
  17. 17.
    Kudo, T.: Tinysvm: Support vector machines (2002), http://chasen.org/~taku/software/TinySVM
  18. 18.
    Laboratório de Engenharia da Linguagem: Label-lex (1995), http://label.ist.utl.pt/pt/apresentacao.php
  19. 19.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, pp. 282–289 (2001)Google Scholar
  20. 20.
    Linguateca: Florestra sintá(c)tica (2009), http://www.linguateca.pt/floresta/corpus.html
  21. 21.
    Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  22. 22.
    Miranda, N., Raminhos, R., Seabra, P., Sequeira, J., Gonçalves, T., Quaresma, P.: Named entity recognition using machine learning techniques. In: EPIA 2011, 15th Portuguese Conference on Artificial Intelligence, Lisbon, PT (October 2011)Google Scholar
  23. 23.
    Palmer, M., Gildea, D., Kingsbury, P.: The preposition bank: An annotated corpus of semantic roles. Computational Linguistics 31 (2005)Google Scholar
  24. 24.
    Palmer, M., Gildea, D., Xue, N.: Semantic Role Labeling. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2010)Google Scholar
  25. 25.
    Pradhan, S., Hacioglu, K., Ward, W., Martin, J., Jurafsky, D.: Semantic role chunking combining complementary syntactic views. In: Proceedings of the Ninth Conference on Computational Natural Language Learning, CoNLL 2005 (2005)Google Scholar
  26. 26.
    Project, T.P.T.: The penn treebank project (1999), http://www.cis.upenn.edu/~treebank/
  27. 27.
    Punyakanok, V., Koomen, P., Roth, D., Yih, W.: Generalized inference with multiple semantic role labeling systems. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL 2005), pp. 181–184 (2005)Google Scholar
  28. 28.
    Punyakanok, V., Roth, D., Yih, W., Zimak, D., Tu, Y.: Semantic role labeling via generalized inference over classifiers. In: Proceedings of CoNLL 2004 Shared Task (2004)Google Scholar
  29. 29.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)Google Scholar
  30. 30.
    Rabiner, L., Juang, B.: An introduction to hidden markov models. IEEE ASSP Magazine (Janeiro 1986)Google Scholar
  31. 31.
    Roth, D.: Learning to resolve natural language ambiguities: A unified approach. In: Proc. of AAAI, pp. 806–813 (1998)Google Scholar
  32. 32.
    Roth, D., Yih, W.: Probabilistic reasoning for entity & relation recognition. In: The 19th International Conference on Computational Linguistics, COLING 2002, pp. 835–841 (2002)Google Scholar
  33. 33.
    Stamp, M.: A revealing introduction to hidden markov models (2004), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.137&rank=1
  34. 34.
    Vapnik, V.: Statistical Learning Theory. Wiley-Interscience (Setembro 1998)Google Scholar
  35. 35.
    Wallach, H.: Conditional random fields: An introduction (2004), http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.6711
  36. 36.
    Xue, N., Palmer, M.: Calibrating features for semantic role labeling. In: Proc. of the EMNLP 2004, pp. 88–94 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • João Sequeira
    • 1
  • Teresa Gonçalves
    • 1
  • Paulo Quaresma
    • 1
  1. 1.Universidade de ÉvoraPortugal

Personalised recommendations