CRF+LG: A Hybrid Approach for the Portuguese Named Entity Recognition

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)

Abstract

Named Entity Recognition is an important and challenging task of Information Extraction. Conditional Random Fields (CRF) is a probabilistic method for structured prediction, which can be used in the Named Entity Recognition task. This paper presents the use of Conditional Random Fields for Named Entity Recognition in Portuguese texts considering an additional feature informed by a Local Grammar. Local grammars are handmade rules to identify named entities within the text. Moreover, we also present a study about the boundaries of CRF’s performance when using a result coming from any other classifier as an additional feature. Two well-known collections in Portuguese were used as training and test sets respectively. The results obtained outperform results of state-of-the-art systems reported in the literature for the Portuguese.

Keywords

Named Entity Recognition Conditional Random Fields Local Grammar 

References

  1. 1.
    Jiang, J.: Information extraction from text. In: Mining Text Data, pp. 11–41. Springer, Boston (2012)Google Scholar
  2. 2.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  3. 3.
    Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)Google Scholar
  4. 4.
    Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)Google Scholar
  5. 5.
    Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Linguateca (2007). http://www.linguateca.pt/aval_conjunta/LivroHAREM/Livro-SantosCardoso2007.pdf. ISBN 978-989-20-0731-1
  6. 6.
    Mota, C., Santos, D.: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca (2008). ISBN 978-989-20-1656-6Google Scholar
  7. 7.
    Pirovani, J.P.C., Oliveira, E.: Extração de Nomes de Pessoas em Textos em Português: uma Abordagem Usando Gramáticas Locais. In: Computer on the Beach 2015. SBC, Florianópolis, March 2015Google Scholar
  8. 8.
    Pellucci, P.R.S., de Paula, R.R., de Oliveira Silva, W.B., Ladeira, A.P.: Utilização de técnicas de aprendizado de máquina no reconhecimento de entidades nomeadas no português. e-Xacta 4(1), 73–81 (2011)Google Scholar
  9. 9.
    Oudah, M., Shaalan, K.F.: A pipeline Arabic named entity recognition using a hybrid approach. In: COLING, pp. 2159–2176 (2012)Google Scholar
  10. 10.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, vol. 1, pp. 282–289 (2001)Google Scholar
  11. 11.
    Gross, M.: The construction of local grammars. In: Roche, E., Schabs, Y. (eds.) Finite-State Language Processing, Language, Speech, and Communication, pp. 329–354. The MIT Press, Cambridge (1997)Google Scholar
  12. 12.
    Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for Portuguese named entity recognition. Inteligencia Artif. 11(36), 67–75 (2007). Revista Iberoamericana de Inteligencia ArtificialGoogle Scholar
  13. 13.
    do Amaral, D.O.F.: O reconhecimento de entidades nomeadas por meio de conditional random fields para a língua portuguesa. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil (2013)Google Scholar
  14. 14.
    dos Santos, C.N., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entities Workshop, ACL 2015, pp. 25–33 (2015)Google Scholar
  15. 15.
    Konkol, M., Konopík, M.: Segment representations in named entity recognition. In: International Conference on Text, Speech, and Dialogue, pp. 61–70. Springer (2015)Google Scholar
  16. 16.
    Amaral, D.O., Fonseca, E.B., Lopes, L., Vieira, R.: Comparative analysis of Portuguese named entities recognition tools. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2554–2558 (2014)Google Scholar
  17. 17.
    dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)Google Scholar
  18. 18.
    Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)Google Scholar
  19. 19.
    do Amaral, D.O.F., Buffet, M., Vieira, R.: Comparative analysis between notations to classify named entities using conditional random fields (2015)Google Scholar
  20. 20.
    Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends® Mach. Learn. 4(4), 267–373 (2012)CrossRefMATHGoogle Scholar
  21. 21.
    Bussab, W.d.O., Morettin, P.A.: Estatística básica. Saraiva (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Programa de Pós-Graduação em InformáticaUniversidade Federal do Espírito Santo (UFES)VitóriaBrazil

Personalised recommendations