Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields

  • Sandra Collovini
  • Lucas Pugens
  • Aline A. Vanin
  • Renata Vieira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8864)

Abstract

An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.

Keywords

Information extraction Relation extraction Named entity Named entity recognition Natural language processing Conditional random fields 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)CrossRefGoogle Scholar
  2. 2.
    Jurafsky, D., Martin, J.H.: Speed and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in Artificial Inteligence, 2nd edn. Pearson Education Ltd., London (2009)Google Scholar
  3. 3.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: the second generation. In: Twenty-second International Joint Conference on Artificial Intelligence, IJCAI, pp. 3–10 (2011)Google Scholar
  4. 4.
    Chen, Y., Zheng, Q., Wang, W., Chen, Y.: Knowledge element relation extraction using conditional random fields. In: CSCWD, pp 245–250 (2010)Google Scholar
  5. 5.
    Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds) ACL, The Association for Computer, Linguistics, Bulgaria, pp. 28–36 (2010)Google Scholar
  6. 6.
    Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on NLP. Asian Federation of NLP, Chiang Mai, pp. 392–400 (2011)Google Scholar
  7. 7.
    Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)Google Scholar
  8. 8.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on NLP of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  9. 9.
    Agichtein, E., Gravano, L.: SNOWBALL: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, pp 85–94 (2000)Google Scholar
  10. 10.
    Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: preliminary results. In: WWW, pp. 100–110 (2004)Google Scholar
  12. 12.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)Google Scholar
  13. 13.
    Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 415. Association for Computational Linguistics (2004)Google Scholar
  14. 14.
    Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: a cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-state Language Processing, pp. 383–406. MIT Press, Cambridge (1997)Google Scholar
  15. 15.
    Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based Weak Supervision for Information Extraction of Overlapping Relations, pp. 541–550. ACL, Stroudsburg (2011)Google Scholar
  16. 16.
    Sun, A.: A two-stage bootstrapping algorithm for relation extraction. In: Proceedings of RANLP 2009—recent advances in NLP, Borovets, Bulgaria (2009)Google Scholar
  17. 17.
    Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL, Stroudsburg, pp. 118–127 (2010)Google Scholar
  18. 18.
    Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), main volume, Barcelona, pp. 423–429 (2004)Google Scholar
  19. 19.
    Cardoso, N.: REMBRANDT — Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 11. Linguateca, pp. 195–211 (2008)Google Scholar
  20. 20.
    Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema SeRELeP para o Reconhecimento de Relações entre Entidades Mencionadas. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 14. Linguateca, pp. 247–260 (2008)Google Scholar
  21. 21.
    Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 13. Linguateca, pp. 231–245 (2008)Google Scholar
  22. 22.
    Batista, D.S., Forte, D., Silva, R., Martins, B., Silva, M.: Extracção de relações semânticas de textos em português explorando a DBpédia e a Wikipédia. Linguamatica 5(1), 41–57 (2013)Google Scholar
  23. 23.
    Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 549–560 (2010)Google Scholar
  24. 24.
    Taba, L.S., de Medeiros Caseli, H.: Automatic Hyponymy Identification from Brazilian Portuguese Texts. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdig\ {a}o, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 186–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Ferreira, L., Oliveira, C., Teixeira, A., Cunha, J.: Extração de informação de relatórios médicos. Linguamatica 1(1), 89–101 (2009)Google Scholar
  26. 26.
    Oliveira, H. G., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 537–548 (2010)Google Scholar
  27. 27.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  28. 28.
    Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)Google Scholar
  29. 29.
    Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceeding of the Twenty-Sixty AAAI Conference on Artificial Intelligence, AAAI, Toronto, Ontario, Canada (2012)Google Scholar
  30. 30.
    Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on portuguese. Journal of the Brazilian Computer Society 19, 553–571 (2013)CrossRefGoogle Scholar
  31. 31.
    Freitas, C., Santos, D., Oliveira, H.G., Carvalho, P., Mota, C.: Relações semânticas do ReRelEM: além das entidades no Segundo HAREM, Chap. 4. Linguateca, pp. 75–94 (2008)Google Scholar
  32. 32.
    Bick, E.: The parsing system PALAVRAS. In: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Frame-work. University of Arhus, Arhus (2000)Google Scholar
  33. 33.
    Collovini, S., Grando, F., Souza, M., Freitas, L., Vieira, R.: Semantic relations extraction in the organization domain. In: Proceedings of IADIS International Conference on Applied Computing, Rio de Janeiro, pp. 99–106 (2011)Google Scholar
  34. 34.
    Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)Google Scholar
  35. 35.
    Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Macmillan, New York (1978)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sandra Collovini
    • 1
  • Lucas Pugens
    • 1
  • Aline A. Vanin
    • 1
  • Renata Vieira
    • 1
  1. 1.Pontifícia Universidade Católica do Rio Grande do Sul - PUCRSPorto AlegreBrazil

Personalised recommendations