Skip to main content

Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8864)

Abstract

An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.

Keywords

  • Information extraction
  • Relation extraction
  • Named entity
  • Named entity recognition
  • Natural language processing
  • Conditional random fields

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-12027-0_9
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-12027-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)

    CrossRef  Google Scholar 

  2. Jurafsky, D., Martin, J.H.: Speed and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in Artificial Inteligence, 2nd edn. Pearson Education Ltd., London (2009)

    Google Scholar 

  3. Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: the second generation. In: Twenty-second International Joint Conference on Artificial Intelligence, IJCAI, pp. 3–10 (2011)

    Google Scholar 

  4. Chen, Y., Zheng, Q., Wang, W., Chen, Y.: Knowledge element relation extraction using conditional random fields. In: CSCWD, pp 245–250 (2010)

    Google Scholar 

  5. Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds) ACL, The Association for Computer, Linguistics, Bulgaria, pp. 28–36 (2010)

    Google Scholar 

  6. Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on NLP. Asian Federation of NLP, Chiang Mai, pp. 392–400 (2011)

    Google Scholar 

  7. Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  8. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on NLP of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  9. Agichtein, E., Gravano, L.: SNOWBALL: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, pp 85–94 (2000)

    Google Scholar 

  10. Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  11. Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: preliminary results. In: WWW, pp. 100–110 (2004)

    Google Scholar 

  12. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)

    Google Scholar 

  13. Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 415. Association for Computational Linguistics (2004)

    Google Scholar 

  14. Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: a cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-state Language Processing, pp. 383–406. MIT Press, Cambridge (1997)

    Google Scholar 

  15. Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based Weak Supervision for Information Extraction of Overlapping Relations, pp. 541–550. ACL, Stroudsburg (2011)

    Google Scholar 

  16. Sun, A.: A two-stage bootstrapping algorithm for relation extraction. In: Proceedings of RANLP 2009—recent advances in NLP, Borovets, Bulgaria (2009)

    Google Scholar 

  17. Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL, Stroudsburg, pp. 118–127 (2010)

    Google Scholar 

  18. Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), main volume, Barcelona, pp. 423–429 (2004)

    Google Scholar 

  19. Cardoso, N.: REMBRANDT — Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 11. Linguateca, pp. 195–211 (2008)

    Google Scholar 

  20. Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema SeRELeP para o Reconhecimento de Relações entre Entidades Mencionadas. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 14. Linguateca, pp. 247–260 (2008)

    Google Scholar 

  21. Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 13. Linguateca, pp. 231–245 (2008)

    Google Scholar 

  22. Batista, D.S., Forte, D., Silva, R., Martins, B., Silva, M.: Extracção de relações semânticas de textos em português explorando a DBpédia e a Wikipédia. Linguamatica 5(1), 41–57 (2013)

    Google Scholar 

  23. Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 549–560 (2010)

    Google Scholar 

  24. Taba, L.S., de Medeiros Caseli, H.: Automatic Hyponymy Identification from Brazilian Portuguese Texts. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdig\ {a}o, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 186–192. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  25. Ferreira, L., Oliveira, C., Teixeira, A., Cunha, J.: Extração de informação de relatórios médicos. Linguamatica 1(1), 89–101 (2009)

    Google Scholar 

  26. Oliveira, H. G., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 537–548 (2010)

    Google Scholar 

  27. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  28. Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)

    Google Scholar 

  29. Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceeding of the Twenty-Sixty AAAI Conference on Artificial Intelligence, AAAI, Toronto, Ontario, Canada (2012)

    Google Scholar 

  30. Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on portuguese. Journal of the Brazilian Computer Society 19, 553–571 (2013)

    CrossRef  Google Scholar 

  31. Freitas, C., Santos, D., Oliveira, H.G., Carvalho, P., Mota, C.: Relações semânticas do ReRelEM: além das entidades no Segundo HAREM, Chap. 4. Linguateca, pp. 75–94 (2008)

    Google Scholar 

  32. Bick, E.: The parsing system PALAVRAS. In: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Frame-work. University of Arhus, Arhus (2000)

    Google Scholar 

  33. Collovini, S., Grando, F., Souza, M., Freitas, L., Vieira, R.: Semantic relations extraction in the organization domain. In: Proceedings of IADIS International Conference on Applied Computing, Rio de Janeiro, pp. 99–106 (2011)

    Google Scholar 

  34. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)

    Google Scholar 

  35. Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Macmillan, New York (1978)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Collovini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Collovini, S., Pugens, L., Vanin, A.A., Vieira, R. (2014). Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12027-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12026-3

  • Online ISBN: 978-3-319-12027-0

  • eBook Packages: Computer ScienceComputer Science (R0)