Skip to main content

Semantic Relation Extraction. Resources, Tools and Strategies

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

  • 709 Accesses

Abstract

Relation extraction is a subtask of information extraction that aims at obtaining instances of semantic relations present in texts. This information can be arranged in machine-readable formats, useful for several applications that need structured semantic knowledge. The work presented in this paper explores different strategies to automate the extraction of semantic relations from texts in Portuguese, Galician and Spanish. Both machine learning (distant-supervised and supervised) and rule-based techniques are investigated, and the impact of the different levels of linguistic knowledge is analyzed for the various approaches. Regarding domains, the experiments are focused on the extraction of encyclopedic knowledge, by means of the development of biographical relations classifiers (in a closed domain) and the evaluation of an open information extraction tool. To implement the extraction systems, several natural language processing tools have been built for the three research languages: From sentence splitting and tokenization modules to part-of-speech taggers, named entity recognizers and coreference resolution systems. Furthermore, several lexica and corpora have been compiled and enriched with different levels of linguistic annotation, which are useful for both training and testing probabilistic and symbolic models. As a result of the performed work, new resources and tools are available for automated processing of texts in Portuguese, Galician and Spanish.

This work has been partially supported by the Spanish Ministry of Economy and Competitiveness through the project FFI2014-51978-C2-1-R, and by a Juan de la Cierva formación grant, reference FJCI-2014-22853.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A possible English translation could be: “John A. Garcia (born in 1949 in Galicia) is one of the pioneers of the modern American computer game industry and the current president of Novalogic.”.

  2. 2.

    All of them are freely available at http://gramatica.usc.es/~marcos/phd.html.

References

  1. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries, pp. 85–94 (2000)

    Google Scholar 

  2. Banko, M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007), pp. 2670–2676 (2007)

    Google Scholar 

  3. Barcala, F.M., Domínguez Noya, E.M., Otero, P.G., López Martínez, M., Moscoso Mato, E.M., Rojo, G., Santalla del Río, M.P., Sotelo Docío, S.: A corpus and lexical resources for multi-word terminology extraction in the field of economy in a in a minority language. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, Proceedings of the 3rd Language & Technology Conference, pp. 359–363 (2007)

    Google Scholar 

  4. Bosque 8.0: Uma floresta integralmente revista por linguistas (2008)

    Google Scholar 

  5. Branco, A., Silva, J.R.: Contractions: breaking the tokenization-tagging circularity. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.G.V. (eds.) PROPOR 2003. LNCS (LNAI), vol. 2721, pp. 167–170. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Branco, A., Silva, J.: Evaluating solutions for the rapid development of state-of-the-art POS taggers for portuguese. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 507–510 (2004)

    Google Scholar 

  7. Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proceedings of the WebDB Workshop at the 6th International Conference on Extending Database Technology (EDBT 1998), pp. 172–183 (1998)

    Google Scholar 

  8. Bruckschen, M., Camargo de Souza, J., Vieira, R., Rigo, S.: Sistema SeRELeP para o reconhecimento de relações entre entidades mencionadas. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, Chap. 14, pp. 247–260. Linguateca (2008)

    Google Scholar 

  9. Cardoso, N.: REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações ANálise Detalhada do Texto. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pp. 195–211. Linguateca (2008)

    Google Scholar 

  10. Carreras, X., Márquez, L., Padró, L.: A simple named entity extractor using AdaBoost. In: Proceedings of the 7th Conference on Natural Language Learning at HLT/NAACL 2003, vol. 4, pp. 152–155. ACL (2003)

    Google Scholar 

  11. Chaves, M.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM, pp. 231–245. Linguateca (2008)

    Google Scholar 

  12. Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web (WWW 2013), pp. 355–366 (2013)

    Google Scholar 

  13. Eleutério, S., Ranchhod, E., Mota, C., Carvalho, P.: Dicionários Electrónicos do Português. Características e Aplicações. In: Actas del VIII Simposio Internacional de Comunicación Social, pp. 636–642 (2003)

    Google Scholar 

  14. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D., Yates, A.: Web-scale information extraction in KnowItAll. In: Proceedings of the 13th International Conference on World Wide Web (WWW 2004), pp. 100–110. ACM (2004)

    Google Scholar 

  15. Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pp. 3–10 (2011)

    Google Scholar 

  16. Gamallo, P., Garcia, M.: A resource-based method for named entity extraction and classification. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 610–623. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 10–18. ACL (2012)

    Google Scholar 

  18. Gamallo, P., González López, I.: A grammatical formalism based on patterns of part-of-speech tags. Int. J. Corpus Linguist. 16(1), 45–71 (2011)

    Article  Google Scholar 

  19. Garcia, M.: Extracção de Relações Semânticas. Recursos, Ferramentas e Estratégias. Ph.D. thesis, Universidade de Santiago de Compostela (2014)

    Google Scholar 

  20. Garcia, M., Gamallo, P.: Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação. Linguamática. Revista para o Processamento Automático das Línguas Ibéricas 2(2), 59–67 (2010)

    Google Scholar 

  21. Garcia, M., Gamallo, P.: Using morphosyntactic post-processing to improve PoS-tagging accuracy. In: Proceedings of the 9th International Conference on Computational Processing of Portuguese Language (PROPOR 2010), Extended Activities Proceedings (2010)

    Google Scholar 

  22. Garcia, M., Gamallo, P.: A weakly-supervised rule-based approach for relation extraction. In: Proceedings of the XIV Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011). Workshop on Knowledge Extraction and Exploitation from Semi-structures Online Sources (KEESOS) (2011)

    Google Scholar 

  23. Garcia, M., Gamallo, P.: An exploration of the linguistic knowledge for semantic relation extraction in Spanish. In: Proceedings of the Joint Workshop FAM-LbR/KRAQ 2011. In: Learning by Reading and its Applications in Intelligent Question-Answering at 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pp. 7–12 (2011)

    Google Scholar 

  24. Garcia, M., Gamallo, P.: Dependency-based text compression for semantic relation extraction. In: Proceedings of the Workshop on Information Extraction and Knowledge Acquisition (IEKA 2011) at 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), pp. 21–28 (2011)

    Google Scholar 

  25. Garcia, M., Gamallo, P.: Evaluating various features on semantic relation extraction. In: Proceedings of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), pp. 721–726 (2011)

    Google Scholar 

  26. Garcia, M., Gamallo, P.: Exploring the effectiveness of linguistic knowledge for biographical relation extraction. Nat. Lang. Eng. 21(4), 519–551 (2013)

    Article  Google Scholar 

  27. Garcia, M., Gamallo, P.: An entity-centric coreference resolution system for person entities with rich linguistic information. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 741–752 (2014)

    Google Scholar 

  28. Garcia, M., Gamallo, P.: Entity-centric coreference resolution of person entities for open information extraction. Procesamiento del Lenguaje Natural 53, 25–32 (2014)

    Google Scholar 

  29. Garcia, M., Gamallo, P.: Multilingual corpora with coreference annotation of person entities. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3229–3233. ELRA (2014)

    Google Scholar 

  30. Garcia, M., Gamallo, P., Gayo, I., Pousada Cruz, M.: PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Procesamiento del Lenguaje Natural 53, 95–101 (2014)

    Google Scholar 

  31. Garcia, M., Gayo, I., González López, I.: Identificação e Classificação de Entidades Mencionadas em Galego. Estudos de Lingüística Galega 4, 13–25 (2012)

    Google Scholar 

  32. Graña, J., Barcala, F.-M., Vilares, J.: Formal methods of tokenization for part-of-speech tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 123–144. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  33. Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. ACL (1992)

    Google Scholar 

  34. Leach, G., Wilson, A.: Recommendations for the morphosyntactic annotation of corpora. Technical report, Expert Advisory Group on Language Engineering Standard (EAGLES) (1996)

    Google Scholar 

  35. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)

    Article  Google Scholar 

  36. Mikheev, A., Grover, C., Moens, M.: XML tools and architecture for Named Entity Recognition. J. Markup Lang. Theory Pract. 1(3), 89–113 (1998)

    Article  Google Scholar 

  37. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009), pp. 1003–1011. ACL (2009)

    Google Scholar 

  38. Mota, C., Santos, D. (eds.): Desafios na avaliação conjunta do reconhecimento de entidades mencionadas. O Segundo HAREM. Linguateca (2008)

    Google Scholar 

  39. Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA (2012)

    Google Scholar 

  40. Palomar, M., Ferrández, A., Moreno, L.: Martínez-Barco, P., Peral, J., Saiz-Noeda, M., Muñoz, R.: An algorithm for anaphora resolution in Spanish texts. Comput. Linguist. 27(4), 545–567 (2001)

    Google Scholar 

  41. Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), pp. 113–120. ACL (2006)

    Google Scholar 

  42. Recasens, M.: Martí, M.: AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Lang. Res. Eval. 44(4), 315–345 (2010)

    Google Scholar 

  43. Santos, D., Cardoso, N. (eds.): Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Linguateca (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Garcia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Garcia, M. (2016). Semantic Relation Extraction. Resources, Tools and Strategies. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics