IBERAMIA 2014: Advances in Artificial Intelligence -- IBERAMIA 2014 pp 108-119 | Cite as
Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields
Abstract
An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.
Keywords
Information extraction Relation extraction Named entity Named entity recognition Natural language processing Conditional random fieldsPreview
Unable to display preview. Download preview PDF.
References
- 1.Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)CrossRefGoogle Scholar
- 2.Jurafsky, D., Martin, J.H.: Speed and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in Artificial Inteligence, 2nd edn. Pearson Education Ltd., London (2009)Google Scholar
- 3.Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open Information Extraction: the second generation. In: Twenty-second International Joint Conference on Artificial Intelligence, IJCAI, pp. 3–10 (2011)Google Scholar
- 4.Chen, Y., Zheng, Q., Wang, W., Chen, Y.: Knowledge element relation extraction using conditional random fields. In: CSCWD, pp 245–250 (2010)Google Scholar
- 5.Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds) ACL, The Association for Computer, Linguistics, Bulgaria, pp. 28–36 (2010)Google Scholar
- 6.Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on NLP. Asian Federation of NLP, Chiang Mai, pp. 392–400 (2011)Google Scholar
- 7.Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)Google Scholar
- 8.Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on NLP of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
- 9.Agichtein, E., Gravano, L.: SNOWBALL: Extracting relations from large plain-text collections. In: 5th ACM International Conference on Digital Libraries, pp 85–94 (2000)Google Scholar
- 10.Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)CrossRefGoogle Scholar
- 11.Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: preliminary results. In: WWW, pp. 100–110 (2004)Google Scholar
- 12.Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP, pp. 1535–1545 (2011)Google Scholar
- 13.Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 415. Association for Computational Linguistics (2004)Google Scholar
- 14.Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: a cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-state Language Processing, pp. 383–406. MIT Press, Cambridge (1997)Google Scholar
- 15.Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L.S., Weld, D.S.: Knowledge-based Weak Supervision for Information Extraction of Overlapping Relations, pp. 541–550. ACL, Stroudsburg (2011)Google Scholar
- 16.Sun, A.: A two-stage bootstrapping algorithm for relation extraction. In: Proceedings of RANLP 2009—recent advances in NLP, Borovets, Bulgaria (2009)Google Scholar
- 17.Wu, F., Weld, D.S.: Open information extraction using Wikipedia. In: ACL, Stroudsburg, pp. 118–127 (2010)Google Scholar
- 18.Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), main volume, Barcelona, pp. 423–429 (2004)Google Scholar
- 19.Cardoso, N.: REMBRANDT — Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 11. Linguateca, pp. 195–211 (2008)Google Scholar
- 20.Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema SeRELeP para o Reconhecimento de Relações entre Entidades Mencionadas. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 14. Linguateca, pp. 247–260 (2008)Google Scholar
- 21.Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In: Mota, C., Santos, D. (eds) Segundo HAREM, Chap. 13. Linguateca, pp. 231–245 (2008)Google Scholar
- 22.Batista, D.S., Forte, D., Silva, R., Martins, B., Silva, M.: Extracção de relações semânticas de textos em português explorando a DBpédia e a Wikipédia. Linguamatica 5(1), 41–57 (2013)Google Scholar
- 23.Santos, D., Mamede, N., Baptista, J.: Extraction of family relations between entities. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 549–560 (2010)Google Scholar
- 24.Taba, L.S., de Medeiros Caseli, H.: Automatic Hyponymy Identification from Brazilian Portuguese Texts. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdig\ {a}o, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 186–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 25.Ferreira, L., Oliveira, C., Teixeira, A., Cunha, J.: Extração de informação de relatórios médicos. Linguamatica 1(1), 89–101 (2009)Google Scholar
- 26.Oliveira, H. G., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Barbosa, L. S., Correia, M. P. (ed) Proceedings of the INForum 2010—II Simpósio de Informática, Braga, Portugal, pp. 537–548 (2010)Google Scholar
- 27.Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
- 28.Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)Google Scholar
- 29.Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceeding of the Twenty-Sixty AAAI Conference on Artificial Intelligence, AAAI, Toronto, Ontario, Canada (2012)Google Scholar
- 30.Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on portuguese. Journal of the Brazilian Computer Society 19, 553–571 (2013)CrossRefGoogle Scholar
- 31.Freitas, C., Santos, D., Oliveira, H.G., Carvalho, P., Mota, C.: Relações semânticas do ReRelEM: além das entidades no Segundo HAREM, Chap. 4. Linguateca, pp. 75–94 (2008)Google Scholar
- 32.Bick, E.: The parsing system PALAVRAS. In: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Frame-work. University of Arhus, Arhus (2000)Google Scholar
- 33.Collovini, S., Grando, F., Souza, M., Freitas, L., Vieira, R.: Semantic relations extraction in the organization domain. In: Proceedings of IADIS International Conference on Applied Computing, Rio de Janeiro, pp. 99–106 (2011)Google Scholar
- 34.Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)Google Scholar
- 35.Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics. Macmillan, New York (1978)Google Scholar