Advertisement

Knowledge and Information Systems

, Volume 56, Issue 1, pp 223–255 | Cite as

OntoILPER: an ontology- and inductive logic programming-based system to extract entities and relations from text

  • Rinaldo LimaEmail author
  • Bernard Espinasse
  • Fred Freitas
Regular Paper

Abstract

Named entity recognition (NER) and relation extraction (RE) are two important subtasks in information extraction (IE). Most of the current learning methods for NER and RE rely on supervised machine learning techniques with more accurate results for NER than RE. This paper presents OntoILPER a system for extracting entity and relation instances from unstructured texts using ontology and inductive logic programming, a symbolic machine learning technique. OntoILPER uses the domain ontology and takes advantage of a higher expressive relational hypothesis space for representing examples whose structure is relevant to IE. It induces extraction rules that subsume examples of entities and relation instances from a specific graph-based model of sentence representation. Furthermore, OntoILPER enables the exploitation of the domain ontology and further background knowledge in the form of relational features. To evaluate OntoILPER, several experiments over the TREC corpus for both NER and RE tasks were conducted and the yielded results demonstrate its effectiveness in both tasks. This paper also provides a comparative assessment among OntoILPER and other NER and RE systems, showing that OntoILPER is very competitive on NER and outperforms the selected systems on RE.

Keywords

Ontology-based information extraction Named entity recognition Relation extraction Ontology population Relational learning Supervised machine learning 

Notes

Acknowledgements

The authors are grateful to Hilário Oliveira for his help in the development of some of the OntoILPER components. We also thank the National Council for Scientific and Technological Development (CNPq/Brazil) for financial support (Grant No. 140791/2010-8).

References

  1. 1.
    Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T (2008) All-paths graph kernel for protein-protein interaction extraction with evaluation of cross corpus learning. BMC Bioinform. 9:S2CrossRefGoogle Scholar
  2. 2.
    Alicante A, Corazza A (2011) Barrier features for classification of semantic relations. In: Proceedings of the international conference recent advances in natural language processing (RANLP) 2011, Hissar, Bulgaria, pp 509–514Google Scholar
  3. 3.
    Baader F, Horrocks I, Sattler U (2008) Description logics. Handbook of knowledge representation. Elsevier, AtlantaGoogle Scholar
  4. 4.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, BostonGoogle Scholar
  5. 5.
    Björne J, Salakoski T (2015). TEES 2.2: Biomedical event extraction for diverse corpora. BMC Bioinform 16. Suppl 16 (2015): S4. PMC. Web. 1 NovGoogle Scholar
  6. 6.
    Brown M, Kros JF (2003) Data mining and the impact of missing data. Indu Manag Data Syst 103(8):611–621CrossRefGoogle Scholar
  7. 7.
    Byrd R, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. J Math Progr 134–1:127–155MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Camacho R, Ramos R, Fonseca N (2014). AND Parallelism for ILP: the APIS system. In: Inductive logic programming: 23rd international conference, ILP (2013) Rio de Janeiro, Brazil, August 28–30, 2013. Revised Selected Papers. Springer, Berlin, pp 93–106Google Scholar
  9. 9.
    Choi SP, Lee S, Jung H, Song S (2013) An intensive case study on kernel-based relation extraction. In: Proceedings of multimedia tools and applications, Springer, US, pp 1–27Google Scholar
  10. 10.
    Choi SP, Jeong CH, Choi YS, Myaeng SH (2009) Relation extraction based on extended composite kernel using flat lexical features. JKIISE Softw Appl 36(8):642–652Google Scholar
  11. 11.
    Christensen J, Mausam, Soderland S, Etzioni O (2010) Semantic role labeling for open information extraction. In: Proceedings of the NAACL HLT, First international workshop on formalisms and methodology for learning by reading (FAM-LbR ’10), ACL, Stroudsburg, PA, USA, pp 52–60Google Scholar
  12. 12.
    Ciaramita M, Altun Y (2006) Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP ’06), association for computational linguistics, Stroudsburg, PA, USA, pp 594–602Google Scholar
  13. 13.
    De Marneffe M-C, Manning CD (2006) Stanford typed dependencies manual. Technical report. Department of Computer Science, Stanford UniversityGoogle Scholar
  14. 14.
    Dou D, Wang H, Liu H (2015) Semantic data mining: a survey of ontology-based approaches. In: IEEE international conference on semantic computing (ICSC), 2015, Anaheim, CA, pp 244–251Google Scholar
  15. 15.
    Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, BerlinCrossRefzbMATHGoogle Scholar
  16. 16.
    Giuliano C, Lavelli A, Romano L (2007) Relation extraction and the influence of automatic NER. ACM Trans Speech Lang Process 5(1):2CrossRefGoogle Scholar
  17. 17.
    Gruber T (1993) Towards principles for the design of ontologies used for knowledge sharing. In: International workshop on formal ontology in conceptual analysis and knowledge representation, Kluwer Academic Publishers, Deventer, The NetherlandsGoogle Scholar
  18. 18.
    Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H (2015) A hybrid ontology-based information extraction system. J Inform Sci 2015:1–23Google Scholar
  19. 19.
    Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2009) OWL 2 Web ontology language primer. W3C Work Draft. http://www.w3.org/TR/owl2-primer
  20. 20.
    Horvath T, Paass G, Reichartz F, Wrobel S (2009) A logic-based approach to relation extraction from texts. In: De Raedt L (ed) Proceedings of the 19th international conference on inductive logic programming (ILP’09). Springer, Berlin, pp 34–48Google Scholar
  21. 21.
    Jiang J (2012) Information extraction from text. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, Berlin, pp 11–41CrossRefGoogle Scholar
  22. 22.
    Jiang J, Guan Y, Zhao C (2015) WI-ENRE in CLEF eHealth evaluation lab 2015: clinical named entity recognition based on CRF. In: Conference and labs of the evaluation forum Toulouse, France, September 8–11, CLEF (working notes)Google Scholar
  23. 23.
    Jiang J, Zhai CX (2007) A systematic exploration of the feature space for relation extraction. In: Annual conference of the North American chapter of the association for computational linguistics, NAACL-HLT’2007, Rochester, NY, USA, pp 113–120Google Scholar
  24. 24.
    Karkaletsis V, Fragkou P, Petasis G, Iosif E (2011) Ontology based information extraction from text. In: Paliouras G et al (eds) Multimedia information extraction, LNAI 6050, pp 89–109Google Scholar
  25. 25.
    Kate RJ, Mooney RJ (2010) Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of the 14th conference on computational natural language learning (CoNLL-2010), Uppsala, Sweden, July, pp 203–212Google Scholar
  26. 26.
    Kohavi R, John GH (1995) Automatic parameter selection by minimizing estimated error. In: 12th international conference on machine learning, San Francisco, Morgam KaufmanGoogle Scholar
  27. 27.
    Lavrac N, Dzeroski S (1994) Inductive logic programming: techniques and applications. Ellis Horwood, New YorkzbMATHGoogle Scholar
  28. 28.
    Lima R, Batista J, Ferreira R, Freitas F, Lins R, Simske S, Riss M (2014) Transforming graph-based sentence representations to alleviate overfitting in relation extraction. In: Proceedings of the 2014 ACM symposium on document engineering (DocEng ’14), ACM, New York, NY, USA, pp 53–62Google Scholar
  29. 29.
    Lima R, Espinasse B, Freitas F (2015) Relation extraction from texts with symbolic rules induced by inductive logic programming. In: Proceedings of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2015, Vietri sul Mar, Italy, pp 194–201Google Scholar
  30. 30.
    Lima R, Espinasse B, Oliveira H, Pentagrossa L, Freitas F (2013) Information extraction from the web: an ontology–based method using inductive logic programming. In: Proceeding of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2013, Washington DC, USA, pp 741–748Google Scholar
  31. 31.
    Li M, Munkhdalai T, Yu X, Keun HR (2015) A novel approach for protein-named entity recognition and protein-protein interaction extraction. Math Probl Eng 2015:10Google Scholar
  32. 32.
    Muggleton S (1991) Inductive logic programming. New Gener Comput 8(4):29CrossRefzbMATHGoogle Scholar
  33. 33.
    Muggleton S (1995) Inverse entailment and Progol. New Gener Comput 13:245–286CrossRefGoogle Scholar
  34. 34.
    Muggleton S, Fen C (1990) Efficient induction of logic programs. In: 1st conference on algorithmic learning theory Tokyo, pp 368–381Google Scholar
  35. 35.
    Muggleton S, Santos J, Tamaddoni-Nezhad A (2009) ProGolem: a system based on relative minimal generalisation. In: 19th international conference on ILP. Springer, Leuven, pp 131–148Google Scholar
  36. 36.
    Muzaffar AW, Azam F, Qamar U (2015) A relation extraction framework for biomedical text using hybrid feature set. Comput Math Methods Med 2015:12Google Scholar
  37. 37.
    Nitesh V, Chawla Kevin W, Bowyer Lawrence OH, Philip KW (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357zbMATHGoogle Scholar
  38. 38.
    Patel A, Ramakrishnan G, Bhattacharya P (2010) Incorporating linguistic expertise using ILP for named entity recognition in data hungry Indian languages, LNCS, vol 5989. Springer, Berlin, pp 178–185Google Scholar
  39. 39.
    Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E (2011) Ontology population and enrichment: state of the art. In: Paliouras G et al (eds) Multimedia information extraction, LNAI, vol 6050, pp 134–166Google Scholar
  40. 40.
    Plotkin G (1971) A note on inductive generalization. Mach Intell 5(1971):153–163zbMATHGoogle Scholar
  41. 41.
    Ramakrishnan G, Joshi S, Balakrishnan S, Srinivasan A (2008) Using ILP to construct features for information extraction from semi-structured text. In: Proceedings of the 17th international conference on inductive logic programming, LNAI, vol 4894. Springer, Berlin, pp 211–224Google Scholar
  42. 42.
    Roth D, Yih W (2004) A Linear programming formulation for global inference in natural language tasks. CoNLL 2004:1–8Google Scholar
  43. 43.
    Roth D, Yih W (2007) Global inference for entity and relation identification via a linear programming formulation. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, CambridgeGoogle Scholar
  44. 44.
    Santos J (2010) Efficient learning and evaluation of complex concepts in inductive logic programming. Ph.D. thesis, Imperial College UniversityGoogle Scholar
  45. 45.
    Seneviratne MD, Ranasinghe DN (2011) Inductive Logic programming in an agent system for ontological relation extraction. Int J Mach Learn Comput 1(4):344–352CrossRefGoogle Scholar
  46. 46.
    Smole D, Ceh M, Podobnikar T (2011) Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services. Int J Geogr Inf Sci 25:1809–1827CrossRefGoogle Scholar
  47. 47.
    Srinivasan A, Faruquie T, Joshi S (2012) Data and task parallelism in ILP using MapReduce. J Mach Learn 86–1:141–168MathSciNetCrossRefzbMATHGoogle Scholar
  48. 48.
    Tang J, Hong M, Zhang D, Liang B, Li J (2007) Information extraction: methodologies and applications. Emerging technologies of text mining: techniques and applications. Idea Group Inc., Hershey, pp 1–33Google Scholar
  49. 49.
    Wimalasuriya DC, Dou D (2010) Components for information extraction: ontology-based information extractors and generic platforms. In: CIKM’10, October 26–30, Toronto, Ontario, CanadaGoogle Scholar
  50. 50.
    Wimalasuriya DC, Dou D (2009) Ontology-based information extraction: an introduction and a survey of current approaches. J Inform Sci 36(3):306–323Google Scholar
  51. 51.
    Xia J, Fang, A C, Zhang X (2014) A novel feature selection strategy for enhanced biomedical event extraction using the Turku system. BioMed Res Int 2014:12Google Scholar
  52. 52.
    Zhou G, Zhang M, Ji D-H, Zhu Q (2007) Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: Joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 728–736Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.Department of Statistics and InformaticsFederal Rural University of PernambucoRecifeBrazil
  2. 2.LSIS-UMR CNRSAix-Marseille UniversityMarseilleFrance
  3. 3.Informatics CenterFederal University of PernambucoRecifeBrazil

Personalised recommendations