Abstract
Most knowledge is available in unstructured texts, however, it must be represented and handled automatically to become truly useful for the construction knowledge-based systems. Ontologies are an approach for knowledge representation capable of expressing a set of entities and their relationships, constraints, axioms and vocabulary of a given domain. Ontology population looks for identifying instances of concepts, relationships and properties of an ontology. Manual population by domain experts and knowledge engineers is an expensive and time consuming task so, automatic or semi-automatic approaches are needed. This article proposes a process for semi-automatic population of ontologies from text focusing on the application of natural language processing and information extraction techniques to acquire and classify ontology instances. Some experiments using a legal corpus were conducted in order to evaluate it. Initial results are promising and indicate that our approach can extract instances with high effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, J.: Natural Language Understanding. Cummings Publishing Company, Redwood City (1995)
Cimiano, P., Volker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of RANLP 2005, Borovets, Bulgaria, pp. 166–172 (2005)
Cimiano, P., Ladwig, G., Staab, S.: Gimme the context: Context-driven automatic semantic annotation with C-PANKOW. In: Proceedings of the 14th World Wide Web Conference (WWW), pp. 332–341 (2005)
Cowie, J., Wilks, Y.: Information Extraction. Handbook of Natural Language Processing, Robert Dale, Hermann Moisl and Harold Somers, 241–260 (2000)
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the world wide web. Artificial Intelligence 118, 69–113 (2000)
Cunningham, H.: Information Extraction. In: Encyclopedia of Language and Linguistics, 2nd edn. (2005)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (July 2002)
Dale, R., Moisl, H., Somers, H.L.: Handbook of natural language processing. CRC, Boca Raton (2000)
Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D., Yates, A.: Web-scale information extraction in KnowItAU. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 100–109 (2004)
Evans, R.: A framework for named entity recognition in the open domain. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 137–144 (2003)
Fleischman, M., Hovy, E.: Fine Grained Classification of Named Entities. In: Proceedings of COLING, Taipei, Taiwan (August 2002)
Girardi, R.: Guiding Ontology Learning and Population by Knowledge System Goals. In: Proceedings of International Conference on Knowledge Engineering and Ontology Development, pp. 480–484. INSTIIC, Valence (2010)
Giuliano, C., Gliozzo, A.: Instance-Based Ontology Population Exploiting Named-Entity Substitution. In: Proceedings of the The 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, August 18-22 (2008)
Guarino, N., Masolo, C., Vetere, C.: Ontoseek: Content-based Access to the web. IEEE Intelligent Systems 14(3), 70–80 (1999)
Alcalá-Fdez, J., Sánchez, L., García, S., Jesús, M.J., Ventura, S., Josep, M.G.G., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13(3), 307–318 (2009)
Karkaletsis, V., Valarakos, A., Spyropoulos, C.D.: Populating ontologies in biomedicine and presenting their content using multilingual generation. Acquiring and Representing Multilingual, Specialized Lexicons: the Case of Biomedicine (2006)
Macedo, M.J.C.: Natural Language Processing for Identification of Classes and Instances at an Ontology. In: CGCC-UFMA Final Degree work (2010) (in Portuguese)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: Penn TreeBank. Computational linguistics: Special Issue on Using Large Corpora 19(2), 313–330 (1993)
Nierenburg, S., Raskin, V.: Ontological Semantics. MIT Press, Cambridge (2004)
Noy, N.F., Fergerson, R.W., Musen, M.A.: The knowledge model of protégé-2000: Combining interoperability and flexibility. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 17–32. Springer, Heidelberg (2000)
OWL, http://www.w3.org/2001/sw/WebOnt/ (last acess November 2010)
Ruiz-Martínez, J.M., Miñarro-Giménez, J.A., Guillén-Cárceles, L., Castellanos-Nieves, D., Valencia-García, R., García-Sánchez, F., Fernández-Breis, J.T., Martínez-Béjar, R.: Populating Ontologies in the eTourism Domain. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, December 09-12, vol. 03, pp. 316–319. IEEE Computer Society, Washington, DC (2008)
Tanev, H., Magnini, B.: Weakly Supervised Approaches for Ontology Population. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 17–24 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faria, C., Girardi, R. (2011). An Information Extraction Process for Semi-automatic Ontology Population. In: Corchado, E., Snášel, V., Sedano, J., Hassanien, A.E., Calvo, J.L., Ślȩzak, D. (eds) Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011. Advances in Intelligent and Soft Computing, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19644-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-19644-7_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19643-0
Online ISBN: 978-3-642-19644-7
eBook Packages: EngineeringEngineering (R0)