Abstract
The conceptualization of knowledge required for an efficient processing of textual data is usually represented as ontologies. Depending on the knowledge domain and tasks, different types of ontologies are constructed: formal ontologies, which involve axioms and detailed relations between concepts; taxonomies, which are hierarchically organized concepts; and informal ontologies, such as Internet encyclopedias created and maintained by user communities. Manual construction of ontologies is a time-consuming and costly process requiring the participation of experts; therefore, in recent years, there have appeared many systems that automate this process in a greater or lesser degree. This paper provides an overview of methods for automatic construction and enrichment of ontologies, with the focus being placed on informal ontologies.
This is a preview of subscription content, access via your institution.
References
Biemann, C., Ontology Learning from Text: A Survey of Methods, LDV Forum, 2005, vol. 20, pp. 75–93.
Navigli, R., Velardi, P., and Faralli, S., A Graph-Based Algorithm for Inducing Lexical Taxonomies From Scratch, Proc. of the Twenty-Second Int. Joint Conf. on Artificial Intelligence, 2011, pp. 1872–1877.
Karkaletsis, V., Fragkou, P., Petasis, G., and Iosif, E., Ontology Based Information Extraction from Text, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Paliouras, G., Spyropoulos, C., and Tsatsaronis, G., Eds., Berlin/Heidelberg: Springer, 2011, pp. 89–109.
Unger, C. and Cimiano, P., Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web, in Natural Language Processing and Information Systems, Berlin/Heidelberg: Springer, 2011, pp. 153–160.
Jimeno-Yepes, A., Berlanga-Llavori, R., and Rebholz-Schuhmann, D., Ontology Refinement for Improved Information Retrieval, Information Processing Management, 2010, vol. 46, no. 4, pp. 426–435.
Grineva, M., Turdakov, D., and Sysoev, A., Blognoon: Exploring a Topic in the Blogosphere, Proc. of the 20th Int. Conf. Companion on World Wide Web, Hyderabad, India, 2011, pp. 213–216.
Miller, G.A., Wordnet: A Lexical Database for English, Commun. ACM, 1995, vol. 38, no. 11, pp. 39–41.
Roget, P.M., Roget’s Thesaurus of English Words and Phrases, London: Longman, 1852.
Suchanek, F.M., Kasneci, G., and Weikum, G., Yago: A Large Ontology from Wikipedia and Wordnet, Web Semantics: Sci., Services Agents World Wide Web, 2008, vol. 6, no. 3, pp. 203–217.
Ivannikov, V., Turdakov, D., and Nedumov, Y., Fast Text Annotation with Linked Data, Eighth Int. Conf. on Computer Science and Information Technologies, Yerevan, Armenia, 2011.
Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the 17th ACM Conf. on Information and Knowledge Management, 2008, pp. 509–518.
Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management, 2007, pp. 233–242.
Gruber, T.R., Towards Principles for the Design of Ontologies Used for Knowledge Sharing, Int. J. Hum.-Comput. Stud., 1995, vol. 43, pp. 907–928.
Faatz, A., Hörmann, S., Seeberg, C., and Steinmetz, R., Conceptual Enrichment of Ontologies by means of a Generic and Configurable Approach, Proc. of the ESS-LLI 2001 Workshop on Semantic Knowledge Acquisition and Categorisation, 2001.
Sowa, J.F., Ontology, 2003. http://www.jfsowa.com/ontology
Zhang, W., Yoshida, T., and Tang, X., Using Ontology to Improve Precision of Terminology Extraction from Documents, Expert Syst. Appl., 2009, vol. 36, no. 5, pp. 9333–9339.
Buitelaar, P., Cimiano, P., and Magnini, B., Ontology Learning from Text: Methods, Evaluation and Applications, in Frontiers in Artificial Intelligence and Applications, IOS, 2005.
Drumond, L. and Girardi, R., A Survey of Ontology Learning Procedures, Proc. of the 3rd Workshop on Ontologies and Their Applications, 2008.
Cimiano, P., Ontology Learning and Population from Text: Algorithms, Evaluation and Applications, in Studies in Philosophy and Religion, Springer, 2006.
van den Heuvel, E., Taxonomy Learning: A Survey of Approaches, 2009. http://oaithesis.eur.nl/ir/repub/asset/4930/4930-Heuvel.pdf
Pazienza, M., Pennacchiotti, M., and Zanzotto, F., Terminology Extraction: An Analysis of Linguistic and Statistical Approaches, Knowledge Mining, ser.: Studies in Fuzziness and Soft Computing, Sirmakessis, S., Ed., Berlin/Heidelberg: Springer, 2005, vol. 185, pp. 255–279.
Kageura, K. and Umino, B., Methods of Automatic Term Recognition: A Review, Terminology, 1996, vol. 3, no. 2, pp. 259–289.
Daille, B., Habert, B., Jacquemin, C., and Royaute’, J., Empirical Observation of Term Variations and Principles for Their Description, Terminology, 1996, vol. 3, no. 2, pp. 197–257.
Ananiadou, S., A Methodology for Automatic Term Recognition, Proc. of the 15th Conf. on Computational Linguistics, 1994, vol. 2, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 1034–1038.
Nazar, R., A Statistical Approach to Term Extraction, Int. J. Engl. Stud., 2011, vol. 11, no. 2, pp. 159–182.
Wermter, J. and Hahn, U., You Can’t Beat Frequency (Unless You Use Linguistic Knowledge): A Qualitative Evaluation of Association Measures for Collocation and Term Extraction, Proc. of the 21st Int. Conf. on Computational Linguistics, 2006.
Evert, S. and Krenn, B., Methods for the Qualitative Evaluation Lexical Association Measures, Proc. of the 39th Annual Meeting on Association for Computational Linguistics, 2001, Stroudsburg, PA, USA: Association for Computational Linguistics, 2001, pp. 188–195.
Wermter, J. and Hahn, U., Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms, Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, Stroudsburg, PA, USA: Association for Computational Linguistics, 2005, pp. 843–850.
Frantzi, K.T. and Ananiadou, S., Extracting Nested Collocations, Proc. of the 16th Conf. on Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 1996, vol. 1, pp. 41–46.
Church, K.W. and Hanks, P., Word Association Norms, Mutual Information, and Lexicography, Comput. Linguist., 1990, vol. 16, no. 1, pp. 22–29.
Manning, C.D. and Schutze, H., Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.
Zhang, W., Yoshida, T., Ho, T.B., and Tang, X., Augmented Mutual Information for Multi-Word Extraction, Inf. Control, 2009, vol. 5, no. 2, pp. 543–554.
Daille, B., Approche mixte pour l’extraction de terminologie: statistique lexicale et filtres linguistiques, Ph.D. Dissertation, TALANA, Universite Paris, 1994.
Church, K.W. and Mercer, R.L., Introduction to the Special Issue on Computational Linguistics Using Large Corpora, Comput. Linguist., 1993, vol. 19, no. 1, pp. 1–24.
Jones, L.P., Gassie, E.W., Jr., and Radhakrishnan, S., Index: The Statistical Basis for an Automatic Conceptual Phrase-Indexing System, J. Am. Soc. Inf. Sci., 1990, vol. 41, no. 2, pp. 87–97.
Hisamitsu, T. and Tsujii, J., Measuring Term Representativeness, Information Extraction in the Web Era, Pazienza, M.T., Ed., Berlin/Heidelberg: Springer, 2003, vol. 2700, pp. 45–76.
Velardi, P., Missikoff, M., and Basili, R., Identification of Relevant Terms to Support the Construction of Domain Ontologies, Proc. of the Workshop on Human Language Technology and Knowledge Management, Stroudsburg, PA, USA: Association for Computational Linguistics, 2001, pp. 51–58.
Bourigault, D., Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases, Proc. of Int. Conf. on Computational Linguistics, Nantes, 1992, pp. 977–981.
Salton, G., Yang, C.S., and Yu, C.T., A Theory of Term Importance in Automatic Text Analysis, J. Am. Soc. Inf. Sci., 1975, vol. 26, no. 1, pp. 33–44.
Ahrenberg, L., Term Extraction: A Review, Draft Version 091221, 2009. http://vir.liu.se/~lah/Publications/tereview-v2.pdf
Vivaldi, J. and Rodrguez, H., Using Wikipedia for Domain Terms Extraction, Proc. of the Second Workshop on the Creation, Harmonization and Application of Terminology Resources (CHAT 2012), Linkoping, Sweden: Linkoping University Electronic Press, 2012, pp. 3–10.
Nenadie, G., Ananiadou, S., and McNaught, J., Enhancing Automatic Term Recognition through Recognition of Variation, Proc. of the 20th Int. Conf. on Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2004.
Park, Y., Byrd, R.J., and Boguraev, B., Automatic Glossary Extraction: Beyond Terminology Identification, Proc. of the 19th Int. Conf. on Computational Linguistics, 2002, pp. 1–7.
Bol’shakova, E.I., Terminological Variance and Its Use in Automatic Text Processing, Proc. of the 11th Natl. Conf. on Artificial Intelligence with International Participation, Moscow: LENAND, 2008, vol. 2, pp. 174–182.
Turdakov, D.Yu., Word Sense Disambiguation Methods, Programming Comput. Software, 2010, vol. 36, no. 6, pp. 309–327.
Slozhenikina, J.V., The Term: Real as Life (Why Term Can and Should Have Variants), Online J. Znanie. Perception. Ability, 2010, vol. 5.
Neshati, M., Abolhassani, H., and Rahimi, A., Taxonomy Learning Using Compound Similarity Measure, Proc. of the 2007 IEEE/WIC/ACM Int. Joint Conf. on Web Intelligence, Silicon Valley: IEEE Comput. Society, 2007, pp. 487–490.
Maedche, A. and Staab, S., Ontology Learning, Handbook on Ontologies, Staab, S. and Studer, R., Eds., Springer, 2004, pp. 173–190.
Weber, N. and Buitelaar, P., Web-Based Ontology Learning with isolde, Proc. of the Workshop on Web Content Mining with Human Language at the Int. Semantic Web Conf., 2006.
Pekar, V. and Staab, S., Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision, Proc. of the 19th Int. Conf. on Computational Linguistics, 2002.
Hearst, M., Automatic Acquisition of Hyponyms from Large Text Corpora, Proc. of the 14th Int. Conf. on Computational Linguistics, 1992.
Kozareva, Z. and Hovy, E.H., A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web, Proc. of the 2010 Conf. on Empirical Methods in Natural Language Processing, MIT Press, 2010, pp. 1110–1118.
Navigli, R. and Velardi, P., Learning Word-Class Lattices for Definition and Hypernym Extraction, Proc. of the 48th Annu. Meeting of the Association for Computational Linguistics, 2010, pp. 1318–1327.
Edmonds, J., Optimum Branchings, J. Res. Nat. Bur. Stand., 1967, vol. 71B, pp. 233–240.
Weichselbrauna, A., Wohlgenannta, G., and Scharl, A., Refining Non-Taxonomic Relation Labels with External Structured Data to Support Ontology Learning, Data Knowl. Eng., 2010, vol. 69, pp. 763–778.
Shen, M., Liu, D.-R., and Huang, Y.-S., Extracting Semantic Relations to Enrich Domain Ontologies, J. Intell. Inf. Syst., 2012, pp. 1–13. doi 10.1007/s10844012-0210-y
Booshehri, M., Zamanifar, K., and Shariatmadari, S., A New Approach to Improve Learning Non-Taxonomic Relations from Text by Using Linked Data.
Kojima, K., Watabe, H., and Tsukasa, K., Existence and Application of Common Threshold of the Degree of Association, Proc. of the Forum on Information Technology, 2004.
Deerwester, S., Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci., 1990, vol. 41, pp. 391–407.
Hindle, D., Noun Classification from Predicate-Argument Structures, Proc. of the 28th Annu. Meeting of the Association for Computational Linguistics, 1990, pp. 268–275.
Hagiwara, M., Ogawa, Y., and Toyama, K., PLSI Utilization for Automatic Thesaurus Construction, Proc. of the Second Int. Joint Conf. on Natural Language Processing, 2005, pp. 334–345.
Hofmann, T., Probabilistic Latent Semantic Indexing, Proc. of the 22nd Int. Conf. on Research and Development in Information Retrieval, 1999, pp. 50–57.
Hagiwara, M., Ogawa, Y., and Toyama, K., PLSI Utilization for Automatic Thesaurus Construction, Lect. Notes Comput. Sci., 2005, vol. 3651, pp. 334–345.
Mochihashi, D. and Matsumoto, Y., Probabilistic Representation of Meanings, Inf. Process. Soc. Jpn. SIG Notes Nat. Lang., 2002, no. 4, NL-147, pp. 77–84.
Hagiwara, M., Ogawa, Y., and Toyama, K., Selection of Effective Contextual Information for Automatic Synonym Acquisition, Proc. of the 21st Int. Conf. on Computational Linguistics and the 44th Annu. Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2006, ser. ACL-44, pp. 353–360.
Briscoe, T. and Carroll, J., Robust Accurate Statistical Annotation of General Text, Proc. of the Third Int. Conf. on Language Resources and Evaluation, 2002, pp. 1499–1504.
Faatz, A. and Steinmetz, R., Ontology Enrichment with Texts from the WWW, Proc. of the ECML/PKDD Second Workshop on Semantic Web Mining, Helsinki, 2002.
Chifu, E.S. and Letia, I.A., Text-Based Ontology Enrichment using Hierarchical Self-Organizing Maps, Proc. of the Workshop on Nature Inspired Reasoning for the Semantic Web (NatuReS) at the 7th Int. Semantic Web Conf. (ISWC 2008), 2008.
Blomqvist, E., OntoCase-Automatic Ontology Enrichment Based on Ontology Design Patterns, Proc. of the Int. Semantic Web Conf. (ISWC-2009), 2009, pp. 65–80.
Valarakos, A., Paliouras, G., Karkaletsis, V., et al., Enhancing Ontological Knowledge through Ontology Population and Enrichment, Proc. of the 14th Int. Conf. on Engineering Knowledge in the Age of the Semantic Web (EKAW-2004), 2004, pp. 144–156.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © N.A. Astrakhantsev, D.Yu. Turdakov, 2013, published in Programmirovanie, 2013, Vol. 39, No. 1.
Rights and permissions
About this article
Cite this article
Astrakhantsev, N.A., Turdakov, D.Y. Automatic construction and enrichment of informal ontologies: A survey. Program Comput Soft 39, 34–42 (2013). https://doi.org/10.1134/S0361768813010039
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768813010039