Automatic construction and enrichment of informal ontologies: A survey

Abstract

The conceptualization of knowledge required for an efficient processing of textual data is usually represented as ontologies. Depending on the knowledge domain and tasks, different types of ontologies are constructed: formal ontologies, which involve axioms and detailed relations between concepts; taxonomies, which are hierarchically organized concepts; and informal ontologies, such as Internet encyclopedias created and maintained by user communities. Manual construction of ontologies is a time-consuming and costly process requiring the participation of experts; therefore, in recent years, there have appeared many systems that automate this process in a greater or lesser degree. This paper provides an overview of methods for automatic construction and enrichment of ontologies, with the focus being placed on informal ontologies.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Biemann, C., Ontology Learning from Text: A Survey of Methods, LDV Forum, 2005, vol. 20, pp. 75–93.

    Google Scholar 

  2. 2.

    Navigli, R., Velardi, P., and Faralli, S., A Graph-Based Algorithm for Inducing Lexical Taxonomies From Scratch, Proc. of the Twenty-Second Int. Joint Conf. on Artificial Intelligence, 2011, pp. 1872–1877.

  3. 3.

    Karkaletsis, V., Fragkou, P., Petasis, G., and Iosif, E., Ontology Based Information Extraction from Text, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Paliouras, G., Spyropoulos, C., and Tsatsaronis, G., Eds., Berlin/Heidelberg: Springer, 2011, pp. 89–109.

    Google Scholar 

  4. 4.

    Unger, C. and Cimiano, P., Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web, in Natural Language Processing and Information Systems, Berlin/Heidelberg: Springer, 2011, pp. 153–160.

    Google Scholar 

  5. 5.

    Jimeno-Yepes, A., Berlanga-Llavori, R., and Rebholz-Schuhmann, D., Ontology Refinement for Improved Information Retrieval, Information Processing Management, 2010, vol. 46, no. 4, pp. 426–435.

    Article  Google Scholar 

  6. 6.

    Grineva, M., Turdakov, D., and Sysoev, A., Blognoon: Exploring a Topic in the Blogosphere, Proc. of the 20th Int. Conf. Companion on World Wide Web, Hyderabad, India, 2011, pp. 213–216.

  7. 7.

    Miller, G.A., Wordnet: A Lexical Database for English, Commun. ACM, 1995, vol. 38, no. 11, pp. 39–41.

    Article  Google Scholar 

  8. 8.

    Roget, P.M., Roget’s Thesaurus of English Words and Phrases, London: Longman, 1852.

    Google Scholar 

  9. 9.

    Suchanek, F.M., Kasneci, G., and Weikum, G., Yago: A Large Ontology from Wikipedia and Wordnet, Web Semantics: Sci., Services Agents World Wide Web, 2008, vol. 6, no. 3, pp. 203–217.

    Article  Google Scholar 

  10. 10.

    Ivannikov, V., Turdakov, D., and Nedumov, Y., Fast Text Annotation with Linked Data, Eighth Int. Conf. on Computer Science and Information Technologies, Yerevan, Armenia, 2011.

  11. 11.

    Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the 17th ACM Conf. on Information and Knowledge Management, 2008, pp. 509–518.

  12. 12.

    Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management, 2007, pp. 233–242.

  13. 13.

    Gruber, T.R., Towards Principles for the Design of Ontologies Used for Knowledge Sharing, Int. J. Hum.-Comput. Stud., 1995, vol. 43, pp. 907–928.

    Article  Google Scholar 

  14. 14.

    Faatz, A., Hörmann, S., Seeberg, C., and Steinmetz, R., Conceptual Enrichment of Ontologies by means of a Generic and Configurable Approach, Proc. of the ESS-LLI 2001 Workshop on Semantic Knowledge Acquisition and Categorisation, 2001.

  15. 15.

    Sowa, J.F., Ontology, 2003. http://www.jfsowa.com/ontology

  16. 16.

    Zhang, W., Yoshida, T., and Tang, X., Using Ontology to Improve Precision of Terminology Extraction from Documents, Expert Syst. Appl., 2009, vol. 36, no. 5, pp. 9333–9339.

    Article  Google Scholar 

  17. 17.

    Buitelaar, P., Cimiano, P., and Magnini, B., Ontology Learning from Text: Methods, Evaluation and Applications, in Frontiers in Artificial Intelligence and Applications, IOS, 2005.

  18. 18.

    Drumond, L. and Girardi, R., A Survey of Ontology Learning Procedures, Proc. of the 3rd Workshop on Ontologies and Their Applications, 2008.

  19. 19.

    Cimiano, P., Ontology Learning and Population from Text: Algorithms, Evaluation and Applications, in Studies in Philosophy and Religion, Springer, 2006.

  20. 20.

    van den Heuvel, E., Taxonomy Learning: A Survey of Approaches, 2009. http://oaithesis.eur.nl/ir/repub/asset/4930/4930-Heuvel.pdf

  21. 21.

    Pazienza, M., Pennacchiotti, M., and Zanzotto, F., Terminology Extraction: An Analysis of Linguistic and Statistical Approaches, Knowledge Mining, ser.: Studies in Fuzziness and Soft Computing, Sirmakessis, S., Ed., Berlin/Heidelberg: Springer, 2005, vol. 185, pp. 255–279.

    Google Scholar 

  22. 22.

    Kageura, K. and Umino, B., Methods of Automatic Term Recognition: A Review, Terminology, 1996, vol. 3, no. 2, pp. 259–289.

    Article  Google Scholar 

  23. 23.

    Daille, B., Habert, B., Jacquemin, C., and Royaute’, J., Empirical Observation of Term Variations and Principles for Their Description, Terminology, 1996, vol. 3, no. 2, pp. 197–257.

    Article  Google Scholar 

  24. 24.

    Ananiadou, S., A Methodology for Automatic Term Recognition, Proc. of the 15th Conf. on Computational Linguistics, 1994, vol. 2, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 1034–1038.

    Google Scholar 

  25. 25.

    Nazar, R., A Statistical Approach to Term Extraction, Int. J. Engl. Stud., 2011, vol. 11, no. 2, pp. 159–182.

    MathSciNet  Google Scholar 

  26. 26.

    Wermter, J. and Hahn, U., You Can’t Beat Frequency (Unless You Use Linguistic Knowledge): A Qualitative Evaluation of Association Measures for Collocation and Term Extraction, Proc. of the 21st Int. Conf. on Computational Linguistics, 2006.

  27. 27.

    Evert, S. and Krenn, B., Methods for the Qualitative Evaluation Lexical Association Measures, Proc. of the 39th Annual Meeting on Association for Computational Linguistics, 2001, Stroudsburg, PA, USA: Association for Computational Linguistics, 2001, pp. 188–195.

    Google Scholar 

  28. 28.

    Wermter, J. and Hahn, U., Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms, Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, 2005, Stroudsburg, PA, USA: Association for Computational Linguistics, 2005, pp. 843–850.

    Google Scholar 

  29. 29.

    Frantzi, K.T. and Ananiadou, S., Extracting Nested Collocations, Proc. of the 16th Conf. on Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 1996, vol. 1, pp. 41–46.

    Google Scholar 

  30. 30.

    Church, K.W. and Hanks, P., Word Association Norms, Mutual Information, and Lexicography, Comput. Linguist., 1990, vol. 16, no. 1, pp. 22–29.

    Google Scholar 

  31. 31.

    Manning, C.D. and Schutze, H., Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.

    Google Scholar 

  32. 32.

    Zhang, W., Yoshida, T., Ho, T.B., and Tang, X., Augmented Mutual Information for Multi-Word Extraction, Inf. Control, 2009, vol. 5, no. 2, pp. 543–554.

    Google Scholar 

  33. 33.

    Daille, B., Approche mixte pour l’extraction de terminologie: statistique lexicale et filtres linguistiques, Ph.D. Dissertation, TALANA, Universite Paris, 1994.

    Google Scholar 

  34. 34.

    Church, K.W. and Mercer, R.L., Introduction to the Special Issue on Computational Linguistics Using Large Corpora, Comput. Linguist., 1993, vol. 19, no. 1, pp. 1–24.

    Google Scholar 

  35. 35.

    Jones, L.P., Gassie, E.W., Jr., and Radhakrishnan, S., Index: The Statistical Basis for an Automatic Conceptual Phrase-Indexing System, J. Am. Soc. Inf. Sci., 1990, vol. 41, no. 2, pp. 87–97.

    Article  Google Scholar 

  36. 36.

    Hisamitsu, T. and Tsujii, J., Measuring Term Representativeness, Information Extraction in the Web Era, Pazienza, M.T., Ed., Berlin/Heidelberg: Springer, 2003, vol. 2700, pp. 45–76.

    Google Scholar 

  37. 37.

    Velardi, P., Missikoff, M., and Basili, R., Identification of Relevant Terms to Support the Construction of Domain Ontologies, Proc. of the Workshop on Human Language Technology and Knowledge Management, Stroudsburg, PA, USA: Association for Computational Linguistics, 2001, pp. 51–58.

    Google Scholar 

  38. 38.

    Bourigault, D., Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases, Proc. of Int. Conf. on Computational Linguistics, Nantes, 1992, pp. 977–981.

  39. 39.

    Salton, G., Yang, C.S., and Yu, C.T., A Theory of Term Importance in Automatic Text Analysis, J. Am. Soc. Inf. Sci., 1975, vol. 26, no. 1, pp. 33–44.

    Article  Google Scholar 

  40. 40.

    Ahrenberg, L., Term Extraction: A Review, Draft Version 091221, 2009. http://vir.liu.se/~lah/Publications/tereview-v2.pdf

  41. 41.

    Vivaldi, J. and Rodrguez, H., Using Wikipedia for Domain Terms Extraction, Proc. of the Second Workshop on the Creation, Harmonization and Application of Terminology Resources (CHAT 2012), Linkoping, Sweden: Linkoping University Electronic Press, 2012, pp. 3–10.

    Google Scholar 

  42. 42.

    Nenadie, G., Ananiadou, S., and McNaught, J., Enhancing Automatic Term Recognition through Recognition of Variation, Proc. of the 20th Int. Conf. on Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2004.

    Google Scholar 

  43. 43.

    Park, Y., Byrd, R.J., and Boguraev, B., Automatic Glossary Extraction: Beyond Terminology Identification, Proc. of the 19th Int. Conf. on Computational Linguistics, 2002, pp. 1–7.

  44. 44.

    Bol’shakova, E.I., Terminological Variance and Its Use in Automatic Text Processing, Proc. of the 11th Natl. Conf. on Artificial Intelligence with International Participation, Moscow: LENAND, 2008, vol. 2, pp. 174–182.

    Google Scholar 

  45. 45.

    Turdakov, D.Yu., Word Sense Disambiguation Methods, Programming Comput. Software, 2010, vol. 36, no. 6, pp. 309–327.

    Article  Google Scholar 

  46. 46.

    Slozhenikina, J.V., The Term: Real as Life (Why Term Can and Should Have Variants), Online J. Znanie. Perception. Ability, 2010, vol. 5.

  47. 47.

    Neshati, M., Abolhassani, H., and Rahimi, A., Taxonomy Learning Using Compound Similarity Measure, Proc. of the 2007 IEEE/WIC/ACM Int. Joint Conf. on Web Intelligence, Silicon Valley: IEEE Comput. Society, 2007, pp. 487–490.

    Google Scholar 

  48. 48.

    Maedche, A. and Staab, S., Ontology Learning, Handbook on Ontologies, Staab, S. and Studer, R., Eds., Springer, 2004, pp. 173–190.

  49. 49.

    Weber, N. and Buitelaar, P., Web-Based Ontology Learning with isolde, Proc. of the Workshop on Web Content Mining with Human Language at the Int. Semantic Web Conf., 2006.

  50. 50.

    Pekar, V. and Staab, S., Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision, Proc. of the 19th Int. Conf. on Computational Linguistics, 2002.

  51. 51.

    Hearst, M., Automatic Acquisition of Hyponyms from Large Text Corpora, Proc. of the 14th Int. Conf. on Computational Linguistics, 1992.

  52. 52.

    Kozareva, Z. and Hovy, E.H., A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web, Proc. of the 2010 Conf. on Empirical Methods in Natural Language Processing, MIT Press, 2010, pp. 1110–1118.

  53. 53.

    Navigli, R. and Velardi, P., Learning Word-Class Lattices for Definition and Hypernym Extraction, Proc. of the 48th Annu. Meeting of the Association for Computational Linguistics, 2010, pp. 1318–1327.

  54. 54.

    Edmonds, J., Optimum Branchings, J. Res. Nat. Bur. Stand., 1967, vol. 71B, pp. 233–240.

    MathSciNet  Article  Google Scholar 

  55. 55.

    Weichselbrauna, A., Wohlgenannta, G., and Scharl, A., Refining Non-Taxonomic Relation Labels with External Structured Data to Support Ontology Learning, Data Knowl. Eng., 2010, vol. 69, pp. 763–778.

    Article  Google Scholar 

  56. 56.

    Shen, M., Liu, D.-R., and Huang, Y.-S., Extracting Semantic Relations to Enrich Domain Ontologies, J. Intell. Inf. Syst., 2012, pp. 1–13. doi 10.1007/s10844012-0210-y

  57. 57.

    Booshehri, M., Zamanifar, K., and Shariatmadari, S., A New Approach to Improve Learning Non-Taxonomic Relations from Text by Using Linked Data.

  58. 58.

    Kojima, K., Watabe, H., and Tsukasa, K., Existence and Application of Common Threshold of the Degree of Association, Proc. of the Forum on Information Technology, 2004.

  59. 59.

    Deerwester, S., Indexing by Latent Semantic Analysis, J. Am. Soc. Inf. Sci., 1990, vol. 41, pp. 391–407.

    Article  Google Scholar 

  60. 60.

    Hindle, D., Noun Classification from Predicate-Argument Structures, Proc. of the 28th Annu. Meeting of the Association for Computational Linguistics, 1990, pp. 268–275.

  61. 61.

    Hagiwara, M., Ogawa, Y., and Toyama, K., PLSI Utilization for Automatic Thesaurus Construction, Proc. of the Second Int. Joint Conf. on Natural Language Processing, 2005, pp. 334–345.

  62. 62.

    Hofmann, T., Probabilistic Latent Semantic Indexing, Proc. of the 22nd Int. Conf. on Research and Development in Information Retrieval, 1999, pp. 50–57.

  63. 63.

    Hagiwara, M., Ogawa, Y., and Toyama, K., PLSI Utilization for Automatic Thesaurus Construction, Lect. Notes Comput. Sci., 2005, vol. 3651, pp. 334–345.

    Article  Google Scholar 

  64. 64.

    Mochihashi, D. and Matsumoto, Y., Probabilistic Representation of Meanings, Inf. Process. Soc. Jpn. SIG Notes Nat. Lang., 2002, no. 4, NL-147, pp. 77–84.

  65. 65.

    Hagiwara, M., Ogawa, Y., and Toyama, K., Selection of Effective Contextual Information for Automatic Synonym Acquisition, Proc. of the 21st Int. Conf. on Computational Linguistics and the 44th Annu. Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2006, ser. ACL-44, pp. 353–360.

    Google Scholar 

  66. 66.

    Briscoe, T. and Carroll, J., Robust Accurate Statistical Annotation of General Text, Proc. of the Third Int. Conf. on Language Resources and Evaluation, 2002, pp. 1499–1504.

  67. 67.

    Faatz, A. and Steinmetz, R., Ontology Enrichment with Texts from the WWW, Proc. of the ECML/PKDD Second Workshop on Semantic Web Mining, Helsinki, 2002.

  68. 68.

    Chifu, E.S. and Letia, I.A., Text-Based Ontology Enrichment using Hierarchical Self-Organizing Maps, Proc. of the Workshop on Nature Inspired Reasoning for the Semantic Web (NatuReS) at the 7th Int. Semantic Web Conf. (ISWC 2008), 2008.

  69. 69.

    Blomqvist, E., OntoCase-Automatic Ontology Enrichment Based on Ontology Design Patterns, Proc. of the Int. Semantic Web Conf. (ISWC-2009), 2009, pp. 65–80.

  70. 70.

    Valarakos, A., Paliouras, G., Karkaletsis, V., et al., Enhancing Ontological Knowledge through Ontology Population and Enrichment, Proc. of the 14th Int. Conf. on Engineering Knowledge in the Age of the Semantic Web (EKAW-2004), 2004, pp. 144–156.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to N. A. Astrakhantsev.

Additional information

Original Russian Text © N.A. Astrakhantsev, D.Yu. Turdakov, 2013, published in Programmirovanie, 2013, Vol. 39, No. 1.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Astrakhantsev, N.A., Turdakov, D.Y. Automatic construction and enrichment of informal ontologies: A survey. Program Comput Soft 39, 34–42 (2013). https://doi.org/10.1134/S0361768813010039

Download citation

Keywords

  • Mutual Information
  • Semantic Similarity
  • Domain Ontology
  • Computational Linguistics
  • Formal Ontology