Creating Ontologies for Content Representation—The OntoSeed Suite

  • Elena Paslaru Bontas Simperl
  • David Schlangen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4601)


Due to the inherent difficulties associated with manual ontology building, knowledge acquisition approaches such as ontology reuse or ontology learning from texts are often seen as instruments that can make this tedious process easier. In this paper we present a NLP-based method to aid ontology design in a specific application scenario, namely that in which the resulting ontology is used to support the semantic annotation of text documents. The proposed method uses the World Wide Web in its analysis of the domain-specific documents, thereby greatly reducing the need for linguistic expertise and resources, and suggests ways to specify domain ontologies in a “linguistics-friendly” format in order to improve further ontology-based natural language processing tasks such as semantic annotation. We present a thorough evaluation of the method, using corpora from three diverse real-world settings (medical information, tourism, and recipes). Additionally, for the first scenario we compare the costs and the benefits of the NLP-based ontology engineering approach against a similar, reuse-oriented experiment.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baroni, M., Bernardini, S.: BootCaT: Bootstrapping Corpora and Terms from the Web. In: Proceedings of the International Language Resources Conference (LREC 2004), May 2004, Lisbon, Portugal, pp. 1313–1316 (2004)Google Scholar
  2. 2.
    Bateman, J.A.: The Theoretical Status of Ontologies in Natural Language Processing. In: Preuβ, S., Schmitz, B. (eds.) Proceedings of the Workshop on Text Representation and Domain Modelling, Technische Universität, Berlin (1992)Google Scholar
  3. 3.
    Bontcheva, K., Cunnigham, H., Tablan, V., Maynard, D., Saggion, H.: Developing Reusable and Robust Language Processing Components for Information Systems using GATE. In: NLIS 2002. Proceedings of the 3rd International Workshop on Natural Language and Information Systems, pp. 223–227. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  4. 4.
    Buitelaar, P., Olejnik, D., Sintek, M.: A Protege Plug-In for Ontology Extraction from Text Based on Linguistic Analysis. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Ceusters, W., Smith, B., Flanagan, J.: Ontology and Medical Terminology: Why Description Logics are Not Enough. In: Proceedings Towards An Electronic Patient Record TEPR2003, CD–ROM (2003)Google Scholar
  6. 6.
    Cimiano, P., Handschuh, S., Staab, S.: Towards the Self-Annotating Web. In: Proceedings of the 13th International World Wide Web Conference (WWW-2004), New York, USA, pp. 462–471 (2004)Google Scholar
  7. 7.
    The Gene Ontology Consortium: Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25, 25–30 (2000)Google Scholar
  8. 8.
    Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)Google Scholar
  9. 9.
    Dittenbach, M., Berger, H., Merll, D.: Improving Domain Ontologies by Mining Semantics from Text. In: Proceedings of the 1st Asian-Pacific Conference on Conceptual Modelling, pp. 91–100. Australian Computer Society, Inc. (2004)Google Scholar
  10. 10.
    Drouin, P.: Detection of Domain Specific Terminology Using Corpora Comparison. In: Proceedings of the International Language Resources Conference LREC 2004, May 2004, Lisbon, Portugal (2004)Google Scholar
  11. 11.
    Faure, D., Poibeau, T.: First Experiments of Using Semantic Knowledge Learned by ASIUM for Information Extraction Task Using INTEX. In: Proceedings of the Ontology Learning ECAI-2000 Workshop, pp. 7–12 (2000)Google Scholar
  12. 12.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge, USA (1998)MATHGoogle Scholar
  13. 13.
    Gangemi, A., Pisanelli, D.M., Steve, G.: An Overview of the ONIONS Project: Applying Ontologies to the Integration of Medical Terminologies. Data Knowledge Engineering 31(2), 183–220 (1999)MATHCrossRefGoogle Scholar
  14. 14.
    Golbeck, J., Fragoso, G., Hartel, F., Hendler, J., Parsia, B., Oberthaler, J.: The National Cancer Institute’s Thesaurus and Ontology. Journal of Web Semantics 1(1) (2003)Google Scholar
  15. 15.
    Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. Springer, Heidelberg (2003)Google Scholar
  16. 16.
    Gurevych, I., Porzel, R., Slinko, E., Pfleger, N., Alexandersson, J., Merten, S.: Less is more: using a single knowledge representation in dialogue systems. In: Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, Morristown, NJ, USA, pp. 14–21, Association for Computational Linguistics (2003)Google Scholar
  17. 17.
    Hahn, U., Schnattinger, K.: Towards Text Knowledge Engineering. In: Proceedings of the AAAI/IAAI, pp. 524–531 (1998)Google Scholar
  18. 18.
    Hobbs, J.R., Croft, W., Davies, T., Edwards, D., Laws, K.: Commonsense metaphysics and lexical semantics. Compuational Linguistics 13(3–4), 241–250 (1987)Google Scholar
  19. 19.
    Jarrar, M., Meersman, R.: Formal Ontology Engineering in the DOGMA Approach. In: Meersman, R., Tari, Z., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519, pp. 1238–1254. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Junichi, T., Ananiadou, S.: Thesaurus or logical onotology, which do we need for mining text? Language Resources and Evaluation 39(1), 77–90 (2005)CrossRefGoogle Scholar
  21. 21.
    Kageura, K., Umino, B.: Methods of Automatic Term Recognition. Terminology 3(2), 259–289 (1996)Google Scholar
  22. 22.
    Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29(3), 333–348 (2003)CrossRefMathSciNetGoogle Scholar
  23. 23.
    KnowledgeWeb European Project: Prototypical Business Use Cases (Deliverable D1.1.2 KnoweldgeWeb FP6-507482) (2004)Google Scholar
  24. 24.
    Lee, L.: Measures of Distributional Similarity. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, USA, pp. 25–32 (1999)Google Scholar
  25. 25.
    Maedche, A., Staab, S.: Semi-automatic Engineering of Ontologies from Text. In: Proceedings of the 12th International Conference on Software Engineering and Knowledge Engineering SEKE 2000, pp. 231–239 (2000)Google Scholar
  26. 26.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, USA (1999)MATHGoogle Scholar
  27. 27.
    Nirenburg, S., Raskin, V.: The Subworld Concept Lexicon and the Lexicon Management System. Computational Linguistics 13(3–4) (1987)Google Scholar
  28. 28.
    Paslaru Bontas, E., Mochol, M., Tolksdorf, R.: Case Studies in Ontology Reuse. In: Proceedings of the 5th International Conference on Knowledge Management IKNOW05 (2005)Google Scholar
  29. 29.
    Paslaru-Bontas, E., Schlangen, D., Schrader, T.: Creating Ontologies for Content Representation – the OntoSeed Suite. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. LNCS, vol. 3761, pp. 1296–1313. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Paslaru Bontas, E., Tietz, S., Tolksdorf, R., Schrader, T.: Generation and Management of a Medical Ontology in a Semantic Web Retrieval System. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. LNCS, vol. 3290, pp. 637–653. Springer, Heidelberg (2004)Google Scholar
  31. 31.
    Pisanelli, D.M., Gangemi, A., Steve, G.: Ontological Analysis of the UMLS Metathesaurus. JAMIA 5, 810–814 (1998)Google Scholar
  32. 32.
    Reinberger, M.L., Spyns, P.: Discovering Knowledge in Texts for the Learning of DOGMA-inspired Ontologies. In: Proceedings of the ECAI-2004 Workshop Ontology Learning and Population, August 2004, Valencia, Spain, pp. 19–24 (2004)Google Scholar
  33. 33.
    Schlangen, D., Stede, M., Paslaru Bontas, E.: Feeding OWL: Extracting and Representing the Content of Pathology Reports. In: Proceedings of the NLPXML Workshop 2004 (2004)Google Scholar
  34. 34.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994)Google Scholar
  35. 35.
    Schulze-Kremer, S., Smith, B., Kumar, A.: Revising the UMLS Semantic Network. In: Proceedings of the Medinfo 2004 (2004)Google Scholar
  36. 36.
    Smith, B., Williams, J., Schulze-Kremer, S.: The Ontology of GeneOntology. In: Proceedings of the AMIA (2003)Google Scholar
  37. 37.
    Stede, M., Schlangen, D.: Information-Seeking Chat: Dialogues Driven by Topic-Structure. In: Proceedings of Catalog (the 8th Workshop on the Semantics and Pragmatics of Dialogue SemDial 2004), pp. 117–124 (2004)Google Scholar
  38. 38.
    Sure, Y., Staab, S., Studer, R.: Methodology for Development and Employment of Ontology based Knowledge Management Applications. In: Meersman, R., Sheth, A. (eds.) SIGMOD Record – Web Edition, vol. 31(4), Special Section on Semantic Web and Data Management (December 2002), available at
  39. 39.
    Sure, Y., Tempich, C., Vrandecic, D.: Ontology Engineering Methodologies. In: Semantic Web Technologies: Trends and Research in Ontology-based Systems, Wiley, UK (2006)Google Scholar
  40. 40.
    Tempich, C., Pinto, H.S., Sure, Y., Staab, S.: An Argumentation Ontology for DIstributed, Loosely-controlled and evolvInG Engineering processes of oNTologies (DILIGENT). In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, Springer, Heidelberg (2005)Google Scholar
  41. 41.
    Tolksdorf, R., Paslaru Bontas, E.: Organizing Knowledge in a Semantic Web for Pathology. In: Proceedings of the NetObjectDays Conference, pp. 39–54 (2004)Google Scholar
  42. 42.
    Zipf, G.K.: Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, MA, USA (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Elena Paslaru Bontas Simperl
    • 1
  • David Schlangen
    • 2
  1. 1.Freie Universität Berlin, Institut für Informatik, AG Netzbasierte Informationssysteme, Takustr. 9, 14195 BerlinGermany
  2. 2.Universität Potsdam, Institut für Linguistik, Angewandte Computerlinguistik, P.O. Box 601553, 14415 PotsdamGermany

Personalised recommendations