Integrated Document Browsing and Data Acquisition for Building Large Ontologies

  • Felix Weigel
  • Klaus U. Schulz
  • Levin Brunner
  • Eduardo Torres-Schumann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4253)


Named entities (e.g., “Kofi Annan”, “Coca-Cola”, “Second World War”) are ubiquitous in web pages and other types of document and often provide a simplified picture of the document’s content. We present an ontology currently containing 31,000 named entities in different languages from various domains such as history, geography, politics, sports, arts, etc., which is being developed at the University of Munich (LMU). The underlying graph data model is simple and yet extremely versatile in different application scenarios. We demonstrate a prototype of a graphical interface to both the ontology and to documents on the web or in a local document repository, with a tight interaction in both directions. Occurrences of concepts from the ontology are highlighted and hyperlinked in the documents. Unrecognized entities could be added to the database and related to other concepts in a semiautomatic process. The entity database can also be used for extending full-text queries on the web or the repository to semantically close documents, and for indexing different kinds of named entities in the document repository. Similar to a programming IDE, the system illustrates how integrated browsing, search and update functionality contributes to the construction of high-quality ontologies, fundamental to the vision of a truly “semantic” web.


Resource Description Framework Integrate Development Environment Local Introduction Document Repository Java Server Pages 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dean, M., Schreiber, G.: OWL Web Ontology Language Ref., W3C Rec. (2005)Google Scholar
  2. 2.
    Klyne, G., Carroll, J.J.: Resource Description Framework, W3C Rec. (2005)Google Scholar
  3. 3.
    Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer, R., Wenke, D.: OntoEdit: Collaborative Ontology Engineering for the Semantic Web. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 221–235. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Noy, N.F., Sintek, M., et al.: Creating Semantic Web Contents with Protege-2000. IEEE Intelligent Systems 16, 60–71 (2001)Google Scholar
  5. 5.
    Wikipedia: The Free Encyclopedia,
  6. 6.
    Schulz, K.U., Weigel, F.: Systematics and architecture for a resource representing knowledge about named entities. In: Bry, F., Henze, N., Małuszyński, J. (eds.) PPSWR 2003. LNCS, vol. 2901, pp. 189–207. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Brunner, L., Schulz, K.U., Weigel, F.: Organizing Thematic, Geographic and Temporal Knowledge in a Well-founded Navigation Space: Logical and Algorithmic Foundations for EFGT Nets. J. Web Serv. Research, Spec. Issue Semantically Augmented Metadata for Services, Grids, and Software Engin. (in press, 2006)Google Scholar
  8. 8.
    Mihov, S., Schulz, K.U.: Efficient Dictionary-Based Text Rewriting using Subsequential Transducers. Journal of Natural Language Engineering (2005)Google Scholar
  9. 9.
    Dzbor, M., Domingue, J., Motta, E.: Magpie – towards a semantic web browser. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 690–705. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Carr, L., Hall, W., Bechhofer, S., Goble, C.: Conceptual Linking: Ontology-based Open Hypermedia. In: Proc. 10th Int. World Wide Web Conf., pp. 334–342 (2001)Google Scholar
  11. 11.
    Cunningham, H., Humphreys, K., et al.: GATE – a General Architecture for Text Engineering. In: Proc. 5th Applied Natural Lang. Processing Conf., pp. 29–30 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Felix Weigel
    • 1
  • Klaus U. Schulz
    • 1
  • Levin Brunner
    • 1
  • Eduardo Torres-Schumann
    • 1
  1. 1.Centre for Information and Language Processing (CIS)University of Munich (LMU)Germany

Personalised recommendations