Integrated Document Browsing and Data Acquisition for Building Large Ontologies
Named entities (e.g., “Kofi Annan”, “Coca-Cola”, “Second World War”) are ubiquitous in web pages and other types of document and often provide a simplified picture of the document’s content. We present an ontology currently containing 31,000 named entities in different languages from various domains such as history, geography, politics, sports, arts, etc., which is being developed at the University of Munich (LMU). The underlying graph data model is simple and yet extremely versatile in different application scenarios. We demonstrate a prototype of a graphical interface to both the ontology and to documents on the web or in a local document repository, with a tight interaction in both directions. Occurrences of concepts from the ontology are highlighted and hyperlinked in the documents. Unrecognized entities could be added to the database and related to other concepts in a semiautomatic process. The entity database can also be used for extending full-text queries on the web or the repository to semantically close documents, and for indexing different kinds of named entities in the document repository. Similar to a programming IDE, the system illustrates how integrated browsing, search and update functionality contributes to the construction of high-quality ontologies, fundamental to the vision of a truly “semantic” web.
KeywordsResource Description Framework Integrate Development Environment Local Introduction Document Repository Java Server Pages
Unable to display preview. Download preview PDF.
- 1.Dean, M., Schreiber, G.: OWL Web Ontology Language Ref., W3C Rec. (2005)Google Scholar
- 2.Klyne, G., Carroll, J.J.: Resource Description Framework, W3C Rec. (2005)Google Scholar
- 4.Noy, N.F., Sintek, M., et al.: Creating Semantic Web Contents with Protege-2000. IEEE Intelligent Systems 16, 60–71 (2001)Google Scholar
- 5.Wikipedia: The Free Encyclopedia, http://www.wikipedia.org
- 7.Brunner, L., Schulz, K.U., Weigel, F.: Organizing Thematic, Geographic and Temporal Knowledge in a Well-founded Navigation Space: Logical and Algorithmic Foundations for EFGT Nets. J. Web Serv. Research, Spec. Issue Semantically Augmented Metadata for Services, Grids, and Software Engin. (in press, 2006)Google Scholar
- 8.Mihov, S., Schulz, K.U.: Efficient Dictionary-Based Text Rewriting using Subsequential Transducers. Journal of Natural Language Engineering (2005)Google Scholar
- 10.Carr, L., Hall, W., Bechhofer, S., Goble, C.: Conceptual Linking: Ontology-based Open Hypermedia. In: Proc. 10th Int. World Wide Web Conf., pp. 334–342 (2001)Google Scholar
- 11.Cunningham, H., Humphreys, K., et al.: GATE – a General Architecture for Text Engineering. In: Proc. 5th Applied Natural Lang. Processing Conf., pp. 29–30 (1997)Google Scholar