Semantic Annotation, Indexing, and Retrieval

  • Atanas Kiryakov
  • Borislav Popov
  • Damyan Ognyanoff
  • Dimitar Manov
  • Angel Kirilov
  • Miroslav Goranov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2870)


The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration.

Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained.

Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.


  1. 1.
    Bontcheva, K., Kiryakov, A., Cunningham, H., Popov, B., Dimitrov, M.: Semantic Web Enabled, Open Source Language Technology. In: proc. of EACL Workshop Language Technology and the Semantic Web NLPXML 2003, April 13 (2003)Google Scholar
  2. 2.
    Brickley, D., Guha, R.V. (eds.): Resource Description Framework (RDF) Schemas, W3C,
  3. 3.
    Carr, L., Bechhofer, S., Goble, C., Hall, W.: Conceptual Linking: Ontology-based Open Hypermedia. In: The WWW10 Conference, Hong Kong, pp. 334–342 (May)Google Scholar
  4. 4.
    Cunningham, H.: Information Extraction: a User Guide (revised version). In: Department of Computer Science, University of Sheffield (May 1999)Google Scholar
  5. 5.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)Google Scholar
  6. 6.
    Collier, N., Takeuchi, K., Kawazoe, A.: Open Ontology Forge: An Environment for Text Mining in a Semantic Web World. In: Proc. of the International Workshop on Semantic Web Foundations and Application Technologies, Nara, Japan (March 11)Google Scholar
  7. 7.
    Dean, M., Connolly, D., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., Patel-Schneider, P., Stein, L.A.: Web Ontology Language (OWL) Reference Version 1.0. In: W3C Working Draft, November 12 (2002),
  8. 8.
    Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., Robbins, D.: Stuff I’ve Seen: A system for personal information retrieval and re-use. In: proc. of SIGIR 2003, Toronto, Canada, July 28 – August 1, pp. 72–79. ACM Press, New York (2003)CrossRefGoogle Scholar
  9. 9.
    Fensel, D.: Ontology Language, v.2 (Welcome to OIL). Deliverable 2, On-To-Knowledge project (December 2001),
  10. 10.
    Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-automatic CREAtion of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 358. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Kahan, J., Koivunen, M., Prud’Hommeaux, E., Swick, R.: Annotea: An Open RDF Infrastructure for Shared Web Annotations. In: The WWW10 Conference, Hong Kong, pp. 623–632 (May)Google Scholar
  12. 12.
    Kampman, A., Harmelen, F., Broekstra, J.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 54. Springer, Heidelberg (2002)Google Scholar
  13. 13.
    Kiryakov, A., Simov, K.Iv., Ognyanov, D.: Ontology Middleware: Analysis and Design Del. 38. On-To-Knowledge (March 2002),
  14. 14.
    Kiryakov, A., Simov, K.Iv.: Ontologically Supported Semantic Matching. In: Proc. of NODALIDA 1999: Nordic Conference on Comp. Linguistics, Trondheim, December 9–10 (1999)Google Scholar
  15. 15.
    Landauer, T., Dumais, S.: A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  16. 16.
    Maedche, A., Motik, B., Stojanovic, L., Studer, R., Volz, R.: Ontologies for Enterprise Knowledge Management. IEEE Intelligent Systems 18(2), 26–33 (2003), CrossRefGoogle Scholar
  17. 17.
    Mahesh, K., Kud, J., Dixon, P.: Oracle at TREC8: A Lexical Approach. In: proc. of the Eighth Text Retrieval Conference (TREC-8) (1999)Google Scholar
  18. 18.
    Manov, D., Kiryakov, A., Popov, B., Bontcheva, K., Maynard, D., Cunningham, H.: Experiments with geographic knowledge for information extraction. In: NAACL-HLT 2003, Workshop on the Analysis of Geographic References, Canada, Edmonton, Alberta, May 31 (2003)Google Scholar
  19. 19.
    Maynard, D., Tablan, V., Bontcheva, K., Cunningham, H., Wilks, Y.: MUlti-Source Entity recognition – an Information Extraction System for Diverse Text Types. Technical report CS–02–03, Univ. of Sheffield, Dep. of CS (2003),
  20. 20.
    Moldovan, D., Mihalcea, R.: Document Indexing Using Named Entities. Studies in Informatics and Control 10(1) (March 2001)Google Scholar
  21. 21.
    Noy, N., Musen, M.: Ontology Versioning as an Element of an Ontology-Management Framework. IEEE Intelligent Systems (2003) (to appear)Google Scholar
  22. 22.
    Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – Semantic Annotation Platform. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003) (to appear)CrossRefGoogle Scholar
  23. 23.
    Pustejovsky, J., Boguraev, B., Verhagen, M., Buitelaar, P., Johnston, M.: Semantic Indexing and Typed Hyperlinking. In: Proc. of the AAAI Conference, Spring Symposium, NLP for WWW, Stanford University, CA, pp. 120–128 (1997)Google Scholar
  24. 24.
    van Ossenbruggen, J., Hardman, L., Rutledge, L.: Hypermedia and the Semantic Web: A Research Agenda. Journal of Digital information 3(1) (May 2002)Google Scholar
  25. 25.
    Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 379. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Voorhees, E.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.) WordNet: an electronic lexical database, MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Atanas Kiryakov
    • 1
  • Borislav Popov
    • 1
  • Damyan Ognyanoff
    • 1
  • Dimitar Manov
    • 1
  • Angel Kirilov
    • 1
  • Miroslav Goranov
    • 1
  1. 1.Ontotext Lab, Sirma AI EOODSofiaBulgaria

Personalised recommendations