Automating the Generation of Semantic Annotation Tools Using a Clustering Technique

  • Vitór Souza
  • Nicola Zeni
  • Nadzeya Kiyavitskaya
  • Periklis Andritsos
  • Luisa Mich
  • John Mylopoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5039)

Abstract

In order to generate semantic annotations for a collection of documents, one needs an annotation schema consisting of a semantic model (a.k.a. ontology) along with lists of linguistic indicators (keywords and patterns) for each concept in the ontology. The focus of this paper is the automatic generation of the linguistic indicators for a given semantic model and a corpus of documents. Our approach needs a small number of user-defined seeds and bootstraps itself by exploiting a novel clustering technique. The baseline for this work is the Cerno project [8] and the clustering algorithm LIMBO [2]. We also present results that compare the output of the clustering algorithm with linguistic indicators created manually for two case studies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amardeilh, F.: OntoPop or how to annotate documents and populate ontologies from texts. In: Proc. of the ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation, Budva, Montenegro (2006)Google Scholar
  2. 2.
    Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: Scalable Clustering of Categorical Data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)Google Scholar
  3. 3.
    Breaux, T.D., Vail, M.W., Antón, A.I.: Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proc. of RE 2006, Washington, DC, USA, pp. 46–55. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  4. 4.
    Celjuska, D., Vargas-Vera, M.: Ontosophie: A Semi-Automatic System for Ontology Population from Text. In: Proc. of ICON 2004, Hyderabad, India (2004)Google Scholar
  5. 5.
    Cimiano, P., Völker, J.: Towards Large-Scale, Open-Domain and Ontology-Based Named Entity Classification. In: Proceedings of RANLP 2005, pp. 166–172 (2005)Google Scholar
  6. 6.
    Hearst, M.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Jardine, N., Sibson, R.: The construction of hierarchic and non-hierarchic classifications. The Computer Journal 11, 117–184 (1968)Google Scholar
  8. 8.
    Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Text mining through semi automatic semantic annotation. In: Reimer, U., Karagiannis, D. (eds.) PAKM 2006. LNCS (LNAI), vol. 4333, pp. 143–154. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Annotating Accommodation Advertisements using CERNO. In: Proc. of ENTER 2007, pp. 389–400. Springer, Wien (2007)Google Scholar
  10. 10.
    Tanev, H., Magnini, B.: Weakly Supervised Approaches for Ontology Population. In: Proc. of EACL 2006, Trento, Italy (2006)Google Scholar
  11. 11.
    Tishby, N., Pereira, F.C., Bialek, W.: The Information Bottleneck Method. In: 37th Annual Allerton Conf. on Communication, Control and Computing (1999)Google Scholar
  12. 12.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Vitór Souza
    • 1
  • Nicola Zeni
    • 1
  • Nadzeya Kiyavitskaya
    • 1
  • Periklis Andritsos
    • 1
  • Luisa Mich
    • 2
  • John Mylopoulos
    • 1
  1. 1.Dept. of Information Engineering and Computer Science  
  2. 2.Dept. of Computer and Management SciencesUniversity of TrentoItaly

Personalised recommendations