Skip to main content

Automating the Generation of Semantic Annotation Tools Using a Clustering Technique

  • Conference paper
Natural Language and Information Systems (NLDB 2008)

Abstract

In order to generate semantic annotations for a collection of documents, one needs an annotation schema consisting of a semantic model (a.k.a. ontology) along with lists of linguistic indicators (keywords and patterns) for each concept in the ontology. The focus of this paper is the automatic generation of the linguistic indicators for a given semantic model and a corpus of documents. Our approach needs a small number of user-defined seeds and bootstraps itself by exploiting a novel clustering technique. The baseline for this work is the Cerno project [8] and the clustering algorithm LIMBO [2]. We also present results that compare the output of the clustering algorithm with linguistic indicators created manually for two case studies.

This work has been partially funded by the EU Commission through the SERENITY and WEE-NET projects and by Provincia Autonoma di Trento through the STAMPS project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amardeilh, F.: OntoPop or how to annotate documents and populate ontologies from texts. In: Proc. of the ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation, Budva, Montenegro (2006)

    Google Scholar 

  2. Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: Scalable Clustering of Categorical Data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)

    Google Scholar 

  3. Breaux, T.D., Vail, M.W., Antón, A.I.: Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proc. of RE 2006, Washington, DC, USA, pp. 46–55. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  4. Celjuska, D., Vargas-Vera, M.: Ontosophie: A Semi-Automatic System for Ontology Population from Text. In: Proc. of ICON 2004, Hyderabad, India (2004)

    Google Scholar 

  5. Cimiano, P., Völker, J.: Towards Large-Scale, Open-Domain and Ontology-Based Named Entity Classification. In: Proceedings of RANLP 2005, pp. 166–172 (2005)

    Google Scholar 

  6. Hearst, M.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Jardine, N., Sibson, R.: The construction of hierarchic and non-hierarchic classifications. The Computer Journal 11, 117–184 (1968)

    Google Scholar 

  8. Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Text mining through semi automatic semantic annotation. In: Reimer, U., Karagiannis, D. (eds.) PAKM 2006. LNCS (LNAI), vol. 4333, pp. 143–154. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Annotating Accommodation Advertisements using CERNO. In: Proc. of ENTER 2007, pp. 389–400. Springer, Wien (2007)

    Google Scholar 

  10. Tanev, H., Magnini, B.: Weakly Supervised Approaches for Ontology Population. In: Proc. of EACL 2006, Trento, Italy (2006)

    Google Scholar 

  11. Tishby, N., Pereira, F.C., Bialek, W.: The Information Bottleneck Method. In: 37th Annual Allerton Conf. on Communication, Control and Computing (1999)

    Google Scholar 

  12. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Souza, V., Zeni, N., Kiyavitskaya, N., Andritsos, P., Mich, L., Mylopoulos, J. (2008). Automating the Generation of Semantic Annotation Tools Using a Clustering Technique. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69858-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69857-9

  • Online ISBN: 978-3-540-69858-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics