Instanced-Based Mapping between Thesauri and Folksonomies

  • Christian Wartena
  • Rogier Brussee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5318)


The emergence of web based systems in which users can annotate items, raises the question of the semantic interoperability between vocabularies originating from collaborative annotation processes, often called folksonomies, and keywords assigned in a more traditional way. If collections are annotated according to two systems, e.g. with tags and keywords, the annotated data can be used for instance based mapping between the vocabularies. The basis for this kind of matching is an appropriate similarity measure between concepts, based on their distribution as annotations. In this paper we propose a new similarity measure that can take advantage of some special properties of user generated metadata. We have evaluated this measure with a set of articles from Wikipedia which are both classified according to the topic structure of Wikipedia and annotated by users of the bookmarking service The results using the new measure are significantly better than those obtained using standard similarity measures proposed for this task in the literature, i.e., it correlates better with human judgments. We argue that the measure also has benefits for instance based mapping of more traditionally developed vocabularies.


Dissimilarity Measure Formal Concept Analysis Annotate Data Ontology Match Video Fragment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR abs/cs/0508082 (2005)Google Scholar
  2. 2.
    Noll, M.G., Meinel, C.: Authors vs. readers: a comparative study of document metadata and content in the www. In: King, P.R., Simske, S.J. (eds.) ACM Symposium on Document Engineering, pp. 177–186. ACM, New York (2007)Google Scholar
  3. 3.
    Lux, M., Granitzer, M., Kern, R.: Aspects of broad folksonomies. In: DEXA Workshops, pp. 283–287. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  4. 4.
    Halpin, H., Robu, V., Shepherd, H.: The complex dynamics of collaborative tagging. In: WWW, pp. 211–220 (2007)Google Scholar
  5. 5.
    Hotho, A., Jäschke, R., Schmitz, C., Stumme, G.: BibSonomy: A Social Bookmark and Publication Sharing System. In: Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pp. 87–102 (2006)Google Scholar
  6. 6.
    Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (DE) (2007)zbMATHGoogle Scholar
  7. 7.
    Isaac, A., van der Meij, L., Schlobach, S., Wang, S.: An empirical study of instance-based ontology matching. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC 2007. LNCS, vol. 4825, pp. 253–266. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Stumme, G., Maedche, A.: FCA-Merge: Bottom-up merging of ontologies. In: 7th Intl. Conf. on Artificial Intelligence (IJCAI 2001), pp. 225–230 (2001)Google Scholar
  9. 9.
    Ponzetto, S.P., Strube, M.: Deriving a large-scale taxonomy from Wikipedia. In: AAAI, pp. 1440–1445. AAAI Press, Menlo Park (2007)Google Scholar
  10. 10.
    Huijsen, W.O., Wartena, C., Brussee, R.: Learning ontologies from wikipedia for semantic annotation of texts. In: Proceedings of the 13th Knowledge Management Forum, Milano (November 2008) (to appear)Google Scholar
  11. 11.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts (1999)zbMATHGoogle Scholar
  12. 12.
    Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: DEXA Workshops. IEEE Computer Society, Los Alamitos (to appear, 2008)Google Scholar
  13. 13.
    Landauer, T., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  14. 14.
    Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. Inf. Process. Manage. 39(4), 521–541 (2003)CrossRefzbMATHGoogle Scholar
  15. 15.
    Fuglede, B., Topsoe, F.: Jensen-shannon divergence and hilbert space embedding. In: Proc. of the Internat. Symposium on Information Theory, p. 31 (2004)Google Scholar
  16. 16.
    Melenhorst, M., Grootveld, M., Veenstra, M.: Tag-based information retrieval of educational videos. EBU Technical Review Q2 (2008),
  17. 17.
    Malaisé, V., Gazendam, L., Brugman, H.: Disambiguating automatic semantic annotation based on a thesaurus structure. In: Actes de la 14e conférence sur le Traitement Automatique des Langues Naturelles, pp. 197–206 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Christian Wartena
    • 1
  • Rogier Brussee
    • 1
  1. 1.Telematica InstituutEnschedeThe Netherlands

Personalised recommendations