Artificial Intelligence Review

, Volume 40, Issue 2, pp 147–174 | Cite as

Sense induction in folksonomies: a review

  • Pierre Andrews
  • Juan Pane


Folksonomies, often known as tagging systems, such as the ones used on the popular Delicious or Flickr websites, use a very simple Knowledge Organisation System. Users have thus been quick to adopt this system and create extensive annotations on the Web. However, because of the simplicity of the folksonomy model, the semantics of the tags used is not explicit and can only be inferred from their context of use. This is a barrier for the automatic use of such Knowledge Organisation Systems by computers and new techniques have been developed to extract the semantic of the tags. In this article we discuss the drawbacks of some of these approaches and propose a generalization of the different approaches to detect new senses of terms in a folksonomy. Another weak point of the current state of the art in the field is the lack of formal evaluation methodology; we thus propose a novel evaluation framework. We introduce a dataset and evaluation methodology that enable the comparison of results between different approaches to sense induction in folksonomies. Finally we discuss the performances of different approaches to the task of homonymous/polysemous tag detection and synonymous identification.


Folksonomy Word sense disambiguation Word sense induction Tag semantic 



This work has been supported by the INSEMTIVES project (FP7-231181, see The authors would like to thank Ilya Zaihrayeu for his valuable contributions and feedback on this work. The dataset annotation system and evaluation framework code are freely available at


  1. Aberer K, Cudré-Mauroux P, Ouksel AM, Catarci T, Hacid M-S, Illarramendi A, Kashyap V, Mecella M, Mena E, Neuhold EJ, De Troyer O, Risse T, Scannapieco M, Saltor F, De Santis L, Spaccapietra S, Staab S, Studer R (2004) Emergent semantics principles and issues. In: Lee Y-J, Li J, Whang K-Y, Lee D (eds) Proceedings of the 9th international conference on database systems for advanced applications (DASFAA’04), vol 2973 of Lecture notes in computer science. Springer, Berlin, pp 25–38. ISBN 3-540-21047-4Google Scholar
  2. Alfonseca E, Manandhar S (2002a) Extending a lexical ontology by a combination of distributional semantics signatures. In: Proceedings of the 13th international conference on knowledge engineering and knowledge management. Ontologies and the semantic web, EKAW ’02. Springer, London, UK, pp 1–7. ISBN 3-540-44268-5Google Scholar
  3. Alfonseca E, Manandhar S (2002b) Proposal for evaluating ontology refinement methods. In: Proceedings of the language resources and evaluation conference (LREC-2002)Google Scholar
  4. Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr 12:461–486. ISSN 1386–4564. doi: 10.1007/s10791-008-9066-8 Google Scholar
  5. Andrews P, Pane J, Zaihrayeu I (2011) Semantic disambiguation in folksonomy: A case study. In: Bernardi R, Chambers S, Gottfried B, Segond F, Zaihrayeu I (eds) Advanced language technologies for digital libraries, vol 6699 of Lecture notes in computer science. Springer, Berlin, pp 114–134. ISBN 978-3-642-23159-9. doi: 10.1007/978-3-642-23160-5_8
  6. Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596CrossRefGoogle Scholar
  7. Au NS, Gibbins, Hadbolt N (2007) Understanding the semantics of ambiguous tags in folksonomies. In: The international workshop on emergent semantics and ontology evolution (ESOE2007) at ISWC/ASWC 2007Google Scholar
  8. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Aberer K, Choi K-S, Noy N, Allemang D, Lee K-I, Nixon L, Golbeck J, Mika P, Maynard D, Mizoguchi R, Schreiber G, Cudré-Mauroux P (eds) The semantic web, vol 4825 of Lecture notes in computer science. Springer, Berlin, pp 722–735. ISBN 978-3-540-76297-3. doi: 10.1007/978-3-540-76298-0_52
  9. Brody S, Lapata M (2009) Bayesian word sense induction. In: Proceedings of the 12th conference of the European Chapter of the association for computational linguistics, EACL ’09, Stroudsburg, PA, USA. Association for computational linguistics, pp 103–111Google Scholar
  10. Dattolo A, Eynard D, Mazzola L (2011) An integrated approach to discover tag semantics. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11, New York, NY, USA. ACM, pp 814–820. ISBN 978-1-4503-0113-8. doi: 10.1145/1982185.1982359,
  11. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI Press, California, pp 226–231Google Scholar
  12. Flouris Giorgos, Manakanatas Dimitris, Kondylakis Haridimos, Plexousakis Dimitris, Antoniou Grigoris (2008) Ontology change: classification and survey. Knowl Eng Rev 23(2):117–152CrossRefGoogle Scholar
  13. García-Silva A, Szomszor M, Alani H, Corcho O (2009) Preliminary results in tag disambiguation using dbpedia. In: Proceedings of the fifth international conference on knowledge capture (KCAP), USAGoogle Scholar
  14. García-Silva A, Corcho O, Alani H, Gómez-Pérez A (2011) Review of the state of the art: discovering and associating semantics to tags in folksonomies. Knowl Eng Rev 26(4): 57–85Google Scholar
  15. Giunchiglia F, Zaihrayeu I (2009) Lightweight ontologies. In: Liu L, Ozsu MT (eds) Encyclopedia of database systems. Springer, Berlin. ISBN 978-0-387-49616-0Google Scholar
  16. Golder Scott, Huberman Bernardo A (2006) The structure of collaborative tagging systems. J Inf Sci 32(2):198–208CrossRefGoogle Scholar
  17. Haase P, Hotho A, Schmidt-Thieme L, Sure Y (2005) Collaborative and usage-driven evolution of personal ontologies. In: Gómez-Pérez A, Euzenat J (eds) The semantic web: research and applications, vol 3532 of Lecture notes in computer science. Springer, Berlin, pp 125–226. ISBN 978-3-540-26124-7. doi: 10.1007/11431053_33
  18. Hahn U, Schnattinger K (1998) Towards text knowledge engineering. In: Proceedings of the fifteenth national/tenth conference on artificial intelligence/innovative applications of artificial intelligence, AAAI ’98/IAAI ’98, Menlo Park, CA, USA. American Association for Artificial Intelligence, pp 524–531. ISBN 0-262-51098-7Google Scholar
  19. Jamoussi S (2009) Une nouvelle représentation vectorielle pour la classification sémantique. Traitement Automatique des Langues 50(3):23–57Google Scholar
  20. Kompatsiaris I, Diplaris S, Papadopoulos S (2011) Extracting emergent semantics from large-scale user-generated content. In: ICT innovations 2011 conference, Skopje, Sept 2011.
  21. Lau JH, Cook P, McCarthy D, Newman D, Baldwin T (2012) Word sense induction for novel sense detection. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, April 2012. Association for computational linguistics, pp 591–601.
  22. Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol 2, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 768–774. doi: 10.3115/980691.980696
  23. Lin H, Davis J, Zhou Y (2009) An integrated approach to extracting ontological structures from folksonomies. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvnen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) The semantic web: research and applications, vol 5554 of Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 654–668 doi: 10.1007/978-3-642-02121-3_48
  24. Maala MZ, Delteil A, Azough A (2008) A conversion process from Flickr tags to RDF descriptions. IADIS Int J www/internet 6(1):103–120Google Scholar
  25. Mika P (2007) Ontologies are us: a unified model of social networks and semantics. Web Sem 5:5–15. ISSN 1570–8268. doi: 10.1016/j.websem.2006.11.002
  26. Manning CD, Raghavan P, Schütze H (2008) Flat clustering. In: Introduction to information retrieval, chap 16. Cambridge University Press, Cambridge.
  27. Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41. doi: 10.1145/219717.219748 CrossRefGoogle Scholar
  28. Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64MathSciNetCrossRefGoogle Scholar
  29. Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the European semantic web conference (ESWC2007), volume 4519 of LNCS, Berlin Heidelberg, Germany. Springer, Berlin, pp 624–639Google Scholar
  30. Stojanovic L, Maedche A, Motik B, Stojanovic N (2002) User-driven ontology evolution management. In: Gómez-Pérez A, Benjamins V (eds) Knowledge engineering and knowledge management: ontologies and the semantic web, vol 2473 of Lecture notes in computer science. Springer, Berlin, pp 133–140. ISBN 978-3-540-44268-4. doi: 10.1007/3-540-45810-7_27
  31. Uschold M, Gruninger M (2004) Ontologies and semantics for seamless connectivity. SIGMOD Rec 33:58–64. ISSN 0163–5808. doi: 10.1145/1041410.1041420 Google Scholar
  32. Van Damme C, Hepp M, Siorpaes K (2007) Folksontology: an integrated approach for turning folksonomies into ontologies. In: Hotho A, Hoser B. (eds) Proceedings of the ESWC 2007 workshop bridging the gap between semantic web and Web 2.0, Innsbruck, Austria, pp 71–84Google Scholar
  33. Vander Wal T (2007) Folksonomy: coinage and definition. (last Accessed on 26 Nov 2011)
  34. Weinberger KQ, Slaney M, Van Zwol R (2008) Resolving tag ambiguity. In: Proceeding of the 16th ACM international conference on Multimedia, MM ’08, New York, NY, USA. ACM, pp 111–120. ISBN 978-1-60558-303-7. doi: 10.1145/1459359.1459375
  35. Wetzker R, Zimmermann C, Bauckhage C (2008) Analyzing social bookmarking systems: a cookbook. In: Proceedings of the ECAI 2008 mining social data workshop. IOS Press, Amsterdam, pp 26–30Google Scholar
  36. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw/Publ IEEE Neural Netw Counc 16(3):645–678. ISSN 1045–9227. doi: 10.1109/TNN.2005.845141 Google Scholar
  37. Zhang L, Wu X, Yu Y (2006) Emergent semantics from folksonomies: a quantitative study. In: Spaccapietra S, Aberer K, Cudré-Mauroux P (eds) Journal on data semantics VI, volume 4090 of Lecture notes in computer science. Springer, Berlin, pp 168–186. ISBN 978-3-540-36712-3. doi: 10.1007/11803034_8

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  1. 1.Departamento de Ciência da ComputaçãoUniversidade de São PauloSão PauloBrazil
  2. 2.Dipartimento di Ingegneria e Scienza dell’InformazioneThe University of TrentoTrentoItaly
  3. 3.Universidad Católica Nuestra Señora de la Asunción, DEIAsunciónParaguay

Personalised recommendations