Skip to main content
Log in

Semantic grounding of social annotations for enhancing resource classification in folksonomies

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

User-generated annotations in tagging or bookmarking sites such as Flickr or Delicious can provide a promising and interesting source of information for aiding tasks such as Web resource classification. However, the use of tags brings up some challenges. Since there are no constraints on the terms that can be used for tagging, noise and ambiguity are introduced when users annotate resources. Moreover, traditional bag-of-words representations ignore connections between terms and, thus, are affected by synonymity and hyponymia. Althougth tag-based representations are a valuable source for classifying resources, the problems associated with the unsupervised nature of tags may hinder classification results. This paper presents an approach for semantically analysing social annotations in order to attain enriched concept-based representations of Web resources. Representations are enriched with concepts extracted from WordNet and Wikipedia to overcome problems caused by natural language as well as enhancing the quality of information available for performing an effective classification of resources. Several strategies for tag pre-processing, concept disambiguation and incorporation of semantic entities to representations are discussed and evaluated in this paper. Experimental results showed that the strategies proposed to associate tags with conceptual entities allow improving resource classification results, outperforming traditional approaches based on bag-of-words representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://delicious.com

  2. http://www.flickr.com/

  3. http://www.wikipedia.org/

  4. http://WordNet.princeton.edu/

  5. http://odur.let.rug.nl/vannoord/TextCat/

  6. http://nlp.uned.es/social-tagging/socialodp2k9/

  7. http://www.stumbleupon.com/

  8. http://lyle.smu.edu/~tspell/jaws/index.html/

  9. http://wikipedia-miner.cms.waikato.ac.nz/

  10. http://www.cs.waikato.ac.nz/ml/weka/

References

  • Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics - Volume 1, ACL, (COLING ’96) Copenhagen, Denmark, (pp. 16–22).

  • Aliakbary, S., Abolhassani, H., Rahmani, H., Nobakht, B. (2009). Web page classification using social tags. In Proceedings of the 2009 international conference on computational science and engineering (CSE ’09) (pp. 588–593).

  • Baeza-Yates, R.A., & Ribeiro-Neto, B.A. (1999). Modern information retrieval. Boston: Addison-Wesley Longman Publishing Co. Inc.

  • Buckley, C. (1993). The importance of proper weighting methods. In Proceedings of the workshop on human language technology, association for computational linguistics, (HLT ’93), Princeton, New Jersey, (pp. 349–352).

  • Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval (pp. 161–175).

  • Dagher, G.G., & Fung, B.C.M. (2013). Subject-based semantic document clustering for digital forensic investigations. Data & Knowledge Engineering (DKE), 86, 224–241.

    Article  Google Scholar 

  • Dattolo, A., Eynard, D., Mazzola, L. (2011). An integrated approach to discover tag semantics. In Proceedings of the 2011 ACM symposium on applied computing, ACM, (SAC ’11), TaiChung, Taiwan, (pp. 814–820).

  • Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Fellbaum, C. (2005). Wordnet and wordnets In K. Brown (Ed.), , Encyclopedia of language and linguistics (pp. 665–670). Oxford: Elsevier.

  • Fogarolli, A. (2009). Word sense disambiguation based on wikipedia link structure. In Proceedings of the 2009 IEEE international conference on semantic computing, IEEE Computer Society, (ICSC ’09), Washington, DC, (pp. 77–82).

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on artifical intelligence, (IJCAI’07) (pp. 1606–1611). Hyderabad: Morgan Kaufmann Publishers Inc.

  • Hotho, A., Staab, S., Stumme, G. (2003). Wordnet improves text document clustering. In Proceedings of the semantic web workshop of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, (SIGIR 2003), Toronto Canada.

  • Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006). Bibsonomy: a social bookmark and publication sharing system In A. de Moor, S. Polovina, H. Delugach (Eds.), Proceedings of the conceptual structures tool interoperability workshop at the 14th international conference on conceptual structures. Aalborg: Aalborg University Press.

  • Hsu, I.C. (2013). Integrating ontology technology with folksonomies for personalized social tag recommendation. Applied Soft Computing, 13(8), 3745–3750. doi:10.1016/j.asoc.2013.03.004, http://www.sciencedirect.com/science/article/pii/S1568494613001087.

  • Huang, A., Milne, D., Frank, E., Witten, I.H. (2009). Clustering documents using a wikipedia-based concept representation. In Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, (PAKDD ’09) (pp. 628–636). Bangkok: Springer-Verlag.

  • Jankowski, N., & Usowicz, K. (2011). Analysis of feature weighting methods based on feature ranking methods for classification. In Proceedings of the 18th international conference on neural information processing, (ICONIP’11) (pp. 238–247). Shanghai: Springer-Verlag.

  • Kohavi, R., Langley, P., Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer-Verlag.

  • Körner, C., Kern, R., Grahsl, H.P., Strohmaier, M. (2010). Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In Proceedings of the 21st ACM conference on hypertext and hypermedia, (HT ’10). (pp. 157–166). Toronto: ACM.

  • Lan, H. (2011). Concept-based text clustering. PhD thesis, University of Waikato, New Zealand.

  • Lan, M., Tan, C.L., Low, H.B., Sung, S.Y. (2005). A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Special interest tracks and posters of the 14th international conference on world wide web, ACM, (WWW ’05)(pp. 1032–1033). Chiba, Japan.

  • Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.

    Article  MATH  Google Scholar 

  • Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on systems documentation, ACM, (SIGDOC ’86)(pp. 24–26). Toronto, Canada.

  • Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F. (2013). Content-based and collaborative techniques for tag recommendation: an empirical evaluation. Journal of Intelligent Information Systems, 40(1), 41–61. doi:10.1007/s10844-012-0215-6.

    Article  Google Scholar 

  • Maree, M., & Belkhatir, M. (2013). Coupling semantic and statistical techniques for dynamically enriching web ontologies. Journal of Intelligent Information Systems, 40(3), 455–478. doi:10.1007/s10844-012-0233-4.

    Article  Google Scholar 

  • Mathes, A. (2004). Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication.

  • Medelyan, O., Milne, D., Legg, C., Witten, I.H. (2009). Mining meaning from wikipedia. International Journal of Human-Computer Studies, 67(9), 716–754.

    Article  Google Scholar 

  • Milne, D., & Witten, I.H. (2008a). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30). AAAI Press.

  • Milne, D., & Witten, I.H. (2008b). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on information and knowledge management, ACM, (CIKM ’08) (pp. 509–518). Napa Valley: California.

  • Milne, D., & Witten, I.H. (2009). An open-source toolkit for mining Wikipedia. In Proceedings of the New Zealand computer science research student conference, (NZCSRSC’09)(Vol. 9).

  • Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys, 41(2), 1–69.

    Article  Google Scholar 

  • Noll, M.G., & Meinel, C. (2007). Authors vs. readers: a comparative study of document metadata and content in the www. In Proceedings of the 2007 ACM symposium on document engineering, ACM, (DocEng ’07) (pp. 177–186). Winnipeg: Manitoba, Canada.

  • Noll, M.G., & Meinel, C. (2008). Exploring social annotations for web document classification. In Proceedings of the 2008 ACM symposium on applied computing, SAC ’08 (pp. 2315–2320). New York: ACM.

  • Platt, J.C. (1999). Advances in kernel methods. MIT Press, Cambridge, MA, USA, chap Fast training of support vector machines using sequential minimal optimization, (pp. 185-208).

  • Porter, M. (1997). Readings in information retrieval. Morgan Kaufmann Publishers Inc., CA, USA, chap An algorithm for suffix stripping, (pp. 313–316).

  • Rijsbergen, C.Jv. (1979). Information retrieval, 2nd edn. Newton: Butterworth-Heinemann.

    Google Scholar 

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.

    Article  Google Scholar 

  • Schütze, H., & Silverstein, C. (1997). Projections for efficient document clustering. In Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, (SIGIR ’97)(pp. 74–81). Philadelphia: ACM.

  • Solskinnsbakk, G., Gulla, J.A., Haderlein, V., Myrseth, P., Cerrato, O. (2012). Quality of hierarchies in ontologies and folksonomies. Data & Knowledge Engineering, 74, 13–25.

  • Strube M, & Ponzetto SP (2006). Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence, (AAAI’06) (pp. 1419–1424). MA: AAAI Press.

  • Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Yin, Z., Li, R., Mei, Q., Han, J. (2009). Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’09) (pp. 957–966). Paris: ACM.

  • Zipf, G.K. (1935). The Psychobiology of Language. Houghton-Mifflin.

  • Zubiaga, A., Martínez, R., Fresno, V. (2009). Getting the most out of social annotations for web page classification. In Proceedings of the 9th ACM symposium on document engineering, ACM, (DocEng ’09) (pp. 74–83). Munich, Germany.

  • Zubiaga, A., Körner, C., Strohmaier, M. (2011a). Tags vs shelves: from social tagging to social classification. In Proceedings of the 22nd ACM conference on hypertext and hypermedia, ACM, (HT ’11) (pp. 93–102). Eindhoven, The Netherlands.

  • Zubiaga, A., Martínez, R., Fresno, V. (2011b). Analyzing tag distributions in folksonomies for resource classification. In Proceedings of the 5th international conference on knowledge science, engineering and management, (KSEM’11) (pp. 91–102). Irvine: Springer-Verlag.

Download references

Acknowledgments

This work has been partially funded by ANPCyT (Argentina) under grant PICT-2011-0366 and by CONICET (Argentina) under grant PIP No. 112-201201-00185.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonela Tommasel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tommasel, A., Godoy, D. Semantic grounding of social annotations for enhancing resource classification in folksonomies. J Intell Inf Syst 44, 415–446 (2015). https://doi.org/10.1007/s10844-014-0339-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0339-y

Keywords

Navigation