Semantic grounding of social annotations for enhancing resource classification in folksonomies

Tommasel, Antonela; Godoy, Daniela

doi:10.1007/s10844-014-0339-y

Semantic grounding of social annotations for enhancing resource classification in folksonomies

Published: 04 November 2014

Volume 44, pages 415–446, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Antonela Tommasel¹ &
Daniela Godoy¹

345 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

User-generated annotations in tagging or bookmarking sites such as Flickr or Delicious can provide a promising and interesting source of information for aiding tasks such as Web resource classification. However, the use of tags brings up some challenges. Since there are no constraints on the terms that can be used for tagging, noise and ambiguity are introduced when users annotate resources. Moreover, traditional bag-of-words representations ignore connections between terms and, thus, are affected by synonymity and hyponymia. Althougth tag-based representations are a valuable source for classifying resources, the problems associated with the unsupervised nature of tags may hinder classification results. This paper presents an approach for semantically analysing social annotations in order to attain enriched concept-based representations of Web resources. Representations are enriched with concepts extracted from WordNet and Wikipedia to overcome problems caused by natural language as well as enhancing the quality of information available for performing an effective classification of resources. Several strategies for tag pre-processing, concept disambiguation and incorporation of semantic entities to representations are discussed and evaluated in this paper. Experimental results showed that the strategies proposed to associate tags with conceptual entities allow improving resource classification results, outperforming traditional approaches based on bag-of-words representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The role of collaborative tagging and ontologies in emerging semantic of web resources

Article 25 January 2019

Sara Qassimi & El Hassan Abdelwahed

Towards a Web of Semantic Tags

Towards an Emergent Semantic of Web Resources Using Collaborative Tagging

Notes

References

Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics - Volume 1, ACL, (COLING ’96) Copenhagen, Denmark, (pp. 16–22).
Aliakbary, S., Abolhassani, H., Rahmani, H., Nobakht, B. (2009). Web page classification using social tags. In Proceedings of the 2009 international conference on computational science and engineering (CSE ’09) (pp. 588–593).
Baeza-Yates, R.A., & Ribeiro-Neto, B.A. (1999). Modern information retrieval. Boston: Addison-Wesley Longman Publishing Co. Inc.
Buckley, C. (1993). The importance of proper weighting methods. In Proceedings of the workshop on human language technology, association for computational linguistics, (HLT ’93), Princeton, New Jersey, (pp. 349–352).
Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval (pp. 161–175).
Dagher, G.G., & Fung, B.C.M. (2013). Subject-based semantic document clustering for digital forensic investigations. Data & Knowledge Engineering (DKE), 86, 224–241.
Article Google Scholar
Dattolo, A., Eynard, D., Mazzola, L. (2011). An integrated approach to discover tag semantics. In Proceedings of the 2011 ACM symposium on applied computing, ACM, (SAC ’11), TaiChung, Taiwan, (pp. 814–820).
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Article Google Scholar
Fellbaum, C. (2005). Wordnet and wordnets In K. Brown (Ed.), , Encyclopedia of language and linguistics (pp. 665–670). Oxford: Elsevier.
Fogarolli, A. (2009). Word sense disambiguation based on wikipedia link structure. In Proceedings of the 2009 IEEE international conference on semantic computing, IEEE Computer Society, (ICSC ’09), Washington, DC, (pp. 77–82).
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on artifical intelligence, (IJCAI’07) (pp. 1606–1611). Hyderabad: Morgan Kaufmann Publishers Inc.
Hotho, A., Staab, S., Stumme, G. (2003). Wordnet improves text document clustering. In Proceedings of the semantic web workshop of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, (SIGIR 2003), Toronto Canada.
Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006). Bibsonomy: a social bookmark and publication sharing system In A. de Moor, S. Polovina, H. Delugach (Eds.), Proceedings of the conceptual structures tool interoperability workshop at the 14th international conference on conceptual structures. Aalborg: Aalborg University Press.
Hsu, I.C. (2013). Integrating ontology technology with folksonomies for personalized social tag recommendation. Applied Soft Computing, 13(8), 3745–3750. doi:10.1016/j.asoc.2013.03.004, http://www.sciencedirect.com/science/article/pii/S1568494613001087.
Huang, A., Milne, D., Frank, E., Witten, I.H. (2009). Clustering documents using a wikipedia-based concept representation. In Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, (PAKDD ’09) (pp. 628–636). Bangkok: Springer-Verlag.
Jankowski, N., & Usowicz, K. (2011). Analysis of feature weighting methods based on feature ranking methods for classification. In Proceedings of the 18th international conference on neural information processing, (ICONIP’11) (pp. 238–247). Shanghai: Springer-Verlag.
Kohavi, R., Langley, P., Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer-Verlag.
Körner, C., Kern, R., Grahsl, H.P., Strohmaier, M. (2010). Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In Proceedings of the 21st ACM conference on hypertext and hypermedia, (HT ’10). (pp. 157–166). Toronto: ACM.
Lan, H. (2011). Concept-based text clustering. PhD thesis, University of Waikato, New Zealand.
Lan, M., Tan, C.L., Low, H.B., Sung, S.Y. (2005). A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Special interest tracks and posters of the 14th international conference on world wide web, ACM, (WWW ’05)(pp. 1032–1033). Chiba, Japan.
Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.
Article MATH Google Scholar
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on systems documentation, ACM, (SIGDOC ’86)(pp. 24–26). Toronto, Canada.
Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F. (2013). Content-based and collaborative techniques for tag recommendation: an empirical evaluation. Journal of Intelligent Information Systems, 40(1), 41–61. doi:10.1007/s10844-012-0215-6.
Article Google Scholar
Maree, M., & Belkhatir, M. (2013). Coupling semantic and statistical techniques for dynamically enriching web ontologies. Journal of Intelligent Information Systems, 40(3), 455–478. doi:10.1007/s10844-012-0233-4.
Article Google Scholar
Mathes, A. (2004). Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication.
Medelyan, O., Milne, D., Legg, C., Witten, I.H. (2009). Mining meaning from wikipedia. International Journal of Human-Computer Studies, 67(9), 716–754.
Article Google Scholar
Milne, D., & Witten, I.H. (2008a). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30). AAAI Press.
Milne, D., & Witten, I.H. (2008b). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on information and knowledge management, ACM, (CIKM ’08) (pp. 509–518). Napa Valley: California.
Milne, D., & Witten, I.H. (2009). An open-source toolkit for mining Wikipedia. In Proceedings of the New Zealand computer science research student conference, (NZCSRSC’09)(Vol. 9).
Navigli, R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys, 41(2), 1–69.
Article Google Scholar
Noll, M.G., & Meinel, C. (2007). Authors vs. readers: a comparative study of document metadata and content in the www. In Proceedings of the 2007 ACM symposium on document engineering, ACM, (DocEng ’07) (pp. 177–186). Winnipeg: Manitoba, Canada.
Noll, M.G., & Meinel, C. (2008). Exploring social annotations for web document classification. In Proceedings of the 2008 ACM symposium on applied computing, SAC ’08 (pp. 2315–2320). New York: ACM.
Platt, J.C. (1999). Advances in kernel methods. MIT Press, Cambridge, MA, USA, chap Fast training of support vector machines using sequential minimal optimization, (pp. 185-208).
Porter, M. (1997). Readings in information retrieval. Morgan Kaufmann Publishers Inc., CA, USA, chap An algorithm for suffix stripping, (pp. 313–316).
Rijsbergen, C.Jv. (1979). Information retrieval, 2nd edn. Newton: Butterworth-Heinemann.
Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Article Google Scholar
Schütze, H., & Silverstein, C. (1997). Projections for efficient document clustering. In Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, (SIGIR ’97)(pp. 74–81). Philadelphia: ACM.
Solskinnsbakk, G., Gulla, J.A., Haderlein, V., Myrseth, P., Cerrato, O. (2012). Quality of hierarchies in ontologies and folksonomies. Data & Knowledge Engineering, 74, 13–25.
Strube M, & Ponzetto SP (2006). Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence, (AAAI’06) (pp. 1419–1424). MA: AAAI Press.
Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Book MATH Google Scholar
Yin, Z., Li, R., Mei, Q., Han, J. (2009). Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’09) (pp. 957–966). Paris: ACM.
Zipf, G.K. (1935). The Psychobiology of Language. Houghton-Mifflin.
Zubiaga, A., Martínez, R., Fresno, V. (2009). Getting the most out of social annotations for web page classification. In Proceedings of the 9th ACM symposium on document engineering, ACM, (DocEng ’09) (pp. 74–83). Munich, Germany.
Zubiaga, A., Körner, C., Strohmaier, M. (2011a). Tags vs shelves: from social tagging to social classification. In Proceedings of the 22nd ACM conference on hypertext and hypermedia, ACM, (HT ’11) (pp. 93–102). Eindhoven, The Netherlands.
Zubiaga, A., Martínez, R., Fresno, V. (2011b). Analyzing tag distributions in folksonomies for resource classification. In Proceedings of the 5th international conference on knowledge science, engineering and management, (KSEM’11) (pp. 91–102). Irvine: Springer-Verlag.

Download references

Acknowledgments

This work has been partially funded by ANPCyT (Argentina) under grant PICT-2011-0366 and by CONICET (Argentina) under grant PIP No. 112-201201-00185.

Author information

Authors and Affiliations

ISISTAN Research Institute, CONICET-UNCPBA, Paraje Arroyo Seco, Campus Universitario, Tandil, Buenos Aires, Argentina
Antonela Tommasel & Daniela Godoy

Authors

Antonela Tommasel
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Godoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonela Tommasel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tommasel, A., Godoy, D. Semantic grounding of social annotations for enhancing resource classification in folksonomies. J Intell Inf Syst 44, 415–446 (2015). https://doi.org/10.1007/s10844-014-0339-y

Download citation

Received: 28 December 2013
Revised: 03 October 2014
Accepted: 07 October 2014
Published: 04 November 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10844-014-0339-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic grounding of social annotations for enhancing resource classification in folksonomies

Abstract

Access this article

Similar content being viewed by others

The role of collaborative tagging and ontologies in emerging semantic of web resources

Towards a Web of Semantic Tags

Towards an Emergent Semantic of Web Resources Using Collaborative Tagging

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic grounding of social annotations for enhancing resource classification in folksonomies

Abstract

Access this article

Similar content being viewed by others

The role of collaborative tagging and ontologies in emerging semantic of web resources

Towards a Web of Semantic Tags

Towards an Emergent Semantic of Web Resources Using Collaborative Tagging

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation