Social Network Analysis and Mining

, Volume 3, Issue 4, pp 889–898 | Cite as

On the impact of text similarity functions on hashtag recommendations in microblogging environments

  • Eva ZangerleEmail author
  • Wolfgang Gassler
  • Günther Specht
Original Article


Microblogging applications such as Twitter are experiencing tremendous success. Microblog users utilize hashtags to categorize posted messages which aim at bringing order to the myriads of microblog messages. However, the percentage of messages incorporating hashtags is small and the used hashtags are very heterogeneous as hashtags may be chosen freely and may consist of any arbitrary combination of characters. This heterogeneity and the lack of use of hashtags lead to significant drawbacks in regards to the search functionality as messages are not categorized in a homogeneous way. In this paper, we present an approach for the recommendation of hashtags suitable for the message the user currently enters which aims at creating a more homogeneous set of hashtags. Furthermore, we present a detailed study on how the similarity measures used for the computation of recommendations influence the final set of recommended hashtags.


Recommender System Cosine Similarity Inverse Document Frequency Twitter User Levenshtein Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749CrossRefGoogle Scholar
  2. Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’07. ACM, New York, pp 971–980Google Scholar
  3. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, New YorkGoogle Scholar
  4. Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40:66–72CrossRefGoogle Scholar
  5. Bollen D, Knijnenburg BP, Willemsen MC, Graus M (2010) Understanding choice overload in recommender systems. In: Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10. ACM, New York, pp 63–70Google Scholar
  6. Boyd D, Golder S, Lotan G (1899) Tweet, tweet, retweet: conversational aspects of retweeting on twitter. In: HICSS, IEEE Computer Society, pp 1–10Google Scholar
  7. Chen J, Nairn R, Nelson L, Bernstein M, Chi E (2010) Short and tweet: experiments on recommending content from information streams. In: Proceedings of the 28th international conference on Human factors in computing systems. ACM, New York, pp 1185–1194Google Scholar
  8. Cremonesi P, Turrin R, Lentini E, Matteucci M (2008) An evaluation methodology for collaborative recommender systems. In: IEEE International Conference on Automated solutions for Cross Media Content and Multi-channel Distribution, 2008. AXMEDIS’08, pp 224–231Google Scholar
  9. Dice L (1945) Measures of the amount of ecologic association between species. Ecol Freshw Fish 26(3):297–302CrossRefGoogle Scholar
  10. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, vol 6. Morgan Kaufmann Publishers Inc., pp 1606–1611Google Scholar
  11. Garg N, Weber I (2008) Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM conference on Recommender systems, RecSys ’08. ACM, New York, pp 67–74Google Scholar
  12. Gassler W, Zangerle E, Specht G (2011) The snoopy concept: fighting heterogeneity in semistructured and collaborative information systems by using recommendations. In: The 2011 International Conference on Collaboration Technologies and Systems (CTS 2011), PhiladelphiaGoogle Scholar
  13. Hannon J, Bennett M, Smyth B (2010) Recommending twitter users to follow using content and collaborative filtering approaches. In: RecSys ’10: Proceedings of the fourth ACM conference on Recommender systems. ACM, New York, pp 199–206Google Scholar
  14. Honeycutt C, Herring SC (2009) Beyond microblogging: conversation and collaboration via Twitter. In: HICSS, IEEE Computer Society, pp 1–10Google Scholar
  15. Huberman B, Romero D, Wu F (2009) Social networks that matter: Twitter under the microscope. First Monday 14(1):8Google Scholar
  16. Jaccard P (1901) Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles, 37:547–579Google Scholar
  17. Jaeschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G (2007) Tag recommendations in Folksonomies. In: Kok J, Koronacki J, Lopez de Mantaras R, Matwin S, Mladenic D, Skowron A (eds) Knowledge discovery in databases: PKDD 2007, vol 4702 of Lecture Notes in Computer Science. Springer, Berlin, pp 506–514Google Scholar
  18. Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, New York, pp 56–65Google Scholar
  19. Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21CrossRefGoogle Scholar
  20. Krishnamurthy B, Gill P, Arlitt M (2008) A few chirps about twitter. In: Proceedings of the first workshop on Online social networks. ACM, New York, pp 19–24Google Scholar
  21. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. ACM, New York, pp 591–600Google Scholar
  22. Levenshtein V (1965) Binary codes with correction for deletions and insertions of the symbol 1. Problemy Peredachi Informatsii 1(1):12–25MathSciNetGoogle Scholar
  23. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710MathSciNetGoogle Scholar
  24. Lipczak M, Milios E (2010) Learning in efficient tag recommendation. In: Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10. ACM, New York, pp 167–174Google Scholar
  25. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeGoogle Scholar
  26. Marlow C, Naaman M, Boyd D, Davis M (2006) HT06, tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the seventeenth conference on Hypertext and hypermedia, HT ’06. ACM, New York, pp 31–40Google Scholar
  27. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press, Menlo Park; MIT Press, LondonGoogle Scholar
  28. Miller G (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81CrossRefGoogle Scholar
  29. Nishida K, Banno K, Fujimura K, Hoshide T (2011) Tweet classification by data compression. In: Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural diversiTy on the social web. ACM, New York, pp 29–34Google Scholar
  30. Pazzani M, Billsus D (2007) Content-based recommendation systems. In: Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web, vol 4321 of Lecture Notes in Computer Science. Springer, Berlin, pp 325–341Google Scholar
  31. Phelan O, McCarthy K, Smyth B (2009) Using twitter to recommend real-time topical news. In: Proceedings of the third ACM conference on recommender systems. ACM, New York, pp 385–388Google Scholar
  32. Rae A, Sigurbjörnsson B, van Zwol R (2010) Improving tag recommendation using social networks. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, RIAO ’10. Le Centre de Hautes Etudes Internationales d’Informatique Documentaire, Paris, pp 92–99Google Scholar
  33. Resnick P, Varian H (1997) Recommender systems. Commun ACM 40(3):58CrossRefGoogle Scholar
  34. Robertson SE, Walker S, Jones S, Hancock-Beaulieu M, Gatford M (1994) Okapi at TREC-3. In: Proceedings of the Text Retrieval Conference (TREC). National Institute of Standards and Technology, Gaithersburg, pp 109–126Google Scholar
  35. Romero DM, Meeder B, Kleinberg JM (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Srinivasan S, Ramamritham K, Kumar A, Ravindra MP, Bertino E, Kumar R (eds) WWW. ACM, New York, pp 695–704Google Scholar
  36. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRefGoogle Scholar
  37. Schedl M (2010) On the use of microblogging posts for similarity estimation and artist labeling. In: Downie JS, Veltkamp RC (eds) ISMIR, International Society for Music Information Retrieval, pp 447–452Google Scholar
  38. Schedl M (2012) # nowplaying madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs. Inf Retr 1–35Google Scholar
  39. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47CrossRefGoogle Scholar
  40. Sen S, Vig J, Riedl J (2009) Tagommenders: connecting users to items through tags. In: Proceedings of the 18th international conference on world wide web, WWW ’09. ACM, New York, pp 671–680Google Scholar
  41. Sigurbjörnsson B, Van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceeding of the 17th international conference on world wide web. ACM, New York, pp 327–336Google Scholar
  42. Tatu M, Srikanth M, D’Silva T (2008) RSDC’08: Tag recommendations using bookmark content. In: Workshop at 18th European Conference on Machine Learning (ECML’08)/11th European Conference on Principles and Practice of Knowledge Discovery in Databases PKDD08Google Scholar
  43. Ye S, Wu S (2010) Measuring Message Propagation and Social Influence on Twitter. com. In: Proceedings of Second International Conference, Socinfo 2010, on Social Informatics, Laxenburg. Springer, New York, pp 216–231Google Scholar
  44. Zangerle E, Gassler W, Specht G (2011) Using tag recommendations to homogenize folksonomies in microblogging environments. In: Bolc L, Makowski M, Wierzbicki A (eds) Proceedings of Third International Conference, SocInfo 2011, on Social Informatics, Singapore, vol 6430 of Lecture Notes in Computer Science. Springer, Berlin, pp 1–18Google Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Eva Zangerle
    • 1
    Email author
  • Wolfgang Gassler
    • 1
  • Günther Specht
    • 1
  1. 1.Databases and Information Systems Institute of Computer ScienceUniversity of InnsbruckInnsbruckAustria

Personalised recommendations