Temporal Semantics: Time-Varying Hashtag Sense Clustering

  • Giovanni Stilo
  • Paola Velardi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8876)

Abstract

Hashtags are creative labels used in micro-blogs to characterize the topic of a message/discussion. However, since hashtags are created in a spontaneous and highly dynamic way by users using multiple languages, the same topic can be associated to different hashtags and conversely, the same hashtag may imply different topics in different time spans. Contrary to common words, sense clustering for hashtags is complicated by the fact that no sense catalogues are available, like, e.g. Wikipedia or WordNet and furthermore, hashtag labels are often obscure. In this paper we propose a sense clustering algorithm based on temporal mining. First, hashtag time series are converted into strings of symbols using Symbolic Aggregate ApproXimation (SAX), then, hashtags are clustered based on string similarity and temporal co-occurrence. Evaluation is performed on two reference datasets of semantically tagged hashtags. We also perform a complexity evaluation of our algorithm, since efficiency is a crucial performance factor when processing large-scale data streams, such as Twitter.

Keywords

Sense Cluster Story Detection Temporal Semantic Twitter Stream Cluster Internal Validity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Mehrota, R., Sanner, S.: Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. In: SIGIR 2013, Dublin, July 28-August 1 (2013)Google Scholar
  2. 2.
    Tsur, O., Littman, A., Rappoport, A.: Efficient Clustering of Short Messages into General Domains. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, ICWSM 2013 (2013)Google Scholar
  3. 3.
    Muntean, C.I., Morar, G.A., Moldovan, D.: Exploring the meaning behind twitter hashtags through clustering. In: Abramowicz, W., Domingue, J., Węcel, K. (eds.) BIS Workshops 2012. LNBIP, vol. 127, pp. 231–242. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter. In: VLDB 2012 WOSS, Istanbul, Turkey, August 31 (2012)Google Scholar
  5. 5.
    Carter, S., Tsagkias, M., Weerkamp, W.: Twitter hashtags: Joint Translation and Clustering. In: 3rd International Conference on Web Science, WebSci (2011)Google Scholar
  6. 6.
    Modi, A., Tinkerhess, M., Antenucci, D., Handy, G.: Classification of Tweets via clustering of hashtags. EECS 545 Final Project (2011)Google Scholar
  7. 7.
    Posch, L., et al.: Meaning as collective use: predicting semantic hashtag categories on twitter. In: Proceedings of the 22nd International Conference on World Wide Web Companion. International World Wide Web Conferences (2013)Google Scholar
  8. 8.
    Romero, D.M., Meeder, B., Kleinberg, J.: Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th International Conference Wide Web, ACM (2011)Google Scholar
  9. 9.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)Google Scholar
  10. 10.
    Weng, J., Yao, Y., Leonardi, E., Lee, B.-S.: Event Detection in Twitter. In: ICWSM 2011 International AAAI Conference on Weblogs and Social Media (2011)Google Scholar
  11. 11.
    Xie, W., Zhu, F., Jang, J., Lim, E.-P., Wang, K.: TopicSketch: Real-time Bursty Topic Detection from Twitter. In: IEEE 13th International Conference on Data Mining, ICDM (2013)Google Scholar
  12. 12.
    Qin, Y., Zhang, Y., Zhang, M., Zheng, D.: Feature-Rich Segment-Based News Event Detection on Twitter. In: International Joint Conference on Natural Language Processing (2013)Google Scholar
  13. 13.
    Guzman, J., Poblete, B.: On-line Relevant Anomaly Detection in the Twitter Stream:An Efficient Bursty Keyword Detection Model. In: KDD 2013 (2013)Google Scholar
  14. 14.
    Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First Story Detection using Twitter and Wikipedia. In: TAIA 2012 (2012)Google Scholar
  15. 15.
    Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding Bursty Topics from Microblogs. In: ACL (2012)Google Scholar
  16. 16.
    Naaman, M., Becker, H., Gravano, L.: Hips and Trendy: characterizing emerging trends on Twitter. JASIST (2011)Google Scholar
  17. 17.
    Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 181–189. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  18. 18.
    Lin, J., Keogh, E., Li, W., Lonardi, S.: Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery 15(2), 107–144 (2007)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Oncina, J., Garcıa, P.: Inferring Regular Languages in Polynomial Updated Time. In: The 4th Spanish Symposium on Pattern Recognition and Image Analysis. MPAI, vol. 1, pp. 49–61. World Scientific (1992)Google Scholar
  20. 20.
    Jain, A.K.: Data clustering: 50 years beyond K –means. Pattern Recognition Letters 31, 651–666 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Giovanni Stilo
    • 1
  • Paola Velardi
    • 1
  1. 1.Dipartimento di InformaticaRomaItaly

Personalised recommendations