EKAW 2014: Knowledge Engineering and Knowledge Management pp 563-578 | Cite as
Temporal Semantics: Time-Varying Hashtag Sense Clustering
Abstract
Hashtags are creative labels used in micro-blogs to characterize the topic of a message/discussion. However, since hashtags are created in a spontaneous and highly dynamic way by users using multiple languages, the same topic can be associated to different hashtags and conversely, the same hashtag may imply different topics in different time spans. Contrary to common words, sense clustering for hashtags is complicated by the fact that no sense catalogues are available, like, e.g. Wikipedia or WordNet and furthermore, hashtag labels are often obscure. In this paper we propose a sense clustering algorithm based on temporal mining. First, hashtag time series are converted into strings of symbols using Symbolic Aggregate ApproXimation (SAX), then, hashtags are clustered based on string similarity and temporal co-occurrence. Evaluation is performed on two reference datasets of semantically tagged hashtags. We also perform a complexity evaluation of our algorithm, since efficiency is a crucial performance factor when processing large-scale data streams, such as Twitter.
Keywords
Sense Cluster Story Detection Temporal Semantic Twitter Stream Cluster Internal ValidityPreview
Unable to display preview. Download preview PDF.
References
- 1.Mehrota, R., Sanner, S.: Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. In: SIGIR 2013, Dublin, July 28-August 1 (2013)Google Scholar
- 2.Tsur, O., Littman, A., Rappoport, A.: Efficient Clustering of Short Messages into General Domains. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, ICWSM 2013 (2013)Google Scholar
- 3.Muntean, C.I., Morar, G.A., Moldovan, D.: Exploring the meaning behind twitter hashtags through clustering. In: Abramowicz, W., Domingue, J., Węcel, K. (eds.) BIS Workshops 2012. LNBIP, vol. 127, pp. 231–242. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 4.Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter. In: VLDB 2012 WOSS, Istanbul, Turkey, August 31 (2012)Google Scholar
- 5.Carter, S., Tsagkias, M., Weerkamp, W.: Twitter hashtags: Joint Translation and Clustering. In: 3rd International Conference on Web Science, WebSci (2011)Google Scholar
- 6.Modi, A., Tinkerhess, M., Antenucci, D., Handy, G.: Classification of Tweets via clustering of hashtags. EECS 545 Final Project (2011)Google Scholar
- 7.Posch, L., et al.: Meaning as collective use: predicting semantic hashtag categories on twitter. In: Proceedings of the 22nd International Conference on World Wide Web Companion. International World Wide Web Conferences (2013)Google Scholar
- 8.Romero, D.M., Meeder, B., Kleinberg, J.: Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th International Conference Wide Web, ACM (2011)Google Scholar
- 9.Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)Google Scholar
- 10.Weng, J., Yao, Y., Leonardi, E., Lee, B.-S.: Event Detection in Twitter. In: ICWSM 2011 International AAAI Conference on Weblogs and Social Media (2011)Google Scholar
- 11.Xie, W., Zhu, F., Jang, J., Lim, E.-P., Wang, K.: TopicSketch: Real-time Bursty Topic Detection from Twitter. In: IEEE 13th International Conference on Data Mining, ICDM (2013)Google Scholar
- 12.Qin, Y., Zhang, Y., Zhang, M., Zheng, D.: Feature-Rich Segment-Based News Event Detection on Twitter. In: International Joint Conference on Natural Language Processing (2013)Google Scholar
- 13.Guzman, J., Poblete, B.: On-line Relevant Anomaly Detection in the Twitter Stream:An Efficient Bursty Keyword Detection Model. In: KDD 2013 (2013)Google Scholar
- 14.Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First Story Detection using Twitter and Wikipedia. In: TAIA 2012 (2012)Google Scholar
- 15.Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding Bursty Topics from Microblogs. In: ACL (2012)Google Scholar
- 16.Naaman, M., Becker, H., Gravano, L.: Hips and Trendy: characterizing emerging trends on Twitter. JASIST (2011)Google Scholar
- 17.Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 181–189. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
- 18.Lin, J., Keogh, E., Li, W., Lonardi, S.: Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery 15(2), 107–144 (2007)CrossRefMathSciNetGoogle Scholar
- 19.Oncina, J., Garcıa, P.: Inferring Regular Languages in Polynomial Updated Time. In: The 4th Spanish Symposium on Pattern Recognition and Image Analysis. MPAI, vol. 1, pp. 49–61. World Scientific (1992)Google Scholar
- 20.Jain, A.K.: Data clustering: 50 years beyond K –means. Pattern Recognition Letters 31, 651–666 (2010)CrossRefGoogle Scholar