World Wide Web

, Volume 20, Issue 6, pp 1527–1549 | Cite as

Hashtag-based topic evolution in social media

  • Md. Hijbul Alam
  • Woo-Jong Ryu
  • SangKeun LeeEmail author


The rise of online social media has led to an explosion of metadata-containing user generated content. The tracking of metadata distribution is essential to understand social media. This paper presents two statistical models that detect interpretable topics over time along with their hashtags distribution. A topic is represented by a cluster of words that frequently occur together, and a context is represented by a cluster of hashtags, i.e., the hashtag distribution. The models combine a context with a related topic by jointly modeling words with hashtags and time. Experiments with real-world datasets demonstrate that the proposed models discover topics over time with related contexts effectively.


Topic evolution Hashtag distribution Topic model Social media 



This research was supported by the Basic Science Research Program and the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (numbers 2015R1A2A1A10052665, 2015R1A2A1A15052701 and 2012M3C4A7033344).


  1. 1.
    Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A.J., Teo, C.H.: Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web (WWW), pp. 267–276 (2011)Google Scholar
  2. 2.
    Alam, M.H., Lee, S.: Semantic aspect discovery for online reviews. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), pp. 816-821 (2012)Google Scholar
  3. 3.
    Alam, M.H., Ryu, W.J., Lee, S.: Context over time: Modeling context evolution in social media. In: Proceedings of the 3rd Workshop on Data-Driven User Behavioral Modeling and Mining from Social Media (DUBMOD), pp. 15–18 (2014)Google Scholar
  4. 4.
    AlSumait, L., Barbara, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 3–12 (2008)Google Scholar
  5. 5.
    Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 113–120 (2006)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  7. 7.
    Bravo-Marquez, F., Mendoza, M., Poblete, B.: Meta-level sentiment models for big social data analysis. Knowl.-Based Syst. 69, 86–99 (2014)CrossRefGoogle Scholar
  8. 8.
    Chua, F., Asur, S.: Automatic summarization of events from social media. In: Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM), pp. 81–90 (2013)Google Scholar
  9. 9.
    Dubey, A., Hefny, A., Williamson, S., Xing, E.P.: A nonparametric mixture model for topic modeling over time. In: Proceedings of the 13th SIAM International Conference on Data Mining, pp. 530– 538 (2013)Google Scholar
  10. 10.
    Flor, M.: Four types of context for automatic spelling correction. Traitement Automatique Langues (TAL) 53(3), 61–99 (2012)Google Scholar
  11. 11.
    He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L.: Detecting topic evolution in scientific literature: How can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 957–966 (2009)Google Scholar
  12. 12.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Katz, G., Ofek, N., Shapira, B.: ConSent: Context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)CrossRefGoogle Scholar
  14. 14.
    Kawamae, N.: Trend analysis model: Trend consists of temporal words, topics, and timestamps. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), pp. 317–326 (2011)Google Scholar
  15. 15.
    Lau, J., Collier, N., Baldwin, T.: On-line trend analysis with topic models: #twitter trends detection topic model. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING), pp. 1–16 (2012)Google Scholar
  16. 16.
    Li, J., Cardie, C.: Timeline generation: Tracking individuals on twitter. In: Proceedings of the 23rd International Conference on World Wide Web (WWW), pp. 643–652 (2014)Google Scholar
  17. 17.
    Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)Google Scholar
  18. 18.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval. Cambridge University Press (2008)Google Scholar
  19. 19.
    McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30(1), 249–272 (2007)Google Scholar
  20. 20.
    Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 889–892 (2013)Google Scholar
  21. 21.
    Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (SIGKDD), pp. 198–207 (2005)Google Scholar
  22. 22.
    Montejo-Rez, A., Daz-Galiano, M.C., Martnez-Santiago, F., Urea-Lpez, L.A.: Crowd explicit sentiment analysis. Knowl.-Based Syst. 69, 134–139 (2014)CrossRefGoogle Scholar
  23. 23.
    Qian, T., Li, Q., Liu, B., Xiong, H., Srivastava, J., Sheu, P.C.: Topic formation and development: A core-group evolving process. World Wide Web 17(6), 1343–1373 (2014)CrossRefGoogle Scholar
  24. 24.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 248–256 (2009)Google Scholar
  25. 25.
    Rao, Y., Lei, J., Wenyin, L., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4), 723–742 (2014)CrossRefGoogle Scholar
  26. 26.
    Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S.M., Ritter, A., Stoyanov, V.: SemEval-2015 task 10: Sentiment analysis in twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval), pp. 451–463 (2015)Google Scholar
  27. 27.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)Google Scholar
  28. 28.
    Shuyo, N.: Language detection library for java. (2010)
  29. 29.
    Si, J., Li, Q., Qian, T., Deng, X.: Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17(6), 1321–1342 (2014)CrossRefGoogle Scholar
  30. 30.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Amer. Stat. Assoc. 101(476), 1566–1581 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Tang, J., Zhang, M., Mei, Q.: One theme in all views: Modeling consensus topics in multiple contexts. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 5–13 (2013)Google Scholar
  32. 32.
    Tang, X., Yang, C.C.: TUT: A statistical model for detecting trends, topics and user interests in social media. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pp. 972–981 (2012)Google Scholar
  33. 33.
    Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: Near-duplicate detection on twitter. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1273–1284 (2013)Google Scholar
  34. 34.
    Wang, X., McCallum, A.: Topics over time: A non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 424–433 (2006)Google Scholar
  35. 35.
    Zhou, E., Zhong, N., Li, Y.: Extracting news blog hot topics based on the W2T methodology. World Wide Web 17(3), 377–404 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceKorea UniversitySeoulRepublic of Korea

Personalised recommendations