Using Structural Topic Modeling to Detect Events and Cluster Twitter Users in the Ukrainian Crisis

  • Alan Mishler
  • Erin Smith Crabb
  • Susannah Paletz
  • Brook Hefright
  • Ewa Golonka
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 528)


Structural topic modeling (STM) is a recently introduced technique to model how the content of a collection of documents changes as a function of variables such as author identity or time of writing. We present two proof-of-concept applications of STM using Russian social media data. In our first study, we model how topics change over time, showing that STM can be used to detect significant events such as the downing of Malaysia Air Flight 17. In our second study, we model how topical content varies across a set of authors, showing that STM can be used to cluster Twitter users who are sympathetic to Ukraine versus Russia as well as to cluster accounts that are suspected to belong to the same individual (so-called “sockpuppets”). Structural topic modeling shows promise as a tool for analyzing social media data, a domain that has been largely ignored in the topic modeling literature.


Structural topic modeling Event detection Authorship attribution Public opinion measurement Social media 


  1. 1.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55, 77–84 (2012)CrossRefGoogle Scholar
  2. 2.
    Mimno, D.: Computational historiography: data mining in a century of classics journals. J. Comput. Cult. Heritage (JOCCH) 5, 1–19 (2012)CrossRefGoogle Scholar
  3. 3.
    Yang, T.-I., Torget, A.J., Mihalcea, R. Topic modeling on historical newspapers. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 96-104. Association for Computational Linguistics, Portland, Oregon (2011)Google Scholar
  4. 4.
    Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. Association for Computational Linguistics, New York (2010)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Roberts, M.E., Stewart, B.M., Airoldi, E.M.: Working Paper (2014). Accessed 24 September 2014
  7. 7.
    Kumar, S., Barbier, G., Abbasi, M.A., Liu, H.: TweetTracker: an analysis tool for humanitarian and disaster relief. In: Proceedings of the International Conference on Weblogs and Social Media, pp. 661–662. AAAI, California (2011)Google Scholar
  8. 8.
    Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, New York (2013)Google Scholar
  9. 9.
    Roberts, M.E., Stewart, B.M., Tingley, D.: stm: R Package for Structural Topic Models. Retrieved from The Comprehensive R Network (2014).
  10. 10.
    New York Times. What Happened to Malaysia Airlines Flight 17. Accessed 23 July 2014
  11. 11.
  12. 12.
    Crabb, E.S., Mishler, A.M., Paletz, S.B., Hefright, B., Golonka, E.: Reading between the lines: a prototype model for detecting Twitter sockpuppet accounts using language-agnostic processes. Communications in Computer and Information Science (CCIS). Springer, New York (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alan Mishler
    • 1
  • Erin Smith Crabb
    • 1
  • Susannah Paletz
    • 1
  • Brook Hefright
    • 1
  • Ewa Golonka
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations