Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling

  • Paolo MissierEmail author
  • Alexander Romanovsky
  • Tudor Miu
  • Atinder Pal
  • Michael Daniilakis
  • Alessandro Garcia
  • Diego Cedrim
  • Leonardo da Silva Sousa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9881)


Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80 % based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.


Topic Modelling Supervise Classification Manual Annotation Cluster Scheme Relevant Content 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been supported by MRC UK and FAPERJ Brazil within the Newton Fund Project entitled A Software Infrastructure for Promoting Efficient Entomological Monitoring of Dengue Fever. The authors would like to thank Oswaldo G. Cruz (Fundao Oswaldo Cruz, Programa de Computacao Cientifica) and Leonardo Frajhof (Unirio, Rio de Janeiro, Brazil) for their contributions to this paper, and Prof. Wagner Meira Jr. and his team for sharing their 2009–2011 Twitter datasets [GVM+11].


  1. [ALG+11]
    Achrekar, H., Lazarus, R., Gandhe, A., Yu, S., Liu, B.: Predicting flu trends using Twitter data. In: IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 702–707. IEEE (2011)Google Scholar
  2. [AZ12]
    Aggarwal, C.C., Zhai, C.X.: A survey of text clustering algorithms. In: Aggarwal, C.A., Zhai, C.X. (eds.) Mining Text Data, pp. 77–128. Springer, New York (2012)CrossRefGoogle Scholar
  3. [BNG11]
    Becker, H., Naaman, M., Gravano, L., Topics, B.T.: Real-world event identification on Twitter. In: Proceedings of ICWSM, pp. 1–17 (2011)Google Scholar
  4. [BNJ03]
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. [CDC15]
    CDC. Centers for Disease Control and Prevention (2015). Accessed 15 Dec 2015
  6. [CW14]
    Cheng, T., Wicks, T.: Event detection using Twitter: a spatio-temporal approach. PloS One 9(6), e97807 (2014)CrossRefGoogle Scholar
  7. [DSL+11]
    Dela Rosa, K., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical clustering of tweets. In: SIGIR 3rd Workshop on Social Web Search and Mining (2011)Google Scholar
  8. [GBH09]
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Rep. Stanford 1(12), 12 (2009)Google Scholar
  9. [GVM+11]
    Gomide, J., Veloso, A., Meira, W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the ACM WebSci 2011, Koblenz, Germany, 14–17 June 2011, pp. 1–8 (2011)Google Scholar
  10. [LC10]
    Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: 2nd International Workshop on Cognitive Information Processing, CIP 2010, pp. 411–416 (2010)Google Scholar
  11. [MPLC13]
    Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In: Proceedings of ICWSM, pp. 400–408 (2013)Google Scholar
  12. [NGS+09]
    Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Mutharaju, R., Jadhav, A.: Spatio-temporal-thematic analysis of citizen sensor data: challenges and experiences. In: Vossen, G., Long, D.D.E., Yu, J.X. (eds.) WISE 2009. LNCS, vol. 5802, pp. 539–553. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. [PR15]
    PUC-Rio. Efficient Monitoring of Aedes Mosquito in Brazil (2015). Accessed 15 Dec 2015
  14. [RDL10]
    Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1 (2010)Google Scholar
  15. [REC12]
    Ritter, A., Etzioni, O., Clark, S.: Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2012, p. 1104 (2012)Google Scholar
  16. [Rou87]
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  17. [SOM10]
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of WWW 2010, p. 851 (2010)Google Scholar
  18. [WLJH10]
    Weng, J., Lim, E., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: Proceedings of WSDM 2010, pp. 261–270. ACM (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Paolo Missier
    • 1
    Email author
  • Alexander Romanovsky
    • 1
  • Tudor Miu
    • 1
  • Atinder Pal
    • 1
  • Michael Daniilakis
    • 1
  • Alessandro Garcia
    • 2
  • Diego Cedrim
    • 2
  • Leonardo da Silva Sousa
    • 2
  1. 1.School of Computing ScienceNewcastle UniversityNewcastle upon TyneUK
  2. 2.PUC-RioRio de JaneiroBrazil

Personalised recommendations