Information Systems Frontiers

, Volume 20, Issue 5, pp 933–948 | Cite as

Classifying and Summarizing Information from Microblogs During Epidemics

  • Koustav RudraEmail author
  • Ashish Sharma
  • Niloy Ganguly
  • Muhammad Imran


During a new disease outbreak, frustration and uncertainties among affected and vulnerable population increase. Affected communities look for known symptoms, prevention measures, and treatment strategies. On the other hand, health organizations try to get situational updates to assess the severity of the outbreak, known affected cases, and other details. Recent emergence of social media platforms such as Twitter provide convenient ways and fast access to disseminate and consume information to/from a wider audience. Research studies have shown potential of this online information to address information needs of concerned authorities during outbreaks, epidemics, and pandemics. In this work, we target three types of end-users (i) vulnerable population—people who are not yet affected and are looking for prevention related information (ii) affected population—people who are affected and looking for treatment related information, and (iii) health organizations—like WHO, who are interested in gaining situational awareness to make timely decisions. We use Twitter data from two recent outbreaks (Ebola and MERS) to build an automatic classification approach useful to categorize tweets into different disease related categories. Moreover, the classified messages are used to generate different kinds of summaries useful for affected and vulnerable communities as well as health organizations. Results obtained from extensive experimentation show the effectiveness of the proposed approach.


Health crisis Epidemic Twitter Classification Summarization 



K. Rudra was supported by a fellowship from Tata Consultancy Services.

Compliance with Ethical Standards

Competing interests

The authors don’t have any competing interests in this paper.


  1. Aronson, A.R. (2001). Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA symposium (p. 17). American Medical Informatics Association.Google Scholar
  2. Aspell-python. (2011). Python wrapper for aspell (C extension and python version).
  3. Bodenreider, O. (2004). The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1), D267–D270.CrossRefGoogle Scholar
  4. Centers for Disease Control and Prevention. (2014).
  5. De Choudhury, M. (2015). Anorexia on tumblr: a characterization study. In Proceedings of the 5th international conference on digital health 2015 (pp. 43–50). ACM.Google Scholar
  6. de Quincey, E., Kyriacou, T., Pantin, T. (2016). # Hayfever; a longitudinal study into hay fever related tweets in the UK. In Proceedings of the 6th international conference on digital health conference (pp. 85–89). ACM.Google Scholar
  7. Denecke, K. (2014). Extracting medical concepts from medical social media with clinical nlp tools: a qualitative study. In Proceedings of the fourth workshop on building and evaluation resources for health and biomedical text processing.Google Scholar
  8. Denecke, K., & Nejdl, W. (2009). How valuable is medical social media data? Content analysis of the medical web. Information Sciences, 179(12), 1870–1880.CrossRefGoogle Scholar
  9. Elkin, N. (2008). How America searches: health and wellness. Opinion Research Corporation: iCrossing pp. 1–17.Google Scholar
  10. Esuli, A., & Sebastiani, F. (2007). SENTIWORDNET: a high-coverage lexical resource for opinion mining. Technical Report 2007-TR-02 Istituto di Scienza e Tecnologie dell’Informazione Consiglio Nazionale delle Ricerche Pisa IT.Google Scholar
  11. Fox, S. (2011). The social life of health information Vol. 2011. Washington, DC: Pew Internet & American Life Project.Google Scholar
  12. Friedman, C., Hripcsak, G., Shagina, L., Liu, H. (1999). Representing information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association, 6(1), 76–87.CrossRefGoogle Scholar
  13. Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G. (2004). Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association, 11(5), 392–402.CrossRefGoogle Scholar
  14. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A. (2011). Part-of-speech tagging for twitter: annotation, features, and experiments. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers (Vol. 2, pp. 42–47). Association for Computational Linguistics.Google Scholar
  15. Goodwin, T.R., & Harabagiu, S.M. (2016). Medical question answering for clinical decision support. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 297–306). ACM.Google Scholar
  16. Gurobi. (2015). The overall fastest and best supported solver available.
  17. Heinze, D.T., Morsch, M.L., Holbrook, J. (2001). Mining free-text medical records. In Proceedings of the AMIA symposium (p. 254). American Medical Informatics Association.Google Scholar
  18. Homan, C.M., Lu, N., Tu, X., Lytle, M.C., Silenzio, V. (2014). Social structure and depression in trevorspace. In Proceedings of the 17th ACM conference on computer supported cooperative work & social computing (pp. 615–625). ACM.Google Scholar
  19. Hripcsak, G., Austin, J.H., Alderson, P.O., Friedman, C. (2002). Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports 1. Radiology, 224(1), 157–163.CrossRefGoogle Scholar
  20. Imran, M., Castillo, C., Lucas, J., Meier, P., Vieweg, S. (2014). Aidr: Artificial intelligence for disaster response. In Proceedings of the WWW companion (pp. 159–162).Google Scholar
  21. Imran, M., Mitra, P., Castillo, C. (2016). Twitter as a lifeline: human-annotated twitter corpora for nlp of crisis-related messages. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). European language resources association (ELRA), Paris, France.Google Scholar
  22. Kinnane, N.A., & Milne, D.J. (2010). The role of the internet in supporting and informing carers of people with cancer: a literature review. Supportive Care in Cancer, 18(9), 1123–1136.CrossRefGoogle Scholar
  23. Kong, L., Schneider, N., Swayamdipta, S., Bhatia, A., Dyer, C., Smith, N.A. (2014). A dependency parser for tweets. In Proceedings of the EMNLP.Google Scholar
  24. Lin, C.Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Proceedings of the workshop on text summarization branches out (with ACL).Google Scholar
  25. Lu, Y., Zhang, P., Deng, S. (2013). Exploring health-related topics in online health community using cluster analysis. In 46th Hawaii international conference on system sciences (HICSS), 2013 (pp. 802–811). IEEE.Google Scholar
  26. Maity, S., Chaudhary, A., Kumar, S., Mukherjee, A., Sarda, C., Patil, A., Mondal, A. (2016). Wassup? lol: characterizing out-of-vocabulary words in twitter. In Proceedings of the 19th ACM conference on computer supported cooperative work and social computing companion, CSCW ’16 companion (pp. 341–344). New York: ACM.Google Scholar
  27. Park, A., Hartzler, A.L., Huh, J., McDonald, D.W., Pratt, W. (2014). Automatically detecting failures in natural language processing tools for online community text. Journal of Medical Internet Research, 17(8), e212–e212.CrossRefGoogle Scholar
  28. Paul, M.J., & Dredze, M. (2011). You are what you tweet: analyzing twitter for public health. Icwsm, 20, 265–272.Google Scholar
  29. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
  30. Roberts, K., & Harabagiu, S.M. (2011). A flexible framework for deriving assertions from electronic medical records. Journal of the American Medical Informatics Association, 18(5), 568–573.CrossRefGoogle Scholar
  31. Rudra, K., Ghosh, S., Ganguly, N., Goyal, P., Ghosh, S. (2015). Extracting situational information from microblogs during disaster events: a classification-summarization approach. In Proceedings of the CIKM.Google Scholar
  32. Rudra, K., Sharma, A., Ganguly, N., Imran, M. (2017). Classifying information from microblogs during epidemics. In Proceedings of the 2017 international conference on digital health (pp. 104–108). ACM.Google Scholar
  33. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G. (2010). Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.CrossRefGoogle Scholar
  34. Scanfeld, D., Scanfeld, V., Larson, E.L. (2010). Dissemination of health information through social networks: twitter and antibiotics. American Journal of Infection Control, 38(3), 182–188.CrossRefGoogle Scholar
  35. Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.Y. (2001). Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA symposium (p. 662). American Medical Informatics Association.Google Scholar
  36. Tu, H., Ma, Z., Sun, A., Wang, X. (2016). When metamap meets social media in healthcare: are the word labels correct?. In Information retrieval technology (pp. 356–362). Springer.Google Scholar
  37. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L. (2011). 2010 I2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.CrossRefGoogle Scholar
  38. World Health Organization (WHO). (2014).
  39. Yang, F.C., Lee, A.J., Kuo, S.C. (2016). Mining health social media with sentiment analysis. Journal of medical systems, 40(11), 236.CrossRefGoogle Scholar
  40. Yom-Tov, E. (2015). Ebola data from the internet: an opportunity for syndromic surveillance or a news event?. In Proceedings of the 5th international conference on digital health 2015 (pp. 115–119). ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IIT KharagpurKharagpurIndia
  2. 2.Qatar Computing Research InstituteHBKUDohaQatar

Personalised recommendations