Classifying and Summarizing Information from Microblogs During Epidemics
- 176 Downloads
During a new disease outbreak, frustration and uncertainties among affected and vulnerable population increase. Affected communities look for known symptoms, prevention measures, and treatment strategies. On the other hand, health organizations try to get situational updates to assess the severity of the outbreak, known affected cases, and other details. Recent emergence of social media platforms such as Twitter provide convenient ways and fast access to disseminate and consume information to/from a wider audience. Research studies have shown potential of this online information to address information needs of concerned authorities during outbreaks, epidemics, and pandemics. In this work, we target three types of end-users (i) vulnerable population—people who are not yet affected and are looking for prevention related information (ii) affected population—people who are affected and looking for treatment related information, and (iii) health organizations—like WHO, who are interested in gaining situational awareness to make timely decisions. We use Twitter data from two recent outbreaks (Ebola and MERS) to build an automatic classification approach useful to categorize tweets into different disease related categories. Moreover, the classified messages are used to generate different kinds of summaries useful for affected and vulnerable communities as well as health organizations. Results obtained from extensive experimentation show the effectiveness of the proposed approach.
KeywordsHealth crisis Epidemic Twitter Classification Summarization
K. Rudra was supported by a fellowship from Tata Consultancy Services.
Compliance with Ethical Standards
The authors don’t have any competing interests in this paper.
- Aronson, A.R. (2001). Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA symposium (p. 17). American Medical Informatics Association.Google Scholar
- Aspell-python. (2011). Python wrapper for aspell (C extension and python version). https://github.com/WojciechMula/aspell-python.
- Centers for Disease Control and Prevention. (2014). https://www.cdc.gov/coronavirus/mers/.
- De Choudhury, M. (2015). Anorexia on tumblr: a characterization study. In Proceedings of the 5th international conference on digital health 2015 (pp. 43–50). ACM.Google Scholar
- de Quincey, E., Kyriacou, T., Pantin, T. (2016). # Hayfever; a longitudinal study into hay fever related tweets in the UK. In Proceedings of the 6th international conference on digital health conference (pp. 85–89). ACM.Google Scholar
- Denecke, K. (2014). Extracting medical concepts from medical social media with clinical nlp tools: a qualitative study. In Proceedings of the fourth workshop on building and evaluation resources for health and biomedical text processing.Google Scholar
- Elkin, N. (2008). How America searches: health and wellness. Opinion Research Corporation: iCrossing pp. 1–17.Google Scholar
- Esuli, A., & Sebastiani, F. (2007). SENTIWORDNET: a high-coverage lexical resource for opinion mining. Technical Report 2007-TR-02 Istituto di Scienza e Tecnologie dell’Informazione Consiglio Nazionale delle Ricerche Pisa IT.Google Scholar
- Fox, S. (2011). The social life of health information Vol. 2011. Washington, DC: Pew Internet & American Life Project.Google Scholar
- Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A. (2011). Part-of-speech tagging for twitter: annotation, features, and experiments. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers (Vol. 2, pp. 42–47). Association for Computational Linguistics.Google Scholar
- Goodwin, T.R., & Harabagiu, S.M. (2016). Medical question answering for clinical decision support. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 297–306). ACM.Google Scholar
- Gurobi. (2015). The overall fastest and best supported solver available. http://www.gurobi.com/.
- Heinze, D.T., Morsch, M.L., Holbrook, J. (2001). Mining free-text medical records. In Proceedings of the AMIA symposium (p. 254). American Medical Informatics Association.Google Scholar
- Homan, C.M., Lu, N., Tu, X., Lytle, M.C., Silenzio, V. (2014). Social structure and depression in trevorspace. In Proceedings of the 17th ACM conference on computer supported cooperative work & social computing (pp. 615–625). ACM.Google Scholar
- Imran, M., Castillo, C., Lucas, J., Meier, P., Vieweg, S. (2014). Aidr: Artificial intelligence for disaster response. In Proceedings of the WWW companion (pp. 159–162).Google Scholar
- Imran, M., Mitra, P., Castillo, C. (2016). Twitter as a lifeline: human-annotated twitter corpora for nlp of crisis-related messages. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). European language resources association (ELRA), Paris, France.Google Scholar
- Kong, L., Schneider, N., Swayamdipta, S., Bhatia, A., Dyer, C., Smith, N.A. (2014). A dependency parser for tweets. In Proceedings of the EMNLP.Google Scholar
- Lin, C.Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Proceedings of the workshop on text summarization branches out (with ACL).Google Scholar
- Lu, Y., Zhang, P., Deng, S. (2013). Exploring health-related topics in online health community using cluster analysis. In 46th Hawaii international conference on system sciences (HICSS), 2013 (pp. 802–811). IEEE.Google Scholar
- Maity, S., Chaudhary, A., Kumar, S., Mukherjee, A., Sarda, C., Patil, A., Mondal, A. (2016). Wassup? lol: characterizing out-of-vocabulary words in twitter. In Proceedings of the 19th ACM conference on computer supported cooperative work and social computing companion, CSCW ’16 companion (pp. 341–344). New York: ACM.Google Scholar
- Paul, M.J., & Dredze, M. (2011). You are what you tweet: analyzing twitter for public health. Icwsm, 20, 265–272.Google Scholar
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
- Rudra, K., Ghosh, S., Ganguly, N., Goyal, P., Ghosh, S. (2015). Extracting situational information from microblogs during disaster events: a classification-summarization approach. In Proceedings of the CIKM.Google Scholar
- Rudra, K., Sharma, A., Ganguly, N., Imran, M. (2017). Classifying information from microblogs during epidemics. In Proceedings of the 2017 international conference on digital health (pp. 104–108). ACM.Google Scholar
- Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G. (2010). Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.CrossRefGoogle Scholar
- Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.Y. (2001). Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA symposium (p. 662). American Medical Informatics Association.Google Scholar
- Tu, H., Ma, Z., Sun, A., Wang, X. (2016). When metamap meets social media in healthcare: are the word labels correct?. In Information retrieval technology (pp. 356–362). Springer.Google Scholar
- World Health Organization (WHO). (2014). http://www.who.int/mediacentre/.
- Yom-Tov, E. (2015). Ebola data from the internet: an opportunity for syndromic surveillance or a news event?. In Proceedings of the 5th international conference on digital health 2015 (pp. 115–119). ACM.Google Scholar