Advertisement

Canadian Journal of Public Health

, Volume 109, Issue 3, pp 419–426 | Cite as

Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study

  • Yasmin KhanEmail author
  • Garvin J. Leung
  • Paul Belanger
  • Effie Gournis
  • David L. Buckeridge
  • Li Liu
  • Ye Li
  • Ian L. Johnson
Quantitative Research

Abstract

Objectives

This study examined Twitter for public health surveillance during a mass gathering in Canada with two objectives: to explore the feasibility of acquiring, categorizing and using geolocated Twitter data and to compare Twitter data against other data sources used for Pan Parapan American Games (P/PAG) surveillance.

Methods

Syndrome definitions were created using keyword categorization to extract posts from Twitter. Categories were developed iteratively for four relevant syndromes: respiratory, gastrointestinal, heat-related illness, and influenza-like illness (ILI). All data sources corresponded to the location of Toronto, Canada. Twitter data were acquired from a publicly available stream representing a 1% random sample of tweets from June 26 to September 10, 2015. Cross-correlation analyses of time series data were conducted between Twitter and comparator surveillance data sources: emergency department visits, telephone helpline calls, laboratory testing positivity rate, reportable disease data, and temperature.

Results

The frequency of daily tweets that were classified into syndromes was low, with the highest mean number of daily tweets being for ILI and respiratory syndromes (22.0 and 21.6, respectively) and the lowest, for the heat syndrome (4.1). Cross-correlation analyses of Twitter data demonstrated significant correlations for heat syndrome with two data sources: telephone helpline calls (r = 0.4) and temperature data (r = 0.5).

Conclusion

Using simple syndromes based on keyword classification of geolocated tweets, we found a correlation between tweets and two routine data sources for heat alerts, the only public health event detected during P/PAG. Further research is needed to understand the role for Twitter in surveillance.

Keywords

Mass gatherings Social media Twitter Surveillance Emergency preparedness Public health 

Résumé

Objectifs

Cette étude a examiné Twitter aux fins de la surveillance de la santé publique lors d’un rassemblement de masse au Canada avec deux objectifs: étudier la faisabilité de l’acquisition, de la catégorisation et de l’utilisation des données géolocalisées de Twitter; et les comparer avec d’autres sources de données utilisées pour la surveillance des Jeux panaméricains et parapanaméricains (JPA/PPA).

Méthodologie

Les définitions des syndromes ont été créées à l’aide de catégories de mots-clés pour extraire les messages de Twitter. Les catégories ont été établies itérativement en fonction de quatre syndromes pertinents: respiratoire, gastro-intestinal, maladie liée à la chaleur et syndrome grippal (SG). Toutes les sources de données étaient localisées à Toronto, Canada. Les données de Twitter ont été recueillies à partir d’un échantillon aléatoire représentant 1% des messages publiés entre le 26 juin et le 10 septembre 2015. Des analyses de corrélation croisée des données chronologiques ont été effectuées entre Twitter et des sources de données comparatives de surveillance: visites aux urgences, appels aux services d’aide par téléphone, taux de positivité des tests de laboratoire, données sur les maladies à déclaration obligatoire et température.

Résultats

La fréquence des messages quotidiens classés en fonction des syndromes était faible: le plus grand nombre moyen de messages quotidiens concernait les syndromes grippaux et les syndromes respiratoires (22,0 et 21,6, respectivement) et le plus faible nombre moyen de messages quotidiens concernait le syndrome de chaleur (4,1). Les analyses de corrélation croisée des données de Twitter ont démontré des corrélations significatives du syndrome de la chaleur avec deux sources de données: les appels aux services d’aide par téléphone (r = 0,4) et les données concernant la température (r = 0,5).

Conclusion

En utilisant des syndromes simples fondés sur la classification par mot-clé des messages géolocalisés, nous avons constaté une corrélation entre les messages et deux sources de données courantes des alertes de chaleur, le seul événement de santé publique détecté pendant les JPA/PPA. Des recherches supplémentaires sont nécessaires afin de comprendre le rôle de Twitter aux fins de la surveillance.

Mots-clés

Rassemblement de masse Médias sociaux Twitter Surveillance Préparation aux situations d’urgence Santé publique 

Notes

Acknowledgements

We thank Tina Badiani, Karen Johnson, Alex Marchand-Austin, Kieran Moore, and Brian Schwartz.

Compliance with ethical standards

Ethics approval was obtained from the Public Health Ontario Ethics Review Board.

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

41997_2018_59_MOESM1_ESM.docx (22 kb)
Supplementary Table 1 (DOCX 21kb)
41997_2018_59_MOESM2_ESM.docx (27 kb)
Supplementary Table 2 (DOCX 27kb)

References

  1. Aslam, A. A., Tsou, M., Spitzberg, B. H., An, L., Gawron, J. M., Gupta, D. K., et al. (2014). The reliability of tweets as a supplementary method of seasonal influenza surveillance. Journal of Medical Internet Research, 16(11), e250.CrossRefPubMedPubMedCentralGoogle Scholar
  2. Bennett, K. J., Olsen, J. M., Harris, S., Mekaru, S., Livinski, A. A., & Brownstein, J. S. (2013). The perfect storm of information: combining traditional and non-traditional data sources for public health situational awareness during hurricane response. PLOS Currents Disasters, 1.Google Scholar
  3. BLU Lab (University of Pittsburgh), The Surveillance Lab (McGill Clinical and Health Informatics Research), NLP Research Group (National Institute of Informatics, Japan). (n.d.). Syndromic Surveillance Ontology. Montreal: McGill University Available from: http://surveillance.mcgill.ca/projects/sso/. Accessed December 22, 2016.
  4. Broniatowski, D. A., Paul, M. J., & Dredze, M. (2013). National and local influenza surveillance through Twitter: an analysis of the 2012–2013 influenza epidemic. PLoS One, 8(12), e83672.CrossRefPubMedPubMedCentralGoogle Scholar
  5. Brownstein, J. S., Freifeld, C. C., & Madoff, L. C. (2009). Digital disease detection—harnessing the Web for public health surveillance. The New England Journal of Medicine, 360(21), 2153–2157.CrossRefPubMedPubMedCentralGoogle Scholar
  6. Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., & Barnes, M. D. (2012). “Right Time, Right Place” health communication on Twitter: value and accuracy of location information. Journal of Medical Internet Research, 14(6), e156.CrossRefPubMedPubMedCentralGoogle Scholar
  7. Cassa, C. A., Chunara, R., Mandl, K., & Brownstein, J. S. (2013). Twitter as a sentinel in emergency situations: lessons from the Boston marathon explosions. PLOS Currents Disasters, 1.Google Scholar
  8. Chan, E., Hohenadel, K., Lee, B., Helferty, M., Harris, J. R., Macdonald, L., et al. (2017). Public health surveillance for the Toronto 2015 Pan/Parapan American Games. Canada Communicable Disease Report, 43(7/8), 156–162.CrossRefPubMedPubMedCentralGoogle Scholar
  9. Charles-Smith, L. E., Reynolds, T. L., Cameron, M. A., Conway, M., Lau, E. H., Olsen, J. M., et al. (2015). Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PLoS One, 10(10), e0139701.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Citron, C., & Khan, Y. (2013). Hazard identification and risk assessment for the 2015 Pan and Para Pan American Games: communicable and infectious diseases. Queen's Printer for Ontario: Toronto.Google Scholar
  11. Denecke, K., Krieck, M., Otrusina, L., Smrz, P., Dolog, P., Nejdl, W., et al. (2013). How to exploit twitter for public health monitoring? Methods of Information in Medicine, 52(4), 326–339.CrossRefPubMedGoogle Scholar
  12. Elasticsearch. (2017). Elasticsearch. Mountain View: Elasticsearch BV Available from: https://www.elastic.co/products/elasticsearch. Accessed August 29, 2017.Google Scholar
  13. Government of Ontario. (2014). Get medical advice: Telehealth Ontario. Toronto: Queen's Printer for Ontario Available from: https://www.ontario.ca/page/get-medical-advice-telehealth-ontario. Accessed August 29, 2017.Google Scholar
  14. Henning, K. J. (2004). What is syndromic surveillance? MMWR. Morbidity and Mortality Weekly Report, 53(Suppl), 7–11.Google Scholar
  15. Jia, K., & Mohamed, K. (2015). Evaluating the use of cell phone messaging for community Ebola syndromic surveillance in high risked settings in Southern Sierra. Afr Health Sci, 15(3), 797–802.CrossRefPubMedPubMedCentralGoogle Scholar
  16. Keller, M., Blench, M., Tolentino, H., Freifeld, C. C., Mandl, K. D., Mawudeku, A., et al. (2009). Use of unstructured event-based reports for global infectious disease surveillance. Emerging Infectious Diseases, 15(5), 689–695.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Kingston, Frontenac and Lennox and Addington Public Health. (n.d.). Acute Care Enhanced Surveillance. Kingston: KFL&A Public Health Available from: http://www.kflaphi.ca/wp-content/uploads/ACES_Brochure_for-KFLAPHI.pdf. Accessed May 31, 2017.
  18. McCloskey, B., Endericks, T., Catchpole, M., Zambon, M., McLauchlin, J., Shetty, N., et al. (2014). London 2012 Olympic and Paralympic Games: public health surveillance and epidemiology. Lancet, 383(9934), 2083–2089.CrossRefPubMedGoogle Scholar
  19. Mollema, L., Harmsen, I. A., Broekhuizen, E., Clijnk, R., De Melker, H., Paulussen, T., et al. (2015). Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013. Journal of Medical Internet Research, (5), e128.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. arXiv:1306.5204 preprint.Google Scholar
  21. Ontario Agency for Health Protection and Promotion (Public Health Ontario). (2015). 2015 Pan Am/Parapan Am Games (P/PAG): PHO surveillance report—final summary. Toronto: Queen's Printer for Ontario.Google Scholar
  22. The Toronto Organizing Committee for the 2015 Pan American and Parapan American Games. (2015). Toronto 2015 Pan Am/Parapan Am Games. Toronto: The Toronto Organizing Committee for the 2015 Pan American and Parapan American Games Available from: http://www.toronto2015.org/. Accessed December 22, 2016.Google Scholar
  23. Twitter. (2017). Firehose: overview. San Francisco: Twitter Available from: http://support.gnip.com/apis/firehose/overview.html. Accessed May 31, 2017.Google Scholar
  24. WordNet. WordNet: a lexical database for English. Princeton: Princeton University, updated 2015. Available from: https://wordnet.princeton.edu/. Accessed December 22, 2016.
  25. World Health Organization. (2015). Public health for mass gatherings: key considerations. Geneva: World Health Organization.Google Scholar

Copyright information

© The Canadian Public Health Association 2018

Authors and Affiliations

  • Yasmin Khan
    • 1
    • 2
    • 3
    Email author
  • Garvin J. Leung
    • 1
    • 4
  • Paul Belanger
    • 5
    • 6
  • Effie Gournis
    • 4
    • 7
  • David L. Buckeridge
    • 8
    • 9
  • Li Liu
    • 5
  • Ye Li
    • 1
    • 4
  • Ian L. Johnson
    • 1
    • 4
  1. 1.Public Health OntarioTorontoCanada
  2. 2.Department of MedicineUniversity of TorontoTorontoCanada
  3. 3.University Health NetworkTorontoCanada
  4. 4.Dalla Lana School of Public HealthUniversity of TorontoTorontoCanada
  5. 5.KFL&A Public HealthKingstonCanada
  6. 6.Department of Geography and PlanningQueen’s UniversityKingstonCanada
  7. 7.Toronto Public HealthTorontoCanada
  8. 8.Surveillance LabMcGill Clinical and Health InformaticsMontrealCanada
  9. 9.Department of Epidemiology, Biostatistics, and Occupational HealthMcGill UniversityMontrealCanada

Personalised recommendations