Added-Value of Automatic Multilingual Text Analysis for Epidemic Surveillance

  • Gaël Lejeune
  • Romain Brixtel
  • Charlotte Lecluze
  • Antoine Doucet
  • Nadine Lucas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7885)


The early detection of disease outbursts is an important objective of epidemic surveillance. The web news are one of the information bases for detecting epidemic events as soon as possible, but to analyze tens of thousands articles published daily is costly. Recently, automatic systems have been devoted to epidemiological surveillance. The main issue for these systems is to process more languages at a limited cost. However, existing systems mainly process major languages (English, French, Russian, Spanish…). Thus, when the first news reporting a disease is in a minor language, the timeliness of event detection is worsened. In this paper, we test an automatic style-based method, designed to fill the gaps of existing automatic systems. It is parsimonious in resources and specially designed for multilingual issues. The events detected by the human-moderated ProMED mail between November 2011 and January 2012 are used as a reference dataset and compared to events detected in 17 languages by the system DAnIEL2 from web articles of this time-window. We show how being able to process press articles in languages less-spoken allows quicker detection of epidemic events in some regions of the world.


Natural Language Processing Local Language Reference Dataset Main Language Major Language 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Collier, N.: Towards cross-lingual alerting for bursty epidemic events. Journal of Biomedical Semantics 2(supp. 5), 1–11 (2011)Google Scholar
  2. 2.
    Cowen, P., Garland, T., Hugh-Jones, M.E., Shimshony, A., Handysides, S., Kaye, D., Madoff, L.C., Pollack, M.P., Woodall, J.: ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004. JAVMA 229(7), 1090–1099 (2006)CrossRefGoogle Scholar
  3. 3.
    Freifeld, C.C., Mandl, K.D., Reis, B.Y., Brownstein, J.S.: Healthmap: Global infectious disease monitoring through automated classification and visualization of internet media reports. Journal of the American Medical Informatics Association 15(2), 150–157 (2008)CrossRefGoogle Scholar
  4. 4.
    Katsiavriades, K., Qureshi, T.: The 30 most spoken languages of the world (2007),
  5. 5.
    Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: DAnIEL: Language Independent Character-Based News Surveillance. In: Isahara, H., Kanzaki, K. (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 64–75. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Lejeune, G., Doucet, A., Yangarber, R., Lucas, N.: Filtering news for epidemic surveillance: Towards processing more languages with fewer resources. In: 4th Workshop on Cross Lingual Information Access, pp. 3–10 (2010)Google Scholar
  7. 7.
    Lyon, A., Nunn, M., Grossel, G., Burgman, M.: Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap. Transboundary and Emerging Diseases 59(3), 223–232 (2011), CrossRefGoogle Scholar
  8. 8.
    Madoff, L., Freedman, D.: Detection of Infectious Diseases Using Unofficial Sources. In: Infectious Diseases: A Geographic Guide, pp. 11–21. Wiley-Blackwell (2011)Google Scholar
  9. 9.
    Mawudeku, A., Blench, M.: Global Public Health Intelligence Network (GPHIN). In: 7th Conference of the Association for Machine Translation in the Americas (AMTA), pp. 7–11 (2006)Google Scholar
  10. 10.
    Mondor, L., Brownstein, J.S., Chan, E., Madoff, L.C., Pollack, M.P., Buckeridge, D.L., Brewer, T.: Timeliness of nongovernmental versus governmental global outbreak communications. Emerging Infectious Diseases 18(7), 1184–1187 (2012)CrossRefGoogle Scholar
  11. 11.
    Morse, S.S.: Public health surveillance and infectious disease detection. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science 10(1), 6–16 (2012)CrossRefGoogle Scholar
  12. 12.
    Piskorski, J., Belyaeva, J., Atkinson, M.: Exploring the usefulness of cross-lingual information fusion for refining real-time news event extraction: A preliminary study. In: Proceedings of Recent Advances in Natural Language Processing, pp. 210–217 (2011)Google Scholar
  13. 13.
    Son, D., Quoc, H.N., Ai, K., Collier, N.: Global Health Monitor - A Web-based system for detecting and mapping infectious diseases. In: Proc. International Joint Conference on Natural Language Processing (IJCNLP), pp. 951–956 (2008)Google Scholar
  14. 14.
    Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation, 1–22 (2011)Google Scholar
  15. 15.
    Tolentino, H., Kamadjeu, R., Fontelo, P., Liu, F., Matters, M., Pollack, M.P., Madoff, L.: Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER. Advances in Disease Surveillance 2, 169 (2007)Google Scholar
  16. 16.
    Yangarber, R., von Etter, P., Steinberger, R.: Content collection and analysis in the domain of epidemiology. In: Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources (2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Gaël Lejeune
    • 1
  • Romain Brixtel
    • 1
  • Charlotte Lecluze
    • 1
  • Antoine Doucet
    • 1
  • Nadine Lucas
    • 1
  1. 1.UNICAEN, GREYC CNRS UMR-6072Normandy UniversityCaen CedexFrance

Personalised recommendations