Abstract
The early detection of disease outbursts is an important objective of epidemic surveillance. The web news are one of the information bases for detecting epidemic events as soon as possible, but to analyze tens of thousands articles published daily is costly. Recently, automatic systems have been devoted to epidemiological surveillance. The main issue for these systems is to process more languages at a limited cost. However, existing systems mainly process major languages (English, French, Russian, Spanish…). Thus, when the first news reporting a disease is in a minor language, the timeliness of event detection is worsened. In this paper, we test an automatic style-based method, designed to fill the gaps of existing automatic systems. It is parsimonious in resources and specially designed for multilingual issues. The events detected by the human-moderated ProMED mail between November 2011 and January 2012 are used as a reference dataset and compared to events detected in 17 languages by the system DAnIEL2 from web articles of this time-window. We show how being able to process press articles in languages less-spoken allows quicker detection of epidemic events in some regions of the world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Collier, N.: Towards cross-lingual alerting for bursty epidemic events. Journal of Biomedical Semantics 2(supp. 5), 1–11 (2011)
Cowen, P., Garland, T., Hugh-Jones, M.E., Shimshony, A., Handysides, S., Kaye, D., Madoff, L.C., Pollack, M.P., Woodall, J.: ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004. JAVMA 229(7), 1090–1099 (2006)
Freifeld, C.C., Mandl, K.D., Reis, B.Y., Brownstein, J.S.: Healthmap: Global infectious disease monitoring through automated classification and visualization of internet media reports. Journal of the American Medical Informatics Association 15(2), 150–157 (2008)
Katsiavriades, K., Qureshi, T.: The 30 most spoken languages of the world (2007), http://www.krysstal.com/spoken.html
Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: DAnIEL: Language Independent Character-Based News Surveillance. In: Isahara, H., Kanzaki, K. (eds.) JapTAL 2012. LNCS, vol. 7614, pp. 64–75. Springer, Heidelberg (2012)
Lejeune, G., Doucet, A., Yangarber, R., Lucas, N.: Filtering news for epidemic surveillance: Towards processing more languages with fewer resources. In: 4th Workshop on Cross Lingual Information Access, pp. 3–10 (2010)
Lyon, A., Nunn, M., Grossel, G., Burgman, M.: Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap. Transboundary and Emerging Diseases 59(3), 223–232 (2011), http://dx.doi.org/10.1111/j.1865-1682.2011.01258.x
Madoff, L., Freedman, D.: Detection of Infectious Diseases Using Unofficial Sources. In: Infectious Diseases: A Geographic Guide, pp. 11–21. Wiley-Blackwell (2011)
Mawudeku, A., Blench, M.: Global Public Health Intelligence Network (GPHIN). In: 7th Conference of the Association for Machine Translation in the Americas (AMTA), pp. 7–11 (2006)
Mondor, L., Brownstein, J.S., Chan, E., Madoff, L.C., Pollack, M.P., Buckeridge, D.L., Brewer, T.: Timeliness of nongovernmental versus governmental global outbreak communications. Emerging Infectious Diseases 18(7), 1184–1187 (2012)
Morse, S.S.: Public health surveillance and infectious disease detection. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science 10(1), 6–16 (2012)
Piskorski, J., Belyaeva, J., Atkinson, M.: Exploring the usefulness of cross-lingual information fusion for refining real-time news event extraction: A preliminary study. In: Proceedings of Recent Advances in Natural Language Processing, pp. 210–217 (2011)
Son, D., Quoc, H.N., Ai, K., Collier, N.: Global Health Monitor - A Web-based system for detecting and mapping infectious diseases. In: Proc. International Joint Conference on Natural Language Processing (IJCNLP), pp. 951–956 (2008)
Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation, 1–22 (2011)
Tolentino, H., Kamadjeu, R., Fontelo, P., Liu, F., Matters, M., Pollack, M.P., Madoff, L.: Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER. Advances in Disease Surveillance 2, 169 (2007)
Yangarber, R., von Etter, P., Steinberger, R.: Content collection and analysis in the domain of epidemiology. In: Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources (2008), http://www.mendeley.com/research/content-collection-analysis-domain-epidemiology/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lejeune, G., Brixtel, R., Lecluze, C., Doucet, A., Lucas, N. (2013). Added-Value of Automatic Multilingual Text Analysis for Epidemic Surveillance. In: Peek, N., Marín Morales, R., Peleg, M. (eds) Artificial Intelligence in Medicine. AIME 2013. Lecture Notes in Computer Science(), vol 7885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38326-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-38326-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38325-0
Online ISBN: 978-3-642-38326-7
eBook Packages: Computer ScienceComputer Science (R0)