Abstract
The Platform for Automated Extraction of Animal Disease Information from the Web (PADI-web) is a multilingual text mining tool for automatic detection, classification, and extraction of disease outbreak information from online news articles. PADI-web currently monitors the Web for nine animal infectious diseases and eight syndromes in five animal hosts. The classification module is based on a supervised machine learning approach to filter the relevant news with an overall accuracy of 0.94. The classification of relevant news between 5 topic categories (confirmed, suspected or unknown outbreak, preparedness and impact) obtained an overall accuracy of 0.75. In the first six months of its implementation (January–June 2016), PADI-web detected 73% of the outbreaks of African swine fever; 20% of foot-and-mouth disease; 13% of bluetongue, and 62% of highly pathogenic avian influenza. The information extraction module of PADI-web obtained F-scores of 0.80 for locations, 0.85 for dates, 0.95 for diseases, 0.95 for hosts, and 0.85 for case numbers.
PADI-web allows complementary disease surveillance in the domain of animal health.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
This definition is only based in the news semantic, and do not take into account the official confirmation by a formal source.
- 6.
References
Ahlers, D.: Assessment of the accuracy of GeoNames gazetteer data. In: Proceedings of the 7th Workshop on Geographic Information Retrieval, pp. 74–81. ACM, New York (2013)
Arsevska, E.: Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web. Comput. Electron. Agric. 123, 104–115 (2016). https://doi.org/10.1016/j.compag.2016.02.010
Arsevska, E., et al.: Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PLoS ONE 13(8), e0199960 (2018). https://doi.org/10.1371/journal.pone.0199960
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–D270 (2004). https://doi.org/10.1093/nar/gkh061
Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Brownstein, J.S., Freifeld, C.C., Reis, B.Y., Mandl, K.D.: Surveillance Sans Frontiéres: Internet-based emerging infectious disease intelligence and the healthmap project. PLOS Med. 5(7), 1–6 (2008). https://doi.org/10.1371/journal.pmed.0050151
Collier, N., Doan, S.: GENI-DB: a database of global events for epidemic intelligence. Bioinformatics 28(8), 1186–1188 (2012). https://doi.org/10.1093/bioinformatics/bts099
Collier, N., et al.: BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics 24(24), 2940–2941 (2008). https://doi.org/10.1093/bioinformatics/btn534
Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, NédellC, Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: Multilingual event extraction for epidemic detection. Artif. Intell. Med. 65(2), 131–143 (2015)
Madoff, L.C.: ProMED-Mail: an early warning system for emerging diseases. Clin. Infect. Dis. 39(2), 227–232 (2004). https://doi.org/10.1086/422003
Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5), 183–197 (1991). https://doi.org/10.1016/0925-2312(91)90023-5
Nahm, U.Y., Mooney, R.J.: Using information extraction to aid the discovery of prediction rules from text. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, KDD-2000 Workshop on Text Mining, pp. 51–58 (2000)
Paquet, C., Coulombier, D., Kaiser, R., Ciotti, M.: Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro. Surveill. 11(12), 212–214 (2006). 665 [pii]
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarbe, R.: Text Mining from the Web for Medical Intelligence. NATO Science for Peace and Security Series, D: Information and Communication Security, pp. 295–310 (2008)
Richardson, L.: Beautiful soup documentation (April 2007)
Robertson, C., Yee, L.: Avian influenza risk surveillance in North America with online media. PLoS ONE 11(11), 1–21 (2016). https://doi.org/10.1371/journal.pone.0165688
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
Strotgen, J., Gertz, M.: HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324 (July 2010)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings of Workshop on Frequent Itemset Mining Implementations, FIMI 2003 (2003)
Valentin, S., et al.: PADI-web: a multilingual event-based surveillance system for monitoring animal infectious diseases. Comput. Electron. Agric. 169, 105163 (2020). https://doi.org/10.1016/j.compag.2019.105163
Acknowledgements
We thank J. de Goër, B. Belot, C. Hemeury, M. Devaud, and T. Filiol for their contribution in the development of PADI-web. We also thank the members of the French Epidemic Intelligence Team in Animal Health for their constructive comments during the development of PADI-web. This work has been supported by the French General Directorate for Food (DGAL), the French Agricultural Research Centre for International Development (CIRAD), the SONGES Project (FEDER and Occitanie), and the French National Research Agency under the Investments for the Future Program, referred as ANR-16-CONV-0004 (#DigitAg). This work has also been funded by the “Monitoring outbreak events for disease surveillance in a data science context” (MOOD) project from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 874850 (https://mood-h2020.eu/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Valentin, S. et al. (2020). PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-66527-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66526-5
Online ISBN: 978-3-030-66527-2
eBook Packages: Computer ScienceComputer Science (R0)