Improving the Fusion of Outbreak Detection Methods with Supervised Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12313)


Epidemiologists use a variety of statistical algorithms for the early detection of outbreaks. The practical usefulness of such methods highly depends on the trade-off between the detection rate of outbreaks and the chances of raising a false alarm. Recent research has shown that the use of machine learning for the fusion of multiple statistical algorithms improves outbreak detection. Instead of relying only on the binary outputs (alarm or no alarm) of the statistical algorithms, we propose to make use of their p-values for training a fusion classifier. In addition, we also show that adding contextual features and adapting the labeling of an epidemic period may further improve performance. For comparison and evaluation, a new measure is introduced which captures the performance of an outbreak detection method with respect to a low rate of false alarms more precisely than previous works. We have performed experiments on synthetic data to evaluate our proposed approach and the adaptations in a controlled setting and used the reported cases for the disease Salmonella and Campylobacter from 2001 until 2018 all over Germany to evaluate on real data. The experimental results show a substantial improvement on the synthetic data when p-values are used for learning. The results on real data are less clear. Inconsistencies in the data appearing under real conditions make it more challenging for the learning approach to identify valuable patterns for outbreak detection.


Outbreak detection Fusion methods Stacking Syndromic surveillance 



This work was supported by the Innovation Committee of the Federal Joint Committee (G-BA) [ESEG project, grant number 01VSF17034]. We thank our project partners the Health Protection Authority of Frankfurt, the Hesse State Health Office and Centre for Health Protection, the Hesse Ministry of Social Affairs and Integration, the Robert Koch-Institut, the Epias GmbH and the Sana Klinikum Offenbach GmbH who provided insight and expertise that greatly assisted the research. Especially, we thank Linus Grabenhenrich, Alexander Ulrich, Theresa Kocher, Madlen Schranz, Sonia Boender and Birte Wagner from the Robert Koch-Institut for their valuable feedback, that substantially improved the manuscript, and for providing us the data for the evaluation.


  1. 1.
    Jafarpour, N., Precup, D., Izadi, M., Buckeridge, D.: Using hierarchical mixture of experts model for fusion of outbreak detection methods. In: Annual Symposium Proceedings 2013, pp. 663–669, November 2013Google Scholar
  2. 2.
    Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)CrossRefGoogle Scholar
  3. 3.
    Kleinman, K.P., Abrams, A.M.: Assessing surveillance using sensitivity, specificity and timeliness. Stat. Methods Med. Res. 15(5), 445–464 (2006)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Kulessa, M., Loza Mencía, E., Fürnkranz, J.: Improving outbreak detection with stacking of statistical surveillance methods. In: Workshop Proceedings of epiDAMIK: Epidemiology meets Data Mining and Knowledge Discovery (held in conjunction with ACM SIGKDD 2019) (2019). Also as preprint arXiv:1907.07464
  5. 5.
    Ma, H., Bandos, A.I., Rockette, H.E., Gur, D.: On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat. Med. 32(20), 3449–3458 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Noufaily, A., Enki, D.G., Farrington, P., Garthwaite, P., Andrews, N., Charlett, A.: An improved algorithm for outbreak detection in multiple surveillance systems. Stat. Med. 32(7), 1206–1222 (2013)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Salmon, M., Schumacher, D., Höhle, M.: Monitoring count time series in R: aberration detection in public health surveillance. J. Stat. Softw. 70(10), 1–35 (2016)CrossRefGoogle Scholar
  8. 8.
    Shmueli, G., Burkom, H.: Statistical challenges facing early outbreak detection in biosurveillance. Technometrics 52(1), 39–51 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Texier, G., Allodji, R.S., Diop, L., Meynard, J., Pellegrin, L., Chaudet, H.: Using decision fusion methods to improve outbreak detection in disease surveillance. BMC Med. Inform. Decis. Mak. 19(1), 38 (2019)CrossRefGoogle Scholar
  10. 10.
    Ting, K., Witten, I.: Issues in stacked generalization. J. Artif. Intell. Res. 10, 271–289 (1999)CrossRefGoogle Scholar
  11. 11.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar
  12. 12.
    Wyner, A.J., Olson, M., Bleich, J., Mease, D.: Explaining the success of AdaBoost and random forests as interpolating classifiers. J. Mach. Learn. Res. 18(48), 1–33 (2017)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Technische Universität DarmstadtDarmstadtGermany
  2. 2.Johannes Kepler Universität LinzLinzAustria

Personalised recommendations