Data Analysis and Outbreak Detection

  • Hsinchun Chen
  • Daniel Zeng
  • Ping Yan
Part of the Integrated Series in Information Systems book series (ISIS, volume 21)


The analysis components of a syndromic surveillance system focus on detecting the changes in public health status, which may be indicative of disease outbreaks. At the core of these analysis components is the automated process of detecting aberration or data anomalies in the public health surveillance data, which often have prominent temporal and spatial data elements, by statistical analysis or data mining techniques. These methods are also capable of dealing with various common problems in epidemiological data such as bias, delay, lack of accuracy, and seasonality. These techniques are the focus of this chapter.

When processing public health surveillance data streams, it is often necessary to map the collected syndromic data into a small set of syndrome categories to facilitate follow-up analysis and outbreak detection. Section 4.1 discusses related syndrome classification approaches. In Sect. 4.2, we provide a taxonomy of anomaly analysis and outbreak detection methods used for biosurveillance. Sections 4.3–4.6 summarize various specific detection methods spanning from classic statistical methods to data mining approaches, which quantify the possibility of an outbreak conditioned on surveillance data.


Anomaly Detection Exponentially Weighted Move Average Recursive Least Square Unify Medical Language System Syndromic Surveillance 


  1. Abrams, A.M., and Kleinman, K.P. 2007. "A Satscan? Macro Accessory for Cartography (SMAC) Package Implemented with Sas® Software," International Journal of Health Geographics (6:6).PubMedCrossRefGoogle Scholar
  2. Bath, P.A. 2004. "Data Mining in Health and Medical Information," Annual Review of Information Science and Technology (ARIST) (38), pp. 331-369.CrossRefGoogle Scholar
  3. Benoit, G. 2002. "Data Mining," Annual Review of Information Science and Technology (ARIST) (36), pp. 265–310.CrossRefGoogle Scholar
  4. Bradley, C.A., Rolka, H., Walker, D., and Loonsk, J. 2005. "BioSense: Implementation of a National Early Event Detection and Situational Awareness System," MMWR (CDC) (54(Suppl)), pp. 11–20.Google Scholar
  5. Brookmeyer, R., and Stroup, D. 2004. Monitoring the Health of Populations: Statistical Surveillance in Public Health. New York: Oxford University Press.Google Scholar
  6. Buckeridge, D., Burkom, H., Campbell, M., Hogan, W., and Moore, A. 2005a. "Algorithms for Rapid Outbreak Detection: A Research Synthesis," Journal of Biomedical Informatics (38), pp. 99–113.PubMedCrossRefGoogle Scholar
  7. Buckeridge, D., Graham, J., O'Connor, J., Choy, M.K., Tu, S.W., and Musen, M. 2002. "Knowledge-Based Bioterrorism Surveillance," American Medical Informatics Association Symposium, San Antonio, TX.Google Scholar
  8. Buckeridge, D., Musen, M., Switzer, P., and Crubezy, M. 2003. "An Analytic Framework for Space-Time Aberrancy Detection in Public Health Surveillance Data," AMIA Symposium pp. 120–124.Google Scholar
  9. Burkom, H., and Murphy, S. 2007. "Data Classification for Selection of Temporal Alerting Methods for Biosurveillance." BioSurvellance workshop 2007.Google Scholar
  10. CDC. 2003. "HIPAA Privacy Rule and Public Health: Guidance from CDC and the US Department of Health and Human Services," MMWR (52(Suppl)), pp. 1–20.Google Scholar
  11. Chang, W., Zeng, D., and Chen, H. 2005. "Prospective Spatio-Temporal Data Analysis for Security Informatics," In proceedings of the 8th IEEE International Conference on Intelligent Transportation Systems, Vienna, Austria.Google Scholar
  12. Chapman, W.W., Christensen, L., Wagner, M.M., Haug, P., Ivanov, O., Dowling, J., and Olszewski, R. 2005. "Classifying Free-Text Triage Chief Complaints into Syndromic Categories with Natural Language Processing," Artificial Intelligence in Medicine (33:1), pp. 31–40.PubMedCrossRefGoogle Scholar
  13. Chapman, W.W., Cooper, G.F., Hanbury, P., Chapman, B.E., Harrison, L.H., and Wagner, M.M. 2003. "Creating a Text Classifier to Detect Radiology Reports Describing Mediastinal Findings Associated with Inhalational Anthrax and Other Disorders," Journal of the American Medical Informatics Association (10:5), pp. 494–503.PubMedCrossRefGoogle Scholar
  14. Cooper, G.F., Dash, D.H., Levander, J.D., Wong, W.K., Hogan, W.R., and Wagner, M.M. 2004. "Bayesian Biosurveillance of Disease Outbreaks," Twentieth Conference on Uncertainty in Artificial Intelligence, Banff, Alberta, Canada, pp. 94–103.Google Scholar
  15. Costagliola, D., Flahault, A., Galinec, D., Garnerin, P., Menares, J., and Valleron, A. 1981. "A Routine Tool for Detection and Assessment of Epidemics of Influenza-Like Syndromes in France," American Journal of Public Health (81:1), pp. 97–99.CrossRefGoogle Scholar
  16. Crubézy, M., O'Connor, Pincus, Z., and Musen, M.A. 2005. "Ontology-Centered Syndromic Surveillance for Bioterrorism," IEEE Intelligent Systems (20:5), pp. 26–35.CrossRefGoogle Scholar
  17. Das, D., Weiss, D., and Mostashari, F. 2003. "Enhanced Drop-in Syndromic Surveillance in New York City Following September 11, 2001," J Urban Health (80:1(suppl)), pp. 176–188.Google Scholar
  18. Duczmal, L., and Buckeridge, D. 2005. "Using Modified Spatial Scan Statistic to Improve Detection of Disease Outbreak When Exposure Occurs in Workplace - Viginia," MMWR (CDC) (54(Suppl)), p. 187.Google Scholar
  19. Espino, J.U., and Wagner, M.M. 2001. "The Accuracy of ICD-9 Coded Chief Complaints for Detection of Acute Respiratory Illness," Proc AMIA Symp, pp. 164–168.Google Scholar
  20. Gesteland, P.H., Wagner, M.M., Chapman, W.W., Espino, J.U., Tsui, F.-C., Gardner, R.M., Rolfs, R.T., Dato, V.M., James, B.C., and Haug, P.J. 2002. "Rapid Deployment of an Electronic Disease Surveillance System in the State of Utah for the 2002 Olympic Winter Games," Proceedings of AMIA Symposium 2002, pp. 285–289.Google Scholar
  21. Grigoryan, V.V., Wagner, M.M., Waller, K., Wallstrom, G.L., and Hogan, W.R. 2005. "The Effect of Spatial Granularity of Data on Reference Dates for Influenza Outbreaks," in: RODS Laboratory Technical Report, 2005.Google Scholar
  22. Halasz, S., Brown, P., Goodall, C., Cochrane, D.G., and Allegra`, J.R. 2006. "The N-gram CC Classifier: A Novel Method of Automatically Creating CC Classifiers Based on ICD-9 Groupings," Advances in Disease Surveillance 2006 (1:30).Google Scholar
  23. Hutwagner L, Thompson W, Seeman GM, and T, T. 2003. "The Bioterrorism Preparedness and Response Early Aberration Reporting System (EARS)," J Urban Health (80(2 suppl 1)), pp. 89–96.Google Scholar
  24. Hutwagner, L., Browne, T., Seeman, G.M., and Fleischauer, A.T. 2005a. "Comparing Aberration Detection Methods with Simulated Data," Emerg Infect Dis [serial on the Internet] (11), pp. 314–316.Google Scholar
  25. Ivanov, O., Wagner, M.M., Chapman, W.W., and Olszewski, R.T. 2002. "Accuracy of Three Classifiers of Acute Gastrointestinal Syndrome for Syndromic Surveillance," AMIA Symp, pp. 345–349.Google Scholar
  26. Kaufman, Z., Cohen, E., Peled-Leviatan, T., Lavi, C., Aharonowitz, G., Dichtiar, R., Bromberg, M., Havkin, O., Shalev, Y., Marom, R., Shalev, V., Shemer, J., and Green, M. 2005. "Using Data on an Influenza B Outbreak to Evaluate a Syndromic Surveillance System - Israel, June 2004 [Abstract]," MMWR (CDC) (54(Suppl)), p. 191.Google Scholar
  27. Kleinman, K., Lazarus, R., and Platt, R. 2004. "A Generalized Linear Mixed Models Approach for Detecting Incident Cluster/Signals of Disease in Small Areas, with an Application to Biological Terrorism (with Invited Commentary)," American Journal of Epidemiology (159), pp. 217–224.PubMedCrossRefGoogle Scholar
  28. Kleinman, K., Abrams, A., Kulldorff, M., and Platt, R. 2005a. "A Model-Adjusted Spacetime Scan Statistic with an Application to Syndromic Surveillance," Epidemiology and Infection (119), pp. 409–419.CrossRefGoogle Scholar
  29. Kulldorff, M. 1997. "A Spatial Scan Statistic," Communications in Statistics: Theory and Methods (26), pp. 1481–1496.CrossRefGoogle Scholar
  30. Kulldorff, M. 1999. "Spatial Scan Statistics: Models, Calculations, and Applications," Scan Statistics and Applications, J.B. Glaz (ed.). Birkhauser, Boston: pp. 303–322.CrossRefGoogle Scholar
  31. Kulldorff, M. 2001. "Prospective Time Periodic Geographical Disease Surveillance Using a Scan Statistic," Journal of the Royal Statistical Society (Series A:164), pp. 61–72.CrossRefGoogle Scholar
  32. Kulldorff, M., Fang, Z., and Walsh, S. 2003. "A Tree-Based Scan Statistic for Database Disease Surveillance," Biometrics (9), pp. 641–646.Google Scholar
  33. Kulldorff, M., Mostashari, F., Duczmal, L., Yih, K., Kleinman, K., and Platt, R. 2005. "Multivariate Spatial Scan Statistics for Disease Surveillance."Google Scholar
  34. Lawson, A. B., and Kleinman, K. 2005. Spatial & Syndromic Surveillance for Public Health. New York: Wiley.CrossRefGoogle Scholar
  35. Le, S.Y., and Carrat, F. 1999. "Monitoring Epidemiologic Surveillance Data Using Hidden Markov Models," Statistics in Medicine (18), pp. 3463–3478.CrossRefGoogle Scholar
  36. Leroy, G., and Chen, H. 2001. "Meeting Medical Terminology Needs - the Ontology-Enhanced Medical Concept Mapper," IEEE Transactions on Information Technology in Biomedicine (5), pp. 261–270.PubMedCrossRefGoogle Scholar
  37. Levine, N. 2002. "Crimestat III: A Spatial Statistics Program for the Analysis of Crime Incident Locations," Washington, DC, The National Institute of Justice.Google Scholar
  38. Lombardo, J., Burkom, H., and Pavlin, J. 2004. "Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE II), Framework for Evaluating Syndromic Surveillance Systems," Syndromic surveillance: report from a national conference, 2003. MMWR 2004 (53(Suppl)), pp. 159–165.Google Scholar
  39. Lu, H.-M., King, C.-C., Wu, T.S., Shin, F.-Y., Hsiao, J.-Y., Zeng, D., and Chen, H. 2007a. " Chinese Chief Complaint Classification for Syndromic Surveillance," in: Intelligence and Security Informatics: BioSurveillance, D. Zeng, Gotham, I., Komatsu, K., Lynch, C., Thurmond, M., Madigan, D., Lober, B., Kvach, J., and Chen, H (ed.). New Brunswick, NJ: Springer Lecture Notes in Computer Science, No. 4506.Google Scholar
  40. Lu, H.-M., Zeng, D., Trujillo, L., Komatsu, K., and Chen, H. 2008. "Ontology-Enhanced Automatic Chief Complaint Classification for Syndromic Surveillance," Journal of Biomedical Informatics (41:2), pp. 340–356.PubMedCrossRefGoogle Scholar
  41. Ma, H., Rolka, H., Mandl, K., Buckeridge, D., Fleischauer, A., and Pavlin, J. 2005. "Implementation of Laboratory Order Data in Biosense Early Event Detection and Situation Awareness System," MMWR (CDC) (54(Suppl)), pp. 27–30.Google Scholar
  42. Madign, D. 2005. "Bayesian Data Mining for Health Surveillance," Spatial & Syndromic Surveillance for Public Health, A.B. Lawson and K. Kleinman (eds.). New York: Wiley.Google Scholar
  43. Mandl, K.D., Overhage, J.M., Wagner, M.M., Lober, W.B., Sebastiani, P., Mostashari, F., Pavlin, J.A., Gesteland, P.H., Treadwell, T., Koski, E., Hutwagner, L., Buckeridge, D.L., Aller, R.D., and Grannis, S. 2004. "Implementing Syndromic Surveillance: A Practical Guide Informed by the Early Experience," Journal of American Medical Informatics Association (11:2), pp. 141–150.CrossRefGoogle Scholar
  44. Moore, A., and Lee, M.S. 1998. "Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets," Journal of Artificial Intelligence Research (8), pp. 67–91.Google Scholar
  45. Moore, A.W., Cooper, G., Tsui, F.-C., and Wagner, M.M. 2002. "Summary of Biosurveillance-Relevant Statistical and Data Mining Techniques," RODS Laboratory Technical Report.Google Scholar
  46. Neill, D., Moore, A., and Cooper, G. 2005. "A Bayesian Spatial Scan Statistic," Neural Information Processing Systems (18).Google Scholar
  47. Neubauer, A. 1997. "The EWMA Control Chart: Properties and Comparison with Other Quality-Control Procedures by Computer Simulation," Clinical Chemistry (43:4), pp. 594–601.PubMedGoogle Scholar
  48. Quenel, P., Dab, W., Hannoun, C., and Cohen, J. 1994. "Sensitivity, Specificity and Predictive Values of Health Service Based Indicators for the Surveillance of Influenza-A Epidemics," International Journal of Epidemiology (23), pp. 849–855.PubMedCrossRefGoogle Scholar
  49. Rath, T.M., Carreras, M., and Sebastiani, P. 2003. "Automated Detection of Influenza Epidemics with Hidden Markov Models," Lecture Notes in Computer Science Berlin: Springer, pp. 521–532.Google Scholar
  50. Reis, B., and Mandl, K. 2003. "Time Series Modeling for Syndromic Surveillance," BMC Medical Informatics and Decision Making (3:2).PubMedCrossRefGoogle Scholar
  51. Reis, B., and Mandl, K. 2004. "Syndromic Surveillance: The Effects of Syndrome Grouping on Model Accuracy and Outbreak Detection," Annals of Emergency Medicine (44:3), pp. 235–241.PubMedCrossRefGoogle Scholar
  52. Reis, B.Y., Kohane, I.S., and Mandl, K.D. 2007. "An Epidemiological Network Model for Disease Outbreak Detection," PLoS Medicine (4:6), pp. 1019–1031.CrossRefGoogle Scholar
  53. Ritter, T. 2002. "Leaders: Lightweight Epidemiology Advanced Detection and Emergency Response System," SPIE, pp. 110–120.Google Scholar
  54. Rogerson, P.A. 1997. "Surveillance Systems for Monitoring the Development of Spatial Patterns," Statistics in Medicine (16:18), pp. 2081–2093.PubMedCrossRefGoogle Scholar
  55. Rogerson, P.A. 2005. "Spatial Surveillance and Cumulative Sum Methods," Spatial & Syndromic Surveillance for Public Health, K.K. Andrew B Lawson (ed.). New York: Wiley, pp. 95–113.CrossRefGoogle Scholar
  56. Serfling, R.E. 1963. "Methods for Current Statistical Analysis of Excess Pneumonia Influenza Deaths," Public Health Reports (78), pp. 494–506.Google Scholar
  57. Shahar, Y., and Musen, M. 1996. "Knowledge-Based Temporal Abstraction in Clinical Domains," Artificial Intelligence in Medicine (8), pp. 267–298.Google Scholar
  58. Shmueli, G., and Fienberg, S.E. 2006. "Current and Potential Statistical Methods for Monitoring Multiple Data Streams for Bio-Surveillance," Statistical Methods in Counter-Terrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, A. Wilson, G. Wilson and D.H. Olwell (eds.). Berlin: Springer.Google Scholar
  59. Sniegoski, C.A. 2004. "Automated Syndromic Classifi Cation of Chief Complaint Records," Johns Hopkins Apl Technical Digest (25:1), pp. 68–75.Google Scholar
  60. Sokolow, L.Z., Grady, N., Rolka, H., Walker, D., McMurray, P., English-Bullard, R., and Loonsk, J. 2005. "Deciphering Data Anomalies in Biosense," MMWR (CDC) (54(Suppl)), pp. 133–140.Google Scholar
  61. Takahashi, K., Kulldorff, M., Tango, T., and Yih, K. 2008. "A Flexibly Shaped Space-Time Scan Statistic for Disease Outbreak Detection and Monitoring," International Journal of Health Geographics (7:14).PubMedCrossRefGoogle Scholar
  62. Travers, D.A., and Haas, S.W. 2004. "Evaluation of Emergency Medical Text Processor, a System for Cleaning Chief Complaint Textual Data," Academic Emergency Medicine (11), pp. 1170–1176.PubMedCrossRefGoogle Scholar
  63. Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., and Wagner, M.M. 2003. "Technical Description of Rods: A Real-Time Public Health Surveillance System," Journal of American Medical Informatics Association 2003 (10), pp. 399–408.CrossRefGoogle Scholar
  64. Tsui, F.-C., Wagner, M.M., Dato, V.M., and Chang, C.C.H. 2001. "Value of ICD-9-Coded Chief Complaints for Detection of Epidemics," Symposium of Journal of American Medical Informatics Association.Google Scholar
  65. Wagner, M.M., Espino, J., Tsui, F.C., Gesteland, P., Chapman, W.W., Ivanov, O., Moore, A., Wong, W., Dowling, J., and Hutman, J. 2004a. "Syndrome and Outbreak Detection Using Chief-Complaint Data - Experience of the Real-Time Outbreak and Disease Surveillance Project," MMWR (CDC) (53(Suppl)), pp. 28–32.Google Scholar
  66. Wong, W.K., Moore, A., Cooper, G., and Wagner, M. 2003. "WSARE: What's Strange About Recent Events? " Journal of Urban Health (80:(2 Suppl. 1)), pp. 66–75.Google Scholar
  67. Wong, W.K., Moore, A., Cooper, G.F., and Wagner, M. 2002. "Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks," AAAI-02, Edmonton, Alberta pp. 217–223.Google Scholar
  68. Yeh, A.B., Lin, D.K.J., Zhou, H., and Venkataramani, C. 2003. "A Multivariate Exponentially Weighted Moving Average Control Chart for Monitoring Process Variability," Journal of Applied Statistics (30:5), pp. 507–536.CrossRefGoogle Scholar
  69. Yih, W., Caldwell, B., and Harmon, R. 2004. "The National Bioterrorism Syndromic Surveillance Demonstration Program," MMWR (CDC) (53(Suppl)), pp. 43–46.Google Scholar
  70. Yih, W.K., Abrams, A., Danila, R., Green, K., Kleinman, K., Kulldorff, M., Miller, B., Nordin, J., and Platt, R. 2005. "Ambulatory-Care Diagnoses as Potential Indicators of Outbreaks of Gastrointestinal Illness - Minnesota," MMWR (CDC) (54(Suppl)), pp. 157–162.Google Scholar
  71. Zeng, D., Chang, W., and Chen, H. 2004a. "A Comparative Study of Spatio-Temporal Hotspot Analysis Techniques in Security Informatics," 7th IEEE Transactions on Intelligent Transportation Systems, Washington, DC, pp. 106–111.Google Scholar
  72. Zhang, J., Tsui, F., Wagner, M., and Hogan, W. 2003. "Detection of Outbreaks from Time Series Data Using Wavelet Transform," AMIA Symp, pp. 748–752.Google Scholar
  73. Zhu, Y., Wang, W., Atrubin, D., and Wu, Y. 2005. "Initial Evaluation of the Early Aberration Reporting System - Florida," MMWR (CDC) (54(Suppl)), pp. 123–130.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Hsinchun Chen
    • 1
  • Daniel Zeng
    • 2
    • 3
  • Ping Yan
    • 2
  1. 1.Department of Management Information SystemsEller College of Management University of ArizonaTucsonUSA
  2. 2.Department of Management Information SystemsEller College of Management University of ArizonaTucsonUSA
  3. 3.Chinese Academy of SciencesBeijingChina

Personalised recommendations