Infectious Disease Informatics pp 49-72 | Cite as

# Data Analysis and Outbreak Detection

## Abstract

The analysis components of a syndromic surveillance system focus on detecting the changes in public health status, which may be indicative of disease outbreaks. At the core of these analysis components is the automated process of detecting aberration or data anomalies in the public health surveillance data, which often have prominent temporal and spatial data elements, by statistical analysis or data mining techniques. These methods are also capable of dealing with various common problems in epidemiological data such as bias, delay, lack of accuracy, and seasonality. These techniques are the focus of this chapter.

When processing public health surveillance data streams, it is often necessary to map the collected syndromic data into a small set of syndrome categories to facilitate follow-up analysis and outbreak detection. Section 4.1 discusses related syndrome classification approaches. In Sect. 4.2, we provide a taxonomy of anomaly analysis and outbreak detection methods used for biosurveillance. Sections 4.3–4.6 summarize various specific detection methods spanning from classic statistical methods to data mining approaches, which quantify the possibility of an outbreak conditioned on surveillance data.

## Keywords

Anomaly Detection Exponentially Weighted Move Average Recursive Least Square Unify Medical Language System Syndromic Surveillance## References

- Abrams, A.M., and Kleinman, K.P. 2007. "A Satscan? Macro Accessory for Cartography (SMAC) Package Implemented with Sas® Software," International Journal of Health Geographics (6:6).PubMedCrossRefGoogle Scholar
- Bath, P.A. 2004. "Data Mining in Health and Medical Information," Annual Review of Information Science and Technology (ARIST) (38), pp. 331-369.CrossRefGoogle Scholar
- Benoit, G. 2002. "Data Mining," Annual Review of Information Science and Technology (ARIST) (36), pp. 265–310.CrossRefGoogle Scholar
- Bradley, C.A., Rolka, H., Walker, D., and Loonsk, J. 2005. "BioSense: Implementation of a National Early Event Detection and Situational Awareness System," MMWR (CDC) (54(Suppl)), pp. 11–20.Google Scholar
- Brookmeyer, R., and Stroup, D. 2004. Monitoring the Health of Populations: Statistical Surveillance in Public Health. New York: Oxford University Press.Google Scholar
- Buckeridge, D., Burkom, H., Campbell, M., Hogan, W., and Moore, A. 2005a. "Algorithms for Rapid Outbreak Detection: A Research Synthesis," Journal of Biomedical Informatics (38), pp. 99–113.PubMedCrossRefGoogle Scholar
- Buckeridge, D., Graham, J., O'Connor, J., Choy, M.K., Tu, S.W., and Musen, M. 2002. "Knowledge-Based Bioterrorism Surveillance," American Medical Informatics Association Symposium, San Antonio, TX.Google Scholar
- Buckeridge, D., Musen, M., Switzer, P., and Crubezy, M. 2003. "An Analytic Framework for Space-Time Aberrancy Detection in Public Health Surveillance Data," AMIA Symposium pp. 120–124.Google Scholar
- Burkom, H., and Murphy, S. 2007. "Data Classification for Selection of Temporal Alerting Methods for Biosurveillance." BioSurvellance workshop 2007.Google Scholar
- CDC. 2003. "HIPAA Privacy Rule and Public Health: Guidance from CDC and the US Department of Health and Human Services," MMWR (52(Suppl)), pp. 1–20.Google Scholar
- Chang, W., Zeng, D., and Chen, H. 2005. "Prospective Spatio-Temporal Data Analysis for Security Informatics," In proceedings of the 8th IEEE International Conference on Intelligent Transportation Systems, Vienna, Austria.Google Scholar
- Chapman, W.W., Christensen, L., Wagner, M.M., Haug, P., Ivanov, O., Dowling, J., and Olszewski, R. 2005. "Classifying Free-Text Triage Chief Complaints into Syndromic Categories with Natural Language Processing," Artificial Intelligence in Medicine (33:1), pp. 31–40.PubMedCrossRefGoogle Scholar
- Chapman, W.W., Cooper, G.F., Hanbury, P., Chapman, B.E., Harrison, L.H., and Wagner, M.M. 2003. "Creating a Text Classifier to Detect Radiology Reports Describing Mediastinal Findings Associated with Inhalational Anthrax and Other Disorders," Journal of the American Medical Informatics Association (10:5), pp. 494–503.PubMedCrossRefGoogle Scholar
- Cooper, G.F., Dash, D.H., Levander, J.D., Wong, W.K., Hogan, W.R., and Wagner, M.M. 2004. "Bayesian Biosurveillance of Disease Outbreaks," Twentieth Conference on Uncertainty in Artificial Intelligence, Banff, Alberta, Canada, pp. 94–103.Google Scholar
- Costagliola, D., Flahault, A., Galinec, D., Garnerin, P., Menares, J., and Valleron, A. 1981. "A Routine Tool for Detection and Assessment of Epidemics of Influenza-Like Syndromes in France," American Journal of Public Health (81:1), pp. 97–99.CrossRefGoogle Scholar
- Crubézy, M., O'Connor, Pincus, Z., and Musen, M.A. 2005. "Ontology-Centered Syndromic Surveillance for Bioterrorism," IEEE Intelligent Systems (20:5), pp. 26–35.CrossRefGoogle Scholar
- Das, D., Weiss, D., and Mostashari, F. 2003. "Enhanced Drop-in Syndromic Surveillance in New York City Following September 11, 2001," J Urban Health (80:1(suppl)), pp. 176–188.Google Scholar
- Duczmal, L., and Buckeridge, D. 2005. "Using Modified Spatial Scan Statistic to Improve Detection of Disease Outbreak When Exposure Occurs in Workplace - Viginia," MMWR (CDC) (54(Suppl)), p. 187.Google Scholar
- Espino, J.U., and Wagner, M.M. 2001. "The Accuracy of ICD-9 Coded Chief Complaints for Detection of Acute Respiratory Illness," Proc AMIA Symp, pp. 164–168.Google Scholar
- Gesteland, P.H., Wagner, M.M., Chapman, W.W., Espino, J.U., Tsui, F.-C., Gardner, R.M., Rolfs, R.T., Dato, V.M., James, B.C., and Haug, P.J. 2002. "Rapid Deployment of an Electronic Disease Surveillance System in the State of Utah for the 2002 Olympic Winter Games," Proceedings of AMIA Symposium 2002, pp. 285–289.Google Scholar
- Grigoryan, V.V., Wagner, M.M., Waller, K., Wallstrom, G.L., and Hogan, W.R. 2005. "The Effect of Spatial Granularity of Data on Reference Dates for Influenza Outbreaks," in: RODS Laboratory Technical Report, 2005.Google Scholar
- Halasz, S., Brown, P., Goodall, C., Cochrane, D.G., and Allegra`, J.R. 2006. "The N-gram CC Classifier: A Novel Method of Automatically Creating CC Classifiers Based on ICD-9 Groupings," Advances in Disease Surveillance 2006 (1:30).Google Scholar
- Hutwagner L, Thompson W, Seeman GM, and T, T. 2003. "The Bioterrorism Preparedness and Response Early Aberration Reporting System (EARS)," J Urban Health (80(2 suppl 1)), pp. 89–96.Google Scholar
- Hutwagner, L., Browne, T., Seeman, G.M., and Fleischauer, A.T. 2005a. "Comparing Aberration Detection Methods with Simulated Data," Emerg Infect Dis [serial on the Internet] (11), pp. 314–316.Google Scholar
- Ivanov, O., Wagner, M.M., Chapman, W.W., and Olszewski, R.T. 2002. "Accuracy of Three Classifiers of Acute Gastrointestinal Syndrome for Syndromic Surveillance," AMIA Symp, pp. 345–349.Google Scholar
- Kaufman, Z., Cohen, E., Peled-Leviatan, T., Lavi, C., Aharonowitz, G., Dichtiar, R., Bromberg, M., Havkin, O., Shalev, Y., Marom, R., Shalev, V., Shemer, J., and Green, M. 2005. "Using Data on an Influenza B Outbreak to Evaluate a Syndromic Surveillance System - Israel, June 2004 [Abstract]," MMWR (CDC) (54(Suppl)), p. 191.Google Scholar
- Kleinman, K., Lazarus, R., and Platt, R. 2004. "A Generalized Linear Mixed Models Approach for Detecting Incident Cluster/Signals of Disease in Small Areas, with an Application to Biological Terrorism (with Invited Commentary)," American Journal of Epidemiology (159), pp. 217–224.PubMedCrossRefGoogle Scholar
- Kleinman, K., Abrams, A., Kulldorff, M., and Platt, R. 2005a. "A Model-Adjusted Spacetime Scan Statistic with an Application to Syndromic Surveillance," Epidemiology and Infection (119), pp. 409–419.CrossRefGoogle Scholar
- Kulldorff, M. 1997. "A Spatial Scan Statistic," Communications in Statistics: Theory and Methods (26), pp. 1481–1496.CrossRefGoogle Scholar
- Kulldorff, M. 1999. "Spatial Scan Statistics: Models, Calculations, and Applications," Scan Statistics and Applications, J.B. Glaz (ed.). Birkhauser, Boston: pp. 303–322.CrossRefGoogle Scholar
- Kulldorff, M. 2001. "Prospective Time Periodic Geographical Disease Surveillance Using a Scan Statistic," Journal of the Royal Statistical Society (Series A:164), pp. 61–72.CrossRefGoogle Scholar
- Kulldorff, M., Fang, Z., and Walsh, S. 2003. "A Tree-Based Scan Statistic for Database Disease Surveillance," Biometrics (9), pp. 641–646.Google Scholar
- Kulldorff, M., Mostashari, F., Duczmal, L., Yih, K., Kleinman, K., and Platt, R. 2005. "Multivariate Spatial Scan Statistics for Disease Surveillance."Google Scholar
- Lawson, A. B., and Kleinman, K. 2005. Spatial & Syndromic Surveillance for Public Health. New York: Wiley.CrossRefGoogle Scholar
- Le, S.Y., and Carrat, F. 1999. "Monitoring Epidemiologic Surveillance Data Using Hidden Markov Models," Statistics in Medicine (18), pp. 3463–3478.CrossRefGoogle Scholar
- Leroy, G., and Chen, H. 2001. "Meeting Medical Terminology Needs - the Ontology-Enhanced Medical Concept Mapper," IEEE Transactions on Information Technology in Biomedicine (5), pp. 261–270.PubMedCrossRefGoogle Scholar
- Levine, N. 2002. "Crimestat III: A Spatial Statistics Program for the Analysis of Crime Incident Locations," Washington, DC, The National Institute of Justice.Google Scholar
- Lombardo, J., Burkom, H., and Pavlin, J. 2004. "Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE II), Framework for Evaluating Syndromic Surveillance Systems," Syndromic surveillance: report from a national conference, 2003. MMWR 2004 (53(Suppl)), pp. 159–165.Google Scholar
- Lu, H.-M., King, C.-C., Wu, T.S., Shin, F.-Y., Hsiao, J.-Y., Zeng, D., and Chen, H. 2007a. " Chinese Chief Complaint Classification for Syndromic Surveillance," in: Intelligence and Security Informatics: BioSurveillance, D. Zeng, Gotham, I., Komatsu, K., Lynch, C., Thurmond, M., Madigan, D., Lober, B., Kvach, J., and Chen, H (ed.). New Brunswick, NJ: Springer Lecture Notes in Computer Science, No. 4506.Google Scholar
- Lu, H.-M., Zeng, D., Trujillo, L., Komatsu, K., and Chen, H. 2008. "Ontology-Enhanced Automatic Chief Complaint Classification for Syndromic Surveillance," Journal of Biomedical Informatics (41:2), pp. 340–356.PubMedCrossRefGoogle Scholar
- Ma, H., Rolka, H., Mandl, K., Buckeridge, D., Fleischauer, A., and Pavlin, J. 2005. "Implementation of Laboratory Order Data in Biosense Early Event Detection and Situation Awareness System," MMWR (CDC) (54(Suppl)), pp. 27–30.Google Scholar
- Madign, D. 2005. "Bayesian Data Mining for Health Surveillance," Spatial & Syndromic Surveillance for Public Health, A.B. Lawson and K. Kleinman (eds.). New York: Wiley.Google Scholar
- Mandl, K.D., Overhage, J.M., Wagner, M.M., Lober, W.B., Sebastiani, P., Mostashari, F., Pavlin, J.A., Gesteland, P.H., Treadwell, T., Koski, E., Hutwagner, L., Buckeridge, D.L., Aller, R.D., and Grannis, S. 2004. "Implementing Syndromic Surveillance: A Practical Guide Informed by the Early Experience," Journal of American Medical Informatics Association (11:2), pp. 141–150.CrossRefGoogle Scholar
- Moore, A., and Lee, M.S. 1998. "Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets," Journal of Artificial Intelligence Research (8), pp. 67–91.Google Scholar
- Moore, A.W., Cooper, G., Tsui, F.-C., and Wagner, M.M. 2002. "Summary of Biosurveillance-Relevant Statistical and Data Mining Techniques," RODS Laboratory Technical Report.Google Scholar
- Neill, D., Moore, A., and Cooper, G. 2005. "A Bayesian Spatial Scan Statistic," Neural Information Processing Systems (18).Google Scholar
- Neubauer, A. 1997. "The EWMA Control Chart: Properties and Comparison with Other Quality-Control Procedures by Computer Simulation," Clinical Chemistry (43:4), pp. 594–601.PubMedGoogle Scholar
- Quenel, P., Dab, W., Hannoun, C., and Cohen, J. 1994. "Sensitivity, Specificity and Predictive Values of Health Service Based Indicators for the Surveillance of Influenza-A Epidemics," International Journal of Epidemiology (23), pp. 849–855.PubMedCrossRefGoogle Scholar
- Rath, T.M., Carreras, M., and Sebastiani, P. 2003. "Automated Detection of Influenza Epidemics with Hidden Markov Models," Lecture Notes in Computer Science Berlin: Springer, pp. 521–532.Google Scholar
- Reis, B., and Mandl, K. 2003. "Time Series Modeling for Syndromic Surveillance," BMC Medical Informatics and Decision Making (3:2).PubMedCrossRefGoogle Scholar
- Reis, B., and Mandl, K. 2004. "Syndromic Surveillance: The Effects of Syndrome Grouping on Model Accuracy and Outbreak Detection," Annals of Emergency Medicine (44:3), pp. 235–241.PubMedCrossRefGoogle Scholar
- Reis, B.Y., Kohane, I.S., and Mandl, K.D. 2007. "An Epidemiological Network Model for Disease Outbreak Detection," PLoS Medicine (4:6), pp. 1019–1031.CrossRefGoogle Scholar
- Ritter, T. 2002. "Leaders: Lightweight Epidemiology Advanced Detection and Emergency Response System," SPIE, pp. 110–120.Google Scholar
- Rogerson, P.A. 1997. "Surveillance Systems for Monitoring the Development of Spatial Patterns," Statistics in Medicine (16:18), pp. 2081–2093.PubMedCrossRefGoogle Scholar
- Rogerson, P.A. 2005. "Spatial Surveillance and Cumulative Sum Methods," Spatial & Syndromic Surveillance for Public Health, K.K. Andrew B Lawson (ed.). New York: Wiley, pp. 95–113.CrossRefGoogle Scholar
- Serfling, R.E. 1963. "Methods for Current Statistical Analysis of Excess Pneumonia Influenza Deaths," Public Health Reports (78), pp. 494–506.Google Scholar
- Shahar, Y., and Musen, M. 1996. "Knowledge-Based Temporal Abstraction in Clinical Domains," Artificial Intelligence in Medicine (8), pp. 267–298.Google Scholar
- Shmueli, G., and Fienberg, S.E. 2006. "Current and Potential Statistical Methods for Monitoring Multiple Data Streams for Bio-Surveillance," Statistical Methods in Counter-Terrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, A. Wilson, G. Wilson and D.H. Olwell (eds.). Berlin: Springer.Google Scholar
- Sniegoski, C.A. 2004. "Automated Syndromic Classifi Cation of Chief Complaint Records," Johns Hopkins Apl Technical Digest (25:1), pp. 68–75.Google Scholar
- Sokolow, L.Z., Grady, N., Rolka, H., Walker, D., McMurray, P., English-Bullard, R., and Loonsk, J. 2005. "Deciphering Data Anomalies in Biosense," MMWR (CDC) (54(Suppl)), pp. 133–140.Google Scholar
- Takahashi, K., Kulldorff, M., Tango, T., and Yih, K. 2008. "A Flexibly Shaped Space-Time Scan Statistic for Disease Outbreak Detection and Monitoring," International Journal of Health Geographics (7:14).PubMedCrossRefGoogle Scholar
- Travers, D.A., and Haas, S.W. 2004. "Evaluation of Emergency Medical Text Processor, a System for Cleaning Chief Complaint Textual Data," Academic Emergency Medicine (11), pp. 1170–1176.PubMedCrossRefGoogle Scholar
- Tsui, F.-C., Espino, J.U., Dato, V.M., Gesteland, P.H., Hutman, J., and Wagner, M.M. 2003. "Technical Description of Rods: A Real-Time Public Health Surveillance System," Journal of American Medical Informatics Association 2003 (10), pp. 399–408.CrossRefGoogle Scholar
- Tsui, F.-C., Wagner, M.M., Dato, V.M., and Chang, C.C.H. 2001. "Value of ICD-9-Coded Chief Complaints for Detection of Epidemics," Symposium of Journal of American Medical Informatics Association.Google Scholar
- Wagner, M.M., Espino, J., Tsui, F.C., Gesteland, P., Chapman, W.W., Ivanov, O., Moore, A., Wong, W., Dowling, J., and Hutman, J. 2004a. "Syndrome and Outbreak Detection Using Chief-Complaint Data - Experience of the Real-Time Outbreak and Disease Surveillance Project," MMWR (CDC) (53(Suppl)), pp. 28–32.Google Scholar
- Wong, W.K., Moore, A., Cooper, G., and Wagner, M. 2003. "WSARE: What's Strange About Recent Events? " Journal of Urban Health (80:(2 Suppl. 1)), pp. 66–75.Google Scholar
- Wong, W.K., Moore, A., Cooper, G.F., and Wagner, M. 2002. "Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks," AAAI-02, Edmonton, Alberta pp. 217–223.Google Scholar
- Yeh, A.B., Lin, D.K.J., Zhou, H., and Venkataramani, C. 2003. "A Multivariate Exponentially Weighted Moving Average Control Chart for Monitoring Process Variability," Journal of Applied Statistics (30:5), pp. 507–536.CrossRefGoogle Scholar
- Yih, W., Caldwell, B., and Harmon, R. 2004. "The National Bioterrorism Syndromic Surveillance Demonstration Program," MMWR (CDC) (53(Suppl)), pp. 43–46.Google Scholar
- Yih, W.K., Abrams, A., Danila, R., Green, K., Kleinman, K., Kulldorff, M., Miller, B., Nordin, J., and Platt, R. 2005. "Ambulatory-Care Diagnoses as Potential Indicators of Outbreaks of Gastrointestinal Illness - Minnesota," MMWR (CDC) (54(Suppl)), pp. 157–162.Google Scholar
- Zeng, D., Chang, W., and Chen, H. 2004a. "A Comparative Study of Spatio-Temporal Hotspot Analysis Techniques in Security Informatics," 7th IEEE Transactions on Intelligent Transportation Systems, Washington, DC, pp. 106–111.Google Scholar
- Zhang, J., Tsui, F., Wagner, M., and Hogan, W. 2003. "Detection of Outbreaks from Time Series Data Using Wavelet Transform," AMIA Symp, pp. 748–752.Google Scholar
- Zhu, Y., Wang, W., Atrubin, D., and Wu, Y. 2005. "Initial Evaluation of the Early Aberration Reporting System - Florida," MMWR (CDC) (54(Suppl)), pp. 123–130.Google Scholar