Advertisement

Detection of Events In Multiple Streams of Surveillance Data

Multivariate, Multi-stream and Multi-dimensional Approaches
Chapter
Part of the Integrated Series in Information Systems book series (ISIS, volume 27)

Chapter Overview

Simultaneous monitoring of multiple streams of data that carry corroborating evidence can be beneficial in many event detection applications. This chapter reviews analytic approaches that can be employed in such scenarios. We cover established statistical algorithms of multivariate time series baseline estimation and forecasting. They are relevant when multiple streams of data can be modeled jointly. We then present more recent methods which do not have to rely on such an assumption. We separately address techniques that deal with data in a specific form of a record of transactions annotated with multiple descriptors, often encountered in the practice of health surveillance. Future event detection algorithms will benefit from incorporation of machine learning methodology. This will enable adaptability, utilization of human feedback, and building reliable detectors using some examples of events of interest. That will lead to highly scalable and economical multi-stream event detection systems.

Keywords

Event detection Analysis of multivariate time series Health surveillance 

References

  1. Box, G., Jenkins, G.M., and Reinsel, G. (1994) Time Series Analysis: Forecasting and Control. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  2. Burkom, H.S. (2003). “Biosurveillance applying scan statistics with multiple, disparate data sources.” Journal of Urban Health 80(2 Suppl 1):57–65.Google Scholar
  3. Burkom, H.S., Elbert, Y., Feldman, A., and Lin, J. (2004) “The role of data aggregation in biosurveillance detection strategies with applications from ESSENCE.” Morbidity and Mortality Weekly Report (Supplement) 53:67–73.Google Scholar
  4. Burkom, H.S., Murphy, S., Coberly, J., and Hurt-Mullen, K. (2005). “Public health monitoring tools for multiple data streams.” Morbidity and Mortality Weekly Report (Supplement) 54:55–62.Google Scholar
  5. CMAJ (2000). “Leadership and fecal coliforms: Walkerton 2000.” Editorial, Canadian Medical Association Journal 163(11):1417Google Scholar
  6. Cooper, G.F., Dash, D.H., Levander, J.D., Wong, W.K., Hogan, W.R., and Wagner, M.M. (2004). “Bayesian biosurveillance of disease outbreaks.” In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Banff, Canada, pp. 94–103.Google Scholar
  7. Cooper, G.F., Dowling, J.N., Levander, J.D., and Sutovsky, P. (2006) “A Bayesian algorithm for detecting CDC category A outbreak diseases from emergency department chief complaints.” Advances in Disease Surveillance 2:45Google Scholar
  8. Crosier, R.B. (1988). “Multivariate generalizations of cumulative sum quality-control schemes.” Technometrics 30(3):291–303.CrossRefGoogle Scholar
  9. Dubrawski, A., Elenberg, K., Moore, A., and Sabhnani, R. (2006). “Monitoring food safety by detecting patterns in consumer complaints.” In: Proceedings of AAAI/IAAI’06, Boston, USA.Google Scholar
  10. Dubrawski, A., Sabhnani, M., Ray, S., Roure, J., and Baysek, M. (2007) “T-Cube as an enabling technology in surveillance applications.” Advances in Disease Surveillance 4:6.Google Scholar
  11. Edgington, E.S. (1972). “An additive method for combining probability values from independent experiments.” Journal of Psychology 80:351–363.CrossRefGoogle Scholar
  12. Fisher, R. (1948). “Combining independent tests of significance.” Journal of American Statistics 2(5):30.Google Scholar
  13. Fricker, R.D. (2007). “Directionally sensitive multivariate statistical process control procedures with application to syndromic surveillance.” Advances in Disease Surveillance 3(1):1–17.Google Scholar
  14. Held, L., Höhle, M., and Hofmann, M. (2005). “A statistical framework for the analysis of multivariate infectious disease surveillance counts.” Statistics Modelling 5:187–199.CrossRefGoogle Scholar
  15. Hotelling, H. (1947). “Multivariate Quality Control.” In: C. Eisenhart, MW. Hastay, and WA. Wallis, editors. Techniques of Statistical Analysis. New York: McGraw-Hill.Google Scholar
  16. Jost, L. “Combining Significance Levels from Multiple Experiments or Analyses.” Available at http://www.loujost.com/Statistics%20and%20Physics/Significance%20Levels/Combin-ingPValues.htm.
  17. Kulldorff, M. (1997). “A spatial scan statistic.” Communications in Statistics: Theory and Methods 26(6):1481–1496.CrossRefGoogle Scholar
  18. Kulldorff, M. (2001). “Prospective time-periodic geographical disease surveillance using a scan statistic.” Journal of the Royal Statistical Society A 164:61–72.CrossRefGoogle Scholar
  19. Kulldorff, M., Mostashari, F., Duczmal, L., Yih, K., Kleinman, K., and Platt, R. (2007). “Multivariate spatial scan statistics for disease surveillance.” Statistics in Medicine 26(8):1824–1833.PubMedCrossRefGoogle Scholar
  20. Lotze, T., Shmueli, G., Murphy, S., and Burkom, H. (2006). “A Wavelet-based Anomaly Detector for Early Detection of Disease Outbreaks.” ICML Workshop on Machine Learning Algorithms for Surveillance and Event Detection, Pittsburgh, PA.Google Scholar
  21. Lowry, C.A., Woodall, W.H., Champ, C.W., and Rigdon, S.E. (1992). “A multivariate exponentially weighted moving average chart.” Technometrics 34:46–53.CrossRefGoogle Scholar
  22. Madigan, D. (2005). “Bayesian Data Mning for Health Surveillance.” In: B. Lawson and K. Kleinman, editors, Spatial and Syndromic Surveillance for Public Health, New York: Wiley.Google Scholar
  23. Mohtashemi, M., Kleinman, K., and Yih, W.K. (2007). “Multi-syndrome analysis of time series using PCA: a new concept for outbreak investigation.” Statistics in Medicine 26:5203–5224.PubMedCrossRefGoogle Scholar
  24. Montgomery, D.C. (2000). Introduction to Statistical Quality Control, 4th ed. New York: Wiley.Google Scholar
  25. Moore, AW. and Lee, M.S. (1998) “Cached sufficient statistics for efficient machine learning with large datasets.” Journal of Artificial Intelligence Research 8:67–91.Google Scholar
  26. Neill, D.B. and Moore, AW. (2004) “A fast multi-resolution method for detection of significant spatial disease clusters.” In: Advances in Neural Information Processing Systems. Vol. 16, pp. 651–658.Google Scholar
  27. Neill, D.B. (2007) “Incorporating learning into disease surveillance systems.” Advances in Disease Surveillance 4:107.Google Scholar
  28. Neill, D.B. and Cooper, G.F. (2008). “A multivariate Bayesian scan statistic for early event detection and characterization.” (Journal paper under review).Google Scholar
  29. Pignatiello, J.J. and Runger, G.C. (1990). “Comparisons of multivariate CUSUM charts.” Journal of Quality Technology 22(3):173–186.Google Scholar
  30. Ray, S., Mchalska, A., Sabhnani, M., Dubrawski, A., Baysek, M., Chen, L., and Ostlund, J. (2008) “T-Cube web interface: a tool for immediate visualization, interactive manipulation and analysis of large sets of multivariate time series.” AMIA Annual Symposium (accepted for publication).Google Scholar
  31. Rolka, H., Burkom, H., Cooper, G.F., Kulldorff, M., Madigan, D., and Wong, W.K. (2007). “Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs.” Statistics in Medicine 26(8):1834–1856.PubMedCrossRefGoogle Scholar
  32. Roure, J., Dubrawski, A., and Schneider, J. (2007). “A study into detection of bio-events in multiple streams of surveillance data.” In: Intelligence and Security Informatics: Bio-surveillance, LNCS Vol. 4506, Berlin: Springer-Verlag.Google Scholar
  33. Roure, J., Dubrawski, A., and Schneider J. (2009). “Learning detectors of events in multivariate time series.” Journal of American Medical Informatics Association (in press).Google Scholar
  34. Sabhnani, M., Moore, A.W., and Dubrawski, AW. (2007a) “T-Cube: a data structure for fast extraction of time series from large datasets.” Technical Report CMU-ML-07-114, Carnegie Mellon University.Google Scholar
  35. Sabhnani, M., Dubrawski, A., and Schneider, J. (2007). “Multivariate time series analyses using primitive univariate algorithms.” Advances in Disease Surveillance 4:112.Google Scholar
  36. Shen, Y. and Cooper, G.F. (2007). “A Bayesian biosurveillance method that models unknown outbreak diseases.” Intelligence and Security Informatics: Biosurveillance, LNCS Vol. 4506, Berlin: Springer-Verlag.Google Scholar
  37. Siddiqi, S.M., Boots, B., Gordon, G.J., and Dubrawski, AW. (2007). “Learning stable multivariate baseline models for outbreak detection.” Advances in Disease Surveillance 4:266.Google Scholar
  38. Sonneson, C. and Frisén, M. (2005). “Multivariate Surveillance.” In: B. Lawson and K. Kleinman, editors, Spatial and Syndromic Surveillance for Public Health, New York: Wiley.Google Scholar
  39. Wasserman, L. (2004). All of Statistics. Berlin: Springer-Verlag.Google Scholar
  40. Wong, W.K., Cooper, G.F., Dash, D.H., Levander, J.D., Dowling, J.N., Hogan, W.R., and Wagner, M.M. (2005) “Bayesian biosurveillance using multiple data streams.” Morbidity and Mortality Weekly Report (Supplement) 54:63–69.Google Scholar
  41. Wong, W., Moore, A., Cooper, G., and Wagner, M. (2005). “What’s strange about recent events (WSARE): an algorithm for the early detection of disease outbreaks.” Journal of Machine Learning Research 6:1961–1998.Google Scholar
  42. Yahav, I. and Shmueli, G. (2007). “Algorithm Combination for Improved Performance in Biosurveillance Systems.” In: Intelligence and Security Informatics: Biosurveillance, LNCS Vol.4506, Berlin: Springer-Verlag.Google Scholar

Suggested Reading

  1. Burkom, H.S., Murphy, S., Coberly, J., and Hurt-Mullen, K. (2005). “Public health monitoring tools for multiple data streams.” Morbidity and Mortality Weekly Report (Supplement) 54:55–62.Google Scholar
  2. Rolka, H., Burkom, H., Cooper, G.F., Kulldorff, M., Madigan, D., and Wong, W.K. (2007). “Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs.” Statistics in Medicine 26(8):1834–1856.PubMedCrossRefGoogle Scholar
  3. Wagner, M.M., Moore, A.W., and Aryel, R.M. (2006) editors. Handbook of Biosurveillance. New York, NY: Academic Press.Google Scholar
  4. Wong, W.K., Cooper, G.F., Dash, D.H., Levander, J.D., Dowling, J.N., Hogan, W.R., and Wagner, M.M. (2005) “Bayesian biosurveillance using multiple data streams.” Morbidity and Mortality Weekly Report (Supplement) 54:63–69.Google Scholar

Online Resources

  1. • Engineering Statistics Handbook (http://www.itl.nist.gov/div898/handbook/index.htm): This interactive handbook developed and hosted by the National Institute of Standards and Technology is a rich source of information about fundamental methods of statistical analysis. It includes comprehensive descriptions of multivariate control charts which can be used to monitor multi-stream data.
  2. • Dataplot (http://www.itl.nist.gov/div898/software/dataplot/homepage.htm): Dataplot is a free, public-domain, multi-platform software for scientific visualization, statistical analysis, and modeling developed at the National Institute of Standards and Technology. Its extensive functionality includes a set of statistical process control and time series analysis methods; it also supports process monitoring.
  3. • SaTScan™ (http://www.satscan.org/): SaTScan™ is a free software that analyzes spatial, temporal and space-time data using the spatial, temporal, or space-time scan statistics. It can also scan multiple datasets simultaneously to look for clusters that occur in one or more of them.
  4. • WSARE and Fast Spatial Scan (http://www.autonlab.org/autonweb/downloads/software.html): Free downloadable implementations of WSARE and scalable spatial scan are available from the Carnegie Mellon University Auton Lab web site.
  5. • T-Cube (http://www.autonlab.org/T-Cube/): A public demo version of the T-Cube prototype web interface is available at the CMU Auton Lab web site. It includes massive screening and detection functions which can be executed against user-supplied or locally available example multi-dimensional data to demonstrate efficiency of the underlying data structure.
  6. • WEKA (http://www.cs.waikato.ac.nz/ml/weka/): WEKA is a free Java language library including many machine learning algorithms which can be used to implement data-driven approaches to detection of events in multi-stream surveillance.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Auton LabCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations