Data Mining and Knowledge Discovery

, Volume 22, Issue 1–2, pp 232–258 | Cite as

Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction

  • Amy McGovern
  • Derek H. Rosendahl
  • Rodger A. Brown
  • Kelvin K. Droegemeier
Article

Abstract

We introduce an efficient approach to mining multi-dimensional temporal streams of real-world data for ordered temporal motifs that can be used for prediction. Since many of the dimensions of the data are known or suspected to be irrelevant, our approach first identifies the salient dimensions of the data, then the key temporal motifs within each dimension, and finally the temporal ordering of the motifs necessary for prediction. For the prediction element, the data are assumed to be labeled. We tested the approach on two real-world data sets. To verify the generality of the approach, we validated the application on several subjects from the CMU Motion Capture database. Our main application uses several hundred numerically simulated supercell thunderstorms where the goal is to identify the most important features and feature interrelationships which herald the development of strong rotation in the lowest altitudes of a storm. We identified sets of precursors, in the form of meteorological quantities reaching extreme values in a particular temporal sequence, unique to storms producing strong low-altitude rotation. The eventual goal is to use this knowledge for future severe weather detection and prediction algorithms.

Keywords

Temporal data mining Multi-dimensional Severe weather 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adlerman E, Droegemeier KK (2005) The dependence of numerically simulated cyclic esocyclogenesis upon environmental vertical wind shear. Mon Weather Rev 133: 3595–3623CrossRefGoogle Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Morgan Kaufmann, pp 487–499Google Scholar
  3. Brotzge J, Droegemeier KK, McLaughlin DJ (2006) Collaborative adaptive sensing of the atmosphere (CASA): new radar system for improving analysis and forecasting of surface weather conditions. J Transp Res Board (1948), pp 145–151Google Scholar
  4. Burgess DW, Donaldson RJ Jr, Desrochers PR (1993) The tornado: its structure, dynamics, prediction, and hazards, vol 79, American Geophysical Union, chap Tornado detection and warning by radar, pp 203–221Google Scholar
  5. Cheng H, Tan PN (2008) Semi-supervised learning with data calibration for long-term time series forecasting. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–141Google Scholar
  6. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: In the 9th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 493–498Google Scholar
  7. Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, pp 16–22Google Scholar
  8. Denton A (2005) Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model. In: Proceedings of the fifth IEEE international conference on data mining, pp 122–129Google Scholar
  9. Donaldson RJ Jr, Dyer RM, Kraus MJ (1975) An objective evaluator of techniques for predicting severe weather events. In: Preprints: ninth conference on severe local storms, American Meteorological Society, pp 321–326Google Scholar
  10. Faloutsos C, Jagadish HV, Mendelzon AO, Milo T (1997) A signature technique for similarity-based queries. In: Proceedings of compression and complexity of sequences, pp 2–20Google Scholar
  11. Goldin D, Mardales R, Nagy G (2006) In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 347–356Google Scholar
  12. Hu M, Xue M, Brewster K, Gao J (2004) Prediction of Fort Worth tornadic thunderstorms using 3DVAR and cloud analysis with WSR-88D Level-II data. In: 11th Conference on aviation, range, aerospace and 22nd conference on severe local storms, American Meteorological Society, Electronically published, Paper J1.2Google Scholar
  13. Idé T (2006) Why does subsequence time-series clustering produce sine waves? Lecture Notes in Computer Science. Springer, Berlin/HeidelbergGoogle Scholar
  14. Johnson JT, MacKeen PL, Witt A, Mitchell ED, Stumpf GJ, Eilts MD, Thomas KW (1998) The storm cell identification and tracking algorithm: an enhanced WSR-88D algorithm. Weather Forecast 13(2): 263–276CrossRefGoogle Scholar
  15. Kahveci T, Singh A, Gürel A (2002) Similarity searching for multi-attribute sequences. In: Proceedings of the international conference on scientific and statistical database management, pp 175–184Google Scholar
  16. Kasetty S, Stafford C, Walker GP, Wang X, Keogh E (2008) Real-time classification of streaming sensor data. In: Proceedings of the 20th IEEE international conference on tools with artificial intelligenceGoogle Scholar
  17. Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: Proceedings of the 3rd IEEE international conference on data mining, pp 115–122Google Scholar
  18. Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), Houston, Texas, pp 226–233Google Scholar
  19. Lee SL, Chun SJ, Kim DH, Lee JH, Chung CW (2000) Similarity search for multidimensional data sequences. In: Proceedings of the IEEE international conference on data engineering, pp 599–608Google Scholar
  20. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11Google Scholar
  21. Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144CrossRefMathSciNetGoogle Scholar
  22. McGovern A, Jensen D (2008) Optimistic pruning for multiple instance learning. Pattern Recognit Lett 29(9): 1252–1260CrossRefGoogle Scholar
  23. McGovern A, Supinie T, Gagne II DJ, Troutman N, Collier M, Brown RA, Basara J, Williams J (2010) Understanding severe weather processes through spatiotemporal relational random forests. In: 2010 NASA conference on intelligent data understanding (to appear)Google Scholar
  24. McGovern A, Rosendahl DH, Kruger A, Beaton MG, Brown RA, Droegemeier KK (2007) Anticipating the formation of tornadoes through data mining. In: Preprints of the Fifth conference on artificial intelligence and its applications to environmental sciences at the american meteorological society annual meeting, American Meteorological Society, San Antonio, TX, Paper 4.3AGoogle Scholar
  25. McGovern A, Hiers N, Collier M, Gagne II DJ, Brown RA (2008) Spatiotemporal relational probability trees. In: Proceedings of the 2008 IEEE international conference on data mining, Pisa, Italy, pp 935–940Google Scholar
  26. Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the SIAM international conference on data mining, pp 473–484Google Scholar
  27. Oates T (1999) Identifying distinctive subsequences in multivariate time series by clustering. In: Proceedings of the Fifth international conference on knowledge discovery and data mining, pp 322–326Google Scholar
  28. Oates T, Cohen PR (1996) Searching for structure in multiple streams of data. In: Proceedings of the thirteenth international conference on machine learning, Morgan Kaufmann, pp 346–354Google Scholar
  29. Oates T, Jensen D, Cohen PR (1998) Discovering rules for clustering and predicting asynchronous events. In: Predicting the future: AI approaches to time series workshop, AAAI-98, pp 73–79Google Scholar
  30. Provost FJ, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52: 199–215MATHCrossRefGoogle Scholar
  31. Rosendahl DH (2008) Identifying precursors to strong low-level rotation within numerically simulated supercell thunderstorms: a data mining approach. Master’s thesis, School of Meteorology, University of OklahomaGoogle Scholar
  32. Schaefer JT (1990) The critical success index as an indicator of warning skill. Weather Forecast 5(4): 570–575CrossRefMathSciNetGoogle Scholar
  33. Shieh J, Keogh E (2009) iSAX: Indexing and mining terabyte sized time series. In: Proceedings of the IEEE international conference on data miningGoogle Scholar
  34. Supinie T, McGovern A, Williams J, Abernethy J (2009) Spatiotemporal relational random forests. In: Proceedings of the IEEE international conference on data mining (ICDM) workshop on spatiotemporal data mining, p electronically publishedGoogle Scholar
  35. Tanaka Y, Uehara K (2003) Discover motifs in multi-dimensional time-series using the principal component analysis and the mdl principle. In: Proceedings of the third international conference on machine learning and data mining in pattern recognition (MLDM 2003), pp 252–265Google Scholar
  36. Vlachos M, Hadjielefheriou M, Gunopulos D, Keogh E (2006) Indexing multidimensional time-series. Int J Very Large Data Bases 15(1): 1–20CrossRefGoogle Scholar
  37. Webb GI (1995) OPUS: an efficient admissible algorithm for unordered search. J Artif Intell Res 3: 431–465MATHGoogle Scholar
  38. Xi X, Keogh E, Wei L, Mafra-Neto A (2007) Finding motifs in database of shapes. In: Proceedings of the SIAM international conference on data miningGoogle Scholar
  39. Xue M, Droegemeier KK, Wong V (2000) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction model. Part I: model dynamics and verification. Meteorol Atmos Phys 75: 161–193CrossRefGoogle Scholar
  40. Xue M, Droegemeier KK, Wong V, Shapiro A, Brewster K, Carr F, Weber D, Liu Y, Wang D (2001) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: model physics and applications. Meteorol Atmos Phys 76: 134–165CrossRefGoogle Scholar
  41. Xue M, Wang D, Gao J, Brewster K, Droegemeier KK (2003) The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorol Atmos Phys 82: 139–170CrossRefGoogle Scholar
  42. Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 947–956Google Scholar
  43. Yin J, Gaber MM (2008) Clustering distibutied time series in sensor networks. In: Proceedings of the IEEE international conference on data mining, pp 678–687Google Scholar
  44. Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60, special issue on unsupervised learningGoogle Scholar
  45. Zaki MJ, Parimi N, De N, Gao F, Phoophakdee B, Urban J, Chaoji V, Hasan MA, Salem S (2005) Towards generic pattern mining. In: International conference on formal concept anaysisGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Amy McGovern
    • 1
  • Derek H. Rosendahl
    • 2
  • Rodger A. Brown
    • 3
  • Kelvin K. Droegemeier
    • 2
  1. 1.School of Computer ScienceUniversity of OklahomaNormanUSA
  2. 2.School of MeteorologyUniversity of OklahomaNormanUSA
  3. 3.NOAA/National Severe Storms LaboratoryNormanUSA

Personalised recommendations