Data Mining and Knowledge Discovery

, Volume 26, Issue 2, pp 398–433 | Cite as

Enhanced spatiotemporal relational probability trees and forests

  • Amy McGovern
  • Nathaniel Troutman
  • Rodger A. Brown
  • John K. Williams
  • Jennifer Abernethy
Open Access
Article

Abstract

Many real world domains are inherently spatiotemporal in nature. In this work, we introduce significant enhancements to two spatiotemporal relational learning methods, the spatiotemporal relational probability tree and the spatiotemporal relational random forest, that increase their ability to learn using spatiotemporal data. We enabled the models to formulate questions on both objects and the scalar and vector fields within and around objects, allowing the models to differentiate based on the gradient, divergence, and curl and to recognize the shape of point clouds defined by fields. This enables the model to ask questions about the change of a shape over time or about its orientation. These additions are validated on several real-world hazardous weather datasets. We demonstrate that these additions enable the models to learn robust classifiers that outperform the versions without these new additions. In addition, analysis of the learned models shows that the findings are consistent with current meteorological theories.

Keywords

Spatiotemporal relational learning Statistical relational learning Hazardous weather 

References

  1. Allcroft DJ, Glasbey C, Durban M (2001) Modelling weather data. In: SCRI annual report 2001, pp 192–195Google Scholar
  2. Allen JF (1991) Time and time again: the many ways to represent time. Int J Intell Syst 6(4): 341–355CrossRefGoogle Scholar
  3. Barber C, Dobkin D, Huhdanpaa H (1996) The quickhull algorithm for convex hulls. ACM Tran Math Softw 22(4):469–483. http://www.qhull.org. Accessed 4 March 2012Google Scholar
  4. Bedka K, Brunner J, Dworak R, Fletz W, Otkin J, Greenwald T (2010) Objective satellite-based detection of overshooting tops using infrared window channel brightness temperature gradients. J Appl Meteorol Climatol 49: 181–202CrossRefGoogle Scholar
  5. Bluestein H, Weiss C, French M, Holthaus E, Tanamachi R, Frasier S, Pazmany A (2007) The structure of tornadoes near Attica, Kansas, on 12 May 2004: high–resolution, mobile, Doppler radar observations. Mon Weather Rev 135(2): 475–506CrossRefGoogle Scholar
  6. Bodenhamer M, Bleckley S, Fennelly D, Fagg AH, McGovern A (2009) Spatio-temporal multi-dimensional relational framework trees. In: Proceedings of the 2009 IEEE international conference on data mining (ICDM) workshop on spatiotemporal data mining, Miami, electronically publishedGoogle Scholar
  7. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159CrossRefGoogle Scholar
  8. Breiman L (2001) Random forests. Mach Learn 45(1): 5–32MATHCrossRefGoogle Scholar
  9. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontMATHGoogle Scholar
  10. Cornman L, Morse C, Cunning G (1995) Real-time estimation of atmospheric turbulence severity from in situ aircraft measurements. J Aircr 32: 171–177CrossRefGoogle Scholar
  11. Cornman L, Meymaris G, Limber M (2004) An update on the FAA aviation weather research program’s in situ turbulence measurement and reporting system. In: Preprints, 11th conf on aviation, range and aerospace meteorology, Hyannis, p P4.3Google Scholar
  12. Cova T, Goodchild M (2002) Extending geographical representation to include fields of spatial objects. Int J Geogr Inform Sci 16(6): 509–532CrossRefGoogle Scholar
  13. Davies-Jones R (2008) Can a descending rain curtain in a supercell instigate tornadogenesis barotropically?. J Atmospheric Sci 65: 2469–2497CrossRefGoogle Scholar
  14. Dutton J, Panofsky HA (1970) Clear air turbulence: a mystery may be unfolding. Science 167: 937–944CrossRefGoogle Scholar
  15. Egan JP (1984) Signal detection theory and ROC analysis. Series in cognition and perception. Academic Press, New YorkGoogle Scholar
  16. Eldardiry H, Neville J (2011) Across-model collective ensemble classification. In: Proceedings of the 25th conference on artificial intelligence AAAI 2011, San Francisco, electronically publishedGoogle Scholar
  17. Fast A, Friedland L, Maier M, Taylor B, Jensen D, Goldberg H, Komoroske K (2007) Relational data pre-processing techniques for improved securities fraud detection. In: Proceedings of the 13th international conference on knowledge discovery and data mining, San Jose, pp 941–949Google Scholar
  18. Fern A, Getoor L, Milch B (2006) SRL2006: open problems in statistical relational learning. http://www.cs.umd.edu/projects/srl2006/. Accessed 4 March 2012
  19. Friedman HF, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the 13th national conference on artificial intelligence, Portland, pp 717–724Google Scholar
  20. Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the international joint conference on artificial intelligence, Stockholm, pp 1300–1309Google Scholar
  21. Gagne II DJ, Supinie T, McGovern A, Basara J, Brown RA (2010) Analyzing the effects of low level boundaries on tornadogensis through spatiotemporal relational data mining. In: Presented at the 8th conference on artificial intelligence applications to environmental science, Atlanta, electronically publishedGoogle Scholar
  22. Gandin LS, Murphy AH (1992) Equitable skill scores for categorical forecasts. Mon Weather Rev 120(2): 361–370CrossRefGoogle Scholar
  23. Gerrity JP (1992) A note on Gandin and Murphy’s equitable skill score. Mon Weather Rev 120(11): 2709–2712CrossRefGoogle Scholar
  24. Getoor L, Friedman N, Koller D, Taskar B (2001) Learning probabilistic models of relational structure. In: Proceedings of the eighteenth international conference on machine learning, Montreal, pp 170–177Google Scholar
  25. Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3: 679–707MathSciNetGoogle Scholar
  26. Glasbey C, Allcroft DJ (2007) Spati-temporal weather models. Talk at 39th French statistical association (SFDS) congress, Angers. http://www.bioss.sari.ac.uk/staff/chris/angers07.pdf. Accessed 4 March 2012
  27. Goodchild M, Yuan M, Cova T (2007) Towards a general theory of geographic representation in GIS. Int J Geogr Inform Sci 21(3): 239–260CrossRefGoogle Scholar
  28. Jensen DD, Cohen PR (2000) Multiple comparisons in induction algorithms. Mach Learn 38(3): 309–338MATHCrossRefGoogle Scholar
  29. Jensen D, Getoor L (2003) IJCAI 2003 workshop on learning statistical models from relational data. http://kdl.cs.umass.edu/srl2003/. Accessed 4 March 2012
  30. Jensen D, Neville J (2002) Linkage and autocorrelation cause feature selection bias in relational learning. In: Proceedings of the international conference on machine learning, Sydney, pp 259–266Google Scholar
  31. Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, New YorkGoogle Scholar
  32. Klemp JB, Rotunno R (1983) A study of the tornadic region within a supercell thunderstorm. J Atmospheric Sci 40(2): 359–377CrossRefGoogle Scholar
  33. Kononenko I, Bratko I, Roskar E (1984) Experiments in automatic learning of medical diagnostic rules. Tech. rep., Jozef Stefan Institute, Ljubjana, YugoslaviaGoogle Scholar
  34. Liu WZ, White AP, Thompson SG, Bramer MA (1997) Techniques for dealing with missing values in classification. Advances in intelligent data analysis. Springer, Berlin, pp 527–536Google Scholar
  35. Longley PA, Goodchild M, Maguire DJ, Rhind DW (2005) Geographic information systems and science. Wiley, New YorkGoogle Scholar
  36. Markowski P, Richardson Y (2009) Tornadogenesis: our current understanding, forecasting considerations, and questions to guide future research. Atmospheric Res 93: 3–10CrossRefGoogle Scholar
  37. Markowski PM, Straka JM, Rasmussen EN (2003) Tornadogenesis resulting from the transport of circulation by a downdraft: Idealized numerical simulation. J Atmospheric Sci 60(6): 295–823MathSciNetCrossRefGoogle Scholar
  38. Marzban C (1998) Scalar measures of performance in rare-event situations. Weather Forecast 13(3): 753–763CrossRefGoogle Scholar
  39. McGovern A, Hiers N, Collier M, Gagne II DJ, Brown RA (2008) Spatiotemporal relational probability trees. In: Proceedings of the 2008 IEEE international conference on data mining, Pisa, pp 935–940Google Scholar
  40. McGovern A, Supinie T, Gagne DJ II, Troutman N, Collier M, Brown RA, Basara J, Williams J (2010) Understanding severe weather processes through spatiotemporal relational random forests. In: Proceedings of the 2010 NASA conference on intelligent data understanding, Mountain View, pp 213–227Google Scholar
  41. McGovern A, Gagne DJ II, Troutman N, Brown RA, Basara J, Williams J (2011a) Using spatiotemporal relational random forests to improve our understanding of severe weather processes. Stat Anal Data Min 4(4): 407–429MathSciNetCrossRefGoogle Scholar
  42. McGovern A, Rosendahl DH, Brown RA, Droegemeier KK (2011b) Identifying predictive multi-dimensional time series motifs: an application to understanding severe weather. Data Min Knowl Discov 22(1): 232–258CrossRefGoogle Scholar
  43. Miller, HJ, Han, J (eds) (2009) Geographic data mining and knowledge discovery, 2nd edn. Chapman and Hall/CRC Press, Boca RatonGoogle Scholar
  44. Neville J, Jensen D (2005) Leveraging relational autocorrelation with latent group models. In: Proceedings of the international conference on data mining, Houston, pp 322–329Google Scholar
  45. Neville J, Jensen D (2007) Relational dependency networks. J Mach Learn Res 8: 653–692MATHGoogle Scholar
  46. Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 625–630Google Scholar
  47. Neville J, Şimşek Ö, Jensen D, Komoroske J, Palmer K, Goldberg H (2005) Using relational knowledge discovery to prevent securities fraud. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 449–458Google Scholar
  48. O’Rourke J (1985) Finding minimal enclosing boxes. Int J Comput Inform Sci 14(3): 17MathSciNetGoogle Scholar
  49. O’Sullivan D, Unwin DJ (2002) Geographic information analysis. Wiley, HobokenGoogle Scholar
  50. Provost F, Domingos P (2000) Well-trained PETs: improving probability estimation trees. University of Washington; CDER working paper 00-04-is, Stern School of Business, NYU, electronically publishedGoogle Scholar
  51. Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3): 203–231MATHCrossRefGoogle Scholar
  52. Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann, Santa MateoGoogle Scholar
  53. Richardson M, Domingos P (2005) Markov logic networks. Mach Learn 62(1-2): 107–136CrossRefGoogle Scholar
  54. Rosendahl DH (2008) Identifying precursors to strong low-level rotation within numerically simulated supercell thunderstorms: a data mining approach. Master’s thesis, School of Meteorology, University of OklahomaGoogle Scholar
  55. Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, Englewood CliffsGoogle Scholar
  56. Schnabel R, Wahl R, Klein R (2007) Efficient RANSAC for point-cloud shape detection. Comput Graphics Forum 26: 214–226CrossRefGoogle Scholar
  57. Sharan U, Neville J (2007) Exploiting time-varying relationships in statistical relational models. In: Proceedings of the 1st SNA-KDD workshop, 13th ACM SIGKDD conference on knowledge discovery and data mining, San JoseGoogle Scholar
  58. Sharan U, Neville J (2008) Temporal-relational classifiers for prediction in evolving domains. In: Proceedings of the IEEE international conference on data mining, PisaGoogle Scholar
  59. Sharman R, Tebaldi C, Wiener G, Wolff J (2006) An integrated approach to mid- and upper-level turbulence forecasting. Weather Forecast 21: 268–287CrossRefGoogle Scholar
  60. Snook N, Xue M (2008) Effects of microphysical drop size distribution on tornadogenesis in supercell thunderstorms. Geophys Res Lett 35(24): L24,803CrossRefGoogle Scholar
  61. Srinivasan A (1999) A study of two probabilistic methods for searching large spaces with ILP. Tech. rep., PRG-TR-16-00 Oxford University Computing Laboratory, University of OxfordGoogle Scholar
  62. Storm Prediction Center (2012) Annual fatal tornado summaries. http://www.spc.noaa.gov/climo/torn/fataltorn.html. Accessed 4 March 2012
  63. Supinie T, McGovern A, Williams JK, Abernethy J (2009) Spatiotemporal relational random forests. In: Proceedings of the 2009 IEEE international conference on data mining workshops, Miami, pp 630–635Google Scholar
  64. Trapp RJ, Stumpf GJ, Manross KL (2005) A reassessment of the percentage of tornadic mesocyclones. Weather Forecast 20(4): 680–687CrossRefGoogle Scholar
  65. Trier SB, Sharman RD (2009) Convection-permitting simulations of the environment supporting widespread turbulence within the upper-level outflow of a mesoscale convective system. Mon Weather Rev 137: 1972–1990CrossRefGoogle Scholar
  66. Troutman N (2010) Enhanced spatiotemporal relational probability trees and forests. Master’s thesis, School of Computer Science, University of OklahomaGoogle Scholar
  67. Valdes-Sosa PA (2004) Spatio-temporal autoregressive models defined over brain manifolds. Neuroinformatics 2(2): 239–250CrossRefGoogle Scholar
  68. Weber RO, Talkner P (1993) Some remarks on spatial correlation function models. Mon Weather Rev 121(9): 2611–2617CrossRefGoogle Scholar
  69. White AP (1987) Probabilistic induction by dynamic path generation in virtual trees. Research and development in expert systems III. Cambridge University Press, Cambridge, pp 34–46Google Scholar
  70. Wicker LJ, Wilhelmson RB (1995) Simulation and analysis of tornado development and decay within a three–dimensional supercell thunderstorm. J Atmospheric Sci 52(15): 2675–2703CrossRefGoogle Scholar
  71. Williams JK, Sharman R, Craig J, Blackburn G (2008) Remote detection and diagnosis of thunderstorm turbulence. In: Proceedings of SPIE, remote sensing applications for aviation weather hazard detection and decision support, vol 7088, San DiegoGoogle Scholar
  72. Wolff J, Sharman R (2008) Climatology of upper-level turbulence over the continental United States. J Appl Meteorol Climatol 47: 2198–2214CrossRefGoogle Scholar
  73. Wurman J, Straka JM, Rasmussen EN (1996) Fine–scale doppler radar observations of tornadoes. Science 272(5269): 1774–1777CrossRefGoogle Scholar
  74. Xue M, Droegemeier KK, Wong V (2000) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction model. Part I: model dynamics and verification. Meteorol Atmospheric Phys 75: 161–193CrossRefGoogle Scholar
  75. Xue M, Droegemeier KK, Wong V, Shapiro A, Brewster K, Carr F, Weber D, Liu Y, Wang D (2001) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: model physics and applications. Meteorol Atmospheric Phys 76: 143–165CrossRefGoogle Scholar
  76. Xue M, Wang D, Gao J, Brewster K, Droegemeier KK (2003) The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorol Atmospheric Phys 82: 139–170CrossRefGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Amy McGovern
    • 1
  • Nathaniel Troutman
    • 1
  • Rodger A. Brown
    • 2
  • John K. Williams
    • 3
  • Jennifer Abernethy
    • 3
  1. 1.School of Computer ScienceUniversity of OklahomaNormanUSA
  2. 2.NOAA/National Severe Storms LaboratoryNormanUSA
  3. 3.Research Applications LaboratoryNational Center for Atmospheric ResearchBoulderUSA

Personalised recommendations