Abstract
Many real world domains are inherently spatiotemporal in nature. In this work, we introduce significant enhancements to two spatiotemporal relational learning methods, the spatiotemporal relational probability tree and the spatiotemporal relational random forest, that increase their ability to learn using spatiotemporal data. We enabled the models to formulate questions on both objects and the scalar and vector fields within and around objects, allowing the models to differentiate based on the gradient, divergence, and curl and to recognize the shape of point clouds defined by fields. This enables the model to ask questions about the change of a shape over time or about its orientation. These additions are validated on several real-world hazardous weather datasets. We demonstrate that these additions enable the models to learn robust classifiers that outperform the versions without these new additions. In addition, analysis of the learned models shows that the findings are consistent with current meteorological theories.
Article PDF
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
References
Allcroft DJ, Glasbey C, Durban M (2001) Modelling weather data. In: SCRI annual report 2001, pp 192–195
Allen JF (1991) Time and time again: the many ways to represent time. Int J Intell Syst 6(4): 341–355
Barber C, Dobkin D, Huhdanpaa H (1996) The quickhull algorithm for convex hulls. ACM Tran Math Softw 22(4):469–483. http://www.qhull.org. Accessed 4 March 2012
Bedka K, Brunner J, Dworak R, Fletz W, Otkin J, Greenwald T (2010) Objective satellite-based detection of overshooting tops using infrared window channel brightness temperature gradients. J Appl Meteorol Climatol 49: 181–202
Bluestein H, Weiss C, French M, Holthaus E, Tanamachi R, Frasier S, Pazmany A (2007) The structure of tornadoes near Attica, Kansas, on 12 May 2004: high–resolution, mobile, Doppler radar observations. Mon Weather Rev 135(2): 475–506
Bodenhamer M, Bleckley S, Fennelly D, Fagg AH, McGovern A (2009) Spatio-temporal multi-dimensional relational framework trees. In: Proceedings of the 2009 IEEE international conference on data mining (ICDM) workshop on spatiotemporal data mining, Miami, electronically published
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Cornman L, Morse C, Cunning G (1995) Real-time estimation of atmospheric turbulence severity from in situ aircraft measurements. J Aircr 32: 171–177
Cornman L, Meymaris G, Limber M (2004) An update on the FAA aviation weather research program’s in situ turbulence measurement and reporting system. In: Preprints, 11th conf on aviation, range and aerospace meteorology, Hyannis, p P4.3
Cova T, Goodchild M (2002) Extending geographical representation to include fields of spatial objects. Int J Geogr Inform Sci 16(6): 509–532
Davies-Jones R (2008) Can a descending rain curtain in a supercell instigate tornadogenesis barotropically?. J Atmospheric Sci 65: 2469–2497
Dutton J, Panofsky HA (1970) Clear air turbulence: a mystery may be unfolding. Science 167: 937–944
Egan JP (1984) Signal detection theory and ROC analysis. Series in cognition and perception. Academic Press, New York
Eldardiry H, Neville J (2011) Across-model collective ensemble classification. In: Proceedings of the 25th conference on artificial intelligence AAAI 2011, San Francisco, electronically published
Fast A, Friedland L, Maier M, Taylor B, Jensen D, Goldberg H, Komoroske K (2007) Relational data pre-processing techniques for improved securities fraud detection. In: Proceedings of the 13th international conference on knowledge discovery and data mining, San Jose, pp 941–949
Fern A, Getoor L, Milch B (2006) SRL2006: open problems in statistical relational learning. http://www.cs.umd.edu/projects/srl2006/. Accessed 4 March 2012
Friedman HF, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the 13th national conference on artificial intelligence, Portland, pp 717–724
Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the international joint conference on artificial intelligence, Stockholm, pp 1300–1309
Gagne II DJ, Supinie T, McGovern A, Basara J, Brown RA (2010) Analyzing the effects of low level boundaries on tornadogensis through spatiotemporal relational data mining. In: Presented at the 8th conference on artificial intelligence applications to environmental science, Atlanta, electronically published
Gandin LS, Murphy AH (1992) Equitable skill scores for categorical forecasts. Mon Weather Rev 120(2): 361–370
Gerrity JP (1992) A note on Gandin and Murphy’s equitable skill score. Mon Weather Rev 120(11): 2709–2712
Getoor L, Friedman N, Koller D, Taskar B (2001) Learning probabilistic models of relational structure. In: Proceedings of the eighteenth international conference on machine learning, Montreal, pp 170–177
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3: 679–707
Glasbey C, Allcroft DJ (2007) Spati-temporal weather models. Talk at 39th French statistical association (SFDS) congress, Angers. http://www.bioss.sari.ac.uk/staff/chris/angers07.pdf. Accessed 4 March 2012
Goodchild M, Yuan M, Cova T (2007) Towards a general theory of geographic representation in GIS. Int J Geogr Inform Sci 21(3): 239–260
Jensen DD, Cohen PR (2000) Multiple comparisons in induction algorithms. Mach Learn 38(3): 309–338
Jensen D, Getoor L (2003) IJCAI 2003 workshop on learning statistical models from relational data. http://kdl.cs.umass.edu/srl2003/. Accessed 4 March 2012
Jensen D, Neville J (2002) Linkage and autocorrelation cause feature selection bias in relational learning. In: Proceedings of the international conference on machine learning, Sydney, pp 259–266
Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, New York
Klemp JB, Rotunno R (1983) A study of the tornadic region within a supercell thunderstorm. J Atmospheric Sci 40(2): 359–377
Kononenko I, Bratko I, Roskar E (1984) Experiments in automatic learning of medical diagnostic rules. Tech. rep., Jozef Stefan Institute, Ljubjana, Yugoslavia
Liu WZ, White AP, Thompson SG, Bramer MA (1997) Techniques for dealing with missing values in classification. Advances in intelligent data analysis. Springer, Berlin, pp 527–536
Longley PA, Goodchild M, Maguire DJ, Rhind DW (2005) Geographic information systems and science. Wiley, New York
Markowski P, Richardson Y (2009) Tornadogenesis: our current understanding, forecasting considerations, and questions to guide future research. Atmospheric Res 93: 3–10
Markowski PM, Straka JM, Rasmussen EN (2003) Tornadogenesis resulting from the transport of circulation by a downdraft: Idealized numerical simulation. J Atmospheric Sci 60(6): 295–823
Marzban C (1998) Scalar measures of performance in rare-event situations. Weather Forecast 13(3): 753–763
McGovern A, Hiers N, Collier M, Gagne II DJ, Brown RA (2008) Spatiotemporal relational probability trees. In: Proceedings of the 2008 IEEE international conference on data mining, Pisa, pp 935–940
McGovern A, Supinie T, Gagne DJ II, Troutman N, Collier M, Brown RA, Basara J, Williams J (2010) Understanding severe weather processes through spatiotemporal relational random forests. In: Proceedings of the 2010 NASA conference on intelligent data understanding, Mountain View, pp 213–227
McGovern A, Gagne DJ II, Troutman N, Brown RA, Basara J, Williams J (2011a) Using spatiotemporal relational random forests to improve our understanding of severe weather processes. Stat Anal Data Min 4(4): 407–429
McGovern A, Rosendahl DH, Brown RA, Droegemeier KK (2011b) Identifying predictive multi-dimensional time series motifs: an application to understanding severe weather. Data Min Knowl Discov 22(1): 232–258
Miller, HJ, Han, J (eds) (2009) Geographic data mining and knowledge discovery, 2nd edn. Chapman and Hall/CRC Press, Boca Raton
Neville J, Jensen D (2005) Leveraging relational autocorrelation with latent group models. In: Proceedings of the international conference on data mining, Houston, pp 322–329
Neville J, Jensen D (2007) Relational dependency networks. J Mach Learn Res 8: 653–692
Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 625–630
Neville J, Şimşek Ö, Jensen D, Komoroske J, Palmer K, Goldberg H (2005) Using relational knowledge discovery to prevent securities fraud. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 449–458
O’Rourke J (1985) Finding minimal enclosing boxes. Int J Comput Inform Sci 14(3): 17
O’Sullivan D, Unwin DJ (2002) Geographic information analysis. Wiley, Hoboken
Provost F, Domingos P (2000) Well-trained PETs: improving probability estimation trees. University of Washington; CDER working paper 00-04-is, Stern School of Business, NYU, electronically published
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3): 203–231
Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann, Santa Mateo
Richardson M, Domingos P (2005) Markov logic networks. Mach Learn 62(1-2): 107–136
Rosendahl DH (2008) Identifying precursors to strong low-level rotation within numerically simulated supercell thunderstorms: a data mining approach. Master’s thesis, School of Meteorology, University of Oklahoma
Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, Englewood Cliffs
Schnabel R, Wahl R, Klein R (2007) Efficient RANSAC for point-cloud shape detection. Comput Graphics Forum 26: 214–226
Sharan U, Neville J (2007) Exploiting time-varying relationships in statistical relational models. In: Proceedings of the 1st SNA-KDD workshop, 13th ACM SIGKDD conference on knowledge discovery and data mining, San Jose
Sharan U, Neville J (2008) Temporal-relational classifiers for prediction in evolving domains. In: Proceedings of the IEEE international conference on data mining, Pisa
Sharman R, Tebaldi C, Wiener G, Wolff J (2006) An integrated approach to mid- and upper-level turbulence forecasting. Weather Forecast 21: 268–287
Snook N, Xue M (2008) Effects of microphysical drop size distribution on tornadogenesis in supercell thunderstorms. Geophys Res Lett 35(24): L24,803
Srinivasan A (1999) A study of two probabilistic methods for searching large spaces with ILP. Tech. rep., PRG-TR-16-00 Oxford University Computing Laboratory, University of Oxford
Storm Prediction Center (2012) Annual fatal tornado summaries. http://www.spc.noaa.gov/climo/torn/fataltorn.html. Accessed 4 March 2012
Supinie T, McGovern A, Williams JK, Abernethy J (2009) Spatiotemporal relational random forests. In: Proceedings of the 2009 IEEE international conference on data mining workshops, Miami, pp 630–635
Trapp RJ, Stumpf GJ, Manross KL (2005) A reassessment of the percentage of tornadic mesocyclones. Weather Forecast 20(4): 680–687
Trier SB, Sharman RD (2009) Convection-permitting simulations of the environment supporting widespread turbulence within the upper-level outflow of a mesoscale convective system. Mon Weather Rev 137: 1972–1990
Troutman N (2010) Enhanced spatiotemporal relational probability trees and forests. Master’s thesis, School of Computer Science, University of Oklahoma
Valdes-Sosa PA (2004) Spatio-temporal autoregressive models defined over brain manifolds. Neuroinformatics 2(2): 239–250
Weber RO, Talkner P (1993) Some remarks on spatial correlation function models. Mon Weather Rev 121(9): 2611–2617
White AP (1987) Probabilistic induction by dynamic path generation in virtual trees. Research and development in expert systems III. Cambridge University Press, Cambridge, pp 34–46
Wicker LJ, Wilhelmson RB (1995) Simulation and analysis of tornado development and decay within a three–dimensional supercell thunderstorm. J Atmospheric Sci 52(15): 2675–2703
Williams JK, Sharman R, Craig J, Blackburn G (2008) Remote detection and diagnosis of thunderstorm turbulence. In: Proceedings of SPIE, remote sensing applications for aviation weather hazard detection and decision support, vol 7088, San Diego
Wolff J, Sharman R (2008) Climatology of upper-level turbulence over the continental United States. J Appl Meteorol Climatol 47: 2198–2214
Wurman J, Straka JM, Rasmussen EN (1996) Fine–scale doppler radar observations of tornadoes. Science 272(5269): 1774–1777
Xue M, Droegemeier KK, Wong V (2000) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction model. Part I: model dynamics and verification. Meteorol Atmospheric Phys 75: 161–193
Xue M, Droegemeier KK, Wong V, Shapiro A, Brewster K, Carr F, Weber D, Liu Y, Wang D (2001) The advanced regional prediction system (ARPS)—a multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: model physics and applications. Meteorol Atmospheric Phys 76: 143–165
Xue M, Wang D, Gao J, Brewster K, Droegemeier KK (2003) The advanced regional prediction system (ARPS), storm-scale numerical weather prediction and data assimilation. Meteorol Atmospheric Phys 82: 139–170
Acknowledgements
The authors thank Jason Craig, David J. Gagne II, Nathan Hiers, Gregory Meymaris, Timothy Supinie, and Derek Rosendahl for their work in generating some of the data used in this research. This research was supported by the National Science Foundation under Grant No. NSF/IIS/0746816 and by NASA under Grant No. NNX08AL89G. Much of the computing for this project was performed at the OU Supercomputing Center for Education & Research (OSCER) at the University of Oklahoma (OU).
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
McGovern, A., Troutman, N., Brown, R.A. et al. Enhanced spatiotemporal relational probability trees and forests. Data Min Knowl Disc 26, 398–433 (2013). https://doi.org/10.1007/s10618-012-0261-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0261-2