Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach
A spatial time series dataset is a collection of time series, each referencing a location in a common spatial framework. Correlation analysis is often used to identify pairs of potentially interacting elements from the cross product of two spatial time series datasets. However, the computational cost of correlation analysis is very high when the dimension of the time series and the number of locations in the spatial frameworks are large. The key contribution of this paper is the use of spatial autocorrelation among spatial neighboring time series to reduce computational cost. A filter-and-refine algorithm based on coning, i.e. grouping of locations, is proposed to reduce the cost of correlation analysis over a pair of spatial time series datasets. Cone-level correlation computation can be used to eliminate (filter out) a large number of element pairs whose correlation is clearly below (or above) a given threshold. Element pair correlation needs to be computed for remaining pairs. Using experimental studies with Earth science datasets, we show that the filter-and-refine approach can save a large fraction of the computational cost, particularly when the minimal correlation threshold is high.
Unable to display preview. Download preview PDF.
- 1.R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search In Sequence Databases. In Proc. of the 4th Int’l Conference of Foundations of Data Organization and Algorithms, 1993.Google Scholar
- 2.G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, 1994.Google Scholar
- 3.B. W. Lindgren. Statistical Theory (Fourth Edition). Chapman-Hall, 1998.Google Scholar
- 4.K. Chan and A. W. Fu. Efficient Time Series Matching by Wavelets. In Proc. of the 15th ICDE, 1999.Google Scholar
- 5.N. Cressie. Statistics for Spatial Data. John Wiley and Sons, 1991.Google Scholar
- 6.Christos Faloutsos. Searching Multimedia Databases By Content. Kluwer Academic Publishers, 1996.Google Scholar
- 7.R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. Namburu, editors. Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers, 2001.Google Scholar
- 9.J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.Google Scholar
- 10.E. Keogh and M. Pazzani. An Indexing Scheme for Fast Similarity Search in Large Time Series Databases. In Proc. of 11th Int’l Conference on Scientific and Statistical Database Management, 1999.Google Scholar
- 11.Y. Moon, K. Whang, and W. Han. A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows. In Proc. of ACM SIGMOD, Madison, WI, 2002.Google Scholar
- 13.J. Roddick and K. Hornsby. Temporal, Spatial, and Spatio-Temporal Data Mining. In First Int’l Workshop on Temporal, Spatial and Spatio-Temporal Data Mining, 2000.Google Scholar
- 14.S. Shekhar and S. Chawla. Spatial Databases: A Tour. Prentice Hall, 2002.Google Scholar
- 16.M. Steinbach, P. Tan, V. Kumar, C. Potter, S. Klooster, and A. Torregrosa. Data Mining for the Discovery of Ocean Climate Indices. In Proc of the Fifth Workshop on Scientific Data Mining, 2002.Google Scholar
- 17.P. Tan, M. Steinbach, V. Kumar, C. Potter, S. Klooster, and A. Torregrosa. Finding Spatio-Temporal Patterns in Earth Science Data. In KDD 2001 Workshop on Temporal Data Mining, 2001.Google Scholar
- 18.G. H. Taylor. Impacts of the El Niño/Southern Oscillation on the Pacific Northwest. http://www.ocs.orst.edu/reports/enso_pnw.html.
- 19.Michael F. Worboys. GIS — A Computing Perspective. Taylor and Francis, 1995.Google Scholar