Spatial prediction and spatial dependence monitoring on georeferenced data streams

Abstract

This paper deals with the analysis of data streams recorded by georeferenced sensors. We focus on the problem of measuring the spatial dependence among the observations recorded over time and with the prediction of the data distribution, where no sensor record is available. The proposed strategy consists of two main steps: an online step summarizes the incoming data records by histograms; an offline step performs the measurement of the spatial dependence and the spatial prediction. The main novelties are the introduction of the variogram and the kriging for histogram data. Through these new tools we can monitor the spatial dependence and to perform the prediction starting from histogram data, rather than from sensor records. The effectiveness of the proposal is evaluated on real and simulated data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. Aggarwal CC, Han J, Wang J, Yu P (2003) CluStream: a framework for clustering evolving data streams. In: Very large data bases

  2. Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. Soc Ind Appl Math 43:904–924

    MathSciNet  MATH  Google Scholar 

  3. Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodol) 44(2):139–77

    MathSciNet  MATH  Google Scholar 

  4. Appice A, Ciampi A, Malerba D (2015) Summarizing numeric spatial data streams by trend cluster discovery. Data Min Knowl Discov 29(1):84–136

    MathSciNet  Article  Google Scholar 

  5. Arroyo J, Maté C (2009) Forecasting histogram time series with k-nearest neighbours methods. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2008.07.003

    Article  Google Scholar 

  6. Balzanella A, Rivoli L, Verde R (2013) Data stream summarization by histograms clustering. In: Giudici P, Ingrassia S, Vichi M (eds) Statistical models for data analysis. Springer, Berlin, pp 27–35

    Google Scholar 

  7. Balzanella A, Romano E, Verde R (2017) Modified half-region depth for spatially dependent functional data. Stoch Environ Res Risk Assess 31:87. https://doi.org/10.1007/s00477-016-1291-x

    Article  MATH  Google Scholar 

  8. Barnes RJ, Johnson TB (1984) Positive kriging. Verley G, David M, Journal AG, Marechal A(eds) Geostatistics for natural resources characterization. Springer, Berlin, pp 231–244

    Google Scholar 

  9. Bigot J, Gouet R, Klein T, López A (2017) Geodesic PCA in the Wasserstein space by convex PCA. Ann Inst Henri Poincare Probab Stat 53(1):1–26

    MathSciNet  Article  Google Scholar 

  10. Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487

    MathSciNet  Article  Google Scholar 

  11. Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin

    Google Scholar 

  12. Boissard E, Le Gouic T, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759. https://doi.org/10.3150/13-BEJ585

    MathSciNet  Article  MATH  Google Scholar 

  13. Boogaart KG, Egozcue JJ, Pawlowsky-Glahn V (2014) Bayes Hilbert spaces. Aust N Z J Stat 56(2):171–194

    MathSciNet  Article  Google Scholar 

  14. Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295

    Article  Google Scholar 

  15. Caballero W, Giraldo R, Mateu J (2013) A universal kriging approach for spatial functional data. Stoch Environ Res Risk Assess 27:1553–1563

    Article  Google Scholar 

  16. Chiles JP, Delfiner P (2012) Geostatististics, modelling spatial uncertainty, 2nd edn. Wiley-Interscience, New York

    Google Scholar 

  17. Cressie N (1993) Statistics for spatial data. Wiley, Hoboken

    Google Scholar 

  18. Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, New York

    Google Scholar 

  19. Cuturi M, Doucet A (2014) Fast computation of Wasserstein barycenters. In: Proceedings of the 31st international conference on machine learning, PMLR, vol 32(2), pp 685–693

  20. Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2018) Robust clustering tools based on optimal transportation. Stat Comput. https://doi.org/10.1007/s11222-018-9800-z

    Article  MATH  Google Scholar 

  21. Delicado P, Giraldo R, Comas C, Mateu J (2010) Statistics for spatial functional data: some recent contributions. Environmetrics 21(3–4):224–239

    MathSciNet  Google Scholar 

  22. Dias S, Brito P (2013) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113

    MathSciNet  Article  Google Scholar 

  23. Ding Q, Ding Q, Perrizo W (2002) Decision tree classification of spatial data streams using Peano count trees. In: Proceedings of the 2002 ACM symposium on applied computing. (SAC’02). ACM, New York, NY, USA, 413–417. https://doi.org/10.1145/508791.508870

  24. Ganguly AR, Gama J, Omitaomu OA, Gaber M, Vatsavai RR (2008) Knowledge discovery from sensor data. CRC Press, Boca Raton

    Google Scholar 

  25. Giraldo R, Delicado P, Mateu J (2011) Ordinary kriging for function-valued spatial data. Environ Ecol Stat 18(3):411–426

    MathSciNet  Article  Google Scholar 

  26. González-Rivera G, Arroyo J (2012) Time series modeling of histogram-valued data: the daily histogram time series of S&P500 intradaily returns. Int J Forecast 28(1):20–33

    Article  Google Scholar 

  27. Gouet R, López A, Ortiz JM (2015) Geodesic kriging in the Wasserstein space. In: Schaeben H, Tolosana-Delgado R, van den Boogaart KG, van den Boogaart R (eds) Proceedings of the 17th annual Conference of the international association for mathematical geosciences IAMG 2015

  28. Ignaccolo R, Mateu J, Giraldo R (2014) Kriging with external drift for functional data for air quality monitoring. Stoch Environ Res Risk Assess 28:1171–1186. https://doi.org/10.1007/s00477-013-0806-y

    Article  Google Scholar 

  29. Irpino A, Romano E (2007) Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. In: Noirhomme-Fraiture M, Venturini G (eds) EGC, Revue des Nouvelles Technologies de lInformation, vol RNTI-E-9, pp 99–110

  30. Irpino A, Verde R (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A, Žiberna A (eds) Data science and classification, proceedings of the IFCS 2006. Springer, Berlin, pp 185-192

  31. Irpino A, Verde R (2015a) Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif 9(2):143–175

    MathSciNet  Article  Google Scholar 

  32. Irpino A, Verde R (2015b) Regression for numeric symbolic variables: a least squares approach based on Wasserstein distance. Adv Data Anal Classif 9:81–106 ISSN: 1862-5347

  33. Journel AG, Huijbregts CJ (2004) Mining geostatistics. The Blackburn Press, Caldwell

    Google Scholar 

  34. Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246

    Article  Google Scholar 

  35. Menafoglio A, Petris G (2016) Kriging for Hilbert-space valued random fields: the operatorial point of view. J Multivar Anal 146(2016):84–94

    MathSciNet  Article  Google Scholar 

  36. Menafoglio A, Secchi P (2017) Statistical analysis of complex and spatially dependent data: a review of Object Oriented Spatial Statistics. Eur J Oper Res 258(2):401–410

    MathSciNet  Article  Google Scholar 

  37. Menafoglio A, Secchi P, Dalla Rosa M (2013) A universal kriging predictor for spatially dependent functional data of a Hilbert space. Electron J Stat 7:2209–2240

    MathSciNet  Article  Google Scholar 

  38. Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:183–1851

    Article  Google Scholar 

  39. Montero JM, Fernandez-Aviles G, Mateu J (2015) An introduction to functional geostatistics. In: Montero J, Fernández-Avilés G, Mateu J (eds) Spatial and spatio-temporal geostatistical modeling and kriging. Wiley, New York, pp 274–294

    Google Scholar 

  40. Panaretos VM, Zemel Y (2016) Amplitude and phase variation of point processes. Ann Stat 44(2):771–812

    MathSciNet  Article  Google Scholar 

  41. Pigoli D, Menafoglio A, Secchi P (2016) Kriging prediction for manifold valued random field. J Multivar Anal 145:117–131

    MathSciNet  Article  Google Scholar 

  42. Ramirez D, Via J, Santamaria I, Scharf LL (2010) Detection of spatially correlated Gaussian time series. IEEE Trans Signal Process 58(10):5006–5015

    MathSciNet  Article  Google Scholar 

  43. Rubner Y, Tomasi C, Guibas LJ (2000) The Earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121

    Article  Google Scholar 

  44. Rushendorff L (2001) Wasserstein metric. In: Encyclopedia of mathematics. Springer, Berlin

  45. Terrell GR, Scott DW (1985) Oversmoothed nonparametric density estimates. J Am Stat Assoc 80:209–214

    MathSciNet  Article  Google Scholar 

  46. Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240

    Article  Google Scholar 

  47. Verde R, Irpino A (2007) Dynamic clustering of histogram data: using the right metric. In: Brito P, Cucumel G, Bertrand P, de Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 123–134

    Google Scholar 

  48. Villani C (2003) Topics in optimal transportation. Graduate Studies in Mathematics, vol 58. American Mathematical Society, Providence

  49. Wackernagel H (2003) Multivariate geostatistics. Springer, Berlin

    Google Scholar 

  50. Wei LY, Peng WC (2013) An incremental algorithm for clustering spatial data streams: exploring temporal locality. Knowl Inf Syst 37(2):453–483

    Article  Google Scholar 

  51. Zemel Y, Panaretos VM (2019) Fréchet means and procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976. https://projecteuclid.org/euclid.bj/1551862840

  52. Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correlation analysis of spatial time series datasets: a filter-and-refine approach. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining

  53. Zhang P, Huang Y, Shekhar S, Kumar V, (2003b) Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases. SSTD, (2003) Lecture Notes in Computer Science, vol 2750. Springer, Berlin

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Antonio Balzanella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Balzanella, A., Irpino, A. Spatial prediction and spatial dependence monitoring on georeferenced data streams. Stat Methods Appl 29, 101–128 (2020). https://doi.org/10.1007/s10260-019-00462-0

Download citation

Keywords

  • Data stream mining
  • Histogram data
  • Variogram
  • Kriging predictor