Skip to main content
Log in

Spatial prediction and spatial dependence monitoring on georeferenced data streams

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

This paper deals with the analysis of data streams recorded by georeferenced sensors. We focus on the problem of measuring the spatial dependence among the observations recorded over time and with the prediction of the data distribution, where no sensor record is available. The proposed strategy consists of two main steps: an online step summarizes the incoming data records by histograms; an offline step performs the measurement of the spatial dependence and the spatial prediction. The main novelties are the introduction of the variogram and the kriging for histogram data. Through these new tools we can monitor the spatial dependence and to perform the prediction starting from histogram data, rather than from sensor records. The effectiveness of the proposal is evaluated on real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Aggarwal CC, Han J, Wang J, Yu P (2003) CluStream: a framework for clustering evolving data streams. In: Very large data bases

  • Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. Soc Ind Appl Math 43:904–924

    MathSciNet  MATH  Google Scholar 

  • Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodol) 44(2):139–77

    MathSciNet  MATH  Google Scholar 

  • Appice A, Ciampi A, Malerba D (2015) Summarizing numeric spatial data streams by trend cluster discovery. Data Min Knowl Discov 29(1):84–136

    Article  MathSciNet  Google Scholar 

  • Arroyo J, Maté C (2009) Forecasting histogram time series with k-nearest neighbours methods. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2008.07.003

    Article  Google Scholar 

  • Balzanella A, Rivoli L, Verde R (2013) Data stream summarization by histograms clustering. In: Giudici P, Ingrassia S, Vichi M (eds) Statistical models for data analysis. Springer, Berlin, pp 27–35

    Chapter  Google Scholar 

  • Balzanella A, Romano E, Verde R (2017) Modified half-region depth for spatially dependent functional data. Stoch Environ Res Risk Assess 31:87. https://doi.org/10.1007/s00477-016-1291-x

    Article  MATH  Google Scholar 

  • Barnes RJ, Johnson TB (1984) Positive kriging. Verley G, David M, Journal AG, Marechal A(eds) Geostatistics for natural resources characterization. Springer, Berlin, pp 231–244

    Chapter  Google Scholar 

  • Bigot J, Gouet R, Klein T, López A (2017) Geodesic PCA in the Wasserstein space by convex PCA. Ann Inst Henri Poincare Probab Stat 53(1):1–26

    Article  MathSciNet  Google Scholar 

  • Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487

    Article  MathSciNet  Google Scholar 

  • Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin

    Book  Google Scholar 

  • Boissard E, Le Gouic T, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759. https://doi.org/10.3150/13-BEJ585

    Article  MathSciNet  MATH  Google Scholar 

  • Boogaart KG, Egozcue JJ, Pawlowsky-Glahn V (2014) Bayes Hilbert spaces. Aust N Z J Stat 56(2):171–194

    Article  MathSciNet  Google Scholar 

  • Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295

    Article  Google Scholar 

  • Caballero W, Giraldo R, Mateu J (2013) A universal kriging approach for spatial functional data. Stoch Environ Res Risk Assess 27:1553–1563

    Article  Google Scholar 

  • Chiles JP, Delfiner P (2012) Geostatististics, modelling spatial uncertainty, 2nd edn. Wiley-Interscience, New York

    Book  Google Scholar 

  • Cressie N (1993) Statistics for spatial data. Wiley, Hoboken

    Book  Google Scholar 

  • Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, New York

    MATH  Google Scholar 

  • Cuturi M, Doucet A (2014) Fast computation of Wasserstein barycenters. In: Proceedings of the 31st international conference on machine learning, PMLR, vol 32(2), pp 685–693

  • Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2018) Robust clustering tools based on optimal transportation. Stat Comput. https://doi.org/10.1007/s11222-018-9800-z

    Article  MATH  Google Scholar 

  • Delicado P, Giraldo R, Comas C, Mateu J (2010) Statistics for spatial functional data: some recent contributions. Environmetrics 21(3–4):224–239

    MathSciNet  Google Scholar 

  • Dias S, Brito P (2013) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113

    Article  MathSciNet  Google Scholar 

  • Ding Q, Ding Q, Perrizo W (2002) Decision tree classification of spatial data streams using Peano count trees. In: Proceedings of the 2002 ACM symposium on applied computing. (SAC’02). ACM, New York, NY, USA, 413–417. https://doi.org/10.1145/508791.508870

  • Ganguly AR, Gama J, Omitaomu OA, Gaber M, Vatsavai RR (2008) Knowledge discovery from sensor data. CRC Press, Boca Raton

    Book  Google Scholar 

  • Giraldo R, Delicado P, Mateu J (2011) Ordinary kriging for function-valued spatial data. Environ Ecol Stat 18(3):411–426

    Article  MathSciNet  Google Scholar 

  • González-Rivera G, Arroyo J (2012) Time series modeling of histogram-valued data: the daily histogram time series of S&P500 intradaily returns. Int J Forecast 28(1):20–33

    Article  Google Scholar 

  • Gouet R, López A, Ortiz JM (2015) Geodesic kriging in the Wasserstein space. In: Schaeben H, Tolosana-Delgado R, van den Boogaart KG, van den Boogaart R (eds) Proceedings of the 17th annual Conference of the international association for mathematical geosciences IAMG 2015

  • Ignaccolo R, Mateu J, Giraldo R (2014) Kriging with external drift for functional data for air quality monitoring. Stoch Environ Res Risk Assess 28:1171–1186. https://doi.org/10.1007/s00477-013-0806-y

    Article  Google Scholar 

  • Irpino A, Romano E (2007) Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. In: Noirhomme-Fraiture M, Venturini G (eds) EGC, Revue des Nouvelles Technologies de lInformation, vol RNTI-E-9, pp 99–110

  • Irpino A, Verde R (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A, Žiberna A (eds) Data science and classification, proceedings of the IFCS 2006. Springer, Berlin, pp 185-192

  • Irpino A, Verde R (2015a) Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif 9(2):143–175

    Article  MathSciNet  Google Scholar 

  • Irpino A, Verde R (2015b) Regression for numeric symbolic variables: a least squares approach based on Wasserstein distance. Adv Data Anal Classif 9:81–106 ISSN: 1862-5347

  • Journel AG, Huijbregts CJ (2004) Mining geostatistics. The Blackburn Press, Caldwell

    Google Scholar 

  • Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246

    Article  Google Scholar 

  • Menafoglio A, Petris G (2016) Kriging for Hilbert-space valued random fields: the operatorial point of view. J Multivar Anal 146(2016):84–94

    Article  MathSciNet  Google Scholar 

  • Menafoglio A, Secchi P (2017) Statistical analysis of complex and spatially dependent data: a review of Object Oriented Spatial Statistics. Eur J Oper Res 258(2):401–410

    Article  MathSciNet  Google Scholar 

  • Menafoglio A, Secchi P, Dalla Rosa M (2013) A universal kriging predictor for spatially dependent functional data of a Hilbert space. Electron J Stat 7:2209–2240

    Article  MathSciNet  Google Scholar 

  • Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:183–1851

    Article  Google Scholar 

  • Montero JM, Fernandez-Aviles G, Mateu J (2015) An introduction to functional geostatistics. In: Montero J, Fernández-Avilés G, Mateu J (eds) Spatial and spatio-temporal geostatistical modeling and kriging. Wiley, New York, pp 274–294

    Chapter  Google Scholar 

  • Panaretos VM, Zemel Y (2016) Amplitude and phase variation of point processes. Ann Stat 44(2):771–812

    Article  MathSciNet  Google Scholar 

  • Pigoli D, Menafoglio A, Secchi P (2016) Kriging prediction for manifold valued random field. J Multivar Anal 145:117–131

    Article  MathSciNet  Google Scholar 

  • Ramirez D, Via J, Santamaria I, Scharf LL (2010) Detection of spatially correlated Gaussian time series. IEEE Trans Signal Process 58(10):5006–5015

    Article  MathSciNet  Google Scholar 

  • Rubner Y, Tomasi C, Guibas LJ (2000) The Earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121

    Article  Google Scholar 

  • Rushendorff L (2001) Wasserstein metric. In: Encyclopedia of mathematics. Springer, Berlin

  • Terrell GR, Scott DW (1985) Oversmoothed nonparametric density estimates. J Am Stat Assoc 80:209–214

    Article  MathSciNet  Google Scholar 

  • Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240

    Article  Google Scholar 

  • Verde R, Irpino A (2007) Dynamic clustering of histogram data: using the right metric. In: Brito P, Cucumel G, Bertrand P, de Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 123–134

    Chapter  Google Scholar 

  • Villani C (2003) Topics in optimal transportation. Graduate Studies in Mathematics, vol 58. American Mathematical Society, Providence

  • Wackernagel H (2003) Multivariate geostatistics. Springer, Berlin

    Book  Google Scholar 

  • Wei LY, Peng WC (2013) An incremental algorithm for clustering spatial data streams: exploring temporal locality. Knowl Inf Syst 37(2):453–483

    Article  Google Scholar 

  • Zemel Y, Panaretos VM (2019) Fréchet means and procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976. https://projecteuclid.org/euclid.bj/1551862840

  • Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correlation analysis of spatial time series datasets: a filter-and-refine approach. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining

  • Zhang P, Huang Y, Shekhar S, Kumar V, (2003b) Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases. SSTD, (2003) Lecture Notes in Computer Science, vol 2750. Springer, Berlin

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Balzanella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Balzanella, A., Irpino, A. Spatial prediction and spatial dependence monitoring on georeferenced data streams. Stat Methods Appl 29, 101–128 (2020). https://doi.org/10.1007/s10260-019-00462-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-019-00462-0

Keywords

Navigation