Abstract
This paper deals with the analysis of data streams recorded by georeferenced sensors. We focus on the problem of measuring the spatial dependence among the observations recorded over time and with the prediction of the data distribution, where no sensor record is available. The proposed strategy consists of two main steps: an online step summarizes the incoming data records by histograms; an offline step performs the measurement of the spatial dependence and the spatial prediction. The main novelties are the introduction of the variogram and the kriging for histogram data. Through these new tools we can monitor the spatial dependence and to perform the prediction starting from histogram data, rather than from sensor records. The effectiveness of the proposal is evaluated on real and simulated data.
Similar content being viewed by others
References
Aggarwal CC, Han J, Wang J, Yu P (2003) CluStream: a framework for clustering evolving data streams. In: Very large data bases
Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. Soc Ind Appl Math 43:904–924
Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodol) 44(2):139–77
Appice A, Ciampi A, Malerba D (2015) Summarizing numeric spatial data streams by trend cluster discovery. Data Min Knowl Discov 29(1):84–136
Arroyo J, Maté C (2009) Forecasting histogram time series with k-nearest neighbours methods. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2008.07.003
Balzanella A, Rivoli L, Verde R (2013) Data stream summarization by histograms clustering. In: Giudici P, Ingrassia S, Vichi M (eds) Statistical models for data analysis. Springer, Berlin, pp 27–35
Balzanella A, Romano E, Verde R (2017) Modified half-region depth for spatially dependent functional data. Stoch Environ Res Risk Assess 31:87. https://doi.org/10.1007/s00477-016-1291-x
Barnes RJ, Johnson TB (1984) Positive kriging. Verley G, David M, Journal AG, Marechal A(eds) Geostatistics for natural resources characterization. Springer, Berlin, pp 231–244
Bigot J, Gouet R, Klein T, López A (2017) Geodesic PCA in the Wasserstein space by convex PCA. Ann Inst Henri Poincare Probab Stat 53(1):1–26
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487
Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin
Boissard E, Le Gouic T, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759. https://doi.org/10.3150/13-BEJ585
Boogaart KG, Egozcue JJ, Pawlowsky-Glahn V (2014) Bayes Hilbert spaces. Aust N Z J Stat 56(2):171–194
Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295
Caballero W, Giraldo R, Mateu J (2013) A universal kriging approach for spatial functional data. Stoch Environ Res Risk Assess 27:1553–1563
Chiles JP, Delfiner P (2012) Geostatististics, modelling spatial uncertainty, 2nd edn. Wiley-Interscience, New York
Cressie N (1993) Statistics for spatial data. Wiley, Hoboken
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, New York
Cuturi M, Doucet A (2014) Fast computation of Wasserstein barycenters. In: Proceedings of the 31st international conference on machine learning, PMLR, vol 32(2), pp 685–693
Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2018) Robust clustering tools based on optimal transportation. Stat Comput. https://doi.org/10.1007/s11222-018-9800-z
Delicado P, Giraldo R, Comas C, Mateu J (2010) Statistics for spatial functional data: some recent contributions. Environmetrics 21(3–4):224–239
Dias S, Brito P (2013) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113
Ding Q, Ding Q, Perrizo W (2002) Decision tree classification of spatial data streams using Peano count trees. In: Proceedings of the 2002 ACM symposium on applied computing. (SAC’02). ACM, New York, NY, USA, 413–417. https://doi.org/10.1145/508791.508870
Ganguly AR, Gama J, Omitaomu OA, Gaber M, Vatsavai RR (2008) Knowledge discovery from sensor data. CRC Press, Boca Raton
Giraldo R, Delicado P, Mateu J (2011) Ordinary kriging for function-valued spatial data. Environ Ecol Stat 18(3):411–426
González-Rivera G, Arroyo J (2012) Time series modeling of histogram-valued data: the daily histogram time series of S&P500 intradaily returns. Int J Forecast 28(1):20–33
Gouet R, López A, Ortiz JM (2015) Geodesic kriging in the Wasserstein space. In: Schaeben H, Tolosana-Delgado R, van den Boogaart KG, van den Boogaart R (eds) Proceedings of the 17th annual Conference of the international association for mathematical geosciences IAMG 2015
Ignaccolo R, Mateu J, Giraldo R (2014) Kriging with external drift for functional data for air quality monitoring. Stoch Environ Res Risk Assess 28:1171–1186. https://doi.org/10.1007/s00477-013-0806-y
Irpino A, Romano E (2007) Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. In: Noirhomme-Fraiture M, Venturini G (eds) EGC, Revue des Nouvelles Technologies de lInformation, vol RNTI-E-9, pp 99–110
Irpino A, Verde R (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A, Žiberna A (eds) Data science and classification, proceedings of the IFCS 2006. Springer, Berlin, pp 185-192
Irpino A, Verde R (2015a) Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif 9(2):143–175
Irpino A, Verde R (2015b) Regression for numeric symbolic variables: a least squares approach based on Wasserstein distance. Adv Data Anal Classif 9:81–106 ISSN: 1862-5347
Journel AG, Huijbregts CJ (2004) Mining geostatistics. The Blackburn Press, Caldwell
Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246
Menafoglio A, Petris G (2016) Kriging for Hilbert-space valued random fields: the operatorial point of view. J Multivar Anal 146(2016):84–94
Menafoglio A, Secchi P (2017) Statistical analysis of complex and spatially dependent data: a review of Object Oriented Spatial Statistics. Eur J Oper Res 258(2):401–410
Menafoglio A, Secchi P, Dalla Rosa M (2013) A universal kriging predictor for spatially dependent functional data of a Hilbert space. Electron J Stat 7:2209–2240
Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:183–1851
Montero JM, Fernandez-Aviles G, Mateu J (2015) An introduction to functional geostatistics. In: Montero J, Fernández-Avilés G, Mateu J (eds) Spatial and spatio-temporal geostatistical modeling and kriging. Wiley, New York, pp 274–294
Panaretos VM, Zemel Y (2016) Amplitude and phase variation of point processes. Ann Stat 44(2):771–812
Pigoli D, Menafoglio A, Secchi P (2016) Kriging prediction for manifold valued random field. J Multivar Anal 145:117–131
Ramirez D, Via J, Santamaria I, Scharf LL (2010) Detection of spatially correlated Gaussian time series. IEEE Trans Signal Process 58(10):5006–5015
Rubner Y, Tomasi C, Guibas LJ (2000) The Earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121
Rushendorff L (2001) Wasserstein metric. In: Encyclopedia of mathematics. Springer, Berlin
Terrell GR, Scott DW (1985) Oversmoothed nonparametric density estimates. J Am Stat Assoc 80:209–214
Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240
Verde R, Irpino A (2007) Dynamic clustering of histogram data: using the right metric. In: Brito P, Cucumel G, Bertrand P, de Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 123–134
Villani C (2003) Topics in optimal transportation. Graduate Studies in Mathematics, vol 58. American Mathematical Society, Providence
Wackernagel H (2003) Multivariate geostatistics. Springer, Berlin
Wei LY, Peng WC (2013) An incremental algorithm for clustering spatial data streams: exploring temporal locality. Knowl Inf Syst 37(2):453–483
Zemel Y, Panaretos VM (2019) Fréchet means and procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976. https://projecteuclid.org/euclid.bj/1551862840
Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correlation analysis of spatial time series datasets: a filter-and-refine approach. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining
Zhang P, Huang Y, Shekhar S, Kumar V, (2003b) Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases. SSTD, (2003) Lecture Notes in Computer Science, vol 2750. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Balzanella, A., Irpino, A. Spatial prediction and spatial dependence monitoring on georeferenced data streams. Stat Methods Appl 29, 101–128 (2020). https://doi.org/10.1007/s10260-019-00462-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-019-00462-0