Abstract
Spatial data are commonly minimal and may have been collected in the process of confirming the profitability of a mining venture or investigating a contaminated site. In such situations, it is common to have measurements preferentially taken in the most critical areas (sweet spots, allegedly contaminated areas), thus conditionally biasing the sample. While preferential sampling makes good practical sense, its direct use leads to distorted sample moments and percentiles. Spatial clusters are a problem that has been identified in the past and solved with approaches ranging from ad hoc solutions to highly elaborate mathematical formulations, covering mostly the effect of clustering on the cumulative frequency distribution. The method proposed here is a form of resample, free of special assumptions, does not use weights to ponder the measurements, does not find solutions by successive approximation and provides variability in the results. The new method is illustrated with a synthetic dataset with an exponential semivariogram and purposely generated to follow a lognormal distribution. The lognormal distribution is both difficult to work with and typical of many attributes of practical interest. Testing of the new solution shows that sample subsets derived from resampled datasets can closely approximate the true probability distribution and the semivariogram, clearly outperforming the original preferentially sampled data.
Similar content being viewed by others
References
Beirlant J, Goegebeur Y, Segers J, Teugels J (2004) Statistics of extremes: theory and applications. Wiley, Chichester, 490 p
Bogaert P (1999) On the optimal estimation of the cumulative distribution function in presence of spatial dependence. Math Geol 31(2):213–239
Bourgault G (1997) Spatial declustering weights. Math Geol 29(2):277–290
Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken, 734 p
Deutsch CV (1989) DECLUS: a FORTRAN 77 program for determining optimum spatial declustering weights. Comput Geosci 15(3):325–332
Diggle J, Menezes R, Su TL (2010) Geostatistical Inference under preferential sampling. J R Stat Soc Ser C 59(2):191–232
Emery X, Ortiz JM (2005) Histogram and variogram inference in the multigaussian model. Stoch Environ Res Risk Assess 19(1):48–58
Emery X, Ortiz JM (2007) Weighted sample variograms as a tool to better assess the spatial variability of soil properties. Geoderma 140(1–2):81–89
Isaac EH, Srivastava RM (1989) Introduction to applied geostatistics. Oxford University Press, New York, 561 p
Journel AG (1983) Nonparametric estimation of spatial distributions. J Int Assoc Math Geol 15(3):445–468
Kovitz JL, Christakos G (2004) Spatial statistics of clustered data. Stoch Environ Res Risk Assess 18(3):147–166
Marchant BP, Viscarra Rossel RA, Webster R (2013) Fluctuations in method-of-moments variograms caused by clustered sampling and their elimination by declustering and residual maximum likelihood estimation. Eur J Soil Sci 64(4):401–409
Olea RA (2006) A six-step practical approach to semivariogram modeling. Stoch Environ Res Risk Assess 20(5):307–318
Olea RA (2007) Declustering of clustered preferential sampling for histogram and semivariogram inference. Math Geol 39(6):453–467
Olea RA (2008) Basic statistical concepts and methods for earth scientists. U.S. Geological Survey Open-File Report 2008–1017, 191 p
Omre H (1984) The variogram and its estimation. In: Verly G, David M, Journel AG, Meréchal A (eds) Geostatistics for natural resources characterization, part 1. Reidel, Dordrecht, pp 107–125
Pardo-Igúzquiza E, Dowd PA (2004) Normality test for spatially correlated data. Math Geol 36(6):659–681
Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling, 2nd edn. Oxford University Press, New York, 433 p
Pyrcz MJ, Gringarten E, Frykman P, Deutsch CV (2006) Representative input parameters for geostatistical simulation. In: Coburn TC, Yarus JM, Chambers RL (eds) Stochastic modeling and meostatistics: principles, methods and case studies, vol II. AAPG Computer Applications in Geology 5, pp 123–137
Reilly C, Gelman A (2007) Weighted classical variogram estimation for data with clustering. Technometrics 49(2):184–194
Richmond A (2002) Two-point declustering for weighting data pairs in experimental variogram calculations. Comput Geosci 28(2):231–241
Rivoirard J (2001) Weighted semivariograms. In: Kleingeld WJ, Krige DG (eds) Proceedings of the 6th International Geostatistics Congress, Cape Town, pp 145–155
Switzer P (1977) Estimation of spatial distributions from point sources with applications to air pollution measurement. Proceedings of the 41st ISI Session, New Delhi. Bulletin of the International Statistical Institute 47(2):123–137. Also available as Technical Report No. 9, Department of Statistics, Stanford University 20 p. https://statistics.stanford.edu/sites/default/files/SIMS%2009.pdf
Webster R, Oliver MA (1992) Sample adequately to estimate variograms of soil properties. J Soil Sci 43(1):177–192
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Chichester, 315 p
Acknowledgments
The author wishes to thank Mark Engle (U.S. Geological Survey), Michael Pyrcz (Chevron Energy Technology Company), John Schuenemeyer (Southwest Statistical Consulting) and an anonymous reviewer appointed by the journal for reviewing an earlier version of the manuscript and making suggestion to improve its contents.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Olea, R.A. Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms. Stoch Environ Res Risk Assess 31, 481–491 (2017). https://doi.org/10.1007/s00477-016-1289-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-016-1289-4