Skip to main content
Log in

Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Spatial data are commonly minimal and may have been collected in the process of confirming the profitability of a mining venture or investigating a contaminated site. In such situations, it is common to have measurements preferentially taken in the most critical areas (sweet spots, allegedly contaminated areas), thus conditionally biasing the sample. While preferential sampling makes good practical sense, its direct use leads to distorted sample moments and percentiles. Spatial clusters are a problem that has been identified in the past and solved with approaches ranging from ad hoc solutions to highly elaborate mathematical formulations, covering mostly the effect of clustering on the cumulative frequency distribution. The method proposed here is a form of resample, free of special assumptions, does not use weights to ponder the measurements, does not find solutions by successive approximation and provides variability in the results. The new method is illustrated with a synthetic dataset with an exponential semivariogram and purposely generated to follow a lognormal distribution. The lognormal distribution is both difficult to work with and typical of many attributes of practical interest. Testing of the new solution shows that sample subsets derived from resampled datasets can closely approximate the true probability distribution and the semivariogram, clearly outperforming the original preferentially sampled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Beirlant J, Goegebeur Y, Segers J, Teugels J (2004) Statistics of extremes: theory and applications. Wiley, Chichester, 490 p

  • Bogaert P (1999) On the optimal estimation of the cumulative distribution function in presence of spatial dependence. Math Geol 31(2):213–239

    Google Scholar 

  • Bourgault G (1997) Spatial declustering weights. Math Geol 29(2):277–290

    Article  Google Scholar 

  • Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken, 734 p

  • Deutsch CV (1989) DECLUS: a FORTRAN 77 program for determining optimum spatial declustering weights. Comput Geosci 15(3):325–332

    Article  Google Scholar 

  • Diggle J, Menezes R, Su TL (2010) Geostatistical Inference under preferential sampling. J R Stat Soc Ser C 59(2):191–232

    Article  Google Scholar 

  • Emery X, Ortiz JM (2005) Histogram and variogram inference in the multigaussian model. Stoch Environ Res Risk Assess 19(1):48–58

    Article  Google Scholar 

  • Emery X, Ortiz JM (2007) Weighted sample variograms as a tool to better assess the spatial variability of soil properties. Geoderma 140(1–2):81–89

    Article  CAS  Google Scholar 

  • Isaac EH, Srivastava RM (1989) Introduction to applied geostatistics. Oxford University Press, New York, 561 p

  • Journel AG (1983) Nonparametric estimation of spatial distributions. J Int Assoc Math Geol 15(3):445–468

    Article  Google Scholar 

  • Kovitz JL, Christakos G (2004) Spatial statistics of clustered data. Stoch Environ Res Risk Assess 18(3):147–166

    Article  Google Scholar 

  • Marchant BP, Viscarra Rossel RA, Webster R (2013) Fluctuations in method-of-moments variograms caused by clustered sampling and their elimination by declustering and residual maximum likelihood estimation. Eur J Soil Sci 64(4):401–409

    Article  CAS  Google Scholar 

  • Olea RA (2006) A six-step practical approach to semivariogram modeling. Stoch Environ Res Risk Assess 20(5):307–318

    Article  Google Scholar 

  • Olea RA (2007) Declustering of clustered preferential sampling for histogram and semivariogram inference. Math Geol 39(6):453–467

    Article  Google Scholar 

  • Olea RA (2008) Basic statistical concepts and methods for earth scientists. U.S. Geological Survey Open-File Report 2008–1017, 191 p

  • Omre H (1984) The variogram and its estimation. In: Verly G, David M, Journel AG, Meréchal A (eds) Geostatistics for natural resources characterization, part 1. Reidel, Dordrecht, pp 107–125

    Chapter  Google Scholar 

  • Pardo-Igúzquiza E, Dowd PA (2004) Normality test for spatially correlated data. Math Geol 36(6):659–681

    Article  Google Scholar 

  • Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling, 2nd edn. Oxford University Press, New York, 433 p

  • Pyrcz MJ, Gringarten E, Frykman P, Deutsch CV (2006) Representative input parameters for geostatistical simulation. In: Coburn TC, Yarus JM, Chambers RL (eds) Stochastic modeling and meostatistics: principles, methods and case studies, vol II. AAPG Computer Applications in Geology 5, pp 123–137

  • Reilly C, Gelman A (2007) Weighted classical variogram estimation for data with clustering. Technometrics 49(2):184–194

    Article  Google Scholar 

  • Richmond A (2002) Two-point declustering for weighting data pairs in experimental variogram calculations. Comput Geosci 28(2):231–241

    Article  Google Scholar 

  • Rivoirard J (2001) Weighted semivariograms. In: Kleingeld WJ, Krige DG (eds) Proceedings of the 6th International Geostatistics Congress, Cape Town, pp 145–155

  • Switzer P (1977) Estimation of spatial distributions from point sources with applications to air pollution measurement. Proceedings of the 41st ISI Session, New Delhi. Bulletin of the International Statistical Institute 47(2):123–137. Also available as Technical Report No. 9, Department of Statistics, Stanford University 20 p. https://statistics.stanford.edu/sites/default/files/SIMS%2009.pdf

  • Webster R, Oliver MA (1992) Sample adequately to estimate variograms of soil properties. J Soil Sci 43(1):177–192

    Article  Google Scholar 

  • Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Chichester, 315 p

Download references

Acknowledgments

The author wishes to thank Mark Engle (U.S. Geological Survey), Michael Pyrcz (Chevron Energy Technology Company), John Schuenemeyer (Southwest Statistical Consulting) and an anonymous reviewer appointed by the journal for reviewing an earlier version of the manuscript and making suggestion to improve its contents.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo A. Olea.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Olea, R.A. Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms. Stoch Environ Res Risk Assess 31, 481–491 (2017). https://doi.org/10.1007/s00477-016-1289-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-016-1289-4

Keywords

Navigation