Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms

Olea, Ricardo A.

doi:10.1007/s00477-016-1289-4

Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms

Original Paper
Published: 09 July 2016

Volume 31, pages 481–491, (2017)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Ricardo A. Olea¹

330 Accesses
6 Citations
Explore all metrics

Abstract

Spatial data are commonly minimal and may have been collected in the process of confirming the profitability of a mining venture or investigating a contaminated site. In such situations, it is common to have measurements preferentially taken in the most critical areas (sweet spots, allegedly contaminated areas), thus conditionally biasing the sample. While preferential sampling makes good practical sense, its direct use leads to distorted sample moments and percentiles. Spatial clusters are a problem that has been identified in the past and solved with approaches ranging from ad hoc solutions to highly elaborate mathematical formulations, covering mostly the effect of clustering on the cumulative frequency distribution. The method proposed here is a form of resample, free of special assumptions, does not use weights to ponder the measurements, does not find solutions by successive approximation and provides variability in the results. The new method is illustrated with a synthetic dataset with an exponential semivariogram and purposely generated to follow a lognormal distribution. The lognormal distribution is both difficult to work with and typical of many attributes of practical interest. Testing of the new solution shows that sample subsets derived from resampled datasets can closely approximate the true probability distribution and the semivariogram, clearly outperforming the original preferentially sampled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Article 18 October 2014

Exploratory Data Analysis

References

Beirlant J, Goegebeur Y, Segers J, Teugels J (2004) Statistics of extremes: theory and applications. Wiley, Chichester, 490 p
Bogaert P (1999) On the optimal estimation of the cumulative distribution function in presence of spatial dependence. Math Geol 31(2):213–239
Google Scholar
Bourgault G (1997) Spatial declustering weights. Math Geol 29(2):277–290
Article Google Scholar
Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken, 734 p
Deutsch CV (1989) DECLUS: a FORTRAN 77 program for determining optimum spatial declustering weights. Comput Geosci 15(3):325–332
Article Google Scholar
Diggle J, Menezes R, Su TL (2010) Geostatistical Inference under preferential sampling. J R Stat Soc Ser C 59(2):191–232
Article Google Scholar
Emery X, Ortiz JM (2005) Histogram and variogram inference in the multigaussian model. Stoch Environ Res Risk Assess 19(1):48–58
Article Google Scholar
Emery X, Ortiz JM (2007) Weighted sample variograms as a tool to better assess the spatial variability of soil properties. Geoderma 140(1–2):81–89
Article CAS Google Scholar
Isaac EH, Srivastava RM (1989) Introduction to applied geostatistics. Oxford University Press, New York, 561 p
Journel AG (1983) Nonparametric estimation of spatial distributions. J Int Assoc Math Geol 15(3):445–468
Article Google Scholar
Kovitz JL, Christakos G (2004) Spatial statistics of clustered data. Stoch Environ Res Risk Assess 18(3):147–166
Article Google Scholar
Marchant BP, Viscarra Rossel RA, Webster R (2013) Fluctuations in method-of-moments variograms caused by clustered sampling and their elimination by declustering and residual maximum likelihood estimation. Eur J Soil Sci 64(4):401–409
Article CAS Google Scholar
Olea RA (2006) A six-step practical approach to semivariogram modeling. Stoch Environ Res Risk Assess 20(5):307–318
Article Google Scholar
Olea RA (2007) Declustering of clustered preferential sampling for histogram and semivariogram inference. Math Geol 39(6):453–467
Article Google Scholar
Olea RA (2008) Basic statistical concepts and methods for earth scientists. U.S. Geological Survey Open-File Report 2008–1017, 191 p
Omre H (1984) The variogram and its estimation. In: Verly G, David M, Journel AG, Meréchal A (eds) Geostatistics for natural resources characterization, part 1. Reidel, Dordrecht, pp 107–125
Chapter Google Scholar
Pardo-Igúzquiza E, Dowd PA (2004) Normality test for spatially correlated data. Math Geol 36(6):659–681
Article Google Scholar
Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling, 2nd edn. Oxford University Press, New York, 433 p
Pyrcz MJ, Gringarten E, Frykman P, Deutsch CV (2006) Representative input parameters for geostatistical simulation. In: Coburn TC, Yarus JM, Chambers RL (eds) Stochastic modeling and meostatistics: principles, methods and case studies, vol II. AAPG Computer Applications in Geology 5, pp 123–137
Reilly C, Gelman A (2007) Weighted classical variogram estimation for data with clustering. Technometrics 49(2):184–194
Article Google Scholar
Richmond A (2002) Two-point declustering for weighting data pairs in experimental variogram calculations. Comput Geosci 28(2):231–241
Article Google Scholar
Rivoirard J (2001) Weighted semivariograms. In: Kleingeld WJ, Krige DG (eds) Proceedings of the 6th International Geostatistics Congress, Cape Town, pp 145–155
Switzer P (1977) Estimation of spatial distributions from point sources with applications to air pollution measurement. Proceedings of the 41st ISI Session, New Delhi. Bulletin of the International Statistical Institute 47(2):123–137. Also available as Technical Report No. 9, Department of Statistics, Stanford University 20 p. https://statistics.stanford.edu/sites/default/files/SIMS%2009.pdf
Webster R, Oliver MA (1992) Sample adequately to estimate variograms of soil properties. J Soil Sci 43(1):177–192
Article Google Scholar
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Chichester, 315 p

Download references

Acknowledgments

The author wishes to thank Mark Engle (U.S. Geological Survey), Michael Pyrcz (Chevron Energy Technology Company), John Schuenemeyer (Southwest Statistical Consulting) and an anonymous reviewer appointed by the journal for reviewing an earlier version of the manuscript and making suggestion to improve its contents.

Author information

Authors and Affiliations

U.S. Geological Survey, 12201 Sunrise Valley Drive, Mail Stop 956, Reston, VA, 20192, USA
Ricardo A. Olea

Authors

Ricardo A. Olea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo A. Olea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olea, R.A. Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms. Stoch Environ Res Risk Assess 31, 481–491 (2017). https://doi.org/10.1007/s00477-016-1289-4

Download citation

Published: 09 July 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00477-016-1289-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Exploratory Data Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

Exploratory Data Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation