Parallel Geostatistics for Sparse and Dense Datasets

Ingram, Ben; Cornford, Dan

doi:10.1007/978-90-481-2322-3_32

Parallel Geostatistics for Sparse and Dense Datasets

Ben Ingram³ &
Dan Cornford⁴

Chapter
First Online: 01 January 2010

1658 Accesses
4 Citations

Part of the book series: Quantitative Geology and Geostatistics ((QGAG,volume 16))

Abstract

Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least two processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that different natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia (1988) and Tresp (2000). Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the different methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Cressie NAC (1993) Statistics for spatial data. Wiley, New York
Google Scholar
David M (1976) The practice of kriging. Adv Geostatistics in the Min Ind, 31:461
Google Scholar
Davis MW, Culbane PG (1984) Contouring very large data sets using kriging. Geostatistics for Nat Resour Characterization 2:599–619
Google Scholar
Emery X, Lantuéjoul C (2006) TBSIM: a computer program for conditional simulation of three-dimensional Gaussian random fields via the turning bands method. Comput Geosci, 32(10):1615–1628
Article Google Scholar
Fernández J, Anguita M, Mota S, Cañas A, Ortigosa E, Rojas FJ (2004) MPI toolbox for octave. In: Proceedings of 6th international conference on high performance computing for computational science, Valencia, Spain, 2004. Springer, Berlin Heidelberg
Google Scholar
Gebhardt A (2003) PVM kriging with R. In: Proceedings of the 3rd international workshop on distributed statistical computing, Vienna
Google Scholar
Haas TC (1995) Local prediction of a spatio-temporal process with an application to wet sulfate deposition. J Am Stat Assoc, 90(432):1189–199
Article Google Scholar
Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York
Google Scholar
Kerry KE, Hawick KA (1998) Kriging interpolation on high-performance computers. In: Proceedings of the international conference and exhibition on high-performance computing and networking. Springer Berlin, Heidelberg, pp 429–438
Google Scholar
Pardo-Igúzquiza E, Dowd PA (1997) AMLE3D: a computer program for the inference of spatial covariance parameters by approximate maximum likelihood estimation. Comput Geosci, 23(7):793–805(13)
Article Google Scholar
Pedelty JA, Schnase JL, Smith JA (2003) High performance geostatistical modeling of biospheric resources in the Cerro Grande Wildfire Site, Los Alamos, New Mexico and Rocky Mountain National Park, Colorado. NASA Goddard Space Flight Center, Code 930
Google Scholar
Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. CRC Press, Boca Raton, FL
Google Scholar
Schwaighofer A, Tresp V (2003) Transductive and inductive methods for approximate Gaussian process regression. Adv Neural Inf Process Syst 15:953–960
Google Scholar
Stein ML, Chi Z, Welty LJ (2004) Approximating likelihoods for large spatial data sets. J R Stat Soc B 66(2):275–296
Article Google Scholar
Tresp V (2000) A Bayesian committee machine. Neural Comput 12(11):2719–2741
Article Google Scholar
Tresp V (2001) Committee machines, in handbook for neural network signal processing, chapter 5. CRC Press, Boca Raton, FL pp 1–18
Google Scholar
Vecchia AV (1988) Estimation and model identification for continuous spatial processes. J R Stat Soc B Met 50(2):297–312
Google Scholar

Download references

Acknowledgements

This work is funded by the European Commission, under the Sixth Framework Programme, by the Contract N. 033811 with the DG INFSO, action Line IST-2005-2.5.12 ICT for Environmental Risk Management. The views expressed herein are those of the authors and are not necessarily those of the European Commission.

Author information

Authors and Affiliations

Facultad de Ingeniería, Universidad de Talca, Camino Los Niches, Curicó, Chile
Ben Ingram
Neural Computing Research Group, Aston University, Aston Street, Birmingham, B4 7ET, United Kingdom
Dan Cornford

Authors

Ben Ingram
View author publications
You can also search for this author in PubMed Google Scholar
Dan Cornford
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Ingram .

Editor information

Editors and Affiliations

School of Geography, University of Southampton, Highfield, Southampton, SO17 1BJ, United Kingdom
P. M. Atkinson
Fac. Science, School of Geosciences, Queen's University Belfast, Belfast, BT7 1NN, United Kingdom
C. D. Lloyd

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ingram, B., Cornford, D. (2010). Parallel Geostatistics for Sparse and Dense Datasets. In: Atkinson, P., Lloyd, C. (eds) geoENV VII – Geostatistics for Environmental Applications. Quantitative Geology and Geostatistics, vol 16. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-2322-3_32

Download citation

DOI: https://doi.org/10.1007/978-90-481-2322-3_32
Published: 04 January 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-2321-6
Online ISBN: 978-90-481-2322-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics