Mathematical Geosciences

, Volume 47, Issue 7, pp 791–817 | Cite as

Multivariate Imputation of Unequally Sampled Geological Variables

  • Ryan M. Barnett
  • Clayton V. Deutsch


Unequally sampled data pose a practical and significant problem for geostatistical modeling. Multivariate transformations are frequently applied in modeling workflows to reproduce the multivariate relationships of geological data. Unfortunately, these transformations may only be applied to data observations that sample all of the variables. In the case of unequal sampling, practitioners must decide between excluding incomplete observations and imputing (inferring) the missing values. While imputation is recommended by missing data theorists, the use of deterministic methods such as regression is generally discouraged. Instead, techniques such as multiple imputation (MI) are advocated to increase the accuracy, decrease the bias, and capture the uncertainty of imputed values. As missing data theory has received little attention within geostatistical literature and practice, MI has not been adapted from its conventional form to be suitable for geological data. To address this, geostatistical algorithms are integrated within an MI framework to produce parametric and non-parametric methods. Synthetic and geometallurgical case studies are used to demonstrate the feasibility of each method, where techniques that use both spatial and colocated information are shown to outperform the alternatives.


Missing data analysis Statistics Geostatistics Modeling 



This research was supported by the National Sciences and Engineering Research Council of Canada and industry sponsors of the Centre for Computational Geostatistics. The authors wish to acknowledge and thank the anonymous reviewers whose comments improved the final manuscript.


  1. Barnett RM, Manchuk JG, Deutsch CV (2014) Projection pursuit multivariate transform. Math Geosci 46:337–359CrossRefGoogle Scholar
  2. Bliss C (1934) The method of probits. Science 79:39–39CrossRefGoogle Scholar
  3. Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 46:167–174Google Scholar
  4. Chiles J-P, Delfiner P (2012) Modeling spatial uncertainty, 2nd edn. Wiley, New YorkCrossRefGoogle Scholar
  5. Davis BM, Greenes KA (1983) Estimating using spatially distributed multivariate data: an example with coal quality. Math Geol 15:287–300CrossRefGoogle Scholar
  6. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38Google Scholar
  7. Desbarats AJ, Dimitrakopoulos R (2000) Geostatistical simulation of regionalized poresize distributions using min/max autocorrelations factors. Math Geol 32:919–942CrossRefGoogle Scholar
  8. Deutsch CV, Journel AG (1998) GSLIB: a geostatistical software library and user’s guide, 2nd edn. Oxford University Press, New YorkGoogle Scholar
  9. Deutsch CV, Zanon S (2004) Direct prediction of reservoir performance with Bayesian updating under a multivariate Gaussian model. In: Petroleum Society’s 5th Canadian Inter Petroleum Conf (5th Annual Technical Meeting), Calgary, AlbertaGoogle Scholar
  10. Doyen PM, Den Boer LD, Pillet WR (1996) Seismic porosity mapping in the Ekofisk field using a new form of collocated cokriging. In: SPE Ann tech conf and exhibition, Denver, ColoradoGoogle Scholar
  11. Enders C (2010) Applied missing data analysis. Guilford Press, New YorkGoogle Scholar
  12. Friedman JH (1987) Exploratory projection pursuit. J Am Stat Assoc 82:249–266CrossRefGoogle Scholar
  13. Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741CrossRefGoogle Scholar
  14. Graham JW, Olchowski AE, Gilreath TD (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science 8:206–213CrossRefGoogle Scholar
  15. Hong S (2010) Multivariate analysis of diverse data for improved geostatistical reservoir modeling. University of Alberta, Edmonton 188 ppGoogle Scholar
  16. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441CrossRefGoogle Scholar
  17. Huang R, Carriere K (2006) Comparison of methods for incomplete repeated measures data analysis in small samples. J Stat Plan Inf 136:235–247CrossRefGoogle Scholar
  18. Johnson RJ, Wichern DW (1998) Applied multivariate statistical analysis, 4th edn. Prentice Hall, New JerseyGoogle Scholar
  19. Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic Press, LondonGoogle Scholar
  20. Knight JR, Sirmons CF, Gelfand AE, Ghosh SK (1998) Analyzing real estate data problems using the Gibbs sampler 26:469–492Google Scholar
  21. Leuangthong O, Deutsch CV (2003) Stepwise conditional transformation for simulation of multiple variables. Math Geol 35:155–173CrossRefGoogle Scholar
  22. Li YY, Parker LE (2008) A spatial–temporal imputation technique for classification with missing data in a wireless sensor network. In: Proceedings of IEEE international converence on intelligent robots and systems, Nice, FranceGoogle Scholar
  23. Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New JerseyGoogle Scholar
  24. Lokupitiya RS, Erandathie L, Paustian K (2006) Comparison of missing value imputation methods for crop yield data. Environmetrics 17:339–349CrossRefGoogle Scholar
  25. Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35:253–278CrossRefGoogle Scholar
  26. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092CrossRefGoogle Scholar
  27. Munoz B, Lesser VM, Smith RA (2010) Applying multiple impuation with geostatistical models to account for iterm nonresponse in environmental data. J Modern Appl Stat Meth 9:274–286Google Scholar
  28. Neufeld C, Deutsch CV (2006) Data integration with non-parametric bayesian updating. CCG annual report 8 University of AlbertaGoogle Scholar
  29. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076CrossRefGoogle Scholar
  30. Rosenblatt M (1952) Remarks on a multivariate transformation. Ann Math Stat 23:470–472CrossRefGoogle Scholar
  31. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592CrossRefGoogle Scholar
  32. Rubin DB (1978) Multiple imputations in sample surveys—a phenomenalogical bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association, pp 30–34Google Scholar
  33. Rubin DB (1987) Multiple imputation for nonresponse in surrveys. Wiley, New YorkCrossRefGoogle Scholar
  34. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Meth 7:147–177CrossRefGoogle Scholar
  35. Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New YorkCrossRefGoogle Scholar
  36. Switzer P, Green AA (1984) Min/max autocorrelation factors for multivariate spatial imaging. In: Department of Statistics Technical Report 6. Stanford University, StanfordGoogle Scholar
  37. Tjelmeland H, Lund KV (2003) Bayesian modelling of spatial compositional data. J Appl Stat 30:87–100CrossRefGoogle Scholar
  38. Yuebiao L, Zhiheng L (2013) Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transport Res Part C Emerg Technol 34:108–120CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2015

Authors and Affiliations

  1. 1.Department of Civil and Environmental Engineering, Centre for Computational GeostatisticsUniversity of AlbertaEdmontonCanada

Personalised recommendations