Statistics and Computing

, Volume 19, Issue 2, pp 113–128 | Cite as

Spatial model fitting for large datasets with applications to climate and microarray problems

Article

Abstract

Many problems in the environmental and biological sciences involve the analysis of large quantities of data. Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence. Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix describing the spatial dependence. We propose a very general type of mixed model that has a random spatial component. Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero. Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters. The novelty of the paper is the combination of the two techniques, tapering and backfitting, to model and analyze spatial datasets several orders of magnitude larger than those datasets typically analyzed with conventional approaches. Results will be demonstrated with two datasets. The first consists of regional climate model output that is based on an experiment with two regional and two driver models arranged in a two-by-two layout. The second is microarray data used to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures.

Keywords

Mixed effects Backfitting Covariance Tapering Sparse matrices 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover, New York (1970) Google Scholar
  2. Bates, D., Maechler, M.: Matrix: A Matrix package for R. R package version 0.995-12 (2006) Google Scholar
  3. Breiman, L., Friedman, J.H.: Estimating optimal transformations for multiple regression and correlations (with discussion). J. Am. Stat. Assoc. 80, 580–619 (1985) MATHCrossRefMathSciNetGoogle Scholar
  4. Buja, A., Hastie, T.J., Tibshirani, R.J.: Linear smoothers and additive models (with discussion). Ann. Stat. 17, 453–555 (1989) MATHCrossRefMathSciNetGoogle Scholar
  5. Christensen, J., Christensen, O.: A summary of the PRUDENCE model projections of changes in European climate by the end of this century. Clim. Change 81, 7–30 (2007) CrossRefGoogle Scholar
  6. Christensen, J., Carter, T.R., Rummukainen, M.: Evaluating the performance and utility of regional climate models: the PRUDENCE project. Clim. Change 81, 1–6 (2007) CrossRefGoogle Scholar
  7. Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993). Revised reprint Google Scholar
  8. Fowler, H.J., Ekström, M., Blenkinsop, S., Smith, A.P.: Estimating change in extreme European precipitation using a multimodel ensemble. J. Geophys. Res. 112, D18104 (2007) CrossRefGoogle Scholar
  9. Furrer, R.: Spam: sparse matrix algebra. http://www.mines.edu/~rfurrer/software/spam/ (2007)
  10. Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15, 502–523 (2006) CrossRefMathSciNetGoogle Scholar
  11. Furrer, R., Knutti, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Spatial patterns of probabilistic temperature change projections from a multivariate Bayesian analysis. Geophys. Res. Lett. 34, L06711 (2007a) CrossRefGoogle Scholar
  12. Furrer, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Multivariate Bayesian analysis of atmosphere-ocean general circulation models. Environ. Ecol. Stat. 14, 249–266 (2007b) CrossRefMathSciNetGoogle Scholar
  13. Furrer, R., Sain, S.R.: Spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. Technical Report, MCS-08-05, Colorado School of Mines, Golden, USA (2008) Google Scholar
  14. George, A., Liu, J.W.H.: Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs (1981) MATHGoogle Scholar
  15. Gneiting, T.: Correlation functions for atmospheric data analysis. Q.J.R. Meteorol. Soc. 125, 2449–2464 (1999) CrossRefGoogle Scholar
  16. Gneiting, T.: Compactly supported correlation functions. J. Multivar. Anal. 83, 493–508 (2002) MATHCrossRefMathSciNetGoogle Scholar
  17. Handcock, M.S., Stein, M.L.: A Bayesian analysis of kriging. Technometrics 35, 403–410 (1993) CrossRefGoogle Scholar
  18. Harville, D.A.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997) MATHGoogle Scholar
  19. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1994) MATHGoogle Scholar
  20. Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996) CrossRefGoogle Scholar
  21. Kaufman, C., Sain, S.R.: Bayesian functional ANOVA modeling using Gaussian process prior distributions (2008, submitted) Google Scholar
  22. Kitanidis, P.K.: Introduction to Geostatistics: Applications in Hydrogeology. University Press, Cambridge (1997) Google Scholar
  23. Koenker, R., Ng, P.: SparseM: sparse matrix package for R. http://www.econ.uiuc.edu/~roger/research/sparse/SparseM.pdf (2003)
  24. Li, C., Tseng, G.C., Wong, H.W.: Model-based analysis of oligonucleotide arrays and issues in cDNA microarray analysis. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 1–34. Chapman & Hall/CRC, London (2003). Chap. 1 Google Scholar
  25. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996) CrossRefGoogle Scholar
  26. Matérn, B.: Spatial variation: stochastic models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforsk. Inst. Stockh. 49(5) (1960) Google Scholar
  27. Nychka, D.W.: Spatial-process estimates as smoothers. In: Schimek, M.G. (ed.) Smoothing and Regression: Approaches, Computation, and Application, pp. 393–424. Wiley, New York (2000). Chap. 13 Google Scholar
  28. PRUDENCE: Prediction of regional scenarios and uncertainties for defining european climate change risks and effects. http://prudence.dmi.dk (2007)
  29. R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2006)
  30. Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, London (2005) MATHGoogle Scholar
  31. Sain, S.R., Furrer, R., Cressie, N.: Combining regional climate model output via a multivariate Markov random field model. In: 56th Session of the International Statistical Institute, Lisbon, Portugal (2007) Google Scholar
  32. Schabenberger, O., Gotway, C.A.: Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC, London (2005) MATHGoogle Scholar
  33. Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992) MATHGoogle Scholar
  34. Shenkar, R., Elliott, J.P., Diener, K., Gault, J., Hu, L., Cohrs, R.J., Phang, T., Hunter, L., Breeze, R.E., Awad, I.A.: Differential gene expression in human cerebrovascular malformations (with discussion). Neurosurgery 52, 465–478 (2003) CrossRefGoogle Scholar
  35. Speed, T.P. (ed.): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall/CRC, New York (2003) MATHGoogle Scholar
  36. Stein, M.L.: Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Stat. 18, 850–872 (1990) MATHCrossRefGoogle Scholar
  37. Stein, M.L.: A simple condition for asymptotic optimality of linear predictions of random fields. Stat. Probab. Lett. 17, 399–404 (1993) MATHCrossRefGoogle Scholar
  38. Stein, M.L.: Interpolation of Spatial Data. Springer, New York (1999a) MATHGoogle Scholar
  39. Stein, M.L.: Predicting random fields with increasing dense observations. Ann. Appl. Probab. 9, 242–273 (1999b) MATHCrossRefMathSciNetGoogle Scholar
  40. Wang, H., He, X.: Detecting differential expressions in GeneChip microarray studies: A quantile approach. J. Am. Stat. Assoc. 102, 104–112 (2007) MATHCrossRefMathSciNetGoogle Scholar
  41. Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4, 389–396 (1995) MATHCrossRefMathSciNetGoogle Scholar
  42. Wu, Z.M.: Compactly supported positive definite radial functions. Adv. Comput. Math. 4, 283–292 (1995) MATHMathSciNetCrossRefGoogle Scholar
  43. Zimmerman, D.L., Cressie, N.: Mean squared prediction error in the spatial linear model with estimated covariance parameters. Ann. Inst. Stat. Math. 44, 27–43 (1992) MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Mathematical and Computer SciencesColorado School of MinesGoldenUSA
  2. 2.Geophysical Statistics ProjectNational Center for Atmospheric ResearchBoulderUSA

Personalised recommendations