Spatial model fitting for large datasets with applications to climate and microarray problems
- 202 Downloads
- 8 Citations
Abstract
Many problems in the environmental and biological sciences involve the analysis of large quantities of data. Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence. Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix describing the spatial dependence. We propose a very general type of mixed model that has a random spatial component. Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero. Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters. The novelty of the paper is the combination of the two techniques, tapering and backfitting, to model and analyze spatial datasets several orders of magnitude larger than those datasets typically analyzed with conventional approaches. Results will be demonstrated with two datasets. The first consists of regional climate model output that is based on an experiment with two regional and two driver models arranged in a two-by-two layout. The second is microarray data used to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures.
Keywords
Mixed effects Backfitting Covariance Tapering Sparse matricesPreview
Unable to display preview. Download preview PDF.
References
- Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover, New York (1970) Google Scholar
- Bates, D., Maechler, M.: Matrix: A Matrix package for R. R package version 0.995-12 (2006) Google Scholar
- Breiman, L., Friedman, J.H.: Estimating optimal transformations for multiple regression and correlations (with discussion). J. Am. Stat. Assoc. 80, 580–619 (1985) MATHCrossRefMathSciNetGoogle Scholar
- Buja, A., Hastie, T.J., Tibshirani, R.J.: Linear smoothers and additive models (with discussion). Ann. Stat. 17, 453–555 (1989) MATHCrossRefMathSciNetGoogle Scholar
- Christensen, J., Christensen, O.: A summary of the PRUDENCE model projections of changes in European climate by the end of this century. Clim. Change 81, 7–30 (2007) CrossRefGoogle Scholar
- Christensen, J., Carter, T.R., Rummukainen, M.: Evaluating the performance and utility of regional climate models: the PRUDENCE project. Clim. Change 81, 1–6 (2007) CrossRefGoogle Scholar
- Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993). Revised reprint Google Scholar
- Fowler, H.J., Ekström, M., Blenkinsop, S., Smith, A.P.: Estimating change in extreme European precipitation using a multimodel ensemble. J. Geophys. Res. 112, D18104 (2007) CrossRefGoogle Scholar
- Furrer, R.: Spam: sparse matrix algebra. http://www.mines.edu/~rfurrer/software/spam/ (2007)
- Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15, 502–523 (2006) CrossRefMathSciNetGoogle Scholar
- Furrer, R., Knutti, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Spatial patterns of probabilistic temperature change projections from a multivariate Bayesian analysis. Geophys. Res. Lett. 34, L06711 (2007a) CrossRefGoogle Scholar
- Furrer, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Multivariate Bayesian analysis of atmosphere-ocean general circulation models. Environ. Ecol. Stat. 14, 249–266 (2007b) CrossRefMathSciNetGoogle Scholar
- Furrer, R., Sain, S.R.: Spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. Technical Report, MCS-08-05, Colorado School of Mines, Golden, USA (2008) Google Scholar
- George, A., Liu, J.W.H.: Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs (1981) MATHGoogle Scholar
- Gneiting, T.: Correlation functions for atmospheric data analysis. Q.J.R. Meteorol. Soc. 125, 2449–2464 (1999) CrossRefGoogle Scholar
- Gneiting, T.: Compactly supported correlation functions. J. Multivar. Anal. 83, 493–508 (2002) MATHCrossRefMathSciNetGoogle Scholar
- Handcock, M.S., Stein, M.L.: A Bayesian analysis of kriging. Technometrics 35, 403–410 (1993) CrossRefGoogle Scholar
- Harville, D.A.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997) MATHGoogle Scholar
- Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1994) MATHGoogle Scholar
- Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996) CrossRefGoogle Scholar
- Kaufman, C., Sain, S.R.: Bayesian functional ANOVA modeling using Gaussian process prior distributions (2008, submitted) Google Scholar
- Kitanidis, P.K.: Introduction to Geostatistics: Applications in Hydrogeology. University Press, Cambridge (1997) Google Scholar
- Koenker, R., Ng, P.: SparseM: sparse matrix package for R. http://www.econ.uiuc.edu/~roger/research/sparse/SparseM.pdf (2003)
- Li, C., Tseng, G.C., Wong, H.W.: Model-based analysis of oligonucleotide arrays and issues in cDNA microarray analysis. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 1–34. Chapman & Hall/CRC, London (2003). Chap. 1 Google Scholar
- Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996) CrossRefGoogle Scholar
- Matérn, B.: Spatial variation: stochastic models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforsk. Inst. Stockh. 49(5) (1960) Google Scholar
- Nychka, D.W.: Spatial-process estimates as smoothers. In: Schimek, M.G. (ed.) Smoothing and Regression: Approaches, Computation, and Application, pp. 393–424. Wiley, New York (2000). Chap. 13 Google Scholar
- PRUDENCE: Prediction of regional scenarios and uncertainties for defining european climate change risks and effects. http://prudence.dmi.dk (2007)
- R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2006)
- Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, London (2005) MATHGoogle Scholar
- Sain, S.R., Furrer, R., Cressie, N.: Combining regional climate model output via a multivariate Markov random field model. In: 56th Session of the International Statistical Institute, Lisbon, Portugal (2007) Google Scholar
- Schabenberger, O., Gotway, C.A.: Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC, London (2005) MATHGoogle Scholar
- Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992) MATHGoogle Scholar
- Shenkar, R., Elliott, J.P., Diener, K., Gault, J., Hu, L., Cohrs, R.J., Phang, T., Hunter, L., Breeze, R.E., Awad, I.A.: Differential gene expression in human cerebrovascular malformations (with discussion). Neurosurgery 52, 465–478 (2003) CrossRefGoogle Scholar
- Speed, T.P. (ed.): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall/CRC, New York (2003) MATHGoogle Scholar
- Stein, M.L.: Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Stat. 18, 850–872 (1990) MATHCrossRefGoogle Scholar
- Stein, M.L.: A simple condition for asymptotic optimality of linear predictions of random fields. Stat. Probab. Lett. 17, 399–404 (1993) MATHCrossRefGoogle Scholar
- Stein, M.L.: Interpolation of Spatial Data. Springer, New York (1999a) MATHGoogle Scholar
- Stein, M.L.: Predicting random fields with increasing dense observations. Ann. Appl. Probab. 9, 242–273 (1999b) MATHCrossRefMathSciNetGoogle Scholar
- Wang, H., He, X.: Detecting differential expressions in GeneChip microarray studies: A quantile approach. J. Am. Stat. Assoc. 102, 104–112 (2007) MATHCrossRefMathSciNetGoogle Scholar
- Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4, 389–396 (1995) MATHCrossRefMathSciNetGoogle Scholar
- Wu, Z.M.: Compactly supported positive definite radial functions. Adv. Comput. Math. 4, 283–292 (1995) MATHMathSciNetCrossRefGoogle Scholar
- Zimmerman, D.L., Cressie, N.: Mean squared prediction error in the spatial linear model with estimated covariance parameters. Ann. Inst. Stat. Math. 44, 27–43 (1992) MATHCrossRefMathSciNetGoogle Scholar