Abstract
Gaussian process models have been used in applications ranging from machine learning to fisheries management. In the Bayesian framework, the Gaussian process is used as a prior for unknown functions, allowing the data to drive the relationship between inputs and outputs. In our research, we consider a scenario in which response and input data are available from several similar, but not necessarily identical, sources. When little information is known about one or more of the populations it may be advantageous to model all populations together. We present a hierarchical Gaussian process model with a structure that allows distinct features for each source as well as shared underlying characteristics. Key features and properties of the model are discussed and demonstrated in a number of simulation examples. The model is then applied to a data set consisting of three populations of Rotifer Brachionus calyciflorus Pallas. Specifically, we model the log growth rate of the populations using a combination of lagged population sizes. The various lag combinations are formally compared to obtain the best model inputs. We then formally compare the leading hierarchical Gaussian process model with the inferential results obtained under the independent Gaussian process model.
Similar content being viewed by others
References
Alder RJ (1981) The geometry of random fields. Wiley, New York
Banerjee S, Gelfand A, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70(4):825–848
Blum M, Riedmiller M (2013) Optimization of Gaussian process hyperparameters using Rprop. IN: ESANN 2013 Belgium, European symposium on artificial neural networks. Computational Intelligence and Machine Learning, pp 24–26
Dension DGT, Holmes BKMCC, Smith AFM (2002) Bayesian methods for nonlinear classification and regression. Wiley, New York
DeIorio M, Johnson WO, Müller P, Rosner GL (2009) Bayesian nonparametric nonproportional hazards survival modelings. Biometrics 65:762–771
Fronczyk K, Kottas A (2010) A bayesian nonparametric modeling framework for developmental toxicity studies. Technical report, University of California, Santa Cruz
Gelfand AE, Gosh SK (1998) Model choice: a minimum posterior predictive loss approach. Biometrika 85(1):1–11
Gelfand AE, Sahu S (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253
Gomulkiewicz R, Kirkpatrick M (1992) Quantitative genetics and the evolution of reaction norms. Evolution 46(2):390–411
Halbach U (1973) Effects of temperature on ectothermic organisms, Ecology, chap. In: Life table data and populations dynamics of the rotifer Brachionus calyciflorus Pallas as influenced by periodically oscillating temperature. Springer, pp 217–228
Hassan M, Penfei D, Iqbal W, Can W, Wei F, Ba W (2014) Temperature and precipitation climatology assessment over South Asia using the regional climate model (RegCM4.3): an evaluation of the model performance. J Earth Sci Clim Change 5:214. doi:10.4172/2157-7617.1000214
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer-Verlag, p 763
Hastings WK (1970) Monte carlo sampling methods using Markov chains and their applications. Biometrics 57:97–109
Hyndman RJ, Ullah MS (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51(10):4942–4956
Kaufmann CG, Sain SR (2010) Bayesian functional ANOVA modeling using Gaussian prior distributions. Bayesian Anal 5:123–150
Kottas A, Krnjajić M (2009) Bayesian semiparametric modeling in quantile regression. Scand J Stat 36:297–319
Kottas A, Behseta S, Moorman D, Poynor V, Olson C (2012) Bayesian nonparametric analysis of neuronal intensity rates. J Neurosci Methods 203(1):241–253
Munch SB, Kottas A, Mangel M (2005) Bayesian nonparametric analysis of stock-recruitment relationships. Can J Fish Aquat Sci 62:1808–1821
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin
NERC Centre for Population Biology, Imperial College (2010) The global population dynamics database version 2. http://www.sw.ic.ac.uk/cpb/cpb/gpdd.html
O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc 40(1):1–42
Poynor V, Kottas A (2015) Nonparametric Bayesian inference for mean residual life functions in survival analysis. arXiv:1411.7481 [stat.ME]
Ramsay JO, Dalzell CJ (1991) Some tools for functional data analysis. J R Stat Soc 53(3):539–572
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Ramsmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rodriguez A, ter Horst E (2008) Bayesian dynamic density estimation. Bayesian Anal 3:339–366
Shi J, Wang B, Will E, West R (2012) Mixed-effects gaussian process functional regression models with application to dose–response curve prediction. Stat Med 57:97–109
Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. CRC Press, Taylor and Francis Group, Boca Raton
Shi JQ, Wang B (2008) Curve prediction and clustering with mixtures of Gaussian process functional regression models. Stat Comput 18:267–283
Shi JQ, Murray-Smith R, Titterington DM (2005) Hierarchical Gassign process mixtures for regression. Stat Comput 15:31–41
Shi JQ, Murray-Smith R, Titterington DM (2007) Gaussian process functional regression modeling for batch data. Biometrics 63:714–723
Acknowledgements
The authors thank Marc Mangel and the Center for Stock Assessment Research for providing an enthusiastic and stimulating work environment as well as impeccable administrative support. S. Munch gratefully acknowledges the NMFS IAM program which provided financial support for Dr. Poynor while conducting this research. The authors also thank Marc Mangel and WhoSeung Lee for their helpful comments on this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Appendices
Appendix 1: Rprop algorithm for HGP
We utilized the Rprop algorithm to establish the initial values of the parameters \({{\varvec{\theta }}} = (\tau ^2, \sigma ^2, \phi , \rho )'\) for the MCMC. We apply the algorithm described by Blum and Riedmiller (2013) for the standard Gaussian Process Regression model for The HGP. In general, the Rprop algorithm is a gradient-based optimization method. We use the algorithm to find the parameter set that maximizes the log likelihood of the HGP model with \(R_C = R_\varSigma \) being the squared exponential function. Note that we also assume the same variance parameter for the corresponding covariance functions. As described in Sect. 2.3, we marginalize the HGP of the unknown functions \(f_i(\cdot )\) to obtain the model stated in (4) and (5). The log likelihood of the model is then given by:
The derivative of the log likelihood with respect to the kth parameter, \(\theta _k\), is given by:
Specifically,
The sequential algorithm according to Rpop for time step, \(t = 1,\ldots ,T\) is as followings:
such that \(\eta ^+ > 1\) and \(0<\eta ^-<1\) are specified. If the sign of the partial derivative of the negative log likelihood with respect to parameter \(\theta _k\) remains unchanged, then \(\theta _k\) continues in the same direction if already heading towards the minimum or otherwise reverses direction by a larger change of \(\eta ^+\varDelta _k^{(t-1)}\). If the partial derivative changes, then \(\theta _k\) reverses direction if the minimum was overshot or otherwise continues in the same direction by a smaller change of \(\eta ^-\varDelta \). If a local minimum has been reached no change is performed. An initial \(\varDelta _0\) value is specified along with a \(\varDelta _{ min }\) and \(\varDelta _{ max }\) such that the value of the change is bounded.
Appendix 2: Posterior sampling and predictive algorithms for HGP
This appendix describes the MCMC algorithm used to obtain posterior samples of the model parameters, \((\tau ^2, \sigma ^2, \phi , \rho )\), as well as predictive inference for new responses, \({{\varvec{y}}}^*\), over the vector of new input values, \({{\varvec{x}}}^*_i\). Recall, that the model is given by,
The elements of \(\varPhi \) are given by:
where \(\delta _{ii'}\) is Kronecker delta function which is 1 if \(i=i'\) and 0 otherwise. We assume the same squared exponential covariance functions for \(C(\cdot )\) and \(\varSigma (\cdot )\). We use a uniform prior for the correlation parameter \(\rho \), and inverse gamma functions, \(\varGamma ^{-1}(a,b)\), with mean a / b for the other three: \(p(\tau ^2) \equiv \varGamma ^{-1}(2, b_\tau ), \ \ p(\sigma ^2) \equiv \varGamma ^{-1}(2, b_\sigma ), \ \ p(\phi ) \equiv \varGamma ^{-1}(2, b_\phi )\).
For the parameter updates, we use the Metropolis–Hastings algorithm (Hastings 1970) as conjugacy is not available. Initial values for \(\tau ^2_0\), \(\sigma ^2_0\), \(\phi _0\), and \(\rho _0\) are obtained using Rprop (see “Appendix 1”), and posterior samples \(t = 1,\ldots ,T\) are obtained via the Metropolis–Hastings algorithm below.
Values of the parameters are proposed by a multivariate normal distribution:
The proposed values are accepted with probability,
Preliminary runs are made to establish the form of the covariance matrix of the proposal distribution, and \(c >0\) is specified to facilitate stability and good mixing of the posterior samples. A burn-in period is performed and thinning is performed to obtain independent posterior samples.
Predictive values of new responses for the ith population, \({{\varvec{y}}}_i^*\), over a grid of new input values, \( {{\varvec{X}}}^*\) can easily be obtained by integrating over the posterior samples of the model parameters:
where \(A(x^*_{ij},x^*_{ij'}) = C(x^*_{ij},x^*_{ij'}) + \varSigma (x^*_{ij},x^*_{ij'}) + \delta _{jj'}\sigma ^2\) and \(B(x^*_{ij}, x_{i'j'}) = C(x^*_{ij}, x_{i'j'}) + \delta _{ii'}\varSigma (x^*_{ij}, x_{i'j'})\). Therefore, for each posterior sample of the model parameters, \({{\varvec{\theta }}}^{(t)}\), we can obtain a sample of the predicted response for the ith population, \({{\varvec{y}}}^*_i\), over the vector of new input values, \({{\varvec{x}}}^*_i\):
Rights and permissions
About this article
Cite this article
Poynor, V., Munch, S. Combining functional data with hierarchical Gaussian process models. Environ Ecol Stat 24, 175–199 (2017). https://doi.org/10.1007/s10651-017-0366-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-017-0366-2