Skip to main content
Log in

Combining functional data with hierarchical Gaussian process models

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Gaussian process models have been used in applications ranging from machine learning to fisheries management. In the Bayesian framework, the Gaussian process is used as a prior for unknown functions, allowing the data to drive the relationship between inputs and outputs. In our research, we consider a scenario in which response and input data are available from several similar, but not necessarily identical, sources. When little information is known about one or more of the populations it may be advantageous to model all populations together. We present a hierarchical Gaussian process model with a structure that allows distinct features for each source as well as shared underlying characteristics. Key features and properties of the model are discussed and demonstrated in a number of simulation examples. The model is then applied to a data set consisting of three populations of Rotifer Brachionus calyciflorus Pallas. Specifically, we model the log growth rate of the populations using a combination of lagged population sizes. The various lag combinations are formally compared to obtain the best model inputs. We then formally compare the leading hierarchical Gaussian process model with the inferential results obtained under the independent Gaussian process model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Alder RJ (1981) The geometry of random fields. Wiley, New York

    Google Scholar 

  • Banerjee S, Gelfand A, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70(4):825–848

    Article  Google Scholar 

  • Blum M, Riedmiller M (2013) Optimization of Gaussian process hyperparameters using Rprop. IN: ESANN 2013 Belgium, European symposium on artificial neural networks. Computational Intelligence and Machine Learning, pp 24–26

  • Dension DGT, Holmes BKMCC, Smith AFM (2002) Bayesian methods for nonlinear classification and regression. Wiley, New York

    Google Scholar 

  • DeIorio M, Johnson WO, Müller P, Rosner GL (2009) Bayesian nonparametric nonproportional hazards survival modelings. Biometrics 65:762–771

    Article  Google Scholar 

  • Fronczyk K, Kottas A (2010) A bayesian nonparametric modeling framework for developmental toxicity studies. Technical report, University of California, Santa Cruz

  • Gelfand AE, Gosh SK (1998) Model choice: a minimum posterior predictive loss approach. Biometrika 85(1):1–11

    Article  Google Scholar 

  • Gelfand AE, Sahu S (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253

    Article  Google Scholar 

  • Gomulkiewicz R, Kirkpatrick M (1992) Quantitative genetics and the evolution of reaction norms. Evolution 46(2):390–411

    Article  Google Scholar 

  • Halbach U (1973) Effects of temperature on ectothermic organisms, Ecology, chap. In: Life table data and populations dynamics of the rotifer Brachionus calyciflorus Pallas as influenced by periodically oscillating temperature. Springer, pp 217–228

  • Hassan M, Penfei D, Iqbal W, Can W, Wei F, Ba W (2014) Temperature and precipitation climatology assessment over South Asia using the regional climate model (RegCM4.3): an evaluation of the model performance. J Earth Sci Clim Change 5:214. doi:10.4172/2157-7617.1000214

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer-Verlag, p 763

  • Hastings WK (1970) Monte carlo sampling methods using Markov chains and their applications. Biometrics 57:97–109

    Article  Google Scholar 

  • Hyndman RJ, Ullah MS (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51(10):4942–4956

    Article  Google Scholar 

  • Kaufmann CG, Sain SR (2010) Bayesian functional ANOVA modeling using Gaussian prior distributions. Bayesian Anal 5:123–150

    Article  Google Scholar 

  • Kottas A, Krnjajić M (2009) Bayesian semiparametric modeling in quantile regression. Scand J Stat 36:297–319

    Article  Google Scholar 

  • Kottas A, Behseta S, Moorman D, Poynor V, Olson C (2012) Bayesian nonparametric analysis of neuronal intensity rates. J Neurosci Methods 203(1):241–253

    Article  PubMed  Google Scholar 

  • Munch SB, Kottas A, Mangel M (2005) Bayesian nonparametric analysis of stock-recruitment relationships. Can J Fish Aquat Sci 62:1808–1821

    Article  Google Scholar 

  • Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin

    Book  Google Scholar 

  • NERC Centre for Population Biology, Imperial College (2010) The global population dynamics database version 2. http://www.sw.ic.ac.uk/cpb/cpb/gpdd.html

  • O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc 40(1):1–42

    Google Scholar 

  • Poynor V, Kottas A (2015) Nonparametric Bayesian inference for mean residual life functions in survival analysis. arXiv:1411.7481 [stat.ME]

  • Ramsay JO, Dalzell CJ (1991) Some tools for functional data analysis. J R Stat Soc 53(3):539–572

    Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    Google Scholar 

  • Ramsmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  • Rodriguez A, ter Horst E (2008) Bayesian dynamic density estimation. Bayesian Anal 3:339–366

    Article  Google Scholar 

  • Shi J, Wang B, Will E, West R (2012) Mixed-effects gaussian process functional regression models with application to dose–response curve prediction. Stat Med 57:97–109

    Google Scholar 

  • Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. CRC Press, Taylor and Francis Group, Boca Raton

    Google Scholar 

  • Shi JQ, Wang B (2008) Curve prediction and clustering with mixtures of Gaussian process functional regression models. Stat Comput 18:267–283

    Article  CAS  Google Scholar 

  • Shi JQ, Murray-Smith R, Titterington DM (2005) Hierarchical Gassign process mixtures for regression. Stat Comput 15:31–41

    Article  Google Scholar 

  • Shi JQ, Murray-Smith R, Titterington DM (2007) Gaussian process functional regression modeling for batch data. Biometrics 63:714–723

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank Marc Mangel and the Center for Stock Assessment Research for providing an enthusiastic and stimulating work environment as well as impeccable administrative support. S. Munch gratefully acknowledges the NMFS IAM program which provided financial support for Dr. Poynor while conducting this research. The authors also thank Marc Mangel and WhoSeung Lee for their helpful comments on this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Munch.

Additional information

Handling Editor: Pierre Dutilleul.

Appendices

Appendix 1: Rprop algorithm for HGP

We utilized the Rprop algorithm to establish the initial values of the parameters \({{\varvec{\theta }}} = (\tau ^2, \sigma ^2, \phi , \rho )'\) for the MCMC. We apply the algorithm described by Blum and Riedmiller (2013) for the standard Gaussian Process Regression model for The HGP. In general, the Rprop algorithm is a gradient-based optimization method. We use the algorithm to find the parameter set that maximizes the log likelihood of the HGP model with \(R_C = R_\varSigma \) being the squared exponential function. Note that we also assume the same variance parameter for the corresponding covariance functions. As described in Sect. 2.3, we marginalize the HGP of the unknown functions \(f_i(\cdot )\) to obtain the model stated in (4) and (5). The log likelihood of the model is then given by:

$$\begin{aligned} l({{\varvec{y}}})\propto & {} -0.5{{\varvec{y}}}'\varPhi ^{-1}{{\varvec{y}}}- 0.5log|\varPhi | \end{aligned}$$

The derivative of the log likelihood with respect to the kth parameter, \(\theta _k\), is given by:

$$\begin{aligned} \frac{\partial }{\partial \theta _k} l({{\varvec{y}}})\propto & {} 0.5tr\left( \varPhi ^{-1}{{\varvec{y}}}{{\varvec{y}}}'\varPhi ^{-1} - \varPhi ^{-1}\right) \frac{\partial \varPhi }{\partial \theta _k} \end{aligned}$$

Specifically,

$$\begin{aligned} \frac{\partial \varPhi }{\partial \tau ^2}(x_{ij},x_{ i'j'})= & {} \rho R_C(x_{ij},x_{i'j'}) + \delta _{ii'}(1 -\rho )R_\varSigma (x_{ij},x_{i'j'}) + \delta _{ij}\delta _{i'j'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \sigma ^2} (x_{ij},x_{ i'j'})= & {} \delta _{ii'}\delta _{jj'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \phi }(x_{ij},x_{ i'j'})= & {} -\rho \tau ^2R_C(x_{ij},x_{i'j'}) (x_{ij}-x_{i'j'})^2\\&- \delta _{ii'}(1-\rho )\tau ^2R_C(x_{ij},x_{i'j'}) (x_{ij}-x_{i'j'})^2 + \delta _{ii'}\delta _{jj'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \rho }(x_{ij},x_{ i'j'})= & {} \tau ^2R_C(x_{ij},x_{i'j'}) - \delta _{ii'}\tau ^2R_C(x_{ij},x_{i'j'}) + \delta _{ii'}\delta _{jj'}\sigma ^2 \end{aligned}$$

The sequential algorithm according to Rpop for time step, \(t = 1,\ldots ,T\) is as followings:

$$\begin{aligned} \theta _k^{(t+1)}= & {} \theta _k^{(t)} - sign \left( \frac{ -\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t)} \right) \varDelta ^{(t)}_k\\ \text{ where } \varDelta _k^{(t)}= & {} \left\{ \begin{array}{lll} \eta ^+ \varDelta ^{(t-1)}_k, &{}\quad \text{ if } \ \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t-1)}\cdot \frac{-\partial l ({{\varvec{y}}})}{\partial \theta _k}^{(t)} > 0\\ \eta ^-\varDelta ^{(t-1)}_k, &{}\quad \text{ if } \ \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t-1)}\cdot \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t)} <0\\ \varDelta ^{(t-1)}_k, &{}\quad \text{ else } \end{array} \right. \end{aligned}$$

such that \(\eta ^+ > 1\) and \(0<\eta ^-<1\) are specified. If the sign of the partial derivative of the negative log likelihood with respect to parameter \(\theta _k\) remains unchanged, then \(\theta _k\) continues in the same direction if already heading towards the minimum or otherwise reverses direction by a larger change of \(\eta ^+\varDelta _k^{(t-1)}\). If the partial derivative changes, then \(\theta _k\) reverses direction if the minimum was overshot or otherwise continues in the same direction by a smaller change of \(\eta ^-\varDelta \). If a local minimum has been reached no change is performed. An initial \(\varDelta _0\) value is specified along with a \(\varDelta _{ min }\) and \(\varDelta _{ max }\) such that the value of the change is bounded.

Appendix 2: Posterior sampling and predictive algorithms for HGP

This appendix describes the MCMC algorithm used to obtain posterior samples of the model parameters, \((\tau ^2, \sigma ^2, \phi , \rho )\), as well as predictive inference for new responses, \({{\varvec{y}}}^*\), over the vector of new input values, \({{\varvec{x}}}^*_i\). Recall, that the model is given by,

$$\begin{aligned} {{\varvec{y}}}|{{\varvec{X}}}, \rho , \phi , \tau ^2, \sigma ^2\sim & {} N_N({{\varvec{0}}}, \varPhi ) \nonumber \\ \rho , \phi , \tau ^2, \sigma ^2\sim & {} p(\rho )p( \phi ) p(\tau ^2)p( \sigma ^2) \end{aligned}$$

The elements of \(\varPhi \) are given by:

$$\begin{aligned} Cov\left[ y_{ij},y_{i'j'}\right]= & {} C(x_{ij}, x_{i'j'}) + \delta _{ii'}\varSigma (x_{ij}, x_{i'j'})+ \delta _{ii'}\delta _{jj'} \sigma ^2 \end{aligned}$$

where \(\delta _{ii'}\) is Kronecker delta function which is 1 if \(i=i'\) and 0 otherwise. We assume the same squared exponential covariance functions for \(C(\cdot )\) and \(\varSigma (\cdot )\). We use a uniform prior for the correlation parameter \(\rho \), and inverse gamma functions, \(\varGamma ^{-1}(a,b)\), with mean a / b for the other three: \(p(\tau ^2) \equiv \varGamma ^{-1}(2, b_\tau ), \ \ p(\sigma ^2) \equiv \varGamma ^{-1}(2, b_\sigma ), \ \ p(\phi ) \equiv \varGamma ^{-1}(2, b_\phi )\).

For the parameter updates, we use the Metropolis–Hastings algorithm (Hastings 1970) as conjugacy is not available. Initial values for \(\tau ^2_0\), \(\sigma ^2_0\), \(\phi _0\), and \(\rho _0\) are obtained using Rprop (see “Appendix 1”), and posterior samples \(t = 1,\ldots ,T\) are obtained via the Metropolis–Hastings algorithm below.

Values of the parameters are proposed by a multivariate normal distribution:

The proposed values are accepted with probability,

$$\begin{aligned} min\left\{ 1, \frac{N({{\varvec{y}}}| {{\varvec{X}}}, \rho ^{(*)}, \phi ^{(*)}, \tau ^{2(*)}, \sigma ^{2(*)})\varGamma ^{-1}(\tau ^{2(*)})\varGamma ^{-1}(\sigma ^{2(*)}) \varGamma ^{-1}(\phi ^{(*)}) \tau ^{2(*)}\sigma ^{2(*)}\phi ^{(*)}\rho ^{(*)}(1-\rho ^{(*)})}{N({{\varvec{y}}}| { {\varvec{X}}}, \rho ^{(t)}, \phi ^{(t)}, \tau ^{2(t)}, \sigma ^{2(t)})\varGamma ^{-1}(\tau ^{2(t)})\varGamma ^{-1}(\sigma ^{2(t)}) \varGamma ^{-1}(\phi ^{(t)}) \tau ^{2(t)}\sigma ^{2(t)}\phi ^{(t)}\rho ^{(t)}(1-\rho ^{(t)})} \right\} . \end{aligned}$$

Preliminary runs are made to establish the form of the covariance matrix of the proposal distribution, and \(c >0\) is specified to facilitate stability and good mixing of the posterior samples. A burn-in period is performed and thinning is performed to obtain independent posterior samples.

Predictive values of new responses for the ith population, \({{\varvec{y}}}_i^*\), over a grid of new input values, \( {{\varvec{X}}}^*\) can easily be obtained by integrating over the posterior samples of the model parameters:

$$\begin{aligned}&p\left( {{\varvec{y}}}_i^*| {{\varvec{X}}}^*, \mathscr {D}\right) = \int _{{\varvec{\theta }}} N\left( {{\varvec{m}}}^* ({{\varvec{\theta }}}), {{\varvec{S}}}^* ({{\varvec{\theta }}})\right) d{{\varvec{\theta }}}\\&\text{ where } \\&\quad {{\varvec{m}}}_i^*({{\varvec{\theta }}}) = B\varPhi ^{-1}{{\varvec{y}}}\\&\quad {{\varvec{S}}}_i^*({{\varvec{\theta }}}) = A - B\varPhi ^{-1}B' \end{aligned}$$

where \(A(x^*_{ij},x^*_{ij'}) = C(x^*_{ij},x^*_{ij'}) + \varSigma (x^*_{ij},x^*_{ij'}) + \delta _{jj'}\sigma ^2\) and \(B(x^*_{ij}, x_{i'j'}) = C(x^*_{ij}, x_{i'j'}) + \delta _{ii'}\varSigma (x^*_{ij}, x_{i'j'})\). Therefore, for each posterior sample of the model parameters, \({{\varvec{\theta }}}^{(t)}\), we can obtain a sample of the predicted response for the ith population, \({{\varvec{y}}}^*_i\), over the vector of new input values, \({{\varvec{x}}}^*_i\):

$$\begin{aligned} {{\varvec{y}}}^{*(t)}_i\sim & {} N_N\left( {{\varvec{m}}}_i^* ({{\varvec{\theta }}}^{(t)}), {{\varvec{S}}}_i^* ({{\varvec{\theta }}}^{(t)})\right) \nonumber \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Poynor, V., Munch, S. Combining functional data with hierarchical Gaussian process models. Environ Ecol Stat 24, 175–199 (2017). https://doi.org/10.1007/s10651-017-0366-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-017-0366-2

Keywords

Navigation