Combining functional data with hierarchical Gaussian process models

Poynor, Valerie; Munch, Stephan

doi:10.1007/s10651-017-0366-2

Combining functional data with hierarchical Gaussian process models

Published: 17 February 2017

Volume 24, pages 175–199, (2017)
Cite this article

Environmental and Ecological Statistics Aims and scope Submit manuscript

Valerie Poynor¹ &
Stephan Munch²

452 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Gaussian process models have been used in applications ranging from machine learning to fisheries management. In the Bayesian framework, the Gaussian process is used as a prior for unknown functions, allowing the data to drive the relationship between inputs and outputs. In our research, we consider a scenario in which response and input data are available from several similar, but not necessarily identical, sources. When little information is known about one or more of the populations it may be advantageous to model all populations together. We present a hierarchical Gaussian process model with a structure that allows distinct features for each source as well as shared underlying characteristics. Key features and properties of the model are discussed and demonstrated in a number of simulation examples. The model is then applied to a data set consisting of three populations of Rotifer Brachionus calyciflorus Pallas. Specifically, we model the log growth rate of the populations using a combination of lagged population sizes. The various lag combinations are formally compared to obtain the best model inputs. We then formally compare the leading hierarchical Gaussian process model with the inferential results obtained under the independent Gaussian process model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of structural equation modeling (SEM) in ecological studies: an updated review

Article Open access 22 November 2016

Complexity and stability of ecological networks: a review of the theory

Article Open access 06 July 2018

Identifying anomalous patterns in ecological communities’ diversity: leveraging functional boxplots and clustering of normalized Hill’s numbers and their integral functions

Article Open access 15 April 2024

References

Alder RJ (1981) The geometry of random fields. Wiley, New York
Google Scholar
Banerjee S, Gelfand A, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70(4):825–848
Article Google Scholar
Blum M, Riedmiller M (2013) Optimization of Gaussian process hyperparameters using Rprop. IN: ESANN 2013 Belgium, European symposium on artificial neural networks. Computational Intelligence and Machine Learning, pp 24–26
Dension DGT, Holmes BKMCC, Smith AFM (2002) Bayesian methods for nonlinear classification and regression. Wiley, New York
Google Scholar
DeIorio M, Johnson WO, Müller P, Rosner GL (2009) Bayesian nonparametric nonproportional hazards survival modelings. Biometrics 65:762–771
Article Google Scholar
Fronczyk K, Kottas A (2010) A bayesian nonparametric modeling framework for developmental toxicity studies. Technical report, University of California, Santa Cruz
Gelfand AE, Gosh SK (1998) Model choice: a minimum posterior predictive loss approach. Biometrika 85(1):1–11
Article Google Scholar
Gelfand AE, Sahu S (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253
Article Google Scholar
Gomulkiewicz R, Kirkpatrick M (1992) Quantitative genetics and the evolution of reaction norms. Evolution 46(2):390–411
Article Google Scholar
Halbach U (1973) Effects of temperature on ectothermic organisms, Ecology, chap. In: Life table data and populations dynamics of the rotifer Brachionus calyciflorus Pallas as influenced by periodically oscillating temperature. Springer, pp 217–228
Hassan M, Penfei D, Iqbal W, Can W, Wei F, Ba W (2014) Temperature and precipitation climatology assessment over South Asia using the regional climate model (RegCM4.3): an evaluation of the model performance. J Earth Sci Clim Change 5:214. doi:10.4172/2157-7617.1000214
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer-Verlag, p 763
Hastings WK (1970) Monte carlo sampling methods using Markov chains and their applications. Biometrics 57:97–109
Article Google Scholar
Hyndman RJ, Ullah MS (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51(10):4942–4956
Article Google Scholar
Kaufmann CG, Sain SR (2010) Bayesian functional ANOVA modeling using Gaussian prior distributions. Bayesian Anal 5:123–150
Article Google Scholar
Kottas A, Krnjajić M (2009) Bayesian semiparametric modeling in quantile regression. Scand J Stat 36:297–319
Article Google Scholar
Kottas A, Behseta S, Moorman D, Poynor V, Olson C (2012) Bayesian nonparametric analysis of neuronal intensity rates. J Neurosci Methods 203(1):241–253
Article PubMed Google Scholar
Munch SB, Kottas A, Mangel M (2005) Bayesian nonparametric analysis of stock-recruitment relationships. Can J Fish Aquat Sci 62:1808–1821
Article Google Scholar
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin
Book Google Scholar
NERC Centre for Population Biology, Imperial College (2010) The global population dynamics database version 2. http://www.sw.ic.ac.uk/cpb/cpb/gpdd.html
O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc 40(1):1–42
Google Scholar
Poynor V, Kottas A (2015) Nonparametric Bayesian inference for mean residual life functions in survival analysis. arXiv:1411.7481 [stat.ME]
Ramsay JO, Dalzell CJ (1991) Some tools for functional data analysis. J R Stat Soc 53(3):539–572
Google Scholar
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Google Scholar
Ramsmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Google Scholar
Rodriguez A, ter Horst E (2008) Bayesian dynamic density estimation. Bayesian Anal 3:339–366
Article Google Scholar
Shi J, Wang B, Will E, West R (2012) Mixed-effects gaussian process functional regression models with application to dose–response curve prediction. Stat Med 57:97–109
Google Scholar
Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. CRC Press, Taylor and Francis Group, Boca Raton
Google Scholar
Shi JQ, Wang B (2008) Curve prediction and clustering with mixtures of Gaussian process functional regression models. Stat Comput 18:267–283
Article CAS Google Scholar
Shi JQ, Murray-Smith R, Titterington DM (2005) Hierarchical Gassign process mixtures for regression. Stat Comput 15:31–41
Article Google Scholar
Shi JQ, Murray-Smith R, Titterington DM (2007) Gaussian process functional regression modeling for batch data. Biometrics 63:714–723
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank Marc Mangel and the Center for Stock Assessment Research for providing an enthusiastic and stimulating work environment as well as impeccable administrative support. S. Munch gratefully acknowledges the NMFS IAM program which provided financial support for Dr. Poynor while conducting this research. The authors also thank Marc Mangel and WhoSeung Lee for their helpful comments on this manuscript.

Author information

Authors and Affiliations

Mathematics Department, California State University of Fullerton, Fullerton, CA, USA
Valerie Poynor
Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Santa Cruz, CA, USA
Stephan Munch

Authors

Valerie Poynor
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Munch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Munch.

Additional information

Handling Editor: Pierre Dutilleul.

Appendices

Appendix 1: Rprop algorithm for HGP

We utilized the Rprop algorithm to establish the initial values of the parameters ${{\varvec{\theta }}} = (\tau ^2, \sigma ^2, \phi , \rho )'$ for the MCMC. We apply the algorithm described by Blum and Riedmiller (2013) for the standard Gaussian Process Regression model for The HGP. In general, the Rprop algorithm is a gradient-based optimization method. We use the algorithm to find the parameter set that maximizes the log likelihood of the HGP model with $R_C = R_\varSigma $ being the squared exponential function. Note that we also assume the same variance parameter for the corresponding covariance functions. As described in Sect. 2.3, we marginalize the HGP of the unknown functions $f_i(\cdot )$ to obtain the model stated in (4) and (5). The log likelihood of the model is then given by:

$$\begin{aligned} l({{\varvec{y}}})\propto & {} -0.5{{\varvec{y}}}'\varPhi ^{-1}{{\varvec{y}}}- 0.5log|\varPhi | \end{aligned}$$

The derivative of the log likelihood with respect to the kth parameter, $\theta _k$, is given by:

$$\begin{aligned} \frac{\partial }{\partial \theta _k} l({{\varvec{y}}})\propto & {} 0.5tr\left( \varPhi ^{-1}{{\varvec{y}}}{{\varvec{y}}}'\varPhi ^{-1} - \varPhi ^{-1}\right) \frac{\partial \varPhi }{\partial \theta _k} \end{aligned}$$

Specifically,

$$\begin{aligned} \frac{\partial \varPhi }{\partial \tau ^2}(x_{ij},x_{ i'j'})= & {} \rho R_C(x_{ij},x_{i'j'}) + \delta _{ii'}(1 -\rho )R_\varSigma (x_{ij},x_{i'j'}) + \delta _{ij}\delta _{i'j'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \sigma ^2} (x_{ij},x_{ i'j'})= & {} \delta _{ii'}\delta _{jj'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \phi }(x_{ij},x_{ i'j'})= & {} -\rho \tau ^2R_C(x_{ij},x_{i'j'}) (x_{ij}-x_{i'j'})^2\\&- \delta _{ii'}(1-\rho )\tau ^2R_C(x_{ij},x_{i'j'}) (x_{ij}-x_{i'j'})^2 + \delta _{ii'}\delta _{jj'}\sigma ^2\\ \frac{\partial \varPhi }{\partial \rho }(x_{ij},x_{ i'j'})= & {} \tau ^2R_C(x_{ij},x_{i'j'}) - \delta _{ii'}\tau ^2R_C(x_{ij},x_{i'j'}) + \delta _{ii'}\delta _{jj'}\sigma ^2 \end{aligned}$$

The sequential algorithm according to Rpop for time step, $t = 1,\ldots ,T$ is as followings:

$$\begin{aligned} \theta _k^{(t+1)}= & {} \theta _k^{(t)} - sign \left( \frac{ -\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t)} \right) \varDelta ^{(t)}_k\\ \text{ where } \varDelta _k^{(t)}= & {} \left\{ \begin{array}{lll} \eta ^+ \varDelta ^{(t-1)}_k, &{}\quad \text{ if } \ \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t-1)}\cdot \frac{-\partial l ({{\varvec{y}}})}{\partial \theta _k}^{(t)} > 0\\ \eta ^-\varDelta ^{(t-1)}_k, &{}\quad \text{ if } \ \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t-1)}\cdot \frac{-\partial l({{\varvec{y}}})}{\partial \theta _k}^{(t)} <0\\ \varDelta ^{(t-1)}_k, &{}\quad \text{ else } \end{array} \right. \end{aligned}$$

such that $\eta ^+ > 1$ and $0<\eta ^-<1$ are specified. If the sign of the partial derivative of the negative log likelihood with respect to parameter $\theta _k$ remains unchanged, then $\theta _k$ continues in the same direction if already heading towards the minimum or otherwise reverses direction by a larger change of $\eta ^+\varDelta _k^{(t-1)}$. If the partial derivative changes, then $\theta _k$ reverses direction if the minimum was overshot or otherwise continues in the same direction by a smaller change of $\eta ^-\varDelta $. If a local minimum has been reached no change is performed. An initial $\varDelta _0$ value is specified along with a $\varDelta _{ min }$ and $\varDelta _{ max }$ such that the value of the change is bounded.

Appendix 2: Posterior sampling and predictive algorithms for HGP

This appendix describes the MCMC algorithm used to obtain posterior samples of the model parameters, $(\tau ^2, \sigma ^2, \phi , \rho )$, as well as predictive inference for new responses, ${{\varvec{y}}}^*$, over the vector of new input values, ${{\varvec{x}}}^*_i$. Recall, that the model is given by,

$$\begin{aligned} {{\varvec{y}}}|{{\varvec{X}}}, \rho , \phi , \tau ^2, \sigma ^2\sim & {} N_N({{\varvec{0}}}, \varPhi ) \nonumber \\ \rho , \phi , \tau ^2, \sigma ^2\sim & {} p(\rho )p( \phi ) p(\tau ^2)p( \sigma ^2) \end{aligned}$$

The elements of $\varPhi $ are given by:

$$\begin{aligned} Cov\left[ y_{ij},y_{i'j'}\right]= & {} C(x_{ij}, x_{i'j'}) + \delta _{ii'}\varSigma (x_{ij}, x_{i'j'})+ \delta _{ii'}\delta _{jj'} \sigma ^2 \end{aligned}$$

where $\delta _{ii'}$ is Kronecker delta function which is 1 if $i=i'$ and 0 otherwise. We assume the same squared exponential covariance functions for $C(\cdot )$ and $\varSigma (\cdot )$. We use a uniform prior for the correlation parameter $\rho $, and inverse gamma functions, $\varGamma ^{-1}(a,b)$, with mean a / b for the other three: $p(\tau ^2) \equiv \varGamma ^{-1}(2, b_\tau ), \ \ p(\sigma ^2) \equiv \varGamma ^{-1}(2, b_\sigma ), \ \ p(\phi ) \equiv \varGamma ^{-1}(2, b_\phi )$.

For the parameter updates, we use the Metropolis–Hastings algorithm (Hastings 1970) as conjugacy is not available. Initial values for $\tau ^2_0$, $\sigma ^2_0$, $\phi _0$, and $\rho _0$ are obtained using Rprop (see “Appendix 1”), and posterior samples $t = 1,\ldots ,T$ are obtained via the Metropolis–Hastings algorithm below.

Values of the parameters are proposed by a multivariate normal distribution:

The proposed values are accepted with probability,

$$\begin{aligned} min\left\{ 1, \frac{N({{\varvec{y}}}| {{\varvec{X}}}, \rho ^{(*)}, \phi ^{(*)}, \tau ^{2(*)}, \sigma ^{2(*)})\varGamma ^{-1}(\tau ^{2(*)})\varGamma ^{-1}(\sigma ^{2(*)}) \varGamma ^{-1}(\phi ^{(*)}) \tau ^{2(*)}\sigma ^{2(*)}\phi ^{(*)}\rho ^{(*)}(1-\rho ^{(*)})}{N({{\varvec{y}}}| { {\varvec{X}}}, \rho ^{(t)}, \phi ^{(t)}, \tau ^{2(t)}, \sigma ^{2(t)})\varGamma ^{-1}(\tau ^{2(t)})\varGamma ^{-1}(\sigma ^{2(t)}) \varGamma ^{-1}(\phi ^{(t)}) \tau ^{2(t)}\sigma ^{2(t)}\phi ^{(t)}\rho ^{(t)}(1-\rho ^{(t)})} \right\} . \end{aligned}$$

Preliminary runs are made to establish the form of the covariance matrix of the proposal distribution, and $c >0$ is specified to facilitate stability and good mixing of the posterior samples. A burn-in period is performed and thinning is performed to obtain independent posterior samples.

Predictive values of new responses for the ith population, ${{\varvec{y}}}_i^*$, over a grid of new input values, $ {{\varvec{X}}}^*$ can easily be obtained by integrating over the posterior samples of the model parameters:

$$\begin{aligned}&p\left( {{\varvec{y}}}_i^*| {{\varvec{X}}}^*, \mathscr {D}\right) = \int _{{\varvec{\theta }}} N\left( {{\varvec{m}}}^* ({{\varvec{\theta }}}), {{\varvec{S}}}^* ({{\varvec{\theta }}})\right) d{{\varvec{\theta }}}\\&\text{ where } \\&\quad {{\varvec{m}}}_i^*({{\varvec{\theta }}}) = B\varPhi ^{-1}{{\varvec{y}}}\\&\quad {{\varvec{S}}}_i^*({{\varvec{\theta }}}) = A - B\varPhi ^{-1}B' \end{aligned}$$

where $A(x^*_{ij},x^*_{ij'}) = C(x^*_{ij},x^*_{ij'}) + \varSigma (x^*_{ij},x^*_{ij'}) + \delta _{jj'}\sigma ^2$ and $B(x^*_{ij}, x_{i'j'}) = C(x^*_{ij}, x_{i'j'}) + \delta _{ii'}\varSigma (x^*_{ij}, x_{i'j'})$. Therefore, for each posterior sample of the model parameters, ${{\varvec{\theta }}}^{(t)}$, we can obtain a sample of the predicted response for the ith population, ${{\varvec{y}}}^*_i$, over the vector of new input values, ${{\varvec{x}}}^*_i$:

$$\begin{aligned} {{\varvec{y}}}^{*(t)}_i\sim & {} N_N\left( {{\varvec{m}}}_i^* ({{\varvec{\theta }}}^{(t)}), {{\varvec{S}}}_i^* ({{\varvec{\theta }}}^{(t)})\right) \nonumber \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poynor, V., Munch, S. Combining functional data with hierarchical Gaussian process models. Environ Ecol Stat 24, 175–199 (2017). https://doi.org/10.1007/s10651-017-0366-2

Download citation

Received: 05 February 2016
Revised: 24 January 2017
Published: 17 February 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10651-017-0366-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining functional data with hierarchical Gaussian process models

Abstract

Access this article

Similar content being viewed by others

Applications of structural equation modeling (SEM) in ecological studies: an updated review

Complexity and stability of ecological networks: a review of the theory

Identifying anomalous patterns in ecological communities’ diversity: leveraging functional boxplots and clustering of normalized Hill’s numbers and their integral functions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Rprop algorithm for HGP

Appendix 2: Posterior sampling and predictive algorithms for HGP

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining functional data with hierarchical Gaussian process models

Abstract

Access this article

Similar content being viewed by others

Applications of structural equation modeling (SEM) in ecological studies: an updated review

Complexity and stability of ecological networks: a review of the theory

Identifying anomalous patterns in ecological communities’ diversity: leveraging functional boxplots and clustering of normalized Hill’s numbers and their integral functions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Rprop algorithm for HGP

Appendix 2: Posterior sampling and predictive algorithms for HGP

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation