Abstract
Analysis of variance is a standard statistical modeling approach for comparing populations. The functional analysis setting envisions that mean functions are associated with the populations, customarily modeled using basis representations, and seeks to compare them. Here, we adopt the modeling approach of functions as realizations of stochastic processes. We extend the Gaussian process version to allow nonparametric specifications using Dirichlet process mixing. Several metrics are introduced for comparison of populations. Then we introduce a hierarchical Dirichlet process model which enables comparison of the population distributions, either directly or through functionals of interest using the foregoing metrics. The modeling is extended to allow us to switch the sampling scheme. There are still population level distributions but now we sample at levels of the functions, obtaining observations from potentially different individuals at different levels. We illustrate with both simulated data and a dataset of temperature versus depth measurements at different locations in the Atlantic Ocean.
Similar content being viewed by others
References
Antoniak, C. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, 2, 1152–1174.
Banerjee, S., Carlin, B., Gelfand, A. (2004). Hierarchical modeling and analysis for spatial data. Boca Raton, FL: Chapman and Hall/CRC Press.
Brumback, B., Rice, J. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of American Statistical Association, 93, 961–980.
Cressie, N. (1993). Statistics for spatial data. NY: Wiley.
DeIorio, M., Muller, P., Rosner, G., MacEachern, S. (2004). An ANOVA model for dependent random measures. Journal of American Statistical Association, 99, 205–215.
Dudley, R. M. (1976). Probabilities and metrics: Convergence of laws on metric spaces, with a view to statistical testing. Aarhus, Denmark: Aarhus Universitet, Matematisk Institut.
Dunson, D. (2010). Nonparametric Bayes applications to biostatistics. In N. Hjort, C. Holmes, P. Mueller, S. Walker (Eds.), Bayesian nonparametrics: Principles and practice (pp. 223–273). Cambridge, UK: Cambridge University Press.
Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209–230.
Ferraty, F., Vieu, P. (2006). Nonparametric functional data analysis: Theory and practice. New York: Springer.
Gelfand, A., Kottas, A., MacEachern, S. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of American Statistical Association, 100, 1021–1035.
Ishwaran, H., Sunil-Rao, J. (2005). Spike and slab variable selection: Bayesian and frequentist strategies. Annals of Statistics, 33, 730–773.
Ishwaran, H., Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica, 12, 941–963.
Kaufman, C., Sain, S. (2010). Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Analysis, 5, 123–150.
Kent, J. (1989). Continuity properties of random fields. Annals of Probability, 17, 1432–1440.
MacEachern, S. (1999). Dependent nonparametric processes. In ASA proceedings of the section on Bayesian statistical science (pp. 50–55). Alexandria, VA: American Statistical Association.
MacLehose, R. F., Dunson, D. (2009). Nonparametric Bayes kernel-based priors for functional data analysis. Statistica Sinica, 19, 611–629.
Morris, J. R., Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society B, 68, 179–199.
Nguyen, X. (2010). Inference of global clusters from locally distributed data. Bayesian Analysis, 5, 817–846.
Nguyen, X. (2013a). Borrowing strength in hierarchical Bayes: Convergence of the Dirichlet base measure. arxiv.org/abs/1301.0802.
Nguyen, X. (2013b). Convergence of latent mixing measures in finite and infinite mixture models. Annals of Statistics, 41, 370–400.
Nguyen, X., Gelfand, A. (2011). The Dirichlet labeling process for clustering functional data. Statistica Sinica, 21, 1249–1289.
Petrone, S., Guidani, M., Gelfand, A. (2009). Hybrid Dirichlet processes for functional data. Journal of the Royal Statistical Society B, 71(4), 755–782.
Ramsay, J. O., Silverman, B. (2006). Functional data analysis (2nd ed.). New York: Springer.
Rappold, A., Lavine, M., Lozier, S. (2007). Subjective Likelihood for the assessment of trends in the ocean’s mixed layer depth. Journal of American Statistical Association, 102, 771–787.
Rodriguez, A., Dunson, D., Gelfand, A. (2009). Bayesian nonparametric functional data analysis through density estimation. Biometrika, 96(1), 149–162.
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.
Spitzner, D., Marron, J., Essick, G. (2003). Mixed-model functional ANOVA for studying human tactile perception. Journal of the American Statistical Association, 98, 263–272.
Stein, M.L. (1999). Interpolation of spatial data: Some theory for kriging. New York: Springer-Verlag.
Teh, Y., Jordan, M., Beal, M., Blei, D. (2006). Hierarchical Dirichlet processes. Journal of American Statistical Association, 101, 1566–1581.
Wang, N., Carroll, R., Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of American Statistical Association, 100, 147–157.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by NSF grants No. 0940671 and No. 1047871 (XN).
Appendices
Appendix A: Inference of mean curves under GP prior
This section provides standard expressions for conditional expectation and variance of population mean curves given a collection of functional data. Suppose that the data \(\varvec{Y} = \{Y_{ui}(x)\}\) are observed at the same set of levels \(x_1,\ldots , x_m\). In the following we use \(\varvec{M}\) to collect all model parameters, \(\varvec{M} = (\varvec{\mu ,C,\sigma ,\tau })\). Given \(\varvec{Y}\) and \(\varvec{M}\), \(\varvec{\theta }_u= (\theta _u(x_1),\ldots , \theta _u(x_m))\) are independent for \(u\in V\). Let \(x_{01}, \ldots ,x_{0p}\) be \(p\) levels that are either placed regularly in \(B\), or uniformly sample from \(B\). For a given population \(u\), we need to derive the posterior distribution for both \(\varvec{\theta }_u\), and \(\varvec{\theta }_{0u} := (\theta _u(x_{01}),\ldots , \theta _u(x_{0p}))\).
Let \(\varvec{C}_u, \varvec{C}_{0u}\) be the a priori covariance matrices for \(\varvec{\theta }_u\) and \(\varvec{\theta }_{0u}\), respectively, while \(\varvec{R}_u\) be the covariance matrix of size \(m\times p\) for the two as given by the GP with covariance function \(\varvec{C}\). We have
We have \(\varvec{\theta }_{0u} | \varvec{\theta }_{u}, \varvec{M} \sim N_{p}(\tilde{\varvec{m}}, \tilde{\varvec{S}})\) where
Due to conditional independence relation, \(\varvec{\theta }_{0u} \perp \hbox {Data} | \varvec{\theta }_u, \varvec{M}\), so we have
Standard calculations yield
Finally, we need to sample \(\varvec{M} = (\varvec{\mu ,C, \tau , \sigma })\) conditionally on the data. This can be achieved via Gibbs sampling.
-
1.
Conditional for \(\varvec{\mu }\): This is normal with covariance matrix and mean specified by
$$\begin{aligned} \varvec{C}_\mu ^{-1}&=\sum \limits _{u}\left( \varvec{C}_u+ \tau _u^2\mathbf {I}_m\right) ^{-1} + \left( 1/\sigma _\mu ^2\right) \mathbf {I}_m,\\ \varvec{C}_\mu ^{-1}\varvec{\mu }_\mu&= \sum \limits _{u}\left( \varvec{C}_u+ \tau _u^2\mathbf {I}_m\right) ^{-1}\sum \limits _{i=1}^{n_u}\varvec{Y}_{ui}. \end{aligned}$$ -
2.
Conditional for \(\varvec{\tau }_u\), for each \(u\): Endow \(\tau _u\) with \(\hbox {igamma}(a_{\tau _u},b_{\tau _u})\), then the conditional for \(\tau _u^2\) is also \(\hbox {igamma}\) with updated parameters \(b_{\tau _u} := a_{\tau _u} + m n_u/2\) and \(b_{\tau _u} := b_{\tau _u} + \sum _{i=1}^{n_u} \Vert \varvec{Y}_{ui} - \varvec{\theta }_u\Vert ^2/2\).
-
3.
Conditional for \(\sigma _\mu \). Endow \(\sigma _\mu \) with \(\hbox {igamma}(a_\mu ,b_\mu )\) then the conditional for \(\sigma _\mu ^2\) is updated by \(a_{\mu } := a_{\mu } + m/2\) and \(b_{\mu } = b+1/2 \Vert \varvec{\mu }\Vert ^2\).
-
4.
Conditional for \(\varvec{C}_u\), for each \(u\): \(\varvec{C}_u\) is parameterized by exponential form, so that \(C_u(x_1,x_2) = \sigma _{C_u}^2 S_u\) where \(S_u(x_1,x_2) = \exp -\phi _u (x_1-x_2)^2\). Endow \(\sigma _{C_u}^2\) with \(\hbox {igamma}(a_{C_{u}},b_{C_{u}})\), which is updated via \(a_{C_u} = a_{C_u} + m/2\) and \(b_{C_{u}} := b_{C_{u}} + \frac{1}{2}(\varvec{\theta }_u - \varvec{\mu })^T \varvec{S}^{-1}(\varvec{\theta }_u - \varvec{\mu })\). \(\phi _u\) is updated via a symmetric Metropolis update, with an acceptance rate equal to \(\min (1,\exp -\frac{1}{2\sigma _u^2} (\varvec{\theta }_u - \varvec{\mu })^T({\tilde{\varvec{S}}}_u^{-1} - \varvec{S}_u^{-1}) (\varvec{\theta }_u - \varvec{\mu }))\).
Appendix B: Properties of summary metrics
Suppose that \(\varvec{\theta }\) is distributed according to a Gaussian process on a closed domain \(B \subset \mathbb {R}\) with mean \(\varvec{\mu }\) and covariance function \(\varvec{C}\). \(\varvec{C}\) can be viewed as a positive semidefinite kernel. Moreover, assume that \(\int C(x_1,x_2) d x_1 d x_2 < \infty \), and consider the integral operator \(L_{\varvec{C}}: L_2(B) \rightarrow L_2(B)\) induced by the kernel \(\varvec{C}\):
This is a self-adjoint, positive and compact operator with a countable systems of non-negative eigenvalues \(\{\lambda _k\}_{k=1}^{\infty }\) and associated eigenfunctions \(\{\varvec{\psi }_k\}_{k=1}^{\infty }\) which form an orthonormal basis of \(L_2(B)\). By Mercer’s theorem, \(\varvec{C}\) admits the following decomposition: \(C(x,x') = \sum _{k=1}^{\infty } \lambda _k \varvec{\psi }_k(x)\varvec{\psi }_k(x')\). Here the series converges absolutely for each pair \(x,x'\) and uniformly in \(B\). For each \(k \in \mathbb {N}_+\), define
By Karhunen–Loève’s theorem applied to Gaussian processes, \(\varvec{\theta }\) can be written as \(\varvec{\theta } = \varvec{\mu } + \sum _{k=1}^{\infty } \eta _k \varvec{\psi }_k\), where the convergence is almost sure and is uniform in \(x\). Moreover, the collection of coefficients \(\{\eta _k\}\) are independent mean-0 Gaussian variables with variance \(\mathrm{var}(\eta _k) = \lambda _k\), for any \(k \in \mathbb {N}_+\).
It is simple to obtain that \(m_1(\varvec{\theta })\) can be expressed in terms of a sum of chi-square and normal variables:
Due to the mutual independence of \(\eta _k\)’s, we obtain that
The variance takes the form
where we have used the fact that \(\mathbb {E}\eta _k = \mathbb {E}\eta _k^3 = 0\); \(\mathbb {E}\eta _k^2 = \lambda _k\), \(\mathbb {E}\eta _k^4 = 3\lambda _k^2\). Although the \(\lambda _k\) and \(\varvec{\psi }_k\) are determined directly from \(\varvec{C}\), except for some special cases closed forms are not available. In practice one might consider sampling for the variance instead.
Appendix C: Decomposition of variance and correlation
First, we study the relations among random measures in the model. \(G_0\) is a random measure that varies around \(H = \hbox {GP}(\varvec{\mu }, C)\), where the variation is governed by \(\gamma \). For each group \(u\), \(G_u\) is a random measure that varies around \(G_0\), where the variation is governed by \(\alpha \). For each level \(x\) and group \(u\), \(Q_{ux}\) varies around \(G_u\), where the variation is governed by \(\alpha _{u}\). Because \(G_0\sim \hbox {DP}(\gamma ,H)\), due to elementary properties of Dirichlet processes for any measurable set \(A\) of functions
Turning to the random measures \(G_u\) for each \(u \in V\),
Marginalizing out \(G_0\), we have
Next, for the random measures \(Q_{ux}\) at each level \(x \in D\), for any measurable set \(A_{x}\), as before
so that
Marginalizing out \(G_0\), we have
Next, let \(A\) and \(B\) are measurable sets with respect to observations at \(x_1\) and \(x_2\), respectively. For \(\varvec{\phi } \sim H\), let \(H_{x_1}(A) = P(\phi (x_1) \in A|H)\) and \(H_{x_1,x_2}(A,B) = P(\phi (x_1) \in A; \phi (x_2) \in B|H)\). Then similar calculation yields, for measure \(G_0\)
For measure \(G_u\), we have
Similarly, for \(Q_{ux}\):
In all expressions above, a priori, the concentration parameters regulate the fraction of variance or correlation that are passed from one level in the Bayesian hierarchy to the next, starting from the base measure \(H\), which regulates the dependence with respect to covariate \(x\). Last, we note that similar calculations can be carried out between populations. We omit the details.
Appendix D: Posterior computation for the model in (9)
We recall and introduce key notations: \(\varvec{\phi }_k\) is a random draw from \(H\), \(\varvec{\psi }_t\) a random draw from \(G_0\), \(\varvec{\varphi }_{ur}\) a random draw from \(G_u\). Finally, \(\theta _{ui}(x)\) is a random draw from \(Q_{ux}\).
Let \(k_t\) denote the index of the \(\varvec{\phi }_k\) associated with the functional atom \(\varvec{\psi }_t\), i.e., \(\varvec{\psi }_t = \varvec{\phi }_{k_t}\). Let \(t_{ur}\) denote the index of the \(\varvec{\psi }_t\) associated with the functional atom \(\varvec{\varphi }_{ur}\) in group \(u\), i.e., \(\varvec{\varphi }_{ur} = \varvec{\psi }_{t_{ur}}\). Let \(r_{ui}^{x}\) denote the index of the \(\varvec{\varphi }_{ur}(x)\) associated with the atom \(\theta _{ui}(x)\), i.e., \(\theta _{ui}(x) = \varvec{\varphi }_{ur_{ui}^{x}}(x)\). The local and functional atoms are related by \(\theta _{ui}(x) = \varvec{\varphi }_{ur_{ui}^{x}}(x) = \varvec{\psi }_{t_{ur_{ui}^{x}}}(x) = \varvec{\phi }_{k_{t_{ur_{ui}^{x}}}}(x)\).
Recall that a priori \(G_0 \sim \hbox {DP}(\gamma ,H)\). Due to a standard property of a Dirichlet process, conditioning on the global factors \(\varvec{\phi }_k\)’s and the index vector \(\varvec{k}\), the posterior distribution of \(G_0\) is distributed according to a DP: \([G_0 | \varvec{k}, \varvec{\phi }_{1}, \ldots , \varvec{\phi }_K] \sim \hbox {DP}(\gamma + q_{\cdot }, \frac{\gamma H + \sum _{k=1}^{K} q_k\delta _{\varvec{\phi }_k}}{\gamma + q_{\cdot }})\), where \(q_k = \#\{t: k_t = k\}\) denotes the number of \(\varvec{\psi }_t\)’s associating with \(\varvec{\phi }_k\), and \(q_{\cdot } = \sum _{k=1}^{K} q_k\). This implies an explicit representation for \(G_0\) as follows:
Similarly, conditionally on \(G_0\), the random distributions \(G_u\) are independent across the group indices \(u\). In particular, given \(G_0\), \(\varvec{k}\), \(\varvec{t}_u\) and the \(\varvec{\phi }_k\)’s, the posterior of \(G_u\) is distributed as \([G_u | G_0, \varvec{k}, \varvec{t}, (\varvec{\phi }_k)_{k=1}^{K}] \sim \hbox {DP}(\alpha _0 + m_{u\cdot }, \frac{\alpha _0 G_0 + \sum _{k=1}^{K}m_{uk}\delta _{\varvec{\phi }_k}}{\alpha _0 + m_{u\cdot }})\), where \(m_{uk} = \#\{r: k_{t_{ur}} = k\}\), the number of \(\varphi _{ur}\) associated with \(\varvec{\phi }_k\), and \(m_{u\cdot } = \sum _{k=1}^{K} m_{uk}\). This implies the following representation for \(G_u\): \(G_u = \sum _{k=1}^{K}\pi _{uk}\delta _{\varvec{\phi }_k} + \pi _{u\mathrm{new}} G_u^{\mathrm{new}}\), where \(G_u^{\mathrm{new}} \sim \hbox {DP}(\alpha _0\beta _{\mathrm{new}}, G_0^{\mathrm{new}})\) and
Once more, conditionally on \(G_u\), the random distributions \(Q_{ux}\) are independent across levels \(x\). In particular, given \(G_u\), \(\varvec{k}\), \(\varvec{t}_u\), \(\varvec{r}_u^x\), and the \(\varvec{\phi }_k\)’s, the posterior of \(Q_{ux}\) is distributed as
where \(n_{uxk} = \#\{i: k_{t_{ur_{ui}^{x}}} = k\}\), the number of \(\theta _{ui}(x)\) associated with \(\varvec{\phi }_k(x)\), and \(n_{ux\cdot } = \sum _{k=1}^{K} n_{uxk}\). This implies the following representation for \(Q_{ux}\):
The above characterization suggests a straightforward Gibbs sampling algorithm by constructing a Markov chain for \((\varvec{\phi }_k)_{k=1}^{K}, \varvec{k}, \varvec{t}, \varvec{r})\). To simplify the implementation by avoiding the book-keeping steps of the index variables, we will consider a modified block Gibbs sampling algorithm by constructing a Markov chain for the count variables (e.g., \(\varvec{q},\varvec{m},\varvec{n}\)) instead. We will still need the index variable \(z_{uxi}\), which denotes the index of the global atom \(\varvec{\phi }_k\) that local atom \(\theta _{ui}(x)\) is associated with, i.e., \(z_{uxi} = k_{t_{ur_{ui}^{x}}}\). Note that the likelihood of the data involves only the \(z_{uxi}\) variables, and that \(\varvec{n}_{ux}\) can be calculated directly in terms of \(z_{uxi}\)’s:
We proceed to describe a block Gibbs sampler by considering a Markov chain for \((\varvec{\phi },\varvec{q},\varvec{m},\varvec{n},\varvec{z},\varvec{\beta }, \varvec{\pi },\varvec{\omega })\).
Sampling \(\varvec{\beta },\varvec{\pi }, \varvec{\omega }\). Conditional probabilities: \([\varvec{\beta }|\varvec{q},\gamma ]\times \prod _{u}[\varvec{\pi }_u|\varvec{m}_u,\varvec{\beta },\alpha _0] \times \prod _{u}\prod _{x}[\varvec{\omega }_{ux}|\varvec{n}_{ux},\varvec{\pi }_{u},\alpha _u]\) are given by Eqs. (15–17).
Sampling of \(\varvec{z}\). Note that a priori, \(z_{uxi}|\varvec{\omega }_{ux} \sim \varvec{\omega }_{ux}\). Let \(n_{uxk}^{-uxi}\) denote the number of data items in the group \(u\) and level \(x\), except \(y_{ui}(x)\), associating with the mixture component \(k\). Then,
where \(f_{uxk^{\mathrm{new}}}^{y_{ui}(x)}(y_{ui}(x)) = \int F(y_{ui}(x)|\varvec{\phi }(x)) d H(\varvec{\phi }(x))\) is the prior density of \(y_{ui}(x)\).
Sampling of \(\varvec{m}_u\). Recall that \(m_{uk}\) is the number of functional atoms \(\varvec{\varphi }_{ur}\) associated with \(\varvec{\phi }_k\) within each group \(u\). This set of functional atoms \(\varphi _{ur}\)’s can be subdivided into disjoint subsets associated with levels \(x\in D\) when the functional atoms \(\varphi _{ur}\) are first generated. Let \(m_{uxk}\) be the number of such functional atoms corresponding to the level \(x\). To be precise,
\(m_{uxk}\) corresponds to the number of partitions among the \(n_{uxk}\) atoms \(\theta _{ui}(x)\) such that \(z_{uxi} = k\). To obtain the distribution of \(m_{uxk}\), consider the distribution of \(r_{ui}^{x}\) conditionally on \(G_u\) (i.e., \(\varvec{\pi }_u\), \(\varvec{\phi }_k\)’s). Note that given \(G_u\), the \(Q_{ux}\) are independent across \(x\)’s. For each atom \(\theta _{ui}(x)\), the probability of being assigned to an existing atom \(\varvec{\varphi }_{ur}(x)\) such that \(k_{t_{ur}} = k\) is
while the probability of being assigned to a new atom \(\varvec{\varphi }_{ur^{\mathrm{new}}}(x)\) is
where \(n_{ux\cdot r}^{-uxi} := \#\{i': r_{ui'}^{x} = r; uxi \ne uxi' \}\), the number of data items at group \(u\) and level \(x\) except \(y_{ui}(x)\) that are associated with \(\varvec{\varphi }_{ur}\). This implies that \(m_{uxk}\) is the number of partitions that arise in a population of \(n_{uxk}\) data items, whose distribution is distributed according to a Dirichlet process with concentration parameter \(\alpha _u\pi _{uk}\). It was shown by Antoniak (1974) that the distribution of \(m_{uxk}\) has the form
where \(s(n,m)\) are unsigned Stirling number of the first kind.
Sampling \(\varvec{q}\). The conditional distribution of \(\varvec{q}\) can be obtained in a similar manner as \(\varvec{m}\). It can be shown that \(q_{k} = \sum _{u\in V} q_{uk}\) where \(q_{uk} = \#\{t: k_{t_{ur}} = k \;\hbox {for some}\; r\}\). Moreover, \(q_{uk}\) is the number of partitions that arise in a population of \(m_{uk}\) atoms, whose distributed according to a Dirichlet process with concentration parameter \(\alpha _0\beta _k\):
Sampling \(\varvec{\phi }\). The conditional distribution for \(\varvec{\phi }\) can be obtained easily. Suppose that the prior distribution \(H\) for \(\varvec{\phi }_k\) is given by a mean function \(\varvec{\mu }\) and covariance function \(\varvec{C}\), which is reduced to a covariance matrix \(\varvec{C}_k\) when restricted to a finite number of covariate values for \(x\). Then the posterior distribution for \(\varvec{\phi }_k\) is also Gaussian with mean and covariance expressions given as follows:
About this article
Cite this article
Nguyen, X., Gelfand, A.E. Bayesian nonparametric modeling for functional analysis of variance. Ann Inst Stat Math 66, 495–526 (2014). https://doi.org/10.1007/s10463-013-0436-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-013-0436-7