Skip to main content
Log in

Bayesian nonparametric modeling for functional analysis of variance

  • special issue: Bayesian Inference and Stochastic Computation
  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Analysis of variance is a standard statistical modeling approach for comparing populations. The functional analysis setting envisions that mean functions are associated with the populations, customarily modeled using basis representations, and seeks to compare them. Here, we adopt the modeling approach of functions as realizations of stochastic processes. We extend the Gaussian process version to allow nonparametric specifications using Dirichlet process mixing. Several metrics are introduced for comparison of populations. Then we introduce a hierarchical Dirichlet process model which enables comparison of the population distributions, either directly or through functionals of interest using the foregoing metrics. The modeling is extended to allow us to switch the sampling scheme. There are still population level distributions but now we sample at levels of the functions, obtaining observations from potentially different individuals at different levels. We illustrate with both simulated data and a dataset of temperature versus depth measurements at different locations in the Atlantic Ocean.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Antoniak, C. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, 2, 1152–1174.

    Article  MATH  MathSciNet  Google Scholar 

  • Banerjee, S., Carlin, B., Gelfand, A. (2004). Hierarchical modeling and analysis for spatial data. Boca Raton, FL: Chapman and Hall/CRC Press.

  • Brumback, B., Rice, J. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of American Statistical Association, 93, 961–980.

    Google Scholar 

  • Cressie, N. (1993). Statistics for spatial data. NY: Wiley.

    Google Scholar 

  • DeIorio, M., Muller, P., Rosner, G., MacEachern, S. (2004). An ANOVA model for dependent random measures. Journal of American Statistical Association, 99, 205–215.

    Google Scholar 

  • Dudley, R. M. (1976). Probabilities and metrics: Convergence of laws on metric spaces, with a view to statistical testing. Aarhus, Denmark: Aarhus Universitet, Matematisk Institut.

    MATH  Google Scholar 

  • Dunson, D. (2010). Nonparametric Bayes applications to biostatistics. In N. Hjort, C. Holmes, P. Mueller, S. Walker (Eds.), Bayesian nonparametrics: Principles and practice (pp. 223–273). Cambridge, UK: Cambridge University Press.

  • Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209–230.

    Google Scholar 

  • Ferraty, F., Vieu, P. (2006). Nonparametric functional data analysis: Theory and practice. New York: Springer.

  • Gelfand, A., Kottas, A., MacEachern, S. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of American Statistical Association, 100, 1021–1035.

    Google Scholar 

  • Ishwaran, H., Sunil-Rao, J. (2005). Spike and slab variable selection: Bayesian and frequentist strategies. Annals of Statistics, 33, 730–773.

    Google Scholar 

  • Ishwaran, H., Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica, 12, 941–963.

    Google Scholar 

  • Kaufman, C., Sain, S. (2010). Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Analysis, 5, 123–150.

    Google Scholar 

  • Kent, J. (1989). Continuity properties of random fields. Annals of Probability, 17, 1432–1440.

    Google Scholar 

  • MacEachern, S. (1999). Dependent nonparametric processes. In ASA proceedings of the section on Bayesian statistical science (pp. 50–55). Alexandria, VA: American Statistical Association.

  • MacLehose, R. F., Dunson, D. (2009). Nonparametric Bayes kernel-based priors for functional data analysis. Statistica Sinica, 19, 611–629.

    Google Scholar 

  • Morris, J. R., Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society B, 68, 179–199.

    Google Scholar 

  • Nguyen, X. (2010). Inference of global clusters from locally distributed data. Bayesian Analysis, 5, 817–846.

    Article  MathSciNet  Google Scholar 

  • Nguyen, X. (2013a). Borrowing strength in hierarchical Bayes: Convergence of the Dirichlet base measure. arxiv.org/abs/1301.0802.

  • Nguyen, X. (2013b). Convergence of latent mixing measures in finite and infinite mixture models. Annals of Statistics, 41, 370–400.

    Article  MATH  MathSciNet  Google Scholar 

  • Nguyen, X., Gelfand, A. (2011). The Dirichlet labeling process for clustering functional data. Statistica Sinica, 21, 1249–1289.

    Google Scholar 

  • Petrone, S., Guidani, M., Gelfand, A. (2009). Hybrid Dirichlet processes for functional data. Journal of the Royal Statistical Society B, 71(4), 755–782.

    Google Scholar 

  • Ramsay, J. O., Silverman, B. (2006). Functional data analysis (2nd ed.). New York: Springer.

  • Rappold, A., Lavine, M., Lozier, S. (2007). Subjective Likelihood for the assessment of trends in the ocean’s mixed layer depth. Journal of American Statistical Association, 102, 771–787.

    Google Scholar 

  • Rodriguez, A., Dunson, D., Gelfand, A. (2009). Bayesian nonparametric functional data analysis through density estimation. Biometrika, 96(1), 149–162.

    Google Scholar 

  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.

    MATH  MathSciNet  Google Scholar 

  • Spitzner, D., Marron, J., Essick, G. (2003). Mixed-model functional ANOVA for studying human tactile perception. Journal of the American Statistical Association, 98, 263–272.

    Google Scholar 

  • Stein, M.L. (1999). Interpolation of spatial data: Some theory for kriging. New York: Springer-Verlag.

  • Teh, Y., Jordan, M., Beal, M., Blei, D. (2006). Hierarchical Dirichlet processes. Journal of American Statistical Association, 101, 1566–1581.

    Google Scholar 

  • Wang, N., Carroll, R., Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. Journal of American Statistical Association, 100, 147–157.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to XuanLong Nguyen.

Additional information

This work was partially supported by NSF grants No. 0940671 and No. 1047871 (XN).

Appendices

Appendix A: Inference of mean curves under GP prior

This section provides standard expressions for conditional expectation and variance of population mean curves given a collection of functional data. Suppose that the data \(\varvec{Y} = \{Y_{ui}(x)\}\) are observed at the same set of levels \(x_1,\ldots , x_m\). In the following we use \(\varvec{M}\) to collect all model parameters, \(\varvec{M} = (\varvec{\mu ,C,\sigma ,\tau })\). Given \(\varvec{Y}\) and \(\varvec{M}\), \(\varvec{\theta }_u= (\theta _u(x_1),\ldots , \theta _u(x_m))\) are independent for \(u\in V\). Let \(x_{01}, \ldots ,x_{0p}\) be \(p\) levels that are either placed regularly in \(B\), or uniformly sample from \(B\). For a given population \(u\), we need to derive the posterior distribution for both \(\varvec{\theta }_u\), and \(\varvec{\theta }_{0u} := (\theta _u(x_{01}),\ldots , \theta _u(x_{0p}))\).

Let \(\varvec{C}_u, \varvec{C}_{0u}\) be the a priori covariance matrices for \(\varvec{\theta }_u\) and \(\varvec{\theta }_{0u}\), respectively, while \(\varvec{R}_u\) be the covariance matrix of size \(m\times p\) for the two as given by the GP with covariance function \(\varvec{C}\). We have

$$\begin{aligned} \varvec{\theta }_{u} | \hbox {Data}, \varvec{M}&\sim N_m({\tilde{\varvec{\mu }}}_u, \tilde{\varvec{C}}_u), \hbox {where}\\ \tilde{\varvec{C}}_u^{-1}&= \varvec{C}_u^{-1} + \left( n_u/\tau _u^2\right) \mathbf {I}_{m},\\ \tilde{\varvec{C}}_u^{-1} {\tilde{\varvec{\mu }}}_u&= \varvec{C}_u^{-1}\varvec{\mu } + \left( 1/\tau _u^2\right) \sum \limits _{i=1}^{n_u} \varvec{Y}_{ui}. \end{aligned}$$

We have \(\varvec{\theta }_{0u} | \varvec{\theta }_{u}, \varvec{M} \sim N_{p}(\tilde{\varvec{m}}, \tilde{\varvec{S}})\) where

$$\begin{aligned} \tilde{\varvec{m}}&= \varvec{m}_{0u} + \varvec{R}_u^T \varvec{C}_{u}^{-1} (\varvec{\theta }_u- \varvec{\mu }),\\ \tilde{\varvec{S}}&= \varvec{C}_{0u} - \varvec{R}_u^T \varvec{C}_{u}^{-1}\varvec{R}_u, \hbox {where}\\ \varvec{m}_{0u}&= (\mu (x_{01}),\ldots ,\mu (x_{0p})). \end{aligned}$$

Due to conditional independence relation, \(\varvec{\theta }_{0u} \perp \hbox {Data} | \varvec{\theta }_u, \varvec{M}\), so we have

$$\begin{aligned}{}[\varvec{\theta }_{0u}| \hbox {Data},\varvec{M}] \propto \int [\varvec{\theta }_{0u} | \varvec{\theta }_{u}, \varvec{M}] \times [\varvec{\theta }_{u} | \hbox {Data}, \varvec{M}] \mathrm{d} \varvec{\theta }_{u}. \end{aligned}$$

Standard calculations yield

$$\begin{aligned} \varvec{\theta }_{0u}| \hbox {Data},\varvec{M}&\sim N_p\left( {\tilde{\varvec{\mu }}}_{0u},\tilde{\varvec{C}}_{0u}\right) , \hbox {where}\\ {\tilde{\varvec{\mu }}}_{0u}&= \varvec{m}_{0u} + \varvec{R}_u^T \varvec{C}_u^{-1} ({\tilde{\varvec{\mu }}}_u- \varvec{\mu }),\\ \tilde{\varvec{C}}_{0u}&= {\tilde{\varvec{S}}} + \varvec{R}_u^T \varvec{C}_u^{-1} \tilde{\varvec{C}}_{u} \varvec{C}_u^{-1} \varvec{R}_u. \end{aligned}$$

Finally, we need to sample \(\varvec{M} = (\varvec{\mu ,C, \tau , \sigma })\) conditionally on the data. This can be achieved via Gibbs sampling.

  1. 1.

    Conditional for \(\varvec{\mu }\): This is normal with covariance matrix and mean specified by

    $$\begin{aligned} \varvec{C}_\mu ^{-1}&=\sum \limits _{u}\left( \varvec{C}_u+ \tau _u^2\mathbf {I}_m\right) ^{-1} + \left( 1/\sigma _\mu ^2\right) \mathbf {I}_m,\\ \varvec{C}_\mu ^{-1}\varvec{\mu }_\mu&= \sum \limits _{u}\left( \varvec{C}_u+ \tau _u^2\mathbf {I}_m\right) ^{-1}\sum \limits _{i=1}^{n_u}\varvec{Y}_{ui}. \end{aligned}$$
  2. 2.

    Conditional for \(\varvec{\tau }_u\), for each \(u\): Endow \(\tau _u\) with \(\hbox {igamma}(a_{\tau _u},b_{\tau _u})\), then the conditional for \(\tau _u^2\) is also \(\hbox {igamma}\) with updated parameters \(b_{\tau _u} := a_{\tau _u} + m n_u/2\) and \(b_{\tau _u} := b_{\tau _u} + \sum _{i=1}^{n_u} \Vert \varvec{Y}_{ui} - \varvec{\theta }_u\Vert ^2/2\).

  3. 3.

    Conditional for \(\sigma _\mu \). Endow \(\sigma _\mu \) with \(\hbox {igamma}(a_\mu ,b_\mu )\) then the conditional for \(\sigma _\mu ^2\) is updated by \(a_{\mu } := a_{\mu } + m/2\) and \(b_{\mu } = b+1/2 \Vert \varvec{\mu }\Vert ^2\).

  4. 4.

    Conditional for \(\varvec{C}_u\), for each \(u\): \(\varvec{C}_u\) is parameterized by exponential form, so that \(C_u(x_1,x_2) = \sigma _{C_u}^2 S_u\) where \(S_u(x_1,x_2) = \exp -\phi _u (x_1-x_2)^2\). Endow \(\sigma _{C_u}^2\) with \(\hbox {igamma}(a_{C_{u}},b_{C_{u}})\), which is updated via \(a_{C_u} = a_{C_u} + m/2\) and \(b_{C_{u}} := b_{C_{u}} + \frac{1}{2}(\varvec{\theta }_u - \varvec{\mu })^T \varvec{S}^{-1}(\varvec{\theta }_u - \varvec{\mu })\). \(\phi _u\) is updated via a symmetric Metropolis update, with an acceptance rate equal to \(\min (1,\exp -\frac{1}{2\sigma _u^2} (\varvec{\theta }_u - \varvec{\mu })^T({\tilde{\varvec{S}}}_u^{-1} - \varvec{S}_u^{-1}) (\varvec{\theta }_u - \varvec{\mu }))\).

Appendix B: Properties of summary metrics

Suppose that \(\varvec{\theta }\) is distributed according to a Gaussian process on a closed domain \(B \subset \mathbb {R}\) with mean \(\varvec{\mu }\) and covariance function \(\varvec{C}\). \(\varvec{C}\) can be viewed as a positive semidefinite kernel. Moreover, assume that \(\int C(x_1,x_2) d x_1 d x_2 < \infty \), and consider the integral operator \(L_{\varvec{C}}: L_2(B) \rightarrow L_2(B)\) induced by the kernel \(\varvec{C}\):

$$\begin{aligned} L_Cf(x) = \int _B C(x,x')f(x') \mathrm{d}x'. \end{aligned}$$

This is a self-adjoint, positive and compact operator with a countable systems of non-negative eigenvalues \(\{\lambda _k\}_{k=1}^{\infty }\) and associated eigenfunctions \(\{\varvec{\psi }_k\}_{k=1}^{\infty }\) which form an orthonormal basis of \(L_2(B)\). By Mercer’s theorem, \(\varvec{C}\) admits the following decomposition: \(C(x,x') = \sum _{k=1}^{\infty } \lambda _k \varvec{\psi }_k(x)\varvec{\psi }_k(x')\). Here the series converges absolutely for each pair \(x,x'\) and uniformly in \(B\). For each \(k \in \mathbb {N}_+\), define

$$\begin{aligned} \eta _k = \int _B (\theta (x)-\mu (x)) \varvec{\psi }_k(x) \mathrm{d}x. \end{aligned}$$

By Karhunen–Loève’s theorem applied to Gaussian processes, \(\varvec{\theta }\) can be written as \(\varvec{\theta } = \varvec{\mu } + \sum _{k=1}^{\infty } \eta _k \varvec{\psi }_k\), where the convergence is almost sure and is uniform in \(x\). Moreover, the collection of coefficients \(\{\eta _k\}\) are independent mean-0 Gaussian variables with variance \(\mathrm{var}(\eta _k) = \lambda _k\), for any \(k \in \mathbb {N}_+\).

It is simple to obtain that \(m_1(\varvec{\theta })\) can be expressed in terms of a sum of chi-square and normal variables:

$$\begin{aligned} m_1(\varvec{\theta }) = \Vert \varvec{\mu }\Vert ^2 + \sum \limits _{k=1}^{\infty } \eta _k^2 + 2\sum \limits _{k=1}^{\infty } \eta _k\varvec{\mu }^T \varvec{\psi }_k. \end{aligned}$$

Due to the mutual independence of \(\eta _k\)’s, we obtain that

$$\begin{aligned} \mathbb {E}[m_1(\varvec{\theta })|\varvec{\mu },\varvec{C}] = \Vert \varvec{\mu }\Vert ^2 + \sum \limits _{k=1}^{\infty } \lambda _k = \Vert \varvec{\mu }\Vert ^2 + \int _{B}\varvec{C}(x,x) \mathrm{d}s. \end{aligned}$$

The variance takes the form

$$\begin{aligned} \mathrm{var} [m_1(\varvec{\theta })|\varvec{\mu },\varvec{C}]&= \mathbb {E}\left[ \left( ~\sum \limits _{k=1}^{\infty } \eta _k^2 + 2\sum \limits _{k=1}^{\infty }\eta _k \varvec{\mu }^T \varvec{\psi }_k - \sum \limits _{k=1}^{\infty }\lambda _k \right) ^2 \biggr |\varvec{\mu },\varvec{C}\right] \\&= \sum \limits _{k=1}^{\infty }2\lambda _k^2 + 4\lambda _k\left( \varvec{\mu }^T\varvec{\psi }_k\right) ^2, \end{aligned}$$

where we have used the fact that \(\mathbb {E}\eta _k = \mathbb {E}\eta _k^3 = 0\); \(\mathbb {E}\eta _k^2 = \lambda _k\), \(\mathbb {E}\eta _k^4 = 3\lambda _k^2\). Although the \(\lambda _k\) and \(\varvec{\psi }_k\) are determined directly from \(\varvec{C}\), except for some special cases closed forms are not available. In practice one might consider sampling for the variance instead.

Appendix C: Decomposition of variance and correlation

First, we study the relations among random measures in the model. \(G_0\) is a random measure that varies around \(H = \hbox {GP}(\varvec{\mu }, C)\), where the variation is governed by \(\gamma \). For each group \(u\), \(G_u\) is a random measure that varies around \(G_0\), where the variation is governed by \(\alpha \). For each level \(x\) and group \(u\), \(Q_{ux}\) varies around \(G_u\), where the variation is governed by \(\alpha _{u}\). Because \(G_0\sim \hbox {DP}(\gamma ,H)\), due to elementary properties of Dirichlet processes for any measurable set \(A\) of functions

$$\begin{aligned} \mathbb {E}[G_0(A)^2|H]&= \frac{1}{\gamma +1}H(A) + \frac{\gamma }{\gamma +1}H(A)^2,\\ \mathrm{var}[G_0(A)|H]&= \frac{1}{\gamma +1}(H(A) - H(A)^2). \end{aligned}$$

Turning to the random measures \(G_u\) for each \(u \in V\),

$$\begin{aligned} \mathrm{var}[G_u(A)|G_0] = \frac{1}{\alpha +1}(G_0(A) - G_0(A)^2). \end{aligned}$$

Marginalizing out \(G_0\), we have

$$\begin{aligned}&\mathrm{var}[G_u(A)|H]=\mathbb {E}[\mathrm{var}[G_u(A)|G_0]|H] + \mathrm{var}[\mathbb {E}[G_u(A)|G_0]|H]\nonumber \\&\quad =\frac{1}{\alpha +1}(H(A) - \mathbb {E}[G_0(A)^2|H]) + \mathrm{var}[G_0(A)|H]\nonumber \\&\quad =\left( \frac{1}{\gamma +1} + \frac{\gamma }{(\gamma +1)(\alpha +1)}\right) (H(A) - H(A)^2). \end{aligned}$$
(13)

Next, for the random measures \(Q_{ux}\) at each level \(x \in D\), for any measurable set \(A_{x}\), as before

$$\begin{aligned} \mathrm{var}[Q_{ux}(A_x)|G_u] = \frac{1}{\alpha _u+1} (G_u(A_x) - G_u(A_x)^2), \end{aligned}$$

so that

$$\begin{aligned} \mathrm{var}[Q_{ux}(A_x)|G_0]&= \mathbb {E}[\mathrm{var}[Q_{ux}(A_x)|G_u]|G_0] + \mathrm{var}[\mathbb {E}[Q_{ux}(A_x)|G_u]|G_0]\\&= \frac{1}{\alpha _{u}+1}\mathbb {E}[(G_u(A_x) - G_u(A_x)^2)|G_0] + \mathrm{var}[G_u(A_x)|G_0]\\&= \left( \frac{1}{\alpha +1} + \frac{\alpha }{(\alpha +1)(\alpha _u+1)}\right) (G_0(A_x)-G_0(A_x)^2). \end{aligned}$$

Marginalizing out \(G_0\), we have

$$\begin{aligned} \mathrm{var}[Q_{ux}(A_{x})|H]&= \mathbb {E}[\mathrm{var}[Q_{ux}(A_{x})|G_0]|H] + \mathrm{var}[\mathbb {E}[Q_{ux}(A_x)|G_0]|H]\nonumber \\&= \left( \frac{1}{\gamma +1} + \frac{\gamma }{(\gamma +1)(\alpha +1)} + \frac{\gamma \alpha }{(\gamma +1)(\alpha +1)(\alpha _u+1)} \right) \nonumber \\&\times (H(A_x)-H(A_x)^2). \end{aligned}$$
(14)

Next, let \(A\) and \(B\) are measurable sets with respect to observations at \(x_1\) and \(x_2\), respectively. For \(\varvec{\phi } \sim H\), let \(H_{x_1}(A) = P(\phi (x_1) \in A|H)\) and \(H_{x_1,x_2}(A,B) = P(\phi (x_1) \in A; \phi (x_2) \in B|H)\). Then similar calculation yields, for measure \(G_0\)

$$\begin{aligned} \mathrm{cov}[G_0(A),G_0(B)|H] = \frac{1}{\gamma +1}(H_{x_1,x_2}(A,B) - H_{x_1}(A)H_{x_2}(B)). \end{aligned}$$

For measure \(G_u\), we have

$$\begin{aligned} \mathrm{cov}(G_u(A),G_u(B)|H)&= \left( \frac{1}{\gamma +1}+ \frac{\gamma }{(\gamma +1)(\alpha +1)}\right) \\&\quad \times (H_{x_1,x_2}(A,B) - H_{x_1}(A)H_{x_2}(B)). \end{aligned}$$

Similarly, for \(Q_{ux}\):

$$\begin{aligned} \mathrm{cov}(Q_{ux}(A),Q_{ux}(B)|H)\!&= \!\left( \frac{1}{\gamma \!+\!1} \!+\! \frac{\gamma }{(\gamma +1)(\alpha \!+\!1)} \!+\! \frac{\gamma \alpha }{(\gamma \!+\!1)(\alpha \!+\!1)(\alpha _u\!+\!1)} \right) \\&\quad \times (H_{x_1,x_2}(A,B) - H_{x_1}(A)H_{x_2}(B)). \end{aligned}$$

In all expressions above, a priori, the concentration parameters regulate the fraction of variance or correlation that are passed from one level in the Bayesian hierarchy to the next, starting from the base measure \(H\), which regulates the dependence with respect to covariate \(x\). Last, we note that similar calculations can be carried out between populations. We omit the details.

Appendix D: Posterior computation for the model in (9)

We recall and introduce key notations: \(\varvec{\phi }_k\) is a random draw from \(H\), \(\varvec{\psi }_t\) a random draw from \(G_0\), \(\varvec{\varphi }_{ur}\) a random draw from \(G_u\). Finally, \(\theta _{ui}(x)\) is a random draw from \(Q_{ux}\).

Let \(k_t\) denote the index of the \(\varvec{\phi }_k\) associated with the functional atom \(\varvec{\psi }_t\), i.e., \(\varvec{\psi }_t = \varvec{\phi }_{k_t}\). Let \(t_{ur}\) denote the index of the \(\varvec{\psi }_t\) associated with the functional atom \(\varvec{\varphi }_{ur}\) in group \(u\), i.e., \(\varvec{\varphi }_{ur} = \varvec{\psi }_{t_{ur}}\). Let \(r_{ui}^{x}\) denote the index of the \(\varvec{\varphi }_{ur}(x)\) associated with the atom \(\theta _{ui}(x)\), i.e., \(\theta _{ui}(x) = \varvec{\varphi }_{ur_{ui}^{x}}(x)\). The local and functional atoms are related by \(\theta _{ui}(x) = \varvec{\varphi }_{ur_{ui}^{x}}(x) = \varvec{\psi }_{t_{ur_{ui}^{x}}}(x) = \varvec{\phi }_{k_{t_{ur_{ui}^{x}}}}(x)\).

Recall that a priori \(G_0 \sim \hbox {DP}(\gamma ,H)\). Due to a standard property of a Dirichlet process, conditioning on the global factors \(\varvec{\phi }_k\)’s and the index vector \(\varvec{k}\), the posterior distribution of \(G_0\) is distributed according to a DP: \([G_0 | \varvec{k}, \varvec{\phi }_{1}, \ldots , \varvec{\phi }_K] \sim \hbox {DP}(\gamma + q_{\cdot }, \frac{\gamma H + \sum _{k=1}^{K} q_k\delta _{\varvec{\phi }_k}}{\gamma + q_{\cdot }})\), where \(q_k = \#\{t: k_t = k\}\) denotes the number of \(\varvec{\psi }_t\)’s associating with \(\varvec{\phi }_k\), and \(q_{\cdot } = \sum _{k=1}^{K} q_k\). This implies an explicit representation for \(G_0\) as follows:

$$\begin{aligned} G_0&= \sum \limits _{k=1}^{K}\beta _k \delta _{\varvec{\phi }_k} + \beta _{\mathrm{new}} G_0^{\mathrm{new}},\nonumber \\ \varvec{\beta }&= (\beta _1,\ldots ,\beta _K, \beta _{\mathrm{new}}) \sim \hbox {Dir}(q_1,\ldots ,q_K,\gamma ),\nonumber \\ G_0^{\mathrm{new}}&\sim \hbox {DP}(\gamma ,H). \end{aligned}$$
(15)

Similarly, conditionally on \(G_0\), the random distributions \(G_u\) are independent across the group indices \(u\). In particular, given \(G_0\), \(\varvec{k}\), \(\varvec{t}_u\) and the \(\varvec{\phi }_k\)’s, the posterior of \(G_u\) is distributed as \([G_u | G_0, \varvec{k}, \varvec{t}, (\varvec{\phi }_k)_{k=1}^{K}] \sim \hbox {DP}(\alpha _0 + m_{u\cdot }, \frac{\alpha _0 G_0 + \sum _{k=1}^{K}m_{uk}\delta _{\varvec{\phi }_k}}{\alpha _0 + m_{u\cdot }})\), where \(m_{uk} = \#\{r: k_{t_{ur}} = k\}\), the number of \(\varphi _{ur}\) associated with \(\varvec{\phi }_k\), and \(m_{u\cdot } = \sum _{k=1}^{K} m_{uk}\). This implies the following representation for \(G_u\): \(G_u = \sum _{k=1}^{K}\pi _{uk}\delta _{\varvec{\phi }_k} + \pi _{u\mathrm{new}} G_u^{\mathrm{new}}\), where \(G_u^{\mathrm{new}} \sim \hbox {DP}(\alpha _0\beta _{\mathrm{new}}, G_0^{\mathrm{new}})\) and

$$\begin{aligned} \varvec{\pi }_u = (\pi _{u1},\ldots ,\pi _{uK},\pi _{u\mathrm{new}}) \sim \hbox {Dir}(\alpha _0\beta _1+m_{u1},\ldots ,\alpha _0\beta _K+m_{uk},\alpha _0\beta _{\mathrm{new}}).\nonumber \\ \end{aligned}$$
(16)

Once more, conditionally on \(G_u\), the random distributions \(Q_{ux}\) are independent across levels \(x\). In particular, given \(G_u\), \(\varvec{k}\), \(\varvec{t}_u\), \(\varvec{r}_u^x\), and the \(\varvec{\phi }_k\)’s, the posterior of \(Q_{ux}\) is distributed as

$$\begin{aligned} \left[ Q_{ux} | G_u, \varvec{k}, \varvec{t}_u, \varvec{r}_u^x, (\varvec{\phi }_k)_{k=1}^{K} \right] \sim \hbox {DP}\left( \alpha _u + n_{ux\cdot }, \frac{\alpha _u G_{ux} + \sum _{k=1}^{K}n_{uxk}\delta _{\varvec{\phi }_{k}(x)}}{\alpha _u + n_{ux\cdot }}\right) , \end{aligned}$$

where \(n_{uxk} = \#\{i: k_{t_{ur_{ui}^{x}}} = k\}\), the number of \(\theta _{ui}(x)\) associated with \(\varvec{\phi }_k(x)\), and \(n_{ux\cdot } = \sum _{k=1}^{K} n_{uxk}\). This implies the following representation for \(Q_{ux}\):

$$\begin{aligned} Q_{ux}&= \sum \limits _{k=1}^{K} \omega _{uxk} \delta _{\varvec{\phi }_k(x)} + \omega _{ux\mathrm{new}}Q_u^{x\mathrm{new}},\nonumber \\ \varvec{\omega }_{ux}&= (\omega _{ux1},\ldots ,\omega _{uxK},\omega _{ux\mathrm{new}})\nonumber \\&\sim \mathrm{Dir}\left( \alpha _u\pi _{u1} + n_{ux1},\ldots , \alpha _u\pi _{uK} + n_{uxK}, \alpha _u \pi _{u\mathrm{new}}\right) ,\nonumber \\ Q_u^{x\mathrm{new}}&\sim \hbox {DP}\left( \alpha _u\pi _{u\mathrm{new}}, G_u^{x\mathrm{new}}\right) . \end{aligned}$$
(17)

The above characterization suggests a straightforward Gibbs sampling algorithm by constructing a Markov chain for \((\varvec{\phi }_k)_{k=1}^{K}, \varvec{k}, \varvec{t}, \varvec{r})\). To simplify the implementation by avoiding the book-keeping steps of the index variables, we will consider a modified block Gibbs sampling algorithm by constructing a Markov chain for the count variables (e.g., \(\varvec{q},\varvec{m},\varvec{n}\)) instead. We will still need the index variable \(z_{uxi}\), which denotes the index of the global atom \(\varvec{\phi }_k\) that local atom \(\theta _{ui}(x)\) is associated with, i.e., \(z_{uxi} = k_{t_{ur_{ui}^{x}}}\). Note that the likelihood of the data involves only the \(z_{uxi}\) variables, and that \(\varvec{n}_{ux}\) can be calculated directly in terms of \(z_{uxi}\)’s:

$$\begin{aligned} n_{uxk} = \sum \limits _{i} \mathbb {I}(z_{uxi} = k). \end{aligned}$$

We proceed to describe a block Gibbs sampler by considering a Markov chain for \((\varvec{\phi },\varvec{q},\varvec{m},\varvec{n},\varvec{z},\varvec{\beta }, \varvec{\pi },\varvec{\omega })\).

Sampling \(\varvec{\beta },\varvec{\pi }, \varvec{\omega }\). Conditional probabilities: \([\varvec{\beta }|\varvec{q},\gamma ]\times \prod _{u}[\varvec{\pi }_u|\varvec{m}_u,\varvec{\beta },\alpha _0] \times \prod _{u}\prod _{x}[\varvec{\omega }_{ux}|\varvec{n}_{ux},\varvec{\pi }_{u},\alpha _u]\) are given by Eqs. (1517).

Sampling of \(\varvec{z}\). Note that a priori, \(z_{uxi}|\varvec{\omega }_{ux} \sim \varvec{\omega }_{ux}\). Let \(n_{uxk}^{-uxi}\) denote the number of data items in the group \(u\) and level \(x\), except \(y_{ui}(x)\), associating with the mixture component \(k\). Then,

$$\begin{aligned}&p\left( z_{uxi} = k|\varvec{z}^{-uxi},\varvec{\omega }, \varvec{\phi }_k, \hbox {Data}\right) \\&\quad = {\left\{ \begin{array}{ll} \left( \alpha _u \pi _{uk} + n_{uxk}^{-uxi}\right) F\left( y_{ui}(x)|\varvec{\phi }_k(x)\right) &{} \;\;\hbox {if}\; k \;\hbox {is previously used}\\ \alpha _u\pi _{u\mathrm{new}}f_{uxk^{\mathrm{new}}}^{y_{ui}(x)}(y_{ui}(x)) &{} \;\;\mathrm{if}\;\; k = k^{\mathrm{new}}, \end{array}\right. } \end{aligned}$$

where \(f_{uxk^{\mathrm{new}}}^{y_{ui}(x)}(y_{ui}(x)) = \int F(y_{ui}(x)|\varvec{\phi }(x)) d H(\varvec{\phi }(x))\) is the prior density of \(y_{ui}(x)\).

Sampling of \(\varvec{m}_u\). Recall that \(m_{uk}\) is the number of functional atoms \(\varvec{\varphi }_{ur}\) associated with \(\varvec{\phi }_k\) within each group \(u\). This set of functional atoms \(\varphi _{ur}\)’s can be subdivided into disjoint subsets associated with levels \(x\in D\) when the functional atoms \(\varphi _{ur}\) are first generated. Let \(m_{uxk}\) be the number of such functional atoms corresponding to the level \(x\). To be precise,

$$\begin{aligned} m_{uxk}&= \#\left\{ r_{ui}^{x}: z_{uxi} = k_{t_{ur_{ui}^x}} = k\; \hbox {for some}\; i\right\} ,\\ m_{uk}&= \sum \limits _{x\in D} m_{uxk}. \end{aligned}$$

\(m_{uxk}\) corresponds to the number of partitions among the \(n_{uxk}\) atoms \(\theta _{ui}(x)\) such that \(z_{uxi} = k\). To obtain the distribution of \(m_{uxk}\), consider the distribution of \(r_{ui}^{x}\) conditionally on \(G_u\) (i.e., \(\varvec{\pi }_u\), \(\varvec{\phi }_k\)’s). Note that given \(G_u\), the \(Q_{ux}\) are independent across \(x\)’s. For each atom \(\theta _{ui}(x)\), the probability of being assigned to an existing atom \(\varvec{\varphi }_{ur}(x)\) such that \(k_{t_{ur}} = k\) is

$$\begin{aligned} p\left( r_{ui}^{x} = r| k_{t_{ur}} = k, \varvec{r}^{-uxi},\varvec{\pi }_u\right) \propto n_{ux\cdot r}^{-uxi} \end{aligned}$$

while the probability of being assigned to a new atom \(\varvec{\varphi }_{ur^{\mathrm{new}}}(x)\) is

$$\begin{aligned} p\left( r_{ui}^{x} = r^{\mathrm{new}}|k_{t_{ur^{\mathrm{new}}}} = k, \varvec{r}^{-uxi},\varvec{\pi }_u\right) \propto \alpha _u \pi _{uk}, \end{aligned}$$

where \(n_{ux\cdot r}^{-uxi} := \#\{i': r_{ui'}^{x} = r; uxi \ne uxi' \}\), the number of data items at group \(u\) and level \(x\) except \(y_{ui}(x)\) that are associated with \(\varvec{\varphi }_{ur}\). This implies that \(m_{uxk}\) is the number of partitions that arise in a population of \(n_{uxk}\) data items, whose distribution is distributed according to a Dirichlet process with concentration parameter \(\alpha _u\pi _{uk}\). It was shown by Antoniak (1974) that the distribution of \(m_{uxk}\) has the form

$$\begin{aligned} p\left( m_{uxk}=m|\varvec{z},\varvec{m}^{-uxk},\varvec{\pi }_u\right) = \frac{\Gamma (\alpha _u\pi _{uk})}{\Gamma (\alpha _u\pi _{uk} + n_{uxk})} s(n_{uxk},m)(\alpha _u\pi _{uk})^m, \end{aligned}$$

where \(s(n,m)\) are unsigned Stirling number of the first kind.

Sampling \(\varvec{q}\). The conditional distribution of \(\varvec{q}\) can be obtained in a similar manner as \(\varvec{m}\). It can be shown that \(q_{k} = \sum _{u\in V} q_{uk}\) where \(q_{uk} = \#\{t: k_{t_{ur}} = k \;\hbox {for some}\; r\}\). Moreover, \(q_{uk}\) is the number of partitions that arise in a population of \(m_{uk}\) atoms, whose distributed according to a Dirichlet process with concentration parameter \(\alpha _0\beta _k\):

$$\begin{aligned} p\left( q_{uk} = q|\varvec{z},\varvec{q}^{-uk},\varvec{\beta }\right) = \frac{\Gamma (\alpha _0\beta _k)}{\Gamma (\alpha _0\beta _k+m_{uk})} s(m_{uk},q)(\alpha _0\beta _k)^q. \end{aligned}$$

Sampling \(\varvec{\phi }\). The conditional distribution for \(\varvec{\phi }\) can be obtained easily. Suppose that the prior distribution \(H\) for \(\varvec{\phi }_k\) is given by a mean function \(\varvec{\mu }\) and covariance function \(\varvec{C}\), which is reduced to a covariance matrix \(\varvec{C}_k\) when restricted to a finite number of covariate values for \(x\). Then the posterior distribution for \(\varvec{\phi }_k\) is also Gaussian with mean and covariance expressions given as follows:

$$\begin{aligned} \tilde{\varvec{C}}_k^{-1}&= \varvec{C}_k^{-1} + \sum \limits _{u\in V}\hbox {diag} \left( \cdots , \sum \limits _{x} n_{uxk},\cdots \right) /\tau _u^2,\\ \tilde{\varvec{C}}_k^{-1}\tilde{\varvec{\mu }}_k&= \varvec{C}_k^{-1} \varvec{\mu }_k + \left( \cdots , \sum \limits _{u\in V} \sum \limits _{i=1}^{n_u} Y_{ui}(\cdot )\mathbb {I}(z_{u\cdot i} = k)/\tau _u^2, \cdots \right) ^{T}. \end{aligned}$$

About this article

Cite this article

Nguyen, X., Gelfand, A.E. Bayesian nonparametric modeling for functional analysis of variance. Ann Inst Stat Math 66, 495–526 (2014). https://doi.org/10.1007/s10463-013-0436-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-013-0436-7

Keywords

Navigation