Abstract
By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one faces the problem of posterior simulation over a high-dimensional space of semiparametric models. To address this problem, we propose a novel Metropolis-Hastings algorithm for moving between models, with a nested generalized collapsed Gibbs sampler for updating the model parameters. Focusing on Dirichlet process priors for the distribution of the basis coefficients in multivariate linear spline models, we apply the approach to the problem of clustering of hormone trajectories. This approach allows the number of clusters and the shape of the trajectories within each cluster to be unknown. The methodology can be applied broadly to allow uncertainty in variable selection in semiparametric Bayes hierarchical models.
Similar content being viewed by others
References
Antoniak, C. 1974. Mixtures of Dirichlet processes with application to nonparametric problems. Annals of Statistics 2:1152–1174.
Baird, D., A. Wilcox, C. Weinberg, F. Kamel, D. McConnaughey, P. Musey, and D. Collins. 1997. Preimplantation hormonal differences between the conception and non-conception menstrual cycles of 32 normal women. Human Reproduction 12:2607–2613.
Basu, S., and S. Chib. 2003. Marginal likelihood and Bayes factors for Dirichlet process mixture models. Journal of the American Statistical Association 98:224–235.
Bigelow, J., and D. Dunson. 2007. Bayesian adaptive regression splines for hierarchical data. Biometrics 63:724–732.
Biller, C. 2000. Adaptive Bayesian regression splines in semiparametric generalized linear models. Journal of Computational and Graphical Statistics 9:122–140.
Blackwell, D., and J. MacQueen. 1973. Ferguson distributions via pólya urn schemes. Annals of Statistics 1:353–355.
Brown, P., M. Kenward, and E. Bassett. 2001. Bayesian discrimination with longitudinal data. Biostatistics 2:417–432.
Brumback, B., and J. Rice. 1998. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93:961–976.
Bush, C., and S. MacEachern. 1996. A semiparametric Bayesian model for randomised block designs. Biometrika 83:275–285.
Cai, B., and D. Dunson. 2005. Variable selection in nonparametric random effects models. Technical report, Department of Statistical Science, Duke University.
Chung, Y., and D. Dunson. 2009. Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association 488:1646–1660.
de la Cruz, R., and F. Quintana. 2005. A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal β-hCG profiles. http://www.mat.puc.cl/~quintana/trbp.pdf .
Denison, D., C. Holmes, B. Mallick, and A. Smith. 2002. Bayesian methods for nonlinear classification and regression. Chichester, West Sussex, England: John Wiley and Sons.
DiMatteo, I., C. Genovese, and R. Kass. 2001. Bayesian curve-fitting with free-knot splines. Biometrika 88:1055–1073.
Dunson, D. 2009a. Nonparametric Bayes kernel-based priors for functional data analysis. Statistica Sinica 19:611–629.
Dunson, D. 2009b. Nonparametric Bayes local partition models for random effects. Biometrika 96:249–262.
Dunson, D.B., A.H. Herring, and S.A. Mulheri-Engel. 2008. Bayesian selection and clustering of polymorphisms in functionally related genes. Journal of the American Statistical Association 103:534–546.
Escobar, M. 1994. Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89:268–277.
Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90:577–588.
Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Annals of Statistics 1:209–230.
Ferguson, T. 1974. Prior distributions on spaces of probability measures. Annals of Statistics 2:615–629.
Geisser, S., and W. Eddy. 1979. A predictive approach to model selection. Journal of the American Statistical Association 74:153–160.
George, E., and R. McCulloch. 1997. Approaches for Bayesian variable selection. Statistica Sinica 7:339–373.
Green, P. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732.
Griffin, J., and M. Steel. 2004. Semiparametric Bayesian inference for stochastic frontier models. Journal of Econometrics 123:121–152.
Hansen, B., and J. Pitman. 2000. Prediction rules for exchangeable sequences related to species sampling. Statistics & Probability Letters 46:251–256.
Hansen, M., and C. Kooperberg. 2002. Spline adaptation in extended linear models. Statistical Science 17:2–20.
Hastings, W. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109.
Heard, N., C. Holmes, and D. Stephens. 2006. A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101:18–29.
Holmes, C., and B. Mallick. 2000. Bayesian wavelet networks for nonparametric regression. IEEE Transactions on Neural Networks 11:27–35.
Holmes, C., and B. Mallick. 2001. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society. Series B 63:3–17.
Holmes, C., and B. Mallick. 2003. Generalized nonlinear modeling with multivariate free-knot regression splines. Journal of the American Statistical Association 98:352–368.
Holmes, C.C., D. Denison, and B. Mallick. 2002. Accounting for model uncertainty in seemingly unrelated regressions. Journal of Computational and Graphical Statistics 11:533–551.
Ishwaran, H., and L. James. 2001. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96:161–173.
Ishwaran, H., and L. James. 2003a. Generalized weighted chinese restaurant processes for species sampling mixture models. Statistica Sinica 13:1211–1235.
Ishwaran, H., and L. James. 2003b. Some further developments for stick-breaking priors: finite and infinite clustering and classification. Sankhyā Series A 65:577–592.
Ishwaran, H., and G. Takahara. 2002. Independent and identically distributed monte carlo algorithms for semiparametric linear mixed models. Journal of the American Statistical Association 97:1154–1166.
Ishwaran, H., and M. Zarepour. 2002a. Dirichlet prior sieves in finite normal mixtures. Statistica Sinica 12:941–963.
Ishwaran, H., and M. Zarepour. 2002b. Exact and approximate sum-representations for the Dirichlet process. Canadian Journal of Statistics 30:269–283.
Jain, S., and R. Neal. 2004. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics 13:158–182.
James, G., and T. Hastie. 2001. Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society. Series B 63:533–550.
James, G., and C. Sugar. 2003. Clustering for sparsely sampled functional data. Journal of the American Statistical Association 98:397–408.
Kass, R., V. Ventura, and C. Cai. 2003. Statistical smoothing of neuronal data. NETWORK: Computation in Neural Systems 14:5–15.
Ke, C., and Y. Wang. 2001. Semiparametric nonlinear mixed-effects models and their application (with discussion). Journal of the American Statistical Association 96:1272–1298.
Kim, S., M. Tadesse, and M. Vannucci. 2006. Variable selection in clustering via Dirichlet process mixture models. Biometrika 93:877–893.
Kleinman, K., and J. Ibrahim. 1998. A semi-parametric Bayesian approach to the random effects model. Biometrics 54:921–938.
Kottas, A., M.D. Branco, and A.E. Gelfand. 2002. A nonparametric Bayesian modeling approach for cytogenetic dosimetry. Biometrics 58:593–600.
Laws, D.J. and A. O’Hagan. 2002. A hierarchical Bayes model for multilocation auditing. Journal of the Royal Statistical Society. Series D 51:431–450.
Lindstrom, M. 2002. Bayesian estimation of free-knot splines using reversible jumps. Computational Statistics & Data Analysis 41:255–269.
Ma, P., C. Castillo-Davis, W. Zhong, and J. Liu. 2005. Curve clustering to discover patterns in time-course gene expression data. Working paper available at http://ilabs.inquiry.uiuc.edu/ilab/fallbiosem/documents/2380/home/ma-et-al-2005.pdf.
MacEachern, S. 1994. Estimating normal means with a conjugate style Dirichlet process prior. Communications in Statistics- Simulation and Computation 23:727–741.
MacEachern, S., and P. Müller. 1998. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics 7:223–238.
Marshall, G., and A. Barón. 2000. Linear discriminant models for unbalanced longitudinal data. Statistics in Medicine 19:1961–1981.
Medvedovic, M., and S. Sivaganesan. 2002. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206.
Morris, J., and R. Carroll. 2006. Wavelet-based functional mixed models. Journal of the Royal Statistical Society. Series B 68:179–199.
Mukhopadhyay, S., and A. Gelfand. 1997. Dirichlet process mixed generalized linear models. Journal of the American Statistical Association 92:633–639.
Müller, P., A. Erkanli, and M. West. 1996. Bayesian curve fitting using multivariate normal mixtures. Biometrika 83:67–79.
Muthén, B., and K. Shedden. 1999. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55:463–469.
Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9:249–265.
Papaspiliopoulos, O., and G. Roberts. 2008. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95:169–186.
Petrone, S., M. Guindani, and A. Gelfand. 2009. Hybrid Dirichlet mixture models for functional data. Journal of the Royal Statistical Society. Series B 71:755–782.
Pitman, J. 1995. Exchangeable and partially exchangeable random partitions. Probability Theory Related Fields 102:145–158.
Pitman, J. 1996. Some developments of the Blackwell-Macqueen urn scheme. In Statistics, probability and game theory, eds. T. Ferguson, L. Shapley, and J. MacQueen, pp. 245–267. IMS Lecture Notes-Monograph Series.
Pitman, J., and M. Yor. 1997. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability 25:855–900.
Ray, S., and B. Mallick. 2006. Functional clustering by Bayesian wavelet methods. Journal of the Royal Statistical Society. Series B 68:305–332.
Rice, J., and C. Wu. 2001. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57:253–259.
Rodriguez, A., D. Dunson, and A. Gelfand. 2008. The nested Dirichlet process (with discussion). Journal of the American Statistical Association 103:1131–1154.
Rodriguez, A., D. Dunson and A. Gelfand. 2009. Bayesian nonparametric functional data analysis through density estimation. Biometrika 96:149–162.
Smith, M., and R. Kohn. 1996. Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75:317–343.
Spiegelhalter, D., N. Best, B. Carlin, and A. van der Linde. 2002. Measures of model complexity and fit. Journal of the Royal Statistical Society. Series B 64:583–639.
van Zonneveld, P., G. Scheffer, F. Broekmans, M. Blankenstein, F. de Jong, C. Looman, J. Habbema, and E. te Velde. 2003. Do cycle disturbances explain the age-related decline of female fertility? Cycle characteristics of women aged over 40 years compared with a reference population of young women. Human Reproduction 18:495–501.
Wilcox, A., C. Weinberg, J. O’Connor, D. Baird, J. Schlatterer, R. Canfield, E. Armstrong, and B. Nisula. 1988. Incidence of early loss of pregnancy. New England Journal of Medicine 319:189–194.
Wood, S., W. Jiang, and M. Tanner. 2002. Bayesian mixture of splines for spatially adaptive nonparametric regression. Biometrika, 89:513–528.
Xue, Y., X. Liao, L. Carin, and B. Krishnapuram. 2007. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8:35–63.
Acknowledgements
This research was supported by the Intramural Research Program of the NIH, and NIEHS. We would like to thank Allen Wilcox, Donna Baird and Clare Weinberg for generously providing the data and for their helpful comments on the approach. Thanks also to the Associate Editor and reviewer for their thoughtful and helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Crandell, J.L., Dunson, D.B. Posterior simulation across nonparametric models for functional clustering. Sankhya B 73, 42–61 (2011). https://doi.org/10.1007/s13571-011-0014-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-011-0014-z