Skip to main content
Log in

Posterior simulation across nonparametric models for functional clustering

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one faces the problem of posterior simulation over a high-dimensional space of semiparametric models. To address this problem, we propose a novel Metropolis-Hastings algorithm for moving between models, with a nested generalized collapsed Gibbs sampler for updating the model parameters. Focusing on Dirichlet process priors for the distribution of the basis coefficients in multivariate linear spline models, we apply the approach to the problem of clustering of hormone trajectories. This approach allows the number of clusters and the shape of the trajectories within each cluster to be unknown. The methodology can be applied broadly to allow uncertainty in variable selection in semiparametric Bayes hierarchical models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Antoniak, C. 1974. Mixtures of Dirichlet processes with application to nonparametric problems. Annals of Statistics 2:1152–1174.

    Article  MathSciNet  MATH  Google Scholar 

  • Baird, D., A. Wilcox, C. Weinberg, F. Kamel, D. McConnaughey, P. Musey, and D. Collins. 1997. Preimplantation hormonal differences between the conception and non-conception menstrual cycles of 32 normal women. Human Reproduction 12:2607–2613.

    Article  Google Scholar 

  • Basu, S., and S. Chib. 2003. Marginal likelihood and Bayes factors for Dirichlet process mixture models. Journal of the American Statistical Association 98:224–235.

    Article  MathSciNet  MATH  Google Scholar 

  • Bigelow, J., and D. Dunson. 2007. Bayesian adaptive regression splines for hierarchical data. Biometrics 63:724–732.

    Article  MathSciNet  MATH  Google Scholar 

  • Biller, C. 2000. Adaptive Bayesian regression splines in semiparametric generalized linear models. Journal of Computational and Graphical Statistics 9:122–140.

    Article  MathSciNet  Google Scholar 

  • Blackwell, D., and J. MacQueen. 1973. Ferguson distributions via pólya urn schemes. Annals of Statistics 1:353–355.

    Article  MathSciNet  MATH  Google Scholar 

  • Brown, P., M. Kenward, and E. Bassett. 2001. Bayesian discrimination with longitudinal data. Biostatistics 2:417–432.

    Article  Google Scholar 

  • Brumback, B., and J. Rice. 1998. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93:961–976.

    Article  MathSciNet  MATH  Google Scholar 

  • Bush, C., and S. MacEachern. 1996. A semiparametric Bayesian model for randomised block designs. Biometrika 83:275–285.

    Article  MATH  Google Scholar 

  • Cai, B., and D. Dunson. 2005. Variable selection in nonparametric random effects models. Technical report, Department of Statistical Science, Duke University.

  • Chung, Y., and D. Dunson. 2009. Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association 488:1646–1660.

    Article  Google Scholar 

  • de la Cruz, R., and F. Quintana. 2005. A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal β-hCG profiles. http://www.mat.puc.cl/~quintana/trbp.pdf .

  • Denison, D., C. Holmes, B. Mallick, and A. Smith. 2002. Bayesian methods for nonlinear classification and regression. Chichester, West Sussex, England: John Wiley and Sons.

    MATH  Google Scholar 

  • DiMatteo, I., C. Genovese, and R. Kass. 2001. Bayesian curve-fitting with free-knot splines. Biometrika 88:1055–1073.

    Article  MathSciNet  MATH  Google Scholar 

  • Dunson, D. 2009a. Nonparametric Bayes kernel-based priors for functional data analysis. Statistica Sinica 19:611–629.

    MathSciNet  MATH  Google Scholar 

  • Dunson, D. 2009b. Nonparametric Bayes local partition models for random effects. Biometrika 96:249–262.

    Article  MathSciNet  MATH  Google Scholar 

  • Dunson, D.B., A.H. Herring, and S.A. Mulheri-Engel. 2008. Bayesian selection and clustering of polymorphisms in functionally related genes. Journal of the American Statistical Association 103:534–546.

    Article  MathSciNet  MATH  Google Scholar 

  • Escobar, M. 1994. Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89:268–277.

    Article  MathSciNet  MATH  Google Scholar 

  • Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90:577–588.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Annals of Statistics 1:209–230.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferguson, T. 1974. Prior distributions on spaces of probability measures. Annals of Statistics 2:615–629.

    Article  MathSciNet  MATH  Google Scholar 

  • Geisser, S., and W. Eddy. 1979. A predictive approach to model selection. Journal of the American Statistical Association 74:153–160.

    Article  MathSciNet  MATH  Google Scholar 

  • George, E., and R. McCulloch. 1997. Approaches for Bayesian variable selection. Statistica Sinica 7:339–373.

    MATH  Google Scholar 

  • Green, P. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732.

    Article  MathSciNet  MATH  Google Scholar 

  • Griffin, J., and M. Steel. 2004. Semiparametric Bayesian inference for stochastic frontier models. Journal of Econometrics 123:121–152.

    Article  MathSciNet  Google Scholar 

  • Hansen, B., and J. Pitman. 2000. Prediction rules for exchangeable sequences related to species sampling. Statistics & Probability Letters 46:251–256.

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen, M., and C. Kooperberg. 2002. Spline adaptation in extended linear models. Statistical Science 17:2–20.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastings, W. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109.

    Article  MATH  Google Scholar 

  • Heard, N., C. Holmes, and D. Stephens. 2006. A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101:18–29.

    Article  MathSciNet  MATH  Google Scholar 

  • Holmes, C., and B. Mallick. 2000. Bayesian wavelet networks for nonparametric regression. IEEE Transactions on Neural Networks 11:27–35.

    Article  Google Scholar 

  • Holmes, C., and B. Mallick. 2001. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society. Series B 63:3–17.

    Article  MathSciNet  MATH  Google Scholar 

  • Holmes, C., and B. Mallick. 2003. Generalized nonlinear modeling with multivariate free-knot regression splines. Journal of the American Statistical Association 98:352–368.

    Article  MathSciNet  MATH  Google Scholar 

  • Holmes, C.C., D. Denison, and B. Mallick. 2002. Accounting for model uncertainty in seemingly unrelated regressions. Journal of Computational and Graphical Statistics 11:533–551.

    Article  MathSciNet  Google Scholar 

  • Ishwaran, H., and L. James. 2001. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96:161–173.

    Article  MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., and L. James. 2003a. Generalized weighted chinese restaurant processes for species sampling mixture models. Statistica Sinica 13:1211–1235.

    MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., and L. James. 2003b. Some further developments for stick-breaking priors: finite and infinite clustering and classification. Sankhyā Series A 65:577–592.

    MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., and G. Takahara. 2002. Independent and identically distributed monte carlo algorithms for semiparametric linear mixed models. Journal of the American Statistical Association 97:1154–1166.

    Article  MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., and M. Zarepour. 2002a. Dirichlet prior sieves in finite normal mixtures. Statistica Sinica 12:941–963.

    MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., and M. Zarepour. 2002b. Exact and approximate sum-representations for the Dirichlet process. Canadian Journal of Statistics 30:269–283.

    Article  MathSciNet  MATH  Google Scholar 

  • Jain, S., and R. Neal. 2004. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics 13:158–182.

    Article  MathSciNet  Google Scholar 

  • James, G., and T. Hastie. 2001. Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society. Series B 63:533–550.

    Article  MathSciNet  MATH  Google Scholar 

  • James, G., and C. Sugar. 2003. Clustering for sparsely sampled functional data. Journal of the American Statistical Association 98:397–408.

    Article  MathSciNet  MATH  Google Scholar 

  • Kass, R., V. Ventura, and C. Cai. 2003. Statistical smoothing of neuronal data. NETWORK: Computation in Neural Systems 14:5–15.

    Article  Google Scholar 

  • Ke, C., and Y. Wang. 2001. Semiparametric nonlinear mixed-effects models and their application (with discussion). Journal of the American Statistical Association 96:1272–1298.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, S., M. Tadesse, and M. Vannucci. 2006. Variable selection in clustering via Dirichlet process mixture models. Biometrika 93:877–893.

    Article  MathSciNet  Google Scholar 

  • Kleinman, K., and J. Ibrahim. 1998. A semi-parametric Bayesian approach to the random effects model. Biometrics 54:921–938.

    Article  MATH  Google Scholar 

  • Kottas, A., M.D. Branco, and A.E. Gelfand. 2002. A nonparametric Bayesian modeling approach for cytogenetic dosimetry. Biometrics 58:593–600.

    Article  MathSciNet  MATH  Google Scholar 

  • Laws, D.J. and A. O’Hagan. 2002. A hierarchical Bayes model for multilocation auditing. Journal of the Royal Statistical Society. Series D 51:431–450.

    Article  MathSciNet  Google Scholar 

  • Lindstrom, M. 2002. Bayesian estimation of free-knot splines using reversible jumps. Computational Statistics & Data Analysis 41:255–269.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, P., C. Castillo-Davis, W. Zhong, and J. Liu. 2005. Curve clustering to discover patterns in time-course gene expression data. Working paper available at http://ilabs.inquiry.uiuc.edu/ilab/fallbiosem/documents/2380/home/ma-et-al-2005.pdf.

  • MacEachern, S. 1994. Estimating normal means with a conjugate style Dirichlet process prior. Communications in Statistics- Simulation and Computation 23:727–741.

    Article  MathSciNet  MATH  Google Scholar 

  • MacEachern, S., and P. Müller. 1998. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics 7:223–238.

    Article  Google Scholar 

  • Marshall, G., and A. Barón. 2000. Linear discriminant models for unbalanced longitudinal data. Statistics in Medicine 19:1961–1981.

    Article  Google Scholar 

  • Medvedovic, M., and S. Sivaganesan. 2002. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206.

    Article  Google Scholar 

  • Morris, J., and R. Carroll. 2006. Wavelet-based functional mixed models. Journal of the Royal Statistical Society. Series B 68:179–199.

    Article  MathSciNet  MATH  Google Scholar 

  • Mukhopadhyay, S., and A. Gelfand. 1997. Dirichlet process mixed generalized linear models. Journal of the American Statistical Association 92:633–639.

    Article  MathSciNet  MATH  Google Scholar 

  • Müller, P., A. Erkanli, and M. West. 1996. Bayesian curve fitting using multivariate normal mixtures. Biometrika 83:67–79.

    Article  MathSciNet  MATH  Google Scholar 

  • Muthén, B., and K. Shedden. 1999. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55:463–469.

    Article  MATH  Google Scholar 

  • Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9:249–265.

    Article  MathSciNet  Google Scholar 

  • Papaspiliopoulos, O., and G. Roberts. 2008. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95:169–186.

    Article  MathSciNet  MATH  Google Scholar 

  • Petrone, S., M. Guindani, and A. Gelfand. 2009. Hybrid Dirichlet mixture models for functional data. Journal of the Royal Statistical Society. Series B 71:755–782.

    Article  MathSciNet  Google Scholar 

  • Pitman, J. 1995. Exchangeable and partially exchangeable random partitions. Probability Theory Related Fields 102:145–158.

    Article  MathSciNet  MATH  Google Scholar 

  • Pitman, J. 1996. Some developments of the Blackwell-Macqueen urn scheme. In Statistics, probability and game theory, eds. T. Ferguson, L. Shapley, and J. MacQueen, pp. 245–267. IMS Lecture Notes-Monograph Series.

  • Pitman, J., and M. Yor. 1997. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability 25:855–900.

    Article  MathSciNet  MATH  Google Scholar 

  • Ray, S., and B. Mallick. 2006. Functional clustering by Bayesian wavelet methods. Journal of the Royal Statistical Society. Series B 68:305–332.

    Article  MathSciNet  MATH  Google Scholar 

  • Rice, J., and C. Wu. 2001. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57:253–259.

    Article  MathSciNet  MATH  Google Scholar 

  • Rodriguez, A., D. Dunson, and A. Gelfand. 2008. The nested Dirichlet process (with discussion). Journal of the American Statistical Association 103:1131–1154.

    Article  MathSciNet  MATH  Google Scholar 

  • Rodriguez, A., D. Dunson and A. Gelfand. 2009. Bayesian nonparametric functional data analysis through density estimation. Biometrika 96:149–162.

    Article  MathSciNet  MATH  Google Scholar 

  • Smith, M., and R. Kohn. 1996. Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75:317–343.

    Article  MATH  Google Scholar 

  • Spiegelhalter, D., N. Best, B. Carlin, and A. van der Linde. 2002. Measures of model complexity and fit. Journal of the Royal Statistical Society. Series B 64:583–639.

    Article  MATH  Google Scholar 

  • van Zonneveld, P., G. Scheffer, F. Broekmans, M. Blankenstein, F. de Jong, C. Looman, J. Habbema, and E. te Velde. 2003. Do cycle disturbances explain the age-related decline of female fertility? Cycle characteristics of women aged over 40 years compared with a reference population of young women. Human Reproduction 18:495–501.

    Article  Google Scholar 

  • Wilcox, A., C. Weinberg, J. O’Connor, D. Baird, J. Schlatterer, R. Canfield, E. Armstrong, and B. Nisula. 1988. Incidence of early loss of pregnancy. New England Journal of Medicine 319:189–194.

    Article  Google Scholar 

  • Wood, S., W. Jiang, and M. Tanner. 2002. Bayesian mixture of splines for spatially adaptive nonparametric regression. Biometrika, 89:513–528.

    Article  MathSciNet  MATH  Google Scholar 

  • Xue, Y., X. Liao, L. Carin, and B. Krishnapuram. 2007. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8:35–63.

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported by the Intramural Research Program of the NIH, and NIEHS. We would like to thank Allen Wilcox, Donna Baird and Clare Weinberg for generously providing the data and for their helpful comments on the approach. Thanks also to the Associate Editor and reviewer for their thoughtful and helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jamie L. Crandell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crandell, J.L., Dunson, D.B. Posterior simulation across nonparametric models for functional clustering. Sankhya B 73, 42–61 (2011). https://doi.org/10.1007/s13571-011-0014-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-011-0014-z

Keywords

Navigation