Abstract
An important byproduct of inference in discrete mixture models is an implied random partition of experimental units. In fact, such random partitions are the main inference targets for many recently published applications of nonparametric Bayesian discrete mixture models. In this chapter we systematically consider the use of nonparametric Bayesian priors for inference on such random partitions. Many scientific inference problems are formalized as the related, more general problem of feature allocation. That is, inference on possibly overlapping random subsets of experimental units. We introduce some examples from data analysis for bioinformatics data and introduce the Polya urn model, product partition models, model based clustering and the Indian buffet process prior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barry D, Hartigan JA (1992) Product partition models for change point problems. Ann Stat 20:260–279
Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88:309–319
Blattberg RC, Gonedes NJ (1974) A comparison of stable and student distribution as statistical models for stock prices. J Bus 47:244–280
Broderick T, Jordan MI, Pitman J (2013) Cluster and feature modeling from combinatorial stochastic processes. Stat Sci 28(3):289–312
Broderick T, Kulis B, Jordan MI (2013) MAD-Bayes: MAP-based asymptotic derivations from Bayes. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning (ICML), 2013
Crowley EM (1997) Product partition models for normal means. J Am Stat Assoc 92:192–198
Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Vannucci M, Do KA, Müller P (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge
Dasgupta A, Raftery AE (1998) Detecting features in spatial point process with clutter via model-base clustering. J Am Stat Assoc 93:294–302
Elton EJ, Gruber MJ (1995) Modern portfolio theory and investment analysis, 5th edn. Wiley, New York
Fama E (1965) The behavior of stock market prices. J Bus 38:34–105
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Geisser S, Eddy W (1979) A predictive approach to model selection. J Am Stat Assoc 74:153–160
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Texts in statistical science series. Chapman & Hall/CRC, Boca Raton, FL
Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28(2):355–375
Griffiths TL, Ghahramani Z (2006) Infinite latent feature models and the Indian buffet process. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 475–482
Hartigan JA (1990) Partition models. Commun Stat Theory Meth 19:2745–2756
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Ser C: Appl Stat 28(1):100–108
Johnson VE, Albert JH (1999) Ordinal data modeling. Springer, New York
Kingman JFC (1978) The representation of partition structures. J Lond Math Soc 18(2):374–380. doi:10.1112/jlms/s2-18.2.374
Kingman JFC (1982) The coalescent. Stoch Process Appl 13(3):235–248. doi:10.1016/0304-4149(82)90011-4. http://dx.doi.org/10.1016/0304-4149(82)90011-4
Lee J, Müller P, Ji Y, Gulukota K (2015) A Bayesian feature allocation model for tumor heterogeneity. Annals of Applied Statistics: to appear
Lee J, Müller P, Zhu Y, Ji Y (2013b) A nonparametric Bayesian model for local clustering. J Amer Stat Ass 108:775–788
Lee J, Quintana FA, Müller P, Trippa L (2013c) Defining predictive probability functions for species sampling models. Stat Sci 28(2):209–222
Leon-Novelo LG, Bekele BN, Müller P, Quintana FA, Wathen K (2012) Borrowing strength with nonexchangeable priors over subpopulations. Biometrics 68(2):550–558
Loschi R, Cruz FRB (2005) Extension to the product partition model: computing the probability of a change. Comput Stat Data Anal 48:255–268
Müller P, Quintana FA, Rosner G (2011) A product partition model with regression on covariates. J Comput Graph Stat 20:260–278
Neal RM (2000) Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J Comput Graph Stat 9:249–265
Quintana FA, Müller P, Papoila A (2015) cluster-specific variable selection for product partition models. Scandinavian J Stat to appear
Report, Pontificia Universidad Catolica de Chile, Department of Statistics, 2013
Quintana FA (2006) A predictive view of Bayesian clustering. J Stat Plann Inference 136(8):2407–2429
Quintana FA, Iglesias PL (2003) Bayesian clustering and product partition models. J R Stat Soc Ser B 65:557–574
Teh YW, Görür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics, 2007
Thibaux R, Jordan M (2007) Hierarchical beta processes and the indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics (AISTAT), Puerto Rico, 2007
Xu Y, Müller P, Yuan Y, Gulukota K, Ji Y (2015) MAD Bayes for tumor – heterogeneity feature allocation with non-normal sampling. J Am Stat Ass (to appear)
Yao YC (1984) Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann Stat 12:1434–1447
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Müller, P., Quintana, F.A., Jara, A., Hanson, T. (2015). Clustering and Feature Allocation. In: Bayesian Nonparametric Data Analysis. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18968-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-18968-0_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18967-3
Online ISBN: 978-3-319-18968-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)