Clustering and Feature Allocation

Müller, Peter; Quintana, Fernando Andrés; Jara, Alejandro; Hanson, Tim

doi:10.1007/978-3-319-18968-0_8

Peter Müller¹¹,
Fernando Andrés Quintana¹²,
Alejandro Jara¹² &
…
Tim Hanson¹³

Part of the book series: Springer Series in Statistics ((SSS))

7212 Accesses
1 Citations

Abstract

An important byproduct of inference in discrete mixture models is an implied random partition of experimental units. In fact, such random partitions are the main inference targets for many recently published applications of nonparametric Bayesian discrete mixture models. In this chapter we systematically consider the use of nonparametric Bayesian priors for inference on such random partitions. Many scientific inference problems are formalized as the related, more general problem of feature allocation. That is, inference on possibly overlapping random subsets of experimental units. We introduce some examples from data analysis for bioinformatics data and introduce the Polya urn model, product partition models, model based clustering and the Indian buffet process prior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barry D, Hartigan JA (1992) Product partition models for change point problems. Ann Stat 20:260–279
Article MATH MathSciNet Google Scholar
Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88:309–319
MATH MathSciNet Google Scholar
Blattberg RC, Gonedes NJ (1974) A comparison of stable and student distribution as statistical models for stock prices. J Bus 47:244–280
Article Google Scholar
Broderick T, Jordan MI, Pitman J (2013) Cluster and feature modeling from combinatorial stochastic processes. Stat Sci 28(3):289–312
Article MathSciNet Google Scholar
Broderick T, Kulis B, Jordan MI (2013) MAD-Bayes: MAP-based asymptotic derivations from Bayes. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning (ICML), 2013
Google Scholar
Crowley EM (1997) Product partition models for normal means. J Am Stat Assoc 92:192–198
Article MATH Google Scholar
Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Vannucci M, Do KA, Müller P (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge
Google Scholar
Dasgupta A, Raftery AE (1998) Detecting features in spatial point process with clutter via model-base clustering. J Am Stat Assoc 93:294–302
Article MATH Google Scholar
Elton EJ, Gruber MJ (1995) Modern portfolio theory and investment analysis, 5th edn. Wiley, New York
Google Scholar
Fama E (1965) The behavior of stock market prices. J Bus 38:34–105
Article Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MATH MathSciNet Google Scholar
Geisser S, Eddy W (1979) A predictive approach to model selection. J Am Stat Assoc 74:153–160
Article MATH MathSciNet Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Texts in statistical science series. Chapman & Hall/CRC, Boca Raton, FL
MATH Google Scholar
Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28(2):355–375
Article MATH MathSciNet Google Scholar
Griffiths TL, Ghahramani Z (2006) Infinite latent feature models and the Indian buffet process. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 475–482
Google Scholar
Hartigan JA (1990) Partition models. Commun Stat Theory Meth 19:2745–2756
Article MathSciNet Google Scholar
Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Ser C: Appl Stat 28(1):100–108
MATH Google Scholar
Johnson VE, Albert JH (1999) Ordinal data modeling. Springer, New York
MATH Google Scholar
Kingman JFC (1978) The representation of partition structures. J Lond Math Soc 18(2):374–380. doi:10.1112/jlms/s2-18.2.374
Article MATH MathSciNet Google Scholar
Kingman JFC (1982) The coalescent. Stoch Process Appl 13(3):235–248. doi:10.1016/0304-4149(82)90011-4. http://dx.doi.org/10.1016/0304-4149(82)90011-4
Lee J, Müller P, Ji Y, Gulukota K (2015) A Bayesian feature allocation model for tumor heterogeneity. Annals of Applied Statistics: to appear
Google Scholar
Lee J, Müller P, Zhu Y, Ji Y (2013b) A nonparametric Bayesian model for local clustering. J Amer Stat Ass 108:775–788
Article Google Scholar
Lee J, Quintana FA, Müller P, Trippa L (2013c) Defining predictive probability functions for species sampling models. Stat Sci 28(2):209–222
Article Google Scholar
Leon-Novelo LG, Bekele BN, Müller P, Quintana FA, Wathen K (2012) Borrowing strength with nonexchangeable priors over subpopulations. Biometrics 68(2):550–558
Article MATH MathSciNet Google Scholar
Loschi R, Cruz FRB (2005) Extension to the product partition model: computing the probability of a change. Comput Stat Data Anal 48:255–268
Article MATH MathSciNet Google Scholar
Müller P, Quintana FA, Rosner G (2011) A product partition model with regression on covariates. J Comput Graph Stat 20:260–278
Article Google Scholar
Neal RM (2000) Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J Comput Graph Stat 9:249–265
MathSciNet Google Scholar
Quintana FA, Müller P, Papoila A (2015) cluster-specific variable selection for product partition models. Scandinavian J Stat to appear
Google Scholar
Report, Pontificia Universidad Catolica de Chile, Department of Statistics, 2013
Google Scholar
Quintana FA (2006) A predictive view of Bayesian clustering. J Stat Plann Inference 136(8):2407–2429
Article MATH MathSciNet Google Scholar
Quintana FA, Iglesias PL (2003) Bayesian clustering and product partition models. J R Stat Soc Ser B 65:557–574
Article MATH MathSciNet Google Scholar
Teh YW, Görür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics, 2007
Google Scholar
Thibaux R, Jordan M (2007) Hierarchical beta processes and the indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics (AISTAT), Puerto Rico, 2007
Google Scholar
Xu Y, Müller P, Yuan Y, Gulukota K, Ji Y (2015) MAD Bayes for tumor – heterogeneity feature allocation with non-normal sampling. J Am Stat Ass (to appear)
Google Scholar
Yao YC (1984) Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann Stat 12:1434–1447
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Texas at Austin, Austin, TX, USA
Peter Müller
Departamento de Estadística, Pontificia Universidad Católica, Santiago, Chile
Fernando Andrés Quintana & Alejandro Jara
Department of Statistics, University of South Carolina, Columbia, SC, USA
Tim Hanson

Authors

Peter Müller
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Andrés Quintana
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Jara
View author publications
You can also search for this author in PubMed Google Scholar
Tim Hanson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Müller, P., Quintana, F.A., Jara, A., Hanson, T. (2015). Clustering and Feature Allocation. In: Bayesian Nonparametric Data Analysis. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18968-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-18968-0_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18967-3
Online ISBN: 978-3-319-18968-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics