Skip to main content

Clustering and Feature Allocation

  • Chapter
Bayesian Nonparametric Data Analysis

Part of the book series: Springer Series in Statistics ((SSS))

Abstract

An important byproduct of inference in discrete mixture models is an implied random partition of experimental units. In fact, such random partitions are the main inference targets for many recently published applications of nonparametric Bayesian discrete mixture models. In this chapter we systematically consider the use of nonparametric Bayesian priors for inference on such random partitions. Many scientific inference problems are formalized as the related, more general problem of feature allocation. That is, inference on possibly overlapping random subsets of experimental units. We introduce some examples from data analysis for bioinformatics data and introduce the Polya urn model, product partition models, model based clustering and the Indian buffet process prior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Barry D, Hartigan JA (1992) Product partition models for change point problems. Ann Stat 20:260–279

    Article  MATH  MathSciNet  Google Scholar 

  • Barry D, Hartigan JA (1993) A Bayesian analysis for change point problems. J Am Stat Assoc 88:309–319

    MATH  MathSciNet  Google Scholar 

  • Blattberg RC, Gonedes NJ (1974) A comparison of stable and student distribution as statistical models for stock prices. J Bus 47:244–280

    Article  Google Scholar 

  • Broderick T, Jordan MI, Pitman J (2013) Cluster and feature modeling from combinatorial stochastic processes. Stat Sci 28(3):289–312

    Article  MathSciNet  Google Scholar 

  • Broderick T, Kulis B, Jordan MI (2013) MAD-Bayes: MAP-based asymptotic derivations from Bayes. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning (ICML), 2013

    Google Scholar 

  • Crowley EM (1997) Product partition models for normal means. J Am Stat Assoc 92:192–198

    Article  MATH  Google Scholar 

  • Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Vannucci M, Do KA, Müller P (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge

    Google Scholar 

  • Dasgupta A, Raftery AE (1998) Detecting features in spatial point process with clutter via model-base clustering. J Am Stat Assoc 93:294–302

    Article  MATH  Google Scholar 

  • Elton EJ, Gruber MJ (1995) Modern portfolio theory and investment analysis, 5th edn. Wiley, New York

    Google Scholar 

  • Fama E (1965) The behavior of stock market prices. J Bus 38:34–105

    Article  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Geisser S, Eddy W (1979) A predictive approach to model selection. J Am Stat Assoc 74:153–160

    Article  MATH  MathSciNet  Google Scholar 

  • Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Texts in statistical science series. Chapman & Hall/CRC, Boca Raton, FL

    MATH  Google Scholar 

  • Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28(2):355–375

    Article  MATH  MathSciNet  Google Scholar 

  • Griffiths TL, Ghahramani Z (2006) Infinite latent feature models and the Indian buffet process. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 475–482

    Google Scholar 

  • Hartigan JA (1990) Partition models. Commun Stat Theory Meth 19:2745–2756

    Article  MathSciNet  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Ser C: Appl Stat 28(1):100–108

    MATH  Google Scholar 

  • Johnson VE, Albert JH (1999) Ordinal data modeling. Springer, New York

    MATH  Google Scholar 

  • Kingman JFC (1978) The representation of partition structures. J Lond Math Soc 18(2):374–380. doi:10.1112/jlms/s2-18.2.374

    Article  MATH  MathSciNet  Google Scholar 

  • Kingman JFC (1982) The coalescent. Stoch Process Appl 13(3):235–248. doi:10.1016/0304-4149(82)90011-4. http://dx.doi.org/10.1016/0304-4149(82)90011-4

  • Lee J, Müller P, Ji Y, Gulukota K (2015) A Bayesian feature allocation model for tumor heterogeneity. Annals of Applied Statistics: to appear

    Google Scholar 

  • Lee J, Müller P, Zhu Y, Ji Y (2013b) A nonparametric Bayesian model for local clustering. J Amer Stat Ass 108:775–788

    Article  Google Scholar 

  • Lee J, Quintana FA, Müller P, Trippa L (2013c) Defining predictive probability functions for species sampling models. Stat Sci 28(2):209–222

    Article  Google Scholar 

  • Leon-Novelo LG, Bekele BN, Müller P, Quintana FA, Wathen K (2012) Borrowing strength with nonexchangeable priors over subpopulations. Biometrics 68(2):550–558

    Article  MATH  MathSciNet  Google Scholar 

  • Loschi R, Cruz FRB (2005) Extension to the product partition model: computing the probability of a change. Comput Stat Data Anal 48:255–268

    Article  MATH  MathSciNet  Google Scholar 

  • Müller P, Quintana FA, Rosner G (2011) A product partition model with regression on covariates. J Comput Graph Stat 20:260–278

    Article  Google Scholar 

  • Neal RM (2000) Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J Comput Graph Stat 9:249–265

    MathSciNet  Google Scholar 

  • Quintana FA, Müller P, Papoila A (2015) cluster-specific variable selection for product partition models. Scandinavian J Stat to appear

    Google Scholar 

  • Report, Pontificia Universidad Catolica de Chile, Department of Statistics, 2013

    Google Scholar 

  • Quintana FA (2006) A predictive view of Bayesian clustering. J Stat Plann Inference 136(8):2407–2429

    Article  MATH  MathSciNet  Google Scholar 

  • Quintana FA, Iglesias PL (2003) Bayesian clustering and product partition models. J R Stat Soc Ser B 65:557–574

    Article  MATH  MathSciNet  Google Scholar 

  • Teh YW, Görür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics, 2007

    Google Scholar 

  • Thibaux R, Jordan M (2007) Hierarchical beta processes and the indian buffet process. In: Proceedings of the 11th conference on artificial intelligence and statistics (AISTAT), Puerto Rico, 2007

    Google Scholar 

  • Xu Y, Müller P, Yuan Y, Gulukota K, Ji Y (2015) MAD Bayes for tumor – heterogeneity feature allocation with non-normal sampling. J Am Stat Ass (to appear)

    Google Scholar 

  • Yao YC (1984) Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann Stat 12:1434–1447

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Müller, P., Quintana, F.A., Jara, A., Hanson, T. (2015). Clustering and Feature Allocation. In: Bayesian Nonparametric Data Analysis. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18968-0_8

Download citation

Publish with us

Policies and ethics