Skip to main content
Log in

A countably infinite mixture model for clustering and feature selection

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Mixture modeling is one of the most useful tools in machine learning and data mining applications. An important challenge when applying finite mixture models is the selection of the number of clusters which best describes the data. Recent developments have shown that this problem can be handled by the application of non-parametric Bayesian techniques to mixture modeling. Another important crucial preprocessing step to mixture learning is the selection of the most relevant features. The main approach in this paper, to tackle these problems, consists on storing the knowledge in a generalized Dirichlet mixture model by applying non-parametric Bayesian estimation and inference techniques. Specifically, we extend finite generalized Dirichlet mixture models to the infinite case in which the number of components and relevant features do not need to be known a priori. This extension provides a natural representation of uncertainty regarding the challenging problem of model selection. We propose a Markov Chain Monte Carlo algorithm to learn the resulted infinite mixture. Through applications involving text and image categorization, we show that infinite mixture models offer a more powerful and robust performance than classic finite mixtures for both clustering and feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  2. Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8): 993–1009

    Article  Google Scholar 

  3. Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10): 1716–1731

    Article  Google Scholar 

  4. Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22: 1–21

    Article  MATH  Google Scholar 

  5. Moise G, Zimek A, Kröger P, Kriegel H-P, Sander J (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowl Inf Syst 21: 299–326

    Article  Google Scholar 

  6. Lu J, Li R, Zhang Y, Zhao T, Lu Z (2010) Image annotation technique based on feature selection for class-pairs. Knowl Inf Syst 24(2): 325–337

    Article  Google Scholar 

  7. Bouguila N, Ziou D (2009) A non-parametric Bayesian learning model: application to text and image categorization. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD). Springer, LNAI 5476, pp 463–474

  8. Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. In: Rizvi H, Rustagi J (eds) Recent advances in statistics. Academic Press, New York, pp 287–302

    Google Scholar 

  9. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430): 577–588

    Article  MathSciNet  MATH  Google Scholar 

  10. Neal RM (2000) Markov Chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9: 249–265

    MathSciNet  Google Scholar 

  11. Ghosh JK, Ramamoorthi RV (2003) Bayesian nonparametrics. Springer, Berlin

    MATH  Google Scholar 

  12. Teh YW, Jordan MI, Beal MI, Matthew J, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476): 1566–1581

    Article  MATH  Google Scholar 

  13. Rasmussen CE (2000) The infinite gaussian mixture model. In: Advances in neural information processing systems (NIPS), pp 554–560

  14. Bouguila N, Ziou D (2004) A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications. In: Proceedings of the 17th international conference on pattern recognition (ICPR), pp 280–283

  15. Bouguila N (2008) Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4): 462–474

    Article  Google Scholar 

  16. Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-gibbs sampling. Pattern Anal Appl 12(2): 151–166

    Article  MathSciNet  Google Scholar 

  17. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Royal Stat Soc B 39: 1–38

    MathSciNet  MATH  Google Scholar 

  18. Bouguila N, Ziou D (2006) A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Trans Image Process 15(9): 2657–2668

    Article  Google Scholar 

  19. Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(9): 1429–1443

    Article  Google Scholar 

  20. Bouguila N, Ziou D, Monga E (2006) Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications. Stat Comput 16(2): 215–225

    Article  MathSciNet  Google Scholar 

  21. Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman & Hall/CRC, London

    Google Scholar 

  22. Marin J-M, Robert CP (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer, Berlin

    Google Scholar 

  23. Ishwaran H, James LF (2003) Generalized weighted chinese restaurant processes for species sampling mixture models. Stat Sinica 13: 1211–1235

    MathSciNet  MATH  Google Scholar 

  24. Papaspiliopoulos O, Roberts GO (2008) Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Stat Sinica 95(1): 169–186

    MathSciNet  MATH  Google Scholar 

  25. Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis, second edition. Chapman & Hall/CRC, London

    Book  Google Scholar 

  26. Gilks WR, Wild P (1993) Algorithm aS 287: adaptive rejection sampling from log-concave density functions. Appl Stat 42(4): 701–709

    Article  Google Scholar 

  27. Chib S, Greenberg E (1995) Understanding the metropolis-hastings algorithm. Am Stat 49(4): 327–335

    Google Scholar 

  28. Bouguila N, Ziou D (2004) Dirichlet-based probability model applied to human skin detection. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 521–524

  29. Madsen RE, Kauchak D, Elkan C (2005) Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd international conference on machine learning (ICML), pp 545–552

  30. McCallum AK (1996) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. Technical report

  31. Gong Z, Liu Q (2009) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 21: 113–132

    Article  Google Scholar 

  32. Bartolini I, Ciaccia P, Patella M (2009) Query processing issues in region-based image databases. Knowl Inf Syst. In press

  33. Bouguila N, Ziou D, Vaillancourt J (2003) Novel mixtures based on the Dirichlet distribution: application to data and image classification. In: Machine learning and data mining in pattern recognition (MLDM), LNAI 2734. pp 172–181

  34. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, 8th European conference on computer vision (ECCV)

  35. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3): 145–175

    Article  MATH  Google Scholar 

  36. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 524–531

  37. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2169–2178

  38. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nizar Bouguila.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouguila, N., Ziou, D. A countably infinite mixture model for clustering and feature selection. Knowl Inf Syst 33, 351–370 (2012). https://doi.org/10.1007/s10115-011-0467-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0467-4

Keywords

Navigation