Abstract
Mixture modeling is one of the most useful tools in machine learning and data mining applications. An important challenge when applying finite mixture models is the selection of the number of clusters which best describes the data. Recent developments have shown that this problem can be handled by the application of non-parametric Bayesian techniques to mixture modeling. Another important crucial preprocessing step to mixture learning is the selection of the most relevant features. The main approach in this paper, to tackle these problems, consists on storing the knowledge in a generalized Dirichlet mixture model by applying non-parametric Bayesian estimation and inference techniques. Specifically, we extend finite generalized Dirichlet mixture models to the infinite case in which the number of components and relevant features do not need to be known a priori. This extension provides a natural representation of uncertainty regarding the challenging problem of model selection. We propose a Markov Chain Monte Carlo algorithm to learn the resulted infinite mixture. Through applications involving text and image categorization, we show that infinite mixture models offer a more powerful and robust performance than classic finite mixtures for both clustering and feature selection.
Similar content being viewed by others
References
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8): 993–1009
Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10): 1716–1731
Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22: 1–21
Moise G, Zimek A, Kröger P, Kriegel H-P, Sander J (2009) Subspace and projected clustering: experimental evaluation and analysis. Knowl Inf Syst 21: 299–326
Lu J, Li R, Zhang Y, Zhao T, Lu Z (2010) Image annotation technique based on feature selection for class-pairs. Knowl Inf Syst 24(2): 325–337
Bouguila N, Ziou D (2009) A non-parametric Bayesian learning model: application to text and image categorization. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD). Springer, LNAI 5476, pp 463–474
Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. In: Rizvi H, Rustagi J (eds) Recent advances in statistics. Academic Press, New York, pp 287–302
Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430): 577–588
Neal RM (2000) Markov Chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9: 249–265
Ghosh JK, Ramamoorthi RV (2003) Bayesian nonparametrics. Springer, Berlin
Teh YW, Jordan MI, Beal MI, Matthew J, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476): 1566–1581
Rasmussen CE (2000) The infinite gaussian mixture model. In: Advances in neural information processing systems (NIPS), pp 554–560
Bouguila N, Ziou D (2004) A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications. In: Proceedings of the 17th international conference on pattern recognition (ICPR), pp 280–283
Bouguila N (2008) Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4): 462–474
Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-gibbs sampling. Pattern Anal Appl 12(2): 151–166
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Royal Stat Soc B 39: 1–38
Bouguila N, Ziou D (2006) A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Trans Image Process 15(9): 2657–2668
Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(9): 1429–1443
Bouguila N, Ziou D, Monga E (2006) Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications. Stat Comput 16(2): 215–225
Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman & Hall/CRC, London
Marin J-M, Robert CP (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer, Berlin
Ishwaran H, James LF (2003) Generalized weighted chinese restaurant processes for species sampling mixture models. Stat Sinica 13: 1211–1235
Papaspiliopoulos O, Roberts GO (2008) Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Stat Sinica 95(1): 169–186
Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis, second edition. Chapman & Hall/CRC, London
Gilks WR, Wild P (1993) Algorithm aS 287: adaptive rejection sampling from log-concave density functions. Appl Stat 42(4): 701–709
Chib S, Greenberg E (1995) Understanding the metropolis-hastings algorithm. Am Stat 49(4): 327–335
Bouguila N, Ziou D (2004) Dirichlet-based probability model applied to human skin detection. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 521–524
Madsen RE, Kauchak D, Elkan C (2005) Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd international conference on machine learning (ICML), pp 545–552
McCallum AK (1996) Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering. Technical report
Gong Z, Liu Q (2009) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 21: 113–132
Bartolini I, Ciaccia P, Patella M (2009) Query processing issues in region-based image databases. Knowl Inf Syst. In press
Bouguila N, Ziou D, Vaillancourt J (2003) Novel mixtures based on the Dirichlet distribution: application to data and image classification. In: Machine learning and data mining in pattern recognition (MLDM), LNAI 2734. pp 172–181
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, 8th European conference on computer vision (ECCV)
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3): 145–175
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 524–531
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2169–2178
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bouguila, N., Ziou, D. A countably infinite mixture model for clustering and feature selection. Knowl Inf Syst 33, 351–370 (2012). https://doi.org/10.1007/s10115-011-0467-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0467-4