Simultaneous clustering and feature selection via nonparametric Pitman–Yor process mixture models

  • Wentao Fan
  • Nizar Bouguila
Original Article


Mixture models constitute one of the most important machine learning approaches. Indeed, they can be considered as the workhorse of generative machine learning. The majority of existing works consider mixtures of Gaussians. Unlike these works, this paper concentrates on nonparametric Bayesian models with Dirichlet-based mixtures. In particular, we consider the case when a Pitman–Yor process prior is adopted. Two central problems when considering such mixtures can be regarded as selecting ‘meaningful’ (or relevant) features and estimating the model’s parameters. We develop an efficient algorithm for model inference, based on the collapsed variational Bayes framework with 0th-order Taylor approximation. The merits and efficacy of the proposed nonparametric Bayesian model are demonstrated via challenging applications that concern real-world data clustering and 3D objects recognition.


Mixture models Generalized Dirichlet Clustering Feature selection Pitman–Yor process Collapsed variational Bayes 



The completion of this work was supported by the National Natural Science Foundation of China (61502183) and the Natural Sciences and Engineering Research Council of Canada (NSERC).


  1. 1.
    Attias H (1999) A variational Bayes framework for graphical models. In: Proceedings of advances in neural information processing systems (NIPS), pp 209–215Google Scholar
  2. 2.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  3. 3.
    Blei DM, Jordan MI (2005) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–144MathSciNetzbMATHGoogle Scholar
  4. 4.
    Blei DM, Kucukelbir A, Mcauliffe J (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877MathSciNetGoogle Scholar
  5. 5.
    Bouguila N (2011) Bayesian hybrid generative discriminative learning based on finite Liouville mixture models. Pattern Recognit 44(6):1183–1200zbMATHGoogle Scholar
  6. 6.
    Bouguila N (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202Google Scholar
  7. 7.
    Bouguila N, ElGuebaly W (2009) Discrete data clustering using finite mixture models. Pattern Recognit 42(1):33–42zbMATHGoogle Scholar
  8. 8.
    Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731Google Scholar
  9. 9.
    Bouguila N, Ziou D (2010) A dirichlet process mixture of generalized dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122Google Scholar
  10. 10.
    Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31(8):1429–1443Google Scholar
  11. 11.
    Bronstein M, Kokkinos I (2010) Scale-invariant heat kernel signatures for non-rigid shape recognition. In: Proceedings of the 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 1704–1711Google Scholar
  12. 12.
    Bruneau P, Gelgon M, Picarougne F (2010) Parsimonious reduction of Gaussian mixture models with a variational-Bayes approach. Pattern Recognit 43(3):850–858zbMATHGoogle Scholar
  13. 13.
    Carpineto C, Romano G (1996) A lattice conceptual clustering system and its application to browsing retrieval. Mach Learn 24(2):95–122Google Scholar
  14. 14.
    Chatzis S, Demiris Y (2012) Nonparametric mixtures of Gaussian processes with power-law behavior. Neural Netw Learn Syst IEEE Trans 23(12):1862–1871Google Scholar
  15. 15.
    Chatzis SP (2013) A markov random field-regulated Pitman–Yor process prior for spatially constrained data clustering. Pattern Recognit 46(6):1595–1603zbMATHGoogle Scholar
  16. 16.
    Chatzis SP, Kosmopoulos DI (2011) A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognit 44(2):295–306zbMATHGoogle Scholar
  17. 17.
    Fan W, Bouguila N (2013) Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection. Pattern Recognit 46(10):2754–2769zbMATHGoogle Scholar
  18. 18.
    Fan W, Bouguila N, Ziou D (2012) Variational learning for finite Dirichlet mixture models and applications. IEEE Trans Neural Netw Learn Syst 23(5):762–774Google Scholar
  19. 19.
    Fan W, Bouguila N, Ziou D (2013) Unsupervised hybrid feature extraction selection for high-dimensional non-Gaussian data clustering with variational inference. IEEE Trans Knowl Data Eng 25(7):1670–1685Google Scholar
  20. 20.
    Fan W, Al-Osaimi FR, Bouguila N, Du J (2017) Proportional data modeling via entropy-based variational Bayes learning of mixture models. Appl Intell 47(2):473–487Google Scholar
  21. 21.
    Fernando B, Fromont E, Muselet D, Sebban M (2012) Supervised learning of Gaussian mixture models for visual vocabulary generation. Pattern Recognit 45(2):897–907zbMATHGoogle Scholar
  22. 22.
    Han S, Tao W, Wu X (2011) Texture segmentation using independent-scale component-wise riemannian-covariance Gaussian mixture model in KL measure based multi-scale nonlinear structure tensor space. Pattern Recognit 44(3):503–518zbMATHGoogle Scholar
  23. 23.
    Ishiguro K, Sato I, Ueda N (2017) Averaged collapsed variational Bayes inference. J Mach Learn Res 18(1):1–29MathSciNetzbMATHGoogle Scholar
  24. 24.
    Korwar RM, Hollander M (1973) Contributions to the theory of Dirichlet processes. Ann Probab 1:705–711MathSciNetzbMATHGoogle Scholar
  25. 25.
    Kurihara K, Welling M, Teh YW (2007) Collapsed variational Dirichlet process mixture models. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 2796–2801Google Scholar
  26. 26.
    Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166Google Scholar
  27. 27.
    Li Y, Dong M, Hua J (2009) Simultaneous localized feature selection and model detection for Gaussian mixtures. IEEE Trans Pattern Anal Mach Intell 31(5):953–960Google Scholar
  28. 28.
    Liu JS (1994) The collapsed gibbs sampler in Bayesian computations with applications to a gene regulation problem. J Am Stat Assoc 89(427):958–966MathSciNetzbMATHGoogle Scholar
  29. 29.
    Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images. Pattern Recognit 41(2):484–493zbMATHGoogle Scholar
  30. 30.
    Ma Z, Rana PK, Taghia J, Flierl M, Leijon A (2014) Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognit 47(9):3143–3157zbMATHGoogle Scholar
  31. 31.
    Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Gr Stat 9(2):249–265MathSciNetGoogle Scholar
  32. 32.
    Permuter H, Francos J, Jermyn I (2006) A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern Recognit 39(4):695–706zbMATHGoogle Scholar
  33. 33.
    Pitman J, Yor M (1997) The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann Probab 25(2):855–900MathSciNetzbMATHGoogle Scholar
  34. 34.
    Platanios E, Chatzis S (2014) Gaussian process-mixture conditional heteroscedasticity. Pattern Anal Mach Intell IEEE Trans 36(5):888–900Google Scholar
  35. 35.
    Rasmussen CE (1999) The infinite Gaussian mixture model. In: Solla SA, Leen TK, Müller K (eds) Advances in neural information processing systems 12, [NIPS conference, Denver, Colorado, USA, November 29–December 4, 1999]. The MIT Press, pp 554–560Google Scholar
  36. 36.
    Sato I, Nakagawa H (2012) Rethinking collapsed variational Bayes inference for LDA. In: Proceedings of the 29th international conference on machine learning, ICML 2012Google Scholar
  37. 37.
    Sato I, Kurihara K, Nakagawa H (2012) Practical collapsed variational Bayes inference for hierarchical Dirichlet process. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 105–113Google Scholar
  38. 38.
    Shilane P, Min P, Kazhdan M, Funkhouser T (2004) The Princeton shape benchmark. In: Proceedings of the shape modeling international 2004, SMI ’04, pp 167–178Google Scholar
  39. 39.
    Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 64(4):583–639MathSciNetzbMATHGoogle Scholar
  40. 40.
    Sudderth EB, Jordan MI (2008) Shared segmentation of natural scenes using dependent Pitman–Yor processes. In: Proceedings of advances in neural information processing systems (NIPS), pp 1585–1592Google Scholar
  41. 41.
    Sun J, Ovsjanikov M, Guibas L (2009) A concise and provably informative multi-scale signature based on heat diffusion. In: Proceedings of the symposium on geometry processing, SGP ’09, pp 1383–1392Google Scholar
  42. 42.
    Teh YW (2006) A hierarchical Bayesian language model based on Pitman-Yor processes. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pp 985–992Google Scholar
  43. 43.
    Teh YW, Newman D, Welling M (2007) A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Proceedings of advances in neural information processing systems (NIPS), pp 1353–1360Google Scholar
  44. 44.
    Titterington DM (2004) Bayesian methods for neural networks and related models. Stat Sci 19(1):128–139MathSciNetzbMATHGoogle Scholar
  45. 45.
    Wallace CS, Dowe DL (2000) MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Stat Comput 10(1):73–83Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyHuaqiao UniversityXiamenChina
  2. 2.Concordia Institute for Information Systems Engineering (CIISE)Concordia UniversityMontrealCanada

Personalised recommendations