Simultaneous Clustering and Dimensionality Reduction Using Variational Bayesian Mixture Model

  • Kazuho WatanabeEmail author
  • Shotaro Akaho
  • Shinichiro Omachi
  • Masato Okada
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Exponential principal component analysis (e-PCA) provides a framework for appropriately dealing with various data types such as binary and integer for which the Gaussian assumption on the data distribution is inappropriate. In this paper, we develop a simultaneous dimensionality reduction and clustering technique based on a latent variable model for the e-PCA. Assuming the discrete distribution on the latent variable leads to mixture models with constraint on their parameters. We derive a learning algorithm for those mixture models based on the variational Bayes method. Although intractable integration is required to implement the algorithm, an approximation technique using Laplace’s method allows us to carry out clustering on an arbitrary subspace. Numerical experiments on handwritten digits data demonstrate its effectiveness for extracting the structures of data as a visualization technique and its high generalization ability as a density estimation model.


Mixture Model Dimensionality Reduction Exponential Family Latent Variable Model Variational Free Energy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaho, S. (2004). e-PCA and m-PCA: Dimension reduction of parameters by information geometry. Proceedings of IJCNN, 129–134.Google Scholar
  2. Amari, S., & Nagaoka, H. (2000). Methods of Information Geometry. Oxford: AMS and Oxford University Press.zbMATHGoogle Scholar
  3. Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. Proceedings of UAI, 21–30.Google Scholar
  4. Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh J. (2005). Clustering with Bregman Divergences. Journal of Machine Learning Research, 6, 1705–1749.zbMATHMathSciNetGoogle Scholar
  5. Bishop, C. M. (1999). Bayesian PCA. Advances in NIPS,11, 382–388.Google Scholar
  6. Collins, M., Dasgupta, S., & Schapire R. (2002). A generalization of principal component analysis to the exponential family. Advances in NIPS,14, 617–624.Google Scholar
  7. Ding, C., & Li, T. (2007). Adaptive dimension reduction using discriminant analysis and K-means clustering. Proceedings of ICML, 521–528.Google Scholar
  8. LeCun, Y., Bottou, L., Bengio, Y., & Haffner P. (1998). Gradient-based learning applied to document recognition. Proceedongs of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  9. Omachi, S., Omachi, M., & Aso H. (2007). An approximation method of the quadratic discriminant function and its application to estimation of high-dimensional distribution. IEICE Transitions on Information System, E90-D(8), 1160–1167.Google Scholar
  10. Sajama, & Orlitsky, A. (2004). Semi-parametric exponential family PCA : Reducing dimensions via non-parametric latent distribution estimation. Technical Report CS2004-0790, University of California at San Diego.Google Scholar
  11. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61, 611–622.zbMATHCrossRefMathSciNetGoogle Scholar
  12. Watanabe, K., Akaho, S., Omachi, S., & Okada, M. (2009). Variational bayesian mixture model on a subspace of exponential family distributions. IEEE Trans. Neural Networks, 20(11), 1783–1796.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kazuho Watanabe
    • 1
    Email author
  • Shotaro Akaho
  • Shinichiro Omachi
  • Masato Okada
  1. 1.Nara Institute of Science and TechnologyIkomaJapan

Personalised recommendations