# MLiT: mixtures of Gaussians under linear transformations

- 128 Downloads

## Abstract

The curse of dimensionality hinders the effectiveness of density estimation in high dimensional spaces. Many techniques have been proposed in the past to discover embedded, locally linear manifolds of lower dimensionality, including the mixture of principal component analyzers, the mixture of probabilistic principal component analyzers and the mixture of factor analyzers. In this paper, we propose a novel mixture model for reducing dimensionality based on a linear transformation which is not restricted to be orthogonal nor aligned along the principal directions. For experimental validation, we have used the proposed model for classification of five “hard” data sets and compared its accuracy with that of other popular classifiers. The performance of the proposed method has outperformed that of the mixture of probabilistic principal component analyzers on four out of the five compared data sets with improvements ranging from 0.5 to 3.2%. Moreover, on all data sets, the accuracy achieved by the proposed method outperformed that of the Gaussian mixture model with improvements ranging from 0.2 to 3.4%.

## Keywords

Dimensionality reduction Regularized maximum-likelihood Mixture models Linear transformations Object classification## Notes

### Acknowledgments

The authors wish to thank the Australian Research Council and iOmniscient Pty Ltd that have partially supported this work under the Linkage Project funding scheme, grant LP0668325.

## References

- 1.Bellman R (ed) (1961) Adaptive control processes—a guided tour. Princeton University Press, Princeton, New JerseyGoogle Scholar
- 2.Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J R Stat Soc Ser B (Stat Methodol) 61(3):611–622MathSciNetCrossRefzbMATHGoogle Scholar
- 3.Roweis S (1997) EM algorithms for PCA and SPCA. In: Advances in neural information processing systems, vol 10. The MIT Press, Colorado, pp 626–632Google Scholar
- 4.Bartholomew DJ (ed) (1987) Latent variable models and factor analysis. Charles Griffin, LondonGoogle Scholar
- 5.Basilevsky A (ed) (1994) Statistical factor analysis and related methods. Wiley, New YorkGoogle Scholar
- 6.Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRefGoogle Scholar
- 7.Chin T-J, Suter D (2007) Incremental kernel principal component analysis. IEEE Trans Image Process 16(6):1662–1674MathSciNetCrossRefzbMATHGoogle Scholar
- 8.Hinton GE, Dayan P, Revow M (1997) Modeling the manifolds of images of handwritten digits. IEEE Trans Neural Netw 8(1):65–74CrossRefGoogle Scholar
- 9.Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443–482CrossRefGoogle Scholar
- 10.Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto (the original paper for the mixture of factor analyzers)Google Scholar
- 11.Ridder DD, Franc V (2003) Robust subspace mixture models using t-distribution. In: 14th British Machine Vision Conference (BMVC), London, UK, pp 319–328Google Scholar
- 12.Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetzbMATHGoogle Scholar
- 13.Bishop CM (ed) (2006) Pattern recognition and machine learning. SpringerGoogle Scholar
- 14.Kittler JV (1998) Combining classifiers: a theoretical framework. Pattern Anal Appl 1(1):18–27MathSciNetCrossRefGoogle Scholar
- 15.Neal R, Hinton G (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, MA, pp 355–368Google Scholar
- 16.Figueiredo MAF, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396 (avoid singularity by applying deterministic annealing)Google Scholar
- 17.Bolton RJ, Krzanowski WJ (1999) A characterization of principal components for projection pursuit. Am Stat 53(2):108–109CrossRefGoogle Scholar
- 18.Asuncion A, Newman DJ (2007) UCI machine learning repositoryGoogle Scholar
- 19.Jain AK, Duin RPW, Jianchang M (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRefGoogle Scholar
- 20.Breiman L, Spector P (1992) Submodel selection and evaluation in regression: the x-random case. Int Stat Rev 60(3):291–319CrossRefGoogle Scholar
- 21.Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian Mixture and Hidden Markov ModelGoogle Scholar
- 22.Schoenberg R (1997) Constrained maximum likelihood. Comput Econ 10:251–266CrossRefzbMATHGoogle Scholar
- 23.Golub GH, van Loan CF (1996) Matrix computations. Johns Hopkins University Press, 3rd ednGoogle Scholar