Skip to main content
Log in

Sparse representation of precision matrices used in GMMs

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The paper presents a novel precision matrix modeling technique for Gaussian Mixture Models (GMMs), which is based on the concept of sparse representation. Representation coefficients of each precision matrix (inverse covariance), as well as an accompanying overcomplete matrix dictionary, are learned by minimizing an appropriate functional, the first component of which corresponds to the sum of Kullback-Leibler (KL) divergences between the initial and the target GMM, and the second represents the sparse regularizer of the coefficients. Compared to the existing, alternative approaches for approximate GMM modeling, like popular subspace-based representation methods, the proposed model results in notably better trade-off between the representation error and the computational (memory) complexity. This is achieved under assumption that the training data in the recognition system utilizing GMM have an inherent sparseness property, which enables application of the proposed model and approximate representation using only one dictionary and a significantly smaller number of coefficients. Proposed model is experimentally compared with the Subspace Precision and Mean (SPAM) model, a state of the art instance of subspace-based representation models, using both the data from a real Automatic Speech Recognition (ASR) system, and specially designed sets of artificially created/synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. 1 Usual term in image processing when working with sparse image representations, see e.g. [1, 11, 12, 32].

  2. 2 \(\lVert {x}\rVert _{l_{0}}\) represents the number of nonzero coefficients in vector \(x \in {\mathbb {R}^{n}}\), while \(\lVert {x}\rVert _{l_{1}}\) represents its convex relaxation.

  3. There, a distinction was made between inner product and scalar multiplications, and the number of scalar and vector additions was also considered.

  4. 4 Sub-gradient f(x) of a convex function \(f: D\subset \mathbb {R}^{d} \rightarrow \mathbb {R}\) obtained in some xD is defined as the set of all \(a \in \mathbb {R}^{d}\), such that \(f(y)-f(x)\ge {\langle a\vert x-y \rangle }_{\mathbb {R}^{d}}\).

  5. 5 Note that each matrix P can be written as P = QΛQ, where Λ is diagonal matrix whose entries are eigenvalues of P, and Q is orthogonal matrix from COE.

References

  1. Aharon M, Bruckstein MEA (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  2. Axelrod S, Gopinath R, Olsen P (2002) Modeling with a subspace constraint on inverse covariance matrices. In: Proceedings of the ISCA internaional conference on spoken language processing, pp 2177–2180

  3. Axelrod S, Goel V, Gopinath RA, Olsen PA, Visweswariah K (2005) Subspace constrained Gaussian mixture models for speech recognition. IEEE Trans Speech Audio Process 13(6):1144– 1160

    Article  Google Scholar 

  4. Bertolami R, Bunke H (2008) Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recog 41(11):3452–3460

    Article  MATH  Google Scholar 

  5. Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press

  6. Burget L, Schwarz P, Agarwal M, Akyazi P, Kai F, Glembek O, Goel N, Karafiát M, Povey D, Rastrow A, Rose RC, Thomas S (2010) Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4334–4337

  7. Cai R, Hao Z, Wen W, Wang L (2013) Regularized Gaussian mixture model based discretization for gene expression data association mining. Appl Intell 39(3):607–613

    Article  Google Scholar 

  8. Chen J, Zhang B, Cao H, Prasad R, Natarajan P (2012a) Applying discriminatively optimized feature transform for HMM-based off-line handwriting recognition. In: Proceedings of the IEEE international conference on frontiers in handwriting recognition, pp 219–224

  9. Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012b) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612

    Article  Google Scholar 

  10. Dharanipragada S, Visweswariah K (2006) Gaussian mixture models with covariances or precisions in shared multiple subspaces. IEEE Trans Speech Audio Process 14(4):1255– 1266

    Article  Google Scholar 

  11. Elad M (2010) Sparse and redundant representations: from theory to applications in signal and image processing. Springer Verlag

  12. Elad M, Figueiredo MAT, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982

    Article  Google Scholar 

  13. Gales MJF (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281

    Article  Google Scholar 

  14. Gopinath RA (1998) Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 2, pp 661–664

  15. Günter S, Bunke H (2004) HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recog 37(10):2069–2079

    Article  Google Scholar 

  16. Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 4, pp 317–320

  17. Horn RA, Johnson CR (1990) Matrix analysis. Cambridge University Press

  18. Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of the ACM international conference on content-based image and video retrieval, pp 319–328

  19. Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimedia 14(4):1196–1205

    Article  Google Scholar 

  20. Janev M, Pekar D, Jakovljević N, Delić V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMs with full covariance matrices. Appl Intell 33(2):107– 116

    Article  Google Scholar 

  21. Kannan A, Ostendorf N, Rohlicek J R (1994) Maximum likelihood clustering of Gaussian mixtures for speech recognition. IEEE Trans Speech Audio Process 2(3):453–455

    Article  Google Scholar 

  22. Liwicki M, Bunke H (2009) Combining diverse on-line and off-line systems for handwritten text line recognition. Pattern Recog 42(12):3254–3263

    Article  MATH  Google Scholar 

  23. Mezzadri F (2007) How to generate random matrices from the classical compact groups. AMS Not 54(5):592–04

    MathSciNet  MATH  Google Scholar 

  24. Nocedal J, Wright SJ (1999) Numerical optimization. Springer Verlag

  25. Olsen P A, Gopinath R A (2004) Modeling inverse covariance matrices by basis expansion. IEEE Trans Speech Audio Process 12(1):37–46

    Article  Google Scholar 

  26. Perkins S, Theiler J (2003) Online feature selection using Grafting. In: Proceedings of the IMLS international conference on machine learning, vol 20, pp 592–599

  27. Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356

    MathSciNet  MATH  Google Scholar 

  28. Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Sečujski M, Delić V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell 37(3):377–389

    Article  Google Scholar 

  29. Povey D (2009) A tutorial-style introduction to subspace Gaussian mixture models for speech recognition. Tech. Rep. MSR-TR-2009-111. Microsoft Research, Redmond, WA

  30. Povey D, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010) Subspace Gaussian mixture models for speech recognition. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4330–4333

  31. Povey D, Burget L, Agarwal M, Akyazi P, FKai Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose R C, Schwarz P, Thomas S (2011) The subspace Gaussian mixture modela structured model for speech recognition. Comput Speech Lang 25(2):404–439

    Article  Google Scholar 

  32. Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057

    Article  Google Scholar 

  33. Schmidt M, Fung G, Rosaless R (2009) Optimization methods for 1-regularization. Tech. Rep. TR-2009-19, University of British Columbia

  34. Spall JC (2003) Introduction to stochastic search and optimization - Estimation, simulation and control. Wiley

  35. Trefethen LN, Bau D (1997) Numerical linear algebra. 50, SIAM

  36. Vanhoucke V, Sankar A (2004) Mixtures of inverse covariances. IEEE Trans Speech Audio Process 12(3):250–264

    Article  Google Scholar 

  37. Wang Y, Huo Q (2009) Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition. Pattern Recog 42(12):3296–3302

    Article  MATH  Google Scholar 

  38. Webb AR (2002) Statistical pattern recognition. Wiley

  39. Wright SJ, Nowak RD, Figueiredo MAT (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research work has been supported by the Ministry of Science and Technology of Republic of Serbia, as part of projects: III44003, III43002 and TR32035.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Branko Brkljač.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brkljač, B., Janev, M., Obradović, R. et al. Sparse representation of precision matrices used in GMMs. Appl Intell 41, 956–973 (2014). https://doi.org/10.1007/s10489-014-0581-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0581-6

Keywords

Navigation