Applied Intelligence

, Volume 41, Issue 3, pp 956–973 | Cite as

Sparse representation of precision matrices used in GMMs

  • Branko Brkljač
  • Marko Janev
  • Radovan Obradović
  • Danilo Rapaić
  • Nebojša Ralević
  • Vladimir Crnojević


The paper presents a novel precision matrix modeling technique for Gaussian Mixture Models (GMMs), which is based on the concept of sparse representation. Representation coefficients of each precision matrix (inverse covariance), as well as an accompanying overcomplete matrix dictionary, are learned by minimizing an appropriate functional, the first component of which corresponds to the sum of Kullback-Leibler (KL) divergences between the initial and the target GMM, and the second represents the sparse regularizer of the coefficients. Compared to the existing, alternative approaches for approximate GMM modeling, like popular subspace-based representation methods, the proposed model results in notably better trade-off between the representation error and the computational (memory) complexity. This is achieved under assumption that the training data in the recognition system utilizing GMM have an inherent sparseness property, which enables application of the proposed model and approximate representation using only one dictionary and a significantly smaller number of coefficients. Proposed model is experimentally compared with the Subspace Precision and Mean (SPAM) model, a state of the art instance of subspace-based representation models, using both the data from a real Automatic Speech Recognition (ASR) system, and specially designed sets of artificially created/synthetic data.


Sparse representation Gaussian mixtures 1−regularization Precision matrix Speech recognition Pattern classification 



This research work has been supported by the Ministry of Science and Technology of Republic of Serbia, as part of projects: III44003, III43002 and TR32035.


  1. 1.
    Aharon M, Bruckstein MEA (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefGoogle Scholar
  2. 2.
    Axelrod S, Gopinath R, Olsen P (2002) Modeling with a subspace constraint on inverse covariance matrices. In: Proceedings of the ISCA internaional conference on spoken language processing, pp 2177–2180Google Scholar
  3. 3.
    Axelrod S, Goel V, Gopinath RA, Olsen PA, Visweswariah K (2005) Subspace constrained Gaussian mixture models for speech recognition. IEEE Trans Speech Audio Process 13(6):1144– 1160CrossRefGoogle Scholar
  4. 4.
    Bertolami R, Bunke H (2008) Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recog 41(11):3452–3460CrossRefzbMATHGoogle Scholar
  5. 5.
    Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University PressGoogle Scholar
  6. 6.
    Burget L, Schwarz P, Agarwal M, Akyazi P, Kai F, Glembek O, Goel N, Karafiát M, Povey D, Rastrow A, Rose RC, Thomas S (2010) Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4334–4337Google Scholar
  7. 7.
    Cai R, Hao Z, Wen W, Wang L (2013) Regularized Gaussian mixture model based discretization for gene expression data association mining. Appl Intell 39(3):607–613CrossRefGoogle Scholar
  8. 8.
    Chen J, Zhang B, Cao H, Prasad R, Natarajan P (2012a) Applying discriminatively optimized feature transform for HMM-based off-line handwriting recognition. In: Proceedings of the IEEE international conference on frontiers in handwriting recognition, pp 219–224Google Scholar
  9. 9.
    Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012b) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612CrossRefGoogle Scholar
  10. 10.
    Dharanipragada S, Visweswariah K (2006) Gaussian mixture models with covariances or precisions in shared multiple subspaces. IEEE Trans Speech Audio Process 14(4):1255– 1266CrossRefGoogle Scholar
  11. 11.
    Elad M (2010) Sparse and redundant representations: from theory to applications in signal and image processing. Springer VerlagGoogle Scholar
  12. 12.
    Elad M, Figueiredo MAT, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982CrossRefGoogle Scholar
  13. 13.
    Gales MJF (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281CrossRefGoogle Scholar
  14. 14.
    Gopinath RA (1998) Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 2, pp 661–664Google Scholar
  15. 15.
    Günter S, Bunke H (2004) HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recog 37(10):2069–2079CrossRefGoogle Scholar
  16. 16.
    Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 4, pp 317–320Google Scholar
  17. 17.
    Horn RA, Johnson CR (1990) Matrix analysis. Cambridge University PressGoogle Scholar
  18. 18.
    Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of the ACM international conference on content-based image and video retrieval, pp 319–328Google Scholar
  19. 19.
    Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimedia 14(4):1196–1205CrossRefGoogle Scholar
  20. 20.
    Janev M, Pekar D, Jakovljević N, Delić V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMs with full covariance matrices. Appl Intell 33(2):107– 116CrossRefGoogle Scholar
  21. 21.
    Kannan A, Ostendorf N, Rohlicek J R (1994) Maximum likelihood clustering of Gaussian mixtures for speech recognition. IEEE Trans Speech Audio Process 2(3):453–455CrossRefGoogle Scholar
  22. 22.
    Liwicki M, Bunke H (2009) Combining diverse on-line and off-line systems for handwritten text line recognition. Pattern Recog 42(12):3254–3263CrossRefzbMATHGoogle Scholar
  23. 23.
    Mezzadri F (2007) How to generate random matrices from the classical compact groups. AMS Not 54(5):592–04MathSciNetzbMATHGoogle Scholar
  24. 24.
    Nocedal J, Wright SJ (1999) Numerical optimization. Springer VerlagGoogle Scholar
  25. 25.
    Olsen P A, Gopinath R A (2004) Modeling inverse covariance matrices by basis expansion. IEEE Trans Speech Audio Process 12(1):37–46CrossRefGoogle Scholar
  26. 26.
    Perkins S, Theiler J (2003) Online feature selection using Grafting. In: Proceedings of the IMLS international conference on machine learning, vol 20, pp 592–599Google Scholar
  27. 27.
    Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356MathSciNetzbMATHGoogle Scholar
  28. 28.
    Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Sečujski M, Delić V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell 37(3):377–389CrossRefGoogle Scholar
  29. 29.
    Povey D (2009) A tutorial-style introduction to subspace Gaussian mixture models for speech recognition. Tech. Rep. MSR-TR-2009-111. Microsoft Research, Redmond, WAGoogle Scholar
  30. 30.
    Povey D, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010) Subspace Gaussian mixture models for speech recognition. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4330–4333Google Scholar
  31. 31.
    Povey D, Burget L, Agarwal M, Akyazi P, FKai Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose R C, Schwarz P, Thomas S (2011) The subspace Gaussian mixture modela structured model for speech recognition. Comput Speech Lang 25(2):404–439CrossRefGoogle Scholar
  32. 32.
    Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057CrossRefGoogle Scholar
  33. 33.
    Schmidt M, Fung G, Rosaless R (2009) Optimization methods for 1-regularization. Tech. Rep. TR-2009-19, University of British ColumbiaGoogle Scholar
  34. 34.
    Spall JC (2003) Introduction to stochastic search and optimization - Estimation, simulation and control. WileyGoogle Scholar
  35. 35.
    Trefethen LN, Bau D (1997) Numerical linear algebra. 50, SIAMGoogle Scholar
  36. 36.
    Vanhoucke V, Sankar A (2004) Mixtures of inverse covariances. IEEE Trans Speech Audio Process 12(3):250–264CrossRefGoogle Scholar
  37. 37.
    Wang Y, Huo Q (2009) Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition. Pattern Recog 42(12):3296–3302CrossRefzbMATHGoogle Scholar
  38. 38.
    Webb AR (2002) Statistical pattern recognition. WileyGoogle Scholar
  39. 39.
    Wright SJ, Nowak RD, Figueiredo MAT (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Branko Brkljač
    • 1
  • Marko Janev
    • 2
  • Radovan Obradović
    • 1
  • Danilo Rapaić
    • 1
  • Nebojša Ralević
    • 1
  • Vladimir Crnojević
    • 1
  1. 1.Faculty of Technical Sciences (FTN)University of Novi SadNovi SadRepublic of Serbia
  2. 2.Mathematical Institute of the Serbian Academy of Sciences and ArtsBelgradeRepublic of Serbia

Personalised recommendations