Gaussian Mixture Models

Yu, Dong; Deng, Li

doi:10.1007/978-1-4471-5779-3_2

Dong Yu³ &
Li Deng⁴

Part of the book series: Signals and Communication Technology ((SCT))

13k Accesses
3 Citations

Abstract

In this chapter we first introduce the basic concepts of random variables and the associated distributions. These concepts are then applied to Gaussian random variables and mixture-of-Gaussian random variables. Both scalar and vector-valued cases are discussed and the probability density functions for these random variables are given with their parameters specified. This introduction leads to the Gaussian mixture model (GMM) when the distribution of mixture-of-Gaussian random variables is used to fit the real-world data such as speech features. The GMM as a statistical model for Fourier-spectrum-based speech features plays an important role in acoustic modeling of conventional speech recognition systems. We discuss some key advantages of GMMs in acoustic modeling, among which is the easy way of using them to fit the data of a wide range of speech features using the EM algorithm. We describe the principle of maximum likelihood and the related EM algorithm for parameter estimation of the GMM in some detail as it is still a widely used method in speech recognition. We finally discuss a serious weakness of using GMMs in acoustic modeling for speech recognition, motivating new models and methods that form the bulk part of this book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Detailed derivation of these formulae can be found in [1], which we omit here. Related derivations for similar but more general models can be found in [2, 3, 6, 15, 18].

References

Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, TR-97-021, ICSI (1997)
Google Scholar
Bilmes, J.: What HMMs can do. IEICE Trans. Inf. Syst. E89-D(3), 869–891 (2006)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio, Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Ser. B. 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Deng, L.: A generalized hidden markov model with state-conditioned trend functions of time for the speech signal. Signal Process. 27(1), 65–78 (1992)
Article MATH Google Scholar
Deng, L.: Computational models for speech production. In: Computational Models of Speech Pattern Processing, pp. 199–213. Springer, New York (1999)
Google Scholar
Deng, L.: Switching dynamic system models for speech articulation and acoustics. In: Mathematical Foundations of Speech and Language Processing, pp. 115–134. Springer, New York (2003)
Google Scholar
Deng, L.: Dynamic Speech Models—Theory, Algorithm, and Applications. Morgan and Claypool, New York (2006)
Google Scholar
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environment. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 806–809 (2000)
Google Scholar
Deng, L., Droppo, J.: A. Acero: recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11, 568–580 (2003)
Article Google Scholar
Deng, L., Droppo, J., Acero, A.: A Bayesian approach to speech feature enhancement using the dynamic cepstral prior. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-829–I-832 (2002)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Trans. Speech Audio Process. 12(2), 133–143 (2004)
Article Google Scholar
Deng, L., Kenny, P., Lennig, M., Gupta, V., Seitz, F., Mermelsten, P.: Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Acoust, Speech Signal Process. 39(7), 1677–1681 (1991)
Article Google Scholar
Deng, L., Mark, J.: Parameter estimation for markov modulated poisson processes via the em algorithm with time discretization. In: Telecommunication Systems (1993)
Google Scholar
Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, New York (2003)
Google Scholar
Deng, L., Ramsay, G., Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)
Article Google Scholar
Deng, L., Rathinavelu, C.: A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition. Comput. Speech Lang. 9(1), 63–86 (1995)
Article Google Scholar
Deng, L., Wang, K., Acero, A., Hon, H., Droppo, J., Boulis, C., Wang, Y., Jacoby, D., Mahajan, M., Chelba, C., Huang, X.: Distributed speech processing in mipad’s multimodal user interface. IEEE Trans. Audio Speech Lang. Process. 20(9), 2409–2419 (2012)
Article Google Scholar
Divenyi, P., Greenberg, S., Meyer, G.: Dynamics of Speech Production and Perception. IOS Press, Washington (2006)
Google Scholar
Frey, B., Deng, L., Acero, A., Kristjansson, T.: Algonquin: iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH) (2000)
Google Scholar
He, X., Deng, L.: Discriminative Learning for Speech Recognition: Theory and Practice. Morgan and Claypool, New York (2008)
Google Scholar
Huang, X., Acero, A., Hon, H.W., et al.: Spoken Language Processing. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Huang, X., Deng, L.: An overview of modern speech recognition. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010). ISBN 978-1420085921
Google Scholar
Jiang, H., Li, X.: Discriminative learning in sequential pattern recognition—a unifying review for optimization-oriented speech recognition. IEEE Signal Process. Mag. 27(3), 115–127 (2010)
Article Google Scholar
Jiang, H., Li, X., Liu, C.: Large margin hidden markov models for speech recognition. IEEE Trans. Audio, Speech Lang. Process. 14(5), 1584–1595 (2006)
Article Google Scholar
Juang, B.H., Levinson, S.E., Sondhi, M.M.: Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains. In: IEEE International Symposium on Information Theory vol. 32(2), pp. 307–309 (1986)
Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005)
Google Scholar
King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M.: Speech production knowledge in automatic speech recognition. J. Acoust. Soc. Am. 121, 723–742 (2007)
Article Google Scholar
Lee, L.J., Fieguth, P., Deng, L.: A functional articulatory dynamic model for speech production. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 797–800. Salt Lake City (2001)
Google Scholar
Rasmussen, C.E.: The infinite gaussian mixture model. In: Proceedings of Neural Information Processing Systems (NIPS) (1999)
Google Scholar
Reynolds, D., Rose, R.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Xiao, L., Deng, L.: A geometric perspective of large-margin training of Gaussian models. IEEE Signal Process. Mag. 27, 118–123 (2010)
Article Google Scholar
Yin, S.C., Rose, R., Kenny, P.: A joint factor analysis approach to progressive model adaptation in text-independent speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(7), 1999–2010 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Bothell, USA
Dong Yu
Microsoft Research, Redmond, WA, USA
Li Deng

Authors

Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Gaussian Mixture Models. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5779-3_2
Published: 12 November 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5778-6
Online ISBN: 978-1-4471-5779-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics