An Evaluation Study on Speech Feature Densities for Bayesian Estimation in Robust ASR

Cifani, Simone; Principi, Emanuele; Rotili, Rudy; Squartini, Stefano; Piazza, Francesco

doi:10.1007/978-3-642-18184-9_23

Simone Cifani²¹,
Emanuele Principi²¹,
Rudy Rotili²¹,
Stefano Squartini²¹ &
…
Francesco Piazza²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6456))

1153 Accesses

Abstract

Bayesian estimators, especially the Minimum Mean Square Error (MMSE) and the Maximum A Posteriori (MAP), are very popular in estimating the clean speech STFT coefficients. Recently, a similar trend has been successfully applied to speech feature enhancement for robust Automatic Speech/Speaker Recognition (ASR) applications either in the Mel, log-Mel or in the cepstral domain. It is a matter of fact that the goodness of the estimate directly depends on the assumptions made about the noise and speech coefficients densities. Nevertheless, while this latter has been exhaustively studied in the case of STFT coefficients, not equivalent attention has been paid to the case of speech features. In this paper, we study the distribution of Mel, log-Mel as well as MFCC coefficients obtained from speech segments. The histograms of the speech features are first fitted into several pdf models by means of the Chi-Square Goodness-of-Fit test, then they are modeled using a Gaussian Mixture Model (GMM). Performed computer simulations show that the choice of log-Mel and MFCC coefficients is more convenient w.r.t. the Mel one from this perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language 23(3), 389–405 (2009)
Article Google Scholar
Wang, X., O’Shaughnessy, D.: Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation. IEEE Trans. Audio, Speech, and Lang. Process 15(4), 1204–1217 (2007)
Article Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Process 32(6), 1109–1121 (1984)
Article Google Scholar
Wolfe, P.J., Godsill, S.J.: Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP J. Appl. Signal Process 2003, 1043–1051 (2003)
Article MATH Google Scholar
Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor. IEEE Trans. Audio, Speech, and Lang. Process 16(5), 1061–1070 (2008)
Article Google Scholar
Rotili, R., Principi, E., Cifani, S., Squartini, S., Piazza, F.: Robust speech recognition using MAP based noise suppression rules in the feature domain. In: Proc. of the 19th Czech-German Workshop on Speech Processing, Prague, Czech Republic, pp. 35–41 (September 2009)
Google Scholar
Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model. IEEE Trans. on Audio, Speech & Lang. Proc. 16(8), 1654–1661 (2008)
Article Google Scholar
Li Deng, J., Droppo, J., Acero, A.: Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features. IEEE Trans. on Speech & Audio Proc. 12(3) (2004)
Google Scholar
Breithaupt, C., Martin, R.: MMSE estimation of magnitude-squared DFT coefficients with SuperGaussian priors. In: Proc. IEEE ICASSP 2003, vol. I, pp. 896–899 (2003)
Google Scholar
Lotter, T., Vary, P.: Speech Enhancement by MAP Spectral Amplitude Estimation using a Super-Gaussian Speech Model. EURASIP Journal on Applied Signal Processing 7, 1110–1126 (2005)
Article MATH Google Scholar
Martin, R.: Speech enhancement based on Minimum Mean-Square Error Estimation and Supergaussian Priors. IEEE Trans. Speech and Audio Process 13(5), 845–856 (2005)
Article Google Scholar
Andrianakis, Y., White, P.R.: Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors. Speech Communication (51), 1–14 (2009)
Google Scholar
Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients with Generalized Gamma Priors. IEEE Trans. Audio, Speech, and Lang. Process 15(6), 1741–1752 (2005)
Article Google Scholar
Hendriks, R.C., Martin, R.: MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions. IEEE Trans. Audio, Speech, and Lang. Process 15(3), 918–927 (2007)
Article Google Scholar
Chen, B., Loizou, P.C.: A Laplacian-based MMSE estimator for speech enhancement. Speech Communication (49), 134–143 (2007)
Google Scholar
Dat, T.H., Takeda, K., Itakura, F.: Generalized Gamma modeling of speech and its online estimation for speech enhancement. In: Proc. of ICASSP 2005, pp. 181–184 (2005)
Google Scholar
Van Trees, H.L.: Detection, Estimation, and Modulation Theory. Wiley, New York (1968)
MATH Google Scholar
McAulay, R.J., Malpass, M.L.: Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust., Speech, Signal Process 28(2), 137–145 (1980)
Article Google Scholar
Gazor, S., Zhang, W.: Speech Probability Distribution. IEEE Signal Processing Letters 10(7) (July 2003)
Google Scholar
Jensen, J., Batina, I., Hendriks, R.C., Heusdens, R.: A study of the distribution of time-domain speech samples and discrete Fourier coefficients. In: Proc. of IEEE SPS-DARTS, pp. 155–158 (2005)
Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)
Article MathSciNet MATH Google Scholar
Figueredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)
Article Google Scholar
Cohen, I.: Noise estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Proc., 466–475 (September 2003)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error log-spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Process 23(2), 443–445 (1985)
Article Google Scholar
Principi, E., Cifani, S., Rotili, R., Squartini, S., Piazza, F.: Comparative Evaluation of Single-Channel MMSE-Based Noise Reduction Schemes for Speech Recognition. Journal of Electrical and Computer Engineering 2010, Article ID 962103, 6pages (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy
Simone Cifani, Emanuele Principi, Rudy Rotili, Stefano Squartini & Francesco Piazza

Authors

Simone Cifani
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Principi
View author publications
You can also search for this author in PubMed Google Scholar
Rudy Rotili
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Squartini
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Piazza
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Advanced Scientific Studies, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
Istituto Nazionale di Geofisica e Vulcanologia, Osservatorio Vesuviano, Via Diocleziano 328, 80124, Napoli, Italy
Antonietta M. Esposito
Dipartemento di Ingegneria dell’ Informazione, Seconda Università di Napoli, Via Roma 29, 81031, Aversa (CE), Italy
Raffaele Martone
Department of Humanities and Social Sciences, Anatolia College/ACT, Kennedy Street, 55510, Pylaia, Greece
Vincent C. Müller
Departmnet of Physics "E.R. Caoamoeööp", University of Salerno and IIASS, International Institute for Advanced Scientific Studies, 84081, Baronissi (SA), Italy
Gaetano Scarpetta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cifani, S., Principi, E., Rotili, R., Squartini, S., Piazza, F. (2011). An Evaluation Study on Speech Feature Densities for Bayesian Estimation in Robust ASR. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V.C., Scarpetta, G. (eds) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues. Lecture Notes in Computer Science, vol 6456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18184-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-18184-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18183-2
Online ISBN: 978-3-642-18184-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics