Discriminative Training of Gender-Dependent Acoustic Models

Vaněk, Jan; Psutka, Josef V.; Zelinka, Jan; Pražák, Aleš; Psutka, Josef

doi:10.1007/978-3-642-04208-9_46

Jan Vaněk²¹,
Josef V. Psutka²¹,
Jan Zelinka²¹,
Aleš Pražák²¹ &
…
Josef Psutka²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5729))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

831 Accesses
7 Citations

Abstract

The main goal of this paper is to explore the methods of gender-dependent acoustic modeling that would take the possibly of imperfect function of a gender detector into consideration. Such methods will be beneficial in real-time recognition tasks (eg. real-time subtitling of meetings) when the automatic gender detection is delayed or incorrect. The goal is to minimize an impact to the correct function of the recognizer. The paper also describes a technique of unsupervised splitting of training data, which can improve gender-dependent acoustic models trained on the basis of manual markers (male/female). The idea of this approach is grounded on the fact that a significant amount of ”masculine” female and ”feminine” male voices occurring in training corpora and also on frequent errors in manual markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Gadde, V.R., Rao, P.M., Rickey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI Hub-5 Conversational Speech Transcription System. In: Proc. NIST Speech Transcription Workshop, College Park, MD (March 2000)
Google Scholar
Zelinka, J.: Audio-visual speech recognition. PhD. thesis, University of West Bohemia, Department of Cybernetics (2009) (in Czech)
Google Scholar
Povey, D.: Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University, Department of Engineering (2003)
Google Scholar
Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in MCE training for speech recognition. In: Proc. Interspeech 2006 (2006)
Google Scholar
McDermott, E., Hazen, T., Roux, J.L., Nakamura, A., Katagiri, S.: Discriminative training for large vocabulary speech recognition using minimum classification error. IEEE Trans. Speech and Audio Proc. 14(2) (2006)
Google Scholar
Reichl, W., Ruske, G.: Discriminative Training for Continuous Speech Recognition. In: Proc. 1995 Europ. Conf. on Speech Communication and Technology, Madrid, September 1995, vol. 1, pp. 537–540 (1995)
Google Scholar
Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, L.R.: Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. In: ICASSP (1986)
Google Scholar
Kapadia, S.: Discriminative Training of Hidden Markov Models. Ph.D. thesis, Cambridge University, Department of Engineering (1998)
Google Scholar
Povey, D., Woodland, P.C.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: IEEE international Conference on Acoustics Speech and Signal Processing, Salt Lake City, Utah, May 7-11 (2001)
Google Scholar
Povey, D., Woodland, P.C.: Frame discrimination training for HMMs for large vocabulary speechrecognition. In: Proceedings of the ICASSP, Phoenix, USA (1999)
Google Scholar
Gauvain, L., Lee, C.H.: Maximum A-Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. In: IEEE Transactions SAP (1994)
Google Scholar
Povey, D., Gales, M.J.F., Kim, D.Y., Woodland, P.C.: MMI-MAP and MPE-MAP for acoustic model adaptation. In: EUROSPEECH, pp. 1981–1984 (2003)
Google Scholar
Povey, D., Woodland, P.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the ICASSP, Orlando, USA (2002)
Google Scholar
Radová, V., Psutka, J.: UWB-S01 Corpus: A Czech Read-Speech Corpus. In: Proceedings of the 6th International Conference on Spoken Language Processing ICSLP2000, Beijing, China (2000)
Google Scholar
Psutka, J., Müller, L., Psutka, J.V.: Comparison of MFCC and PLP Parameterization in the Speaker Independent Continuous Speech Recognition Task. In: 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark (2001)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustic. Soc. Am. 87 (1990)
Google Scholar
Psutka, J.: Robust PLP-Based Parameterization for ASR Systems. In: SPECOM 2007 Proceedings. Moscow State Linguistic University, Moscow (2007)
Google Scholar
Young, s., et al.: The HTK Book (for HTK Version 3.4), Cambridge (2006)
Google Scholar
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: International Conference on Spoken Language Processing (ICSLP 2002), Denver, USA (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic
Jan Vaněk, Josef V. Psutka, Jan Zelinka, Aleš Pražák & Josef Psutka

Authors

Jan Vaněk
View author publications
You can also search for this author in PubMed Google Scholar
Josef V. Psutka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Zelinka
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Pražák
View author publications
You can also search for this author in PubMed Google Scholar
Josef Psutka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Wet Bohemia at Pilsen, Czech Republic
Václav Matoušek
Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vaněk, J., Psutka, J.V., Zelinka, J., Pražák, A., Psutka, J. (2009). Discriminative Training of Gender-Dependent Acoustic Models. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-04208-9_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04207-2
Online ISBN: 978-3-642-04208-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics