Skip to main content

Discriminative Training of Gender-Dependent Acoustic Models

  • Conference paper
Text, Speech and Dialogue (TSD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5729))

Included in the following conference series:

Abstract

The main goal of this paper is to explore the methods of gender-dependent acoustic modeling that would take the possibly of imperfect function of a gender detector into consideration. Such methods will be beneficial in real-time recognition tasks (eg. real-time subtitling of meetings) when the automatic gender detection is delayed or incorrect. The goal is to minimize an impact to the correct function of the recognizer. The paper also describes a technique of unsupervised splitting of training data, which can improve gender-dependent acoustic models trained on the basis of manual markers (male/female). The idea of this approach is grounded on the fact that a significant amount of ”masculine” female and ”feminine” male voices occurring in training corpora and also on frequent errors in manual markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Gadde, V.R., Rao, P.M., Rickey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI Hub-5 Conversational Speech Transcription System. In: Proc. NIST Speech Transcription Workshop, College Park, MD (March 2000)

    Google Scholar 

  2. Zelinka, J.: Audio-visual speech recognition. PhD. thesis, University of West Bohemia, Department of Cybernetics (2009) (in Czech)

    Google Scholar 

  3. Povey, D.: Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University, Department of Engineering (2003)

    Google Scholar 

  4. Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in MCE training for speech recognition. In: Proc. Interspeech 2006 (2006)

    Google Scholar 

  5. McDermott, E., Hazen, T., Roux, J.L., Nakamura, A., Katagiri, S.: Discriminative training for large vocabulary speech recognition using minimum classification error. IEEE Trans. Speech and Audio Proc. 14(2) (2006)

    Google Scholar 

  6. Reichl, W., Ruske, G.: Discriminative Training for Continuous Speech Recognition. In: Proc. 1995 Europ. Conf. on Speech Communication and Technology, Madrid, September 1995, vol. 1, pp. 537–540 (1995)

    Google Scholar 

  7. Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, L.R.: Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. In: ICASSP (1986)

    Google Scholar 

  8. Kapadia, S.: Discriminative Training of Hidden Markov Models. Ph.D. thesis, Cambridge University, Department of Engineering (1998)

    Google Scholar 

  9. Povey, D., Woodland, P.C.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: IEEE international Conference on Acoustics Speech and Signal Processing, Salt Lake City, Utah, May 7-11 (2001)

    Google Scholar 

  10. Povey, D., Woodland, P.C.: Frame discrimination training for HMMs for large vocabulary speechrecognition. In: Proceedings of the ICASSP, Phoenix, USA (1999)

    Google Scholar 

  11. Gauvain, L., Lee, C.H.: Maximum A-Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. In: IEEE Transactions SAP (1994)

    Google Scholar 

  12. Povey, D., Gales, M.J.F., Kim, D.Y., Woodland, P.C.: MMI-MAP and MPE-MAP for acoustic model adaptation. In: EUROSPEECH, pp. 1981–1984 (2003)

    Google Scholar 

  13. Povey, D., Woodland, P.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the ICASSP, Orlando, USA (2002)

    Google Scholar 

  14. Radová, V., Psutka, J.: UWB-S01 Corpus: A Czech Read-Speech Corpus. In: Proceedings of the 6th International Conference on Spoken Language Processing ICSLP2000, Beijing, China (2000)

    Google Scholar 

  15. Psutka, J., Müller, L., Psutka, J.V.: Comparison of MFCC and PLP Parameterization in the Speaker Independent Continuous Speech Recognition Task. In: 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark (2001)

    Google Scholar 

  16. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustic. Soc. Am. 87 (1990)

    Google Scholar 

  17. Psutka, J.: Robust PLP-Based Parameterization for ASR Systems. In: SPECOM 2007 Proceedings. Moscow State Linguistic University, Moscow (2007)

    Google Scholar 

  18. Young, s., et al.: The HTK Book (for HTK Version 3.4), Cambridge (2006)

    Google Scholar 

  19. Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: International Conference on Spoken Language Processing (ICSLP 2002), Denver, USA (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vaněk, J., Psutka, J.V., Zelinka, J., Pražák, A., Psutka, J. (2009). Discriminative Training of Gender-Dependent Acoustic Models. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04208-9_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04207-2

  • Online ISBN: 978-3-642-04208-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics