Practical Considerations for Real-Time Implementation of Speech-Based Gender Detection

  • Erik Scheme
  • Eduardo Castillo-Guerra
  • Kevin Englehart
  • Arvind Kizhanatham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4225)


This paper describes a detailed analysis and implementation of a robust gender detector for audio stream applications. The implementation, based on melcepstral features and a Gaussian mixture model classifier, is designed to maximize gender classification performance in continuous speech. The described detector outperforms other reported systems based on statistically significant numbers of gender verifications (2136 unique speakers) obtained from the FISHER speech corpus. The system yields high accuracies for long and short utterances while a confidence figure of merit score for the decision ensures reliability in continuous audio streams.


Gender detection GMM classification audio streaming 


  1. 1.
    Parris, E.S., Carey, M.J.: Language Dependent Gender Identification. In: Acoustics, Speech, and Signal Processing. ICASSP-1996 Conference Proceedings, vol. 2, pp. 685–688 (1996)Google Scholar
  2. 2.
    Hurb, H., Chen, L.: Gender Identification Using a General Audio Classifier. In: ICME 2003 Proceedings, July 2003, vol. 2, pp. 733–736 (2003)Google Scholar
  3. 3.
    Kamran, M., Bruce, I.C.: Robust Formant Tracking for Continuous Speech with Speaker Variability. IEEE Trans. Speech and Audio Proc. January 19 (2005) (accepted for publication)Google Scholar
  4. 4.
    Vergin, R., Farhat, A., O’Shaughnessy, D.: Robust Gender-dependent Acoustic-phonetic Modelling in Continuous Speech Recognition Based on a New Automatic Male/female Classification. In: ICSLP 1996 Conference Proceedings, October 1996, vol. 2, pp. 1081–1084 (1996)Google Scholar
  5. 5.
    Slomka, S., Sridharan, S.: Automatic Gender Identification Optimised for Language Independence. In: TENCON 1997 IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications Conference Proceedings, December 1997, vol. 1, pp. 685–688 (1997)Google Scholar
  6. 6.
    Childers, D.G., Ke, W., Bae, K.S., Hicks, D.M.: Automatic Recognition of Gender by Voice. Acoustics, Speech, and Signal Processing. In: ICASSP 1988 Conference Proceedings, vol. 1, pp. 603–606 (1988)Google Scholar
  7. 7.
    Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller, J.R.: Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features. In: International Conference in Spoken Language. Denver (2002)Google Scholar
  8. 8.
    Chen, T., Huang, C., Chang, E., Wang, J.: Automatic Accent Identification Using Gaussian Mixture Models. In: Workshop in Automatic Speech Recognition and Understanding ASRU 2001, pp. 343–346 (2001)Google Scholar
  9. 9.
    Andrianaki, I., White, P. R.: Modeling of Mel Frequency Features for Non Stationary Noise. Institute of Sound and Vibration Research. University of Southampton. Available:
  10. 10.
    Fisher English Training Speech Part 1, Linguistic Data Consortium, LDC2004S13 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Erik Scheme
    • 1
  • Eduardo Castillo-Guerra
    • 2
  • Kevin Englehart
    • 1
    • 2
  • Arvind Kizhanatham
    • 3
  1. 1.Institute of Bimedical EngineeringUniversity of New BrunswickFrederictonCanada
  2. 2.Dept. of Electrical and Computer EngineeringUniversity of New BrunswickFrederictonCanada
  3. 3.Diaphonics Inc.Halifax, Nova ScotiaCanada

Personalised recommendations