Skip to main content
Log in

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

This paper presents an experimental evaluation of the combined temporal and spectral processing methods for speaker recognition task under noise, reverberation or multi-speaker environments. Automatic speaker recognition system gives good performance in controlled environments. Speech recorded in real environments by distant microphones is degraded by factors like background noise, reverberation and interfering speakers. This degradation strongly affects the performance of the speaker recognition system. Combined temporal and spectral processing (TSP) methods proposed in our earlier study are used for pre-processing to improve the speaker-specific features and hence the speaker recognition performance. Different types of degradation like background noise, reverberation and interfering speaker are considered for evaluation. The evaluation is carried out for the individual temporal processing, spectral processing and the combined TSP method. The experimental results show that the combined TSP methods give relatively higher recognition performance compared to either temporal or spectral processing alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen J, Berkley D 1979 Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65: 943–950

    Article  Google Scholar 

  • Bimbot F, Bonastre J F, Fredouille C, Gravier G, Chagnolleau M I, Meignier S, Merlin T, Garcia O J, Delacretaz P, Reynolds 2004 A tutorial on text-independent speaker verification. EURASIP J. Applied Signal process. 4: 430–451

    Article  Google Scholar 

  • Boll S 1979 Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech, Signal process. ASSP-27 113–120

    Google Scholar 

  • Campbell J P 1997 Speaker recognition: A tutorial. Proc. IEEE 85(9): 1437–1462

    Article  Google Scholar 

  • Ephraim Y, Malah D 1984 Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal process. ASSP-32 1109–1121

    Google Scholar 

  • Furui S 1981 Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Audio, Speech and Language process. 29(3): 342–350

    Article  Google Scholar 

  • Greenberg S, Kingsbury B E D 1997 The modulation spectrogram: in pursuit of an invariant representation of speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Munich, Germany 1647–1650

  • Habets E A P, Gannot S, Cohen I, Sommen P C W 2008 Joint dereverberation and residual echo suppression of speech signals in noisy environments. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1433–1451

    Article  Google Scholar 

  • Heck L P, Konig Y, Sönmez M K, Weintraub M 2000 Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31(2–3): 181–192

    Article  Google Scholar 

  • Kamath S, Loizou P 2002 A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Orlando, USA

  • Krishnamoorthy P, Prasanna S R M 2007 Processing noisy speech for enhancement. J. IETE Technical Review, Special issue on spoken language processing 24: 349–355

    Google Scholar 

  • Krishnamoorthy P, Prasanna S R M 2008 Temporal and spectral processing of degraded speech. In: IEEE Proc. Int. Conf. Advanced Computing and Communications 112–118

  • Krishnamoorthy P, Prasanna S R M 2009 Reverberant speech enhancement by temporal and spectral processing. IEEE Trans. Speech, Audio and Language Process. 17(2): 253–266

    Article  Google Scholar 

  • Lebart K, Boucher J 2001 A new method based on spectral subtraction for speech dereverberation. Acta Acoustica 87: 359–366

    Google Scholar 

  • Markel J 1972 The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio and Electroacoustics 20: 367–377

    Article  Google Scholar 

  • McAulay R, Quatieri T 1986 Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust., Speech, Signal process. ASSP-34 744–754

    Google Scholar 

  • Ming J, Hazen T, Glass J, Reynolds D 2007 Robust speaker recognition in noisy conditions. IEEE Trans. Audio, Speech, and Language process. 15(5): 1711–1723

    Article  Google Scholar 

  • Morgan D, George E, Lee L, Kay S 1997 Cochannel speaker separation by harmonic enhancement and suppression. IEEE Trans. Speech Audio process. 5: 407–424

    Article  Google Scholar 

  • Murty K S R, Yegnanarayana B 2008 Epoch extraction from speech signals. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1602–1613

    Article  Google Scholar 

  • Ortega-Garcia J, Gonzalez-Rodriguez J 1996 Overview of speech enhancement techniques for automatic speaker recognition. In: Proc. Fourth Int. Conf. Spoken Language. 2: 929–932

    Article  Google Scholar 

  • Parsons T 1976 Separation of speech from interfering speech by means of harmonic selection. J. Acoust. Soc. Am. 60: 911–918

    Article  Google Scholar 

  • Picone J 1993 Signal modelling techniques in speech recognition. Proc. IEEE 81(9): 1215–1247

    Article  Google Scholar 

  • Prakash V, Hansen J 2007 In-set/out-of-set speaker recognition under sparse enrollment. IEEE Trans. Audio, Speech, and Language process. 15(7): 2044–2052

    Article  Google Scholar 

  • Prasanna S R M, Sandeep Reddy B, Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks and modulation spectrum energies. IEEE Trans. Speech, Audio and Language Process. 17(4): 556–565

    Article  Google Scholar 

  • Prasanna S R M, Subramanian A 2005 Finding pitch markers using first order Gaussian differentiator. In: IEEE Proc. Third Int. Conf. Intelligent Sensing Information Process. 140–145

  • Prasanna S R M, Yegnanarayana B 2004 Extraction of pitch in adverse conditions. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Vol. 1. Montreal, Quebec, Canada I-109–I-112

    Google Scholar 

  • Proakis J G, Manolakis D G 1996 Digital signal processing-principles, algorithms, and applications, 3rd Edition. Prentice Hall

  • Rao K, Prasanna S R M, Yegnanarayana B 2007 Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal process. Letters 14(10): 762–765

    Article  Google Scholar 

  • Reynolds D 1994 Experimental evaluation of features for robust speaker identification. IEEE Trans., Speech Audio process. 2(4): 639–643

    Article  Google Scholar 

  • Reynolds D A 1995 Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17: 91–108

    Article  Google Scholar 

  • Reynolds D A 2000 Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3): 19–41

    Article  Google Scholar 

  • Sankar A, Lee C-H 1996 A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. Speech Audio process. 4(3): 190–202

    Article  Google Scholar 

  • Varga A, Steeneken H J M 1993 Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Communication 12(3): 247–251

    Article  Google Scholar 

  • Wu M, Wang D 2006 A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Trans. Audio, Speech, Language process. 14: 774–784

    Article  Google Scholar 

  • Yegnanarayana B, Avendano C, Hermansky H, Satyanarayana Murthy P 1999 Speech enhancement using linear prediction residual. Speech Communication 28: 25–42

    Article  Google Scholar 

  • Yegnanarayana B, Prasanna S R M, Duraiswami R, Zotkin D 2005 Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio process. 13: 1110–1118

    Article  Google Scholar 

  • Yegnanarayana B, Prasanna S R M, Mathew M 2003 Enhancement of speech in multispeaker environment. In: Proc. European Conf. Speech process., Technology. Geneva, Switzerland 581–584

  • Yegnanarayana B, Satyanarayana Murthy P 2000 Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio process. 8: 267–281

    Article  Google Scholar 

  • Zilovic M, Ramachandran R, Mammone R 1998 Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio process. 6(3): 260–267

    Article  Google Scholar 

  • Zue V, Seneff S, Glass J 1990 Speech database development at MIT: TIMIT and beyond. Speech Communication 9(4): 351–356

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Krishnamoorthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnamoorthy, P., Mahadeva Prasanna, S.R. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana 34, 729–754 (2009). https://doi.org/10.1007/s12046-009-0043-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-009-0043-8

Keywords

Navigation