Skip to main content
Log in

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

In this paper, the explicit and implicit modelling of the subsegmental excitation information are experimentally compared. For explicit modelling, the static and dynamic values of the standard Liljencrants–Fant (LF) parameters that model the glottal flow derivative (GFD) are used. A simplified approximation method is proposed to compute these LF parameters by locating the glottal closing and opening instants. The proposed approach significantly reduces the computation needed to implement the LF model. For implicit modelling, linear prediction (LP) residual samples considered in blocks of 5 ms with shift of 2.5 ms are used. Different speaker recognition studies are performed using NIST-99 and NIST-03 databases. In case of speaker identification, the implicit modelling provides significantly better performance compared to explicit modelling. Alternatively, the explicit modelling seem to be providing better performance in case of speaker verification. This indicates that explicit modelling seem to have relatively less intra and inter-speaker variability. The implicit modelling on the other hand, has more intra and inter-speaker variability. What is desirable is less intra and more inter-speaker variability. Therefore, for speaker verification task explicit modelling may be used and for speaker identification task implicit modelling may be used. Further, for both speaker identification and verification tasks the explicit modelling provides relatively more complimentary information to the state-of-the-art vocal tract features. The contribution of the explicit features is relatively more robust against noise. We suggest that the explicit approach can be used to model the subsegmental excitation information for speaker recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  • Ananthapadmanabha T V and Fant G 1982 Calculation of true glottal flow and its components. Speech Commun. 1: 167–184

    Article  Google Scholar 

  • Ananthapadmanabha T V and Yegnanarayana B 1979 Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. on Acoust., Speech and Signal Process. ASSP-27: 309–319

    Google Scholar 

  • Atal B S 1972 Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Amer. 52(6): 1687–1697

    Article  Google Scholar 

  • Atal B S 1976 Automatic recognition of speakers from their voices. Proc. IEEE 64(4): 460–475

    Article  Google Scholar 

  • Bimbot F, Bonastre J-F, Fredouille C et al 2004 A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 4: 430–451

    Google Scholar 

  • Campbell J P Jr 1997 Speaker recognition: A tutorial. Proc. IEEE 85(9): 1437–1462

    Article  Google Scholar 

  • Carlson R, Fant G, Gobl C, Granstrom B, Karlsson I and Lin Q-G 1989 Voice source rules for text-to-speech synthesis. In: Int. conf. on Acoust. Speech and Signal Process. (ICASSP) vol. 1, Glasgow, Scotland, 223–226

  • Cohen L 1995 Time-frequency analysis: theory and application, ser. Signal Processing Series. Englewood Cliffs: Prentice Hall

  • Davis S B and Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech and Signal Process. 28(28): 357–366

    Article  Google Scholar 

  • Deller J R Jr, Hansen J H L and Proakis J G 1993 Discrete-time processing of speech signal, 2nd ed. New York: IEEE Press

    Google Scholar 

  • Duda R O and Hart P E 2001 Pattern Classification, 2nd ed. Willy

  • Ezzaidi H and Rouat J 2004 Pitch and MFCC dependent GMM models for speaker identification systems. In: IEEE int. conf. on Electrical and Computer Engg., vol. 1

  • Furui S 1981a Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech, and Signal Process. 29(2): 254–272

    Article  Google Scholar 

  • Furui S 1981b Comparison of speaker recognition methods using static features and dynamic features. IEEE Trans. Acoust., Speech and Signal Process. ASSP-29(3): 342–350

    Article  Google Scholar 

  • Furui S 2005 Fifty years progress in speech and speaker recognition research. Proc. SPECOM. Patras, Greece, 1–9

  • Haeb-Umbach R 1999 Investigation on inter-speaker variability in the feature space. In: Int. conf. on Acoust. Speech and Signal Process. (ICASSP), Phoenix, AZ, 397–400

  • Hall J J and Srihari S N 1994 Decision combination in multiple classifier systems. IEEE Trans. Patt. Anal. and Mach. Intell. 16: 66–75

    Article  Google Scholar 

  • Hari Krishnan P, Padmanabhan R and Murthy H A 2006 Robust voice activity detection using group delay functions. In: Proc. IEEE International Conference on Industrial Technology, 2603–2607

  • Hayakawa S, Takeda K and Itakura F 1997 Speaker identification using harmonic structure of lp-residual spectrum. Biometric personal Aunthentification, Lecture notes, vol. 1206. Springer, Berlin, 253–260

    Google Scholar 

  • Huang W, Chao J and Zhang Y 2008 Combination of pitch and MFCC GMM supervectors for speaker verification. In: IEEE int. conf. on Audio, Language and Image Process. (ICALIP), 1335–1339

  • Iseli M R and Alwan A 2000 Inter- and intra-speaker variability of glottal flow derivative. In: Int. conf. on Spoken Language Processing (ICSLP, 2000), Beijing, Chaina

  • Karlsson I 1988 Glottal waveform parameters for different speaker types. STL-QPSR, 29(2–3): 61–67

    Google Scholar 

  • Kittler J, Hatef M, Duin R P W and Matas J 1998 On combining classifiers. IEEE Trans. Patteren Analysis and Machine Intelligence 20(3): 226–239

    Article  Google Scholar 

  • Kominek J and Black A 2004 CMU-Arctic speech database. In: 5th ISCA Speech Synthesis Workshop. Pittsburg, PA, 223–224

  • Makhoul J 1975 Linear prediction: A tutorial review. Proc. IEEE 63(4): 561–580

    Article  Google Scholar 

  • Martin A, Doddington G, Kamm T, Ordowski M and Przybocki M 1997 The DET curve in assessment of detection task performance. In: Proc. Eur. Conf. on Speech Communication Technology, vol. 4. Rhodes, Greece, 1895–1898

  • Mary L and Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50: 782–796

    Article  Google Scholar 

  • Murty K S R and Yegnanarayana B 2006 Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1): 52–55

    Article  Google Scholar 

  • Murthy K S R and Yegnanarayana B 2008 Epoch extraction from speech signal. IEEE Trans. Audio Speech and Language Process. 16(8): 1602–1613

    Article  Google Scholar 

  • Murthy K S R and Yegnanarayana B 2009 Characterization of glottal activity from speech signal. IEEE Signal Process. Lett. 16(6): 469–472

    Article  Google Scholar 

  • Murty K S R, Prasanna S R M and Yegnanarayana B 2004 Speaker specific information from residual phase. In: Int. Conf. on Signal Proces. and Comm. (SPCOM)

  • Naylor P A, Kounoudes A, Gudnason J and Brookes M 2007 Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech and Language Process. 15(1):34–43

    Article  Google Scholar 

  • Padmanabhan R and Murthy H A 2010 Acoustic feature diversity and speaker verification. In: INTERSPEECH 2010, Makuhari, Chiba, Japan, 2010–2013

  • Pati D and Prasanna S R M 2008 Non-parametric vector quantization of excitation source information for speaker recognition. In: Proc. IEEE TENCON, 1–4

  • Pati D and Prasanna S R M 2010 Speaker information from subband energies of linear prediction residual. In: Proc. NCC 2010, 1–4

  • Pati D and Prasanna S R M 2011 Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. Int. J. of Speech Technology, Springer 14(1): 49–63

    Article  Google Scholar 

  • Pati D and Prasanna S R M 2013 Processing of linear prediction residual in spectral and cepstral domains for speaker information. Proceedings in Communicated to SADHNA Academy Proceedings in Engineering Sciences, Springer

  • Plumpe M D, Quatieri T F and Reynolds D A 1999 Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech and Audio Process. 7(5): 569–586

    Article  Google Scholar 

  • Prasanna S R M, Gupta C S and Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48: 1243–1261

    Article  Google Scholar 

  • Pruzansky S and Mathews M V 1964 Talker-recognition procedure based on analysis of variance. J. Acoust. Soc. Amer. 36(11): 2041–2047

    Article  Google Scholar 

  • Przybocky M and Martin A 2000 The NIST-1999 speaker recognition evaluation- An overview. Digital Signal Processing 10: 1–18

    Article  Google Scholar 

  • Przybocky M and Martin A 2003 Nist speaker recognition evaluation plan. In: Proc. NIST Speaker Recognition Workshop, College Park, MD

  • Qi Y and Bi N 1994 A simplified approximation of the four-parameter LF model of voice source. J. Acoustic. Soc. Amer. 96(2): 1182–1185

    Article  Google Scholar 

  • Reynolds D A 1994 Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech Audio Process. 2(4): 639–643

    Article  Google Scholar 

  • Reynolds D A and Rose R C 1995 Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech and Audio Process. 3(1): 4–17

    Article  Google Scholar 

  • Strik H 1998 Automatic parameterization of differentiated glottal flow: Comparing methods by means of synthetic flow pulses. J. Acoust. Soc. Amer. 103(5): 2659–2669

    Article  Google Scholar 

  • Thevenaz P and Hugli H 1995 Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun. 17: 145–157

    Article  Google Scholar 

  • Veldhuish R 1998 A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. J. Acoust. Soc. Amer. 103(1): 566–571

    Article  Google Scholar 

  • Wang N, Ching P C and Lee T 2009 Exploration of vocal excitation modulation features for speaker recognition. In: Proc. INTERSPEECH-09, Brighton UK, 892–895

  • Wolf J J 1972 Efficient acoustic parameters for speaker recognition. J. Acoust. Soc. Amer. 51(2): 2044–2055

    Article  Google Scholar 

  • Yegnanarayana B and Veldhuis R N J 1998 Extraction of vocal-tract system characterstics from speech signals. IEEE Trans. Speech Audio Proc. 6(4): 313–327

    Article  Google Scholar 

  • Yegnenarayana B and Murthy K S R 2009 Event based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio, Speech and Language Process. 17(4): 614–624

    Article  Google Scholar 

  • Yegnanarayana B and Prasanna S R M 2010 Analysis of instantaneous F0 contours from two speakers mixed signal using zero frequency filtering. In: Int. Conf. on Acoust. Speech and Signal Process. (ICASSP), Dallas, Texas, USA, 5074–5077

  • Yegnanarayana B, Prasanna S R M, Zachariah J M, Gupta C S 2005 Combining evidences from source, suprasegmental and spectral features for fixed-text speaker verification study. IEEE Trans. on Speech and Audio Process. 13(4): 575–582

    Article  Google Scholar 

  • Yegnanarayana B, Reddy K S and Kishore S P 2001 Source and systsem feature for speaker recognition using AANN Models. Proc. IEEE Int. Conf. Acoust. Speech and Signal Process. Salt Lake City, UT, USA, 409–412

  • Zheng N, Lee T and Ching P C 2007 Integration of complimentary acoustic features for speaker recognition. IEEE Signal Process. Lett. 14(3): 181–184

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to DEBADATTA PATI.

Rights and permissions

Reprints and permissions

About this article

Cite this article

PATI, D., MAHADEVA PRASANNA, S.R. A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana 38, 591–620 (2013). https://doi.org/10.1007/s12046-013-0163-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-013-0163-z

Keywords

Navigation