Prosodic Features for Speaker Recognition

Mary, Leena

doi:10.1007/978-1-4614-0263-3_13

Leena Mary Ph.D.³

1648 Accesses
2 Citations

Abstract

In this chapter the effectiveness of syllable-based prosodic features for speaker recognition is discussed. The term prosody represents a collection of characteristics such as intonation, stress and timing, primarily expressed using variations in pitch, energy and duration at various levels of speech. Prosody reflects the learned/acquired speaking habits of a person and hence contributes for speaker recognition. Because prosodic features are less affected by channel mismatch and noise, they are particularly well suited for speaker forensics, a field that demands accurate identification of suspects with as few mitigating conditions as possible. In this chapter, the author describes a method for extracting prosodic features directly from speech signal. Applying this method, speech is segmented into syllable-like regions using vowel onset points (VOP). The locations of VOPs serve as reference for extraction and representation of prosodic features. The effectiveness of the prosodic features for speaker recognition is demonstrated for extended task of NIST speaker recognition evaluation 2003. Combining evidence from spectral features with that of the proposed prosodic features helps to improve overall speaker recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Heck LP (2002) Integrating high-level information for robust speaker recognition in John Hopkins University workshop on SuperSID, Baltimore, Maryland. http:\\www.cslp.jhu.edu/ws2002/groups/supersid
Doddington GG (2001) Speaker recognition based on idiolectic differences between speakers. Proc. EUROSPEECH, Aalborg, Denmark, pp 2521–2524
Google Scholar
Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462
Article Google Scholar
Mary L (2006) Multilevel implicit features for language and speaker recognition. Ph. D. Thesis, Indian Institute of Technology, Madras
Google Scholar
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40
Article Google Scholar
NIST (2001) Speaker recognition evaluation website: http://www.nist.gov/speech/tests/spk/2001
Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The superSID project: exploiting high-level information for high-accuracy speaker recognition Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, Hong Kong, China, 4, pp 784–787
Google Scholar
Shriberg E, Stolcke A, Hakkani-Tur D, Tur G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32:127–154
Article Google Scholar
Sonmez MK, Heck L, Weintraub M, Shriberg E (1997) A lognormal tied mixture model of pitch for prosody-based speaker recognition. Proc. EUROSPEECH, Rhodes, Greece. 3, pp 1391–1394
Google Scholar
Atkinson JE (1978) Correlation analysis of the physiological factors controlling fundamental voice frequency. J Acoust Soc Am 63(1):211–222
Article Google Scholar
Yegnanarayana B, Prasanna SRM, Zachariah JM, Gupta CS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582
Article Google Scholar
Atal B (1972) Automatic speaker recognition based on pitch contours. J Acous Soc Am 52(3):1687–1697
Article Google Scholar
Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong Kong, China, 4, pp 788–791
Google Scholar
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Article Google Scholar
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Speech Audio Process 29:254–272
Google Scholar
Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83
Article Google Scholar
Reynolds DA (1996) The effect of handset variability on speaker recognition performance: Experiments on the switchboard corpus. Proc. ICASSP, Atlanta, GA, USA, 1, pp 113–116
Google Scholar
Thyme-Gobbel AE, Hutchins SE (1996) On using prosodic cues in automatic language identification. Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, 3, pp 1768–1772
Google Scholar
Mary L, Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun 50:782–796
Article Google Scholar
Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 132–135
Google Scholar
Shriberg E, Stolcke A (2008) The case for automatic higher level features in forensic speaker recognition. Proc. Interspeech, Brisbane, Australia, pp 1509–1512
Google Scholar
Rose P (2006) Technical speaker recognition: evaluation, types and testing of evidence. Comp Speech Lang 20:159–1914
Article Google Scholar
Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46:455–472
Article Google Scholar
Sonmez MK, Shriberg E, Heck L, Weintraub M (1998) Modeling dynamic prosodic variation for speaker variation. Proc. ICSLP, Sydney, Australia, 7, pp 3189–3192
Google Scholar
Adami AG, Mihaescu R, Reynolds DA, Godfrey JJ (2003) Modeling prosodic dynamics for speaker recognition. Proc. ICASSP, Hong kong, China, 4, pp 788–791
Google Scholar
Peskin B, Navratil J, Abramson J, Jones D, Klusacek D, Reynolds D, Xiang B (2003) Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS`02. Proc. ICASSP, Hong kong, China, 4, pp 792–795
Google Scholar
Rouas J, Farinas J, Pellegrino F, Andre-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Commun 47:436–456
Article Google Scholar
Nagarajan T, Murthy HA (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Commun 48:913–926
Article Google Scholar
Dehak N, Kenny P, Dumouchel P (2007) Continuous prosodic features and formant modeling with joint factor analysis for speaker verification. Proc. of Interspeech, pp 1234–1237
Google Scholar
Mary L, Yegnanarayana B (2006) Prosodic features for speaker verification. Proc. of Interspeech, Pittsburgh, Pennsylvania, pp 917–920
Google Scholar
MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546
Google Scholar
Krakow RA (1999) Physiological organization of syllables: a review. J Phonetics 27:23–54
Article Google Scholar
Atterer M, Ladd DR (2004) On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. J Phonetics 32:177–197
Article Google Scholar
Prasanna SRM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. Proc. Signal Proc. Com, Indian Institute of Science, pp. 81–88
Google Scholar
Prasanna SRM (2004) Event-based analysis of speech. Ph D Thesis, Indian Institute of Technology, Madras
Google Scholar
Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information, Proc. of Interspeech, pp 1133–1136
Google Scholar
Prasanna SRM, Zachariah JM (2002) Detection of vowel onset point in speech. Proc. IEEE Int Conf Acoust Speech, Signal Processing, Orlando, Fl, USA 4:4159
Google Scholar
Ananthapadmanabha TV (1978) Epoch extraction of voice speech. Ph. D. Thesis, Indian institute of Science, Bangalore
Google Scholar
Hess W (1983) Pitch determination of speech signals. Springer, Berlin
Book Google Scholar
Ananthapadmanabha TV, Yegnanarayana B (1979) Epoch extraction fromlinear prediction residual for identification of closed glottis interval. IEEE Trans ASSP 27:309–319
Article Google Scholar
Ananthapadmanabha TV, Yegnanarayana B (1975) Epoch extraction of voice speech. IEEE Trans ASSP 23:562–570
Article Google Scholar
Taylor P (2000) Analysis and synthesis of intonation using the tilt model. J Acoust Soc Am 107(3):1697–1714
Article Google Scholar
Gussenhoven C, Reepp BH, Rietveld A, Rump HH, Terken J (1997) The perceptual prominence of fundamental frequency peaks. J Acoust Soc Am 102(5):3009–3022
Article Google Scholar
Yegnanarayana B (1999) Artificial neural network. Prentice Hall of India, New Delhi
Google Scholar
Yegnanarayana B, Kishore SP (2002) AANN-An alternative for GMM for pattern recognition. Neural Netw 15(3):459–469
Article Google Scholar

Download references

Acknowledgement

The author would like to thank Prof. B. Yegnanarayana and members of Speech and Vision Laboratory of IIT Madras, India during 2002–2006 for their support to carry out the study described in this chapter.

Author information

Authors and Affiliations

Rajiv Gandhi Institute of Technology, 686501, Kottayam, Kerala, India
Leena Mary Ph.D.

Authors

Leena Mary Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leena Mary Ph.D. .

Editor information

Editors and Affiliations

Lingustic Technology Systems, Palisade Ave Apt 1809 800, Fort Lee, 07024-4121, New Jersey, USA
Amy Neustein
Near Indroda Circle, DA-IICT, Room 4103, Faculty Block 4, Gandhinagar, 382 007, Gujarat, India
Hemant A. Patil

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mary, L. (2012). Prosodic Features for Speaker Recognition. In: Neustein, A., Patil, H. (eds) Forensic Speaker Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0263-3_13

Download citation

DOI: https://doi.org/10.1007/978-1-4614-0263-3_13
Published: 04 October 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0262-6
Online ISBN: 978-1-4614-0263-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics