Epoch-based analysis of speech signals

YEGNANARAYANA, B; GANGASHETTY, SURYAKANTH V

doi:10.1007/s12046-011-0046-0

Epoch-based analysis of speech signals

Published: 22 November 2011

Volume 36, pages 651–697, (2011)
Cite this article

Sadhana Aims and scope Submit manuscript

B YEGNANARAYANA¹ &
SURYAKANTH V GANGASHETTY¹

481 Accesses
46 Citations
Explore all metrics

Abstract

Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abberton E R M, Howard D M, Fourcin A J 1989 Laryngographic assessment of normal voice: A tutorial. Clinical Linguistics and Phonetics 3(3): 263–296
Google Scholar
Ambramson A S, Lisker L 1965 Voice onset time in stop consonants: acoustic analysis and synthesis. Proc. 5th Int. Congr. Phonetic Sciences, Liege, A51
Ananthapadmanabha T V, Fant G 1982 Calculations of true glottal volume-velocity and its components. Speech Commun. 1: 167–184
Article Google Scholar
Ananthapadmanabha T V, Yegnanarayana B 1975 Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6): 562–570
Article Google Scholar
Ananthapadmanabha T V, Yegnanarayana B 1979 Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4): 309–319
Article Google Scholar
Atal B S, Hanauer S L 1971 Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2): 637–655
Article Google Scholar
Bachorowski J, Smoski M, Owren M 2001 The acoustic features of human laughter. J. Acoust. Soc. Am. 111: 1582–1597
Google Scholar
Bagshaw P C, Hiller S M, Jack M A 1993 Enhanced pitch tracking and the processing of F ₀ contours for computer and intonation teaching. Proc. European Conf. on Speech Commun. (Eurospeech), Berlin, Germany, 1003–1006. URL http://www.cstr.ed.ac.uk/research/projects/fda/
Bapineedu G 2010 Analysis of Lombard effect speech and its application in speaker verification for imposter detection. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India
Barros A K, Rutkowski T, Itakura F, Ohnishi N 2002 Estimation of speech embedded in a reverberant and noisy environment by independent component analysis and wavlets. IEEE Trans. Neural Netw. 13: 888–893
Article Google Scholar
Boersma P 1993 Accurate short-term analysis of fundamental frequency and the hormincs-to-noise ratio of a sampled sound. Proc. Inst. Phonetic Sci. 17: 97–110
Google Scholar
Boll S F 1979 Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. ASSP-27: 113–120
Article Google Scholar
Borroff M L 2007 A landmark underspecification account of the patterning of glottal stop. PhD thesis, Stony Brook University, New York
Brookes M, Naylor P A, Gudnason J 2006 A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2): 456–466
Article Google Scholar
Cardoso J-F 1998 Blind signal separation: statistical principles. Proc. IEEE 86: 2009–2025
Article Google Scholar
Cheng Y M, O’Shaughnessy D 1989 Automatic and reliable estimation of glottal closure instant and period. IEEE Trans. Acoust. Speech Signal Process. 27: 1805–1815
Article Google Scholar
Cheng Y M, O’Shaughnessy D 1991 Speech enhancement based conceptually on auditory evidence. IEEE Trans. Signal Process. 39: 1943–1954
Article Google Scholar
CMU-ARCTIC speech synthesis databases. URL http://festvox.org/cmu_arctic/index.html
d’Alessandro C, Scherer K R 2003 Voice quality: Functions, analysis and synthesis (VOQUAL’03). ISCA Tutorial and Research Workshop, Geneva, Switzerland, http://archives.limsi.fr/VOQUAL/voicematerial.html (last viewed 04/08/2009)
de Cheveigne A, Kawahara H 2002 YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4): 1917–1930
Article Google Scholar
Dilley L, Shattuck-Hufnagel S, Ostendorf M 1996 Glottalization of word-initial vowels as a function of prosodic structure. J. Phonetics 24: 423–444
Article Google Scholar
Ephraim Y, Van Trees H L 1995 A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4): 251–266
Article Google Scholar
FFabre P 1957 Un procede electrique percutane d’inscrition de l’accolement glottique au cours de la phonation: glottographie de haute frequence. premiers resultats. Bull. Acad. Natl. Med. 141: 66
Google Scholar
Flanagan J L, Jonston J D, Zahn R, Elko G W 1985 Computer-steered microphone arrays for sound transduction in large rooms. J. Acoust. Soc. Am. 78(5): 1508–1518
Article Google Scholar
Fourcin A J, Abberton E 1971 First applications of a new laryngograph. Med. Biol. Illus. 21: 172–182
Google Scholar
Frazier R H et al 1990 Enhancement of speech by adaptive filtering. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. New York, NY, USA
Frokjaer-Jensen B 1967 A photo-electric glottograph. Annual Report of the Institute of Phonetics of University of Copenhagen 2: 5–19
Google Scholar
Frokjaer-Jensen B, Thorvaldsen P 1968 Construction of a fabre glottograph. ARIPUC 3: 1
Google Scholar
Gauffin J, Sundberg J 1989 Spectral correlates of glottal voice source waveform characteristics. J. Speech. Hear. Res. 32: 556–565
Google Scholar
Goldberg R, Riek L 2000 A practical handbook of speech coders. (Boca Raton, FL: CRC Press)
Book MATH Google Scholar
Gordon M, Ladefoged P 2001 Phonation types: a cross-linguistic overview. J. Phonetics 29(4): 383–406
Article Google Scholar
Guruprasad S 2010 Significance of processing regions of high signal-to-noise ratio in speech signals. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras
Guruprasad S, Yegnanarayana B 2009 Perceived loudness of speech based on the characteristics of excitation source. J. Acoust. Soc. Am. 126(4): 2061–2071
Article Google Scholar
Guruprasad S, Yegnanarayana B 2011 Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Lang. Process. 19(7): 1853–1864
Google Scholar
Hamon C, Moulines E, Charpentier F 1989 A diphone synthesis system based on time domain prosodic modifications of speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Glasgow, 238–241
Hermes D J 1988 Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1): 257–264
Article Google Scholar
Hess W, Indefrey H 1987 Accurate time-domain pitch determination of speech signals by means of a laryngograph. Speech Commun. 6: 55–68
Article Google Scholar
Huang J, Zhao Y 1998 An energy-constrained signal subspace method for speech enhancement and recognition in colored noise. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Seattle, WA, USA, 377–380
Huckvale M 2000 Speech filing system: Tools for speech research. URL http://www.phon.ucl.ac.uk/resource/sfs/
Iskra D, Grosskopf B, Marasek K, Van Den Heuvel H, Diehl F, Kiessling A 2002 SPEECON – speech databases for consumer devices: Database specification and validation. Proc. Third Int. Conf. Lang. Resources Eval. (LREC), Las Palmas, Canary Islands - Spain, 329–333
Jankowski C R Jr, Quatieri T F, Reynolds D A 1995 Measuring fine structure in speech: Application to speaker identification. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Detroit, MI, USA, 325–328
Jensen J, Hansen J H L 2001 Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7): 731–740
Article Google Scholar
Joseph A M, Guruprasad S, Yegnanarayana B 2006 Extracting formants from short segments using group delay functions. Proc. Int. Conf. Spoken Language Processing, Pittsburgh, USA, 1009–1012
Joseph A M, Yegnanarayana B, Gupta S, Kesheorey M R 2009 Speaker dependent mapping for low bit rate coding of throat microphone speech. Proc. Interspeech 2009, Brighton, UK, 1087–1090
Kominek J, Black A 2004 The CMU Arctic speech databases. Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, 223–224
Kounoudes A, Naylor P A, Brookes M 2002 The DYPSA algorithm for estimation of glottal closure instants in voiced speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. vol. 11, Orlando, FL, 349–352
Kumar S K 2010 Analysis of laugh signals for automatic detection and synthesis. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India
Lecluse F L E 1977 Elecroglottography. Dissertation, Univ. of Rotterdam
Lee C K, Childers D G 1988 Cochannel speech separation. J. Acoust. Soc. Am. 83(1): 274–280
Article Google Scholar
Leslau W 1995 Reference grammar of Amharic. (Wiesbaden: Otto Harrassowitz)
Google Scholar
Lombard E 1911 Le signe de l’elevation de la voix, annals maladiers oreille. Larynx. Nez. Pharynx. 37: 101–119
Google Scholar
Ma Y K C, Willems L F 1994 A Frobenius norm approach to glottal closure detection from the speech signal. IEEE Trans. Speech Audio Process. 2: 258–265
Article Google Scholar
Makhoul J 1975 Linear prediction: A tutorial review. Proc. IEEE, 63(4): 561–580
Article Google Scholar
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M 1977 The DET curve in assessment of detection task performance. Proc. European Conf. Speech Process. Technol. Greece, 1895–1898
McKenna J G 2001 Automatic glottal closed-phase location and analysis by Kalman filtering. Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland
Meyer G F 1995 Keele pitch database, School of Psychology, University of Liverpool, UK. URL http://www.liv.ac.uk/Psychology/hmp/projects/pitch.html
Mitchell O M M, Ross C A, Yates G H 1971 Signal processing for a cocktail party effect. J. Acoust. Soc. Am. 50: 656–660
Article Google Scholar
Mittal U, Phamdo N 2000 Signal/noise klt based approach for enhancing speech degraded by colored noise. IEEE Trans. Speech Audio Process. 8: 159–167
Article Google Scholar
Miyoshi M, Kaneda Y 1988 Inverse filtering of room acoustics. IEEE Trans. Acoust. Speech Signal Process. ASSP-36: 145–152
Article Google Scholar
Monsen R B, Engebretson A M 1977 Study of variations in the male and female glottal wave. J. Acoust. Soc. Am. 62(4): 981–993
Article Google Scholar
Morgan D P, George E B, Lee L T, Kay S M 1997 Cochannel speech separation by harmonic enhancement and supression. IEEE Trans. Speech Audio Process. 5: 407–424
Article Google Scholar
Murty K S R 2009 Significance of Excitation Source Information for Speech Analysis. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras
Murty K S R, Yegnanarayana B 2006 Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1): 52–56
Article Google Scholar
Murty K S R, Yegnanarayana B 2008 Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8): 1602–1613
Article Google Scholar
Murty K S R, Khurana S, Itankar Y U, Kesheorey M R, Yegnanarayana B 2008 Efficient representation of throat microphone speech. Proc. Interspeech 2008, Brisbane, Australia, 2610–2613
Murty K S R, Yegnanarayana B, Joseph M A 2009 Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6): 469–472
Article Google Scholar
Murty P S, Yegnanarayana B 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7(6): 609–619
Article Google Scholar
Navarro-Mesa J L, Lleida-Solano E, Moreno-Bilbao A 2001 A new method for epoch detection based on the Cohen’s class of time frequency representations. IEEE Signal Process. Lett. 8(8): 225–227
Article Google Scholar
Naylor P A, Kounoudes A, Gudnason J, Brookes M 2007 Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1): 34–43
Article Google Scholar
Nemer E, Goubran R, Mahmoud S 2002 Speech enhancement using fourth-order cumulants and optimum filters in the subband domain. Speech Commun. 36: 219–246
Article MATH Google Scholar
Neocleous A, Naylor P A 1998 Voice source parameters for speaker verification. Proc. Eur. Signal Process. Conf. 697–700
Noisex-92. URL http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
Oh S, Viswanathan V 1992 Hands-free voice communication in an automobile with a microphone array. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. San Francisco, California, 281–284
Parsons T W 1976 Separation of speech from interfering speech by means of harmonic selection. J. Acoust. Soc. Am. 60: 911–918
Article Google Scholar
Plante F, Meyer G F, Ainsworth W A 1995 A pitch extraction reference database. Proc. European Conf. Speech Commun. (Eurospeech) Madrid, Spain, 827–840
Plumpe M D, Quatieri T F, Reynolds D A 1999 Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Acoust. Speech Signal Process. 7: 569–586
Google Scholar
Prasanna S R M 2004 Event Based Analysis of Speech. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras
Rao K S 2005 Acquisition and incorporation of prosody knowledge for speech systems in Indian languages. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras
Rao K S, Yegnanarayana B 2006 Prosody modification using instants of significant excitation. IEEE Trans. Audio, Speech Lang. Process. 14(3): 972–980
Article Google Scholar
Rao K S, Prasanna S R M, Yegnanarayana B 2007 Determination of instants of significant excitation in speech using Hilbert envelope and group-delay function. IEEE Signal Process. Lett. 14(10): 762–765
Article Google Scholar
Reddy S H M 2010 Analysis of Speech at Different Speaking Rates using Excitation Source Information. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India
Reddy S H M, Prahallad K, Gangashetty S V, Yegnanarayana B 2010 Significance of pitch synchronous analysis for speaker recognition using AANN models. Proc. Interspeech 2010, Makuhari, Chiba, Japan, 669–672
Swamy R K, Murty K S R, Yegnanarayana B 2007 Determining number of speakers from multispeaker speech signals using excitation source information. IEEE Signal Process. Lett. 14(7): 481–484
Article Google Scholar
Satyanarayana P 1999 Short segment analysis of speech for enhancement. PhD thesis, Indian Institute of Technology Madras, Department of Computer Science and Engg., Chennai, India
Scalart P, Benmar A 1996 A system for speech enhancement in the context of hands-free radiotelephony with combined noise reduction and acoustic echo cancellation. Speech Commun. 20: 203–214
Article Google Scholar
Scherer R C, Druker D G, Titze I R 1988 Vocal physiology: Voice production mechanisms and functions. (New York: Raven Press Ltd.)
Google Scholar
Silverman H F 1987 Some analysis of microphone arrays for speech data acquisition. IEEE Trans. Acoust. Speech Signal Process. ASSP-35(12): 1699–1712
Article Google Scholar
Smits R, Yegnanarayana B 1995 Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5): 325–333
Article Google Scholar
Sobakin A N 1972 Digital computer determination of formant parameters of the vocal tract from a speech signal. Soviet Phys.-Acoust. 18: 84–90
Google Scholar
Stevens K N 1977 Physics of laryngeal behavior and larynx models. Phonetica 34: 264–279
Article Google Scholar
Stevens M, Hajek J 2004 A preliminary investigation of some acoustic characteristics of ejectives in Waima’a: VOT and closure duration. In S Cassidy, F Cox, R Mannell, S Palethorpe (eds) Proc. Tenth Australian Int. Conf. Speech Science and Technology Macquaire University, Sydney, ASSTA, 277–282
Strube H W 1974 Determination of the instant of glottal closures from the speech wave. J. Acoust. Soc. Am. 56: 1625–1629
Article Google Scholar
Subramaniam S, Petropulu A P, Wendt C 1996 Cepstrum-based deconvolution for speech dereverberation. IEEE Trans. Speech Audio Process. 4: 392–396
Article Google Scholar
Sun X 2002 Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, Orlando, FL, USA, 333–336
Talkin D 1995 A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, (Amsterdam: Elsevier Science)
Google Scholar
Tuan V N, d’Alessandro C 1999 Robust glottal closure detection using the wavelet transform. Proc. European Conf. Speech Processing, Technology, Budapest, 2805–2808
Van Den Berg J 1958 Myoelastic-aerodynamic theory of voice production. J. Speech Hearing 1: 227–244
Google Scholar
Varga A, Steeneken H J M 1993 Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm. 12(3): 247–251. [Online] Available: http://www.speech.cs.smu.edu/comp.speech/Section1/Data/noisex.html
Vassiére J 1997 Phonological use of the larynx. In ISCA LARYNX-1997, Marseille, France, 115–126
Veeneman D, BeMent S 1985 Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Signal Process. 33: 369–377
Article Google Scholar
Wallace C 2007 The phonetics of laughter – A linguistic approach. Interdisciplinary Workshop on the Phonetics of Laughter, Saarbrucken, August 4–5 Saarbruchen
Wong D Y, Markel J D, Gray A H 1979 Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27: 350–355
Article Google Scholar
Worku H S 2010 Acoustic characterization of glottal stop and glottalized sounds in Amharic using non-spectral methods of speech analysis. PhD thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India
Yegnanarayana B 1978 Formant extraction from linear prediction phase spectra. J. Acoust. Soc. Am. 63(5): 1638–1640
Article Google Scholar
Yegnanarayana B, Murty P S 2000 Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8(3): 267–281
Article Google Scholar
Yegnanarayana B, Smits R L H M 1995 A robust method for determining instants of major excitations in voiced speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Detroit, USA, 776–779
Yegnanarayana B, Veldhuis R N J 1998 Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech and Audio Process. 6(4): 313–327
Article Google Scholar
Yegnanarayana B, Avendaño C, Hermansky H, Murty P S 1997 Processing linear prediction residual for speech enhancement. Proc. European Conf. Speech Process. Technol. Rhodes, Greece, 1399–1402
Yegnanarayana B, Avendaño C, Hermansky H, Murthy P S 1999 Speech enhancement using linear prediction residual. Speech Commun. 28(1): 25–42
Article Google Scholar
Yegnanarayana B, Prasanna S R M, Duraiswami R, Zotkin D 2005 Processing of reverberent speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13(6): 1110–1118
Article Google Scholar
Yegnanarayana B, Murty K S R, Rajendran S 2008 Analysis of stop consonants in indian languages using excitation source information in speech signal. Proc. Workshop Speech Anal. Process. Knowledge Discovery, June 4–6 Aalborg, Denmark
Yegnanarayana B, Murty K S R 2009 Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4): 614–624
Google Scholar
Yegnanarayana B, Prasanna S R M, Guruprasad S 2011 Study of robustness of zero frequency resonator method for extraction of fundamental frequency. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Prague, Czech Republic

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, 500 032, India
B YEGNANARAYANA & SURYAKANTH V GANGASHETTY

Authors

B YEGNANARAYANA
View author publications
You can also search for this author in PubMed Google Scholar
SURYAKANTH V GANGASHETTY
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B YEGNANARAYANA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

YEGNANARAYANA, B., GANGASHETTY, S.V. Epoch-based analysis of speech signals. Sadhana 36, 651–697 (2011). https://doi.org/10.1007/s12046-011-0046-0

Download citation

Published: 22 November 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s12046-011-0046-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Epoch-based analysis of speech signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction by Phase Modelling of Speech Signals

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Epoch-based analysis of speech signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction by Phase Modelling of Speech Signals

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation