Skip to main content
Log in

Epoch-based analysis of speech signals

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abberton E R M, Howard D M, Fourcin A J 1989 Laryngographic assessment of normal voice: A tutorial. Clinical Linguistics and Phonetics 3(3): 263–296

    Google Scholar 

  • Ambramson A S, Lisker L 1965 Voice onset time in stop consonants: acoustic analysis and synthesis. Proc. 5th Int. Congr. Phonetic Sciences, Liege, A51

  • Ananthapadmanabha T V, Fant G 1982 Calculations of true glottal volume-velocity and its components. Speech Commun. 1: 167–184

    Article  Google Scholar 

  • Ananthapadmanabha T V, Yegnanarayana B 1975 Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6): 562–570

    Article  Google Scholar 

  • Ananthapadmanabha T V, Yegnanarayana B 1979 Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4): 309–319

    Article  Google Scholar 

  • Atal B S, Hanauer S L 1971 Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2): 637–655

    Article  Google Scholar 

  • Bachorowski J, Smoski M, Owren M 2001 The acoustic features of human laughter. J. Acoust. Soc. Am. 111: 1582–1597

    Google Scholar 

  • Bagshaw P C, Hiller S M, Jack M A 1993 Enhanced pitch tracking and the processing of F 0 contours for computer and intonation teaching. Proc. European Conf. on Speech Commun. (Eurospeech), Berlin, Germany, 1003–1006. URL http://www.cstr.ed.ac.uk/research/projects/fda/

  • Bapineedu G 2010 Analysis of Lombard effect speech and its application in speaker verification for imposter detection. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India

  • Barros A K, Rutkowski T, Itakura F, Ohnishi N 2002 Estimation of speech embedded in a reverberant and noisy environment by independent component analysis and wavlets. IEEE Trans. Neural Netw. 13: 888–893

    Article  Google Scholar 

  • Boersma P 1993 Accurate short-term analysis of fundamental frequency and the hormincs-to-noise ratio of a sampled sound. Proc. Inst. Phonetic Sci. 17: 97–110

    Google Scholar 

  • Boll S F 1979 Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. ASSP-27: 113–120

    Article  Google Scholar 

  • Borroff M L 2007 A landmark underspecification account of the patterning of glottal stop. PhD thesis, Stony Brook University, New York

  • Brookes M, Naylor P A, Gudnason J 2006 A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2): 456–466

    Article  Google Scholar 

  • Cardoso J-F 1998 Blind signal separation: statistical principles. Proc. IEEE 86: 2009–2025

    Article  Google Scholar 

  • Cheng Y M, O’Shaughnessy D 1989 Automatic and reliable estimation of glottal closure instant and period. IEEE Trans. Acoust. Speech Signal Process. 27: 1805–1815

    Article  Google Scholar 

  • Cheng Y M, O’Shaughnessy D 1991 Speech enhancement based conceptually on auditory evidence. IEEE Trans. Signal Process. 39: 1943–1954

    Article  Google Scholar 

  • CMU-ARCTIC speech synthesis databases. URL http://festvox.org/cmu_arctic/index.html

  • d’Alessandro C, Scherer K R 2003 Voice quality: Functions, analysis and synthesis (VOQUAL’03). ISCA Tutorial and Research Workshop, Geneva, Switzerland, http://archives.limsi.fr/VOQUAL/voicematerial.html (last viewed 04/08/2009)

  • de Cheveigne A, Kawahara H 2002 YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4): 1917–1930

    Article  Google Scholar 

  • Dilley L, Shattuck-Hufnagel S, Ostendorf M 1996 Glottalization of word-initial vowels as a function of prosodic structure. J. Phonetics 24: 423–444

    Article  Google Scholar 

  • Ephraim Y, Van Trees H L 1995 A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4): 251–266

    Article  Google Scholar 

  • FFabre P 1957 Un procede electrique percutane d’inscrition de l’accolement glottique au cours de la phonation: glottographie de haute frequence. premiers resultats. Bull. Acad. Natl. Med. 141: 66

    Google Scholar 

  • Flanagan J L, Jonston J D, Zahn R, Elko G W 1985 Computer-steered microphone arrays for sound transduction in large rooms. J. Acoust. Soc. Am. 78(5): 1508–1518

    Article  Google Scholar 

  • Fourcin A J, Abberton E 1971 First applications of a new laryngograph. Med. Biol. Illus. 21: 172–182

    Google Scholar 

  • Frazier R H et al 1990 Enhancement of speech by adaptive filtering. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. New York, NY, USA

  • Frokjaer-Jensen B 1967 A photo-electric glottograph. Annual Report of the Institute of Phonetics of University of Copenhagen 2: 5–19

    Google Scholar 

  • Frokjaer-Jensen B, Thorvaldsen P 1968 Construction of a fabre glottograph. ARIPUC 3: 1

    Google Scholar 

  • Gauffin J, Sundberg J 1989 Spectral correlates of glottal voice source waveform characteristics. J. Speech. Hear. Res. 32: 556–565

    Google Scholar 

  • Goldberg R, Riek L 2000 A practical handbook of speech coders. (Boca Raton, FL: CRC Press)

    Book  MATH  Google Scholar 

  • Gordon M, Ladefoged P 2001 Phonation types: a cross-linguistic overview. J. Phonetics 29(4): 383–406

    Article  Google Scholar 

  • Guruprasad S 2010 Significance of processing regions of high signal-to-noise ratio in speech signals. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras

  • Guruprasad S, Yegnanarayana B 2009 Perceived loudness of speech based on the characteristics of excitation source. J. Acoust. Soc. Am. 126(4): 2061–2071

    Article  Google Scholar 

  • Guruprasad S, Yegnanarayana B 2011 Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Lang. Process. 19(7): 1853–1864

    Google Scholar 

  • Hamon C, Moulines E, Charpentier F 1989 A diphone synthesis system based on time domain prosodic modifications of speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Glasgow, 238–241

  • Hermes D J 1988 Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1): 257–264

    Article  Google Scholar 

  • Hess W, Indefrey H 1987 Accurate time-domain pitch determination of speech signals by means of a laryngograph. Speech Commun. 6: 55–68

    Article  Google Scholar 

  • Huang J, Zhao Y 1998 An energy-constrained signal subspace method for speech enhancement and recognition in colored noise. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Seattle, WA, USA, 377–380

  • Huckvale M 2000 Speech filing system: Tools for speech research. URL http://www.phon.ucl.ac.uk/resource/sfs/

  • Iskra D, Grosskopf B, Marasek K, Van Den Heuvel H, Diehl F, Kiessling A 2002 SPEECON – speech databases for consumer devices: Database specification and validation. Proc. Third Int. Conf. Lang. Resources Eval. (LREC), Las Palmas, Canary Islands - Spain, 329–333

  • Jankowski C R Jr, Quatieri T F, Reynolds D A 1995 Measuring fine structure in speech: Application to speaker identification. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Detroit, MI, USA, 325–328

  • Jensen J, Hansen J H L 2001 Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7): 731–740

    Article  Google Scholar 

  • Joseph A M, Guruprasad S, Yegnanarayana B 2006 Extracting formants from short segments using group delay functions. Proc. Int. Conf. Spoken Language Processing, Pittsburgh, USA, 1009–1012

  • Joseph A M, Yegnanarayana B, Gupta S, Kesheorey M R 2009 Speaker dependent mapping for low bit rate coding of throat microphone speech. Proc. Interspeech 2009, Brighton, UK, 1087–1090

  • Kominek J, Black A 2004 The CMU Arctic speech databases. Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, 223–224

  • Kounoudes A, Naylor P A, Brookes M 2002 The DYPSA algorithm for estimation of glottal closure instants in voiced speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. vol. 11, Orlando, FL, 349–352

  • Kumar S K 2010 Analysis of laugh signals for automatic detection and synthesis. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India

  • Lecluse F L E 1977 Elecroglottography. Dissertation, Univ. of Rotterdam

  • Lee C K, Childers D G 1988 Cochannel speech separation. J. Acoust. Soc. Am. 83(1): 274–280

    Article  Google Scholar 

  • Leslau W 1995 Reference grammar of Amharic. (Wiesbaden: Otto Harrassowitz)

    Google Scholar 

  • Lombard E 1911 Le signe de l’elevation de la voix, annals maladiers oreille. Larynx. Nez. Pharynx. 37: 101–119

    Google Scholar 

  • Ma Y K C, Willems L F 1994 A Frobenius norm approach to glottal closure detection from the speech signal. IEEE Trans. Speech Audio Process. 2: 258–265

    Article  Google Scholar 

  • Makhoul J 1975 Linear prediction: A tutorial review. Proc. IEEE, 63(4): 561–580

    Article  Google Scholar 

  • Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M 1977 The DET curve in assessment of detection task performance. Proc. European Conf. Speech Process. Technol. Greece, 1895–1898

  • McKenna J G 2001 Automatic glottal closed-phase location and analysis by Kalman filtering. Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland

  • Meyer G F 1995 Keele pitch database, School of Psychology, University of Liverpool, UK. URL http://www.liv.ac.uk/Psychology/hmp/projects/pitch.html

  • Mitchell O M M, Ross C A, Yates G H 1971 Signal processing for a cocktail party effect. J. Acoust. Soc. Am. 50: 656–660

    Article  Google Scholar 

  • Mittal U, Phamdo N 2000 Signal/noise klt based approach for enhancing speech degraded by colored noise. IEEE Trans. Speech Audio Process. 8: 159–167

    Article  Google Scholar 

  • Miyoshi M, Kaneda Y 1988 Inverse filtering of room acoustics. IEEE Trans. Acoust. Speech Signal Process. ASSP-36: 145–152

    Article  Google Scholar 

  • Monsen R B, Engebretson A M 1977 Study of variations in the male and female glottal wave. J. Acoust. Soc. Am. 62(4): 981–993

    Article  Google Scholar 

  • Morgan D P, George E B, Lee L T, Kay S M 1997 Cochannel speech separation by harmonic enhancement and supression. IEEE Trans. Speech Audio Process. 5: 407–424

    Article  Google Scholar 

  • Murty K S R 2009 Significance of Excitation Source Information for Speech Analysis. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras

  • Murty K S R, Yegnanarayana B 2006 Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1): 52–56

    Article  Google Scholar 

  • Murty K S R, Yegnanarayana B 2008 Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8): 1602–1613

    Article  Google Scholar 

  • Murty K S R, Khurana S, Itankar Y U, Kesheorey M R, Yegnanarayana B 2008 Efficient representation of throat microphone speech. Proc. Interspeech 2008, Brisbane, Australia, 2610–2613

  • Murty K S R, Yegnanarayana B, Joseph M A 2009 Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6): 469–472

    Article  Google Scholar 

  • Murty P S, Yegnanarayana B 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7(6): 609–619

    Article  Google Scholar 

  • Navarro-Mesa J L, Lleida-Solano E, Moreno-Bilbao A 2001 A new method for epoch detection based on the Cohen’s class of time frequency representations. IEEE Signal Process. Lett. 8(8): 225–227

    Article  Google Scholar 

  • Naylor P A, Kounoudes A, Gudnason J, Brookes M 2007 Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1): 34–43

    Article  Google Scholar 

  • Nemer E, Goubran R, Mahmoud S 2002 Speech enhancement using fourth-order cumulants and optimum filters in the subband domain. Speech Commun. 36: 219–246

    Article  MATH  Google Scholar 

  • Neocleous A, Naylor P A 1998 Voice source parameters for speaker verification. Proc. Eur. Signal Process. Conf. 697–700

  • Noisex-92. URL http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html

  • Oh S, Viswanathan V 1992 Hands-free voice communication in an automobile with a microphone array. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. San Francisco, California, 281–284

  • Parsons T W 1976 Separation of speech from interfering speech by means of harmonic selection. J. Acoust. Soc. Am. 60: 911–918

    Article  Google Scholar 

  • Plante F, Meyer G F, Ainsworth W A 1995 A pitch extraction reference database. Proc. European Conf. Speech Commun. (Eurospeech) Madrid, Spain, 827–840

  • Plumpe M D, Quatieri T F, Reynolds D A 1999 Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Acoust. Speech Signal Process. 7: 569–586

    Google Scholar 

  • Prasanna S R M 2004 Event Based Analysis of Speech. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras

  • Rao K S 2005 Acquisition and incorporation of prosody knowledge for speech systems in Indian languages. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras

  • Rao K S, Yegnanarayana B 2006 Prosody modification using instants of significant excitation. IEEE Trans. Audio, Speech Lang. Process. 14(3): 972–980

    Article  Google Scholar 

  • Rao K S, Prasanna S R M, Yegnanarayana B 2007 Determination of instants of significant excitation in speech using Hilbert envelope and group-delay function. IEEE Signal Process. Lett. 14(10): 762–765

    Article  Google Scholar 

  • Reddy S H M 2010 Analysis of Speech at Different Speaking Rates using Excitation Source Information. MS thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India

  • Reddy S H M, Prahallad K, Gangashetty S V, Yegnanarayana B 2010 Significance of pitch synchronous analysis for speaker recognition using AANN models. Proc. Interspeech 2010, Makuhari, Chiba, Japan, 669–672

  • Swamy R K, Murty K S R, Yegnanarayana B 2007 Determining number of speakers from multispeaker speech signals using excitation source information. IEEE Signal Process. Lett. 14(7): 481–484

    Article  Google Scholar 

  • Satyanarayana P 1999 Short segment analysis of speech for enhancement. PhD thesis, Indian Institute of Technology Madras, Department of Computer Science and Engg., Chennai, India

  • Scalart P, Benmar A 1996 A system for speech enhancement in the context of hands-free radiotelephony with combined noise reduction and acoustic echo cancellation. Speech Commun. 20: 203–214

    Article  Google Scholar 

  • Scherer R C, Druker D G, Titze I R 1988 Vocal physiology: Voice production mechanisms and functions. (New York: Raven Press Ltd.)

    Google Scholar 

  • Silverman H F 1987 Some analysis of microphone arrays for speech data acquisition. IEEE Trans. Acoust. Speech Signal Process. ASSP-35(12): 1699–1712

    Article  Google Scholar 

  • Smits R, Yegnanarayana B 1995 Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5): 325–333

    Article  Google Scholar 

  • Sobakin A N 1972 Digital computer determination of formant parameters of the vocal tract from a speech signal. Soviet Phys.-Acoust. 18: 84–90

    Google Scholar 

  • Stevens K N 1977 Physics of laryngeal behavior and larynx models. Phonetica 34: 264–279

    Article  Google Scholar 

  • Stevens M, Hajek J 2004 A preliminary investigation of some acoustic characteristics of ejectives in Waima’a: VOT and closure duration. In S Cassidy, F Cox, R Mannell, S Palethorpe (eds) Proc. Tenth Australian Int. Conf. Speech Science and Technology Macquaire University, Sydney, ASSTA, 277–282

  • Strube H W 1974 Determination of the instant of glottal closures from the speech wave. J. Acoust. Soc. Am. 56: 1625–1629

    Article  Google Scholar 

  • Subramaniam S, Petropulu A P, Wendt C 1996 Cepstrum-based deconvolution for speech dereverberation. IEEE Trans. Speech Audio Process. 4: 392–396

    Article  Google Scholar 

  • Sun X 2002 Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, Orlando, FL, USA, 333–336

  • Talkin D 1995 A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, (Amsterdam: Elsevier Science)

    Google Scholar 

  • Tuan V N, d’Alessandro C 1999 Robust glottal closure detection using the wavelet transform. Proc. European Conf. Speech Processing, Technology, Budapest, 2805–2808

  • Van Den Berg J 1958 Myoelastic-aerodynamic theory of voice production. J. Speech Hearing 1: 227–244

    Google Scholar 

  • Varga A, Steeneken H J M 1993 Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm. 12(3): 247–251. [Online] Available: http://www.speech.cs.smu.edu/comp.speech/Section1/Data/noisex.html

  • Vassiére J 1997 Phonological use of the larynx. In ISCA LARYNX-1997, Marseille, France, 115–126

  • Veeneman D, BeMent S 1985 Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Signal Process. 33: 369–377

    Article  Google Scholar 

  • Wallace C 2007 The phonetics of laughter – A linguistic approach. Interdisciplinary Workshop on the Phonetics of Laughter, Saarbrucken, August 4–5 Saarbruchen

  • Wong D Y, Markel J D, Gray A H 1979 Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27: 350–355

    Article  Google Scholar 

  • Worku H S 2010 Acoustic characterization of glottal stop and glottalized sounds in Amharic using non-spectral methods of speech analysis. PhD thesis, Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India

  • Yegnanarayana B 1978 Formant extraction from linear prediction phase spectra. J. Acoust. Soc. Am. 63(5): 1638–1640

    Article  Google Scholar 

  • Yegnanarayana B, Murty P S 2000 Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8(3): 267–281

    Article  Google Scholar 

  • Yegnanarayana B, Smits R L H M 1995 A robust method for determining instants of major excitations in voiced speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Detroit, USA, 776–779

  • Yegnanarayana B, Veldhuis R N J 1998 Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech and Audio Process. 6(4): 313–327

    Article  Google Scholar 

  • Yegnanarayana B, Avendaño C, Hermansky H, Murty P S 1997 Processing linear prediction residual for speech enhancement. Proc. European Conf. Speech Process. Technol. Rhodes, Greece, 1399–1402

  • Yegnanarayana B, Avendaño C, Hermansky H, Murthy P S 1999 Speech enhancement using linear prediction residual. Speech Commun. 28(1): 25–42

    Article  Google Scholar 

  • Yegnanarayana B, Prasanna S R M, Duraiswami R, Zotkin D 2005 Processing of reverberent speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13(6): 1110–1118

    Article  Google Scholar 

  • Yegnanarayana B, Murty K S R, Rajendran S 2008 Analysis of stop consonants in indian languages using excitation source information in speech signal. Proc. Workshop Speech Anal. Process. Knowledge Discovery, June 4–6 Aalborg, Denmark

  • Yegnanarayana B, Murty K S R 2009 Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4): 614–624

    Google Scholar 

  • Yegnanarayana B, Prasanna S R M, Guruprasad S 2011 Study of robustness of zero frequency resonator method for extraction of fundamental frequency. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Prague, Czech Republic

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B YEGNANARAYANA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

YEGNANARAYANA, B., GANGASHETTY, S.V. Epoch-based analysis of speech signals. Sadhana 36, 651–697 (2011). https://doi.org/10.1007/s12046-011-0046-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-011-0046-0

Keywords

Navigation