Skip to main content
Log in

Group delay functions and its applications in speech technology

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Traditionally, the information in speech signals is represented in terms of features derived from short-time Fourier analysis. In this analysis the features extracted from the magnitude of the Fourier transform (FT) are considered, ignoring the phase component. Although the significance of the FT phase was highlighted in several studies over the recent three decades, the features of the FT phase were not exploited fully due to difficulty in computing the phase and also in processing the phase function. The information in the short-time FT phase function can be extracted by processing the derivative of the FT phase, i.e., the group delay function. In this paper, the properties of the group delay functions are reviewed, highlighting the importance of the FT phase for representing information in the speech signal. Methods to process the group delay function are discussed to capture the characteristics of the vocal-tract system in the form of formants or through a modified group delay function. Applications of group delay functions for speech processing are discussed in some detail. They include segmentation of speech into syllable boundaries, exploiting the additive and high resolution properties of the group delay functions. The effectiveness of segmentation of speech, and the features derived from the modified group delay are demonstrated in applications such as language identification, speech recognition and speaker recognition. The paper thus demonstrates the need to exploit the potential of the group delay functions for development of speech systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aarabi P, Shi G, Shanechi M M, Rabi S A 2006 Phase based processing speech (Singapore: World Scientific Publishing Co. Pte. Ltd.)

  • Alsteris L D, Paliwal K K 2006 Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun. 48: 727–736

    Article  Google Scholar 

  • Auckentaler R, Carey M, Lloyd-Thomas H 2000 Score normalisation for text-independent speaker verification systems. Digital Signal Process. 10: 42–54

    Article  Google Scholar 

  • Black A, Taylor P, Caley R 1998 The festival speech synthesis system. http://festvox.org/festival/

  • Bozkurt B, Couvreur L, Dutoit T 2007 Chirp group delay analysis of speech signals. Speech Commun. 49(3): 159–176

    Article  Google Scholar 

  • Chevireddy S, Murthy H A, Chandrasekhar C 2008a A syllable-based segment vocoder. Proc. National Conference on Communications, Mumbai, India, 442–445

  • Chevireddy S, Murthy H A, Chandrasekhar C 2008b Signal processing based segmentation and hmm based automatic clustering for a syllable based segment vocoder at 1.4kbps. Proc. EUSIPCO, Lausanne, Switzerland. www.eurasip.org/Proceedings/Eusipco2008/papers/1569104947.pdf

  • Childers D G 1977 The cepstrum: A guide to processing. Proc. IEEE 68: 1428–1443

    Article  Google Scholar 

  • CUED 2002 HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk

  • Davis S, Mermelstein 1980 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech, Signal Process 28: 357–366

    Article  Google Scholar 

  • DDNews 2001 Database for Indian languages. India, Speech and Vision Lab, IIT Madras, Chennai

  • Dupont S, Luettin J 2000 Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2(3) 141–151

    Google Scholar 

  • Godfrey J J, Holliman E C, McDaniel J 1992 SWITCHBOARD: Telephone speech corpus for research and development. Proc. IEEE Int. Conf. Acoust. Speech Signal Process, San Francisco, California, USA, 1. 517–520

  • Greenberg S 1999 Speaking in short hand - A syllable centric perspective for understanding pronounciation variation. Speech Commun. 29: 159–176

    Article  Google Scholar 

  • Greenberg S, Hollenback J, Ellis D 1996 Insights into spoken language gleaned from phonetic transcription of the switchboard corpus. Proc. Int. Conf. Spoken Language Process, Philadelphia, USA, 24–27

  • Gurban M, Thiran J-P 2008 Using entropy as a stream reliability estimate for audio-visual speech recognition. Proc. EUSIPCO, Lausanne, Switzerland. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/papers/1569104998.pdf

  • Halberstadt A K, Glass J R 1998 Heterogeneous acoustic measurements and multiple classifiers for speech recognition. Proc. Int. Conf. Spoken Language Process. Sydney, Australia, paper 0396

  • Halberstadt A K 1998 Heterogeneous acoustic measurements and multiple classifiers for speech recognition. Ph.D. thesis, Massachussets Institute of Technology

  • Hermansky H 1990 Perceptually linear predictive (plp) analysis of speech. J. of the Acoust. Soc. of Am. 87: 1738–1752

    Article  Google Scholar 

  • Hirsch H, Pearce D 2000 The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA Tutorial and Research Workshop on Automatic Speech Recognition, Paris, France, 181–188

  • Janakiram R, Kumar C J, Murthy H A 2010 Robust syllable segmentation its application to syllable-centric continuous speech recognition. Proc. National Conference on Communications, Chennai, India, 276–280

  • Jelinek F 1999 Statistical methods for speech recognition (Cambridge, Massachusetts: The MIT Press)

    Google Scholar 

  • Kamakshi Prasad V, Nagarajan T, Murthy H A 2004 Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42: 429–446

    Article  Google Scholar 

  • Kishore S P, Black A W 2003 Unit size in unit selection speech synthesis. Proc. EUROSPEECH, Geneva, Switzerland, 1317–1320

  • Kumar C J, Murthy H A 2009 Entropy based measures for incorporating feature stream diversity in the linguistic search space for syllable based automatic annotated recognizer. Proc. National Conference on Communication, Guwahati, India, 286–289

  • Kumar J C, Janakiraman R, Murthy H A 2010 Kl divergence based feature switching in the linguistic search space for automatic speech recognition. Proc. National Conference on Communication, Chennai, India, 281–285

  • Lakshmi Sarada G, Nagarajan T, Murthy H A 2004 Multiple frame size and multiple frame rate feature extraction for speech recognition. Proc. SPCOM, Bangalore, India, 592–595

  • Lakshmi A, Murthy H A 2008 A new approach to continuous speech recognition in indian languages. Proc. National Conference on Communication, Mumbai, India, 277–281

  • Li K 1994 Automatic language identification using syllabic spectral features. Proc. IEEE Int. Conf. Acoust. Speech Signal Process, Adelaide, South Australia, 1. 297–300

  • Li X, Stern R 2003 Training of stream weights for the decoding of speech using parallel feature streams. Proc. IEEE Int. Conf. Acoust. Speech Signal Process, 1: 832–835

    Google Scholar 

  • Lim J 1979 Spectral root homomorphic deconvolution system. IEEE Trans. Acoust. Speech Signal Process 27: 223–233

    Google Scholar 

  • Murthy H A 1997 The real root cepstrum and its applications to speech processing. Proc. National Conference on Communication, Chennai, India, 180–183

  • Murthy H A, Rao G V R 2003 The modified group delay function and its application to phoneme recognition. Proc. IEEE Int. Conf. Acoust. Speech Signal Process, Hongkong, 1.68–71

  • Murthy H A, Yegnanarayana B 1991 Formant extraction from minimum phase group delay function. Speech Commun. 10: 209–221

    Article  Google Scholar 

  • Murthy K V M, Yegnanarayana B 1989 Effectiveness of representation of signals through group delay functions. Elsevier Signal Process. 17: 141–150

    Google Scholar 

  • Nagarajan T, Murthy H A, Hegde R M 2003 Segmentation of speech into syllable-like units. Proc. EUROSPEECH, Geneva, Switzerland, 2893–2896

  • Nagarajan T, Prasad V K, Murthy H A 2001 The minimum phase signal derived from the magnitude spectrum and its applications to speech segmentation. Proc. SPCOM, Bangalore, India, 95–101

  • Neti C P, Luettin G, Matthews J, Vergyri J H G 2001 Large-vocabulary audio-visual speech recognition: A summary of the johns hopkins summer 2000 workshop. Proc. IEEE Fourth Workshop on Multimedia Signal Processing, Cannes, France, 619–624

  • NIST 2003 The NIST year 2003 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2003/index.html

  • Noll A M 1967 Cepstrum pitch determination. J. Acoust. Soc. Am. 41(2): 179–195

    Google Scholar 

  • OGI 1992 The OGI multi-language telephone speech corpus. Proc. Int. Conf. Spoken Lang., Banff, Alberta

  • Oppenheim A V, Schafer R W 1990 Discrete time signal processing (New Jersey: Prentice Hall, Inc.)

    Google Scholar 

  • Padmanabhan R, Murthy H A 2010 Acoustic feature diversity and speaker verification. Proc. INTERSPEECH, Makuhari, Japan, 2110–2113

  • Padmanabhan R, Parthasarthi S H K, Murthy H A 2009 Robustness of phase based features for speaker recognition. Proc. INTERSPEECH, Brighton, U.K., 2355–2358

  • Paliwal K K, Alsteris L D 2005 On the usefulness of stft phase spectrum in human listening tests. Speech Commun. 45 153–170

    Article  Google Scholar 

  • Papoulis A 1977 Signal analysis (New York: McGraw Hill)

    MATH  Google Scholar 

  • Pfitzinger H R, Burger S, Heid S 1996 Syllable detection in read and spontaneous speech. Proc. Int. Conf. Spoken Language Process., Philadelphia, USA, 1261–1264

  • Pradhan A, Chevireddy S, Veezhinathan K, Murthy H A 2010 A low-bit rate segment vocoder using minimum residual energy criteria. Proc. National Conference on Communication, Chennai, India, 246–250

  • Prasanna S, Reddy S B, Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks and modulation spectrum energies. IEEE Trans. Audio Speech Language Process. 17(4): 556–565

    Article  Google Scholar 

  • Rabiner L R, Schafer R W 1969 The chirp z-transform algorithm and its application. Bell Syst. Tech. J. 48(5): 1249–1292

    MathSciNet  Google Scholar 

  • Ramasubramanian V, Jayaram A K V S, Sreenivas T V 2003 Language identification using parallel sub-word recognition — an ergodic hmm equivalence. Proc. EUROSPEECH, Geneva, Switzerland, 1357–1360

  • Rao M N, Thomas S, Nagarajan T, Murthy H A 2005 Text-to-speech synthesis using syllable-like units. Proc. National Conference on Communications, Kharagpur, India, 227–280

  • Rasipuram R, Hegde R M, Murthy H A 2008 Incorporating acoustic diversity into the linguistic feature space for syllable recognition. Proc. EUSIPCO 2008, Lausanne, Switzerland, www.eurasip.org/Proceedings/Eusipco/papers/1569104561.pdf

  • Sethi A, Narayanan S 2003 Split-lexicon based hierarchial recognition of speech using syllable and word level acoustic units. Proc. IEEE Int. Conf. Acoust. Speech Signal Process, Hong Kong, 185–187

  • Shi G, Shanechi M, Aarabi P 2006 On the importance of phase in human speech recognition. IEEE Trans. on Audio Speech Language Processing 14(5): 1867–1874

    Article  Google Scholar 

  • TIMIT 1990 Acoustic-phonetic continuous speech corpus. National Institute of Standards and Technology Speech Disc 1-1.1. Fisher W, Doddington G, Goudie Marshall K M 1986 The DARPA speech recognition research database: Specifications and status. Proc. DARPA Workshop on Speech Recognition, California, 93–99

  • Tribolet J 1979 A new phase unwrapping algorithm. IEEE Trans. Acoust. Speech Signal Process 2: 170–179

    Google Scholar 

  • Yegnanarayana B 1979 Formant extraction from linear-prediction phase spectra. J. Acoust. Soc. Am. 63: 1638–1640

    Article  Google Scholar 

  • Yegnanarayana B, Murthy H A 1992 Significance of group delay functions in spectrum estimation. IEEE Trans. Signal Process. 40(9): 2281–2289

    Article  MATH  Google Scholar 

  • Yegnanarayana B, Saikia D K, Krishan T R 1984 Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process 3: 610–623

    Article  Google Scholar 

  • Yip P, Rao K R 1997 Discrete cosine transform: Algorithms, advantages and applicatons (Boston, USA: Academic Press)

    Google Scholar 

  • Zissman M A 1996 Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process 4(1): 31–44

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HEMA A MURTHY.

Rights and permissions

Reprints and permissions

About this article

Cite this article

MURTHY, H.A., YEGNANARAYANA, B. Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011). https://doi.org/10.1007/s12046-011-0045-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-011-0045-1

Keywords

Navigation