Epoch Extraction by Phase Modelling of Speech Signals

Vijayan, Karthika; Murty, K. Sri Rama

doi:10.1007/s00034-015-0166-6

Epoch Extraction by Phase Modelling of Speech Signals

Published: 19 September 2015

Volume 35, pages 2584–2609, (2016)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Karthika Vijayan¹ &
K. Sri Rama Murty¹

390 Accesses
9 Citations
Explore all metrics

Abstract

Epochs are instants of significant excitation of vocal-tract system in speech production process. In this paper, we attempt to extract information about epochs from phase spectra of speech signals. The phase spectrum of speech is modelled as the response of an allpass (AP) filter, and the resulting error signal is used for epoch extraction. The parameters of AP model are estimated by imposing sparsity constraints on the error signal. The error signal, thus obtained, exhibits prominent peaks at epoch locations. The epochal candidates obtained from the error signal are refined using a dynamic programming algorithm. The performance of the proposed method is consistent across genders and is comparable with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. Acoust. Speech Signal Process. IEEE Trans. 23(6), 562–570 (1975)
Article Google Scholar
T.V. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. Acoust. Speech Signal Process. IEEE Trans. ASSP 27(4), 309–319 (1979)
Article Google Scholar
B. Andrews, R.A. Davis, F.J. Breidt, Maximum likelihood estimation for all-pass time series models. J. Multivar. Anal. 97, 1638–1659 (2006)
Article MathSciNet MATH Google Scholar
B.S. Atal, S.L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2B), 637–655 (1971). doi:10.1121/1.1912679
Article Google Scholar
A. Bouzid, N. Ellouze, Multiscale product of electroglottogram signal for glottal closure and opening instant detection. In: IMACS Multiconference on Computational Engineering in Systems Application, pp. 106–109 (2006)
F.J. Breidt, R.A. Davis, A.A. Trindade, Least absolute deviation estimation for all-pass time series models. Ann. Stat. 29, 919–946 (2001)
MathSciNet MATH Google Scholar
M. Brookes, Voicebox: speech processing toolbox for matlab. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (1997)
M. Brookes, P.A. Naylor, J. Gudnason, A quantitative assessment of group delay methods for identifying glottal closures in voiced speech. Audio, Speech Lang. Process. IEEE Trans. 14(2), 456–466 (2006)
Article Google Scholar
J.K. Chen, F.K. Soong, An N-best candidates-based discriminative training for speech recognition applications. Speech Audio Process. IEEE Trans. 2(1), 206–216 (1994)
Article Google Scholar
Y.M. Cheng, D. O’Shaughnessy, Automatic and reliable estimation of glottal closure instant and period. Acoust. Speech Signal Process. IEEE Trans. 37(12), 1805–1815 (1989)
Article Google Scholar
C.Y. Chi, J.Y. Kung, A new identification algorithm for allpass systems by higher-order statistics. Signal Process. 41, 239–256 (1995)
Article MATH Google Scholar
H.M. Chien, H.L. Yang, C.Y. Chi, Parametric cumulant based phase estimation of 1-D and 2-D nonminimum phase systems by allpass filtering. Signal Process. IEEE Trans. 45, 1742–1762 (1997)
Article MATH Google Scholar
T.M. Cover, J.A. Thomas, Elements of Information Theory. Wiley series in Telecommunications and Signal Processing (Wiley-Interscience, A John Wiley & Sons, Inc., Hoboken, NJ, 2006)
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals. In: Interspeech, pp. 2891–2894 (2009)
T. Drugman, M. Thomas, J. Gudnason, P.A. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. Audio, Speech Lang. Process. IEEE Trans. 20(3), 994–1006 (2012)
Article Google Scholar
J. Gauffin, J. Sundberg, Spectral correlates of glottal voice source waveform characteristics. J. Speech Hear. Res. 32, 556–565 (1989)
N. Henrich, C. D’ Alessandro, B. Doval, M. Castellengo, On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. J. Acoust. Soc. Am. 115(3), 1321–1332 (2004)
Article Google Scholar
W. Hess, H. Indefrey, Accurate pitch determination of speech signals by means of a laryngograph. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 73–76 (1984)
N. Hurley, S. Rickard, Comparing measures of sparsity. Info. Theory, IEEE Trans. 55(10), 4723–4741 (2009)
Article MathSciNet Google Scholar
J. Kane, C. Gobl, Evaluation of glottal closure instant detection in a range of voice qualities. Speech Commun. 55, 295–314 (2013)
Article Google Scholar
F.L.E. Lecluse, M.P. Brocaar, J. Verschurre, The electroglottography and its relation to glottal activity. Folia Phoniatrica 27(3), 215–224 (1975)
C. Ma, Y. Kamp, L.F. Willems, A Frobenius norm approach to glottal closure detection from the speech signal. Speech Audio Process. IEEE Trans. 2(2), 258–265 (1994)
Article Google Scholar
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
J.M. Mendel, Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications. Proc. IEEE 79(3), 278–305 (1991)
Article Google Scholar
P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. Speech Audio Process. IEEE Trans. 7(6), 609–619 (1999)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. Audio, Speech Lang. Process. IEEE Trans. 16(8), 1602–1613 (2008)
Article Google Scholar
J.L. Navarro-Mesa, E. Lleida-Solano, A. Moreno-Bilbao, A new method for epoch detection based on the Cohens class of time frequency representations. IEEE Signal Process. Lett. 8(8), 225–227 (2001)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. Audio, Speech Lang. Process. IEEE Trans. 15(1), 34–43 (2007)
Article Google Scholar
A. Neocleous,, P.A. Naylor, Voice source parameters for speaker verification. In: European Signal Processing Conference, pp. 697–700 (1998)
A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing. Signal processing series, 2nd edn. (Prentice Hall Inc, Upper Saddle River, NJ, USA, 1999)
Google Scholar
K.K. Paliwal, L.D. Alsteris, On the usefulness of STFT phase spectrum in human listening tests. Speech Commun. 45(2), 153–170 (2005). doi:10.1016/j.specom.2004.08.001
Article Google Scholar
M. Plumpe, T. Quatieri, D. Reynolds, Modeling of the glottal flow derivative waveform with application to speaker identification. Speech Audio Process. IEEE Trans. 7(5), 569–586 (1999). doi:10.1109/89.784109
Article Google Scholar
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. Audio, Speech Lang. Process. IEEE Trans. 21(12), 2471–2480 (2013)
Article Google Scholar
L.R. Rabiner, R.W. Schafer, Theory and Applications to Digital Speech Processing, 1st edn. (Pearson Higher Education, Inc., Upper Saddle River, NJ, 2010)
Google Scholar
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. Audio, Speech Lang. Process. IEEE Trans. 14(3), 972–980 (2006)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)
Article Google Scholar
R. Schwartz, Y.L. Chow, The N-best algorithm: an efficient and exact procedure for finding the N most likely sentence hypotheses. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 81–84 (1990)
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. Speech Audio Process. IEEE Trans. 3(5), 325–333 (1995)
Article Google Scholar
A.N. Sobakin, Digital computer determination of formant parameters of the vocal tract from a speech signal. Sov. Phys. Acoust 18, 84–90 (1972)
Google Scholar
H. Stark, J.W. Woods, Probability and Random Processes with Applications to Signal Processing, 3rd edn. (Prentice Hall PTR, Upper Saddle River, NJ, 2002)
Google Scholar
H.W. Strube, Determination of the instant of glottal closures from the speech wave. J. Acoust. Soc. Am. 56, 1625–1629 (1974)
Article Google Scholar
A. Swami, HOSA- Higher order spectral analysis toolbox. www.mathworks.in/matlabcentral/fileexchange/3013-hosa-higher-order-spectral-analysis-toolbox (2003)
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. Audio, Speech Lang. Process. IEEE Trans. 20(1), 82–91 (2012)
Article Google Scholar
M.R.P. Thomas, P.A. Naylor, The sigma algorithm: a glottal activity detector for electroglottographic signals. Audio, Speech Lang. Process. IEEE Trans. 17(8), 1557–1566 (2009)
Article Google Scholar
V.N. Tuan, C. D’ Alessandro, Robust glottal closure detection using the wavelet transform. In: The European Conference on Speech Technology, pp. 2805–2808 (1999)
D.E. Veeneman, S.L. BeMent, Automatic glottal inverse filtering from speech and electroglottographic signals. Acoustics, Speech Signal Process. IEEE Trans. 33(2), 369–377 (1985)
Article Google Scholar
K. Vijayan, K.S.R. Murty, Epoch extraction from allpass residual of speech signals. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1493–1497 (2014)
D.Y. Wong, J.D. Markel, A.H. Gray, Least squares glottal closure inverse filtering from the acoustic speech waveform. Acoustics, Speech Signal Process. IEEE Trans. ASSP–27(4), 35–355 (1979)
Google Scholar
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. Audio, Speech Lang. Process. IEEE Trans. 17(4), 614–624 (2009)
Article Google Scholar
B. Yegnanarayana, P.S. Murty, Enhancement of reverberant speech using LP residual signal. Speech Audio Process. IEEE Trans. 8(3), 267–281 (2000)
Article Google Scholar
B. Yegnanarayana, R.N.J. Veldhuis, Extraction of vocal-tract system characteristics from speech signals. Speech Audio Process. IEEE Trans. 6(4), 313–327 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Hyderabad, Yeddumailaram, Hyderabad, India
Karthika Vijayan & K. Sri Rama Murty

Authors

Karthika Vijayan
View author publications
You can also search for this author in PubMed Google Scholar
K. Sri Rama Murty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karthika Vijayan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijayan, K., Murty, K.S.R. Epoch Extraction by Phase Modelling of Speech Signals. Circuits Syst Signal Process 35, 2584–2609 (2016). https://doi.org/10.1007/s00034-015-0166-6

Download citation

Received: 07 April 2015
Revised: 03 September 2015
Accepted: 04 September 2015
Published: 19 September 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s00034-015-0166-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Epoch Extraction by Phase Modelling of Speech Signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Epoch Extraction by Phase Modelling of Speech Signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation