Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion

Abstract

In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    M. Brookes, P.A. Naylor, J. Gundnason, A quantitative assessment of group delay method for identifying glottal closure in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006)

    Article  Google Scholar 

  2. 2.

    C.S. Chen, K. Gopalan, P. Mitra, Speech signal analysis and synthesis via Fourier–Bessel representation, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1985), pp. 497–500

    Google Scholar 

  3. 3.

    S. Das, J.H.L. Hansen, Detection of voice onset time (VOT) for unvoiced stops (/k/, /t/, /p/) using the Teager energy operator (TEO) for automatic detection of accented English, in Proc. 6th Nordic Signal Processing Symposium (2004), pp. 344–347

    Google Scholar 

  4. 4.

    K. Gopalan, T.R. Anderson, E.J. Cupples, A comparison of speaker identification results using features based on cepstrum and Fourier–Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999)

    Article  Google Scholar 

  5. 5.

    K. Gopalan, Speech coding using Fourier–Bessel expansion of speech signals, in Proc. 27th Annu. Conf. IEE Industrial Electronics Society, vol. 3 (2001), pp. 2199–2203

    Google Scholar 

  6. 6.

    F.S. Gurgen, C.S. Chen, Speech enhancement by Fourier–Bessel coefficients of speech and noise. Commun. Speech Vis., IEE Proc. I 137(5), 290–294 (1990)

    Article  Google Scholar 

  7. 7.

    J.F. Kaiser, On a simple algorithm to calculate the energy of a signal, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1990), pp. 381–384

    Google Scholar 

  8. 8.

    L. Kaushik, D. O’Saughnessy, A novel method for epoch extraction from speech signals, in Proc. Interspeech (2009), pp. 2883–2886

    Google Scholar 

  9. 9.

    P.A. Keating, J.R. Westbury, K.N. Stevens, Mechanisms of stop-consonant release for different places of articulation. J. Acoust. Soc. Am. 67, 93 (1980)

    Article  Google Scholar 

  10. 10.

    J. Kominek, A. Black, The CMU Arctic speech databases, in Proc. 5th ISCA Speech Synthesis Workshop (2004), pp. 223–234

    Google Scholar 

  11. 11.

    A.K. Krishnamurthy, Glottal source estimation using a sum-of-exponential model. IEEE Trans. Acoust. Speech Signal Process. 40(3), 682–686 (1992)

    MathSciNet  Article  Google Scholar 

  12. 12.

    P. Ladefoged, A Course in Phonetics, 3rd edn. (Harcourt Brace College, Fort Worth, 1993)

    Google Scholar 

  13. 13.

    P. Maragos, J.F. Kaiser, T.F. Quatieri, Energy separation in signal modulation with application to speech analysis. Digit. Signal Process. 41(10), 3024–3051 (1993)

    MATH  Google Scholar 

  14. 14.

    K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  15. 15.

    J.I. Navarro-Mesa, E. Lleida-Solano, A. Moreno-Bilbao, A new method for epoch detection based on the Cohen’s class of time-frequency representations. IEEE Signal Process. Lett. 8(8), 225–227 (2001)

    Article  Google Scholar 

  16. 16.

    A. Nayeemulla Khan, S.V. Gangashetty, S. Rajendran, Speech database for Indian languages—a preliminary study, in Proc. Int. Conf. Natural Language Processing, Mumbai, India (2002), pp. 295–301

    Google Scholar 

  17. 17.

    P.A. Naylor, A. Kounoudes, J. Gundnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  18. 18.

    R.B. Pachori, P. Sircar, Analysis of multicomponent AM-FM signals using FB-DESA method. Digit. Signal Process. 20, 42–62 (2010)

    Article  Google Scholar 

  19. 19.

    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. 2011 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 139–142

    Google Scholar 

  20. 20.

    C. Prakash, S.V. Gangashetty, Fourier–Bessel cepstral coefficients for robust speech recognition, in Proc. Inter. Conf. Signal Processing and Communication (SPCOM) (2012), pp. 1–5

    Google Scholar 

  21. 21.

    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Detection of glottal closure instants from Bessel features using AM-FM signal, in Proc. 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 143–146

    Google Scholar 

  22. 22.

    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Exploring Bessel features for detection of glottal closure instants, in Proc. Interspeech (2011), pp. 1985–1988

    Google Scholar 

  23. 23.

    K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)

    Article  Google Scholar 

  24. 24.

    J. Schroeder, Signal processing via Fourier–Bessel series expansion. Digit. Signal Process. 3, 112–124 (1993)

    MathSciNet  Article  Google Scholar 

  25. 25.

    D.O. Shaughnessy, in Speech Communications Human and Machine, 2nd edn. (Wiley/IEEE, New York, 1999)

    Google Scholar 

  26. 26.

    K. Sjolander, J. Beskow, Wavesurfer—an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000), pp. 464–467

    Google Scholar 

  27. 27.

    R. Smiths, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay functions. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)

    Article  Google Scholar 

  28. 28.

    K.N. Stevens, Acoustic Phonetics (MIT, Cambridge, 1999)

    Google Scholar 

  29. 29.

    A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  30. 30.

    B. Yegnanarayana, S.V. Gangashetty, Machine learning for speech recognition—an illustration of phonetic engine using hidden Markov models, in Proc. Inter. Conf. Frontiers of Interface Between Statistics and Science (2010), pp. 319–328

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Department of Information Technology (DIT), Government of India, and the Defense Research and Development Organization (DRDO), Government of India, for supporting this activity through sponsored research projects. The second author would also like to thank The Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170), and the European community’s seventh framework programme (FP7/2007–2013) under grant agreement no. 287678 (Simple4All) for supporting his stay in Finland as a postdoctoral researcher.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chetana Prakash.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Prakash, C., Gowda, D.N. & Gangashetty, S.V. Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion. Circuits Syst Signal Process 32, 2915–2938 (2013). https://doi.org/10.1007/s00034-013-9596-1

Download citation

Keywords

  • Bessel series expansion
  • Voice onset time
  • Glottal closure instant
  • DESA
  • AM-FM