Circuits, Systems, and Signal Processing

, Volume 32, Issue 6, pp 2915–2938 | Cite as

Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion

  • Chetana Prakash
  • Dhananjaya N. Gowda
  • Suryakanth V. Gangashetty
Article

Abstract

In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.

Keywords

Bessel series expansion Voice onset time Glottal closure instant DESA AM-FM 

References

  1. 1.
    M. Brookes, P.A. Naylor, J. Gundnason, A quantitative assessment of group delay method for identifying glottal closure in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006) CrossRefGoogle Scholar
  2. 2.
    C.S. Chen, K. Gopalan, P. Mitra, Speech signal analysis and synthesis via Fourier–Bessel representation, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1985), pp. 497–500 Google Scholar
  3. 3.
    S. Das, J.H.L. Hansen, Detection of voice onset time (VOT) for unvoiced stops (/k/, /t/, /p/) using the Teager energy operator (TEO) for automatic detection of accented English, in Proc. 6th Nordic Signal Processing Symposium (2004), pp. 344–347 Google Scholar
  4. 4.
    K. Gopalan, T.R. Anderson, E.J. Cupples, A comparison of speaker identification results using features based on cepstrum and Fourier–Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999) CrossRefGoogle Scholar
  5. 5.
    K. Gopalan, Speech coding using Fourier–Bessel expansion of speech signals, in Proc. 27th Annu. Conf. IEE Industrial Electronics Society, vol. 3 (2001), pp. 2199–2203 Google Scholar
  6. 6.
    F.S. Gurgen, C.S. Chen, Speech enhancement by Fourier–Bessel coefficients of speech and noise. Commun. Speech Vis., IEE Proc. I 137(5), 290–294 (1990) CrossRefGoogle Scholar
  7. 7.
    J.F. Kaiser, On a simple algorithm to calculate the energy of a signal, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1990), pp. 381–384 CrossRefGoogle Scholar
  8. 8.
    L. Kaushik, D. O’Saughnessy, A novel method for epoch extraction from speech signals, in Proc. Interspeech (2009), pp. 2883–2886 Google Scholar
  9. 9.
    P.A. Keating, J.R. Westbury, K.N. Stevens, Mechanisms of stop-consonant release for different places of articulation. J. Acoust. Soc. Am. 67, 93 (1980) CrossRefGoogle Scholar
  10. 10.
    J. Kominek, A. Black, The CMU Arctic speech databases, in Proc. 5th ISCA Speech Synthesis Workshop (2004), pp. 223–234 Google Scholar
  11. 11.
    A.K. Krishnamurthy, Glottal source estimation using a sum-of-exponential model. IEEE Trans. Acoust. Speech Signal Process. 40(3), 682–686 (1992) MathSciNetCrossRefGoogle Scholar
  12. 12.
    P. Ladefoged, A Course in Phonetics, 3rd edn. (Harcourt Brace College, Fort Worth, 1993) Google Scholar
  13. 13.
    P. Maragos, J.F. Kaiser, T.F. Quatieri, Energy separation in signal modulation with application to speech analysis. Digit. Signal Process. 41(10), 3024–3051 (1993) MATHGoogle Scholar
  14. 14.
    K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008) CrossRefGoogle Scholar
  15. 15.
    J.I. Navarro-Mesa, E. Lleida-Solano, A. Moreno-Bilbao, A new method for epoch detection based on the Cohen’s class of time-frequency representations. IEEE Signal Process. Lett. 8(8), 225–227 (2001) CrossRefGoogle Scholar
  16. 16.
    A. Nayeemulla Khan, S.V. Gangashetty, S. Rajendran, Speech database for Indian languages—a preliminary study, in Proc. Int. Conf. Natural Language Processing, Mumbai, India (2002), pp. 295–301 Google Scholar
  17. 17.
    P.A. Naylor, A. Kounoudes, J. Gundnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007) CrossRefGoogle Scholar
  18. 18.
    R.B. Pachori, P. Sircar, Analysis of multicomponent AM-FM signals using FB-DESA method. Digit. Signal Process. 20, 42–62 (2010) CrossRefGoogle Scholar
  19. 19.
    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. 2011 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 139–142 Google Scholar
  20. 20.
    C. Prakash, S.V. Gangashetty, Fourier–Bessel cepstral coefficients for robust speech recognition, in Proc. Inter. Conf. Signal Processing and Communication (SPCOM) (2012), pp. 1–5 Google Scholar
  21. 21.
    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Detection of glottal closure instants from Bessel features using AM-FM signal, in Proc. 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 143–146 Google Scholar
  22. 22.
    C. Prakash, N. Dhananjaya, S.V. Gangashetty, Exploring Bessel features for detection of glottal closure instants, in Proc. Interspeech (2011), pp. 1985–1988 Google Scholar
  23. 23.
    K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007) CrossRefGoogle Scholar
  24. 24.
    J. Schroeder, Signal processing via Fourier–Bessel series expansion. Digit. Signal Process. 3, 112–124 (1993) MathSciNetCrossRefGoogle Scholar
  25. 25.
    D.O. Shaughnessy, in Speech Communications Human and Machine, 2nd edn. (Wiley/IEEE, New York, 1999) CrossRefGoogle Scholar
  26. 26.
    K. Sjolander, J. Beskow, Wavesurfer—an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000), pp. 464–467 Google Scholar
  27. 27.
    R. Smiths, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay functions. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995) CrossRefGoogle Scholar
  28. 28.
    K.N. Stevens, Acoustic Phonetics (MIT, Cambridge, 1999) Google Scholar
  29. 29.
    A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993) CrossRefGoogle Scholar
  30. 30.
    B. Yegnanarayana, S.V. Gangashetty, Machine learning for speech recognition—an illustration of phonetic engine using hidden Markov models, in Proc. Inter. Conf. Frontiers of Interface Between Statistics and Science (2010), pp. 319–328 Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Chetana Prakash
    • 1
  • Dhananjaya N. Gowda
    • 2
  • Suryakanth V. Gangashetty
    • 1
  1. 1.International Institute of Information Technology HyderabadHyderabadIndia
  2. 2.Department of Signal Processing and AcousticsAalto UniversityEspooFinland

Personalised recommendations