Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion
- 337 Downloads
- 4 Citations
Abstract
In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.
Keywords
Bessel series expansion Voice onset time Glottal closure instant DESA AM-FMNotes
Acknowledgements
The authors would like to thank the Department of Information Technology (DIT), Government of India, and the Defense Research and Development Organization (DRDO), Government of India, for supporting this activity through sponsored research projects. The second author would also like to thank The Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170), and the European community’s seventh framework programme (FP7/2007–2013) under grant agreement no. 287678 (Simple4All) for supporting his stay in Finland as a postdoctoral researcher.
References
- 1.M. Brookes, P.A. Naylor, J. Gundnason, A quantitative assessment of group delay method for identifying glottal closure in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006) CrossRefGoogle Scholar
- 2.C.S. Chen, K. Gopalan, P. Mitra, Speech signal analysis and synthesis via Fourier–Bessel representation, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1985), pp. 497–500 Google Scholar
- 3.S. Das, J.H.L. Hansen, Detection of voice onset time (VOT) for unvoiced stops (/k/, /t/, /p/) using the Teager energy operator (TEO) for automatic detection of accented English, in Proc. 6th Nordic Signal Processing Symposium (2004), pp. 344–347 Google Scholar
- 4.K. Gopalan, T.R. Anderson, E.J. Cupples, A comparison of speaker identification results using features based on cepstrum and Fourier–Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999) CrossRefGoogle Scholar
- 5.K. Gopalan, Speech coding using Fourier–Bessel expansion of speech signals, in Proc. 27th Annu. Conf. IEE Industrial Electronics Society, vol. 3 (2001), pp. 2199–2203 Google Scholar
- 6.F.S. Gurgen, C.S. Chen, Speech enhancement by Fourier–Bessel coefficients of speech and noise. Commun. Speech Vis., IEE Proc. I 137(5), 290–294 (1990) CrossRefGoogle Scholar
- 7.J.F. Kaiser, On a simple algorithm to calculate the energy of a signal, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1990), pp. 381–384 CrossRefGoogle Scholar
- 8.L. Kaushik, D. O’Saughnessy, A novel method for epoch extraction from speech signals, in Proc. Interspeech (2009), pp. 2883–2886 Google Scholar
- 9.P.A. Keating, J.R. Westbury, K.N. Stevens, Mechanisms of stop-consonant release for different places of articulation. J. Acoust. Soc. Am. 67, 93 (1980) CrossRefGoogle Scholar
- 10.J. Kominek, A. Black, The CMU Arctic speech databases, in Proc. 5th ISCA Speech Synthesis Workshop (2004), pp. 223–234 Google Scholar
- 11.A.K. Krishnamurthy, Glottal source estimation using a sum-of-exponential model. IEEE Trans. Acoust. Speech Signal Process. 40(3), 682–686 (1992) MathSciNetCrossRefGoogle Scholar
- 12.P. Ladefoged, A Course in Phonetics, 3rd edn. (Harcourt Brace College, Fort Worth, 1993) Google Scholar
- 13.P. Maragos, J.F. Kaiser, T.F. Quatieri, Energy separation in signal modulation with application to speech analysis. Digit. Signal Process. 41(10), 3024–3051 (1993) MATHGoogle Scholar
- 14.K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008) CrossRefGoogle Scholar
- 15.J.I. Navarro-Mesa, E. Lleida-Solano, A. Moreno-Bilbao, A new method for epoch detection based on the Cohen’s class of time-frequency representations. IEEE Signal Process. Lett. 8(8), 225–227 (2001) CrossRefGoogle Scholar
- 16.A. Nayeemulla Khan, S.V. Gangashetty, S. Rajendran, Speech database for Indian languages—a preliminary study, in Proc. Int. Conf. Natural Language Processing, Mumbai, India (2002), pp. 295–301 Google Scholar
- 17.P.A. Naylor, A. Kounoudes, J. Gundnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007) CrossRefGoogle Scholar
- 18.R.B. Pachori, P. Sircar, Analysis of multicomponent AM-FM signals using FB-DESA method. Digit. Signal Process. 20, 42–62 (2010) CrossRefGoogle Scholar
- 19.C. Prakash, N. Dhananjaya, S.V. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. 2011 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 139–142 Google Scholar
- 20.C. Prakash, S.V. Gangashetty, Fourier–Bessel cepstral coefficients for robust speech recognition, in Proc. Inter. Conf. Signal Processing and Communication (SPCOM) (2012), pp. 1–5 Google Scholar
- 21.C. Prakash, N. Dhananjaya, S.V. Gangashetty, Detection of glottal closure instants from Bessel features using AM-FM signal, in Proc. 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 143–146 Google Scholar
- 22.C. Prakash, N. Dhananjaya, S.V. Gangashetty, Exploring Bessel features for detection of glottal closure instants, in Proc. Interspeech (2011), pp. 1985–1988 Google Scholar
- 23.K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007) CrossRefGoogle Scholar
- 24.J. Schroeder, Signal processing via Fourier–Bessel series expansion. Digit. Signal Process. 3, 112–124 (1993) MathSciNetCrossRefGoogle Scholar
- 25.D.O. Shaughnessy, in Speech Communications Human and Machine, 2nd edn. (Wiley/IEEE, New York, 1999) CrossRefGoogle Scholar
- 26.K. Sjolander, J. Beskow, Wavesurfer—an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000), pp. 464–467 Google Scholar
- 27.R. Smiths, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay functions. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995) CrossRefGoogle Scholar
- 28.K.N. Stevens, Acoustic Phonetics (MIT, Cambridge, 1999) Google Scholar
- 29.A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993) CrossRefGoogle Scholar
- 30.B. Yegnanarayana, S.V. Gangashetty, Machine learning for speech recognition—an illustration of phonetic engine using hidden Markov models, in Proc. Inter. Conf. Frontiers of Interface Between Statistics and Science (2010), pp. 319–328 Google Scholar