Application of non-negative frequency-weighted energy operator for vowel region detection

Article
  • 18 Downloads

Abstract

In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.

Keywords

Vowel onset point Vowel end-point Instantaneous energy contour Envelope-derivative of the speech signal Uniformity of epoch intervals Strength of the excitation 

References

  1. Ananthapadmanabha, T., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 309–319.CrossRefGoogle Scholar
  2. Deller, J. R, Jr., Proakis, J. G., & Hansen, J. H. (1993). Discrete time processing of speech signals. Englewood Cliffs: Prentice Hall PTR.Google Scholar
  3. Donaldson, G. S., Rogers, C. L., Cardenas, E. S., Russell, B. A., & Hanna, N. H. (2013). Vowel identification by cochlear implant users: Contributions of static and dynamic spectral cues. The Journal of the Acoustical Society of America, 134(4), 3021–3028.CrossRefGoogle Scholar
  4. Dumpala, S. H., Nellore, B. T., Nevali, R. R., Gangashetty, S. V., & Yegnanarayana, B. (2016). Robust vowel landmark detection using epoch-based features. In INTERSPEECH (pp. 160–164).Google Scholar
  5. Fant, G. (1971). Acoustic theory of speech production: With calculations based on X-ray studies of Russian articulations. Berlin: Walter de Gruyter.CrossRefGoogle Scholar
  6. Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In Fifteenth annual conference of the International Speech Communication Association.Google Scholar
  7. Glass, J. R. (2003). A probabilistic framework for segment-based speech recognition. Computer Speech & Language, 17(2), 137–152.CrossRefGoogle Scholar
  8. Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789.CrossRefGoogle Scholar
  9. Hermes, D. J. (1990). Vowel-onset detection. The Journal of the Acoustical Society of America, 87(2), 866–873.CrossRefGoogle Scholar
  10. Johnson, K. (2004). Acoustic and auditory phonetics. Phonetica, 61(1), 56–58.CrossRefGoogle Scholar
  11. Juneja, A., & Espy-Wilson, C. (2008). A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. The Journal of the Acoustical Society of America, 123(2), 1154–1168.CrossRefGoogle Scholar
  12. Kaiser, J. F. (1990). On a simple algorithm to calculate the ’energy’ of a signal. In Proceedings of the 1990 international conference on acoustics, speech, and signal processing (ICASSP-90), pp. 381–384.Google Scholar
  13. Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the 18th IEEE international conference on acoustics, speech, and signal processing (ICASSP '93), vol. 3, pp. 149–152.Google Scholar
  14. Kashani, H. B., Sayadiyan, A., & Sheikhzadeh, H. (2017). Vowel detection using a perceptually-enhanced spectrum matching conditioned to phonetic context and speaker identity. Speech Communication, 91, 28–48.CrossRefGoogle Scholar
  15. Kumar, A., Shahnawazuddin, S., & Pradhan, G. (2017). Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits, Systems, and Signal Processing, 36(6), 2315–2340.MathSciNetCrossRefGoogle Scholar
  16. Liu, S. A. (1996). Landmark detection for distinctive feature-based speech recognition. The Journal of the Acoustical Society of America, 100(5), 3417–3430.CrossRefGoogle Scholar
  17. Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRefGoogle Scholar
  18. Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.CrossRefGoogle Scholar
  19. O’Toole, J. M., Temko, A., & Stevenson, N. (2014). Assessing instantaneous energy in the EEG: A non-negative, frequency-weighted energy operator. In Engineering in Medicine and Biology Society (EMBC), 2014 36th annual international conference of the IEEE, pp. 3288–3291.Google Scholar
  20. Palmu, K., Stevenson, N., Wikström, S., Hellström-Westas, L., Vanhatalo, S., & Palva, J. M. (2010). Optimization of an nleo-based algorithm for automated detection of spontaneous activity transients in early preterm EEG. Physiological Measurement, 31(11), N85.CrossRefGoogle Scholar
  21. Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.CrossRefGoogle Scholar
  22. Prasanna, S. M. & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation information. In Ninth European conference on speech communication and technology.Google Scholar
  23. Prasanna, S. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.CrossRefGoogle Scholar
  24. Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.CrossRefGoogle Scholar
  25. Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51(12), 1263–1269.CrossRefGoogle Scholar
  26. Rose, P. (2003). Forensic speaker identification. Boca Raton: CRC Press.Google Scholar
  27. Saha, P., Laskar, R. H., & Laskar, A. (2016). A pre-processing method for improvement of vowel onset point detection under noisy conditions. Speech Communication, 80, 71–83.CrossRefGoogle Scholar
  28. Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. The Journal of the Acoustical Society of America, 115(3), 1296–1305.CrossRefGoogle Scholar
  29. Schutte, K., & Glass, J., (2005). Robust detection of sonorant landmarks. In Ninth European conference on speech communication and technology.Google Scholar
  30. Stevens, K. N. (2000). Acoustic phonetics. Cambridge: MIT Press.Google Scholar
  31. Teager, H., & Teager, S. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. Speech Production and Speech Modelling, 55, 241–261.CrossRefGoogle Scholar
  32. Vuppala, A. K., & Rao, K. S. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.CrossRefGoogle Scholar
  33. Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012). Improved vowel onset point detection using epoch intervals. AEU-International Journal of Electronics and Communications, 66(8), 697–700.CrossRefGoogle Scholar
  34. Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1894–1903.CrossRefGoogle Scholar
  35. Vydana, H. K., Vikash, P., Vamsi, T., Kumar, K. P., & Vuppala, A. K. (2015). Detection of emotionally significant regions of speech for emotion recognition. In India conference (INDICON), 2015 Annual IEEE, pp. 1–6.Google Scholar
  36. Vydana, H. K., & Vuppala, A. K. (2016). Detection of fricatives using s-transform. The Journal of the Acoustical Society of America, 140(5), 3896–3907.CrossRefGoogle Scholar
  37. Yadav, J., & Rao, K. S. (2013). Detection of vowel offset point from speech signal. IEEE Signal Processing Letters, 20(4), 299–302.CrossRefGoogle Scholar
  38. Yegnanarayana, B., Prasanna, S. M. & Guruprasad, S. (2011). Study of robustness of zero frequency resonator method for extraction of fundamental frequency. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5392–5395.Google Scholar
  39. Yegnanarayana, B., & Murty, K. S. R. (2009). Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Speech Processing Lab, KCISInternational Institute of Information Technology, Hyderabad (IIIT-H)HyderabadIndia

Personalised recommendations