Skip to main content
Log in

A detection and classification method for nasalized vowels in noise using product spectrum based cepstra

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, a method based on cepstra derived from the product spectrum is developed for the detection and classification of nasalized vowels with varying degree of nasalization. Conventionally, features for detecting and classifying nasalized vowels are derived considering magnitude spectrum only, ignoring the phase spectrum. Exploiting the power spectrum and the group delay function of a band limited vowel, the product spectrum is defined thus incorporating the information of both magnitude and phase spectra. Unlike conventional mel frequency cepstral coefficients (MFCCs) derived from the power spectrum, MFCCs computed from the product spectrum, namely MFPSCCs are fed to a linear discriminant analysis (LDA) based classifier for the detection and classification of nasalized vowels. The performance of nasalized vowel detection and classification based on some of the state-of-the-art features, namely MFCCs, A1–P1 are compared with that of the proposed feature using not only LDA based classifier but also support vector machine based classifier. A detail simulation results on TIMIT database show that the proposed cepstral features derived from the product spectrum outperform the state-of-the-art features in the task of detecting and classifying nasalized vowels in clean as well as different noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Beddor, P. S. (1993). The perception of nasal vowels. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (Vol. 5, pp. 171–196). San Diego: Academic Press.

    Chapter  Google Scholar 

  • Bell-Berti, F. (1993). Understanding velic motor control:studies of segmental context. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (pp. 63–85). San Diego: Academic Press.

    Chapter  Google Scholar 

  • Cairns, D. A., Hansen, J., & Riski, J. (1996). A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Transactions on Biomedical Engineering, 43(1), 35. doi:10.1109/10.477699.

  • Chen, M. Y. (1995). Acoustic parameters of nasalized vowels in hearing impaired and normal hearing speakers. Journal of Acoustic Society of America, 98, 2443–2453.

    Article  Google Scholar 

  • Chen, M. Y. (1997). Acoustic correlates of english and french nasalized vowels. Journal of Acoustic Society of America, 102(4), 2360–2370.

    Article  Google Scholar 

  • Chen, N. F., Slifka, J. L., Stevens, K. N. (2007). Vowel nasalization in american english: Acoustic variability due to phonetic context. Speech Communication (pp. 905–918).

  • Deng, L., Acero, A., & Bazzi, I. (2006). Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 425–434. doi:10.1109/TSA.2005.855841.

  • Fant, G. (1960). Acoustic theory of speech production (2nd ed.). The Netherlands: Mouton.

    Google Scholar 

  • Glass, J. R., Zue, V. W. (1985). Detection of nasalized vowels in american english. In Proceedings of IEEE International Conference of Acoustic, Speech, and Signal Processing (pp. 1569–1572)

  • Hawkins, S., & Stevens, K. N. (1985). Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. Journal of Acoustic Society of America, 77(4), 1560–1574.

    Article  Google Scholar 

  • Hedge, R. M., & Murthy, H. A. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transaction on Audio, Speech and Language Processing, 5(1), 189–201.

    Google Scholar 

  • Hori, Y. (1983). An accelerometric measure as a physical correlate of perceived hypernasality in speech. Journal of Speech, Language and Hearing Research, 26, 476–480.

    Article  Google Scholar 

  • Johnson, M. H. (2005). Landmark-based speech recognition: Report of the 2004 johns hopkins summer workshop. In Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 213–216).

  • Kanagasundaram, A., Dean, D. B., et al. (2012). Weighted LDA techniques for i-vector based speaker verification. In IEEE Transactions on Acoustics, Speech, and Signal Processing (pp. 4781–4784). Japan: IEEE.

  • Kim, H. K., & Rose, R. C. (2003). Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Audio, Speech and Language Processing, 11(5), 435–446.

  • Krakow, R. (1993). Nonsegmental influences on velum movement patterns: Syllables, sentences, stress and speaking rate. In R. K. M. K. Huffman (Ed.), Phonetics and phonology: Nasals, nasalization, and the velum (Vol. 5, pp. 87–116). San Diego: Academic Press.

    Chapter  Google Scholar 

  • Maddieson, I. (1984). Patterns of sounds. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Oppenheim, A. V., & Schafer, R. W. (2004). From frequency to quefrency: A history of the cepstrum. IEEE Signal Processing Magazine, 21(5), 95–106.

    Article  Google Scholar 

  • O’Shaughnessy, D. (2000). Speech communications: Human and machine (2nd ed.). New York: Universities Press.

    Google Scholar 

  • Prasad, V. K., Nagarajan, T., Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communication, 42, 429–446. doi:10.1016/j.specom.2003.12.002. http://www.sciencedirect.com/science/article/pii/S0167639303001444.

  • Pruthi, T. (2007). Analysis, vocal-tract modeling, and automatic detection of vowel nasalization. Ph.D. Thesis, University of Maryland, College Park.

  • Pruthi, T., & Espy-Wilson, C. Y. (2004). Acoustic parameters for automatic detection of nasal manner. Speech Communication, 43(3), 225–239.

    Article  Google Scholar 

  • Rodenbaugh, M. A., & Reich, A. R. (1985). Correspondence between an accelerometric nasal/voice amplitude ratio and listeners direct magnitude estimation of hypernasality. Journal of Speech and Hearing Research, 28, 273–281.

    Article  Google Scholar 

  • Seaver, E. J., Dalston, R. M., Leeper, H. A., & Adams, L. E. (1991). A study of nasometric values for normal nasal resonance. Journal of Speech and Hearing Research, 34(4), 715–721.

    Article  Google Scholar 

  • TIMIT. (1990). TIMIT acoustic-phonetic continuous speech corpus. In National Institute of Standards and Technology Speech Disc 1–1.1, NTIS order no. pb91-5050651996.

  • Verhelst, W., & Steenhaut, O. (1986). A new model for the shorttime complex cepstrum of voiced speech. IEEE Transactions on Audio, Speech and Language Processing, 34(1), 43–51.

    Google Scholar 

  • Yegnanarayana, B., Saikia, D., & Krishnan, T. (1984). Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 610–623. doi:10.1109/TASSP.1984.1164365.

    Article  Google Scholar 

  • Young, S. (1996). A review of large-vocabulary continuous-speech. IEEE Signal Processing Magazine, 13(5), 45. doi:10.1109/79.536824.

    Article  Google Scholar 

  • Yuan, J., Liberman, M. (2011). Automatic measurement and comparison of vowel nasalization across languages. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong (pp. 2011–2247).

  • Yuan, J., Seidl, A., & Cristi, A. (2010). Automatic detection and comparison of vowel nasalization in American English. Journal of Acoustic Society of America, 128(4), 2291.

    Article  Google Scholar 

  • Zhu, D., Paliwal, K. K. (2004). Product of power spectrum and group delay function for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 125–8).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celia Shahnaz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Najnin, S., Shahnaz, C. A detection and classification method for nasalized vowels in noise using product spectrum based cepstra. Int J Speech Technol 18, 97–111 (2015). https://doi.org/10.1007/s10772-014-9225-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9225-9

Keywords

Navigation