Skip to main content
Log in

Automated modification of consonant–vowel ratio of stops for improving speech intelligibility

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant–vowel ratio (CVR) modification, is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds and for hearing-impaired listeners. A technique for real-time CVR modification of stops using the rate of change of spectral centroid for detection of spectral transitions is presented. Its effectiveness in improving the recognition of consonants in the presence of speech-spectrum shaped noise is evaluated by conducting listening tests on normal-hearing subjects. At lower values of SNR, there was an increase of 7–21 % in recognition scores and an equivalent SNR advantage of 3 dB. The technique is implemented on a DSP board based on a 16-bit fixed point processor with on-chip FFT hardware and tested for satisfactory real-time operation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Ananthapadmanabha, T. V., Prathosh, A. P., & Ramakrishnan, A. G. (2014). Detection of closure burst transitions of stops and affricates in continuous speech using the plosion index. Journal of Acoustical Society of America, 135, 460–471.

    Article  Google Scholar 

  • Baer, T., Moore, B. C. J., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development, 30, 49–72.

    Google Scholar 

  • Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. Journal of Acoustical Society of America, 112, 272–284.

    Article  Google Scholar 

  • Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with learning disabilities. Journal of Speech, Language, and Hearing Research, 46, 80–97.

    Article  Google Scholar 

  • Colotte, V., & Laprie, Y. (2000). Automatic enhancement of speech intelligibility. In Proceedings of ICASSP 2000 (pp. 1057–1060). Istanbul, Turkey.

  • Dillon, H. (2001). Hearing aids. New York: Thieme Medical.

    Google Scholar 

  • Freyman, R. L., & Nerbonne, G. P. (1989). The importance of consonant–vowel intensity ratio in the intelligibility of voiceless consonants. Journal of Speech and Hearing Research, 32, 524–535.

    Article  Google Scholar 

  • Gan, W. S., Seth, A., & Kuo, S. M. (2011). Versatile and portable DSP platform for learning embedded signal processing. In Proceedings of ICASSP 2011 (pp. 2888–2891). Praugue, Czech Republic.

  • Gatehouse, S., & Gordon, J. (1990). Response times to speech stimuli as measures of benefit from amplification. British Journal of Audiology, 24, 63–68.

    Article  Google Scholar 

  • Gordon-Salant, S. (1986). Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hearing. Journal of Acoustical Society of America, 80, 1599–1607.

    Article  Google Scholar 

  • Hazan, V., & Simpson, A. (1998). The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise. Speech Communication, 24, 211–226.

    Article  Google Scholar 

  • House, A. S., Williams, C. E., Hecker, H. M. L., & Kryter, K. D. (1965). Articulation-testing methods: Consonantal differentiation with a closed-response set. Journal of Acoustical Society of America, 37, 158–166.

    Article  Google Scholar 

  • Jayan, A. R. (2014a). Enhancement of speech intelligibility using acoustic properties of clear speech. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.

  • Jayan, A. R. (2014b). Speech files used as the test material for evaluation of speech enhancement techniques. [online] www.ee.iitb.ac.in/~spilab/material/jayan_phd2014.

  • Jayan, A. R., & Pandey, P. C. (2012). Automated CVR modification for improving perception of stop consonants. In Proceedings of 18th national conference on communications (pp. 698–702). Kharagpur, India.

  • Jayan, A. R., & Pandey, P. C. (2009). Detection of stop landmarks using Gaussian mixture modeling of speech spectrum. In Proceedings of ICASSP 2009 (pp. 4681–4684). Taipei, Taiwan.

  • Jayan, A. R., Rajath Bhat, P. S., & Pandey, P. C. (2011). Detection of burst onset landmarks in speech using rate of change of spectral moments. In Proceedings of 17th national conference on communications (paper no. SpPrI.3), Bangalore, India.

  • Kapoor, A., & Allen, J. B. (2012). Perceptual effects of plosive feature modification. Journal of Acoustical Society of America, 131, 478–491.

    Article  Google Scholar 

  • Kennedy, E., Levitt, H., Neuman, A. C., & Wiess, M. (1998). Consonant–vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners. Journal of Acoustical Society of America, 103, 1098–1114.

    Article  Google Scholar 

  • Koning, R., & Wouters, J. (2012). The potential of onset enhancement for increased speech intelligibility in auditory prostheses. Journal of Acoustical Society of America, 132, 2569–2581.

    Article  Google Scholar 

  • Krause, J. C., & Braida, L. D. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of Acoustical Society of America, 115, 362–378.

    Article  Google Scholar 

  • Kulkarni, P. N., Pandey, P. C., & Jangamashetti, D. S. (2012). Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss. Speech Communication, 54, 341–350.

    Article  Google Scholar 

  • Li, F., Menon, A., & Allen, J. B. (2010). A psychoacoustic method to find the perceptual cues to stop consonants in natural speech. Journal of Acoustical Society of America, 127, 2599–2610.

    Article  Google Scholar 

  • Li, F., Menon, A., & Allen, J. B. (2012). A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise. Journal of Acoustical Society of America, 132, 2663–2675.

    Article  Google Scholar 

  • Lin, C. Y., & Wang, H. C. (2011). Burst onset landmark detection and its application to speech recognition. IEEE Transaction on Audio, Speech, Language Processing, 19, 1253–1264.

    Article  Google Scholar 

  • Liu, S. A. (1996). Landmark detection for distinctive feature based speech recognition. Journal of Acoustical Society of America, 100, 3417–3430.

    Article  Google Scholar 

  • Liu, S., & Zeng, F. G. (2006). Temporal properties in clear speech perception. Journal of Acoustical Society of America., 120, 424–432.

    Article  Google Scholar 

  • Loizou, P. C. (2007). Speech enhancement: Theory and practice. New York: CRC.

    Google Scholar 

  • Miller, G. E., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of Acoustical Society of America, 27, 338–352.

    Article  Google Scholar 

  • Montgomery, A. A., & Edge, R. A. (1988). Evaluation of two speech enhancement techniques to improve intelligibility for hearing impaired adults. Journal of Speech and Hearing Research, 31, 386–393.

    Article  Google Scholar 

  • O’Shaughnessy, D. (1987). Speech communication: Human and machine. New York: Addison-Wesley.

    Google Scholar 

  • O’Shaughnessy, D. (2008). Formant estimation and tracking. In J. Benesty, M. M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing (pp. 213–227). Berlin: Springer.

    Chapter  Google Scholar 

  • Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. Journal of Acoustical Society of America, 95, 1581–1592.

    Article  Google Scholar 

  • Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28, 96–103.

    Article  Google Scholar 

  • Picheny, M. A., Durlach, N. I., & Braida, L. D. (1986). Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29, 434–446.

    Article  Google Scholar 

  • Picheny, M. A., Durlach, N. I., & Braida, L. D. (1989). Speaking clearly for the hard of hearing III: An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. Journal of Speech and Hearing Research, 32, 600–603.

    Article  Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, New Jersey: Prentice-Hall.

    Google Scholar 

  • Regnier, M. S., & Allen, J. B. (2008). A method to identify noise-robust perceptual features: Application for consonant /t/. Journal of Acoustical Society of America, 123, 2801–2814.

    Article  Google Scholar 

  • Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. Journal of Acoustical Society of America, 115, 1296–1305.

    Article  Google Scholar 

  • Sammeth, C. A., Dorman, M. F., & Stearns, C. J. (1999). The role of consonant–vowel amplitude ratio in the recognition of voiceless stop consonants by listeners with hearing impairment. Journal of Speech and Hearing Research, 42, 42–55.

    Article  Google Scholar 

  • Skowronski, M. D., & Harris, J. G. (2005). Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments. Speech Communication, 48, 549–558.

    Article  Google Scholar 

  • Spectrum Digital, Inc. (2010). TMS320C5515 eZdsp USB stick technical eeference. [online] http://support.spectrumdigital.com/boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf.

  • Tantibundhit, C. Pernkopf, F., & Kubin, G. (2009). Speech enhancement based on joint time-frequency segmentation. In Proceedings of ICASSP 2009 (pp. 4673–4676). Taipei, Taiwan.

  • Texas Instruments, Inc. (2008). TLV320AIC3204 ultra low power stereo audio codec. [online] focus.ti.com/lit/ds/symlink/tlv320aic3204.pdf.

  • Texas Instruments, Inc. (2011). TMS320C5515 fixed-point digital signal processor. [online] focus.ti.com/lit/ds/symlink/tms320c5515.pdf.

  • Thomas, T. G. (1996). Experimental evaluation of improvement in speech perception with consonantal intensity and duration modification. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.

  • van Son, R. J. J. H., & Pols, L. C. W. (1999). An acoustic description of consonant reduction. Speech Communication, 28, 125–140.

  • Vaughan, N. E., Furukawa, I., Balasingam, N., Mortz, M., & Fausti, S. A. (2002). Time expanded speech and speech recognition in older adults. Journal of Rehabilitation Research and Development, 39, 559–566.

    Google Scholar 

  • Yoo, S. D., Boston, J. R., El-Jaroudi, A., & Li, C. C. (2007). Speech signal modification to increase intelligibility in noisy environment. Journal of Acoustical Society of America, 122, 1138–1149.

    Article  Google Scholar 

Download references

Acknowledgments

The research is partly supported by a project grant under the National Programme on Perception Engineering, sponsored by the Department of Electronics & Information Technology (DEITY), Ministry of Communications & Information Technology, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prem C. Pandey.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jayan, A.R., Pandey, P.C. Automated modification of consonant–vowel ratio of stops for improving speech intelligibility. Int J Speech Technol 18, 113–130 (2015). https://doi.org/10.1007/s10772-014-9254-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9254-4

Keywords

Navigation