Automated modification of consonant–vowel ratio of stops for improving speech intelligibility

Jayan, A. R.; Pandey, Prem C.

doi:10.1007/s10772-014-9254-4

Automated modification of consonant–vowel ratio of stops for improving speech intelligibility

Published: 04 October 2014

Volume 18, pages 113–130, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

A. R. Jayan¹ &
Prem C. Pandey¹

424 Accesses
11 Citations
Explore all metrics

Abstract

Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant–vowel ratio (CVR) modification, is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds and for hearing-impaired listeners. A technique for real-time CVR modification of stops using the rate of change of spectral centroid for detection of spectral transitions is presented. Its effectiveness in improving the recognition of consonants in the presence of speech-spectrum shaped noise is evaluated by conducting listening tests on normal-hearing subjects. At lower values of SNR, there was an increase of 7–21 % in recognition scores and an equivalent SNR advantage of 3 dB. The technique is implemented on a DSP board based on a 16-bit fixed point processor with on-chip FFT hardware and tested for satisfactory real-time operation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transcription Correction Using Group Delay Processing for Continuous Speech Recognition

Article 04 July 2017

Rajan Golda Brunet & A. Hema Murthy

Event-Based Transformation of Misarticulated Stops in Cleft Lip and Palate Speech

Article 20 February 2021

Protima Nomo Sudro, C. M. Vikram & S. R. Mahadeva Prasanna

Phonemes Recognition Using Formant Analysis in the Case of Consonant Vowel Transition Case “Amazigh Language”

References

Ananthapadmanabha, T. V., Prathosh, A. P., & Ramakrishnan, A. G. (2014). Detection of closure burst transitions of stops and affricates in continuous speech using the plosion index. Journal of Acoustical Society of America, 135, 460–471.
Article Google Scholar
Baer, T., Moore, B. C. J., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development, 30, 49–72.
Google Scholar
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. Journal of Acoustical Society of America, 112, 272–284.
Article Google Scholar
Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with learning disabilities. Journal of Speech, Language, and Hearing Research, 46, 80–97.
Article Google Scholar
Colotte, V., & Laprie, Y. (2000). Automatic enhancement of speech intelligibility. In Proceedings of ICASSP 2000 (pp. 1057–1060). Istanbul, Turkey.
Dillon, H. (2001). Hearing aids. New York: Thieme Medical.
Google Scholar
Freyman, R. L., & Nerbonne, G. P. (1989). The importance of consonant–vowel intensity ratio in the intelligibility of voiceless consonants. Journal of Speech and Hearing Research, 32, 524–535.
Article Google Scholar
Gan, W. S., Seth, A., & Kuo, S. M. (2011). Versatile and portable DSP platform for learning embedded signal processing. In Proceedings of ICASSP 2011 (pp. 2888–2891). Praugue, Czech Republic.
Gatehouse, S., & Gordon, J. (1990). Response times to speech stimuli as measures of benefit from amplification. British Journal of Audiology, 24, 63–68.
Article Google Scholar
Gordon-Salant, S. (1986). Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hearing. Journal of Acoustical Society of America, 80, 1599–1607.
Article Google Scholar
Hazan, V., & Simpson, A. (1998). The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise. Speech Communication, 24, 211–226.
Article Google Scholar
House, A. S., Williams, C. E., Hecker, H. M. L., & Kryter, K. D. (1965). Articulation-testing methods: Consonantal differentiation with a closed-response set. Journal of Acoustical Society of America, 37, 158–166.
Article Google Scholar
Jayan, A. R. (2014a). Enhancement of speech intelligibility using acoustic properties of clear speech. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.
Jayan, A. R. (2014b). Speech files used as the test material for evaluation of speech enhancement techniques. [online] www.ee.iitb.ac.in/~spilab/material/jayan_phd2014.
Jayan, A. R., & Pandey, P. C. (2012). Automated CVR modification for improving perception of stop consonants. In Proceedings of 18th national conference on communications (pp. 698–702). Kharagpur, India.
Jayan, A. R., & Pandey, P. C. (2009). Detection of stop landmarks using Gaussian mixture modeling of speech spectrum. In Proceedings of ICASSP 2009 (pp. 4681–4684). Taipei, Taiwan.
Jayan, A. R., Rajath Bhat, P. S., & Pandey, P. C. (2011). Detection of burst onset landmarks in speech using rate of change of spectral moments. In Proceedings of 17th national conference on communications (paper no. SpPrI.3), Bangalore, India.
Kapoor, A., & Allen, J. B. (2012). Perceptual effects of plosive feature modification. Journal of Acoustical Society of America, 131, 478–491.
Article Google Scholar
Kennedy, E., Levitt, H., Neuman, A. C., & Wiess, M. (1998). Consonant–vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners. Journal of Acoustical Society of America, 103, 1098–1114.
Article Google Scholar
Koning, R., & Wouters, J. (2012). The potential of onset enhancement for increased speech intelligibility in auditory prostheses. Journal of Acoustical Society of America, 132, 2569–2581.
Article Google Scholar
Krause, J. C., & Braida, L. D. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of Acoustical Society of America, 115, 362–378.
Article Google Scholar
Kulkarni, P. N., Pandey, P. C., & Jangamashetti, D. S. (2012). Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss. Speech Communication, 54, 341–350.
Article Google Scholar
Li, F., Menon, A., & Allen, J. B. (2010). A psychoacoustic method to find the perceptual cues to stop consonants in natural speech. Journal of Acoustical Society of America, 127, 2599–2610.
Article Google Scholar
Li, F., Menon, A., & Allen, J. B. (2012). A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise. Journal of Acoustical Society of America, 132, 2663–2675.
Article Google Scholar
Lin, C. Y., & Wang, H. C. (2011). Burst onset landmark detection and its application to speech recognition. IEEE Transaction on Audio, Speech, Language Processing, 19, 1253–1264.
Article Google Scholar
Liu, S. A. (1996). Landmark detection for distinctive feature based speech recognition. Journal of Acoustical Society of America, 100, 3417–3430.
Article Google Scholar
Liu, S., & Zeng, F. G. (2006). Temporal properties in clear speech perception. Journal of Acoustical Society of America., 120, 424–432.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement: Theory and practice. New York: CRC.
Google Scholar
Miller, G. E., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of Acoustical Society of America, 27, 338–352.
Article Google Scholar
Montgomery, A. A., & Edge, R. A. (1988). Evaluation of two speech enhancement techniques to improve intelligibility for hearing impaired adults. Journal of Speech and Hearing Research, 31, 386–393.
Article Google Scholar
O’Shaughnessy, D. (1987). Speech communication: Human and machine. New York: Addison-Wesley.
Google Scholar
O’Shaughnessy, D. (2008). Formant estimation and tracking. In J. Benesty, M. M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing (pp. 213–227). Berlin: Springer.
Chapter Google Scholar
Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. Journal of Acoustical Society of America, 95, 1581–1592.
Article Google Scholar
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28, 96–103.
Article Google Scholar
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1986). Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29, 434–446.
Article Google Scholar
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1989). Speaking clearly for the hard of hearing III: An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. Journal of Speech and Hearing Research, 32, 600–603.
Article Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, New Jersey: Prentice-Hall.
Google Scholar
Regnier, M. S., & Allen, J. B. (2008). A method to identify noise-robust perceptual features: Application for consonant /t/. Journal of Acoustical Society of America, 123, 2801–2814.
Article Google Scholar
Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. Journal of Acoustical Society of America, 115, 1296–1305.
Article Google Scholar
Sammeth, C. A., Dorman, M. F., & Stearns, C. J. (1999). The role of consonant–vowel amplitude ratio in the recognition of voiceless stop consonants by listeners with hearing impairment. Journal of Speech and Hearing Research, 42, 42–55.
Article Google Scholar
Skowronski, M. D., & Harris, J. G. (2005). Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments. Speech Communication, 48, 549–558.
Article Google Scholar
Spectrum Digital, Inc. (2010). TMS320C5515 eZdsp USB stick technical eeference. [online] http://support.spectrumdigital.com/boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf.
Tantibundhit, C. Pernkopf, F., & Kubin, G. (2009). Speech enhancement based on joint time-frequency segmentation. In Proceedings of ICASSP 2009 (pp. 4673–4676). Taipei, Taiwan.
Texas Instruments, Inc. (2008). TLV320AIC3204 ultra low power stereo audio codec. [online] focus.ti.com/lit/ds/symlink/tlv320aic3204.pdf.
Texas Instruments, Inc. (2011). TMS320C5515 fixed-point digital signal processor. [online] focus.ti.com/lit/ds/symlink/tms320c5515.pdf.
Thomas, T. G. (1996). Experimental evaluation of improvement in speech perception with consonantal intensity and duration modification. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.
van Son, R. J. J. H., & Pols, L. C. W. (1999). An acoustic description of consonant reduction. Speech Communication, 28, 125–140.
Vaughan, N. E., Furukawa, I., Balasingam, N., Mortz, M., & Fausti, S. A. (2002). Time expanded speech and speech recognition in older adults. Journal of Rehabilitation Research and Development, 39, 559–566.
Google Scholar
Yoo, S. D., Boston, J. R., El-Jaroudi, A., & Li, C. C. (2007). Speech signal modification to increase intelligibility in noisy environment. Journal of Acoustical Society of America, 122, 1138–1149.
Article Google Scholar

Download references

Acknowledgments

The research is partly supported by a project grant under the National Programme on Perception Engineering, sponsored by the Department of Electronics & Information Technology (DEITY), Ministry of Communications & Information Technology, Government of India.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400 076, India
A. R. Jayan & Prem C. Pandey

Authors

A. R. Jayan
View author publications
You can also search for this author in PubMed Google Scholar
Prem C. Pandey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prem C. Pandey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jayan, A.R., Pandey, P.C. Automated modification of consonant–vowel ratio of stops for improving speech intelligibility. Int J Speech Technol 18, 113–130 (2015). https://doi.org/10.1007/s10772-014-9254-4

Download citation

Received: 12 March 2014
Accepted: 22 September 2014
Published: 04 October 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10772-014-9254-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated modification of consonant–vowel ratio of stops for improving speech intelligibility

Abstract

Access this article

Similar content being viewed by others

Transcription Correction Using Group Delay Processing for Continuous Speech Recognition

Event-Based Transformation of Misarticulated Stops in Cleft Lip and Palate Speech

Phonemes Recognition Using Formant Analysis in the Case of Consonant Vowel Transition Case “Amazigh Language”

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automated modification of consonant–vowel ratio of stops for improving speech intelligibility

Abstract

Access this article

Similar content being viewed by others

Transcription Correction Using Group Delay Processing for Continuous Speech Recognition

Event-Based Transformation of Misarticulated Stops in Cleft Lip and Palate Speech

Phonemes Recognition Using Formant Analysis in the Case of Consonant Vowel Transition Case “Amazigh Language”

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation