Abstract
Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant–vowel ratio (CVR) modification, is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds and for hearing-impaired listeners. A technique for real-time CVR modification of stops using the rate of change of spectral centroid for detection of spectral transitions is presented. Its effectiveness in improving the recognition of consonants in the presence of speech-spectrum shaped noise is evaluated by conducting listening tests on normal-hearing subjects. At lower values of SNR, there was an increase of 7–21 % in recognition scores and an equivalent SNR advantage of 3 dB. The technique is implemented on a DSP board based on a 16-bit fixed point processor with on-chip FFT hardware and tested for satisfactory real-time operation.
Similar content being viewed by others
References
Ananthapadmanabha, T. V., Prathosh, A. P., & Ramakrishnan, A. G. (2014). Detection of closure burst transitions of stops and affricates in continuous speech using the plosion index. Journal of Acoustical Society of America, 135, 460–471.
Baer, T., Moore, B. C. J., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development, 30, 49–72.
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. Journal of Acoustical Society of America, 112, 272–284.
Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with learning disabilities. Journal of Speech, Language, and Hearing Research, 46, 80–97.
Colotte, V., & Laprie, Y. (2000). Automatic enhancement of speech intelligibility. In Proceedings of ICASSP 2000 (pp. 1057–1060). Istanbul, Turkey.
Dillon, H. (2001). Hearing aids. New York: Thieme Medical.
Freyman, R. L., & Nerbonne, G. P. (1989). The importance of consonant–vowel intensity ratio in the intelligibility of voiceless consonants. Journal of Speech and Hearing Research, 32, 524–535.
Gan, W. S., Seth, A., & Kuo, S. M. (2011). Versatile and portable DSP platform for learning embedded signal processing. In Proceedings of ICASSP 2011 (pp. 2888–2891). Praugue, Czech Republic.
Gatehouse, S., & Gordon, J. (1990). Response times to speech stimuli as measures of benefit from amplification. British Journal of Audiology, 24, 63–68.
Gordon-Salant, S. (1986). Recognition of natural and time/intensity altered CVs by young and elderly subjects with normal hearing. Journal of Acoustical Society of America, 80, 1599–1607.
Hazan, V., & Simpson, A. (1998). The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise. Speech Communication, 24, 211–226.
House, A. S., Williams, C. E., Hecker, H. M. L., & Kryter, K. D. (1965). Articulation-testing methods: Consonantal differentiation with a closed-response set. Journal of Acoustical Society of America, 37, 158–166.
Jayan, A. R. (2014a). Enhancement of speech intelligibility using acoustic properties of clear speech. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.
Jayan, A. R. (2014b). Speech files used as the test material for evaluation of speech enhancement techniques. [online] www.ee.iitb.ac.in/~spilab/material/jayan_phd2014.
Jayan, A. R., & Pandey, P. C. (2012). Automated CVR modification for improving perception of stop consonants. In Proceedings of 18th national conference on communications (pp. 698–702). Kharagpur, India.
Jayan, A. R., & Pandey, P. C. (2009). Detection of stop landmarks using Gaussian mixture modeling of speech spectrum. In Proceedings of ICASSP 2009 (pp. 4681–4684). Taipei, Taiwan.
Jayan, A. R., Rajath Bhat, P. S., & Pandey, P. C. (2011). Detection of burst onset landmarks in speech using rate of change of spectral moments. In Proceedings of 17th national conference on communications (paper no. SpPrI.3), Bangalore, India.
Kapoor, A., & Allen, J. B. (2012). Perceptual effects of plosive feature modification. Journal of Acoustical Society of America, 131, 478–491.
Kennedy, E., Levitt, H., Neuman, A. C., & Wiess, M. (1998). Consonant–vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners. Journal of Acoustical Society of America, 103, 1098–1114.
Koning, R., & Wouters, J. (2012). The potential of onset enhancement for increased speech intelligibility in auditory prostheses. Journal of Acoustical Society of America, 132, 2569–2581.
Krause, J. C., & Braida, L. D. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of Acoustical Society of America, 115, 362–378.
Kulkarni, P. N., Pandey, P. C., & Jangamashetti, D. S. (2012). Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss. Speech Communication, 54, 341–350.
Li, F., Menon, A., & Allen, J. B. (2010). A psychoacoustic method to find the perceptual cues to stop consonants in natural speech. Journal of Acoustical Society of America, 127, 2599–2610.
Li, F., Menon, A., & Allen, J. B. (2012). A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise. Journal of Acoustical Society of America, 132, 2663–2675.
Lin, C. Y., & Wang, H. C. (2011). Burst onset landmark detection and its application to speech recognition. IEEE Transaction on Audio, Speech, Language Processing, 19, 1253–1264.
Liu, S. A. (1996). Landmark detection for distinctive feature based speech recognition. Journal of Acoustical Society of America, 100, 3417–3430.
Liu, S., & Zeng, F. G. (2006). Temporal properties in clear speech perception. Journal of Acoustical Society of America., 120, 424–432.
Loizou, P. C. (2007). Speech enhancement: Theory and practice. New York: CRC.
Miller, G. E., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of Acoustical Society of America, 27, 338–352.
Montgomery, A. A., & Edge, R. A. (1988). Evaluation of two speech enhancement techniques to improve intelligibility for hearing impaired adults. Journal of Speech and Hearing Research, 31, 386–393.
O’Shaughnessy, D. (1987). Speech communication: Human and machine. New York: Addison-Wesley.
O’Shaughnessy, D. (2008). Formant estimation and tracking. In J. Benesty, M. M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing (pp. 213–227). Berlin: Springer.
Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. Journal of Acoustical Society of America, 95, 1581–1592.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28, 96–103.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1986). Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29, 434–446.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1989). Speaking clearly for the hard of hearing III: An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. Journal of Speech and Hearing Research, 32, 600–603.
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, New Jersey: Prentice-Hall.
Regnier, M. S., & Allen, J. B. (2008). A method to identify noise-robust perceptual features: Application for consonant /t/. Journal of Acoustical Society of America, 123, 2801–2814.
Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. Journal of Acoustical Society of America, 115, 1296–1305.
Sammeth, C. A., Dorman, M. F., & Stearns, C. J. (1999). The role of consonant–vowel amplitude ratio in the recognition of voiceless stop consonants by listeners with hearing impairment. Journal of Speech and Hearing Research, 42, 42–55.
Skowronski, M. D., & Harris, J. G. (2005). Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments. Speech Communication, 48, 549–558.
Spectrum Digital, Inc. (2010). TMS320C5515 eZdsp USB stick technical eeference. [online] http://support.spectrumdigital.com/boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf.
Tantibundhit, C. Pernkopf, F., & Kubin, G. (2009). Speech enhancement based on joint time-frequency segmentation. In Proceedings of ICASSP 2009 (pp. 4673–4676). Taipei, Taiwan.
Texas Instruments, Inc. (2008). TLV320AIC3204 ultra low power stereo audio codec. [online] focus.ti.com/lit/ds/symlink/tlv320aic3204.pdf.
Texas Instruments, Inc. (2011). TMS320C5515 fixed-point digital signal processor. [online] focus.ti.com/lit/ds/symlink/tms320c5515.pdf.
Thomas, T. G. (1996). Experimental evaluation of improvement in speech perception with consonantal intensity and duration modification. Ph.D. Thesis, Electrical Engineering, Indian Institute of Technology Bombay, India.
van Son, R. J. J. H., & Pols, L. C. W. (1999). An acoustic description of consonant reduction. Speech Communication, 28, 125–140.
Vaughan, N. E., Furukawa, I., Balasingam, N., Mortz, M., & Fausti, S. A. (2002). Time expanded speech and speech recognition in older adults. Journal of Rehabilitation Research and Development, 39, 559–566.
Yoo, S. D., Boston, J. R., El-Jaroudi, A., & Li, C. C. (2007). Speech signal modification to increase intelligibility in noisy environment. Journal of Acoustical Society of America, 122, 1138–1149.
Acknowledgments
The research is partly supported by a project grant under the National Programme on Perception Engineering, sponsored by the Department of Electronics & Information Technology (DEITY), Ministry of Communications & Information Technology, Government of India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jayan, A.R., Pandey, P.C. Automated modification of consonant–vowel ratio of stops for improving speech intelligibility. Int J Speech Technol 18, 113–130 (2015). https://doi.org/10.1007/s10772-014-9254-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-014-9254-4