Abstract
In this paper, we propose a consonant–vowel (CV) dependentWiener filter for dysarthric automatic speech recognition (ASR) in noisy environments. When a Wiener filter is applied to dysarthric speech in noise, it distorts initial consonants of dysarthric speech. This is because compared to normal speech, the speech spectrum at a consonant-vowel onset in dysarthric speech is much similar to that of noise, thus speech at the onset is easy to be removed by the Wiener filtering. In order to mitigate this problem, the transfer function of a Wiener filter is differently constructed depending on the result of CV classification that is performed by combining voice activity detection (VAD) and vowel onset estimation. In this work, VAD is done by a statistical model based approach and the vowel onset estimation is by investigating the variation of linear prediction residual signals. To demonstrate the effectiveness of the proposed CV–dependentWiener filter on the performance of dysarthric ASR, we compare the performance of an ASR system employing the proposed method with that using a conventional Wiener filter for different groups of degrees of disability under different signal–to–noise ratio conditions. Consequently, it is shown from the ASR experiments that the proposed Wiener filter achieves a relative average word error rate reduction of 10.41%, 6.03%, and 0.94% for the mild, moderate, and severe group of disability, respectively, when compared to the conventional Wiener filter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Haines D (2004) Neuroanatomy: an Atlas of Structures, Sections, and Systems. Lippincott Williams and Wilkins, Hagerstown
Platt LJ, Andrews G, Young M, Quinn PT (1980) Dysarthria of adult cerebral palsy: I. Intelligibility and articulatory impairment. Journal of Speech and Hearing Research 23(1):28–40
Hasegawa–Johnson M, Gunderson J, Perlman A, Huang T (2006) HMM– based and SVM–based recognition of the speech of talkers with spastic dysarthria. in Proc. of International Conference on Acoustics, Speech, and Signal Processing 1:1060–1063
Parker M, Cunningham S, Enderby P, Hawley, M, Green P (2006) Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project. Clinical Linguistics and Phonetics 20(2/3):149–156
Benesty J, Makino S, Chen J (2005) Speech Enhancement. Springer, Berlin [6] Erkelens JS, Heusdens R (2008) Tracking of nonstationary noise based on data–driven recursive noise power estimation. IEEE Trans. on Audio, Speech, and Language Processing 16(6):1112–1123
Kent RD, Rosenbek JC (1983) Acoustic patterns of apraxia of speech. Journal of Speech and Hearing Research 26(2):231–249
Platt LJ, Andrews G, Howie PM (1980) Dysarthria of adult cerebral palsy: II. Phonemic analysis of articulation errors. Journal of Speech and Hearing Research 23(1):41–55
Sohn J, Kim NS, Sung W (1999) A statistical model based voice activity detection. IEEE Signal Processing Letters 6(1):1–3
Prasanna SR, Reddy BV, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. on Audio, Speech, and Language Processing 17(4):556–565
Kim S, Oh S, Jung HY, Jeong HB, Kim JS (2002) Common speech database collection. in Proc. Acoustical Society of Korea 21(1):21–24
Acknowledgements
This work was supported in part by the R&D Program of MKE/KEIT (10036461, Development of an embedded key-word spotting speech recognition system individually customized for disabled persons with dysarthria) and the Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (No.2010-0023888).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this paper
Cite this paper
Park, J.H., Seong, W.K., Kim, H.K. (2011). Preprocessing of Dysarthric Speech in Noise Based on CV–Dependent Wiener Filtering. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1335-6_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)