Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation

Seong, Woo Kyeong; Park, Ji Hun; Kim, Hong Kook

doi:10.1007/978-3-642-31534-3_70

Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation

Woo Kyeong Seong²⁰,
Ji Hun Park²⁰ &
Hong Kook Kim²⁰

Conference paper

4697 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7383))

Abstract

In this paper, we propose a dysarthric speech recognition error correction method based on weighted finite state transducers (WFSTs). First, the proposed method constructs a context–dependent (CD) confusion matrix by aligning a recognized word sequence with the corresponding reference sequence at a phoneme level. However, because the dysarthric speech database is too insufficient to reflect all combinations of context–dependent phonemes, the CD confusion matrix can be underestimated. To mitigate this underestimation problem, the CD confusion matrix is interpolated with a context–independent (CI) confusion matrix. Finally, WFSTs based on the interpolated CD confusion matrix are built and integrated with a dictionary and language model transducers in order to correct speech recognition errors. The effectiveness of the proposed method is demonstrated by performing speech recognition using the proposed error correction method incorporated with the CD confusion matrix. It is shown from the speech recognition experiment that the average word error rate (WER) of a speech recognition system employing the proposed error correction method with the CD confusion matrix is relatively reduced by 13.68% and 5.93%, compared to those of the baseline speech recognition system and the error correction method with the CI confusion matrix, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Haines, D.: Neuroanatomy: an Atlas of Structures, Sections, and Systems. Lippingcott Williams and Wilkins, Hagerstown (2004)
Google Scholar
Hasegawa–Johnson, M., Gunderson, J., Perlman, A., Huang, T.: HMM–based and SVM–based recognition of the speech of talkers with spastic dysarthria. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Toulouse, France, pp. 1060–1063 (2006)
Google Scholar
Rudzicz, F.: Towards a noisy–channel model of dysarthria in speech recognition. In: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Los Angeles, CA, pp. 80–88 (2010)
Google Scholar
Poock, G.K., Lee Jr., W.C., Blackstone, S.W.: Dysarthric speech input to expert systems, electronic mail, and daily job activities. In: Proceedings of the American Voice Input/Output Society Conference, Alexandria, VA, pp. 33–43 (1987)
Google Scholar
Kotler, A.L., Tam, C.: Effectiveness of using discrete utterance speech recognition software. Augmentative and Alternative Communication 18(3), 137–146 (2002)
Article Google Scholar
Rosen, K., Yampolsky, S.: Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication 16(1), 48–60 (2000)
Article Google Scholar
Polur, P.D., Miller, G.E.: Effect of high–frequency spectral components in computer recognition of dysarthric speech based on a Mel–cepstral stochastic model. Journal of Rehabilitation Research and Development 42(3), 363–371 (2005)
Article Google Scholar
Hosem, J.P., Jakobs, T., Baker, A., Fager, S.: Automatic speech recognition for assistive writing in speech supplemented word prediction. In: The 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 2674–2677 (2010)
Google Scholar
Rudzicz, F.: Correcting error in speech recognition with articulatory dynamics. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 60–68 (2010)
Google Scholar
Morales, S.O.C., Cox, S.J.: Modeling errors in automatic speech recognition for dysarthric speakers. EURASIP Journal on Advances in Signal Processing, Article ID 308340, 14 pages (2009)
Google Scholar
Fransen, T.J., Pye, D., Foote, J., Renals, S.: WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, Detroit, MI, pp. 81–84 (1995)
Google Scholar
Menendez–Pidal, X., Polikoff, J.B., Peters, S.M., Leonzio, J.E., Bunnell, H.T.: The Nemours database of dysarthric speech. In: Proceedings of International Conference on Spoken Language Processing, Philadelphia, PA, pp. 1962–1965 (1996)
Google Scholar
ETSI Standard Document ES 201 108.: Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front–end Feature Extraction Algorithm; Compression Algorithms (2000)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Robinson, T.: British English Example Pronunciation Dictionary (BEEP). Cambridge University, Cambridge (1997)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted finite–state transducers in speech recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communications, Gwangju Institute of Science and Technology (GIST), 1 Oryong–dong, Buk–gu, Gwangju, 500–712, Korea
Woo Kyeong Seong, Ji Hun Park & Hong Kook Kim

Authors

Woo Kyeong Seong
View author publications
You can also search for this author in PubMed Google Scholar
Ji Hun Park
View author publications
You can also search for this author in PubMed Google Scholar
Hong Kook Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Linz, Institut Integriert Studieren, Altenbergerstraße 69, 4040, Linz, Austria
Klaus Miesenberger
University of San Francisco, 2130 Fulton St, 94117, San Francisco, CA, USA
Arthur Karshmer
Support Centre for Students with Special Needs, Masaryk University, Botanická 68A, 602 00, Brno, Czech Republic
Petr Penaz
Institute “integriert studieren”, Vienna University of Technology, Favoritenstr. 11/029, 1040, Vienna, Austria
Wolfgang Zagler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seong, W.K., Park, J.H., Kim, H.K. (2012). Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation. In: Miesenberger, K., Karshmer, A., Penaz, P., Zagler, W. (eds) Computers Helping People with Special Needs. ICCHP 2012. Lecture Notes in Computer Science, vol 7383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31534-3_70

Download citation

DOI: https://doi.org/10.1007/978-3-642-31534-3_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31533-6
Online ISBN: 978-3-642-31534-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics