AI 2003: AI 2003: Advances in Artificial Intelligence pp 686-698 | Cite as
Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech
Abstract
This paper proposes and evaluates a new direct speech transform method with waveforms from laryngectomee speech to normal speech. Almost all conventional speech recognition systems and other speech processing systems are not able to treat laryngectomee speech with satisfactory results. One of the major causes is difficulty preparing corpora. It is very hard to record a large amount of clear and intelligible utterance data because the acoustical quality depends strongly on the individual status of such people.
We focus on acoustic characteristics of speech waveform by laryngectomee people and transform them directly into normal speech. The proposed method is able to deal with esophageal and alaryngeal speech in the same algorithm. The method is realized by learning transform rules that have acoustic correspondences between laryngectomee and normal speech. Results of several fundamental experiments indicate a promising performance for real transform.
Keywords
Esophageal speech Alaryngeal speech Speech transform Transform rule Acoustic characteristics of speechPreview
Unable to display preview. Download preview PDF.
References
- 1.Müller, J., Stahl, H.: Speech understanding and speech transform by maximum a-posteriori semantic decoding. Proceedings of Artificial Intelligence in Engineering, 373–384 (1999)Google Scholar
- 2.Ding, W., Higuchi, N.: A voice conversion method based on complex RBF network. In: Proceedings of the 1997 autumn meeting of ASJ(Japanese), pp. 335–336 (1997)Google Scholar
- 3.Turk, O., Arslan, L.M.: Subband based Voice Conversation. In: Proceedings of ICSLP 2002, pp. 289–292 (2002)Google Scholar
- 4.Murakami, K., Hiroshige, M., Araki, K., Tochinai, K.: Evaluation of direct speech transform method using Inductive Learning for conversations in the travel domain. In: Proceedings of ACL 2002 Workshop on Speech-to-Speech Translation, pp. 45–52 (2002)Google Scholar
- 5.Murakami, K., Araki, K., Hiroshige, M., Tochinai, K.: Evaluation of the rule acquisition on a direct speech translation method with waveforms using Inductive Learning for nouns and noun phrases. In: Proceedings of Pacific Association for Computational Linguistics PACLING 2003, pp. 121–130 (2003)Google Scholar
- 6.Araki, K., Tochinai, K.: Effectiveness of natural language processing method using inductive learning. In: Proceedings of Artificial Intelligence and Soft Computing ASC 2001, pp. 295–300 (2001)Google Scholar
- 7.Matsui, K., Noguchi, E.: Enhancement of esophageal speech. In: Proceedings the 1996 autumn meeting of the ASJ(Japanese), pp. 423–424 (1996)Google Scholar
- 8.Lu, J., Doi, Y., Nakamura, S., Shikano, K.: Acoustical Characteristics of Vowels of Esophageal Speech. Technical report of IEICE, SP96-126, pp. 233–240 (1997)Google Scholar
- 9.Espy-Wilson, C.Y., Chari, V.R., Huang, C.B.: Enhancement of Alaryngeal Speech by Adaptive Filter. In: Proceedings of ICSLP 1996, pp. 764–767 (1996)Google Scholar
- 10.Lee, A., Kawahara, T., Shikano, K.: Julius – a Open Source Real-Time Large Vocabulary Recognition Engine. In: Proceedings of EUROSPEECH 2001, pp. 1691–1693 (2001)Google Scholar
- 11.Katoh, Y.: Acoustic characteristics of speech in voice disorders. In: The 2000 spring meeting of the ASJ(Japanese), pp. 309–310 (2002)Google Scholar
- 12.Callan, D., Kent, R.D., Roy, N., Tasko, S.M.: Self-organizing Map for the Classification of Normal and Disordered Female Voices. Journal of Speech, Language, and Hearing Research 43, 355–366 (1999)Google Scholar
- 13.Silverman, H.F., Morgan, D.P.: The application of dynamic programming to connected speech recognition. IEEE, ASSP Magazine, 6–25 (1990)Google Scholar