Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network
A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants’ vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.
KeywordsSpeech Perception Vocal Tract Motor Theory Speech Sound Vowel Sound
Unable to display preview. Download preview PDF.
- 1.Liberman, A.M., Cooper, F.S., et al.: A motor theory of speech perception. In: Proc. Speech Communication Seminar, Paper-D3, Stockholm (1962)Google Scholar
- 2.Tani, J., Ito, M.: Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment. IEEE Transactions on SMC Part A 33(4), 481–488 (2003)Google Scholar
- 3.Minematsu, N., Nishimura, T., Nishinari, K., Sakuraba, K.: Theorem of the invariant structure and its derivation of speech gestalt. In: Proc. Int. Workshop on Speech Recognition and Intrinsic Variations, pp. 47–52 (2006)Google Scholar
- 5.Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T.: Auditory-motor interaction revealed by fmri. Area Spt. Journal of Cognitive Neuroscience 15(5), 673–682 (2003)Google Scholar
- 6.Yokoya, R., Ogata, T., Tani, J., Komatani, K., Okuno, H.G.: Experience based imitation using RNNPB. In: IEEE/RSJ IROS 2006 (2006)Google Scholar
- 7.Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model. In: Speech production and speech modeling, pp. 131–149. Kluwer Academic Publishers, Dordrecht (1990)Google Scholar
- 8.Kitawaki, N., Itakura, F., Saito, S.: Optimum coding of transmission parameters in parcor speech analysis synthesis system. Transactions of the Institute of Electronics and Communication Engineers of Japan (IEICE) J61-A(2), 119–126 (1978)Google Scholar
- 9.Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1303–1306 (1997)Google Scholar
- 10.Jordan, M.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Eighth Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ, pp. 513–546 (1986)Google Scholar
- 11.Rumelhart, D., Hinton, G., Williams, R.: Learning internal representation by error propagation. MIT Press, Cambridge (1986)Google Scholar