Advertisement

Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network

  • Hisashi Kanda
  • Tetsuya Ogata
  • Kazunori Komatani
  • Hiroshi G. Okuno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4985)

Abstract

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants’ vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.

Keywords

Speech Perception Vocal Tract Motor Theory Speech Sound Vowel Sound 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Liberman, A.M., Cooper, F.S., et al.: A motor theory of speech perception. In: Proc. Speech Communication Seminar, Paper-D3, Stockholm (1962)Google Scholar
  2. 2.
    Tani, J., Ito, M.: Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment. IEEE Transactions on SMC Part A 33(4), 481–488 (2003)Google Scholar
  3. 3.
    Minematsu, N., Nishimura, T., Nishinari, K., Sakuraba, K.: Theorem of the invariant structure and its derivation of speech gestalt. In: Proc. Int. Workshop on Speech Recognition and Intrinsic Variations, pp. 47–52 (2006)Google Scholar
  4. 4.
    Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Cognitive Neuroscience 15, 399–402 (2002)CrossRefGoogle Scholar
  5. 5.
    Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T.: Auditory-motor interaction revealed by fmri. Area Spt. Journal of Cognitive Neuroscience 15(5), 673–682 (2003)Google Scholar
  6. 6.
    Yokoya, R., Ogata, T., Tani, J., Komatani, K., Okuno, H.G.: Experience based imitation using RNNPB. In: IEEE/RSJ IROS 2006 (2006)Google Scholar
  7. 7.
    Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model. In: Speech production and speech modeling, pp. 131–149. Kluwer Academic Publishers, Dordrecht (1990)Google Scholar
  8. 8.
    Kitawaki, N., Itakura, F., Saito, S.: Optimum coding of transmission parameters in parcor speech analysis synthesis system. Transactions of the Institute of Electronics and Communication Engineers of Japan (IEICE) J61-A(2), 119–126 (1978)Google Scholar
  9. 9.
    Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1303–1306 (1997)Google Scholar
  10. 10.
    Jordan, M.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Eighth Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ, pp. 513–546 (1986)Google Scholar
  11. 11.
    Rumelhart, D., Hinton, G., Williams, R.: Learning internal representation by error propagation. MIT Press, Cambridge (1986)Google Scholar
  12. 12.
    Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312 (1972)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Hisashi Kanda
    • 1
  • Tetsuya Ogata
    • 1
  • Kazunori Komatani
    • 1
  • Hiroshi G. Okuno
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations