Consideration of Infants’ Vocal Imitation Through Modeling Speech as Timbre-Based Melody

  • Nobuaki Minematsu
  • Tazuko Nishimura
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4914)


Infants acquire spoken language through hearing and imitating utterances mainly from their parents [1,2,3] but never imitate their parents’ voices as they are. What in the voices do the infants imitate? Due to poor phonological awareness, it is difficult for them to decode an input utterance into a string of small linguistic units like phonemes [3,4,5,6], so it is also difficult for them to convert the individual units into sounds with their mouths. What then do infants acoustically imitate? Developmental psychology claims that they extract the holistic sound pattern of an input word, called word Gestalt [3,4,5], and reproduce it with their mouths. We address the question “What is the acoustic definition of word Gestalt?” [7] It has to be speaker-invariant because infants extract the same word Gestalt for a particular input word irrespective of the person speaking that word to them. Here, we aim to answer the above question by regarding speech as timbre-based melody that focuses on holistic and speaker-invariant contrastive features embedded in an utterance.


Speech Recognition Speech Sound Input Word Absolute Pitch Musical Piece 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kuhl, P.K., Meltzoff, A.N.: Infant vocalizations in response to speech: Vocal imitation and developmental change. J. Acoust. Soc. Am. 100(4), 2425–2438 (1996)CrossRefGoogle Scholar
  2. 2.
    Gruhn, W.: The audio-vocal system in sound perception and learning of language and music. In: Proc. Int. Conf. on language and music as cognitive systems (2006)Google Scholar
  3. 3.
    Hayakawa, M.: Language acquisition and matherese. In: Language, Taishukan pub. vol. 35(9), pp. 62–67 (2006)Google Scholar
  4. 4.
    Shaywitz, S.E.: Overcoming dyslexia, Random House (2005)Google Scholar
  5. 5.
    Kato, M.: Phonological development and its disorders. J. Communication Disorders 20(2), 84–85 (2003)Google Scholar
  6. 6.
    Hara, K.: Phonological disorders and phonological awareness in children. J. Communication Disorders 20(2), 98–102 (2003)Google Scholar
  7. 7.
    Minematsu, N., Nishimura, T.: Universal and invariant representation of speech, CD-ROM of Int. Conf. Infant Study (2006),
  8. 8.
    Johnson, K., Mullennix, J.W.: Talker variability in speech processing. Academic Press, London (1997)Google Scholar
  9. 9.
  10. 10.
    Miyamoto, K.: Making voices and watching voices. Morikawa Pub. (1995)Google Scholar
  11. 11.
    Minematsu, N., et al.: Theorem of the invariant structure and its derivation of speech Gestalt. In: Proc. ISCA Int. Workshop on Speech Recognition and Intrinsic Variation, pp. 47–52 (2006)Google Scholar
  12. 12.
    Minematsu, N.: Are learners myna birds to the averaged distributions of native speaker? – a note of warning from a serious speech engineer –, CD-ROM of ISCA Int. Workshop on Speech and Language Technology in Education (2007)Google Scholar
  13. 13.
    Asakawa, S., Minematsu, N., Hirose, K.: Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics. In: Proc. InterSpeech, pp. 890–893 (2007)Google Scholar
  14. 14.
    Qiao, Y., Asakawa, S., Minematsu, N.: Random discriminant structure analysis for continous Japanese vowel recognition. In: Proc. Int. Workshop on Automatic Speech Recognition and Understanding, December 2007 (to appear)Google Scholar
  15. 15.
    Taniguchi, T.: Sounds become music in mind – Introduction to music psychology –. Kitaoji Pub. (2000)Google Scholar
  16. 16.
    Titze, I.R.: Principles of voice production. Prentice-Hall Inc., Englewood Cliffs (1994)Google Scholar
  17. 17.
    Miyazaki, K.: How well do we understand absolute pitch? J. Acoust. Soc. Jpn. 60(11), 682–688 (2004)Google Scholar
  18. 18.
    Minematsu, N., Asakawa, S., Hirose, K.: Linear and non-linear transformation invariant representation of information and its use for acoustic modeling of speech. In: Proc. Spring Meeting Acoust. Soc. Jpn., pp. 147–148 (2007)Google Scholar
  19. 19.
    Jakobson, R., Lotz, J.: Notes on the French phonemic pattern, Hunter (1949)Google Scholar
  20. 20.
    Saussure, F.: Cours de linguistique general. In: Publie par Charles Bally et Albert Schehaye avec la collaboration de Albert Riedlinge, Lausanne et Paris, Payot (1916)Google Scholar
  21. 21.
    Labov, W., Ash, W., Boberg, C.: Atlas of North American English. Walter de Gruyter, Berlin (2001)Google Scholar
  22. 22.
    Saito, D., et al.: Derectional dependency of cepstrum on vocal tract length. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing (2008, submitted)Google Scholar
  23. 23.
    Minematsu, N.: Yet another acoustic representation of speech. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing, pp. 585–588 (2004)Google Scholar
  24. 24.
    Kawahara, T., et al.: Recent progress of open-source LVCSR engine Julius and Japanese model repository. In: Proc. Int. Conf. Spoken Language Processing, pp. 3069–3072 (2004)Google Scholar
  25. 25.
    Asakawa, S., Minematsu, N., Hirose, K.: Multi-stream parameterization for structural speech recognition. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing (2008, submitted)Google Scholar
  26. 26.
    Takeshima, C., Tsuzaki, M., Irino, T.: Identification of size-modulated vowel sequences and temporal characteristics of the size extraction process, IEIEC Technical Report, SP2006-29, 13-17 (2006)Google Scholar
  27. 27.
    Smith, D.R., et al.: The processing and perception of size information in speech sounds. J. Acoust. Soc. Am. 171(1), 305–318 (2005)CrossRefGoogle Scholar
  28. 28.
    Hayashi, Y., et al.: Comparison of perceptual characteristics of scaled vowels and words. In: Proc. Spring Meeting Acoust. Soc. Jpn., pp. 473–474 (2007)Google Scholar
  29. 29.
    Davis, R.D., Braun, E.M.: The gift of dyslexia, Perigee Trade (1997)Google Scholar
  30. 30.
    Frith, U.: Autism: Explaining the enigma. Blackwell Pub., Malden (1992)Google Scholar
  31. 31.
    Happe, F.: Autism: An introduction of psychological theory. UCL Press (1994)Google Scholar
  32. 32.
    Higashida, N., Higashida, M.: Messages to all my colleagues living on the planet. Escor Pub. (2005)Google Scholar
  33. 33.
    Nade, J.: The developing child with autism: evidences, speculations and vexed questions. In: Tutorial Session of IEEE Int. Conf. Development and Learning (2005)Google Scholar
  34. 34.
    Asami, T.: A book on my son, Hiroshi, Nakagawa Pub., vol. 5 (2006)Google Scholar
  35. 35.
    Trehub, S.E.: The developmental origins of musicality. Nature neurosciences 6, 669–673 (2003)CrossRefGoogle Scholar
  36. 36.
    Hauser, M.D., McDermott, J.: The evolution of the music faculty: A comparative perspective. Nature neurosciences 6, 663–668 (2003)CrossRefGoogle Scholar
  37. 37.
    Levitin, D.J., Rogers, S.E.: Absolute pitch: perception, coding, and controversies. Trends in Cognitive Sciences 9(1), 26–33 (2005)CrossRefGoogle Scholar
  38. 38.
    Kojima, S.: A search for the origins of human speech: Auditory and vocal functions of the chimpanzee. Trans Pacific Press (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nobuaki Minematsu
    • 1
  • Tazuko Nishimura
    • 2
  1. 1.Graduate School of EngineeringThe University of TokyoBunkyo-kuJapan
  2. 2.Graduate School of MedicineThe University of TokyoBunkyo-kuJapan

Personalised recommendations