Temporally Variable Multi attribute Morphing of Arbitrarily Many Voices for Exploratory Research of Speech Prosody

Part of the Prosody, Phonology and Phonetics book series (PRPHPH)


Morphing provides a flexible research strategy for non- and para linguistic aspects of speech. Recent extension of the morphing procedure has made it possible to interpolate and extrapolate physical attributes of arbitrarily many utterance examples. By using utterances representing typical instantiation of the non- and para linguistic information in question and introducing systematic perturbation of trajectories in a high-dimensional space spanned by a set of indexed weights for the physical parameters of utterances, the physical correlates of such information can be represented in terms of a differential geometrical concept. Formulation of this extended morphing framework in generalized representations and a few representative cases of applications are discussed with comments on the limitations of the current implementation and possible solutions.


Graphical User Interface Linguistic Information Perceptual Attribute Spectral Envelope Abstract Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Arakawa, A, Y Uchimura, H Banno, F Itakura, and H Kawahara. 2010. High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of STRAIGHT spectrum. In Proceedings of ICASSP2010, IEEE, pp 4834–4837.Google Scholar
  2. Bruckert, L., P. Bestelmeyer, M. Latinus, J. Rouger, I. Charest, G. A. Rousselet, H. Kawahara, and P. Belin. 2010. Vocal attractiveness increases by averaging. Current Biology 20 (2): 116– 120.CrossRefGoogle Scholar
  3. Douglas-Cowie, E., N. Campbell, R. Cowie, and P. Roach. 2003. Emotional speech: Towards a new generation of databases. Speech communication 40 (1): 33–60.CrossRefGoogle Scholar
  4. Fujisaki, H. 1996. Prosody, models, and spontaneous speech. In Computing prosody, ed. Y. Sagisaka, N. Campbell, and N. Higuchi, 27–42l. New York: Springer.Google Scholar
  5. Kawahara, H., and H. Matsui. 2003. Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. In Proceedings of ICASSP2003, Hong Kong, vol I, pp 256–259.Google Scholar
  6. Kawahara, H., and M. Morise. 2011. Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework. Sadhana 36 (5): 713–727.CrossRefGoogle Scholar
  7. Kawahara, H., M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno. 2008. A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0 and aperiodicity estimation. In Proceedings of ICASSP 2008, IEEE, pp. 3933–3936.Google Scholar
  8. Kawahara, H., M. Morise, T. Takahashi, H. Banno, R. Nisimura, and T. Irino. 2010. Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems. In Proceedings of Interspeech2010, ISCA, pp. 38–41.Google Scholar
  9. Kawahara, H., M. Morise, H. Banno, and V. Skuk. 2013. Temporally variable multi-aspect N-way morphing based on interference-free speech representations. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), OS.28-SLA.9.Google Scholar
  10. Matlab. 2013. version (R2013b). The MathWorks Inc., Natick, Massachusetts, USA.Google Scholar
  11. Schröder,M. 2001. Emotional speech synthesis: a review. In INTERSPEECH, pp. 561–564Google Scholar
  12. Schuller, B., A. Batliner, S. Steidl, and D. Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53 (9): 1062–1087.CrossRefGoogle Scholar
  13. Schweinberger, S. R., S. Casper, N. Hauthal, J. M. Kaufmann, H. Kawahara, N. Kloth, and D. M. Robertson. 2008. Auditory adaptation in voice perception. Current Biology 18:684–688.CrossRefGoogle Scholar
  14. Schweinberger, S. R., H. Kawahara, A.P. Simpson, V. G. Skuk, and R. Zäske. 2014. Speaker perception. Wiley Interdisciplinary Reviews: Cognitive Science 5 (1): 15–25.CrossRefGoogle Scholar
  15. Turk, O., and M. Schroder. 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE Trans Audio Speech and Language Processing 18 (5): 965–973.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Wakayama UniversityWakayamaJapan

Personalised recommendations