Voice Transformation by Mapping the Features at Syllable Level

  • K. Sreenivasa Rao
  • R. H. Laskar
  • Shashidhar G. Koolagudi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)

Abstract

Voice transformation involves modifying the source speaker voice to target speaker voice. Voice characteristics of a speaker depends on the shape of the glottal pulse (source characteristics), shape of the vocal tract system (system characteristics) and the long term features (prosody or supra-segmental) of the speech signal produced by the speaker. In this paper we proposed the mapping functions to transform the vocal tract characteristics and intonation characteristics from source speaker to target speaker. Mapping functions are developed by the features extracted from syllable level. The shape of the vocal tract system is characterized by linear prediction coefficients, and the mapping function is realized by a five layer feedforward neural network. Mapping of the intonation characteristics (pitch contour) is provided by associating the code books derived from the pitch contours of the source and target speakers. The proposed mapping functions are used in voice transformation task. The target speaker’s speech is synthesized and evaluated using listening tests. The results of the listening tests indicate that the proposed voice transformation provides better mapping of the voice characteristics compared to the earlier method proposed by the author. The original and the synthesized speech signals obtained using mapping functions are available for listening at http://shilloi.iitg.ernet.in/~ksrao/result.html

References

  1. 1.
    Lee, K.-S.: Statistical approach for voice personality transformation. IEEE Trans. Audio, Speech, and Language processing 15, 641–651 (2007)CrossRefGoogle Scholar
  2. 2.
    Yegnanarayana, B., Reddy, K.S., Kishore, S.P.: Source and system features for speaker recognition using AANN models. In: Proc. ICASSP, Salt lake city, Utah, USA, pp. 409–412 (May 2001)Google Scholar
  3. 3.
    Narendranadh, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Communication 16, 206–216 (1995)Google Scholar
  4. 4.
    Arslan, L.M.: Speaker transformation algorithm using segmental code books (STASC). Speech Communication 28, 211–226 (1999)CrossRefGoogle Scholar
  5. 5.
    Lee, K.S., Youn, D.H., Cha, I.W.: A new voice personality transformation based on both linear and non-linear prediction analysis. In: Proc. ICSLP, pp. 1401–1404 (1996)Google Scholar
  6. 6.
    Rao, K.S., Yegnanarayana, B.: Voice conversion by prosody and vocal tract modification. In: Proc. Int. Conf. Information Technology, pp. 111–116 (December 2006)Google Scholar
  7. 7.
    Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. ICASSP, vol. 2, pp. 841–844 (May 2001)Google Scholar
  8. 8.
    Abe, M., Nakanura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. ICASSP, pp. 655–658 (May 1998)Google Scholar
  9. 9.
    Inanoglu, Z.: Transforming pitch in a voice conversion framework, M.Phil thesis, St.Edmund’s College University of Cambridge (July 2003)Google Scholar
  10. 10.
    Stylianou, Y., Cappe, Y., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Processing 6, 131–142 (1998)CrossRefGoogle Scholar
  11. 11.
    Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks, Computer Speech and Language, pp. 282–295 (April 2007)Google Scholar
  12. 12.
    Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall Inc., New Jersey (1999)MATHGoogle Scholar
  13. 13.
    Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Commn. 28(1), 84–95 (1980)CrossRefGoogle Scholar
  14. 14.
    Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Trans. Audio, Speech and Language Processing 14, 972–980 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • K. Sreenivasa Rao
    • 1
  • R. H. Laskar
    • 2
  • Shashidhar G. Koolagudi
    • 1
  1. 1.School of Information Technology, IIT Kharagpur, Kharagpur-721302, West BengalIndia
  2. 2.Department of Electrical Engineering, NIT Silchar, Silchar, AssamIndia

Personalised recommendations