Mandarin Voice Conversion Using Tone Codebook Mapping
A tone codebook mapping method is proposed to obtain a better performance in voice conversion of Mandarin speech than the conventional conversion method which deals mainly with short-time spectral envelopes. The pitch contour of the whole Mandarin syllable is used as a unit type for pitch conversion. The syllable pitch contours are first extracted from the source and target utterances. Time normalization and moving average filtering are then performed on them. These preprocessed pitch contours are classified to generate the source and target tone codebooks, and by associating them, a Mandarin tone mapping codebook is finally obtained in terms of speech alignment. Experiment results show that the proposed method for voice conversion can deliver a satisfactory performance in Mandarin speech.
KeywordsPitch Contour Target Tone Target Speaker Voice Conversion Pitch Difference
Unable to display preview. Download preview PDF.
- 1.Moulines, E., Sagisaka, Y.: Voice conversion: state of the art and perspectives. Special Issue of Speech Communication 16(2), 125–126 (1995)Google Scholar
- 2.Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice Conversion through Vector Quantization. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, NY, USA, pp. 655–658 (1988)Google Scholar
- 4.Türk, O.: New Methods for Voice Conversion (MS thesis). Boğaziçi University, Turkey (2003)Google Scholar
- 5.Zhou, T.: Modern Chinese Phonetics. Beijing Normal University Press, Beijing (1990)Google Scholar
- 6.Chu, M.: Research on Chinese TTS system with high intelligibility and naturalness (Doctoral thesis). Institute of Acoustic, Chinese Academy of Sciences, Beijing (1995)Google Scholar
- 7.Zhu, T., Gao, W.: Data Mining for Learning Mandarin Prosodic Models. Chinese Journal of Computer 23(11), 1179–1183 (2000)Google Scholar
- 8.Kain, A., Macon, M.: Spectral Voice Conversion for Text-to-Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, May 1998, pp. 285–288 (1998)Google Scholar