Integrating coding techniques into LP-based Mandarin text-to-speech synthesis
- 88 Downloads
In this paper, speech coding techniques are integrated into a Mandarin text-to-speech system. By exploiting the intrinsic properties of Mandarin, we encode the acoustic features of 408 syllabic utterances into templates, each containing modeling parameters for speech synthesis. As a result, the developed TTS system demands merely 36 Kbytes to store all syllabic templates.
In the synthesis stage, modeling parameters retrieved from the templates are modified according to the prosody estimated from a hierarchically layered model. To render a general view of the performance of this TTS system, we conduct listening tests and end up with 86.4% intelligibility and 97% comprehensibility. A simplified Mandarin TTS system is also implemented on an FPGA development board. The realization on an FPGA makes us to believe that such a TTS synthesizer can be easily incorporable with other portable devices as a voicing interface.
KeywordsText-to-speech Speech coding Linear prediction synthesizer
Unable to display preview. Download preview PDF.
- Bailly, G., Benoit, C., & Sawallis, T. (Eds.) (1992). Talking machines: theories, models and designs. Amsterdam: North Holland, Elsevier. Google Scholar
- Chiang, C. Y., Chen, S. H., & Wang, Y. R. (2005). On the inter-syllable coarticulation effect of pitch modeling for Mandarin speech. In Proceeding of interspeech (pp. 3269–3272). Google Scholar
- Choi, J., Hon, H. W., Lebrun, J. L., Lee, S. P., Loudon, G., Phan, V. H., & Yogananthan, S. (1994). Yanhui, a software based high performance Mandarin text-to-speech system. In Proc. ROCLING XII (pp. 35–50). Google Scholar
- Chu, M., Tang, D., Si, H., Tian, X., & Lu, S. (1998). Research on perception of juncture between syllables in Chinese. Chinese Journal of Acoustics, 17(2), 143–152. Google Scholar
- Cohen, G., & Malah, D. (1995). Speech analysis and synthesis using a glottal excited AR model with DTW-based glottal determination. In 18th Convention of electrical and Electronics Engineers, 3.2.3 (pp. 1–5). Google Scholar
- Fujisaki, H., & Hirose, K. (1984). Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (E), 5(4), 233–241. Google Scholar
- Hu, H. T., Kuo, F. J., & Wang, H. J. (2000). A pseudo glottal excitation model for the linear prediction vocoder with speech signals coded at 1.6 kbps. IEICE Transactions on Information and Systems, E83-D(8), 1654–1661. Google Scholar
- Hund, A. (1993). Software dreams and talking machines. Available at http://us.geocities.com/tim_hobbs.geo/sw2.htm.
- Hwang, S. H., Chen, S. H., & Wang, Y. R. (1996). A Mandarin text-to-speech system. In Proc. 4th int. conf. spoken language (Vol. 3, pp. 1421–1424). Google Scholar
- Klatt, D. H. (1982). The Klattalk text-to-speech system. In Proc. IEEE int. conf. acoust. speech signal process (Vol. 7, pp. 1589–1592). Google Scholar
- Laroche, J., Stylianou, Y., & Moulines, E. (1993). HNS: Speech modification based on a harmonic + noise model. In Proc. IEEE int. conf. acoust. speech signal process (Vol. 2, pp. 550–553). Google Scholar
- Liu, C. S., Ju, G. H., Wang, W. J., Wang, H. C., & Lai, W. H. (1991). A new speech synthesizer for text-to-speech system using multipulse excitation with pitch predictor. In Proc. IEEE int. conf. computer process. Chinese and oriental languages (pp. 205–209). Google Scholar
- Silva, S. S., & Netto, S. L. (2004). Closed-form estimation of the amplitude commands in the automatic extraction of the Fujisaki’s model. In Proc. IEEE int. conf. acoust. speech signal process (Vol. 1, pp. 621–624). Google Scholar
- Supplee, L. M., Cohn, R. P., & Collura, J. S. (1997). MELP: the new federal standard at 2400 bps. In Proc. IEEE int. conf. acoust. speech signal process (Vol. 2, pp. 1591–1594). Google Scholar
- Taylor, P., Black, A. W., & Caley, R. (1998). The architecture of the festival speech synthesis system. In Proceedings of the third ESCA workshop in speech synthesis (pp. 147–151). Available at http://www.cstr.ed.ac.uk/projects/festival/.
- Wu, C. H., Chen, C. H., & Juang, S. C. (1995). An CELP-based prosodic information modification and generation of Mandarin text-to-speech. In Proc. ROCLING XIII (pp. 233–251). Google Scholar