Skip to main content
Log in

Pithc-synchronous articulatory synthesis incorporated with the inverse solution of speech production

  • Science & Engineering
  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

This paper presents a new proposal to synthesize natural sounds with less control parameters by combining the inverse speech production and pitch-synchronous articulatory synthesis. The pitch-synchronous excited Reflection-Type Line Analog (RTLA) model is employed as the synthesis filter. Multi-rate system sampling and dynamic scattering wave adjustment are used to handle the variable VT length and the acoustic continuity. The synthesizer is controlled by vocal-tract (VT) area functions. Given the targets of formant trajectories, the dynamic VT area function which is modeled by time variant VT length is derived using an inverse solution of speech production. A distinguishing feature of this method is that artificially specified formant trace can be precisely aimed in the synthetical sounds. Experimental results show that the formant target can be well matched by the synthetic sounds. Potential application to text-to-speech conversion of this method is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Entropic Research Lab., 1993. Mannual of Xwaves, ESPS programs Version 5.0

  • Gupta, S. K. and Schroeter, J., 1993. Pitch-synchronous frame-by-frame and segment-based articulatory analysis by synthesis.J. Acoust. Soc. Am.,94(5):2517–2530.

    Article  Google Scholar 

  • Kelly, J. L. and Lockbaum, C. C., 1962. Speech synthesis. Proc. 4th Int. Congress on Acoustics, Copenhagen,G(42):1–4.

  • Liljencrants, J., 1985. Reflection-type line analog synthesis. Ph. D. Thesis, Royal Institution of Technology (KTH), Stockholm, p. 141.

    Google Scholar 

  • Mermelstein, P., 1967. Determination of vocal tract shapes from measured formant frequencies.J. Acoust. Soc. Am.,41(5):1283–1294.

    Article  Google Scholar 

  • Moulines, E., 1995. Time-domain and frequency-domain techniques for prosodic modification of speech.In: Speech Coding and Synthesis, Edited by Kleijn, W. B. and Paliwal, K. K., Elsevier, Amsterdam, p. 519–555.

    Google Scholar 

  • Rosenberg, A. E., 1971. Effect of pulse shape on the quality of natural vowels.J. Acoust. Soc. Am.,49(2): 583–591.

    Article  Google Scholar 

  • Schroeder, M. R., 1967. Determination of the geometry of the human vocal tract by acoustic measurements.J. Acoust. Soc. Am.,41(4):1002–1010.

    Article  Google Scholar 

  • Schroeder, J. and Sondhi, M. M., 1994. Techniques for estimating vocal-tract shapes from the speech signal.IEEE Trans. Speech & Audio Processing,2(1–II):133–150.

    Article  Google Scholar 

  • Wu, H. Y., Badin, P. and Cheng, Y. M., 1987. Vocal tract simulation: implementation of continuous variation of the length in Kelly-Lochbaum model, effects of area function spatial sampling. Proc. ICASSP’86,1:9–12.

    Google Scholar 

  • Yu, Z. L. and Ching, P. C., 1996. Determination of vocaltract shapes from formant frequencies based on perturbation theory and interpolation method. Proc. ICASSP’96, Atlanta, USA,1:369–372.

    Google Scholar 

  • Yu, Z. L. and Ching, P. C., 1997. Geometrically and acoustically optimized codebook for unique mapping from formants to vocal-tract shape. Proc. EUROSPEECH’97, Rhodes, Greece,5:2551–2554.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhen-li.

Additional information

Project supported by NSFC (69972046), and Zhenjiang Provincial Natural Science Foundation of China (698076).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhen-li, Y., Pak-chung, C. Pithc-synchronous articulatory synthesis incorporated with the inverse solution of speech production. J. Zhejiang Univ. Sci. A 1, 388–393 (2000). https://doi.org/10.1631/jzus.2000.0388

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.2000.0388

Key words

Document code

CLC number

Navigation