Abstract
This article presents comprehensive technical information about STRAIGHT and TANDEM-STRAIGHT, a widely used speech modification tool and its successor. They share the same concept: the periodic excitation found in voiced sounds is an efficient mechanism for transmitting underlying smooth time–frequency representation. The tools are also based on the perceptual equivalence of two sets of independent Gaussian random signals. This equivalence makes it possible to discard input phase information intentionally and enables flexible manipulation of parameters.
Similar content being viewed by others
References
Harris F J 1978 On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE 66(1): 51–83
Kawahara H, Masuda-Katsuse I, de Cheveigné A 1999a Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction, Speech Commun. 27(3–4): 187–207
Kawahara H, Katayose H, de Cheveigné A, Patterson R D 1999b Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, In Proc. EUROSPEECH’99, ESCA. vol. 6, pp. 2781–2784
Kawahara H, de Cheveigné A, Banno H, Takahashi T, Irino T 2005 Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT, In Proc. Interspeech 2005 ISCA, pp. 537–540
Kawahara H 2006 STRAIGHT, exploration of the other aspect of vocoder: Perceptually isomorphic decomposition of speech sounds, Acoust. Sci. Technol. 27(5): 349–353
Kawahara H, Morise M, Takahashi T, Nisimura R, Irino T, Banno H 2008 A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0 and aperiodicity estimation, In Proc. ICASSP 2008 IEEE, pp. 3933–3936
Morise M, Takahashi T, Kawahara H, Irino T 2007 Power spectrum estimation method for periodic signals virtually irrespective to time window position, Trans. IEICE J90-D(12): 3265–3267 (in Japanese)
Nuttall A H 1981 Some windows with very good sidelobe behavior, IEEE Trans. Audio Speech Signal Process. 29(1): 84–91
Unser M 2000 Sampling – 50 years after Shannon, Proc. IEEE 88(4): 569–587
Welch P D 1967 The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms, IEEE Trans. Audio Electroacoust. AU-15(2): 70–73
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
KAWAHARA, H., MORISE, M. Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework. Sadhana 36, 713–727 (2011). https://doi.org/10.1007/s12046-011-0043-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-011-0043-3