Abstract
This paper proposes a flexible method for pitch contour modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch contour is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good, and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms and listening tests. Listening tests are performed on voice conversion application, where the source speaker’s pitch contour is modified by the proposed method according to the target speaker’s pitch contour. The performance of the proposed method is compared with Linear Prediction Pitch Synchronous Overlap and Add (LP-PSOLA) method using listening tests, for the voice conversion application.
Similar content being viewed by others
References
B. Bozkurt, T. Dutoit, R. Prudon, C. D’Alessandro, V. Pagel, Improving quality of MBROLA synthesis for non-uniform units synthesis, in IEEE Workshop on Speech Synthesis, Santa Monica, California, USA, September (2002)
R. Crochiere, A weighted overlap-add method of short time Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Process. 28, 99–102 (1980)
S. Desai, A.W. Black, B. Yegnanarayana, K. Prahlad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Speech Audio Process. 18, 954–964 (2010)
J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-time processing of speech signals (Macmillan Co., New York, 1993)
T. Dutoit, H. Leich, Text-to-speech synthesis based on a MBE resynthesis of segments database. Speech Commun. 13, 435–440 (1993)
M. Edgington, A. Lowry, Residual-based speech modification algorithms for text-to-speech synthesis, in ICSLP, Philadelphia, PA, USA, October (1996)
D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping. In NCC 2009, Guwahati, India, January (2009)
R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Co., New York, 1987)
H. Kawahara, YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 1917–1930 (2002)
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Munich, Germany, vol. 2 (1997), pp. 1303–1306
J. Laroche, Y. Stylianou, E. Moulines, HNS: Speech modification based on a harmonic + noise model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Minneapolis, USA, April (1993), pp. 550–553
S. Lemmetty, Review of speech synthesis technology. Master thesis, Dept. of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, March (1999)
R.H. Laskar, Voice conversion by transforming the vocal tract and prosodic characteristics. Master thesis, Dept. of Electronic and Communication Engineering, Indian Institute of Technology Guwahati, May (2006)
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Commun. 9, 453–467 (1990)
E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)
R. Murali Sankar, A.G. Ramakrishnan, P. Prathibha, Modification of pitch using DCT in source domain. Speech Commun. 42, 143–154 (2004)
P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)
M. Narendranadh, H.A. Murthy, S. Rajendran, B. Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16, 206–216 (1995)
A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)
S.R.M. Prasanna, C.S. Gupta, B. Yegnarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)
S.R.M. Prasanna, P.K. Murthy, B. Yegnanarayana, Speech enhancement using source features and group delay analysis, in INDICON, Chennai, India, December (2005), pp. 19–23
S.R.M. Prasanna, D. Govind, K.S. Rao, B. Yegnanarayana, Fast prosody modification using instants of significant excitation, in Speech Prosody 2010, Chicago, USA, May (2010)
T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Trans. Signal Process. 40, 497–510 (1992)
K.S. Rao, Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May (2005)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Process. 14, 972–980 (2006)
K.S. Rao, R.H. Laskar, S.G. Koolagudi, Voice transformation by mapping the features at syllable level, in 2nd International Conference on Pattern Recognition and Machine Intelligence, Premi-2007, Kolkota, India, December. LNCS (2007) pp. 479–486
K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9, 21–29 (2001)
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 325–333 (1995)
K. Sjolander, J. Beskow, Wavesurfer: an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000). http://www.speech.kth.se/wavesurfer/download.html
B. Yegnanarayana, C. d’Alessandro, V. Darsinos, An iterative algorithm for decomposition of speech signals into periodic and aperiodic components. IEEE Trans. Speech Audio Process. 6, 1–11 (1998)
B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)
Y. Zhang, J. Tao, Prosody modification on mixed-language speech synthesis, in Proc. Int. Conf. Spoken Language Processing, Brisbane, Australia, September (2008)
Acknowledgements
Author would like to acknowledge the reviewers for their valuable comments and suggested corrections. Those have helped us a lot for improving the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rao, K.S. Unconstrained Pitch Contour Modification Using Instants of Significant Excitation. Circuits Syst Signal Process 31, 2133–2152 (2012). https://doi.org/10.1007/s00034-012-9428-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-012-9428-8