Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

Rao, Krothapalli Sreenivasa

doi:10.1007/s00034-012-9428-8

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

Published: 01 May 2012

Volume 31, pages 2133–2152, (2012)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Krothapalli Sreenivasa Rao¹

213 Accesses
8 Citations
Explore all metrics

Abstract

This paper proposes a flexible method for pitch contour modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch contour is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good, and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms and listening tests. Listening tests are performed on voice conversion application, where the source speaker’s pitch contour is modified by the proposed method according to the target speaker’s pitch contour. The performance of the proposed method is compared with Linear Prediction Pitch Synchronous Overlap and Add (LP-PSOLA) method using listening tests, for the voice conversion application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Article 04 September 2015

A Pitch Estimation Method Robust to High Levels of Noise

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

References

B. Bozkurt, T. Dutoit, R. Prudon, C. D’Alessandro, V. Pagel, Improving quality of MBROLA synthesis for non-uniform units synthesis, in IEEE Workshop on Speech Synthesis, Santa Monica, California, USA, September (2002)
Google Scholar
R. Crochiere, A weighted overlap-add method of short time Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Process. 28, 99–102 (1980)
Article Google Scholar
S. Desai, A.W. Black, B. Yegnanarayana, K. Prahlad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Speech Audio Process. 18, 954–964 (2010)
Article Google Scholar
J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-time processing of speech signals (Macmillan Co., New York, 1993)
Google Scholar
T. Dutoit, H. Leich, Text-to-speech synthesis based on a MBE resynthesis of segments database. Speech Commun. 13, 435–440 (1993)
Article Google Scholar
M. Edgington, A. Lowry, Residual-based speech modification algorithms for text-to-speech synthesis, in ICSLP, Philadelphia, PA, USA, October (1996)
Google Scholar
D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping. In NCC 2009, Guwahati, India, January (2009)
Google Scholar
R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Co., New York, 1987)
Google Scholar
H. Kawahara, YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 1917–1930 (2002)
Article Google Scholar
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
Article Google Scholar
H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Munich, Germany, vol. 2 (1997), pp. 1303–1306
Google Scholar
J. Laroche, Y. Stylianou, E. Moulines, HNS: Speech modification based on a harmonic + noise model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Minneapolis, USA, April (1993), pp. 550–553
Chapter Google Scholar
S. Lemmetty, Review of speech synthesis technology. Master thesis, Dept. of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, March (1999)
R.H. Laskar, Voice conversion by transforming the vocal tract and prosodic characteristics. Master thesis, Dept. of Electronic and Communication Engineering, Indian Institute of Technology Guwahati, May (2006)
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Commun. 9, 453–467 (1990)
Article Google Scholar
E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)
Article Google Scholar
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)
Article Google Scholar
R. Murali Sankar, A.G. Ramakrishnan, P. Prathibha, Modification of pitch using DCT in source domain. Speech Commun. 42, 143–154 (2004)
Article Google Scholar
P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)
Article Google Scholar
M. Narendranadh, H.A. Murthy, S. Rajendran, B. Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16, 206–216 (1995)
Google Scholar
A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)
Google Scholar
S.R.M. Prasanna, C.S. Gupta, B. Yegnarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)
Article Google Scholar
S.R.M. Prasanna, P.K. Murthy, B. Yegnanarayana, Speech enhancement using source features and group delay analysis, in INDICON, Chennai, India, December (2005), pp. 19–23
Google Scholar
S.R.M. Prasanna, D. Govind, K.S. Rao, B. Yegnanarayana, Fast prosody modification using instants of significant excitation, in Speech Prosody 2010, Chicago, USA, May (2010)
Google Scholar
T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Trans. Signal Process. 40, 497–510 (1992)
Article Google Scholar
K.S. Rao, Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May (2005)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Process. 14, 972–980 (2006)
Article Google Scholar
K.S. Rao, R.H. Laskar, S.G. Koolagudi, Voice transformation by mapping the features at syllable level, in 2nd International Conference on Pattern Recognition and Machine Intelligence, Premi-2007, Kolkota, India, December. LNCS (2007) pp. 479–486
Chapter Google Scholar
K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)
Article MATH Google Scholar
Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9, 21–29 (2001)
Article Google Scholar
R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 325–333 (1995)
Article Google Scholar
K. Sjolander, J. Beskow, Wavesurfer: an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000). http://www.speech.kth.se/wavesurfer/download.html
Google Scholar
B. Yegnanarayana, C. d’Alessandro, V. Darsinos, An iterative algorithm for decomposition of speech signals into periodic and aperiodic components. IEEE Trans. Speech Audio Process. 6, 1–11 (1998)
Article Google Scholar
B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)
Article Google Scholar
Y. Zhang, J. Tao, Prosody modification on mixed-language speech synthesis, in Proc. Int. Conf. Spoken Language Processing, Brisbane, Australia, September (2008)
Google Scholar

Download references

Acknowledgements

Author would like to acknowledge the reviewers for their valuable comments and suggested corrections. Those have helped us a lot for improving the quality of the paper.

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
Krothapalli Sreenivasa Rao

Authors

Krothapalli Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krothapalli Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S. Unconstrained Pitch Contour Modification Using Instants of Significant Excitation. Circuits Syst Signal Process 31, 2133–2152 (2012). https://doi.org/10.1007/s00034-012-9428-8

Download citation

Received: 01 August 2011
Revised: 17 April 2012
Published: 01 May 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s00034-012-9428-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

Abstract

Access this article

Similar content being viewed by others

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

A Pitch Estimation Method Robust to High Levels of Noise

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

Abstract

Access this article

Similar content being viewed by others

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

A Pitch Estimation Method Robust to High Levels of Noise

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation