Abstract
The control over aspects of the glottal source signal is fundamental to correctly modify relevant voice characteristics, such as breathiness. This voice quality is strongly related to the characteristics of the glottal source signal produced at the glottis, mainly the shape of the glottal pulse and the aspiration noise. This type of noise results from the turbulence of air passing through the glottis and it can be represented by an amplitude modulated Gaussian noise, which depends on the glottal volume velocity and glottal area. However, the dependency between the glottal signal and the noise component is usually not taken into account for transforming breathiness. In this paper, we propose a method for modelling the aspiration noise which permits to adapt the aspiration noise to take into account its dependency with the glottal pulse shape, while producing high-quality speech. The envelope of the amplitude modulated noise is estimated from the speech signal pitch-synchronously and then it is parameterized by using a non-linear polynomial fitting algorithm. Finally, an asymmetric triangular window is obtained from the non-linear polynomial representation for obtaining a shape of the energy envelope of the noise closer to that of the glottal source. In the experiments for voice transformation, both the proposed aspiration noise model and an acoustic glottal source model are used to transform a modal voice into breathy. Results show that the aspiration noise model improves the voice quality transformation compared with an excitation using only the glottal model and an excitation that combines the glottal source model and a spectral representation of the noise component.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mehta, D., Quatieri, F.: Synthesis, analysis, and pitch modification of the breathy vowel. In: Proc. of IEEE WASPAA, pp. 1628–1639 (2005)
Pantazis, Y., Stylianou, Y.: Improving the modeling of the noise part in the harmonic plus noise model of speech. In: Proc. of ICASSP, pp. 4609–4612 (2008)
Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification, PhD thesis, Ecole Nationale Supérieure des Télécommunications (1996)
Degottex, G., Roebel, A., Rodet, X., “Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter”, Proc. of ICASSP, 5128–5131, 2011.
Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: Glottal Spectral Separation for Parametric Speech Synthesis. In: Proc. Interspeech, pp. 1829–1832 (2008)
Fant, G., Liljencrants, J., Lin, Q.: A four-parameter model of glottal flow. STL-QPSR 26(4), 1–13 (1985)
Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: HMM-based speech synthesiser using the LF-model of the glottal source. In: Proc. of ICASSP (2011)
Alku, P., Vilkman, E., Laine, U.K.: Analysis of glottal waveform in different phonation types using the new IAIF method. In: Proc. of ICPhS, France, vol. 4, pp. 362–365 (1991)
Kawahara, H., Masuda-Katsuse, I., Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f 0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Hermes, D.J.: Synthesis of breathy vowels: some research methods. Speech Communication 10, 497–502 (1991)
Erro, D., Moreno, A.: A Pitch-Asynchronous Simple Method for Speech Synthesis by Diphone Concatenation using the Deterministic plus Stochastic Model. In: SPECOM, Greece, pp. 321–324 (2005)
Mehta, D.: Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification, PhD Thesis, Massachussets Institute of Technology (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cabral, J.P., Carson-Berndsen, J. (2013). Towards a Better Representation of the Envelope Modulation of Aspiration Noise. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)