Skip to main content

Towards a Better Representation of the Envelope Modulation of Aspiration Noise

  • Conference paper
Book cover Advances in Nonlinear Speech Processing (NOLISP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

Abstract

The control over aspects of the glottal source signal is fundamental to correctly modify relevant voice characteristics, such as breathiness. This voice quality is strongly related to the characteristics of the glottal source signal produced at the glottis, mainly the shape of the glottal pulse and the aspiration noise. This type of noise results from the turbulence of air passing through the glottis and it can be represented by an amplitude modulated Gaussian noise, which depends on the glottal volume velocity and glottal area. However, the dependency between the glottal signal and the noise component is usually not taken into account for transforming breathiness. In this paper, we propose a method for modelling the aspiration noise which permits to adapt the aspiration noise to take into account its dependency with the glottal pulse shape, while producing high-quality speech. The envelope of the amplitude modulated noise is estimated from the speech signal pitch-synchronously and then it is parameterized by using a non-linear polynomial fitting algorithm. Finally, an asymmetric triangular window is obtained from the non-linear polynomial representation for obtaining a shape of the energy envelope of the noise closer to that of the glottal source. In the experiments for voice transformation, both the proposed aspiration noise model and an acoustic glottal source model are used to transform a modal voice into breathy. Results show that the aspiration noise model improves the voice quality transformation compared with an excitation using only the glottal model and an excitation that combines the glottal source model and a spectral representation of the noise component.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mehta, D., Quatieri, F.: Synthesis, analysis, and pitch modification of the breathy vowel. In: Proc. of IEEE WASPAA, pp. 1628–1639 (2005)

    Google Scholar 

  2. Pantazis, Y., Stylianou, Y.: Improving the modeling of the noise part in the harmonic plus noise model of speech. In: Proc. of ICASSP, pp. 4609–4612 (2008)

    Google Scholar 

  3. Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification, PhD thesis, Ecole Nationale Supérieure des Télécommunications (1996)

    Google Scholar 

  4. Degottex, G., Roebel, A., Rodet, X., “Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter”, Proc. of ICASSP, 5128–5131, 2011.

    Google Scholar 

  5. Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: Glottal Spectral Separation for Parametric Speech Synthesis. In: Proc. Interspeech, pp. 1829–1832 (2008)

    Google Scholar 

  6. Fant, G., Liljencrants, J., Lin, Q.: A four-parameter model of glottal flow. STL-QPSR 26(4), 1–13 (1985)

    Google Scholar 

  7. Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: HMM-based speech synthesiser using the LF-model of the glottal source. In: Proc. of ICASSP (2011)

    Google Scholar 

  8. Alku, P., Vilkman, E., Laine, U.K.: Analysis of glottal waveform in different phonation types using the new IAIF method. In: Proc. of ICPhS, France, vol. 4, pp. 362–365 (1991)

    Google Scholar 

  9. Kawahara, H., Masuda-Katsuse, I., Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f 0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)

    Article  Google Scholar 

  10. Hermes, D.J.: Synthesis of breathy vowels: some research methods. Speech Communication 10, 497–502 (1991)

    Article  Google Scholar 

  11. Erro, D., Moreno, A.: A Pitch-Asynchronous Simple Method for Speech Synthesis by Diphone Concatenation using the Deterministic plus Stochastic Model. In: SPECOM, Greece, pp. 321–324 (2005)

    Google Scholar 

  12. Mehta, D.: Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification, PhD Thesis, Massachussets Institute of Technology (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cabral, J.P., Carson-Berndsen, J. (2013). Towards a Better Representation of the Envelope Modulation of Aspiration Noise. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38847-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38846-0

  • Online ISBN: 978-3-642-38847-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics