Skip to main content

Non-stationary Self-consistent Acoustic Objects as Atoms of Voiced Speech

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Abstract

To account for the strong non-stationarity of voiced speech and its nonlinear aero-acoustic origin, the classical source-filter model is extended to a cascaded drive-response model with a conventional linear secondary response, a synchronized and/or synchronously modulated primary response and a non-stationary fundamental drive which plays the role of the long time-scale part of the basic time-scale separation of acoustic perception. The transmission proto col of voiced speech is assumed to be based on non-stationary acoustic objects which can be synthesized as the described secondary response and which are analysed by introducing a self-consistent (filter stable) part-tone decom position, suited to reconstruct the hidden funda mental drive and to confirm its topo logical equivalence to a glottal master oscillator. The filter-stable part-tone decomposition opens the option of a phase modulation trans mission protocol of voiced speech. Aiming at communi cation channel invariant acoustic features of voiced speech, the phase modulation cues are expected to be particularly suited to extend and/or replace the classical feature vectors of phoneme and speaker recognition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gold, B., Morgan, N.: Speech and audio signal processing. John Wiley & Sons, Chichester (2000)

    Google Scholar 

  2. Moore, B.C.J.: An introduction to the psychology of hearing. Academic Press, London (1989)

    Google Scholar 

  3. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, NJ, Englewood Cliffs (1978)

    Google Scholar 

  4. Kantz, H., Schreiber, T.: Nonlinear time series analysis. Cambridge Univ. Press, Cambridge (1997)

    MATH  Google Scholar 

  5. Herzel, H., Berry, D., Titze, I.R., Saleh, M.: Analysis of vocal disorders with methods from nonlinear dynamics. J. Speech Hear. Res. 37, 1008–1019 (1994)

    Google Scholar 

  6. Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production in the vocal tract. In: Proc NATO ASI on Speech Production and Speech Modelling, pp. 241–261 (1990)

    Google Scholar 

  7. Jackson, P.J.B., Shadle, C.H.: Pitch scaled estimation of simultaneous voiced and turbulent-noise components in speech. IEEE trans. speech audio process 9, 713–726 (2001)

    Article  Google Scholar 

  8. Schoentgen, J.: Stochastic models of jitter. J. Acoust. Soc. Am. 109(4), 1631–1650 (2001)

    Article  Google Scholar 

  9. Grice, M.: Intonation. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, vol. 5, Elsevier, Oxford (2006)

    Google Scholar 

  10. Drepper, F.R.: A two-level drive-response model of non-stationary speech signals. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 125–138. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Drepper, F.R.: Voiced excitation as entrained primary response of a reconstructed glottal master oscillator. In: Interspeech 2005, Lisboa, pp. 329–332 (2005)

    Google Scholar 

  12. Drepper, F.R.: Fortschritte der Akustik-DAGA 2006 (2006)

    Google Scholar 

  13. Drepper, F.R.: Voiced speech as response of a self-consistent fundamental drive. Speech Comm. 49, 186–200 (2007)

    Article  Google Scholar 

  14. Rulkov, N.F., Sushchik, M.M., Tsimring, L.S., Abarbanel, H.D.I.: Generalized synchronization of chaos in directionally coupled systems. Phys. Rev. E 51, 980–994 (1995)

    Article  Google Scholar 

  15. Afraimovich, V.S., Verichev, N.N., Rabinovich, M.I.: Stochastic synchronization of oscillation in dissipative systems. Radiophys. Quantum Electron. 29, 795 (1986)

    Article  MathSciNet  Google Scholar 

  16. Rameau, J.-P.: Generation harmonique. In: Jacobi, E. (ed.) Complete Theoretical Writings, vol. 3, American Institute of Musicology (1967)

    Google Scholar 

  17. Seebeck, A.: Über die Definition des Tones. Poggendorf’s Annalen der Physik und Chemie LXIII, 353–368 (1844)

    Article  Google Scholar 

  18. Terhardt, E., Stoll, G., Seewann, M.: Algorithm for extraction of pitch and pitch salience from complex tonal signals. J. Acoust. Soc. Am. 71, 679–688 (1982)

    Article  Google Scholar 

  19. Goldstein, J.: An optimum processor theory for the central formation of the pitch of complex tones. J. Acoust. Soc. Am. 54, 1496–1516 (1973)

    Article  Google Scholar 

  20. Paliwal, K.K., Atal, B.S.: Frequency-related representation of speech. In: Eurospeech 2003, Genf (2003)

    Google Scholar 

  21. Kawahara, H., Katayose, H., de Cheveigné, A., Patterson, R.: Fixed point analysis of frequency to instantaneous frequency mapping. EuroSpeech 99, 2781–2784 (1999)

    Google Scholar 

  22. McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech a. Signal Proc. ASSP 34(4), 744–754 (1986)

    Article  Google Scholar 

  23. Heinbach, W.: Aurally adequate signal representation: The part-tone-time-pattern. Acustica 67, 113–121 (1988)

    Google Scholar 

  24. Patterson, R.D.: Auditory images: How complex sounds are represented in the auditory system. J. Acoust. Soc. Jpn (E) 21, 4 (2000)

    Google Scholar 

  25. Hohmann, V.: Frequency analysis and synthesis using a gammatone filterbank. Acta Acustica 10, 433–442 (2002)

    Google Scholar 

  26. Gabor, D.: Acoustic quanta and the theory of hearing. Nature 159, 591–594 (1947)

    Google Scholar 

  27. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H., Zheng, Q., Yen, N.-C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A 454, 903–995 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  28. Bouzid, A., Ellouze, N.: Voiced Speech Analysis by Empirical Mode Decomposition. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, J.-L. (eds.) NOLISP 2005. LNCS, vol. 4885, pp. 213–220. Springer, Heidelberg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Drepper, F.R. (2007). Non-stationary Self-consistent Acoustic Objects as Atoms of Voiced Speech. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77347-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77346-7

  • Online ISBN: 978-3-540-77347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics