Abstract
Analysis of speech signals is usually carried out using STFT. The most successful features currently being used in both speech recognition and speaker recognition systems are cepstral features. The cepstral features in one way or another are based on the source-filter model of speech production. However, it is well known that a significant part of the acoustic information cannot be modeled by the linear source-filter model. The source-filter model assumes that the sound source for the voiced speech is localized in the larynx and the vocal tract acts as a convolution filter for the emitted sound. Examples of phenomena not well-captured by the source-filter model include unstable airflow, turbulence and nonlinearities arising from oscillators with time-varying masses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood Cliffs
Rao A, Kumaresan R (2000) On decomposing speech into modulated components. IEEE Trans Speech Audio Process 8(3):240–254
Dimitriadis D, Maragos P (2003) Robust energy demodulation based on continuous models with application to speech recognition. In: Proceedings of EUROSPEECH’03, Geneva, pp 2853–2856
Maragos P, Kaiser JF, Quatieri TF (1993) Energy separation in signal modulations with application to speech analysis. IEEE Trans Signal Process 41(10):3024–3051
Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601
Patterson RD (1987) A pulse ribbon model of monoaural phase perception. J Acoust Soc Am 82(5):1560–1586
Paliwal K, Arslan L (2003) Usefulness of phase spectrum in human speech perception. In: Proceeding of EUROSPEECH’03, Geneva, pp 2117–2120
Paliwal K, Alsteris L (2005) On the usefulness of stft phase spectrum in human listening tests. Speech Commun 45(2):153–170
Alsteris L, Paliwal K (2006) Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun 48(6):727–736
Loughlin PJ, Tacer B (1996) On the amplitude and frequency modulation decomposition of signals. J Acoust Soc Am 100(3):1594–1601
Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806
Li G, Qiu L, Ng LK (2000) Signal representation based on instantaneous amplitude models with application to speech synthesis. IEEE Trans Speech Audio Process 8(3):353–357
Dimitriadis V, Maragos P, Potamianos A (2005) Robust AM-FM features for speech recognition. IEEE Signal Process Lett 12(9):621–624
Potamianos A, Maragos P (2001) Time-frequency distributions for automatic speech recognition. IEEE Trans Speech Audio Process 9(3):196–200
Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp 325–328
Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111
Lindemann E, Kates JM (1999) Phase relationships and amplitude envelopes in auditory perception. In: Proceedings of the IEEE workshop on applications of signal processing to audio and acouslics, New Paltz, New York, pp 17–20
Zeng FG, Nie K, Stickney GS, Kong YY, Vongphoe M, Bhargave A, Wei C, Cao K (2005) Speech recognition with amplitude and frequency modulations. Proc Natl Acad Sci U S A 102(7):2293–2298
Saberi K, Hafter ER (1995) A common neural code for frequency and amplitude-modulated sounds. Nature 374:537–539
Haykin S (1994) Communication systems. Wiley, New York
Boashash B (1992) Estimating and interpreting the instanteneous frequency of a signal-part 1: fundamentals. Proc IEEE 80(4):519–538
Potamianos A, Maragos P (1995) Speech formant frequency and bandwidth tracking using multiband energy demodulation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’95), pp 784–787
McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans Acoustic Speech Signal Process 34:744–754
Cohen L, Lee C (1992) Instantaneous bandwidth. In: Boashash B (ed) Time frequency signal analysis-methods and applications, Longman Cheshire, London
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 The Author(s)
About this chapter
Cite this chapter
Holambe, R.S., Deshpande, M.S. (2012). AM-FM: Modulation and Demodulation Techniques. In: Advances in Non-Linear Modeling for Speech Processing. SpringerBriefs in Electrical and Computer Engineering(). Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1505-3_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1505-3_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1504-6
Online ISBN: 978-1-4614-1505-3
eBook Packages: EngineeringEngineering (R0)