Abstract
In 1963, Bogert, Healy, and Tukey published a chapter with one of the most unusual titles to be found in the literature of science and engineering [9.1]. In this chapter, they observed that the logarithm of the power spectrum of a signal plus its echo (delayed and scaled replica) consists of the logarithm of the signal spectrum plus a periodic component due to the echo. They suggested that further spectrum analysis of the log spectrum could highlight the periodic component in the log spectrum and thus lead to a new indicator of the occurrence of an echo. Specifically they made the following observation:
In general, we find ourselves operating on the frequency side in ways customary on the time side and vice versa.
As an aid in formalizing this new point of view, they introduced a number of paraphrased words. For example, they defined the cepstrum of a signal as the power spectrum of the logarithm of the power spectrum of a signal. (In fact, they used discrete-time spectrum estimates based on the discrete Fourier transform.) Similarly, the term quefrency was introduced for the independent variable of the cepstrum [9.1].
In this chapter we will explore why the cepstrum has emerged as a central concept in digital speech processing. We will start with definitions appropriate for discrete-time signal processing and develop some of the general properties and computational approaches for the cepstrum of speech. Using this basis, we will explore the many ways that the cepstrum has been used in speech processing applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ASR:
-
automatic speech recognition
- CELP:
-
code-excited linear prediction
- DCT:
-
discrete cosine transform
- DFT:
-
discrete Fourier transform
- DTFT:
-
discrete-time Fourier transform
- DoD:
-
Department of Defense
- FFT:
-
fast Fourier transform
- FIR:
-
finite impulse response
- IDTFT:
-
inverse discrete-time Fourier transform
- LPC:
-
linear prediction coefficients
- LPC:
-
linear predictive coding
- MFCC:
-
mel-filter cepstral coefficient
- VQ:
-
vector quantization
References
B.P. Bogert, M.J.R. Healy, J.W. Tukey: The quefrency alanysis of times series for echos: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, Proc. of the Symposium on Time Series Analysis, ed. by M. Rosenblatt (Wiley, New York 1963)
R.W. Schafer: Echo removal by discrete generalized linear filtering (MIT, Cambridge 1968), Ph.D. dissertation
A.V. Oppenheim, R.W. Schafer, T.G. Stockham Jr.: Nonlinear filtering of multiplied and convolved signals, Proc. IEEE 56(8), 1264-1291 (1968)
A.V. Oppenheim, R.W. Schafer, J.R. Buck: Discrete-Time Signal Processing (Upper Saddle River, Prentice-Hall 1999)
A.V. Oppenheim: Superposition in a Class of Nonlinear Systems (MIT, Cambridge 1964), Ph.D. dissertation, Also: MIT Research Lab. of Electronics, Cambridge, Massachusetts, Technical Report 432
J.M. Tribolet: A new phase unwrapping algorithm, IEEE Trans. Acoust. Speech ASSP-25(2), 170-177 (1977)
G.A. Sitton, C.S. Burrus, J.W. Fox, S. Treitel: Factoring very-high-degree polynomials, IEEE Signal Proc. Mag. 20(6), 27-42 (2003)
L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)
A.V. Oppenheim, R.W. Schafer: Homomorphic analysis of speech, IEEE Trans. Audio Electroacoust. AU-16, 221-228 (1968)
G.E. Kopec, A.V. Oppenheim, J.M. Tribolet: Speech analysis by homomorphic prediction, IEEE Trans. Acoust. Speech ASSP-25(1), 40-49 (1977)
A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41(2), 293-309 (1967)
B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50, 561-580 (1971)
A.V. Oppenheim: A speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am. 45(2), 293-309 (1969)
R.W. Schafer, L.R. Rabiner: System for automatic formant analysis of voiced speech, J. Acoust. Soc. Am. 47(2), 458-465 (1970)
B.S. Atal, J. Remde: A new model of LPC exitation for producing natural-sounding speech at low bit rates, Proc. IEEE ICASSP (1982), 614-617
M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP): high-quality speech at very low bit rates, Proc. IEEE ICASSP (1985), 937-940
R.C. Rose, T.P. Barnwell III: The self excited vocoder - an alternate approach to toll quality at 4800 bps, Proc. IEEE ICASSP 11, 453-456 (1986)
J.H. Chung, R.W. Schafer: Excitation modeling in a homomorphic vocoder, Proc. IEEE ICASSP 1, 25-28 (1990)
J.H. Chung, R.W. Schafer: Performance evaluation of analysis-by-synthesis homomorphic vocoders, Proc. IEEE ICASSP 2, 117-120 (1992)
B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criterion, IEEE Trans. Acoust. Speech ASSP-27, 247-254 (1079)
T.G. Stockham Jr., T.M. Cannon, R.B. Ingebretsen: Blind deconvolution through digital signal processing, Proc. IEEE 63, 678-692 (1975)
S. Furui: Cepstral analysis technique for automatic speaker verification, IEEE Trans. Acoust. Speech ASSP-29(2), 254-272 (1981)
Y. Tohkura: A weighted cepstral distance measure for speech recognition, IEEE Trans. Acoust. Speech ASSP-35(10), 1414-1422 (1987)
B.-H. Juang, L.R. Rabiner, J.G. Wilpon: On the use of bandpass liftering in speech recognition, IEEE Trans. Acoust. Speech ASSP-35(7), 947-954 (1987)
F. Itakura, T. Umezaki: Distance measure for speech recognition based on the smoothed group delay spectrum, Proc. IEEE ICASSP 12, 1257-1260 (1987)
S.B. Davis, P. Mermelstein: Comparison of parametric representations for monosyllabic word recognition in continously spoken sentences, IEEE Trans. Acoust. Speech ASSP-28(4), 357-366 (1980)
P.D. Smith, M. Kucic, R. Ellis, P. Hasler, D.V. Anderson: Mel-frequency cepstrum encoding in analog floating-gate circuitry, Proc. ISCAS 2002(4), 671-674 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schafer, R.W. (2008). Homomorphic Systems and Cepstrum Analysis of Speech. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)