Signal Processing and Feature Extraction
The initial stages in speech processing, discussed in Chapter 4, are commonly performed using a short-time Fourier transformation (STFT) of the digitally-sampled acoustic time series. Several representations of the STFT have been employed for automatic speech recognition, including linear, logarithmic scale, logarithmic mel-scale, cepstral and differenced-cepstral coefficients. However, recent investigations of mammalian auditory processing have determined that the cochlea is a time-domain analyzer, and that the STFT representation is not always the most appropriate method of signal analysis. Therefore, this chapter reviews the properties and behavior of cochlear models and their importance to ASR. It emphasizes the benefits gained from better models of “early” signal processing in mammals. A discussion of artificial neural network applications for conventional signal processing problems follows. The remainder of this chapter discusses how low-level “feature maps” may be created and used in ASR applications.
KeywordsHide Layer Hair Cell Auditory Nerve Firing Pattern Outer Hair Cell
Unable to display preview. Download preview PDF.