Abstract
Consideration was given to the transformations of speech in the frequency domain which precede extraction of the informative attributes of phonemes. A processing of the speech spectrum ensuring stability of recognition in the presence of frequency distortions and additive noise was proposed. It is based on linear bandpass filtering of the logarithmic amplitude spectrum and subsequent nonlinear transformation that models the effect of lateral inhibition in the auditory analyzer.
Similar content being viewed by others
REFERENCES
Picone, J.W., Signal Modeling Techniques in Speech Recognition, Proc. IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.
Fant, G., Acoustic Theory of Speech Perception, Hague: Mouton, 1960. Translated under the title Akusticheskaya teoriya recheobrazovaniya, Moscow: Nauka, 1964.
Flanagan, J.L., Speech Analysis, Synthesis and Perception, Berlin: Springer, 1965. Translated under the title Analiz, sintez i vospriyatie rechi, Moscow: Svyaz', 1968.
Stevens, K.N., Acoustic Correlates of Some Phonetic Categories, J. Acoust. Soc. Am., 1980, vol. 68, no. 3, pp. 836–842.
Chistovich, L.A., Ventsov, A.V., Granstrem, M.P., et al., Physiology of Speech. Human Perception, in Rukovodstvo po fiziologii (Manual on Physiology), Leningrad: Nauka, 1976.
Zwicker, E. and Terhardt, E., Analytical Expressions for Critical-Band Rate and Critical Bandwidth as a Function of Frequency, J. Acoust. Soc. Am., 1980, vol. 68, no. 5, pp. 1523–1525.
Traunmüller, H., Analytical Expressions for the Tonotopic Sensory Scale, J. Acoust. Soc. Am., 1990, vol. 88, no. 1, pp. 97–100.
Varshavskii, L.A. and Chistovich, L.A., Mean Spectra of the Russian Vowel Phoneme, Probl. Phyziol. Akust., 1959, vol. IV, pp. 181–186.
Pirogov, A.A., On Phonetic Speech Coding, Elektrosvyaz', 1967, no. 5, pp. 24–31.
Kolokolov, A.S. and Yakhno, V.P., Speaker-Independent Recognition of Isolated Voice Commands on the Basis of Auditory Models, Avtom. Telemekh., 1995, no. 8, pp. 150–157.
Sachs, M.B. and Kiang, N.Y.S., Two-Tone Inhibition in Auditory Nerve Fibers, J. Acoust. Soc. Am., 1968, vol. 43, pp. 1120–1128.
Lyubinskii, I.A., Pozin, N.V., and Yakhno, V.P., Analysis of the Models of Uniform Neural Layer with Lateral Connections, Avtom. Telemekh., 1967, no. 10, pp. 168–181.
Kolokolov, A.S., Ob odnom metode analiza periodicheskikh signalov, iskazhennykh additivnym shumom (On a Method of Analysis of Periodic Signals in Additive Noise), Available from VINITI, 1983, Moscow, no. 6253–83.
Kolokolov, A.S., Lyubinskii, I.A., and Yakhno, V.P., Improving the Signal-to-Noise Ratio by Nonlinear Filtering of the Amplitude Spectrum, 14th All-Union Symp. on Hydroacoustics, Minsk, 1986, pp. 107–109.
Childers, D.G., Skinner, D.P., and Kemerait, R.C., The Cepstrum: A Guide to Processing, Proc. IEEE, 1977, vol. 65, no. 10, pp. 1428–1443.
Juang, B.H., Rabiner, L.R., and Wilpon, J.G., On the Use of Bandpass Liftering in Speech Recognition, IEEE Trans. Acoust., Speech, Signal Proc., 1987, vol. 35, no. 7, pp. 947–954.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kolokolov, A.S. Signal Preprocessing for Speech Recognition. Automation and Remote Control 63, 494–501 (2002). https://doi.org/10.1023/A:1014714820229
Issue Date:
DOI: https://doi.org/10.1023/A:1014714820229