Skip to main content
Log in

Signal Preprocessing for Speech Recognition

  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

Consideration was given to the transformations of speech in the frequency domain which precede extraction of the informative attributes of phonemes. A processing of the speech spectrum ensuring stability of recognition in the presence of frequency distortions and additive noise was proposed. It is based on linear bandpass filtering of the logarithmic amplitude spectrum and subsequent nonlinear transformation that models the effect of lateral inhibition in the auditory analyzer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Picone, J.W., Signal Modeling Techniques in Speech Recognition, Proc. IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.

    Google Scholar 

  2. Fant, G., Acoustic Theory of Speech Perception, Hague: Mouton, 1960. Translated under the title Akusticheskaya teoriya recheobrazovaniya, Moscow: Nauka, 1964.

    Google Scholar 

  3. Flanagan, J.L., Speech Analysis, Synthesis and Perception, Berlin: Springer, 1965. Translated under the title Analiz, sintez i vospriyatie rechi, Moscow: Svyaz', 1968.

    Google Scholar 

  4. Stevens, K.N., Acoustic Correlates of Some Phonetic Categories, J. Acoust. Soc. Am., 1980, vol. 68, no. 3, pp. 836–842.

    Google Scholar 

  5. Chistovich, L.A., Ventsov, A.V., Granstrem, M.P., et al., Physiology of Speech. Human Perception, in Rukovodstvo po fiziologii (Manual on Physiology), Leningrad: Nauka, 1976.

    Google Scholar 

  6. Zwicker, E. and Terhardt, E., Analytical Expressions for Critical-Band Rate and Critical Bandwidth as a Function of Frequency, J. Acoust. Soc. Am., 1980, vol. 68, no. 5, pp. 1523–1525.

    Google Scholar 

  7. Traunmüller, H., Analytical Expressions for the Tonotopic Sensory Scale, J. Acoust. Soc. Am., 1990, vol. 88, no. 1, pp. 97–100.

    Google Scholar 

  8. Varshavskii, L.A. and Chistovich, L.A., Mean Spectra of the Russian Vowel Phoneme, Probl. Phyziol. Akust., 1959, vol. IV, pp. 181–186.

  9. Pirogov, A.A., On Phonetic Speech Coding, Elektrosvyaz', 1967, no. 5, pp. 24–31.

  10. Kolokolov, A.S. and Yakhno, V.P., Speaker-Independent Recognition of Isolated Voice Commands on the Basis of Auditory Models, Avtom. Telemekh., 1995, no. 8, pp. 150–157.

  11. Sachs, M.B. and Kiang, N.Y.S., Two-Tone Inhibition in Auditory Nerve Fibers, J. Acoust. Soc. Am., 1968, vol. 43, pp. 1120–1128.

    Google Scholar 

  12. Lyubinskii, I.A., Pozin, N.V., and Yakhno, V.P., Analysis of the Models of Uniform Neural Layer with Lateral Connections, Avtom. Telemekh., 1967, no. 10, pp. 168–181.

  13. Kolokolov, A.S., Ob odnom metode analiza periodicheskikh signalov, iskazhennykh additivnym shumom (On a Method of Analysis of Periodic Signals in Additive Noise), Available from VINITI, 1983, Moscow, no. 6253–83.

  14. Kolokolov, A.S., Lyubinskii, I.A., and Yakhno, V.P., Improving the Signal-to-Noise Ratio by Nonlinear Filtering of the Amplitude Spectrum, 14th All-Union Symp. on Hydroacoustics, Minsk, 1986, pp. 107–109.

  15. Childers, D.G., Skinner, D.P., and Kemerait, R.C., The Cepstrum: A Guide to Processing, Proc. IEEE, 1977, vol. 65, no. 10, pp. 1428–1443.

    Google Scholar 

  16. Juang, B.H., Rabiner, L.R., and Wilpon, J.G., On the Use of Bandpass Liftering in Speech Recognition, IEEE Trans. Acoust., Speech, Signal Proc., 1987, vol. 35, no. 7, pp. 947–954.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolokolov, A.S. Signal Preprocessing for Speech Recognition. Automation and Remote Control 63, 494–501 (2002). https://doi.org/10.1023/A:1014714820229

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014714820229

Keywords

Navigation