Skip to main content
Log in

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Bourland, H. and Morgan, N., Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions, http://www.tzi.org/ik98/prog/kursunterlagen/-t2/bourland.html.

  2. Tran, D., Wagner, M., and Zheng, T., A Fuzzy Approach to Statistical Models in Speech and Speaker Recognition, Proc. of 1999 IEEE Int. Fuzzy Systems Conf., Korea, pp. 1275-1280.

  3. Lippman, R. and Gold, B., Neural Classifiers Useful for Speech Recognition, Proc. IEEE First Int. Conf. on Neural Networks, 1987, vol. 4, pp. 417-422.

    Google Scholar 

  4. Gold, B. and Morgan, N., Speech and Audio Signal Processing, Wiley, 1999.

  5. Demars, C., Two-Dimensional Representations of Speech Signal. Time-frequency Representation and Parametrizations, 1999, http://www.limsi.fr/Individu/-chrd/tablematniE2001.html.html.

  6. Chan, C.P., Lee, T., and Ching, P.C., Two-Dimensional Multi-Resolution Analysis of Speech Signals and Its Application to Speech Recognition, Speech and Signal Processing (Proc. of 1999 IEEE Int. Conf. on Acoustics), 1999, vol. 1, pp. 405-408.

    Google Scholar 

  7. Dvoryankin, S., Relationship between Digits and Graphics, Sound and Image, Otkrytye sistemy, 2000, no. 3, pp. 25-32.

    Google Scholar 

  8. Szego, G., Orthogonal Polynomials, Am. Math. Soc. Colloquium Publications, 1959, vol. 23.

  9. Jeckson, D., Fourier Series and Orthogonal Polynomials, in Carus Mathematical Monographs, 1941, no. 6.

  10. Martens, J.-B., The Hermite Transform-Theory, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1595-1606

    Google Scholar 

  11. Martens, J.-B., The Hermite Transform-Applications, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1607-1618.

    Google Scholar 

  12. Krylov, A. and Kortchagine, D.N., Projection Filtering in Image Processing, Proc. of the Conf. Graphicon' 2000, Moscow, 2000, pp. 42-45.

  13. Krylov, A. and Liakishev, A.V., Numerical Projection Method for Inverse Fourier Transform and Its Application, Numerical Functional Analysis Optimization, 2000, vol. 21, pp. 205-216.

    Google Scholar 

  14. Krylov, A.S., Kortchagine, D.N., and Lukin, A.S., Streaming Waveform Data Processing by Hermite Expansion for Text-Independent Speaker Indexing from Continuous Speech, Proc. of the Conf. Graphicon' 2002, Nizhni Novgorod, 2002, pp. 91-98.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhirkov, A.O., Kortchagine, D.N., Lukin, A.S. et al. Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information. Programming and Computer Software 29, 210–218 (2003). https://doi.org/10.1023/A:1024970609361

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024970609361

Keywords

Navigation