Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Zhirkov, A. O.; Kortchagine, D. N.; Lukin, A. S.; Krylov, A. S.; Bayakovskii, Yu. M.

doi:10.1023/A:1024970609361

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Published: July 2003

Volume 29, pages 210–218, (2003)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

A. O. Zhirkov¹,
D. N. Kortchagine¹,
A. S. Lukin¹,
A. S. Krylov¹ &
…
Yu. M. Bayakovskii¹

47 Accesses
Explore all metrics

Abstract

Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of Signals from Pulsed Sources Based on the Form of Wavelet Spectra Constructed by the Principal Component Method

Article 10 June 2024

Robust Hierarchical and Sparse Representation of Natural Sounds in High-Dimensional Space

Acoustic Signal Processing

REFERENCES

Bourland, H. and Morgan, N., Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions, http://www.tzi.org/ik98/prog/kursunterlagen/-t2/bourland.html.
Tran, D., Wagner, M., and Zheng, T., A Fuzzy Approach to Statistical Models in Speech and Speaker Recognition, Proc. of 1999 IEEE Int. Fuzzy Systems Conf., Korea, pp. 1275-1280.
Lippman, R. and Gold, B., Neural Classifiers Useful for Speech Recognition, Proc. IEEE First Int. Conf. on Neural Networks, 1987, vol. 4, pp. 417-422.
Google Scholar
Gold, B. and Morgan, N., Speech and Audio Signal Processing, Wiley, 1999.
Demars, C., Two-Dimensional Representations of Speech Signal. Time-frequency Representation and Parametrizations, 1999, http://www.limsi.fr/Individu/-chrd/tablematniE2001.html.html.
Chan, C.P., Lee, T., and Ching, P.C., Two-Dimensional Multi-Resolution Analysis of Speech Signals and Its Application to Speech Recognition, Speech and Signal Processing (Proc. of 1999 IEEE Int. Conf. on Acoustics), 1999, vol. 1, pp. 405-408.
Google Scholar
Dvoryankin, S., Relationship between Digits and Graphics, Sound and Image, Otkrytye sistemy, 2000, no. 3, pp. 25-32.
Google Scholar
Szego, G., Orthogonal Polynomials, Am. Math. Soc. Colloquium Publications, 1959, vol. 23.
Jeckson, D., Fourier Series and Orthogonal Polynomials, in Carus Mathematical Monographs, 1941, no. 6.
Martens, J.-B., The Hermite Transform-Theory, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1595-1606
Google Scholar
Martens, J.-B., The Hermite Transform-Applications, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1607-1618.
Google Scholar
Krylov, A. and Kortchagine, D.N., Projection Filtering in Image Processing, Proc. of the Conf. Graphicon' 2000, Moscow, 2000, pp. 42-45.
Krylov, A. and Liakishev, A.V., Numerical Projection Method for Inverse Fourier Transform and Its Application, Numerical Functional Analysis Optimization, 2000, vol. 21, pp. 205-216.
Google Scholar
Krylov, A.S., Kortchagine, D.N., and Lukin, A.S., Streaming Waveform Data Processing by Hermite Expansion for Text-Independent Speaker Indexing from Continuous Speech, Proc. of the Conf. Graphicon' 2002, Nizhni Novgorod, 2002, pp. 91-98.

Download references

Author information

Authors and Affiliations

Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992, Russia
A. O. Zhirkov, D. N. Kortchagine, A. S. Lukin, A. S. Krylov & Yu. M. Bayakovskii

Authors

A. O. Zhirkov
View author publications
You can also search for this author in PubMed Google Scholar
D. N. Kortchagine
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Lukin
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Krylov
View author publications
You can also search for this author in PubMed Google Scholar
Yu. M. Bayakovskii
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhirkov, A.O., Kortchagine, D.N., Lukin, A.S. et al. Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information. Programming and Computer Software 29, 210–218 (2003). https://doi.org/10.1023/A:1024970609361

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1024970609361

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Abstract

Access this article

Similar content being viewed by others

Recognition of Signals from Pulsed Sources Based on the Form of Wavelet Spectra Constructed by the Principal Component Method

Robust Hierarchical and Sparse Representation of Natural Sounds in High-Dimensional Space

Acoustic Signal Processing

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Abstract

Access this article

Similar content being viewed by others

Recognition of Signals from Pulsed Sources Based on the Form of Wavelet Spectra Constructed by the Principal Component Method

Robust Hierarchical and Sparse Representation of Natural Sounds in High-Dimensional Space

Acoustic Signal Processing

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation