Abstract
Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.
Similar content being viewed by others
REFERENCES
Bourland, H. and Morgan, N., Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions, http://www.tzi.org/ik98/prog/kursunterlagen/-t2/bourland.html.
Tran, D., Wagner, M., and Zheng, T., A Fuzzy Approach to Statistical Models in Speech and Speaker Recognition, Proc. of 1999 IEEE Int. Fuzzy Systems Conf., Korea, pp. 1275-1280.
Lippman, R. and Gold, B., Neural Classifiers Useful for Speech Recognition, Proc. IEEE First Int. Conf. on Neural Networks, 1987, vol. 4, pp. 417-422.
Gold, B. and Morgan, N., Speech and Audio Signal Processing, Wiley, 1999.
Demars, C., Two-Dimensional Representations of Speech Signal. Time-frequency Representation and Parametrizations, 1999, http://www.limsi.fr/Individu/-chrd/tablematniE2001.html.html.
Chan, C.P., Lee, T., and Ching, P.C., Two-Dimensional Multi-Resolution Analysis of Speech Signals and Its Application to Speech Recognition, Speech and Signal Processing (Proc. of 1999 IEEE Int. Conf. on Acoustics), 1999, vol. 1, pp. 405-408.
Dvoryankin, S., Relationship between Digits and Graphics, Sound and Image, Otkrytye sistemy, 2000, no. 3, pp. 25-32.
Szego, G., Orthogonal Polynomials, Am. Math. Soc. Colloquium Publications, 1959, vol. 23.
Jeckson, D., Fourier Series and Orthogonal Polynomials, in Carus Mathematical Monographs, 1941, no. 6.
Martens, J.-B., The Hermite Transform-Theory, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1595-1606
Martens, J.-B., The Hermite Transform-Applications, IEEE Trans. Acoustics, Speech, Signal Processing, 1990, vol. 38, pp. 1607-1618.
Krylov, A. and Kortchagine, D.N., Projection Filtering in Image Processing, Proc. of the Conf. Graphicon' 2000, Moscow, 2000, pp. 42-45.
Krylov, A. and Liakishev, A.V., Numerical Projection Method for Inverse Fourier Transform and Its Application, Numerical Functional Analysis Optimization, 2000, vol. 21, pp. 205-216.
Krylov, A.S., Kortchagine, D.N., and Lukin, A.S., Streaming Waveform Data Processing by Hermite Expansion for Text-Independent Speaker Indexing from Continuous Speech, Proc. of the Conf. Graphicon' 2002, Nizhni Novgorod, 2002, pp. 91-98.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhirkov, A.O., Kortchagine, D.N., Lukin, A.S. et al. Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information. Programming and Computer Software 29, 210–218 (2003). https://doi.org/10.1023/A:1024970609361
Issue Date:
DOI: https://doi.org/10.1023/A:1024970609361