Abstract
In this work, we propose to use as source of speech information the Short-MelfrequencyCepstra Time Transform (SMCTT), c τ (t). The SMCTT studies the time properties at quefrency τ. Since the SMCTT signal, c τ (t), comes from a nonlinear transformation of the speech signal, s(t), it makes the STMCTT a potential signal with new properties in time, frequency, quefrency, etc. The goal of this work is to present the performance of the SMCTT signal when the SMCTT is applied to an Automatic Speech Recognition (ASR) task. Our experiment results show that important information is given by this SMCTT waveform, c τ (t).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Davis, S., Mermelstain, P.: Comparasion of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASSP ASSP-28, 357–366 (1980)
Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech, Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP 1985, April 1985, vol. 10, pp. 509–512 (1985)
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis for Speech. Journal of Acoustic Society of America, 1738–1752 (1990)
Zbancioc, M., Costin, M.: Using neural networks and LPCC to improve speech recognition, Signals, Circuits and Systems. In: International Symposium on SCS 2003, July 10-11, 2003, vol. 2, p. 445 (2003)
Cohen, L.: Time-Frequency analysis. Prentice Hall, Englewood Cliffs (1995)
Tahir, S.M., Sha’ameri, A.Z.: A comparison between speech signal representation using linear prediction and Gabor transform, Communications. In: The 9th Asia-Pacific Conference on APCC 2003, September 21-24, 2003, vol. 2, pp. 859–862 (2003)
Martin, W., Flandrin, P.: Winger-ville Spectral Anaysis of Nonstationary Process. IEEE Proc. ASSP ASSP-33(6) (December 1985)
Kadra, L.M.: The smoothed pseduo Wigner Distribution in Speech Processing. Int. J. Electronics 65(4), 743–755 (1988)
Graudari: Speech Signal Analysis using the Wigner distribution. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Conference Proceeding, October 13-16, 1996, vol. 1, pp. 497–501 (1989)
Oppenheim, A.V., Schafer, R.W.: Digital Signal Processing. Prentice Hall (1975); Sphinxlee, K.: Large Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD thesis, Computer Science Department, Carnegie Mellon University (April 1988)
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals, Comparasion of parametric representations for Prentice Hall, Sec. 6.2 (1993)
Chu, S.M., Libal, V., Marcheret, E., Neti, C.: Multistage information fusion for audio-visual speech recognition Multimedia and Expo. In: IEEE International Conference on ICME 2004, June 27-30, 2004, vol. 3, pp. 1651–1654 (2004)
Rao, R.A., Mersereau, R.M.: Lip modeling for visual speech recognition Signals, Systems and Computers. In: Conference Record of the Twenty-Eighth Asilomar Conference on 1994, 31 October-2 November 1994, vol. 1, pp. 587–590 (1994)
Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.: Analysis of lip geometric features for audio-visual speech recognition, Systems, Man and Cybernetics. IEEE Transactions on Part A 34(4), 564–570 (2004)
Yuhas, B.P., Goldstein Jr., M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine 27(11), 65–71 (1989)
Foo, S.W., Lian, Y., Dong, L.: Recognition of visual speech elements using adaptively boosted hidden Markov models. IEEE Transactions on Circuits and Systems for Video Technology 14(5), 693–705 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nolazco-Flores, J.A. (2005). ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_88
Download citation
DOI: https://doi.org/10.1007/11579427_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)