ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform

Nolazco-Flores, Juan Arturo

doi:10.1007/11579427_88

Juan Arturo Nolazco-Flores²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1074 Accesses

Abstract

In this work, we propose to use as source of speech information the Short-MelfrequencyCepstra Time Transform (SMCTT), c _τ(t). The SMCTT studies the time properties at quefrency τ. Since the SMCTT signal, c _τ(t), comes from a nonlinear transformation of the speech signal, s(t), it makes the STMCTT a potential signal with new properties in time, frequency, quefrency, etc. The goal of this work is to present the performance of the SMCTT signal when the SMCTT is applied to an Automatic Speech Recognition (ASR) task. Our experiment results show that important information is given by this SMCTT waveform, c _τ(t).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Davis, S., Mermelstain, P.: Comparasion of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASSP ASSP-28, 357–366 (1980)
Article Google Scholar
Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech, Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP 1985, April 1985, vol. 10, pp. 509–512 (1985)
Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis for Speech. Journal of Acoustic Society of America, 1738–1752 (1990)
Google Scholar
Zbancioc, M., Costin, M.: Using neural networks and LPCC to improve speech recognition, Signals, Circuits and Systems. In: International Symposium on SCS 2003, July 10-11, 2003, vol. 2, p. 445 (2003)
Google Scholar
Cohen, L.: Time-Frequency analysis. Prentice Hall, Englewood Cliffs (1995)
Google Scholar
Tahir, S.M., Sha’ameri, A.Z.: A comparison between speech signal representation using linear prediction and Gabor transform, Communications. In: The 9th Asia-Pacific Conference on APCC 2003, September 21-24, 2003, vol. 2, pp. 859–862 (2003)
Google Scholar
Martin, W., Flandrin, P.: Winger-ville Spectral Anaysis of Nonstationary Process. IEEE Proc. ASSP ASSP-33(6) (December 1985)
Google Scholar
Kadra, L.M.: The smoothed pseduo Wigner Distribution in Speech Processing. Int. J. Electronics 65(4), 743–755 (1988)
Article Google Scholar
Graudari: Speech Signal Analysis using the Wigner distribution. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Conference Proceeding, October 13-16, 1996, vol. 1, pp. 497–501 (1989)
Google Scholar
Oppenheim, A.V., Schafer, R.W.: Digital Signal Processing. Prentice Hall (1975); Sphinxlee, K.: Large Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD thesis, Computer Science Department, Carnegie Mellon University (April 1988)
Google Scholar
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals, Comparasion of parametric representations for Prentice Hall, Sec. 6.2 (1993)
Google Scholar
Chu, S.M., Libal, V., Marcheret, E., Neti, C.: Multistage information fusion for audio-visual speech recognition Multimedia and Expo. In: IEEE International Conference on ICME 2004, June 27-30, 2004, vol. 3, pp. 1651–1654 (2004)
Google Scholar
Rao, R.A., Mersereau, R.M.: Lip modeling for visual speech recognition Signals, Systems and Computers. In: Conference Record of the Twenty-Eighth Asilomar Conference on 1994, 31 October-2 November 1994, vol. 1, pp. 587–590 (1994)
Google Scholar
Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.: Analysis of lip geometric features for audio-visual speech recognition, Systems, Man and Cybernetics. IEEE Transactions on Part A 34(4), 564–570 (2004)
Google Scholar
Yuhas, B.P., Goldstein Jr., M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine 27(11), 65–71 (1989)
Article Google Scholar
Foo, S.W., Lian, Y., Dong, L.: Recognition of visual speech elements using adaptively boosted hidden Markov models. IEEE Transactions on Circuits and Systems for Video Technology 14(5), 693–705 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, ITESM, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Col. Tecnológico, Monterrey, N.L., C.P. 64849, México
Juan Arturo Nolazco-Flores

Authors

Juan Arturo Nolazco-Flores
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh
Technológico de Monterrey (ITESM), Campus Ciudad de México (CCM), Calle del Puente 222, Col. Ejudos de Huipulco, 14360 DF, Tlalpan, Mexico
Álvaro de Albornoz
Center for Intelligent Systems, Tecnológico de Monterrey, Campus Monterrey, 64849, Monterrey, N.L., Mexico
Hugo Terashima-Marín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nolazco-Flores, J.A. (2005). ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_88

Download citation

DOI: https://doi.org/10.1007/11579427_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics