Skip to main content

ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform

  • Conference paper
MICAI 2005: Advances in Artificial Intelligence (MICAI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

  • 1074 Accesses

Abstract

In this work, we propose to use as source of speech information the Short-MelfrequencyCepstra Time Transform (SMCTT), c τ (t). The SMCTT studies the time properties at quefrency τ. Since the SMCTT signal, c τ (t), comes from a nonlinear transformation of the speech signal, s(t), it makes the STMCTT a potential signal with new properties in time, frequency, quefrency, etc. The goal of this work is to present the performance of the SMCTT signal when the SMCTT is applied to an Automatic Speech Recognition (ASR) task. Our experiment results show that important information is given by this SMCTT waveform, c τ (t).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Davis, S., Mermelstain, P.: Comparasion of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASSP ASSP-28, 357–366 (1980)

    Article  Google Scholar 

  2. Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech, Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP 1985, April 1985, vol. 10, pp. 509–512 (1985)

    Google Scholar 

  3. Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis for Speech. Journal of Acoustic Society of America, 1738–1752 (1990)

    Google Scholar 

  4. Zbancioc, M., Costin, M.: Using neural networks and LPCC to improve speech recognition, Signals, Circuits and Systems. In: International Symposium on SCS 2003, July 10-11, 2003, vol. 2, p. 445 (2003)

    Google Scholar 

  5. Cohen, L.: Time-Frequency analysis. Prentice Hall, Englewood Cliffs (1995)

    Google Scholar 

  6. Tahir, S.M., Sha’ameri, A.Z.: A comparison between speech signal representation using linear prediction and Gabor transform, Communications. In: The 9th Asia-Pacific Conference on APCC 2003, September 21-24, 2003, vol. 2, pp. 859–862 (2003)

    Google Scholar 

  7. Martin, W., Flandrin, P.: Winger-ville Spectral Anaysis of Nonstationary Process. IEEE Proc. ASSP ASSP-33(6) (December 1985)

    Google Scholar 

  8. Kadra, L.M.: The smoothed pseduo Wigner Distribution in Speech Processing. Int. J. Electronics 65(4), 743–755 (1988)

    Article  Google Scholar 

  9. Graudari: Speech Signal Analysis using the Wigner distribution. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Conference Proceeding, October 13-16, 1996, vol. 1, pp. 497–501 (1989)

    Google Scholar 

  10. Oppenheim, A.V., Schafer, R.W.: Digital Signal Processing. Prentice Hall (1975); Sphinxlee, K.: Large Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD thesis, Computer Science Department, Carnegie Mellon University (April 1988)

    Google Scholar 

  11. Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals, Comparasion of parametric representations for Prentice Hall, Sec. 6.2 (1993)

    Google Scholar 

  12. Chu, S.M., Libal, V., Marcheret, E., Neti, C.: Multistage information fusion for audio-visual speech recognition Multimedia and Expo. In: IEEE International Conference on ICME 2004, June 27-30, 2004, vol. 3, pp. 1651–1654 (2004)

    Google Scholar 

  13. Rao, R.A., Mersereau, R.M.: Lip modeling for visual speech recognition Signals, Systems and Computers. In: Conference Record of the Twenty-Eighth Asilomar Conference on 1994, 31 October-2 November 1994, vol. 1, pp. 587–590 (1994)

    Google Scholar 

  14. Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.: Analysis of lip geometric features for audio-visual speech recognition, Systems, Man and Cybernetics. IEEE Transactions on Part A 34(4), 564–570 (2004)

    Google Scholar 

  15. Yuhas, B.P., Goldstein Jr., M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine 27(11), 65–71 (1989)

    Article  Google Scholar 

  16. Foo, S.W., Lian, Y., Dong, L.: Recognition of visual speech elements using adaptively boosted hidden Markov models. IEEE Transactions on Circuits and Systems for Video Technology 14(5), 693–705 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nolazco-Flores, J.A. (2005). ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_88

Download citation

  • DOI: https://doi.org/10.1007/11579427_88

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29896-0

  • Online ISBN: 978-3-540-31653-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics