Skip to main content

Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7637))

Abstract

Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abushariah, A., Gunawan, T., Khalifa, O., Abushariah, M.: English Digits Speech Recognition System Based on Hidden Markov Models. In: Intl. Conf. on Computer and Communication Engineering (ICCCE 2010), pp. 1–5. IEEE (2010)

    Google Scholar 

  2. Alotaibi, Y.: Investigating Spoken Arabic Digits in Sspeech Recognition Setting. Information Sciences 173(1), 115–139 (2005)

    Article  Google Scholar 

  3. Azam, S., Mansoor, Z., Mughal, M., Mohsin, S.: Urdu Spoken Digits Recognition Using Classified MFCC and Backpropgation Neural Network. In: Computer Graphics, Imaging and Visualisation (CGIV 2007), pp. 414–418. IEEE (2007)

    Google Scholar 

  4. Bresolin, A.A., Neto, A.D.D., Alsina, P.J.: Digit Recognition Using Wavelet and SVM in Brazilian Portuguese. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 1545–1548. IEEE (2008)

    Google Scholar 

  5. Ghanty, S., Shaikh, S., Chaki, N.: On Recognition of Spoken Bengali Numerals. In: International Conference on Computer Information Systems and Industrial Management Applications (CISIM 2010), pp. 54–59. IEEE (2010)

    Google Scholar 

  6. Hu, X., Zhan, L., Xue, Y., Zhou, W., Zhang, L.: Spoken Arabic Digits Recognition Based on Wavelet Neural Networks. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC 2011), pp. 1481–1485. IEEE (2011)

    Google Scholar 

  7. Itakura, F.: Line Spectrum Representation of Linear Predictor Coefficients of Speech Signals. The Journal of the Acoustical Society of America 57, S35 (1975)

    Google Scholar 

  8. Kondo, K., Kamata, H., Ishida, Y.: Speaker-Independent Spoken Digits Recognition Using LVQ. In: IEEE World Congress on Computational Intelligence (WCCI 1994), vol. 7, pp. 4448–4451 (1994)

    Google Scholar 

  9. Kopparapu, S., Rao, P.: Enhancing Spoken Connected-Digit Recognition Accuracy by Error Correction Codes – A Novel Scheme. Sadhana 29(5), 559–571 (2004)

    Article  MATH  Google Scholar 

  10. Markel, J., Gray, A.: Linear Prediction of Speech. Springer (1976)

    Google Scholar 

  11. Oppenheim, A., Schafer, R., Buck, J.: Discrete-Time Signal Processing. Prentice-Hall (1989)

    Google Scholar 

  12. Paliwal, K., Kleijn, W.: Quantization of LPC Parameters. In: Speech Coding and Synthesis, pp. 433–466. Elsevier (1995)

    Google Scholar 

  13. Panwar, M., Sharma, R., Khan, I., Farooq, O.: Design of Wavelet Based Features for Recognition of Hindi Digits. In: Intl. Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT 2011), pp. 232–235 (2011)

    Google Scholar 

  14. Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice-Hall (1978)

    Google Scholar 

  15. Rodrigues, F., Trancoso, I.: Digit Recognition Using the SPEECHDAT Corpus. In: 2nd Conference on Telecommunications (CONFETELE 1999), pp. 1–4 (1999)

    Google Scholar 

  16. Stevens, S.S., Volkmann, J., Newman, E.B.: A Scale for the Measurement of the Psychological Magnitude Pitch. Journal of the Acoustical Society of America 8(3), 185–190 (1937)

    Article  Google Scholar 

  17. Watson, A.: Image Compression Using the Discrete Cosine Transform. Mathematica Journal 4(1), 81–88 (1994)

    Google Scholar 

  18. Zhen, B., Wu, X., Liu, Z., Chi, H.: On the Importance of Components of the MFCC in Speech and Speaker Recognition. In: 6th International Conference on Spoken Language Processing (ICSLP 2000), pp. 487–490 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Silva, D.F., de Souza, V.M.A., Batista, G.E.A.P.A., Giusti, R. (2012). Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds) Advances in Artificial Intelligence – IBERAMIA 2012. IBERAMIA 2012. Lecture Notes in Computer Science(), vol 7637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34654-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34654-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34653-8

  • Online ISBN: 978-3-642-34654-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics