Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies

Silva, Diego F.; de Souza, Vinícius M. A.; Batista, Gustavo E. A. P. A.; Giusti, Rafael

doi:10.1007/978-3-642-34654-5_25

Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies

Diego F. Silva²¹,
Vinícius M. A. de Souza²¹,
Gustavo E. A. P. A. Batista²¹ &
…
Rafael Giusti²¹

Conference paper

1926 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7637))

Abstract

Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abushariah, A., Gunawan, T., Khalifa, O., Abushariah, M.: English Digits Speech Recognition System Based on Hidden Markov Models. In: Intl. Conf. on Computer and Communication Engineering (ICCCE 2010), pp. 1–5. IEEE (2010)
Google Scholar
Alotaibi, Y.: Investigating Spoken Arabic Digits in Sspeech Recognition Setting. Information Sciences 173(1), 115–139 (2005)
Article Google Scholar
Azam, S., Mansoor, Z., Mughal, M., Mohsin, S.: Urdu Spoken Digits Recognition Using Classified MFCC and Backpropgation Neural Network. In: Computer Graphics, Imaging and Visualisation (CGIV 2007), pp. 414–418. IEEE (2007)
Google Scholar
Bresolin, A.A., Neto, A.D.D., Alsina, P.J.: Digit Recognition Using Wavelet and SVM in Brazilian Portuguese. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 1545–1548. IEEE (2008)
Google Scholar
Ghanty, S., Shaikh, S., Chaki, N.: On Recognition of Spoken Bengali Numerals. In: International Conference on Computer Information Systems and Industrial Management Applications (CISIM 2010), pp. 54–59. IEEE (2010)
Google Scholar
Hu, X., Zhan, L., Xue, Y., Zhou, W., Zhang, L.: Spoken Arabic Digits Recognition Based on Wavelet Neural Networks. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC 2011), pp. 1481–1485. IEEE (2011)
Google Scholar
Itakura, F.: Line Spectrum Representation of Linear Predictor Coefficients of Speech Signals. The Journal of the Acoustical Society of America 57, S35 (1975)
Google Scholar
Kondo, K., Kamata, H., Ishida, Y.: Speaker-Independent Spoken Digits Recognition Using LVQ. In: IEEE World Congress on Computational Intelligence (WCCI 1994), vol. 7, pp. 4448–4451 (1994)
Google Scholar
Kopparapu, S., Rao, P.: Enhancing Spoken Connected-Digit Recognition Accuracy by Error Correction Codes – A Novel Scheme. Sadhana 29(5), 559–571 (2004)
Article MATH Google Scholar
Markel, J., Gray, A.: Linear Prediction of Speech. Springer (1976)
Google Scholar
Oppenheim, A., Schafer, R., Buck, J.: Discrete-Time Signal Processing. Prentice-Hall (1989)
Google Scholar
Paliwal, K., Kleijn, W.: Quantization of LPC Parameters. In: Speech Coding and Synthesis, pp. 433–466. Elsevier (1995)
Google Scholar
Panwar, M., Sharma, R., Khan, I., Farooq, O.: Design of Wavelet Based Features for Recognition of Hindi Digits. In: Intl. Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT 2011), pp. 232–235 (2011)
Google Scholar
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice-Hall (1978)
Google Scholar
Rodrigues, F., Trancoso, I.: Digit Recognition Using the SPEECHDAT Corpus. In: 2nd Conference on Telecommunications (CONFETELE 1999), pp. 1–4 (1999)
Google Scholar
Stevens, S.S., Volkmann, J., Newman, E.B.: A Scale for the Measurement of the Psychological Magnitude Pitch. Journal of the Acoustical Society of America 8(3), 185–190 (1937)
Article Google Scholar
Watson, A.: Image Compression Using the Discrete Cosine Transform. Mathematica Journal 4(1), 81–88 (1994)
Google Scholar
Zhen, B., Wu, X., Liu, Z., Chi, H.: On the Importance of Components of the MFCC in Speech and Speaker Recognition. In: 6th International Conference on Spoken Language Processing (ICSLP 2000), pp. 487–490 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Brazil
Diego F. Silva, Vinícius M. A. de Souza, Gustavo E. A. P. A. Batista & Rafael Giusti

Authors

Diego F. Silva
View author publications
You can also search for this author in PubMed Google Scholar
Vinícius M. A. de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo E. A. P. A. Batista
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Giusti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Facultad de Informática, Universidad Complutense de Madrid, c\ Profesor José García Santesmases, 28040, Madrid, Spain
Juan Pavón & Rubén Fuentes-Fernández &
Universidad Nacional de Colombia, Carrera 30 No 45-03, Edificio 477, Bogotá, DC, Colombia
Néstor D. Duque-Méndez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, D.F., de Souza, V.M.A., Batista, G.E.A.P.A., Giusti, R. (2012). Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds) Advances in Artificial Intelligence – IBERAMIA 2012. IBERAMIA 2012. Lecture Notes in Computer Science(), vol 7637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34654-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-34654-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34653-8
Online ISBN: 978-3-642-34654-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics