Neural Network Configurations Analysis for Identification of Speech Pattern with Low Order Parameters

Lima, Priscila; Barros, Allan; Silva, Washington

doi:10.1007/978-3-319-69266-1_17

Priscila Lima⁵,
Allan Barros⁵ &
Washington Silva⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 751))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

702 Accesses

Abstract

This work proposes the analysis between two neural network configurations for development a intelligent recognition system of speech signal patterns of numerical commands in Brazilian Portuguese. Thus, the Multilayer Perceptron (MLP) and Learning Vector Quantization (LVQ) networks are evaluated their performance in the course of training, validation and testing in speech signal recognition, whose pattern of speech signal is given by a two-dimensional time matrix, resulting of the encoding of the mel-cepstral coefficients (MFCC) through application of discrete cosine transform (DCT). These patterns have reduced set of parameters and the configurations of neural network in analysis use few examples for each pattern through training. It was carried out many simulations for network topologies and some selected learning algorithms to determine the network structures with best hit and generalization results. The potential this proposed approach is shown by check up on obtained outcomes with others classifiers, represented by Gaussian Mixture Models (GMM) and Support Vector Machines (SVM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Petry, F.E.: Speech recognition: a current perspective: in spite of limitations, areas of application are growing, and voice communication with computers may well be commonplace by the 21st century. IEEE Potentials 2, 18–20 (1983). https://doi.org/10.1109/MP.1983.6499579
Husnjak, S., Perakovic, D., Jovovic, I.: Possibilities of using speech recognition systems of smart terminal devices in traffic environment. Proc. Eng. 69, 778–787 (2014)
Article Google Scholar
Špale, J., Schweize, C.: Speech control of measurement devices. IFAC-PapersOnLine 49, 13–18 (2016). https://doi.org/10.1016/j.ifacol.2016.12.003
Breen, A., et al.: Voice in the user interface. In: Interactive Displays: Natural Human-Interface Technologies. https://doi.org/10.1002/9781118706237.ch3 (2014)
Bisio, I., et al.: Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans. Emerg. Top. Comput. 1, 244–257 (2013). https://doi.org/10.1109/TETC.2013.2274797
Weng, F., et al.: Conversational in-vehicle dialog systems: the past, present, and future. IEEE Signal Process. Mag. 33, 49–60 (2016). https://doi.org/10.1109/MSP.2016.2599201
Yang, Y., Li, L.: The design and implementation of a smart e-receptionist. IEEE Potentials 32, 22–27 (2013). https://doi.org/10.1109/MPOT.2012.2213851
Singh, T., Yadav, N.: Voice recognition based advance patient’s room automation. IJRET: Int. J. Res. Eng. Technol. 4, 308–310 (2015)
Google Scholar
Silva, W.L.S.: Intelligent genetic fuzzy inference system for speech recognition: an approach from low order feature based on discrete cosine transform. J. Control Autom. Electr. Syst. 25, 689–698 (2014)
Article Google Scholar
Bellegarda, J.R., Monz, C.: State of the art in statistical methods for language and speech processing. Comput. Speech Lang. 35, 163–184 (2016). https://doi.org/10.1016/j.csl.2015.07.001
Youcef, B.C.: Speech recognition system based on OLLO French corpus by using MFCCs. In: Lecture Notes in Electrical Engineering. https://doi.org/10.1007/978-3-319-48929-2_25 (2017)
Sarma, M., Sarma, K.K.: Acoustic modeling of speech signal using artificial neural network: a review of techniques and current trends. In: Intelligent Applications for Heterogeneous System Modeling and Design. https://doi.org/10.4018/978-1-4666-8493-5.ch012 (2015)
Lee, C.H., Siniscalchi, S.M.: An information-extraction approach to speech processing: analysis, detection, verification, and recognition. Proc. IEEE 101, 1089–1115 (2013). https://doi.org/10.1109/JPROC.2013.2238591
O’Shaughnessy, D.: Acoustic analysis for automatic speech recognition. Proc. IEEE 101, 1038–1053 (2013). https://doi.org/10.1109/JPROC.2013.2251592
Silva, W.: Sistema de Inferência Genética-Nebuloso para Reconhecimento de Voz (System of fuzzy- genetic inference for speech recognition) Thesis, Federal University of Maranhão (2015)
Google Scholar
Bridle, J.S.: Neural networks or hidden markov models for automatic speech recognition: is there a choice? In: Pietro, L., De Mori, R. (eds.) Speech Recognition and Understanding: Recent Advances, Trends and Applications, vol. 75, pp. 225–236. Springer, Heidelberg (1992)
Chapter Google Scholar
McCrocklin, S.M.: Pronunciation learner autonomy: the potential of automatic speech recognition. System 57, 25–42 (2016). https://doi.org/10.1016/j.system.2015.12.013
Picheny, M.: Trends and advances in speech recognition. IBM J. Res. Dev. 55, 1–18 (2011). https://doi.org/10.1147/JRD.2011.2163277
Haton, J.P.: Neural networks for automatic speech recognition: a review. In: Chollet, G., et al. (eds.) Speech Processing, Recognition and Artificial Neural Networks: Proceedings of the 3rd International School on Neural Nets “Eduardo R. Caianiello”, 1999, pp. 259–280. Springer, London (1999)
Google Scholar
Nightingale, C., Myers, D.J., Linggard, R.: Introduction neural networks for vision, speech and natural language. In: Nightingale, C., Myers, D.J., Linggard, R. (eds.) Neural Networks for Vision, Speech and Natural Language, vol. 1, pp. 1–4. Springer, Netherlands (1992)
Google Scholar
Siniscalchi, S.M., Svendsen, T., Lee, C.-H.: An artificial neural network approach to automatic speech processing. Neurocomputing 140, 326–338 (2014). https://doi.org/10.1016/j.neucom.2014.03.005
Lippmann, R.P.: Review of neural networks for speech recognition. Neural Comput. 1, 1–38 (1989). https://doi.org/10.1162/neco.1989.1.1.1
Hu, Y.H., Hwang, J.N. (eds.): Handbook of Neural Networks for Speech Processing. CRC Press, Washington DC (2014)
Google Scholar
Kim, M.W., Ryu, J.W., Kim, E.J.: Speech recognition with multi-modal features based on neural networks. In: King, I., et al. (eds.) Neural Information Processing: 13th International Conference, ICONIP 2006, Hong Kong, China, October, 2006. Proceedings, Part II. Lecture Notes in Computer Science (Lecture Notes in Neural Information Processing), vol. 4233, pp. 489–498. Springer, Heidelberg (2006)
Google Scholar
Veselý, K., Burget, L., Grézl, F.: Parallel training of neural networks for speech recognition. In: Sojka, P., et al. (eds.) Text, Speech and Dialogue: 13th International Conference, TSD 2010, Brno, Czech Republic, Sept 2010. Lecture Notes in Computer Science (Lecture Notes in Text, Speech and Dialogue), vol. 6231. Springer, Heidelberg (2010)
Google Scholar
Yam, J.Y.F., Chow, T.W.S.: A weight initialization method for improving training speed in feedforward neural network. Neural Comput. 30, 219–232 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Maranhão, Portugueses Avenue, 1966, Bacanga Village, São Luís-Ma, Brazil
Priscila Lima & Allan Barros
Federal Institute of Maranhão, Getúlio Vargas Avenue, 4, Monte Castelo, São Luís-Ma, Brazil
Washington Silva

Authors

Priscila Lima
View author publications
You can also search for this author in PubMed Google Scholar
Allan Barros
View author publications
You can also search for this author in PubMed Google Scholar
Washington Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priscila Lima .

Editor information

Editors and Affiliations

School of Computing, Ulster University at Jordanstown, Newtownabbey, County Antrim, United Kingdom
Yaxin Bi
The Science and Information (SAI) Organization, Bradford, United Kingdom
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, United Kingdom
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lima, P., Barros, A., Silva, W. (2018). Neural Network Configurations Analysis for Identification of Speech Pattern with Low Order Parameters. In: Bi, Y., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2016. Studies in Computational Intelligence, vol 751. Springer, Cham. https://doi.org/10.1007/978-3-319-69266-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-69266-1_17
Published: 31 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69265-4
Online ISBN: 978-3-319-69266-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics