Abstract
The automatic recognition of speaker-independent digits requires high accuracy and robustness to noise and variability. Spiking neural networks (SNNs) are a promising model for this task, as they can mimic the temporal dynamics and energy efficiency of the human auditory system. However, SNNs are difficult to train and often require complex learning algorithms. Spoken digits provide a useful benchmark task to evaluate new SNN architectures. Performance in small vocabulary tasks is an important first step before scaling up to more complex recognition scenarios. In this paper, we propose to use an extreme learning layer (ELL) as a simple and effective way to improve the learning of SNNs for spoken digit recognition. The ELL is a randomly generated layer that maps the input features to the next layer without any further adjustment. The output layer is then trained by entropy minimization. We show that ELL can boost the performance of the SNN on the benchmark data set TIDIGITS. We also compare our approach with some state-of-the-art methods achieving competitive results with less computational cost and complexity. The proposed approach also shows good robustness to additive noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The internal summation with the upper limit \(\infty \) indicates that the summation is performed until the entire spike encoding of the digit is completed.
References
Bhangale, K.B., Kothandaraman, M.: Survey of deep learning paradigms for speech processing. Wireless Pers. Commun. 125(2), 1913–1949 (2022)
Deng, Y., Chakrabartty, S., Cauwenberghs, G.: Analog auditory perception model for robust speech recognition. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, vol. 3, pp. 1705–1709. IEEE (2004)
Gerstner, W., Kistler, W.M.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)
Guo, W., Fouda, M.E., Eltawil, A.M., Salama, K.N.: Neural coding in spiking neural networks: a comparative study for robust neuromorphic systems. Front. Neurosci. 15, 638474 (2021)
Gupta, S., Agrawal, A., Pathak, A.: Energy-efficient deep learning: a review. Sustain. Comput.: Inform. Syst. 25, 100370 (2020). https://doi.org/10.1016/j.suscom.2020.100370
Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Izhikevich, E.M.: Which model to use for cortical spiking neurons? IEEE Trans. Neural Networks 15(5), 1063–1070 (2004)
Leonard, R.G., Doddington, G.: TIDIGITS. Linguistic Data Consortium, Philadelphia (1993)
Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10(9), 1659–1671 (1997)
Pan, Z., Chua, Y., Wu, J., Zhang, M., Li, H., Ambikairajah, E.: An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. Front. Neurosci. 13 (2020). https://www.frontiersin.org/articles/10.3389/fnins.2019.01420
Peralta, I., Odetti, N., Filomena, E., Rufiner, J., Ricart, N., Rufiner, H.L.: A new spiking neural network with extreme learning for FPGA implementation. In: Proceedings of the 10th Southern Programmable Logic Conference, pp. 49–54 (2019). https://sinc.unl.edu.ar/sinc-publications/2019/POFRRR19
Schrauwen, B., Van Campenhout, J.: Parallel hardware implementation of a broad class of spiking neurons using serial arithmetic. In: Proceedings of the 14th European Symposium on Artificial Neural Networks, pp. 623–628. D-Side Publications (2006)
Unnikrishnan, K., Hopfield, J.J., Tank, D.W.: Speaker-independent digit recognition using a neural network with time-delayed connections. Neural Comput. 4(1), 108–119 (1992)
Varga, A., Steeneken, H.J.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Verstraeten, D., Schrauwen, B., Stroobandt, D., Van Campenhout, J.: Isolated word recognition with the liquid state machine: a case study. Inf. Process. Lett. 95(6), 521–528 (2005)
Acknowledgements
We would like to express our gratitude to the National University of Entre Ríos UNER for their support and resources provided within the framework of the research and development project PID6187, and to the National University of Litoral with project CAID 50620190100151LI, enabling us to conduct this investigation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peralta, I., Odetti, N., Rufiner, H.L. (2023). Extreme Learning Layer: A Boost for Spoken Digit Recognition with Spiking Neural Networks. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)