Skip to main content

Extreme Learning Layer: A Boost for Spoken Digit Recognition with Spiking Neural Networks

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Abstract

The automatic recognition of speaker-independent digits requires high accuracy and robustness to noise and variability. Spiking neural networks (SNNs) are a promising model for this task, as they can mimic the temporal dynamics and energy efficiency of the human auditory system. However, SNNs are difficult to train and often require complex learning algorithms. Spoken digits provide a useful benchmark task to evaluate new SNN architectures. Performance in small vocabulary tasks is an important first step before scaling up to more complex recognition scenarios. In this paper, we propose to use an extreme learning layer (ELL) as a simple and effective way to improve the learning of SNNs for spoken digit recognition. The ELL is a randomly generated layer that maps the input features to the next layer without any further adjustment. The output layer is then trained by entropy minimization. We show that ELL can boost the performance of the SNN on the benchmark data set TIDIGITS. We also compare our approach with some state-of-the-art methods achieving competitive results with less computational cost and complexity. The proposed approach also shows good robustness to additive noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The internal summation with the upper limit \(\infty \) indicates that the summation is performed until the entire spike encoding of the digit is completed.

References

  1. Bhangale, K.B., Kothandaraman, M.: Survey of deep learning paradigms for speech processing. Wireless Pers. Commun. 125(2), 1913–1949 (2022)

    Article  Google Scholar 

  2. Deng, Y., Chakrabartty, S., Cauwenberghs, G.: Analog auditory perception model for robust speech recognition. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, vol. 3, pp. 1705–1709. IEEE (2004)

    Google Scholar 

  3. Gerstner, W., Kistler, W.M.: Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  4. Guo, W., Fouda, M.E., Eltawil, A.M., Salama, K.N.: Neural coding in spiking neural networks: a comparative study for robust neuromorphic systems. Front. Neurosci. 15, 638474 (2021)

    Article  Google Scholar 

  5. Gupta, S., Agrawal, A., Pathak, A.: Energy-efficient deep learning: a review. Sustain. Comput.: Inform. Syst. 25, 100370 (2020). https://doi.org/10.1016/j.suscom.2020.100370

    Article  Google Scholar 

  6. Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    Article  MATH  Google Scholar 

  7. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  8. Izhikevich, E.M.: Which model to use for cortical spiking neurons? IEEE Trans. Neural Networks 15(5), 1063–1070 (2004)

    Article  Google Scholar 

  9. Leonard, R.G., Doddington, G.: TIDIGITS. Linguistic Data Consortium, Philadelphia (1993)

    Google Scholar 

  10. Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10(9), 1659–1671 (1997)

    Article  Google Scholar 

  11. Pan, Z., Chua, Y., Wu, J., Zhang, M., Li, H., Ambikairajah, E.: An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. Front. Neurosci. 13 (2020). https://www.frontiersin.org/articles/10.3389/fnins.2019.01420

  12. Peralta, I., Odetti, N., Filomena, E., Rufiner, J., Ricart, N., Rufiner, H.L.: A new spiking neural network with extreme learning for FPGA implementation. In: Proceedings of the 10th Southern Programmable Logic Conference, pp. 49–54 (2019). https://sinc.unl.edu.ar/sinc-publications/2019/POFRRR19

  13. Schrauwen, B., Van Campenhout, J.: Parallel hardware implementation of a broad class of spiking neurons using serial arithmetic. In: Proceedings of the 14th European Symposium on Artificial Neural Networks, pp. 623–628. D-Side Publications (2006)

    Google Scholar 

  14. Unnikrishnan, K., Hopfield, J.J., Tank, D.W.: Speaker-independent digit recognition using a neural network with time-delayed connections. Neural Comput. 4(1), 108–119 (1992)

    Article  Google Scholar 

  15. Varga, A., Steeneken, H.J.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  16. Verstraeten, D., Schrauwen, B., Stroobandt, D., Van Campenhout, J.: Isolated word recognition with the liquid state machine: a case study. Inf. Process. Lett. 95(6), 521–528 (2005)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We would like to express our gratitude to the National University of Entre Ríos UNER for their support and resources provided within the framework of the research and development project PID6187, and to the National University of Litoral with project CAID 50620190100151LI, enabling us to conduct this investigation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo L. Rufiner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peralta, I., Odetti, N., Rufiner, H.L. (2023). Extreme Learning Layer: A Boost for Spoken Digit Recognition with Spiking Neural Networks. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics