Skip to main content

An Automatic Sound Classification Framework with Non-volatile Memory

  • Chapter
  • First Online:
Emerging Non-volatile Memory Technologies
  • 1228 Accesses

Abstract

Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, a biologically plausible ASC framework is introduced, namely SOM-SNN. The emerging dense crossbar array of non-volatile memory (NVM) devices have been recognized as a promising approach to emulate such distributed, massively-parallel and densely connected neuromorphic computing systems. This chapter presents the general structure of this framework for sound event and speech recognition, demonstrating attractive computational benefits and suitableness with an NVM implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. G. Guo, S.Z. Li, Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Networks 14(1), 209–215 (2003). Jan

    Google Scholar 

  2. A. Rabaoui, M. Davy, S. Rossignol, N. Ellouze, Using one-class SVMS and wavelets for audio surveillance. IEEE Trans. Inf. Forensics Secur. 3(4), 763–775 (2008). Dec

    Google Scholar 

  3. J. Dennis, H.D. Tran, H. Li, Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011). Feb

    Google Scholar 

  4. C. Kwak, O.W. Kwon, Cardiac disorder classification by heart sound signals using murmur likelihood and hidden Markov model state likelihood. IET Signal Proc. 6(4), 326–334 (2012). June

    Google Scholar 

  5. R.V. Sharan, T.J. Moir, An overview of applications and advancements in automatic sound recognition. Neurocomputing 200, 22–34 (2016)

    Article  Google Scholar 

  6. P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, D.S. Modha, A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)

    Article  ADS  Google Scholar 

  7. F.F. Li, R. Fergus, P. Perona, One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006). April

    Google Scholar 

  8. M. Palatucci, D. Pomerleau, G.E. Hinton, T.M. Mitchell, Zero-shot learning with semantic output codes. Adv. Neural Inf. Process. Syst. pp. 1410–1418 (2009)

    Google Scholar 

  9. C.D. Schuman, T.E. Potok, R.M. Patton, J.D. Birdwell, M.E. Dean, G.S. Rose, J.S. Plank, A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017)

  10. G.W. Burr, R.M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M.i Ishii, P. Narayanan, A. Fumarola, et al., Neuromorphic computing using non-volatile memory. Adv. Phys.: X 2(1), 89–124 (2017)

    Google Scholar 

  11. G. Indiveri, S.-C. Liu, Memory and information processing in neuromorphic systems. Proc. IEEE 103(8), 1379–1397 (2015)

    Article  Google Scholar 

  12. M. Ziegler, Ch. Wenger, E. Chicca, H. Kohlstedt, Tutorial: Concepts for closely mimicking biological learning with memristive devices: Principles to emulate cellular forms of learning. J. Appl. Phys. 124(15), 152003 (2018)

    Article  ADS  Google Scholar 

  13. T. Delbrück  B. Linares-Barranco, E. Culurciello, C. Posch, Activity-driven, event-based vision sensors, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 2426–2429

    Google Scholar 

  14. C. Brandli, R. Berner, M. Yang, S.C. Liu, T. Delbruck, A 240\(\times \) 180 130 dB 3 \(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014). Oct

    Google Scholar 

  15. S.C. Liu, A. van Schaik, B.A. Minch, T. Delbruck, Asynchronous binaural spatial audition sensor with 2644 channel output. IEEE Trans. Biomed. Circuits Syst. 8(4), 453–464 (2014). Aug

    Google Scholar 

  16. A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, D. Modha, A low power, fully event-based gesture recognition system, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 7388–7397

    Google Scholar 

  17. T. Serrano-Gotarredona, B. Linares-Barranco, F. Galluppi, L. Plana, S. Furber, Convnets experiments on spinnaker, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pp. 2405–2408

    Google Scholar 

  18. T. Kohonen, The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)

    Article  Google Scholar 

  19. J. Wu, Y. Chua, M. Zhang, H. Li, K.C. Tan, A spiking neural network framework for robust sound classification. Front. Neurosci. 12, 836 (2018)

    Article  Google Scholar 

  20. M. Bear, B. Connors, M. Paradiso, Neuroscience: Exploring the brain, 4th edn. (Wolters Kluwer, Philadelphia, 2016)

    Google Scholar 

  21. A. R. Møller, Hearing: Anatomy, Physiology, and Disorders of the Auditory System, (Plural Publishing, 2012)

    Google Scholar 

  22. Q. Yu, H. Tang, K.C. Tan, H. Li, Rapid feedforward computation by temporal encoding and learning with spiking neurons. IEEE Trans. Neural Networks Learn. Syst. 24(10), 1539–1552 (2013)

    Article  Google Scholar 

  23. S.M. Bohte, J.N. Kok, H. La Poutre, Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4), 17–37 (2002)

    Article  Google Scholar 

  24. C. Pantev, O. Bertrand, C. Eulitz, C. Verkindt, S. Hampson, G. Schuierer, T. Elbert, Specific tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous magnetic and electric recordings. Electroencephalogr. Clin. Neurophysiol. 94(1), 26–40 (1995)

    Article  Google Scholar 

  25. R. Gütig, H. Sompolinsky, The tempotron: a neuron that learns spike timing-based decisions. Nat. Neurosci. 9(3), 420 (2006)

    Article  Google Scholar 

  26. J. Dennis, Q. Yu, H. Tang, H.D. Tran, H. Li, Temporal coding of local spectrogram features for robust sound recognition, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 803–807

    Google Scholar 

  27. R. Xiao, R. Yan, H. Tang, K.C. Tan, A Spiking Neural Network Model for Sound Recognition (Springer, Singapore, 2017), pp. 584–594

    Google Scholar 

  28. J. Wu, Y. Chua, H. Li, A biologically plausible speech recognition framework based on spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8

    Google Scholar 

  29. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  30. I. McLoughlin, H. Zhang, Z. Xie, Y. Song, W. Xiao, Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio, Speech, Language Process. 23(3), 540–552 (2015). March

    Google Scholar 

  31. T. Nishiura, S. Nakamura, An evaluation of sound source identification with RWCP sound scene database in real acoustic environments, in Proceedings. IEEE International Conference on Multimedia and Expo, 2002, vol. 2, pp. 265–268

    Google Scholar 

  32. R.G. Leonard, G. Doddington, Tidigits Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)

    Google Scholar 

  33. N. Morgan, H. Bourlard, Continuous speech recognition using multilayer perceptrons with hidden markov models, in 1990 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 1990), pp. 413–416

    Google Scholar 

  34. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. pp. 1097–1105 (2012)

    Google Scholar 

  35. A. Graves, A. Mohamed, G.E. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 2013), pp. 6645–6649

    Book  Google Scholar 

  36. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  37. A.M. Saxe, J.L. McClelland, S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013)

  38. D. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  39. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Google Scholar 

  40. R. Gütig, H. Sompolinsky, Time-warp-invariant neuronal processing. PLoS Biol. 7(7), e1000141 (2009)

    Article  Google Scholar 

  41. K. Greff, R.K. Srivastava, J. Koutnk, B.R. Steunebrink, J. Schmidhuber, LSTM: A search space odyssey. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2222–2232 (2017)

    Article  MathSciNet  Google Scholar 

  42. J. Anumula, D. Neil, T. Delbruck, S.C. Liu, Feature representations for neuromorphic audio spike streams. Front. Neurosci 12, 23 (2018)

    Article  Google Scholar 

  43. M. Abdollahi, S.-C. Liu, Speaker-independent isolated digit recognition using an AER silicon cochlea, in 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS), Nov 2011, pp. 269–272

    Google Scholar 

  44. D. Neil, S.-C. Liu, Effective sensor fusion with event-based sensors and deep network architectures, in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 2282–2285

    Google Scholar 

  45. A. Tavanaei, A. Maida, Bio-inspired multi-layer spiking neural network extracts discriminative features from speech signals, in International Conference on Neural Information Processing (Springer, Berlin, 2017), pp. 899–908

    Google Scholar 

  46. A. Tavanaei, A. Maida, A spiking network that learns to extract spike signatures from speech signals. Neurocomputing 240, 191–199 (2017)

    Article  Google Scholar 

  47. Y. Zhang, P. Li, Y. Jin, Y. Choe, A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans. Neural Networks Learn. Syst. 26(11), 2635–2649 (2015)

    Article  MathSciNet  Google Scholar 

  48. M. Zhang, H. Qu, A. Belatreche, X. Xie, EMPD: An efficient membrane potential driven supervised learning algorithm for spiking neurons. IEEE Trans. Cogn. Dev. Syst. 99, 1–1 (2017)

    Google Scholar 

  49. R. Gütig, Spiking neurons can discover predictive features by aggregate-label learning. Science 351(6277), 4113 (2016)

    Article  Google Scholar 

  50. F. Ponulak, A. Kasiński, Supervised learning in spiking neural networks with Re Su Me: sequence learning, classification, and spike shifting. Neural Comput. 22(2), 467–510 (2010)

    Article  MathSciNet  Google Scholar 

  51. Q. Yu, H. Tang, K.C. Tan, H. Li, Precise-spike-driven synaptic plasticity: Learning hetero-association of spatiotemporal spike patterns. PLoS ONE 8(11), 1–16 (2013). Nov

    Google Scholar 

  52. D. Bilecen, K. Scheffler, N. Schmid, K. Tschopp, J. Seelig, Tonotopic organization of the human auditory cortex as detected by bold-FMRI. Hear. Res. 126(1), 19–27 (1998)

    Article  Google Scholar 

  53. T. Hromádka, M.R. DeWeese, A.M. Zador, Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol. 6(1), e16 (2008)

    Article  Google Scholar 

  54. S. Haykin, Z. Chen, The cocktail party problem. Neural Comput. 17(9), 1875–1902 (2005)

    Article  Google Scholar 

  55. M.C.W. Van Rossum, B.J. O’Brien, R.G. Smith, Effects of noise on the spike timing precision of retinal ganglion cells. J. Neurophysiol. 89(5), 2406–2419 (2003)

    Article  Google Scholar 

  56. E. Schneidman, Noise and information in neural codes, Ph.D. thesis, Hebrew University, (2001)

    Google Scholar 

  57. M.A. Zidan, A. Chen, G. Indiveri, W.D. Lu, Memristive computing devices and applications. J. Electroceram. 39(1–4), 4–20 (2017)

    Article  Google Scholar 

  58. Z. Pan, Y. Chua, J. Wu, H. Li, An event-based cochlear filter temporal encoding scheme for speech signals, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8

    Google Scholar 

  59. G.W. Burr, P. Narayanan, R.M. Shelby, S. Sidler, I. Boybat, C. di Nolfo, Y. Leblebici, Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power), in 2015 IEEE International Electron Devices Meeting (IEDM), Dec 2015, p. 4.4

    Google Scholar 

  60. T. Rumbell, S.L. Denham, T. Wennekers, A spiking self-organizing map combining STDP, oscillations, and continuous learning. IEEE Trans. Neural Networks Learn. Syst. 25(5), 894–907 (2014)

    Article  Google Scholar 

  61. H. Hazan, D. Saunders, D.T. Sanghavi, H. Siegelmann, R. Kozma, Unsupervised learning with self-organizing spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018

    Google Scholar 

  62. T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian, E. Eleftheriou, Stochastic phase-change neurons. Nat. Nanotechnol. 11(8), 693 (2016)

    Article  ADS  Google Scholar 

  63. W. Gerstner, W.M. Kistler, R. Naud, L. Paninski, Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition (Cambridge University Press, Cambridge, 2014)

    Book  Google Scholar 

  64. T. Tuma, M. Le Gallo, A. Sebastian, E. Eleftheriou, Detecting correlations using phase-change neurons and synapses. IEEE Electron Device Lett. 37(9), 1238–1241 (2016)

    Article  ADS  Google Scholar 

  65. A. Pantazi, S. Woźniak, T. Tuma, E. Eleftheriou, All-memristive neuromorphic computing with level-tuned neurons. Nanotechnology 27(35), 355205 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jibin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wu, J., Chua, Y., Zhang, M., Li, H., Tan, K.C. (2021). An Automatic Sound Classification Framework with Non-volatile Memory. In: Lew, W.S., Lim, G.J., Dananjaya, P.A. (eds) Emerging Non-volatile Memory Technologies. Springer, Singapore. https://doi.org/10.1007/978-981-15-6912-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6912-8_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6910-4

  • Online ISBN: 978-981-15-6912-8

  • eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics